Abstract
Synchronous shared memory (SSM) architectures are promising candidates
for future CMP architectures due to their ability to execute general purpose
parallel code efficiently down to the finest granularity and support for
easy-to-use parallel program-ming models. While the recent SSM architectures
are tuned for fast execution of parallel workloads and co-exploitation of ILP
and TLP, the solutions used in them do not support efficient execution of
low-TLP code fragments. More generally speaking, this inability of
architectures optimized for parallel execution to efficiently execute
sequential code has been shown to be one of the design bottlenecks in the
theory of architectures.
In this presentation we propose a SW/HW approach for dynamically optimizing
the performance of recent SSM architectures to low-TLP situations. The HW part
includes changes to the processor pipeline and instruction set as well as a
new technique called bunching that combines execution slots of multiple
threads into a single bunch executing a single thread with a speedup
proportional to the number of threads. The SW part includes language level
mechanism to support seamless bunching concur-rently with parallel execution.
Preliminary evaluation of the approach is given.
Original language | English |
---|---|
Publication status | Published - 2009 |
MoE publication type | Not Eligible |
Event | Scalable Approaches to High-Performance and High-Productivity Computing 2009, ScalPerf’09 - Bertinoro, Italy Duration: 20 Sept 2009 → 24 Sept 2009 |
Conference
Conference | Scalable Approaches to High-Performance and High-Productivity Computing 2009, ScalPerf’09 |
---|---|
Abbreviated title | ScalPerf’09 |
Country/Territory | Italy |
City | Bertinoro |
Period | 20/09/09 → 24/09/09 |
Keywords
- Parallel computing
- computer architecture
- CMP
- PRAM
- NUMA
- optimization