Synchronous shared memory (SSM) architectures are promising candidates for future CMP architectures due to their ability to execute general purpose parallel code efficiently down to the finest granularity and support for easy-to-use parallel program-ming models. While the recent SSM architectures are tuned for fast execution of parallel workloads and co-exploitation of ILP and TLP, the solutions used in them do not support efficient execution of low-TLP code fragments. More generally speaking, this inability of architectures optimized for parallel execution to efficiently execute sequential code has been shown to be one of the design bottlenecks in the theory of architectures. In this presentation we propose a SW/HW approach for dynamically optimizing the performance of recent SSM architectures to low-TLP situations. The HW part includes changes to the processor pipeline and instruction set as well as a new technique called bunching that combines execution slots of multiple threads into a single bunch executing a single thread with a speedup proportional to the number of threads. The SW part includes language level mechanism to support seamless bunching concur-rently with parallel execution. Preliminary evaluation of the approach is given.
|Publication status||Published - 2009|
|MoE publication type||Not Eligible|
|Event||Scalable Approaches to High-Performance and High-Productivity Computing 2009, ScalPerf’09 - Bertinoro, Italy|
Duration: 20 Sep 2009 → 24 Sep 2009
|Conference||Scalable Approaches to High-Performance and High-Productivity Computing 2009, ScalPerf’09|
|Period||20/09/09 → 24/09/09|
- Parallel computing
- computer architecture
Forsell, M. (2009). SW/HW approach for optimizing the performance of synchronous shared memory architectures to low-TLP situations. Paper presented at Scalable Approaches to High-Performance and High-Productivity Computing 2009, ScalPerf’09, Bertinoro, Italy.