SW/HW approach for optimizing the performance of synchronous shared memory architectures to low-TLP situations

Research output: Contribution to conferenceConference articleScientific

Abstract

Synchronous shared memory (SSM) architectures are promising candidates for future CMP architectures due to their ability to execute general purpose parallel code efficiently down to the finest granularity and support for easy-to-use parallel program-ming models. While the recent SSM architectures are tuned for fast execution of parallel workloads and co-exploitation of ILP and TLP, the solutions used in them do not support efficient execution of low-TLP code fragments. More generally speaking, this inability of architectures optimized for parallel execution to efficiently execute sequential code has been shown to be one of the design bottlenecks in the theory of architectures. In this presentation we propose a SW/HW approach for dynamically optimizing the performance of recent SSM architectures to low-TLP situations. The HW part includes changes to the processor pipeline and instruction set as well as a new technique called bunching that combines execution slots of multiple threads into a single bunch executing a single thread with a speedup proportional to the number of threads. The SW part includes language level mechanism to support seamless bunching concur-rently with parallel execution. Preliminary evaluation of the approach is given.
Original languageEnglish
Publication statusPublished - 2009
MoE publication typeNot Eligible
EventScalable Approaches to High-Performance and High-Productivity Computing 2009, ScalPerf’09 - Bertinoro, Italy
Duration: 20 Sep 200924 Sep 2009

Conference

ConferenceScalable Approaches to High-Performance and High-Productivity Computing 2009, ScalPerf’09
Abbreviated titleScalPerf’09
CountryItaly
CityBertinoro
Period20/09/0924/09/09

Fingerprint

Memory architecture
Inductive logic programming (ILP)
Pipelines

Keywords

  • Parallel computing
  • computer architecture
  • CMP
  • PRAM
  • NUMA
  • optimization

Cite this

Forsell, M. (2009). SW/HW approach for optimizing the performance of synchronous shared memory architectures to low-TLP situations. Paper presented at Scalable Approaches to High-Performance and High-Productivity Computing 2009, ScalPerf’09, Bertinoro, Italy.
Forsell, Martti. / SW/HW approach for optimizing the performance of synchronous shared memory architectures to low-TLP situations. Paper presented at Scalable Approaches to High-Performance and High-Productivity Computing 2009, ScalPerf’09, Bertinoro, Italy.
@conference{1f1da94f967f437a9aa370d3de0d9aab,
title = "SW/HW approach for optimizing the performance of synchronous shared memory architectures to low-TLP situations",
abstract = "Synchronous shared memory (SSM) architectures are promising candidates for future CMP architectures due to their ability to execute general purpose parallel code efficiently down to the finest granularity and support for easy-to-use parallel program-ming models. While the recent SSM architectures are tuned for fast execution of parallel workloads and co-exploitation of ILP and TLP, the solutions used in them do not support efficient execution of low-TLP code fragments. More generally speaking, this inability of architectures optimized for parallel execution to efficiently execute sequential code has been shown to be one of the design bottlenecks in the theory of architectures. In this presentation we propose a SW/HW approach for dynamically optimizing the performance of recent SSM architectures to low-TLP situations. The HW part includes changes to the processor pipeline and instruction set as well as a new technique called bunching that combines execution slots of multiple threads into a single bunch executing a single thread with a speedup proportional to the number of threads. The SW part includes language level mechanism to support seamless bunching concur-rently with parallel execution. Preliminary evaluation of the approach is given.",
keywords = "Parallel computing, computer architecture, CMP, PRAM, NUMA, optimization",
author = "Martti Forsell",
year = "2009",
language = "English",
note = "Scalable Approaches to High-Performance and High-Productivity Computing 2009, ScalPerf’09, ScalPerf’09 ; Conference date: 20-09-2009 Through 24-09-2009",

}

Forsell, M 2009, 'SW/HW approach for optimizing the performance of synchronous shared memory architectures to low-TLP situations' Paper presented at Scalable Approaches to High-Performance and High-Productivity Computing 2009, ScalPerf’09, Bertinoro, Italy, 20/09/09 - 24/09/09, .

SW/HW approach for optimizing the performance of synchronous shared memory architectures to low-TLP situations. / Forsell, Martti.

2009. Paper presented at Scalable Approaches to High-Performance and High-Productivity Computing 2009, ScalPerf’09, Bertinoro, Italy.

Research output: Contribution to conferenceConference articleScientific

TY - CONF

T1 - SW/HW approach for optimizing the performance of synchronous shared memory architectures to low-TLP situations

AU - Forsell, Martti

PY - 2009

Y1 - 2009

N2 - Synchronous shared memory (SSM) architectures are promising candidates for future CMP architectures due to their ability to execute general purpose parallel code efficiently down to the finest granularity and support for easy-to-use parallel program-ming models. While the recent SSM architectures are tuned for fast execution of parallel workloads and co-exploitation of ILP and TLP, the solutions used in them do not support efficient execution of low-TLP code fragments. More generally speaking, this inability of architectures optimized for parallel execution to efficiently execute sequential code has been shown to be one of the design bottlenecks in the theory of architectures. In this presentation we propose a SW/HW approach for dynamically optimizing the performance of recent SSM architectures to low-TLP situations. The HW part includes changes to the processor pipeline and instruction set as well as a new technique called bunching that combines execution slots of multiple threads into a single bunch executing a single thread with a speedup proportional to the number of threads. The SW part includes language level mechanism to support seamless bunching concur-rently with parallel execution. Preliminary evaluation of the approach is given.

AB - Synchronous shared memory (SSM) architectures are promising candidates for future CMP architectures due to their ability to execute general purpose parallel code efficiently down to the finest granularity and support for easy-to-use parallel program-ming models. While the recent SSM architectures are tuned for fast execution of parallel workloads and co-exploitation of ILP and TLP, the solutions used in them do not support efficient execution of low-TLP code fragments. More generally speaking, this inability of architectures optimized for parallel execution to efficiently execute sequential code has been shown to be one of the design bottlenecks in the theory of architectures. In this presentation we propose a SW/HW approach for dynamically optimizing the performance of recent SSM architectures to low-TLP situations. The HW part includes changes to the processor pipeline and instruction set as well as a new technique called bunching that combines execution slots of multiple threads into a single bunch executing a single thread with a speedup proportional to the number of threads. The SW part includes language level mechanism to support seamless bunching concur-rently with parallel execution. Preliminary evaluation of the approach is given.

KW - Parallel computing

KW - computer architecture

KW - CMP

KW - PRAM

KW - NUMA

KW - optimization

M3 - Conference article

ER -

Forsell M. SW/HW approach for optimizing the performance of synchronous shared memory architectures to low-TLP situations. 2009. Paper presented at Scalable Approaches to High-Performance and High-Productivity Computing 2009, ScalPerf’09, Bertinoro, Italy.