Abstract
It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy parallel programmability crucial to wider penetration of CMPs to general purpose computing. This implementation relies on exploitation of the slack of parallel applications to hide the latency of the memory system instead of caches, sufficient bisection bandwidth to guarantee high throughput, and hashing to avoid hot spots in intercommunication. Unfortunately this solution can not handle workloads with low thread-level parallelism (TLP) efficiently because then there is not enough parallel slackness available for hiding the latency. In this paper we show that integrating nonuniform memory access (NUMA) support to the PRAM implementation architecture can solve this problem. The obtained PRAM-NUMA hybrid model is described and architectural implementation of it is outlined on our Eclipse ESM CMP framework.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010 |
Subtitle of host publication | Atlanta, Georgia, USA, 19-23 April 2010 |
Place of Publication | Piscataway, NJ, USA |
Publisher | IEEE Institute of Electrical and Electronic Engineers |
Number of pages | 8 |
ISBN (Electronic) | 978-1-4244-6534-7 |
ISBN (Print) | 978-1-4244-6533-0, 978-1-4244-6532-3 |
DOIs | |
Publication status | Published - 2010 |
MoE publication type | A4 Article in a conference publication |