The Evolution Of Chip Multi-Processors And Its Role In High Performance And Parallel Computing
Description
Vol. 8 No. 6 September 2010 International Journal of Computer Science and Information Security
Shared by: ijcsis
Categories
Tags
IJCSIS, call for paper, journal computer science, research, google scholar, IEEE, Scirus, download, ArXiV, library, information security, internet, peer review, scribd, docstoc, cornell university, archive, Journal of Computing, DOAJ, Open Access, October 2010, Volume 8, No. 7, Impact Factor, engineering, international, proQuest, computing, computer, technology
-
Stats
- views:
- 163
- posted:
- 11/2/2010
- language:
- English
- pages:
- 7
Document Sample


(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 7, October 2010
THE EVOLUTION OF CHIP MULTI-PROCESSORS AND ITS ROLE IN
HIGH PERFORMANCE AND PARALLEL COMPUTING
A.Neela madheswari, Dr.R.S.D.Wahida banu,
Research Scholar, Anna University, Coimbatore, Research Supervisor, Anna University, Coimbatore,
India. India.
neela.madheswari@gmail.com drwahidabanu@gmail.com
Abstract - The importance given for today’s number of cores continues to offer dramatically
computing environment is the support of a increased performance and power characteristics
number of threads and functional units so [14].
that multiple processes can be done
simultaneously. At the same time, the In recent years, Chip Multi-Processing (CMP)
processors must not suffer from high heat architectures have been developed to enhance
liberation due over increase in frequencies to performance and power efficiency through the
attain high speed of the processors and also exploitation of both instruction-level and thread-
they must attain high system performance. level parallelism. For instance, the IBMPower5
These situations led to the emergence and the processor enables two SMT threads to execute
growth of Chip Multi-Processor (CMP) on each of its two cores and four chips to be
architecture, which forms the basis for this interconnected to form an eight-core module [8].
paper. It gives the contribution towards the Intel Montecito, Woodcrest, and AMDAMD64
role of CMPs in parallel and high processors all support dual-cores [9]. Sun also
performance computing environments and shipped eight-core 32-way Niagara processors in
the needs to move towards CMP architectures 2006 [10, 15]. Chip Multi-Processors (CMP)
in the near future. have the advantages of:
1. Parallelism of computation: Multiple
Keywords- CMPs; High Performance processors on a chip can execute process threads
computing; Grid Computing; Parallel concurrently.
computing; Simultaneous multithreading. 2. Processor core density in systems: Highly
scalable enterprise class servers systems as well
I. INTRODUCTION as rack-mount servers can be built that fit in
several processor cores in a small volume.
Advances in semiconductor technology enable 3. Short design cycle and quick time-to-market:
the integration of billion transistors on a single Since CMP chips are based on existing processor
chip. Such exponentially increasing transistor cores the product schedules can be short [5].
counts makes reliability an important design
challenge since a processor’s soft error rate II. MOTIVATION
grows in direct proportion to the number of
devices being integrated [7]. The huge amount of For the last few years, the software industry has
transistors, on the other hand, leads to the significant advances in computing and the
popularity of multi-core processor or chip multi- emerging grid computing, cloud computing and
processor architectures for improved system Rich Internet Applications will be the best
throughput [13]. examples for distributed applications. Although
we are in machine-based computing now, a shift
Multi-core processors represents an evolutionary towards human-based computing are also
change in conventional computing as well setting emerging in which the voice, speech, gesture and
the new trend for high performance computing commands of the human can be understand by
(HPC) - but parallelism is nothing new. Intel has the computers and act according to the human
a long history with the concept of parallelism signals. Video conferencing, natural language
and the development of hardware-enhanced processing and speech recognition software are
threading capabilities. Intel has been delivering come under this human-based computing as
threading capable products for more than a example. For these kinds of computing, there is a
decade. The move towards chip-level need for huge computing power with a number
multiprocessing architectures with a large
111 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 7, October 2010
of processors together with the advancement in (1) Single processor architecture, which
multi-processor technologies. does not support multiple functional
units to run simultaneously.
In this decade, computer architecture has entered (2) Simultaneous multithreading (SMT)
a new ‘multi-core’ era with the advent of Chip architecture, which supports multiple
Multi-processors (CMP). Many leading threads to run simultaneously but not
companies, Intel, AMD and IBM, have the multiple functional units at any
successfully released their multi-core processor particular time.
series, such as Intel IXP network processors
[28], the Cell processor [12], the AMD
(3) Multi-core architecture or Chip multi-
OpteronTM etc. CMPs have evolved largely due processor (CMP) architecture, which
to the increased power consumption in nanoscale supports functional units to run
technologies which have forced the designers to simultaneous and may support multiple
seek alternative measures instead of device threads also simultaneously at any
scaling to improve performance. Increasing particular time.
parallelism with multiple cores is an effective
strategy [18]. A. Single processor architecture
III. EVOLUTION OF PROCESSOR The single processor architecture is shown in
ARCHITECTURE figure 1. Here only one processing unit is present
in the chip for performing the arithmetic or
Dual and multi-core processor systems are going logical operations. At any particular time, only
to change the dynamics of the market and enable one operation can be performed.
new innovative designs delivering high
performance with an optimized power
characteristic. They drive multithreading and
parallelism at a higher than instruction level, and
provide it to mainstream computing on a massive
scale. From an operating system level (OS), they
look like a symmetric multi-processor system
(SMP) but they bring lot more advantage than
typical dual or multi- processor systems.
Multi-core processing is a long-term strategy for
Intel that began more than a decade ago. Intel
has more than 15 multi- core processor projects
underway and it is on the fast track to deliver
multi-core processors in high volume across off Figure 1: Single core CPU chip
of their platform families. Intel’s multi-core
architecture will possibly feature dozens or even B. Simultaneous multithreading (SMT)
hundreds of processor cores on a single die. In architecture
addition to general-purpose cores, Intel multi-
core processors will eventually include SMT permits simultaneous multiple independent
specialized cores for processing graphics, speech threads to execute simultaneously on the same
recognition algorithms, communication core. If one thread is waiting for a floating point
protocols, and more. Many new and significant operation to complete, another thread can use
innovations designed to optimize the power, integer units. Without SMT, only a single thread
performance, and scalability is implemented into can run at any given time. But in SMT, the same
the new multi-core processors [14]. functional unit cannot be executed
simultaneously. If two threads want to execute
According to the number of functional units the integer unit at the same time, it is not
running simultaneously, the processor possible with SMT. Here all the caches of the
architecture is classified into 3 main types system are shared.
namely:
C. Chip Multi-Processor architecture
112 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 7, October 2010
IV. EXISTING ENVIRONMENTS FOR
In multi-core or chip multi-processor CHIP MULTI- PROCESSOR
architecture, multiple processing units or chips ARCHITECTURE
are present on a single die. Figure 2 shows a
multi-core architecture with 3 cores in a single The chip multi-processors are used in the range
CPU chip. Here all the cores are fit on a single of desktop to high performance computing
processor socket called as Chip Multi Processor. environments. The section 4.1 and section 4.2
The cores can run in parallel. Within each core, will show the existence and the main role of
threads can be time-sliced similar to single CMPs in various computing environments.
processor system [17].
A. High Performance Computing
High performance computing uses super
computers and computer clusters to solve
advanced computation problems. A list of the
most powerful high-performance computers can
be found on the Top500 list.
Top500 is a list of the world’s fastest computers.
The list is created twice a year and includes
some rather large systems. Not all Top500
systems are clusters, but many of them are built
from the same technology. There may be HPC
Figure 2: Chip multi-processor architecture systems out there that are proprietary or not
interested in the Top500 ranking. The Top500
The multi-core architecture with cache and main list is the wealth of historical data. The list was
memory is shown in Figure 3, comprises started in 1993 and has data on vendors,
processor cores from 0 to N and each core has organizations, processors, memory, and so on for
private L1 cache which consists of instruction each entry in the list [22]. As per the information
cache (I-cache) and date cache (D-cache). taken at June 2010 from [23], the first 10
systems are given in the table 1.
Table 1: Top 10 Super computers list
Rank Processor details Year
1. Jaguar - Cray XT5-HE 2009.
Opteron Six Core 2.6
GHz.
2. Nebulae - Dawning 2010.
TC3600 Blade, Intel
X5650, NVidia Tesla
C2050 GPU.
3. Roadrunner - 2009.
BladeCenter QS22/LS21
Figure 3: Multi-core architecture with Cluster, PowerXCell 8i
memory 3.2 GHz / Opteron DC
1.8 GHz, Voltaire
Each L1 cache is connected to the shared L2 Infiniband.
cache. The L2 cache is unified and inclusive, i.e. 4. Kraken XT5 - Cray XT5- 2009.
it includes all the lines contained in the L1 HE Opteron Six Core 2.6
caches. The main memory is connected to L2 GHz.
cache, if the data requests are missed in L2 5. JUGENE - Blue Gene/P 2009.
cache, the data access will happened in main Solution.
memory [20]. 6. Pleiades - SGI Altix ICE 2010.
8200EX/8400EX, Xeon
HT QC 3.0/Xeon
113 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 7, October 2010
Westmere 2.93 GHz, Here the processors involved belong to multi
Infiniband. core types under some grids. Hence under grid
7. Tianhe-1 - NUDT TH-1 2009. computing environment also chip multi-
Cluster, Xeon processors are used.
E5540/E5450, ATI
Radeon HD 4870 2, C. Parallel computing
Infiniband.
8. BlueGene/L - eServer 2007. Parallel computing plays a major role in the
Blue Gene Solution. current trends and in almost all the fields.
9. Intrepid - Blue Gene/P 2007. Formerly they are useful only to solve very huge
Solution. problems such as weather forecasting, etc. But
10. Red Sky - Sun Blade 2010. nowadays the concept of parallel computing are
x6275, Xeon X55xx 2.93 used starting from super computing environment
GHz, Infiniband. to the modern desktop environment such as
quad-core or in the GPU usage [25].
Among the top 10 super computers, Jaguar and
Kraken are having multi-core that are coming As per the parallel workload archive [21], the
under CMP processors. Thus under high parallel computing systems are listed as:
performance computing environments, the chip 1. CTC IBM SP2: It contains 512 nodes
multi processors are involved and extends their IBM SP2 during 1996.
capability in near future since the worldwide 2. DAS-2 5-Cluster: It contains 72 nodes,
HPC market is growing rapidly. Successful HPC each of dual 1GHz Pentium-III during
applications span many industrial, government 2003.
and academic sectors. 3. HPC2N: It contains 120-node, each
node contains two 240 AMD Athlon
B. Grid computing MP2000+ processors during 2002.
Grid computing has emerged as the next-
4. KTH IBM SP2: It contains 100 nodes
IBM SP2 during 1996.
generation parallel and distributed computing
methodology, which aggregates dispersed 5. LANL: It contains 1024-node
heterogeneous resources for solving various Connection Machine CM-5, during
kinds of large-scale parallel applications in 1994.
science, engineering and commerce [3]. As per 6. LANL O2K: It contains a cluster of 16
[24], the list of the various grid computing Origin 2000 machines with 128
environments are: processors each (2048 total) during
1999.
1. DAS-2: DAS-2 is a wide-area distributed
7. LCG: It contains LHC (Large Hadron
computer of 200 Dual Pentium-III nodes
Collider) Computing Grid during 2005.
[26].
8. LLNL Atlas: It contains 1152 node,
2. Grid5000: It is distributed over 9 sites and each node contains 8 AMD Opteron
contains approximately 1500 nodes and processors during 2006.
approximately 5500 CPUs [29].
9. LLNL T3D: It contains 128 nodes, each
3. NorduGrid: It is one of the largest node has two DEC Alpha 21064
production grids in the world having more processors. Each of the 128 nodes has
than 30 sites of heterogeneous clusters. two DEC Alpha 21064 processors
Some of the cluster nodes contain dual during 1996.
Pentium III processors [ng]. 10. LLNL Thunder: It contains 1024 nodes,
4. AuverGrid: It is a heterogeneous cluster each with 4 Intel IA-64 Itanium
[30]. processors during 2007.
5. Sharcnet: It is a cluster of clusters. It 11. LLNL uBGL: It contains 2048
consists of 10 sites and has 6828 processors processors during 2006.
[24]. 12. LPC: It contains 70 dual 3GHz
6. LCG: It contains 24115 processors [24]. Pentium-IV Xeons nodes during 2004.
13. NASA: It contains 128-nodes during
1993.
114 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 7, October 2010
14. OSC Cluster: It has two types of nodes: Chip-Multiprocessor (CMP) or multi-core
32 quad-processor nodes, and 25 dual- technology has become the mainstream in CPU
processor nodes, for a total of 178 designs. It embeds multiple processor cores into
processors during 2000. a single die to exploit thread-level parallelism for
15. SDSC: It contains 416 nodes during achieving higher overall chip-level Instruction-
1995. Per-Cycle (IPC) [2, 4, 6, 11, 27]. Combined with
16. SDSC DataStar: It contains 184 nodes increased clock frequency, a multi-core,
during 2004. multithreaded processor chip demands higher
17. SDSC Blue Horizon: It contains 144 on- and off-chip memory bandwidth and suffers
nodes during 2000. longer average memory access delays despite an
18. SDSC SP2: It contains 128-node IBM increasing on-chip cache size. Tremendous
SP2 during 1998. pressures are put on memory hierarchy systems
19. SHARCNET: It contains 10 clusters to supply the needed instructions and data timely
with quad and dual core processors [16].
during 2005.
The memory and the chip memory bandwidth
Hence most of the processors involved in the are a few of the main concern which plays an
parallel computing machines are multi-core important role in improving the system
processor types. This implies the involvement of performance in CMP architecture. Similarly the
multi-core processors in parallel computing interconnection of the chips within the single die
environments. is also an important consideration.
V. CMP CHALLENGES VI. CONCLUSION
The advent of multi-core processors and the In today’s scenario, it is essential to have a shift
emergence of new parallel applications that take towards Chip multi processor architectures. It is
advantage of such processors pose difficult not only applicable for the high performance and
challenges to designers. parallel computing but also for the desktops to
face the challenges of system performance. Day
With relatively constant die sizes, limited on by day, the challenges faced by the CMPs
chip cache, and scarce pin bandwidth, more become complicated but the application and
cores on chip reduces the amount of available needs are also increasing. Suitable steps to be
cache and bus bandwidth per core, therefore taken to decrease power consumption and
exacerbating the memory wall problem [1]. The leakage current.
designer has to build a processor that provides a
core with good single-thread performance in the References
presence of long latency cache misses, while
enabling as many of these cores to be placed on [1] W. Wulf and S. McKee, “Hitting the
the same die for high throughput. Memory Wall: Implications of the Obvious”,
ACM SIGArch Computer Architecture News,
Limited on chip cache area, reduced cache 23(1):20-24, March 1995.
capacity per core, and the increase in application [2] L. Hammond, B. A. Nayfeh and K. Olukotun,
cache foot prints as applications scale up with A Single-Chip Multiprocessor, IEEE Computer,
the number of cores, will make cache miss stalls Sep. 1997.
more problematic [19]. [3] I. Foster, C. Kesselman (Eds.), “The Grid:
Blueprint for a Future Computing
The problem of shared L2 cache allocation is Infrastructure”, Morgan Kaufmann Publishers,
critical to the effective utilization of multi-core 1999.
processors. Sometimes unbalanced cache [4] J. M. Tendler, S. Dodson, S. Fields, H. Le,
allocation will happen, and this situation can and B. Sinharoy, “IBM eserver Power4 System
easily leads to serious problems such as thread Microarchitecture,” IBM White Paper, Oct.
starvation and priority inversion, which threatens 2001.
to processor’s utilization ratio and system [5] Ishwar Parulkar, Thomas Ziaja, Rajesh
performance. Pendurkar, Anand D’Souza and Amitava
Majumdar, “A Scalable, Low Cost Design-For-
Test Architecture for UltraSPARCTM Chip Multi-
115 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 7, October 2010
Processors”, International Test Conference, [19] Satyanarayana Nekkalapu, Haitham Akkary,
IEEE, 2002, pp.726-735. Komal Jothi, Renjith Retnamma, Xiaoyu Song,
[6] Sun Microsystems, “Sun’s 64-bit Gemini “A Simple Latency Tolerant Processor”, IEEE,
Chip,” Sunflash, 66(4), Aug. 2003. 2008, pp.384-389.
[ng] “NorduGrid – The Nordic Testbed for Wide [20] Benhai Zhou , Jianzhong Qiao, Shu-kuan
area computing and data handling”, Final Report, Lin, “Research on fine-grain cache assignment
Jan 2003. scheduling algorithm for multi-core processors”,
[7] S. Mukherjee, J. Emer, and S. Reinhardt, IEEE, 2009, pp.1-4.
“The soft error problem, an architectural [21] Parallel workloads archive,
perspective”, HPCA-11, 2005. Dror.G.Feitelson,
[8] B. Sinharoy, R. Kalla, J. Tendler, R. http://www.cs.huji.ac.il/labs/parallel/workload,
Eickemeyer, and J. Joyner. Power5 system March 2009.
microarchitecture. IBM Journal of Research and [22] Douglas Eadline, “High Performance
Development, 49(4/5):505–521, 2005. Computing for Dummies”, SUN and AMD
[9] C. McNairy and R. Bhatia. Montecito: A Special edition, 2009.
dual-core, dualthread itanium processor. IEEE [23] Top 10 super computers,
Micro, 25(2):10–20, 2005. http://www.top500.org/, Sep 2010.
[10] P. Kongetira, K. Aingaran, and K. [24] Grid computing environments,
Olukotun. Niagara: A 32-way multithreaded http://gwa.ewi.tudelft.nl/pmwiki/, June 2010.
sparc processor. IEEE Micro, 25(2):21–29, 2005. [25] A.Neela madheswari, R.S.D.Wahida banu,
[11] AMD, Multi-core Processors: The Next “Important essence of co-scheduling for parallel
Evolution in Computing, job scheduling”, Advances in Computational
http://multicore.amd.com/WhitePapers/Multi- Sciences and Technology, Vol.3, No.1, 2010,
Core_Processors_WhitePaper.pdf, 2005. pp.49-55.
[12] A. Eichenberger, J. O’Brien, and et al. [26] The Distributed ASCI Supercomputer 2,
Using advanced compiler technology to exploit http://www.cs.vu.nl/das2/, Sep 2010.
the performance of the cell broadband engineTM [27] Intel, Inside Intel Core Microarchitecture
architecture. IBM Systems Journal, 45:59–84, and Smart Memory Access.
2006. http://download.intel.com/technology/architectur
[13] Huiyang Zhou, “A Case for fault tolerance e/sma.pdf.
and performance enhancement using Chip Multi- [28] Intel. Intel ixp2855 network processor -
Processors”, IEEE Computer architecture product brief.
letters”, Vol.5, 2006. [29] Pierre Riteau, Mauricio Tsugawa, Andrea
[14] Pawel Gepner, Michal F.Kowalik, “Multi- Matsunaga, Jose Fortes, Tim Freeman, Kate
Core Processors: New way to achieve high Keahey, “Sky computing on FutureGrid and
system performance”, In the proceedings of the Grid5000”.
International Symposium on Parallel computing [30] AuverGrid, http://gstat-
in Electrical Engineering, IEEE, 2006. prod.cern.ch/gstat/site/AUVERGRID/, Sep 2010.
[15] Fengguang Song, Shirley Moore, Jack
Dongarra, “L2 Cache Modeling for Scientific AUTHOR’S PROFILE
Applications on Chip Multi-Processors”,
International Conference on Parallel Processing A.Neela Madheswari received her Master of
(ICPP), 2007. Computer Science and Engineering degree from
[16] Lu Peng, Jih-Kwon Peir, Tribuvan K. Vinayaka Missions University, on June 2006.
Prakash, Yen-Kuang Chen and David Currently, she is doing his research in the area of
Koppelman, “Memory performance and Parallel and Distributed systems under Anna
scalability of Intel’s and AMD’s Dual-Core University, Coimbatore. Earlier she completed
Processors: A case study”, IEEE, 2007, pp.55- her B.E, from Madras University of Computer
64. Science and Engineering, Chennai on April
[17] Jernej Barbic, “Multi-core architectures”, 2000. Later, she joined as Lecturer at Mahendra
15-213, Spring 2007, May 2007. Engineering College in CSE department from
[18] Sushu Zhang, Karam S.Chatha, “Automated 2002. She had completed her M.E., from
Techniques for Energy Efficient scheduling on Vinayaka Missions University of Computer
Homogeneous and Heterogeneous Chip Multi- Science and Engineering during 2006 and now
processor architectures”, IEEE, 2008, pp.61-66. she serves as Assistant Professor at MET’S
School of Engineering, Thrissur. Her research
116 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 7, October 2010
interest includes Parallel and Distributed
Computing and Web Technologies. She is a
member of the Computer Society of India,
Salem. She had presented the papers under
national and international journals, national and
international conferences. She is the reviewer in
journals namely IJCNS and IJCSIS.
117 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
Related docs
Other docs by ijcsis
Comparative Analysis between Split and HierarchyMap Treemap Algorithms for Visualizing Hierarchical Data
Views: 15 | Downloads: 0
Non-Preemptive Multi-Constrain Scheduling for Multiprocessor with Hopfield Neural Network
Views: 5 | Downloads: 0
Reliable Multipath Routing Protocol (RMRP) For Mobile Ad Hoc Networks Using Adaptive Video Compression
Views: 10 | Downloads: 1
Single CCTA-Based Four Input Single Output Voltage-Mode Universal Biquad Filter
Views: 36 | Downloads: 0
A Cloud Computing Architecture for E-Learning Platform, Supporting Multimedia Content
Views: 42 | Downloads: 0
Get documents about "