Anda di halaman 1dari 7


Man Prakash Gupta , Minki Cho , Saibal Mukhopadhyay , Satish Kumar 1 Department of Mechanical Engineering 2 Department of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, Georgia, USA Phone: (404) 385 6640 Email:
1 2 2 1

ABSTRACT One of the novel methods for the thermal management of multi-core processors is power multiplexing (also known as core hopping) which involves dynamical change of the locations of active cores within the chip at fixed time intervals. The power multiplexing technique helps in reducing the number of hotspots on the chip by providing a spatially uniform thermal profile which in turn lowers the maximum temperature rise on the chip. We quantify the effects of power multiplexing on the thermal profile of multi-core processor chip. Different core migration policies have been implemented in an attempt to evolve an optimally suitable policy for the multiplexing purpose. We observe that the selection of appropriate migration policy and the migration rate can efficiently reduce the spatial non-uniformity and peak temperature on the chip. The ratio of active to total cores has been varied to accommodate and analyze the effect of varying computing workload. We correlated the cooling power with the peak temperature on the chip and discussed the efficient usage of core-migration policies in the context of the power reduction. . KEY WORDS: Multicore, Timeslice, Multiplexing, Core hopping, Hot spots NOMENCLATURE time, s temperature, K velocity along inlet flow direction, m/s maximum temperature on chip (K) minimum temperature on chip (K)

t T V Tmax Tmin

Greek symbols U mass density (kg/m3) CP specific heat (J/kg-K) k thermal conductivity (W/m-K) INTRODUCTION Due to the need of higher performance processors and advancements in circuit technology, numbers of transistors on a single chip have grown considerably and are expected to reach in billions per single chip [1]. In order to utilize billions
978-1-4244-5343-6/10/$26.00 2010 IEEE

of transistors on a single chip efficiently, semiconductor industry is moving towards Chip-Multiprocessors (CMP) or multi-core technology [1]. While the number of cores in these multi-core processors is expected to reach in hundreds or more per single processor die, such large-scale integration and higher power densities bring a significant challenge of heat dissipation. As the traditional air-cooling devices begin to reach their flow and acoustic limits apart from being highly inefficient from economic point of view when applied to many-core technology, maintaining a uniform temperature distribution using efficient heat redistribution techniques, can improve energy-efficiency and coefficient of performance (compute/cooling power) [2,3]. This brings new opportunities for the dynamic thermal management techniques for CMPs as the role of dynamic thermal management techniques to address the new challenges of power dissipation issues becomes very critical. Various dynamic thermal management (DTM) techniques available in literature such as clock gating, dynamic voltage and frequency scaling (DVFS) and thread migration for single and multi-core processors are based on reactive methods which rely on either reducing amount of dissipated energy or redistributing the energy over chip area (thread migration) whenever the chip temperature reaches the peak of acceptable limits [4-8]. All these reactive methods have their power and performance overheads apart from hardware and software implications. To evaluate the effect of DTM techniques on performance, Rao and Vradhula [4] proposed a technique to predict the steady-state throughput and the corresponding power consumption of a homogeneous multicore processor under given workload and thermal constraints. Donald and Martonosi [5] proposed framework for assessing different thermal control options including DVFS and thread migration. Mukherjee and Memik [6] proposed algorithms for physical aware frequency scaling as part of DVFS for DTM of multicore processors. Hanumaiah et al. [7] presented their analysis for maximizing performance of a multicore processor using DVFS under thermal constraints. Chapparo et al. [8] presented different DTM techniques including DVFS and thread migration for multicore designs. In case of many-core processors, throughout the execution, the distribution of threads to different cores can lead to uneven amounts of activity in different cores. Also since all the cores may not be simultaneously active, it leads to

spatial and temporal non-uniform power density map on the chip resulting in non-uniform thermal field. Significant spatiotemporal non-uniformity in the thermal field is detrimental to both performance and reliability [6]. It also increases leakage power resulting in higher power, cooling and maintenance costs [9, 10]. Hence uniform distribution of power dissipation on chip-area becomes important for manycore processors as very high performance can be achieved if the location of high temperature zones (hot spots) is optimized [11]. To obtain high level of thermal uniformity, proactive methods such as power multiplexing can prove to be very effective. These proactive methods can be utilized as supplementary approach to reactive methods for effective and better thermal management of many-core processors. However not much research has been performed to explore these proactive methods to analyze and control the thermal field on the chip. In our previous work, we have analyzed one of such proactive methods known as power multiplexing as an execution principle for many-core microprocessors for managing the thermal field under varying workload condition [12]. We analyzed the effects of spatiotemporal uniformity on leakage and performance of many-core chips and showed that the proposed method provides better energy-efficiency. We used a compact modeling tool Hot spot for the estimation of the thermal profile on chip [12]. In this paper, we perform a detailed 3D thermal modeling of electronic package and attached cooling devices to accurately estimate the thermal gradients on the chip while investigating the power multiplexing technique for thermal management. The present work analyze the effect of power multiplexing which involves dynamical rearrangement of the location of active (power generating) cores at fixed time intervals to reduce the peak temperature and gain uniformity in the thermal field. We quantify the effect of power multiplexing on the thermal profile of 256-core processor chip using a detailed 3D CFD-thermal modeling of the electronic package and relevant cooling components. These simulations involve the convective cooling effect and present more accurate and detailed results compared to compact thermal models which do not include fluid flow and associated cooling. We aim to help develop suitable power migration policies for better thermal management and reducing the power budget in many-core processors. For this purpose, we have analyzed the effects of different power configurations, timeslices, workloads and core migration policies. We also attempt to relate cooling power with the peak temperature of the chip to highlight a possible energy efficient solution for temperature control on chip by combining migration policies with cooling strategies. THERMO-FLUIDICS SYSTEM Electronic package and 3D thermal modeling: In order to accurately quantify and characterize the effect of power multiplexing on thermal profile of chip and on cooling efficiency, a detailed 3D CFD-thermal modeling of entire thermo-fluidics system has been performed using ANSYS Fluent [13].


Figure 1. (Top) Flow tunnel with the heat sink and electronic package used for the thermal modeling. (Below) Schematic of the heat sink and electronic package of the multicore processor which include heat spreader, thermal interface material, chip and substrate (view along the direction of inlet flow). Table 1. Properties of the various parts of the system

component Heat sink Heat spreader TIM Chip

Material copper aluminum grease silicon




8978 2719 2550 2330

381 871 700 712

387.6 202.4 4 141.2

This system is comprised of flow tunnel, heat sink, heat spreader, thermal interface material (TIM), chip and substrate (see Fig.1). The properties of the various components of the system are listed in Table 1. In our simulation, a predictive tile-type homogeneous 256-core processor is considered. The cores are arranged in a 16x16 2D array. Each core is assumed to have its own local cache and running at 3GHz clock frequency. Die size is 12mm x 12 mm and power dissipation range from 64W to 192W. The power numbers are selected based on the prediction by International Technology Roadmap of Semiconductor (ITRS) for 16nm node technology [14]. Our

system model considers each core with 1.0W power which

is reasonable for cores running at 3GHz with 16nm node technology [12]. A detailed discussion for the estimation of power numbers can be found in [12].
POWER MULTIPLEXING Power multiplexing is a technique used for the dynamic thermal management which helps in the redistribution of heat generated within the chip. This is implemented by rearrangement (also referred to as migration) of the locations of active cores within the chip at regular or specified time interval. This redistribution of heat lowers spatiotemporal temperature gradients on the chip and reduces the maximum temperature. The time interval at which this migration of cores takes place is known as timeslice. The manner in which the arrangement of cores takes place at each migration is determined by the policy being followed. Both timeslice and policy are important parameters for power multiplexing. We will consider different core migration policies and timeslices in our present work. In practice power multiplexing will be useful only when the computational workload on the processor is not 100%. Only a fraction of the total number of cores is used typically for compute work and rest are free of workload only to take up the load from the busy cores during migration. In our analysis we have considered 25% and 50% active cores and each active core is allocated same power such that the total power is 128W. Different random and specific configurations of these active cores have been considered for the analysis. On-Off switching of all cores on a chip:

the frequency of switching is higher (i.e. period is low), then the Tmax envelops along a lesser value, e.g., the difference in Tmax at t = 50 s is 16 oC for period of 1 s and 10s. EFFECT OF DIFFERENT POWER CONFIGURATIONS Power configurations change at each migration step during the multiplexing. Different power configuration yield different temperature profiles and have direct affect on the size, location and magnitude of hot spot on the processor.

Figure 3. Different power configurations on a 4x4 2D power array with 50% active cores for (a) Checkerboard, (b) Center, (c) Random.

Tmax = 377 K

Tmax = 388 K



Tmax = 382 K

Tmax = 378 K



Figure 2. Effect of periodically turning the cores on and off for different switching periods. Higher the frequency of the switching lower is the rise in the maximum temperature on the chip.

Figure 4. Temperature profile on a die for different power configurations on a 16x16 multi-core chip (a) checkerboard, (b) center, (c) random 1, (d) random 2.

In order to explain the concept of multiplexing, a simple case has been considered where all cores on chip are simultaneously switched on and off periodically. So in this case power configuration remains fixed i.e. migration of cores does not take place and they are just switched on and off at fixed time interval (referred to as period). It has been observed that for higher switching period the maximum temperature (Tmax) envelops towards a higher value (see Fig. 2). Whereas if

Figure 3 shows selected power configurations on a sample 4x4 power array. Checkerboard configuration (Fig. 3a) has the active cores arranged in checkerboard fashion. In this configuration active cores are evenly distributed on the chip. Center configuration (Fig. 3b) corresponds to the arrangement of cores where all the active cores are placed near the center of the chip. This configuration yields highest proximity among the active cores. In random configuration (Fig. 3c), active cores are randomly distributed on the chip.

The effect of different power configurations (namely Center, Checkerboard, and Random) on thermal profile is shown below (Fig. 4). Checkerboard configuration yields least maximum temperature among all types of configurations and also provides higher spatial uniformity in thermal profile on the chip (Fig. 4a) due to uniform arrangement of active cores. Center configuration gives the highest temperature rise on the chip and produces the biggest hot spot at the center due to higher proximity of active cores near the center (Fig. 4b). It also gives the highest spatial temperature difference (defined as the difference between the maximum and minimum temperature on the chip). Two other random configurations have also been shown (Fig. 4c, d) which shows peak temperature and spatial temperature difference in the range between those of former two configurations.

The change in Tmax of chip with time for different power configurations is shown in Fig. 6. The difference in Tmax values for different configurations after 0.1 s remain constant till steady temperature is achieved. These characteristics of the power configurations can be further utilized to explore the efficient power migration policies such that power can be migrated from one configuration to another configuration which has lesser spatial non-uniformity and lesser Tmax. EFFECT OF TIMESLICE Power multiplexing involves migration of cores at specified regular time intervals (referred to as timeslices). Frequency of core migration during multiplexing is inversely proportional to timeslice taken. We have considered three cases: (a) always on (i.e. no multiplexing), (b) multiplexing with timeslice = 33e-4s and (c) multiplexing with timeslice = 33e-5s to study the effect of timeslice on thermal profile of chip. In our model switching of active core locations is based on the number of clock cycles after which we want to perform core migration. As the cores are running at 3GHz frequency, the timeslices 33e-4s and 33e-5s will correspond to 10000K and 1000K clock cycles respectively [12]. Multiplexing has been implemented using random policy which involves migration of active cores from one random configuration to another.

Figure 5. Variation of spatial temperature difference for different power configurations.

The correlation of spatial temperature difference with Tmax for different random power configurations has been shown in Fig. 5. It has been observed that spatial temperature difference increases with increasing T max. In other words, higher the peak temperature on the chip, more is the nonuniformity in the thermal profile.

Figure 7. Effect of timeslice on the maximum temperature rise for random multiplexing. 25% of cores are active with total power =128W.

Figure 6. Maximum temperature rise of chip with time for different power configurations on chip. Total Power = 128W.

Results suggest that multiplexing helps in reducing the peak temperature and provides uniform thermal profile on the chip. However, the reduction in maximum temperature and degree of uniformity depend on the timeslice used for the multiplexing. Lesser timeslice accompanies with higher reduction in the peak temperature (Fig. 7) and lesser spatial temperature difference (Fig. 8). Thermal profile on the chip at three different time instants, t= 0.1s, 0.33 s and 0.66s, is shown in Figure 9 for three different cases considered. A significant difference in the evolution of thermal profiles in time can be observed for the three cases mentioned above. Thermal profile becomes more uniform as we decrease the timeslice (Fig. 9). Random power multiplexing with high migration frequency generates less number of hot spots, reduces peak temperature and produces relatively uniform thermal profile on the chip.

such as 25%, 50% and 75% active cores out of total 256 cores. Expectedly, the peak temperature rises as we increase the workload (Fig. 10).
304 310 315 K 309 318 327 K 315 322 340 K

Tmax = 315 K (a)

Tmax = 327 K (b)

Tmax = 339 K (c)

Figure 8. Variation of spatial temperature difference with time for different timeslices for random multiplexing. 25% active cores with total power =128.
t = 0.099s

t = 0.33s

t = 0.66s

Figure 11. Thermal profile on 256 core chip for (a) 64 active cores (b) 128 active cores (c) 192 active cores at time t = 0.66s. Random policy has been used for core hopping with timeslice = 33e-4s. Respective maximum temperatures are shown at the top of contours at t = 0.66s.


Higher workload presents itself with the higher temperature rise across the entire chip as observed in Fig. 11 (a), (b) and (c). For higher number of active cores, the size and strength of hot spots increases leading to higher maximum temperature on the chip. EFFECT OF POLICY Power multiplexing involves change of configuration or arrangement of active cores at each migration step i.e. currently active cores are deactivated and another set of cores are activated. The configuration change can occur in a number of ways and it depends on the policy followed during the multiplexing. In case of random policy, a different random configuration (with same number of active cores and total power) is used after each migration step (i.e. timeslice). In case of cyclic policy the 256 core chip is divided into smaller blocks of 2x2 cores . The active cores are assigned in checker board fashion and shifted in a circular fashion after each timeslice. In this study, we have considered these two polices with 64 active cores and total power as 128W.


Figure 9. Thermal profile on 256 core chip at different time instants for (a) no multiplexing, (b) multiplexing with timeslice = 33e-4 and (c) multiplexing with timeslice = 33e-5 with random core migration policy. 25% active cores with total power = 128W.


Figure 10. Variation of maximum chip temperature with time for different number of active cores.

In order to assess the effect of processor workload on temperature profile of chip, we have considered three cases

Figure 12. Variation of maximum chip temperature with time for different policies. Timeslice for multiplexing is taken to be 33e-4s. 25% active cores with total power =128W.

Power multiplexing with cyclic policy yields less peak temperature compared to the random policy (Fig. 12). The spatial temperature difference across the chip is also lower in the case of cyclic policy (Fig. 13). This observation is consistent with the fact that cyclic policy has checkerboard distribution of active cores which means lesser proximity of active cores compared to the random configuration.

EFFECT OF COOLING POWER Heat sink attached with the electronic package provides increased surface area for the heat dissipation from the processors; this heat is carried away by the air through the convection process. The mass flow rate of air passing through the heat sink has direct effect on the cooling of the chip. Higher the inlet speed of air or lower the inlet temperature of air, lesser is the peak temperature on the chip, but it also consumes high cooling power (Fig. 15). Our goal is to correlate the peak temperature of the chip with the cooling power which can provided insights for the efficient usage of migration policies for reducing the total power budget without compromising the compute performance. The relationship between peak temperature and inlet air speed (or cooling power) is not linear. In fact, as we increase the cooling power, the amount of further reduction in maximum temperature decreases rapidly (see Fig. 16). It, therefore, suggests that increasing the cooling power to decrease the peak temperature is not always an energy efficient solution for temperature control over chip. Combining migration policies with cooling strategies can help in efficient utilization of power.

Figure 13. Variation of maximum chip temperature with time for different policies corresponding to timeslice = 33e-4s. 25% active cores with total power = 128W.

Cyclic Policy: The effect of timeslice on power multiplexing with cyclic policy has been performed for three cases: (a) always on (i.e. no multiplexing), (b) multiplexing with timeslice = 33e-4s and (c) multiplexing with timeslice = 33e5s. These cases are same to the three cases considered for the random policy. It has been observed that thermal profile becomes uniform only for considerably low timeslice (Fig. 14). For higher timeslices, small hotspots tend to show up which highlights the fact that fine grain control over hotspots is relatively difficult in case of cyclic policy compared to the random policy.

Figure 15. Variation of maximum temperature with time for different inlet air speeds.





328 K




Figure 14. Thermal profile on chip for cyclic policy for (a) timeslice = 33e-5s, (b) timeslice = 33e-4, (c) no change in configuration. Higher spatial thermal uniformity can be seen for high frequency of multiplexing/core hopping. 25% active cores with total power = 128W.

Figure 16. Relation between peak temperature on the chip and cooling power. Different curves correspond to different random power configurations.

CONCLUSION Spatiotemporal power multiplexing approach has been analyzed in this paper as a prospective thermal management technique for multicore microprocessors. The method utilizes the phenomenon of lateral heat flow to redistribute the generated heat in many-core chip by varying the location of the active cores over time. This results in a lower maximum temperature and better spatiotemporal uniformity of the temperature. Increasing the frequency of core-migration accompanies with higher reduction in peak temperature on the chip. The reduction in peak temperature and degree of nonuniformity of chip is further related with the selection of migration policy. Cyclic migration leads to reduced peak temperature compared to the random policy, but fine grain control over hotspots is relatively difficult in case of cyclic policy compared to the random policy. These migration policies need to be applied efficiently with cooling strategies for improvement in energy-efficiency and coefficient of performance of computing units.

On-chip Temperature Distribution in Homogeneous Many-Core Processors in SEMI-THERM, 2010, Santa Clara, CA, USA [13] International Technology Roadmap for Semiconductors." [] [14 ]

REFERENCES [1] L. Peng et al, Memory Performance and Scalability of Intels and AMDs Dual-Core Processors: A Case Study, in IPCCC, 2007, pp. 55-64. P. Rodgers et al, Limits of Air-Cooling: Status and Challenges, in SEMI-THERM, 2005, pp. 116-124. S. Krishnan et al, Towards a Thermal Moores Law, IEEE Transactions on Advanced Packaging, 2007, pp. 462-474. R. Rao, S. Vrudhula, and C. Chakrabarti, Throughput of Multi-Core Processors Under Thermal Constraints," in ISLPED, 2007, pp. 201-206. J. Donald and M. Martonosi, Techniques for Multicore Thermal Management: Classification and New Exploration," in ISCA, 2006, pp. 78-88. R. Mukherjee and S. O. Memik, Physical Aware Frequency Selection for Dynamic Thermal Management in Multi-Core Systems", in ICCAD, 2006, pp. 547-552. Hanumaiah et al, Maximizing performance of thermally constrained multi-core processors by dynamic voltage and frequency control, in ICCAD, 2009, pp. 310-313. P. Chaparro, et. al, Understanding the Thermal Implications of Multicore Architectures, IEEE Transactions on Parallel and Distributed Systems, 2007, pp. 1055-1065 J Srinivasan, et. al, The Case for Lifetime ReliabilityAware Microprocessors, ISCA, 2004, pp. 276- 287. E. kursun et. al, Temperature Variation Characterization and Thermal Management of Multicore Architectures, IEEE Micro, 2009, pp. 116-126. G. Xu, Thermal Modeling of Multi-&RUH 3URFHVVRUV in ITHERM, 2006, pp. 96-100. &KR 0 HW DO Proactive Power Migration to Reduce Maximum Value and Spatiotemporal Non-uniformity of

[2] [3]






[9] [10]

[11] [12]

Anda mungkin juga menyukai