Because program phases are difficult to define, we compare the techniques using a variety of metrics. BBV techniques perform better than the other techniques providing higher sensitivity and more stable phases. The conditional branch counter technique provides good sensitivity, but is less effective at detecting major phase changes.
As an auxiliary result, we show that techniques based on procedure granularities do not perform as well as those based on instruction or basic block granularities. These slowdowns are often trig- ambiguation gered by various long-latency stalls. Our goal is to develop a systematic approach to accurately predict We have examined several microarchitectural these stalls and then address them either through performance bottlenecks, especially for high- counter measures or by allowing the hardware performance numerical applications.
We found that resources to be used in independent tasks. This in turn and executed ahead of the real computation. We are exploring a range of solutions to IV. One angle we looked at is to use binary the basic research issues in phase detection and analysis to identify load instructions that under exploitation in an advanced execution environment. These schemes. Based on these techniques, we have ob- instructions are then handled specially to yield tained encouraging results on dynamic memory precious microarchitectural tracking resources at management, co-scheduling on multi-threaded chip run-time to reduce stalling.
At least three pa- the hardware, not only saving energy but also pers [10], [13], [25] will appear in in par- relieving pressure on an important resource and allel computing, computer architecture, and pro- therefore improving performance [13]. Huang, A. Garg, and M. Software-Hardware Darema.
Cooperative Memory Disambiguation. Ioannidis, U. Rencuzogullari, R. Stets, and S. Amza, A. Cox, S. Dwarkadas, and W. In Proc. Liu and M. Conference on Supercomputing, St. Malo, France, June— [2] C. Dwarkadas, L. Jin, K. Ra- July Adaptive protocols for [16] G.
Magklis, M. Scott, G. Semeraro, D. Albonesi, and software distributed shared memory. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, processor. In International Symposium on Computer and S. In International Symposium on [17] G. Magklis, G. Albonesi, S. Dropsho, Microarchitecture, pages —, Monterey, California, S.
Dwarkadas, and M. Dynamic frequency and Dec. Buyuktosunoglu, cessor. A Dynamically Tunable Memory Hi- Papathanasiou and M. Chaver, L. Pinuel, M. Prieto, F. Tirado, and M. Chen, C. Tang, X. Chen, S. Technical Conf. Multi-level shared state for distributed systems. Rencuzogullari and S. Dynamic adap- In Proc. In 8th ACM [7] F. Darema, G. In Proceedings Parallel Programming, June A technique for and Modeling of Computer Systems, May Dropsho, A. Buyuktosunoglu, R.
The goal of authors is that identified phases are valid for different architectures but since L2 Miss Count is architecture-dependent, cache behavior of threads is presented in the form of architectural signature that collected offline. So, the needed information to identify program phases is obtained from architectural signature. The output of the proposed method is phase signature that includes the information about program phases on considered architectures.
In the second half of the paper, the accuracy of generated architectural signatures is evaluated and phases of some Splash benchmarks for different architectures are identified then obtained results are compared with actual ones.
0コメント