Effcient Handling of Lock Hand-off in DSM Multiprocessors with Buffering Coherence Controllers |
| |
Authors: | Benjamín Sahelices Agustín de Dios Pablo Ibá?ez Víctor Vi?als-Yúfera José María Llabería |
| |
Affiliation: | Benjamín Sahelices1, Agustín de Dios1, Pablo Ibáez2, Member, IEEE Víctor Vials-Yúfera2, Member, ACM, IEEE, and José María Llabería31Computer Science Department and HiPEAC European Network of Excellence, University of Valladolid, Valladolid, Spain 2Computer Science and Systems Engineering Department, I3A Research Institute and HiPEAC European Network of Excellence, University of Zaragoza, Zaragoza, Spain 3Computer Architecture Department and HiPEAC European Network of Excellence, Polytechnic University of Catalua Barcelona, Spain |
| |
Abstract: | Synchronization in parallel programs is a major performance bottleneck in multiprocessor systems. Shared data is protected
by locks and a lot of time is spent on the competition arising at the lock hand-off. In order to be serialized, requests to
the same cache line can either be bounced (NACKed) or buffered in the coherence controller. In this paper, we focus mainly
on systems whose coherence controllers buffer requests. In a lock hand-off, a burst of requests to the same line arrive at
the coherence controller. During lock hand-off only the requests from the winning processor contribute to progress of the
computation, since the winning processor is the only one that will advance the work. This key observation leads us to propose
a hardware mechanism we call request bypassing, which allows requests from the winning processor to bypass the requests buffered
in the coherence controller keeping the lock line. We present an inexpensive implementation of request bypassing that reduces
the time spent on all the execution phases of a critical section (acquiring the lock, accessing shared data, and releasing
the lock) and which, as a consequence, speeds up the whole parallel computation. This mechanism requires neither compiler
or programmer support nor ISA or coherence protocol changes. By simulating a 32-processor system, we show that using request
bypassing does not degrade but rather improves performance in three applications with low synchronization rates, while in
those having a large amount of synchronization activity (the remaining four), we see reductions in execution time and in lock
stall time ranging from 14% to 39% and from 52% to 71%, respectively. We compare request bypassing with a previously proposed
technique called read combining and with a system that bounces requests, observing a significantly lower execution time with
the bypassing scheme. Finally, we analyze the sensitivity of our results to some key hardware and software parameters. |
| |
Keywords: | distributed shared memory multiprocessors synchronization buffer coherence controller request bypass |
本文献已被 CNKI SpringerLink 等数据库收录! |
|
|