Complexity-Effective Superscalar Processors

Complexity-Effective Superscalar Processors

| Subbarao Palacharla, Norman P. Jouppi, J. E. Smith
This paper analyzes the tradeoff between hardware complexity and clock speed in superscalar processors. It defines a generic superscalar pipeline and analyzes specific areas such as register renaming, instruction window wakeup and selection logic, and operand bypassing. These components are modeled and simulated for feature sizes of 0.8μm, 0.35μm, and 0.18μm. The results show that window wakeup and selection logic, as well as operand bypass logic, are likely to be the most critical in the future. A microarchitecture that simplifies wakeup and selection logic is proposed. This implementation puts chains of dependent instructions into queues and issues instructions from multiple queues in parallel. Simulation shows little slowdown compared to a completely flexible issue window when measured in clock cycles. This microarchitecture helps minimize performance degradation due to slow bypasses in future wide-issue machines. The paper discusses the sources of complexity in a baseline microarchitecture, focusing on instruction dispatch and issue logic, and data bypass logic. It analyzes potential critical paths in these structures and develops models for quantifying their delays. The analysis shows that logic associated with the issue window and data bypasses are likely to be key limiters of clock speed due to wire delays dominating overall delay. The paper proposes a dependence-based microarchitecture that groups dependent instructions rather than independent ones. This microarchitecture simplifies issue window logic while exploiting similar levels of parallelism to current superscalar microarchitectures using more complex logic. The proposed microarchitecture is evaluated and shows improved performance. The paper also discusses the methodology used to study the critical pipeline structures, including the use of Hspice circuit simulations for different feature sizes. The results show that the delay of the structures increases with issue width and window size, and that wire delays become increasingly important as feature sizes are reduced. The paper concludes that the dependence-based microarchitecture can be clocked faster than typical microarchitectures by reducing the delay of the window logic significantly. This could improve the clock period by up to 39% in 0.18μm technology.This paper analyzes the tradeoff between hardware complexity and clock speed in superscalar processors. It defines a generic superscalar pipeline and analyzes specific areas such as register renaming, instruction window wakeup and selection logic, and operand bypassing. These components are modeled and simulated for feature sizes of 0.8μm, 0.35μm, and 0.18μm. The results show that window wakeup and selection logic, as well as operand bypass logic, are likely to be the most critical in the future. A microarchitecture that simplifies wakeup and selection logic is proposed. This implementation puts chains of dependent instructions into queues and issues instructions from multiple queues in parallel. Simulation shows little slowdown compared to a completely flexible issue window when measured in clock cycles. This microarchitecture helps minimize performance degradation due to slow bypasses in future wide-issue machines. The paper discusses the sources of complexity in a baseline microarchitecture, focusing on instruction dispatch and issue logic, and data bypass logic. It analyzes potential critical paths in these structures and develops models for quantifying their delays. The analysis shows that logic associated with the issue window and data bypasses are likely to be key limiters of clock speed due to wire delays dominating overall delay. The paper proposes a dependence-based microarchitecture that groups dependent instructions rather than independent ones. This microarchitecture simplifies issue window logic while exploiting similar levels of parallelism to current superscalar microarchitectures using more complex logic. The proposed microarchitecture is evaluated and shows improved performance. The paper also discusses the methodology used to study the critical pipeline structures, including the use of Hspice circuit simulations for different feature sizes. The results show that the delay of the structures increases with issue width and window size, and that wire delays become increasingly important as feature sizes are reduced. The paper concludes that the dependence-based microarchitecture can be clocked faster than typical microarchitectures by reducing the delay of the window logic significantly. This could improve the clock period by up to 39% in 0.18μm technology.
Reach us at info@study.space