The system's novel features include a fast channel decomposer, an obstacle-avoiding global router, and an obstacle-avoiding switching router. The router's channel decomposition algorithm relies on a corner-stitched data structure to efficiently produce a small number of large channels. The global router considers obstacles during path generation, trading-off net length and channel complexity to simplify the subsequent channel routing task. While able to cope with obstacles, Magic's switchbox router is still comparable to the best traditional (non-obstacle-avoiding) channel routers.
The router's obstacle-avoidance features rely on two underlying concepts: (1) a preferred direction for crossing an obstacle, and (2) hazards, or areas the routing should avoid. Crossing obstacles in the preferred direction minimizes the creation of blocked areas, which can not be crossed by other routing; this minimizes obstacles' impact on the automatic routing. Crossings in preferred directions are controlled by strategically-placed hazards adjacent to obstacles.
Measurements show that obstacle-avoiding routing is both useful and practical: hand-routing improves the electrical characteristics of the selected nets, while the hand-routing obstacles have only minor effects upon the routing quality of a design as a whole. The improvement in electrical characteristics is due to the decreased net length and increased attention to layer selection possible when nets are prewired by hand.
The clock is the essence of a synchronous digital system. Physically, the clock is distributed from an external pad to all similarly clocked synchronizing elements through a distribution network that encompasses combinational logic and interconnects. It serves to unify the physical and temporal design abstractions by determining the precise instants in time at which the digital machine changes state. Because the clock is important, optimization of the clock signal can have a significant impact on the chip's cycle time, especially in high-performance designs. Non-optimal clock behavior is caused by two phenomena: the routing to the chip's synchronizing elements, and the asymmetric behavior of the clock distribution logic.
Previous work in clock optimization has been contributed by several authors. H-trees have been recognized for years as a technique to help reduce the skew in synchronous systems [FK82] [KGE82] [DFW84] [BWM86]. For regular structures such as systolic arrays the H-tree works well to reduce skew because the synchronizing elements are distributed in a uniform pattern throughout the chip. However, for general design styles, nonuniform distributions of clock pins are common and the H-tree becomes ineffective as a technique for clock routing. The large size of the clock net has led some researchers [DFW84] [Mij87] to perform buffer optimization within the clock distribution tree. [BWM86] have provided an analysis of the clock tree that considers the transmission line properties of the interconnects. [BBB +89] have presented an approach for ASIC clock distribution that integrates buffer optimization into place and route algorithms. However, in all previous work the routing of the clock net is performed using ordinary global routing tools based on minimum spanning or approximate minimum Steiner tree net models and with detailed routers that have little understanding of clock routing problems. This causes non-optimal clock behavior and as region size or the number of pins in the clock net increases, the detrimental behavior is exacerbated. In this paper, we focus on routing techniques for optimizing clock signals in VLSI circuits. We demonstrate the superiority of our algorithm over standard routing techniques for widely varying region size, clock pin distributions, numbers of clock pins and technology feature size.
In section two the preliminaries necessary for understanding the paper are presented. Following this, in section three the problem is defined. Section four illustrates the algorithm for clock routing and section five discusses theoretical results. Next, in section six the experimental results are presented, and in section seven possible avenues for future work and conclusions regarding the approach are discussed.
A communication network constructed from the proposed components is modeled as a set of nodes (components) connected by bidirectional communication links. Because of technological constraints, the total I/O bandwidth of each node is limited to some fixed value, and assumed to be equally divided among the attached links. Increasing the number of links per component leads to a reduction in the average number of hops between nodes, but at the cost of reduced link bandwidth. This "hop count/ link bandwidth" tradeoff is examined in great detail through M/M/1 queueing models and simulations using traffic loads generated by parallel application programs. These results indicate that a small number of links should be used. It is also found that a significant improvement in performance is obtained if a component is allowed to immediately begin forwarding a message when the selected output link becomes idle, regardless of whether or not the end of the message has arrived. Finally, mechanisms which efficiently transmit a single message to multiple destinations are seen to have a significant impact on performance in programs relying on global information.
The complexity of the circuitry required to implement a communication component is examined. Schemes for providing hardware support for communication functions--routing, buffer management, and flow control--are presented. Estimates of the number of buffers and the degree of multiplexing on each communication link are determined. The amount of circuitry to implement a communication component is computed, and it is seen that the proposed communication component could be complemented with technology available today. Design recommendations for the implementation of such a component are made.
Fast Packet Switches for Asynchronous Transfer Mode will form the basis for implementing the Broadband Intergated Services Digital Network of the future. This thesis presents the preliminary design of a switch chip that implements a novel flow control algorithm guaranteeing fair allocation and full utilization of bandwidth even under congestion. The switch chip is intented to be connected via its bit parallel links to other identical switch chip and various interface chips, so that it can be used as a building block for constructing composite switches of arbitrary topology and size. Its architecture is scalable up to a dozen multi-Gbit/sec links, although the first prototype is designed for only 4 bidirectional links of 400 Mbits/sec per link in each direction. The organization of the on-chip buffer memory, along with the back-pressure flow control, and the weighted round robin cell scheduling mechanisms that the chip implements in hardware, provide the network manager a set of powerful tools for tuning traffic flows and guaranteeing service performance. Full bandwidth utilization is achieved by providing dedicated buffer space to some classes of Virtual Circuits, and communication latency is reduced by using ``virtual cut-through''.The realization of this architecture under a 1\ \(*mm CMOS technology has been studied. We present the circuits that are crucial for the size and speed of the chip. These crucial circuits are primarily: the buffer memory, the input-output buffers, the content addressable memory used to select the next VC to be serviced, the circuit that schedules the accesses to the shared buffer, and the chip routing tables. We present the design and layout of the above circuits and the results of their simulation as a proof of the fact that the proposed architecture is realizable using the modern VLSI technology
1991.TR25.Fast_packet_switches.ps.Z
In this report, hierarchical decomposition is examined as a way to attack the general area routing problem. This divide and conquer method is a useful technique for managing complexity and in recent years has been applied to various routing problems. In particular, there are examples of this concept in channel routers [BP82, BP83a, HM85, HM89], switchbox routers [BP83b], and global routers [M584, LTW86, M586, Lau87]. This work describes a set of experiments that outlines the extension of these applications to general area routing problems.
We first consider the problem of global routing in gate-arrays. This problem has important applications in the design of integrated circuits, and can be formulated as a zero-one integer program. We introduce a technique we call randomized rounding for producing a provably good approximation to this integer program from the solution to its relaxation. In order to prove the quality of this approximation, we make use of some new bounds on the tail of the binomial distribution. We present the results of experiments conducted on industrial gate=arrays using our methods; these are encouraging and call for further work.
We then show that our randomized rounding technique can be applied to some problems in combinatorial optimization and operations research. We also describe the relation between the problems we study and a class of combinatorialresults known as "discrete ham-sandwich theorems". This leads to the problem of rounding linear program solutions deterministically mimic the randomized algorithm in a certain precise sense. This leads us to the development of a deterministic polynomial time rounding algorithm that yields the same performance guarantees as the random ized method.
To overcome this limitation, I have developed a model for macro-module layout that employs a high degree of tool interaction. The module floor planner decides the relative placement and shape of function blocks that minimizes external wiring while maximizing the amount of over the cell routing. Cell layouts are provided on demand by the SoGOLaR automatic cell generator; their quality and feasibility are, in turn, dependent on the constraints imposed by the external wiring. This model can easily accommodate modules with non-uniform function blocks.
Inter-tool communication is handled by maintaining a database of layouts based on a data structure that contains both the layout state and associated metrics. The database handles all layout requests, and maintains a history of both successful and unsuccessful synthesis attempts, all while imposing few restrictions on tool interaction.
A separate failure handler is employed to remedy situations where the normal tool interaction sequence produces a floor plan that cannot be routed or no longer meets specified constraints. The handler will undo portions of the layout and re-invoke the synthesis tools, altering their control parameters and constraints to guide them toward a more promising solution. Repetition is prevented by restricting constraint alterations and by checking the layout database for previous failed layout attempts. Remedies are controlled by a greedy strategy that is significantly less complex than general backtracking methods.
A merged hierarichal approach to layout presented that provides a means of investigating the relationship between placement and routing to improve current layout methods. A common data model is used to integrate layout phases and to simplify and enhance information flow within the layout process. The data model induces a structure to the layout process for experimenting with feedback and information management.
A layout system based on a 2x2 grid-graph abstraction that combines placement and global routing has been implemented based on this paradigm. Examples from sea-of-gates designs are used for benchmark comparisons.