The processor chip invented in 1971 plays a role in defining the computer. Since then, the computer has evolved in accordance with the development of the processor chip. It is the computer on the chip, the processor The chip's ISA (Instruction Set Architecture, instruction set architecture) has already dominated the world. In 1987, people put forward the concept of system-on-chip (SoC) to study how to transfer computer system design to system-on-chip design, which will play a role in the replacement. The system chip has a bus-interconnected MP (Multi-Processor) system chip and a network-connected AP (ArrayProcessor) system chip, but the AP system chip has not yet developed to a mature stage. The chip design provides an opportunity for competition. Therefore, we have conducted research on the MPP (Massively Parallel Processing, Massively Parallel Processing) system chip architecture. Now, from the unification of the four aspects of data flow calculation mode, parallel computing array chip, application evolution mathematical technology, and silicon-based chip manufacturing technology, the development of the array processor system chip has been studied, and how to Design an array processor system chip with a unified architecture, referred to as APU (Array Processing for Unification architecture, Array Processing for Unification architecture) system chip.
The unification of data flow calculation mode
The Turing abstract machine in 1935 defined the calculation mode that controls the flow of data and completes the calculation. Now it has formed the instruction flow, data flow and structure. Let flow three calculation modes that control the flow of data. The current popular computing mode for controlling data flow is mainly von Neumann's instruction flow computing mode, which has four architectures of SISD, SIMD, MISD and MIMD. But the current single-core/multi-core/many-core chips only implement the SISD instruction flow calculation mode, as well as MMX [SIMD], pipeline [MISD], VLIW [MIMD] and other low-parallel instruction flow calculation modes. Since SIMD's instruction flow calculation mode is most suitable for image processing algorithms, the processors and computers of the SIMD architecture have already been developed. The data flow calculation mode is realized by circuit-designed ASIC/ASSP chips or statically reconstructed FPGA chips, while the structured flow calculation mode is realized by reconfigurable RCDevice (ReConfigurable Device) chips. Their computational efficiency High, the application design threshold is also high, there is no flexibility in programming, and there are many types of chips. Therefore, we have studied and implemented the MISD/MIMD instruction flow calculation mode, which not only has the computational efficiency of the data flow/structure flow calculation mode, but also has the flexibility of program design, the application design threshold is low, and the variety of chips is small. Etc. The unification of the calculation mode is to use the MISD/MIMD instruction flow calculation mode to replace the data flow/structure flow calculation mode without programming flexibility, so that all calculations are unified into the instruction flow calculation mode.
Unification of parallel computing array chips
From the perspective of parallel computing, there are array chips for task-level parallel computing, data-level parallel computing, operation-level parallel computing, and instruction-level parallel computing. The current MPP computer is mainly based on Task Level Parallel (TLP, Task Level Parallel) to complete the calculation; it is implemented by single-core/multi-core/many-core chips. Single-core/multi-core/many-core chips are evolving to MP system chips and AP system chips for TLP computing. TLP computing is a calculation of MPMD by mapping tasks (processes/threads) to cores (processors). . Due to the problems of synchronization and mutual exclusion between tasks (processes/threads), TLP calculations have low efficiency and complex programming. Data Level Parallel (DLP, Data Level Parallel) calculations are calculated according to the SIMD mode, mainly implemented by the SIMD architecture in the instruction flow calculation mode. There are already GPUs and other system chips, as well as GPU or CPU+GPU MPP Computer. Operation Level Parallel (OLP, Operation Level Parallel) calculation is performed on the ASIC/ASSP/FPGA array chip in the data flow calculation mode and the RCDevice array chip in the flow calculation mode. There is no program design (change). Flexibility. Science and art are used to explore the 4-dimensional space-time relationship. The APU system chip uses the abutting technology between PE (Processing Element) to explore the 4-dimensional space-time parallel computing relationship, and realizes DLP calculation and instruction level. Parallel (ILP, Instruction Level Parallel) calculation. The unification of the array chip is the DLP calculation of SIMD and the ILP calculation of MISD/MIMD, which is realized by the APU system chip of the adjacent interconnection (Abutting) between the processing elements.
The unification of applied evolutionary mathematics technology
Computational science is a "mathematical technology" derived from mathematical thinking and engineering thinking, which has changed the way people think. As a result of the increase in chip integration speed according to Moore’s prediction, mathematical technology has promoted the new development of computers in the application evolution of high-performance computing, networked computing and embedded computing. High-performance computers mainly help mankind understand the world and create the world through simulation, such as earth simulators, blue storms, cosmic computers, code crackers and weapon simulators. The names of these computers indicate the evolution of their applications, which require the establishment of complex mathematical models and experimental or observational databases through mathematical techniques. The core of simulation is to establish a mathematical model related to the real or virtual system, and discuss the impact on the high-performance computer architecture through the mathematical model and database. The communication function of networked computing is very successful and fundamentally changed the world's information infrastructure. Now, with the evolution of mathematical technology, the role of computer networks has evolved from a communication role to a resource sharing service role, which is called Net-Centric Computing/Grid Computing and Network Storage. With the support of high-performance parallel computing and mass storage systems, cloud computing and SaaS (Software as a Service, Storage as a Service, software as a service, storage as a service) or HaaS (Hardware as a Service, hardware as a service), etc. Mathematical technology enables the next-generation data center to play the service role of "data power plant" and "data bank".
Embedded computing is a service model that combines computing technology with the physical world. Some people call it the application of reification and physics. It simulates the form of interaction between humans and the physical world, and has become a The visual, auditory and sensory, etc.) and the computer of the actuator (simulating the human limbs), and through the application of mathematical technology, allow industrial machines to work autonomously like humans. Although the current mathematical technology of artificial intelligence only gives robots the ability to think logically and partly in images, and basically has no creative thinking ability, it has brought creative methods to the research of robots. In terms of shape, humanoid robots and non-humanoid robots. The deformed robot of the US Department of Defense is to make the robot have the ability of self-assembly through the mathematical technology that evolves with the application, which can ensure that the robot can successfully board the surface of the planet. In terms of function realization methods, there are artificial methods and natural bionic methods. Artificial robots include surgical robots and autonomous driving robots. Bionic method robots include air-sounding robots, gravity walking robots, chemical robots, neuron robots, emotional robots, robots that simulate biological evolution, and molecular robots. The bionic methods make the calculation of mathematical techniques that evolve with applications increasingly natural. . The rapid development of computing technology is also reflected in the evolution of programming languages, from the earliest Basic to Algol, to Fortran, and now the C language, which is close to assembly language. Mathematical technology is finally mapped to the computer through assembly language to complete the calculation. The advantage of assembly language is high program quality, but the disadvantage is poor readability, no compatibility, and it is not uniform. Therefore, the ISA of the APU system chip is not described in mnemonic assembly language, but a mapping language oriented to math technology and instruction definition is used to describe ISA, referred to as M language (Mapping/MiddleLanguage). Mathematical technology is unified to the mapping language to improve the reusability of the program.
The unification of silicon-based chip manufacturing technology
Quantum computing and biological computing are still in the exploratory stage, and the current computer is implemented using silicon-based chip manufacturing technology. It is expected that the manufacturing technology of silicon-based chips will be close to its development limit by 2016, and it is necessary to find new technological breakthroughs. For example, expanding the chip area is a new way to improve chip integration, which is the Wafer Scale Integration (WSI) technology. For another example, hybrid integrated circuits are a miniaturized, high-performance, and highly reliable interconnect packaging method, which is called secondary integration technology in China. In 1993, the Georgia Institute of Technology in the United States put forward the concept of SoP (System on Package, System-in-Package) that integrates SoC chips, MEMS chips, and passive components. IC chips developed according to Moore's Law account for only 10% of the volume of a system, while SoP solves 90% of the volume of the system. Especially in 2007, Intel took the lead in having the production capacity of 45nm silicon-based chips, which enabled the semiconductor industry to enter the era of "material-driven revolution". The 32nm chip with an integration level of nearly 2 billion transistors is close to practical.
In order to solve the "red wall" problem of deep submicron technology and the miniaturization of embedded applications, the TSV 3D integrated manufacturing technology of silicon-based chips has been developed. IBM, Intel and Samsung have all adopted TSV (Through-Silicon-Via, through-silicon via package) three-dimensional integration technology. According to IBM, TSV technology can shorten the transmission distance required for chip data by 1,000 times, increase the number of connections by 100 times, and reduce power consumption by as much as 20%. IBM will apply TSV technology to wireless communication chips, power processors, BlueGene supercomputer chips and high-bandwidth memory. The "Sixteen Special Projects" proposed by my country's 2006 National Science Conference embodies the characteristics of the industrial chain of chip design, manufacturing and application. Driven by the strategic tasks of the "Sixteen Special Projects", it is expected that my country's chip technology will keep up with the development pace of the "Moore's Prophecy". The unification of manufacturing technology refers to the unification of three-dimensional integrated TSV technology to realize the miniaturization of embedded computers and solve the RedbrickWall (red wall) problem of deep submicron; it is also the only way to improve my country's chip manufacturing capabilities. In terms of design, the array architecture of the APU system chip, as well as chips such as sensors, displays, and memories, are all arrays, which are just suitable for the application of TSV technology.
From the perspective of PE interconnection structure, array processors can be divided into four prototypes:
Linear Array Processor (LAP) , LinearArrayProcessor)
Square array processor (SAP, SquareArrayProcessor)
Pyramid processor (PYR, PYRamid)< /p>
HypeRcube (HPR, HyPeRcube).
Among them, the square array processor looks more in line with the two-dimensional structure of the image. However, some previous studies have found that under the premise of the same number of PEs, the LAP Computing efficiency and data throughput rate are no less than SAP, and the former has smaller hardware overhead.