The central processing unit (CPU) is one of the main equipment of electronic computers and the core component of computers. Its function is mainly to interpret computer instructions and process data in computer software. The CPU is the core component of the computer that is responsible for reading instructions, decoding and executing instructions. The central processing unit mainly includes two parts, namely the controller and the arithmetic unit, which also includes the high-speed buffer memory and the bus that realizes the data and control of the connection between them. The three core components of an electronic computer are the CPU, internal memory, and input/output devices. The functions of the central processing unit are mainly to process instructions, perform operations, control time, and process data.
In the computer architecture, the CPU is the core hardware unit that controls all the hardware resources of the computer (such as memory, input and output units) and performs general operations. The CPU is the computing and control core of the computer. The operation of all software layers in the computer system will eventually be mapped to the operation of the CPU through the instruction set.
CPU appeared in the era of large-scale integrated circuits. The iterative update of processor architecture design and the continuous improvement of integrated circuit technology have promoted its continuous development and improvement. From initially dedicated to mathematical calculations to widely used in general calculations, from 4-bit to 8-bit, 16-bit, 32-bit processors, and finally to 64-bit processors, from the incompatibility of various manufacturers to the emergence of different instruction set architecture specifications, The CPU has been developing rapidly since its inception.
CPU development has a history of more than 40 years. We usually divide it into six stages.
(1) The first stage (1971-1973). This is the era of 4-bit and 8-bit low-end microprocessors, and the representative product is the Intel 4004 processor.
In 1971, Intel’s 4004 microprocessor integrated arithmetic unit and controller on a single chip, marking the birth of the CPU; in 1978, the appearance of the 8086 processor laid the foundation for the X86 instruction set architecture. Subsequently, the 8086 series processors were widely used in personal computer terminals, high-performance servers and cloud servers.
(2) The second stage (1974-1977). This is the era of 8-bit mid-to-high-end microprocessors, and the representative product is Intel 8080. At this time, the command system is relatively complete.
(3) The third stage (1978-1984). This is the era of 16-bit microprocessors, and the representative product is Intel 8086. Relatively speaking, it is relatively mature.
(4) The fourth stage (1985-1992). This is the era of 32-bit microprocessors, and the representative product is Intel 80386. Already capable of multi-tasking, multi-user operations.
The 80486 processor released in 1989 implemented a 5-level scalar pipeline, marking the initial maturity of the CPU and the end of the development phase of the traditional processor.
(5) The fifth stage (1993-2005). This is the era of the Pentium series of microprocessors.
In November 1995, Intel released the Pentium processor, which for the first time adopted a superscalar instruction pipeline structure, introduced out-of-order execution of instructions and branch prediction technology, which greatly improved the performance of the processor. Therefore, the superscalar instruction pipeline structure has been adopted by subsequent modern processors, such as AMD (Advanced Micro devices) Ryzen and Intel's Core series.
(6) The sixth stage (2005 to 2021). The processor gradually develops towards more cores and higher parallelism. Typical representatives are Intel's Core series processors and AMD's Ryzen series processors.
In order to meet the upper-level work requirements of the operating system, modern processors have further introduced functions such as parallelization, multi-core, virtualization, and remote management systems, which continue to promote the development of upper-level information systems.
The Von Neumann architecture is the basis of modern computers. Under this architecture, programs and data are stored in a unified manner. Instructions and data need to be accessed from the same storage space, transmitted via the same bus, and cannot be overlapped for execution. According to the Von Neumann system, the CPU's work is divided into the following five stages: instruction fetching stage, instruction decoding stage, instruction execution stage, access number and result write back.
Instruction fetch (IF, instruction fetch) is the process of fetching an instruction from the main memory to the instruction register. The value in the program counter is used to indicate the position of the current instruction in the main memory. When an instruction is fetched, the value in the PC will automatically increase according to the length of the instruction word.
In the instruction decode stage (ID, instruction decode), after fetching the instruction, the instruction decoder splits and interprets the retrieved instruction according to the predetermined instruction format, identifying and distinguishing different instruction types And various methods of obtaining operands. Modern CISC processors will split to improve parallelism and efficiency.
The execution instruction stage (EX, execute), which specifically implements the function of the instruction. The different parts of the CPU are connected to perform the required operations.
Access the access number stage (MEM, memory), according to the instruction needs to access the main memory, read the operand, the CPU obtains the address of the operand in the main memory, and reads the operation from the main memory Numbers are used for calculations. Some instructions do not need to access the main memory, you can skip this stage.
Result write back stage (WB, write back), as the last stage, the result write back stage "writes back" the running result data of the execution instruction stage to some storage form. The result data is generally written to the internal register of the CPU so that it can be quickly accessed by subsequent instructions; many instructions will also change the state of the flag bits in the program status word register. These flag bits indicate different operation results and can be Used to influence the actions of the program.
After the instruction is executed and the result data is written back, if no unexpected events (such as result overflow, etc.) occur, the computer will obtain the address of the next instruction from the program counter and start a new cycle. One instruction cycle will fetch the next instruction in sequence.
Performance and structure
Performance measurement indicators
For the CPU, the indicators that affect its performance mainly include the main frequency, the number of CPU bits, and the CPU cache Instruction set, number of CPU cores and IPC (number of instructions per cycle). The so-called CPU frequency refers to the clock frequency, which directly determines the performance of the CPU. You can increase the CPU frequency by overclocking to obtain higher performance. The number of CPU bits refers to the number of floating-point numbers that the processor can calculate at one time. Generally, the higher the number of CPU bits, the faster the CPU performs operations. After the 1920s, the CPUs used in personal computers are generally 64-bit. This is because 64-bit processors can process a larger range of data and natively support higher memory addressing capacity, which improves people's work efficiency. The cache instruction set of the CPU is stored inside the CPU, which mainly refers to the hard program that can guide and optimize the operation of the CPU. Generally speaking, the cache of a CPU can be divided into a first-level cache, a second-level cache, and a third-level cache. The cache performance directly affects the processing performance of the CPU. Some CPUs with special functions may be equipped with a four-level cache.
Generally speaking, the structure of a CPU can be roughly divided into arithmetic logic components, register components, and control components. The so-called arithmetic logic components are mainly capable of performing related logical operations, such as: shift operations and logical operations, in addition to fixed-point or floating-point arithmetic operations, and address operations and conversion commands. Functional arithmetic unit. The register component is used to temporarily store instructions, data and addresses. The control component is mainly used to analyze the instruction and can send out the corresponding control signal. The computer memory can be divided into random access memory (RAM) and read-only memory (ROM). The difference between the two is that the random access memory can directly exchange data with the CPU, and it can also be called the main memory. RAM can be read and written at any time, and the speed of this process is very fast. Therefore, due to this advantage of main memory, it is often used as a temporary data storage medium for operating systems or other running programs; and read-only memory ROM is a kind of memory that can only read the data stored in advance. The user does not have the authority to change the data stored in it and cannot delete it, and the data will not disappear after the power is turned off. This kind of memory has also been widely used, and has been well used in electronic or computer systems where data does not need to be changed frequently.
For the central processing unit, it can be regarded as a large-scale integrated circuit whose main task is to process and process various data. The storage capacity of the traditional computer is relatively small, it has certain difficulty in the process of large-scale data processing, and the processing effect is relatively low. With the rapid development of my country's information technology level, a high-configuration processor computer has appeared, and the high-configuration processor is used as the control center, which plays an important role in improving the structure and function of the computer's CPU. The core part of the central processing unit is the controller and arithmetic unit, which play an important role in improving the overall function of the computer. It can realize the proliferation of multiple functions such as register control, logic operation, signal transmission and reception, and lay a good foundation for improving the performance of the computer. Foundation.
Integrated circuits act as control signals in the computer and execute different command tasks according to user operation instructions. The central processing unit is a very large-scale integrated circuit. It is composed of arithmetic units, controllers, registers, etc., as shown in the figure below. The key operation lies in the processing and processing of various data.
Traditional computers have relatively small storage capacity and low operating efficiency for large-scale data sets. The new generation of computers uses a high-configuration processor as the control center, and the CPU has a lot of room for improvement in terms of structure and function. The central processing unit uses arithmetic units and controllers as the main devices, and gradually spreads into multiple functions such as logic operations, register control, program coding, and signal transmission and reception. These all speed up the optimization and upgrading of CPU control performance.
The CPU bus is the fastest bus in a computer system, and it is also the core of the chipset and motherboard. People usually call the local bus directly connected to the CPU CPU bus or internal bus, and call those local buses connected to various general-purpose expansion slots as system bus or external bus. In a CPU with a relatively single internal structure, there is often only a set of data transfer buses, that is, the CPU internal bus, which is used to connect the internal registers of the CPU and the arithmetic logic operation components, so this type of bus can also be called It is the ALU bus. The bus inside the component connects the chips together by using a set of buses, so it can be called the bus inside the component, which generally contains two sets of lines, the address line and the data line. The system bus refers to the line that connects the various components inside the system and is the basis for connecting the whole system together; the bus outside the system is the basic line that connects the computer and other devices together.
The core part
The arithmetic unit
The arithmetic unit is the part of the computer that performs various arithmetic and logical operations, and the arithmetic logic unit is the central processing core Part.
(1) Arithmetic logic unit (ALU). Arithmetic logic unit refers to a combinational logic circuit that can realize multiple sets of arithmetic operations and logic operations, and it is an important part of the central processing. The operation of the arithmetic logic unit is mainly to perform two-bit arithmetic operations, such as addition, subtraction, and multiplication. In the process of calculation, the arithmetic logic unit is mainly to perform arithmetic and logical operations in a centralized computer instruction. Generally speaking, ALU can play the role of direct reading and reading, which is specifically reflected in the processor controller, memory and input and output devices. , Input and output are implemented on the basis of the bus. The input instruction contains an instruction word, including operation code, format code and so on.
(2) Intermediate register (IR). Its length is 128 bits, and its actual length is determined by the operand. IR plays an important role in the "push into stack and fetch" instruction. During the execution of this instruction, the contents of ACC are sent to IR, and then the operand is fetched to ACC, and then the contents of IR are put on the stack.
(3) Operational accumulator (ACC). Current registers are generally single accumulators with a length of 128 bits. For ACC, it can be regarded as a variable length accumulator. In the process of narrating instructions, the expression of the length of ACC is generally based on the value of ACS, and the length of ACS is directly related to the length of ACC, doubling or halving the length of ACS can also be regarded as doubling or halving the length of ACC.
(4) Description word register (DR). It is mainly used to store and modify description words. The length of DR is 64 bits. In order to simplify data structure processing, the use of description words plays an important role.
(5) Register B. It plays an important role in the modification of instructions. The length of the B register is 32 bits, and the address modification amount can be saved in the process of modifying the address. The main memory address can only be modified with a description word. Pointing to the first element in the array is the description word, so access to other elements in the array should require a modifier. For number composition, it is composed of data of the same size or elements of the same size, and is stored continuously. The common access method is vector description word. Because the address in vector description word is byte address, In the conversion process, the basic address should be added first. For conversion work, it is mainly realized automatically by hardware. In this process, special attention should be paid to alignment to avoid exceeding the array boundary.
The controller refers to changing the wiring of the main circuit or the control circuit and changing the resistance value in the circuit according to a predetermined sequence to control the start, speed, brake and reverse of the motor. To the master device. The controller is composed of the program status register PSR, the system status register SSR, the program counter PC, and the instruction register. As a "decision-making body", the main task is to issue commands and play the role of coordinating and commanding the operation of the entire computer system. The classification of control mainly includes two kinds, respectively is the combinational logic controller, the microprogram controller, the two parts have their own advantages and disadvantages. Among them, the structure of the combinational logic controller is relatively complex, but the advantage is that it is faster; the structure of the microprogram controller is simple, but in the modification of a machine instruction function, all the microprograms need to be reprogrammed.
Introduction of related brands
"Loongson" series chips
"Loongson" series chips are designed and developed by the Chinese Academy of Sciences Zhongke Technology Co., Ltd., using the MIPS system Structure, with independent intellectual property rights, the products now include three series of Loongson No. 1 small CPU, Loongson No. 2 medium CPU and Loongson No. 3 large CPU, in addition to Loongson 7A1000 bridge chips. The Loongson 1 series 32/64-bit processors are specially designed for the embedded field, mainly used in cloud terminals, industrial control, data acquisition, handheld terminals, network security, consumer electronics and other fields, with low power consumption, high integration and high Features such as cost performance. Among them, the Loongson 1A 32-bit processor and Loongson 1C 64-bit processor work stably at 266-300 MHz, and the Loongson 1B processor is a lightweight 32-bit chip. The Loongson 1D processor is a dedicated chip for ultrasonic heat meters, water meters and gas meters. In 2015, the new generation of Beidou navigation satellites is equipped with the Loongson 1E and 1F chips independently developed by my country. These two chips are mainly used to complete the data processing task 1 of the inter-satellite link.
Loongson 2 series are 64-bit high-performance low-power processors for desktop and high-end embedded applications. Loongson 2 products include Loongson 2E, 2F, 2H and 2K1000 chips. For the first time, Loongson 2E realized external production and sales authorization. The average performance of Loongson 2F is more than 20% higher than that of Loongson 2E. It can be used in personal computers, industrial terminals, industrial control, data acquisition, network security and other fields. Loongson 2H launched its official product in 2012, which is suitable for the needs of computers, cloud terminals, network equipment, consumer electronics and other fields, and can be used as a full-featured chip with HT or PCI-e interfaces. In 2018, Loongson launched the Loongson 2K1000 processor, which is mainly a dual-core processing chip for the field of network security and mobile intelligence. The main frequency can reach 1 GHz, which can meet the needs of the rapid development of the industrial Internet of things and an independent and controllable industrial security system.
The Loongson 3 series are multi-core processors for high-performance computers, servers and high-end desktop applications, featuring high bandwidth, high performance, and low power consumption. The Loongson 3A3000/3B3000 processor adopts an independent microstructure design, and the main frequency can reach 1.5 GHz or more; the Loongson 3A4000, which is planned for the market in 2019, is the first quad-core chip of Loongson's third-generation product. The chip is based on a 28nm process and adopts new research and development. GS464V 64-bit high-performance processor core architecture, and implements 256-bit vector instructions, while optimizing on-chip interconnection and memory access, integrated 64-bit DDR3/4 memory controller, integrated on-chip security mechanism, main frequency and performance It has been greatly improved again.
The Loongson 7A1000 bridge is the first dedicated bridge set product of Loongson. The goal is to replace the AMD RS780+SB710 bridge set and provide the north-south bridge function for the Loongson processor. It was released in February 2018 and is currently used on a high-performance network platform with Loongson 3A3000 and Ziguang 4G DDR3 memory. Compared with the 3A3000+780e platform, the overall performance of the program has been greatly improved, and it has the characteristics of high domestic production rate, high performance, and high reliability.
According to the Intel product line plan, as of 2021, Intel’s 11th-generation consumer core has five types of products: i9/i7/i5/i3/Pentium/Celeron. In addition, there are Xeon Platinum/Gold/Silver/Bronze for servers and Xeon W series for HEDT platform.
According to AMD's product line planning, as of 2021, AMD Ryzen 5000 series processors have four consumer product lines: ryzen9/ryzen7/ryzen5/ryzen3. In addition, there are the third-generation Xiaolong EPYC processor for the server market and the thread tearer series for the HEDT platform.
Shanghai Zhaoxin Integrated Circuit Co., Ltd. is a state-owned holding company established in 2013. Its processors adopt the x86 architecture. The products mainly include the first ZX- A, ZX-c/ZX-C+, ZX-D, Kaixian KX-5000 and KX-6000; Kaisheng ZX-C+, ZX-D, KH-20,000, etc. Among them, Kaixian KX-5000 series processors adopt 28 nm process and are available in 4-core or 8-core versions. The overall performance is up to 140% higher than the previous generation product, reaching the international mainstream general-purpose processor performance level, which can fully meet the party and government desktop Office applications, and a variety of entertainment applications including 4K ultra-high-definition video viewing. Kaisheng KH-20000 series processors are CPU products launched by Zhaoxin for servers and other equipment. Kaixian KX-6000 series processors have a main frequency of up to 3.0 GHz and are compatible with a full range of Windows operating systems and domestically-made independent controllable operating systems such as Zhongke Fangde, Winner Kirin, and PricewaterhouseCoopers. The performance is equivalent to Intel's seventh-generation Core i5.
Shenwei processor is referred to as "Sw processor", from DEC's Alpha 21164, adopts Alpha architecture, has completely independent intellectual property rights, and its products have single core Sw-1, dual-core Sw-2, quad-core Sw-410, sixteen-core SW-1600/SW-1610, etc. The Sunway Blu-ray supercomputer uses 8704 SW-1600, equipped with Sunway Ruisi operating system, realizing all the localization of software and hardware. Since its release in June 2016, the "Shenwei·Light of Taihu" supercomputer based on Sw-26010 has ranked first in the world's TOP 500 supercomputers for four consecutive times, and two items on the "Shenwei·Light of Taihu" Tens of millions of core machine applications won the "Gordon Bell" award, the world's highest award for high-performance computing applications in 2016 and 2017.
Instruction set method
The classification of CPU can also be divided into reduced instruction set computer (RISC) and complex instruction according to the instruction set method Set Computer (CISC). The length and execution time of RISC instructions are constant, while the length and execution time of CISC instructions are not necessarily. The degree of parallel execution of RISC instructions is better, and the efficiency of the compiler is also higher. The CISC instruction has better optimization for different tasks, at the cost of complex circuits and more difficult to improve parallelism. The typical CISC instruction set has the x86 micro-architecture, and the typical RISC instruction set has the ARM micro-architecture. However, in modern processor architectures, both RISC and CISC instructions will be converted during the decoding process and split into RISC-like instructions inside the CPU
Embedded System CPU
Traditional embedding The type of domain refers to a very wide range, and it is the main application domain of processors in addition to the server and PC domains. The so-called "embedded" means that in many chips, the processor contained in it is as if it is embedded in it and is unknown.
In recent years, with the further development of various new technologies and new fields, the embedded field itself has also been developed into several different sub-fields and has differentiated.
First of all, with the development of smart phones (Mobile Smart Phone) and handheld devices (Mobile Devices), the mobile field has gradually developed into an independent field that rivals or even exceeds the PC field in scale. Because the processor in the Mobile field needs to load the Linux operating system and involves a complex software ecosystem, it has the same serious dependence on the software ecosystem as the PC field.
Next is the real-time (Real Time) embedded field. Relatively speaking, this field does not have such serious software dependence, so there is no absolute monopoly. However, due to the successful commercial promotion of ARM processor IP, ARM's processor architecture still accounts for most of the market share. Other processor architectures such as Synopsys ARC, etc. also have good market results.
Finally is the deep embedded field. This field is more like the traditional embedded field mentioned earlier. The demand in this field is very large, but it often pays attention to low power consumption, low cost and high energy efficiency. There is no need to load a large application operating system like Linux. Most of the software requires customized bare metal programs or simple real-time operating systems. Therefore, The dependence on the software ecology is relatively low.
Mainframe, or mainframe. Mainframes use dedicated processor instruction sets, operating systems, and application software. The term mainframe originally referred to a large computer system housed in a very large iron box with a frame to distinguish it from smaller minicomputers and microcomputers.
Reducing mainframe CPU consumption is an important task. Saving each CPU cycle can not only delay hardware upgrades, but also reduce software licensing fees based on the scale of use.
The mainframe architecture mainly includes the following two points: a high degree of virtualization, and all system resources are shared. The mainframe can integrate a large number of loads into one and maximize resource utilization; asynchronous I/O operations. That is, when the I/O operation is performed, the CPU hands the I/O instructions to the I/O subsystem for completion, and the CPU itself is released to execute other instructions. Therefore, the host can perform other tasks at the same time while performing heavy I/O tasks.
The main form of CPU control technology
The powerful data processing function of the central processing unit has effectively improved the work efficiency of the computer. In the data processing operation, it is not just a simple Operation, the operation of the central processing unit is based on the instruction tasks issued by the computer users. During the execution of the instruction tasks, the control instructions input by the user are corresponding to the CPU. With the rapid development of information technology in our country, computers are widely used in people's lives, work, and corporate office automation. As a master control device, they promote the development of e-commerce networks and upgrade the CPU control performance. The process has been greatly improved. Command control, actual control, operation control, etc. are the performance of the application of computer CPU technology.
(1) Selection control. The operation of the centralized processing mode is implemented on the basis of specific program instructions to meet the needs of computer users. The CPU can choose according to the actual situation during the operation to meet the user's data flow requirements. The important role played by command and control technology. According to the needs of users, the calculation method is formulated, so that the orderly formulation of data command actions is well maintained. During the execution of the CPU, the implementation of the instructions of the program is completed smoothly, and only by following a certain sequence can the computer use effect be guaranteed. The CPU is mainly to expand the automatic processing of data sets, which is the key to realize centralized control, and its core is the instruction control operation.
(2) Insertion control. The generation of the operation control signal by the CPU is mainly realized through the function of instructions, and the purpose of controlling these components is achieved by sending the instructions to the corresponding components. The realization of an instruction function is mainly accomplished by a sequence of operations performed by the components in the computer. More small control components are the key to building a centralized processing mode, the purpose is to better complete the CPU data processing operations.
(3) Time control. Applying time timing to various operations is the so-called time control. When an instruction is executed, it should be completed within the specified time. The instruction of the CPU is fetched from the cache or memory, and then the instruction decoding operation is carried out, which is mainly implemented in the instruction register. In this process, it is necessary to Pay attention to strictly control the program time.
The flourishing development of CPUs also brings many security issues. The FDIV bug (Pentium floating point division error) that appeared on the Pentium processor in 1994 can cause errors in the division of floating-point numbers; the F00F abnormal instruction on the Pentium processor in 1997 can cause the CPU to crash; the Intel processor trusted execution technology in 2011 (TXT, trusted execution technology) has a buffer overflow problem, which can be used by attackers for privilege escalation; a vulnerability in the Intel management engine (ME, management engine) component in 2017 can lead to remote unauthorized arbitrary code execution; in 2018, The two CPU vulnerabilities of Meltdown and Spectre have affected almost every computing device manufactured in the past 20 years, leaving the private information stored on billions of devices at risk of being leaked. These security issues have seriously endangered national network security, critical infrastructure security, and information security in important industries, and have caused or will cause huge losses.
Comparison of CPU and GPU
GPU is the image processor. The workflow and physical structure of CPU and GPU are roughly similar, compared to As far as CPU is concerned, the work of GPU is more single. In most personal computers, the GPU is only used to draw images. If the CPU wants to draw a two-dimensional graph, it only needs to send an instruction to the GPU, and the GPU can quickly calculate all the pixels of the graph and draw the corresponding graph at the specified position on the display. Since the GPU generates a lot of heat, there is usually a separate heat sink on the graphics card.
CPU has a powerful arithmetic operation unit, which can complete arithmetic calculations in a few clock cycles. At the same time, there is a large cache that can store a lot of data in it. In addition, there are complex logic control units, which reduce the delay by providing branch prediction capabilities when the program has multiple branches. GPU is based on a large throughput design, with many arithmetic operation units and very few caches. At the same time, GPU supports a large number of threads to run at the same time. If they need to access the same data, the cache will merge these accesses, which will naturally cause delay problems. Despite the delay, because of the large number of arithmetic operation units, it can achieve a very large throughput effect.
Obviously, because the CPU has a large number of caches and complex logic control units, it is very good at logic control and serial operations. In comparison, GPU has a large number of arithmetic operation units, so it can perform a large number of calculations at the same time. It is good at large-scale concurrent calculations. The calculation is large but there is no technical content, and it has to be repeated many times. In this way, the way we use GPU to increase the speed of program operation is obvious. Using CPU to do complex logic control and GPU to do simple but large arithmetic operations can greatly improve the running speed of the program.
Future development of CPU
The general-purpose central processing unit (CPU) chip is a basic component of the information industry and a core component of weaponry. my country's lack of CPU technology and industry with independent intellectual property rights has not only caused the information industry to be controlled by others, but also national security cannot be fully guaranteed. During the "Tenth Five-Year Plan" period, the national "863 Program" began to support independent research and development of CPUs. During the "Eleventh Five-Year Plan" period, the "Core Electronic Devices, High-end General-purpose Chips and Basic Software Products" ("Nuclear High Base") major project introduced the CPU achievements of the "863 Program" into the industry. Starting from the "Twelfth Five-Year Plan" period, my country has carried out independent research and development of CPU applications and pilot projects in many fields, forming an independent technology and industrial system within a certain range, which can meet the application needs of weapons, equipment, information technology and other fields. But foreign CPU monopolization has been long, and my country's independent research and development of CPU products and the maturity of the market will take some time.