CPU is the central processing unit. The structure of CPU mainly includes ALU (Arithmetic and Logic Unit), control unit (CU, Control Unit), registers (Register), cache, and data and control unit for communication between them. (The structure of the CPU mainly consists of the Arithmetic and Logic Unit (ALU), the Control Unit (CU), the Register, the Cache, and the Bus of Data, Control and Status (BUS) for communication between them.)
GPU generally refers to the graphics processor, also known as the display chip, visual processor, is a special microprocessor in personal computers and some mobile devices to do image and graphics-related computing work.
Now notebooks are basically Intel processors + NVIDIA discrete graphics cards, very common.
It is said that hardware is the next windfall, coupled with the rapid development of deep learning and computer vision in recent years
ushered in the era of new processors, the following is to introduce one by one.
1.NPU
Embedded neural network processor (NPU), the full name of the English Neural-network Processing Units, using the “data-driven parallel computing” architecture, especially good at processing video, image class of massive multimedia data. For example, the laboratory’s ITX3588J 8K AI development board, its specifications written on the NPU description as follows.
Here the science under TOPS (unit of processor computing power), OPS is the abbreviation of Tera Operations Per Second, 1TOPS represents the processor can carry out a trillion times per second (10^12) operations.
1GOPS means that the processor can perform one billion (10^9) operations per second, and 1MOPS means that the processor can perform one million (10^6) operations per second. TOPS is convertible with GOPS and MOPS, both representing the number of operations per second, but in different units.
In some cases, TOPS/W is also used as a performance metric to evaluate the computing power of the processor, and TOPS/W is used to measure how many trillion operations the processor can perform at 1W power consumption.
In the industry, computational performance per unit of power consumption, or performance-to-power ratio, is used to measure the merits of processor architectures. According to Zhang Yundong, executive director of the lab and chief technology officer of SMIC, the performance-to-power ratio of Starlight Smart One is “at least two or three orders of magnitude higher” than that of the traditional von Neumann architecture, or several hundred times higher.
IBM’s Deep Blue in the 20th century and Google’s AlphaGo in 2016 need to be supported by huge data calculations, with the former using supercomputers and the latter using server clusters that cannot be separated from the server room with constant temperature and humidity. A single game of chess costs $3,000 in photoelectricity. Zhang Yundong called them “a scientific experiment”, and there is still a long way to go before the technology is put into practice.
This highlights the advantages of miniaturization, low power consumption and low cost of embedded NPU to accelerate the application of artificial intelligence technology on the ground. For example, drones have high requirements for camera weight and power consumption, otherwise it will affect takeoff and endurance. The birth of Starlight Smart One, which is only the size of an ordinary postage stamp and weighs only a few dozen grams, has made it possible for many small devices such as surveillance cameras to be artificially intelligent, taking a step forward from the mysterious machine room to the application of artificial intelligence in life.
2.VPU
VPU (Video Processing Unit) is a new core engine of video processing platform with hard decoding function and the ability to reduce CPU load. In addition, VPU can reduce server load and network bandwidth consumption. vpu is proposed by ATI to distinguish from traditional GPU (Graph Process Unit). The Graph Process Unit in turn includes three main modules: the Video Processing Unit, the External Video Module and the Post-Processing Module. Simply put, the VPU is a module that belongs to the GPU and can be taken out separately.
Basic features of VPU.
Rich categories of supported video codec standards and high-definition images.
Supports multiple anti-bias tools, multiple decoding and full-duplex multiparty calls simultaneously.
Provides programmability, soft-initiality, and easily upgradable decoding and encoding or host interfaces, as firmware in both codec processing and host interfaces are implemented as programmable microprocessors.
VPU programming features.
The programming characteristics of the VPU are reflected in the presence of an internal DSP called the BIT processor. the operation of this BIT processor is determined by a dedicated microcode called bit firmware. The VPU provides developers with a programming implementation interface with the following features.
Frame-based processing. the BIT processor completes decoding operations on a frame-by-frame basis, and after the VPU decoder operates without the burden of image processing commands from the host processor, the host application performs its own operations until it is ready for the next image processing operation or until it receives an interrupt notification from the VPU to complete the host processor image processing.
Program memory management. It has its own program’s memory to load the bit firmware used to support application-specific operations. To use this internal memory efficiently, the BIT firmware has a dynamic reload schedule that allows the VPU to have a small amount of program memory.
Multiple Instances The VPU supports multiple instances, which is useful for multi-channel decoder applications. When creating a new instance and starting an image processing operation, the VPU creates and automatically updates a set of context parameters within. This internal context management scheme allows the tasks of different decoders running on the host processor to control the VPU to operate independently of its own instance number. When a new instance is created, the application’s task receives a new handle specifying an instance. An application can easily close an instance by calling a parameter to terminate a single VPU task.
VPU Features and Applications.
The VPU is designed for vision processing applications with special enhancements in performance, power consumption, and functionality to make it more relevant to real-world application requirements. It is designed to take into account multiple uses and is specifically optimized for vision processing in a hardware system.
In fact, the VPU is also a SoC with internal integration of multiple master RISC CPUs, many hardware gas pedal units and vector processor arrays, a high-performance image signal processor (ISP) specifically designed for visual massive pixels, and a rich set of high-speed peripheral interfaces. While ensuring powerful performance and functionality, VPU uses more advanced integrated circuit processes to greatly reduce the size of the chip, and combines effective technical means to reduce the power consumption of individual computing units.
The most well-known application today is Intel® Movidius™, a vision processor capable of efficiently completing demanding computer vision and edge AI workloads. By combining highly parallel programmable computing with workload-specific hardware acceleration in a unique architecture that minimizes data movement, Movidius vision processors balance power efficiency with computational performance. In areas such as visual retail, security, and industrial automation, VPU technology supports smart cameras, edge servers, and artificial intelligence devices with deep neural networks and computer vision-based applications.
3.APU
APU, the full English name is “Accelerated Processing Units”, which is a product that integrates the functions of CPU and GPU. APU is called Accelerated Processor in Chinese, and is a product of AMD’s concept of integration, which is the first time that the processor and the core of the unique graphics on a chip, it has the processing performance of both high-performance processors and the latest discrete graphics cards, supporting DX11 games and the latest applications of “accelerated computing “The secret of APU’s strong performance lies in its revolutionary core architecture, the latest video decoding engine, ultra-small chip and ultra-low power consumption design, and strong display performance.
Background of APU emergence.
NVIDIA, Intel and AMD all want to make their chips into a combination of CPU + GPU and have long launched such products. It can be said that the form of CPU + GPU has become the future trend of chip design, and this combination is APU.
AMD has dreamed of using APUs since the heyday of the Opteron CPU in 2006, and began launching the first APUs for PCs in 2010. custom APU series consoles were subsequently launched in the Sony Play Station 4 and 5 and the Microsoft Xbox XS, and a number of Opteron APUs were also launched — the X2100 in 2013 and the X3000 in 2017.
APU trends.
AMD’s recently announced roadmap shows that it will launch the Instinct MI300 chip in 2023, AMD’s first tens of trillion APU, which AMD calls “the world’s first data center APU”.
The APU is a chip that combines CPU and GPU cores into a single package, specifically combining a Zen4-based Epyc CPU with a GPU using its new CDNA3 architecture.
APU performance gains.
AMD says the Instinct MI300 is expected to deliver more than 8x the AI training performance improvement over its Instinct MI250X, and the CDNA3 architecture for the Instinct MI300 will deliver more than 5x the performance-to-power ratio improvement for AI workloads compared to the CDNA2 GPU architecture supporting the Instinct MI200 series .
AMD said the architecture will be designed to allow APUs to use less power than implementations using discrete CPUs and GPUs.
Intel similarly said its Falcon Shores chip will significantly increase bandwidth, performance per watt, compute density and memory capacity.
4.NVR
NVR, full name Network Video Recorder, that is, network video recorder, is the storage and forwarding part of the network video surveillance system, NVR and video encoder or network camera work together to complete the video recording, storage and forwarding functions.
Intelligent NVR is the current latest generation of NVR, using embedded design, integrated high-performance NPU module, embedded mature video structured identification algorithm, real-time video of human, vehicle, face target detection, tracking, capture and identification, fully exploit the value of surveillance video, service security big data era.
Intelligent NVR makes the traditional network camera with intelligent brain, so that the system has the ability to understand the video content, which in turn can be based on the video content of the target-based warning, deployment, content-based search, big data mining, the ordinary network camera into intelligent cameras.
Main features.
Integration of human, vehicle and face recognition algorithms based on the traditional NVR with video storage as the main purpose.
Support for 8-way video structuring, or 4-way video structuring with face recognition.
support for video retrieval with structured and unstructured information.
supporting unified storage of video, picture, structured and unstructured data for easy management.
support for 10 million dynamic library retrieval, and second-level results for million dynamic library retrieval.
support video storage, intelligent processing, camera management, platform management, intelligent retrieval, arming and control warning, statistical analysis, event processing, real-time preview, video playback and other functions.
support 4/8/16 disk storage, single disk support up to 8T.
support ONVIF protocol, can be connected to third-party devices.
Support HDMI video output with a maximum resolution of 4K.
Support H.265, H.264 HP/MP/BP decoding.