Graphics Card
Overview
- Nvidia Graphics Card
- GPU
- GPC (Graphics Processing Clusters)
- Raster Engine
- TPC (Texture Processing Clusters)
- SM (Streaming Multiprocessor)
- Warp Scheduler
- Dispatch Unit
- L0 Instruction Cache
- L1 Data Cache / Shared Memory
- Register File
- CUDA Core (Compute Unified Device Architecture)
- Tensor Core
- RT Core (Ray Tracing Core)
- Tex (Texture Unit)
- LD/ST (Load/Store Unit)
- SFU (Special Function Unit)
- SM (Streaming Multiprocessor)
- L2 Cache
- NVENC
- NVDEC
- Memory Controller
- PCIe Host Interface
- GPC (Graphics Processing Clusters)
- VRAM
- Interface
- PCIe
- NVLink
- Display Output
- HDMI
- DP
- Power
- Cooling
- GPU
GPU
Architecture
Nvidia Fermi
Nvidia Kepler
Nvidia Maxwell
Nvidia Pascal
- Add NVLink
- eg. Nvidia P100, Nvidia GTX 10 Series
Nvidia Volta
- Add Tensor Core
- eg. Nvidia V100
Nvidia Turing
- Add RT Core
- eg. Nvidia GTX 16 Series, Nvidia RTX 20 Series
Nvidia Ampere
- eg. Nvidia RTX 30 Series, Nvidia A100
Nvidia Ada-Lovelace
- eg. Nvidia RTX 40 Series
Nvidia Hopper
- eg. Nvidia H100
Nvidia Blackwell
CUDA Core
- High precision: FP64, FP32, FP16, INT32
Tensor Core
- Low precision: FP16, INT8
- Special for matrix multiplication and accumulation
- Mixed Precision
VRAM
Parameter
- Capacity
- Latency
- Bandwidth
- Refresh Rates
- Memory Bus Width
128-bit = 4/8/16G
160-bit = 10G
192-bit = 3/6/12G
256-bit = 4/8/16G
320-bit = 10/20G
352-bit = 11/22G
384-bit = 6/12/24G
Type
GDDR (Graphics Double Data Rate)
- GDDR5
- GDDR5X
- GDDR6
- GDDR6X
HBM (High Bandwidth Memory)
- HBM
- HBM2
- HBM2e
- HBM3
- HBM3e
Power
- Power Phases
Interconnect
- Hardware
- Intra-Machine
- Shared Memory
- PCIe
- NVLink
- Inter-Machine
- InfiniBand
- TCP/IP Sockets
- RDMA (Remote Direct Memory Access)
- RoCE
- Intra-Machine
- Software
- MPI
- GLOO
- XCCL
RDMA
- CPU Offload
- Kernel Bypass
- Zero Copy
Software Tech
- Ray Tracing
- DLSS (Deep Learning Super Sampling)
- CUDA
- cuDNN (CUDA Deep Neural Network Library)
- TensorRT
References
深入GPU原理
https://www.bilibili.com/video/BV1bm4y1m7Ki
GPU工作原理
https://www.bilibili.com/video/BV17L4y1a7Xy
RTX40系显卡评测序章:ADA新架构变化有多大?
https://www.bilibili.com/video/BV1W8411W7aM
Pascal Architecture Whitepaper
https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf
Volta Architecture Whitepaper
https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
Turing Architecture Whitepaper
https://images.nvidia.com/aem-dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf
Ampere Architecture Whitepaper
https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.1.pdf
Hopper Architecture Whitepaper
https://resources.nvidia.com/en-us-tensor-core
Ada-Lovelace Architecture Whitepaper
https://images.nvidia.com/aem-dam/Solutions/geforce/ada/nvidia-ada-gpu-architecture.pdf
Blackwell Architecture Technical Brief
https://resources.nvidia.com/en-us-blackwell-architecture