Graphics Card

Overview

  • Nvidia Graphics Card
    • GPU
      • GPC (Graphics Processing Clusters)
        • Raster Engine
        • TPC (Texture Processing Clusters)
          • SM (Streaming Multiprocessor)
            • Warp Scheduler
            • Dispatch Unit
            • L0 Instruction Cache
            • L1 Data Cache / Shared Memory
            • Register File
            • CUDA Core (Compute Unified Device Architecture)
            • Tensor Core
            • RT Core (Ray Tracing Core)
            • Tex (Texture Unit)
            • LD/ST (Load/Store Unit)
            • SFU (Special Function Unit)
      • L2 Cache
      • NVENC
      • NVDEC
      • Memory Controller
      • PCIe Host Interface
    • VRAM
    • Interface
      • PCIe
      • NVLink
      • Display Output
        • HDMI
        • DP
    • Power
    • Cooling

GPU

Architecture

  • Nvidia Fermi

  • Nvidia Kepler

  • Nvidia Maxwell

  • Nvidia Pascal

    • Add NVLink
    • eg. Nvidia P100, Nvidia GTX 10 Series
  • Nvidia Volta

    • Add Tensor Core
    • eg. Nvidia V100
  • Nvidia Turing

    • Add RT Core
    • eg. Nvidia GTX 16 Series, Nvidia RTX 20 Series
  • Nvidia Ampere

    • eg. Nvidia RTX 30 Series, Nvidia A100
  • Nvidia Ada-Lovelace

    • eg. Nvidia RTX 40 Series
  • Nvidia Hopper

    • eg. Nvidia H100
  • Nvidia Blackwell

CUDA Core

  • High precision: FP64, FP32, FP16, INT32

Tensor Core

  • Low precision: FP16, INT8
  • Special for matrix multiplication and accumulation
  • Mixed Precision

VRAM

Parameter

  • Capacity
  • Latency
  • Bandwidth
    • Refresh Rates
    • Memory Bus Width

128-bit = 4/8/16G
160-bit = 10G
192-bit = 3/6/12G
256-bit = 4/8/16G
320-bit = 10/20G
352-bit = 11/22G
384-bit = 6/12/24G

Type

  • GDDR (Graphics Double Data Rate)

    • GDDR5
    • GDDR5X
    • GDDR6
    • GDDR6X
  • HBM (High Bandwidth Memory)

    • HBM
    • HBM2
    • HBM2e
    • HBM3
    • HBM3e

Power

  • Power Phases

Interconnect

  • Hardware
    • Intra-Machine
      • Shared Memory
      • PCIe
      • NVLink
    • Inter-Machine
      • InfiniBand
      • TCP/IP Sockets
      • RDMA (Remote Direct Memory Access)
        • RoCE
  • Software
    • MPI
    • GLOO
    • XCCL

RDMA

  • CPU Offload
  • Kernel Bypass
  • Zero Copy

Software Tech

  • Ray Tracing
  • DLSS (Deep Learning Super Sampling)
  • CUDA
    • cuDNN (CUDA Deep Neural Network Library)
    • TensorRT

References

深入GPU原理
https://www.bilibili.com/video/BV1bm4y1m7Ki

GPU工作原理
https://www.bilibili.com/video/BV17L4y1a7Xy

RTX40系显卡评测序章:ADA新架构变化有多大?
https://www.bilibili.com/video/BV1W8411W7aM

Pascal Architecture Whitepaper
https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf

Volta Architecture Whitepaper
https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf

Turing Architecture Whitepaper
https://images.nvidia.com/aem-dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf

Ampere Architecture Whitepaper
https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.1.pdf

Hopper Architecture Whitepaper
https://resources.nvidia.com/en-us-tensor-core

Ada-Lovelace Architecture Whitepaper
https://images.nvidia.com/aem-dam/Solutions/geforce/ada/nvidia-ada-gpu-architecture.pdf

Blackwell Architecture Technical Brief
https://resources.nvidia.com/en-us-blackwell-architecture