GPU Setup & Cloud

> Training on CPU is fine for learning. Training for real needs a GPU.

Type: Build

Languages: Python

Prerequisites: Phase 0, Lesson 01

Time: ~45 minutes

Learning Objectives

The Problem

Most lessons in phases 1-3 run fine on CPU. But once you start training CNNs, transformers, or LLMs (phases 4+), you need GPU acceleration. A training run that takes 8 hours on CPU takes 10 minutes on GPU.

You have three options: local GPU, cloud GPU, or Google Colab (free).

The Concept

Your options:

1. Local NVIDIA GPU
   Cost: $0 (you already have it)
   Setup: Install CUDA + cuDNN
   Best for: Regular use, large datasets

2. Google Colab (free tier)
   Cost: $0
   Setup: None
   Best for: Quick experiments, no GPU at home

3. Cloud GPU (Lambda, RunPod, Vast.ai)
   Cost: $0.20-2.00/hr
   Setup: SSH + install
   Best for: Serious training, large models

Build It

Option 1: Local NVIDIA GPU

Check if you have one:

nvidia-smi

Install PyTorch with CUDA:

import torch

print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

Option 2: Google Colab

  1. Go to colab.research.google.com
  2. Runtime > Change runtime type > T4 GPU
  3. Run !nvidia-smi to verify

Upload notebooks from this course directly to Colab.

Option 3: Cloud GPU

For Lambda Labs, RunPod, or Vast.ai:

ssh user@your-gpu-instance

pip install torch torchvision torchaudio
python -c "import torch; print(torch.cuda.get_device_name(0))"

No GPU? No problem.

Most lessons work on CPU. The ones that need GPU will say so and include Colab links.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using: {device}")

Build It: GPU vs CPU benchmark

import torch
import time

size = 5000

a_cpu = torch.randn(size, size)
b_cpu = torch.randn(size, size)

start = time.time()
c_cpu = a_cpu @ b_cpu
cpu_time = time.time() - start
print(f"CPU: {cpu_time:.3f}s")

if torch.cuda.is_available():
    a_gpu = a_cpu.to("cuda")
    b_gpu = b_cpu.to("cuda")

    torch.cuda.synchronize()
    start = time.time()
    c_gpu = a_gpu @ b_gpu
    torch.cuda.synchronize()
    gpu_time = time.time() - start
    print(f"GPU: {gpu_time:.3f}s")
    print(f"Speedup: {cpu_time / gpu_time:.0f}x")

Exercises

  1. Run the benchmark above and compare CPU vs GPU times
  2. If you don't have a GPU, run it on Google Colab and compare
  3. Check how much GPU memory you have and estimate the largest model you can fit (rule of thumb: 2 bytes per parameter for fp16)

Key Terms

Term What people say What it actually means
CUDA "GPU programming" NVIDIA's parallel computing platform that lets you run code on the GPU
VRAM "GPU memory" Video RAM on the GPU, separate from system RAM. Limits model size.
fp16 "Half precision" 16-bit floating point, uses half the memory of fp32 with minimal accuracy loss
Tensor Core "Fast matrix hardware" Specialized GPU cores for matrix multiplication, 4-8x faster than regular cores