Week 26 — Coding Projects

Core

Implement deterministic GD and SGD.

  • NumPy: Fit linear regression with GD and SGD. Compare batch size effects. Track loss over iterations.
  • Metal: Compute batch gradients on GPU, optimizer update on CPU first. Multi-pass reduction for gradient. · Reading: MBT — batched compute, reductions, multi-pass update structure.
  • Vulkan: Compute passes for objective + gradient evaluation. · Reading: Vulkan Book — compute passes for objective + gradient evaluation.
  • CUDA: SGD mini-batch kernel with reduction of partial gradients. · Reading: CUDA Book — mini-batch compute, reduction of partial gradients.
  • Stretch: Add momentum or Adam. Compare convergence traces.
  • Verify: SGD is noisier but often cheaper per step · Learning rate matters dramatically · GPU batch gradient matches CPU reference.