Week 33 — Coding Projects

Core

Profile earlier kernels systematically.

  • NumPy: Build a benchmark harness and correctness harness for CPU reference routines. Record problem size, runtime, and error.
  • Metal: Use Metal profiling tools and timestamps. Profile reduction, scan, blur, matmul, and renderer passes. · Reading: MBT — debugging/profiling/performance sections, frame capture workflow.
  • Vulkan: Profile dispatches, barriers, and frame timing. · Reading: Vulkan Book — synchronization cost, frame pacing, resource lifetime and performance sections.
  • CUDA: Profile occupancy, bandwidth, and kernel timing. · Reading: CUDA Book — profiling, occupancy intuition, bottleneck analysis.
  • Stretch: Create a short report ranking kernels by bottleneck type: compute-bound, memory-bound, or synchronization-bound.
  • Verify: Benchmarks are repeatable · You know which kernels are actually slow · Optimizations are justified by measurements.