Week 33 — Coding Projects
Core
Profile earlier kernels systematically.
- NumPy: Build a benchmark harness and correctness harness for CPU reference routines. Record problem size, runtime, and error.
- Metal: Use Metal profiling tools and timestamps. Profile reduction, scan, blur, matmul, and renderer passes. · Reading: MBT — debugging/profiling/performance sections, frame capture workflow.
- Vulkan: Profile dispatches, barriers, and frame timing. · Reading: Vulkan Book — synchronization cost, frame pacing, resource lifetime and performance sections.
- CUDA: Profile occupancy, bandwidth, and kernel timing. · Reading: CUDA Book — profiling, occupancy intuition, bottleneck analysis.
- Stretch: Create a short report ranking kernels by bottleneck type: compute-bound, memory-bound, or synchronization-bound.
- Verify: Benchmarks are repeatable · You know which kernels are actually slow · Optimizations are justified by measurements.