Syllabus

36-Week Detailed Schedule

Full week-by-week plan for the course. Each entry lists the lecture topic, relevant math readings, proof targets, book exercises, a short deliverable connecting the math back to the GPU/graphics project, and a coding exercise (NumPy reference + Metal / Vulkan / CUDA GPU implementation). Lessons are published in Lessons as I complete them.

Note on AI use: The weekly lesson titles and lesson plan summaries in this syllabus were generated with AI assistance. All actual lesson notes, exercises, coding projects, and proofs are written entirely by me — the goal is to learn the material, and that requires producing my own content. All proofs are done by hand on paper first; the handwritten proofs are then converted to LaTeX using AI so I can focus on the mathematics and let the AI handle the typesetting.

Prerequisites: Multivariable calculus (college level) · Introductory discrete mathematics · Basic probability theory

General rule for each week: (1) one proof writeup in full sentences, (2) 3–8 book exercises, (3) one short summary page connecting the math back to the GPU/graphics project, (4) a coding exercise — build the NumPy reference first (see Numerical Python by Johansson for NumPy/SciPy reference), then implement in one GPU stack (Metal, Vulkan, or CUDA). Validate GPU output against the CPU reference. Record correctness result, runtime, and notes on numerical or memory behavior.

Book abbreviations used below:

Linear Algebra + Analysis - Axler — Linear Algebra Done Right (Axler) - Axler-M — Measure, Integration & Real Analysis (Axler) - Ross — Elementary Analysis (Ross) - T&B — Numerical Linear Algebra (Trefethen & Bau) - BK — An Introductory Course in Functional Analysis (BK)

Probability - BT — Introduction to Probability (Bertsekas & Tsitsiklis)

Geometry + Graphics - 3DMP — 3D Math Primer for Graphics and Game Development (Dunn & Parberry) - Pressley — Elementary Differential Geometry (Pressley) - FCG — Fundamentals of Computer Graphics (Shirley & Marschner) - Mendelson — Introduction to Topology (Mendelson)

Signal Processing + Complex Analysis - O&S — Discrete-Time Signal Processing (Oppenheim & Schafer) - OWN — Signals and Systems (Oppenheim, Willsky & Nawab) - CA — Complex Analysis (Ahlfors / Bak & Newman)

Optimization + Information Theory + Algebra - Boyd — Convex Optimization (Boyd & Vandenberghe) - C&T — Elements of Information Theory (Cover & Thomas) - Pinter — A Book of Abstract Algebra (Pinter)

Secondary Enrichment - Dasgupta — Algorithms (Dasgupta, Papadimitriou & Vazirani) - Sipser — Introduction to the Theory of Computation (Sipser) - PR — Probabilistic Robotics (Thrun, Burgard & Fox)

Computing - Numerical Python — Numerical Python: Scientific Computing and Data Science Applications with NumPy, SciPy and Matplotlib (Johansson) - MBT — Metal by Tutorials - Vulkan Book — Mastering Graphics Programming with Vulkan - CUDA Book — GPU Programming with C++ and CUDA

Phase 0 · Weeks 1–4

Foundations: GPU + Linear Algebra + Analysis

Tool guidance: NumPy first, then one GPU stack. Metal or CUDA are both natural starting points; Vulkan is viable if you want explicit engine-style foundations from day one.

Week 1 — Vector Spaces and GPU Data Representation

Math: Axler Ch. 1 · Ross Ch. 1–2

Proofs:

Prove that the intersection of any collection of subspaces is a subspace.
Prove that the union of two subspaces is a subspace iff one is contained in the other.
Prove uniqueness of additive identity and additive inverses in a vector space.
Write a proof sketch of completeness of ℝ using the least-upper-bound property.

Exercises: Axler Ch. 1 — subspaces, span, linear dependence, bases · Ross Ch. 1–2 — supremum/infimum, absolute value inequalities, basic completeness consequences

Deliverable: Explain why vectors used in shaders/buffers are elements of finite-dimensional vector spaces.

Coding exercise — Core: Build vector math primitives and a first “hello GPU” buffer pipeline.

NumPy: Implement vector add, scalar multiply, dot product, and L2 norm. Verify distributive and scalar laws numerically. Store vertices of a triangle as arrays.
Metal: Render a triangle and write one compute kernel for vector addition. · Reading: MBT — app setup, Metal device/command queue, buffers, first render pipeline, first compute pipeline.
Vulkan: Render a triangle and create a vertex buffer with triangle data uploaded. · Reading: Vulkan Book — project setup, instance/device/swapchain basics, first graphics pipeline, vertex buffers.
CUDA: Write a vector-add kernel, launch over a 1D array, and verify against CPU. · Reading: CUDA Book — CUDA execution model, kernel launch basics, memory copies, thread/block indexing.
Stretch: Batched vector add for shape (N, D).
Verify: Triangle renders · GPU vector add matches CPU exactly for simple inputs.

Week 2 — Linear Maps, Transformations, and Program Structure

Math: Axler Ch. 2 · Ross Ch. 3

Secondary enrichment: Sipser — regular languages and finite automata (keep it light and conceptual)

Proofs:

Prove that the kernel of a linear map is a subspace.
Prove that the image/range of a linear map is a subspace.
Prove linearity of composition of linear maps.
Prove the ε-δ definition of limit for a simple polynomial map f(x) = x² at an arbitrary point.
Sipser enrichment — proof sketch: Prove closure of regular languages under union using the product construction on DFAs.

Exercises: Axler Ch. 2 — linear maps, null spaces, ranges, matrix representations · Ross Ch. 3 — ε-δ limit exercises for polynomials, rational functions, and uniqueness of limits

Sipser enrichment exercises:

Design DFAs or NFAs for simple token patterns (identifiers, numeric literals, keywords).
Sketch how the lexical structure of a shader language (GLSL, MSL) can be described by regular expressions and modeled by finite automata.
Convert a small NFA to an equivalent DFA using the subset construction.
Write a short note: what part of a graphics pipeline “parses” program structure, and how does this relate to automata?

Compute connection: The front end of every shader compiler runs a lexer whose token rules are regular languages. The DFA model is the theory behind that first compilation stage.

Deliverable: Relate kernels/images to graphics transforms and program interface layers.

Coding exercise — Core: Apply linear transforms to point sets.

NumPy: Implement matrix-vector and matrix-matrix multiply. Build rotation, scaling, shear, and reflection matrices. Transform a square point cloud.
Metal: Pass transform matrices into a shader and animate object transforms. · Reading: MBT — uniforms, matrices, coordinate transforms, per-frame updates, shader parameter passing.
Vulkan: Use uniform buffers or push constants for transforms and animate over frames. · Reading: Vulkan Book — descriptor sets, uniform buffers, transform data flow, frame resources.
CUDA: Apply a matrix transform to a large point cloud in parallel. · Reading: CUDA Book — device memory layout, grid-stride loops, simple linear algebra kernels.
Stretch: Separate model, view, and projection matrices cleanly. Add unit tests for composition order.
Verify: Composition order visibly matters · Identity transform leaves points unchanged · Inverse restores original points when invertible.

Week 3 — Matrix Multiplication and Linear Systems

Math: Axler Ch. 3 · T&B Lectures 1–2

Secondary enrichment: Dasgupta — algorithm analysis and divide-and-conquer (early chapters/sections on recurrence thinking, asymptotic growth, and matrix multiplication as an algorithmic object)

Proofs:

Prove the rank-nullity theorem in finite dimensions.
Prove that matrix multiplication corresponds to composition of linear maps.
Prove uniqueness of LU factorization under appropriate assumptions (unit lower-triangular L).
Dasgupta enrichment: Prove correctness of recursive matrix multiplication by induction on matrix size. Solve the divide-and-conquer recurrence T(n) = 8T(n/2) + O(n²) by repeated substitution or a recursion tree.

Exercises: Axler Ch. 3 — bases, coordinates, change of basis, dimension counts · T&B Lect. 1–2 — Gaussian elimination, LU factorization, flop counts, triangular solves

Dasgupta enrichment exercises:

Analyze the runtime of naive triple-loop matrix multiplication; derive the O(n³) bound carefully.
Set up the recurrence for recursive (divide-and-conquer) matmul and solve it using the Master Theorem or a recursion tree.
Compare asymptotic work of direct triangular solve vs. matrix-inverse-based solve in a small example.

Compute connection: When benchmarking naive vs. tiled matmul, include the asymptotic discussion as part of the writeup: why does O(n³) work not tell you everything about cache behavior, and where does arithmetic intensity appear as a more useful measure?

Deliverable: Explain how matrix multiplication kernels implement composition at scale.

Coding exercise — Core: Implement dense matmul at multiple optimization levels.

NumPy: Write naive triple-loop matmul and a blocked CPU matmul. Compare against numpy.matmul. Implement Gaussian elimination or LU for small dense matrices.
Metal: Write a naive compute matmul, then a tiled/threadgroup-memory version. Benchmark both. · Reading: MBT — compute pipelines, threadgroups, threadgroup memory, performance-oriented compute examples.
Vulkan: Write a compute shader matmul, then a tiled/shared-memory-style version. · Reading: Vulkan Book — compute pipelines, storage buffers, synchronization basics, compute dispatch structure.
CUDA: Write a naive matmul kernel, then a tiled shared-memory version. · Reading: CUDA Book — shared memory, thread blocks, memory coalescing, tiled matrix multiplication.
Stretch: Benchmark square and rectangular cases. Add simple performance plots.
Verify: Max absolute error versus NumPy is tiny · Tiled version outperforms naive for large enough matrices · Non-square cases handled correctly.

Week 4 — Eigenvalues, Eigenvectors, and Diagonalization

Math: Axler Ch. 5 · Ross Ch. 5

Proofs:

Prove that eigenvectors corresponding to distinct eigenvalues are linearly independent.
Prove the criterion for diagonalizability in terms of existence of a basis of eigenvectors.
Prove continuity of sums, products, and compositions of continuous functions.

Exercises: Axler Ch. 5 — eigenvalues, eigenspaces, diagonalization, invariant subspaces · Ross Ch. 5 — continuity and standard closure properties of continuous functions

Deliverable: Connect diagonalization to repeated application of transformations and PCA motivation.

Coding exercise — Core: Connect eigendecomposition to 3D object rendering and geometric transforms.

NumPy: Use numpy.linalg.eig on small matrices. Repeatedly apply matrix powers to vectors. Show eigenvector directions remain invariant up to scale.
Metal: Render a rotating cube with a model-view-projection pipeline. · Reading: MBT — 3D rendering setup, depth buffering, transform stacks, camera basics.
Vulkan: Render a rotating cube with depth buffer. · Reading: Vulkan Book — depth testing, 3D pipeline setup, uniform-driven transforms.
CUDA: GPU batch application of 3×3 or 4×4 transforms to many points. · Reading: CUDA Book — batched vector/matrix kernels, throughput-oriented data-parallel patterns.
Stretch: Add perspective projection. Compare diagonalizable vs. defective examples.
Verify: Av ≈ λv for chosen eigenvectors · Cube transform chain is correct · Numerical behavior for repeated powers of A is visible.

Phase 1 · Weeks 5–8

Probability + Functional Thinking

Tool guidance: NumPy first for statistics. CUDA is the strongest choice for pure compute kernels; Metal is also good if staying Apple-native; Vulkan is viable for explicit compute buffer practice.

Week 5 — Probability Spaces and Normed Vector Spaces

Math: BT Ch. 1–2 · BK Ch. 1

Proofs:

Prove basic σ-algebra closure properties.
Prove De Morgan’s laws in the language of events.
Prove that every norm induces a metric.
Prove that any normed vector space is a metric space.

Exercises: BT Ch. 1–2 — sample spaces, events, σ-fields, axioms of probability, inclusion-exclusion, conditional probability basics · BK Ch. 1 — norms, metrics, convergence, examples of normed spaces

Deliverable: Write down why L¹, L², and L^∞ all matter for numerical/GPU thinking.

Coding exercise — Core: Compute norms and simulate basic probability experiments.

NumPy: Simulate dice and coin experiments. Implement L¹, L², and L^∞ norms manually. Plot empirical probability convergence.
Metal: Row-wise norm kernel over a large matrix. Optionally count threshold events in parallel. · Reading: MBT — compute buffer processing, reductions or batched compute foundations.
Vulkan: Compute shader for row norms. · Reading: Vulkan Book — compute buffer iteration, workgroup sizing, storage buffer patterns.
CUDA: Batched norm kernel. · Reading: CUDA Book — reduction-adjacent compute kernels, efficient memory access.
Stretch: Histogram of norms in high dimension. Add histogram generation.
Verify: GPU norm results match CPU · Empirical frequencies stabilize with larger N.

Week 6 — Random Variables and Linear Operators

Math: BT Ch. 3 · BK (operator sections)

Proofs:

Prove linearity of expectation.
Prove E[𝟙_A] = P(A).
Prove that a bounded linear operator is continuous.
Prove equivalence of continuity at one point and continuity everywhere for linear operators.

Exercises: BT Ch. 3 — discrete random variables, expectation, indicator variables, distribution computations · BK — linear operators, boundedness, continuity, operator norm examples

Deliverable: Explain expectation as reduction/aggregation over random outputs of a kernel.

Coding exercise — Core: Work with random variables and build elementwise operator pipelines.

NumPy: Simulate discrete random variables. Estimate expectation and variance. Implement elementwise square, exp, sigmoid, and clamp.
Metal: Chain elementwise compute kernels over large buffers. · Reading: MBT — multiple compute passes, resource reuse, command encoding patterns.
Vulkan: Compute pipeline chain for elementwise ops with descriptor reuse. · Reading: Vulkan Book — compute dispatch chaining, descriptor reuse, buffer ping-pong.
CUDA: Elementwise kernels plus a fused kernel version. · Reading: CUDA Book — kernel fusion intuition, launch overhead, data locality.
Stretch: Compare fused vs. multi-pass kernels. Measure cost of extra memory traffic.
Verify: Empirical expectation approaches analytic expectation · Chained kernels produce same results as CPU · Fused and unfused outputs agree.

Week 7 — Covariance, Inner Products, and Numerical Stability

Math: BT Ch. 4 · T&B Ch. 3–4

Secondary enrichment: PR — Gaussian beliefs, covariance interpretation, uncertainty propagation intuition

Proofs:

Prove Var(X) = E[X²] − (E[X])².
Prove covariance bilinearity properties.
Prove Cauchy-Schwarz in an inner-product space.
Prove a simple conditioning/error bound for solving Ax = b under perturbations.
PR enrichment: Prove that a covariance matrix is symmetric positive semidefinite from the definition Σ = E[(X − μ)(X − μ)ᵀ]. Derive how covariance transforms under a linear map: if Y = AX + b, then Cov(Y) = A Cov(X) Aᵀ.

Exercises: BT Ch. 4 — variance, covariance, correlation, jointly distributed random variables · T&B Ch. 3–4 — orthogonality, QR factorization, least squares, conditioning

PR enrichment exercises:

Compute the covariance matrix of a 2D Gaussian by hand from a small set of samples; verify symmetry and positive semidefiniteness.
Given a 2×2 covariance matrix, compute its eigenvectors and eigenvalues; sketch the corresponding uncertainty ellipse and explain what each axis represents geometrically.
Apply a 2D rotation matrix A to a circular Gaussian (identity covariance); compute the resulting covariance Aᵀ and verify it remains isotropic.
Explain in one paragraph: why does the shape of a covariance ellipse matter for sensor fusion and state estimation?

Compute connection: Implement a NumPy notebook that visualizes 2D Gaussian uncertainty ellipses from 2×2 covariance matrices — draw samples, fit the ellipse via eigendecomposition, and show how the ellipse deforms under linear transforms. This will be directly reused in Week 23’s localization exercises.

Deliverable: Explain why covariance matrices are symmetric positive semidefinite.

Coding exercise — Core: Compute covariance matrices and study floating-point stability.

NumPy: Generate correlated data. Compute covariance manually. Compare one-pass vs. two-pass variance formulas. Show catastrophic cancellation example.
Metal: Mean reduction kernel then covariance accumulation. Compare float32 and float16 where available. · Reading: MBT — reductions, parallel accumulation, synchronization concepts in compute chapters.
Vulkan: Compute shader for mean and covariance with workgroup shared memory. · Reading: Vulkan Book — workgroup shared memory, barriers, reduction patterns.
CUDA: Parallel reduction for means then a covariance kernel. · Reading: CUDA Book — reductions, numeric stability, accumulation strategies.
Stretch: Add Kahan summation on CPU reference. Compare float16/float32 effects.
Verify: Covariance is symmetric within tolerance · Two-pass formulas are more stable on adversarial data · GPU and CPU results are close.

Week 8 — Spectral Decomposition and Principal Component Analysis

Math: Axler-M Ch. 1–3 · linear algebra spectral material

Proofs:

Prove the spectral theorem in the finite-dimensional real symmetric case (orthogonal diagonalization of symmetric matrices).
Prove that principal components maximize variance subject to orthonormal constraints.
Prove measurability of simple indicator-generated functions from early measure-theory definitions.

Exercises: Axler-M Ch. 1–3 — σ-algebras, measurable spaces, measurable functions, integration/simple functions · Axler — orthogonal eigenbases and diagonalization of self-adjoint/symmetric operators

Deliverable: Write a one-page derivation of PCA from covariance + spectral decomposition.

Coding exercise — Core: Implement PCA from covariance to projection.

NumPy: Mean-center data. Compute covariance. Eigendecompose. Project to top-k PCs. Study reconstruction error.
Metal: Mean-centering and covariance accumulation on GPU; eigensolve on CPU initially. Feed GPU projection using top-k eigenvectors. · Reading: MBT — compute workflows over large buffers, feeding compute outputs into visualization.
Vulkan: Same split: GPU covariance, CPU eigensolve first. · Reading: Vulkan Book — compute-to-graphics data flow, storage buffers for matrix data.
CUDA: GPU covariance plus optional power iteration for top eigenvector. · Reading: CUDA Book — iterative kernels, reductions, matrix-vector multiplies.
Stretch: Implement power iteration. Reconstruct from top-1 or top-2 PCs.
Verify: Principal direction visually matches variance direction · Reconstruction error decreases as k increases.

Phase 2 · Weeks 9–12

Geometry + Graphics Math

Tool guidance: Metal or Vulkan. These are graphics-heavy weeks — a live graphics app is the best environment. Metal is more concise on Apple hardware; Vulkan gives explicit control and is portable.

Week 9 — Coordinate Systems and Geometric Transformations

Math: 3DMP Ch. 1–4 · Pressley Ch. 1

Proofs:

Prove that reparametrization of a regular curve preserves its image but changes speed.
Prove that tangent vectors transform linearly under smooth coordinate changes.

Exercises: 3DMP Ch. 1–4 — vectors, points vs. directions, affine combinations, matrices, transformations · Pressley Ch. 1 — parametrized curves, tangents, arc length, regularity

Deliverable: Explain world, view, and clip coordinates as successive coordinate changes.

Coding exercise — Core: Build a camera transform pipeline.

NumPy: Implement look-at and perspective projection matrices. Transform world points into camera and clip coordinates. Visualize a simple scene projection.
Metal: Build an interactive camera with model/view/projection uniforms. · Reading: MBT — cameras, uniforms, input-driven transforms, scene graph style structure.
Vulkan: Interactive camera with uniform buffers and frame synchronization. · Reading: Vulkan Book — camera matrices, descriptor updates, frame synchronization.
CUDA: GPU projection of very large point clouds. · Reading: CUDA Book — batched geometry transforms, memory throughput basics.
Stretch: Add orbit camera and FPS camera modes.
Verify: Moving camera changes scene correctly · Near/far clipping behaves sensibly · Points behind camera are handled correctly.

Week 10 — Curves, Surfaces, and Curvature

Math: Pressley Ch. 2–3

Proofs:

Derive the curvature formula for a plane curve parametrized by arc length.
Prove that curvature is invariant under rigid motions.

Exercises: Pressley Ch. 2–3 — Frenet frame, curvature, torsion, surfaces, local parametrizations

Deliverable: Explain where curvature shows up in geometry processing and shading intuition.

Coding exercise — Core: Work with parametric curves and simple surfaces.

NumPy: Implement Bézier or spline curve evaluation. Approximate tangents and curvature numerically. Generate a parametric surface grid.
Metal: Render a mesh from a parametric surface. Animate deformation over time. · Reading: MBT — mesh generation, vertex buffers, indexed drawing, normals.
Vulkan: Render a generated mesh with indexed geometry and normal handling. · Reading: Vulkan Book — indexed geometry, mesh data upload, normal handling.
CUDA: Parallel evaluation of curve/surface samples over a grid. · Reading: CUDA Book — batched evaluation kernels, multidimensional launch layouts.
Stretch: Color the surface by curvature. Add normal computation.
Verify: Tangents and normals are visually sensible · Curvature spikes where expected · Surface mesh indexing is correct.

Week 11 — Surface Normals and Illumination Models

Math: FCG Ch. 4–6

Proofs:

Derive the reflection vector formula from projection/orthogonal decomposition.
Prove Lambert’s cosine law from dot-product geometry.

Exercises: FCG Ch. 4–6 — geometric optics basics, local illumination, BRDF intuition, rasterization/ray geometry

Deliverable: Write the derivation for Lambert and Blinn-Phong in vector notation.

Coding exercise — Core: Implement basic shading models.

NumPy: Given positions/normals/light direction, compute Lambert intensity. Add Blinn-Phong specular. Render a simple shaded sphere image on CPU.
Metal: Vertex + fragment shaders for Lambert and Blinn-Phong. Render a sphere with light and camera controls. · Reading: MBT — lighting, normals, fragment shading, material parameters.
Vulkan: Equivalent lighting pipeline with shader stages and lighting UBOs. · Reading: Vulkan Book — shader stages, lighting UBOs, material pipeline setup.
CUDA: Software shading of many fragments/pixels in parallel. · Reading: CUDA Book — image-parallel kernels, per-pixel compute patterns.
Stretch: Add multiple lights. Add gamma correction.
Verify: Diffuse intensity matches max(dot(n, l), 0) · Specular highlight moves with camera/light · Normals are normalized before shading.

Week 12 — Topological Concepts in Geometric Spaces

Math: Mendelson Ch. 1–2

Proofs:

Prove that arbitrary unions of open sets are open.
Prove that finite intersections of open sets are open.
Prove that complements of open sets are closed and vice versa.
Prove that metric-open sets satisfy the topology axioms.

Exercises: Mendelson Ch. 1–2 — open/closed sets, neighborhoods, closure, interior, limit points, continuity in topological terms

Deliverable: Explain why topology captures structure preserved under continuous deformation.

Coding exercise — Core: Explore continuity-like ideas numerically through UV mapping.

NumPy: Parameterize a square or cylinder with UV coordinates. Sample a texture grid and map values to surface points. Visualize seams and distortions.
Metal: Apply a texture to a mesh. Show one good UV map and one intentionally seamed UV map. · Reading: MBT — textures, samplers, UV coordinates, image loading.
Vulkan: Texture mapping with sampler and image descriptors. · Reading: Vulkan Book — sampled images, samplers, descriptor image bindings.
CUDA: Image resampling or UV-sample kernel. · Reading: CUDA Book — general image-parallel kernels.
Stretch: Compute Jacobian-like distortion estimates numerically. Add checkerboard texture to expose stretching.
Verify: Texture coordinates line up as intended · Seams are explainable · Distortion is visible and measurable.

Phase 3 · Weeks 13–16

GPU Algorithms + Numerical Stability

Tool guidance: CUDA is the strongest choice for studying raw parallel algorithm behavior. Metal and Vulkan are both good for GPU algorithm practice on graphics-integrated platforms.

Week 13 — Parallel Reduction Algorithms and Floating-Point Error

Math: T&B Ch. 5–6

Proofs:

Prove a simple backward stability statement for summation or triangular solve.
Prove/derive a worst-case accumulation error bound for floating-point summation.

Exercises: T&B Ch. 5–6 — floating-point arithmetic, conditioning, stability, orthogonalization/numerical error

Deliverable: Compare naive reduction vs. tree reduction from an error perspective.

Coding exercise — Core: Implement parallel reduction and study summation error.

NumPy: Compare left-fold, pairwise, and NumPy sum. Build adversarial floating-point examples where order matters.
Metal: Parallel reduction kernel for sum and max. Multi-stage reduction for large arrays. · Reading: MBT — compute optimization, threadgroup reductions, synchronization.
Vulkan: Compute reduction kernel with workgroup-local reductions and barriers. · Reading: Vulkan Book — workgroup-local reductions, barriers, multi-pass reduction structure.
CUDA: Standard reduction ladder with warp/block reduction. · Reading: CUDA Book — optimized reductions, warp/block reduction concepts, memory coalescing.
Stretch: Add max reduction and norm reduction. Compare float16/float32.
Verify: Summation order changes result for adversarial values · GPU reduction matches CPU reference within expected tolerance · GPU speedup appears for large enough arrays.

Week 14 — Prefix Sums and Parallel Scan Algorithms

Proofs:

Prove associativity is the key algebraic condition enabling a parallel scan.
Prove correctness of inclusive and exclusive scan by induction on tree levels.

Exercises:

Prove correctness of Blelloch scan.
Show why non-associative floating-point addition complicates reproducibility.
Analyze work and depth of scan.

Deliverable: One-page proof + work-depth analysis.

Coding exercise — Core: Implement inclusive and exclusive scan.

NumPy: Implement scan sequentially. Use it to solve stream compaction or histogram prefix problems.
Metal: Implement Blelloch scan or Hillis-Steele scan. Use scan for stream compaction. · Reading: MBT — advanced compute patterns, multi-pass buffer algorithms.
Vulkan: Compute-based scan with multi-dispatch orchestration. · Reading: Vulkan Book — prefix-scan friendly workgroup structure, multi-dispatch orchestration.
CUDA: Parallel scan with hierarchical structure for large arrays. · Reading: CUDA Book — scan/prefix-sum patterns, hierarchical scans.
Stretch: Use scan to build radix-sort components. Add segmented scan if ambitious.
Verify: Inclusive and exclusive outputs are correct · Stream compaction preserves order · Work decomposition is understood.

Week 15 — Sorting Networks and Parallel Sorting Algorithms

Secondary enrichment: Dasgupta — divide-and-conquer and sorting · Sipser — time complexity basics

Proofs:

Prove correctness of a comparator.
Prove the zero-one principle for sorting networks.
Use the zero-one principle to justify correctness of bitonic sort.
Dasgupta enrichment: Prove correctness of mergesort by induction on the size of the input array. Prove the recurrence T(n) = 2T(n/2) + O(n) and solve it to get O(n log n).
Sipser enrichment — proof sketch: Prove that a simple sorting language is decidable in polynomial time by describing a concrete algorithm and bounding its step count.

Exercises:

Prove a bitonic merge stage works.
Count comparators in a small sorting network.
Compare work/depth with sequential sorting.

Dasgupta enrichment exercises:

Derive the asymptotic runtime for mergesort via the recursion tree.
Compare sequential sorting complexity O(n log n) with bitonic sort’s work O(n log² n) and depth O(log² n) — when does each regime matter?
Write a short comparison of asymptotic work vs. parallel depth: why is minimizing depth the GPU-relevant measure even when it costs extra total work?

Sipser enrichment exercises:

Write a runtime bound for a sequential sorting algorithm by explicitly describing the machine steps in the language of a TM model, then translate back to a practical asymptotic bound.
Write a short note: why does asymptotic complexity not capture all relevant GPU costs? (Identify at least two effects — e.g., memory latency, branch divergence — that are invisible to the O(·) model.)

Compute connection: CPU mergesort is the natural sequential baseline. Implement it and compare its measured runtime to bitonic sort at varying array sizes. Your writeup should include both the “theory cost” (work and depth) and the “hardware cost” (measured time) and discuss why they sometimes disagree.

Deliverable: A clean correctness proof for bitonic sorting network.

Coding exercise — Core: Implement bitonic sorting network.

NumPy: Simulate comparator stages explicitly. Print intermediate arrays for small n.
Metal: Bitonic sort compute kernel with barrier placement between stages. · Reading: MBT — compute passes over buffers, synchronization, staged algorithms.
Vulkan: Bitonic sort in compute shaders with barrier placement and staged compare-exchange kernels. · Reading: Vulkan Book — repeated compute dispatches, barrier placement, staged compare-exchange kernels.
CUDA: Bitonic sort kernel with branch-friendly compare-exchange structure. · Reading: CUDA Book — sorting kernels, compare-exchange structure, branch behavior.
Stretch: Sort key-value pairs. Benchmark against a CPU baseline.
Verify: Correctness on random arrays and duplicates · Stage ordering is right · Bitonic constraints are understood.

Week 16 — Optimized Matrix Multiplication and Memory Hierarchy

Proofs:

Prove correctness of blocked matrix multiplication by partitioned-matrix algebra.
Derive arithmetic intensity for naive vs. tiled matmul.

Exercises:

Compute memory traffic for naive matmul.
Compute traffic for tiled matmul.
Explain why shared/local memory tiling helps.

Deliverable: Cache/memory analysis note.

Coding exercise — Core: Optimize matmul for locality and throughput.

NumPy: Compare naive loops, blocked loops, and NumPy BLAS-backed matmul. Estimate arithmetic intensity.
Metal: Tuned tiled matmul using threadgroup memory carefully. Sweep tile sizes. · Reading: MBT — threadgroup memory, compute performance tuning, buffer layout.
Vulkan: Tuned tiled compute matmul with workgroup shape experiments. · Reading: Vulkan Book — compute tuning, workgroup shape experiments, storage buffer layout.
CUDA: Shared-memory tiled matmul exploring occupancy vs. bandwidth tradeoffs. · Reading: CUDA Book — optimized GEMM foundations, occupancy vs bandwidth tradeoffs.
Stretch: Add transposed operand version. Explore alignment/padding effects. Run tile-size sweep.
Verify: Correctness versus NumPy · Performance changes with tile size · Tiling reduces global memory traffic conceptually.

Phase 4 · Weeks 17–20

Signals + Fourier + Complex Analysis

Tool guidance: NumPy first for DSP correctness. For GPU: Metal or Vulkan for image-processing integration (Weeks 17–19); CUDA for compute-heavy Monte Carlo work (Week 20). Metal/Vulkan are better if you want rendering integration.

Week 17 — Discrete Convolution and Linear Time-Invariant Systems

Math: O&S Ch. 2 · OWN Ch. 2 · CA Ch. 1–2

Proofs:

Prove convolution is commutative.
Prove convolution is associative.
Prove every discrete-time LTI system is represented by convolution with its impulse response.

Exercises: O&S Ch. 2 — discrete convolution, impulse response, causality, linearity/time invariance · OWN Ch. 2 — LTI systems and signal decompositions · CA early chapters — complex algebra, analytic functions, geometric meaning of multiplication

Deliverable: Explain blur as convolution with a finite impulse response kernel.

Coding exercise — Core: Implement discrete convolution for signals and images.

NumPy: 1D convolution for signals. 2D convolution for grayscale images. Implement box blur and Gaussian-like blur.
Metal: Image blur compute kernel with separable filter for performance. · Reading: MBT — image processing compute shaders, texture read/write, separable filters.
Vulkan: Compute blur over images with image resource layout transitions. · Reading: Vulkan Book — image resources, compute image processing, image barriers/layouts.
CUDA: Image blur kernel with shared-memory tiling for neighborhood access. · Reading: CUDA Book — stencil/convolution kernels, shared-memory tiling for neighborhoods.
Stretch: Compare direct vs. separable convolution. Add edge handling modes.
Verify: Small hand-built convolution tests match exact expected outputs · Separable version is faster for separable kernels.

Week 18 — Discrete Fourier Transform and Frequency Analysis

Math: O&S Ch. 4–5 · OWN Ch. 4 · CA residues

Secondary enrichment: Dasgupta — FFT / divide-and-conquer chapter (canonical algorithmic treatment of FFT as a D&C algorithm)

Proofs:

Derive the DFT from sampling the DTFT on Nth roots of unity.
Prove orthogonality of the complex exponential basis.
Prove the convolution-multiplication duality in the discrete setting.
Dasgupta enrichment: Prove the recursive decomposition of the DFT into even-index and odd-index sub-problems (the Cooley-Tukey butterfly identity). Derive and solve the recurrence T(n) = 2T(n/2) + O(n) to establish O(n log n) FFT runtime. Prove that the inverse DFT reconstructs the original vector up to a normalization factor.

Exercises: O&S Ch. 4–5 — DFT properties, circular convolution, symmetry, FFT ideas · OWN Ch. 4 — frequency-domain representation · CA — residue-calculation exercises

Dasgupta enrichment exercises:

Work through one hand-computed FFT for n = 4 or n = 8: write down all butterfly stages explicitly.
Compare O(n²) naive DFT vs. O(n log n) FFT: for n = 1024, how many operations does each require?
Trace the butterfly dependencies on paper for n = 8 and explain why the access pattern is GPU-friendly or GPU-unfriendly.

Compute connection: The coding exercise for this week should explicitly include: (1) naive O(n²) DFT, (2) recursive radix-2 FFT, and (3) a runtime comparison plot showing the crossover point where FFT wins.

Deliverable: One-page note explaining frequency-domain interpretation for graphics/signal workloads.

Coding exercise — Core: Implement DFT first, then FFT.

NumPy: Implement O(N²) DFT manually. Compare against numpy.fft.fft. Visualize magnitude spectra of simple signals.
Metal: Small DFT or radix-2 FFT prototype using buffer ping-pong for stages. · Reading: MBT — compute-heavy multi-stage algorithms, buffer ping-pong, performance debugging.
Vulkan: Compute DFT/FFT prototype with staged compute pipelines. · Reading: Vulkan Book — staged compute pipelines, storage buffers, synchronization between FFT passes.
CUDA: Radix-2 FFT prototype with shared-memory staging. · Reading: CUDA Book — multi-stage compute, shared-memory staging, memory access patterns.
Stretch: 2D FFT for images. Frequency-domain filtering.
Verify: DFT matches NumPy within tolerance · Peaks appear where expected for sinusoidal signals · Inverse transform reconstructs original signal.

Week 19 — Sampling Theory and Aliasing

Math: O&S Ch. 7 · OWN Ch. 7

Proofs:

Prove the Nyquist criterion in the bandlimited idealized setting.
Prove/derive why undersampling creates aliasing via spectral replication.

Exercises: O&S Ch. 7 — sampling, reconstruction, aliasing, ideal low-pass interpolation · OWN Ch. 7 — reconstruction and sampling rates

Deliverable: Explain aliasing in images and rendered scenes in both spatial and frequency language.

Coding exercise — Core: Show aliasing visually and numerically.

NumPy: Sample sinusoids at varying rates. Downsample images with and without prefiltering. Plot the difference.
Metal: Procedural checker/stripe shader showing aliasing. Vary frequency and show aliasing in motion. · Reading: MBT — textures, sampling, mipmapping or filtering-related sections.
Vulkan: Equivalent aliasing demo with sampling, image filtering, and mip levels. · Reading: Vulkan Book — sampling, image filtering, mip levels, texture pipeline behavior.
CUDA: Downsampling or prefilter compute kernels. · Reading: CUDA Book — image resampling and filtering style kernels.
Stretch: Add mipmap-like image pyramid demo. Compare nearest vs. bilinear vs. prefiltered downsampling.
Verify: Undersampling causes visible false frequencies · Prefiltering reduces aliasing · Demo clearly shows Nyquist intuition.

Week 20 — Monte Carlo Methods in Rendering

Secondary enrichment (optional): Sipser — probabilistic computation (if your edition includes it; this is a theory lens on randomized algorithms, not required reading)

Proofs:

Prove that the sample mean estimator is unbiased for the integral/expectation.
Prove that variance of the sample mean scales like 1/N for iid samples.
Sipser enrichment — proof sketch (optional): Prove correctness or bounded-error guarantees for a toy randomized decision procedure (e.g., a randomized primality test framed as a decision problem); focus on the structure of the error-probability argument.

Exercises:

Derive the unbiased estimator for pixel radiance.
Compare two estimators with different variances.
Show how random variable choice changes convergence.

Sipser enrichment exercises (optional):

Compare “randomized algorithm as numerical estimator” (Monte Carlo rendering) vs. “randomized algorithm as decision procedure” (BPP): what is the analog of “error probability” in each setting?
Write a one-page note distinguishing Monte Carlo rendering from complexity-theoretic randomized computation: what are the respective goals, success criteria, and resource measures?

Compute connection: No extra code required for the Sipser portion — this is conceptual enrichment. The rendering prototype from this week is the right hands-on complement.

Deliverable: A clean proof of unbiasedness for the path-tracing estimator.

Coding exercise — Core: Write a tiny Monte Carlo renderer or integration demo.

NumPy: Estimate integrals by random sampling. Build a tiny CPU path tracer or hemisphere-sampling estimator.
Metal: Progressive simple path tracer or toy light transport estimator with compute-driven accumulation. · Reading: MBT — ray tracing or advanced rendering sections if included; otherwise compute-driven rendering accumulation.
Vulkan: Compute-driven progressive renderer or toy path tracer. · Reading: Vulkan Book — compute/graphics integration, progressive accumulation, advanced rendering architecture.
CUDA: Path tracing prototype with per-pixel random state management. · Reading: CUDA Book — Monte Carlo parallelism, random state management, per-pixel kernels.
Stretch: Add progressive rendering. Add cosine-weighted hemisphere sampling.
Verify: Estimate converges as sample count increases · Noise falls roughly with more samples · Accumulation is unbiased in the simple setup.

Phase 5 · Weeks 21–24

Probability Deep Dive

Tool guidance: NumPy first for estimator correctness. CUDA is strongest for heavy Monte Carlo compute. Metal/Vulkan are better if you want rendering integration for the sampling experiments.

Week 21 — Law of Large Numbers and Modes of Convergence

Math: BT (LLN section)

Proofs:

Prove or sketch weak LLN under iid finite-variance assumptions using Chebyshev.
Prove convergence in probability for the sample mean.

Exercises: BT probability chapters — Chebyshev, convergence in probability, sample-average consistency

Deliverable: Explain why Monte Carlo eventually stabilizes but still looks noisy at finite N.

Coding exercise — Core: Empirically study convergence across many parallel trials.

NumPy: Run repeated Monte Carlo experiments. Plot sample mean against N. Show confidence-like spread over many trials.
Metal: Parallel generation of many trial estimates, aggregate on CPU for plotting. · Reading: MBT — batched compute workloads and reductions.
Vulkan: Parallel multi-trial estimator with large-batch compute dispatch. · Reading: Vulkan Book — large-batch compute dispatch and result aggregation.
CUDA: Embarrassingly parallel simulation of many trials. · Reading: CUDA Book — embarrassingly parallel simulation structure.
Stretch: Compare bounded vs. heavy-tailed examples.
Verify: Sample means stabilize with larger N · Variability across trials shrinks · Pathological distributions converge more awkwardly.

Week 22 — Variance Reduction Techniques in Simulation

Secondary enrichment: Dasgupta — randomized algorithms / probabilistic algorithm ideas · PR — particle-filter intuition, importance weights, resampling motivation

Proofs:

Prove unbiasedness of importance sampling under the support condition.
Prove variance reduction for control variates in the optimal-coefficient formula.
Dasgupta enrichment: Prove unbiasedness of a simple randomized estimator (e.g., random sampling for approximate counting or integration). Prove the expectation of a randomized algorithm’s output in a toy example (e.g., expected runtime of randomized QuickSort on a random input).

Exercises:

Derive importance sampling estimator.
Derive antithetic variance relation in a simple example.
Derive optimal control-variate coefficient.

Dasgupta enrichment exercises:

Compare a deterministic vs. randomized strategy on a toy search or integration problem: compute the expected cost and compare to the deterministic worst case.
Compute expected estimator value and variance in a small worked example using a randomized algorithm (not Monte Carlo rendering — pick a discrete toy problem from Dasgupta).
Write a short note: why does variance matter in both simulation (rendering quality) and randomized algorithms (algorithm reliability), and how do the respective error-reduction strategies differ?

Compute connection: Add one small randomized algorithm experiment alongside the importance sampling demo — for instance, randomized selection or a toy probabilistic estimator — and compare its convergence rate to the importance sampling results.

PR enrichment — proofs:

Derive normalized importance weights: given unnormalized weights w̃ᵢ = p(xᵢ)/q(xᵢ), prove that wᵢ = w̃ᵢ / Σw̃ᵢ sums to 1 after normalization.
Derive why weight degeneracy (a few particles carrying nearly all weight) motivates resampling: show that effective sample size ESS ≈ 1 / Σwᵢ² collapses to 1 in the degenerate case.

PR enrichment exercises:

Work through a particle filter update by hand with 6 particles: assign unnormalized weights from a simple likelihood, normalize, compute ESS.
Compare the weight distributions under a good proposal (close to the true posterior) and a bad proposal (far from it); explain the variance implications.
Implement multinomial resampling from a weighted particle set in NumPy and verify that the resulting empirical distribution approximates the target.

PR compute connection: Implement a 1D particle filter in NumPy: propagate particles through a noisy motion model, weight by a noisy sensor likelihood, normalize, and resample. This is the core computational primitive for Week 23 and 24.

Deliverable: One-page comparison of crude MC vs. importance sampling.

Coding exercise — Core: Compare crude Monte Carlo to variance-reduced methods.

NumPy: Implement crude MC and importance sampling for one integral. Compare estimator variance across repeated runs.
Metal: Parallel weighted-sample evaluator with compute accumulation pipeline. · Reading: MBT — compute accumulation pipelines, random-sample workloads.
Vulkan: Compute accumulation of batched weighted estimates. · Reading: Vulkan Book — compute accumulation, batched weighted estimates.
CUDA: Importance sampling kernel with RNG/state layout. · Reading: CUDA Book — weighted Monte Carlo kernels, RNG/state layout.
Stretch: Use a rendering-inspired integrand. Try multiple proposal distributions.
Verify: Variance changes substantially with proposal choice · Bad proposal can hurt · Unbiasedness is preserved when implemented correctly.

Week 23 — Markov Chains and Stochastic Processes

Secondary enrichment: PR — Bayes filters, Markov assumption, transition models, hidden-state estimation

Proofs:

Prove that a stationary distribution satisfies πP = π.
Prove that detailed balance implies stationarity.
PR enrichment: Derive the Bayes filter recursion in discrete form: the prediction step bel⁻(xₜ) = Σ p(xₜ | xₜ₋₁, uₜ) bel(xₜ₋₁) and the correction step bel(xₜ) ∝ p(zₜ | xₜ) bel⁻(xₜ). Show how the Markov assumption makes this recursion well-defined: the future is conditionally independent of the past given the present state.

Exercises: Transition matrices, n-step transitions, communicating classes, stationary distributions

PR enrichment exercises:

Write a discrete robot localization model with a 5-cell grid: specify a transition matrix (motion model) and a sensor likelihood table (observation model).
Do one full predict-update cycle by hand: compute the predicted belief after a motion step, then the corrected belief after an observation.
Run the Bayes filter for 5–10 steps in NumPy; show how the belief concentrates over the true cell as more observations arrive.
Explain the difference between the transition probability p(xₜ | xₜ₋₁) and the sensor likelihood p(zₜ | xₜ) — what does each model, and where does each come from in a real system?

PR compute connection: Implement a histogram filter (discrete Bayes filter) in NumPy for 1D grid localization: represent belief as a normalized probability vector, apply motion and observation updates iteratively, and visualize belief concentration. GPU option: batch the belief update across many independent robot trials in parallel.

Deliverable: Explain random walk on a graph and long-run behavior.

Coding exercise — Core: Simulate Markov chains and random walks.

NumPy: Build transition matrices. Simulate chains. Estimate empirical stationary distribution. Implement 1D or 2D random walk.
Metal: Many random walks in parallel with state update kernels. · Reading: MBT — buffer-based simulation loops, state update kernels.
Vulkan: Random-walk state update compute pipeline with iterative state updates. · Reading: Vulkan Book — compute simulation framework, iterative state updates.
CUDA: Massive random-walk simulation with random-state management. · Reading: CUDA Book — simulation kernels, random-state management.
Stretch: Add absorbing states and hitting-time experiments. Compare empirical and matrix-power distributions.
Verify: Row-stochastic transitions sum to 1 · Long-run empirical distribution matches theory in easy cases · Random-walk intuition is visible.

Week 24 — Advanced Sampling Techniques for Rendering

Secondary enrichment: PR — particle filters, sequential Monte Carlo, sample impoverishment and resampling

Proofs:

Prove multiple-importance-sampling estimator remains unbiased for a standard balance heuristic setup.
PR enrichment: Derive the particle filter update equations in simple form: propagate each particle through the motion model, weight by the observation likelihood, normalize, and resample. Show that after resampling, the empirical distribution is a consistent estimator of the posterior belief as particle count grows.

Exercises:

Derive a two-technique MIS estimator.
Compare variance under poor proposal choice.
Explain when stratification helps.

PR enrichment exercises:

Implement multinomial and systematic resampling in NumPy; compare the variance of the resulting particle sets on a small example.
Simulate a 1D or 2D particle filter for a robot moving in a ring or corridor; track particle spread over time as the number of particles varies (10, 100, 1000).
Observe sample impoverishment: run the particle filter with too few particles over many steps and show how belief diversity collapses. Explain why this is the particle-filter analog of high variance in Monte Carlo estimation.

PR compute connection: Implement a full 1D or 2D particle filter with resampling in NumPy. GPU option (CUDA preferred): parallelize likelihood evaluation and particle propagation across all particles simultaneously — this is where GPU acceleration becomes directly motivated by a real estimation workload.

Deliverable: Improved path tracing note.

Coding exercise — Core: Improve the Week 20 renderer/integrator with better sampling.

NumPy: Add stratified sampling or MIS in toy form. Compare noise and convergence.
Metal: Add better sampling strategy to the rendering prototype with progressive accumulation and per-pixel random states. · Reading: MBT — advanced rendering accumulation / sampling-related sections.
Vulkan: Same in progressive renderer built for iterative refinement. · Reading: Vulkan Book — rendering architecture for iterative refinement.
CUDA: Same in CUDA path tracer with improved sampling strategy. · Reading: CUDA Book — improved sampling in Monte Carlo render kernels.
Stretch: Add next-event estimation. Compare image variance side by side.
Verify: Same sample budget produces cleaner results · Improved estimator still matches reference mean · Changes are measurable, not just aesthetic.

Phase 6 · Weeks 25–28

Optimization + Information Theory

Tool guidance: NumPy primary. CUDA is the best choice if you want big batched numerical kernels. Metal/Vulkan are optional for objective evaluation and heatmap rendering.

Week 25 — Convex Sets and Convex Functions

Math: Boyd Ch. 2

Secondary enrichment: Dasgupta — linear programming chapter/section · PR — estimation-as-optimization viewpoint, least-squares intuition

Proofs:

Prove intersection of convex sets is convex.
Prove epigraph characterization of convexity.
Prove Jensen’s inequality in finite-dimensional form.
Dasgupta enrichment: Prove that the feasible set of a linear program (intersection of halfspaces) is convex. Prove that any local optimum of a linear objective over a convex feasible set is also a global optimum.
PR enrichment: Derive least-squares state estimation for a linear Gaussian measurement model: given measurements z = Hx + v with v ~ N(0, R), show that minimizing the squared residual ‖z − Hx‖² yields the normal equations Hᵀx = Hᵀz and relate the solution to the maximum likelihood estimate.

Exercises: Boyd Ch. 2 — affine sets, convex hulls, hyperplanes, convex functions, epigraphs

Dasgupta enrichment exercises:

Put a small 2-variable optimization problem into LP standard form; write out the constraints as a matrix inequality Ax ≤ b.
Draw the feasible region for a 2D LP and identify the candidate optimal vertices by inspection.
Relate the geometry of the feasible polytope (its vertices and edges) to the optimization result — why does the optimum always occur at a vertex for a linear objective?

PR enrichment exercises:

Solve a 1D least-squares localization toy problem: given 3 noisy position measurements, write and solve the normal equations; compare to the sample mean.
Relate the negative log-likelihood of a Gaussian measurement model to a squared error objective — show algebraically that maximizing the likelihood is the same as minimizing the weighted squared residual.

Compute connection: Add a 2D LP visualization: plot the feasible polytope, the objective function level sets, and the optimal vertex in a NumPy/Matplotlib notebook. Also implement a NumPy least-squares localization demo: generate noisy observations of a hidden 2D position and recover the estimate via the normal equations.

Deliverable: Explain why least-squares and many ML objectives are convex/nonconvex.

Coding exercise — Core: Visualize convex optimization landscapes.

NumPy: Implement quadratic objectives. Plot level sets in 2D. Check convexity via midpoint test examples. Run gradient descent.
Metal: GPU-accelerate batched objective/gradient evaluation or render 2D heatmaps. · Reading: MBT — compute pipelines for numerical workloads.
Vulkan: Batched objective evaluation with compute shaders. · Reading: Vulkan Book — storage-buffer numerical compute.
CUDA: Batched gradient evaluation kernel. · Reading: CUDA Book — optimization-friendly batched kernels.
Stretch: Compare convex and nonconvex examples. Add projected steps onto a simple convex set.
Verify: GD converges on well-conditioned convex quadratics · Step size affects convergence · Geometry of convexity is visible.

Week 26 — Gradient-Based Optimization Methods

Math: Boyd Ch. 3–4

Secondary enrichment: PR — optimization-based estimation, MAP-style intuition

Proofs:

Derive gradient descent from first-order Taylor approximation.
Prove that steepest descent direction under the Euclidean norm is −∇f.
Prove first-order optimality condition for differentiable convex functions.
PR enrichment: Derive the MAP (maximum a posteriori) objective for a simple Gaussian prior + Gaussian likelihood model: show that −log p(x | z) ∝ ‖z − Hx‖²_R + ‖x − μ₀‖²_P, and that minimizing this is equivalent to regularized least squares. Derive the gradient of this objective with respect to x.

Exercises: Boyd Ch. 3–4 — gradients, Hessians, unconstrained minimization, norm interpretations, least-squares style objectives

PR enrichment exercises:

Write the MAP objective for a 1D localization problem with Gaussian prior and Gaussian sensor noise; solve analytically and verify the result interpolates between the prior mean and the measurement.
Implement gradient descent for a 2D pose-fitting problem in NumPy: fit a position estimate to noisy range observations by iterating on the gradient of the squared-residual objective.
Compare the closed-form least-squares solution to the gradient-descent solution; verify they agree and compare convergence speed as a function of step size.

Deliverable: Write the derivation of the SGD update and its deterministic ancestor.

Coding exercise — Core: Implement deterministic GD and SGD.

NumPy: Fit linear regression with GD and SGD. Compare batch size effects. Track loss over iterations.
Metal: Compute batch gradients on GPU, optimizer update on CPU first. Multi-pass reduction for gradient. · Reading: MBT — batched compute, reductions, multi-pass update structure.
Vulkan: Compute passes for objective + gradient evaluation. · Reading: Vulkan Book — compute passes for objective + gradient evaluation.
CUDA: SGD mini-batch kernel with reduction of partial gradients. · Reading: CUDA Book — mini-batch compute, reduction of partial gradients.
Stretch: Add momentum or Adam. Compare convergence traces.
Verify: SGD is noisier but often cheaper per step · Learning rate matters dramatically · GPU batch gradient matches CPU reference.

Week 27 — Duality Theory and Constrained Optimization

Math: Boyd Ch. 5

Secondary enrichment: Dasgupta — remaining LP material, duality intuition · Sipser — NP-completeness / polynomial-time reductions

Proofs:

Prove weak duality.
Derive the Lagrange dual for one simple constrained optimization problem.
Prove KKT conditions in a standard differentiable convex case, or state them carefully and verify on examples.
Dasgupta enrichment: Prove weak duality in a finite-dimensional LP setting directly from the primal-dual formulation. Verify complementary-slackness-style conditions on a tiny hand-worked example.
Sipser enrichment: Prove one simple polynomial-time reduction in full detail (e.g., 3-SAT ≤p CLIQUE or a simpler reduction from your edition). Prove that if A ≤p B and B ∈ P, then A ∈ P.

Exercises: Boyd Ch. 5 — Lagrangians, dual functions, Slater-type conditions, KKT verification, simple constrained problems

Dasgupta enrichment exercises:

Solve one small LP by hand using the simplex method or by inspecting vertices.
Write the primal and dual for one toy LP problem; verify that the dual objective lower-bounds the primal.
Check that objective values match at the optimum (strong duality) in a worked 2-variable example.

Sipser enrichment exercises:

Work through one classic NP-completeness reduction carefully, following the proof structure in the text.
Identify which optimization problems from this course live in P (e.g., convex QP, LP) and which are NP-hard in their general combinatorial form (e.g., integer programming, graph coloring).
Write a short note: why does convexity serve as such a sharp dividing line between tractable and intractable optimization?

Compute connection: Add a primal/dual checker in NumPy: given a small LP in standard form, compute primal and dual objective values at given points and verify the duality gap. Optional: implement brute-force search for a tiny NP-hard toy problem (e.g., 3-coloring a small graph) and contrast the cost with solving its LP relaxation.

Deliverable: One worked primal-dual example from start to finish.

Coding exercise — Core: Solve a simple constrained problem numerically.

NumPy: Implement projected gradient descent for box constraints. Solve a tiny quadratic program. Compare primal objective and dual lower bound.
Metal: Batched projection onto box constraints and objective evaluation. · Reading: MBT — numerical compute workflows.
Vulkan: Batched candidate evaluation and projection. · Reading: Vulkan Book — batched candidate evaluation / projections.
CUDA: Projection + objective kernels for large candidate sets. · Reading: CUDA Book — batched constrained-step kernels.
Stretch: Visualize feasible region in 2D. Add Lagrange multiplier iteration in a simple example.
Verify: Projected iterates remain feasible · Dual bound behaves as expected · KKT intuition becomes computationally concrete.

Week 28 — Entropy, Divergence, and Information Measures

Math: C&T Ch. 2–3

Secondary enrichment: Dasgupta — greedy algorithms section, especially Huffman coding

Proofs:

Prove nonnegativity of KL divergence.
Prove entropy is maximized by the uniform distribution over a finite set.
Prove chain rule for entropy.
Dasgupta enrichment: Prove prefix-freeness of a Huffman-style code construction in the form used in the text. Prove at a proof-sketch level that the greedy merge step in Huffman’s algorithm preserves optimal-substructure: the two least-frequent symbols should always be at the deepest level of an optimal code.

Exercises: C&T Ch. 2–3 — entropy, joint entropy, conditional entropy, mutual information, KL divergence, data-processing manipulations

Dasgupta enrichment exercises:

Build a Huffman tree by hand for a small alphabet with a given probability distribution (4–6 symbols).
Compute the average code length of the Huffman code and compare it to the entropy of the distribution.
Show that the average code length of the Huffman code satisfies H(X) ≤ L̄ < H(X) + 1, and verify this numerically on your example.
Write a short note: why does Huffman coding connect entropy (an information-theoretic quantity) to greedy algorithm design (an algorithmic technique)?

Compute connection: Implement Huffman coding as a small side project: priority-queue-based tree construction, prefix-code table generation, encode and decode a toy string. Report average code length vs. entropy.

Deliverable: Explain why entropy and KL divergence are useful loss/measurement quantities.

Coding exercise — Core: Compute entropy, cross-entropy, and KL divergence over large batches.

NumPy: Implement entropy for discrete distributions. Implement KL(p ‖ q) with care around zeros. Compare and plot.
Metal: Per-row entropy/KL compute kernel as a batched reduction exercise. · Reading: MBT — reductions and per-buffer statistical kernels.
Vulkan: Batched reduction-like numerical kernel for entropy/KL. · Reading: Vulkan Book — batched reduction-like numerical kernels.
CUDA: Efficient row-wise reductions with log-heavy compute. · Reading: CUDA Book — efficient row-wise reductions and log-heavy compute.
Stretch: Add softmax + cross-entropy pipeline. Interpret classification-style loss examples.
Verify: Entropy is nonnegative · KL is near zero only when distributions match · Numerical stability around tiny probabilities is handled carefully.

Phase 7 · Weeks 29–32

Abstract Algebra + Coding Theory

Tool guidance: NumPy primary. CUDA is useful for batched finite-field and coding experiments. Metal/Vulkan are optional — viable if you want integer/bitwise compute practice in a familiar pipeline.

Week 29 — Groups and Subgroup Structure

Math: Pinter (groups chapters)

Proofs:

Prove the subgroup test.
Prove uniqueness of identity and inverses in a group.
Prove order of an element divides the group order in the finite cyclic case; extend to Lagrange’s theorem.

Exercises: Pinter group chapters — groups, subgroups, cyclic groups, permutation examples, homomorphism basics

Deliverable: Connect modular arithmetic to GPU-friendly finite computations.

Coding exercise — Core: Implement modular arithmetic utilities and explore group structure computationally.

NumPy: Implement addition and multiplication mod n. Compute units mod n. Generate Cayley tables for small groups.
Metal: Batched modular arithmetic on buffers as an integer compute exercise. · Reading: MBT — integer compute basics, generic compute buffer passes.
Vulkan: Integer storage-buffer compute kernels. · Reading: Vulkan Book — integer storage-buffer compute.
CUDA: Large-batch modular arithmetic kernels for integer ALU throughput. · Reading: CUDA Book — integer ALU kernels and throughput.
Stretch: Visualize subgroup structure for small cyclic groups. Add permutation-group toy examples on CPU.
Verify: Operations are closed mod n · Unit elements are correctly detected · Cayley tables match theory for small n.

Week 30 — Rings, Ideals, and Polynomial Arithmetic

Math: Pinter (rings chapters)

Secondary enrichment: Dasgupta — arithmetic algorithms: Euclid, modular arithmetic, fast exponentiation

Proofs:

Prove that kernels of ring homomorphisms are ideals.
Prove that principal ideals in ℤ have the form nℤ.
Prove Euclidean algorithm correctness in ℤ[x] or ℤ.
Dasgupta enrichment: Prove correctness of the integer Euclidean algorithm using the invariant gcd(a, b) = gcd(b, a mod b) at each step, with the base case gcd(a, 0) = a. Prove correctness of modular exponentiation by repeated squaring using the binary representation of the exponent.

Exercises: Pinter ring chapters — ring axioms, ideals, quotient intuition, polynomial arithmetic, gcd/Euclidean algorithm

Dasgupta enrichment exercises:

Compute gcd(84, 36) and gcd(x³ − 1, x² − 1) (polynomial case) by hand using the Euclidean algorithm; verify each step.
Compare naive exponentiation (O(n) multiplications) vs. repeated squaring (O(log n) multiplications) for a^n mod p on a worked example.
Implement gcd and fast modular exponentiation in Python/NumPy and verify against pow(a, n, p).
Explain how the polynomial Euclidean algorithm relates to the integer one: what is the analog of remainder in each case, and why does termination follow in both?

Compute connection: Add gcd and fast modular exponentiation as utility functions. These will be directly useful for the finite-field inverse computation in Week 31 and the Reed–Solomon work in Week 32.

Deliverable: Explain polynomial arithmetic as preparation for coding theory.

Coding exercise — Core: Implement polynomial arithmetic.

NumPy: Represent polynomials as coefficient arrays. Implement add, multiply, and long division. Implement Euclidean algorithm for integer or mod-p coefficient polynomials.
Metal: Batched polynomial evaluation over many x values using Horner’s method. · Reading: MBT — generic compute over arrays, Horner-style evaluation kernels.
Vulkan: Buffer-based batched algebraic evaluation. · Reading: Vulkan Book — buffer-based batched algebraic evaluation.
CUDA: Polynomial evaluation and coefficient operations for throughput. · Reading: CUDA Book — throughput kernels over coefficient arrays.
Stretch: Work mod p for coding-theory prep. Add Horner evaluation benchmarking.
Verify: Division identity holds: f = qg + r · GCD results are consistent · Mod-p arithmetic is handled correctly.

Week 31 — Finite Fields and Field Extensions

Math: Pinter (finite field sections)

Secondary enrichment: Dasgupta — number-theoretic algorithms / modular arithmetic / field-adjacent arithmetic sections

Proofs:

Prove existence/construction of GF(p) for prime p.
Prove that quotient by an irreducible polynomial over GF(p) gives a field.
Construct GF(2ⁿ) explicitly for one small irreducible polynomial.
Dasgupta enrichment: Prove correctness of repeated squaring in modular arithmetic (building on Week 30). Prove that multiplicative inverses in GF(p) exist and can be computed via the Extended Euclidean algorithm or by Fermat’s little theorem: a^(p−1) ≡ 1 (mod p), so a^(p−2) is the inverse.

Exercises: Pinter finite-field material — irreducibility checks, quotient constructions, field arithmetic, extension degree

Dasgupta enrichment exercises:

Compute the multiplicative inverse of 7 in GF(11) using both the Extended Euclidean algorithm and Fermat’s little theorem; verify both give the same answer.
Implement fast modular exponentiation in GF(p) and use it to compute inverses; test against a direct multiplication-table lookup for a small prime.
Compare computing inverses via: (a) brute-force search through all elements, (b) multiplication table, (c) Fermat’s little theorem, (d) Extended Euclidean. Record operation counts.

Compute connection: Add modular inverse and finite-field helper routines that will be directly reused in Week 32’s Reed–Solomon encoder. These routines should work over both GF(p) and GF(2^m) (the latter using the polynomial representations from Week 31).

Deliverable: Work out one full table of arithmetic in a small finite field.

Coding exercise — Core: Implement finite-field arithmetic for a small extension field.

NumPy: Represent field elements as bit patterns. Implement add as XOR. Implement multiply mod an irreducible polynomial. Build multiplication tables for GF(2^m) with small m.
Metal: Batched GF(2^m) adds and multiplies as a bitwise compute exercise. · Reading: MBT — integer/bitwise compute patterns.
Vulkan: Bitwise compute kernels over integer buffers. · Reading: Vulkan Book — integer compute and buffer workflows.
CUDA: GF(2^m) arithmetic kernels with bitwise-heavy operations. · Reading: CUDA Book — bitwise-heavy kernels and batched arithmetic.
Stretch: Implement multiplicative inverse. Build log/antilog tables for a small field.
Verify: Field axioms hold on tested elements · Nonzero elements have inverses · Arithmetic matches hand-worked examples.

Week 32 — Error Correcting Codes and Polynomial Methods

Secondary enrichment: Dasgupta — any coding-, polynomial-, or algebraic-algorithm adjacent material in your edition

Proofs:

Prove uniqueness of polynomial interpolation through n points over a field.
Prove minimum-distance error-detection/correction relationships for a simple code.
Dasgupta enrichment: Prove that Lagrange interpolation over a field is correct and unique using the degree argument: a nonzero polynomial of degree < n has at most n − 1 roots, so a degree-(n−1) polynomial is uniquely determined by n distinct evaluation points.

Exercises:

Lagrange interpolation derivation.
Reed-Solomon encoding example.
Syndrome or parity-check reasoning for a small linear code.

Dasgupta enrichment exercises:

Work through a small polynomial interpolation problem by hand over GF(7) or GF(11): given n evaluation points, recover the unique polynomial of degree < n.
Encode a 2-symbol message as a Reed–Solomon codeword over a small finite field by evaluating the message polynomial at several points; corrupt one position and show it can be corrected.
Compare the cost of polynomial evaluation by direct Horner’s method vs. the cost of full Lagrange interpolation; relate this to the encoding/decoding asymmetry in RS codes.

Compute connection: Add an interpolation notebook and a toy Reed–Solomon encoder that uses the finite-field routines from Weeks 30–31. Encode, corrupt, and decode a short message; report whether correction succeeds as a function of the number of errors introduced.

Deliverable: One full Reed-Solomon toy example over a small finite field.

Coding exercise — Core: Implement a toy error-correcting code.

NumPy: Implement a simple linear block code or Hamming code. Then implement a tiny Reed-Solomon encoder over a small finite field. Encode, corrupt, and decode small messages.
Metal: Batch encode many messages in parallel. Or batch syndrome computations. · Reading: MBT — batched compute pipelines over structured data.
Vulkan: Encode/syndrome compute passes over codeword buffers. · Reading: Vulkan Book — multi-pass compute processing over codeword buffers.
CUDA: Reed-Solomon-style finite-field kernels with syndrome accumulation. · Reading: CUDA Book — batched algebraic kernels and syndrome accumulation.
Stretch: Add erasure decoding in a toy setting. Visualize codeword distance behavior.
Verify: Valid codewords satisfy parity/syndrome conditions · Small errors are detected or corrected as expected · Finite-field implementation is consistent.

Phase 8 · Weeks 33–36

Systems + Capstone

Tool guidance: Use the stack you want your capstone portfolio to showcase. Keep NumPy as oracle and analysis environment throughout. Metal is best for Apple-native graphics; Vulkan for explicit engine architecture; CUDA for compute-heavy research-style work.

Week 33 — GPU Profiling and Performance Analysis

Secondary enrichment: Sipser — time complexity and space complexity

Proofs:

Prove a performance claim using arithmetic intensity or asymptotic work/depth.
Sipser enrichment: Prove a basic complexity inclusion such as TIME(f(n)) ⊆ SPACE(f(n)) in the simple form your text presents. Prove or explain why asymptotic complexity is machine-model dependent only up to polynomial factors (the polynomial-equivalence of reasonable models of computation).

Exercises: Redo the hardest 2 exercises from linear algebra and numerical linear algebra.

Sipser enrichment exercises:

For one kernel from earlier weeks (e.g., matmul or reduction), state its asymptotic complexity in the RAM model; then contrast this with its measured runtime and explain what the O(·) model cannot see (memory latency, bandwidth, occupancy).
Explain why O(n log n) sorting can still lose to O(n²) on small n or on specialized hardware: give a concrete numerical threshold using measured constants.
Write a short “theory vs. measurement” note for one kernel: state the theoretical complexity, measure the actual runtime at several problem sizes, and identify whether the kernel is compute-bound, memory-bound, or synchronization-bound.

Compute connection: Add a “theory vs. measurement” section to the profiling report: for each kernel, record its asymptotic class, its measured arithmetic intensity, and which hardware resource is the bottleneck. This turns the profiling report into a bridge between complexity theory and systems reality.

Coding exercise — Core: Profile earlier kernels systematically.

NumPy: Build a benchmark harness and correctness harness for CPU reference routines. Record problem size, runtime, and error.
Metal: Use Metal profiling tools and timestamps. Profile reduction, scan, blur, matmul, and renderer passes. · Reading: MBT — debugging/profiling/performance sections, frame capture workflow.
Vulkan: Profile dispatches, barriers, and frame timing. · Reading: Vulkan Book — synchronization cost, frame pacing, resource lifetime and performance sections.
CUDA: Profile occupancy, bandwidth, and kernel timing. · Reading: CUDA Book — profiling, occupancy intuition, bottleneck analysis.
Stretch: Create a short report ranking kernels by bottleneck type: compute-bound, memory-bound, or synchronization-bound.
Verify: Benchmarks are repeatable · You know which kernels are actually slow · Optimizations are justified by measurements.

Week 34 — Integration of Graphics and Compute Pipelines

Proofs:

Prove correctness of one interface boundary between graphics and compute stages.

Exercises: Redo the hardest 2 exercises from analysis and signal processing.

Coding exercise — Core: Combine compute and rendering in one application.

NumPy: Prototype the data flow on CPU first: simulation → transform → shading inputs.
Metal: Build one app where a compute pass produces data consumed by a render pass — particle system, height field, or image post-process chain. · Reading: MBT — combining compute + graphics, resource sharing, render loop architecture.
Vulkan: Integrated engine-style compute + graphics flow with render graph / pass structure. · Reading: Vulkan Book — render graph / pass structure / synchronization concepts.
CUDA: Less natural for full graphics integration; use CUDA for a heavy compute stage with CPU/graphics handoff. · Reading: CUDA Book — interop or large compute subsystem patterns.
Stretch: Add double buffering or frame-resource management. Add debug visualizations.
Verify: Resource transitions/data flow are correct · Compute output visibly affects rendering · Architecture is clean, not ad hoc.

Week 35 — Capstone System Implementation

Proofs:

Prove one invariance or conservation property in the final system.

Exercises: Redo the hardest 2 exercises from probability and optimization.

Coding exercise — Core: Build the capstone.

Suggested capstone options:

Tiny graphics engine with compute-based post-processing
Path tracer with progressive accumulation
Geometry-processing sandbox with PCA/FFT/filtering
DSP/image-processing toolkit
Coding-theory visualizer
PR-inspired options: 1D/2D histogram (grid) localization engine · Particle filter visualizer with live belief heatmap · Kalman filter / extended Kalman filter toy simulator · Sensor fusion demo combining noisy odometry + distance sensor with GPU-accelerated belief update

Recommended stacks: Metal for Apple-native graphics capstone · Vulkan for explicit engine architecture capstone · CUDA for compute-heavy research-style capstone · NumPy as validation/oracle layer throughout.

Reading: MBT — whichever advanced chapters align with your capstone · Vulkan Book — engine architecture, advanced rendering, resource systems · CUDA Book — performance, memory hierarchy, advanced kernel tuning.
NumPy: Keep a CPU oracle/reference path for every major subsystem.
Stretch: Add test scenes and regression tests. Add offline image comparison or numeric result comparison.
Verify: End-to-end system works · Major math from earlier weeks appears explicitly in code · Results are reproducible enough for debugging.

Week 36 — System Optimization and Final Presentation

Secondary enrichment: Sipser — decidability / reducibility / NP-completeness overview chapters

Proofs:

Write a final “mathematical appendix” collecting the 5 best proofs from the course.
Sipser enrichment: Choose and write out one full reduction proof from the text. Choose one proof sketch of undecidability or intractability that you personally want to retain long-term (e.g., undecidability of the Halting Problem, or Cook-Levin).

Exercises: Redo the hardest 2 exercises from abstract algebra.

Sipser enrichment exercises — theory appendix for the capstone:

What problem class does your capstone project mostly live in? (P? numerical/continuous? combinatorial/discrete?)
What part of your capstone is numerical and continuous — where does convexity or smoothness give you tractability guarantees?
What part is combinatorial or discrete — where could complexity-theoretic limits become relevant if the problem scaled?
Where in your capstone do complexity-theoretic limits actually matter or could matter in a production version of the system?

PR enrichment exercises — estimation appendix for the capstone (if applicable):

What was the hidden state in your system? What was the observation model and the transition model?
Was the belief representation discrete (histogram), Gaussian (Kalman), or particle-based? Why did that choice fit the problem?
Where did GPU acceleration help most in the estimation or simulation pipeline?

Deliverable: Final proof portfolio + annotated bibliography.

Coding exercise — Core: Polish, optimize, and present.

NumPy: Finalize CPU reference notebooks and plots. Produce benchmark charts and error summaries.
Metal: Tune workgroup sizes, memory layout, and pass structure. Capture final profiling screenshots. · Reading: MBT — polishing/debugging/performance sections.
Vulkan: Performance and engine-organization polish. · Reading: Vulkan Book — performance and engine-organization sections.
CUDA: Tuning, profiling, and bottleneck analysis. · Reading: CUDA Book — tuning, profiling, bottleneck analysis sections.
Stretch: Write a mini technical report covering: problem, algorithm, implementation, performance, numerical issues, and future work.
Verify: Final demo is stable · Performance claims are measured · Mathematical ideas and systems ideas are both visible.

Implementation Notes

Stack Choice by Phase

Weeks	Primary	Notes
1–4	NumPy + Metal or CUDA	Vulkan viable if you want explicit engine foundations from day one
5–8	NumPy + CUDA	Metal also good for Apple-native; Vulkan for compute buffer practice
9–12	Metal or Vulkan	Graphics-heavy weeks; a live graphics app is essential
13–16	CUDA	Best for raw parallel algorithm study; Metal and Vulkan also good
17–19	NumPy first, then Metal/Vulkan/CUDA	Metal/Vulkan for image integration; CUDA for pure compute
20–24	NumPy first, CUDA for heavy MC	Metal/Vulkan if you want rendering integration
25–28	NumPy + CUDA	Metal/Vulkan optional for batched numerical kernels
29–32	NumPy + CUDA	Metal/Vulkan optional for integer/bitwise compute practice
33–36	Capstone stack of choice	NumPy as oracle throughout

Practical Split

A clean overall progression: NumPy as the mathematical oracle throughout; Metal for Apple graphics and general GPU learning; CUDA for compute-heavy parallel algorithms; Vulkan for explicit engine and graphics API mastery.

Very practical split:

Weeks 1–12: NumPy + Metal
Weeks 13–24: add CUDA for compute-heavy kernels
Weeks 25–32: mostly NumPy + CUDA
Weeks 33–36: final capstone in Metal or Vulkan, with CUDA only if the capstone is more compute than graphics

Swift vs. Objective-C++

Use Swift when: building Metal apps, managing buffers/pipelines cleanly, or writing modern Apple-platform code quickly.

Use Objective-C++ when: mixing Metal with existing C++ math/rendering code, wanting shared C++ core logic across platforms, or planning engine-style infrastructure.

Default: Swift + Metal for Weeks 1–28. Objective-C++ only if you specifically want a reusable engine core or C++ interop.

Final Outcome

A complete, self-built curriculum spanning:

GPU systems and parallel programming
Computer graphics (rendering pipelines, shading, texturing)
Applied mathematics: linear algebra, analysis, probability, signal processing, optimization, information theory, algebra