Week 8 — Convex Sets and Convex Functions
Overview
Convexity is the property that makes an optimization problem solvable: a convex problem has no spurious local minima, so any local optimum is global. This week builds the geometric and analytic vocabulary — convex sets and cones, the operations that preserve convexity, convex functions and their first/second-order characterizations, conjugates, and quasiconvexity. Recognizing convexity (or constructing it) is the single highest-leverage modeling skill in applied optimization, so the emphasis is on the calculus of “what stays convex.”
Readings
- Boyd Ch 2 — Convex sets. Affine and convex sets, important examples (hyperplanes, halfspaces, norm balls, polyhedra, the PSD cone), operations preserving convexity, generalized inequalities, separating and supporting hyperplanes.
- Boyd Ch 3 — Convex functions. Definition and first/second-order conditions, examples, operations preserving convexity, the conjugate function, quasiconvex functions, log-concavity.
Key Concepts
Convex sets and cones
A set \(C\) is convex if the segment between any two of its points stays inside: \(x,y\in C \Rightarrow \theta x + (1-\theta)y \in C\) for \(\theta\in[0,1]\). Key examples: hyperplanes and halfspaces, norm balls, polyhedra \(\{x : Ax \le b\}\), the second-order (ice-cream) cone, and the positive semidefinite cone \(\mathbb{S}^n_+\). A convex cone additionally contains all nonnegative combinations. Convexity is preserved by intersection, affine images and preimages, and perspective/linear-fractional maps — this is how you certify a complicated set is convex without checking the definition directly.
Separating and supporting hyperplanes
Two disjoint convex sets can be separated by a hyperplane; at any boundary point of a convex set there is a supporting hyperplane. These theorems are the geometric root of Lagrange duality (Week 9) and of optimality conditions.
Convex functions
\(f\) is convex if its domain is convex and
\[f(\theta x + (1-\theta)y) \le \theta f(x) + (1-\theta) f(y).\]
Equivalent characterizations on a differentiable \(f\):
- First order: \(f(y) \ge f(x) + \nabla f(x)^\top (y - x)\) — the function lies above all its tangent planes, so a stationary point is a global minimum.
- Second order: \(\nabla^2 f(x) \succeq 0\) — the Hessian is positive semidefinite everywhere (here the symmetric eigenstructure of Week 2 decides convexity).
Examples: affine functions, norms, \(\exp\), \(-\log\), \(x\log x\), quadratics with PSD matrices, log-sum-exp, and the max of convex functions.
Operations preserving convexity
Nonnegative weighted sums, composition with affine maps, pointwise maximum/supremum, partial minimization over a convex set, and certain scalar compositions (e.g. \(h\circ g\) with \(h\) convex nondecreasing and \(g\) convex) all preserve convexity. As with sets, you almost never prove convexity from scratch — you build it from a known atom through these rules.
Conjugate, quasiconvexity, log-concavity
The conjugate \(f^*(y) = \sup_x (y^\top x - f(x))\) is always convex and is the analytic device behind duality (Week 9). A function is quasiconvex if its sublevel sets are convex (weaker than convex; allows bisection-based solution). Log-concave densities (\(\log f\) concave) include the Gaussian and many others, and log-concavity is preserved under marginalization — useful in statistics and estimation.
Connections
- Backward: convexity arguments lean on Week 5 — continuity, closedness, and the compactness that guarantees a minimizer exists.
- Forward: Week 9 puts these pieces into standard problem forms, derives the dual via the conjugate and supporting hyperplanes, and gives algorithms whose convergence depends on the curvature (Hessian) studied here.
- Across courses: Loss-function design, regularization, and trust-region/optimizer intuition (Courses 1, 3).
Further Reading
- Boyd & Vandenberghe, Convex Optimization, Chapters 2–3 (free PDF from the authors).