Week 15 — Theory of Computation: Automata, Languages, and Complexity

Course 5 syllabus

Overview

This week formalizes what “computation” and “efficient” mean. Finite automata and regular languages model the simplest machines (and are exactly what regular expressions and lexers compute); context-free grammars and pushdown automata model nested structure (parsers, programming-language syntax); and complexity theory makes precise the classes P and NP whose practical face appeared in Week 14. The recurring tools are nondeterminism, the pumping lemmas (for proving a language is not in a class), and reductions.

Readings

  • Sipser Ch 0 — Introduction. Sets, functions, relations, graphs, and proof techniques (induction, contradiction, construction).
  • Sipser Ch 1 — Regular languages. DFAs, NFAs, regular expressions, the pumping lemma for regular languages.
  • Sipser Ch 2 — Context-free languages. CFGs, pushdown automata, the pumping lemma for context-free languages.
  • Sipser Ch 7 — Time complexity. Big-O on Turing machines, the classes P and NP, NP-completeness, polynomial-time reductions.

Key Concepts

Proof techniques and the setup

Chapter 0 fixes the mathematical foundation: precise definitions of sets, functions, relations, strings, and languages (a language is a set of strings over an alphabet \(\Sigma\)), and the proof methods — induction, contradiction, and construction — used throughout. A computational model is judged by which languages it can decide.

Finite automata and regular languages

A DFA has finitely many states and a deterministic transition on each input symbol; it accepts a string if it ends in an accept state. A nondeterministic NFA may branch, but NFAs and DFAs recognize the same class — the regular languages — via the subset construction. Regular languages are closed under union, concatenation, star, intersection, and complement, and Kleene’s theorem says they are exactly the languages described by regular expressions. The pumping lemma gives a property all regular languages share; its contrapositive is the standard tool for proving a language (e.g. \(\{0^n 1^n\}\)) is not regular.

Context-free languages

A context-free grammar generates strings by production rules; the languages so generated are the context-free languages (CFLs), recognized by pushdown automata (a finite control plus a stack). CFLs capture nested/recursive structure — balanced parentheses, arithmetic expressions, programming-language syntax — which regular languages cannot. There is a corresponding pumping lemma for CFLs to prove a language is not context-free (e.g. \(\{a^n b^n c^n\}\)).

Time complexity: P and NP

A Turing machine is the standard model of general computation. Time complexity counts steps as a function of input length. The key classes:

\[\mathrm{P} = \bigcup_k \mathrm{TIME}(n^k), \qquad \mathrm{NP} = \text{languages with polynomial-time verifiers.}\]

P is “efficiently decidable”; NP is “efficiently checkable.” A problem is NP-complete if it is in NP and every NP problem reduces to it in polynomial time — so an efficient algorithm for one would solve all of NP. SAT is NP-complete (Cook–Levin), and further NP-completeness proofs proceed by polynomial-time reduction from a known NP-complete problem. Whether \(\mathrm{P} = \mathrm{NP}\) is the central open question; this is the formal underpinning of Week 14’s practical hardness.

Connections

  • Backward: the P/NP classes and reductions formalize the intractability discussed concretely in Week 14.
  • Across courses: automata and grammars under tokenization, parsing, and formal-language models (Course 3); complexity reasoning and the cost model behind performance engineering (Course 4).

Further Reading

  • Sipser, Introduction to the Theory of Computation, 3rd ed., Chapters 0–2, 7 (and 3–5 for Turing machines, decidability, and reducibility if going deeper).