Syllabus
10-Week NLP + Deep Learning + LLM Course
Full week-by-week plan. Each entry lists the theme and reading. Detailed theory exercises, implementation tasks, experiments, and definitions of done live in the weekly lecture notes.
Prerequisites — Courses 5 and 6. This course assumes the mathematical foundations are in hand from Course 5: probability and Bayes (Naive Bayes, smoothing, calibration), convex optimization and gradient descent (logistic regression, training), information theory (entropy, cross-entropy, perplexity, KL), and linear algebra (embeddings, attention, projections). Course 6 supplies the signal-processing front end (sampling, spectrograms, MFCCs) used in speech. Those are referenced and applied rather than re-derived — which is what compresses the course from 16 weeks to 10 focused ones.
Note on AI use: The weekly lecture notes and syllabus summaries on this site are drafted with AI assistance, so each week has a consistent structure to study from. The substance is mine: every math proof and exercise solution is worked by hand on paper first and converted to LaTeX/KaTeX/MathJax using AI for typesetting only, and all code is written by me. The goal is to learn the material, which only happens by producing the proofs and the code myself.
Weekend rhythm: Saturday morning = watch lecture material. Saturday afternoon = read textbook sections. Saturday evening = implement the core model. Sunday = finish implementation, run experiments and benchmarks, and write a staff-level note in docs/weekXX-topic.md.
General rule for each week: (1) theory exercises (applying Course 5 results), (2) implementation, (3) experiments with recorded data, (4) staff-level note in docs/.
Goal: By the end of this 10-week course you should understand classical NLP, modern deep-learning NLP, transformers, LLMs, masked language models, instruction tuning, alignment, retrieval-augmented generation, information extraction, and practical NLP/LLM system design. The portfolio artifact is a citation-grounded RAG assistant over course notes, papers, books, and technical documents.
Primary resources: Stanford CS224N (deep learning for NLP), MIT 6.S191 (deep-learning foundations and LLMs), Jurafsky & Martin Speech and Language Processing 3rd ed. draft (core NLP textbook).
Book abbreviations:
NLP/Speech: J&M (Jurafsky & Martin) · CS224N · 6.S191 (MIT 6.S191)
Math (review from Course 5): BT (Bertsekas & Tsitsiklis) · C&T (Cover & Thomas) · Boyd · Axler · T&B · DPV · Sipser
Main repository structure:
nlp-llm-course/
docs/
nlp_lab/
tokenization_ngram/ embeddings/ rnn_lm/
sequence_labeling_parsing/ seq2seq_attention/
transformer_from_scratch/ llm_mlm/ post_training/
rag_system/ agents_ie/
final_project/
app/ eval/ data/ benchmarks/ reports/
Phase 1 · Weeks 1–3 — Foundations: Text, Tokens, Embeddings, Sequences
Week 1 — Tokenization, Subwords (BPE), and N-Gram Language Models
Theme: Language is messy before it becomes a tensor; the first model of language was counting plus smoothing.
Read: J&M: Ch. 2 Words and Tokens; Ch. 3 N-gram Language Models. · CS224N: intro, tokenization, language modeling. · 6.S191: Lecture 1. (Entropy, cross-entropy, and perplexity: assumed from Course 5 information theory.)
Week 2 — Embeddings and Dense Representations
Theme: Turn sparse text into dense learned vectors; text classification as the warm-up application.
Read: J&M: Ch. 5 Embeddings; Ch. 6 Neural Networks; Ch. 4 Logistic Regression (skim). · CS224N: word-vector (word2vec/GloVe) material. · 6.S191: Lectures 1–2. (Logistic regression, gradient descent, Naive Bayes, and backprop foundations: assumed from Course 5 optimization/probability.)
Week 3 — RNNs, LSTMs, and Sequence Modeling
Theme: Model language as a sequence before transformers take over.
Read: J&M: Ch. 13 RNNs and LSTMs. · 6.S191: Lecture 2 Deep Sequence Modeling. · CS224N: language models, RNNs, vanishing gradients, LSTMs.
Phase 2 · Weeks 4–6 — Syntax, Attention, and Transformers
Week 4 — Sequence Labeling and Parsing: POS, NER, HMMs, Dependency Parsing
Theme: Assign labels to tokens and recover the structural skeleton of language.
Read: J&M: Ch. 17 Sequence Labeling (POS, NER); Appendix A Hidden Markov Models; Ch. 18 Constituency Parsing; Ch. 19 Dependency Parsing. · CS224N: sequence models and dependency parsing. (Viterbi/CKY as dynamic programming: DPV review from Course 5.)
Week 5 — Attention and Seq2Seq Machine Translation
Theme: Attention solves the bottleneck of compressing a sentence into one vector.
Read: J&M: Ch. 12 Machine Translation. · CS224N: seq2seq and attention lectures. · 6.S191: sequence-modeling material.
Week 6 — Transformers from Scratch
Theme: Build the core architecture behind modern LLMs.
Read: J&M: Ch. 8 Transformers. · CS224N: self-attention and transformer material. · 6.S191: LLM material. (Linear algebra of attention/projections: Axler review from Course 5.)
Phase 3 · Weeks 7–10 — LLMs, Alignment, Retrieval, and Final Project
Week 7 — Large Language Models, Scaling, and Masked Models (BERT)
Theme: What changes when models become large; when an encoder beats a decoder.
Read: J&M: Ch. 7 Large Language Models; Ch. 10 Masked Language Models. · CS224N: LLMs, scaling, BERT/contextual embeddings. · 6.S191: New Frontiers lecture. (Scaling laws / cross-entropy: C&T review from Course 5.)
Week 8 — Post-Training: Instruction Tuning, RLHF, DPO, LoRA, Alignment
Theme: Pretraining predicts text; post-training makes models usable and aligned.
Read: J&M: Ch. 9 Post-training (instruction tuning, alignment, test-time compute). · CS224N: instruction tuning, RLHF, DPO, LoRA. · 6.S191: fine-tuning lab. (Policy-gradient / preference-optimization objectives: Boyd optimization review from Course 5.)
Week 9 — Information Retrieval, RAG, Agents, and Extraction
Theme: LLMs need external memory, citations, structure, and tools.
Read: J&M: Ch. 11 Information Retrieval and RAG; Ch. 20 Information Extraction; Ch. 23 Coreference; Ch. 25 Conversation and Agents. · CS224N: LLM application and evaluation material. (BM25/retrieval and entity linking as algorithms: DPV review from Course 5.)
Week 10 — Capstone: Citation-Grounded RAG Assistant
Theme: Integrate everything into a portfolio-grade, evaluated NLP/LLM system.
Read: Review all prior docs. · CS224N final-project and LLM-evaluation material. · Production NLP/LLM engineering references for serving and evaluation context.
Book Integration Map
| Book (Course 5 review) | Weeks |
|---|---|
| Bertsekas & Tsitsiklis | 1, 4, 7, 8 — probability, Bayes, Markov models, calibration |
| Cover & Thomas | 1, 7, 8, 9 — entropy, cross-entropy, KL, perplexity |
| Axler / T&B | 2, 5, 6, 9 — embeddings, dot products, attention, projections, SVD |
| Boyd & Vandenberghe | 2, 7, 8 — logistic regression, gradient descent, fine-tuning |
| DPV / Sipser | 1, 4, 9 — edit distance, Viterbi, CKY, retrieval, grammars/parsing |
| Oppenheim (Course 6) | speech front-end (optional Weeks 11+) — spectrograms, MFCCs |
Optional Weeks 11+ Direction
If prioritizing speech and multimodal NLP: phonetics, MFCCs/spectrograms (building on Course 6 DSP), ASR (CTC/attention), TTS (VITS/Bark), Whisper fine-tuning, audio-text alignment, multimodal LLMs. (This is the dedicated speech track that the 16-week version covered inline; the signal-processing front end now lives in Course 6.)
If prioritizing LLM systems: deeper fine-tuning pipelines, quantization/GGUF, flash attention, speculative decoding, multi-GPU inference, production serving (vLLM, TGI), evaluation frameworks.
If prioritizing search and RAG: graph RAG, multi-hop retrieval, long-document QA, structured retrieval over tables, RAG evaluation (RAGAS), vector-database deployment.
If prioritizing agents and extraction: complex tool-calling agents, code-execution agents, multi-agent frameworks, knowledge-graph construction, long-document extraction at scale.