Lectures
Please find below the schedule of our lectures. Each lecture includes a brief description, required readings, and suggested readings for further exploration. The lecture slides and additional resources will be posted on the course website and GitHub repository. Please remember that the schedule may change during the course. If you have any questions or need further assistance, please let me know!
Module 0 — Orientation
January 14, 2026:
- Syllabus and course repository: https://github.com/danilofreire/datasci185.
- Lecture 01: Course introduction.
Readings:
- Stanford University’s Human-Centered Artificial Intelligence (2025). AI Index Annual Report. An excellent report on the state of AI, with many useful charts and references.
- Our World in Data (2025). Artificial intelligence. A concise overview of AI trends and applications, with links to further resources.
- Menlo Ventures (2025). 2025: the state of consumer AI. How AI is being used in consumer products, with examples and market data.
- Morgan Stanley (2025). AI’s next leap: 5 trends shaping innovation and ROI. A short text on the current state of AI applications in industry.
- McKinsey & Company (2025). The state of AI: how organizations are rewiring to capture value. A business-focused overview of AI adoption and impact.
- The New Yorker (2026). What is Claude? Anthropic doesn’t know, either. A fascinating article about the challenges of building and understanding large language models. Also available here.
January 19, 2026: Martin Luther King Day (No Classes)
January 21, 2026:
- Lecture 02: A brief history of AI and the recent shift.
- Assignment 01.
Readings:
- Dick, S. (2019). Artificial intelligence. Harvard Data Science Review, 1(1). A brief history of AI research and applications. Written for a general audience.
- Grzybowski, A., Pawlikowska–Łagód, K., & Lambert, W. C. (2024). A history of artificial intelligence. Clinics in Dermatology, 42(3), 221-229. A “pre-history” overview, focusing on early ideas and milestones.
- Cao, Y., Liu, L., Li, M., Huang, Y., & Li, S. (2023). A comprehensive survey of AI-generated content (AIGC): A history of generative AI from GAN to ChatGPT. arXiv preprint. A somewhat technical but comprehensive overview of generative AI methods.
- Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2), 8-12. A classic paper arguing that large datasets are often more important than sophisticated algorithms.
- Pradhan, M. (2023). A non-technical introduction to Transformers. A clear explanation of the transformer architecture that underpins many modern AI systems.
- Georgia Tech’s Polo Club of Data Science (2025). Transformer explainer. An interactive visualisation of how transformers work.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. In case you want to read the original paper. Not for the faint of heart.
Module 1 — How AI systems are designed
January 26, 2026:
- Lecture 03: Dataset design, labels, and tasks.
- Kahoot Quiz
Readings:
- Athalye, A., Northcutt, C., & Mueller, J. (2024). Dataset creation and curation in Data-Centric AI. A chapter from a forthcoming book on data-centric AI, covering the main ideas in dataset design.
- Rich, A. S., & Gureckis, T. M. (2019). Lessons for artificial intelligence from the study of natural stupidity. Nature Machine Intelligence, 1(4), 174-180. A title this good is hard to find. The article discusses different sources of bias in AI systems.
- Sapien Labs. (2024). Labeling data for machine learning: best practices and quality control. A practical guide to data labelling, with tips and common pitfalls.
- Liang, W., Tadesse, G. A., Ho, D., Fei-Fei, L., Zaharia, M., Zhang, C., & Zou, J. (2022). Advances, challenges and opportunities in creating data for trustworthy AI. Nature Machine Intelligence, 4(8), 669-677. A good discussion on the challenges of creating high-quality datasets.
- Chai, C., & Li, G. (2020). Human-in-the-loop techniques in machine learning. IEEE Data Eng. Bull., 43(3), 37-52. A little more technical, but quite comprehensive.
January 28, 2026:
- Lecture 04: Supervised, unsupervised, and reinforcement learning.
- Kahoot Quiz
- Assignment 01 due (5%).
- Assignment 02.
Readings:
- Jiang, T., Gradus, J. L., & Rosellini, A. J. (2020). Supervised machine learning: a brief primer. Behavior Therapy, 51(5), 675-687. A non-technical introduction to supervised machine learning.
- Amazon Web Services (2025). What is the difference between supervised and unsupervised learning? A practical overview with examples.
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Chapter 5: Machine learning basics in Deep Learning. A technical but clear introduction to the main ideas behind machine learning, by some of the founders of deep learning.
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning. A widely used book covering supervised and unsupervised learning methods. Included here for reference, as you’ll surely use it in your future career.
- Ghasemi, M., & Ebrahimi, D. (2024). Introduction to reinforcement learning. arXiv preprint. Short introduction to the topic. If you would like to read something more advanced, please refer to Sutton & Barto (2015).
February 02, 2026:
- Lecture 05: Metrics, validation and overfitting.
- Kahoot Quiz
Readings:
- Thomas, R. L., & Uminsky, D. (2022). Reliance on metrics is a fundamental challenge for AI. Patterns, 3(5).100440. A discussion of the limitations of metrics.
- Akinkugbe, A. (2025). The essential guide to model evaluation metrics for classification. A practical guide to common evaluation metrics.
- Confident AI (2025). LLM evaluation metrics: the ultimate LLM evaluation guide. A simple guide on new ways to measure the quality of LLM outputs.
- Rashka, S. (2025) Understanding the 4 main approaches to LLM evaluation. Good blog post about the different ways to evaluate LLMs.
- Coursera (2025). Precision vs. recall in machine learning: What’s the difference?. A short article about those two commonly confused metrics.
- Inie, N., Stray, J., & Derczynski, L. (2023). Summon a demon and bind it: a grounded theory of LLM red teaming in the wild. arXiv preprint. A good overview of red teaming.
- Radharapu, B., Robinson, K., Aroyo, L., & Lahoti, P. (2023). Aart: AI-assisted red-teaming with diverse data generation for new LLM-powered applications. arXiv preprint*. A method to improve LLM safety with simple adversarial attacks.
Module 2 — Language and perception
February 04, 2026:
- Lecture 06: How machines handle language: tokens, embeddings and context.
- Kahoot Quiz
- Assignment 02 due (5%).
- Assignment 03.
Readings:
- HuggingFace (2025). Tokenizer summary A really good overview of tokenisation, with examples and videos.
- Neptune.ai (2025). The ultimate guide to word embeddings. A clear and practical introduction to word embeddings.
- Neptune.ai (2025). Tokenization in NLP: types, challenges, examples, tools. Another article by the same group on tokenisation.
- StackOverflow (2023). An intuitive introduction to text embeddings. A practical introduction to embeddings.
- Metzger, S. (2022). A beginner’s guide to tokens, vectors, and embeddings in NLP.. Easy to read and with interesting examples.
- Boykis, V. (2022). What are embeddings?. A whole book dedicated to embeddings. For those who are interested in a more in-depth treatment. There’s also a GitHub repository with code examples.
February 09, 2026:
- Lecture 07: How machines see and hear.
- Kahoot Quiz
Readings:
- O’Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. arXiv preprint. A clear and concise tutorial.
- Gupta, A. (2025). How computers see the world: a beginner’s guide to CNNs An illustrated introduction to convolutional neural networks. Feel free to skip the coding section if you are not interested.
- AltexSoft (2025). How AI sound, music, and voice generation works. Another practical introduction. It goes a little beyond what we cover in class.
- Raschka, S. (2024). Understanding Multimodal LLMs. More technical but clear.
- Wu, J., Gan, W., Chen, Z., Wan, S., & Yu, P. S. (2023). Multimodal Large Language Models: A Survey. arXiv preprint.
- Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023, July). Robust speech recognition via large-scale weak supervision. In International conference on machine learning (pp. 28492-28518). PMLR. The original paper for Whisper.
February 11, 2026:
- Lecture 08: Quiz 01.
- Assignment 03 due (5%).
- Assignment 04.
February 16, 2026:
- Lecture 09: Prompting techniques.
- Kahoot Quiz
Readings:
- Anthropic (2025). Prompt engineering overview. Official guide from Claude’s creators.
- OpenAI (2025). Prompt engineering. OpenAI’s best practices.
- Google (2025). Gemini for Workspace Prompting Guide. The persona-task-context-format (PTCF) framework.
- Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. *Advances in neural information processing systems, 33, 1877-1901.
- Wei, J., et al. (2022). Chain-of-Thought prompting elicits reasoning in large language models. The original CoT paper.
- Prompting Guide. Community-maintained reference with examples.
- Google (2025). Prompting essentials. Free course on Coursera.
- Berryman, J., & Ziegler, A. (2024). Prompt engineering for LLMs: the art and science of building large language model–based applications. A recent book about the topic.
Module 3 — Retrieval, generation and pipelines
February 18, 2026:
- Lecture 10: Creativity and hallucination.
- Kahoot Quiz
- Assignment 04 due (5%).
- Assignment 05.
Readings:
- Jiang, X., Tian, Y., Hua, F., Xu, C., Wang, Y., & Guo, J. (2024). A survey on large language model hallucination via a creativity perspective. arXiv preprint. Very interesting paper. If you want to read just one article on this topic, this is a good candidate.
- Wang, C., Liu, X., Yue, Y., Guo, Q., Hu, X., Tang, X., … & Zhang, Y. (2025). Survey on factuality in large language models. ACM Computing Surveys, 58(1), 1-37. Comprehensive and informative. Not technical.
- Anh-Hoang, D., Tran, V., & Nguyen, L. M. (2025). Survey and analysis of hallucinations in large language models: attribution to prompting strategies or model behavior. Frontiers in Artificial Intelligence, 8, 1622292. The authors argue that chain-of-thought prompting reduce hallucinations in prompt-sensitive scenarios.
- Shao, A. (2025). New sources of inaccuracy? A conceptual framework for studying AI hallucinations. Harvard Kennedy School Misinformation Review. Worth reading as well.
- OpenAI (2024). Does ChatGPT tell the truth? Practical advice.
February 23, 2026:
Readings:
- Amazon Web Services (2025). What is retrieval-augmented generation (RAG)? Beginners’ guide to RAG.
- Mastering LLM (2024). 11 chunking strategies for RAG — simplified & visualized Nice explanation of chunking strategies.
- Slack (2025). Semantic search explained: how it works, and why it matters. Good explainer.
February 25, 2026:
- Lecture 12: Quiz 02.
- Assignment 05 due (5%).
- Assignment 06.
March 02, 2026:
- Lecture 13: Building reliable pipelines: monitoring and testing.
Readings:
- Google (2025). Production ML systems: monitoring pipelines. Practical guide to monitoring ML pipelines.
- Allen, J. (2024). Data pipeline monitoring: best practices for full observability. Practical tips.
- Spot Intelligence (2024). Data drift in machine learning. An accessible overview of how data drift affects ML systems over time.
- Howard, J. (2024). Context degradation syndrome: when large language models lose the plot. A discussion on why model performance degrades over time.
- Nawani, M. (2025). From correctness to confidence: understanding LLM testing. Explains the challenges of testing non-deterministic AI systems.
- ApX Machine Learning (2025). Input validation and sanitization for LLMs. Practical guide to defending AI systems against prompt injection attacks.
- ApX Machine Learning (2025). Output filtering and content moderation. Best practices for validating AI outputs before sending to users.
- AI Incident Database. AI Incident Database. A collection of real-world AI failures and incidents for case study research.
Module 4 — Data ethics and bias
March 04, 2026:
- Lecture 14: Documentation and dataset governance.
- Assignment 06 due (5%).
- Assignment 07.
- Kahoot Quiz
Readings:
- Gebru, T., et al. (2018). Datasheets for datasets. arXiv preprint. The original paper proposing datasheets for datasets.
- Mitchell, M., et al. (2019). Model cards for model reporting. Proceedings of FAccT. The original paper proposing model cards.
- IBM (2025). What is data lineage? A short guide.
- GO-FAIR (2025). FAIR principles. The gold standard for making research data Findable, Accessible, Interoperable, and Reusable.
- OpenAI (2021). CLIP model card. A practical example of model documentation, including disaggregated performance evaluation.
- Lanier, J. (2019). Data dignity and the inversion of AI. A philosophical perspective on data rights and the idea that your data is your labour.
- Hutchinson, B., et al. (2021). Towards accountability for machine learning datasets: Practices from software engineering and infrastructure. FAccT. Bridges software engineering practices to dataset accountability.
- Crawford, K., & Paglen, T. (2019). Excavating AI: The politics of images in machine learning training sets. A compelling visual essay on ImageNet’s problems and the politics of training data.
March 9 and 11, 2026: Spring Break (No Classes)
March 16, 2026:
- Lecture 15: Types of bias and how they arise.
Readings:
- Barocas, S., Hardt, M., & Narayanan, A. (2019). Fairness and machine learning. Chapter 2: When is automated decision making legitimate?
- Mehrabi, N., Morstatter, F., Saxena, N., Lettich, A., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1-35.
- Silberg, J., & Manyika, J. (2019). Notes from the AI frontier: tackling bias in AI (and in humans). McKinsey Global Institute.
- Hellman, D. (2025). Algorithmic fairness. The Stanford Encyclopedia of Philosophy (Fall 2025 Edition).
- Kantayya, S. (2020). Coded Bias. Documentary film. May or may not be available on Netflix depending on your location.
- Liu Roc, A. (2023). Understanding the impossibility of fairness. Medium. A good, accessible overview of the impossibility results in fairness.
March 18, 2026:
- Lecture 16: AI in Finance and Healthcare - Opportunities and Biases.
- Assignment 07 due (5%).
- No assignment this week.
- Instructions for the Final Project.
Readings:
- Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453. An important paper on the topic. We will discuss it in class.
- Li, Y. et al. (2023). Large language models in finance: a survey. ICAIF-23. A comprehensive overview. Easy to read.
- Opel, N., & Breakspear, M. (2026). Transforming mental health research and care through artificial intelligence Science, 391(6782), 249-258. The authors review ways in which AI may reduce care inequities when deployed responsibly. They discuss how AI can be useful across several stages of care, from prevention to treatment and rehabilitation.
- Agentic Health AI. (2025). Awesome AI agents for healthcare. A curated list of research papers, projects, and resources related to the application of AI agents for healthcare, including medical image analysis, EHR manipulation, counseling, drug discovery, patient dialogue, and healthcare administration.
- Zouq, G. (2026). Awesome AI in Finance. Same thing for finance.
- AI4Finance Foundation (2026). AI4Finance. A series of open-source frameworks to build AI models for finance. Requires coding skills.
Module 6 — Applications, limits and projects
April 06, 2026:
Readings:
- Habicht, J., Dina, L. M., McFadyen, J., Stylianou, M., Harper, R., Hauser, T. U., & Rollwage, M. (2025). Generative AI–enabled therapy support tool for improved clinical outcomes and patient engagement in group therapy: real-world observational study. Journal of Medical Internet Research, 27, e60435. An experiment on the use of AI in group therapy. It has a positive result.
- Moore, J., Grabb, D., Agnew, W., Klyman, K., Chancellor, S., Ong, D. C., & Haber, N. (2025). Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers. arXiv preprint arXiv:2504.18412. The authors question the idea that LLMs should replace mental health providers.
- OpenAI. (2025). Strengthening ChatGPT’s responses in sensitive conversations. OpenAI’s approach to making ChatGPT more helpful in sensitive conversations.
April 08, 2026:
- Lecture 22: Quiz 04.
- Assignment 09 due (5%).
- Assignment 10.
April 13, 2026:
- Lecture 23: Misinformation, deepfakes and trust online.
Readings:
- Helmus, T. C. (2022). Artificial intelligence, deepfakes, and disinformation: a primer. RAND Corporation. A major revision of the literature.
- Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146-1151. A seminal paper on the topic of misinformation. It shows that false news spreads faster and farther than true news, and that bots are not the main driver of this phenomenon.
- Lazer, D. M., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., … & Zittrain, J. L. (2018). The science of fake news. Science, 359(6380), 1094-1096. A call to action for researchers to study misinformation and develop solutions.
- Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of economic perspectives, 31(2), 211-236. A widely-cited paper that analyses the economics of fake news.
- Mirsky, Y., & Lee, W. (2020). The creation and detection of deepfakes: a survey. arXiv preprint arXiv:2004.11138. A technical survey on deepfakes. Quite technical, though. If you are interested in the topic but don’t have a technical background, you can skip this one and read the next one instead.
- Chesney, R., & Citron, D. K. (2019). Deepfakes and the new disinformation war. Foreign Affairs. A short article about the risks of deepfakes and how to address them.
April 15, 2026:
- Lecture 24: Long-term safety, the alignment problem and the future of AI.
- Assignment 10 due (5%).
Readings:
- Amodei, D., et al. (2016). Concrete problems in AI safety. arXiv preprint. A highly influential paper on the practical challenges of AI safety.
- Ji, J., Qiu, T., Chen, B., Zhang, B., Lou, H., Wang, K., … & Gao, W. (2023). AI alignment: a comprehensive survey. arXiv preprint. A survey of the field of AI alignment.
- Zhi-Xuan, T., Carroll, M., Franklin, M., & Ashton, H. (2025). Beyond preferences in AI alignment. Philosophical Studies, 182(7), 1813-1863. A philosophical take on the limits of rational choice in AI alignment. A little more abstract, but very interesting.
- Anthropic. (2025). Constitutional classifiers: defending against universal jailbreaks. A jailbreak defense system using input/output classifiers trained on synthetic data. It reduced jailbreak success rates by a lot… but some people still found ways around it. If you would like to read the whole academic paper, check it out here.
- Christian, B. (2024). The alignment problem: machine learning and human values. Penguin Random House. A great book on the challenges of aligning AI systems with human values.
- Russell, S. (2017). 3 principles for creating safer AI. TED Talk.
- Center for AI Safety. A non-profit organisation working on this topic. They have a lot of resources on their website.
- AI Alignment Forum. A community website for discussing AI alignment. Many of the most important papers in the field are posted here.
April 20, 2026:
- Lecture 25: Course revision.
April 22, 2026:
- Lecture 26: Quiz 05.
April 27, 2026:
- No lecture.
- Q&A session and final project work time.
- Final Project due (20%).