Lectures

Please find below the schedule of our lectures. Each lecture includes a brief description, required readings, and suggested readings for further exploration. The lecture slides and additional resources will be posted on the course website and GitHub repository. Please remember that the schedule may change during the course. If you have any questions or need further assistance, please let me know!

Module 0 — Orientation

January 14, 2026:

Syllabus and course repository: https://github.com/danilofreire/datasci185.
Lecture 01: Course introduction.

Readings:

Stanford University’s Human-Centered Artificial Intelligence (2025). AI Index Annual Report. An excellent report on the state of AI, with many useful charts and references.
Our World in Data (2025). Artificial intelligence. A concise overview of AI trends and applications, with links to further resources.
Menlo Ventures (2025). 2025: the state of consumer AI. How AI is being used in consumer products, with examples and market data.
Morgan Stanley (2025). AI’s next leap: 5 trends shaping innovation and ROI. A short text on the current state of AI applications in industry.
McKinsey & Company (2025). The state of AI: how organizations are rewiring to capture value. A business-focused overview of AI adoption and impact.
The New Yorker (2026). What is Claude? Anthropic doesn’t know, either. A fascinating article about the challenges of building and understanding large language models. Also available here.

January 19, 2026: Martin Luther King Day (No Classes)

January 21, 2026:

Lecture 02: A brief history of AI and the recent shift.
Assignment 01.

Readings:

Dick, S. (2019). Artificial intelligence. Harvard Data Science Review, 1(1). A brief history of AI research and applications. Written for a general audience.
Grzybowski, A., Pawlikowska–Łagód, K., & Lambert, W. C. (2024). A history of artificial intelligence. Clinics in Dermatology, 42(3), 221-229. A “pre-history” overview, focusing on early ideas and milestones.
Cao, Y., Liu, L., Li, M., Huang, Y., & Li, S. (2023). A comprehensive survey of AI-generated content (AIGC): A history of generative AI from GAN to ChatGPT. arXiv preprint. A somewhat technical but comprehensive overview of generative AI methods.
Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2), 8-12. A classic paper arguing that large datasets are often more important than sophisticated algorithms.
Pradhan, M. (2023). A non-technical introduction to Transformers. A clear explanation of the transformer architecture that underpins many modern AI systems.
Georgia Tech’s Polo Club of Data Science (2025). Transformer explainer. An interactive visualisation of how transformers work.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. In case you want to read the original paper. Not for the faint of heart.

Module 1 — How AI systems are designed

January 26, 2026:

Lecture 03: Dataset design, labels, and tasks.
Kahoot Quiz

Readings:

Athalye, A., Northcutt, C., & Mueller, J. (2024). Dataset creation and curation in Data-Centric AI. A chapter from a forthcoming book on data-centric AI, covering the main ideas in dataset design.
Rich, A. S., & Gureckis, T. M. (2019). Lessons for artificial intelligence from the study of natural stupidity. Nature Machine Intelligence, 1(4), 174-180. A title this good is hard to find. The article discusses different sources of bias in AI systems.
Sapien Labs. (2024). Labeling data for machine learning: best practices and quality control. A practical guide to data labelling, with tips and common pitfalls.
Liang, W., Tadesse, G. A., Ho, D., Fei-Fei, L., Zaharia, M., Zhang, C., & Zou, J. (2022). Advances, challenges and opportunities in creating data for trustworthy AI. Nature Machine Intelligence, 4(8), 669-677. A good discussion on the challenges of creating high-quality datasets.
Chai, C., & Li, G. (2020). Human-in-the-loop techniques in machine learning. IEEE Data Eng. Bull., 43(3), 37-52. A little more technical, but quite comprehensive.

January 28, 2026:

Lecture 04: Supervised, unsupervised, and reinforcement learning.
Kahoot Quiz
Assignment 01 due (5%).
Assignment 02.

Readings:

Jiang, T., Gradus, J. L., & Rosellini, A. J. (2020). Supervised machine learning: a brief primer. Behavior Therapy, 51(5), 675-687. A non-technical introduction to supervised machine learning.
Amazon Web Services (2025). What is the difference between supervised and unsupervised learning? A practical overview with examples.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Chapter 5: Machine learning basics in Deep Learning. A technical but clear introduction to the main ideas behind machine learning, by some of the founders of deep learning.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning. A widely used book covering supervised and unsupervised learning methods. Included here for reference, as you’ll surely use it in your future career.
Ghasemi, M., & Ebrahimi, D. (2024). Introduction to reinforcement learning. arXiv preprint. Short introduction to the topic. If you would like to read something more advanced, please refer to Sutton & Barto (2015).

February 02, 2026:

Lecture 05: Metrics, validation and overfitting.
Kahoot Quiz

Readings:

Thomas, R. L., & Uminsky, D. (2022). Reliance on metrics is a fundamental challenge for AI. Patterns, 3(5).100440. A discussion of the limitations of metrics.
Akinkugbe, A. (2025). The essential guide to model evaluation metrics for classification. A practical guide to common evaluation metrics.
Confident AI (2025). LLM evaluation metrics: the ultimate LLM evaluation guide. A simple guide on new ways to measure the quality of LLM outputs.
Rashka, S. (2025) Understanding the 4 main approaches to LLM evaluation. Good blog post about the different ways to evaluate LLMs.
Coursera (2025). Precision vs. recall in machine learning: What’s the difference?. A short article about those two commonly confused metrics.
Inie, N., Stray, J., & Derczynski, L. (2023). Summon a demon and bind it: a grounded theory of LLM red teaming in the wild. arXiv preprint. A good overview of red teaming.
Radharapu, B., Robinson, K., Aroyo, L., & Lahoti, P. (2023). Aart: AI-assisted red-teaming with diverse data generation for new LLM-powered applications. arXiv preprint*. A method to improve LLM safety with simple adversarial attacks.

Module 2 — Language and perception

February 04, 2026:

Lecture 06: How machines handle language: tokens, embeddings and context.
Kahoot Quiz
Assignment 02 due (5%).
Assignment 03.

Readings:

HuggingFace (2025). Tokenizer summary A really good overview of tokenisation, with examples and videos.
Neptune.ai (2025). The ultimate guide to word embeddings. A clear and practical introduction to word embeddings.
Neptune.ai (2025). Tokenization in NLP: types, challenges, examples, tools. Another article by the same group on tokenisation.
StackOverflow (2023). An intuitive introduction to text embeddings. A practical introduction to embeddings.
Metzger, S. (2022). A beginner’s guide to tokens, vectors, and embeddings in NLP.. Easy to read and with interesting examples.
Boykis, V. (2022). What are embeddings?. A whole book dedicated to embeddings. For those who are interested in a more in-depth treatment. There’s also a GitHub repository with code examples.

February 09, 2026:

Lecture 07: How machines see and hear.
Kahoot Quiz

Readings:

O’Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. arXiv preprint. A clear and concise tutorial.
Gupta, A. (2025). How computers see the world: a beginner’s guide to CNNs An illustrated introduction to convolutional neural networks. Feel free to skip the coding section if you are not interested.
AltexSoft (2025). How AI sound, music, and voice generation works. Another practical introduction. It goes a little beyond what we cover in class.
Raschka, S. (2024). Understanding Multimodal LLMs. More technical but clear.
Wu, J., Gan, W., Chen, Z., Wan, S., & Yu, P. S. (2023). Multimodal Large Language Models: A Survey. arXiv preprint.
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023, July). Robust speech recognition via large-scale weak supervision. In International conference on machine learning (pp. 28492-28518). PMLR. The original paper for Whisper.

February 11, 2026:

Lecture 08: Quiz 01.
Assignment 03 due (5%).
Assignment 04.

February 16, 2026:

Lecture 09: Prompting techniques.
Kahoot Quiz

Readings:

Anthropic (2025). Prompt engineering overview. Official guide from Claude’s creators.
OpenAI (2025). Prompt engineering. OpenAI’s best practices.
Google (2025). Gemini for Workspace Prompting Guide. The persona-task-context-format (PTCF) framework.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. *Advances in neural information processing systems, 33, 1877-1901.
Wei, J., et al. (2022). Chain-of-Thought prompting elicits reasoning in large language models. The original CoT paper.
Prompting Guide. Community-maintained reference with examples.
Google (2025). Prompting essentials. Free course on Coursera.
Berryman, J., & Ziegler, A. (2024). Prompt engineering for LLMs: the art and science of building large language model–based applications. A recent book about the topic.

Module 3 — Retrieval, generation and pipelines

February 18, 2026:

Lecture 10: Creativity and hallucination.
Kahoot Quiz
Assignment 04 due (5%).
Assignment 05.

Readings:

Jiang, X., Tian, Y., Hua, F., Xu, C., Wang, Y., & Guo, J. (2024). A survey on large language model hallucination via a creativity perspective. arXiv preprint. Very interesting paper. If you want to read just one article on this topic, this is a good candidate.
Wang, C., Liu, X., Yue, Y., Guo, Q., Hu, X., Tang, X., … & Zhang, Y. (2025). Survey on factuality in large language models. ACM Computing Surveys, 58(1), 1-37. Comprehensive and informative. Not technical.
Anh-Hoang, D., Tran, V., & Nguyen, L. M. (2025). Survey and analysis of hallucinations in large language models: attribution to prompting strategies or model behavior. Frontiers in Artificial Intelligence, 8, 1622292. The authors argue that chain-of-thought prompting reduce hallucinations in prompt-sensitive scenarios.
Shao, A. (2025). New sources of inaccuracy? A conceptual framework for studying AI hallucinations. Harvard Kennedy School Misinformation Review. Worth reading as well.
OpenAI (2024). Does ChatGPT tell the truth? Practical advice.
Luo, Q., King, G., Puett, M., & Smith, M. D. (2026). Inducing sustained creativity and diversity in large language models. Working paper, Harvard University. Proposes a novel decoding scheme that unlocks diverse and creative LLM outputs beyond standard modal decoding.

February 23, 2026:

Lecture 11: Retrieval-augmented workflows and semantic search.
Kahoot Quiz

Readings:

Amazon Web Services (2025). What is retrieval-augmented generation (RAG)? Beginners’ guide to RAG.
Mastering LLM (2024). 11 chunking strategies for RAG — simplified & visualized Nice explanation of chunking strategies.
Slack (2025). Semantic search explained: how it works, and why it matters. Good explainer.

February 25, 2026:

Lecture 12: Quiz 02.
Assignment 05 due (5%).
Assignment 06.

March 02, 2026:

Lecture 13: Building reliable pipelines: monitoring and testing.

Readings:

Google (2025). Production ML systems: monitoring pipelines. Practical guide to monitoring ML pipelines.
Allen, J. (2024). Data pipeline monitoring: best practices for full observability. Practical tips.
Spot Intelligence (2024). Data drift in machine learning. An accessible overview of how data drift affects ML systems over time.
Howard, J. (2024). Context degradation syndrome: when large language models lose the plot. A discussion on why model performance degrades over time.
Nawani, M. (2025). From correctness to confidence: understanding LLM testing. Explains the challenges of testing non-deterministic AI systems.
ApX Machine Learning (2025). Input validation and sanitization for LLMs. Practical guide to defending AI systems against prompt injection attacks.
ApX Machine Learning (2025). Output filtering and content moderation. Best practices for validating AI outputs before sending to users.
AI Incident Database. AI Incident Database. A collection of real-world AI failures and incidents for case study research.

Module 4 — Data ethics and bias

March 04, 2026:

Lecture 14: Documentation and dataset governance.
Assignment 06 due (5%).
Assignment 07.
Kahoot Quiz

Readings:

Gebru, T., et al. (2018). Datasheets for datasets. arXiv preprint. The original paper proposing datasheets for datasets.
Mitchell, M., et al. (2019). Model cards for model reporting. Proceedings of FAccT. The original paper proposing model cards.
IBM (2025). What is data lineage? A short guide.
GO-FAIR (2025). FAIR principles. The gold standard for making research data Findable, Accessible, Interoperable, and Reusable.
OpenAI (2021). CLIP model card. A practical example of model documentation, including disaggregated performance evaluation.
Lanier, J. (2019). Data dignity and the inversion of AI. A philosophical perspective on data rights and the idea that your data is your labour.
Hutchinson, B., et al. (2021). Towards accountability for machine learning datasets: Practices from software engineering and infrastructure. FAccT. Bridges software engineering practices to dataset accountability.
Crawford, K., & Paglen, T. (2019). Excavating AI: The politics of images in machine learning training sets. A compelling visual essay on ImageNet’s problems and the politics of training data.

March 9 and 11, 2026: Spring Break (No Classes)

March 16, 2026:

Lecture 15: Types of bias and how they arise.

Readings:

Barocas, S., Hardt, M., & Narayanan, A. (2019). Fairness and machine learning. Chapter 2: When is automated decision making legitimate?
Mehrabi, N., Morstatter, F., Saxena, N., Lettich, A., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1-35.
Silberg, J., & Manyika, J. (2019). Notes from the AI frontier: tackling bias in AI (and in humans). McKinsey Global Institute.
Hellman, D. (2025). Algorithmic fairness. The Stanford Encyclopedia of Philosophy (Fall 2025 Edition).
Kantayya, S. (2020). Coded Bias. Documentary film. May or may not be available on Netflix depending on your location.
Liu Roc, A. (2023). Understanding the impossibility of fairness. Medium. A good, accessible overview of the impossibility results in fairness.

March 18, 2026:

Lecture 16: AI in Finance and Healthcare.
Assignment 07 due (5%).
No assignment this week.
Instructions for the final project (PDF). Groups must be finalised by March 23. Optional proposal due April 1; final report and infographic due April 27.

Readings:

Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453. An important paper on the topic. We will discuss it in class.
Li, Y. et al. (2023). Large language models in finance: a survey. ICAIF-23. A comprehensive overview. Easy to read.
Opel, N., & Breakspear, M. (2026). Transforming mental health research and care through artificial intelligence Science, 391(6782), 249-258. The authors review ways in which AI may reduce care inequities when deployed responsibly. They discuss how AI can be useful across several stages of care, from prevention to treatment and rehabilitation.
Agentic Health AI. (2025). Awesome AI agents for healthcare. A curated list of research papers, projects, and resources related to the application of AI agents for healthcare, including medical image analysis, EHR manipulation, counseling, drug discovery, patient dialogue, and healthcare administration.
Zouq, G. (2026). Awesome AI in Finance. Same thing for finance.
AI4Finance Foundation (2026). AI4Finance. A series of open-source frameworks to build AI models for finance. Requires coding skills.

Module 5 — Policy, governance and social impact

March 23, 2026:

Lecture 17: AI regulation and standards around the world.
Group list due.

Readings:

The White House (2025). Winning the race: America’s AI action plan. What the US government wants to do about AI.
European Parliament (2023). EU AI act. The original text.
GOV.UK (2023). A pro-innovation approach to AI regulation.. The UK’s approach to AI regulation.
Cole, M. D. (2024). AI regulation and governance on a global scale. Journal of AI Law and Regulation, 1(1), 126-142. A short report on national and international efforts to regulate AI.
Wikipedia (2026). Regulation of artificial intelligence. It’s hard to be updated on this topic, so this Wikipedia page is a good resource to keep track of the latest developments.
G’Sell, F. (2025). Regulating under uncertainty: governance options for generative AI. A report on the challenges of regulating AI and possible approaches to do it effectively.

March 25, 2026:

Lecture 18: Quiz 03.
Assignment 08.

March 30, 2026:

Lecture 19: Privacy and data protection.

Readings:

Taddeo, M., & Floridi, L. (2018). How AI can be a force for good. Science, 361(6404), 751-752. A philosophical discussion on the ethical implications of AI, including privacy concerns.
Hoofnagle, C. J., Van Der Sloot, B., & Borgesius, F. Z. (2019). The European Union general data protection regulation: what it is and what it means. Information & Communications Technology Law, 28(1), 65-98. A comprehensive overview of the GDPR, the most influential data protection law in the world.
European Data Protection Board (2025). AI privacy risks & mitigations Large Language Models (LLMs). This report provides use cases examples on the application of the risk management framework in real-world scenarios.
OECD (2026). OECD due diligence guidance for responsible AI. Practical guidance for organisations to implement good AI practices.
King, J. & Meinhardt, C. (2024). Rethinking privacy in the AI era: policy provocations for a data-centric world. The authors analyse how existing and future privacy and data protection regulation will impact the development and deployment of AI systems.
OpenAI (2025). OpenAI privacy portal. OpenAI’s privacy policy and practices.
Future of Privacy Forum (2025). https://fpf.org/. An NGO that advances principled and pragmatic data protection, AI and digital governance practices. Many resources on their website.

April 01, 2026:

Lecture 20: Labour markets and economic effects.
Assignment 08 due (5%).
Optional project proposal due.
Assignment 09.

Readings:

Acemoglu, D., & Restrepo, P. (2019). Automation and new tasks: how technology displaces and reinstates labor. Journal of Economic Perspectives, 33(2), 3-30.
World Economic Forum. (2025). The future of jobs report 2025. A comprehensive report on how jobs are expected to change.
Financial Times. (2026). The great graduate job drought. Also available here. An overview of the job market for recent graduates. You may want to read this one. Spoiler: it’s not good.
Jones, C. (2026). AI and our economic future. A working paper that reviews the literature on the economic effects of AI. Written for a broad audience, to be published in the Journal of Economic Perspectives.
Strain, Michael R. (2024). The case for AI optimism.. Maybe things as not as bad as they seem?
Anthropic (2026). Anthropic economic index. How people use Claude and what are its possible impacts on the economy.
OpenAI (2025). How people are using ChatGPT. Same thing, but for ChatGPT. Spoiler: most people are not using it for work. The full article is available here.
Oks, D. (2026). Why the ATM didn’t kill bank teller jobs (but the iPhone did). A short essay arguing that task-automating technologies rarely eliminate jobs, while paradigm-shifting ones can. The author applies this distinction to AI.
Karpathy, A. (2025). AI exposure across US occupations. An interactive visualisation of AI exposure scores for 342 US occupations, with breakdowns by education level, pay band, and employment outlook.

Module 6 — Applications, limits and projects

April 06, 2026:

Lecture 21: AI and wellbeing: the attention economy, mental health, and the environment.

Readings:

Habicht, J., Dina, L. M., McFadyen, J., Stylianou, M., Harper, R., Hauser, T. U., & Rollwage, M. (2025). Generative AI–enabled therapy support tool for improved clinical outcomes and patient engagement in group therapy: real-world observational study. Journal of Medical Internet Research, 27, e60435. An experiment on the use of AI in group therapy. It has a positive result.
Moore, J., Grabb, D., Agnew, W., Klyman, K., Chancellor, S., Ong, D. C., & Haber, N. (2025). Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers. arXiv preprint arXiv:2504.18412. The authors question the idea that LLMs should replace mental health providers.
OpenAI. (2025). Strengthening ChatGPT’s responses in sensitive conversations. OpenAI’s approach to making ChatGPT more helpful in sensitive conversations.
Habicht, J., et al. (2024). Closing the accessibility gap to mental health treatment with a personalised self-referral chatbot. Nature Medicine. A study of 129,400 NHS patients showing a chatbot self-referral system increased mental health referrals by 15% overall and 29% among ethnic minorities.
McBain, R. K., et al. (2025). Use of generative AI for mental health advice among adolescents and young adults. JAMA Network Open. A nationally representative survey showing 1 in 8 US adolescents and young adults use AI chatbots for mental health advice.
Orben, A., & Przybylski, A. K. (2019). The association between adolescent well-being and digital technology use. Nature Human Behaviour, 3, 173-182. A specification-curve analysis of 355,000+ adolescents finding that technology use explains at most 0.4% of variation in wellbeing.
Twenge, J. M., Haidt, J., Lozano, J., & Cummins, K. M. (2022). Specification curve analysis shows that social media use is linked to poor mental health, especially among girls. Acta Psychologica, 224, 103512. A direct rebuttal to Orben & Przybylski using the same method, arguing the relationship is stronger and gender-differentiated.
Arguedas, A. R., Robertson, C. T., Fletcher, R., & Nielsen, R. K. (2022). Echo chambers, filter bubbles, and polarisation: a literature review. Reuters Institute, University of Oxford. A comprehensive synthesis of the evidence, finding the picture is more mixed than popular coverage suggests.
MIT Technology Review (2025). We did the math on AI’s energy footprint. Here’s the story you haven’t heard. A data-driven investigation into the opacity of Big Tech’s energy reporting.

April 08, 2026:

Lecture 22: Quiz 04.
Assignment 09 due (5%).
Assignment 10.

April 13, 2026:

Lecture 23: Misinformation, deepfakes and trust online.

Readings:

Helmus, T. C. (2022). Artificial intelligence, deepfakes, and disinformation: a primer. RAND Corporation. A major revision of the literature.
Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146-1151. A seminal paper on the topic of misinformation. It shows that false news spreads faster and farther than true news, and that bots are not the main driver of this phenomenon.
Lazer, D. M., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., … & Zittrain, J. L. (2018). The science of fake news. Science, 359(6380), 1094-1096. A call to action for researchers to study misinformation and develop solutions.
Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of economic perspectives, 31(2), 211-236. A widely-cited paper that analyses the economics of fake news.
Mirsky, Y., & Lee, W. (2020). The creation and detection of deepfakes: a survey. arXiv preprint arXiv:2004.11138. A technical survey on deepfakes. Quite technical, though. If you are interested in the topic but don’t have a technical background, you can skip this one and read the next one instead.
Chesney, R., & Citron, D. K. (2019). Deepfakes and the new disinformation war. Foreign Affairs. A short article about the risks of deepfakes and how to address them.

April 15, 2026:

Lecture 24: Long-term safety, the alignment problem and the future of AI.
Assignment 10 due (5%).

Readings:

Amodei, D., et al. (2016). Concrete problems in AI safety. arXiv preprint. A highly influential paper on the practical challenges of AI safety.
Ji, J., Qiu, T., Chen, B., Zhang, B., Lou, H., Wang, K., … & Gao, W. (2023). AI alignment: a comprehensive survey. arXiv preprint. A survey of the field of AI alignment.
Zhi-Xuan, T., Carroll, M., Franklin, M., & Ashton, H. (2025). Beyond preferences in AI alignment. Philosophical Studies, 182(7), 1813-1863. A philosophical take on the limits of rational choice in AI alignment. A little more abstract, but very interesting.
Anthropic. (2025). Constitutional classifiers: defending against universal jailbreaks. A jailbreak defense system using input/output classifiers trained on synthetic data. It reduced jailbreak success rates by a lot… but some people still found ways around it. If you would like to read the whole academic paper, check it out here.
Christian, B. (2024). The alignment problem: machine learning and human values. Penguin Random House. A great book on the challenges of aligning AI systems with human values.
Russell, S. (2017). 3 principles for creating safer AI. TED Talk.
Center for AI Safety. A non-profit organisation working on this topic. They have a lot of resources on their website.
AI Alignment Forum. A community website for discussing AI alignment. Many of the most important papers in the field are posted here.

April 20, 2026:

Lecture 25: Course revision.

April 22, 2026:

Lecture 26: Quiz 05.

April 27, 2026:

No lecture.
Q&A session and final project work time.
Final project due (20%).