Lectures
The course covers the following topics, with corresponding lecture materials available in the lectures folder. Please refer to the syllabus for additional suggested readings on each topic. Links will be added as the materials are posted.
Module 01: Introduction to Python, Jupyter, and GitHub
Wednesday, August 28:
- Syllabus and course repository: https://github.com/danilofreire/qtm350.
- Lecture 01: Welcome to QTM 350 - Introduction.
- Course Tutorials: How to Install Anaconda, Jupyter, PostgreSQL, VSCode, and Open a Free Educational Account on GitHub.
Suggested references:
- Cleveland, W. S. (2001). Data science: An action plan for expanding the technical areas of the field of statistics. International Statistical Review, 69(1), 21-26.
- Donoho, D. (2017). 50 Years of Data Science. Journal of Computational and Graphical Statistics, 26(4), 745-766.
- Breiman, L. (2001). Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author). Statistical Science, 16(3), 199-231.
- Brady, H. E. (2019). The Challenge of Big Data and Data Science. Annual Review of Political Science, 22(1), 297-323.
- Zitnik, M., Nguyen, F., Wang, B., Leskovec, J., Goldenberg, A., & Hoffman, M. M. (2019). Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. Information Fusion, 50, 71-91.
Monday, September 02: Labour Day (no class)
Wednesday, September 04:
- Lecture 02: Computational Literacy.
- Assignment 01: Problem Set 01
Suggested references:
- Campbell-Kelly, M., Aspray, W. F., Yost, J. R., Tinn, H., & Díaz, G. C. (2023). Computer: A History of the Information Machine. Routledge.
- Shalf, J. (2020). The Future of Computing beyond Moore’s Law. Philosophical Transactions of the Royal Society A, 378(2166), 20190061.
- Al-Hashimi, H. M. (2023). Turing, von Neumann, and The Computational Architecture of Biological Machines. Proceedings of the National Academy of Sciences, 120(25), e2220022120.
- Wing, J. M. (2006). Computational Thinking. Communications of the ACM, 49(3), 33-35.
- Videos: David J. Malan - Abstraction, Khan Academy - Hexadecimal Number System, Matthias Wandel - Marble Adding Machine, Crash Course - Early Computing and Electronic Computing (the last two are quite entertaining!).
Module 02: Introduction to the Command Line Interface and Version Control
Monday September 09:
Suggested references:
- Janssens, J. (2021). Data science at the command line: Obtain, scrub, explore, and model data with Unix power tools (2nd ed.). O’Reilly Media.
- Levy, J. (2024). The art of command line. GitHub.
- Shotts, W. (2019). The Linux Command Line: A Complete Introduction. No Starch Press.
- Healy, K. (2019). The Plain Person’s Guide to Plain Text Social Science. Chapters 1-5.
Wednesday, September 11:
- Lecture 04: More command Line Tools, Text Files, and Scripting.
- Kahoot Quiz.
- Assignment 01 due (5%).
- Assignment 02: Problem Set 02.
Suggested references:
- Kerr, D. (2024). Effective Shell.
- Irianto, I. (2021). Learn Vim (the Smart Way).
- Neil, D. (2015). Practical Vim: Edit Text at the Speed of Thought. Pragmatic Bookshelf.
- Dennis, J. Your problem with Vim is that you don’t grok vi. (Stack Overflow).
- Vim Adventures. (Instructor’s note: this is a fun, albeit cringy, way to learn Vim).
- Videos: freeCodeCamp - Command line crash course, Percy Grunwald - Absolute beginner guide to the macOS terminal, NetworkChuck - 50 macOS tips and tricks using terminal
Monday, September 16:
- Lecture 05: Version control with git and GitHub.
- Kahoot Quiz.
Suggested references:
- Chacon, S. and Straub, B. (2014). Pro Git. Apress. (Instructor’s note: this is the book on Git).
- GitHub tutorials: GitHub skills (recommended), Git guides, GitHub learning lab, Best practices for repositories.
Wednesday, September 18:
- Lecture 06: More Git and GitHub: pull requests, issues, pages, and collaboration features.
- Kahoot Quiz.
- Assignment 02 due (5%).
- Assignment 03: Problem Set 03.
Suggested references:
- Perez-Riverol, Y., Gatto, L., Wang, R., Sachsenberg, T., Uszkoreit, J., Leprevost, F. da V., Fufezan, C., Ternent, T., Eglen, S. J., Katz, D. S., Pollard, T. J., Konovalov, A., Flight, R. M., Blin, K., & Vizcaíno, J. A. (2016). Ten Simple Rules for Taking Advantage of Git and GitHub. PLOS Computational Biology, 12(7), e1004947.
- Beckman, M. D., Çetinkaya-Rundel, M., Horton, N. J., Rundel, C. W., Sullivan, A. J., & Tackett, M. (2021). Implementing version control with git and GitHub as a learning objective in statistics and data science courses. Journal of Statistics and Data Science Education, 29(sup1), S132-S144.
- Escamilla, E., Klein, M., Cooper, T., Rampin, V., Weigle, M. C., & Nelson, M. L. (2022). The Rise of GitHub in Scholarly Publications. arXiv preprint arXiv:2208.04895.
Monday, September 23:
- Lecture 07: Quiz 01: git and Github (6%).
Module 03: Literate Programming with Markdown, Quarto, and Jupyter
Wednesday, September 25:
- Lecture 08: Using Quarto for Reproducible Reports.
- Assignment 03 due (5%).
- Assignment 04: Problem Set 04.
Suggested references:
- Quarto official website.
- Awesome Quarto: https://github.com/mcanouil/awesome-quarto. Note: this repository contains dozens of tutorials, examples, and resources.
- Çetinkaya-Rundel, M. & Lowndes, J. S. (2022) Keynote talk: Hello Quarto: Share • Collaborate • Teach • Reimagine. Slides and source code. This is one of the nicest Quarto presentations I have seen.
- Getting Started with Quarto (YouTube). Note: Posit (formerly RStudio) has a series of tutorials on Quarto on their YouTube channel. You can find their playlist here.
- Markdown Guide.
- Jupyter Notebooks Documentation.
- Codecademy - How to use Jupyter Notebooks
- Course tutorial: Jupyter and Markdown
Monday, September 30:
Suggested references:
- Quarto Documentation - Presentations and Websites.
- GitHub Pages Documentation.
- French, J. (2023). Creating Websites with Quarto and GitHub Pages (YouTube Playlist).
- Taylor, I. (2022). Publishing a Quarto Site to GitHub Pages
Wednesday, October 02:
- Lecture 10: Quiz 02: Literate Programming (6%).
- Assignment 05: Problem Set 05.
- Assignment 04 due (5%).
Module 04: AI-Assisted Programming
Monday, October 07:
- Lecture 11: Introduction to AI-Assisted Programming and Chatbots.
Suggested references:
- Cihon, P. & Demirer, M. (2023). How AI-powered software development may affect labor markets. Brookings Institution
- Poldrack, R. A., Lu, T., & Beguš, G. (2023). AI-assisted Coding: Experiments with GPT-4. arXiv preprint arXiv:2304.13187.
- Lau, S & Guo, P. (2023). From “Ban It Till We Understand It” to “Resistance is Futile”: How University Programming Instructors Plan to Adapt as More Students Use AI Code Generation and Explanation Tools such as ChatGPT and GitHub Copilot. In Proceedings of the 2023 ACM Conference on International Computing Education Research V.1 (ICER ’23 V1), August 07–11, 2023, Chicago, IL, USA. ACM, New York, NY, USA 16 Pages.
- Linus Torvalds Discusses the Impact of AI on Programming (YouTube).
Wednesday, October 09:
- Lecture 12: AI-Assisted Programming with GitHub Copilot.
- Kahoot Quiz.
- Assignment 05 due (5%).
- Assignment 06: Problem Set 06.
Suggested references:
- GitHub Copilot Documentation.
- Using GitHub Copilot in your IDE: Tips, Tricks, and Best Practices
- Using GitHub Copilot in the Command Line
- Coding with an AI Pair Programmer: Getting Started with GitHub Copilot (YouTube)
- GitHub Copilot YouTube Playlist
- Labadze, L., Grigolia, M., & Machaidze, L. (2023). Role of AI Chatbots in Education: Systematic Literature Review. International Journal of Educational Technology in Higher Education, 20(1), 56.
Module 05: Data Manipulation with Python
Monday, October 14: Fall Break (no class)
Wednesday, October 16:
- Lecture 13: Python Data Types, Boolean Logic, and Control Structures.
- Kahoot Quiz.
- Assignment 06 due (5%).
- Assignment 07: Problem Set 07.
Suggested references:
- Python Documentation: An Informal Introduction to Python.
- Python Documentation: More Control Flow Tools.
- Python Documentation: Compound Statements.
- NumPy Documentation: Quickstart Tutorial.
- Programiz: Math Operations in Python.
- Matthes, E. (2019). Python Crash Course: A Hands-On, Project-Based Introduction to Programming (2nd ed.). No Starch Press. Chapter 02.
- Severance, C. (2016). Python for Everybody: Exploring Data in Python 3. CreateSpace Independent Publishing Platform. Chapters 3-11 (Note: Read only the chapters which interest you).
Monday, October 21:
- Lecture 14: Introduction to Pandas.
- Kahoot Quiz.
Wednesday, October 23:
- Lecture 15: Pandas for Data Analysis: Data Wrangling and Aggregating.
- Kahoot Quiz.
- Assignment 07 due (5%).
- Assignment 08: Problem Set 08.
Suggested references:
- McKinney, W. (2022). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython (3rd ed.). O’Reilly Media. Chapter 05: Getting Started with Pandas.
- VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O’Reilly Media. Chapter 3: Data Manipulation with Pandas.
- McKinney, W. (2022). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython (3rd ed.). O’Reilly Media. Chapter 07: Data Cleaning and Preparation.
- DataCamp: Pandas Tutorial: DataFrames in Python.
- Real Python: Pandas Tutorial: DataFrames in Python.
Monday, October 28:
- Lecture 16: Quiz 03: Python for Data Analysis (6%).
Module 06: Introduction to SQL Databases
Wednesday, October 30:
- Lecture 17: Introduction to PostgreSQL: Data Types, Tables, and Queries.
- Assignment 08 due (5%).
- Assignment 09: Problem Set 09.
- Instructions for the Final Project.
Suggested references:
- Mode Analytics: SQL Tutorial.
- Real Python: SQL Databases and SQLite.
- Khan Academy: SQL Basics. (Note: Khan Academy is a great resource for learning SQL and other programming languages).
- Coursera: PostgreSQL for Everybody.
- PostgreSQL Tutorial.
- PostgreSQL Documentation: SQL Commands. (Note: For reference only).
Monday, November 04:
Wednesday, November 06:
- Lecture 19: Merging Tables in SQL.
- Kahoot Quiz.
- Assignment 09 due (5%).
- Assignment 10: Problem Set 10.
Suggested references:
Monday, November 11:
- Lecture 20: Quiz 04: SQL Databases (6%).
Module 07: Parallel Computing
Wednesday, November 13:
- Lecture 21: Parallel Computing with Dask.
- Assignment 10 due (5%).
Suggested references:
- Dask Documentation
- Dask Tutorial
- Coiled - Intro to Dask Tutorial (YouTube).
- Rocklin, M. (2017). Dask: Flexible Library for Parallel Computing in Python. In Proceedings of the 16th Python in Science Conference (Vol. 126, p. 130).
Monday, November 18:
- Lecture 22: Application: Parallelising Data Analysis with Dask and AutoML.
Suggested references:
- Dask Documentation: Machine Learning.
- He, X., Zhao, K., & Chu, X. (2021). AutoML: A Survey of the State-of-The-Art. Knowledge-based systems, 212, 106622.
- TPOT Documentation.
Module 08: Containers and Reproducibility
Wednesday, November 20:
- Lecture 23: Dependency Management, Virtual Environments, and Containers.
Suggested references:
Monday, November 25:
- Lecture 24: Docker for Data Science.
Wednesday, November 27: Thanksgiving Break (no class)
Monday, December 02:
- Lecture 25: Quiz 05: Dask, Docker and Containers (6%).
Wednesday, December 04:
- Lecture 26: Review and Final Project Discussion.
Monday, December 09:
- Final Project due (20%).