Lectures
The course covers the following topics, with corresponding lecture materials available in the lectures folder. Please refer to the syllabus for additional suggested readings on each topic. Links will be added as the materials are posted.
Module 01: Introduction to Python, Jupyter, and GitHub
Wednesday, January 15:
- Syllabus and course repository: https://github.com/danilofreire/qtm350.
- Lecture 01: Welcome to QTM 350 - Introduction.
- Course Tutorials: How to Install Anaconda, Jupyter, PostgreSQL, VSCode, and Open a Free Educational Account on GitHub.
Suggested references:
- Cleveland, W. S. (2001). Data science: An action plan for expanding the technical areas of the field of statistics. International Statistical Review, 69(1), 21-26.
- Donoho, D. (2017). 50 Years of Data Science. Journal of Computational and Graphical Statistics, 26(4), 745-766.
- Breiman, L. (2001). Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author). Statistical Science, 16(3), 199-231.
- Brady, H. E. (2019). The Challenge of Big Data and Data Science. Annual Review of Political Science, 22(1), 297-323.
- Zitnik, M., Nguyen, F., Wang, B., Leskovec, J., Goldenberg, A., & Hoffman, M. M. (2019). Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. Information Fusion, 50, 71-91.
Monday, January 20: Martin Luther King Jr. Day (no class)
Wednesday, January 22:
- Lecture 02: Computational Literacy.
- Assignment 01: Problem Set 01
Suggested references:
- Campbell-Kelly, M., Aspray, W. F., Yost, J. R., Tinn, H., & Díaz, G. C. (2023). Computer: A History of the Information Machine. Routledge.
- Shalf, J. (2020). The Future of Computing beyond Moore’s Law. Philosophical Transactions of the Royal Society A, 378(2166), 20190061.
- Al-Hashimi, H. M. (2023). Turing, von Neumann, and The Computational Architecture of Biological Machines. Proceedings of the National Academy of Sciences, 120(25), e2220022120.
- Wing, J. M. (2006). Computational Thinking. Communications of the ACM, 49(3), 33-35.
- Videos: David J. Malan - Abstraction, Khan Academy - Hexadecimal Number System, Matthias Wandel - Marble Adding Machine, Crash Course - Early Computing and Electronic Computing (the last two are quite entertaining!).
Module 02: Introduction to the Command-Line Interface and Version Control
Monday, January 27:
Suggested references:
- Janssens, J. (2021). Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools (2nd ed.). O’Reilly Media.
- Levy, J. (2024). The Art of Command Line. GitHub.
- Shotts, W. (2019). The Linux Command Line: A Complete Introduction. No Starch Press.
- Healy, K. (2019). The Plain Person’s Guide to Plain Text Social Science. Chapters 1-5.
Wednesday, January 29:
- Lecture 04: More command line tools, text files and scripting.
- Assignment 01 due (5%).
- Assignment 02: Problem Set 02.
Suggested references:
- Kerr, D. (2024). Effective Shell.
- Irianto, I. (2021). Learn Vim (the Smart Way).
- Neil, D. (2015). Practical Vim: Edit Text at the Speed of Thought. Pragmatic Bookshelf.
- Dennis, J. Your problem with Vim is that you don’t grok vi. (Stack Overflow).
- Vim Adventures. (Instructor’s note: this is a fun, albeit cringy, way to learn Vim).
- Videos: freeCodeCamp - Command line crash course, Percy Grunwald - Absolute beginner guide to the macOS terminal, NetworkChuck - 50 macOS tips and tricks using terminal
Monday, February 03:
- Lecture 05: Version control with Git and GitHub.
Suggested references:
- Chacon, S. and Straub, B. (2014). Pro Git. Apress. (Instructor’s note: this is the book on Git).
- GitHub tutorials: GitHub skills (recommended), Git guides, GitHub learning lab, Best practices for repositories.
Wednesday, February 05:
- Lecture 06: More Git and GitHub: pull requests, issues, pages, and collaboration features.
- Assignment 02 due (5%).
- Assignment 03: Problem Set 03.
Suggested references:
- Perez-Riverol, Y., Gatto, L., Wang, R., Sachsenberg, T., Uszkoreit, J., Leprevost, F. da V., Fufezan, C., Ternent, T., Eglen, S. J., Katz, D. S., Pollard, T. J., Konovalov, A., Flight, R. M., Blin, K., & Vizcaíno, J. A. (2016). Ten Simple Rules for Taking Advantage of Git and GitHub. PLOS Computational Biology, 12(7), e1004947.
- Beckman, M. D., Çetinkaya-Rundel, M., Horton, N. J., Rundel, C. W., Sullivan, A. J., & Tackett, M. (2021). Implementing version control with git and GitHub as a learning objective in statistics and data science courses. Journal of Statistics and Data Science Education, 29(sup1), S132-S144.
- Escamilla, E., Klein, M., Cooper, T., Rampin, V., Weigle, M. C., & Nelson, M. L. (2022). The Rise of GitHub in Scholarly Publications. arXiv preprint arXiv:2208.04895.
Monday, February 10:
- Lecture 07: Quiz 01: Git and Github (6%).
Module 03: Literate Programming with Markdown, Quarto, and Jupyter
Wednesday, February 12:
- Lecture 08: Using Quarto for Reproducible Reports.
- Assignment 03 due (5%).
- Assignment 04: Problem Set 04.
Suggested references:
- Quarto official website.
- Awesome Quarto: https://github.com/mcanouil/awesome-quarto. Note: this repository contains dozens of tutorials, examples, and resources.
- Çetinkaya-Rundel, M. & Lowndes, J. S. (2022) Keynote talk: Hello Quarto: Share • Collaborate • Teach • Reimagine. Slides and source code. This is one of the nicest Quarto presentations I have seen.
- Getting Started with Quarto (YouTube). Note: Posit (formerly RStudio) has a series of tutorials on Quarto on their YouTube channel. You can find their playlist here.
- Markdown Guide.
- Jupyter Notebooks Documentation.
- Codecademy - How to use Jupyter Notebooks
- Course tutorial: Jupyter and Markdown
Monday, February 17:
Suggested references:
- Quarto Documentation - Presentations and Websites.
- GitHub Pages Documentation.
- French, J. (2023). Creating Websites with Quarto and GitHub Pages (YouTube Playlist).
- Taylor, I. (2022). Publishing a Quarto Site to GitHub Pages.
Wednesday, February 19:
- Lecture 10: Quiz 02: Literate Programming (6%).
- Assignment 05: Problem Set 05.
- Assignment 04 due (5%).
Module 04: AI-Assisted Programming
Monday, February 24:
- Lecture 11: Introduction to AI-Assisted Programming and Chatbots.
Suggested references:
- Cihon, P. & Demirer, M. (2023). How AI-powered software development may affect labor markets. Brookings Institution
- Poldrack, R. A., Lu, T., & Beguš, G. (2023). AI-assisted Coding: Experiments with GPT-4. arXiv preprint arXiv:2304.13187.
- Lau, S & Guo, P. (2023). From “Ban It Till We Understand It” to “Resistance is Futile”: How University Programming Instructors Plan to Adapt as More Students Use AI Code Generation and Explanation Tools such as ChatGPT and GitHub Copilot. In Proceedings of the 2023 ACM Conference on International Computing Education Research V.1 (ICER ’23 V1), August 07–11, 2023, Chicago, IL, USA. ACM, New York, NY, USA 16 Pages.
- Linus Torvalds Discusses the Impact of AI on Programming (YouTube).
Wednesday, February 26:
- Lecture 12: AI-Assisted Programming with GitHub Copilot.
- Assignment 05 due (5%).
- Assignment 06: Problem Set 06.
Suggested references:
- GitHub Copilot Documentation.
- Using GitHub Copilot in your IDE: Tips, Tricks, and Best Practices
- Using GitHub Copilot in the Command Line
- Coding with an AI Pair Programmer: Getting Started with GitHub Copilot (YouTube)
- GitHub Copilot YouTube Playlist
- Labadze, L., Grigolia, M., & Machaidze, L. (2023). Role of AI Chatbots in Education: Systematic Literature Review. International Journal of Educational Technology in Higher Education, 20(1), 56.
Module 05: Data Manipulation with Python
Monday, March 03:
- Lecture 13: Python Data Types, Boolean Logic, and Control Structures.
- Kahoot Quiz.
- Assignment 06 due (5%).
- Assignment 07: Problem Set 07.
Suggested references:
- Python Documentation: An Informal Introduction to Python.
- Python Documentation: More Control Flow Tools.
- Python Documentation: Compound Statements.
- NumPy Documentation: Quickstart Tutorial.
- Programiz: Math Operations in Python.
- Matthes, E. (2019). Python Crash Course: A Hands-On, Project-Based Introduction to Programming (2nd ed.). No Starch Press. Chapter 02.
- Severance, C. (2016). Python for Everybody: Exploring Data in Python 3. CreateSpace Independent Publishing Platform. Chapters 3-11 (Note: Read only the chapters which interest you).
Wednesday, March 05:
- Lecture 14: Introduction to Pandas.
- Kahoot Quiz.
Monday, March 10: Spring Break (no class)
Wednesday, March 12: Spring Break (no class)
Monday, March 17:
- Lecture 15: Pandas for Data Analysis: Data Wrangling and Aggregating.
- Kahoot Quiz.
- Assignment 07 due (5%).
- Assignment 08: Problem Set 08.
Suggested references:
- McKinney, W. (2022). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython (3rd ed.). O’Reilly Media. Chapter 05: Getting Started with Pandas.
- VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O’Reilly Media. Chapter 3: Data Manipulation with Pandas.
- McKinney, W. (2022). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython (3rd ed.). O’Reilly Media. Chapter 07: Data Cleaning and Preparation.
- DataCamp: Pandas Tutorial: DataFrames in Python.
- Real Python: Pandas Tutorial: DataFrames in Python.
Wednesday, March 19:
- Lecture 16: Quiz 03: Python for Data Analysis (6%).
Module 06: Introduction to SQL Databases
Monday, March 24:
- Lecture 17: Introduction to PostgreSQL: Data Types, Tables, and Queries.
- Assignment 08 due (5%).
- Assignment 09: Problem Set 09.
- Instructions for the Final Project.
Suggested references:
- Mode Analytics: SQL Tutorial.
- Real Python: SQL Databases and SQLite.
- Khan Academy: SQL Basics. (Note: Khan Academy is a great resource for learning SQL and other programming languages).
- Coursera: PostgreSQL for Everybody.
- PostgreSQL Tutorial.
- PostgreSQL Documentation: SQL Commands. (Note: For reference only).
Wednesday, March 26:
Monday, March 31:
- Lecture 19: Merging Tables in SQL.
- Kahoot Quiz.
- Assignment 09 due (5%).
- Assignment 10: Problem Set 10.
Suggested references:
Wednesday, April 02:
- Lecture 20: Quiz 04: SQL Databases (6%).
Module 07: Parallel Computing
Monday, April 07:
- Lecture 21: Parallel Computing with Dask.
- Assignment 10 due (5%).
Suggested references:
- Dask Documentation
- Dask Tutorial
- Coiled - Intro to Dask Tutorial (YouTube).
- Rocklin, M. (2017). Dask: Flexible Library for Parallel Computing in Python. In Proceedings of the 16th Python in Science Conference (Vol. 126, p. 130).
Wednesday, April 09:
Suggested references:
- Dask Documentation: Machine Learning.
- He, X., Zhao, K., & Chu, X. (2021). AutoML: A Survey of the State-of-The-Art. Knowledge-based systems, 212, 106622.
- TPOT Documentation.
Module 08: Containers and Reproducibility
Monday, April 14:
Suggested references:
Wednesday, April 16:
- Lecture 24: Docker for Data Science.
Monday, April 21:
- Lecture 25: Quiz 05: Dask, Docker and Containers (6%).
Wednesday, April 23:
- Lecture 26: Review and Final Project Discussion.
Monday, April 28:
- Final Project due (20%).