Lectures
The course covers the following topics, with corresponding lecture materials available in the lectures folder. Please refer to the syllabus for additional suggested readings on each topic. Links will be added as the materials are posted.
Module 01: Introduction, Computational Literacy, and Command Line Interface (CLI)
Friday, May 16:
- Syllabus and course repository: https://github.com/danilofreire/qtm350-summer.
- Lecture 01: Welcome to QTM 350 - Introduction.
- Lecture 02: Computational Literacy.
- Course Tutorials: How to Install Anaconda, Jupyter, PostgreSQL, VSCode, and Open a Free Educational Account on GitHub.
Suggested references:
- Cleveland, W. S. (2001). Data science: An action plan for expanding the technical areas of the field of statistics. International Statistical Review, 69(1), 21-26.
- Donoho, D. (2017). 50 Years of Data Science. Journal of Computational and Graphical Statistics, 26(4), 745-766.
- Breiman, L. (2001). Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author). Statistical Science, 16(3), 199-231.
- Brady, H. E. (2019). The Challenge of Big Data and Data Science. Annual Review of Political Science, 22(1), 297-323.
- Zitnik, M., Nguyen, F., Wang, B., Leskovec, J., Goldenberg, A., & Hoffman, M. M. (2019). Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. Information Fusion, 50, 71-91.
- Campbell-Kelly, M., Aspray, W. F., Yost, J. R., Tinn, H., & Díaz, G. C. (2023). Computer: A History of the Information Machine. Routledge.
- Shalf, J. (2020). The Future of Computing beyond Moore’s Law. Philosophical Transactions of the Royal Society A, 378(2166), 20190061.
- Al-Hashimi, H. M. (2023). Turing, von Neumann, and The Computational Architecture of Biological Machines. Proceedings of the National Academy of Sciences, 120(25), e2220022120.
- Wing, J. M. (2006). Computational Thinking. Communications of the ACM, 49(3), 33-35.
- Videos: David J. Malan - Abstraction, Khan Academy - Hexadecimal Number System, Matthias Wandel - Marble Adding Machine, Crash Course - Early Computing and Electronic Computing (the last two are quite entertaining!).
Monday, May 19:
- Lecture 03: Encoding Information & Introduction to Programming.
- Lecture 04: Command Line Interface.
- Kahoot Quiz.
- Assignment 01: Problem Set 01.
Suggested references:
- Janssens, J. (2021). Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools (2nd ed.). O’Reilly Media.
- Levy, J. (2024). The Art of Command Line. GitHub.
- Shotts, W. (2019). The Linux Command Line: A Complete Introduction. No Starch Press.
- Healy, K. (2019). The Plain Person’s Guide to Plain Text Social Science. Chapters 1-5.
- Kerr, D. (2024). Effective Shell.
- Irianto, I. (2021). Learn Vim (the Smart Way).
- Neil, D. (2015). Practical Vim: Edit Text at the Speed of Thought. Pragmatic Bookshelf.
- Dennis, J. Your problem with Vim is that you don’t grok vi. (Stack Overflow).
- Vim Adventures. (Instructor’s note: this is a fun, albeit cringy, way to learn Vim).
- Videos: freeCodeCamp - Command line crash course, Percy Grunwald - Absolute beginner guide to the macOS terminal, NetworkChuck - 50 macOS tips and tricks using terminal
Module 02: Version Control with Git and GitHub
Wednesday, May 21:
- Lecture 05: Command Line Interface Continued.
- Lecture 06: Version control with Git and GitHub.
- Assignment 01 due (5%).
- Assignment 02: Problem Set 02.
Suggested references:
- Chacon, S. and Straub, B. (2014). Pro Git. Apress. (Instructor’s note: this is the book on Git).
- GitHub tutorials: GitHub skills (recommended), Git guides, GitHub learning lab, Best practices for repositories.
Friday, May 23:
- Lecture 07: More Git and GitHub: pull requests, issues, pages, and collaboration features.
- Lecture 08: Practice.
- Kahoot Quiz.
- Assignment 02 due (5%).
- Assignment 03: Problem Set 03.
Suggested references:
- Perez-Riverol, Y., Gatto, L., Wang, R., Sachsenberg, T., Uszkoreit, J., Leprevost, F. da V., Fufezan, C., Ternent, T., Eglen, S. J., Katz, D. S., Pollard, T. J., Konovalov, A., Flight, R. M., Blin, K., & Vizcaíno, J. A. (2016). Ten Simple Rules for Taking Advantage of Git and GitHub. PLOS Computational Biology, 12(7), e1004947.
- Beckman, M. D., Çetinkaya-Rundel, M., Horton, N. J., Rundel, C. W., Sullivan, A. J., & Tackett, M. (2021). Implementing version control with git and GitHub as a learning objective in statistics and data science courses. Journal of Statistics and Data Science Education, 29(sup1), S132-S144.
- Escamilla, E., Klein, M., Cooper, T., Rampin, V., Weigle, M. C., & Nelson, M. L. (2022). The Rise of GitHub in Scholarly Publications. arXiv preprint arXiv:2208.04895.
Monday, May 26: Memorial Day (no class)
Module 03: Reproducible Research with Quarto
Wednesday, May 28:
- Lecture 09: Introduction to Quarto.
- Lecture 10: Quiz 01 - Command Line Interface and Version Control.
- Assignment 03 due (5%).
- Assignment 04: Problem Set 04.
Suggested references:
- Quarto official website.
- Awesome Quarto: https://github.com/mcanouil/awesome-quarto. Note: this repository contains dozens of tutorials, examples, and resources.
- Çetinkaya-Rundel, M. & Lowndes, J. S. (2022) Keynote talk: Hello Quarto: Share • Collaborate • Teach • Reimagine. Slides and source code. This is one of the nicest Quarto presentations I have seen.
- Getting Started with Quarto (YouTube). Note: Posit (formerly RStudio) has a series of tutorials on Quarto on their YouTube channel. You can find their playlist here.
Friday, May 30:
- Lecture 11: Quarto Continued.
- Lecture 12: Quarto Practice. Jupyter Notebook version
- Assignment 04 due (5%).
- Assignment 05: Problem Set 05.
Module 04: AI-Assisted Programming
Monday, June 2:
- Lecture 13: Introduction to AI-Assisted Programming.
- Lecture 14: APIs and Agents.
Wednesday, June 4:
- Lecture 15: APIs and Agents Continued.
- Lecture 16: Quiz 02 - Quarto.
- Assignment 05 due (5%).
- Assignment 06: Problem Set 06.
Friday, June 6:
- Lecture 17: AI-Assisted Programming Practice. Jupyter Notebook.
- Lecture 18: Introduction to Cloud Computing.
- Assignment 06 due (5%).
Monday, June 9:
- Lecture 19: Cloud Computing Continued.
- Lecture 20: Cloud Computing Practice. Jupyter Notebook version
- Assignment 07: Problem Set 07.
Module 05: Introduction to SQL and Relational Databases
Wednesday, June 11:
- Lecture 21: Introduction to SQL and Relational Databases.
- Lecture 22: Quiz 03 - AI-Assisted Programming and Cloud Computing.
Friday, June 13:
- Lecture 23: SQL in Python: Connecting to Databases with Pandas..
- Lecture 24: Merging Data with SQL.
- Assignment 07 due (5%). (More time to complete this assignment!).
Module 06: Parallel Computing, Dependency Management and Containers
Monday, June 16:
- Lecture 25: SQL Practice. Jupyter Notebook version.
- Lecture 26: Parallel Computing.
- Assignment 08: Problem Set 08.
Wednesday, June 18:
- Lecture 27: Application: Parallelising Data Analysis with Dask and AutoML.
- Lecture 28: Quiz 04 - SQL.
- Assignment 08 due (5%).
Friday, June 20:
- Lecture 29: Dependency Management, Virtual Environments, and Containers..
- Lecture 30: Docker for Data Science.
- Assignment 09: Problem Set 09.
Monday, June 23:
- Lecture 31: Course Revision.
- Lecture 32: Parallel Computing and Docker Practice. Jupyter Notebook version.
- Assignment 09 due (5%).
- Assignment 10: Problem Set 10.
Wednesday, June 25:
- Lecture 33: Quiz 05 - Parallel Computing and Docker.
- Lecture 34: Final Project Q&A Session.
Friday, June 27:
- Final Project due (20%).
- Assignment 10 due (5%).