QTM 151 - Introduction to Statistical Computing II

Course Description

Welcome to QTM 151! This course introduces students to data analysis and statistical computing using Python and SQL. It is ideal for those with little or no programming experience who want to develop skills for data-driven decision making.

Over the semester, we will cover version control for collaborative coding, Jupyter Notebooks for reproducible research, Python programming basics, data wrangling and merging in SQL, data visualisation, and introductions to linear modelling and time series analysis.

Students will work with real-world datasets and problems, gaining practical experience in using these tools to extract insights from data. The course aims to develop both technical skills and critical thinking needed for complex data challenges. By the end, students will be ready for advanced study in quantitative methods and data science.

Learning Objectives

By the end of this course, students will be able to:

  1. Perform basic operations and write functions in Python.
  2. Conduct data wrangling and manipulate data using Python libraries such as Pandas.
  3. Merge and manage databases using SQL.
  4. Create visualisations to effectively communicate data insights.
  5. Implement linear models and understand the principles of time series analysis.
  6. Use Jupyter Notebooks for reproducible research.
  7. Develop problem-solving skills relevant to data analysis and statistical computing.

Prerequisites

There are no prerequisites for this course. All students are welcome to join, regardless of their prior experience with programming or data analysis. Please feel free to reach out if you have any questions about the course content or your readiness to take the class.

Materials

This course is designed to be self-contained, providing all the necessary resources and materials to succeed in mastering the core concepts. However, students are encouraged to explore the following suggested books and online courses to deepen their understanding of Python and SQL.

Suggested Books

Online Courses

Additional Resources

Course Information

We will meet every Monday and Wednesday from 16:00 to 16:50 in the Anthropology Building, 303. It is important that you read the materials before class. All information about the course is available on the course’s GitHub repository at https://github.com/danilofreire/qtm151. While I will try to adhere to the course schedule as much as possible, I also want to adapt to your learning pace and style. The syllabus and course plan may change in the semester. Again, please check the course repository regularly to check for updates. I will also announce any changes in class and via email.

Software

We will mainly use Python in this course. Python is a free, versatile, and powerful programming language that is widely used in data science, machine learning, and scientific computing. I recommend using the Anaconda distribution as it comes with many necessary Python libraries for data analysis, such as Pandas, NumPy, and Jupyter.

You can write your Python code in any text editor, but I recommend VS Code with the Python extension. Pycharm is also well-regarded by developers. If you are feeling adventurous, you can also use Neovim with the coc-pyright plugin. That is, if you can exit the editor. :)

We will use PostgreSQL for database management. You can download PostgreSQL from the official website. Please also install pgAdmin and the VS Code extension for PostgreSQL to interact with the database.

We will also use Jupyter Notebooks in class. Jupyter itself comes pre-installed with Anaconda, but please install the Jupyter extension for VS Code as well. We will have a hands-on session to learn how to use Jupyter effectively.

To help you get started, I have prepared a series of tutorials on how to install Anaconda, Jupyter, PostgreSQL, VS Code, and open a free educational account on GitHub. Please follow these tutorials as soon as possible to ensure that you have all the necessary tools for the course.

Office Hours

I am very flexible with office hours, but it is easier to contact me via email. Feel free to send me a message any time at danilo.freire@emory.edu, and I will likely reply within a few hours. You can also book an appointment with me on Calendly, or just email me your availability. If you prefer, you can meet me in the afternoon at my office. I am in the Department of Quantitative Theory and Methods almost every weekday. My office address is in the Psychology and Interdisciplinary Sciences Building, 36 Eagle Row, 5th Floor, room XXXX. If possible, please email me before coming to ensure that no two students book the same time slot.

Academic Integrity

Upon every individual who is a part of Emory University falls the responsibility for maintaining in the life of Emory a standard of unimpeachable honour in all academic work. The Honour Code of Emory College is based on the fundamental assumption that every loyal person of the University not only will conduct his or her own life according to the dictates of the highest honor, but will also refuse to tolerate in others action which would sully the good name of the institution. Academic misconduct is an offense generally defined as any action or inaction which is offensive to the integrity and honesty of the members of the academic community. Any suspected case of academic misconduct will be referred to the Emory Honour Council.

Artificial Intelligence

Students have to submit ten problem sets and complete five in-class quizzes. You are allowed to use AI to assist with your assignments. I recommend using GitHub Copilot to generate code snippets, as it is free for students and provides good suggestions and explanations. Claude, ChatGPT, and Perplexity AI are also good tools. I am available to provide support and assistance with these tools during office hours or by appointment. However, please note that any errors or omissions resulting from the use of AI tools are your responsibility. Do not rely solely on AI to complete your assignments; you must always double-check your work. Remember to cite all sources used in your problem sets and projects, including AI tools. Please include a note at the end of any document indicating that AI was used in its development.

Special Needs and Accessibility Services

I am committed to providing necessary accommodations to ensure all students have an equal opportunity to succeed in this course. Students with medical or health conditions that may impact their academic performance should visit the Department of Accessibility Services (DAS) to determine eligibility for appropriate accommodations. Those who receive accommodations should provide me with an Accommodation Letter from DAS at the beginning of the semester or as soon as the accommodation is granted. Please note that DAS accommodations, such as extra time or quiet spaces, will apply only to quizzes, not assignments. This is because assignments are released in advance, allowing students to work at their own pace. Athletes and students with other commitments should also inform me of any scheduling conflicts at the beginning of the semester. I will do my best to accommodate these students, but I cannot guarantee that all requests will be granted. If you have any questions or concerns, please contact me.

English Language Learners

Emory University welcomes students from around the country and the world, and the unique perspectives international and multilingual students bring enrich the campus community. To empower multilingual learners, an array of support is available including language and culture workshops and individual appointments. For more information about English Language Learning support at Emory, please contact the ELLP Specialists at https://writingcenter.emory.edu. No student will be penalised for their command of the English language.

Assignments and Grading Policy

Problem Sets (50%). There will be ten problem sets throughout the course. These assignments are designed to reinforce concepts covered in lectures and readings, and to provide hands-on practice with statistical programming. Problem sets will include a mix of theoretical questions and practical applications. They will be assigned regularly and must be completed individually. You may discuss your work with other colleagues as long as you do not copy entire sentences, just changing a few words. If you worked with other students, please write down their names on your assignment. Please also acknowledge any sources you used in your work, including textbooks, articles, and AI resources. Any assignment submitted after the due date/time will automatically be graded for half points. To accommodate unexpected circumstances, your lowest assignment grade will be automatically dropped at the end of the semester. The same applies to in-class quizzes. Please submit all assignments in Jupyter Notebook format (.ipynb) via Canvas until midnight on the due date.

Class Quizzes (30%). Students will also take five in-class quizzes throughout the semester. These quizzes will be based on the lectures from the previous weeks. They will be designed to test your understanding of the material and your ability to apply the concepts to new problems. Quizzes will be open-book and open-notes, and students have 50 minutes to complete them. You are not allowed to use AI tools. They are individual assessments, and students are not allowed to discuss the questions with their colleagues in class.

Final Project (20%). The final project will consist of a short report, created using Jupyter and using one of the datasets shared on the course GitHub repository. Further instructions will be provided in class. The final project will be due on the last day of class.

Grading Scale

Each student’s final grade will be based on the following after rounding up to the nearest point:

Grade A A- B+ B B- C D F
Range 91%–100% 86%–90% 81%–85% 76%–80% 71%–75% 66%–70% 60%–65% <60%

Course Outline and Suggested Readings

The lecture notes cover all the necessary material for the course, and the weekly suggested readings are recommended for those who want to deepen their understanding of the course topics. As mentioned above, the course outline is subject to change, and I will update the syllabus if needed. Please remember to check the course GitHub repository regularly.

Module 01: Introduction to Python, Jupyter, and GitHub

Wednesday, August 28:

Weekly suggested readings:

Monday, September 02: Labour Day (no class)

Wednesday, September 04:

Weekly suggested readings:

Module 02: Python Data Types and Controlling Flows

Monday September 09:

Wednesday, September 11:

Weekly suggested readings:

Monday, September 16:

Wednesday, September 18:

Weekly suggested readings:

Module 03: Writing and Running Functions

Monday, September 23:

Wednesday, September 25:

Weekly suggested readings:

Monday, September 30:

Wednesday, October 02:

Friday, October 04: (exceptionally)

  • Assignment 04 due (5%).

Weekly suggested readings:

Monday, October 07:

Weekly suggested readings:

Module 04: Data Manipulation with Pandas

Wednesday, October 09:

Monday, October 14: Fall Break (no class)

Wednesday, October 16:

Weekly suggested readings:

Monday, October 21:

Wednesday, October 23:

Weekly suggested readings:

Module 05: Data Manipulation with SQL

Monday, October 28:

Wednesday, October 30:

Weekly suggested readings:

Monday, November 04:

Wednesday, November 06:

Weekly suggested readings:

Module 06: Time Series and Panel Data

Monday, November 11:

Wednesday, November 13:

Weekly suggested readings:

Monday, November 18:

Wednesday, November 20:

Weekly suggested readings:

Module 07: Text Data and Advanced Plots

Monday, November 25:

Wednesday, November 27: Thanksgiving Break (no class)

Monday, December 02:

Wednesday, December 04:

  • Lecture 26: Drop-in Session for the Final Project. No readings.

Weekly suggested readings:

Monday, December 09:

  • Final Project due (20%).
Back to top