QTM 151 - Introduction to Statistical Computing II
Course Description
Welcome to QTM 151! This course introduces students to data analysis and statistical computing using Python and SQL. It is ideal for those with little or no programming experience who want to develop skills for data-driven decision making.
Over the next three weeks, we will cover version control for collaborative coding, Jupyter Notebooks for reproducible research, Python programming basics, data wrangling and merging in SQL, data visualisation, and introductions to linear modelling and time series analysis.
You will work with real-world datasets and problems, gaining practical experience in using these tools to extract insights from data. The course aims to develop both technical skills and critical thinking needed for complex data challenges. By the end, you will be ready for advanced study in quantitative methods and data science.
Learning Objectives
By the end of this course, you will be able to:
- Perform basic operations and write functions in Python.
- Conduct data wrangling and manipulate data using Python libraries such as Pandas.
- Merge and manage databases using SQL.
- Create visualisations to effectively communicate data insights.
- Implement linear models and understand the principles of time series analysis.
- Use Jupyter Notebooks for reproducible research.
- Develop problem-solving skills relevant to data analysis and statistical computing.
Prerequisites
There are no prerequisites for this course. All students are welcome to join, regardless of their prior experience with programming or data analysis. Please feel free to reach out if you have any questions about the course content or your readiness to take the class.
Materials
This course is designed to be self-contained, providing all the necessary resources and materials to succeed in mastering the core concepts. However, students are encouraged to explore the following suggested books and online courses to deepen their understanding of Python and SQL.
Suggested Books
- Python for Data Analysis by Wes McKinney
- Elements of Data Science by Allen Downey
- Automate the Boring Stuff with Python by Al Sweigart
- Python for Everybody by Charles Severance
- SQL for Data Scientists by Renee M. P. Teate
Online Courses
Additional Resources
Course Information
We will meet every weekday from May 13th to May 30th, 2025 from 11:30 AM to 12:50 PM. Our meetings will be online via Zoom. The link is https://emory.zoom.us/j/1234567890 (please check the course website for the correct link). It is important that you read the materials before class. All information about the course is available on the course’s GitHub repository at https://github.com/danilofreire/qtm151-summer. While I will try to adhere to the course schedule as much as possible, I also want to adapt to your learning pace and style. The syllabus and course plan may change in the semester. Again, please check the course repository regularly to check for updates. I will also announce any changes in class and via email.
Software
We will mainly use Python in this course. Python is a free, versatile, and powerful programming language that is widely used in data science, machine learning, and scientific computing. I recommend using the Anaconda distribution as it comes with many necessary Python libraries for data analysis, such as Pandas, NumPy, and Jupyter.
You can write your Python code in any text editor, but I recommend VS Code with the Python extension. Pycharm is also well-regarded by developers. If you are feeling adventurous, you can also use Neovim with the coc-pyright plugin. That is, if you can exit the editor. :)
We will use SQLite for database management. SQLite already comes pre-installed in Python, but you can read more about it here. I recommend you to install the SQLite extension for VS Code to make it easier to work with SQLite databases.
We will also use Jupyter Notebooks in class. Jupyter itself also comes pre-installed with Anaconda, but please install the Jupyter extension for VS Code as well. We will have a hands-on session to learn how to use Jupyter effectively. It is a good idea to install the DataWrangler extension for VS Code, as it will help you to visualise and clean your data.
To help you get started, I have prepared a series of tutorials on how to install Anaconda, Jupyter, VS Code, and open a free educational account on GitHub. There is also a tutorial for PostgreSQL, but it is not required for this course. PostgreSQL is another popular database management system that is widely used in the industry. It is a good idea to learn it if you are interested in data science or software development. SQLite is easier and faster to use (that is why we will use it in class), but if you are interested in learning more about SQL, feel free to check out the tutorial.
Please follow these tutorials to ensure that you have all the necessary tools for the course.
Office Hours
I am very flexible with office hours, but it is easier to contact me via email. Feel free to send me a message any time at danilo.freire@emory.edu, and I will likely reply within a few hours.
Academic Integrity
Upon every individual who is a part of Emory University falls the responsibility for maintaining in the life of Emory a standard of unimpeachable honour in all academic work. The Honour Code of Emory College is based on the fundamental assumption that every loyal person of the University not only will conduct his or her own life according to the dictates of the highest honor, but will also refuse to tolerate in others action which would sully the good name of the institution. Academic misconduct is an offense generally defined as any action or inaction which is offensive to the integrity and honesty of the members of the academic community. Any suspected case of academic misconduct will be referred to the Emory Honour Council.
Artificial Intelligence
Students have to submit ten problem sets and complete five in-class quizzes. You are allowed to use AI to assist with your assignments. I recommend using GitHub Copilot to generate code snippets, as it is free for students and provides good suggestions and explanations. Claude, ChatGPT, and Perplexity AI are also good tools. I am available to provide support and assistance with these tools during office hours or by appointment. However, please note that any errors or omissions resulting from the use of AI tools are your responsibility. Do not rely solely on AI to complete your assignments; you must always double-check your work. Remember to cite all sources used in your problem sets and projects, including AI tools. Please include a note at the end of any document indicating that AI was used in its development.
Special Needs and Accessibility Services
I am committed to providing necessary accommodations to ensure all students have an equal opportunity to succeed in this course. Students with medical or health conditions that may impact their academic performance should visit the Department of Accessibility Services (DAS) to determine eligibility for appropriate accommodations. Those who receive accommodations should provide me with an Accommodation Letter from DAS at the beginning of the semester or as soon as the accommodation is granted. Please note that DAS accommodations, such as extra time or quiet spaces, will apply only to quizzes, not assignments. This is because assignments are released in advance, allowing students to work at their own pace. Athletes and students with other commitments should also inform me of any scheduling conflicts at the beginning of the semester. I will do my best to accommodate these students, but I cannot guarantee that all requests will be granted. If you have any questions or concerns, please contact me.
English Language Learners
Emory University welcomes students from around the country and the world, and the unique perspectives international and multilingual students bring enrich the campus community. To empower multilingual learners, an array of support is available including language and culture workshops and individual appointments. For more information about English Language Learning support at Emory, please contact the ELLP Specialists at https://writingcenter.emory.edu. No student will be penalised for their command of the English language.
Assignments and Grading Policy
Problem Sets (50%). There will be five problem sets throughout the course. These assignments are designed to reinforce concepts covered in lectures and readings, and to provide hands-on practice with statistical programming. Problem sets will include a mix of theoretical questions and practical applications. They will be assigned regularly and must be completed individually. You may discuss your work with other colleagues as long as you do not copy entire sentences, just changing a few words. If you worked with other students, please write down their names on your assignment. Please also acknowledge any sources you used in your work, including textbooks, articles, and AI resources. Any assignment submitted after the due date/time will automatically be graded for half points. Please submit all assignments in HTML or PDF format (converted from Jupyter Notebook) on Canvas. You can convert your Jupyter Notebook to HTML or PDF in VS Code. Please check out this link for more information.
Class Quizzes (50%). Students will also take three in-class quizzes throughout the semester. These quizzes will be based on the lectures from the previous days. They will be designed to test your understanding of the material and your ability to apply the concepts to new problems. Quizzes will be open-book and open-notes. You are allowed to use AI tools. They are individual assessments, and you are not allowed to discuss the questions with their colleagues in class.
Grading Scale
Each student’s final grade will be based on the following after rounding up to the nearest point:
Grade | A | A- | B+ | B | B- | C | D | F |
---|---|---|---|---|---|---|---|---|
Range | 91%–100% | 86%–90% | 81%–85% | 76%–80% | 71%–75% | 66%–70% | 60%–65% | <60% |