Introduction to Data Science for Public Policy

Course overview

Instructor: Thomas Monk (t.d.spammonk@lse.ac.uk)

Room 2.01 H, Centre for Economic Performance, London School of Economics

This intensive course will introduce students to the Python programming language as a tool for applied data science. In PP455 we utilised Stata, the primary environment used by economists and public policy academics for regression analysis. Python is a more general-purpose tool from which we can perform a range of tasks, from data cleaning, transformation, and visualisation to more advanced techniques on the social science research frontier, such as natural language processing and machine learning.

The two-week course, containing a semester’s worth of material, will take students from the first principles of Python programming to the application of data science packages such as NumPy and Pandas. Each class will be practical and hands-on, with the course focusing on the application of these tools in the public policy space.

Prerequisites

A pass mark in PP455, or equivalent.

Schedule

We meet daily 11:00-13:00 in NAB 2.09, with an additional class scheduled on Tuesday, 30 August 2022 from 14:00-16:00.

Syllabus

This course is designed as an intensive two-week introduction to programming and data science for public policy students. The content covered will include:

Lecture Slides & Problem Sets

Course Outline

Date Class Content I Content II Application
Tuesday, 30 August 2022 Class 1 - AM Intro to Programming Python basics Setting up the Python environment
Tuesday, 30 August 2022 Class 2 - PM Python basics Functions & conditionals Working with notebooks
Wednesday, 31 August 2022 Class 3 Lists, strings and dictionaries Loops and list comprehensions  
Thursday, 1 September 2022 Class 4 Recap: lists, strings and dictionaries Loops Nested loops: in the casino
Friday, 2 September 2022 Class 5 Data assignment Data assignment Chicago city employee data
Tuesday, 6 September 2022 Class 6 NumPy Introduction to Pandas Wine ratings & crime data
Wednesday, 7 September 2022 Class 7 More advanced Pandas Merging with Pandas Chicago city employee data
Thursday, 8 September 2022 Class 8 Text as Data Sentiment analysis Twitter as data
Friday, 9 September 2022 Class 9 - AM Introduction to machine learning - linear model Applied machine learning task House price data
Friday, 9 September 2022 PM Class 10 - PM Machine learning: non-linear models Applied machine learning: Random Forests and XGBoost House price data: better predictions?

Resource list

Programming, Numpy and Pandas

Text as Data

Machine Learning

Other resources