Skip to content

40% | Project

Students are required to submit an individual project that uses some of the methods and tools introduced during the course. Your project will prepare the data, visualize and introduce a model. You will discuss the challenges related to the data you are using, carry out a motivated and valid model implementation, and assess the strengths and weaknesses of your data and model(s).

Context

You are a young data scientist, with a strong social science background working for the New York Times during the 2020 United States Presidential campaign. You are tasked to understand voting behaviour using the ANES 2020 dataset. You are expected to suggest a motivated machine learning model explaining the democratic vote (V201033) using the ANES 2020 dataset. The project can use any of the methods introduced during the course, but is not limited to. You will explain the methods and the variables you choose to incorporate in your model and why those are appropriate to predicting your target. You can use theory or data to motivate your choices. You will also discuss the assumptions made by the methods, the data and the limits of what we can learn from your model.

Objective

  • Leverage theory and data to critically build and evaluate a predictive model.
  • Articulate prose and code to answer a research question.
  • Develop an end-to-end reproducible project.

Instructions

Your submission will consist of a notebook (.ipynb) that is shared with the instructor. The template will be automatically created for you when you accept the assignment.

The Notebook:

  • Contains all the code necessary to reproduce the project.
  • Uses code chunks that are understandable to the reader.
  • Loads, cleans and prepares the data.
  • Makes summary statistics and appropriate visualizations of the target variable and explanatory variables you select.
  • Builds an algorithm using variables from the ANES 2020 dataset.
  • Uses the trained algorithm to make predictions of the target variable.
  • Evaluates and summarises the performance of the algorithm.
    • You will use appropriate tables and visualizations.
  • Uses prose to describe the algorithmic implementation and variable choices to the reader.
    • You will discuss the strengths and weaknesses of the data and your algorithm.
  • Uses relevant citations when appropriate to ground the project theoretically.
  • All the sources are properly cited and referenced using the APA citation style.
  • The project is no longer than 2500 words.
  • The notebook runs successfully!

Grading

Your project will be graded based on five criteria:

Criteria Description
Algorithm The computational methods applied to the data are appropriate.
Code The code is of good quality and appropriately documented.
Structure The document is well formatted and uses the right tools.
Expression The methods and variables selected or created are properly motivated using theory and empirics.
Originality The algorithmic implementation is original and uses a good depth of course material.