Skip to content

Getting to Know Your Data

Overview

This module introduces the foundational skills of data exploration, cleaning, and visualization for political science research. You will learn how to inspect datasets, identify patterns, handle missing or inconsistent values, and produce simple visual summaries.

Learning Objectives

  • Know common functions data exploration
  • Subset observations and variables (rows & columns)
  • Compare relationships between variables
  • Summarize relationships using descriptive statistics
  • Create simple data visualizations

The emphasis is on developing an intuitive understanding of the data (its structure, limitations, and quirks) before any formal analysis or modeling begins. Just as you would not draw conclusions about a close colleague without knowing them well, effective data analysis requires knowing your data like a best friend. Careful understanding and preprocessing are essential prerequisites to any meaningful modeling effort.

By the end of the module, you will be equipped to prepare and explore political data, laying a rigorous foundation for subsequent analytical work.

TLDR;

Many modeling errors in political science or any data-related task arise not from sophisticated algorithms, but from an inadequate understanding of the underlying data.

Common pitfalls to avoid

  • Interpreting results without checking missing data
  • Assuming variables are coded intuitively
  • Modeling before visualizing relationships
  • Ignoring outliers and implausible values

Hack Time

We will work on Notebooks 4 during this hack time.

Tip

To load and use a notebook in VS Code, follow steps 3 to 5 in 📘 Notebooks in VS Code

  • Focus on understanding what each operation does to the data, not just on producing output. Ask yourself:
    • What changed, and why?

ANES 2020 Codebook

Why codebooks matter

Codebooks reveal:

  • Question wording
  • Response categories
  • Skip patterns
  • Known data limitations

Additional Resources