From Transformations to Models¶
This module introduces the fundamental concepts of statistical modeling in political science research. We have already explored our data, visualized distributions, and examined relationships between variables. Now, we ask:
- How can we formalize patterns in data?
- How can we make confident claims that go beyond mere observation?
Modeling is the bridge between descriptive insights and data-driven inference. It allows us to quantify uncertainty, test hypotheses, and predict outcomes. Think of models as tools that help you reason rigorously about your data instead of relying solely on intuition.
By the end of this module, you will have a conceptual understanding of modeling and practical skills to begin building your first statistical models.
Theory¶
What is a Model?¶
A model is a simplified representation of reality that allows us to:
- Summarize relationships between variables
- Estimate effects while accounting for uncertainty
- Make predictions for new data
For example, let's say we recorded the time I take to come to the Sciences Po Bordeaux. You might ask: What is the actual time I take to come to work?

Application¶
Modeling with Words¶
Even without code, practice conceptual modeling:
- Identify your outcome variable (dependent variable) - Y (here it's
time_to_iep) - Identify potential predictors (independent variables) - Xs
- Ask: How do other predictors explain variation in outcomes?
Conceptual modeling is essential before jumping to formulas. Models without theory are often misleading.
Common pitfalls to avoid¶
- Modeling before understanding the data
- Neglecting measurement validity
- Misinterpreting statistical significance
Code to reproduce the figure¶
# !pip install "altair[all]"
import altair as alt
import pandas as pd
df = pd.DataFrame(
{
'time_to_iep': [
16.93, 19.49, 18.21, 19.09, 17.67, 18.48, 16.37, 17.57, 19.18,
18.74, 17.15, 17.76, 17.2, 19.78, 18.34, 17.93, 18.09, 17.14,
19.41, 17.99, 16.54, 18.42, 16.65, 19.83, 18.32, 18.13, 16.72,
18.05, 18.5, 19.45, 17.22, 17.32, 19.48, 18.93, 18.69, 18.78,
18.58, 18.8, 18.28, 20.06, 18.12, 18.64, 18.16, 17.44, 18.96,
17.55, 19.09, 17.95, 21.01, 18.19
]
}
)
mean_val = df["time_to_iep"].mean()
hist = alt.Chart(df, title="Distribution of Time to IEP").mark_bar().encode(
x=alt.X("time_to_iep:Q", bin=alt.Bin(maxbins=10), title="Time to IEP"),
y="count()"
)
mean_line = alt.Chart(pd.DataFrame({"x":[mean_val]})).mark_rule(
color="red", strokeDash=[6,4]
).encode(x="x:Q")
mean_text = alt.Chart(pd.DataFrame({"x":[mean_val]})).mark_text(
text=f"{mean_val:.2f}",
dx=20, dy=250, color="red"
).encode(x="x:Q", y=alt.value(0))
hist + mean_line + mean_text
Hack Time¶
During hack time, we will work from Notebook 7.
Tip
To load and use a notebook in VS Code, follow steps 3 to 5 in 📘 Notebooks in VS Code
Focus on understanding how each IV (predictors) is related to the DV (outcome). Ask yourself:
- How does each variable help me explain the outcome (DV)?
- What is the causal mechanism behind it?
- What is the hypothetical direction of the effect (+/-)?
Get Ready for Next Week: Think. Read. Practice.¶
Thinking Ahead
- Prepare a formal model that summarizes your project.
Y ~ IV+CV1+CV2+...
Practice
- Try using
statsmodelsto train the model related to your final project.