From Transformations to Models¶
Agenda¶
- Theory
- What is a model?
- Identify and quantify variable relationships
- Interpret results from simple linear regression models
- Code
- Train a simple linear regression model
- Produce model summary tables
- Make predictions using a trained model
- Application
- Interpret regression table outputs
- Integrate regression tables into markdown documents
Theory¶
Data Modeling¶
We have started to use data to answer questions using descriptive statistics such as df['some_variable'].mean()
. We have also learned how to use visualizations to identify variable relationships and trends using df[]()
. People often says that an image is worth a 1000 words. Images are a great tool to convey a message however, our perceptions can be biased [@lewandowsky1989perception]. Let's take a closer look.
Relationships between variables can be complex to see¶
Adding random lines?¶
Is it a Strong relationship?¶
Is the relationship observed due to random chance?¶
- To avoid such biases, we can quantify the strength of relationships using statistical models (e.g. Linear Regression)
What is a model?¶
# Create data
my_data = pd.DataFrame({
'time_to_iep': [16.93, 19.49, 18.21, 19.09, 17.67, 18.48, 16.37, 17.57, 19.18, 18.74, 17.15, 17.76, 17.2, 19.78, 18.34,
17.93, 18.09, 17.14, 19.41, 17.99, 16.54, 18.42, 16.65, 19.83, 18.32, 18.13, 16.72, 18.05, 18.5, 19.45,
17.22, 17.32, 19.48, 18.93, 18.69, 18.78, 18.58, 18.8, 18.28, 20.06, 18.12, 18.64, 18.16, 17.44, 18.96,
17.55, 19.09, 17.95, 21.01, 18.19]
})
# Visualize
my_data['time_to_iep'].plot.hist(alpha=0.5, bins=10)
# Add a vertical line at the mean
mean_value = my_data['time_to_iep'].mean()
plt.axvline(mean_value, color='red', linestyle='dashed', linewidth=3.5)
# Annotate the mean value on the plot
plt.text(mean_value, 4.5, f" {mean_value:.2f}", color='red', ha='left', fontsize=16)
plt.xlabel('Time to IEP')
plt.ylabel('Frequency')
plt.title('Histogram of Time to IEP')
Code¶
- Code
- Train a simple linear regression model
- Produce model summary tables
- Make predictions using a trained model
Application¶
- Application
- Interpret regression table outputs
- Integrate regression tables into markdown documents
For Next Time¶
- Use
statsmodels
to create a model related to your final project. - Mandatory Reading, Note, & Presentation