Multiple Linear Regression¶
Load Libraries¶
import pandas as pd
from statsmodels.formula.api import ols
Load and Verify Data¶
df = pd.read_csv("data/academicperformance.csv")
df.head()
GPA | Income | Sleep | Time | Grade | |
---|---|---|---|---|---|
0 | 2.9 | 82461 | 6.5 | 47 | 77 |
1 | 3.7 | 61113 | 6.2 | 47 | 94 |
2 | 2.8 | 63632 | 6.2 | 39 | 69 |
3 | 2.0 | 66854 | 7.2 | 49 | 81 |
4 | 2.8 | 82721 | 5.5 | 49 | 78 |
Multiple Linear Regression¶
mlr = ols('Grade ~ GPA + Sleep + Time', df).fit()
mlr.summary()
Dep. Variable: | Grade | R-squared: | 0.891 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.891 |
Method: | Least Squares | F-statistic: | 5653. |
Date: | Mon, 27 Dec 2021 | Prob (F-statistic): | 0.00 |
Time: | 16:35:45 | Log-Likelihood: | -6191.4 |
No. Observations: | 2077 | AIC: | 1.239e+04 |
Df Residuals: | 2073 | BIC: | 1.241e+04 |
Df Model: | 3 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -39.7098 | 0.879 | -45.179 | 0.000 | -41.434 | -37.986 |
GPA | 9.0992 | 0.136 | 67.065 | 0.000 | 8.833 | 9.365 |
Sleep | 7.2070 | 0.104 | 69.500 | 0.000 | 7.004 | 7.410 |
Time | 1.0580 | 0.011 | 95.102 | 0.000 | 1.036 | 1.080 |
Omnibus: | 1.358 | Durbin-Watson: | 1.941 |
---|---|---|---|
Prob(Omnibus): | 0.507 | Jarque-Bera (JB): | 1.305 |
Skew: | -0.014 | Prob(JB): | 0.521 |
Kurtosis: | 3.120 | Cond. No. | 344. |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Predictions¶
data = {'GPA':[3,3,3,2,3,4,2.5,2.5,2.5],
'Sleep':[5,6,7,6,6,6,5,5,5],
'Time':[30,30,30,30,30,30,40,50,60]}
df_predict = pd.DataFrame(data)
df_predict['Grade'] = mlr.predict(df_predict).round(1)
df_predict
GPA | Sleep | Time | Grade | |
---|---|---|---|---|
0 | 3.0 | 5 | 30 | 55.4 |
1 | 3.0 | 6 | 30 | 62.6 |
2 | 3.0 | 7 | 30 | 69.8 |
3 | 2.0 | 6 | 30 | 53.5 |
4 | 3.0 | 6 | 30 | 62.6 |
5 | 4.0 | 6 | 30 | 71.7 |
6 | 2.5 | 5 | 40 | 61.4 |
7 | 2.5 | 5 | 50 | 72.0 |
8 | 2.5 | 5 | 60 | 82.6 |