Multiple Linear Regression

1. Load Libraries

import pandas as pd
from statsmodels.formula.api import ols

2. Load and Verify Data

df = pd.read_csv("data/academicperformance.csv")
df.head()
GPA Income Sleep Time Grade
0 2.9 82461 6.5 47 77
1 3.7 61113 6.2 47 94
2 2.8 63632 6.2 39 69
3 2.0 66854 7.2 49 81
4 2.8 82721 5.5 49 78

3. Run Multiple Linear Regression

mlr = ols('Grade ~ GPA + Sleep + Time', df).fit()

4. Evaluate Model

mlr.summary()
OLS Regression Results
Dep. Variable: Grade R-squared: 0.891
Model: OLS Adj. R-squared: 0.891
Method: Least Squares F-statistic: 5653.
Date: Tue, 25 Jan 2022 Prob (F-statistic): 0.00
Time: 22:45:33 Log-Likelihood: -6191.4
No. Observations: 2077 AIC: 1.239e+04
Df Residuals: 2073 BIC: 1.241e+04
Df Model: 3
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept -39.7098 0.879 -45.179 0.000 -41.434 -37.986
GPA 9.0992 0.136 67.065 0.000 8.833 9.365
Sleep 7.2070 0.104 69.500 0.000 7.004 7.410
Time 1.0580 0.011 95.102 0.000 1.036 1.080
Omnibus: 1.358 Durbin-Watson: 1.941
Prob(Omnibus): 0.507 Jarque-Bera (JB): 1.305
Skew: -0.014 Prob(JB): 0.521
Kurtosis: 3.120 Cond. No. 344.


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Predictions

data = {'GPA':[3,3,3,2,3,4,2.5,2.5,2.5],
        'Sleep':[5,6,7,6,6,6,5,5,5],
        'Time':[30,30,30,30,30,30,40,50,60]}
df_predict = pd.DataFrame(data)
df_predict['Grade'] = mlr.predict(df_predict).round(1)
df_predict
GPA Sleep Time Grade
0 3.0 5 30 55.4
1 3.0 6 30 62.6
2 3.0 7 30 69.8
3 2.0 6 30 53.5
4 3.0 6 30 62.6
5 4.0 6 30 71.7
6 2.5 5 40 61.4
7 2.5 5 50 72.0
8 2.5 5 60 82.6