Multiple Linear Regression¶

1. Load Libraries¶

import pandas as pd
from statsmodels.formula.api import ols

2. Load and Verify Data¶

df = pd.read_csv("data/academicperformance.csv")
df.head()

	GPA	Income	Sleep	Time	Grade
0	2.9	82461	6.5	47	77
1	3.7	61113	6.2	47	94
2	2.8	63632	6.2	39	69
3	2.0	66854	7.2	49	81
4	2.8	82721	5.5	49	78

3. Run Multiple Linear Regression¶

mlr = ols('Grade ~ GPA + Sleep + Time', df).fit()

4. Evaluate Model¶

mlr.summary()

OLS Regression Results
Dep. Variable:	Grade	R-squared:	0.891
Model:	OLS	Adj. R-squared:	0.891
Method:	Least Squares	F-statistic:	5653.
Date:	Tue, 25 Jan 2022	Prob (F-statistic):	0.00
Time:	22:45:33	Log-Likelihood:	-6191.4
No. Observations:	2077	AIC:	1.239e+04
Df Residuals:	2073	BIC:	1.241e+04
Df Model:	3
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	-39.7098	0.879	-45.179	0.000	-41.434	-37.986
GPA	9.0992	0.136	67.065	0.000	8.833	9.365
Sleep	7.2070	0.104	69.500	0.000	7.004	7.410
Time	1.0580	0.011	95.102	0.000	1.036	1.080

Omnibus:	1.358	Durbin-Watson:	1.941
Prob(Omnibus):	0.507	Jarque-Bera (JB):	1.305
Skew:	-0.014	Prob(JB):	0.521
Kurtosis:	3.120	Cond. No.	344.

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Predictions¶

data = {'GPA':[3,3,3,2,3,4,2.5,2.5,2.5],
        'Sleep':[5,6,7,6,6,6,5,5,5],
        'Time':[30,30,30,30,30,30,40,50,60]}
df_predict = pd.DataFrame(data)

df_predict['Grade'] = mlr.predict(df_predict).round(1)

df_predict

	GPA	Sleep	Time	Grade
0	3.0	5	30	55.4
1	3.0	6	30	62.6
2	3.0	7	30	69.8
3	2.0	6	30	53.5
4	3.0	6	30	62.6
5	4.0	6	30	71.7
6	2.5	5	40	61.4
7	2.5	5	50	72.0
8	2.5	5	60	82.6

Machine Learning for Absolute Beginners

Multiple Linear Regression¶

1. Load Libraries¶

2. Load and Verify Data¶

3. Run Multiple Linear Regression¶

4. Evaluate Model¶

Predictions¶