Simple Linear Regression¶

1. Import Libraries¶

# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from statsmodels.formula.api import ols
import statsmodels.api as sm

2. Load and Verify Dataset¶

# load dataset and create dataframe
df = pd.read_csv('data/edincome.csv').round(1)

# verify first few records
df.head()

	Education	Income
0	10.0	32.1
1	10.4	36.5
2	10.7	23.9
3	11.1	52.3
4	11.4	30.2

# plot scatterplot

fig = df.plot.scatter(x="Education", y="Income",figsize=(8, 6) )
plt.title('Education vs Income',fontsize=24)
plt.xlabel('Education', fontsize=18)
plt.ylabel('Income',fontsize=18)
plt.grid()

_images/SLR_6_0.png

3. Run Simple Linear Regression¶

slr = ols('Income ~ Education',df).fit()

print(slr.params)

Intercept   -23.176365
Education     5.574237
dtype: float64

slr.params[0]

-23.176364855801438

fig = df.plot.scatter(x="Education", y="Income")
x = np.linspace(10,22,100)
y = slr.params[1]*x + slr.params[0]
plt.plot(x, y, '-.r')
plt.grid()

_images/SLR_11_0.png

4. Review Results and Evaluate Model¶

slr.summary()

OLS Regression Results
Dep. Variable:	Income	R-squared:	0.878
Model:	OLS	Adj. R-squared:	0.875
Method:	Least Squares	F-statistic:	238.4
Date:	Tue, 25 Jan 2022	Prob (F-statistic):	1.17e-16
Time:	22:45:36	Log-Likelihood:	-119.61
No. Observations:	35	AIC:	243.2
Df Residuals:	33	BIC:	246.3
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	-23.1764	5.918	-3.917	0.000	-35.216	-11.137
Education	5.5742	0.361	15.440	0.000	4.840	6.309

Omnibus:	2.854	Durbin-Watson:	2.535
Prob(Omnibus):	0.240	Jarque-Bera (JB):	1.726
Skew:	0.502	Prob(JB):	0.422
Kurtosis:	3.420	Cond. No.	75.8

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

print(slr.rsquared)

0.8784032808796992

print(slr.mse_model)

13766.191657863852

5. Generate Predictions¶

# predict new points
data = {'Education': [12,16,18]}
df_predict = pd.DataFrame(data).round(1)

df_predict['Income'] = slr.predict(df_predict).round(1)

df_predict

	Education	Income
0	12	43.7
1	16	66.0
2	18	77.2

previous

SLR - Minimal

next

Multiple Linear Regression