KMeans

For the case study, we try to discover meaningful customer groups for market segmentation.

Import Python libraries

# import libraries

import pandas as pd
from sklearn.cluster import KMeans

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")

Load and verify data

Next, we load and verify the data. Although there are four columns, we only use Income and Score for clustering. Income is the customer’s personal income. Score is an indexed score of how much the customer spends at the mall.

df = pd.read_csv("data/mallcustomers.csv")
df.head()
Gender Age Income Score
0 Male 19 15 39
1 Male 21 15 81
2 Female 20 16 6
3 Female 23 16 77
4 Female 31 17 40

Run K-Means

Next, we run \(K\)-Means with \(k\) (number of clusters) set at \(k=5\).

X = df[['Score','Income']]
km = KMeans(n_clusters=5).fit(X)
results = km.predict(X)
clusters = pd.DataFrame(results,columns=['cluster'])
df_c = X.join(clusters, how='outer')
df_c.head()
Score Income cluster
0 39 15 4
1 81 15 3
2 6 16 4
3 77 16 3
4 40 17 4
category = {0:'Enthusiastic', 1:'Conservative', 
            2:'Middle-of-the-Road', 3:'Browsers',4:'Luxury'}
df_c['cat'] = df_c['cluster']
df_c = df_c.replace({'cat':category})

Display Results

plt.figure(figsize=(12,6))
sns.scatterplot(x="Income",y="Score",data=df_c,hue="cat",
                palette="deep", s=80)
<AxesSubplot:xlabel='Income', ylabel='Score'>
_images/KM-CaseStudy_15_1.png