Wine Classification Model

Wine Classification Model


Multi Classification Wine Model with Random Forest.

The Model predicts if a Wine is Regular, Good or Excellent by its levels of alcohol, pH, sulphates, citrics, etc.

The different Machine Learning Algorithms that were used for the Wine Dataset were:

  • Keras Nearest Neighboor

  • Naive Bayes

  • SVC

  • Random Forest

  • Stochastic Gradient Descent

The Random Forest Model has the best result, with a 70% Accuracy of the three different Wine Classes.

Modules Needed

import pandas as pd
import sklearn
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

Loading Dataset

dataset = pd.read_csv("./data/winequalityN.csv")

Changing Quality Values between 3-9 to only 3 Quality Classes 0-1-2

dataset.quality = dataset.quality.replace({3: 0, 4: 0, 5: 0, 6: 1, 7: 2, 8: 2, 9: 2})

Dropping NA Values in the Dataset

dataset = dataset.dropna()

Transforming String Values to Numeric Values

dataset.type = dataset.type.replace({"white": 1, "red" : 0})

Checking Each Quality Class Has the Same Number of Rows

df_0 = dataset[dataset['quality']==0]
df_1 = dataset[dataset['quality']==1]
df_2 = dataset[dataset['quality']==2]

df_0 = df_0.sample(1250)
df_1 = df_1.sample(1250)
df_2 = df_2.sample(1250)

dataset = pd.concat([df_0, df_1, df_2])

Preprocessing Data

X = dataset.iloc[:, 0:-1]
y = dataset.iloc[:, -1]
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X, y, test_size=0.2, random_state=100, stratify=y)

numerical_features = X.select_dtypes(include=['float64', 'int64'])

numerical_columns = numerical_features.columns

ct = ColumnTransformer([("only numeric", StandardScaler(), numerical_columns)], remainder='passthrough')

X_Train = ct.fit_transform(X_Train)
X_Test = ct.transform(X_Test)

Keras Nearest Neighboor Model

from sklearn.neighbors import KNeighborsClassifier
knn_model = KNeighborsClassifier(n_neighbors=3)

knn_model.fit(X_Train, Y_Train)
y_predicted = knn_model.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_predicted))

Naive Bayes Model

from sklearn.naive_bayes import GaussianNB
nb_model = GaussianNB()
nb_model = nb_model.fit(X_Train, Y_Train)
y_pred = nb_model.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_pred))

SVC Model

from sklearn.svm import SVC
svm_model = SVC()
svm_model = svm_model.fit(X_Train, Y_Train)
y_pred = svm_model.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_pred))

Random Forest Model

from sklearn.ensemble import RandomForestClassifier
random_forest = RandomForestClassifier()

random_forest.fit(X_Train, Y_Train)
y_pred = random_forest.predict(X_Test)

print(sklearn.metrics.classification_report(Y_Test, y_pred))

Stochastic Gradient Descent

from sklearn.linear_model import SGDClassifier
sgd = SGDClassifier()
sgd.fit(X_Train, Y_Train)
pred_sgd = sgd.predict(X_Test)

print(sklearn.metrics.classification_report(Y_Test, y_pred))

The Random Forest Model has the best result, with a 70% Accuracy of the three different Wine Classes.

Check-it out

Test the Model yourself by running the main.py file, built with Streamlit.

streamlit run main.py

Resources: