Table of contents
- Multi Classification Wine Model with Random Forest.
- Modules Needed
- Loading Dataset
- Changing Quality Values between 3-9 to only 3 Quality Classes 0-1-2
- Dropping NA Values in the Dataset
- Transforming String Values to Numeric Values
- Checking Each Quality Class Has the Same Number of Rows
- Preprocessing Data
- Keras Nearest Neighboor Model
- Naive Bayes Model
- SVC Model
- Random Forest Model
- Stochastic Gradient Descent
- Check-it out
Multi Classification Wine Model with Random Forest.
The Model predicts if a Wine is Regular, Good or Excellent by its levels of alcohol, pH, sulphates, citrics, etc.
The different Machine Learning Algorithms that were used for the Wine Dataset were:
Keras Nearest Neighboor
Naive Bayes
SVC
Random Forest
Stochastic Gradient Descent
The Random Forest Model has the best result, with a 70% Accuracy of the three different Wine Classes.
Modules Needed
import pandas as pd
import sklearn
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
Loading Dataset
dataset = pd.read_csv("./data/winequalityN.csv")
Changing Quality Values between 3-9 to only 3 Quality Classes 0-1-2
dataset.quality = dataset.quality.replace({3: 0, 4: 0, 5: 0, 6: 1, 7: 2, 8: 2, 9: 2})
Dropping NA Values in the Dataset
dataset = dataset.dropna()
Transforming String Values to Numeric Values
dataset.type = dataset.type.replace({"white": 1, "red" : 0})
Checking Each Quality Class Has the Same Number of Rows
df_0 = dataset[dataset['quality']==0]
df_1 = dataset[dataset['quality']==1]
df_2 = dataset[dataset['quality']==2]
df_0 = df_0.sample(1250)
df_1 = df_1.sample(1250)
df_2 = df_2.sample(1250)
dataset = pd.concat([df_0, df_1, df_2])
Preprocessing Data
X = dataset.iloc[:, 0:-1]
y = dataset.iloc[:, -1]
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X, y, test_size=0.2, random_state=100, stratify=y)
numerical_features = X.select_dtypes(include=['float64', 'int64'])
numerical_columns = numerical_features.columns
ct = ColumnTransformer([("only numeric", StandardScaler(), numerical_columns)], remainder='passthrough')
X_Train = ct.fit_transform(X_Train)
X_Test = ct.transform(X_Test)
Keras Nearest Neighboor Model
from sklearn.neighbors import KNeighborsClassifier
knn_model = KNeighborsClassifier(n_neighbors=3)
knn_model.fit(X_Train, Y_Train)
y_predicted = knn_model.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_predicted))
Naive Bayes Model
from sklearn.naive_bayes import GaussianNB
nb_model = GaussianNB()
nb_model = nb_model.fit(X_Train, Y_Train)
y_pred = nb_model.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_pred))
SVC Model
from sklearn.svm import SVC
svm_model = SVC()
svm_model = svm_model.fit(X_Train, Y_Train)
y_pred = svm_model.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_pred))
Random Forest Model
from sklearn.ensemble import RandomForestClassifier
random_forest = RandomForestClassifier()
random_forest.fit(X_Train, Y_Train)
y_pred = random_forest.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_pred))
Stochastic Gradient Descent
from sklearn.linear_model import SGDClassifier
sgd = SGDClassifier()
sgd.fit(X_Train, Y_Train)
pred_sgd = sgd.predict(X_Test)
print(sklearn.metrics.classification_report(Y_Test, y_pred))
The Random Forest Model has the best result, with a 70% Accuracy of the three different Wine Classes.
Check-it out
Test the Model yourself by running the main.py
file, built with Streamlit
.
streamlit run main.py