Polymodeling

In this example we use VerbaGPT to create multiple models to predict the progression of diabetes from the famous diabetes dataset1.

Video:

User question: Give me 3 best models to predict the iris flower. Please rank their performance, and include confusion matrix as at least one measure of performance.

VerbaGPT response:

Cost of query: $0.00

Decision Tree R2 score: 1.000
Random Forest R2 score: 0.921
Linear Regression R2 score: 0.518

VerbaGPT answers the question correctly and gives us exactly what we asked for. It is kind of incredible that we are at a point where we can ask a question in plain english, and in seconds have this statistical cake prepared, recipe and and all, that tells us all kinds of interesting things and gives us an incredible head-start in our analysis.

What a time to be alive!

Check out the full VerbaGPT read-out by click on below.

VerbaGPT Message Window
Here is a Python code snippet that imports the necessary libraries and uses three different models to predict the target variable in the diabetes dataset. The models are ranked in terms of R2 performance:


import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score

# Load data from SQL database
query = "SELECT * FROM diabetes"
data = pd.read_sql(query, conn)

# Split data into features and target variable
X = data.drop('target', axis=1)
y = data['target']

# Create and fit models
lr = LinearRegression()
lr.fit(X, y)
dt = DecisionTreeRegressor()
dt.fit(X, y)
rf = RandomForestRegressor()
rf.fit(X, y)

# Make predictions and calculate R2 scores
lr_pred = lr.predict(X)
lr_r2 = r2_score(y, lr_pred)
dt_pred = dt.predict(X)
dt_r2 = r2_score(y, dt_pred)
rf_pred = rf.predict(X)
rf_r2 = r2_score(y, rf_pred)

# Rank models by R2 score
models = [('Linear Regression', lr_r2), ('Decision Tree', dt_r2), ('Random Forest', rf_r2)]
models.sort(key=lambda x: x[1], reverse=True)

# Print results
for model in models:
    print(f'{model[0]} R2 score: {model[1]:.3f}')


This code uses the `LinearRegression`, `DecisionTreeRegressor`, and `RandomForestRegressor` models from the `sklearn` library to make predictions on the diabetes dataset. The R2 scores for each model are calculated using the `r2_score` function from the `sklearn.metrics` module. The models are then ranked by their R2 scores and the results are printed.
  1. diabetes dataset

Posted

in

, ,

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *