Linear Regression

In this example we ask VerbaGPT to give us a model and associated performance metrics on some data.

Video:

User question: Give me a model to predict Rings. Provide at least two performance metrics, including r2

VerbaGPT response:

Cost of query: $0.00

Answer:

Mean squared error: 5.01
R-squared: 0.54

VerbaGPT answered the question correctly. Since VerbaGPT comes with an embedding framework, it did not have to be told which database (i.e., Db_Test_01*) to look into, or which table within the database is relevant to the user query. In this case, there is a table called “d_Abalone” in the Db_Test_01 database, that has a column called Rings. This column is a proxy for the age of Aabalones (a type of mollusc), and other columns (like shell weight, height, etc.) are characteristics of Abalones that may help us determine age.

VerbaGPT Message Window

In addition to producing the chart, VerbaGPT also provides the user with the following (editable) code. If the user chooses to, they can edit the code before hitting execute button, or simply hit execute without editing the code to produce the results shown above.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load data from the Abalone table into a pandas DataFrame
query = "SELECT * FROM d_Abalone"
data = pd.read_sql(query, conn)

# One-hot-encode the 'Sex' column
data = pd.get_dummies(data, columns=['Sex'])

# Split data into features and target
X = data.drop('Rings', axis=1)
y = data['Rings']

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Create and fit a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate performance metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Print performance metrics
print(f'Mean Squared Error: {mse:.2f}')
print(f'R^2 Score: {r2:.2f}')

This is a great looking code and a fantastic starting point for someone who may wish to take the modeling further. To think, this was accomplished by asking a simple question, no jargon, and getting a complete answer with relatively complex statistical modeling (in a typical business context) done in seconds.

LangChain + OpenAI response:

Cost of query: $0.25

The langchain agent’s reponse is…not good. Since the image is hard to read, the full read-out is below. For some reason, it selects the top 5 actual rows of data, and then simply tells me that a model can be created using the other variables in the data table. Sort of restating the question.

Now, we are using the SQL agent since we are specifically interested in querying our SQL assets, there are other langchain agents (python?) that perhaps could do this task in conjunction with the SQL agent, something we haven’t build out / tested yet.

> Entering new SQLDatabaseChain chain...
Give me a model to predict Rings. Provide at least two performance metrics.
SQLQuery:SELECT TOP 5 [Length], [Diameter], [Height], [WholeWeight], [ShuckedWeight], [VisceraWeight], [ShellWeight], [Rings] FROM [d_Abalone]
SQLResult: [(0.455, 0.365, 0.095, 0.514, 0.2245, 0.101, 0.15, 15), (0.35, 0.265, 0.09, 0.2255, 0.0995, 0.0485, 0.07, 7), (0.53, 0.42, 0.135, 0.677, 0.2565, 0.1415, 0.21, 9), (0.44, 0.365, 0.125, 0.516, 0.2155, 0.114, 0.155, 10), (0.33, 0.255, 0.08, 0.205, 0.0895, 0.0395, 0.055, 7)]
Answer:A model to predict Rings can be created using the Length, Diameter, Height, WholeWeight, ShuckedWeight, VisceraWeight, and ShellWeight columns from the d_Abalone table. Performance metrics for this model can include accuracy and precision.
> Finished chain.
'A model to predict Rings can be created using the Length, Diameter, Height, WholeWeight, ShuckedWeight, VisceraWeight, and ShellWeight columns from the d_Abalone table. Performance metrics for this model can include accuracy and precision.'

As stated in earlier posts, the langchain agent did need an explicit and direct connection to the relevant database (i.e., Db_Test_01) on our testing system in order to attempt an answer to this question

*Db_Test_01 is a testing database on our system.


Posted

in

,

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *