Recently I’ve encountred a client that predicts “in 6 month AI will be able to do much coding instead of man”.
…in years you’ll be able to on the fly, ask the AI to purchase a server, or create a website with X website builder… and basically, I bet it will write code on the fly on your demand where it connects to these tool’s APIs to really make things happen. It could do this now for some easy stuff but it’s unreliable and will mess up.
Now we’ve ancountered a interesing public repo, called Sketch. It’s AI code-writing assistant for Pandas (Python) users.
Sketch a Python library allowing to assist with code-writing for Data Mining. It allows a “standard” (hypothetical) data-analysis workflow, showing a Natural Language interace that successfully navigates many tasks in the data stack landscape. It’s
How it works
Sketch uses efficient approximation algorithms (data sketches) to quickly summarize your data, and feed that information into language models. Right now it does this by summarizing the columns and writing these summary statistics as additional context to be used by the code-writing prompt. In the future we hope to feed these sketches directly into custom made “data + language” foundation models to get more accurate results.
Main functionality
sketch.ask
Ask is a basic question-answer system on sketch, this will return an answer in text that is based off of the summary statistics and description of the data.
! pip install sketch
Eg. we have loaded a certain data set in memory:
import sketch
import pandas as pd
sales_data = pd.read_csv("https://gist.githubusercontent.com/bluecoconut/9ce2135aafb5c6ab2dc1d60ac595646e/raw/c93c3500a1f7fae469cba716f09358cfddea6343/sales_demo_with_pii_and_all_states.csv")
sales_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 185950 entries, 0 to 185949
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Order ID 185950 non-null int64
1 Product 185950 non-null object
2 Quantity Ordered 185950 non-null float64
3 Price Each 185950 non-null float64
4 Order Date 185950 non-null object
5 Purchase Address 185950 non-null object
6 Credit Card 185950 non-null object
7 SSN 185950 non-null object
dtypes: float64(2), int64(1), object(5)
memory usage: 11.3+ MB
So , the Sketch might be asked a particular data-wise question:
df.sketch.ask("Which columns are integer type?")
reply being the following
The columns that are integer type are: Order ID, Quantity Ordered, and Index.
2. sketch.howto
The basic “code-writing” prompt in Sketch. This will return a code-block you should be able to copy paste and use as your coding starting point. Eg.:
sales_data.sketch.howto("Plot the sales versus time")
It’ll generate the following code:
# import libraries import matplotlib.pyplot as plt import pandas as pd # read the dataframe df = pd.read_csv('sales_data.csv') # convert the Order Date column to datetime format df['Order Date'] = pd.to_datetime(df['Order Date']) # create a new column for the month of the order date df['Month'] = df['Order Date'].dt.month # group by month and sum up the total sales for each month monthly_sales = df.groupby('Month').sum()['Quantity Ordered'] # plot the sales versus time plt.plot(monthly_sales) plt.xlabel('Month') plt.ylabel('Sales') plt.title('Sales versus Time') plt.show()
As we run the code, replacing line 6 with df=sales_data
, the result is the following:
See the examples at the video on the github repo page.
sketch.apply
It is a more advanced prompt that is more useful for data generation. Use it to parse fields, generate new features, and more, eg.:
df['review_keywords'] = df.sketch.apply("Keywords for the review [{{ review_text }}] of product [{{ product_name }}] (comma separated):")
GPT-3 based
The Sketch is based with GPT-3.
GPT-3 is the third generation of the GPT (Generative Pre-training Transformer) series. With over 175 billion parameters, it is significantly larger and more powerful than its predecessors.
Playground
There is Sketch palyground with examples at the Google colab.
Conclusion
As the AI grows in its capabilities we’ll see more code-writing tools. As to my Machine Learning endeavours I’d gladly apply Sketch for Data mining problems.