![Vanna AI](https://img.vanna.ai/vanna-full.svg)

This notebook will help you unleash the full potential of AI-powered data analysis at your organization. We'll go through how to "bulk train" Vanna and generate SQL, tables, charts, and explanations, all with minimal code and effort. For more about Vanna, see our [intro blog post](https://medium.com/vanna-ai/intro-to-vanna-a-python-based-ai-sql-co-pilot-218c25b19c6a).

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vanna-ai/vanna-py/blob/main/notebooks/vn-full.ipynb)

[![Open in GitHub](https://img.vanna.ai/github.svg)](https://github.com/vanna-ai/vanna-py/blob/main/notebooks/vn-full.ipynb)

# Install Vanna
First we install Vanna from [PyPI](https://pypi.org/project/vanna/) and import it.
Here, we'll also install the Snowflake connector. If you're using a different database, you'll need to install the appropriate connector.

In [None]:
%pip install vanna
%pip install snowflake-connector-python

In [2]:
import vanna as vn
import snowflake.connector
import pandas as pd

# Login
Creating a login and getting an API key is as easy as entering your email (after you run this cell) and entering the code we send to you. Check your Spam folder if you don't see the code.

In [3]:
api_key = vn.get_api_key('my-email@example.com')
vn.set_api_key(api_key)

# Set your Model
You need to choose a globally unique model name. Try using your company name or another unique string. All data from models are isolated - there's no leakage.

In [4]:
vn.set_model('my-model') # Enter your dataset name here. This is a globally unique identifier for your dataset.

# Add Training Data
Instead of adding question / SQL pairs one by one, let's load a bunch in from a JSON, all at once. You'll make your own JSON that represents your data. You can see the [format here](https://github.com/vanna-ai/vanna-training-queries/blob/main/tpc-h/questions.json)

In [None]:
training_json =  #@param {type:"string"}

for _, row in pd.read_json(training_json).iterrows():
  vn.train(question=row.question, sql=row.sql)

# Set Database Connection
These details are only referenced within your notebook. These database credentials are never sent to Vanna's severs.

In [5]:
vn.connect_to_snowflake(account='my-account', username='my-username', password='my-password', database='my-database')

# Get Results
This gets the SQL, gets the dataframe, and prints them both. Note that we use your connection string to execute the SQL on your warehouse from your local instance. Your connection nor your data gets sent to Vanna's servers. For more info on how Vanna works, [see this post](https://medium.com/vanna-ai/how-vanna-works-how-to-train-it-data-security-8d8f2008042).

In [7]:
vn.ask()

Enter a question:  What are the top 10 customers by sales?


SELECT c.c_name as customer_name,
       sum(l.l_extendedprice * (1 - l.l_discount)) as total_sales
FROM   snowflake_sample_data.tpch_sf1.lineitem l join snowflake_sample_data.tpch_sf1.orders o
        ON l.l_orderkey = o.o_orderkey join snowflake_sample_data.tpch_sf1.customer c
        ON o.o_custkey = c.c_custkey
GROUP BY customer_name
ORDER BY total_sales desc limit 10;


Unnamed: 0,CUSTOMER_NAME,TOTAL_SALES
0,Customer#000143500,6757566.0218
1,Customer#000095257,6294115.334
2,Customer#000087115,6184649.5176
3,Customer#000131113,6080943.8305
4,Customer#000134380,6075141.9635
5,Customer#000103834,6059770.3232
6,Customer#000069682,6057779.0348
7,Customer#000102022,6039653.6335
8,Customer#000098587,6027021.5855
9,Customer#000064660,5905659.6159


display


AI-generated follow-up questions:
What is the total sales amount for each of the top 10 customers?
Can you provide the details of the transactions for each of the top 10 customers?
Which products did the top 10 customers purchase the most?
What is the average sales amount for all customers?
What is the distribution of sales amounts among all customers?
How does the total sales of the top 10 customers compare to the total sales of the rest of the customers?
Are there any seasonal patterns in the sales of the top 10 customers?
Which regions do the top 10 customers belong to?
What is the percentage contribution of each of the top 10 customers to the total sales?
How have the sales of the top 10 customers changed over time?


# Improve Your Training Data
If the SQL ran in the last segment, that question to SQL association is now part of the training set and will be remembered for future queries. If the SQL didn't run, you can try to improve your training data. You can store the actual SQL that should be associated with the question using `vn.add_sql`

# Run as a Web App
If you would like to use this functionality in a web app, you can deploy the Vanna Streamlit app and use your own secrets. See [this repo](https://github.com/vanna-ai/vanna-streamlit).