<h1 align="center"> Generative AI Hackathon</h1>
<table align="center">
    <!-- <td>
        <a href="https://colab.research.google.com/github/teamdatatonic/gen-ai-hackathon/blob/feature/DBA-hackathon/notebook/analytics_hackathon.ipynb">
            <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo">
            <span style="vertical-align: middle;">Run in Colab</span>
        </a>
    </td> -->
    <!-- <td>
        <a href="https://github.com/teamdatatonic/gen-ai-hackathon/blob/DBA-hackathon/analytics_hackathon.ipynb">
            <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
            <span style="vertical-align: middle;">View on GitHub</span>
        </a>
    </td> -->
    <!-- <td>
        <a href="http://127.0.0.1:8888/?token=30f0873aab701a416cc3cc4be5926caa89940d3778fcef47
        ">
            <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"> 
            <span style="vertical-align: middle;">Open in Jupyter Notebook</span>
        </a>
    </td> -->
</table>
<hr>

**➡️ Your task:** Learn about Generative AI by building your own Analytics Assistant using Python and LangChain!

**❗ Note:** This workshop has been designed to be run in Jupyter Notebook. A credentials.json key will be shared with you for the purpose of running this project. 

### Pip install package dependencies

In [None]:
# %pip install --quiet "git+https://github.com/teamdatatonic/gen-ai-hackathon.git@feat/alvaro#egg=dt-gen-ai-analytics-helper"

In [None]:
!poetry install 
!poetry export --format requirements.txt --output requirements.txt
%pip install -r requirements.txt

### Launch Jupyter Notebook

In [None]:
!poetry run jupyter notebook

**❗ Note:** This notebook will keep running until it is shut down manually.

## Analytics Assistant Hackathon - Start Here

### Vertex AI Endpoint

Currently, Vertex AI LLMs are accessible via Google Cloud projects. 

1. Set the env variables `project_id` and `dataset_id` with the filepath (**❗ Note:** the `/content/` folder is where uploaded files are stored by default).

In [None]:
# Replace 'your-project-id' with your Google Cloud project ID
PROJECT_ID = 'dt-gen-ai-hackathon-dev'
DATASET_ID = 'database_analytics_demo_v2'

In [None]:
import os

# @title Set project credentials. { run: "auto", display-mode: "form" }
# @markdown Set the filepath to the `.json` credentials file.

GOOGLE_APPLICATION_CREDENTIALS = "credentials.json"  # @param {type:"string"}
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = GOOGLE_APPLICATION_CREDENTIALS

In [None]:
!gcloud config set account dt-gen-ai-hackathon-sa@dt-gen-ai-hackathon-dev.iam.gserviceaccount.com
!gcloud auth activate-service-account --key-file={GOOGLE_APPLICATION_CREDENTIALS}
!gcloud config set project {PROJECT_ID}

### Import packages

In [None]:
from langchain_experimental.sql.base import SQLDatabaseSequentialChain
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.agents import(create_pandas_dataframe_agent)
from sqlalchemy.ext.declarative import declarative_base
from langchain.agents.agent_types import AgentType
from langchain.sql_database import SQLDatabase
from langchain import LLMChain,PromptTemplate
from langchain.agents import create_sql_agent 
from sqlalchemy.engine import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
from langchain.llms import VertexAI
from langchain import SQLDatabase
from tabulate import tabulate
from datetime import date
from pathlib import Path
import pandas as pd
import gradio as gr
import time

## Begin the Hackathon

- Set up the Bigquery Database
- Set up LLM Chain 
- Example query to LLM

## Task 0

#### Create Bigquery Engine

In [None]:

class Database:
    def __init__(self, url: str, schema: str = None):
        print("creating db engine...")
        self.engine = self.create_engine(url)
        print("creating db session...")
        self.base = declarative_base()
        self.sessionmaker = sessionmaker(
            autocommit=True, autoflush=True, bind=self.engine
        )
        self.schema = schema
        print("creating db connection...")
        self.connect = self.engine.connect()

    def create_engine(self, url):
        return create_engine(url)

    @property
    def dialect(self) -> str:
        return self.engine.dialect.name

    def create_session(self):
        return self.sessionmaker()
    
    def create_connection(self):
        return  self.connect
    

class BigQueryDatabase(Database):
    def __init__(
        self,
        project_id=PROJECT_ID,
        dataset_id=DATASET_ID,
    ):
        super().__init__(f"bigquery://{project_id}/{dataset_id}")
        self.schema = dataset_id

#### Create LLM Chain

In [None]:

TEST_PROMPT = '''
You are a GoogleSQL expert. Given an input question, first create a syntactically
correct GoogleSQL query to run, then look at the results of the query and return
the answer to the input question:{question}
'''

def create_sql_chain(llm, db, question):
    """ Create a Q&A conversation chain using the VertexAI LLM.

    """
    
    db_chain = SQLDatabaseSequentialChain.from_llm(
        llm,
        db,
        verbose=True,
        return_intermediate_steps=True,
    )
    test_prompt = PromptTemplate(template=TEST_PROMPT, input_variables=["question"])

    output = db_chain(test_prompt.format(question=question))
    sql_query = output["intermediate_steps"][1]
    response = output["result"]
    
    return response, sql_query


#### Intialize two LLM types - Code generation and Text generation

In [None]:
# Initialize text generation LLM
llm = VertexAI(model_name='text-bison@001',
               temperature=0, max_output_tokens=1024)

# Initialize code generation LLM
code_llm = VertexAI(model_name='code-bison@001', temperature=0, max_output_tokens=1024)


In [None]:
db = BigQueryDatabase(project_id=PROJECT_ID, dataset_id=DATASET_ID)
session = db.create_session()

conn = db.create_connection()

langchain_db = SQLDatabase(
    db.engine, schema=db.schema, sample_rows_in_table_info=0)

#### Test Code generation LLM with Query

In [None]:
# Define a function to query the SQLDBChain
def query_database(question, llm ,db):
    
    # Call the SQLDBChain to get the answer based on the question
    answer, sql_query = create_sql_chain(llm=llm, db=langchain_db, question)

    return answer


In [None]:

query_database('How many customers are there?')

## Task 1 : Create LLM Chain

This part of the hackathon will be up to you to implement. We have provided example code for you to use as examples but its up to you what you create!

#### Step 1 

- Can you write a better prompt?
- How can this prompt be used to improve the performance of the LLM?
- Can we use different LLMs to achiveve better performance? Eg. code generation and Text generation LLMs

#### Step 2: Create Simple Gradio Interface 

## Task 2: Create SQL Agent with ToolKit

Now that we have a more capable LLM and gradio interface implemented, can we make it better? 

## Task 3: Create Gradio chatbot

## Task 4: Create Pandas Agent

Now that we have made an SQLAgent and a Gradio chatbot, we can go further and make a Data Analytics agent that is able to perform analysis and plot relevant charts