<a href="https://colab.research.google.com/github/yash-sawant/dc-know-it-all/blob/main/DC_Dolly_Expert_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
!pip install accelerate>=0.12.0 transformers[torch]==4.25.1
!pip install langchain
# !python -m spacy download en_core_web_trf

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting langchain
  Downloading langchain-0.0.146-py3-none-any.whl (600 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m600.7/600.7 kB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting SQLAlchemy<2,>=1
  Downloading SQLAlchemy-1.4.47-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m36.3 MB/s[0m eta [36m0:00:00[0m
Collecting async-timeout<5.0.0,>=4.0.0
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting openapi-schema-pydantic<2.0,>=1.2
  Downloading openapi_schema_pydantic-1.2.4-py3-none-any.whl (90 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.0/90.0 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.6.0,>=0.5.7
  Downloading dataclasses_json-0.5.

In [None]:
!git clone https://github.com/yash-sawant/dc-know-it-all.git

Cloning into 'dc-know-it-all'...
remote: Enumerating objects: 8, done.[K
remote: Counting objects: 100% (8/8), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 8 (delta 1), reused 4 (delta 1), pack-reused 0[K
Unpacking objects: 100% (8/8), 74.84 KiB | 4.68 MiB/s, done.


In [None]:
!python -m spacy download en_core_web_md

2023-04-21 16:45:33.844582: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-21 16:45:38.418470: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-04-21 16:45:38.419114: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-04-

In [None]:
import torch
from transformers import pipeline
from langchain import PromptTemplate, LLMChain
from langchain.llms import HuggingFacePipeline

class Dolly:
  def __init__(self):
    # generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
    generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16,
                            trust_remote_code=True, device_map="auto", return_full_text=True)

    # template for an instrution with no input
    prompt = PromptTemplate(
        input_variables=["instruction"],
        template="{instruction}")

    # template for an instruction with input
    prompt_with_context = PromptTemplate(
        input_variables=["instruction", "context"],
        template="{instruction}\n\nInput:\n{context}")

    hf_pipeline = HuggingFacePipeline(pipeline=generate_text)
    self.llm_chain = LLMChain(llm=hf_pipeline, prompt=prompt)
    self.llm_context_chain = LLMChain(llm=hf_pipeline, prompt=prompt_with_context)

  def ask(self, ques):
    return self.llm_chain.predict(instruction=ques).lstrip()


  def ask_with_context(self, ques, context):
    return self.llm_context_chain.predict(instruction=ques, context=context).lstrip()


In [None]:
import pandas as pd
import spacy
from sklearn.metrics.pairwise import cosine_similarity
# load spacy model
import numpy as np

DATA_PATH = 'dc-know-it-all/dc_data.xlsx'

class DCExpert:
    def __init__(self, path):
        self.df = pd.read_excel(path,
                           header=[0],
                           index_col=[0,1,2]
                           )
        self.df.dropna(inplace=True)
        print('Loading spacy...')
        self.nlp = spacy.load('en_core_web_md')#, disable=["lemmatizer", "tagger", "parser", "ner"])
        print('spacy data loaded')
        self.question_encoding = np.vstack(self.df['questions'].apply(lambda p:self.nlp(p).vector))
        # print(self.question_encoding)

    def get_context(self, ques):
        ques_enc = self.nlp(ques).vector.reshape((1, -1))
        similarity_scores = cosine_similarity(ques_enc, self.question_encoding)
        similarity_scores=similarity_scores.reshape((-1))
        q_idx = np.argsort(similarity_scores)[::-1][0]

        return self.df.iloc[q_idx, :].name[1]




In [None]:
exp1 = DCExpert(DATA_PATH)
dolly = Dolly()

Loading spacy...
spacy data loaded


Downloading (…)lve/main/config.json:   0%|          | 0.00/820 [00:00<?, ?B/s]

Downloading (…)instruct_pipeline.py:   0%|          | 0.00/9.10k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/5.68G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/450 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/228 [00:00<?, ?B/s]

In [None]:
q = 'How are students chosen for very programs that are difficult to get into?'

dolly.ask_with_context(q,exp1.get_context(q))

'Wellness coaching is a conversation about the things you believe are important. It is an exploration of mindset, perspectives, values and priorities, and aims to help you move forward and toward wellness and self-development goals and/or actions you want to take. Wellness coaching can help you find clarity and confidence in who you are, relationships, making decisions, and goal attainment. Coaching can also be a journey in self-efficacy, boundaries, and holistic, personal care. Wellness coaching is based on the principal that individuals are resourceful and full, and can help people enhance the important areas of their lives. Coaching is not counselling, therapy or academic advising.'

In [None]:
# text = input("Please enter your text: \n")
question = input("\nPlease enter your question: \n")
ans = ''
while True:
    ans = dolly.ask_with_context(question,ans+exp1.get_context(q))
    flag = True
    flag_N = False
    print(ans)
    while flag:
        response = input("\nDo you want to ask another question based on this text (Y/N)? ")
        if response[0] == "Y":
            question = input("\nPlease enter your question: \n")
            flag = False
        elif response[0] == "N":
            print("\nBye!")
            flag = False
            flag_N = True
            
    if flag_N == True:
        break


Please enter your question: 
What are the financial options at Durham College?
The financial options at Durham College include:
- Free personal wellness coaching to help you move forward and toward wellness and self-development goals and/or actions you want to take
- Wellness workshops and events such as Friday Q&A, Healthy Living Series, Master Class and more
- Wellness Based Workshops and Event Organizers receive a fee for logistics, printing and promotional items related to the event
- The Wellness Den offers a range of programs and services responsive to the unique wellness needs of each student. Some of these services include Drop-in Wednesdays between 10:00am-2:00pm to learn more and discover resources such as condoms, pregnancy tests, menstruation products and a healthy snack cupboard.

Do you want to ask another question based on this text (Y/N)? Y

Please enter your question: 
Are there any scholarships available?
The following is a list of scholarships available for undergra