Learning Journey

This repo captures early learning activities acround the building of a local GenAi solution.

The approach taken is iterative using the folowing steps:

Query CHatGPT for guidance
Investigate the usefulness of the guidance
Expand on queries and document findings.

A Jupyter Notebook Environment would dramatically help here.

Approach 1

Creating a domain-specific text analytics application with a natural language interface (NLU) as the user experience (UX).

Goal	ChatGPT Guidance
Establish a comprehensive guide for a beginner to explore a corpus of text data from PDF files.	Prompt 1 Guidance

Observations

Code lacks integration of text-analytics with a model
Proposed code uses deprecated Open SDK APIs

Issues

Problem 1

ModuleNotFoundError: No Module Named openai

Solution 1

Follow the steps below to install the openai package for the current interpreter

Enter the python terminal session using python and then run the following code

import sys
print(sys.executable)

get the current interpreter path

/Users/dag/Code/sandbox/chatgpt-101/text_analytics_env/bin/python

Copy the path and install openai using the following command in the terminal

/Users/dag/Code/sandbox/chatgpt-101/text_analytics_env/bin/python -m pip install openai

Problem 2

OpenAI Deprecated API.

Solution 2

OpenAI SDK Migration

Problem 3

Setting OpenAI API Key

Solution 3

export OPENAI_API_KEY=<ENTER KEY  HERE>

Approach 2

Integration of text-analytics with language model

Creating a domain-specific text analytics application with a natural language interface (NLU) as the user experience (UX).

Goal	ChatGPT Guidance
Integrate the extracted PDF data into the text analytics application and ensure the language model (engine) can provide accurate, domain-specific responses.	Prompt 2 Guidance

Observations

Two options were proposed. Explored Option 2: Using Document Embeddings for Retrieval-Based Q&A which yielded numerous runt-time errors.

Issues

Problem 4

Integrated PDF data

Solution 4

See library docs

pip install -U sentence-transformers

Problem 5

Object of type SentenceTransformer is not JSON serializable

Research Inquiry 1

Creating a domain-specific text analytics application with a natural language interface using Python involves several steps. Here’s a detailed guide for beginners to establish a reusable set of Python scripts to accomplish this task, using the latest versions of OpenAI SDK and Streamlit for the UX.

Goal	ChatGPT Guidance
Propose a solution that manages embeddings and indexes manually with FAISS.	Prompt 1 Guidance

Research Inquiry 2

How would the solution to Approach 3 be modified and improved by using Vector DBs?

Goal	ChatGPT Guidance
Explore Vector DB benefits	Technology Comparison from Prompts 3 and 5

Several Vector DB options, namely, Milvus, Weaviate, Pinecone, Cassio, and MindsDB are considered and compared.

Approach 3

Test an alternative using MindsDB as the Vector DB. This will aloow for the managing and querying of embeddings. MindsDB is particularly suitable for integrating machine learning models with databases, and it can work well with vector search tasks.

Goal	ChatGPT Guidance
Leverage an open-source self managed vector db solution using MindsDB.	Prompt 4 Guidance

Observations

Setup

pip install mindsdb openai streamlit PyPDF2
brew install libmagic # for macOS

Create Virtual Env

python -m venv mindsdb-venv

Activate Virtual environment

source mindsdb-venv/bin/activate

Results

Abandoned approach. MAy still be viable but new insighjts suggest Chroma is a better approach.

Approach 4

Test an alternative using Chroma as the Vector DB. This will allow for the managing and querying of embeddings. Chroma is particularly suitable for integrating machine learning models with databases, and it can work well with vector search tasks.

Goal	ChatGPT Guidance
Leverage an open-source self managed vector db solution using Chroma.	Prompt 1 Guidance

Observations

Chroma Setup
Install and run Docker Hub Image chromadb/chroma:latest
This approach publically shares the PDF data with OpenAI Servers. To avoid this we can consider Local Embedding Generation.

Decision

Explore a Local Embedding Generation solution.

Approach 5

Test an alternative using Chroma as the Vector DB and local embedding. This will prevent the sharing of data and allow for the managing and querying of embeddings. Chroma is particularly suitable for integrating machine learning models with databases, and it can work well with vector search tasks.

Goal	ChatGPT Guidance
Leverage an open-source self managed vector db solution using Chroma and Local Embedding Generation.	Prompt 2 Guidance

Observations

Chroma Setup
Install and run Docker Hub Image chromadb/chroma:latest
See ULIDs
```
pip install py-ulid
```
Good Sentence Transformers Article

Status

Solution works in that it connects a front-end with local vector database that is primed with locally processed data. This will not scale but it helps to learn some of teh solution components.

The solution does not currently yield actual results. It is more of an operational example that needs tect analytics work.

Next Steps

Work on Text Analytics capabilities so that a query will actually result in a list of meaningful results.
Consider chuncking the PDF docs into sentences.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Learning Journey

Approach 1

Observations

Issues

Problem 1

Solution 1

Problem 2

Solution 2

Problem 3

Solution 3

Approach 2

Observations

Issues

Problem 4

Solution 4

Problem 5

Research Inquiry 1

Research Inquiry 2

Approach 3

Observations

Results

Approach 4

Observations

Decision

Approach 5

Observations

Status

Next Steps

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Approach 1		Approach 1
Approach 2		Approach 2
Approach 4		Approach 4
Approach 5		Approach 5
pdf_files		pdf_files
.gitignore		.gitignore
README.md		README.md

vinomaster/chatgpt-stack-tutorial

Folders and files

Latest commit

History

Repository files navigation

Learning Journey

Approach 1

Observations

Issues

Problem 1

Solution 1

Problem 2

Solution 2

Problem 3

Solution 3

Approach 2

Observations

Issues

Problem 4

Solution 4

Problem 5

Research Inquiry 1

Research Inquiry 2

Approach 3

Observations

Results

Approach 4

Observations

Decision

Approach 5

Observations

Status

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages