Arnav Joshi's submission for Assignment #1 of INFO 5940.
Before starting, ensure you have the following installed on your system:
- Docker (Ensure Docker Desktop is running)
- VS Code
- VS Code Remote - Containers Extension
- Git
- OpenAI API Key
The installation for this assignment should be nearly identical to cloning and setting up the repository straight from GitHub. However, due to minor issues in the setup that currently exist in the repository (and some inefficiencies), the instructions are in a different order and slightly modified.
Open a terminal and run:
git clone https://github.com/joshiarnav/INFO5940-A1.git
cd INFO5940-A1
Since docker-compose.yml
expects environment variables, follow these steps:
-
Inside the project folder, create a
.env
file:ni .env
or for Mac:
touch .env
-
Add your API key and base URL:
OPENAI_API_KEY=your-api-key-here OPENAI_BASE_URL=https://api.ai.it.cornell.edu/ TZ=America/New_York
Now, your API key will be automatically loaded inside the container.
- Open VS Code, navigate to the
INFO5940-A1
folder. - Open the Command Palette (
Ctrl+Shift+P
orCmd+Shift+P
on Mac) and search for:Dev Containers: Rebuild and Reopen in Container
- Select this option. VS Code will build and open the project inside the container.
📌 Note: If you don’t see this option, ensure that the Dev Containers extension is installed.
- Navigate to the
INFO5940-A1
folder. - Run:
streamlit run advanced_chat.py
- Open a browser and navigate to
http://localhost:8501
(or the port number shown in the terminal if different). - Upload and chat! Supports PDFs and text files (plural) and utilizes FAISS for vectorization, vector store, and vector similarity search. Utilizes OpenAI's Embeddings (to create vector embeddings) and LLMs (for chat).
- Deleted old poetry.lock file (there was a dependency error on all tested machines in the existing
poetry.lock
file in the lecture-05 branch of INFO-5940). - Added faiss-cpu package to the poetry packages (
poetry add faiss-cpu
). - Added PyPDF2 package to the poetry packages (
poetry add PyPDF2
). - Deleted any extraneous folders/files for the assignment:
/notebooks
/data
chat_with_pdf.py
chat_with_rag.py
Chatbot.py
poetry.lock
(explained above)- Old
README.md
summary.py
tokens.py
- Modified
README.md
:- Modified name of the extension for Dev Containers (
Dev Containers: Rebuild and Reopen in Container
). The original instructions in the lecture-05 branch of INFO-5940 were for Remote - Containers which is deprecated. - Moved API key setup before starting the container (this avoids rebuilding the container to add the API key).
- Modified name of the extension for Dev Containers (
- Course Material (lecture-05 branch of INFO-5940)
- FAISS Documentation
- OpenAI Platform Page
-
Ensure Docker Desktop is running.
-
Run
docker-compose up --build
again. -
If errors persist, delete existing containers with:
docker-compose down
Then restart:
docker-compose up --build
- Ensure you’re using the correct port (
8888
). - Run
docker ps
to check if the container is running.
- Check if
.env
is correctly created. - Ensure
docker-compose.yml
includesenv_file: - .env
. - Restart the container after making changes (
docker-compose up --build
).