Welcome to CodeChat, a web app that revolutionizes the way you interact with GitHub repositories. CodeChat allows users to seamlessly search and communicate with a GitHub repository using natural language. With the power of llama-index, ChromaDB vector store, and Streamlit, CodeChat offers an innovative and intuitive experience for developers and collaborators.
Click the image above to watch a brief demonstration of CodeChat in action.
Index your github repo by providing the URL
CodeChat is a web application designed to enhance collaboration and communication among developers working on GitHub repositories. By integrating llama-index for efficient code search, ChromaDB vector store for semantic understanding, and Streamlit for a user-friendly interface, CodeChat streamlines the process of querying and discussing code.
-
Enter GitHub URL: Users start by entering the URL of the GitHub repository they want to explore and communicate with.
-
Natural Language Interaction: Once the repository is loaded, users can interact with the codebase using natural language queries. CodeChat leverages a unique combination of keyword search (bm25), vector indexing, and reranking to provide accurate and relevant results.
-
Chat Interface: The chat interface allows users to ask questions, seek code snippets, or discuss specific parts of the codebase. Responses are generated based on the RAG pipeline, providing a more contextual and meaningful interaction.
- llama-index: Efficient and fast code search engine.
- Chroma DB: Creating Vector Store and Semantic Understanding of code.
- Streamlit: User-friendly web application framework.
CodeChat employs a RAG (Retrieval-Augmented Generation) pipeline that involves fusion of keyword and vector search to enhance the interaction between users and code repositories. The pipeline involves:
- Keyword Search (bm25): Initial retrieval of relevant code snippets based on keyword queries.
- Vector Indexing: Semantic understanding and representation of code using ChromaDB's vector store.
- Reranking: Fine-tuning and reranking of results to provide more accurate and context-aware responses.
To set up CodeChat locally, follow these steps:
-
Clone the Repository:
git clone https://github.com/zmusaddique/codechat.git cd codechat
-
Install Dependencies:
pip install -r requirements.txt
-
Run the Application:
streamlit run app.py
-
Open in Browser: Open your web browser and go to http://localhost:8501 to access CodeChat.
Before running CodeChat, you need to set up a .env
file to configure the necessary environment variables. Follow these steps:
-
Modify the
.env
File: Find the file named.env
in the root directory of your CodeChat project. -
Add the Following Variables: Open the
.env
file in a text editor and add the following variables:# GitHub API Token (Generate one at https://github.com/settings/tokens) GITHUB_API_TOKEN=your-github-api-token # Hugging Face API Token (Generate one at https://huggingface.co/signup) HUGGING_FACE_API_TOKEN=your-hugging-face-api-token
Replace
your-github-api-token
with your GitHub API token. If you don't have one, you can generate it here. -
Save the File: Save the changes to the
.env
file.
Now, when you run CodeChat, it will automatically read these environment variables to connect to the GitHub API, llama-index, and HuggingFace.
Make sure to keep your .env
file secure and never share it publicly, as it may contain sensitive information. If you deploy CodeChat in a production environment, ensure that proper security measures are taken to protect these credentials.
Now you're ready to explore and chat with GitHub repositories using CodeChat!
Feel free to contribute, report issues, or suggest improvements. Happy coding! :)