<a href="https://colab.research.google.com/github/Ghat0tkach/github-triage-bot/blob/main/Github_Triage_bot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import os
import torch
from transformers import AutoTokenizer, AutoModel
from tqdm import tqdm
!git clone https://github.com/Ghat0tkach/jlug-lenscape-event-frontend.git
!pip install transformers torch tqdm
!pip install groq

In [None]:
# Load pre-trained model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/codebert-base")
model = AutoModel.from_pretrained("microsoft/codebert-base")

In [3]:
def chunk_code(code, max_length=510):
    tokens = tokenizer.tokenize(code)
    chunks = []
    for i in range(0, len(tokens), max_length):
        chunk = tokens[i:i + max_length]
        chunks.append(tokenizer.convert_tokens_to_string(chunk))
    return chunks

def get_embeddings(code_chunk):
    inputs = tokenizer(code_chunk, return_tensors="pt", truncation=True, max_length=512, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state[:, 0, :].numpy()

def process_file(file_path):
    with open(file_path, 'r') as file:
        content = file.read()

    chunks = chunk_code(content)
    embeddings = []
    metadata = []

    for i, chunk in enumerate(chunks):
        embedding = get_embeddings(chunk)
        embeddings.append(embedding)
        metadata.append({
            'file_path': file_path,
            'chunk_index': i,
            'chunk_content': chunk
        })

    return embeddings, metadata


In [6]:
ls

[0m[01;34mjlug-lenscape-event-frontend[0m/  [01;34msample_data[0m/


In [6]:

def process_repository(repo_path):
    all_embeddings = []
    all_metadata = []

    for root, _, files in os.walk(repo_path):
        for file in tqdm(files, desc="Processing files"):
            if file.endswith(('.py', '.js', '.java', '.cpp', '.c', '.html', '.css','.jsx','.tsx','.ts')):  # Add more extensions as needed
                print("File ", file)
                file_path = os.path.join(root, file)
                embeddings, metadata = process_file(file_path)
                all_embeddings.extend(embeddings)
                all_metadata.extend(metadata)

    return all_embeddings, all_metadata

# Process the repository
repo_path = 'jlug-lenscape-event-frontend'  # Adjust this to the cloned repo's path
embeddings, metadata = process_repository(repo_path)

print(f"Total embeddings generated: {len(embeddings)}")
print(f"Sample embedding shape: {embeddings[0].shape}")
print(f"Sample metadata: {metadata[0]}")

Processing files:   0%|          | 0/10 [00:00<?, ?it/s]

File  tailwind.config.ts


Processing files: 100%|██████████| 10/10 [00:02<00:00,  3.34it/s]
Processing files:   0%|          | 0/1 [00:00<?, ?it/s]

File  useUserStore.ts


Processing files: 100%|██████████| 1/1 [00:01<00:00,  1.13s/it]
Processing files:   0%|          | 0/1 [00:00<?, ?it/s]

File  globals.css


Processing files: 100%|██████████| 1/1 [00:02<00:00,  2.57s/it]
Processing files:   0%|          | 0/2 [00:00<?, ?it/s]

File  page.tsx


Processing files:  50%|█████     | 1/2 [00:06<00:06,  6.35s/it]

File  layout.tsx


Processing files: 100%|██████████| 2/2 [00:06<00:00,  3.32s/it]
Processing files:   0%|          | 0/2 [00:00<?, ?it/s]

File  user.ts


Processing files:  50%|█████     | 1/2 [00:00<00:00,  3.76it/s]

File  post.ts


Processing files: 100%|██████████| 2/2 [00:00<00:00,  3.86it/s]
Processing files:   0%|          | 0/2 [00:00<?, ?it/s]

File  page.tsx


Processing files:  50%|█████     | 1/2 [00:22<00:22, 22.18s/it]

File  layout.tsx


Processing files: 100%|██████████| 2/2 [00:22<00:00, 11.26s/it]
Processing files:   0%|          | 0/2 [00:00<?, ?it/s]

File  page.tsx


Processing files:  50%|█████     | 1/2 [00:00<00:00,  1.90it/s]

File  layout.tsx


Processing files: 100%|██████████| 2/2 [00:01<00:00,  1.51it/s]
Processing files:   0%|          | 0/3 [00:00<?, ?it/s]

File  checkServer.ts


Processing files:  33%|███▎      | 1/3 [00:02<00:05,  2.84s/it]

File  posts.tsx


Processing files:  67%|██████▋   | 2/3 [00:04<00:02,  2.14s/it]

File  userApi.ts


Processing files: 100%|██████████| 3/3 [00:05<00:00,  1.91s/it]
Processing files: 100%|██████████| 2/2 [00:00<00:00, 19152.07it/s]
Processing files:   0%|          | 0/2 [00:00<?, ?it/s]

File  page.tsx


Processing files:  50%|█████     | 1/2 [00:12<00:12, 12.09s/it]

File  layout.tsx


Processing files: 100%|██████████| 2/2 [00:12<00:00,  6.16s/it]
Processing files:   0%|          | 0/2 [00:00<?, ?it/s]

File  page.tsx


Processing files:  50%|█████     | 1/2 [00:05<00:05,  5.12s/it]

File  layout.tsx


Processing files: 100%|██████████| 2/2 [00:05<00:00,  2.70s/it]
Processing files:   0%|          | 0/2 [00:00<?, ?it/s]

File  auth.utils.ts


Processing files:  50%|█████     | 1/2 [00:02<00:02,  2.56s/it]

File  utils.ts


Processing files: 100%|██████████| 2/2 [00:02<00:00,  1.44s/it]
Processing files:   0%|          | 0/1 [00:00<?, ?it/s]

File  index.ts


Processing files: 100%|██████████| 1/1 [00:03<00:00,  3.30s/it]
Processing files: 100%|██████████| 5/5 [00:00<00:00, 57614.07it/s]
Processing files: 100%|██████████| 1/1 [00:00<00:00, 14614.30it/s]
Processing files: 0it [00:00, ?it/s]
Processing files: 0it [00:00, ?it/s]
Processing files: 100%|██████████| 1/1 [00:00<00:00, 15709.00it/s]
Processing files: 0it [00:00, ?it/s]
Processing files: 100%|██████████| 1/1 [00:00<00:00, 15307.68it/s]
Processing files: 0it [00:00, ?it/s]
Processing files: 100%|██████████| 1/1 [00:00<00:00, 14413.42it/s]
Processing files: 0it [00:00, ?it/s]
Processing files: 100%|██████████| 1/1 [00:00<00:00, 14614.30it/s]
Processing files: 0it [00:00, ?it/s]
Processing files: 100%|██████████| 1/1 [00:00<00:00, 11618.57it/s]
Processing files: 0it [00:00, ?it/s]
Processing files: 100%|██████████| 2/2 [00:00<00:00, 28339.89it/s]
Processing files: 0it [00:00, ?it/s]
Processing files: 100%|██████████| 13/13 [00:00<00:00, 167772.16it/s]
Processing files: 100%|██████████|

File  postDialogs.tsx


Processing files:   8%|▊         | 1/12 [00:08<01:32,  8.45s/it]

File  faq.tsx


Processing files:  17%|█▋        | 2/12 [00:13<01:06,  6.60s/it]

File  teamMembersAndInvitation.tsx


Processing files:  25%|██▌       | 3/12 [00:16<00:45,  5.05s/it]

File  likedPosts.tsx


Processing files:  33%|███▎      | 4/12 [00:18<00:29,  3.70s/it]

File  googleAnalytics.tsx


Processing files:  42%|████▏     | 5/12 [00:19<00:18,  2.66s/it]

File  userInfoCard.tsx


Processing files:  50%|█████     | 6/12 [00:21<00:14,  2.36s/it]

File  participantAnalytics.tsx


Processing files:  58%|█████▊    | 7/12 [00:26<00:15,  3.17s/it]

File  footer.tsx


Processing files:  67%|██████▋   | 8/12 [00:34<00:18,  4.70s/it]

File  loader.tsx


Processing files:  75%|███████▌  | 9/12 [00:34<00:10,  3.41s/it]

File  imageModel.tsx


Processing files:  83%|████████▎ | 10/12 [00:35<00:05,  2.77s/it]

File  homePageHeader.tsx


Processing files:  92%|█████████▏| 11/12 [00:37<00:02,  2.50s/it]

File  lockedPosts.tsx


Processing files: 100%|██████████| 12/12 [00:39<00:00,  3.28s/it]
Processing files:   0%|          | 0/13 [00:00<?, ?it/s]

File  alert.tsx


Processing files:   8%|▊         | 1/13 [00:01<00:20,  1.74s/it]

File  container-scroll-animation.tsx


Processing files:  15%|█▌        | 2/13 [00:04<00:28,  2.55s/it]

File  input.tsx


Processing files:  23%|██▎       | 3/13 [00:05<00:16,  1.66s/it]

File  card.tsx


Processing files:  31%|███       | 4/13 [00:06<00:14,  1.60s/it]

File  dialog.tsx


Processing files:  38%|███▊      | 5/13 [00:10<00:17,  2.24s/it]

File  flip-words.tsx


Processing files:  46%|████▌     | 6/13 [00:13<00:17,  2.53s/it]

File  background-lines.tsx


Processing files:  54%|█████▍    | 7/13 [00:37<00:58,  9.69s/it]

File  toast.tsx


Processing files:  62%|██████▏   | 8/13 [00:42<00:40,  8.11s/it]

File  tooltip.tsx


Processing files:  69%|██████▉   | 9/13 [00:43<00:23,  5.81s/it]

File  hero-parallax.tsx


Processing files:  77%|███████▋  | 10/13 [00:53<00:21,  7.07s/it]

File  checkbox.tsx


Processing files:  85%|████████▍ | 11/13 [00:54<00:10,  5.21s/it]

File  button.tsx


Processing files:  92%|█████████▏| 12/13 [00:55<00:04,  4.11s/it]

File  label.tsx


Processing files: 100%|██████████| 13/13 [00:56<00:00,  4.33s/it]
Processing files:   0%|          | 0/1 [00:00<?, ?it/s]

File  use-toast.ts


Processing files: 100%|██████████| 1/1 [00:03<00:00,  4.00s/it]

Total embeddings generated: 158
Sample embedding shape: (1, 768)
Sample metadata: {'file_path': 'jlug-lenscape-event-frontend/tailwind.config.ts', 'chunk_index': 0, 'chunk_content': 'import type { Config } from "tailwindcss"\n\nconst config = {\n  darkMode: ["class"],\n  content: [\n    \'./pages/**/*.{ts,tsx}\',\n    \'./components/**/*.{ts,tsx}\',\n    \'./app/**/*.{ts,tsx}\',\n    \'./src/**/*.{ts,tsx}\',\n\t],\n  prefix: "",\n  theme: {\n    container: {\n      center: true,\n      padding: "2rem",\n      screens: {\n        "2xl": "1400px",\n      },\n    },\n    extend: {\n      colors: {\n        border: "hsl(var(--border))",\n        input: "hsl(var(--input))",\n        ring: "hsl(var(--ring))",\n        background: "hsl(var(--background))",\n        foreground: "hsl(var(--foreground))",\n        primary: {\n          DEFAULT: "hsl(var(--primary))",\n          foreground: "hsl(var(--primary-foreground))",\n        },\n        secondary: {\n          DEFAULT: "hsl(var(--secondar




In [10]:
import numpy as np
import json

# Save embeddings
embeddings_array = np.array([e[0] for e in embeddings])
np.save('embeddings.npy', embeddings_array)

# Save metadata
with open('metadata.json', 'w') as f:
    json.dump(metadata, f)

print("Embeddings saved to 'embeddings.npy'")
print("Metadata saved to 'metadata.json'")

Embeddings saved to 'embeddings.npy'
Metadata saved to 'metadata.json'


In [None]:
import numpy as np
import json
import pandas as pd

# Load embeddings
loaded_embeddings = np.load('embeddings.npy')

# Load metadata
with open('metadata.json', 'r') as f:
    loaded_metadata = json.load(f)

# Create a DataFrame
df = pd.DataFrame(loaded_embeddings)

# Add metadata columns
df['file_path'] = [item['file_path'] for item in loaded_metadata]
df['chunk_index'] = [item['chunk_index'] for item in loaded_metadata]

# Rename embedding columns
df.columns = [f'dim_{i}' if isinstance(i, int) else i for i in df.columns]

# Display the first few rows
print(df.head())

# Display info about the DataFrame
print(df.info())

# If you want to see all columns, you can use:
# pd.set_option('display.max_columns', None)
# print(df)

# Save as CSV if needed
df.to_csv('embeddings_with_metadata.csv', index=False)
print("Saved embeddings with metadata to 'embeddings_with_metadata.csv'")

In [14]:
import numpy as np
import json
from scipy.spatial.distance import cosine
from groq import Groq
from transformers import AutoTokenizer, AutoModel
import torch

# Load pre-trained model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/codebert-base")
model = AutoModel.from_pretrained("microsoft/codebert-base")

def encode_query(query):
    inputs = tokenizer(query, return_tensors="pt", truncation=True, max_length=512, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state[:, 0, :].numpy()[0]

# Load embeddings and metadata
embeddings = np.load('embeddings.npy')
with open('metadata.json', 'r') as f:
    metadata = json.load(f)

def find_relevant_snippets(query, embeddings, metadata, top_k=3):
    query_embedding = encode_query(query)
    similarities = [1 - cosine(query_embedding, emb) for emb in embeddings]
    top_indices = np.argsort(similarities)[-top_k:][::-1]
    return [metadata[i] for i in top_indices], [similarities[i] for i in top_indices]

# Function to query Groq API
client = Groq(api_key="")

def query_groq(question, context, similarities):
    prompt = f"""You are an AI assistant specialized in answering questions about code.
    Given the following code snippets, their relevance scores, and a question, provide a detailed answer.
    Use the relevance scores to weight the importance of each snippet in your answer.

    Code snippets and their relevance scores:
    {context}

    Question: {question}

    Answer:"""

    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model="mixtral-8x7b-32768",
        temperature=0.5,
        max_tokens=1024,
    )

    return chat_completion.choices[0].message.content

# Main question-answering loop
while True:
    question = input("Ask a question about the code (or type 'exit' to quit): ")
    if question.lower() == 'exit':
        break

    relevant_snippets, similarities = find_relevant_snippets(question, embeddings, metadata)
    context = "\n\n".join([f"File: {snippet['file_path']}\nRelevance: {sim:.4f}\nChunk: {snippet['chunk_content']}"
                           for snippet, sim in zip(relevant_snippets, similarities)])

    answer = query_groq(question, context, similarities)
    print("\nAnswer:", answer)
    print("\n" + "="*50 + "\n")

Ask a question about the code (or type 'exit' to quit): hi

Answer: Hello! It seems like you've provided a question, but it's quite broad. I'm here to help with specific coding questions. If you have a question related to the code snippets you've given, I'd be happy to help with that. 

For instance, if you have a question about a function or a specific line of code in the files `jlug-lenscape-event-frontend/app/api/userApi.ts`, `jlug-lenscape-event-frontend/components/teamMembersAndInvitation.tsx`, or `jlug-lenscape-event-frontend/components/ui/container-scroll-animation.tsx`, I can provide a detailed answer based on those snippets, weighting the importance of each snippet according to their relevance scores.

However, for a broad question like "hi", it's a bit difficult to provide a meaningful response. If you could provide more details or specify what you'd like to know about the code, I'd be happy to help!


Ask a question about the code (or type 'exit' to quit): can you tell what 

KeyboardInterrupt: Interrupted by user