# AI Scoring Experiment Notebook

This notebook demonstrates the text analysis logic used in the Hackflow AI service. 
It uses TF-IDF and Cosine Similarity to score project submissions against ideal archetypes.

In [None]:
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer

# Download NLTK data (lightweight)
try:
    nltk.data.find('vader_lexicon')
except LookupError:
    nltk.download('vader_lexicon')

try:
    nltk.data.find('punkt')
except LookupError:
    nltk.download('punkt')

In [None]:
# --- IDEAL PROJECT ARCHETYPES ---
# We compare submissions against these "perfect" descriptions.

IDEAL_INNOVATION = """
This project introduces a novel, groundbreaking approach to solving a complex problem. 
It utilizes state-of-the-art technology, unique algorithms, and patent-pending methods. 
The solution is a game-changer, disrupting the current market with creativity and out-of-the-box thinking.
It leverages generative AI, blockchain, or quantum computing in a way never seen before.
"""

IDEAL_TECHNICAL = """
The system architecture is highly scalable, secure, and optimized for performance. 
It uses microservices, Docker, Kubernetes, and efficient database schemas. 
The code is clean, modular, and follows best practices with low latency and high throughput. 
API endpoints are RESTful or GraphQL, with robust authentication and encryption.
Implementation details show a deep understanding of the tech stack (React, Node, Python, cloud).
"""

IDEAL_BUSINESS = """
The product has a clear target audience and a strong business model. 
It addresses a significant market need with a cost-effective solution. 
The plan includes user acquisition strategies, retention metrics, and a path the monetization.
It demonstrates high ROI, financial feasibility, and real-world impact.
"""

# Combined corpus for vectorization training
CORPUS = [IDEAL_INNOVATION, IDEAL_TECHNICAL, IDEAL_BUSINESS]

# Initialize Vectorizer
vectorizer = TfidfVectorizer(stop_words='english')
# "Train" the vectorizer on our ideal archetypes
tfidf_matrix_archetypes = vectorizer.fit_transform(CORPUS)

# Initialize Sentiment Analyzer
sia = SentimentIntensityAnalyzer()

In [None]:
def analyze_project(name, notes_text, extracted_text, github_url):
    print(f"--- Analyzing: {name} ---")
    combined_text = (notes_text or "") + "\n" + (extracted_text or "")
    
    if not combined_text or len(combined_text.strip()) < 10:
        print("Text too short to analyze.")
        return

    # 1. Similarity Scoring
    submission_vector = vectorizer.transform([combined_text])
    similarities = cosine_similarity(submission_vector, tfidf_matrix_archetypes)
    
    inn_score = similarities[0][0]
    tech_score = similarities[0][1]
    biz_score = similarities[0][2]
    
    # 2. Sentiment Analysis
    sentiment = sia.polarity_scores(combined_text)['compound']
    
    # 3. Lexical Diversity
    words = nltk.word_tokenize(combined_text)
    unique_words = set(words)
    diversity = len(unique_words) / len(words) if words else 0

    print(f"Innovation Match: {inn_score:.4f}")
    print(f"Technical Match: {tech_score:.4f}")
    print(f"Business Match:   {biz_score:.4f}")
    print(f"Sentiment Score:  {sentiment:.4f}")
    print(f"Lexical Diversity: {diversity:.4f}")
    
    return inn_score, tech_score, biz_score

In [None]:
# TEST CASE 1: Strong Submission
strong_notes = "We built a decentralized application using Ethereum smart contracts and IPFS for storage. The system uses Zero-Knowledge Proofs for privacy. It solves the issue of data sovereignty."
strong_tech = "Technically, we used React for frontend, Node.js for backend, and Solidity for contracts. The architecture is microservices based with Docker containerization."

analyze_project("Strong Submission", strong_notes, strong_tech, "http://github.com/repo")

In [None]:
# TEST CASE 2: Weak Submission
weak_notes = "We made a simple website."
weak_tech = "It has a login page."

analyze_project("Weak Submission", weak_notes, weak_tech, "")