In [1]:
import yaml

# load config
with open("config.yaml", "r") as file:
    config = yaml.safe_load(file)

## extracting

In [2]:
import fitz  # PyMuPDF

def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    all_text = []
    
    for page_num in range(len(doc)):
        page = doc.load_page(page_num)
        text = page.get_text("text")
        text = text.strip()
        if text:
            all_text.append(text)
    
    full_text = "\n\n".join(all_text)
    return full_text

In [3]:
path = "doc/Ethan Rasiel, Ph.D., Paul N. Friga - The McKinsey Mind.pdf"
raw_text = extract_text_from_pdf(pdf_path=path)

## chunking

In [4]:
from langchain.text_splitter import CharacterTextSplitter

# 2️⃣ Chunk your text
text_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=100,
    chunk_overlap=20
)
chunks = text_splitter.split_text(raw_text)

Created a chunk of size 125, which is longer than the specified 100
Created a chunk of size 127, which is longer than the specified 100
Created a chunk of size 122, which is longer than the specified 100
Created a chunk of size 127, which is longer than the specified 100
Created a chunk of size 119, which is longer than the specified 100
Created a chunk of size 118, which is longer than the specified 100
Created a chunk of size 124, which is longer than the specified 100
Created a chunk of size 121, which is longer than the specified 100
Created a chunk of size 122, which is longer than the specified 100
Created a chunk of size 125, which is longer than the specified 100
Created a chunk of size 127, which is longer than the specified 100
Created a chunk of size 129, which is longer than the specified 100
Created a chunk of size 125, which is longer than the specified 100
Created a chunk of size 122, which is longer than the specified 100
Created a chunk of size 132, which is longer tha

In [6]:
chunks[:5]

['TEAMFLY\nTeam-Fly®\nT H E\nMIND\nMCKINSEY\nThis page intentionally left blank.',
 'New York   Chicago   San Francisco   Lisbon   London   Madrid   Mexico City',
 'Milan   New Delhi   San Juan   Seoul   Singapore   Sydney   Toronto\nT H E',
 'T H E\nE T H A N  M .  R A S I E L\nAU T H O R  O F  T H E  M C K I N S E Y  WAY\nA N D PAUL N. FRIGA',
 'A N D PAUL N. FRIGA\nUnderstanding and Implementing the Problem-']

## embedding

In [7]:
from sentence_transformers import SentenceTransformer

embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
# Embed text
embeddings = embedding_model.encode(chunks)
print(f"Vector length: {len(embeddings[0])}")

  from .autonotebook import tqdm as notebook_tqdm


Vector length: 384


## store in db

### chroma cloud

In [48]:
import chromadb
  
chroma_client = chromadb.CloudClient(
  api_key=config['chroma_api'],
  tenant=config['chroma_tenant'],
  database='first_rag'
)

In [66]:
# 3️⃣ Create or get collection
collection = chroma_client.get_or_create_collection(name="my_collection")

In [61]:
# # Cloud collections work the same way!
# collection.add(
#     ids=[f"doc_{i}" for i in range(len(chunks))],  # unique IDs
#     embeddings=embeddings.tolist(),               # must be list of lists!
#     documents=chunks,                              # optional, but useful
#     metadatas=[{"source": "example"} for _ in chunks]  # optional metadata
# )

# print("✅ Vectors stored in Chroma Cloud!")

In [76]:
# Just store the first 200 chunks instead of all
n = 300
collection.add(
    ids=[f"doc_{i}" for i in range(n)],
    embeddings=embeddings[:n].tolist(),
    documents=chunks[:n],
    metadatas=[{"source": "example"} for _ in range(n)]
)

### chroma local

In [8]:
import chromadb

# Local vector DB (PersistentClient = new style!)
chroma_client = chromadb.PersistentClient(path="./vector_database")

collection = chroma_client.get_or_create_collection(name="my_local_collection")

In [9]:
# Cloud collections work the same way!
collection.add(
    ids=[f"doc_{i}" for i in range(len(chunks))],  # unique IDs
    embeddings=embeddings.tolist(),               # must be list of lists!
    documents=chunks,                              # optional, but useful
    metadatas=[{"source": "example"} for _ in chunks]  # optional metadata
)

## retrieve

In [42]:
query = "How to think like McKinsey?"

# Embed locally
query_embedding = embedding_model.encode([query])

# Search Chroma Cloud
results = collection.query(
    query_embeddings=query_embedding.tolist(),
    n_results=300
)

# Extract relevant chunks
contexts = results['documents'][0]
context_text = "\n".join(contexts)

In [43]:
contexts[:10]

['really felt at McKinsey like you had an advocate.',
 'the “McKinsey Mind” not to confine itself to brilliant prob-',
 'The McKinsey Mind\nObviously, even more than the lessons on advancing one’s career,',
 'model of the McKinsey Mind—analyzing, presenting, and man-',
 'If there is a stereotype of McKinsey in the minds of business-',
 'The McKinsey Mind\nhave an elevator ride to get your point across to them. What',
 'The McKinsey Mind. Those of you who want to follow up on your',
 'McKinsey has developed a strong, fact-based culture that man-',
 'Is there a question about McKinsey we’ve forgotten to ask? What’s your\nanswer?',
 'in my post-McKinsey career involves the search for the very']

In [36]:
import google.generativeai as genai

# Configure the SDK
genai.configure(api_key=config['genimi_api'])

# Create the Gemini 2.5 Flash model
gemini_model = genai.GenerativeModel("gemini-2.5-flash")

In [40]:
response = gemini_model.generate_content(query)

print(response.text)

Thinking like McKinsey (or any top-tier management consulting firm) isn't about memorizing a checklist; it's about internalizing a *mindset* and a *structured problem-solving approach*. It's a way of looking at complex business challenges and breaking them down into manageable, actionable pieces.

Here's a breakdown of how to cultivate this thinking:

---

**I. The Core Mindset Principles**

1.  **Problem-First & Client-Centric:**
    *   **Always start with the core problem:** Don't jump to solutions. What is the *real* question the client needs to answer? Often, clients present symptoms, not root causes. Use the "5 Whys" technique to dig deeper.
    *   **Focus on the client's needs:** What is their ultimate goal? How will this solution create tangible value for *them*? Understand their context, constraints, and capabilities.

2.  **Hypothesis-Driven:**
    *   **Formulate an educated guess early:** Before deep diving into data, develop a preliminary hypothesis about the problem and 

In [45]:
# Call your local Llama
prompt = f"""Answer the question below using ONLY the context below.

Context:
{context_text}

Question:
{query}

Answer:"""

response = gemini_model.generate_content(prompt)

print(response.text)

To think like McKinsey, one should adopt a problem-solving and decision-making process that is fact-based and hypothesis-driven. This involves:

*   **Framing Problems:** Breaking down complex business problems to make them susceptible to rigorous fact-based analysis.
*   **Structured Thinking:** Utilizing a framework-driven approach to organize thoughts and simplify complex problems into clear representations (e.g., using two-by-two matrices or other simple structures).
*   **Fact-based Analysis:** Relying heavily on facts ("Facts are friendly") for problem-solving.
*   **Data Gathering and Analysis:** Employing techniques to gather, manage, and analyze data to test hypotheses and extract useful conclusions, generating insights.
*   **Communication and Presentation:** Presenting ideas with a flowing, logical structure to convey the full impact and ensure they are accepted, aiming to "make change happen."
*   **Balancing Intuition and Data:** While historically less emphasis on intuiti