In [4]:
!pip install langchain_community

Collecting langchain_community
  Downloading langchain_community-0.3.13-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting httpx-sse<0.5.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting langchain<0.4.0,>=0.3.13 (from langchain_community)
  Downloading langchain-0.3.13-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core<0.4.0,>=0.3.27 (from langchain_community)
  Downloading langchain_core-0.3.28-py3-none-any.whl.metadata (6.3 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.7.0-py3-none-any.whl.metadata (3.5 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.23.2-py3-none-any.whl.metadata (7.1 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-

In [5]:
!pip install -q cassio datasets langchain openai tiktoken

In [37]:
#Langchain components to use
from langchain_community import llms
from langchain.vectorstores.cassandra import Cassandra
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

# support for dataset retrieval with hugging face
from datasets import load_dataset

#with CassIO, the engine powering the Astra DB integration in Langchain
#cassio will also initialize the DB connection:
import cassio

In [38]:
!pip install PyPDF2 #help to read any text inside the pdf



In [39]:
from PyPDF2 import PdfReader

In [40]:
## SETUP FOR ASTRA DB

In [41]:
ASTRA_DB_APPLICATION_TOKEN="AstraCS:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ASTRA_DB_ID="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
OPENAI_API_KEY="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

In [42]:
# # First, create the embedding object:
# embedding = OpenAIEmbeddings()

In [62]:
#provide the path to the pdf file
pdfreader=PdfReader('query.pdf')

In [63]:
from typing_extensions import Concatenate
#read text from pdf
raw_text=''
for i, page in enumerate(pdfreader.pages):
  content=page.extract_text()
  if content:
    raw_text+=content

In [64]:
raw_text

'THETHOUSAND BRAINS PROJECT : A N EWPARADIGM FOR\nSENSORIMOTOR INTELLIGENCE\nViviane Clay∗\nNumenta, Inc.\nRedwood City\nCA, United States\nvclay@thousandbrains.org\nNiels Leadholm*\nNumenta, Inc.\nRedwood City\nCA, United States\nnleadholm@thousandbrains.orgJeff Hawkins\nNumenta, Inc.\nRedwood City\nCA, United States\njhawkins@thousandbrains.org\nDecember 25, 2024\nABSTRACT\nArtificial intelligence has advanced rapidly in the last decade, driven primarily by progress in the\nscale of deep-learning systems. Despite these advances, the creation of intelligent systems that\ncan operate effectively in diverse, real-world environments remains a significant challenge. In\nthis white paper, we outline the Thousand Brains Project, an ongoing research effort to develop\nan alternative, complementary form of AI, derived from the operating principles of the neocortex.\nWe present an early version of a thousand-brains system, a sensorimotor agent that is uniquely\nsuited to quickly learn a wide r

Initialize connection to DB

In [65]:
cassio.init(
    token=ASTRA_DB_APPLICATION_TOKEN,
    database_id=ASTRA_DB_ID,
    # secure_connect_bundle="secure-connect-astradb.zip"
)

Create the langchain embedding and llm objects for later usage

In [66]:
llm=OpenAI(openai_api_key=OPENAI_API_KEY)
embedding=OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
astra_vector_store = Cassandra(
    embedding=embedding,
    table_name="qa_mini_demo",
    session=None,
    keyspace=None
)

In [67]:
from langchain.text_splitter import CharacterTextSplitter
# to split the text using thr character text split such that it should not increase token size
text_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
)
texts=text_splitter.split_text(raw_text)


In [68]:
texts[:50]

['THETHOUSAND BRAINS PROJECT : A N EWPARADIGM FOR\nSENSORIMOTOR INTELLIGENCE\nViviane Clay∗\nNumenta, Inc.\nRedwood City\nCA, United States\nvclay@thousandbrains.org\nNiels Leadholm*\nNumenta, Inc.\nRedwood City\nCA, United States\nnleadholm@thousandbrains.orgJeff Hawkins\nNumenta, Inc.\nRedwood City\nCA, United States\njhawkins@thousandbrains.org\nDecember 25, 2024\nABSTRACT\nArtificial intelligence has advanced rapidly in the last decade, driven primarily by progress in the\nscale of deep-learning systems. Despite these advances, the creation of intelligent systems that\ncan operate effectively in diverse, real-world environments remains a significant challenge. In\nthis white paper, we outline the Thousand Brains Project, an ongoing research effort to develop\nan alternative, complementary form of AI, derived from the operating principles of the neocortex.\nWe present an early version of a thousand-brains system, a sensorimotor agent that is uniquely',
 'an alternative, complementar

Load the dataset into the vector store

In [69]:
astra_vector_store.add_texts(texts)
print("Inserted %i headlines." % len(texts[:50]))
astra_vector_index=VectorStoreIndexWrapper(vectorstore=astra_vector_store)

Inserted 50 headlines.


Run the QA cycle

What is the Thousand Brains Project, and how does it differ from traditional deep learning approaches?
How is the concept of cortical columns in the neocortex integrated into the Thousand Brains Project's design?
What are the core principles guiding the Thousand Brains Project, and how do they reflect neuroscientific insights?
What is the Cortical Messaging Protocol (CMP), and why is it significant in the system's architecture?
### Technical Aspects
What role do learning modules play in the Thousand Brains architecture, and how do they function independently and collaboratively?
How does the Thousand Brains Project handle sensorimotor learning, and why is this approach fundamental?
What are reference frames, and how are they utilized for structuring models within the system?
Can you explain the hierarchy and voting mechanisms in multi-learning module systems?
### Implementation
What are the capabilities of the Monty system as the first instantiation of the Thousand Brains Project?
How does the system achieve rapid and continual learning, and what are the challenges involved?
How does the system ensure generalization across different sensory modalities and task domains?
### Evaluation and Applications
What types of experimental evaluations have been conducted for the Monty system, and what are the results?
How does the system apply its learned models to interact with and manipulate the environment?
What are some potential real-world applications of the Thousand Brains architecture?
### Philosophical and Long-Term Considerations
How does the Thousand Brains Project challenge preconceptions about AI and machine learning?
In what ways does the project aim to replicate human-like intelligence, and where does it deliberately diverge?
### Future Directions
What improvements or advancements are planned for the next iterations of the Thousand Brains architecture?
How does the project envision integrating more biologically accurate models, such as neural grid cells, into the system?

In [70]:
first_question = True
while True:
    if first_question:
        query_text = input("\nEnter your question (or type 'quit' to exit): ").strip()
    else:
      query_text=input("\nwhat's your next question (or type 'quit' to exit): ").strip()

    if query_text.lower()=="quit":
      break
    if query_text=="":
      continue

    firsy_question=False
    print("\nQUESTION: \"%s\"" % query_text)
    answer=astra_vector_index.query(query_text, llm=llm).strip()
    print("\nANSWER: \"%s\"\n" % answer)

    print("FIRST DOCUMENTS BY RELEVANCE:")
    for doc, score in astra_vector_store.similarity_search_with_score(query_text,k=4):
      print("   [%0.4f] \"%s ...\"" % (score,doc.page_content[:84]))




Enter your question (or type 'quit' to exit): What are the capabilities of the Monty system as the first instantiation of the Thousand Brains Project?

QUESTION: "What are the capabilities of the Monty system as the first instantiation of the Thousand Brains Project?"





ANSWER: "Monty, the first instantiation of the Thousand Brains Project, is capable of embodying key principles of the Thousand Brains Theory and using those principles to enable efficient learning of generalizable representations from sensorimotor data. It is also designed to support multimodal integration and the development of more abstract representations. In addition, Monty is capable of interacting with its environment using several different sensors and motor systems."

FIRST DOCUMENTS BY RELEVANCE:




   [0.9561] "cies. Building on these core concepts, we described Monty,
the first instantiation o ..."
   [0.9479] "environment using several different sensors, in this case,
touch and vision.
While p ..."
   [0.9459] "intelligent, more flexible, and more capable in the many
applications that deep lear ..."
   [0.9448] "biases around the spatial structure of the world to enable rapid and continual learn ..."

Enter your question (or type 'quit' to exit): What is the Cortical Messaging Protocol (CMP), and why is it significant in the system's architecture?

QUESTION: "What is the Cortical Messaging Protocol (CMP), and why is it significant in the system's architecture?"





ANSWER: "The Cortical Messaging Protocol (CMP) is a common communication protocol used by all components in the architecture - sensor modules, learning modules, and the motor system - to share information. It is significant because it allows for the seamless integration and communication between different components, as long as they have the appropriate interfaces. This allows for flexibility and versatility in the system, as different components can have varied inner workings as long as they can communicate through the CMP. The CMP is also inspired by long-range connections in the cortex, making it an effective and efficient communication protocol. Overall, the CMP plays a crucial role in the architecture by enabling the integration and coordination of different components to achieve the system's goals."

FIRST DOCUMENTS BY RELEVANCE:




   [0.9293] "3 Overview Of The Architecture
There are three major components that play a role in  ..."
   [0.9278] "We use a common communication protocol that all compo-
nents - learning modules, sen ..."
   [0.9112] "vature, etc.
• ‘Confidence’ (defined in the range [0, 1]).
•A boolean for whether th ..."
   [0.9051] "blocks. The architecture we are creating is built on this
premise. Thousand-brains s ..."

Enter your question (or type 'quit' to exit): What is the primary computational unit in the Thousand Brains Project?

QUESTION: "What is the primary computational unit in the Thousand Brains Project?"





ANSWER: "The primary computational unit in the Thousand Brains Project is the cortical column, as proposed by Vernon Mountcastle's theory of intelligence. This idea is also reflected in the first practical implementation of a thousand-brains system called "Monty"."

FIRST DOCUMENTS BY RELEVANCE:




   [0.9402] "∗Joint first authors.arXiv:2412.18354v1  [cs.AI]  24 Dec 2024The Thousand Brains Pro ..."
   [0.9379] "blocks. The architecture we are creating is built on this
premise. Thousand-brains s ..."
   [0.9363] "intelligent, more flexible, and more capable in the many
applications that deep lear ..."
   [0.9336] "THETHOUSAND BRAINS PROJECT : A N EWPARADIGM FOR
SENSORIMOTOR INTELLIGENCE
Viviane Cl ..."

Enter your question (or type 'quit' to exit): What is the term used for the shared coordinate system within the Thousand Brains Project?

QUESTION: "What is the term used for the shared coordinate system within the Thousand Brains Project?"





ANSWER: "The shared coordinate system within the Thousand Brains Project is called the "Cortical Messaging Protocol" or CMP."

FIRST DOCUMENTS BY RELEVANCE:




   [0.9273] "∗Joint first authors.arXiv:2412.18354v1  [cs.AI]  24 Dec 2024The Thousand Brains Pro ..."
   [0.9240] "8The Thousand Brains Project
Figure 4: By using a common messaging protocol between  ..."
   [0.9235] "biases around the spatial structure of the world to enable rapid and continual learn ..."
   [0.9233] "blocks. The architecture we are creating is built on this
premise. Thousand-brains s ..."

Enter your question (or type 'quit' to exit): quit
