In [6]:
import google.generativeai as genai
import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("GEMINI_API_KEY")
genai.configure(api_key=api_key)


In [7]:
# Create the model
model = genai.GenerativeModel('gemini-1.5-flash')

In [8]:
textbook_text = """
 Understanding the Brain
 According to Marr (1982), understanding an information processing sys
tern has three levels, called the levels ofanalysis:
 1. Computational theory corresponds to the goal of computation and an
 abstract definition of the task.
 2. Representation and algorithm is about how the input and the output
 are represented and about the specification of the algorithm for the
 transformation from the input to the output.
 3. Hardware implementation is the actual physical realization of the sys
tem.
 One example is sorting: The computational theory is to order a given
 set of elements. The representation may use integers, and the algorithm
 maybe Quicksort. After compilation, the executable code for a particular
 processor sorting integers represented in binary is one hardware imple
mentation.
 The idea is that for the same computational theory, there may be mul
tiple representations and algorithms manipulating symbols in that repre
sentation. Similarly, for any given representation and algorithm, there
 may be multiple hardware implementations. We can use one of vari
ous sorting algorithms, and even the same algorithm can be compiled
 on computers with different processors and lead to different hardware
 implementations.
 To take another example, '6', 'VI', and '110' are three different repre
sentations of the number six. There is a different algorithm for addition
 depending on the representationused. Digital computers use binary rep
resentation and have circuitry to add in this representation, which is one
11.1 Introduction
 231
 particular hardware implementation. Numbers are represented differ
ently, and addition corresponds to a different set of instructions on an
 abacus, which is another hardware implementation. When we add two
 numbers in our head, we use another representation and an algorithm
 suitable to that representation, which is implemented by the neurons.
 But all these different hardware implementations-for example, we, aba
cus, digital computer-implement the same computational theory which
 is addition.
 The classic example is the difference between natural and artificial fly
ing machines: A sparrow flaps its wings; a commercial airplane does not
 flap its wings but uses jet engines. The sparrow and the airplane are
 two hardware implementations built for different purposes, satisfying
 different constraints. But they both implement the same theory, which is
 aerodynamics.
 Thebrainis onehardwareimplementationfor learningorpatternrecog
nition. If from this particular implementation, we can do reverse engi
neering and extract the representation and the algorithm used, and if
 from that in turn, we can get the computational theory, we can then use
 another representation and algorithm, and in turn a hardware implemen
tation more suited to the means and constraints we have. One hopes our
 implementation will be cheaper, faster, and more accurate.
 Just as the initial attempts to build flying machines looked very much
 like birds until we discovered aerodynamics, it is also expected that the
 first attempts to build structures possessing brain's abilities will look
 like the brain with networks of large numbers of processing units, until
 we discover the computational theory of intelligence. So it can be said
 that in understanding the brain, when we are working on artificial neural
 networks, we are at the representation and algorithm level.
 Just as the feathers are irrelevant to flying, in time we may discover
 that neurons and synapses are irrelevant to intelligence. But until that
 time there is one other reason why we are interested in understanding
 the functioning of the brain, and that is related to parallel processing.
 11.1.2
 Neural Networks as a Paradigmfor Parallel Processing
 Since the 1980s, computer systems with thousands of processors have
 been commerciallyavailable. The software for suchparallelarchitectures,
 however, has not advanced as quickly as hardware. The reason for this
 is that almost all our theory of computation up to that point was based
232
 11 Multilayer Perceptrons
 PARAllEL PROCESSING
 on serial, one processor machines. We are not able to use the parallel
 machineswe have efficientlybecause we cannot program themefficiently.
 There are mainly two paradigms for parallel processing: In Single In
struction Multiple Data (SIMD) machines, all processors execute the same
 instruction but on different pieces of data. In Multiple Instruction Mul
tiple Data (MIMD) machines, different processors may execute different
 instructions on different data. SIMD machines are easier to program be
cause there is only one program to write. However, problems rarely have
 such a regular structure that they can be parallelized over a SIMD ma
chine. MIMD machines are more general, butit is not aneasy taskto write
 separate programs for all the individual processors; additional problems
 are related to synchronization, data transfer between processors, and so
 forth. SIMD machines are also easier to build, and machines with more
 processors can be constructed if they are SIMD. In MIMD machines, pro
cessors are more complex, and a more complex communication network
 should be constructed for the processors to exchange data arbitrarily.
 Assume now that we can have machines where processors are a lit
tle bit more complex than SIMD processors but not as complex as MIMD
 processors. Assume we have simple processors with a small amount of
 local memory where some parameters can be stored. Each processor im
plements a fixed function and executes the same instructions as SIMD
 processors; but by loading different values into the local memory, they
 can be doing different things and the whole operation can be distributed
 over such processors. We will then have what we can call Neural Instruc
tion Multiple Data (NIMD) machines, where each processor corresponds
 to a neuron, local parameters correspond to its synaptic weights, and the
 whole structure is a neural network. If the function implemented in each
 processor is simple and if the local memory is small, then many such
 processors canbe fitted on a single chip.
 The problemnow is to distribute a task over a network of such proces
sors and to determine the local parameter values. This is where learning
 comes into play: We do not need to program such machines and deter
mine the parameter values ourselves if such machines can learn from
 examples.
 Thus, artificial neural networks are a way to make use of the parallel
 hardware we canbuildwith current technology and-thanks to learning
they need not be programmed. Therefore, we also save ourselves the
 effort of programming them.
 In this chapter, we discuss such structures and how they are trained.
11.2 The Perceptron
 233
 Figure 11.1 Simple perceptron. Xj, j = 1, ...,d are the input units. XQ is the
 bias unit that always has the value 1. y is the output unit. Wj is the weight of
 the directed connection from input xj to the output.
 Keep in mind that the operation of an artificial neural network is a math
ematical function that can be implemented on a serial computer-as it
 generallyis-and training the networkis not much different from statisti
cal techniques that we have discussed in the previous chapters. Thinking
 of this operation as being carried out on a network of simple processing
 units is meaningful only if we have the parallel hardware, and only if the
 network is so large that it cannot be simulated fast enough on a serial
"""

In [9]:
prompt = f"""
You are an expert at creating flashcards for active recall learning.  
I will give you text from a textbook or study notes.  
Create flashcards in the exact format:

Q: [clear, concise question]  
A: [short answer, max 8 words]  

Rules:
- Answers must be short enough to fit on one side of a flashcard.
- Avoid full sentences unless necessary — prefer key terms or short phrases.
- Each card should test only one fact or concept.
- Skip unclear or irrelevant parts of the text.
- Do not include explanations, examples, or commentary.
- Output only the flashcards, one per line, no extra formatting.

Here is the text:

{textbook_text}

Generate only the flashcards now.


"""

try:
    response = model.generate_content(prompt)
    flashcards = response.text
except Exception as e:
    flashcards = f"An error occurred: {e}"

In [10]:
from IPython.display import Markdown

Markdown(flashcards)

Q: Marr's three levels of analysis?
A: Computational theory, Representation & algorithm, Hardware implementation

Q: Computational theory defines what?
A: Goal of computation, task definition

Q: Representation & algorithm details what?
A: Input/output representation, transformation algorithm

Q: Hardware implementation describes what?
A: Physical system realization

Q: Example of different number representations?
A: '6', 'VI', '110'

Q: What is the computational theory in the flying machine example?
A: Aerodynamics

Q: What level of analysis are artificial neural networks?
A: Representation & algorithm

Q: Two paradigms for parallel processing?
A: SIMD, MIMD

Q: SIMD stands for what?
A: Single Instruction Multiple Data

Q: MIMD stands for what?
A: Multiple Instruction Multiple Data

Q: NIMD stands for what?
A: Neural Instruction Multiple Data

Q: What does each NIMD processor represent?
A: Neuron

Q: What do local NIMD parameters represent?
A: Synaptic weights
