# OpenNotebookLM Demo

This notebook demonstrates a workflow similar to Google NotebookLM, including document ingestion, data analysis, and machine learning model building. Each section guides you through the process, from importing libraries to visualizing results.

## 1. Import Essential Libraries

Import libraries for data analysis, visualization, and machine learning.

In [None]:
# Import libraries for data analysis and machine learning
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

## 2. Load and Explore Dataset

Load a sample dataset and explore its structure, statistics, and visualizations.

In [None]:
# Load the Iris dataset from scikit-learn
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target
df.head()

In [None]:
# Show dataset shape and basic statistics
print('Shape:', df.shape)
print(df.describe())

In [None]:
# Visualize feature distributions
sns.pairplot(df, hue='target')
plt.show()

## 3. Data Preprocessing

Prepare the data for modeling: clean, encode, and split into training and testing sets.

In [None]:
# Check for missing values
print(df.isnull().sum())

In [None]:
# Split dataset into features and target
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 4. Build and Train a Machine Learning Model

Train a Random Forest classifier on the training data and display training progress.

In [None]:
# Train Random Forest classifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
print('Training complete.')

## 5. Evaluate Model Performance

Assess the trained model using accuracy, confusion matrix, and classification report.

In [None]:
# Predict on test set
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print(f'Accuracy: {acc:.2f}')
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))
print('Classification Report:')
print(classification_report(y_test, y_pred))

## 6. Generate Predictions

Use the trained model to make predictions and display sample outputs.

In [None]:
# Show sample predictions vs actual
results = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
print(results.head())

## 7. Visualize Results

Create plots to visualize model performance, such as confusion matrix and prediction vs actual values.

In [None]:
# Plot confusion matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

In [None]:
# Plot prediction vs actual values
plt.scatter(y_test, y_pred)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('Prediction vs Actual')
plt.show()