# PandRS Tutorial

This notebook demonstrates the basic usage of PandRS, a Rust-powered DataFrame library for Python with a pandas-like API.

In [None]:
import pandrs as pr
import numpy as np
import pandas as pd

print(f"PandRS version: {pr.__version__}")

## Creating DataFrames

You can create a DataFrame from a dictionary of lists or NumPy arrays:

In [None]:
# Create a simple DataFrame
df = pr.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': ['a', 'b', 'c', 'd', 'e'],
    'C': [1.1, 2.2, 3.3, 4.4, 5.5]
})

# Display the DataFrame
df

## Converting between pandas and PandRS

You can easily convert between pandas and PandRS DataFrames:

In [None]:
# Convert to pandas DataFrame
pd_df = df.to_pandas()
print("Pandas DataFrame:")
pd_df

In [None]:
# Convert back to PandRS DataFrame
pr_df = pr.DataFrame.from_pandas(pd_df)
print("PandRS DataFrame:")
pr_df

## Working with Series

You can extract columns as Series objects:

In [None]:
# Get a column as a Series
series_a = df['A']
series_a

In [None]:
# Convert to NumPy array
np_array = series_a.to_numpy()
print(f"NumPy array: {np_array}")
print(f"Type: {type(np_array)}")

## Handling Missing Values

PandRS has built-in support for handling missing values with the NASeries class:

In [None]:
# Create a DataFrame with missing values
df_na = pr.DataFrame({
    'A': [1, 2, None, 4, 5],
    'B': ['a', None, 'c', 'd', None],
    'C': [1.1, 2.2, 3.3, None, 5.5]
})

df_na

## Loading and Saving Data

PandRS supports reading and writing CSV and JSON files:

In [None]:
# Save DataFrame to CSV
df.to_csv('sample_data.csv')

# Read DataFrame from CSV
df_from_csv = pr.DataFrame.read_csv('sample_data.csv')
df_from_csv

In [None]:
# Convert DataFrame to JSON
json_str = df.to_json()
print(json_str)

# Read DataFrame from JSON
df_from_json = pr.DataFrame.read_json(json_str)
df_from_json

## Performance Comparison

Let's compare the performance of PandRS and pandas:

In [None]:
import time

# Create a large DataFrame
n_rows = 100000
data = {
    'A': list(range(n_rows)),
    'B': [f"value_{i}" for i in range(n_rows)],
    'C': [i * 1.1 for i in range(n_rows)]
}

# Time pandas DataFrame creation
start = time.time()
pd_df = pd.DataFrame(data)
pd_time = time.time() - start
print(f"pandas DataFrame creation time: {pd_time:.4f} seconds")

# Time PandRS DataFrame creation
start = time.time()
pr_df = pr.DataFrame(data)
pr_time = time.time() - start
print(f"PandRS DataFrame creation time: {pr_time:.4f} seconds")

print(f"Speed ratio: {pd_time / pr_time:.2f}x")