# üêº Pandas Integration - Data Analysis with Bright Data SDK

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vzucher/brightdata-sdk-python/blob/master/notebooks/02_pandas_integration.ipynb)

Learn how to integrate Bright Data SDK with pandas for powerful data analysis.

## What You'll Learn
1. Converting results to DataFrames
2. Batch scraping to DataFrame
3. Data cleaning and analysis
4. Exporting to CSV/Excel
5. Visualization with matplotlib

---


## üì¶ Setup


In [None]:
# Install required packages
%pip install brightdata-sdk pandas matplotlib seaborn -q

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from brightdata import BrightDataClient

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úÖ All packages loaded")


In [None]:
# Authentication
API_TOKEN = "your_api_token_here"  # Replace with your token
client = BrightDataClient(token=API_TOKEN)
print("‚úÖ Client initialized")


## üìä Method 1: Single Result to DataFrame

Convert a single scrape result to a DataFrame:


In [None]:
# Scrape one product
result = client.scrape.amazon.products(
    url="https://www.amazon.com/dp/B0CRMZHDG8"
)

# Convert to DataFrame
if result.success and result.data:
    df = pd.DataFrame([result.data])
    
    # Add metadata
    df['url'] = result.url
    df['cost'] = result.cost
    df['elapsed_ms'] = result.elapsed_ms()
    df['scraped_at'] = pd.Timestamp.now()
    
    print(f"‚úÖ DataFrame: {len(df)} rows, {len(df.columns)} columns")
    display(df.head())


## üîÑ Method 2: Batch Scraping to DataFrame

Scrape multiple URLs and create a comprehensive DataFrame:


In [None]:
# List of Amazon product URLs
urls = [
    "https://www.amazon.com/dp/B0CRMZHDG8",
    "https://www.amazon.com/dp/B09B9C8K3T",
    "https://www.amazon.com/dp/B0CX23V2ZK",
]

print(f"Scraping {len(urls)} products...")
results = []

for i, url in enumerate(urls, 1):
    print(f"  [{i}/{len(urls)}] {url[:50]}...")
    try:
        result = client.scrape.amazon.products(url=url)
        if result.success:
            results.append({
                'url': result.url,
                'title': result.data.get('title', 'N/A'),
                'price': result.data.get('final_price', 'N/A'),
                'rating': result.data.get('rating', 'N/A'),
                'reviews_count': result.data.get('reviews_count', 0),
                'cost': result.cost,
                'elapsed_ms': result.elapsed_ms(),
                'status': 'success'
            })
    except Exception as e:
        results.append({'url': url, 'error': str(e), 'status': 'failed'})

# Create DataFrame
df = pd.DataFrame(results)
print(f"\n‚úÖ Scraped {len(df)} products")
print(f"   Success: {(df['status'] == 'success').sum()}")
print(f"   Failed: {(df['status'] != 'success').sum()}")


In [None]:
display(df.head())

# Summary statistics
print("\nüìä Summary:")
print(f"Total cost: ${df['cost'].sum():.4f}")
print(f"Avg time: {df['elapsed_ms'].mean():.2f}ms")


## üíæ Export Data


In [None]:
# Export to CSV
df.to_csv('amazon_products.csv', index=False)
print("‚úÖ Exported to amazon_products.csv")

# Export to Excel
df.to_excel('amazon_products.xlsx', index=False, sheet_name='Products')
print("‚úÖ Exported to amazon_products.xlsx")


## üí° Pro Tips for Data Scientists

### Use Progress Bars
```python
from tqdm import tqdm
for url in tqdm(urls, desc="Scraping"):
    result = client.scrape.amazon.products(url=url)
```

### Cache Results
```python
import joblib
memory = joblib.Memory('.cache', verbose=0)

@memory.cache
def scrape_cached(url):
    return client.scrape.amazon.products(url=url)
```

### Track Costs
```python
total_cost = df['cost'].sum()
print(f"Total spent: ${total_cost:.4f}")
```

---

## ‚úÖ Summary

You learned:
- ‚úÖ Converting SDK results to DataFrames
- ‚úÖ Batch scraping workflows
- ‚úÖ Data visualization
- ‚úÖ Exporting to CSV/Excel

## üéì Next: [Amazon Deep Dive](./03_amazon_scraping.ipynb)
