# FileUtils Tutorial

This notebook demonstrates how to use FileUtils for data management in Python data science projects. We'll cover:
1. Installation and Setup
2. Basic File Operations
3. Working with Different File Formats
4. Azure Storage Integration
5. Advanced Features

## 1. Installation and Setup

First, let's install FileUtils and set up our environment:

In [None]:
# Install FileUtils with all features
!pip install FileUtils[azure,parquet,excel]

In [None]:
# Import required packages
import pandas as pd
import numpy as np
from pathlib import Path
from FileUtils import FileUtils, OutputFileType

# Initialize FileUtils
file_utils = FileUtils()

# Create some sample data
np.random.seed(42)
df = pd.DataFrame({
    'date': pd.date_range('2024-01-01', periods=10),
    'value': np.random.randn(10),
    'category': np.random.choice(['A', 'B', 'C'], 10)
})

## 2. Basic File Operations

Let's start with basic file operations using different formats:

In [None]:
# Save as CSV
csv_files, _ = file_utils.save_data_to_disk(
    data=df,
    output_filetype=OutputFileType.CSV,
    output_type="processed",
    file_name="sample_data"
)

# Load CSV and verify
loaded_csv = file_utils.load_single_file(
    list(csv_files.values())[0],
    input_type="processed"
)
print("Loaded CSV data:")
print(loaded_csv.head())

### Working with Multiple DataFrames

FileUtils can handle multiple DataFrames in a single file:

In [None]:
# Create multiple DataFrames
df_dict = {
    'original': df,
    'filtered': df[df['value'] > 0],
    'summary': df.groupby('category')['value'].agg(['mean', 'count']).reset_index()
}

# Save to Excel with multiple sheets
excel_files, _ = file_utils.save_data_to_disk(
    data=df_dict,
    output_filetype=OutputFileType.XLSX,
    output_type="processed",
    file_name="multi_sheet_data"
)

# Load all sheets
loaded_sheets = file_utils.load_excel_sheets(
    list(excel_files.values())[0],
    input_type="processed"
)

print("\nLoaded Excel sheets:")
for sheet_name, sheet_df in loaded_sheets.items():
    print(f"\n{sheet_name}:")
    print(sheet_df.head())

## 3. Working with Different File Formats

FileUtils supports various file formats including CSV, Excel, Parquet, JSON, and YAML:

In [None]:
# Save data in different formats
formats = {
    OutputFileType.CSV: "csv_data",
    OutputFileType.XLSX: "excel_data",
    OutputFileType.PARQUET: "parquet_data"
}

for format_type, filename in formats.items():
    saved_files, _ = file_utils.save_data_to_disk(
        data=df,
        output_filetype=format_type,
        output_type="processed",
        file_name=filename
    )
    print(f"Saved {format_type.value} file: {list(saved_files.values())[0]}")
    
    # Load and verify
    loaded_df = file_utils.load_single_file(
        list(saved_files.values())[0],
        input_type="processed"
    )
    print(f"Successfully loaded {format_type.value} data with shape {loaded_df.shape}\n")

## 4. Azure Storage Integration

To use Azure Storage, make sure you have set up your credentials in a .env file first. Let's verify the setup:

In [None]:
from FileUtils.azure_setup import AzureSetupUtils

# Validate Azure setup
try:
    is_valid = AzureSetupUtils.validate_azure_setup()
    if is_valid:
        print("Azure setup is valid")
        
        # Initialize Azure-enabled FileUtils
        azure_utils = FileUtils.create_azure_utils()
        
        # Save to Azure
        saved_files, _ = azure_utils.save_data_to_disk(
            data=df,
            output_filetype="csv",
            output_type="processed",
            file_name="azure_data"
        )
        print(f"\nSaved to Azure: {saved_files}")
        
        # Load from Azure
        azure_path = list(saved_files.values())[0]
        loaded_df = azure_utils.load_single_file(azure_path)
        print("\nLoaded from Azure:")
        print(loaded_df.head())
    else:
        print("Azure setup is not valid - skipping Azure examples")
except Exception as e:
    print(f"Azure setup not configured: {e}")

## 5. Advanced Features

Let's explore some advanced features of FileUtils:

### 5.1 Custom Configuration

In [None]:
import yaml

# Create custom config
config = {
    'csv_delimiter': '|',
    'encoding': 'utf-8',
    'include_timestamp': True,
    'logging_level': 'DEBUG',
    'directory_structure': {
        'data': ['raw', 'interim', 'processed', 'external'],
        'reports': ['figures', 'tables']
    }
}

config_path = Path('custom_config.yaml')
with open(config_path, 'w') as f:
    yaml.dump(config, f)

# Initialize with custom config
custom_utils = FileUtils(config_file=config_path)

# Test custom delimiter
saved_files, _ = custom_utils.save_data_to_disk(
    data=df,
    output_filetype=OutputFileType.CSV,
    output_type="processed",
    file_name="custom_delim_data"
)

# View the file contents
with open(list(saved_files.values())[0], 'r') as f:
    print("CSV with custom delimiter:")
    print(f.readline().strip())  # Print header row

### 5.2 Error Handling

In [None]:
# Try to load non-existent file
try:
    file_utils.load_single_file("nonexistent.csv")
except FileNotFoundError as e:
    print(f"Expected error: {e}")

# Try invalid file type
try:
    file_utils.save_data_to_disk(
        data=df,
        output_filetype="invalid"
    )
except ValueError as e:
    print(f"Expected error: {e}")

## Cleanup

Clean up temporary files created during this tutorial:

In [None]:
# Remove custom config file
if config_path.exists():
    config_path.unlink()
print("Cleanup completed")

## Next Steps

You've now seen the main features of FileUtils. Here are some suggestions for further exploration:

1. Customize the directory structure for your project
2. Experiment with different file formats and options
3. Set up Azure Storage for cloud integration
4. Create your own configuration file
5. Explore the logging capabilities

For more information, check the documentation or create an issue on GitHub if you have questions.