# Multi Purpose Notebook

This note book contains the following python code:

- Testing parquet transformation
- Generating sample PDFs (Receipts, Invoices, etc.) - the current layout is for a receipt
- Counting the number of blobs within a container (.NET)

## Testing parquet transformation

### Install the required packages

First we would need to ensure we have the right packages installed. We can do this by running the following command:


In [None]:
!pip install pandas
!pip install pyarrow

### Validating the transformation

Using the following python code to examine the transformed parquet file

In [None]:
import pandas as pd
pd.read_parquet('sample.parquet', engine='pyarrow')

## Creating Sample PDFs

This section address the need to create multiple types of receipts, with random customer names, random number of items, spanning cross pages. this is to check the ingestion workflow for receipts.

### Install the required packages

In [None]:
!pip install reportlab
!pip install faker

### Selecting the right font

The following code shows what are the current available fonts (on the OS you currently running)

In [None]:
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

def show_font_styles(filename='Font_Styles.pdf'):
    c = canvas.Canvas(filename, pagesize=letter)
    available_fonts = c.getAvailableFonts()
    y_position = 750  # Start position on the page for the first font
    c.setFont("Helvetica", 12)  # Set a default font for the title
    
    c.drawString(40, y_position + 20, "Available Fonts and their Styles:")
    y_position -= 30  # Move down for the first entry

    for font in available_fonts:
        c.setFont(font, 12)  # Set the font to each available typeface
        c.drawString(40, y_position, f"{font}")
        y_position -= 20  # Move down after each font name
        
        if y_position < 40:  # Check if we are near the bottom of the page
            c.showPage()
            y_position = 750  # Reset position at the top of a new page

    c.save()
    print(f"Font styles displayed in {filename}")

show_font_styles()


### Generating Receipts PDF

Following code generates a sample receipt PDF with random customer name, random number of items, and random total amount. It also adds signatures and date of signatures (for some of the files)

Once this code is loaded, it can be tested in two ways 

#### Single file generation

```python
create_random_invoice("Random_Invoice1.pdf", num_items=random.randint(3, 15))
```

#### Multiple file generation

```python
generate_multiple_invoices()
```


In [None]:
import random
from faker import Faker
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle, PageBreak
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib import colors
from datetime import datetime, timedelta

# Initialize Faker
fake = Faker()
# Function to generate a random date within the last 30 days
def random_date_last_30_days():
    today = datetime.now()
    days_back = random.randint(0, 30)
    random_date = today - timedelta(days=days_back)
    return random_date.strftime("%Y-%m-%d")

def create_random_invoice(filename, num_items=5):
    # Prepare filename path
    filename = 'generated/' + filename

    # Create document template
    doc = SimpleDocTemplate(filename, pagesize=letter)
    story = []
    styles = getSampleStyleSheet()

    # Company and customer headers
    company_header = Paragraph("<font size=12><b>CONTOSO</b></font><br/>Innovation drives progress", styles["Heading2"])
    customer_name = fake.company()
    customer_address = fake.address().replace('\n', ', ')
    customer_info = Paragraph(f"<b>CUSTOMER:</b><br/>{customer_name}<br/>{customer_address}", styles["Normal"])
    
    # Align headers in a table for proper layout
    header_table = Table([[company_header, customer_info]], colWidths=[270, 270])
    header_table.setStyle(TableStyle([
        ('VALIGN', (0,0), (-1,-1), 'TOP'),
        ('ALIGN', (1,0), (1,0), 'RIGHT')
    ]))
    story.append(header_table)
    story.append(Spacer(1, 12))

    # Invoice details
    story.append(Paragraph(f"ISSUED: {fake.date_this_year()}", styles["Normal"]))
    story.append(Spacer(1, 20))

    # Table data
    header = [["PRODUCT ID", "UNIT PRICE", "QUANTITY", "TOTAL PRICE"]]
    data = []
    total_price = 0

    # Generate table data
    for i in range(num_items):
        product_id = f"{random.randint(1,100)}-{''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=2))}-XX"
        unit_price = random.uniform(0.5, 100.0)
        quantity = random.randint(1, 20)
        line_total = unit_price * quantity
        total_price += line_total
        data.append([product_id, f"{unit_price:.2f}", str(quantity), f"{line_total:.2f}"])

    data.append(["TOTAL", "", "", f"{total_price:.2f}"])

    # Splitting data for the first page and subsequent pages
    first_page_data = header + data[:20]  # Including header
    subsequent_data = data[20:]           # No header for subsequent pages

    # Define table style
    table_style = TableStyle([
        ('BACKGROUND', (0,0), (-1,0), colors.grey),
        ('TEXTCOLOR', (0,0), (-1,0), colors.whitesmoke),
        ('ALIGN', (0,0), (-1,-1), 'CENTER'),
        ('FONTNAME', (0,0), (-1,0), 'Helvetica-Bold'),
        ('BOTTOMPADDING', (0,0), (-1,0), 12),
        ('BACKGROUND', (0,1), (-1,-1), colors.beige),
    ])

    # First page table
    if first_page_data:
        table1 = Table(first_page_data, colWidths=[100, 100, 100, 100], repeatRows=1, style=table_style)
        story.append(table1)
        story.append(PageBreak())  # Insert a page break after the first table

    # Subsequent pages table
    if subsequent_data:
        table2 = Table(subsequent_data, colWidths=[100, 100, 100, 100], repeatRows=0, style=table_style)
        story.append(table2)

    # Space before signatures
    story.append(Spacer(1, 12 * 5))  # 5 lines of space
    
    # Signature placeholders
    distributor_signed = random.choice([True, False])
    customer_signed = random.choice([True, False])
    
    distributor_date = random_date_last_30_days() if distributor_signed else ""
    customer_date = random_date_last_30_days() if customer_signed else ""


    signatures_data = [
    ["Distributor Signature:", "John Doe" if distributor_signed else "", "Date:", "", distributor_date],
    ["Customer Signature:", "~~/\\/\\~~" if customer_signed else "", "Date:", "", customer_date]
    ]


    # Adjust column widths to accommodate the new structure
    signature_table = Table(signatures_data, colWidths=[150, 150, 50, 5, 95])  # Adjust colWidths as needed

    # Style adjustments, ensuring 'Date:' label is always visible
    signature_table.setStyle(TableStyle([
        ('SPAN', (2,0), (3,0)),  # Span 'Date:' label over an empty column for alignment
        ('SPAN', (2,1), (3,1)),  # Repeat for the second row
        ('ALIGN', (1,0), (1,-1), 'CENTER'),  # Center align the signature placeholders
        ('ALIGN', (4,0), (4,-1), 'CENTER'),  # Center align the actual date
        ('FONTNAME', (1,0), (1,-1), 'Times-Italic'),  # Use a more 'handwritten' font if available
    ]))

    story.append(signature_table)

    # Build the document
    doc.build(story)



def generate_multiple_invoices():
    # Random number of invoices to generate
    num_invoices = random.randint(1, 10)
    print(f"Generating {num_invoices} invoices...")

    for _ in range(num_invoices):
        # Create a random filename for each invoice
        filename = f"{fake.unique.word()}_Invoice_{fake.random_int(min=100, max=999)}.pdf"
        # Random number of items in each invoice
        num_items = random.randint(20, 50)
        # Call the invoice creation function
        create_random_invoice(filename, num_items=num_items)
        print(f"Generated invoice '{filename}' with {num_items} items.")

def generate_multiple_invoices():
    # Random number of invoices to generate
    num_invoices = random.randint(1, 10)
    print(f"Generating {num_invoices} invoices...")

    for _ in range(num_invoices):
        # Create a random filename for each invoice
        filename = f"{fake.unique.word()}_Invoice_{fake.random_int(min=100, max=999)}.pdf"
        # Random number of items in each invoice
        num_items = random.randint(20, 50)
        # Call the invoice creation function
        create_random_invoice(filename, num_items=num_items)
        print(f"Generated invoice '{filename}' with {num_items} items.")


In [None]:
# Testing single file generation
create_random_invoice("Random_Invoice1.pdf", num_items=random.randint(3, 15))
# Testing multiple file generation
generate_multiple_invoices()

## Counting Blobs in a container

The following is sample Python code that uses the Azure Blob Storage SDK to count the number of blobs in a container.

### Install the required packages

In [None]:
!pip install azure-storage-blob
!pip install nest_asyncio
!pip install aiohttp


### Counting the blobs

The only change required here is to set the connection string for the blob storage account and the container name.


In [None]:
import asyncio
from azure.storage.blob.aio import BlobServiceClient
import nest_asyncio

# Apply nest_asyncio to enable running in notebooks or other already running event loops
nest_asyncio.apply()

async def list_blobs():
    connection_string = ""
    container_mame = ''
    blob_service_client = BlobServiceClient.from_connection_string(connection_string)
    container_client = blob_service_client.get_container_client(container_mame)

    count = 0
    # Ensure we use asynchronous listing
    blob_pager = container_client.list_blobs()
    
    # Correctly iterate over pages asynchronously
    async for page in blob_pager.by_page():
        blobs = [blob async for blob in page]
        count += len(blobs)
    
    print(f"Number of blobs in the container: {count}")

# Execute the asynchronous function using asyncio
asyncio.get_event_loop().run_until_complete(list_blobs())
