# JMESPath Proxy Types Tutorial

This notebook demonstrates the usage of JMESPath with the custom proxy types `JMESPathArrayProxy` and `JMESPathObjectProxy` provided by the Evo SDK.

## Overview

The Evo SDK provides enhanced JMESPath functionality through proxy types that:
- Wrap JSON-like data structures to enable JMESPath queries as attribute/index access
- Provide convenient serialization methods
- Allow seamless chaining of JMESPath expressions

## Key Features

- `JMESPathObjectProxy`: Wraps dictionaries/mappings to enable JMESPath queries via `obj["expression"]`
- `JMESPathArrayProxy`: Wraps lists/sequences to enable JMESPath queries and indexing
- Automatic proxy wrapping of nested structures
- JSON serialization with support for UUID and MappingProxyType

In [None]:
# Import the JMESPath module from the Evo SDK
from uuid import uuid4

from evo.jmespath import compile, proxy, search

## Sample Data

Let's create some sample data structures that we'll use throughout the examples:

In [None]:
# Sample data structure representing a company with employees and projects
company_data = {
    "name": "TechCorp",
    "founded": 2010,
    "id": uuid4(),
    "employees": [
        {
            "id": uuid4(),
            "name": "Alice Johnson",
            "department": "Engineering",
            "age": 32,
            "skills": ["Python", "JavaScript", "Docker"],
            "projects": [
                {"name": "Project Alpha", "status": "completed", "priority": "high"},
                {"name": "Project Beta", "status": "active", "priority": "medium"},
            ],
        },
        {
            "id": uuid4(),
            "name": "Bob Smith",
            "department": "Engineering",
            "age": 28,
            "skills": ["Java", "Kubernetes", "Python"],
            "projects": [
                {"name": "Project Gamma", "status": "active", "priority": "high"},
                {"name": "Project Delta", "status": "planning", "priority": "low"},
            ],
        },
        {
            "id": uuid4(),
            "name": "Carol Davis",
            "department": "Marketing",
            "age": 35,
            "skills": ["Social Media", "Analytics", "Content Creation"],
            "projects": [{"name": "Campaign X", "status": "completed", "priority": "high"}],
        },
    ],
    "departments": {
        "Engineering": {"budget": 500000, "head": "David Wilson"},
        "Marketing": {"budget": 200000, "head": "Emma Thompson"},
    },
}

print("Sample data created with", len(company_data["employees"]), "employees")

## 1. Basic JMESPath Operations

### Creating Proxy Objects

First, let's see how to create proxy objects using the `proxy()` function:

In [None]:
# Create proxy objects
company_proxy = proxy(company_data)
employees_proxy = proxy(company_data["employees"])

print("Company proxy type:", type(company_proxy))
print("Employees proxy type:", type(employees_proxy))

# Demonstrate that proxies automatically wrap the underlying data
print("\nOriginal data is preserved:")
print("Company name:", company_proxy.raw["name"])
print("Number of employees:", len(employees_proxy.raw))

### Basic Identifiers and Subexpressions

JMESPath identifiers select keys from JSON objects. Subexpressions allow access to nested values:

In [None]:
# Using search() function - traditional JMESPath approach
print("=== Using search() function ===")
print("Company name:", search("name", company_data))
print("Founded year:", search("founded", company_data))
print("Engineering budget:", search("departments.Engineering.budget", company_data))

print("\n=== Using JMESPathObjectProxy ===")
# Using proxy objects with [] notation for JMESPath expressions
print("Company name:", company_proxy["name"])
print("Founded year:", company_proxy["founded"])
print("Engineering budget:", company_proxy["departments.Engineering.budget"])

# Nested access with subexpressions
print("Marketing head:", company_proxy["departments.Marketing.head"])

## 2. Array Operations and Indexing

The `JMESPathArrayProxy` provides both standard array indexing and JMESPath expressions:

In [None]:
# Array indexing with JMESPathArrayProxy
print("=== Array Indexing ===")
print("First employee:", employees_proxy[0]["name"])
print("Last employee:", employees_proxy[-1]["name"])
print("Second employee's department:", employees_proxy[1]["department"])

print("\n=== JMESPath expressions on arrays ===")
# Using JMESPath expressions as strings with arrays
print("All employee names:", employees_proxy["[*].name"])
print("All departments:", employees_proxy["[*].department"])

# Accessing nested array elements
print("\n=== Nested array access ===")
print("First employee's first skill:", employees_proxy[0]["skills[0]"])
print("All skills of first employee:", employees_proxy[0]["skills"])

# Slicing operations
print("\n=== Array slicing ===")
numbers = proxy([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print("First 5 numbers:", numbers["[:5]"])
print("Last 5 numbers:", numbers["[5:]"])
print("Every other number:", numbers["[::2]"])
print("Reverse order:", numbers["[::-1]"])

## 3. Projections

Projections are one of JMESPath's most powerful features, allowing you to apply expressions to collections:

In [None]:
# List projections
print("=== List Projections ===")
print("All employee names:", company_proxy["employees[*].name"])
print("All employee ages:", company_proxy["employees[*].age"])

# Object projections
print("\n=== Object Projections ===")
print("All department budgets:", company_proxy["departments.*.budget"])
print("All department heads:", company_proxy["departments.*.head"])

# Nested projections
print("\n=== Nested Projections ===")
print("All project names:", company_proxy["employees[*].projects[*].name"])
print("All skills:", company_proxy["employees[*].skills[*]"])

# Flatten projections
print("\n=== Flatten Projections ===")
# Compare [*] vs [] - the latter flattens the result
nested_projects = company_proxy["employees[*].projects[*].name"]
flattened_projects = company_proxy["employees[].projects[].name"]

print("Nested structure:", type(nested_projects), "- length:", len(nested_projects))
print("Flattened structure:", type(flattened_projects), "- length:", len(flattened_projects))
print("Flattened projects:", flattened_projects)

In [None]:
# Filter projections - finding specific data
print("=== Filter Projections ===")

# Find employees in Engineering department
engineers = company_proxy["employees[?department=='Engineering'].name"]
print("Engineers:", engineers)

# Find active projects
active_projects = company_proxy["employees[].projects[?status=='active'].name"]
print("Active projects:", active_projects)

# Find high priority projects
high_priority = company_proxy["employees[].projects[?priority=='high']"]
print("High priority projects:", high_priority)

# Complex filters - employees older than 30
experienced = company_proxy["employees[?age>`30`].name"]
print("Employees over 30:", experienced)

# Find employees with Python skills
python_devs = company_proxy["employees[?contains(skills, 'Python')].name"]
print("Python developers:", python_devs)

## 4. Pipe Expressions and MultiSelect

Pipe expressions allow you to chain operations, while MultiSelect helps create custom data structures:

In [None]:
# Pipe expressions - stopping projections and chaining operations
print("=== Pipe Expressions ===")

# Get first employee name
first_employee = company_proxy["employees[*].name | [0]"]
print("First employee:", first_employee)

# Get sorted list of employee names
sorted_names = company_proxy["employees[*].name | sort(@)"]
print("Sorted names:", sorted_names)

# MultiSelect lists - creating arrays of selected values
print("\n=== MultiSelect Lists ===")
employee_summary = company_proxy["employees[*].[name, age, department]"]
print("Employee summaries:", employee_summary)

# MultiSelect hashes - creating objects with selected keys
print("\n=== MultiSelect Hashes ===")
employee_profiles = company_proxy["employees[*].{Name: name, Age: age, Dept: department, SkillCount: length(skills)}"]
print("Employee profiles:")
for profile in employee_profiles:
    print(f"  {profile}")

# Complex multiselect with nested data
project_info = company_proxy["employees[*].{Employee: name, Projects: projects[*].{Name: name, Status: status}}"]
print("\nProject information:")
for info in project_info:
    print(f"  {info['Employee']}: {len(info['Projects'])} projects")
    for proj in info["Projects"]:
        print(f"    - {proj['Name']} ({proj['Status']})")

## 5. Functions

JMESPath provides many built-in functions for data transformation and analysis:

In [None]:
# Length and aggregation functions
print("=== Length and Aggregation ===")
print("Number of employees:", company_proxy["length(employees)"])
print("Number of departments:", company_proxy["length(departments)"])

# Math functions
print("\n=== Math Functions ===")
print("Average age:", company_proxy["avg(employees[*].age)"])
print("Max age:", company_proxy["max(employees[*].age)"])
print("Min age:", company_proxy["min(employees[*].age)"])
print("Sum of all ages:", company_proxy["sum(employees[*].age)"])

# Sorting functions
print("\n=== Sorting Functions ===")
print("Employees by age (ascending):", company_proxy["sort_by(employees, &age)[*].{name: name, age: age}"])
print("Oldest employee:", company_proxy["max_by(employees, &age).name"])
print("Youngest employee:", company_proxy["min_by(employees, &age).name"])

# String functions
print("\n=== String Functions ===")
all_skills = company_proxy["employees[].skills[]"]
print("All unique skills:", company_proxy["employees[].skills[] | sort(@)"])

# Type functions
print("\n=== Type Functions ===")
print("Company ID as string:", company_proxy["to_string(id)"])
print("Founded year as string:", company_proxy["to_string(founded)"])

# Advanced: Finding employees with most projects
print("\n=== Advanced: Most Active Employee ===")
most_active = company_proxy["max_by(employees, &length(projects))"]
print(f"Most active employee: {most_active['name']} with {len(most_active['projects'])} projects")

## 6. Working with Compiled Expressions

For better performance when running the same query multiple times, you can compile expressions:

In [None]:
import time

# Compile expressions for reuse
print("=== Compiled Expressions ===")

# Compile common queries
get_names = compile("employees[*].name")
get_engineers = compile("employees[?department=='Engineering']")
get_active_projects = compile("employees[].projects[?status=='active'].name")

# Use compiled expressions
print("Names:", get_names.search(company_data))
print("Engineers:", [emp["name"] for emp in get_engineers.search(company_data)])
print("Active projects:", get_active_projects.search(company_data))

# The results are automatically wrapped in proxy types
names_result = get_names.search(company_data)
print("\nResult type:", type(names_result))
print("Can iterate:", list(names_result))

# Performance benefit: compile once, use many times

# Test with repeated usage
test_data = [company_data] * 1000

start = time.perf_counter_ns()
for data in test_data:
    search("employees[*].name", data)
search_time = (time.perf_counter_ns() - start) * 0.000000001

start = time.perf_counter_ns()
compiled_expr = compile("employees[*].name")
for data in test_data:
    compiled_expr.search(data)
compile_time = (time.perf_counter_ns() - start) * 0.000000001

print(f"\nPerformance comparison ({len(test_data)} iterations):")
print(f"Using search(): {search_time:.4f} seconds")
print(f"Using compile(): {compile_time:.4f} seconds")
print(f"Speedup: {search_time / compile_time:.2f}x")

## 7. JSON Serialization Features

The proxy types provide enhanced JSON serialization with support for UUID and other special types:

In [None]:
# JSON serialization with proxy types
print("=== JSON Serialization ===")

# Get a subset of data
employee_subset = company_proxy["employees[0]"]
print("Employee data type:", type(employee_subset))

# The json_dumps method handles UUIDs and other special types automatically
print("\nSerialized employee:")
print(employee_subset.json_dumps(indent=2))

# Compare with standard json.dumps (would fail with UUID)
try:
    import json

    json.dumps(employee_subset.raw)
    print("Standard json.dumps worked")
except TypeError as e:
    print(f"Standard json.dumps failed: {e}")

# Demonstrate pretty printing with __repr__
print("\n=== Pretty Printing ===")
skills_proxy = company_proxy["employees[0].skills"]
print("Skills representation:")
print(repr(skills_proxy))

# Working with nested proxy results
departments_subset = company_proxy["departments"]
print(f"\nDepartments type: {type(departments_subset)}")
print("Departments JSON:")
print(departments_subset.json_dumps(indent=2))

## Summary

This tutorial demonstrated the key features of the Evo SDK's JMESPath proxy types:

### JMESPathObjectProxy
- Wraps dictionaries and allows JMESPath queries via `obj["expression"]`
- Supports nested object navigation with dot notation
- Automatically returns proxy-wrapped results for chaining

### JMESPathArrayProxy  
- Wraps lists and supports both integer indexing and JMESPath expressions
- Enables array slicing with JMESPath syntax
- Supports iteration with automatic proxy wrapping

### Key Benefits
- **Seamless integration**: Use JMESPath expressions as dictionary keys or array indices
- **Automatic wrapping**: Results are automatically wrapped in appropriate proxy types
- **Enhanced serialization**: Built-in support for UUID and other special types
- **Performance**: Compile expressions for repeated use
- **Chaining**: Natural chaining of JMESPath operations

### Best Practices
1. Use `proxy()` to wrap your data structures for JMESPath operations
2. Compile expressions when using them repeatedly for better performance
3. Leverage the `json_dumps()` method for serialization with special type support
4. Chain operations naturally using the proxy types
5. Use filter expressions to find specific data efficiently

The proxy types make JMESPath feel more natural in Python while providing all the power of the JMESPath query language!