### What is Pickling?

    A process of serializing Python objects, transforming them into a byte stream that can be stored on disk or transmitted over a network.

Essential for saving and loading large datasets or complex objects.

### Why Use Pickling?

    Efficiency: Faster than reading from CSV or JSON files, especially for large datasets.
    Preservation of Object Structure: Maintains the original structure and data types.
    Cross-Session Persistence: Allows data to be used in different Python sessions.

### Data Preparation

    Reading CSV Data:
        Using pandas.read_csv to load the oil well data from the CSV file.

In [1]:
import pandas as pd
from collections import namedtuple
import pickle
import locale
# Read oil well data from csv file
df = pd.read_csv('oil_wells.csv')
df

Unnamed: 0,well_id,latitude,longitude,production_rate,depth,age,well_type
0,WELL-001,35.234,45.678,1000,10000,10,oil
1,WELL-002,34.987,46.123,800,9500,15,gas
2,WELL-003,35.567,45.89,1200,11000,8,oil
3,WELL-004,34.789,46.345,950,10500,12,both
4,WELL-005,35.123,45.987,1100,10800,9,oil
5,WELL-006,34.876,46.234,750,9000,16,gas
6,WELL-007,35.456,45.789,1700,11500,7,oil
7,WELL-008,34.678,46.456,1050,10200,11,both
8,WELL-009,35.345,45.654,900,9800,13,gas
9,WELL-010,34.987,46.123,1250,11200,6,oil


Creating Namedtuples:
    Defining a namedtuple to represent oil well data, improving code readability and maintainability.

In [2]:
# Define a namedtuple for oil well data
OilWell = namedtuple(
    'OilWell',
    ['well_id', 'location', 'production_rate', 'depth', 'age', 'well_type'],
)

def _str(well):
    return f"Well ID: {well.well_id}, Production Rate: {well.production_rate}"

OilWell.__str__ = _str

Converting to Python Objects:
    Iterating over the DataFrame to create a list of OilWell objects.

In [3]:
oil_wells = []
for index, row in df.iterrows():
    well = OilWell(
        well_id=row['well_id'],
        location=(row['latitude'], row['longitude']),
        production_rate=row['production_rate'],
        depth=row['depth'],
        age=row['age'],
        well_type=row['well_type']
    )
    oil_wells.append(well)

### Pickling the Data

    Opening a Binary File:
        Using open with 'wb' mode to create a binary file for writing.
    Dumping the Data:
        Using pickle.dump to serialize the oil_wells list into the binary file.

In [4]:
with open('oil_wells.pkl', 'wb') as f:
    pickle.dump(oil_wells, f)

### Loading the Pickled Data

    Opening a Binary File:
        Using open with 'rb' mode to open the binary file for reading.
    Loading the Data:
        Using pickle.load to deserialize the data from the binary file back into a Python object.

In [5]:
 locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
# Load the pickled oil well data
with open('oil_wells.pkl', 'rb') as file_handle:
    oil_wells = pickle.load(file_handle)

### Data Analysis and Visualization

    Sorting and Filtering:
        Sorting the wells by production rate to identify the top producers.

In [6]:
# Sort wells by production rate (highest to lowest)
oil_wells.sort(key=lambda well: well.production_rate, reverse=True)

Sorting and Filtering:
        Sorting the top 5 producing wells.

In [7]:

# Print the top 5 producing wells
print("Top 5 Producing Wells:")
for well in oil_wells[:5]:
    print(well)


Top 5 Producing Wells:
Well ID: WELL-017, Production Rate: 1900
Well ID: WELL-007, Production Rate: 1700
Well ID: WELL-027, Production Rate: 1500
Well ID: WELL-010, Production Rate: 1250
Well ID: WELL-020, Production Rate: 1250


In [8]:
 # Calculate total production rate
total_production = sum([well.production_rate for well in oil_wells])
print(f"Total Production Rate: {total_production:.2f} barrels per day")


Total Production Rate: 29400.00 barrels per day


In [9]:
 # Analyze wells by type
oil_wells_by_type = {}
for well in oil_wells:
    well_type = well.well_type
    if well_type not in oil_wells_by_type:
        oil_wells_by_type[well_type] = []
    oil_wells_by_type[well_type].append(well)

In [10]:
for well_type, wells in oil_wells_by_type.items():
        print(f"\n{well_type} Wells:")
        for well in wells:
            print(well)


oil Wells:
Well ID: WELL-017, Production Rate: 1900
Well ID: WELL-007, Production Rate: 1700
Well ID: WELL-010, Production Rate: 1250
Well ID: WELL-020, Production Rate: 1250
Well ID: WELL-003, Production Rate: 1200
Well ID: WELL-013, Production Rate: 1200
Well ID: WELL-023, Production Rate: 1200
Well ID: WELL-011, Production Rate: 1150
Well ID: WELL-015, Production Rate: 1150
Well ID: WELL-021, Production Rate: 1150
Well ID: WELL-025, Production Rate: 1150
Well ID: WELL-005, Production Rate: 1100
Well ID: WELL-001, Production Rate: 1000

gas Wells:
Well ID: WELL-027, Production Rate: 1500
Well ID: WELL-019, Production Rate: 950
Well ID: WELL-009, Production Rate: 900
Well ID: WELL-012, Production Rate: 850
Well ID: WELL-022, Production Rate: 850
Well ID: WELL-002, Production Rate: 800
Well ID: WELL-016, Production Rate: 800
Well ID: WELL-026, Production Rate: 800
Well ID: WELL-006, Production Rate: 750

both Wells:
Well ID: WELL-008, Production Rate: 1050
Well ID: WELL-018, Productio

In [11]:

# Calculate average production rate of top 5 wells
from statistics import mean
top_5_production_rates = [well.production_rate for well in oil_wells[:5]]
average_top_5_production = mean(top_5_production_rates)
print(f"Average Production Rate of Top 5 Wells: {average_top_5_production:.2f} barrels per day")


Average Production Rate of Top 5 Wells: 1520.00 barrels per day


### Pickling JSON Data

    Loading JSON Data:
        Using json.loads to parse the JSON string into a Python object.

In [12]:
# pickling  "Zohr Gas Field" data from json format:
import json
field_data= """{
    "field_name": "North Sea Oil Field",
    "production_history": [
        {"year": 2009, "production": 100000},
        {"year": 2010, "production": 120000},
        {"year": 2011, "production": 135000},
        {"year": 2012, "production": 145000},
        {"year": 2013, "production": 150000},
        {"year": 2014, "production": 148000},
        {"year": 2015, "production": 142000},
        {"year": 2016, "production": 135000},
        {"year": 2017, "production": 125000},
        {"year": 2018, "production": 110000},
        {"year": 2019, "production": 95000},
        {"year": 2020, "production": 80000},
        {"year": 2021, "production": 70000},
        {"year": 2022, "production": 60000},
        {"year": 2023, "production": 50000}
    ]
}"""

North_Sea_Oil_Field = json.loads(field_data)
df.to_json('North_Sea_Oil_Field.json')

In [13]:
# Load the JSON data
with open('North_Sea_Oil_Field.json', 'r') as f:
    field_data = json.load(f)

#### Pickling the JSON Data:

    Using pickle.dump to serialize the JSON object into a binary file.

In [14]:
# Pickle the data
with open('North_Sea_Oil_Field.pkl', 'wb') as f:
    pickle.dump(field_data, f)

#### Loading the Pickled JSON Data:

    Using pickle.load to deserialize the data from the binary file.

In [15]:
# Open pickle data.
with open('North_Sea_Oil_Field.pkl', 'rb') as f:
    field_data = pickle.load(f)

#### Converting to Pandas DataFrame:

    Using pd.DataFrame to convert the JSON data into a DataFrame for easier analysis.

In [16]:

# Convert the JSON data to a Pandas DataFrame for easier analysis
df = pd.DataFrame(field_data)

# Basic exploration
# Print the first 5 rows of the DataFrame to get a quick overview
print(df.head())

    well_id  latitude  longitude  production_rate  depth  age well_type
0  WELL-001    35.234     45.678             1000  10000   10       oil
1  WELL-002    34.987     46.123              800   9500   15       gas
2  WELL-003    35.567     45.890             1200  11000    8       oil
3  WELL-004    34.789     46.345              950  10500   12      both
4  WELL-005    35.123     45.987             1100  10800    9       oil


#### Basic Data Exploration:

    Using df.head(), df.tail(), and df.describe() to get a quick overview of the data.

In [17]:
# Print the last 5 rows of the DataFrame
print(df.tail())

     well_id  latitude  longitude  production_rate  depth  age well_type
22  WELL-023    35.234     45.987             1200  11000    9       oil
23  WELL-024    34.876     46.345              900  10500   13      both
24  WELL-025    35.123     45.789             1150  10800    8       oil
25  WELL-026    34.789     46.234              800   9000   15       gas
26  WELL-027    35.456     45.678             1500   1150   15       gas


In [18]:
# Generate descriptive statistics, including count, mean, standard deviation, min, 25%, 50%, 75%, and max
print(df.describe())

        latitude  longitude  production_rate         depth        age
count  27.000000  27.000000        27.000000     27.000000  27.000000
mean   35.080185  46.015407      1088.888889   9990.740741  10.925926
std     0.374463   0.393351       273.627109   1931.919981   3.221633
min    34.345000  45.123000       750.000000   1150.000000   6.000000
25%    34.832500  45.789000       900.000000   9650.000000   8.000000
50%    35.123000  45.987000      1050.000000  10500.000000  11.000000
75%    35.289500  46.289500      1200.000000  10900.000000  14.000000
max    35.789000  46.678000      1900.000000  11500.000000  16.000000


In [19]:
# references:
# https://github.com/MPR-UKD/cvi42py/blob/main/src/cvi42py.py