## Pandas Series and DataFrame
### What is a Series?

A pandas Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floats, etc.). Its key features include:

    Homogeneous Data: All elements must be of the same type.
    Labeled Index: Each element is associated with a label.
    Flexible Indexing: Supports both integer and label-based indexing.

Example: Creating and Exploring Series

In [1]:
import pandas as pd
import numpy as np

# Create a Series of oil well names
well_names = ['Well-A', 'Well-B', 'Well-C', 'Well-D', 'Well-E']
series_well_names = pd.Series(well_names)
print("Series of well names:\n", series_well_names)

# Create a Series of oil production rates (barrels per day)
production_rates = [1000, 1500, 2000, 1800, 1200]
series_production_rates = pd.Series(production_rates, index=well_names)
print("\nSeries of production rates:\n", series_production_rates)


Series of well names:
 0    Well-A
1    Well-B
2    Well-C
3    Well-D
4    Well-E
dtype: object

Series of production rates:
 Well-A    1000
Well-B    1500
Well-C    2000
Well-D    1800
Well-E    1200
dtype: int64


### Series Creation
#### From List

You can easily create a Series from a Python list. The list's elements become the values, and an integer index is automatically assigned.

In [2]:
series_from_list = pd.Series([1, 2, 3, 4, 5])
print("Series from list:\n", series_from_list)

Series from list:
 0    1
1    2
2    3
3    4
4    5
dtype: int64


#### From Dictionary

Use a dictionary where keys become the index, and values become the Series' values.



In [3]:
series_from_dict = pd.Series({'a': 10, 'b': 20, 'c': 30})
print("Series from dictionary:\n", series_from_dict)

Series from dictionary:
 a    10
b    20
c    30
dtype: int64


#### From Scalar

A scalar value can be converted into a Series by specifying an index.



In [4]:
scalar_value = 42
series_from_scalar = pd.Series(scalar_value, index=['A', 'B', 'C'])
print("Series from scalar value:\n", series_from_scalar)


Series from scalar value:
 A    42
B    42
C    42
dtype: int64


#### From NumPy Array

You can convert NumPy arrays into Series seamlessly.

In [5]:
numpy_array = np.array([1, 2, 3, 4, 5])
series_from_array = pd.Series(numpy_array)
print("Series from NumPy array:\n", series_from_array)


Series from NumPy array:
 0    1
1    2
2    3
3    4
4    5
dtype: int64


In [6]:

# Create a Series of wellhead temperatures (Fahrenheit)
temperatures = [150, 160, 170, 155, 165]
series_temperatures = pd.Series(temperatures, index=well_names)
print("\nSeries of wellhead temperatures:\n", series_temperatures)


Series of wellhead temperatures:
 Well-A    150
Well-B    160
Well-C    170
Well-D    155
Well-E    165
dtype: int64


### Basic Operations on Series
#### Accessing Data

Series allow you to access elements using labels or positions.

In [7]:
# Create a labeled Series
well_names = pd.Series(['Well-A', 'Well-B', 'Well-C'], index=['a', 'b', 'c'])

# Access by label
print("Access by label 'a':", well_names['a'])

# Access by position
print("Access by position 1:", well_names.iloc[1])


Access by label 'a': Well-A
Access by position 1: Well-B


#### Indexing and Slicing

In [8]:
# Label-based slicing
print("Label-based slicing:\n", well_names['a':'b'])

# Position-based slicing
print("Position-based slicing:\n", well_names.iloc[0:2])

Label-based slicing:
 a    Well-A
b    Well-B
dtype: object
Position-based slicing:
 a    Well-A
b    Well-B
dtype: object


### Vectorized Operations

Perform efficient element-wise operations.

In [9]:
# Increase production rates by 10%
production_rates = pd.Series([1000, 1500, 2000])
increased_rates = production_rates * 1.1
print("Increased production rates:\n", increased_rates)


Increased production rates:
 0    1100.0
1    1650.0
2    2200.0
dtype: float64


### Handling Missing Data

Detect, fill, or drop missing values.



In [10]:
# Create a Series with missing values
rates_with_missing = pd.Series([1000, None, 2000])
print("Missing values:\n", rates_with_missing.isna())

# Fill missing values
filled_rates = rates_with_missing.fillna(0)
print("Filled values:\n", filled_rates)

# Drop missing values
dropped_rates = rates_with_missing.dropna()
print("Dropped values:\n", dropped_rates)


Missing values:
 0    False
1     True
2    False
dtype: bool
Filled values:
 0    1000.0
1       0.0
2    2000.0
dtype: float64
Dropped values:
 0    1000.0
2    2000.0
dtype: float64


### Statistical Methods

Perform statistical operations on Series.

In [11]:
rates = pd.Series([1000, 1500, 2000, 1800, 1200])

print("Mean:", rates.mean())
print("Median:", rates.median())
print("Standard Deviation:", rates.std())


Mean: 1500.0
Median: 1500.0
Standard Deviation: 412.31056256176606


### Series Manipulation
#### Reindexing

In [12]:
depths = pd.Series([10000, 12000, 15000], index=['Well-A', 'Well-B', 'Well-C'])
new_index = ['Well-B', 'Well-C', 'Well-A']
reindexed_depths = depths.reindex(new_index)
print("Reindexed depths:\n", reindexed_depths)


Reindexed depths:
 Well-B    12000
Well-C    15000
Well-A    10000
dtype: int64


#### Sorting

In [13]:
# Sort by values
sorted_depths = depths.sort_values()
print("Sorted by values:\n", sorted_depths)

# Sort by index
sorted_index = depths.sort_index()
print("Sorted by index:\n", sorted_index)

Sorted by values:
 Well-A    10000
Well-B    12000
Well-C    15000
dtype: int64
Sorted by index:
 Well-A    10000
Well-B    12000
Well-C    15000
dtype: int64


### Creating DataFrames

A pandas DataFrame is a two-dimensional labeled data structure. Here's how you can create one:
#### From List

In [14]:
data = [[1, 'A'], [2, 'B'], [3, 'C']]
df_from_list = pd.DataFrame(data, columns=['Number', 'Letter'])
print(df_from_list)

   Number Letter
0       1      A
1       2      B
2       3      C


#### From Dictionary

In [15]:
data = {'Numbers': [1, 2, 3], 'Letters': ['A', 'B', 'C']}
df_from_dict = pd.DataFrame(data)
print(df_from_dict)

   Numbers Letters
0        1       A
1        2       B
2        3       C


#### From NumPy Array

In [16]:
array = np.array([[1, 'A'], [2, 'B'], [3, 'C']])
df_from_array = pd.DataFrame(array, columns=['Number', 'Letter'])
print(df_from_array)

  Number Letter
0      1      A
1      2      B
2      3      C


### Transformation Methods

    Transformation methods enable applying element-wise functions to Series.

In [17]:

# Transformation Methods
production_in_thousand_barrels = production_rates.apply(lambda x: x / 1000)
print("\nProduction rates in thousand barrels:\n", production_in_thousand_barrels)



Production rates in thousand barrels:
 0    1.0
1    1.5
2    2.0
dtype: float64


### Series Manipulation

Series manipulation involves operations like reindexing, sorting, filtering, and mapping to reshape and transform data effectively.

#### Reindexing

    Reindexing allows you to change the index of a Series to match a new set of labels.

In [18]:
# Create a Series of oil well depths (meters)
well_depths = pd.Series([10000, 12000, 15000, 13000, 11000], index=['Well-A', 'Well-B', 'Well-C', 'Well-D', 'Well-E'])

# Reindexing
# Desired new order: Well-B, Well-C, Well-D, Well-E, Well-A
new_index = ['Well-B', 'Well-C', 'Well-D', 'Well-E', 'Well-A']
reindexed_depths = well_depths.reindex(new_index)
print("Reindexed Series (well depths):\n", reindexed_depths)


Reindexed Series (well depths):
 Well-B    12000
Well-C    15000
Well-D    13000
Well-E    11000
Well-A    10000
dtype: int64


In [19]:
# Mapping (Transformation)
# Map well depths to depth categories (Shallow: < 10000m, Medium: 10000-15000m, Deep: > 15000m)
def depth_category(depth):
  if depth < 12000:
    return "Shallow"
  elif depth <= 13000:
    return "Medium"
  else:
    return "Deep"

mapped_depths = well_depths.map(depth_category)
print("\nMapped Series with well depth categories:\n", mapped_depths)



Mapped Series with well depth categories:
 Well-A    Shallow
Well-B     Medium
Well-C       Deep
Well-D     Medium
Well-E    Shallow
dtype: object


In [20]:
# Create Series for combination
well_names_1 = pd.Series(['Well-A', 'Well-B', 'Well-C'], index=['a', 'b', 'c'])
well_names_2 = pd.Series(['Well-D', 'Well-E'], index=['d', 'e'])

# Concatenation
concatenated_wells = pd.concat([well_names_1, well_names_2])
print("Concatenated Well Names:\n", concatenated_wells)



Concatenated Well Names:
 a    Well-A
b    Well-B
c    Well-C
d    Well-D
e    Well-E
dtype: object


In [21]:
# Appending (using concat)
appended_wells = pd.concat([well_names_1, well_names_2])
print("\nAppended Well Names:\n", appended_wells)



Appended Well Names:
 a    Well-A
b    Well-B
c    Well-C
d    Well-D
e    Well-E
dtype: object


In [22]:
# Merging
# Create Series with production rates
production_rates_1 = pd.Series([1000, 1500, 2000], index=['a', 'b', 'c'])
production_rates_2 = pd.Series([1800, 1200], index=['d', 'e'])

# Concatenating Series with different data (well names and production rates)
merged_data = pd.concat([well_names_1, production_rates_1])
print("\nMerged Series with Well Names and Production Rates:\n", merged_data)



Merged Series with Well Names and Production Rates:
 a    Well-A
b    Well-B
c    Well-C
a      1000
b      1500
c      2000
dtype: object



Create Pandas DataFrame

    A pandas DataFrame is a two-dimensional labeled data structure with columns that can hold different data types. It's essentially a tabular data structure, similar to a spreadsheet or SQL table.   