# Random Sampling in Pandas
In this notebook, we'll explore how to perform random sampling in Pandas using the `sample()` method. Random sampling is useful for creating subsets of data for analysis, training/testing, or bootstrapping.

## 1. Basic Random Sampling
Use the `sample()` method to randomly sample a specified number of rows.

In [1]:
import pandas as pd

# Sample DataFrame
data = {'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Randomly sample 2 rows
sampled_df = df.sample(n=2)
print(sampled_df)

   A   B
4  5  50
0  1  10


## 2. Sampling with a Fraction of the Data
Instead of specifying the number of rows, you can sample a fraction of the data using the `frac` parameter.

In [2]:
# Sample 40% of the rows
sampled_df = df.sample(frac=0.4)
print(sampled_df)

   A   B
1  2  20
2  3  30


## 3. Random Sampling with Reproducibility
To ensure reproducibility (getting the same random sample every time), set a random seed using the `random_state` parameter.

In [3]:
# Sample with a fixed random seed
sampled_df = df.sample(n=3, random_state=42)
print(sampled_df)

   A   B
1  2  20
4  5  50
2  3  30


## 4. Random Sampling with Replacement
By default, sampling is done without replacement. To allow duplicates, set `replace=True`.

In [4]:
# Sample with replacement
sampled_df = df.sample(n=6, replace=True)
print(sampled_df)

   A   B
2  3  30
0  1  10
1  2  20
4  5  50
3  4  40
3  4  40


## 5. Random Sampling of Columns
To sample random columns instead of rows, use the `axis` parameter.

In [5]:
# Randomly sample 1 column
sampled_columns = df.sample(n=1, axis=1)
print(sampled_columns)

   A
0  1
1  2
2  3
3  4
4  5


## 6. Weighted Sampling
You can specify weights for rows using the `weights` parameter. This assigns different probabilities to rows being selected.

In [6]:
# Assign weights to rows
sampled_df = df.sample(n=3, weights=[0.1, 0.2, 0.3, 0.4, 0.0], random_state=42)
print(sampled_df)

   A   B
2  3  30
3  4  40
1  2  20


## 7. Shuffling Rows
Random sampling of all rows (`frac=1`) achieves shuffling.

In [7]:
# Shuffle the entire DataFrame
shuffled_df = df.sample(frac=1, random_state=42)
print(shuffled_df)

   A   B
1  2  20
4  5  50
2  3  30
0  1  10
3  4  40


# Examples with Real Dataset