# Lesson 4: Mastering Multi-Part Uploads in Amazon S3 with Boto3 and Python

## Multipart Upload of Different Sized Chunks to Amazon S3

In this task, we'll perform a multipart upload using different chunk sizes. First, we'll upload a chunk of size 5MB, then a chunk of size 6MB, and finally, we'll finish uploading by sending the remaining part of the file. Moreover, we'll not use any loop in this task to highlight how we can handle multipart uploads manually. The file we will upload is "cosmo-hadoop-course-data-set.zip" and the destination bucket is called "cosmo-archive-2023".

Your task has the following steps:

Initialize the multipart upload.
Upload the first chunk of size 5MB.
Upload the second chunk of size 6MB.
Upload the final chunk, which should contain the rest of the file.
Complete the multipart upload.
Important Note: Running scripts can alter the filesystem's state or modify the resources in our AWS simulator. To revert to the initial state, you can use the reset button located in the top right corner. However, keep in mind that resetting will erase any code changes. To preserve your code during a reset, consider copying it to the clipboard.

```python
import boto3
import os

# Create the S3 client
s3_client = boto3.client('s3')

# Create a new bucket
bucket_name = 'cosmo-archive-2023'
s3_client.create_bucket(Bucket=bucket_name)

# Path to your dataset
file_path = '/usercode/FILESYSTEM/assets/cosmo-hadoop-course-data-set.zip'
key = 'cosmos-hadoop-course-data-set.zip'

# Initiate multipart upload
multipart_upload = s3_client.create_multipart_upload(Bucket=bucket_name, Key=key)
upload_id = multipart_upload['UploadId']

uploaded_parts = []

# Open the file for reading data
with open(file_path, 'rb') as f:
    # Upload the first chunk
    data = f.read(1024 * 1024 * 5)  # 5MB
    response = s3_client.upload_part(Bucket=bucket_name, Key=key, PartNumber=1, UploadId=upload_id, Body=data)
    uploaded_parts.append({'PartNumber': 1, 'ETag': response['ETag']})

    # Upload the second chunk
    data = f.read(1024 * 1024 * 6)  # 6MB
    response = s3_client.upload_part(Bucket=bucket_name, Key=key, PartNumber=2, UploadId=upload_id, Body=data)
    uploaded_parts.append({'PartNumber': 2, 'ETag': response['ETag']})

    # Upload the final chunk (which is the rest of the file)
    data = f.read()
    response = s3_client.upload_part(Bucket=bucket_name, Key=key, PartNumber=3, UploadId=upload_id, Body=data)
    uploaded_parts.append({'PartNumber': 3, 'ETag': response['ETag']})

# Complete the multipart upload
s3_client.complete_multipart_upload(Bucket=bucket_name, Key=key, UploadId=upload_id, MultipartUpload={'Parts': uploaded_parts})
print("Dataset uploaded successfully in chunks of varying sizes. Multipart upload completed.")

```

This script demonstrates a **manual multipart upload** to Amazon S3 using **different chunk sizes** without looping. Here's a breakdown:

---

### **📌 Steps in the Script**
1. **Initialize the multipart upload** → Creates an upload session.
2. **Upload the first chunk (5MB)** → Reads and uploads the first part.
3. **Upload the second chunk (6MB)** → Reads and uploads the second part.
4. **Upload the final chunk** → Uploads the remaining data.
5. **Complete the upload** → Informs S3 that all parts are uploaded.

---

### **💡 Key Takeaways**
- **Manual Multipart Upload:** Unlike automated loops, this script explicitly uploads each chunk separately.
- **Chunk Sizes Can Vary:** Different parts (5MB, 6MB, and remaining file) showcase flexibility in uploading.
- **AWS Constraints:** Minimum part size is **5MB**, except for the last chunk, which can be smaller.
- **No Loop Used:** A simple, step-by-step manual upload.

---

### **🚀 Usage Scenario**
Ideal for cases where you need **precise control** over multipart uploads, such as:
- **Optimizing upload performance** for unstable networks.
- **Handling custom chunk sizes** based on system constraints.
- **Debugging and testing multipart upload workflows.**

This approach ensures better efficiency and control over large file transfers to **Amazon S3**. 🚀



## Multipart Upload Mastery - Segmenting the Cosmos

Embark on a mission to enhance your cloud storage skills by completing a multipart upload to Amazon S3. Your challenge is to upload a 43 MB dataset in three parts to the cosmo-archive-2023 bucket. The dataset is segmented into chunks of 15 MB, 15 MB, and 13 MB. Although the code for uploading the first chunk is provided, it has not been executed yet. Your objective is to finalize the script by seamlessly uploading all three chunks, showcasing your adeptness in managing significant datasets in S3 with efficiency and accuracy. This task is your opportunity to demonstrate proficiency in ensuring data integrity and accessibility in the cloud.

Important Note: Running scripts can alter the filesystem's state or modify the resources in our AWS simulator. To revert to the initial state, you can use the reset button located in the top right corner. However, keep in mind that resetting will erase any code changes. To preserve your code during a reset, consider copying it to the clipboard.

```python
import boto3

# Configure the S3 client
s3_client = boto3.client('s3')

# Create a new bucket and enable versioning
bucket_name = 'cosmo-archive-2023'
s3_client.create_bucket(Bucket=bucket_name)
s3_client.put_bucket_versioning(Bucket=bucket_name, VersioningConfiguration={'Status': 'Enabled'})

# Path to your large dataset
file_path = '/usercode/FILESYSTEM/assets/cosmo-hadoop-course-data-set.zip'
key = 'cosmos-hadoop-course-data-set.zip'

# Initiate multipart upload
multipart_upload = s3_client.create_multipart_upload(Bucket=bucket_name, Key=key)
upload_id = multipart_upload['UploadId']

# Upload the first chunk (15 MB) as an example
with open(file_path, 'rb') as f:
    data = f.read(1024 * 1024 * 15)  # Read the first 15 MB for the first chunk
    s3_client.upload_part(Bucket=bucket_name, Key=key, PartNumber=1, UploadId=upload_id, Body=data)

    # TODO: Upload the second chunk of 15 MB
    # TODO: Upload the final chunk of 13 MB

# TODO: Complete the multipart upload by combining all the uploaded parts


```

Here’s the complete script that uploads the 43 MB dataset in three parts to Amazon S3:  

```python
import boto3

# Configure the S3 client
s3_client = boto3.client('s3')

# Create a new bucket and enable versioning
bucket_name = 'cosmo-archive-2023'
s3_client.create_bucket(Bucket=bucket_name)
s3_client.put_bucket_versioning(Bucket=bucket_name, VersioningConfiguration={'Status': 'Enabled'})

# Path to your large dataset
file_path = '/usercode/FILESYSTEM/assets/cosmo-hadoop-course-data-set.zip'
key = 'cosmos-hadoop-course-data-set.zip'

# Initiate multipart upload
multipart_upload = s3_client.create_multipart_upload(Bucket=bucket_name, Key=key)
upload_id = multipart_upload['UploadId']

uploaded_parts = []

# Open the file for reading data
with open(file_path, 'rb') as f:
    # Upload the first chunk (15 MB)
    data = f.read(1024 * 1024 * 15)
    response = s3_client.upload_part(Bucket=bucket_name, Key=key, PartNumber=1, UploadId=upload_id, Body=data)
    uploaded_parts.append({'PartNumber': 1, 'ETag': response['ETag']})

    # Upload the second chunk (15 MB)
    data = f.read(1024 * 1024 * 15)
    response = s3_client.upload_part(Bucket=bucket_name, Key=key, PartNumber=2, UploadId=upload_id, Body=data)
    uploaded_parts.append({'PartNumber': 2, 'ETag': response['ETag']})

    # Upload the final chunk (13 MB)
    data = f.read()  # Read the remaining file data
    response = s3_client.upload_part(Bucket=bucket_name, Key=key, PartNumber=3, UploadId=upload_id, Body=data)
    uploaded_parts.append({'PartNumber': 3, 'ETag': response['ETag']})

# Complete the multipart upload
s3_client.complete_multipart_upload(Bucket=bucket_name, Key=key, UploadId=upload_id, MultipartUpload={'Parts': uploaded_parts})

print("Dataset successfully uploaded in three parts (15MB, 15MB, 13MB). Multipart upload completed.")
```

---

### **📌 What’s Completed?**
✔ **Uploaded First Chunk** – 15MB  
✔ **Uploaded Second Chunk** – 15MB  
✔ **Uploaded Final Chunk** – 13MB  
✔ **Completed Multipart Upload**  

---

### **💡 Why Use Multipart Upload?**
- 🚀 **Faster Uploads** – Uploading in parallel is possible.  
- 🔄 **Resilience** – If a part fails, retry only that part.  
- 🔧 **Handles Large Files** – Efficiently manages multi-GB uploads.  

Now, you have a complete, **efficient**, and **error-free** multipart upload process! 🚀

## Multipart Upload of a Large Dataset to Amazon S3

Prepare to further advance your cloud storage skills with a new challenge: uploading a dataset in smaller increments to Amazon S3. Your mission involves dividing a 43 MB dataset into 5 MB chunks and uploading each to the cosmo-archive-2023 bucket. Utilizing a loop for this multipart upload process, you will demonstrate your ability to efficiently manage large datasets in S3. This task is an excellent opportunity to showcase your adeptness in optimizing data transfers and ensuring the dataset's accessibility in the cloud.

```python
import boto3

# Initialize the S3 client
s3_client = boto3.client('s3')

# Create a new bucket for your uploads
bucket_name = 'cosmo-archive-2023'
s3_client.create_bucket(Bucket=bucket_name)

# Path to your dataset
file_path = '/usercode/FILESYSTEM/assets/cosmo-hadoop-course-data-set.zip'
key = 'cosmos-hadoop-course-data-set.zip'

# TODO: Initiate a multipart upload session

# TODO: Upload the dataset in 5 MB chunks using a loop

# TODO: Complete the multipart upload by combining all the uploaded parts
```

Here's the complete script that efficiently uploads the 43 MB dataset in **5 MB chunks** to Amazon S3 using a loop:  

```python
import boto3
import os

# Initialize the S3 client
s3_client = boto3.client('s3')

# Create a new bucket for your uploads
bucket_name = 'cosmo-archive-2023'
s3_client.create_bucket(Bucket=bucket_name)

# Path to your dataset
file_path = '/usercode/FILESYSTEM/assets/cosmo-hadoop-course-data-set.zip'
key = 'cosmos-hadoop-course-data-set.zip'

# Initiate a multipart upload session
multipart_upload = s3_client.create_multipart_upload(Bucket=bucket_name, Key=key)
upload_id = multipart_upload['UploadId']

uploaded_parts = []
part_size = 1024 * 1024 * 5  # 5MB per chunk
file_size = os.path.getsize(file_path)
part_count = (file_size + part_size - 1) // part_size  # Calculate number of parts

# Upload the dataset in 5 MB chunks using a loop
with open(file_path, 'rb') as f:
    for part_no in range(1, part_count + 1):
        data = f.read(part_size)  # Read 5MB chunk
        response = s3_client.upload_part(
            Bucket=bucket_name,
            Key=key,
            PartNumber=part_no,
            UploadId=upload_id,
            Body=data
        )
        uploaded_parts.append({'PartNumber': part_no, 'ETag': response['ETag']})

# Complete the multipart upload by combining all the uploaded parts
s3_client.complete_multipart_upload(
    Bucket=bucket_name,
    Key=key,
    UploadId=upload_id,
    MultipartUpload={'Parts': uploaded_parts}
)

print(f"Dataset successfully uploaded in {part_count} chunks of 5MB each. Multipart upload completed.")
```

---

### **📌 What’s Implemented?**
✔ **Multipart Upload Initialized**  
✔ **File Split into 5MB Chunks**  
✔ **Upload Handled in a Loop**  
✔ **Upload Finalized & Combined**  

---

### **💡 Why Use This Approach?**
- **📊 Scalability:** Handles any file size efficiently.  
- **🔄 Reliability:** Only failed parts need re-uploading.  
- **🚀 Performance:** Parallel uploads possible for faster processing.  

Now you have a **robust, scalable**, and **automated** multipart upload process! 🚀