# Upload Files to Cloud Object Storage

This notebook will try to upload from your local into Cloud Object Storage in IBM Cloud Bucket
Things to prepare:
- APIKEY info
- INSTANCE CRN info
- Public endpoint info


How to create service credential? Please see [this reference](https://cloud.ibm.com/docs/cloud-object-storage/iam?topic=cloud-object-storage-service-credentials).

Reference that could help:
- [IBM COS Reference python](https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-python)
- [IBM COS Python SKD Documentation](https://ibm.github.io/ibm-cos-sdk-python/reference/services/index.html)
- [IBM COS endpoint](https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-endpoints#endpoints)


## Install COS python SDK

if you not yet install SDK, please run this code, otherwise you can skip this.

In [None]:
%pip install -U ibm-cos-sdk

## Define Dependencies and credentials

In [None]:
import os 
import pandas as pd
import ibm_boto3
from ibm_botocore.client import Config, ClientError

To get endpoint url, open your bucket configuration, under the _Endpoint section_ you can find the information about the URL.
From the configuration you also can find the information about location and object class.
Current list avaiable at https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints

In [None]:
endpoint_url_private = "s3.private.us-east.cloud-object-storage.appdomain.cloud"
endpoint_url_public = "s3.us-east.cloud-object-storage.appdomain.cloud"

In [None]:
COS_ENDPOINT = "https://"+endpoint_url_public 
COS_API_KEY_ID =  #eg: "W00YixxxxxxxxxxMB-xxx-2ySxxxxxxxxxxxxc--Pxxxk"
COS_INSTANCE_CRN = #eg: "crn:v1:bluemix:public:cloud-object-storage:global:a/3bfxxxxxxxxxxxxxxxxxxxxxxxxxxx1c:dxxxxxx3-6xxf-4xx2-axx5-6xxxxxxxxx3::"

Based on SDK Documentation the connection can be established by using _Client_ or _Resources_

## Client

This part executing functions that can be accesed using Client.
This is just a example to get the bucket name and list the objects, but it capped to 1000 files only

In [None]:
cos = ibm_boto3.client("s3",
    ibm_api_key_id=COS_API_KEY_ID,
    ibm_service_instance_id=COS_INSTANCE_CRN,
    config=Config(signature_version="oauth"),
    endpoint_url=COS_ENDPOINT
)

In [None]:
bucket_name = cos.list_buckets()['Buckets'][0]['Name']

In [None]:
response = cos.list_objects(Bucket=bucket_name,)
len([name['Key'] for name in response['Contents']])

## Resources

This part executing functions that can be accesed using Resources.

In [None]:
res = ibm_boto3.resource("s3",
    ibm_api_key_id=COS_API_KEY_ID,
    ibm_service_instance_id=COS_INSTANCE_CRN,
    config=Config(signature_version="oauth"),
    endpoint_url=COS_ENDPOINT
)

### Get the available bucket

In [None]:
def get_buckets():
    print("Retrieving list of buckets")
    bucket_list = []
    try:
        buckets = res.buckets.all()
        for bucket in buckets:
            print("Bucket Name: {0}".format(bucket.name))
            bucket_list.append(bucket.name)
        return bucket_list
    except ClientError as be:
        print("CLIENT ERROR: {0}\n".format(be))
    except Exception as e:
        print("Unable to retrieve list buckets: {0}".format(e))

In [None]:
avail_bucket = get_buckets()
avail_bucket

In [None]:
bucket_name = avail_bucket[0]

### Get the available objects inside bucket

In [None]:
def get_bucket_contents(bucket_name):
    print("Retrieving bucket contents from: {0}".format(bucket_name))
    obj_list = []
    try:
        files = res.Bucket(bucket_name).objects.all()
        for file in files:
            file_info = {}
            #print("Item: {0} ({1} bytes).".format(file.key, file.size))
            file_info['filename'] = file.key
            file_info['filesize'] = file.size
            obj_list.append(file_info)
        return obj_list
    except ClientError as be:
        print("CLIENT ERROR: {0}\n".format(be))
    except Exception as e:
        print("Unable to retrieve bucket contents: {0}".format(e))

In [None]:
avail_obj = get_bucket_contents(bucket_name)
avail_obj_df = pd.DataFrame(avail_obj)
avail_obj_df

In [None]:
avail_obj_df.filesize.sum()

### Get all files inside the uncompressed folder

__CAUTION: Ensure the uncompressed folder you about to upload is in the same level (folder) with your cos_sample_code.ipynb file__

In [None]:
local_folder_path = 'your_local_uncompressed_folder_path'

In [None]:
def get_obj_list(folder_path, obj_list = []):
    with os.scandir(folder_path) as entries:
        for entry in entries:
            if entry.name!= '.DS_Store':
                file_path = os.path.join(folder_path,entry.name)
                if entry.is_dir():
                    get_obj_list(file_path, obj_list)
                if entry.is_file():
                    obj_list.append(file_path)
        return obj_list
        

In [None]:
local_file_list = get_obj_list(local_folder_path,[])
len(local_file_list)

### Upload Function

In [None]:
##uncomment to test with sample files first, after upload going ok, then delete the file
#res.Bucket(bucket_name).upload_file('sample.txt', 'check/sample.txt')

In [None]:
#use this if you try to upload file you need to provide a path
def upload_file(bucket_name, item_name):
    print("Starting upload item to bucket: {0}, key: {1}".format(bucket_name, item_name))
    try:
        res.Bucket(bucket_name).upload_file(item_name, item_name)
        print("uploaded file: ", item_name)
        return item_name
    except Exception as e:
        print("Unable to retrieve file contents: {0}".format(e))

In [None]:
#use this if you try to upload an object (opened file) or binary file
def upload_file_obj(bucket_name, item_name):
    print("Starting upload item to bucket: {0}, key: {1}".format(bucket_name, item_name))
    with open('filename.png', 'rb') as item_bin:
        try:
            #uncomment this in case you need to upload binary object, wrap it using Bytesio first
            #from io import BytesIO
            #res.Bucket(bucket_name).Object(item_name).upload_fileobj(BytesIO(item_bin))
            res.Bucket(bucket_name).Object(item_name).upload_fileobj(item_bin)
            print("uploaded file: ", item_name)
            return f"{COS_ENDPOINT}/{bucket_name}/{item_name}"
        except Exception as e:
            print("Unable to retrieve file contents: {0}".format(e))

In [None]:
uploaded_file_list = []

for file_path in local_file_list:
    upload_file(bucket_name, file_path)
    uploaded_file_list.append(file_path)

### Verify Upload

In [None]:
avail_obj = get_bucket_contents(bucket_name)
avail_obj_df = pd.DataFrame(avail_obj)
avail_obj_df

In [None]:
unuploaded = [file_path for file_path in local_file_list if file_path not in list(avail_obj_df['filename'].unique())]
unuploaded

### Download Function

In [None]:
def download_file(bucket_name, filename, filename_local):
    res.Object(bucket_name, filename).download_file(filename_local)

In [None]:
def download_file_obj(bucket_name, item_name):
    obj = res.Bucket(bucket_name).Object(item_name)
    
    #print(type(obj))
    with open('filename.png', 'wb') as data:
        obj.download_fileobj(data)