<a href="https://colab.research.google.com/github/thecodemancer/study-with-me/blob/main/python/colab/colab_external_data_cloud_storage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Google Cloud Storage (GCS)

In order to use Colaboratory with GCS, you'll need to create a [Google Cloud project](https://cloud.google.com/storage/docs/projects) or use a pre-existing one.

Specify your project ID below:

In [7]:
project_id = 'Your_project_ID_here'

Files in GCS are contained in [buckets](https://cloud.google.com/storage/docs/buckets).

Buckets must have a globally-unique name, so we generate one here.

In [2]:
import uuid
bucket_name = 'colab-sample-bucket-' + str(uuid.uuid1())

In order to access GCS, we must authenticate.

In [2]:
from google.colab import auth
auth.authenticate_user()

## gsutil

First, we configure `gsutil` to use the project we specified above by using `gcloud`.

In [4]:
!gcloud config set project {project_id}

Updated property [core/project].


Create a local file to upload.

In [5]:
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt

/tmp/to_upload.txt contains:
my sample file

Make a bucket to which we'll upload the file ([documentation](https://cloud.google.com/storage/docs/gsutil/commands/mb)).

In [6]:
!gsutil mb gs://{bucket_name}

Creating gs://colab-sample-bucket-87a34dce-1455-11ee-848b-0242ac1c000c/...


Copy the file to our new bucket ([documentation](https://cloud.google.com/storage/docs/gsutil/commands/cp)).

In [7]:
!gsutil cp /tmp/to_upload.txt gs://{bucket_name}/

Copying file:///tmp/to_upload.txt [Content-Type=text/plain]...
-
Operation completed over 1 objects/14.0 B.                                       


Dump the contents of our newly copied file to make sure everything worked ([documentation](https://cloud.google.com/storage/docs/gsutil/commands/cat)).


In [8]:
!gsutil cat gs://{bucket_name}/to_upload.txt

my sample file

In [None]:
#@markdown Once the upload has finished, the data will appear in the Cloud Console storage browser for your project:
print('https://console.cloud.google.com/storage/browser?project=' + project_id)

https://console.cloud.google.com/storage/browser?project=Your_project_ID_here


Finally, we'll download the file we just uploaded in the example above. It's as simple as reversing the order in the `gsutil cp` command.

In [9]:
!gsutil cp gs://{bucket_name}/to_upload.txt /tmp/gsutil_download.txt

# Print the result to make sure the transfer worked.
!cat /tmp/gsutil_download.txt

Copying gs://colab-sample-bucket-87a34dce-1455-11ee-848b-0242ac1c000c/to_upload.txt...
/ [0 files][    0.0 B/   14.0 B]                                                / [1 files][   14.0 B/   14.0 B]                                                
Operation completed over 1 objects/14.0 B.                                       
my sample file

## GCP Python SDK

In [6]:
from google.cloud import storage

def list_buckets(project_id):
    """
    Args:
        project_id: The project id of your Google Cloud project.
    """
    storage_client = storage.Client(project=project_id)
    buckets = storage_client.list_buckets()
    print("Buckets:")
    for bucket in buckets:
        print(bucket.name)
    print("Listed all storage buckets.")

list_buckets(project_id=project_id)

Buckets:
colab-sample-bucket-2b656c62-1456-11ee-848b-0242ac1c000c
colab-sample-bucket-87a34dce-1455-11ee-848b-0242ac1c000c
dataproc-staging-us-central1-999513749112-aylxeqzs
dataproc-temp-us-central1-999513749112-kzuic8h9
thecodemancer
thecodemancer_dataflow_workshops
thecodemancer_us-east1
Listed all storage buckets.


In [10]:
def download_blob(bucket_name, source_blob_name, destination_file_name="logo.jpg"):
    """
    Downloads a blob from the bucket.

    The ID of your GCS bucket
    bucket_name = "your-bucket-name"

    The ID of your GCS object
    source_blob_name = "storage-object-name"

    The path to which the file should be downloaded
    destination_file_name = "local/path/to/file"
    """
    storage_client = storage.Client()

    bucket = storage_client.bucket(bucket_name)

    # Construct a client side representation of a blob.
    # Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
    # any content from Google Cloud Storage. As we don't need additional data,
    # using `Bucket.blob` is preferred here.
    blob = bucket.blob(source_blob_name)
    print(blob)
    blob.download_to_filename(destination_file_name)

    print(
        "Downloaded storage object {} from bucket {} to local file {}.".format(
            source_blob_name, bucket_name, destination_file_name
        )
    )

download_blob("thecodemancer","Revelo/logo.jpg")

<Blob: thecodemancer, Revelo/logo.jpg, None>
Downloaded storage object Revelo/logo.jpg from bucket thecodemancer to local file logo.jpg.


In [9]:
!ls -l

total 8
-rw-r--r-- 1 root root 2455 Jun 26 21:57 logo.jpg
drwxr-xr-x 1 root root 4096 Jun 23 13:41 sample_data


## Python API

These snippets based on [a larger example](https://github.com/GoogleCloudPlatform/storage-file-transfer-json-python/blob/master/chunked_transfer.py) that shows additional uses of the API.

 First, we create the service client.

In [10]:
from googleapiclient.discovery import build
gcs_service = build('storage', 'v1')

Create a local file to upload.

In [11]:
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt

/tmp/to_upload.txt contains:
my sample file

Create a bucket in the project specified above.

In [12]:
# Use a different globally-unique bucket name from the gsutil example above.
import uuid
bucket_name = 'colab-sample-bucket-' + str(uuid.uuid1())

body = {
  'name': bucket_name,
  # For a full list of locations, see:
  # https://cloud.google.com/storage/docs/bucket-locations
  'location': 'us',
}
gcs_service.buckets().insert(project=project_id, body=body).execute()
print('Done')

Done


Upload the file to our newly created bucket.

In [13]:
from googleapiclient.http import MediaFileUpload

media = MediaFileUpload('/tmp/to_upload.txt',
                        mimetype='text/plain',
                        resumable=True)

request = gcs_service.objects().insert(bucket=bucket_name,
                                       name='to_upload.txt',
                                       media_body=media)

response = None
while response is None:
  # _ is a placeholder for a progress object that we ignore.
  # (Our file is small, so we skip reporting progress.)
  _, response = request.next_chunk()

print('Upload complete')

Upload complete


In [None]:
#@markdown Once the upload has finished, the data will appear in the Cloud Console storage browser for your project:
print('https://console.cloud.google.com/storage/browser?project=' + project_id)

https://console.cloud.google.com/storage/browser?project=Your_project_ID_here


Download the file we just uploaded.

In [14]:
from apiclient.http import MediaIoBaseDownload

with open('/tmp/downloaded_from_gcs.txt', 'wb') as f:
  request = gcs_service.objects().get_media(bucket=bucket_name,
                                            object='to_upload.txt')
  media = MediaIoBaseDownload(f, request)

  done = False
  while not done:
    # _ is a placeholder for a progress object that we ignore.
    # (Our file is small, so we skip reporting progress.)
    _, done = media.next_chunk()

print('Download complete')

Download complete


Inspect the downloaded file.


In [15]:
!cat /tmp/downloaded_from_gcs.txt

my sample file

---
If you made it this far, follow [David Regalado](https://beacons.ai/davidregalado) for more code!