<h1 align=center><font size = 5>Reading Files Python </font></h1>

<br>

This notebook will provide information regarding reading **.txt** files.

## Table of Contents


<div class="alert alert-block alert-info" style="margin-top: 20px">

<li><a href="#ref0">Importing Data Using IBM Cloud Object Storage  </a></li>
<li><a href="#ref1">Reading Text Files</a></li>

<br>
<p></p>
Estimated Time Needed: <strong>15 min</strong>
</div>

<hr>

 <a id="ref0"></a>
<h2 align=center> Importing Data Using IBM Cloud Object Storage
 </h2>

 Information stored in IBM® Cloud Object Storage is encrypted and dispersed across multiple geographic locations, and accessed over HTTPS using a REST API. You can share your data,  collaborate with developers and data scientists to Build applications, and store up to 25 GB/month for free.
 
 IBM cloud storage stores your data in different Buckets; Buckets are just a way of organising your data. Each bucket contains objects, in this case, each object is a separate file. It is helpful to use the following image :

 <a,align = "center"><img src = "https://ibm.box.com/shared/static/zn11ytdojne85cj5m37avihtns9lm1xo.png" width = 800, >
</a>

 To interact with IBM cloud storage, we require the python package **ibm_boto3**; this package allows Python developers to write software that interacts with IBM Cloud Object Storage. 

In [3]:
# if you use this notebook in your local environment, you may have to install the ibm_boto3 package 
#!pip install -U ibm-cos-sdk
#!pip install -U ibm-cos-sdk


In [4]:
import ibm_boto3
from ibm_botocore.client import Config
#import boto3
#from boto3.client import Config

 We will use the function **make_client** to create a low-level service client by name, we only input the crd check the credentials. Check the references at the bottom of this notebook for more details how the function works.

In [5]:
def make_client(credentials):
    
    import json
    import requests
    from ibm_botocore.client import Config
    # Rquest detailed enpoint list
    endpoints = requests.get(credentials["endpoints"]).json()
    # Obtain iam and cos host from the the detailed endpoints
    iam_host = (endpoints['identity-endpoints']['iam-token'])
    cos_host = (endpoints['service-endpoints']['cross-region']['us']['public']['us-geo'])
    
    api_key = credentials['apikey']
    service_instance_id = credentials['resource_instance_id']
    
    # Constrict auth and cos endpoint
    auth_endpoint = "https://" + iam_host + "/oidc/token"
    service_endpoint = "https://" + cos_host
    # Get bucket list
    client = ibm_boto3.client('s3',ibm_api_key_id=api_key,
                    ibm_service_instance_id=service_instance_id,
                    ibm_auth_endpoint=auth_endpoint,
                    config=Config(signature_version='oauth'),
                    endpoint_url=service_endpoint)
    return client

To create a low-level service client, we need service credentials. Service credentials are needed to uniquely identify your object storage instance and authenticate you to an object storage instance.

In [6]:
credentials={
  "apikey": "W2Zsax1IVera7rdakIk9koyOqX3hkfozjHdiyaGMtOYo",
  "endpoints": "https://cos-service.bluemix.net/endpoints",
  "iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - crn:v1:bluemix:public:cloud-object-storage:global:a/c68328aadbdc6f9109a7fa81c17a1d16:2b44157a-946d-4ea4-94c9-509c74746305::",
  "iam_apikey_name": "auto-generated-apikey-53a56235-a37f-4e79-b563-2eecb4533e95",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Reader",
  "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/c68328aadbdc6f9109a7fa81c17a1d16::serviceid:ServiceId-6f5fb8cb-f8d7-4bff-b354-a0d8e5bf8adf",
  "resource_instance_id": "crn:v1:bluemix:public:cloud-object-storage:global:a/c68328aadbdc6f9109a7fa81c17a1d16:2b44157a-946d-4ea4-94c9-509c74746305::"
}


 We now create a low-level service client, we will use this object to interact with object storage as shown in the figuer. 

In [7]:
client=make_client(credentials)

<a,align = "center"><img src = "https://ibm.box.com/shared/static/sj3v88u2mvhddjsvi4a5cdmymhyz05nc.png" width = 800, >
</a>

 Now we  can download the file with one line of code  as shown in the next cell, but let's do some simple operations like retrieving the name of the buckets and the name of the file objects 

In [8]:
#client.download_file(Bucket='pythonfordatascience',Key='example1.txt',Filename='/resources/data/Example1.txt')

 We can use the method **list_buckets()** to provide a list of all buckets owned by the authenticated sender of the request

In [9]:
response = client.list_buckets()

 The result is a python dictionary; we can see there are three keys

In [10]:
response.keys()

dict_keys(['ResponseMetadata', 'Buckets', 'Owner'])

 We are only concerned with the values for the key 'Buckets'. If we look at the values, we see we have a list; each element of the list is a dictionary.  Each nested dictionary has two keys **'CreationDate'** and  **'Name'**.  The value for  **'CreationDate' **
 contains the creation date of the bucket and  **'Name'** contains the name of the buckets.

In [11]:
response['Buckets']

[{'Name': '123-abc-2021-05-12-t-19-23-34-333-z',
  'CreationDate': datetime.datetime(2021, 5, 12, 19, 23, 36, 605000, tzinfo=tzutc())},
 {'Name': '213-2021-06-15-t-19-53-17-873-z',
  'CreationDate': datetime.datetime(2021, 6, 15, 19, 53, 20, 13000, tzinfo=tzutc())},
 {'Name': 'annotationstestjoe',
  'CreationDate': datetime.datetime(2021, 1, 14, 17, 48, 58, 527000, tzinfo=tzutc())},
 {'Name': 'annotationstestjoe2',
  'CreationDate': datetime.datetime(2021, 1, 14, 21, 2, 19, 920000, tzinfo=tzutc())},
 {'Name': 'asth-2021-04-29-t-21-42-01-410-z',
  'CreationDate': datetime.datetime(2021, 4, 29, 21, 42, 4, 287000, tzinfo=tzutc())},
 {'Name': 'capstone-donotdelete-pr-vb1gwzlqjbew87',
  'CreationDate': datetime.datetime(2019, 11, 8, 22, 4, 30, 955000, tzinfo=tzutc())},
 {'Name': 'capstonedeeplearning',
  'CreationDate': datetime.datetime(2019, 11, 8, 22, 0, 43, 686000, tzinfo=tzutc())},
 {'Name': 'cc-tutorialjoe',
  'CreationDate': datetime.datetime(2019, 2, 7, 21, 39, 33, 884000, tzinfo=tz

 We can use a list comprehension to get the name of all the buckets  

In [12]:
buckets = [bucket['Name'] for bucket in response['Buckets']]
print(buckets)


['123-abc-2021-05-12-t-19-23-34-333-z', '213-2021-06-15-t-19-53-17-873-z', 'annotationstestjoe', 'annotationstestjoe2', 'asth-2021-04-29-t-21-42-01-410-z', 'capstone-donotdelete-pr-vb1gwzlqjbew87', 'capstonedeeplearning', 'cc-tutorialjoe', 'celebritydataset', 'cloud-object-storage-yo-cos-standard-bse', 'cnn-2021-05-13-t-14-21-12-358-z', 'cos-standard-wv7', 'dataanalyticswithspark-donotdelete-pr-3lesisu0ln8dsd', 'dataanalyticswithsparksc0104en-donotdelete-pr-4uk8smjxgfbhpb', 'datascienceandmachinelearningcaps-donotdelete-pr-hgw4fhiabjw2zj', 'detectstuff', 'final-project-with-traffic-sig-2021-05-14-t-20-16-42-784-z', 'final-project-with-traffic-sig-2021-05-14-t-20-17-48-219-z', 'final-project-with-traffic-sig-2021-05-14-t-20-17-57-916-z', 'fuckya-2021-05-14-t-17-38-02-633-z', 'gputest-donotdelete-pr-x0qetwicaxper2', 'hotdog-2021-04-29-t-00-57-15-806-z', 'hotdog-4-2021-04-30-t-14-07-22-108-z', 'hotdog-last-2021-04-29-t-02-12-14-440-z', 'hotdog-last-2021-04-29-t-12-45-15-049-z', 'hotdog-la

We can use the method **list_objects** to return a dictionary that contains information about the objects in a bucket. The parameter Bucket is simply the name of the bucket in this case **'pythonfordatascience'**.

In [13]:
response = client.list_objects(Bucket='pythonfordatascience')
response.keys()

dict_keys(['ResponseMetadata', 'IBMSSEKPEnabled', 'IsTruncated', 'Marker', 'Contents', 'Name', 'Prefix', 'Delimiter', 'MaxKeys'])

 We are only concerned with the values for the key 'Contents'. If we look at the values, we see we have a list; each element of the list is a dictionary.  Similar to how we extracted the names of the buckets we can extract the name of the file objects a list:

In [14]:
file_names=[content['Key'] for content in response['Contents']]
print(file_names)

['example1.txt', 'top_selling_albums.csv', 'top_selling_albums.xlsx']


We now have the list of the file objects; we can now use the method **download_file** to download the file from IBM Object Storage. The parameter **Bucket** is the name of the bucket the parameter **Key** is the name of the file object, and the parameter **Filename** is the name we would like to give to the file. For example, from the bucket  'pythonfordatascience' we can download the file object **'example1.txt'** and save it as **'example1.txt'** as follows:

In [16]:
client.download_file(Bucket='pythonfordatascience',Key='example1.txt',Filename='Example1.txt')

 <divstyle="margin-top: 20px">
 <a href="http://cocl.us/PDSCloudObjectStorage"><img src = "https://ibm.box.com/shared/static/93z0ktpy10m9a8npyx6b0cmdfxie49kz.png" width = 750, align = "center"></a>

 <a id="ref1"></a>
<h2 align=center>Reading Text Files</h2>

One way to read or write a file in Python is to use the built-in **open** function. The **open** function provides a **File object** that contains the methods and attributes you need in order to read, save, and manipulate the file. In this notebook, we will only cover **.txt** files. The first parameter you need is the file path and the file name. An example is shown in __Figure 1__:



 <a ><img src = "https://ibm.box.com/shared/static/6wl3vw4ghflafrou0noj70t2n4hbalqr.png" width = 500, align = "center"></a>
  <h4 align=center>  
    Figure 1: Labeled Syntax of a file object.  

  </h4> 

 The mode argument is optional and the default value is **r**. In this notebook we only cover two modes: 

<li>**r** Read mode for reading files </li>
<li>**w** Write mode for writing files</li>

 For the next example, we will use the text file **Example1.txt**. The file is shown in figure 2:


 <a ><img src = "https://ibm.box.com/shared/static/ilzy3av6x1cd3gi61bq2nq0vxb0awhju.png" width = 200, align = "center"></a>
  <h4 align=center>  
    Figure 2: The text file "Example1.txt".

  </h4> 

 We read the file: 

In [17]:
example1="Example1.txt"
file1 = open(example1,"r")

 We can view the attributes of the file.

The name of the file:

In [18]:
file1.name

'Example1.txt'

 The mode the file object is in:

In [19]:
file1.mode

'r'

We can read the file and assign it to a variable :

In [20]:
FileContent=file1.read()
FileContent

'This is line 1 \nThis is line 2\nThis is line 3'

The “/n” means that there is a new line. 

We can print the file: 

In [21]:
print(FileContent)

This is line 1 
This is line 2
This is line 3


The file is of type string:

In [22]:
type(FileContent)

str

 We must close the file object:

In [23]:
file1.close()

 <h3> A  Better Way to Open a File </h3>

Using the **with** statement is better practice, it automatically closes the file even if the code encounters an exception. The code will run everything in the indent block then close the file object. 


In [24]:
with open(example1,"r") as file1:
    FileContent=file1.read()
    print(FileContent)

This is line 1 
This is line 2
This is line 3


The file object is closed, you can verify it by running the following cell:  

In [25]:
file1.closed

True

 We can see the info in the file:

In [26]:
print(FileContent)

This is line 1 
This is line 2
This is line 3


The syntax is a little confusing as the file object is after the **as** statement. We also don’t explicitly close the file. Therefore we summarise the steps in a figure:

 <a ><img src = "https://ibm.box.com/shared/static/ywul1ji1ld82xwz60ljxvbg6fs2vrunm.png" width = 500, align = "center"></a>
  <h4 align=center>  
    The syntax for opening a file using a 'with' statement.

  </h4> 

In [27]:
with open(example1,"r") as file1:
    FileContent=file1.readlines()
    print(FileContent)

['This is line 1 \n', 'This is line 2\n', 'This is line 3']


We don’t have to read the entire file, for example, we can read the first 4 characters by entering three as a parameter to the method **.read()**:


In [28]:
with open(example1,"r") as file1:
    print(file1.read(4))

This


Once the method **.read(4)** is called the first 4 characters are called.  If we call the method again, the next 4 characters are called. The output for the following cell will demonstrate the process for different inputs to the method **read() **:



In [29]:
with open(example1,"r") as file1:
    print(file1.read(4))
    print(file1.read(4))
    print(file1.read(7))
    print(file1.read(15))


This
 is 
line 1 

This is line 2


 The process is illustrated in the below figure, and each colour represents the part of the file read after the method **read()** is called:


 <a ><img src = "https://ibm.box.com/shared/static/s0xs6y4vcvabp2ll2pwspa6kd8qeoddj.png" width = 500, align = "center"></a>
  <h4 align=center>  
     Illustration using the method **.read()** to call different characters 

  </h4> 

 Here is an example using the same file, but instead we read 16, 5, and then 9 characters at a time: 

In [30]:
with open(example1,"r") as file1:
    print(file1.read(16))
    print(file1.read(5))
    print(file1.read(9))


This is line 1 

This 
is line 2


We can also read one line of the file at a time using the method **readline()**: 

In [31]:
 with open(example1,"r") as file1:
    print("first line: " + file1.readline())


first line: This is line 1 



 We can use a loop to iterate through each line: 


In [32]:
 with open(example1,"r") as file1:
        i=0;
        for line in file1:
            print("Iteration" ,str(i),":",line)
            i=i+1;

Iteration 0 : This is line 1 

Iteration 1 : This is line 2

Iteration 2 : This is line 3


We can use the method **readline()** to save the text file to a list: 

In [33]:
with open(example1,"r") as file1:
    FileasList=file1.readlines()

 Each element of the list corresponds to a line of text:

In [34]:
FileasList[0]

'This is line 1 \n'

In [35]:
FileasList[1]

'This is line 2\n'

In [36]:
FileasList[2]

'This is line 3'

### References

1) <a href="https://ibm.github.io/ibm-cos-sdk-python/reference/core/boto3.html"> ibm-cos-sdk Reference</a>
 
2) <a href="https://dataplatform.ibm.com/analytics/notebooks/v2/ee1d0b44-0fce-4cf6-8545-e1dc961d0668/view?access_token=c0489b861ab65f63be7e3c5ce962003a2a0197660e67ecb140c477c2e11b5fe3"> IBM Cloud Object Storage In Python</a>
 
3) <a href="https://console.bluemix.net/docs/services/cloud-object-storage/libraries/python.html#using-python"> IBM Cloud DocsCloud Object Storage In Python</a>