#<u><center>File Handling</u>

This notebook will give you an introduction to the concept of File Handling in Python.

The following topics will be covered.

[1. Printing output to screen and taking input from keyboard](#Printingoutput)

[2. File Reading](#Reading)

[3. File Writing](#Writing)

[4. Appending to an existing file](#Append)

[5. Introduction to os](#os)

[6. Introduction to pandas](#Introduction)



<a name='Printingoutput'></a>
### <u>Printing output to screen and taking input from keyboard</u>


##### <u>input('prompt')</u>

input is a built in python function that takes input from users using keyboard<br>
<pre><b>Arguments</b>
    prompt => The text that is prompted when Input is asked from the user<br>
<b>Return</b>
    Return => The input the user gives. This is taken as a <b>String</b></pre>
[Read about input()](https://docs.python.org/3/library/functions.html#input)
<br><br><br>


##### <u>print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False)</u>
print is a built in python function used to print values (Objects) into screen
<pre><b>Arguments</b>
    *objects => The objects to be printed(eg:- strings,int,float,dictionaries.....)<br>
    sep => How to separate the objects from each other (default = ' ').<br>
    end => How to end the print (default = '/n')
<b>Return</b>
    Return => Output the objects into the screen</pre>
[Read about print()](https://docs.python.org/3/library/functions.html#print)

<a id='mode_table'></a>

![title](Images/AccessModes.PNG)
![title](Images/FileObjectAttribute.PNG)

In [4]:
num_1 = input('Input your first number my friend :- ')
num_2 = input('Input your second number my friend :- ')
num_1 = int(num_1)
num_2 = int(num_2)

sum_of_numbers = num_1 + num_2
#print('The sum of the two numbers',sum_of_numbers,sep=' :- ',end=' finished')
#print('The sum of the two numbers',sum_of_numbers,sep=' ',end='\n')
print('The sum of the two numbers',sum_of_numbers)

Input your first number my friend :-  5
Input your second number my friend :-  5


The sum of the two numbers 10


<a name='Reading'></a>
### <u>File Reading</u>

##### <u>open(file_name, mode, buffering)</u>

open is a built in python function that is used to create a file object (representation of a file <b>within</b> python) for a specific file <br>
<pre><b>Arguments</b>
    file_name => The file path (location of file, <b>relative</b> or <b>absolute</b>)<br>
    mode => Access mode of the file(discussed below) (default is 'r')<br>
    buffering => buffering is an optional integer used to set the buffering policy (check python documentation using the link below)
<b>Return</b>
    Return => The python file object</pre>
<br>

[Read about open()](https://docs.python.org/3/library/functions.html#open)
<br><br><br>
##### <u>f.close()</u>

f.close() (where f is a file object) is a <b>file object method/function</b> that is used to close a file object. it f flushes any unwritten data and after flosing no more writing or reading can be done from f<br>
<br>




In [5]:
fileObject = open("Files/example.txt", mode = "r")

#### There are three ways to read the file mainly

![title](Images/new_reads.PNG)

#### 1. Using read(). 
<pre>
    This method reads the whole file if a count argument is not given , if the count argument is given the read() function<br>    reads until the numbrt of characters read is equal to count 
</pre>

In [6]:
string = fileObject.read()
print(string)

Hello guys,
Welcome to the python workshop
conducted by YGSL
This is the example text
let's learn how to load a file
line by line 
in a python script.
Thanks for joining guys.


In [7]:
string = fileObject.read() #why does it result in an empty string
string

''

In [8]:
fileObject.seek(0,0); #set pointer to zero or re-open file object which reset the pointer

[Read about seek() here!!!](https://pscustomobject.github.io/python/Python-Reset-Read-Write-Position/)

In [9]:
string = fileObject.read(10)
print(string)
fileObject.seek(0,0); #resetting pointer since I'm using the same fileObject for next examples

Hello guys


#### 2. using readLine().

<pre>
    The fileObject.readline() method reads a single line from a file Object, a newline character ('\n') is left at the end 
    of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. If f.readline()
    returns an empty string, the end of the file f has been reached 
</pre>

In [10]:
line = fileObject.readline()
line

'Hello guys,\n'

In [11]:
fileObject.seek(0,0);

In [12]:
line = fileObject.readline(15)
line

'Hello guys,\n'

In [13]:
fileObject.seek(0,0);

In [14]:
line = fileObject.readline(25)
line

'Hello guys,\n'

In [15]:
fileObject.close()

In [16]:
fileObject.read()

ValueError: I/O operation on closed file.

#### 2.5. using readLines().

In [19]:
fileObject = open("Files/example.txt", mode = "r")

In [20]:
lines = fileObject.readlines()
lines

['Hello guys,\n',
 'Welcome to the python workshop\n',
 'conducted by YGSL\n',
 'This is the example text\n',
 "let's learn how to load a file\n",
 'line by line \n',
 'in a python script.\n',
 'Thanks for joining guys.']

#### 3. using with open.

The specialty of this method is that you don't require to close the file object manually and the file object can't be accessed outside the <b>with open</b> and since a new object is created each time thr code runs no need to reset the pointer of the file

In [None]:
with open("Files/example.txt", "r") as fo:
    for line in fo:
        print(line)
        print('*')# added to show that each line is separately printed

Hello guys,

*
Welcome to the python workshop

*
conducted by YGSL

*
This is the example text

*
let's learn how to load a file

*
line by line 

*
in a python script.

*
Thanks for joining guys.
*


<a name='Writing'></a>
### <u>File Writing</u>

Used to write to a file object. If file doesnt exist this method will create and then write, check the mode your creating the file object always.

[Read more about write() and other file handling functions](https://www.tutorialspoint.com/python/python_files_io.htm)

In [None]:
file_obj = open('Files/text_write.txt',mode='w')

In [None]:
file_obj.write( "Python is a great language.\nYeah I love it!\n");

In [None]:
file_obj.close()

<a name='Append'></a>
### <u>Appending to an existing file</u>

when opening the file we need to open it in append mode read the above [table](#mode_table) for the modes.
<b>In the append mode remember that the file pointer is at the end of the file</b>

In [None]:
file_obj = open('Files/text_write.txt',mode='a+')

In [None]:
file_obj.write(' \nJust appending some text\nlests see what happens guys.');

In [None]:
#since the file pointer is at the end of the file I'm resetting it to the begining before printing the lines
file_obj.seek(0,0);

In [None]:
for line in file_obj:
    print(line)

Python is a great language.

Yeah I love it!

 

Just appending some text

lests see what happens guys.


<a name='os'></a>
### <u>Introduction to os Module</u>

os is a python module that is used for os related tasks a few important functions are iven below but the os modules have many more applications that we highly encourage you to explore on your own

<b>Note :- Knowledge about this module is not tested in first semester examinations and we encourage you to NOT use it any university examnations during first sem but when your using python as a practicle tool and for further learning (eg:- deeplearning/machine learning etc.) the os module is a valueable tool</b>

In [22]:
import os

##### 1. Checking whether a file exists

In [23]:
os.path.exists('./Files/example.txt') #path can be relative or absolute

True

In [24]:
os.path.exists('./Images/image_that_isnt_there.jpg')

False

##### 2. List of files inside a directory

In [25]:
os.listdir('./Files')

['.ipynb_checkpoints', 'example.txt', 'text_write.txt']

##### 3. Make Directory

In [26]:
os.mkdir('./Test_os_mkdir')

In [27]:
if not 'Test_os_mkdir' in os.listdir('./'):
    os.mkdir('./Test_os_mkdir')

##### 4. Delete file or directory


In [28]:
with open('Test_os_mkdir/test_file.txt',mode = 'w') as file:
    file.write('Hello the file here is for a test')

In [29]:
os.remove('Test_os_mkdir/test_file.txt')

In [30]:
os.rmdir('Test_os_mkdir') #file must be empty to use this

<pre>import shutil
    shutil.rmtree(path)</pre>
    
    
Use this if the file is non empty
[Read this](https://docs.python.org/3/library/shutil.html) and [this](https://stackoverflow.com/questions/1557351/python-delete-non-empty-dir) for more information

##### 5. Rename File

In [31]:
os.mkdir('Test')
with open('Test/test_file.txt','w') as file_obj:
    file_obj.write('This is pre work to show renaming')

In [32]:
os.rename('Test/test_file.txt','Test/renamed_file.txt')#renaming file

In [33]:
os.rmdir('Test') # see how this cell gives an error since it is not empty

OSError: [WinError 145] The directory is not empty: 'Test'

In [34]:
os.remove('Test/renamed_file.txt')
os.rmdir('Test')

##### 6.Getting Working Directory

![title](Images/cwd.PNG)

In [35]:
os.getcwd()

'F:\\YGSL\\CompleteFiles_PythonWorkshop\\Week-5'

<a name='Introduction'></a>
### <u>Introduction to pandas</u>

Pandas is a python library which is used to manipulate large quantities of data. <b>This is a very important in Data Science.</b>

![title](Images/pandas.jpg)


<b>Note :- Knowledge about this module is not tested in first semester examinations and we encourage you to NOT use it any university examnations during first sem but when your using python as a practicle tool and for further learning (eg:- deeplearning/machine learning etc.) the os module is a valueable tool</b><br>

#### [Read Pandas Documentation](https://pandas.pydata.org/docs/reference/index.html)

Pandas is mainly used for <b>DataFrames</b>. Data Frames can be thought of as objects which represnts excel sheets inside of files and data in large formats are usually saved in csv files(aka Comma Separated Files , These like excel sheets but not as organized as them thus require less resources).

<b>Pandas is a very widly used python library and has many functions to manupilate other types of data such as audio , JSON ,html but we will only discuss about CSV handling which is the what pandas is mainly used for.</b>

![title](Images/pandas_read_functions.png)
![title](Images/pandas_read_functions_2.png)

![title](Images/PandasIntro.PNG)

## Pandas Series

In [37]:
import pandas as pd #by using as we are assigning a special name for pandas otherwise we will have to write pandas all the time ,
                    #this is a good practice 

##### 1. Series Creation

In [36]:
object1 = [10,20,30]
object2 = ['a','b','c']
object3 = {
    'a':10,
    'b':20,
    "c":30
}

In [38]:
pd.Series(data=object1)

0    10
1    20
2    30
dtype: int64

In [39]:
pd.Series(data=object1,index=object2)

a    10
b    20
c    30
dtype: int64

In [41]:
series_1 = pd.Series(data=object3)
series_1

a    10
b    20
c    30
dtype: int64

In [42]:
series_1['b']

20

#### 1. Read a CSV File

[Read about pd.read_csv() in docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv)

In [44]:
dataframe = pd.read_csv('CSV/temperature.csv')

In [45]:
dataframe #this is a dataframe object (like a excel sheet representation within python)

Unnamed: 0,datetime,Vancouver,Portland,San Francisco,Seattle,Los Angeles,San Diego,Las Vegas,Phoenix,Albuquerque,...,Philadelphia,New York,Montreal,Boston,Beersheba,Tel Aviv District,Eilat,Haifa,Nahariyya,Jerusalem
0,2012-10-01 12:00:00,,,,,,,,,,...,,,,,,,309.100000,,,
1,2012-10-01 13:00:00,284.630000,282.080000,289.480000,281.800000,291.870000,291.530000,293.410000,296.600000,285.120000,...,285.630000,288.220000,285.830000,287.170000,307.590000,305.470000,310.580000,304.4,304.4,303.5
2,2012-10-01 14:00:00,284.629041,282.083252,289.474993,281.797217,291.868186,291.533501,293.403141,296.608509,285.154558,...,285.663208,288.247676,285.834650,287.186092,307.590000,304.310000,310.495769,304.4,304.4,303.5
3,2012-10-01 15:00:00,284.626998,282.091866,289.460618,281.789833,291.862844,291.543355,293.392177,296.631487,285.233952,...,285.756824,288.326940,285.847790,287.231672,307.391513,304.281841,310.411538,304.4,304.4,303.5
4,2012-10-01 16:00:00,284.624955,282.100481,289.446243,281.782449,291.857503,291.553209,293.381213,296.654466,285.313345,...,285.850440,288.406203,285.860929,287.277251,307.145200,304.238015,310.327308,304.4,304.4,303.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45248,2017-11-29 20:00:00,,282.000000,,280.820000,293.550000,292.150000,289.540000,294.710000,285.720000,...,290.240000,,275.130000,288.080000,,,,,,
45249,2017-11-29 21:00:00,,282.890000,,281.650000,295.680000,292.740000,290.610000,295.590000,286.450000,...,289.240000,,274.130000,286.020000,,,,,,
45250,2017-11-29 22:00:00,,283.390000,,282.750000,295.960000,292.580000,291.340000,296.250000,286.440000,...,286.780000,,273.480000,283.940000,,,,,,
45251,2017-11-29 23:00:00,,283.020000,,282.960000,295.650000,292.610000,292.150000,297.150000,286.140000,...,284.570000,,272.480000,282.170000,,,,,,


In [46]:
dataframe['datetime']

0        2012-10-01 12:00:00
1        2012-10-01 13:00:00
2        2012-10-01 14:00:00
3        2012-10-01 15:00:00
4        2012-10-01 16:00:00
                ...         
45248    2017-11-29 20:00:00
45249    2017-11-29 21:00:00
45250    2017-11-29 22:00:00
45251    2017-11-29 23:00:00
45252    2017-11-30 00:00:00
Name: datetime, Length: 45253, dtype: object

In [47]:
dataframe.datetime

0        2012-10-01 12:00:00
1        2012-10-01 13:00:00
2        2012-10-01 14:00:00
3        2012-10-01 15:00:00
4        2012-10-01 16:00:00
                ...         
45248    2017-11-29 20:00:00
45249    2017-11-29 21:00:00
45250    2017-11-29 22:00:00
45251    2017-11-29 23:00:00
45252    2017-11-30 00:00:00
Name: datetime, Length: 45253, dtype: object

#### 2. Read first and last N rows

[Read about head() from docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html)
<br>
[Read about tail() from docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html)

In [None]:
N = int(input('How many rows :- '))

How many rows :-  5


In [None]:
dataframe.head(N)

Unnamed: 0,datetime,Vancouver,Portland,San Francisco,Seattle,Los Angeles,San Diego,Las Vegas,Phoenix,Albuquerque,...,Philadelphia,New York,Montreal,Boston,Beersheba,Tel Aviv District,Eilat,Haifa,Nahariyya,Jerusalem
0,2012-10-01 12:00:00,,,,,,,,,,...,,,,,,,309.1,,,
1,2012-10-01 13:00:00,284.63,282.08,289.48,281.8,291.87,291.53,293.41,296.6,285.12,...,285.63,288.22,285.83,287.17,307.59,305.47,310.58,304.4,304.4,303.5
2,2012-10-01 14:00:00,284.629041,282.083252,289.474993,281.797217,291.868186,291.533501,293.403141,296.608509,285.154558,...,285.663208,288.247676,285.83465,287.186092,307.59,304.31,310.495769,304.4,304.4,303.5
3,2012-10-01 15:00:00,284.626998,282.091866,289.460618,281.789833,291.862844,291.543355,293.392177,296.631487,285.233952,...,285.756824,288.32694,285.84779,287.231672,307.391513,304.281841,310.411538,304.4,304.4,303.5
4,2012-10-01 16:00:00,284.624955,282.100481,289.446243,281.782449,291.857503,291.553209,293.381213,296.654466,285.313345,...,285.85044,288.406203,285.860929,287.277251,307.1452,304.238015,310.327308,304.4,304.4,303.5


In [None]:
dataframe.tail(N)

Unnamed: 0,datetime,Vancouver,Portland,San Francisco,Seattle,Los Angeles,San Diego,Las Vegas,Phoenix,Albuquerque,...,Philadelphia,New York,Montreal,Boston,Beersheba,Tel Aviv District,Eilat,Haifa,Nahariyya,Jerusalem
45248,2017-11-29 20:00:00,,282.0,,280.82,293.55,292.15,289.54,294.71,285.72,...,290.24,,275.13,288.08,,,,,,
45249,2017-11-29 21:00:00,,282.89,,281.65,295.68,292.74,290.61,295.59,286.45,...,289.24,,274.13,286.02,,,,,,
45250,2017-11-29 22:00:00,,283.39,,282.75,295.96,292.58,291.34,296.25,286.44,...,286.78,,273.48,283.94,,,,,,
45251,2017-11-29 23:00:00,,283.02,,282.96,295.65,292.61,292.15,297.15,286.14,...,284.57,,272.48,282.17,,,,,,
45252,2017-11-30 00:00:00,,282.28,,283.04,294.93,291.4,291.64,297.15,284.7,...,283.42,,271.8,280.65,,,,,,


#### 3. Get shape of Dataframe
[Read about shape from docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html)

In [None]:
dataframe.shape #output in the form (no of rows,no of columns)

(45253, 37)

#### 4. Getting specific columns

[Read more](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/03_subset_data.html)

In [None]:
few_columns = dataframe[['datetime','Vancouver','Portland','San Francisco']] #returns a new dataframe 

In [None]:
few_columns #new dataframe so we can apply all the functions to this as well

Unnamed: 0,datetime,Vancouver,Portland,San Francisco
0,2012-10-01 12:00:00,,,
1,2012-10-01 13:00:00,284.630000,282.080000,289.480000
2,2012-10-01 14:00:00,284.629041,282.083252,289.474993
3,2012-10-01 15:00:00,284.626998,282.091866,289.460618
4,2012-10-01 16:00:00,284.624955,282.100481,289.446243
...,...,...,...,...
45248,2017-11-29 20:00:00,,282.000000,
45249,2017-11-29 21:00:00,,282.890000,
45250,2017-11-29 22:00:00,,283.390000,
45251,2017-11-29 23:00:00,,283.020000,


#### 5. Getting Specific value at row and column


<pre>we can use 
    1. dataframe.loc[]
    2. dataframe.at[] 
    3. dataframe.iloc[]
    4. dataframe.iat[]
To get the differences of these functions please read the pandas documentations which is linked below.</pre>


[Read about .loc[]](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) 
<br>
[Read about .iloc[]](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html)
<br>
[Read about .at[]](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.at.html)
<br>
[Read about .iat[]](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iat.html)

In [None]:
dataframe.loc[4,'Vancouver']

284.624954535

In [None]:
dataframe.iloc[4,4]

281.78244858

#### 6. Getting a specific row with an attribute

In [None]:
dataframe.Vancouver #Gives the values of the column as a series (like a dataframe but only one column)

0               NaN
1        284.630000
2        284.629041
3        284.626998
4        284.624955
            ...    
45248           NaN
45249           NaN
45250           NaN
45251           NaN
45252           NaN
Name: Vancouver, Length: 45253, dtype: float64

In [None]:
df = dataframe.loc[dataframe.Vancouver >= 305]

In [None]:
df

Unnamed: 0,datetime,Vancouver,Portland,San Francisco,Seattle,Los Angeles,San Diego,Las Vegas,Phoenix,Albuquerque,...,Philadelphia,New York,Montreal,Boston,Beersheba,Tel Aviv District,Eilat,Haifa,Nahariyya,Jerusalem
34041,2016-08-19 21:00:00,305.9,310.22,298.93,305.19,305.65,303.11,308.15,312.4,300.42,...,303.08,302.85,294.59,300.96,293.173,299.35,301.323,301.173,301.173,299.25
34042,2016-08-19 22:00:00,306.69,310.98,299.39,305.81,306.27,303.15,310.37,312.62,297.84,...,302.29,302.73,293.39,299.78,293.173,298.65,301.323,301.173,301.173,298.55
34043,2016-08-19 23:00:00,307.0,310.44,299.34,306.07,305.67,301.98,310.37,312.81,296.79,...,300.86,301.7,292.78,298.34,290.96,298.15,298.61,300.96,300.96,297.95
34044,2016-08-20 00:00:00,306.69,309.8,298.63,305.42,304.53,300.88,310.37,312.21,295.67,...,298.43,299.94,291.14,296.6,290.96,297.65,298.61,300.96,300.96,297.65
34045,2016-08-20 01:00:00,306.06,308.88,298.1,304.37,302.53,299.16,309.82,310.63,290.42,...,296.89,298.94,289.77,295.22,290.96,297.85,298.61,300.96,300.96,297.85
34046,2016-08-20 02:00:00,305.3,307.82,294.64,302.68,299.53,296.71,307.59,309.14,288.19,...,295.67,298.11,288.81,294.31,290.42,297.45,296.52,300.82,300.82,297.45


In [None]:
df = dataframe.loc[(dataframe.Vancouver >= 305)&(dataframe.Portland>=310)]

In [None]:
df

Unnamed: 0,datetime,Vancouver,Portland,San Francisco,Seattle,Los Angeles,San Diego,Las Vegas,Phoenix,Albuquerque,...,Philadelphia,New York,Montreal,Boston,Beersheba,Tel Aviv District,Eilat,Haifa,Nahariyya,Jerusalem
34041,2016-08-19 21:00:00,305.9,310.22,298.93,305.19,305.65,303.11,308.15,312.4,300.42,...,303.08,302.85,294.59,300.96,293.173,299.35,301.323,301.173,301.173,299.25
34042,2016-08-19 22:00:00,306.69,310.98,299.39,305.81,306.27,303.15,310.37,312.62,297.84,...,302.29,302.73,293.39,299.78,293.173,298.65,301.323,301.173,301.173,298.55
34043,2016-08-19 23:00:00,307.0,310.44,299.34,306.07,305.67,301.98,310.37,312.81,296.79,...,300.86,301.7,292.78,298.34,290.96,298.15,298.61,300.96,300.96,297.95


In [None]:
df = dataframe.loc[(dataframe.Vancouver >= 305)&(dataframe.Portland>=310)][['Vancouver','Portland']]

In [None]:
df

Unnamed: 0,Vancouver,Portland
34041,305.9,310.22
34042,306.69,310.98
34043,307.0,310.44


#### 7. Getting unique values in a column and Getting column names

[Read about .unique](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.unique.html)

In [None]:
columns = dataframe.columns

In [None]:
columns

Index(['datetime', 'Vancouver', 'Portland', 'San Francisco', 'Seattle',
       'Los Angeles', 'San Diego', 'Las Vegas', 'Phoenix', 'Albuquerque',
       'Denver', 'San Antonio', 'Dallas', 'Houston', 'Kansas City',
       'Minneapolis', 'Saint Louis', 'Chicago', 'Nashville', 'Indianapolis',
       'Atlanta', 'Detroit', 'Jacksonville', 'Charlotte', 'Miami',
       'Pittsburgh', 'Toronto', 'Philadelphia', 'New York', 'Montreal',
       'Boston', 'Beersheba', 'Tel Aviv District', 'Eilat', 'Haifa',
       'Nahariyya', 'Jerusalem'],
      dtype='object')

In [None]:
unique = dataframe.Vancouver.unique() # Note that dataframe.Vancouver gives a series

In [None]:
unique

array([         nan, 284.63      , 284.62904131, ..., 286.93775694,
       288.64453889, 282.777     ])