
# Write and Save Files in Python

### This Notebook is a part of my studies for IBM Certification in Data Science Professional

## What I learned:

-   Write to files using Python libraries


<h2>Table of Contents</h2>
<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ul>
        <li><a href="write">Writing Files</a></li>
        <li><a href="Append">Appending Files</a></li>
        <li><a href="add">Additional File modes</a></li>
        <li><a href="copy">Copy a File</a></li>
    </ul>

</div>

<hr>


<h2 id="write">Writing Files</h2>


 We can open a file object using the method <code>write()</code> to save the text file to a list. To write the mode, argument must be set to write <b>w</b>. Let’s write a file <b>Example2.txt</b> with the line: <b>“This is line A”</b>


In [1]:
# Write line to file
exmp2 = '/resources/data/Example2.txt'
with open(exmp2, 'w') as writefile:
    writefile.write("This is line A")

 We can read the file to see if it worked:


In [2]:
# Read file

with open(exmp2, 'r') as testwritefile:
    print(testwritefile.read())

This is line A


We can write multiple lines:


In [3]:
# Write lines to file

with open(exmp2, 'w') as writefile:
    writefile.write("This is line A\n")
    writefile.write("This is line B\n")

The method <code>.write()</code> works similar to the method <code>.readline()</code>, except instead of reading a new line it writes a new line. The process is illustrated in the figure , the different colour coding of the grid represents a new line added to the file after each method call.


<img src="https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Images/WriteLine.png" width="500" />


You can check the file to see if your results are correct 


In [4]:
# Check whether write to file

with open(exmp2, 'r') as testwritefile:
    print(testwritefile.read())

This is line A
This is line B



 We write a list to a <b>.txt</b> file  as follows:


In [5]:
# Sample list of text

Lines = ["This is line A\n", "This is line B\n", "This is line C\n"]
Lines

['This is line A\n', 'This is line B\n', 'This is line C\n']

In [6]:
# Write the strings in the list to text file

with open('Example2.txt', 'w') as writefile:
    for line in Lines:
        print(line)
        writefile.write(line)

This is line A

This is line B

This is line C



 We can verify the file is written by reading it and printing out the values:  


In [7]:
# Verify if writing to file is successfully executed

with open('Example2.txt', 'r') as testwritefile:
    print(testwritefile.read())

This is line A
This is line B
This is line C



However, note that setting the mode to **w** overwrites all the existing data in the file.


In [8]:
with open('Example2.txt', 'w') as writefile:
    writefile.write("Overwrite\n")
with open('Example2.txt', 'r') as testwritefile:
    print(testwritefile.read())

Overwrite



<hr>
<h2 id="Append">Appending Files</h2>


 We can write to files without losing any of the existing data as follows by setting the mode argument to append **a**.  We can append a new line as follows:


In [12]:
# Write a new line to text file

with open('Example2.txt', 'a') as testwritefile:
    testwritefile.write("This is line C\n")
    testwritefile.write("This is line D\n")
    testwritefile.write("This is line E\n")

 We can verify the file has changed by running the following cell:


In [13]:
# Verify if the new line is in the text file

with open('Example2.txt', 'r') as testwritefile:
    print(testwritefile.read())

Overwrite
This is line C
This is line D
This is line E
This is line C
This is line D
This is line E



<hr>
<h2 id="add">Additional modes</h2> 


It's fairly ineffecient to open the file in **a** or **w** and then reopening it in **r** to read any lines. Luckily we can access the file in the following modes:

-   **r+** : Reading and writing. Cannot truncate the file.
-   **w+** : Writing and reading. Truncates the file.
-   **a+** : Appending and Reading. Creates a new file, if none exists.
    We dont have to dwell on the specifics of each mode for this lab. 


Let's try out the **a+** mode:


In [14]:
with open('Example2.txt', 'a+') as testwritefile:
    testwritefile.write("This is line E\n")
    print(testwritefile.read())




There were no errors but <code>read() </code> also did not output anything. This is because of our location in the file.


Most of the file methods we've looked at work in a certain location in the file. <code>.write() </code> writes at a certain location in the file. <code>.read()</code> reads at a certain location in the file and so on. We can think of this as moving your pointer around in the notepad to make changes at specific location.


Opening the file in **w** is akin to opening the .txt file, moving your cursor to the beginning of the text file, writing new text and deleting everything that follows.
Whereas opening the file in **a** is similiar to opening the .txt file, moving your cursor to the very end and then adding the new pieces of text. <br>
It is often very useful to know where the 'cursor' is in a file and be able to control it. The following methods allow us to do precisely this -

-   <code>.tell()</code> - returns the current position in bytes
-   <code>.seek(offset,from)</code> - changes the position by 'offset' bytes with respect to 'from'. From can take the value of 0,1,2 corresponding to beginning, relative to current position and end


Now lets revisit **a+**


In [1]:
with open('Example2.txt', 'a+') as testwritefile:
    print("Initial Location: {}".format(testwritefile.tell()))
    
    data = testwritefile.read()
    if (not data):  #empty strings return false in python
            print('Read nothing') 
    else: 
            print(testwritefile.read())
            
    testwritefile.seek(0,0) # move 0 bytes from beginning.
    
    print("\nNew Location : {}".format(testwritefile.tell()))
    data = testwritefile.read()
    if (not data): 
            print('Read nothing') 
    else: 
            print(data)
    
    print("Location after read: {}".format(testwritefile.tell()) )

Initial Location: 30
Read nothing

New Location : 0
Line 1
Line 2
Line 3
finished

Location after read: 30


Finally, a note on the difference between **w+** and **r+**. Both of these modes allow access to read and write methods, However opening a file in **w+** overwrites it and deletes all existing data. <br>
To work with a file on existing data, use **r+** and **a+**. While using **r+**, it can be useful to add a <code>.truncate()</code> method at the end of your data. This will reduce the file to your data and delete everything that follows. <br>
In the following code block, Run the code as it is first and then run it with the <code>.truncate()</code>.


In [2]:
with open('Example2.txt', 'r+') as testwritefile:
    data = testwritefile.readlines()
    testwritefile.seek(0,0) #write at beginning of file
   
    testwritefile.write("Line 1" + "\n")
    testwritefile.write("Line 2" + "\n")
    testwritefile.write("Line 3" + "\n")
    testwritefile.write("finished\n")
    #Uncomment the line below
    #testwritefile.truncate()
    testwritefile.seek(0,0)
    print(testwritefile.read())
    

Line 1
Line 2
Line 3
finished



<hr>


<h2 id="copy">Copy a File</h2> 


Let's copy the file <b>Example2.txt</b> to the file <b>Example3.txt</b>:


In [3]:
# Copy file to another

with open('Example2.txt','r') as readfile:
    with open('Example3.txt','w') as writefile:
          for line in readfile:
                writefile.write(line)

We can read the file to see if everything works:


In [4]:
# Verify if the copy is successfully executed

with open('Example3.txt','r') as testwritefile:
    print(testwritefile.read())

Line 1
Line 2
Line 3
finished



 After reading files, we can also write data into files and save them in different file formats like **.txt, .csv, .xls (for excel files) etc**. You will come across these in further examples


Now go to the directory to ensure the <b>.txt</b> file exists and contains the summary data that we wrote.


<hr>


<h2> Exercise </h2>


Your local university's Raptors fan club maintains a register of its active members on a .txt document. Every month they update the file by removing the members who are not active. You have been tasked with automating this with your python skills. <br>
Given the file currentMem, Remove each member with a 'no' in their inactive coloumn. Keep track of each of the removed members and append them to the exMem file. Make sure the format of the original files in preserved.   (_Hint: Do this by reading/writing whole lines and ensuring the header remains_ )
<br>
Run the code block below prior to starting the exercise. The skeleton code has been provided for you, Edit only the cleanFiles function.


In [5]:
#Run this prior to starting the exercise
from random import randint as rnd

memReg = 'members.txt'
exReg = 'inactive.txt'
fee =('yes','no')

def genFiles(current,old):
    with open(current,'w+') as writefile: 
        writefile.write('Membership No  Date Joined  Active  \n')
        data = "{:^13}  {:<11}  {:<6}\n"

        for rowno in range(20):
            date = str(rnd(2015,2020))+ '-' + str(rnd(1,12))+'-'+str(rnd(1,25))
            writefile.write(data.format(rnd(10000,99999),date,fee[rnd(0,1)]))


    with open(old,'w+') as writefile: 
        writefile.write('Membership No  Date Joined  Active  \n')
        data = "{:^13}  {:<11}  {:<6}\n"
        for rowno in range(3):
            date = str(rnd(2015,2020))+ '-' + str(rnd(1,12))+'-'+str(rnd(1,25))
            writefile.write(data.format(rnd(10000,99999),date,fee[1]))


genFiles(memReg,exReg)



Start your solution below:


In [6]:

def cleanFiles(currentMem,exMem):
    '''
    currentMem: File containing list of current members
    exMem: File containing list of old members
    
    Removes all rows from currentMem containing 'no' and appends them to exMem
    '''
    
    with open(currentMem,'r+') as readfile:
        with open(exMem,'a+') as appendfile:
            #get the data
            readfile.seek(0,0)
            members = readfile.readlines()
            #remove the header
            header = members[0]
            members.pop(0)
            '''
            inactive = []
            for member in members:
                if 'no' in member:
                    inactive.append(member)
            '''   
            #go to the beginning of the write file
            readfile.seek(0) 
            readfile.write(header)
            for member in members:
                if ( 'no' in member):
                    appendfile.write(member)
                else:
                    readfile.write(member)      
            readfile.truncate()
            

# Code to help see the files
# Leave as is
memReg = 'members.txt'
exReg = 'inactive.txt'
cleanFiles(memReg,exReg)


headers = "Membership No  Date Joined  Active  \n"
with open(memReg,'r') as readFile:
    print("Active Members: \n\n")
    print(readFile.read())
    
with open(exReg,'r') as readFile:
    print("Inactive Members: \n\n")
    print(readFile.read())
                
    

Active Members: 


Membership No  Date Joined  Active  
    22743      2016-12-6    yes   
    97829      2020-12-17   yes   
    71547      2016-1-24    yes   
    80875      2019-8-7     yes   
    87495      2019-6-1     yes   
    64321      2016-10-16   yes   
    40796      2016-5-9     yes   
    43906      2018-10-12   yes   
    48942      2017-12-7    yes   
    31444      2016-9-11    yes   
    31424      2019-6-24    yes   
    50990      2017-7-2     yes   

Inactive Members: 


Membership No  Date Joined  Active  
    55263      2018-9-12    no    
    78491      2019-10-22   no    
    89209      2017-12-1    no    
    69817      2016-9-1     no    
    26085      2016-4-13    no    
    19088      2019-5-16    no    
    30584      2020-11-2    no    
    68030      2016-5-5     no    
    67139      2015-7-13    no    
    81473      2020-8-11    no    
    21247      2019-5-9     no    



Run the following to verify your code:


In [7]:
def testMsg(passed):
    if passed:
       return 'Test Passed'
    else :
       return 'Test Failed'

testWrite = "testWrite.txt"
testAppend = "testAppend.txt" 
passed = True

genFiles(testWrite,testAppend)

with open(testWrite,'r') as file:
    ogWrite = file.readlines()

with open(testAppend,'r') as file:
    ogAppend = file.readlines()

try:
    cleanFiles(testWrite,testAppend)
except:
    print('Error')

with open(testWrite,'r') as file:
    clWrite = file.readlines()

with open(testAppend,'r') as file:
    clAppend = file.readlines()
        
# checking if total no of rows is same, including headers

if (len(ogWrite) + len(ogAppend) != len(clWrite) + len(clAppend)):
    print("The number of rows do not add up. Make sure your final files have the same header and format.")
    passed = False
    
for line in clWrite:
    if  'no' in line:
        passed = False
        print("Inactive members in file")
        break
    else:
        if line not in ogWrite:
            print("Data in file does not match original file")
            passed = False
print ("{}".format(testMsg(passed)))
    



Test Passed


<h2>About the Author:</h2> 

<a href="https://www.linkedin.com/in/wanderson-torres-31049522/">Wanderson Torres</a>
