# Lecture 28 Notes

## Files and Folders

Reading and writing files is a common Python task. There are two main kinds of
files: **text files**, that consist of plain text and can viewed and edited in
any text editor. And **binary files**, which are every other kind of file, e.g.
image files, video files, etc..

A **folder** (or **directory**) is a named collection of files and folders. In
most operating systems, folders forms a hierarchy, with a single folder at the
top called the **root** folder.

## Determining What Folder You're In

Before we start opening files, here's a useful trick for being sure what folder
or directory (*folder* and *directory* mean the same thing) you're reading from:

In [2]:
import os

print(os.getcwd())             # cwd is "current working directory"
all_files = os.listdir(os.getcwd())
print(all_files)
print(len(all_files), "files in total")

/mnt/c/Users/tjdon/OneDrive - Simon Fraser University (1sfu)/courses/2024/120fall2024/private/notebooks
['austenPandP.txt', 'comparisons.xlsx', 'Lecture10.ipynb', 'Lecture11.ipynb', 'Lecture12.ipynb', 'Lecture13.ipynb', 'Lecture14.ipynb', 'Lecture15.ipynb', 'Lecture16.ipynb', 'Lecture17.ipynb', 'Lecture18.ipynb', 'Lecture19.ipynb', 'Lecture2.ipynb', 'Lecture20.ipynb', 'Lecture21.ipynb', 'Lecture22.ipynb', 'Lecture23.ipynb', 'Lecture24.ipynb', 'Lecture25.ipynb', 'Lecture26.ipynb', 'Lecture27.ipynb', 'Lecture28.ipynb', 'Lecture29.ipynb', 'Lecture3.ipynb', 'Lecture30.ipynb', 'Lecture4.ipynb', 'Lecture5.ipynb', 'Lecture6.ipynb', 'Lecture7.ipynb', 'Lecture8.ipynb', 'Lecture9.ipynb', 'LectureN.ipynb', 'no_long_lines.py', 'scratch.ipynb', 'searchRealTimeGraph_small.png', 'some_long_lines.py', 'sortingTimeComp_small.png', 'turtle_test.ipynb']
38 files in total


`os.getcwd()` returns the *current working directory*, i.e. the directory that
Python will read files from by default. This tells you where Python will
read/write files.

`os.listdir(os.getcwd())` returns a list of all the files and folders in the
current working directory.

## Reading a Text File Line by Line

[joke.txt](joke.txt) is a text file containing this:

```
Who’s there?
A broken pencil.
A broken pencil who?
Never mind. It’s pointless.
```

This code opens that file and prints it line-by-line:


In [3]:
textfile = open('joke.txt')  # assumes joke.txt is in os.getcwd()
for line in textfile:
    print(line)

Who's there?

A broken pencil.

A broken pencil who?

Never mind. It's pointless.


The printing is *double-spaced* because each line of `joke.txt` ends with a `\n`
(which causes a new line when printed), and Python's `print` always adds a `\n`
after what it prints. If we want single-spacing, we can tell `print` *not* add a
final `\n`:


In [4]:
textfile = open('joke.txt')
for line in textfile:
    print(line, end='')  # don't put a \n after line

Who's there?
A broken pencil.
A broken pencil who?
Never mind. It's pointless.

Or we can check which lines end with a `\n` and not print the final `\n`:


In [5]:
textfile = open('joke.txt')
for line in textfile:
    if line[-1] == '\n':
        print(line[:-1])  # don't print the last character of line
    else:
        print(line)

Who's there?
A broken pencil.
A broken pencil who?
Never mind. It's pointless.


Removing a single `\n` from the end of a string is a common enough operation
that we can  write a function to do it:

In [6]:
def chop(s):
    """If s ends with a \n, remove it. Otherwise returns s unchanged.
    """
    if s == '': 
        return ''
    elif s[-1] == '\n': 
        return s[:-1]
    else:
        return s

Now the previous program can be written more simply:

In [7]:
textfile = open('joke.txt')
for line in textfile:
    print(chop(line))

Who's there?
A broken pencil.
A broken pencil who?
Never mind. It's pointless.


## Reading Lines of Text File into a List

Something that's often useful to do is to read the lines of file into a list:


In [8]:
textfile = open('joke.txt')
all_lines = []
for line in textfile:
    all_lines.append(chop(line))  # chop removes a \n at the end line,
                                  # if there is one
print(all_lines)
print()
print(f'number of lines: {len(all_lines)}')

["Who's there?", 'A broken pencil.', 'A broken pencil who?', "Never mind. It's pointless."]

number of lines: 4


The built-in method `readlines` reads a text file into a list of strings:


In [15]:
f = open('joke.txt')
all_lines = f.readlines()
print(len(all_lines))

4


Let's write our own versions `readlines` to see how it works:

In [9]:
def get_line_list(fname):
    """Returns a list of lines from the file named fname.
    """
    textfile = open(fname)
    all_lines = []
    for line in textfile:
        all_lines.append(chop(line))
    return all_lines

Now we can write code like this:

In [16]:
all_lines = get_line_list('joke.txt')
print(all_lines)
print(f'number of lines: {len(all_lines)}')

["Who's there?", 'A broken pencil.', 'A broken pencil who?', "Never mind. It's pointless."]
number of lines: 4


Having all the lines in a list makes some operations easy. For example, this
prints the original file:

In [11]:
all_lines = get_line_list('joke.txt')
for line in all_lines:
    print(line)

Who's there?
A broken pencil.
A broken pencil who?
Never mind. It's pointless.


This prints the file in reverse order by line:

In [12]:
all_lines = get_line_list('joke.txt')
all_lines.reverse()
for line in all_lines:
    print(line)

Never mind. It's pointless.
A broken pencil who?
A broken pencil.
Who's there?


This prints the file with line numbers:

In [13]:
all_lines = get_line_list('joke.txt')
line_num = 1
for line in all_lines:
    print(f'{line_num:3} {line}')
    line_num += 1

  1 Who's there?
  2 A broken pencil.
  3 A broken pencil who?
  4 Never mind. It's pointless.


You can also do simple (and slow!) text searching. For example, this prints all
the lines that contain the string `'pencil'`:

In [14]:
all_lines = get_line_list('joke.txt')
for line in all_lines:
    if 'pencil' in line:
        print(line)

A broken pencil.
A broken pencil who?


## Writing to a Text File

Basic writing to a text file is done using the `write` method. Here's an
example:


In [18]:
# open a file for writing
outfile = open('output.txt', 'w')    # 'w' means write

# write some lines
outfile.write('This is a line 1\n')  # \n is needed to end the line
outfile.write('This is a line 2\n')
outfile.write('\n')
outfile.write('The line above is blank\n')

outfile.close()  # close the file, to make sure the data is written

#
# Now let's read the file we just wrote to make sure it's correct.
#
print(get_line_list('output.txt'))

['This is a line 1', 'This is a line 2', '', 'The line above is blank']


**Careful!** If `output.txt` doesn't exist, then it will be created. But if
`output.txt` already exists, calling `open('output.txt', 'w')` will overwrite
it, i.e. delete any previous contents.

## Reading a Web Page

Python provides an easy way to read the contents of a web page as string:


In [19]:
import urllib.request

def get_web_page_text(url):
    """ Retrieve the contents of a web page.
    """
    socket = urllib.request.urlopen(url)
    return socket.read()

page = get_web_page_text('https://www.sfu.ca/')
print(page)

b'<!DOCTYPE html>\r\n<html lang="en" data-page-template="basic-home" data-custom-template="sfu-ca" data-page-type="home" data-custom-page-type="sfu-main-home-page"  >\r\n\r\n<head>\r\n\t<meta http-equiv="X-UA-Compatible" content="IE=Edge, chrome=1">\r\n\t<meta http-equiv="content-type" content="text/html; charset=UTF-8" />\r\n\t<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0">\r\n\t<link rel="stylesheet" type="text/css" href="/etc/designs/clf/clientlibs/clf4/default/css/default.css"/>\r\n\t<title> Simon Fraser University </title> \t<!-- Favicon -->\r\n\t<link rel="shortcut icon" href="https://www.sfu.ca/favicon.ico"/> \t<!-- Stylesheets -->\r\n\t<!-- CSS added by us --> <link rel="stylesheet" href="/etc/designs/clf/clientlibs/pack/head-clf4.styles.min.css" type="text/css">\n  \t<!-- theme css (v1) -->\r\n\t<link rel="stylesheet" type="text/css" href="/etc/designs/clf/clientlibs/clf4/default/css/themes/v1/default.css"/> \t<!-- Javascript -->\r\n\t

This returns the web page *as one big string*, which may be quite unreadable! 
If you want to create your own
[web scraper](https://en.wikipedia.org/wiki/Web_scraping) or web browser, then 
this is a good start.