# More Strings

Arguably one of the things Python does best is strings. It is capable of processing large strings en masse and doing operations on them. 

Today we will analyze a list of *puzzle words* that was compiled as part of the Moby lexicon project. This will also be our first example of Python working with files from our computer.

To start - you need to get a copy of these notes and the file with our data "CROSSWD.TXT" into the same directory. Or you can follow the instructions and use the second cell to get the file via the URL in Github.

## Jupyter

On Jupyter this is as simple as - before you open a notebook, uploading the .TXT file to the directory you are in.


In [1]:
words_file = open('CROSSWD.TXT')
# open creates a file object in Python for us to manipulate


## Google Colab

You need to use a module in order to read the file from a URL or use a module in order to read it from your Google Drive account. I like reading it from a URL because this means anyone with the .ipynb file can run the code and get the file. This method will work in Jupyter as well. Just choose the option you want and run that, and comment out the other one.

In [None]:
## Commenting this out - you would want to uncomment it if you are using Google Colab

#from urllib.request import urlopen
#words_file = urlopen('https://github.com/virgilpierce/CS_120/raw/main/CROSSWD.TXT')
# Github is public facing so I just add the link to the file I get by right clicking on the "Download" button for the file in Github and choosing copy url.

# If you get an error in Jupyter you either need to access the file using the cell above; or you need to install 
# the urllib module by using a Terminal and typing: pip install urllib

# Note that the urlopen does behave a little strangely. It is not loading the file all at once and instead queries the server line by line for it
# this will with a slow internet connection make this method slower than the open() above.


In [2]:
type(words_file)
# The type indicates that it is an Input/Output stream

_io.TextIOWrapper

In [4]:
[x for x in dir(words_file) if '_' != x[0]]
# Let's check what methods we have

['buffer',
 'close',
 'closed',
 'detach',
 'encoding',
 'errors',
 'fileno',
 'flush',
 'isatty',
 'line_buffering',
 'mode',
 'name',
 'newlines',
 'read',
 'readable',
 'readline',
 'readlines',
 'reconfigure',
 'seek',
 'seekable',
 'tell',
 'truncate',
 'writable',
 'write',
 'write_through',
 'writelines']

In [5]:
help(words_file.readline)
# we can get information about a method

Help on built-in function readline:

readline(size=-1, /) method of _io.TextIOWrapper instance
    Read until newline or EOF.
    
    Returns an empty string if EOF is hit immediately.



In [6]:
words_file.readline()
# each time we execute .readline() it reads the next line in the file as a string. Try it.

'aa\n'

### Byte-Strings

If you are doing this using the URL method above for Google Colab - the string you just got probably has a *b* in front of it. This is how Python designates
a type called a byte-string. Byte strings are how computers encode characters beyond the standard alphabet we are using, and because the internet is international
sites like Github have to deliver their content in byte-strings rather than regular strings.

We know this file is made up entirely of regular strings and so we might want to remove the *b*.  We can do that by adding a .decode('utf-8') after the .readline().

'utf-8' specifies the encoding that the byte-string is using (in this case Github uses *Unicode Transformation 8-bit*). 

We don't really need the '\n' new line character and we can use the .strip() method to remove it:

In [7]:
words_file.readline().strip()
# Note that we can just string together methods - and you can start to see the reason they are written as .method()

'aah'

Even better, the file object is an iterable:  meaning we can use it in a for loop:  Note if you execute the command that follows, you will probably have to use Interupt to stop it unless you want to wait a long time.

In [None]:
for line in words_file:
    word = line.strip()
    print(word)

## Program 1

Write a program that reads CROSSWD.TXT and prints only the words with more than 20 characters.

Note that in each of the Programs below we need to start by opening the file (or URL). It used to be very important to close the file when you are done - it is now less important **UNLESS** you are writing data to the file - in that case you need to close it before your operating system will ensure that the data sent to the file is actually stored to your systems disk. We will play with some file manipulation later in the semester.

In [9]:
words_file = open('CROSSWD.TXT')
for line in words_file:
    word = line.strip()
    if len(word) > 20:
        print(word)

counterdemonstrations
hyperaggressivenesses
microminiaturizations


## Program 2

Write a function called *has_no_e* that takes a word and returns True if it has no e and False if it has an e.  

Then modify your Program 1 to print all the words that have no e.

In [11]:
def has_no_e(word):
    if 'e' in word:
        return False
    else:
        return True

In [12]:
words_file = open('CROSSWD.TXT')
count_e = 0
count_no_e = 0
for line in words_file:
    word = line.strip()
    if has_no_e(word):
        count_no_e += 1
    else:
        count_e += 1
        
count_no_e, count_e

(37641, 76168)

## Program 3

Write a function named *uses_only* that takes a word and a string of letters and returns True only if the word uses letters from the list.

Then modify Program 1 so that you can construct a sentence that uses the only the letters 'asdfjkl' if possible.

## Program 4 

Write a function named *uses_all* that takes a word and a string of letters and returns True if the word uses all of the letters from the list at least once but also uses any other letters.

How many words are there that use all of the vowels 'aeiou'?  How about 'aeiouy'?

## Program 5

Write a function called *is_alphabetical* that retursn True if the letters in a word appear in alphabetical order.