# More Strings

Arguably one of the things Python does best is strings. It is capable of processing large strings en masse and doing operations on them. 

Today we will analyze a list of *puzzle words* that was compiled as part of the Moby lexicon project. This will also be our first example of Python working with files from our computer.

For this week we will use the url for the txt file in github. You can do this with files in Google Drive, One Drive, Dropbox, or any resources that lets you host files online and share them via a url.  Even a website would work.



## Google Colab

You need to use a module in order to read the file from a URL or use a module in order to read it from your Google Drive account. I like reading it from a URL because this means anyone with the .ipynb file can run the code and get the file. This method will work in Jupyter as well. Just choose the option you want and run that, and comment out the other one.

In [7]:
## Commenting this out - you would want to uncomment it if you are using Google Colab

from urllib.request import urlopen
words_file = urlopen('https://github.com/virgilpierce/CS_120/raw/main/CROSSWD.TXT')
# Github is public facing so I just add the link to the file I get by right clicking on the "Download" button for the file in Github and choosing copy url.

# If you get an error in Jupyter you either need to access the file using the cell above; or you need to install 
# the urllib module by using a Terminal and typing: pip install urllib

# Note that the urlopen does behave a little strangely. It is not loading the file all at once and instead queries the server line by line for it
# this will with a slow internet connection make this method slower than the open() above.


In [None]:
type(words_file)
# The type indicates that it is an Input/Output stream

In [None]:
[x for x in dir(words_file) if '_' != x[0]]
# Let's check what methods we have

In [None]:
help(words_file.readline)
# we can get information about a method

In [None]:
words_file.readline()
# each time we execute .readline() it reads the next line in the file as a string. Try it.

### Byte-Strings

The string you just got probably has a *b* in front of it. This is how Python designates
a type called a byte-string. Byte strings are how computers encode characters beyond the standard alphabet we are using, and because the internet is international
sites like Github have to deliver their content in byte-strings rather than regular strings.

We know this file is made up entirely of regular strings and so we might want to remove the *b*.  We can do that by adding a .decode('utf-8') after the .readline().

'utf-8' specifies the encoding that the byte-string is using (in this case Github uses *Unicode Transformation 8-bit*). 

We don't really need the '\n' new line character and we can use the .strip() method to remove it:

In [None]:
words_file.readline().decode('utf-8').strip()
# Note that we can just string together methods - and you can start to see the reason they are written as .method()

Even better, the file object is an iterable:  meaning we can use it in a for loop:  Note if you execute the command that follows, you will probably have to use Interupt to stop it unless you want to wait a long time.

In [None]:
for line in words_file:
    word = line.decode('utf-8').strip()
    print(word)

## Program 1

Write a program that reads CROSSWD.TXT and prints only the words with more than 20 characters.

Note that in each of the Programs below we need to start by opening the file (or URL). It used to be very important to close the file when you are done - it is now less important **UNLESS** you are writing data to the file - in that case you need to close it before your operating system will ensure that the data sent to the file is actually stored to your systems disk. We will play with some file manipulation later in the semester.

In [None]:
words_file = urlopen('https://github.com/virgilpierce/CS_120/raw/main/CROSSWD.TXT')
for line in words_file:
    word = line.decode('utf-8').strip()
    if len(word) > 20:
        print(word)

## Program 2

Write a function called *has_no_e* that takes a word and returns True if it has no e and False if it has an e.  

Then modify your Program 1 to print all the words that have no e.

In [None]:
def has_no_e(word):
    if 'e' in word:
        return False
    else:
        return True

In [None]:
words_file = urlopen('https://github.com/virgilpierce/CS_120/raw/main/CROSSWD.TXT')
count_e = 0
count_no_e = 0
for line in words_file:
    word = line.decode('utf-8').strip()
    if has_no_e(word):
        count_no_e += 1
    else:
        count_e += 1
        
count_no_e, count_e

## Program 3

Write a function named *uses_only* that takes a word and a string of letters and returns True only if the word uses letters from the list.

Then modify Program 1 so that you can construct a sentence that uses the only the letters 'asdfjkl' if possible.

In [1]:
def uses_only(word, letters):

  for c in word:
    if not c in letters:
      return False

  return True

In [3]:
uses_only('checked', 'chked')

True

In [4]:
uses_only('checked', 'chkeda')

True

In [5]:
uses_only('checked', 'chke')

False

In [10]:
words_file = urlopen('https://github.com/virgilpierce/CS_120/raw/main/CROSSWD.TXT')
count_uses_only = 0  #initialization
count_not_uses_only = 0
for line in words_file:
    word = line.decode('utf-8').strip()
    if uses_only(word, 'asdfjklu'):
        print(word)
        count_uses_only += 1
    else:
        count_not_uses_only += 1
        
count_not_uses_only, count_uses_only

aa
aal
aals
aas
ad
add
adds
ads
aff
ala
alas
alaska
alaskas
alfa
alfalfa
alfalfas
alfas
all
alls
alula
as
ask
asks
ass
audad
audads
auk
auks
auld
da
dad
dada
dadas
dads
daff
daffs
dak
daks
duad
duads
dual
duals
dud
duds
duff
duffs
dull
dulls
dusk
dusks
fa
fad
fads
fall
fallal
fallals
falls
fas
fauld
faulds
flak
flask
flasks
flu
fluff
fluffs
flus
fud
fuds
full
fulls
fuss
jauk
jauks
judas
juju
jujus
jus
ka
kaas
kaka
kakas
kas
kudu
kudus
kulak
kulaks
la
lad
lads
lall
lalls
las
lass
laud
lauds
luau
luaus
luff
luffa
luffas
luffs
lull
lulls
lulu
lulus
lusus
sad
sal
salad
salads
sall
sals
sass
sau
saul
sauls
skald
skalds
skua
skuas
skulk
skulks
skull
skulls
sluff
sluffs
sudd
sudds
suds
sulfa
sulfas
sulk
sulks
us
usual
usuals


(113672, 137)

In [11]:
def check_uses_only(letters):

  words_file = urlopen('https://github.com/virgilpierce/CS_120/raw/main/CROSSWD.TXT')
  count_uses_only = 0  #initialization
  count_not_uses_only = 0
  for line in words_file:
    word = line.decode('utf-8').strip()
    if uses_only(word, letters):
        count_uses_only += 1
    else:
        count_not_uses_only += 1
        
  return count_not_uses_only, count_uses_only

In [12]:
check_uses_only('aeiouf')

(113787, 22)

In [13]:
check_uses_only('aeioug')

(113778, 31)

## Program 4 

Write a function named *uses_all* that takes a word and a string of letters and returns True if the word uses all of the letters from the list at least once but also uses any other letters.

How many words are there that use all of the vowels 'aeiou'?  How about 'aeiouy'?

In [14]:
def uses_all(word, letters):

  return uses_only(letters, word)

In [15]:
def uses_all(word, letters):

  for c in letters:
    if not c in word:
      return False
  
  return True

In [17]:
words_file = urlopen('https://github.com/virgilpierce/CS_120/raw/main/CROSSWD.TXT')
count_uses_all = 0  #initialization
count_not_uses_all = 0
for line in words_file:
    word = line.decode('utf-8').strip()
    if uses_all(word, 'aeiou'):
        print(word)
        count_uses_all += 1
    else:
        count_not_uses_all += 1
        
count_not_uses_all, count_uses_all

aboideau
aboideaus
aboideaux
aboiteau
aboiteaus
aboiteaux
abstemious
abstemiously
accentuation
accentuations
accountabilities
accountancies
accoutering
adulteration
adulterations
adventitious
adventitiously
adventitiousness
adventitiousnesses
aerobium
aeronautic
aeronautical
aeronautically
aeronautics
agouties
ambidextrous
ambidextrously
antibourgeois
anticonsumer
antievolution
antievolutionary
antihomosexual
antireligious
antirevolutionary
antisubversion
antituberculosis
antiunemployment
armouries
arsenious
assiduousness
assiduousnesses
atrociousness
atrociousnesses
attenuation
attenuations
auctioned
auctioneer
auctioneers
auditioned
auditories
augmentation
augmentations
aureoling
authentication
authentications
authoritative
authoritatively
authorities
authorize
authorized
authorizes
autobiographer
autobiographers
autobiographies
autocracies
autogamies
autogenies
automobile
automobiles
automotive
autonomies
autopsied
autopsies
autotomies
autotypies
avoidupoises
beautification
beautifi

(113211, 598)

## Program 5

Write a function called *is_alphabetical* that retursn True if the letters in a word appear in alphabetical order.

In [18]:
5 > 6

False

In [24]:
'a' > 'z', 'a' < 'z'

(False, True)

In [39]:
def is_alphabetical(word):
  
 for k in range(len(word)-1):
    if word[k] > word[k+1]:
      return False

  return True

In [40]:
example = 'fly'
is_alphabetical(example)

True

In [32]:
is_alphabetical('flies')

True

In [29]:
words_file = urlopen('https://github.com/virgilpierce/CS_120/raw/main/CROSSWD.TXT')
for line in words_file:
    word = line.decode('utf-8').strip()
    if is_alphabetical(word):
        print(word)


aa
aah
aahs
aal
aals
aas
abbe
abbes
abbess
abbey
abbot
abet
abhor
abhors
ably
abo
abort
abos
abuzz
aby
accent
accept
access
accost
ace
acers
aces
achoo
achy
act
ad
add
adder
adders
adds
adeem
adeems
adept
adios
adit
ado
adopt
ados
ads
adz
ae
aegis
aery
aff
affix
afflux
afoot
aft
agin
agio
agios
agist
aglow
agly
ago
ah
ahoy
ai
ail
ails
aim
aims
ain
ains
air
airs
airt
airy
ais
ait
all
allot
allow
alloy
alls
ally
almost
alms
alow
alp
alps
alt
am
ammo
ammos
amort
amp
amps
amu
an
annoy
ant
any
apt
ar
ars
art
arty
as
ass
at
aw
ay
be
bee
beef
beefily
beefs
beefy
been
beep
beeps
beer
beers
beery
bees
beet
befit
beg
begin
begins
begirt
begot
begs
beknot
bel
bell
bellow
bells
belly
below
bels
belt
ben
benny
bens
bent
berry
best
bet
bevy
bey
bhoot
bi
bijou
bijoux
bill
billow
billowy
bills
billy
bin
bins
bint
bio
biopsy
bios
birr
birrs
bis
bit
bitt
bitty
bloop
bloops
blot
blotty
blow
blowy
bo
boo
boor
boors
boos
boost
boot
booty
bop
bops
bort
borty
bortz
bos
boss
bossy
bot
bott
bow
box
boxy
boy
bu