## Essential Python Programming
- string functions, 
- data structures, 
- list comprehension, 
- counters, 
- file and web functions, 
- regular expressions, 
- globbing, 
- and data pickling.

### Strings
Humans use strings to interact with computers

- lower(), upper(), capitalize()
- islower(), isupper(), isdigit(), isalpha()

In [1]:
s = "UZAY"
s.isupper(), s.islower()

(True, False)

getting rid of unwanted whitespaces
- lstrip() (left strip), rstrip() (right strip), 
- and strip() remove all whitespaces

They don’t remove the inner spaces

In [2]:
s = "    mis gibi    "
s.strip()

'mis gibi'

Tokenezation

In [3]:
"Hello, world!".split()

['Hello,', 'world!']

In [4]:
university = "www.bilgi.edu.tr".split(".")
university

['www', 'bilgi', 'edu', 'tr']

In [5]:
".".join(university)

'www.bilgi.edu.tr'

## Data Structures
- lists, tuples, sets, and dictionaries

#### Lists
> Python implements lists as arrays. They have linear search time, which makes
them impractical for storing large amounts of searchable data.

#### Tuples 
Tuples are immutable lists.

#### Sets
They are excellent for membership look-ups and eliminating duplicates
> Set items don’t have indexes. Sets can store at most one copy of an item and have sublinear O(log(N)) search
time. 

You can transform list data to a set for faster membership look-ups.



In [6]:
import timeit
bigList = [str(i) for i in range(10000000)]
bigSet = set(bigList)

In [7]:
start_time = timeit.default_timer()
print("abc" in bigList) # Takes 0.2 sec
print("Time: ", timeit.default_timer() - start_time)

False
Time:  0.15491518799535697


In [8]:
start_time = timeit.default_timer()
print("abc" in bigSet)
print("Time: ", timeit.default_timer() - start_time)

False
Time:  0.00014974999794503674


#### Dictionaries
Dictionaries map keys to values. Dictionaries have sublinear O(log(N)) search time. They
are excellent for key-value look-ups.

In [9]:
seq = ["alpha", "bravo", "charlie", "delta"]
dict(enumerate(seq))

{0: 'alpha', 1: 'bravo', 2: 'charlie', 3: 'delta'}

In [10]:
kseq = range(4)
vseq = ["alpha", "bravo", "charlie", "delta"]
dict(zip(kseq, vseq))

{0: 'alpha', 1: 'bravo', 2: 'charlie', 3: 'delta'}

#### Generators 
Python implements enumerate(seq) and zip(kseq, vseq) (and the good old range(),
too) as list generators.

> Unlike a real list, a list generator produces the next element in a lazy way, only as needed. Generators facilitate working with large lists and even permit “infinite” lists.

#### List Comprehension

In [11]:
vseq = ["alpha", "bravo", "charlie", "delta"]
[v for v in vseq if v.endswith('a')]

['alpha', 'delta']

In [12]:
[x**2 for x in range(6)]

[0, 1, 4, 9, 16, 25]

#### Counters
A counter is a dictionary-style collection for tallying items in another collection.

In [13]:
from collections import Counter
phrase = "a man a plan a canal panama"
cntr = Counter(phrase.split())
cntr

Counter({'a': 3, 'canal': 1, 'man': 1, 'panama': 1, 'plan': 1})

In [14]:
cntr.most_common()

[('a', 3), ('man', 1), ('plan', 1), ('canal', 1), ('panama', 1)]

In [15]:
cntr.most_common(1)

[('a', 3)]

In [16]:
cntr.most_common(2)

[('a', 3), ('man', 1)]

#### Files
> A file is a non-volatile container for long-term data storage.

Modes
- reading (default mode, denoted as "r"), 
- [over]writing ("w"), 
- or appending ("a").

Reading
- f.read() # Read all data as a string or a binary
- f.read(n) # Read the first n bytes as a string or a binary
- f.readline() # Read the next line as a string
- f.readlines() # Read all lines as a list of strings

Writing
- f.write(line) # Write a string or a binary
- f.writelines(ines) # Write a list of strings

In [17]:
# with open(name, mode="r") as f:
#    «read the file»

#### Reaching the Web
The module urllib.request contains functions for downloading data from the web. 

In [18]:
import urllib.request
try:
    with urllib.request.urlopen("http://www.networksciencelab.com") as doc:
        html = doc.read()
        # If reading was successful, the connection is closed automatically
except:
    print("Could not open %s" % doc, file=sys.err)
    # Do not pretend that the document has been read!
    # Execute an error handler here

#### Regular expressions

Regular expressions are a powerful mechanism for searching, splitting, and
replacing strings based on pattern matching.

__Basic operators__
 - . Any character except newline
 - a The character a itself
 - ab The string ab itself
 - x|y x or y
 - \y Escapes a special character y, such as ^+{}$()[]|\-?.*

__Character classes__
 - [a-d] One character of: a,b,c,d
 - [^a-d] One character except: a,b,c,d
 - \d One digit
 - \D One non-digit
 - \s One whitespace
 - \S One non-whitespace
 - \w One alphanumeric character
 - \W One non-alphanumeric character
 
__Quantifierss__
 - x* Zero or more xs
 - x+ One or more xs
 - x? Zero or one x
 - x{2} Exactly two xs
 - x{2,5} Between two and five xs
 
 
__Assertions__
 - ^ Start of string
 - \b Word boundary
 - \B Non-word boundary
 - $ End of string
 
__Groups__
 - (x) Capturing group
 - (?:x) Non-capturing group
 
To define a raw string, put the character r immediately in front of the opening quotation mark.

In [19]:
import re
r"\n"

'\\n'

In [20]:
r"\w[-\w\.]*@\w[-\w]*(\.\w[-\w]*)+"
#An email address.
r"<TAG\b[^>]*<(.*?)</TAG>"
#Specific HTML tag with a matching closing tag.
r"[-+]?((\d*\.?\d+)|(\d\.))([eE][-+]?\d+)?"
#A floating point number.

'[-+]?((\\d*\\.?\\d+)|(\\d\\.))([eE][-+]?\\d+)?'

In [21]:
re.split(r"\W", "Hello, world") # split \W non-alphanumeric character

['Hello', '', 'world']

In [22]:
mo = re.match(r"\d+", "067 Starts with a number 068 - 1") # find digit \d pattern
mo.group()

'067'

In [23]:
re.findall(r"\d+", "067 Starts with a number 068 - 1")

['067', '068', '1']

In [24]:
re.findall(r"[a-z]+","067 Starts with a number 068 - 1")

['tarts', 'with', 'a', 'number']

r"cab+" 
 - matches a substring that starts with a "ca", followed by at least one "b", 

but r"c(?:ab)+" matches a substring
 - that starts with a "c", followed by one or more "ab"s.

#### Pickle
 - The module pickle implements serialization
 
__Dump an object into a file__
 - with open("myData.pickle", "__wb__") as oFile:
  - pickle.dump(object, oFile)

__Load the same object back__
 - with open("myData.pickle", "__rb__") as iFile:
  - object = pickle.load(iFile)

# Your Turn
__Word Frequency Counter__
> Write a program that downloads a web page requested by the user and reports up to ten most frequently used words.

use regular expression r"\w+".

In [25]:
import urllib.request, re
from collections import Counter

# Talk to the user and the Internet
url = input("Enter the URL: ")
try:
    page = urllib.request.urlopen(url)
    # Read and partially normalize the page
    doc = page.read().decode().lower()

    # Split the text into words
    words = re.findall(r"\w+", doc)

    # Build a counter and report the answer
    print(Counter(words).most_common(10))
except:
    print("Cannot open %s" % url)
    quit()




Enter the URL: http://mathinsight.org/thread/list
[('a', 83), ('div', 48), ('class', 47), ('li', 40), ('ym', 39), ('of', 33), ('the', 31), ('script', 28), ('href', 23), ('math', 20)]
