## Importing from Modules

* Many python tools need to be imported
* Use the **import statement**
* **Basic syntax** `import math`

In [1]:
import math

In [2]:
math

<module 'math' from '/Users/tiverson/.pyenv/versions/anaconda3-5.0.0/lib/python3.6/lib-dynload/math.cpython-36m-darwin.so'>

In [3]:
type(math)

module

In [4]:
math.pi

3.141592653589793

In [5]:
type(math.pi)

float

## Function call syntax

In [6]:
math.sqrt(3)

1.7320508075688772

In [7]:
math.sqrt

<function math.sqrt>

In [8]:
type(math.sqrt)

builtin_function_or_method

## The type and value of a function

In [9]:
type(math.sqrt)

builtin_function_or_method

In [10]:
math.sqrt # Functions are DATA!

<function math.sqrt>

## Think of modules as folders

<img src="https://github.com/wsu-stat489/USCOTS2017_workshop/blob/master/img/mathmod.png?raw=true">

<font color="red"><h2>Exercise</h2></font>

Compute the $\cos(\pi/3)$

In [11]:
math.cos(math.pi/3)

0.5000000000000001

## Inspecting a module with `dir` and `help`

* Use `dir(module)` to list all elements
* Use `help(module.item)` to learn about elements

In [12]:
dir(math)

['__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'acos',
 'acosh',
 'asin',
 'asinh',
 'atan',
 'atan2',
 'atanh',
 'ceil',
 'copysign',
 'cos',
 'cosh',
 'degrees',
 'e',
 'erf',
 'erfc',
 'exp',
 'expm1',
 'fabs',
 'factorial',
 'floor',
 'fmod',
 'frexp',
 'fsum',
 'gamma',
 'gcd',
 'hypot',
 'inf',
 'isclose',
 'isfinite',
 'isinf',
 'isnan',
 'ldexp',
 'lgamma',
 'log',
 'log10',
 'log1p',
 'log2',
 'modf',
 'nan',
 'pi',
 'pow',
 'radians',
 'sin',
 'sinh',
 'sqrt',
 'tan',
 'tanh',
 'tau',
 'trunc']

In [13]:
help(math.fabs)

Help on built-in function fabs in module math:

fabs(...)
    fabs(x)
    
    Return the absolute value of the float x.



## Double Underscore is *dunder* in Python

* A method like `"__ge__"` are
    * referred to as *dunder* methods
    * implementation details and mostly ignored

## Importing directly into the main namespace

* Typing `math.` gets annoying
* `import math as m` assigns an alias
* Use `from math import pi` to get direct access
* Beware of shadowing!

In [14]:
import math as m # using an alias
round(m.sqrt(m.pi), 3)

1.772

In [15]:
m.cos(m.pi/3)

0.5000000000000001

In [16]:
from math import pi, sqrt

In [17]:
round(sqrt(pi), 3)

1.772

## Object Oriented Design

* All Python data are objects
    * Attached methods at attributes


## Example - String

Python strings include

* Text data
* Methods for working with that string

In [18]:
s = "Do the difficult things while they are easy and do the great things while they are small."
s

'Do the difficult things while they are easy and do the great things while they are small.'

In [19]:
s.upper()

'DO THE DIFFICULT THINGS WHILE THEY ARE EASY AND DO THE GREAT THINGS WHILE THEY ARE SMALL.'

In [20]:
s.lower()

'do the difficult things while they are easy and do the great things while they are small.'

In [21]:
t = s.lower()
t.replace("do", "put off")

'put off the difficult things while they are easy and put off the great things while they are small.'

## Chaining `str` methods

* You can string chain methods together with
    * `obj.method1().method2()`
* Make sure each method returns a `str`

In [22]:
s.lower().replace("do", "put off")

'put off the difficult things while they are easy and put off the great things while they are small.'

## Chaining objects is a lot like piping

<img src="./img/r_pipe.png" width=400 />

In [23]:
s.lower().replace("do", "put off")

'put off the difficult things while they are easy and put off the great things while they are small.'

## Use `dir` and dot notation with objects

* `dir` lists all attributes/methods
* Ignore members starting with `_`
* Use dot notation to access members

In [24]:
dir(s)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

## Exploring methods with help

* `help` gives some info about a method.
* Syntax: `help(object.method)`
    * NOT a function call

In [25]:
help(s.casefold)

Help on built-in function casefold:

casefold(...) method of builtins.str instance
    S.casefold() -> str
    
    Return a version of S suitable for caseless comparisons.



In [26]:
s.casefold()

'do the difficult things while they are easy and do the great things while they are small.'

In [27]:
len(s)

89

<font color="red"><h2> Exercise 1 </h2></font>

What is the difference between the following.
Which is correct?

In [28]:
help(s.upper)

Help on built-in function upper:

upper(...) method of builtins.str instance
    S.upper() -> str
    
    Return a copy of S converted to uppercase.



In [29]:
help(s.upper())

No Python documentation found for 'DO THE DIFFICULT THINGS WHILE THEY ARE EASY AND DO THE GREAT THINGS WHILE THEY ARE SMALL.'.
Use help() to get the interactive help utility.
Use help(str) for help on the str class.



<font color="red"><h2> Exercise 2 </h2></font>

Change the "do"s in the original quote to "definitely do"s

In [30]:
*Your answer here*

SyntaxError: invalid syntax (<ipython-input-30-9e60f5d9bc51>, line 1)

## Downloading a website with `requests`

* `requests` is used to download the source of a website
* Install with the following line

```
pip install requests
```

In [31]:
import requests

## Making a session and getting content

* Start by making a session
* Then use the `get` method passing the url

In [32]:
s = requests.Session() # Start a session
r = s.get('https://en.wikipedia.org/wiki/Web_scraping') # Get a static page
r.ok # Check status

True

## Accessing the raw content

* `content` attribute has the raw content

In [33]:
r.content

b'<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>Web scraping - Wikipedia</title>\n<script>document.documentElement.className = document.documentElement.className.replace( /(^|\\s)client-nojs(\\s|$)/, "$1client-js$2" );</script>\n<script>(window.RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"Web_scraping","wgTitle":"Web scraping","wgCurRevisionId":857079391,"wgRevisionId":857079391,"wgArticleId":2696619,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["CS1 Danish-language sources (da)","Articles needing additional references from June 2017","All articles needing additional references","Articles with limited geographic scope from October 2015","USA-centric","Articles to be split from July 2018","All articles to be split","Web scraping"],"wgBreakFrames":false,"wgPageCo

<font color="red"> <h2>Exercise 2</h2></font>

Use `requests` to get and print out your website from Assignment 1.

## Another example page

* Check out [this page](http://www.pythonscraping.com/pages/page1.html)
* View the source in your browser

In [34]:
r2 = s.get('http://www.pythonscraping.com/pages/page1.html')
r2.content

b'<html>\n<head>\n<title>A Useful Page</title>\n</head>\n<body>\n<h1>An Interesting Title</h1>\n<div>\nLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n</div>\n</body>\n</html>\n'

## Using Beautiful Soup to navigate a site

* Beautiful Soup makes searching/navigating a site easy
* Imported as `bs4`
* Start with the `BeautifulSoup` class

In [35]:
from bs4 import BeautifulSoup
import requests

In [36]:
soup = BeautifulSoup(r2.content, "html.parser")

In [37]:
soup.title

<title>A Useful Page</title>

In [38]:
soup.body

<body>
<h1>An Interesting Title</h1>
<div>
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</div>
</body>

In [39]:
soup.h1

<h1>An Interesting Title</h1>

In [40]:
soup.div

<div>
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</div>

In [41]:
soup.a

In [42]:
soup.a is None

True

<font color="red"><h2> Exercise 3 </h2></font>

Make a `soup` object for your webpage and pull off the following:

* a bold tag
* an img tag
* a link

<font color="red"> <h2>Take-Home Exercise</h2></font>

See if you can find and explain the source of *Beautiful Soup*.