Please go through the following in the text:
- [Chapter 12 - Networked Programs](https://eng.libretexts.org/Bookshelves/Computer_Science/Programming_Languages/Book%3A_Python_for_Everybody_(Severance)/12%3A_Networked_Programs)
- [Chapter 13 - Web Services](https://eng.libretexts.org/Bookshelves/Computer_Science/Programming_Languages/Book%3A_Python_for_Everybody_(Severance)/13%3A_Python_and_Web_Services/)
    - up to 13.4
    
And this external resource is very detailed!
- [Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

Additional Resources
- https://www.w3schools.com/python/python_sets.asp
- https://realpython.com/introduction-to-python-generators/
- https://www.geeksforgeeks.org/zip-in-python/

### Networked Programs and Web Services
#### HTTP in Python
Python can retrieve images and webpages via HTTP. This is typically used in *Web Scraping* 

In [20]:
import urllib.request, urllib.parse, urllib.error
# Open a picture and save to your drive
img = urllib.request.urlopen('http://data.pr4e.org/cover3.jpg')
fhand = open('cover3.jpg', 'wb') #start a new file cover3.jpg, write binary
size = 0
while True:
    info = img.read(100000)
    if len(info) < 1: break
    size = size + len(info)
    fhand.write(info)

print(size, 'characters copied.')
fhand.close()


230210 characters copied.


#### Web Scraping using Beautiful Soup
Beautiful soup is a Python module that parse an HTML file into various Python objects.

Example: A hyperlink \<a href="https://www.google.com">
- Tag objects are hierarchical objects that content tags such as the 'a' tag \<a>
- These tags have different attributes such as 'href' in \<a href> , which in HTML will create a hyperlink
- The attributes themselves can have different values, such as the actual link address https://www.google.com

- Retrive the first match with an \<a> tag: `soup.a`
    - Note: `soup.find('a')` is equivalent to `soup.a`
- Retrive ALL matches with an \<a> tag: `soup('a')`
    - Note: `soup.find_all('a')` is equivalent to `soup('a')`
- Get text from an \<a> tag: `soup.a.text`
    - Note: `soup.a.get_text()` is equivalent
- Get text from ALL matches with an \<a> tag: `[item.text for item in soup('a')]`
    - Important! When you work with ALL matches of a tag or attribute, you should (1) convert to a different Python object such as a list or dictionary to be able to work with it, or (2) use a loop to iterate through each item.


In [15]:
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

site= "https://littlesunnykitchen.com/baked-potatoes-on-the-grill/"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site, headers=hdr)
page = urlopen(req)
soup = BeautifulSoup(page) #Creates a BeautifulSoup object called soup
soup # a little messy
print(soup.prettify()) # a little easier to view

<!DOCTYPE html>
<html lang="en-US">
 <head>
  <meta charset="utf-8"/>
  <script>
   if(navigator.userAgent.match(/MSIE|Internet Explorer/i)||navigator.userAgent.match(/Trident\/7\..*?rv:11/i)){var href=document.location.href;if(!href.match(/[?&]nowprocket/)){if(href.indexOf("?")==-1){if(href.indexOf("#")==-1){document.location.href=href+"?nowprocket=1"}else{document.location.href=href.replace("#","?nowprocket=1#")}}else{if(href.indexOf("#")==-1){document.location.href=href+"&nowprocket=1"}else{document.location.href=href.replace("#","&nowprocket=1#")}}}}
  </script>
  <script>
   class RocketLazyLoadScripts{constructor(){this.v="1.2.4",this.triggerEvents=["keydown","mousedown","mousemove","touchmove","touchstart","touchend","wheel"],this.userEventHandler=this._triggerListener.bind(this),this.touchStartHandler=this._onTouchStart.bind(this),this.touchMoveHandler=this._onTouchMove.bind(this),this.touchEndHandler=this._onTouchEnd.bind(this),this.clickHandler=this._onClick.bind(this),this.i

In [16]:
print("First <a> tag result:\n", soup.a) # view result of the FIRST <a> HTML tag
print("First <h1> tag result:\n", soup.h1) # view result of the FIRST <h1> HTML tag

First <a> tag result:
 <a class="screen-reader-shortcut" href="#genesis-nav-primary"> Skip to primary navigation</a>
First <h1> tag result:
 <h1 class="entry-title">Baked Potatoes on The Grill</h1>


In [17]:
#What are the attributes of the first result of <a>?
print("The attributes for the first <a> are:\n",soup.a.attrs)
#Retrieve value for 'class' attribute of <a>
print("The value for attribute 'class' in first <a>:\n", soup.a.get('class'))

The attributes for the first <a> are:
 {'href': '#genesis-nav-primary', 'class': ['screen-reader-shortcut']}
The value for attribute 'class' in first <a>:
 ['screen-reader-shortcut']


In [18]:
# Get text of entire document
print(soup.text)




























Baked Potato On The Grill Recipe - Little Sunny Kitchen































































































 

 Skip to primary navigation Skip to main content Skip to primary sidebar




Free Dinner Ebook! Get your copy!






Join the Little Sunny Kitchen email family!
Get our FREE dinners ebook!
Please enable JavaScript in your browser to complete this form.First NameEmail *GDPR Agreement *I consent to receive emails from Little Sunny Kitchen *Sign me up! 





About
Videos
Learn to Cook























Visit my other site: Fun Cookie Recipes






Main MenuLittle Sunny KitchenDelicious Recipes for Real LifeAll Recipes
Course

Appetizers
Breakfast
Sides
Dinner
Casseroles
Salads
Sandwiches
Pasta
Soups & Chilis
Desserts
Sauces & Dressings
Beverages


Holiday

Ramadan
Easter
Summer
Fall
Halloween
Thanksgiving
Christmas
Valentine’s
Game Day


Method

Instant Pot
Air Fryer
Slow Cooker
Oven
Casseroles
Stovetop
Gr

In [26]:
# Get text of first <a> tag
print(soup.a.get_text())

 Skip to primary navigation


In [19]:
print("ALL <a> tag results:\n", soup('a')) # view ALL results of the <a> HTML tag

ALL <a> tag results:
 [<a class="screen-reader-shortcut" href="#genesis-nav-primary"> Skip to primary navigation</a>, <a class="screen-reader-shortcut" href="#genesis-content"> Skip to main content</a>, <a class="screen-reader-shortcut" href="#genesis-sidebar-primary"> Skip to primary sidebar</a>, <a class="more-link" data-lity="" href="#popup-1">Free Dinner Ebook! <em>Get your copy!</em><span class="icon-font icon-arrow"></span></a>, <a href="https://littlesunnykitchen.com/about/"><span>About</span></a>, <a href="https://www.youtube.com/c/LittleSunnyKitchen" rel="noopener" target="_blank"><span>Videos</span></a>, <a href="https://littlesunnykitchen.com/category/basics/"><span>Learn to Cook</span></a>, <a aria-label="Pinterest" class="link-item sm-col-4" href="https://www.pinterest.com/lilsunnykitchen/" rel="noopener" role="img" target="_blank">
<span class="link-icon icon-font icon-font-social icon-pinterest"></span>
</a>, <a aria-label="Facebook" class="link-item sm-col-4" href="http

In [20]:
# What are ALL the attributes of <a> ?
[tag.attrs for tag in soup('a')]

[{'href': '#genesis-nav-primary', 'class': ['screen-reader-shortcut']},
 {'href': '#genesis-content', 'class': ['screen-reader-shortcut']},
 {'href': '#genesis-sidebar-primary', 'class': ['screen-reader-shortcut']},
 {'href': '#popup-1', 'data-lity': '', 'class': ['more-link']},
 {'href': 'https://littlesunnykitchen.com/about/'},
 {'target': '_blank',
  'rel': ['noopener'],
  'href': 'https://www.youtube.com/c/LittleSunnyKitchen'},
 {'href': 'https://littlesunnykitchen.com/category/basics/'},
 {'href': 'https://www.pinterest.com/lilsunnykitchen/',
  'class': ['link-item', 'sm-col-4'],
  'target': '_blank',
  'rel': ['noopener'],
  'role': 'img',
  'aria-label': 'Pinterest'},
 {'href': 'https://www.facebook.com/littlesunnykitchen/',
  'class': ['link-item', 'sm-col-4'],
  'target': '_blank',
  'rel': ['noopener'],
  'role': 'img',
  'aria-label': 'Facebook'},
 {'href': 'https://www.instagram.com/littlesunnykitchen/',
  'class': ['link-item', 'sm-col-4'],
  'target': '_blank',
  'rel': [

In [29]:
# What are ALL the values for attribute 'href' of all <a> tags?
[tag.get('href') for tag in soup('a')]

['#genesis-nav-primary',
 '#genesis-content',
 '#genesis-sidebar-primary',
 '#popup-1',
 'https://littlesunnykitchen.com/about/',
 'https://www.amazon.com/shop/littlesunnykitchen',
 'https://www.youtube.com/c/LittleSunnyKitchen',
 'https://littlesunnykitchen.com/web-stories/',
 'https://www.pinterest.com/lilsunnykitchen/',
 'https://www.facebook.com/littlesunnykitchen/',
 'https://www.instagram.com/littlesunnykitchen/',
 'https://www.youtube.com/c/LittleSunnyKitchen/',
 'https://littlesunnykitchen.com/',
 'https://littlesunnykitchen.com/recipes/',
 '#',
 'https://littlesunnykitchen.com/category/meal/appetisers/',
 'https://littlesunnykitchen.com/category/meal/sides/',
 'https://littlesunnykitchen.com/category/meal/main-dishes/',
 'https://littlesunnykitchen.com/category/meal/salads/',
 'https://littlesunnykitchen.com/category/meal/pasta/',
 'https://littlesunnykitchen.com/category/meal/soups/',
 'https://littlesunnykitchen.com/category/desserts/',
 'https://littlesunnykitchen.com/categ

In [21]:
# Get text of ALL <a> tags
#print(soup('a').get_text()) #whomp whomp... this doesn't work!! Make it a list, dictionary, etc.
text_of_a_tags = [item.text for item in soup('a')]
print("Text of ALL <a> tags:", text_of_a_tags)

Text of ALL <a> tags: [' Skip to primary navigation', ' Skip to main content', ' Skip to primary sidebar', 'Free Dinner Ebook! Get your copy!', 'About', 'Videos', 'Learn to Cook', '\n\n', '\n\n', '\n\n', '\n\n', '\n\n', 'Visit my other site: Fun Cookie Recipes', 'Little Sunny Kitchen', 'All Recipes', 'Course', 'Appetizers', 'Breakfast', 'Sides', 'Dinner', 'Casseroles', 'Salads', 'Sandwiches', 'Pasta', 'Soups & Chilis', 'Desserts', 'Sauces & Dressings', 'Beverages', 'Holiday', 'Ramadan', 'Easter', 'Summer', 'Fall', 'Halloween', 'Thanksgiving', 'Christmas', 'Valentine’s', 'Game Day', 'Method', 'Instant Pot', 'Air Fryer', 'Slow Cooker', 'Oven', 'Casseroles', 'Stovetop', 'Grill', 'Bread Machine', 'Display Search Bar', 'Easy Meals', 'Slow Cooker', 'Air Fryer', 'Breakfast', 'Snacks', 'Chicken', 'Beef', 'Easter', 'Copycat Recipes', 'Home', 'Method', 'Grilling', 'Rate Recipe', '3 Comments', ' Jump to Recipe', ' \n\n\nShare', 'Diana', 'disclosure policy', 'grilled portobello mushrooms', 'grille

In [22]:
set([tag.name for tag in soup.find_all()]) #get a set of all unique tags in HTML doc
# What's a set? more info at the bottom of the page 😀 

{'a',
 'article',
 'aside',
 'body',
 'br',
 'button',
 'defs',
 'div',
 'em',
 'fieldset',
 'figure',
 'footer',
 'form',
 'g',
 'h1',
 'h2',
 'h3',
 'head',
 'header',
 'html',
 'img',
 'input',
 'label',
 'legend',
 'li',
 'lineargradient',
 'link',
 'main',
 'meta',
 'nav',
 'noscript',
 'ol',
 'p',
 'path',
 'polygon',
 'script',
 'section',
 'small',
 'span',
 'stop',
 'strong',
 'style',
 'svg',
 'textarea',
 'time',
 'title',
 'ul',
 'use'}

#### Embedded JSON objects

In [23]:
import json
json_scripts = soup.find_all('script', type='application/ld+json')
json_data = [json.loads(script.text, strict=False) for script in json_scripts] 
json_data

[{'@context': 'https://schema.org',
  '@graph': [{'@type': 'Article',
    '@id': 'https://littlesunnykitchen.com/baked-potatoes-on-the-grill/#article',
    'isPartOf': {'@id': 'https://littlesunnykitchen.com/baked-potatoes-on-the-grill/'},
    'author': {'name': 'Diana',
     '@id': 'https://littlesunnykitchen.com/#/schema/person/b3edd42c9baea6a2e9d68df549045c74'},
    'headline': 'Baked Potatoes on The Grill',
    'datePublished': '2020-05-15T03:31:00+00:00',
    'dateModified': '2021-05-20T10:45:43+00:00',
    'wordCount': 851,
    'commentCount': 3,
    'publisher': {'@id': 'https://littlesunnykitchen.com/#organization'},
    'image': {'@id': 'https://littlesunnykitchen.com/baked-potatoes-on-the-grill/#primaryimage'},
    'thumbnailUrl': 'https://littlesunnykitchen.com/wp-content/uploads/Baked-Potatoes-on-The-Grill-11.jpg',
    'articleSection': ['All Recipes',
     'Dairy Free',
     'Grilling',
     'Sides',
     'Summer Recipes',
     'Vegan'],
    'inLanguage': 'en-US',
    'pot

In [24]:
#Extract the list of ingredients!!
json_data[0]['@graph'][7]['recipeIngredient']

['6  potatoes', '2 tablespoons neutral oil', '½ teaspoon sea salt']

## Extra stuff in Python
Let's learn some extra stuff in python that could improve the efficiency of your programming. 

### zip()
The zip() function in Python is used to combine multiple iterables (lists, tuples, etc.) element-wise. It returns an iterator that produces tuples where the i-th tuple contains the i-th element from each of the argument sequences or iterables.

It's easier to understand if you look at an example 😀 

In [80]:
# Example of using zip()
list1 = ['Ben', 'Cody', 'Any']
list2 = ['Becerra', 'Coyote', 'One']
zipped = zip(list1, list2)

print(zipped) #a zip object doesn't print

for item in zipped: #but you can use it as an iterable
    print(item)

<zip object at 0x7fb1fab52040>
('Ben', 'Becerra')
('Cody', 'Coyote')
('Any', 'One')


You can also use zip() to unzip a zipped iterable by using the * operator with zip().

In [85]:
# Example of unzipping a zipped iterable
zipped = [(1, 'a'), (2, 'b'), (3, 'c')]
unzipped = zip(*zipped)

list1, list2 = unzipped
print("List 1:", list(list1))  # Output: [1, 2, 3]
print("List 2:", list(list2))  # Output: ['a', 'b', 'c']

dict1 = {1: 'a', 2: 'b', 3: 'c'}
for key, value in dict1.items():
    print('key:', key, 'value:', value)

unzipdict1 = zip(*dict1.items())
key, value = unzipdict1
print("Keys:", key)
print("Values:", value)

List 1: [1, 2, 3]
List 2: ['a', 'b', 'c']
key: 1 value: a
key: 2 value: b
key: 3 value: c
Keys: (1, 2, 3)
Values: ('a', 'b', 'c')


### Sets
Sets in Python are unordered collections of unique elements. Items are accessed in ascending order (e.g. 1,2,3... a,b,c... A,B,C..). Sets are mutable but elements of a set must be immutable.

Sets in Python use a hash table-based implementation, optimized for fast membership testing and avoiding duplicate elements. A hash code is used as an index to store the element in a hash table. Operations like adding, removing, and checking for membership in sets achieve constant-time average-case performance, regardless of the set's size. However, sets <u>do not preserve the order</u> of elements, as they are stored based on their hash codes. 

If you want to call any specific value, you can convert a set to another data structure (e.g. list) or use it within a loop.

In [13]:
# Example of using sets
my_set = {9999, 9, 3, 'b', 'a', 'hello', 4, 5}
print(my_set, "no specific order here") # There's no order
my_set.add(6)  # Adding elements to a set
print("Added a 6:", my_set)
my_set.remove('hello')  # Removing elements from a set
print("Removed 'hello':", my_set)
my_set[1] #NOT subscriptable... 
    # convert to list or iterate through loop to call specific values


{'hello', 3, 4, 5, 9, 'b', 9999, 'a'} no specific order here
Added a 6: {'hello', 3, 4, 5, 6, 9, 'b', 9999, 'a'}
Removed 'hello': {3, 4, 5, 6, 9, 'b', 9999, 'a'}


TypeError: 'set' object is not subscriptable

Sets support various set operations such as union, intersection, difference, and symmetric difference.

- Example of set operations
  
$A = \{1, 2, 3, 4\}$

$B = \{3, 4, 5, 6\}$

    - Union (OR) = either one
$A \cup B = \{1, 2, 3, 4, 5, 6\}$

    - Intersection (AND) = common to both
$A \cap B = \{3, 4\}$

    - Difference (NOT) = what A has that B doesn't
$A \setminus B = \{1, 2\}$

    - Symmetric Difference (XOR) = what is NOT common to both
$A \oplus B = (A \setminus B) \cup (B \setminus A) = \{1, 2, 5, 6\} $


In [28]:
# Example of set operations
set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}

# Union
print("Union", set1.union(set2))  

# Intersection
print("Intersection", set1.intersection(set2)) 

# Difference
print("Difference", set1.difference(set2))  

# Symmetric Difference
print("Symmetric Difference", set1.symmetric_difference(set2))


Union {1, 2, 3, 4, 5, 6}
Intersection {3, 4}
Difference {1, 2}
Symmetric Difference {1, 2, 5, 6}


You can also check for subsets and supersets.

    - Subset (Everything in A is also in B)
$A \subseteq B$

    - Superset (A contains all of B)
$A \supseteq B$

    - Disjoint (There's nothing in common between A and B)
$A \perp B$

In [3]:
# Example of subset and superset
set1 = {1, 2, 3}
set2 = {1, 2}
set3 = {4, 5}

print(set2.issubset(set1))  
print(set1.issuperset(set2))
print(set1.isdisjoint(set3))


True
True
True


A frozenset 🥶 is an immutable set that has a fixed hash value, and can be used as keys in dictionaries or elements in other sets.

In [4]:
fset1 = frozenset(set1)
fset1.add(10) #this won't work! can't add or delete

AttributeError: 'frozenset' object has no attribute 'add'

### Generators
Generators can be used to generate an infinite sequence or to lazily load data from large datasets.

In [86]:
# Example of generating an infinite sequence
def infinite_sequence():
    num = 0
    while True:
        yield num
        num += 1

gen = infinite_sequence()
print(next(gen))  
print(next(gen))  


0
1


Generators can also be used with generator comprehensions.


In [8]:
# Example of generator expression
squares = (x * x for x in range(5))
print(list(squares)) #convert to list to print it


[0, 1, 4, 9, 16]


### enumerate()
`enumerate()` function in Python is used to iterate over a sequence (such as a list, tuple, or string) while keeping track of the index of each item. It returns an enumerate object, which yields pairs of index and value tuples.

Example: `'hi'` becomes `(0, 'h'), (1, 'e')`

Default is 0, but you can set the starting point with the `start` parameter

In [16]:
# These are tuple pairs
for pair in enumerate('hello'):
    print(pair) 
# You can also access each element of the tuple like this
for index, letter in enumerate('hello'): 
    print("The index of", letter, "is", index)

(0, 'h')
(1, 'e')
(2, 'l')
(3, 'l')
(4, 'o')
The index of h is 0
The index of e is 1
The index of l is 2
The index of l is 3
The index of o is 4


In [9]:
# Example of specifying start value for index
my_list = ['apple', 'banana', 'cherry']
for index, value in enumerate(my_list, start=1):
    print("Index:", index, "Value:", value)

Index: 1 Value: apple
Index: 2 Value: banana
Index: 3 Value: cherry


You can also create dictionaries with enumerated elements.

In [17]:
# Example of creating a dictionary with enumerated elements
my_list = ['apple', 'banana', 'cherry']
my_dictionary = {index: value for index, value in enumerate(my_list)}
print(my_dictionary)


{0: 'apple', 1: 'banana', 2: 'cherry'}


### Iterator Exhaustion 😴 
Some iterators get exhausted (fully consumed) and you can't reuse them anymore. But which ones?

In [19]:
# Let's make a bunch of iterators
list1 = ['a','b','c']

set1 = {1,2,3}

zip1 = zip(list1, set1)

enum1 = enumerate(list1)

def genexample():
    for letter in ['z','y','x']:
        yield letter
gen1 = genexample()

range1 = range(4,7)


In [78]:
# First use
for item in list1:
    print("list:", item)
for item in set1:
    print("set:", item)
for item in zip1:
    print("zip:", item)
for item in enum1:
    print("enumerate:", item)
for item in gen1:
    print("generator:", item)
for item in range1:
    print("range:", item)

list: a
list: b
list: c
set: 1
set: 2
set: 3
zip: ('a', 1)
zip: ('b', 2)
zip: ('c', 3)
enumerate: (0, 'a')
enumerate: (1, 'b')
enumerate: (2, 'c')
generator: z
generator: y
generator: x
range: 4
range: 5
range: 6


In [79]:
# Second use
for item in list1:
    print("list:", item)
    
for item in set1:
    print("set:", item)
    
for item in zip1:
    print("zip:", item)
if not list(zip1):
    print("zip exhausted 😴")

for item in enum1:
    print("enumerate:", item)
if not list(enum1):
    print("enumerate exhausted 😴")

for item in gen1:
    print("generator:", item)
if not list(gen1):
    print("generator exhausted 😴")

for item in range1:
    print("range:", item)

list: a
list: b
list: c
set: 1
set: 2
set: 3
zip exhausted 😴
enumerate exhausted 😴
generator exhausted 😴
range: 4
range: 5
range: 6


zip, enumerate, and generators can all get exhausted. When using different types of iterators other that these, check if they can be reused or not since it could impact your code. If you need to re-use a consumed iterator, recreate the object (or create a new instance: more on this next lesson!)

### Activity

In [None]:
#1 Download this image using HTTP in Python, show your code!
#https://www.csusb.edu/sites/default/files/upload/image/logos-samples-brandisty_0.png


In [1]:
#1
import urllib.request, urllib.parse, urllib.error

img = urllib.request.urlopen('https://www.csusb.edu/sites/default/files/upload/image/logos-samples-brandisty_0.png')
fhand = open('activityImage.png', 'wb')
size = 0
while True:
    info = img.read(100000)
    if len(info) < 1: break
    size = size + len(info)
    fhand.write(info)

print(size, 'characters copited.')
fhand.close()

61651 characters copited.


In [None]:
#2 Extract the text description of IST 4320😀
#Use this website https://bulletin.csusb.edu/coursesaz/ist/
import re
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

site= "https://bulletin.csusb.edu/coursesaz/ist/"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site, headers=hdr)
page = urlopen(req)
soup = BeautifulSoup(page) #Creates a BeautifulSoup object called soup


##### CHANGE CODE BELOW
course_descriptions = # add code to extract all course descriptions here using a list comprehension
##### 

# This is a helper program (assuming you are making a list)
item_number = 0
for item in course_descriptions:
    found_it = re.findall('Advanced applications development', item)
    if found_it:
        print("Found it!! Look in item number:", item_number)
        break
    item_number+=1

print(course_descriptions[item_number])

## This was just an example template...you can make your OWN code from scratch if you like to extract 
# the course description for IST 4320 😀

In [1]:
#2
import re
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

site = "https://bulletin.csusb.edu/coursesaz/ist/"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site, headers=hdr)
page = urlopen(req)
soup = BeautifulSoup(page, "html.parser")

course_descriptions = soup.find_all("p", class_="courseblockdesc" )

item_number = 0
for item in course_descriptions:
    found_it = re.findall('Advanced applications development', item.text)
    if found_it:
        print("Found it!! Look in item number:", item_number)
        break
    item_number+=1

if item_number < len(course_descriptions):
    print(course_descriptions[item_number].text)
else:
    print("Description not found.")


Found it!! Look in item number: 14

Semester Prerequisite:  IST 2310 or consent of instructor. Quarter Prerequisite: IST 282 or consent of instructorAdvanced applications development in an object-oriented environment. Advanced object-oriented concepts are applied to design and implement various applications for business information systems. Focuses on developing complex applications that address a business problem or opportunity. Formerly offered as IST 483.



In [None]:
#3 Extract all links (hint: links are in <a href> tags)
import re
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

site= "https://bulletin.csusb.edu/colleges-schools-departments/"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site, headers=hdr)
page = urlopen(req)
soup = BeautifulSoup(page) #Creates a BeautifulSoup object called soup

## Example template (you can make your OWN code from scratch if you like!)
###### ADD CODE HERE
#hint: try making a list or dictionary that contains all the hyperlinks 😀 
#Challenge!: extract both the hyperlink AND the text description of the hyperlink in one object
#More challenge! Make it a generator!
######

In [2]:
#3
import re
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

site = "https://bulletin.csusb.edu/colleges-schools-departments/"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site, headers=hdr)
page = urlopen(req)
soup = BeautifulSoup(page, "html.parser")

links = [a['href'] for a in soup.find_all('a', href=True)]

for link in links:
    dict_links = [links]
    print("Here are all the extracted links:", dict_links)

Here are all the extracted links: [['#contentarea', '/azindex/', '/', 'https://csusb.edu/', 'https://my.csusb.edu', 'https://www.csusb.edu/campus-directory', 'https://csusb.edu/library', 'https://www.csusb.edu/maps-directions', 'https://www.csusb.edu/pdc', 'https://www.csusb.edu/advancement/development/how-give/make-a-gift', '/', '/bulletin-contents/', '/colleges-schools-departments/', '/programs-az/', '/coursesaz/', '/colleges-schools-departments/arts-letters/', '/colleges-schools-departments/arts-letters/art/', '/colleges-schools-departments/arts-letters/communication-studies/', '/colleges-schools-departments/arts-letters/english/', '/colleges-schools-departments/arts-letters/music/', '/colleges-schools-departments/arts-letters/philosophy/', '/colleges-schools-departments/arts-letters/theatre-arts/', '/colleges-schools-departments/arts-letters/world-languages-literatures/', '/colleges-schools-departments/arts-letters/liberal-studies-office/', '/colleges-schools-departments/natural-sc

In [14]:
#4 Fix this generator! It should have unlimited uses
# Hint: two fixes needed :)
from random import choice

print("Welcome! Draw a card.")

def drawcard():
    for i in range(2):
        card = [str(i) for i in range(2, 11)]
        card.extend(list("JQKA"))
        suit = ["♠️","❤️","♣️","♦️"]
        selected = choice(card) + choice(suit)
        return selected

carddraw = drawcard()

print("You drew:", next(carddraw))
print("You drew:", next(carddraw))
print("You drew:", next(carddraw))

Welcome! Draw a card.


TypeError: 'str' object is not an iterator

In [13]:
#4
from random import choice

print("Welcome! Draw a card.")

def drawcard():
    card = [str(i) for i in range(2, 11)]
    card.extend(list("JQKA"))
    suit = ["♠️","❤️","♣️","♦️"]
    while True:
        yield choice(card) + choice(suit)

carddraw = drawcard()

print("You drew:", next(carddraw))
print("You drew:", next(carddraw))
print("You drew:", next(carddraw))

Welcome! Draw a card.
You drew: 5♦️
You drew: 8♣️
You drew: 10♠️


In [None]:
#5a Which unique tags in this website 
# https://www.atlasobscura.com/things-to-do/san-bernardino-california
# are NOT in the set 'tagset'?
# Hint: try set operations :)
tagset = {'a', 'img', 'svg', 'ftp', 'article'}

#5b Which tags are common to both?


#5c Create a dictionary of common tags you found in 5b with the index. 
# For example, if you found 'tag1', 'tag2' common then 
# the format should be {0: 'tag1', 1: 'tag2, ...} etc.



In [22]:
#5a
tagset = {'a', 'img', 'svg', 'ftp', 'article'}
website_tagset = {'head', 'script', 'iframe', 'body' 'html', 'div', 'nav', 'main', 'section', 'footer', 'button', 'noscript', 'img', 'span'}
print("Differences:", tagset.difference(website_tagset))

#5b
print("Intersection:", tagset.intersection(website_tagset))

#5c
common_list = tagset.intersection(website_tagset)
my_dictionary = {index: value for index, value in enumerate(common_list)}
print(my_dictionary)

Differences: {'article', 'a', 'ftp', 'svg'}
Intersection: {'img'}
{0: 'img'}


Additional Sources: 
- https://towardsdatascience.com/data-science-skills-web-scraping-using-python-d1a85ef607ed


Copyright Benjamin J. Becerra v2024.03.09.0