*********************************************************************************************************
# A Tour of Python 3  
version 1.0.1  
Authors: Phil Pfeiffer, Zack Bunch, and Feyisayo Oyeniyi  
East Tennessee State University  
Last updated June 2021  
*********************************************************************************************************

# 5.  Built-in data structures  
 5.1 [Overview](#Builtin-Data-Structures-Overview)  
 5.2 [Strings](#Builtin-Data-Structures-Strings)  
 &ensp; 5.2.1 [String instantiation](#Builtin-Data-Structures-Strings-Instantiation)  
 &ensp; 5.2.2 [Formatting](#Builtin-Data-Structures-Strings-Formatting)  
 &ensp;&ensp; 5.2.2.1 [Printf-style string formatting](#Builtin-Data-Structures-Strings-Printf-Style-String-Formatting)   
 &ensp;&ensp; 5.2.2.2 [Formatting with the `format()` method](#Builtin-Data-Structures-Strings-Format-Style-String-Formatting)  
 &ensp;&ensp; 5.2.2.3 [Formatting with template strings](#Builtin-Data-Structures-Strings-Template-String-Style-String-Formatting)  
 &ensp;&ensp; 5.2.2.4 [Formatting with f-strings](#Builtin-Data-Structures-Strings-F-String-Style-String-Formatting)  
 &ensp; 5.2.3 [Strings as sequences](#Builtin-Data-Structures-Strings-As-Sequences)  
 &ensp;&ensp; 5.2.3.1 [Indexing](#Builtin-Data-Structures-Strings-Indexing)  
 &ensp;&ensp; 5.2.3.2 [Slicing](#Builtin-Data-Structures-Strings-Slicing)  
 &ensp;&ensp; 5.2.3.3 [Strings and iteration](#Builtin-Data-Structures-Strings-Strings-And-Iteration)  
 &ensp; 5.2.4 [Regular expressions](#Builtin-Data-Structures-Strings-Regular-Expressions)  
 5.3 [Lists](#Builtin-Data-Structures-Lists)  
 &ensp; 5.3.1 [List instantiation](#Builtin-Data-Structures-Lists-Instantiation)  
 &ensp; 5.3.2 [List comprehensions](#Builtin-Data-Structures-Lists-Comprehensions)  
 &ensp; 5.3.3 [Update in place operations](#Builtin-Data-Structures-Lists-Update-In-Place)  
 &ensp; 5.3.4 [Additional list operations](#Builtin-Data-Structures-Lists-Additional-Operations)  
 5.4 [Tuples](#Builtin-Data-Structures-Tuples)  
 &ensp; 5.4.1 [Tuple instantiation](#Builtin-Data-Structures-Tuple-Instantiation)  
 &ensp; 5.4.2 [Additional tuple operations](#Builtin-Data-Structures-Tuple-Operations)  
 5.5 [Dicts](#Builtin-Data-Structures-Dicts)  
 &ensp; 5.5.1 [Dict instantiation](#Builtin-Data-Structures-Dict-Instantiation)  
 &ensp; 5.5.2 [Dict operations](#Builtin-Data-Structures-Dict-Operations)  
 5.6 [Sets](#Builtin-Data-Structures-Sets)  
 &ensp; 5.6.1 [Set instantiation](#Builtin-Data-Structures-Set-Instantiation)  
 &ensp; 5.6.2 [Additional set operations](#Builtin-Data-Structures-Set-Operations)  
 &ensp; 5.6.3 [Additional in-place operations on sets](#Builtin-Data-Structures-Set-In-Place-Operations)  
 5.7 [Frozensets](#Builtin-Data-Structures-Frozensets)  
 &ensp; 5.7.1 [Frozenset instantiation](#Builtin-Data-Structures-Frozenset-Instantiation)  
 &ensp; 5.7.2 [Additional frozenset operations ](#Builtin-Data-Structures-Frozenset-Operations)

## 5.1  Overview <a name='Builtin-Data-Structures-Overview'></a>

Python supports six types of native data structures:
[strings](#Builtin-Data-Structures-Strings), 
[lists](#Builtin-Data-Structures-Lists), 
[tuples](#Builtin-Data-Structures-Tuples), 
[dicts](#Builtin-Data-Structures-Dicts), 
[sets](#Builtin-Data-Structures-Sets), and 
[frozensets](#Builtin-Data-Structures-Frozensets). 
The following descriptions of these structures highlight selected operations from each. 
For more information, consult the Python library documentation on 
[strings](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str), 
[lists](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range), 
[tuples](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range), 
[dicts](https://docs.python.org/3/library/stdtypes.html#mapping-types-dict), 
[sets](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset), and 
[frozensets](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset).

Python's structures are divisible into what Python refers to as *mutable* and *immutable* objects.
-  Mutable objects, which include lists, dicts, and sets, can be updated after being created.
   -  This makes these objects potentially space-efficient.
   -  For reasons of computational efficiency, however, Python will not hash these objects:
       i.e., they can't be used as keys for [dicts](#Builtin-Data-Structures-Dicts)
-  Immutable objects, which include numbers, strings, tuples, and frozensets, are not updateable, once created.
   -  Operations that appear to "change" mutables produce new objects.
   -  This makes immutable objects potentially space-inefficient, but effective for use as keys in dicts:
       structures that map keys to associated values.

In [None]:
# 5.1 data structure hash functions

print('documentation for immutable object hash functions:')
print('int: ',       int.__hash__)
print('float: ',     float.__hash__)
print('complex: ',   complex.__hash__)
print('str: ',       str.__hash__)
print('tuple: ',     tuple.__hash__)
print('frozenset: ', frozenset.__hash__)
print()
print('-----------------------------')
print()
print('documentation for mutable object hash functions:')
print('list:', list.__hash__)
print('dict:', dict.__hash__)
print('set:',  set.__hash__)

## 5.2 Strings <a name='Builtin-Data-Structures-Strings'></a>

Strings are unusual structures.  Some Python operators treat them as atomic values. Others treat them as lists of characters.

### 5.2.1 String instantiation <a name='Builtin-Data-Structures-Strings-Instantiation'></a>

In [None]:
# 5.2.1.a  using single- and double-quotes to instantiate strings

print( 'abc', "easy as 1-2-3" )

In [None]:
# 5.2.1.b  using escapes to embed quotes in strings

print( "simple as \"do-re-mi\"", '\'abc\'' )

In [None]:
# 5.2.1.c  using string concatenation to build strings

'that\'s how easy' + ' ' + 'love can be'

In [None]:
# 5.2.1.d  using triple quoting to build strings that cross line boundaries.
# The newlines become part of the string.
# The string is closed with a matching triple quote.

"""
James James
Morrison Morrison
Weatherby George Dupree
Took great
Care of his Mother
Though he was only three.
James James
Said to his mother,
"Mother", he said, said he,
"You must never go down to the end of the town if you don't go down with me."
"""

In [None]:
# 5.2.1.e  using Python's raw string construct

r'prepending a single r to a string makes it a "raw" string: one where \ does not escape content'

In [None]:
# 5.2.1.f  using string replication to build strings

'abc' * 3

In [None]:
# 5.2.1.g  using the cross-object method str() to instantiate object-describing strings

print( 'str(123)        is ', str(123) )
print( 'str(\'1, 2, 3\')  is ', str('1, 2, 3') )
print( 'str(123+456j)   is ', str(123+456j)   )
print( 'str((1, 2, 3))  is ', str((1, 2, 3))  )
print( 'str([1, 2, 3])  is ', str([1, 2, 3])  )
print( 'str({1:2, 3:4}) is ', str({1:2, 3:4}) )

In [None]:
# 5.2.1.h1  using the cross-object method repr() to instantiate object-describing strings

print( 'repr(123)        is ', repr(123) )
print( 'repr(\'1, 2, 3\')  is ', repr('1, 2, 3') )
print( 'repr(123+456j)   is ', repr(123+456j)   )
print( 'repr((1, 2, 3))  is ', repr((1, 2, 3))  )
print( 'repr([1, 2, 3])  is ', repr([1, 2, 3])  )
print( 'repr({1:2, 3:4}) is ', repr({1:2, 3:4}) )

In [None]:
# 5.2.1.h2  Comparing output from str(), repr()

print( 'str(\'1, 2, 3\') == repr(\'1, 2, 3\') is', str('1, 2, 3') == repr('1, 2, 3') )
print( 'str(123) == repr(123) is', str(123) == repr(123) )
print( 'str(123+456j) == repr(123+456j) is', str(123+456j) == repr(123+456j) )
print( 'str((1, 2, 3)) == repr((1, 2, 3)) is', str((1, 2, 3)) == repr((1, 2, 3)) )
print( 'str({1:2, 3:4}) == repr({1:2, 3:4}) is', str({1:2, 3:4}) == repr({1:2, 3:4}) )

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.2.1.1:**

</span><span style='color:navy'>In the following markdown cell, 
explain the difference between `str()` and `repr()`.</span>
***


***

### 5.2.2 Formatting <a name='Builtin-Data-Structures-Strings-Formatting'></a>

Python's string formatting operators generate strings by merging content from a sequence of items into a base string. This base string normally includes meta-substrings-- think of these of as "holes"-- that mark locations for receiving content; type the content to insert; and specify how to format that content.

Python supports four mechanisms for string formatting:
-  The oldest, [*printf*-style formatting](#Builtin-Data-Structures-Strings-Printf-Style-String-Formatting), uses expressions of the form *template % item_list*
-  A somewhat newer mechanism uses expressions of the form [*template.format()*](#Builtin-Data-Structures-Strings-Format-Style-String-Formatting)
-  A still newer mechanism uses formatted string literals, called [*template-strings*](#Builtin-Data-Structures-Strings-Template-String-Style-String-Formatting)
-  A final mechanism, [*f-strings*](#Builtin-Data-Structures-Strings-F-String-Style-String-Formatting), uses an initial `f` as a shorthand for a final .format() method

The presence of four string formatting mechanisms in Python illustrates Mark Lutz's point about Python's departure from  *one-- and preferably only one-- obvious way to do it*.  But that's the way it is.

#### 5.2.2.1 Printf-style string formatting <a name='Builtin-Data-Structures-Strings-Printf-Style-String-Formatting'></a>

The string-based % operator is similar to C's *printf* function.  % takes two arguments:
-  The left-hand argument is a template to complete.
   -  It consists of two types of content:
      -  Plain old text.
      -  Format specifiers, embedded in the text, which are prefixed with %.  They come in 7 basic types:
         -  `%d`, `%i` - signed decimal integer
         -  `%o` - signed octal
         -  `%x`, `%X` - hexadecimal; show letters, when displayed, as lower- and upper-case, respectively
         -  `%e`, `%E`, `%f`, `%F`, `%g`, `%G` - floating point formats; lower and upper case forms show exponent indicator as e and E, respectively.
            -  `%e`, `%E` - show in exponent format
            -  `%f`, `%F` - show in decimal point format
            -  `%g`, `%G` - show in exponent format if exponent is -5 or less, else decimal format
         -  `%c` - single character; also accepts 1-character strings
         -  `%a`, `%s` `%r`, - string; convert argument using ascii(), str(), and repr(), respectively
         -  `%%` - single percent sign; consumes no arguments
   -  Optional qualifiers after the initial % specify the following:
      -  zero padding (0)
      -  field width (positive int)
      -  left justification (-)
      -  precision (., followed by positive int)
      -  alternate format (#)
      -  initial sign for all conversions (+)
      -  (*key*) - the name of a key in the right-hand side argument; only valid when the right-hand argument is a dict
-  The right-hand argument can be one of two types of constructs:
   -  An object that yields a sequence of values: e.g., a tuple, a list
   -  A *dict* - i.e., a collection of key-value pairs

The following examples use these additional Python constructs:
-  One-element tuples - these, like &ensp; (30,) &ensp;, must be written with a final comma to distinguish them from parenthesized terms
-  *s &ast; n*  - when *s* is a tuple, list, or string, this returns *n* copies of the sequence

In [None]:
# 5.2.2.1.a1  printf-style-formatting with a numeric constant

print( '30. in different formats:  %d,  %e,  %E,  %f,  %F,  %g,  %G' % ((30.,) * 7) )
print( '30. in alternate formats: %#d, %#e, %#E, %#f, %#F, %#g, %#G' % ((30.,) * 7) )

In [None]:
# 5.2.2.1.a2  octal and hex formats, which require an int, with a variable

thirty_as_tuple = (30,) * 6
print( '30 in different  formats: %o, %#o, %x, %X, %#x, %#X' % thirty_as_tuple )

In [None]:
# 5.2.2.1.b  printf-style formatting with different justifications

print( '30. in a five-space field, with different justifications and formats' )
print( '>%-5d<   >%05d<    >%#5d<' % ( (30.,) * 3 ) )

In [None]:
# 5.2.2.1.c  printf-style string and character formatting

print( '"30" in different formats: %c, %a, %r, %s' % (('30'[0],) + (('30',) * 3)) )

In [None]:
# 5.2.2.1.d  printf-style formatting using a dict

print( '%(this)s %(is)s %(a)s %(message)s' % { 'a' : 'A', 'is' : 'iZ', 'message' : 'mesG', 'this' : 'thiz' } )

[The Python documentation](https://docs.python.org/3/library/stdtypes.html#old-string-formatting) 
cautions that printf-style formatting exhibits "a variety of quirks that lead to a number of common errors 
(such as failing to display tuples and dictionaries correctly)". 
For this reason, it recommends the two alternative formatting mechanisms described in what follows.

#### 5.2.2.2  Formatting with the `format()` method <a name='Builtin-Data-Structures-Strings-Format-Style-String-Formatting'></a>

The string class's `format()` method supports a {}-based syntax for denoting holes.
  The basic syntax is '*some string*'.format(*some values*)', as follows:
-  *some values* can be one of two types of constructs:
   -  An object that yields a sequence of values: e.g., a tuple, a list
   -  A list of key-value pairs
-  *some string* is a base string to format. It consists of two types of content:
   -  Plain old text.
   -  Format specifiers, embedded in the text.
      -  These specifiers are denoted by matching braces ({}).
      -  They can reference the *some values* collection in one of three basic ways:
         -  `{}` - the value collection is a sequence; take the next item from the sequence
         -  `{n}` - the value collection is a sequence; take the *n*th item from the sequence (0-index)
         -  `{key}` - *key* is a key for a key-value pair; take the value associated with *key*
      -  As described in [the Python library doc](https://docs.python.org/3/library/string.html#formatstrings),
          these references can be qualified in one of three ways:
         -  They can be prefixed with format specifiers.
             These specifiers are like those for [printf-style formatting](#Builtin-Data-Structures-Strings-Printf-Style-String-Formatting),
             with the following exceptions and additions:
            -  `:` is used instead of % to prefix format qualifiers: e.g., %3.2f is comparable to {:3.2f}
            -  `s` (string) is the default, and can be omitted: i.e., {} is the same as {!s)
            -  `%` denotes a percentage; the value is multiplied by 100 and displayed as a fixed format value
            -  `n` functions like `g`, but uses the current locale to render output
            -  `,` and `_` specify the use of , and _ as thousands separators, respectively
            -  `<`, `^`, and `>` specify left alignment, centering, and right alignment, respectively
            -  `=` inserts padding after any sign but before all other digits
         -  Non-atomic objects can be qualified with a suffix that selects one of their components
         -  Objects can be qualified with a suffix that coerces their value, using one of three Python built-in functions:
            -  `!a` - specifies *ascii()*
            -  `!r` - specifies *repr()*
            -  `!s` - specifies *str()*

In [None]:
# 5.2.2.2.a  ordered retrieval of items by format() for template string insertion

print('{} is a {} with {} {}'.format( 'This', 'string', 'inserted', 'content' ) )

In [None]:
# 5.2.2.2.b  positional retrieval of items, including repeated items, for template string insertion

print( '30 in different formats: {int:d}, {int:o}, {int:x}, {fp:e}, {fp:03.3f}, {fp:g}'.format( fp=30., int=30 ) )

#### 5.2.2.3  Formatting with template strings <a name='Builtin-Data-Structures-Strings-Template-String-Style-String-Formatting'></a>

According to the [Python library documentation](https://docs.python.org/3/library/string.html#template-strings), template strings were introduced to provide improved support for string internationalization. Template string formatting, like format string formatting, is method-based. The basic syntax is &ensp; '*some string*'.`substitute`(*some values*). Here,
-  *some string* is a base string to format. It consists of two types of content:
   -  Plain old text.
   -  Format specifiers, embedded in the text.  They take one of three forms:
      - \$ &#123; *identifier* &#125; 
      - \$*identifier* - a shorthand for the former when *identifier* is followed by a non-word character.
      - $$ - a shorthand for $
-  *some values* is a list of key-value pairs

`substitute` fails if any of the specified identifiers are not in *some values* list of keys.  A second method, `safe_substitute`, leaves *identifier* in place when it's missing from the key-values list.

Note: as of the time when this document was written, the Windows port of Python lacked support for template strings.

In [None]:
# 5.2.2.3.a  example of template string formatting, using substitute()

if 'substitute' in dir(str):
  print( '$this $_is $a $message'.substitute( a='A', _is='iZ', message='mesG', this='thiz' ) )
else:
  print( 'template strings aren\'t supported in this implementation of Python.' )

In [None]:
# 5.2.2.3.b  example of template string formatting, using substitute(), with a missing key

if 'substitute' in dir(str):
  print( '$this $_is $a $message'.substitute( a='A', _is='iZ', this='thiz' ) )
else:
  print( 'template strings aren\'t supported in this implementation of Python.' )

In [None]:
# 5.2.2.3.c  example of template string formatting, using safe_substitute(), with a missing key

if 'safe_substitute' in dir(str):
  print( '$this $_is $a $message'.safe_substitute( a='A', _is='iZ', this='thiz' ) )
else:
  print( 'template strings aren\'t supported in this implementation of Python.' )

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.2.2.3.1:**

</span><span style='color:navy'>In the following markdown cell, 
explain why the above examples use `is_` as a key, rather than `is`.</span>
***


***

#### 5.2.2.4  Formatting with f-strings <a name='Builtin-Data-Structures-Strings-F-String-Style-String-Formatting'></a>

As of this writing, `f-strings` are not described in the Python library documentation.Rather, they're documented in a [Python Enhancement Proposal (PEP) - PEP 498](https://www.python.org/dev/peps/pep-0498/), which characterizes them as a simpler alternative to `str.format()`.Loosely speaking, `f-strings` are like `str.format`, with the following changes:
-  an initial `f` in front of a string triggers formatting
-  each expression to print is placed directly in paired curly braces
-  operators for positioning values in fields aren't supported
-  `{{` and `}}` can be used to include braces in f-strings
-  backslash `(\)` characters are disallowed in f-strings

By default, the `f-string` formatting method, `__format__`, uses `__str__` to convert expressions to strings. `!a` and `!r` can be appended to an expression to request ascii and `__repr__` conversions, respectively.

In [None]:
# 5.2.2.4  example of f-string formatting

a = 3
b = 4
message_part_1 = 'a is'
message_part_2 = '; b is'
message_part_3 = '; a+b is'
message_part_4 = '; a-b is'

print( f'{message_part_1} {a}{message_part_2} {b}{message_part_3} {a+b}{message_part_4} {a-b}' )

### 5.2.3  Strings as sequences <a name='Builtin-Data-Structures-Strings-As-Sequences'></a>

Strings can behave as atomic values or sequences, depending on context.  Python supports two operators on sequences:
-  an indexing operator, denoted by []
-  a slicing operator, a generalization of the indexing operator, denoted by [:] or [::]

#### 5.2.3.1 Indexing <a name='Builtin-Data-Structures-Strings-Indexing'></a>


In [None]:
# 5.2.3.1.a  using positive indexing to enumerate a string's characters

from math import log10, floor
test_string = 'abcdef'
test_string_len_space_allocation = floor(log10(len(test_string))) + 1
test_string_message_format = 'test string character %s%dd is %sc' % ( '%', test_string_len_space_allocation, '%' )

for this_index in range(len(test_string)):
  print( test_string_message_format % ( this_index , test_string[this_index] ) )

In [None]:
# 5.2.3.1.b  using negative indexing to enumerate a string's characters

from math import log10, floor
test_string = 'abcdef'
test_string_len_space_allocation = floor(log10(len(test_string))) + 1
test_string_message_format = 'test string character %s%dd is %sc' % ( '%', test_string_len_space_allocation, '%' )

for this_index in range(-1,-len(test_string)-1,-1):
  print( test_string_message_format % (this_index, test_string[this_index]) )

In [None]:
# 5.2.3.1.c  using Python's enumerate() built-in to enumerate a string's characters and their positions

from math import log10, floor
test_string = 'abcdef'
test_string_len_space_allocation = floor(log10(len(test_string))) + 1
test_string_message_format = 'test string character %s%dd is %sc' % ( '%', test_string_len_space_allocation, '%' )

for (this_index, this_character) in enumerate(test_string):
  print( test_string_message_format % (this_index, this_character) )

#### 5.2.3.2  Slicing <a name='Builtin-Data-Structures-Strings-Slicing'></a>

*Slicing* is primarily useful for extracting content from sequence-based datasets. These examples use strings to illustrate the operator's function.  Slicing can also be used to extract subsequences from tuples and lists.

Python's slicing operator has two forms.  The operator's first form, [`begin`:`onebeyond`], extracts a contiguous substring from a base string `s`.
-  The operator's first `begin` argument is the slice's initial index. 
   -   Nonnegative values relate to a string's leftmost element, starting with 0. 
   -   Negative values relate to a string's rightmost element:  i.e., `-len(s)` denotes a string's leftmost character.  
-  The operator's second `onebeyond` argument is one position beyond the slice's final index.  
   -   Nonnegative values relate to a string's leftmost element, starting with 0:  i.e., `len(s)+1` denotes all elements up to and including `s`'s rightmost character. 
   -   Negative values relate to a string's rightmost element:  i.e., -1 denotes all elements up to but not including `s`'s rightmost character

In [None]:
# 5.2.3.2.a  illustrating the default values for the [:] operator

test_string = 'abcdef'
print( 'test string is                                          ', test_string )
print( 'test string sliced with a start index of 1     is       ', test_string[1:len(test_string)] )
print( 'test string sliced with a final index of len-1 is       ', test_string[0:len(test_string)-1] )
print( 'test string sliced with default start, final indices is ', test_string[:] )

In [None]:
# 5.2.3.2.b  using positive indexing to enumerate a string's substrings

test_string = 'abcdef'
for start_index in range(len(test_string)):
  leading_padding = ' ' * start_index;
  for end_index in range(start_index+1, len(test_string)+1):
    slice = test_string[start_index:end_index]
    print( f"{test_string}[{start_index}:{end_index}] is {leading_padding}'{slice}'" )

In [None]:
# 5.2.3.2.c  using negative indexing with a negative stride to enumerate a string's substrings

test_string = 'abcdef'
for start_index in range(-len(test_string), 0):
  leading_padding = ' ' * (len(test_string)+start_index);
  for end_index in range(start_index+1, 0):
    slice = test_string[start_index:end_index]
    print( f"{test_string}[{start_index}:{end_index}] is {leading_padding}'{slice}'" )

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.2.3.2.1:**

</span><span style='color:navy'>In the following code cell, for the previous two (positive and negative range) examples, show what happens when the range of a slice expands beyond test_string's length.</span>

The slicing operator's second, [`begin`:`onebeyond`:`k`] form adds an explicit stride parameter.
-  A positive stride, `k`, direct the operator to select every `k`th value from the sequence, moving from left to right. 
-  A negative stride, `k`, directs the operator to select every `k`th value from the sequence, moving from right to left.

In [None]:
# 5.2.3.2.d  illustrating the default stride value for the [::] operator

test_string = 'abcdef'
print( 'test string is ', test_string )
print( 'test string sliced with a stride of 1 is      ', test_string[0:len(test_string):1] )
print( 'test string sliced with a stride of 2 is      ', test_string[0:len(test_string):2] )
print( 'test string sliced with the default stride is ', test_string[0:len(test_string):] )

In [None]:
# 5.2.3.2.e  using positive indexing with a negative stride to enumerate a string's substrings

test_string = 'abcdef'
for start_index in range(0,len(test_string)):
  leading_padding = ' ' * (len(test_string)-start_index);
  for end_index in range(start_index):
    slice = test_string[start_index:end_index:-1]
    print( f"{test_string}[{start_index}:{end_index}:-1] is {leading_padding}'{slice}'" )

In [None]:
# 5.2.3.2.f  using negative indexing to enumerate a string's substrings

test_string = 'abcdef'
for start_index in range(-1,-len(test_string)-1,-1):
  leading_padding = ' ' * (-start_index-1);
  for end_index in range(start_index-1, -len(test_string)-2, -1):
    slice = test_string[start_index:end_index:-1]
    print( f"{test_string}[{start_index}:{end_index}:-1] is {leading_padding}'{slice}'" )

In [None]:
# 5.2.3.2.g  illustrating the effect of varying positive strides on positive indexing

test_string = 'abcdefghijklmnop'
for stride in range(1, len(test_string)+1):
  print( f"{test_string}[::{stride}] is '{test_string[::stride]}'" )

In [None]:
# 5.2.3.2.h illustrating the effect of varying negative strides on negative indexing

test_string = 'abcdefghijklmnop'
for stride in range(-1,-len(test_string)-1,-1):
  slice = test_string[-1:-len(test_string)-1:stride]
  print( f"{test_string}[-1:{-len(test_string)-1}:{stride}] is '{slice}'" )

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.2.3.2.2:**

</span><span style='color:navy'>In the following code cell, illustrate the effect of specifying a stride of 0.</span>

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.2.3.2.3:**

</span><span style='color:navy'>In the following code cell, illustrate a single slice with numeric first and second arguments and negative stride that returns the entirety of test_string. If you can't do this, show that it can't be done, including examples to make your point.</span>

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.2.3.2.4:**

</span><span style='color:navy'>In the following code cell, repeat the previous exercise, experimenting with `None` as a value for the slicing argument's first and second operators.</span>

#### 5.2.3.3  Strings and iteration  <a name='Builtin-Data-Structures-Strings-Strings-And-Iteration'></a>

The following examples illustrate a potential hazard that a string's dual nature as an atomic value and a list of characters creates for loop-based string manipulation. 

**Including commas in singleton tuples is especially critical for strings.** Failing to do so can totally give unexpected results WITHOUT generating an obvious program failure. This makes the mistake particularly difficult to catch.

In [None]:
# 5.2.3.3.a looping through a singleton sequence of strings

string_seq = ('abcdef',)    # final ',' a must, to ensure interpretation as a tuple rather than an atomic value
for string in string_seq:
  print( string )

In [None]:
# 5.2.3.3.b looping through a string

string_seq = ('abcdef')    # without the final ',' string_seq is interpreted as a list of characters
for string in string_seq:
   print( string )

### 5.2.4 Regular expressions <a name='Builtin-Data-Structures-Strings-Regular-Expressions'></a>

Regular expressions (regexps) were devised and explored in the late 1940's by U. Wisconsin-Madison mathematician Stephen Kleene. Regexps are essentially an expressive means for characterizing groups of related strings.  At base, all regexps are built from three basic string operators:
-  xy – concatenation
-  &ast; – repetition (any number of) 
-  | - alternation (either/or) 

Over time, the set of operators that various implementations of regexps offer has expanded greatly. Still, all of these operators can be characterized as shorthands for combinations of Kleene's original three operators.

As of the mid-2010's, the Free Software Foundation supported 12 different regexp standards in its GNU suite of Unix utilities.   Even so, the de facto regexp standard is the Perl programming language's implementation of regexps, which dates to the late 1980's.   Python implements Perl-like regexps, providing support for the common regexp operators (e.g., &ast;, +, ?, {*m*,*n*}, [..])  as well as more exotic operators, like these:
-  positive and negative lookbehind ((?&lt;=...), (?&lt;!...))
-  positive and negative lookahead  ((?=...), (?!...))
-  noncapturing expressions (?:...)
-  expression tagging (?P=(name)...)

The following examples use this additional Python construct:
-  `split` - split a string into multiple chunks, returning a list of chunks, using the specified string as a point of splitting (default: whitespace)

In [None]:
# 5.2.4.a using regular expressions to retrieve non-backreferenced patterns.
# re.findall - return all matching patterns in a given string

import re

# regular expression subpatterns for this example
ASSERT_WORD_START=r'?<!\w'
FOUR_WORD_CHARS=r'\w{4}'
ASSERT_WORD_END=r'?!\w'

# the text to process
text = """James James
Morrison Morrison
Weatherby George Dupree
Took great
Care of his Mother
Though he was only three.
James James
Said to his mother,
"Mother", he said, said he,
"You must never go down to the end of the town if you don't go down with me.""" + '"'

# the pattern to attempt
four_letter_words = []
four_letter_word_pattern = '({}){}({})'.format(ASSERT_WORD_START, FOUR_WORD_CHARS, ASSERT_WORD_END)

# go process
for line in text.split('\n'):
  four_letter_words += re.findall( four_letter_word_pattern, line)
print( 'four letter words in passage: ', ', '.join( four_letter_words ) )

In [None]:
# 5.2.4.b using regular expressions to retrieve backreferenced patterns.
# This problem is more difficult, because of the need to avoid capturing irrelevant capture groups.
# The problem can't be solved with re.findall, which returns all capture groups.
# Instead, the code uses compiled search objects, which support a starting point argument.
# then slides through each string, checking for matching patterns.

import re

# regular expression subpatterns for this example
ASSERT_WORD_START=r'?<!\w'
WORD_CHARS=r'\w+'
TRAILING_COMMA='?:,'
NONWORD_CHAR=r'\W'
NONWORD_CHARS=r'\W+'
STUFF='.*'
_2ND_ITEM_IN_PARENS=r'\2'
_3RD_ITEM_IN_PARENS=r'\3'
ASSERT_WORD_END=r'?!\w'

# the text to process
text = """James James
Morrison Morrison
Weatherby George Dupree
Took great
Care of his Mother
Though he was only three.
James James
Said to his mother,
"Mother", he said, said he,
"You must never go down to the end of the town if you don't go down with me.""" + '"'

# what to process, and where to store results
regexps_to_process_plus_matching_phrases = []

# the list of all matches to attempt
words_before_commas = []
pattern_elements = [ ASSERT_WORD_START, WORD_CHARS, TRAILING_COMMA ]
this_pattern = '({})(?P<pattern_to_seek>({}))({})'.format( *pattern_elements )
word_before_commas_regexp = re.compile( this_pattern )
regexps_to_process_plus_matching_phrases += [ ( word_before_commas_regexp, words_before_commas ) ]

doubled_adjacent_words = []
pattern_elements = [ ASSERT_WORD_START, WORD_CHARS, NONWORD_CHARS, _2ND_ITEM_IN_PARENS, ASSERT_WORD_END ]
this_pattern = '({})(?P<pattern_to_seek>({})({})({}))({})'.format( *pattern_elements )
doubled_adjacent_words_regexp = re.compile( this_pattern )
regexps_to_process_plus_matching_phrases += [ (doubled_adjacent_words_regexp, doubled_adjacent_words ) ]

phrases_with_doubled_words = []
pattern_elements = [ ASSERT_WORD_START, WORD_CHARS, NONWORD_CHAR, STUFF, NONWORD_CHAR, _2ND_ITEM_IN_PARENS, ASSERT_WORD_END ]
this_pattern = '({})(?P<pattern_to_seek>({}){}({}{})?{})({})'.format( *pattern_elements )
phrase_with_doubled_words_regexp = re.compile( this_pattern ) 
regexps_to_process_plus_matching_phrases += [ (phrase_with_doubled_words_regexp, phrases_with_doubled_words ) ]

reversed_two_word_phrases = []
pattern_elements = [ ASSERT_WORD_START, WORD_CHARS, NONWORD_CHARS, WORD_CHARS, ASSERT_WORD_END, STUFF ]
pattern_elements += [ ASSERT_WORD_START, _3RD_ITEM_IN_PARENS, NONWORD_CHARS, _2ND_ITEM_IN_PARENS, ASSERT_WORD_END ]
this_pattern = '({})(?P<pattern_to_seek>({}){}({})({}){}(({}){}{}{}))({})'.format( *pattern_elements )
reversed_two_word_phrase_regexp = re.compile( this_pattern )
regexps_to_process_plus_matching_phrases += [ (reversed_two_word_phrase_regexp, reversed_two_word_phrases ) ]

for line in text.split('\n'):
  for (regexp, matching_phrases) in regexps_to_process_plus_matching_phrases:
    start_position = 0
    while True:
      this_match = regexp.search(line, start_position)
      if not this_match:
        break  # no more matches to be had for this line
      matching_phrases += [ this_match.groupdict()[ 'pattern_to_seek' ] ]   # retrieve this match 
      start_position = this_match.span()[1]+1                               # continue after this pattern 

print( 'words before commas in passage: ',         words_before_commas )
print( 'doubled adjacent words in passage: ',      doubled_adjacent_words )
print( 'phrases with doubled words in passage: ',  phrases_with_doubled_words )
print( 'four word phrases with reversed halves: ', reversed_two_word_phrases )

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.2.4.1:**

</span><span style='color:navy'>In the following markdown cell, explain why the previous example's third test, which checks for phrases with doubled words, doesn't return all such phrases.</span>
***


***


<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.2.4.2:**

</span><span style='color:navy'>In the following code cell, update this example, fixing the logic that fixes this problem, and does so in a uniform way for all four checks.</span>

### 5.2.5 Concluding exercises for strings

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.2.5.1:**

</span><span style='color:navy'>In the following code cell, show the use of the string type's [split](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) operator to create examples that illustrate the following:</span>
-  <span style='color:navy'>The effect of splitting a string s with a split string ss of length 2 or more.  Make sure s contains two or more instances of ss</span>
-  <span style='color:navy'>The effect of splitting a string s with a zero-length split string.  Make sure s is nonempty</span>

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.2.5.2:**

</span><span style='color:navy'>In the following code cell, show the use of the code in the section on [Docstrings](./4.%20%20Interactive%20help%20features.ipynb#Interactive-Help-Features-Docstrings)  to identify and obtain documentation on methods supported by class *str*</span>

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.2.5.3:**

</span><span style='color:navy'>In the following code cell, repeat the previous exercise, modifying the "for" loop to skip names that begin and end with '\_\_'</span>

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.2.5.4:**

</span><span style='color:navy'>In the following code cell, repeat the previous exercise, this time modifying the "for" loop to skip names that begin with exactly two underscores ( '\_\_' ) -- no more -- and end with exactly two underscores -- no more.</span>

## 5.3 Lists <a name='Builtin-Data-Structures-Lists'></a>

Lists are core Python data structures.  They're commonly used to represent sequences of items and vectors (1-D arrays). Nested lists are commonly used to represent multi-dimensional arrays.

### 5.3.1 List Instantiation <a name='Builtin-Data-Structures-Lists-Instantiation'></a>

In [None]:
# 5.3.1.a  using square brackets to instantiate lists

print( [], [1], [1,2,3,4,5,6] )

In [None]:
# 5.3.1.b  using list concatenation to build lists

[1, 2, 3] + [4, 5, 6]

In [None]:
# 5.3.1.c using list replication to build lists

[1, 2, 3] * 3

In [None]:
# 5.3.1.d  using the cross-object method list() to instantiate object-describing strings

print( 'list([1, 2, 3]) is       ', list([1, 2, 3]) )
print( 'list((1, 2, 3)) is       ', list((1, 2, 3)) )
print( 'list({1, 2, 3}) is       ', list({1, 2, 3}) )
print( 'list({1:7, 2:8, 3:9}) is ', list({1:7, 2:8, 3:9}) )

### 5.3.2 List comprehensions <a name='Builtin-Data-Structures-Lists-Comprehensions'></a>

Comprehensions are an __essential__ construct to master for clarity of coding.    Comprehensions build collections of values without the need for classic loop-like control blocks.   As such, they're highly concise constructs for denoting large, potentially complex structures.

List comprehensions build lists.  Python's other built-in collection classes-- tuples, dicts, sets, and frozensets-- also support comprehensions. These objects' comprehensions work similarly to what is described below.

The basic form of a list comprehension is as follows: 

&ensp;&ensp;&ensp;&ensp;[ *value-returning expression* **for** *index variable* **in** *sequence of values* **if** *condition is True* ]

  The final `if` clause is optional.  Intuitively, for each successive value returned by its *sequence of values* clause, a comprehension
-  assigns that value to *index variable*
-  if the `if` clause is missing, or if evaluating *sequence of values* relative to the value of *index variable* returns True
   -  evaluates *value-returning expression*, relative to the value of *index variable*
   -  inserts that value into the next position of the list to return


In [None]:
# 5.3.2.a  generate [0, 2, 4, 6], using different combinations of *range* and "if" clauses

print([i for i in range(0, 8, 2)])
print([-i for i in range(0, -8, -2)])
print([2*i for i in range(4)])
print([i for i in range(8) if i % 2 == 0])

In [None]:
# 5.3.2.b  generate a list of upper case letters from ascii character codes for those letters,
# then display it as a list and a string

x = [chr(i) for i in range(ord('A'), ord('Z')+1)]
print( ''.join(x), '\n', x )

In [None]:
# 5.3.2.c  generate a list of lower-case consonants from ascii character codes for those letters,
# then display it as a list and a string

x = [chr(i) for i in range(ord('a'), ord('z')+1) if chr(i) not in 'aeiou']
print( ''.join(x), '\n', x )

In [None]:
# 5.3.2.d  generate a list of lists of integers

print([[i, 2*i] for i in range(10)])

Comprehensions can be doubly nested, like double-nested loops.
  Double-nested comprehensions can be used to generate flat lists or lists of lists.

In [None]:
# 5.3.2.e  generate a flat version of the single-digit multiplication table

print( [ i*j for i in range(10) for j in range(10)] )

In [None]:
# 5.3.2.f  generate and pretty-print a two-dimensional version of the single-digit multiplication table

mult_table = [ [row*col for col in range(10)] for row in range(10)]
for multi_table_row in mult_table:
  for item in multi_table_row:
    print( '%4d' % item, end = '' )
  print()

In [None]:
# 5.3.2.g  like the previous, except insert the formatted entries in the list 

mult_table = [ [ '%4d' % (row*col) for col in range(10)] for row in range(10)]
for multi_table_row in mult_table:
  for item in multi_table_row:
    print( item, end = '' )
  print()

In [None]:
# 5.3.2.h  generate and pretty-print the upper diagonal version of the single-digit multiplication table

mult_table = [ [ '%4d' % (row*col) if row <= col else '    ' for col in range(10)] for row in range(10)]
for multi_table_row in mult_table:
  for item in multi_table_row:
    print( item, end = '' )
  print()

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.3.2.1:**

</span><span style='color:navy'>In the following code cell, craft an example that generates and pretty-prints the lower-diagonal version of the single-digit multiplication table.</span>

In [None]:
# 5.3.2.i  filtering a list's items based on their lengths

milne_phrase = "Took great care of his Mother Though he was only three"
max_word_len = max([len(substr) for substr in milne_phrase.split()])
for i in range(1, max_word_len+1):
  print( f"words of length {i}: {', '.join([substr for substr in milne_phrase.split() if len(substr) == i])}" )

In [None]:
# 5.3.2.j  constructing lists of booleans

milne_phrase = "Took great care of his Mother Though he was only three"
capital_letters = [chr(i) for i in range(ord('A'), ord('Z')+1)]
print( 'word in phrase is capitalized', [ ( substr, substr[0] in capital_letters ) for substr in milne_phrase.split() ] )

In [None]:
# 5.3.2.k  filtering by type

test_list = ['a', 1, 'b', 2, 'c', '3']
for (valtype, typename) in (('int', 'integer'), ('str', 'string')):
  items_of_valtype = [ str(item) for item in test_list if isinstance(item, eval(valtype)) ]
  typename = typename if len(items_of_valtype) == 1 else typename+'s'
  print( typename + ' in test list: ', ', '.join(items_of_valtype) )

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.3.2.2:**

</span><span style='color:navy'>In the following code cell, craft an example that shows whether the index identifiers that a comprehension employs-- e.g., i in [i for i in range(10)]-- change the values of any identifiers with the same names that were in use before the comprehension executes.</span>

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.3.2.3:**

</span><span style='color:navy'>In the following code cell, craft an example that shows whether the index variables that a comprehension employs-- e.g., i in [i for i in range(10)]-- persist after the comprehension executes.</span>

#### 5.3.3 Update in place operations <a name='Builtin-Data-Structures-Lists-Update-In-Place'></a>

All operations that update values in place return `None`.

In [None]:
# 5.3.3.a Append a value to a list

x = [1, 2, 3]
x.append(4)
x

In [None]:
# 5.3.3.b Insert values into a list

x = [2, 3, 4]
print( 'initial list:                             ', x )
x.insert(0, 1) 
print( 'after inserting 1 at list head:           ', x )
x.insert(-len(x)-1, 0)
print( 'after inserting 0 at list head:           ', x )
x.insert(-1, 3.5)
print( 'after inserting 3.5 before the last item: ', x )
x.insert(len(x), 5)
print( 'after inserting 5 after the last item:    ', x )

In [None]:
# 5.3.3.c  Reverse a list in place

x = [1, 2, 3]
print( 'initial list:         ', x )
x.reverse()
print( 'list after reversal:  ', x )

In [None]:
# 5.3.3.d  Treat a list as a stack, exhausting and printing its values

x = [1, 2, 3, 4]
print( 'initial list:   ', x )
while x:
  print(  'popped', x.pop(), 'from list ') 
  print(  'list is now    '       , x)

### 5.3.4 Additional list operations <a name='Builtin-Data-Structures-Lists-Additional-Operations'></a>

In [None]:
# 5.3.4.a  return a list's length

print( 'length of [] is ', len([]) )
print( 'length of [\'a\', \'b\', \'c\'] is ', len(['a', 'b', 'c']) )

In [None]:
# 5.3.4.b  return a list of pairs of parallel items from two lists in two ways

list_abc = ['a', 'b', 'c']
list_123 = [1, 2, 3]

print( [(list_value1, list_value2) for (list_value1, list_value2) in zip(list_abc, list_123)] )
print( [(list_value2, list_value1) for (list_value1, list_value2) in zip(list_abc, list_123)] )

## 5.4 Tuples  <a name='Builtin-Data-Structures-Tuples'></a>


Tuples, like lists, are commonly used to represent sequences of items and vectors (1-D arrays). Unlike lists, tuples are immutable. This makes them suitable for use as indices for dicts.

### 5.4.1 Tuple instantiation <a name='Builtin-Data-Structures-Tuple-Instantiation'></a>

In [None]:
# 5.4.1.a  using parenthesized expressions to instantiate tuples
# IMPORTANT: a singleton tuple must be instantiated with a comma

(), (1,), (1,2,3,4)

In [None]:
# 5.4.1.b  Composing the contents of two tuples

(1, 2, 3) + (4, 5, 6)

In [None]:
# 5.4.1.c  using the cross-object method tuple() to instantiate tuples

print( 'tuple([{1:2, 3:4}) is    ', tuple({1:2, 3:4}) )
print( 'tuple([(1,2), (3,4)]) is ', tuple([(1,2), (3,4)]) )
print( 'tuple(((1,2), (3,4))) is ', tuple(((1,2), (3,4))) )

In [None]:
# 5.4.2.d  tuple comprehension

tuple( chr(code) for code in range( ord('a'), ord('z')+1 ) )

### 5.4.2 Additional tuple operations <a name='Builtin-Data-Structures-Tuple-Operations'></a>

In [None]:
# 5.4.2.a  length of a tuple

print( 'length of () is ', len(()) )
print( 'length of (1, 2, 3, 4) is ', len( (1, 2, 3, 4) ) )

## 5.5 Dicts  <a name='Builtin-Data-Structures-Dicts'></a>

A Python dict (short for "dictionary") is an associative array that pairs immutable objects that key the dict's elements with other objects, referred to as values.

### 5.5.1 Dict instantiation <a name='Builtin-Data-Structures-Dict-Instantiation'></a>

In [None]:
# 5.5.1.a  using curly braces to instantiate dicts

{}, {1:2, 3:4}

In [None]:
# 5.5.1.b  using the cross-object method dict() to instantiate dicts

print( 'dict([{1:2, 3:4}) is    ', dict({1:2, 3:4}) )
print( 'dict([(1,2), (3,4)]) is ', dict([(1,2), (3,4)]) )
print( 'dict(((1,2), (3,4))) is ', dict(((1,2), (3,4))) )

In [None]:
# 5.5.1.c  using different, immutable keys for a dict

x = {1:1, 2.0:2, 3+3j:3, 'four':4, (5,):5, frozenset((6,)):6}
print( x[1], x[2.0], x[3+3j], x['four'], x[(5,)], x[frozenset((6,))] )

In [None]:
# 5.5.1.d  dict comprehension, mapping character codes for lower-case characters to their respective characters

{ ( code, chr(code) ) for code in range( ord('a'), ord('z')+1 ) }

### 5.5.2 Dict operations <a name='Builtin-Data-Structures-Dict-Operations'></a>

The following examples use these additional Python constructs:
-  `lambda` - [a shorthand for a nameless function that maps its arguments to a single value](./7.%20Functions.ipynb#Functions-Lambda-Expressions)
-  `set` -    returns a set; set intersection is &
-  `&` -      set intersection operator
-  `assert` - throw exception of first argument's logical expression is False, with string given by second

In [None]:
# 5.5.2.a  number of elements in a dict

print( 'length of {} is ', len({}) )
print( 'length of {1:2, 3:4, 5:6, 7:8} is ', len( {1:2, 3:4, 5:6, 7:8} ) )

In [None]:
# 5.5.2.b  return value if key present, else second if key absent

x = { 1:2, 3:4, 5:6 }
[ '%d = %s,' % (key, str(x.get(key,'not present'))) for key in range(7) ]

In [None]:
# 5.5.2.c  retrieving all keys as a list

[k for k in { 1:2, 3:4, 5:6 }.keys()]

In [None]:
# 5.5.2.d  retrieving all values as a list

[v for v in { 1:2, 3:4, 5:6 }.values()]

In [None]:
# 5.5.2.e  Composing the contents of two dictionaries

x = {'a':'b', 'c':'d'}
y = {1:2, 3:4, 5:6, 7:8}

common_keys = set(x.keys()) & set(y.keys())
assert not common_keys, 'can\'t compose x and y, due to duplicate keys (%r)' % common_keys
dict_to_list = lambda d: [(key,value) for (key,value) in d.items()]

print( f'merge of {x} and {y} is {dict( dict_to_list( x ) + dict_to_list( y ) )}' )

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.5.2.1:**

</span><span style='color:navy'>In the following code cell, create a code that *inverts* a dict: i.e., maps v to k wherever the original maps k to v.Your code should explicitly test for and print a descriptive error message if the original contains either of the following:
-  <span style='color:navy'>two or more items with the same value </span>
-  <span style='color:navy'>any mutable values:  i.e., any values v for which `type(v).__hash__` is `None`.</span>

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.5.2.2:**

</span><span style='color:navy'>In the following code cell, create a code that merges two dicts when each key that is common to the two dicts associates that key with the same value: i.e., reject merges when the two dicts associate any shared key with different values.  Illustrate your logic with two examples: one where the two dicts share a single, common (key, value) pair, and one where the dicts associate a common key with different values.</span>

In [None]:
#  5.5.2.f   using a dict to represent a (sparse) matrix

x = dict([ ((row,col), row*col) for col in range(10) for row in range(10) if row*col % 4 == 0 ])
print( 'sparse array contains ', x, '\n' )

print( 'sparse array, displayed as array: \n' )
for row in range(10):             # again, be careful of the indentation
  for col in range(10):
    try:
      print('{0:3d}'.format(x[(row,col)]), end="")
    except:
      print('   ', end="")
  print()

More specialized types of dicts are provided by the [Python library's collections module](https://docs.python.org/3/library/collections.html#module-collections).One, a [ChainMap](./17.%20ChainMaps.ipynb), creates a single, sequential view of a series of dicts
Others not covered in this version of the Tour include the following:
-   Counter - dict subclass for counting hashable objects
-   OrderedDict - dict subclass that remembers the order entries were added
-   defaultdict - dict subclass that calls a factory function to supply missing values
-   UserDict - wrapper around dictionary objects for easier dict subclassing
-   UserList - wrapper around list objects for easier list subclassing
-   UserString - wrapper around string objects for easier string subclassing

## 5.6  Sets <a name='Builtin-Data-Structures-Sets'></a>

Sets, including frozensets, were added relatively late to the core Python language. This decision is reflected in part by the use of curly braces, a variant of dict() syntax, to instantiate sets.

Like dicts, sets and frozensets can only contain immutable values, due to Python's use of hashing to retrieve items in sets.

### 5.6.1 Set instantiation <a name='Builtin-Data-Structures-Set-Instantiation'></a>

In [None]:
# 5.6.1.a  using curly braces to instantiate sets
# IMPORTANT:  an empty set must be instantiated using the set constructor

print( 'Value of {1} is                     ', {1}                                   )
print( 'Value of {1,2,3,4} is               ', {1,2,3,4}                             )
print( 'Value of {1, 1, 1, 1, 1, 2, 3, 4} is', {1, 1, 1, 1, 1, 2, 3, 4} )
print( )
print( 'type of {}  is ', type({}) )
print( 'type of {1} is ', type({1}) )

In [None]:
# 5.6.1.b  Composing the contents of two sets

print( 'Value of {1, 2, 3} | {4, 5, 6} is', {1, 2, 3} | {4, 5, 6} )

In [None]:
# 5.6.1.c  set comprehension

{ chr(code) for code in range( ord('a'), ord('z')+1 ) }

In [None]:
# 5.6.1.c  using the cross-object method set() to instantiate sets

print( 'set( )                is', set( )                )
print( 'set( {1, 2, 3, 4} )   is', set( {1, 2, 3, 4} )   )
print( 'set( {1:2, 3:4} )     is', set( {1:2, 3:4} )     )
print( 'set( [(1,2), (3,4)] ) is', set( [(1,2), (3,4)] ) )
print( 'set( ((1,2), (3,4)) ) is', set( ((1,2), (3,4)) ) )

### 5.6.2 Additional set operations <a name='Builtin-Data-Structures-Set-Operations'></a>

In [None]:
# 5.6.2.a  number of elements in a set

print( 'length of set() is ', len( set() ) )
print( 'length of { 1, 2, 3, 4 } is ', len( { 1, 2, 3, 4 } ) )

In [None]:
# 5.6.2.b  set-combining operations

a = {1, 2, 3}
b = {2, 3, 4}

print( f'{a} union {b} is {a | b}' )
print( f'{a} intersection {b} is {a & b}' )
print( f'the set difference of {a} and {b} is {a - b}' )
print( f'the set difference of {b} and {a} is {b - a}' )
print( f'the symmetric difference of {a} and {b} is {a ^ b}' )

In [None]:
# 5.6.2.c  inclusion testing

a = {1, 2, 3}
b = {1, 2}

print( f"{a} is {'' if a > b else 'not '}a proper superset of {b}" )
print( f"{a} is {'' if a > a else 'not '}a proper superset of {a}" )
print( f"{b} is {'' if b > a else 'not '}a proper superset of {a}", end='\n\n' )

print( f"{a} is {'' if a >= b else 'not '}a superset of {b}" )
print( f"{a} is {'' if a >= a else 'not '}a superset of {a}" )
print( f"{b} is {'' if b >= a else 'not '}a superset of {a}", end='\n\n' )

print( f"{a} is {'' if a <= b else 'not '}a subset of {b}" )
print( f"{a} is {'' if a <= a else 'not '}a subset of {a}" )
print( f"{b} is {'' if b <= a else 'not '}a subset of {a}", end='\n\n' )

print( f"{a} is {'' if a < b else 'not '}a proper subset of {b}" )
print( f"{a} is {'' if a < a else 'not '}a proper subset of {a}" )
print( f"{b} is {'' if b < a else 'not '}a proper subset of {a}", end='\n\n' )

### 5.6.3 Additional in-place operations on sets <a name='Builtin-Data-Structures-Set-In-Place-Operations'></a>

All operations that update values in place return `None`.

The following example uses this additional Python construct:
`copy` - return a (shallow) copy of an object, instead of a reference to the original object.

In [None]:
# 5.6.3 in-place set operations

import copy

x = {1, 2, 3, 4}
x_before = copy.copy(x)
y = {5, 6, 7}
x != y
print( f'{x_before} | {y} is {x}' )

x_before = copy.copy(x)
y = {1, 2, 3, 4, 5}
x &= y
print( f'{x_before} & {y} is {x}' )

x_before = copy.copy(x)
y = {4, 5}
x -= {4, 5}
print( f'{x_before} - {y} is {x}' )

x_before = copy.copy(x)
y = {0, 2, 3, 4, 5}
x ^= {0, 2, 3, 4, 5}
print( f'{x_before} ^ {y} is {x}', end='\n\n' )

print( 'clearing x\'s contents' )
while len(x) > 0: print(x.pop())

print( )
print( 'x\'s final value is ', x)

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.6.3.1:**

</span><span style='color:navy'>In the following code cell, revise the re.findall example in [the regular expressions examples](#Builtin-Data-Structures-String-Regular-Expressions) so that it eliminates duplicates. As part of this exercise, when checking for duplicates,treat words that differ only in how they're capitalized as the same. Do, however, retain the original casing for at least one of the variants: e.g., be sure that running your code on text that includes just 'FOOL 'and 'Fool' returns one or the other--not 'fool'.</span>

## 5.7  Frozensets <a name='Builtin-Data-Structures-Frozensets'></a>

Frozensets are immutable analogues of sets.  They share all the same operations as sets, except for update operations. Python associates no special punctuation symbols with frozensets; all references must use the constructor.

### 5.7.1 Frozenset instantiation <a name='Builtin-Data-Structures-Frozenset-Instantiation'></a>

In [None]:
# 5.7.1.a  using the cross-object method frozenset() to instantiate sets

print( 'frozenset( )                is', frozenset( )                )
print( 'frozenset( {1, 2, 3, 4} )   is', frozenset( {1, 2, 3, 4} )   )
print( 'frozenset( {1:2, 3:4} )     is', frozenset( {1:2, 3:4} )     )
print( 'frozenset( [(1,2), (3,4)] ) is', frozenset( [(1,2), (3,4)] ) )
print( 'frozenset( ((1,2), (3,4)) ) is', frozenset( ((1,2), (3,4)) ) )

In [None]:
# 5.7.1.b  Composing the contents of two frozensets

print( 'Value of frozenset({1, 2, 3}) | frozenset({4, 5, 6}) is', frozenset({1, 2, 3}) | frozenset({4, 5, 6}) )

In [None]:
# 5.7.1.c  frozenset comprehension

frozenset( { chr(code) for code in range( ord('a'), ord('z')+1 ) } )

### 5.7.2 Additional frozenset operations <a name='Builtin-Data-Structures-Frozenset-Operations'></a>

In [None]:
# 5.7.2.a  number of elements in a frozenset

print( 'length of frozenset() is ', len( frozenset() ) )
print( 'length of frozenset({ 1, 2, 3, 4 }) is ', len( frozenset({ 1, 2, 3, 4 }) ) )

In [None]:
# 5.7.2.b  frozenset-combining operations

a = frozenset( {1, 2, 3} )
b = frozenset( {2, 3, 4} )

print( f'{a} union {b} is {a | b}' )
print( f'{a} intersection {b} is {a & b}' )
print( f'the difference of {a} and {b} is {a - b}' )
print( f'the difference of {b} and {a} is {b - a}' )
print( f'the symmetric difference of {a} and {b} is {a ^ b}' )

In [None]:
# 5.7.2.c  inclusion testing

a = frozenset( {1, 2, 3} )
b = frozenset( {1, 2} )

print( f"{a} is {'' if a > b else 'not '}a proper superset of {b}" )
print( f"{a} is {'' if a > a else 'not '}a proper superset of {a}" )
print( f"{b} is {'' if b > a else 'not '}a proper superset of {a}", end='\n\n' )

print( f"{a} is {'' if a >= b else 'not '}a superset of {b}" )
print( f"{a} is {'' if a >= a else 'not '}a superset of {a}" )
print( f"{b} is {'' if b >= a else 'not '}a superset of {a}", end='\n\n' )

print( f"{a} is {'' if a <= b else 'not '}a subset of {b}" )
print( f"{a} is {'' if a <= a else 'not '}a subset of {a}" )
print( f"{b} is {'' if b <= a else 'not '}a subset of {a}", end='\n\n' )

print( f"{a} is {'' if a < b else 'not '}a proper subset of {b}" )
print( f"{a} is {'' if a < a else 'not '}a proper subset of {a}" )
print( f"{b} is {'' if b < a else 'not '}a proper subset of {a}" )

<span style='color:blue'>&#128073;&ensp;&ensp;**Exercise 5.7.2.1:**

</span><span style='color:navy'>In the following markdown cell, </span>
- <span style='color:navy'>Explain the difference between *set* and *frozenset*</span>
- <span style='color:navy'>List functions that are supported by *set* but not *frozenset*</span>
- <span style='color:navy'>List functions that are supported by *frozenset* but not *set*</span>
***


***
