# Chapter 1. Begin

### Activating virtual environment
```
$ cd mydir; python3 -m venv .venv 
$ source .venv/bin/activate 
(.venv) $ which python
/root_dir/.venv/bin/python
(.venv) $ deactivate
$ which python3
/usr/local/bin/python3
```
__[venv documentation](https://docs.python.org/3/library/venv.html)__

Install packages: create file `requirements.txt`
```
delorean==1.0.0
requests==2.22.0
```
and install it:
```
$ pip install -r requirements.txt
$ pip freeze   # just to check what installed
```
VSC: Ctrl-Shift-P -- Jupyter: Select Interpreter to Run Python code <br>
Windows in PowerShellAdmin: Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy Unrestricted

In [2]:
import sys
print(sys.executable)

c:\usr\yuri\python\.venv\Scripts\python.exe


### Creating strings with formatted values

In [3]:
"example {} here".format("STRING")
"example {} here {} again".format(1.23, "STRING")
"example {1} here {0} again".format(1.23, "STRING")
"example {first} here {second} again".format(first=1, second=2)

'example 1 here 2 again'

In [5]:
# TEMPLATES
data = [
    (1000,10),
    (200,-10),
    (300,20)
]
print("REVENUE | PROFIT | PERCENT")  #header
TEMPLATE = '{revenue:>7,} | {profit:>+6} | {percent:>7.2%}'
# after parameter : separates format definition
# width 7, aligns right >
# , for 1,000    + adds + for positive, .2 precision
for revenue, profit in data:
    row = TEMPLATE.format(revenue=revenue, profit=profit, percent=profit/revenue)
    print(row)

REVENUE | PROFIT | PERCENT
  1,000 |    +10 |   1.00%
    200 |    -10 |  -5.00%
    300 |    +20 |   6.67%


In [2]:
# f-strings
param1 = 'first'
param2 = 'second'
print( f'Parameters {param1}:{param2}' )
value = 'VALUE'
f'Curly brackets twice: {{{value}}}' # used in meta-templates

Parameters first:second


'Curly brackets twice: {VALUE}'

__[pyformat.info](https://pyformat.info/)__  <br>
__[docs.python.org formatspec](https://docs.python.org/3/library/string.html#formatspec)__

### Manipulating Strings
Eliminate numbers, newlines after period, 80-char, ascii-only

In [8]:
INPUT_TEXT = ''' AFTER THE CLOSE OF THE SECOND QUARTER, OUR COMPANY, CASTAÑACORP
HAS ACHIEVED A GROWTH IN THE REVENUE OF 7.47%. THIS IS IN LINE
WITH THE OBJECTIVES FOR THE YEAR. THE MAIN DRIVER OF THE SALES HAS BEEN
THE NEW PACKAGE DESIGNED UNDER THE SUPERVISION OF OUR MARKETING DEPARTMENT.
OUR EXPENSES HAS BEEN CONTAINED, INCREASING ONLY BY 0.7%, THOUGH THE BOARD
CONSIDERS IT NEEDS TO BE FURTHER REDUCED. THE EVALUATION IS SATISFACTORY
AND THE FORECAST FOR THE NEXT QUARTER IS OPTIMISTIC. THE BOARD EXPECTS
AN INCREASE IN PROFIT OF AT LEAST 2 MILLION DOLLARS.'''
words = INPUT_TEXT.split()  # default separators: ws \n
# replace numbers with X
no_num = [ ''.join('X' if c.isdigit() else c for c in word) 
    for word in words]
# string -> bytes -> string
ascii_only = [ word.encode('ascii', errors='replace').decode('ascii')
    for word in no_num]
newlines = [ word + '\n' if word.endswith('.') else word    for word in ascii_only]
LINE_SIZE = 80
lines = []; line = ''
for word in newlines:
    if line.endswith('\n') or len(line) + len(word) + 1 > LINE_SIZE:
        lines.append(line) # newline
        line = ''
    line = line + ' ' + word
# Title lines, join
lines = [line.title()  for line in lines]
result = '\n'.join(lines)
print(result)

 After The Close Of The Second Quarter, Our Company, Casta?Acorp Has Achieved A
 Growth In The Revenue Of X.Xx%.

 This Is In Line With The Objectives For The Year.

 The Main Driver Of The Sales Has Been The New Package Designed Under The
 Supervision Of Our Marketing Department.

 Our Expenses Has Been Contained, Increasing Only By X.X%, Though The Board
 Considers It Needs To Be Further Reduced.

 The Evaluation Is Satisfactory And The Forecast For The Next Quarter Is
 Optimistic.



__[encode/decode](https://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3)__<br>
__[strings](https://diveintopython3.net/strings.html)__ <br>
 __[textwrap_module](https://docs.python.org/3/library/textwrap.html)__  

### Extracting data from structured strings
`[<Timestamp in iso format>] - SALE - PRODUCT: <product id> - PRICE: $<price of the sale>`

In [16]:
import delorean
from decimal import Decimal
log = '[2018-05-05T11:07:12.267897] - SALE - PRODUCT: 1345 - PRICE: $09.99'
divide_it = log.split(' - ')
timestamp_str, _, product_str, price_str = divide_it
timestamp = delorean.parse(timestamp_str.strip('[]'))
product_id = int(product_str.split(':')[-1])
price = Decimal(price_str.split('$')[-1])
timestamp, product_id, price

(Delorean(datetime=datetime.datetime(2018, 5, 5, 11, 7, 12, 267897), timezone='UTC'),
 1345,
 Decimal('9.99'))

Python class

In [32]:
class PriceLog(object):
    def __init__(self,timestamp,product_id,price):
        self.timestamp = timestamp
        self.product_id = product_id
        self.price = price
    def __repr__(self):
        return '<PriceLog({}, {}, {})>'.format(self.timestamp,self.product_id,self.price)
    @classmethod
    def parse(cls,log):  #cls is reference to class passed as a first arg in classmethod
        '''
        Parse from a text log with the format
            [<Timestamp>] - SALE - PRODUCT: <product id> - PRICE: $<price>
        to a PriceLog object
        '''
        divide_it = log.split(' - ')
        timestamp_str, _, product_str, price_str = divide_it
        timestamp = delorean.parse(timestamp_str.strip('[]'))
        product_id = int(product_str.split(':')[-1])
        price = Decimal(price_str.split('$')[-1])
        return cls(timestamp=timestamp, product_id=product_id, price=price)

log = '[2018-05-05T12:58:59.998903] - SALE - PRODUCT: 897 - PRICE: $17.99'
PriceLog.parse(log)

<PriceLog(Delorean(datetime=datetime.datetime(2018, 5, 5, 12, 58, 59, 998903), timezone='UTC'), 897, 17.99)>

To process $, multiplu by 100 and use Integer, or __[Decimal](https://docs.python.org/3.8/library/decimal.html)__

### Using a third-party tool—parse
__[module parse](https://github.com/r1chardj0n3s/parse)__

Add `parse==1.14.0` and `$ pip install -r requirements.txt`

In [40]:
from parse import parse
LOG = '[2018-05-06T12:58:00.714611] - SALE - PRODUCT: 1345 - PRICE: $09.99'
FORMAT = '[{date}] - SALE - PRODUCT: {product} - PRICE: ${price}'   # all strings
result = parse(FORMAT,LOG)
result['date']

# types -- 'ti' is timestep in ISO
FORMAT = '[{date:ti}] - SALE - PRODUCT: {product:d} - PRICE: ${price:05.2f}'
result = parse(FORMAT,LOG)

# define custom decimal type to avoid floating
def price(string):
    return Decimal(string)
FORMAT = '[{date:ti}] - SALE - PRODUCT: {product:d} - PRICE: ${price:price}'
result = parse(FORMAT,LOG, {'price' : price})
result

<Result () {'date': datetime.datetime(2018, 5, 6, 12, 58, 0, 714611), 'product': 1345, 'price': Decimal('9.99')}>

In [43]:
import parse 
class PriceLog(object):
    def __init__(self,timestamp,product_id,price):
        self.timestamp = timestamp
        self.product_id = product_id
        self.price = price
    def __repr__(self):
        return '<PriceLog({}, {}, {})>'.format(self.timestamp,self.product_id,self.price)
        
    @classmethod
    def parse(cls,log):
        '''
        Parse from a text log with the format
            [<Timestamp>] - SALE - PRODUCT: <product id> - PRICE: $<price>
        to a PriceLog object
        '''
        def price(string):
            return Decimal(string)
        def isodate(string):
            return delorean.parse(string)
        FORMAT = ('[{timestamp:isodate}] - SALE - PRODUCT: {product:d} - PRICE: ${price:price}')
        formats_extra = {'price' : price, 'isodate' : isodate}
        result = parse.parse(FORMAT,log,formats_extra)
        print('result>>>{}<<<<',result)  # yuri's debug output
        return cls(timestamp=result['timestamp'], product_id=result['product'], price=result['price'])

log = '[2018-05-06T14:58:59.051545] - SALE - PRODUCT: 827 - PRICE: $22.25'
PriceLog.parse(log)

result>>>{}<<<< <Result () {'timestamp': Delorean(datetime=datetime.datetime(2018, 6, 5, 14, 58, 59, 51545), timezone='UTC'), 'product': 827, 'price': Decimal('22.25')}>


<PriceLog(Delorean(datetime=datetime.datetime(2018, 6, 5, 14, 58, 59, 51545), timezone='UTC'), 827, 22.25)>

### Introducing regular expressions
re.search()

In [3]:
import re
re.search(r'PAT', 'STRPAT')

<re.Match object; span=(3, 6), match='PAT'>

In [4]:
re.search(r'PAT', 'no match')

In [5]:
STRING = 'something in the things she shows me'
match = re.search(r'thing', STRING)
STRING[:match.start()],   STRING[match.start():match.end()],   STRING[match.end():]

('some', 'thing', ' in the things she shows me')

In [6]:
match = re.search(r'\bthing', STRING)  # now with \b
STRING[:match.start()],   STRING[match.start():match.end()],   STRING[match.end():]

('something in the ', 'thing', 's she shows me')

In [8]:
STRING = 'my email is qw1@r2ty.com'  # http://emailregex.com/
PAT = r"([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)"
re.search(PAT,STRING).group()  

'qw1@r2ty.com'

### Going deeper into regular expressions
__[Python 're' documentation](https://docs.python.org/3/library/re.html)__

In [14]:
import re
match = re.search(r'phone number is ([\d-]+)', 'My phone number is 1-408-123-5467 blah.')
match.group()
match.group(1)

'1-408-123-5467'

In [36]:
match = re.search(r'phone number is ([\d-]+)', 'My phone number is 1234-567-890\.')
match.group()  # same as group[0] -- whole match
match.group(1)

'1234-567-890'

In [43]:
PAT = re.compile(r'answer to question (\w+) is (yes|no)', re.IGNORECASE)
PAT.search('and answer to question 3b is YES')
PAT.search('and answer to question 3b is YES').groups()

('3b', 'YES')

In [46]:
#  +? -- non-greedy match
PAT = re.compile(r'([A-Z][\w\s]+?).(TX|OR|OH|MI)')
TEXT ='''
the jackalopes are the team of Odessa,TX while the knights are native of Corvallis OR 
and the mud hens come from Toledo.OH; the whitecaps have their base in Grand Rapids,MI
'''
list(PAT.finditer(TEXT))
_[0].groups()

('Odessa', 'TX')

In [51]:
# search() returns only first object
PAT.findall(TEXT)

[('Odessa', 'TX'),
 ('Corvallis', 'OR'),
 ('Toledo', 'OH'),
 ('Grand Rapids', 'MI')]

In [57]:
# groups can be named -- (?P<groupname>PATTERN)
# referred by .group(groupname) or groupdict()
PAT = re.compile(r'(?P<city>[A-Z][\w\s]+?).(?P<state>TX|OR|OH|MI)')
match = PAT.search(TEXT)
match.group(1), match.group(2)
match.groupdict()
match.group('city')


'Odessa'

### Adding command-line arguments
__[documentation of argparse module](https://docs.python.org/3/library/argparse.html)__

In [64]:
import argparse
#The _main_ function makes it easy to know what the entry point for the code is
def main(c, number):
    print(c * number)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('number', type=int, help='M-factor') # which args to accept
    parser.add_argument('-c', type=str, help='char to print', default='#') 
    args = parser.parse_args()
    main(args.c, args.number)  

usage: ipykernel_launcher.py [-h] [-c C] number
ipykernel_launcher.py: error: the following arguments are required: number


SystemExit: 2