# Python: peeking under the hood

First let's try to delve in to some behavior that has demonstrated but not yet been explained in depth. We want to try and understand the difference between 

a) the observed output of a statement when a python expression is not assigned to a given variable
b) the observed output of using the print function on a given variable

By way of example:

#### a)

In [1]:
2 + 2

4

#### b)

In [2]:
print(2 + 2)

4


### Are they different?

The two examples above have completely different outcomes although they look the same superficially. This become clearer if we assign each expression to a variable

#### a)

In [3]:
a = 2 + 2

In [4]:
a

4

#### b)

In [5]:
b = print(2 + 2)

4


In [6]:
b

In the case of a, the result is assigned to the label "a" and when "a" is evaluated as an expression we observe the value we stored.

In contrast, the expression in b displays the result of the expression "2 + 2" but since the print function always returns "None", we can't do much with our variable "b"

### Why might I care about this difference?

A simple reason to care is if we like the look of the output and want to capture it as a string to use it elsewhere. We can use the `str` function for that...

In [1]:
my_list = [float(x) for x in range(2,10)]
str(my_list)

'[2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]'

That's nice. It makes some things easier, but it doesn't quite explain what is going on. For example, using different methods of seeing a Path object gives different results:

In [7]:
from pathlib import Path

In [8]:
type

type

In [9]:
test_path = Path('a_test_file.txt')

In [10]:
test_path

PosixPath('a_test_file.txt')

In [11]:
print(test_path)

a_test_file.txt


In [12]:
str(test_path)

'a_test_file.txt'

### Ok. Enough said. What's under the hood?

It turns out that when we print a variable we are calling a "hidden" `__str__` method of the variable and display the resulting string. To capture the string we could instead call this method. This is still not exactly the same as `print`, which will interpret the character encodings and display the string without quotes. It's close enough though.

In [13]:
test_path.__str__()

'a_test_file.txt'

And finally, the `__repr__` method will give us something more official and is often sufficient to instantiate the object that we currently have:

In [14]:
test_path.__repr__()

"PosixPath('a_test_file.txt')"

The are lots of hidden methods like this that enable all python objects to behave the way they do:

In [16]:
test_path

PosixPath('a_test_file.txt')

In [17]:
import sys

In [18]:
sys.stdout.write(repr(test_path))

PosixPath('a_test_file.txt')

In [15]:
[x for x in dir(test_path) if x.startswith('__')]

['__bytes__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__fspath__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rtruediv__',
 '__setattr__',
 '__sizeof__',
 '__slots__',
 '__str__',
 '__subclasshook__',
 '__truediv__']

### I'm still some of the details of the print function
[Here](https://snarky.ca/why-print-became-a-function-in-python-3/) is a description that you might like.

# Finishing the exercise to generate a tree of data

In [26]:
from pathlib import Path
import itertools
import random
import shutil

seasons = 'spring summer autumn winter'.split()
animals = 'cat dog bat monkey elephant'.split()
test_dir = Path('testdir')
if test_dir.exists():
    shutil.rmtree(test_dir)


def generate_text():
    return ',\n'.join([str(random.random()) for x in range(random.randint(1,100))])

for animal,season in itertools.product(animals,seasons):
    this_loop_dir = test_dir / animal / season
    text_path = this_loop_dir / 'data.txt'
    text_path.parent.mkdir(parents=True)
    text_string = generate_text()
    text_path.write_text(text_string)

print(text_path.read_text())

0.5577414683537048,
0.43982426834841004,
0.897589796788902,
0.876609769270911,
0.9922582677278722,
0.6205887153925937,
0.6928911401825408,
0.48506966924095585,
0.7794101214164957,
0.43766596991908735,
0.8799010662057603,
0.7741598413730841,
0.29393259488248513,
0.40590722584568617,
0.7921578606217805,
0.5451154237853839,
0.7260056309674134,
0.829564929942111,
0.2298744666762338,
0.32276524778012183,
0.14595640287163358,
0.2852285099625921,
0.3168579796009109,
0.1961417999019941,
0.6249853326640685,
0.17306885345079803,
0.551420125428466,
0.028752360068183713,
0.5201292736843518,
0.21605074873246444


In [22]:
def generate_text():
    return ',\n'.join([str(random.random()) for x in range(random.randint(1,100))])

print(generate_text())

0.07675528334388637,
0.9253231137600759,
0.9536016491873796,
0.49635786045188157,
0.5669686113293642,
0.851614126879594,
0.24153117598988483,
0.8022641201012453,
0.051569214075244174,
0.36168866432966906,
0.8242871683621597,
0.5529553491582786,
0.5590114538301728,
0.2580153556405833,
0.5187139486975969,
0.9125330097376304,
0.8503908750424144,
0.10336459986904911,
0.3546156598554361,
0.6933389455872093,
0.4379221128527685,
0.28117478765990767,
0.43415804245461864,
0.9395392622345882,
0.3906350386795886,
0.20545847016346397,
0.04050850965202546,
0.8363224680177082,
0.5213106387874163,
0.5801105759549575,
0.2736849444628209,
0.9227714173618083,
0.46923529060749647,
0.9253101268207485,
0.7644995292671246,
0.7978489193333103,
0.6090422316590997,
0.14778301035088737


In [23]:
def generate_text():
    return ',\n'.join([str(random.random()) for x in range(random.randint(1,100))])

generate_text()

'0.7916059118026918,\n0.46496884960951124,\n0.234141643288848,\n0.09630001428655255,\n0.3228969422454372,\n0.9784501204171521,\n0.31397023519223277,\n0.417792047582602,\n0.4508095778412754,\n0.6089571534055745,\n0.33365950366463115,\n0.5982996032699263'

In [71]:
def generate_test_tree(test_dir, overwrite=False):
    seasons = 'spring summer autumn winter'.split()
    animals = 'cat dog bat monkey elephant'.split()
    
    if test_dir.exists():
        shutil.rmtree(test_dir)   #deal with file exist error
    for animal,season in itertools.product(animals,seasons):
        this_loop_dir = test_dir / animal / season
        text_path = this_loop_dir / 'data.txt'
        text_path.parent.mkdir(parents=True)
        text_string = generate_text()
        text_path.write_text(text_string)

test_dir = Path("test_directory_path")
generate_test_tree(test_dir)

In [86]:
# to overwrite, overwrite is default
def generate_test_tree(test_dir, overwrite=False):
    seasons = 'spring summer autumn winter'.split()
    animals = 'cat dog bat monkey elephant'.split()
    
    if test_dir.exists() and overwrite:
        shutil.rmtree(test_dir)   
    for animal,season in itertools.product(animals,seasons):
        this_loop_dir = test_dir / animal / season
        text_path = this_loop_dir / 'data.txt'
        text_path.parent.mkdir(parents=True)
        text_string = generate_text()
        text_path.write_text(text_string)

test_dir = Path("test_directory_path")
generate_test_tree(test_dir, overwrite=True)

In [63]:
tdir=Path("test_dir2")
generate_test_tree(tdir)

In [51]:
!pwd


/Users/miaol/fall2019/2019-10-31


In [60]:
pwd

'/Users/miaol/fall2019/2019-10-31'

In [61]:
%pwd

'/Users/miaol/fall2019/2019-10-31'

In [87]:
generate_test_tree(test_dir1,overwrite=True)

NameError: name 'test_dir1' is not defined

In [82]:
from my_funcs import generate_test_tree

In [None]:
generate_test_tree(test_dir3,overwrite=True)

### Making this better...

There are many things we can do to make this code better. Let's discuss some of them.

### Breaking down the problem. 
This breakdown should ease rather than hinder the debugging process.

### Carefully thinking about idempotence.

### Carefully thinking about breaking backwards compatibility as we expand the functionality of our code.

### Saving our code in a way that is more reusable, shareable.

### Reassessing whether we solved the problem in the correct way. Would we attack the problem differently?

### The end of course project

A rough rubric that is subject to change is available on the [course repository](https://github.com/biof309/fall2019.git)

In [95]:
def generate_text():
    num = range(random.randint(1,100))
    numlist = [random.random() for a in num]
    output = ",\n".join([str(x) for a in numlist])
    return output