<img src="img/insight.svg" style="width: 300px"><br>
<font color='#544640'>
<center><i>Engineering Summit 2019</i></center>
<center><i>Denver, Colorado</i></center></font><br>
<center><i><font color='#544640' size='1'>Author: Victor Aranda</font></i></center></font>
<center><i><font color='#B81590' size='1'>victor.aranda@insight.com</font></i></center></font>
<hr>

# <font color="#D21087">Working with Files</font>

<font color='#544640'>In this notebook we'll work with some simple file manipulation; opening them, reading data, and writing it back to disk.</font>

<br><br><font color="#B81590">$$\large-\infty-$$</font><br><br>

In [1]:
# environment setup
import os
import pandas as pd

<br><br><font color="#B81590">$$\large-\infty-$$</font><br><br>

### <font color="#D21087">Basic Directory Stuff</font>

<font color='#544640'>Python can tell you about the current working directory and the files therein. This is pretty handy. For example, you can iterate through them and perform actions depending on the type of file, or file name, etc. There are a lot of options here - use your imagination.</font>

In [2]:
os.listdir('./samplefiles/')

['.DS_Store', 'example.xlsx', 'text.txt']

### <font color="#D21087">Text Files</font>

<font color='#544640'>We'll open two files, an Excel (.xlsx) and a text file.

we'll use the built-in `open()` command. This takes a filename and a flag:

* `'r'` read only
* `'r+'` read and write
* `'rb'` read only (binary)
* `'rb+'` read and write (binary)
* `'w'` write only
* `'a'` append

Note: the `+` indicates the file will be created if it doesn't exist.

Regarding the next cell: The `%%` symbol below is not python - it's a [Jupyter magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html). This cell sets up our test file.</font>

In [None]:
%%writefile './samplefiles/text.txt'
hello!
line two!
line three!
line four!
when is lunch??

In [None]:
# opening a text file with read-only privs

text_file1 = open('./samplefiles/text.txt', 'r') # r = read only

print('contents:\n')
for line in text_file1.readlines():
    print('   ',line)
    
if text_file1.closed:
    print('\ntext_file2 is closed')
else:
    print('\ntext_file2 is still open')

text_file1.close()

<font color='#544640'>There's a better way to open files. If you open files using the above method and forget to `close()` them, it may take some time for python to clean up the open file (i.e. it will stay open). It is easy to forget to `close()` a file when you are done.</font>

In [None]:
# opening a text file with read-only privs
# the fancy way

with open('./samplefiles/text.txt', 'r') as text_file2:
    print('contents:\n')
    for line in text_file2.readlines():
        print('   ',line)

# file is automatically closed when the block ends

if text_file2.closed:
    print('\ntext_file2 is closed')
else:
    print('\ntext_file2 is still open')

Let's add a line to this file.</font>

In [None]:
text_file3 = open('./samplefiles/text.txt', 'a+') #append and read
    
for i in range(6,10):
    text_file3.write('line number ' + str(i) + '\n')

# readlines uses a 'cursor' to keep track of where it left off
# go back to the beginning of the file
text_file3.seek(0)
    
for line in text_file3.readlines():
    print(line)

<font color='#544640'>Let's modify the fifth line.</font>

In [None]:
with open('./samplefiles/text.txt', 'r') as file:
    file_data = file.readlines()

# access the 5th line of file_data
file_data[4] = '(this is line number 5)\n'

# and write everything back
text_file4 = open('./samplefiles/text.txt', 'w+')

text_file4.writelines(file_data)
text_file4.close()

print('updated contents:\n')
with open('./samplefiles/text.txt', 'r') as text_file4:
    for line in text_file4.readlines():
        print('   ',line)   

<br><br><font color="#B81590">$$\large-\infty-$$</font><br><br>

### <font color="#D21087">Excel Files</font>

<font color='#544640'>We'll now open an Excel (.xlsx) containing some information about an environment in two different tabs.

We're going to use `pandas`, a very powerful data science and data manipulation library that can handle large amounts of multidimensional "panel data" (hence the name) efficiently. It is generally used for data science and computing applications.

It's also convenient for accessing and handling tabular data. There are *many* libraries that can handle Excel files, `pandas` is only one. `pandas` comes with some caveats (and limitations) that we won't go into here, related to it's original intended use; i.e. it is definitely not just a 'excel reader' library!

Side note: `pandas` objects are called `dataframes`, much like R's native data structure and are very similar.

For our purposes, here is the relevant doc:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html</font>

In [None]:
data = pd.read_excel('./samplefiles/example.xlsx', sheet_name = None)

<font color='#544640'>That was easy. When we specify sheet_name, we can do so using  `pd.read_excel` returns a `dict` whose key values are the names of the sheets in the `xlsx` document.</font>

In [None]:
list(data.keys())

In [None]:
data['server_inventory']

In [None]:
data['network_inventory']

<font color='#544640'>We can access a particular column or row of a `pandas` dataframe using indices, as we learned with lists.</font>

In [None]:
type(data['server_inventory'])

In [None]:
# this is easier to type
df1 = data['server_inventory']

# .loc[i] returns the ith row
df1.loc[0]

In [None]:
# accessing a data frame like a dict ['colname'] returns a column
df1['IP']

<font color='#544640'>Great, how about a specific element?</font>

In [None]:
# we can refer to the 3rd row of a specified column

print(df1['IP'][3])

In [None]:
# or we can refer to the IP 'field' of the 3rd column, which is the same element

print(df1.loc[3].IP)

<font color='#544640'>What if it was a gigantic excel file that we couldn't directly view and read, and we only wanted to find `jbozic`'s servers? Using `pandas` takes a bit of explanation.

Below we are creating a list of boolean (true/false) values based on comparing the column `Owner` to `'jbozic'`.</font>

In [None]:
df1.Owner == 'jbozic'

<font color='#544640'>We can use that list of true/false values as flags to access particular rows of the dataframe:</font>

In [None]:
df1[df1.Owner == 'jbozic']

In [None]:
df1[df1.Owner == 'jorlandini']

<font color='#544640'>Note the indices are preserved. That's because `pandas` assumed we wanted it to create the indices for us. We could have used the Hostname as our index instead, which would make our `loc` usage a bit more clear:</font>

In [None]:
data2 = pd.read_excel('./samplefiles/example.xlsx', sheet_name = None, index_col=0)

In [None]:
data2['server_inventory']

<font color='#544640'>Now we can access rows by the server name:</font>

In [None]:
df2 = data2['server_inventory']
df2.loc['Server006']

<font color='#544640'>Or by subnet!</font>

In [None]:
data2['network_inventory'].loc['10.20.0.0']

<font color='#544640'>If we want to make a new element of the df, we insert it using its new (unique) index. Since we're using hostnames now, it looks like this:</font>

In [None]:
df2.loc['Server009'] = ['10.20.0.17','HPE ProLiant', 'ckapusta','STAGING']
df2

<font color='#544640'>`pandas` can do a lot more - sort values, insert, remove, merge, append, split, etc. More than we have time for!

For now, let's figure out how to put all this back into an Excel file.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html</font>

In [None]:
# the updates we made were to the first sheet, currently stored in 'df2'
# but let's not forget about our second sheet!

# for more than one sheet we'll need an ExcelWriter object.

with pd.ExcelWriter('./samplefiles/output.xlsx') as writer:
    df2.to_excel(writer, sheet_name='server_inventory')
    data2['network_inventory'].to_excel(writer, sheet_name='network_inventory')

In [None]:
os.listdir('./samplefiles/')

In [None]:
# ignore hidden files

[f for f in os.listdir('./samplefiles/') if not f.startswith('.')]

<br><br><font color="#B81590">$$\large-\infty-$$</font><br><br>