# File navigation and data handling

One of the most popular reasons why people learn Python is for how powerful and convenient it is to parse data. It has a collection of very powerful third-party packages that make data processing a breeze. Third party packages are simply a collection of useful code that other people have written, so that we don't have to anymore. Today, we will handle the following topics:
- file  navigation and IO
- pandas (see [the next notebook](./Lecture_2.ipynb))
- seaborn (visualization [the next notebook](./Lecture_2.ipynb))

The learning curve can be a bit steep, especially compared to how intuitive e.g. Excel is. However, once you're getting over the learning curve, the possibilities are endless.

To process your data, you will first need to know how to navigate files and read in files in Python.

## File navigation

Reading in your data requires that you're able to locate it. Each file and folder on your computer has a unique path, that looks something like: `"C:\Users\username\Documents\my_project_final_REALLYfinal_v3.docx"`

Here, the `\` is the "separator", indicating that you're moving a level deeper into the filesystem. Note that Linux and MacOS have `/` as a separator.


Navigating your files can be done with the `os` package, which is built into Python. You don't have to install this one yourself, you can just import it:

In [1]:
import os

What can `os` do?

### List files in a directory

In [2]:
# List all files in a directory
os.listdir('.')  # '.' means "the current directory, wherever that may be"

['Lecture_1.ipynb']

In [3]:
os.listdir('..')  # '..' means "the directory above here, wherever that may be"

['Day_3', 'README.md', 'Day_1', '.git']

Here, `.` means "the current directory", and `..` means "one directory above the current directoy". This is relative to the directory from which this code is being executed. These relative markers are very useful to avoid having to type out the full path of some file every time. this way, we can e.g. print out the files in a sibling directory, without having to know the full path to it:

In [18]:
operating_system = os.uname().sysname
separator = os.sep
print("I am using a", operating_system, "operating system, which uses the separator: ", separator)

I am using a Linux operating system, which uses the separator:  /


In [8]:
sibling_directory = '..' + separator + 'Day_1'
print(sibling_directory)
os.listdir(sibling_directory)

../Day_1


['Lecture_2.ipynb', 'Lecture_1.ipynb']

Creating new directories can be done with:

In [11]:
os.makedirs('data')

In [12]:
# To verify the directory we created is actually there:
os.listdir('.')

['Lecture_1.ipynb', 'data']

### The `path` subpackage

Handling paths can be done with the `path` subpackage, contained in `os`.

In [16]:
# What is the full path of the current directory?
current_directory = os.path.abspath('.')
print(current_directory)

/gpfs/soma_fs/scratch/meulemeester/project_src/python_for_neuroscience/Day_3


In [13]:
os.path.abspath('..')  # Get the absolute path of the current directory

'/gpfs/soma_fs/scratch/meulemeester/project_src/python_for_neuroscience'

You can easily construct pathnames using `os.path.join`. This will automatically use the correct separator (`\` for Windows, `/` for everything else).

In [19]:
data_directory = os.path.join(current_directory, 'data')
print(data_directory)

/gpfs/soma_fs/scratch/meulemeester/project_src/python_for_neuroscience/Day_3/data


If you are constructing paths yourself, it is always a good idea to check if it really exists. Just in case you made a typo, or you want to access a folder that has not been created (yet).

In [21]:
os.path.exists(data_directory)

True

## Writing and reading data using Python: file IO

Writing data to a file consists of four steps. 
1. Choosing a file location
2. Opening/creating the file
3. Write out the data to the open file
4. Closing the file

Writing this out in a 4-step plan seems a bit silly, doesn't it. But there is actually a lot that can go wrong in the code if you're not careful. Here are some common things to look out for:



> ⚠️ Common mistakes when doing file IO:
> 
> 1. Forgetting to close the file after you have written out data. Python won't write anything until it gets the command to close the file.
>
> 2. Opening the file in a wrong mode.
>
> 3. Choosing a file location that doesn't exist (due to a typo, or the parent directory doesn't exist (yet))

#### Writing out data

To write out data, you open an existing file in `w` mode. `w` simply stand for "write". 

You give this opened file a name in your Python code, so you can perform operations on it. Note that this name is the name of a Python variable, and has nothing to do with the name of the actual file.

In [29]:
my_file_name = os.path.join(data_directory, 'my_file.txt')
print("My file is: ", my_file_name)

My file is:  /gpfs/soma_fs/scratch/meulemeester/project_src/python_for_neuroscience/Day_3/data/my_file.txt


At this point in time, this file does not exist yet:

In [28]:
os.path.exists(my_file_name)

False

We can create it and write data to it in a single line of code:

In [30]:
with open(my_file_name, 'w') as my_file_as_python_variable:
    my_file_as_python_variable.write('Hi mom, I\'m learning Python!')

Now, the file does exist:

In [31]:
os.path.exists(my_file_name)

True

And it contains data too!

In [25]:
with open(os.path.join(data_directory, 'test.txt'), 'r') as my_file:
    content = my_file.read()
print(content)

Hi mom, I'm learning Python!


It is recommended to use the syntax as above to read or write files, i.e. using the following syntax:
```python
with open(some_file) as f:
    # do something with f...
```
Using this `with ... as` syntax style, Python will automatically close the file as soon as the codeblock has finished executing, and you don't have to remember to run something like `f.close()`, which many people (like me) often forget.

## Exercises

1. Create a new directory in the current folder, called "data2".

2. Create a file in this directory, containing the title of the song you're currently obsessed with.

3. Create a second file in this directory, containing all numbers between 1 and 100. I would not recommend to do this manually - try a `for`-loop! See [the exercises of the previous day](../Day_1/Lecture_1.ipynb) for more information on `for` loops.


4. Open the same file as before, and now add the name of your favorite TV show, without losing the numbers 1-100. You will have to open the file in a new mode for this: `a` for "append".