# Organization

Data analysis projects can quickly get out of hand and learning to manage them best will come with experience. 

A few suggestions:

## Project Directory - Git Repository

When starting a new project create a directory that will contain everything pertaining to that project. Initialize it as a git repository so that all changes are tracked and can be backed up at github, bitbucket, or other such online repository.

Make use of a `.gitignore` file to prevent git from tracking large data files or any files in which credentials (passwords, cryptographic keys, etc) or other sensitive information are stored. Example `.gitignore` files can be found that will come preconfigured to ignore extra accessory files that are created such as `.ipynb_checkpoints` which doesn't hold the original code for jupyter notebook files but helps with the autosave features.

For example I set up a `.gitignore` file for this project that contains the following:

```
data/
venv/
.ipynb_checkpoints/

```

This will keep all the data and accessory files out of the git repository.

## Environment Management - python virtual environment

If you're working in python you should make it a habit to use virtual environments for each project. A virtual environment is like an isolated clean python install when you create it. Then you can add just the packages that are needed for your work. 

There are several ways to do this depending on whether you're using python alone or anaconda. The general steps if using python on linux:

1) Create the virtual environment (using python 3.8 into a directory called `venv`)

`virtualenv -p python3.8 venv`

2) Activate virtual environment

`. venv/bin/activate`

3) Install things

`pip install pandas`

Environment management is a recurring concern in computational work and you'll encounter many ways to achieve similar things in different ways and for different purposes. The main ideas is to let you specify which versions of tools you need for a particular task while also letting them coexist on the same system as different versions of the same tools that are needed for a separate task. Additionally since these tools can be boiled down to a list of tool names and versions you can use this to recreate the same enviornment elsewhere.

Using `pip` the convention is to create a file called `requirements.txt` like so:

`pip freeze > requirements.txt`

This captures a list of packages installed in the current environment. It can be used to reconstitute the environment like so:

`pip install -r requirements.txt`

It's a good idea to add the `requirements.txt` or equivalent to the git repository so this travels along with your code.

## Notebooks

Just try to give them good names and use subdirectories to organize them as best you can.

## Reuseable Code

Code you develop in one notebook that you want to use in other notebooks is best moved to a python file or package. It's easier to find and any bugs you find and fix are done in one central location instead of having to remember to fix it in multiple notebooks.