# Getting Set Up

## Editor
There are a range of text editors and developer environments than you can use for this work. I tend to recommend [Atom](http://atom.io/) for a few reasons. It's highly customizable, for one, which can also be a drawback. Some of its more advanced features can significantly slow down the program and drain your battery, but you can turn many of these extra features off in the preferences menu. Most importantly for newcomers, Atom allows easy integration with the terminal in a way that can be more complicated to set up using different editors. In Atom, you can easily enable these through the "Atom" menu when the program is launched and selecting "Install Shell Commands." This will allow you to, say, open a text file or even an entire directory from the terminal:

$ atom iliad.txt

$ atom .

The first command would open a text file of the Iliad, while the second would open the entire folder in Atom for editing. Moving back and forth between terminal and text editor in this way is often a challenging concept for beginner students, but Atom makes it easy to make the connection. 

## Filenames

If you're working with Python, odds are pretty high that you'll be working in the terminal at some point. There are certain practices you can take on to make your life easier. The primary thing to keep in mind is that you should avoid spaces, because the command line uses a space to distinguish between inputs. For example, given a folder of family photos:

$ ls family photos

vs.

$ ls family_photos

The first example will error, because the computer assumes photos is a part of the command. Instead of spaces, use underscores or dashes to connect multi-word file and folder names. Capitalization is acceptable as far as the computer goes, and, depending on your operating system the terminal may not particularly care if you accidentally type "Family_Photos" instead of "family_photos." But I usually encourage students to stick with all lower-case filenames for consistency and laziness.

## Linters

If you're new to this kind of work, you might be coming to programming after having taken a course in programming, worked through a book or two on your own, or done some exercises while working through an online programming course. That's great! Odds are pretty good that you've also been making the work unnecessarily difficult for yourself. 

When you're programming for the first time, you were probably writing code with little help. In other words, you only found out that there was a problem with your code after you ran it and something broke. There is good reason for this workflow - it forces you to learn some basic syntax and memorize fundamental concepts before you learn to rely on other aids. But, in practice, many programmers make use of a linter, a piece of software that makes notes of style guide violations and outright errors. These powerful little tools can go a long way in helping to enforce good programming practices, and getting one set up on your system can make one a lot easier. Part of the reason I suggest using Atom is that it comes with a number of linters for a variety of languages baked in.

## Ignoring files in Version Control

GitHub accounts are public by default, meaning anyone can look at your files, your project history, and your conversations about them that are stored on GitHub. To keep unwanted information from your project history, you'll want to use a .gitignore file. This file can take filenames, foldernames, and also wildcards so as to tell your git repository what information to ignore when updating your project's history. Here we make a new directory, 'my_files' and change into it. Once inside, we initialize a git repository and use the touch command to create a .gitignore file.

In [2]:
%%bash
mkdir my-files
cd my-files
git init
touch .gitignore

Initialized empty Git repository in /Users/Brandon/projects/python-cookbook/my-files/.git/


Note that .gitignore is a dot file, so it will be hidden by the ls command and by a GUI by default. 

In [6]:
%%bash
cd my-files
ls

We can check to make sure it is there with a modified ls command:

In [9]:
%%bash
cd my-files
ls -a

.
..
.git
.gitignore


The .gitignore file can be easily edited with a text editor - simply open it like you would any other file and add the names of files or folders that you want to add, each on their own line. You can also add to the file using the command line. Let's add several things to our .gitignore file this way.

In [26]:
%%bash
cd my-files
echo 'test-file-1.txt' > .gitignore
echo 'test-file-2.txt' >> .gitignore
echo 'my-files/' >> .gitignore

In [27]:
%%bash
cd my-files
cat .gitignore

test-file-1.txt
test-file-2.txt
my-files/


The cat command reads out the contents of a file. So in this case, we're using it to confirm that we successfully added a list of files and folders to our .gitignore. We could also open our text editor to confirm as well!

## Version Control and Sensitive Information

"Oops! I just realized that I have been committing copyrighted material to GitHub!"

The whole point of a version control system like GitHub is that it should be very, very difficult to lose anything permanently. We want redundancy, and we want to have a record of every piece of our work to be tracked. This can cause real problems when, say, you suddenly realize that you have been uploading content to you publicly readable GitHub repository that you should not otherwise have there. Some examples:

* sensitive information - passwords, credentials
* temporary files that will clutter the repository
* copyrighted material

Because GitHub and version control systems try to make it very difficult to throw things away, correcting this issue is not as easy as just deleting the files. That will keep them from showing up in the current state of the project, but someone canny enough could go through your project's history to find records of those files that you want to remove. You'll need to remove them from the entire history of your project, which you can do a single command. The following assumes you have accidentally added a folder 'mistakes_to_be_corrected' to your GitHub repository:

$ git filter-branch --tree-filter 'rm -rf mistakes_to_be_corrected' HEAD

Now it is as though the 'mistakes_to_be_corrected' folder was never a part of our project history - it only exists on our computer. Since GitHub is a distributed model for version control, the offending data will also be out in the cloud until you update the master record on GitHub:

$ git push origin master --force

For more on this sort of thing and for a more detailed explanation that can account for edge cases, check out Dalibor Nasevic's [discussion on his blog](https://dalibornasevic.com/posts/2-permanently-remove-files-and-folders-from-git-repo).

## Ways to Organize Your Work

Each person has their own preferences for organizing their workspace. I tend to think in buckets:

* .gitignore file to facilitate version control.
* inputs - I often call this "corpus"
* output - if the scripts produce something, you have to direct that data somewhere. To keep things clean, I tend to direct things to an output corpus. 
* Python files related to processing, named in such a way that you will remember what they do.

So a typical directory might look like the following:
```
|-- .gitignore
|-- text-processing.py
|-- visualization.py
|-- corpus
    |-- text-file-one.txt
    |-- text-file-two.txt
    |-- text-file-three.txt
|-- output
    |-- visualization.png
 ```
 
Your mileage may vary, of course, depending on the project. You may, for example, be working with more than corpus. Or you may be producing so many visualizations that you need more than one output file, perhaps dynamically generated by the code as you go. 