In [71]:
# Origin: Code for week 5 of programming class.
# Author: Tyler Abbot
# Title: week_5_code.ipynb
# Last modified: 2016.08.06

# Introduction

Today's class will introduce you to **packages in R**.

## Today's Objective

Our goal for the day is to become familiar with packages, not only some of the many packages available, but how to create our own. We would like to achieve the following objectives:
1. Discuss code modularity and packages.
2. Discuss neat packages that are available.
3. Discuss the benefits of creating your own packages.
4. Learn how to create a package from start to finish.
5. Learn how to publish your code.
6. Learn about collaboration using Github.

If by the end of the class, you feel we haven't sufficiently covered one of these, speak up!

## A Working Example : Hello World

Today you should have no problems with the example problem. Our goal is simply to print to screen the phrase "Hello World.", a [proto-typical computer programming example](https://en.wikipedia.org/wiki/%22Hello,_World!%22_program).

This example is important for your nerd culture, but we haven't seen it yet because in R it is so trivial:

In [2]:
print("Hello World.")

[1] "Hello World."


Ta-da! We did it! This is incredibly simple, but our goal today will be to write and publish a package that includes a function to do exactly this.

Before we get into the nitty-gritty though, we should talk about what are packages and why do we use them.

## What are packages?

A **package** is just code, like we've been writing all along, which contains functions related to one or several similar topics. The goal is to allow the user to specify exactly what functions they would like to load into the namespace at any one time.

If you've ever used Matlab, you know that it is very slow to start up. This is because every time you start matlab it loads all of its functionality.

In R, users can create their own packages and publish them to the Comprehensive R Archive Network (CRAN). Packges available there include everything from "Immunoglobulin Clonal Lineage and Diversity Analysis" to "Bayseian Structural Time Series". In fact, (according to [this website](http://blog.revolutionanalytics.com/2015/06/how-many-packages-are-there-really-on-cran.html)) there are over 6700 packages available, all open source and free.

You can imagine how long it would take to load all of those when starting up R! The package system allows you to pick and choose what you would like to use.

## Neat Packages

You will find yourself becoming interested in some incredibly specific area of research in the future and you'll probably be the only nerd working on your topic, but until then you might find yourself using packages others have made. [For a list of useful packages, see here.](https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages)

We've already interacted with one package (MASS) and we will see more as we move along to more complicated topics. Today we are going to need the following package, so let's install them:

**NOTE:** If you get a `non-zero exit status` error, check the error message, it may be caused by curl problems with the `git2r` package. If on Unix machine, try running `sudo apt-get -y install libcurl4-gnutls-dev
`.

In [3]:
#install.packages("devtools",
#                 repos='http://cran.us.r-project.org')
#install.packages("testthat",
#                 repos='http://cran.us.r-project.org')
#install.packages("roxygen2",
#                 repos='http://cran.us.r-project.org')

also installing the dependency ‘git2r’



At this point you are probably experiencing errors! Welcome, my friend, to open source computing. Some tips on solving your problem:
1. Look at the error message. If it says you need some other package, try installing that.
2. Copy your errors into google and search them, you are not the first to come this way.
3. Add `verbose=TRUE` to the `install.packages` command to get more verbose output and see if there is some error you're missing.
4. Keep struggling! It will work.

## Creating Your Own Packages

Before we get to the nitty gritty, let's take a minute to consider why this is a good idea, particularly from your perspective as a graduate student in economics.

Open source computing is a wonderful thing, but it can sometimes seem like a useless amount of extra work to publish your own code. Here are some reasons you should really consider doing this:

1. **"You are an intellectual entrepreneur."** That's a quote from Stephen Magee (possibly the happiest guy I ever met), one of my teachers in undergrad. He was referring to an academic's job to not only do research, but to sell their ideas to the world. You will only be truly successful if you can convince other people that you are important. One easy way to do so is to create a portfolio of open-source computing projects which you have either started or contributed to.
2. **Building competence.** The only way to learn how to code is to code, but the only way to learn how to write good, clean code is to let experienced programmers critique your work. You can get this kind of criticism by contributing to an open source project.
3. **Getting a job.** When you go on the job market, you are going to look exactly like everyone else. You are a graduate student from a decently ranked, European university with, most likely, no publication record. One way to stand out is by showing that you are contributing to the community through computing tools.
4. **Giving back.** All the code we are using in this class is open source: R, jupyter, Github, etc. People just like you set aside their free time to make it possible to use these tools for free, so why not do the same?

Ok, now that I'm down off my soap box, let's consider what we need in order to create a package in R. We can creat a rough outline of a package using the following command:

In [10]:
library("devtools")
# Change these to suit your needs
dir <- '~/MEGA/Teaching/Programming/Courses/Week5/'
name <- 'PracticePackage'

# Create your package
setwd(dir)
create(name)

Creating package 'PracticePackage' in '/home/tmabbot/MEGA/Teaching/Programming/Courses/Week5'
No DESCRIPTION found. Creating with values:




Package: PracticePackage
Title: What the Package Does (one line, title case)
Version: 0.0.0.9000
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
Description: What the package does (one paragraph).
Depends: R (>= 3.3.1)
License: What license is it under?
Encoding: UTF-8
LazyData: true


* Creating `PracticePackage.Rproj` from template.
* Adding `.Rproj.user`, `.Rhistory`, `.RData` to ./.gitignore


This command creates the following file tree:

~~~
PracticePackage/
├── DESCRIPTION
├── NAMESPACE
├── PracticePackage.Rproj
└── R
~~~

The `create` command gives you only the most bare-bones package structure. You could also use `package.skeleton` in the following way:

In [11]:
# Change these to suit your needs
dir <- '~/MEGA/Teaching/Programming/Courses/Week5/'
name <- 'PracticePackage2'

# Create your package
package.skeleton(name=name, path=dir)

Creating directories ...
Creating DESCRIPTION ...
Creating NAMESPACE ...
Creating Read-and-delete-me ...
Saving functions and data ...
In dump(internalObjs, file = file.path(code_dir, sprintf("%s-internal.R", : deparse may be incompleteMaking help files ...
Done.
Further steps are described in '~/MEGA/Teaching/Programming/Courses/Week5//PracticePackage2/Read-and-delete-me'.


This command gives you the following file structure:
~~~
PracticePackage2/
├── data
│   ├── dir.rda
│   └── name.rda
├── DESCRIPTION
├── man
│   ├── dir.Rd
│   ├── name.Rd
│   ├── PracticePackage2-package.Rd
│   ├── print_hello_world.Rd
│   ├── q.Rd
│   └── quit.Rd
├── NAMESPACE
├── R
│   ├── PracticePackage2-internal.R
│   ├── print_hello_world.R
│   ├── q.R
│   └── quit.R
└── Read-and-delete-me

~~~

This seems quite a bit more complex, but is actually relatively simple. Here's a brief description of the structure:
* **`data`** contains data-sets which come with the package.
* **`DESCRIPTION`** contains information about the package itself.
* **`man`** contains help information.
* **`NAMESPACE`** contains information about the imports and exports for a package.
* **`R`** contains your package code.

Ok, we've got our package! Now we need to make it our own. **I'm going to continue working from the bare-bones structure output by `create()`, but you can use the other structure if you'd like.**

## Customizing the Package Skeleton

The first thing to note is that our package is missing something. We need a `man` folder to hold our help documentation. You can add this using either the **Graphical User Interface (GUI)** of your operating system, or directly in R using the folowing commands: 

In [21]:
dir <- '~/MEGA/Teaching/Programming/Courses/Week5/'
name <- 'PracticePackage'
dir.create(paste(paste(dir, name, sep=""), "man", sep="/"))

### The R Functions
Next, we want to add our Hello World funciton. Create a `.R`. file titled whatever you like (functions works) and add the following code:

In [8]:
print_hello_world <- function(){
    print("Hello World.")
}

When you have a more complex package, you can save each individual function in its own file or group similar functionality into individual files. Try to keep things organized!

### Documenting Your Functions
Since all through this course you've been documenting your functions, this should come naturally. Add the `roxygen2` formatted documentation above your function in the `.R` file you just created:

In [None]:
#' A Hello World Function
#'
#' This function outpus 'Hello World.' That is all.
#' @export
#' @examples
#' print_hello_world()
 
print_hello_world <- function(){
    print("Hello World.")
}

Now you can process your documentation easily:

In [9]:
setwd(paste(dir, name, sep=""))
document()

Updating PracticePackage documentation
Loading PracticePackage


### Describing Your Package
The last thing to do is fill in the `DESCRIPTION` file. It already has all of the fields, you simply need to fill them in. Here's what mine looks like:
~~~
Package: PracticePackage
Title: A Hello World Package
Version: 0.0.0.9000
Authors@R: person("Tyler", "Abbot", email = "tyler.abbot@sciencespo.fr", role = c("aut", "cre"))
Description: This package teaches students how to create a package from start to finish... well, not really. It was used to do that! All the package does is print Hello World...
Depends:
    R (>= 3.3.1)
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 5.0.1
~~~

This is pretty straight forward, but there are a couple of things you should note:
1. The **version** number follows a standard form. `x.y.z.aaaa` is of the form `major.minor.patch.dev`. This is a **development package number** indicating that the package is not released. A **released package number** would be `x.y.z` only. One increments the rightmost number first, incrementing the major number only when large changes have occured (note that R is only at version 3.3.1.
2. The **license** determines how others are allowed to use your code. You can see a [list of licenses and what they imply here](http://wiki.civiccommons.org/Choosing_a_License/). I usually use the MIT License because it is one of the least restrictive and does not require any credit from future developers (it's just easier that way). To use the MIT License you also need to include a `LICENSE` file which states
~~~
YEAR: 2016
COPYRIGHT HOLDER: Your Name
~~~

You're done! You now have a package. Now it's just a question of installing it, sharing it, updating it, testing it, etc.!

## Installing Your Package
Installation is simple, simply back out of the current working directory (to the parent directory of your package) and run the install command:

In [10]:
setwd(dir)
install(name)

Installing PracticePackage
'/home/tmabbot/anaconda3/lib64/R/bin/R' --no-site-file --no-environ --no-save  \
  --no-restore --quiet CMD INSTALL  \
  '/home/tmabbot/MEGA/Teaching/Programming/Courses/Week5/PracticePackage'  \
  --library='/home/tmabbot/anaconda3/lib/R/library' --install-tests 

Reloading installed PracticePackage


In [5]:
print_hello_world()

[1] "Hello World."


You can even look at the help documenation:

In [6]:
?print_hello_world

0,1
print_hello_world {PracticePackage},R Documentation


## Testing Your Package
One important thing to include in your work is **testing**. This is a way to guarantee the proper functioning of your code. If in the future someone makes changes to the code and it breaks your functions, your testing should warn them that this is happening.

To get started, we'll use `devtools` and `testthat` to create a testing directory:

In [30]:
setwd(paste(dir, name, sep=""))
devtools::use_testthat()

* Adding testthat to Suggests
* Creating `tests/testthat`.
* Creating `tests/testthat.R` from template.


This created a directory for your tests and a tests file called `testthat.R`. If you've followed the steps up until now, your package should have the following file structure (just as a check):
~~~
PracticePackage/
├── DESCRIPTION
├── man
│   └── print_hello_world.Rd
├── NAMESPACE
├── PracticePackage.Rproj
├── R
│   └── function.R
└── tests
    ├── testthat
    └── testthat.R
~~~


In terms of writing your test files, this can get really complicated if you go looking around on the internet ([see the documentation here](http://r-pkgs.had.co.nz/tests.html)), but doesn't have to be.

A test file does things in a specific order, first reading in the file to test, then running a series of tests. You can keep things organized by writing a single file for each test. For our example, create a file titled `test_print_hello_world.R` in the `tests/testthat` folder.

Here's a test file for our function:

In [None]:
# Tell R what you are testing
context("print_hello_world.R")

#A test function
test_that("Outputs the correct thing.",{
    expect_match(print_hello_world(), "Hello World.")
})

Now we can run our tests from the R command line:

In [7]:
setwd(paste(dir, name, sep=""))
devtools::test()

Loading PracticePackage
Loading required package: testthat
Testing PracticePackage


print_hello_world.R: [1] "Hello World."
.



We seem to be doing great! Just to see what happens if we don't do things correctly, let's create a test that our package can never pass. Create a new file titled `test_impossible.R` and enter the following:

In [None]:
# Tell R what you are testing
context("print_hello_world.R")

#A test function
test_that("Outputs the correct thing.",{
    expect_match(print_hello_world(), "Thou shalt not pass.")
})

Now run the test again:

In [35]:
setwd(paste(dir, name, sep=""))
devtools::test()

Loading PracticePackage
Testing PracticePackage


print_hello_world.R: [1] "Hello World."
1
print_hello_world.R: [1] "Hello World."
.

Failed -------------------------------------------------------------------------
1. Failure: Outputs the correct thing. (@test_impossible.R#6) ------------------
print_hello_world() does not match "Thou shalt not pass.".
Actual value: "Hello World."




Our test works!!!! We are all that is programmer! Ok, delete that file and let's move on.

## Building Your Package

Essentially the last step before submitting your package to CRAN is to build it and check it.

In [11]:
setwd(paste(dir, name, sep=""))
build()

'/home/tmabbot/anaconda3/lib64/R/bin/R' --no-site-file --no-environ --no-save  \
  --no-restore --quiet CMD build  \
  '/home/tmabbot/MEGA/Teaching/Programming/Courses/Week5/PracticePackage'  \
  --no-resave-data --no-manual 



The `build` command creates a zip file (a "tar ball" to be exact) containing your package. Now to check the package you need to go to the command line, cd into the correct directory, and run the following:
~~~
R CMD check PracticePackage_0.0.0.9000.tar.gz --as-cran
~~~
You will almost surely have problems with this the first time you run it. Have patience and work through them. This is what programming is all about!

![up late](no_sleep.jpg)

Now that you are done, you should end up with a single warning:
~~~
Version contains large components...
~~~
This is driven by our development version number, shange this to whatever version you want and you are good to go!

## Publishing Your Code
The reason we ran our check with the option `--as-cran` was because our ultimate goal should be to publish the package on their database. This would allow users to install it using the `install.packages()` command and make us incredibly popular and famous.

Sadly, our package is far too simple to publish and the archive administrators would laugh us out of work (although we are unpaid, so that's not a huge loss).

The instructions for submitting are at the bottom of the [CRAN home page](https://cran.r-project.org/). You should submit at least one package during your life, so get on it! Maybe your semester project will suffice.

## Collaboration
Now that you know how packages are made and how to build them, you should consider collaborating. This is most often done using Github, which you should be getting familiar with. The homework will help you to undertand [the workflow](https://guides.github.com/introduction/flow/), but here's a brief outline of how things work:

1. You find a project you would like to contribut to. Either you are amazingly talented and they can't exist without you or vice versa (but we all know it's the former).
2. You **create a branch** from the master branch. Do this by cloning the repository. This will be a **fork**, on which you can make changes.
3. Make your changes, **commiting** them often. The more commits and the better your commit messages, the easier it will be to follow your changes.
4. Open a **pull request**. This asks the owner of the master branch to merge your changes to their version.
5. Make any changes the owner wants.
6. **Merge** your branch to the master branch.

In order to finish the homework, you'll need to carry out all of these steps (or WE will).

## Conclusion
That's it! You now know how to create packages and deploy them to CRAN and Github. We should have covered the following:
1. Discuss code modularity and packages.
2. Discuss neat packages that are available.
3. Discuss the benefits of creating your own packages.
4. Learn how to create a package from start to finish.
5. Learn how to publish your code.
6. Learn about collaboration using Github.

# Homework:

1. I've created a folder called "class_packs" in the Week5 folder of the course repo into which you will put your responses, so be sure to sync your local repo!
2. Create a blank package using your preffered method within your own, personal local repository.
3. Set up this package to run whatever function you want. I would prefer it be a different function than the one from class. Be sure to document your function.
5. Choose a license and fill in the `DESCRIPTION` file.
6. Generate the documentation using the `document()` command.
7. Install your package and make sure it works.
10. Push the changes to your own repo.
11. Copy and paste your package into the "class_packs" folder. 12. Add and commit your changes to the course repository.
13. Now open a <a href="https://help.github.com/articles/creating-a-pull-request/" target="_blank">pull request</a> and I will merge your changes into the library! **NOTE:** This may be troublesome at first, as we haven't done it yet, but it is pretty straight forward!