-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathindex.html
248 lines (160 loc) · 12.5 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="IPython Cookbook, ">
<!-- FAVICON -->
<link rel="apple-touch-icon" sizes="57x57" href="/apple-touch-icon-57x57.png">
<link rel="apple-touch-icon" sizes="114x114" href="/apple-touch-icon-114x114.png">
<link rel="apple-touch-icon" sizes="72x72" href="/apple-touch-icon-72x72.png">
<link rel="apple-touch-icon" sizes="144x144" href="/apple-touch-icon-144x144.png">
<link rel="apple-touch-icon" sizes="60x60" href="/apple-touch-icon-60x60.png">
<link rel="apple-touch-icon" sizes="120x120" href="/apple-touch-icon-120x120.png">
<link rel="apple-touch-icon" sizes="76x76" href="/apple-touch-icon-76x76.png">
<link rel="apple-touch-icon" sizes="152x152" href="/apple-touch-icon-152x152.png">
<link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon-180x180.png">
<link rel="icon" type="image/png" href="/favicon-192x192.png" sizes="192x192">
<link rel="icon" type="image/png" href="/favicon-160x160.png" sizes="160x160">
<link rel="icon" type="image/png" href="/favicon-96x96.png" sizes="96x96">
<link rel="icon" type="image/png" href="/favicon-16x16.png" sizes="16x16">
<link rel="icon" type="image/png" href="/favicon-32x32.png" sizes="32x32">
<meta name="msapplication-TileColor" content="#da532c">
<meta name="msapplication-TileImage" content="/mstile-144x144.png">
<link rel="alternate" href="https://ipython-books.github.io/feeds/all.atom.xml" type="application/atom+xml" title="IPython Cookbook Full Atom Feed"/>
<title>IPython Cookbook - 7.8. Analyzing data with the R programming language in the Jupyter Notebook</title>
<link href="//cdnjs.cloudflare.com/ajax/libs/font-awesome/4.2.0/css/font-awesome.min.css" rel="stylesheet">
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/pure/0.3.0/pure-min.css">
<!--[if lte IE 8]>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/pure/0.5.0/pure-min.css">
<![endif]-->
<!--[if gt IE 8]><!-->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/pure/0.5.0/pure-min.css">
<!--<![endif]-->
<link rel="stylesheet" href="https://ipython-books.github.io/theme/css/styles.css">
<link rel="stylesheet" href="https://ipython-books.github.io/theme/css/pygments.css">
<!-- <link href='https://fonts.googleapis.com/css?family=Lato:300,400,700' rel='stylesheet' type='text/css'> -->
<link href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,500" rel="stylesheet" type="text/css">
<link href='https://fonts.googleapis.com/css?family=Ubuntu+Mono' rel='stylesheet' type='text/css'>
<script src="//cdnjs.cloudflare.com/ajax/libs/jquery/2.0.3/jquery.min.js"></script>
</head>
<body>
<header id="header" class="pure-g">
<div class="pure-u-1 pure-u-md-3-4">
<div id="menu">
<div class="pure-menu pure-menu-open pure-menu-horizontal">
<ul>
<li><a href="/">home</a></li>
<li><a href="https://github.com/ipython-books/cookbook-2nd-code">Jupyter notebooks</a></li>
<li><a href="https://github.com/ipython-books/minibook-2nd-code">minibook</a></li>
<li><a href="https://cyrille.rossant.net">author</a></li>
</ul> </div>
</div>
</div>
<div class="pure-u-1 pure-u-md-1-4">
<div id="social">
<div class="pure-menu pure-menu-open pure-menu-horizontal">
<ul>
<li><a href="https://twitter.com/cyrillerossant"><i class="fa fa-twitter"></i></a></li>
<li><a href="https://github.com/ipython-books/cookbook-2nd"><i class="fa fa-github"></i></a></li>
</ul> </div>
</div>
</div>
</header>
<div id="layout" class="pure-g">
<section id="content" class="pure-u-1 pure-u-md-4-4">
<div class="l-box">
<header id="page-header">
<h1>7.8. Analyzing data with the R programming language in the Jupyter Notebook</h1>
</header>
<section id="page">
<p><a href="/"><img src="https://raw.githubusercontent.com/ipython-books/cookbook-2nd/master/cover-cookbook-2nd.png" align="left" alt="IPython Cookbook, Second Edition" height="130" style="margin-right: 20px; margin-bottom: 10px;" /></a> <em>This is one of the 100+ free recipes of the <a href="/">IPython Cookbook, Second Edition</a>, by <a href="http://cyrille.rossant.net">Cyrille Rossant</a>, a guide to numerical computing and data science in the Jupyter Notebook. The ebook and printed book are available for purchase at <a href="https://www.packtpub.com/big-data-and-business-intelligence/ipython-interactive-computing-and-visualization-cookbook-second-e">Packt Publishing</a>.</em></p>
<p>▶ <em><a href="https://github.com/ipython-books/cookbook-2nd">Text on GitHub</a> with a <a href="https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode">CC-BY-NC-ND license</a></em><br />
▶ <em><a href="https://github.com/ipython-books/cookbook-2nd-code">Code on GitHub</a> with a <a href="https://opensource.org/licenses/MIT">MIT license</a></em></p>
<p>▶ <a href="https://ipython-books.github.io/chapter-7-statistical-data-analysis/"><strong><em>Go to</em></strong> <em>Chapter 7 : Statistical Data Analysis</em></a><br />
▶ <a href="https://github.com/ipython-books/cookbook-2nd-code/blob/master/chapter07_stats/08_r.ipynb"><em><strong>Get</strong> the Jupyter notebook</em></a> </p>
<p><a href="https://www.packtpub.com/big-data-and-business-intelligence/ipython-interactive-computing-and-visualization-cookbook-second-e">The recipe is available in the book, to be purchased on Packt.</a></p>
<!-- REMOVE AS PER PACKT AGREEMENT
**R** (http://www.r-project.org) is an open-source domain-specific programming language for statistics. Its syntax is well-adapted to statistical modeling and data analysis. By contrast, Python's syntax is typically more convenient for general-purpose programming. Luckily, Jupyter allows us to have the best of both worlds. For example, we can insert R code snippets anywhere in a normal Jupyter notebook. We can continue using Python and pandas for data loading and wrangling, and switch to R to design and fit statistical models. Using R instead of Python for these tasks is more than a matter of programming syntax; R comes with an impressive statistical toolbox.
In this recipe, we will show how to interface R with Python in the Jupyter Notebook , and we will illustrate the most basic capabilities of R with a simple data analysis example.
> There is another way of using R in the Jupyter Notebook, which is to install **IR**, the R kernel for Jupyter. Using this method, all of the code of an IR notebook is written in R, not in Python. You will find more information at [https://irkernel.github.io/installation/.](https://irkernel.github.io/installation/.)
## Getting ready
You need the statsmodels package for this recipe. It should be installed by default with Anaconda, but you can always install it with `conda install statsmodels`.
You also need R and rpy2 (https://rpy2.readthedocs.io/). There are three steps to use R with Python:
**1. ** Download R from [https://cran.r-project.org/](https://cran.r-project.org/) and install it.
**2. ** Install rpy2 with `conda install rpy2`.
**3. ** Run the `%load_ext rpy2.ipython` command in a Jupyter notebook.
> rpy2 does not appear to work well on Windows. We recommend using Linux or macOS.
## How to do it...
Here, we will use the following workflow: first, we load data from Python. Then, we use R to design and fit a model, and to make some plots in the Jupyter Notebook. We could also use R only for the entire recipe, or Python only. The goal of this recipe is precisely to show how to use both languages in the same Jupyter notebook.
**1. ** Let's load the *longley* dataset with the statsmodels package. This dataset contains a few economic indicators in the US from 1947 to 1962. We also load the IPython R extension:
wzxhzdk:0
wzxhzdk:1
wzxhzdk:2
**2. ** We define `x` and `y` as the **exogeneous** (independent) and **endogenous** (dependent) variables, respectively. The endogenous variable quantifies the total employment in the country.
wzxhzdk:3
wzxhzdk:4
wzxhzdk:5
**3. ** For convenience, we add the endogenous variable to the `x` DataFrame:
wzxhzdk:6
wzxhzdk:7

**4. ** We will make a simple plot in R. First, we need to pass Python variables to R. We use the `%R -i var1,var2` magic. Then, we call R's `plot()` command:
wzxhzdk:8
wzxhzdk:9

**5. ** Now that the data has been passed to R, we can fit a linear model to the data. In R, the `lm()` function lets us perform a linear regression. Here, we want to express `totemp` (total employment) as a function of the country's GNP. We use the `%%R` cell magic to write several lines of R code in a cell:
wzxhzdk:10

## How it works...
The `-i` and `-o` options of the `%R` magic allow us to pass variables back and forth between IPython and R. The variable names need to be separated by commas. You can find more information about the `%R` magic in the documentation available at [https://rpy2.readthedocs.io/.](https://rpy2.readthedocs.io/.)
In R, the tilde (~) expresses the dependence of a dependent variable upon one or several independent variables. The `lm()` function allows us to fit a simple linear regression model to the data. Here, `totemp` is expressed as a function of gnp:
$$\mathrm{totemp} = a \times \mathrm{gnp} + b$$
Here, `b` (intercept) and `a` are the coefficients of the linear regression model. These two values are returned by `fit$coefficients` in R, where `fit` is the fitted model.
Our data points do not satisfy this relation exactly, but the coefficients are chosen so as to minimize the error between this linear prediction and the actual values. This is typically done by minimizing the following least squares error:
$$r(a,b) = \sum_{i=1}^n (\mathrm{totemp}_i - (a \times \mathrm{gnp}_i + b))^2$$
The data points are $(gnp_i, totemp_i)$ here. The coefficients $a$ and $b$ that are returned by `lm()` make this sum minimal: they fit the data best.
## There's more...
Regression is an important statistical concept that we will see in greater detail in the next chapter. Here are a few references:
* Regression analysis on Wikipedia, available at [https://en.wikipedia.org/wiki/Regression_analysis](https://en.wikipedia.org/wiki/Regression_analysis)
* Least squares method on Wikipedia, available at [https://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29](https://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29)
Here are a few references about R:
* Introduction to R available at [http://cran.r-project.org/doc/manuals/R-intro.html](http://cran.r-project.org/doc/manuals/R-intro.html)
* R tutorial available at [http://www.cyclismo.org/tutorial/R/](http://www.cyclismo.org/tutorial/R/)
* CRAN, or Comprehensive R Archive Network, containing many packages for R, available at [http://cran.r-project.org](http://cran.r-project.org)
## See also
* Exploring a dataset with Pandas and matplotlib
-->
</section>
</div>
</section>
<footer id="footer" class="pure-u-1 pure-u-md-4-4">
<div class="l-box">
<div>
<p>© <a href="https://cyrille.rossant.net">Cyrille Rossant</a> –
Built with <a href="https://github.com/PurePelicanTheme/pure-single">Pure Theme</a>
for <a href="https://blog.getpelican.com/">Pelican</a>
</p>
</div>
</div>
</footer>
</div>
<!-- Start of StatCounter Code for Default Guide -->
<script type="text/javascript">
var sc_project=9752080;
var sc_invisible=1;
var sc_security="c177b501";
var scJsHost = (("https:" == document.location.protocol) ?
"https://secure." : "http://www.");
</script>
<script type="text/javascript"
src="https://www.statcounter.com/counter/counter.js"
async></script>
<noscript><div class="statcounter"><a title="Web Analytics"
href="https://statcounter.com/" target="_blank"><img
class="statcounter"
src="//c.statcounter.com/9752080/0/c177b501/1/" alt="Web
Analytics"></a></div></noscript>
<!-- End of StatCounter Code for Default Guide -->
</body>
</html>