-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathindex.html
352 lines (279 loc) · 23.3 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="IPython Cookbook, ">
<!-- FAVICON -->
<link rel="apple-touch-icon" sizes="57x57" href="/apple-touch-icon-57x57.png">
<link rel="apple-touch-icon" sizes="114x114" href="/apple-touch-icon-114x114.png">
<link rel="apple-touch-icon" sizes="72x72" href="/apple-touch-icon-72x72.png">
<link rel="apple-touch-icon" sizes="144x144" href="/apple-touch-icon-144x144.png">
<link rel="apple-touch-icon" sizes="60x60" href="/apple-touch-icon-60x60.png">
<link rel="apple-touch-icon" sizes="120x120" href="/apple-touch-icon-120x120.png">
<link rel="apple-touch-icon" sizes="76x76" href="/apple-touch-icon-76x76.png">
<link rel="apple-touch-icon" sizes="152x152" href="/apple-touch-icon-152x152.png">
<link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon-180x180.png">
<link rel="icon" type="image/png" href="/favicon-192x192.png" sizes="192x192">
<link rel="icon" type="image/png" href="/favicon-160x160.png" sizes="160x160">
<link rel="icon" type="image/png" href="/favicon-96x96.png" sizes="96x96">
<link rel="icon" type="image/png" href="/favicon-16x16.png" sizes="16x16">
<link rel="icon" type="image/png" href="/favicon-32x32.png" sizes="32x32">
<meta name="msapplication-TileColor" content="#da532c">
<meta name="msapplication-TileImage" content="/mstile-144x144.png">
<link rel="alternate" href="https://ipython-books.github.io/feeds/all.atom.xml" type="application/atom+xml" title="IPython Cookbook Full Atom Feed"/>
<title>IPython Cookbook - 3.2. Converting a Jupyter notebook to other formats with nbconvert</title>
<link href="//cdnjs.cloudflare.com/ajax/libs/font-awesome/4.2.0/css/font-awesome.min.css" rel="stylesheet">
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/pure/0.3.0/pure-min.css">
<!--[if lte IE 8]>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/pure/0.5.0/pure-min.css">
<![endif]-->
<!--[if gt IE 8]><!-->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/pure/0.5.0/pure-min.css">
<!--<![endif]-->
<link rel="stylesheet" href="https://ipython-books.github.io/theme/css/styles.css">
<link rel="stylesheet" href="https://ipython-books.github.io/theme/css/pygments.css">
<!-- <link href='https://fonts.googleapis.com/css?family=Lato:300,400,700' rel='stylesheet' type='text/css'> -->
<link href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,500" rel="stylesheet" type="text/css">
<link href='https://fonts.googleapis.com/css?family=Ubuntu+Mono' rel='stylesheet' type='text/css'>
<script src="//cdnjs.cloudflare.com/ajax/libs/jquery/2.0.3/jquery.min.js"></script>
</head>
<body>
<header id="header" class="pure-g">
<div class="pure-u-1 pure-u-md-3-4">
<div id="menu">
<div class="pure-menu pure-menu-open pure-menu-horizontal">
<ul>
<li><a href="/">home</a></li>
<li><a href="https://github.com/ipython-books/cookbook-2nd-code">Jupyter notebooks</a></li>
<li><a href="https://github.com/ipython-books/minibook-2nd-code">minibook</a></li>
<li><a href="https://cyrille.rossant.net">author</a></li>
</ul> </div>
</div>
</div>
<div class="pure-u-1 pure-u-md-1-4">
<div id="social">
<div class="pure-menu pure-menu-open pure-menu-horizontal">
<ul>
<li><a href="https://twitter.com/cyrillerossant"><i class="fa fa-twitter"></i></a></li>
<li><a href="https://github.com/ipython-books/cookbook-2nd"><i class="fa fa-github"></i></a></li>
</ul> </div>
</div>
</div>
</header>
<div id="layout" class="pure-g">
<section id="content" class="pure-u-1 pure-u-md-4-4">
<div class="l-box">
<header id="page-header">
<h1>3.2. Converting a Jupyter notebook to other formats with nbconvert</h1>
</header>
<section id="page">
<p><a href="/"><img src="https://raw.githubusercontent.com/ipython-books/cookbook-2nd/master/cover-cookbook-2nd.png" align="left" alt="IPython Cookbook, Second Edition" height="130" style="margin-right: 20px; margin-bottom: 10px;" /></a> <em>This is one of the 100+ free recipes of the <a href="/">IPython Cookbook, Second Edition</a>, by <a href="http://cyrille.rossant.net">Cyrille Rossant</a>, a guide to numerical computing and data science in the Jupyter Notebook. The ebook and printed book are available for purchase at <a href="https://www.packtpub.com/big-data-and-business-intelligence/ipython-interactive-computing-and-visualization-cookbook-second-e">Packt Publishing</a>.</em></p>
<p>▶ <em><a href="https://github.com/ipython-books/cookbook-2nd">Text on GitHub</a> with a <a href="https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode">CC-BY-NC-ND license</a></em><br />
▶ <em><a href="https://github.com/ipython-books/cookbook-2nd-code">Code on GitHub</a> with a <a href="https://opensource.org/licenses/MIT">MIT license</a></em></p>
<p>▶ <a href="https://ipython-books.github.io/chapter-3-mastering-the-jupyter-notebook/"><strong><em>Go to</em></strong> <em>Chapter 3 : Mastering the Jupyter Notebook</em></a><br />
▶ <a href="https://github.com/ipython-books/cookbook-2nd-code/blob/master/chapter03_notebook/02_nbformat.ipynb"><em><strong>Get</strong> the Jupyter notebook</em></a> </p>
<p>A Jupyter notebook is saved in a JSON text file. This file contains the entire contents of the notebook: text, code, and outputs. The matplotlib figures are encoded as base64 strings within the notebooks, resulting in standalone, but sometimes big, notebook files.</p>
<blockquote>
<p>JSON is a human-readable, text-based, open standard format that can represent structured data. Although derived from JavaScript, it is language independent. Its syntax bears some resemblance with Python dictionaries. JSON can be parsed in many languages including JavaScript and Python (using the <code>json</code> module in Python's standard library).</p>
</blockquote>
<p><strong>nbconvert</strong> (https://nbconvert.readthedocs.io/en/stable/) is a tool that can convert notebooks to other formats: raw text, Markdown, HTML, LaTeX/PDF, and even slides with the reveal.js library. You will find more information about the different supported formats on the nbconvert documentation.</p>
<p>One typically uses the <strong>nbformat</strong> (https://nbformat.readthedocs.io/en/latest/) library to manipulate a notebook. However, in this recipe, we will see how to manipulate the contents of a notebook (which is just a plain text JSON file) directly with Python, and how to convert it to other formats with nbconvert.</p>
<h2>Getting ready</h2>
<p>You need to install pandoc, available at <a href="http://pandoc.org.">http://pandoc.org.</a> This tool is used to convert markup files to various formats. On Ubuntu, type <code>sudo apt-get install pandoc</code> in a terminal.</p>
<p>To convert a notebook to PDF, you need a LaTeX distribution, which you can download and install at <a href="http://latex-project.org/ftp.html.">http://latex-project.org/ftp.html.</a></p>
<h2>How to do it...</h2>
<p><strong>1. </strong> Let's download and open the test notebook. A notebook is just a plain text file (JSON):</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">io</span>
<span class="kn">import</span> <span class="nn">requests</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="n">url</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'https://github.com/ipython-books/'</span>
<span class="s1">'cookbook-2nd-data/blob/master/'</span>
<span class="s1">'test.ipynb?raw=true'</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="n">contents</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span><span class="o">.</span><span class="n">text</span>
<span class="k">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">contents</span><span class="p">))</span>
</pre></div>
<div class="highlight"><pre><span></span>3857
</pre></div>
<p><strong>2. </strong> Here is an excerpt of the <code>test.ipynb</code> file:</p>
<div class="highlight"><pre><span></span><span class="k">print</span><span class="p">(</span><span class="n">contents</span><span class="p">[:</span><span class="mi">345</span><span class="p">]</span> <span class="o">+</span> <span class="s1">'...'</span> <span class="o">+</span> <span class="n">contents</span><span class="p">[</span><span class="o">-</span><span class="mi">33</span><span class="p">:])</span>
</pre></div>
<div class="highlight"><pre><span></span>{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# First chapter"
]
},
{
"cell_type": "markdown",
"metadata": {
"my_field": [
"value1",
"2405"
]
},
"source": [
"Let's write some *rich* **text** with
[links](http://www.ipython.org) and lists:\n",
"\n",
"* item1...rmat": 4,
"nbformat_minor": 4
}
</pre></div>
<p><strong>3. </strong> Now that we have loaded the notebook in a string, let's parse it with the <code>json</code> module as follows:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">json</span>
<span class="n">nb</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">contents</span><span class="p">)</span>
</pre></div>
<p><strong>4. </strong> Let's have a look at the keys in the notebook dictionary:</p>
<div class="highlight"><pre><span></span><span class="k">print</span><span class="p">(</span><span class="n">nb</span><span class="o">.</span><span class="n">keys</span><span class="p">())</span>
<span class="k">print</span><span class="p">(</span><span class="s1">'nbformat </span><span class="si">%d</span><span class="s1">.</span><span class="si">%d</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">nb</span><span class="p">[</span><span class="s1">'nbformat'</span><span class="p">],</span>
<span class="n">nb</span><span class="p">[</span><span class="s1">'nbformat_minor'</span><span class="p">]))</span>
</pre></div>
<div class="highlight"><pre><span></span>dict_keys(['cells', 'metadata',
'nbformat', 'nbformat_minor'])
nbformat 4.4
</pre></div>
<p><strong>5. </strong> Each cell has a type, optional metadata, some contents (text or code), possibly one or several outputs, and other information. Let's look at a Markdown cell and a code cell:</p>
<div class="highlight"><pre><span></span><span class="n">nb</span><span class="p">[</span><span class="s1">'cells'</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span>
</pre></div>
<div class="highlight"><pre><span></span>{'cell_type': 'markdown',
'metadata': {'my_field': ['value1', '2405']},
'source': ["Let's write some *rich* **text** with
[links](http://www.ipython.org) and lists:\n",
'\n',
'* item1\n',
'* item2\n',
' 1. subitem\n',
' 2. subitem\n',
'* item3']}
</pre></div>
<div class="highlight"><pre><span></span><span class="n">nb</span><span class="p">[</span><span class="s1">'cells'</span><span class="p">][</span><span class="mi">2</span><span class="p">]</span>
</pre></div>
<div class="highlight"><pre><span></span>{'cell_type': 'code',
'execution_count': 1,
'metadata': {},
'outputs': [{'data': {'image/png': 'iVBOR...QmCC\n',
'text/plain': ['<matplotlib Figure at ...>']},
'metadata': {},
'output_type': 'display_data'}],
'source': ['import numpy as np\n',
'import matplotlib.pyplot as plt\n',
'%matplotlib inline\n',
'plt.figure(figsize=(2,2));\n',
"plt.imshow(np.random.rand(10,10),
interpolation='none');\n",
"plt.axis('off');\n",
'plt.tight_layout();']}
</pre></div>
<p><strong>6. </strong> Once parsed, the notebook is represented as a Python dictionary. Manipulating it is therefore quite convenient in Python. Here, we count the number of Markdown and code cells as follows:</p>
<div class="highlight"><pre><span></span><span class="n">cells</span> <span class="o">=</span> <span class="n">nb</span><span class="p">[</span><span class="s1">'cells'</span><span class="p">]</span>
<span class="n">nm</span> <span class="o">=</span> <span class="nb">len</span><span class="p">([</span><span class="n">cell</span> <span class="k">for</span> <span class="n">cell</span> <span class="ow">in</span> <span class="n">cells</span>
<span class="k">if</span> <span class="n">cell</span><span class="p">[</span><span class="s1">'cell_type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'markdown'</span><span class="p">])</span>
<span class="n">nc</span> <span class="o">=</span> <span class="nb">len</span><span class="p">([</span><span class="n">cell</span> <span class="k">for</span> <span class="n">cell</span> <span class="ow">in</span> <span class="n">cells</span>
<span class="k">if</span> <span class="n">cell</span><span class="p">[</span><span class="s1">'cell_type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'code'</span><span class="p">])</span>
<span class="k">print</span><span class="p">((</span><span class="n">f</span><span class="s2">"There are {nm} Markdown cells and "</span>
<span class="n">f</span><span class="s2">"{nc} code cells."</span><span class="p">))</span>
</pre></div>
<div class="highlight"><pre><span></span>There are 2 Markdown cells and 1 code cells.
</pre></div>
<p><strong>7. </strong> Let's have a closer look at the image output of the cell with the matplotlib figure:</p>
<div class="highlight"><pre><span></span><span class="n">cells</span><span class="p">[</span><span class="mi">2</span><span class="p">][</span><span class="s1">'outputs'</span><span class="p">][</span><span class="mi">0</span><span class="p">][</span><span class="s1">'data'</span><span class="p">]</span>
</pre></div>
<div class="highlight"><pre><span></span>{'image/png': 'iVBOR...QmCC\n',
'text/plain': ['<matplotlib.figure.Figure at ...>']}
</pre></div>
<p>In general, there can be zero, one, or multiple outputs. Additionally, each output can have multiple representations. Here, the matplotlib figure has a PNG representation (the base64-encoded image) and a text representation (the internal representation of the figure).
<strong>8. </strong> Now, we convert our text notebook to HTML using nbconvert:</p>
<div class="highlight"><pre><span></span><span class="c1"># We write the notebook to a file on disk.</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'test.ipynb'</span><span class="p">,</span> <span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">contents</span><span class="p">)</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="err">!</span><span class="n">jupyter</span> <span class="n">nbconvert</span> <span class="o">--</span><span class="n">to</span> <span class="n">html</span> <span class="n">test</span><span class="o">.</span><span class="n">ipynb</span>
</pre></div>
<div class="highlight"><pre><span></span>[NbConvertApp] Converting notebook test.ipynb to html
[NbConvertApp] Writing 253784 bytes to test.html
</pre></div>
<p><strong>9. </strong> Let's display this document in an <code><iframe></code> (a small window showing an external HTML document within the notebook):</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">IPython.display</span> <span class="kn">import</span> <span class="n">IFrame</span>
<span class="n">IFrame</span><span class="p">(</span><span class="s1">'test.html'</span><span class="p">,</span> <span class="mi">600</span><span class="p">,</span> <span class="mi">200</span><span class="p">)</span>
</pre></div>
<p><img alt="HTML export" src="https://ipython-books.github.io/pages/chapter03_notebook/02_nbformat_files/02_nbformat_30_0.png" /></p>
<p><strong>10. </strong> We can also convert the notebook to LaTeX and PDF. In order to specify the title and author of the document, we need to extend the default LaTeX template. First, we create a file called <code>temp.tplx</code> that extends the default <code>article.tplx</code> template provided by nbconvert. We specify the contents of the author and title blocks as follows:</p>
<div class="highlight"><pre><span></span><span class="o">%%</span><span class="n">writefile</span> <span class="n">temp</span><span class="o">.</span><span class="n">tplx</span>
<span class="p">((</span><span class="o">*-</span> <span class="n">extends</span> <span class="s1">'article.tplx'</span> <span class="o">-*</span><span class="p">))</span>
<span class="p">((</span><span class="o">*</span> <span class="n">block</span> <span class="n">author</span> <span class="o">*</span><span class="p">))</span>
\<span class="n">author</span><span class="p">{</span><span class="n">Cyrille</span> <span class="n">Rossant</span><span class="p">}</span>
<span class="p">((</span><span class="o">*</span> <span class="n">endblock</span> <span class="n">author</span> <span class="o">*</span><span class="p">))</span>
<span class="p">((</span><span class="o">*</span> <span class="n">block</span> <span class="n">title</span> <span class="o">*</span><span class="p">))</span>
\<span class="n">title</span><span class="p">{</span><span class="n">My</span> <span class="n">document</span><span class="p">}</span>
<span class="p">((</span><span class="o">*</span> <span class="n">endblock</span> <span class="n">title</span> <span class="o">*</span><span class="p">))</span>
</pre></div>
<div class="highlight"><pre><span></span>Writing temp.tplx
</pre></div>
<p><strong>11. </strong> Then, we can run nbconvert by specifying our custom template as follows:</p>
<div class="highlight"><pre><span></span><span class="o">%%</span><span class="n">bash</span>
<span class="n">jupyter</span> <span class="n">nbconvert</span> <span class="o">--</span><span class="n">to</span> <span class="n">pdf</span> <span class="o">--</span><span class="n">template</span> <span class="n">temp</span> <span class="n">test</span><span class="o">.</span><span class="n">ipynb</span>
</pre></div>
<div class="highlight"><pre><span></span>[NbConvertApp] Converting notebook test.ipynb to pdf
[NbConvertApp] Support files will be in test_files/
[NbConvertApp] Making directory test_files
[NbConvertApp] Writing 16695 bytes to notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times:
['xelatex', 'notebook.tex']
[NbConvertApp] Running bibtex 1 time:
['bibtex', 'notebook']
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 16147 bytes to test.pdf
</pre></div>
<p>We used nbconvert to convert the notebook to LaTeX, and pdflatex (coming with our LaTeX distribution) to compile the LaTeX document to PDF. The following screenshot shows the PDF version of the notebook:</p>
<p><img alt="PDF output" src="https://ipython-books.github.io/pages/chapter03_notebook/02_nbformat_files/doc.png" /></p>
<h2>How it works...</h2>
<p>As we have seen in this recipe, an <code>.ipynb</code> file contains a structured representation of the notebook. This JSON file can be easily parsed and manipulated in Python and other languages. However, it is better practice to use the <strong>nbformat</strong> package to manipulate a notebook. The internal JSON format may change, whereas the nbformat API is not expected to change.</p>
<p>nbconvert is a tool for converting a notebook to another format. The conversion can be customized in several ways. Here, we extended an existing template using jinja2, a templating package (see <a href="http://jinja.pocoo.org/docs/">http://jinja.pocoo.org/docs/</a>).</p>
<h2>There's more...</h2>
<p>There is a free online service, <strong>nbviewer</strong>, that lets us render Jupyter notebooks in HTML dynamically in the cloud. The idea is that we provide to nbviewer a URL to a raw notebook (in JSON), and we get a rendered HTML output. The main page of nbviewer (http://nbviewer.jupyter.org/) contains a few examples. This service is maintained by the Jupyter developers and is hosted on Rackspace (https://www.rackspace.com).</p>
<p>GitHub automatically renders Jupyter notebooks stored in repositories.</p>
<p><strong>binder</strong>, available at <a href="https://mybinder.org">https://mybinder.org</a>, allows one to turn a GitHub repository into a collection of interactive notebooks in the cloud. The service is free and the code is open source, so that anyone can provide their own binder service.</p>
<p>Here are some more references:</p>
<ul>
<li>Documentation of nbconvert, at <a href="https://nbconvert.readthedocs.io/en/stable/">https://nbconvert.readthedocs.io/en/stable/</a></li>
<li>RISE, create interactive slideshows out of Jupyter notebooks, at <a href="https://damianavila.github.io/RISE/">https://damianavila.github.io/RISE/</a></li>
</ul>
</section>
</div>
</section>
<footer id="footer" class="pure-u-1 pure-u-md-4-4">
<div class="l-box">
<div>
<p>© <a href="https://cyrille.rossant.net">Cyrille Rossant</a> –
Built with <a href="https://github.com/PurePelicanTheme/pure-single">Pure Theme</a>
for <a href="https://blog.getpelican.com/">Pelican</a>
</p>
</div>
</div>
</footer>
</div>
<!-- Start of StatCounter Code for Default Guide -->
<script type="text/javascript">
var sc_project=9752080;
var sc_invisible=1;
var sc_security="c177b501";
var scJsHost = (("https:" == document.location.protocol) ?
"https://secure." : "http://www.");
</script>
<script type="text/javascript"
src="https://www.statcounter.com/counter/counter.js"
async></script>
<noscript><div class="statcounter"><a title="Web Analytics"
href="https://statcounter.com/" target="_blank"><img
class="statcounter"
src="//c.statcounter.com/9752080/0/c177b501/1/" alt="Web
Analytics"></a></div></noscript>
<!-- End of StatCounter Code for Default Guide -->
</body>
</html>