# Parting words

Now that you've made it to the end, perhaps you're wondering where to go
from here. Programming is perhaps the easiest skill to learn using only
the internet, so there are many options. There's also always room for
improvement and learning new things, it's a lifelong journey. In lieu of
parting words, here are some tips on where to get started, at least in
the world of Python.

Thank you for reading so far, and if you have any suggestions for
improvement, additions, or just spotted a few typos, please [report them
on GitHub](https://github.com/v4py/v4py.github.io/issues)!

# Books

<a href="https://www.nltk.org/book/" target="_blank">
<img src="images/outro/nltk_book.jpg" width="200" alt="Natural Language Processing with Python book cover">
</a>

As a general introduction to Python programming which focuses on
linguistic applications, I've already recommended [*Natural Language
Processing with Python: Analyzing Text with the Natural Language
Toolkit*](https://www.nltk.org/book/) by Steven Bird, Ewan Klein and
Edward Loper, and I'm going to recommend it again. It's a great
resource, all the more useful since it's freely available online. It
doesn't provide just recipes on how to use the latest and greatest fancy
stuff in NLP, treating the tools as black boxes, but rather focuses on
understanding algorithms and concepts and improving your programming
skills. This means that it often spends time on less cutting-edge
methods, which are however conceptually simpler and thus have better
teaching value. Depending on what your immediate needs are, this may be
a strength or a weakness, but in the long run, I would argue that every
programming linguist should spend some time honing their programming
skills instead of always blindly following how-to style recipes, because
even copy-pasting black box code can go seriously wrong if you don't
have a larger picture of what's going on.

<a href="https://jakevdp.github.io/PythonDataScienceHandbook/" target="_blank">
<img src="images/outro/python_data_science.png" width="200" alt="Python Data Science Handbook cover">
</a>

On perhaps a more practical note, I can definitely recommend Jake
VanderPlas's [*Python Data Science
Handbook*](https://jakevdp.github.io/PythonDataScienceHandbook/), an
other great resource which is also freely available online. This book is
not concerned with NLP per se but rather with data analysis, i.e. with
what happens after you've processed your text data and want to do some
statistical modeling or machine learning with it. This has traditionally
been the domain of [R](https://www.r-project.org/), especially among
linguists, but R is a very idiosyncratic language which encourages the
copy-paste, black box approach: while it sometimes provides pleasant and
easy-to-use abstractions (especially in the
[tidyverse](https://www.tidyverse.org/) third-party packages), building
them yourself or wiring them together can be challenging because the
underlying language is not really well-designed, edge cases and
surprising behaviors abound. Python is much easier to wrap your head
around, perhaps because it has always been intended as a more
general-purpose programming language, but by the same token, it can be
sometimes hard to know which libraries and techniques to use when
getting started with data analysis in Python. The *Python Data Science
Handbook* is there to help you with that.

# Videos

Unlike conferences in linguistics, programming conferences are often
recorded and professionally produced videos (including presentation
slides) are subsequently made available, most often for free. For
tutorials and workshops, the materials often remain available long after
the conference via sites like GitHub, so you can follow along at your
leisure.

Since conferences are popular, there's potentially *a lot* of watching
material, not all of it great. Some people are better programmers than
speakers or teachers, some are good at neither, but there are so many
conferences that they can accommodate all of them. Below is a collection
of videos that I either consider rare gems of the Python conference
circuit, or that are particularly relevant to the subject of analyzing
language data, or both.

If you end up searching yourself, I can recommend almost anything by
either Raymond Hettinger, Ned Batchelder or David Beazley. Their
contributions are consistently extremely informative, well-prepared and
entertaining at the same time.

## Improving your Python chops

<!-- Beazley: builtins _____________________________________________ -->


A great tour of Python's built-in functionality, i.e. stuff that's
always available, without having to load any libraries, and tips and
tricks on how to use it. A great way to top off your Python initiation
and graduate to a proficient beginner.


<iframe width="560" height="315"
  src="https://www.youtube-nocookie.com/embed/lyDLAutA88s" frameborder="0"
  allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
  allowfullscreen>
</iframe>

<!-- Batchelder: variables _________________________________________ -->


This is perhaps the most useful intermediate Python talk ever. It'll
clear up any misconceptions about how variables work in Python that you
might have accumulated on your programming journey so far, and enable
you to work on more complicated and larger pieces of code with more
confidence.


<iframe width="560" height="315"
  src="https://www.youtube-nocookie.com/embed/_AEJHKGk9ns" frameborder="0"
  allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
  allowfullscreen>
</iframe>

<!-- =============================================================== -->

<hr style="margin-top: 4rem;">

## Data analysis and NLP

<!-- VanderPlas: PyData 101 ________________________________________ -->


If you're interested in using Python for data analysis, I can recommend
anything by Jake VanderPlas (who wrote the *Python Data Science
Handbook* mentioned earlier). This is an introductory talk which
provides basic orientation in the Python data analysis landscape -- what
tools exist and when to use which. As a keynote, it's somewhat longer
and also provides a bit of historical background on Python, with a bias
for data science applications of course.


<iframe width="560" height="315"
  src="https://www.youtube-nocookie.com/embed/DifMYH3iuFw" frameborder="0"
  allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
  allowfullscreen>
</iframe>

<!-- VanderPlas: statistics ________________________________________ -->


A bit unsure how statistics works, or what it's even good for? This
particular talk may be titled *Statistics for Hackers*, but in reality
it's geared towards anyone with a keen mind who's interested in
statistics but hasn't had extensive formal training in math, which means
they sometimes struggle with a formula-heavy approach. Which often
applies to linguists, including myself. This may also be a good place to
point out that ['hacker'](https://en.wikipedia.org/wiki/Hacker) doesn't
always (and certainly not here) refer to someone who breaks into other
people's computers.


<iframe width="560" height="315"
  src="https://www.youtube-nocookie.com/embed/Iq9DzN6mvYA" frameborder="0"
  allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
  allowfullscreen>
</iframe>

<!-- VanderPlas: cutting-edge viz __________________________________ -->


Visualization is currently a rapidly evolving landscape in Python, and
this talk is about new developments based on the *grammar of graphics*
and *declarative visualization* approach, which was popularized by the R
package [ggplot2](https://ggplot2.tidyverse.org/). The first part is a
teaches you how to think about visualization in general, while the
second introduces the [Altair](https://altair-viz.github.io/) library.


<iframe width="560" height="315"
  src="https://www.youtube-nocookie.com/embed/vTingdk_pVM" frameborder="0"
  allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
  allowfullscreen>
</iframe>

<!-- VanderPlas: traditional viz ___________________________________ -->


By contrast, this older talk gives an overview of the Python
visualization landscape including the more traditional and established
Python visualization tools, which many people continue using and which
aren't going away anytime soon. Their advantage is that they're mature,
stable and widely known, so it can be much easier to get help on how to
use them from random people on the internet. A good accompanying
resource for this talk is [Part
4](https://jakevdp.github.io/PythonDataScienceHandbook/04.00-introduction-to-matplotlib.html)
of the *Python Data Science Handbook*.


<iframe width="560" height="315"
  src="https://www.youtube-nocookie.com/embed/FytuB8nFHPQ" frameborder="0"
  allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
  allowfullscreen>
</iframe>

<!-- Zhao: NLP tutorial ____________________________________________ -->


An NLP tutorial whose most valuable aspect is that if offers an extended
worked example of data analysis, from collecting raw data to
communicating insights. It gives a very good idea of what this entire
process typically looks like and what are the pitfalls to look out for.


<iframe width="560" height="315"
  src="https://www.youtube-nocookie.com/embed/xvqsFTUsOmc" frameborder="0"
  allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
  allowfullscreen>
</iframe>

<!-- =============================================================== -->

<hr style="margin-top: 4rem;">

## War stories

<!-- Beazley: mission impossible ___________________________________ -->


A real-life story on how Python helped David Beazley to make sense of
large amounts of unknown data in order to prepare an expert testimony in
a legal case. More on the entertaining side than the educational, but it
*will* teach you that Python is a Swiss-army knife for slicing and
dicing data. The *Mission: Impossible* of programming conference talks,
with Python starring as agent Ethan Hunt!


<iframe width="560" height="315"
  src="https://www.youtube-nocookie.com/embed/RZ4Sn-Y7AP8" frameborder="0"
  allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
  allowfullscreen>
</iframe>

<!-- =============================================================== -->

<hr style="margin-top: 4rem;">

## Nuts and bolts (advanced)

And finally, here are a few more advanced talks which I heartily
recommend watching after you've spent a little more time with Python.

<!-- Hettinger: dictionaries _______________________________________ -->


Dictionaries are the bread and butter of the Python programmer, and
they're also at the core of how many of the features in the language
work. As such, their implementation has evolved over the years to
incorporate increasingly clever tricks. Learn what they are from the
proverbial horse's mouth, Python core developer Raymond Hettinger, who's
also one of the most consistently entertaining conference speakers I've
seen!


<iframe width="560" height="315"
  src="https://www.youtube-nocookie.com/embed/npw4s1QTmPg" frameborder="0"
  allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
  allowfullscreen>
</iframe>

<!-- Hettinger: concurrency ________________________________________ -->


Another talk by Raymond Hettinger, this time about making the computer
do multiple things at the same time. Spoiler: it's hard to get this
right, and you should probably think twice whether you really needed
before you start tinkering with it.


<iframe width="560" height="315"
  src="https://www.youtube-nocookie.com/embed/9zinZmE3Ogk" frameborder="0"
  allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
  allowfullscreen>
</iframe>

<!-- NJ Smith: trio ________________________________________________ -->


On a similar topic as the previous one, a talk on how computers can
*pretend* they're doing several things at the same time by quickly
switching between tasks, and on designing a Python library which makes
such programming fairly intuitive and less error-prone. If you've heard
the buzzwords `async` and `await`, they feature prominently.


<iframe width="560" height="315"
  src="https://www.youtube-nocookie.com/embed/oLkfnc_UMcE" frameborder="0"
  allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
  allowfullscreen>
</iframe>

<!-- =============================================================== -->

<hr style="margin-top: 4rem;">

# Libraries

- nltk
- corpy
- lxml
- trio
- regex (use re docs)
- matplotlib
- seaborn
- pandas

<!-- vim: set spell spelllang=en: -->