# Day 1, Part 1: Introduction

## Introduction to Python

### [Tom Paskhalis](https://tom.paskhal.is/)

##### [RECSM Summer School 2023](https://www.upf.edu/web/survey/summer-school-2023)

<table>
    <tr>
        <td><img width="500" src='../imgs/python_snake.jpg'></td>
        <td><img width="500" src='../imgs/python_monty.png'></td>
    </tr>
</table>

<div style="text-align: center;">
    <img width="500" height="300" src="../imgs/xkcd_353.png">
</div>

Source: [https://xkcd.com/353/](https://xkcd.com/353/)

<div style="text-align: center;">
    <img width="700" height="400" src="../imgs/stats_languages.jpg">
</div>

## About me

- Assistant Professor in Political Science and Data Science, [Trinity College Dublin](https://www.tcd.ie/)
    - Before: Postdoctoral Fellow, [New York University](https://www.nyu.edu/)
    - PhD in Social Research Methods, [London School of Economics and Political Science](http://www.lse.ac.uk/)
- My research:
    - Political communication, social media, interest groups
    - Text analysis, machine learning, record linkage, data visualization
- Contact
    - [tom@paskhal.is](mailto:tom@paskhal.is)
    - [tom.paskhal.is](https://tom.paskhal.is/)
    - [@tpaskhalis](https://twitter.com/tpaskhalis/)

## About you

<table>
    <tr>
        <td><img width="500" src="../imgs/michael_scott.jpg"></td>
        <td style="width: 500px; text-align: left; font-size:120%;">
            <ul>
                <li>Name?</li>
                <li>Affiliation?</li>
                <li>Research interests?</li>
                <li>Previous Experience with Python?</li>
                <li>Why are you interested in this course?</li>
            </ul>
        </td>
    </tr>
</table>

## R/Stata/SPSS is great, why learn Python?

- [Python is free and open source](https://github.com/python/cpython)
- [Python is a truly versatile programming language](https://github.com/readme/nasa-ingenuity-helicopter)
- [Python offers a great library ecosystem (>300K)](https://pypi.org/)
- [Python is widely used in the industry](https://www.tiobe.com/tiobe-index/)
- [Python is well-known outside academia/data science](https://www.economist.com/science-and-technology/2018/07/19/python-has-brought-computer-programming-to-a-vast-new-audience)

## Popularity of programming languages

<div style="text-align: center;">
    <img width="700" height="500" src='../imgs/tiobe_index.png'>
</div>
    
Source: [https://www.tiobe.com/tiobe-index/](https://www.tiobe.com/tiobe-index/)

## Popularity of data analysis software

<div style="text-align: center;">
    <img width="600" height="400" src="../imgs/kaggle_ide.png">
</div>

Source: [https://www.kaggle.com/kaggle-survey-2021](https://www.kaggle.com/kaggle-survey-2021)

## Python and Development Enviroments

- There is a number of integrated development environments (*IDE*s) available for Python (IDLE, Spyder, PyCharm)
- As well code editors with Python-specific extensions (Visual Studio Code, Atom, Sublime Text, Vim)
- Try different ones and choose what works best for you!

## Python and Jupyter Notebook

- [Jupyter Notebook](https://jupyter-notebook.readthedocs.io/en/latest/) is language-agnostic web-based interactive computational environment
- Is available with backends (*kernels*) for different programming languages (**Ju**lia, **Py**thon, **R** = **Jupy**te**r**)
- Can be used both locally and remotely
- Good for ad-hoc data analysis and visualization 

## Jupyter Notebook

- Notebooks allow writing, executing and viewing the output of Python code within the same environment
- All notebook files have `.ipynb` extension for **i**nteractive **py**thon **n**ote**b**ook
- The main unit of notebook is *cell*, a text input field (Python, Markdown, HTML)
- Output of a cell can include text, table or figure

## Jupyter Notebook online

- For this workshop I recommend using one of the online platforms for working with Jupyter Notebooks:
  - [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb), a cloud platform for hosting Jupyter Notebooks. You need to have a Google account, but it does not require any local installations.
  - [Kaggle Code](https://www.kaggle.com/code), a platform for sharing and exploring data-science-focussed Jupyter Notebooks. Although technically owned by Google, you can register just for Kaggle website.

## Jupyter Notebook installation

- If you would prefer to install Jupyter Notebook on your local machine, there are two main ways to do this: 
    - [pip](https://jupyter.readthedocs.io/en/latest/install/notebook-classic.html#alternative-for-experienced-python-users-installing-jupyter-with-pip)
    - [conda](https://jupyter.readthedocs.io/en/latest/install/notebook-classic.html#installing-jupyter-using-anaconda-and-conda)
- Unless you have prior experience with Python, I recommend installing [Anaconda](https://www.anaconda.com/products/individual) distribution, which contains all the packages required for this course.

## Jupyter Notebook demonstration

![Jupyter Notebook 1](../imgs/jupyter_notebook_1.png)

## Jupyter Notebook demonstration

![Jupyter Notebook 2](../imgs/jupyter_notebook_2.png)

## Course outline

| Date    | Time (CEST)   | Topic                                         |
|:--------|:--------------|:----------------------------------------------|
| 26 June | 09:00-10:45   | Introduction to Python objects and data types |
|         | 10:45-11:15   | Break                                         |
|         | 11:15-13:00   | Pandas, data input/output                     |
| 27 June | 09:00-10:45   | Exploratory data analysis, data visualization |
|         | 10:45-11:15   | Break                                         |
|         | 11:15-13:00   | Regression analysis, communicating results    |


## Materials

- All materials for this workshop can be found: 
    - In this GitHub repository: [github.com/tpaskhalis/RECSM_Introduction_Python](https://github.com/tpaskhalis/RECSM_Introduction_Python)
    - Alternative shortlink: [bit.ly/RECSM_Python](https://bit.ly/RECSM_Python)
- For your convenience you might want to choose to clone this repository to your local macihine.
- All slides and exercises were created using Python and Jupyter Notebooks.

## Additional materials

- There are many great online resources and published books on programming in Python.
- Some of them also provide a good coverage of using Python for data analysis.
- Here are some pointers to start from.

## Books

- Guttag, John. 2021 *Introduction to Computation and Programming Using Python: With Application to Computational Modeling and Understanding Data*. 3rd ed. Cambridge, MA: The MIT Press

- McKinney, Wes. 2022. *Python for Data Analysis: Data Wrangling with pandas, NumPy, and
Jupyter*. 3rd ed. Sebastopol, CA: O'Reilly Media

- Sweigart, Al. 2019. *Automate the Boring Stuff with Python*. 2nd ed. San Francisco, CA: No Starch Press

## Online

- [The Hitchhiker’s Guide to Python](https://docs.python-guide.org/)

- [Python For You and Me](https://pymbook.readthedocs.io/en/latest/)

- [Python Wikibook](https://en.wikibooks.org/wiki/Python_Programming)

- [Python 3 Documentation](https://docs.python.org/3/) (intermediate and advanced)

## Next

- Basic Python types
- Operations
- Object manipulations