<a href="https://colab.research.google.com/github/tomasplsek/AstroML/blob/main/01_uvod.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Introduction to Machine learning for astronomers in Python**




> - 6-7 lectures
- 6 hands-on sessions
- final project

<div style="">
<img src="https://www.physics.muni.cz/~plsek/img.jpeg" height="160px" align="right">

<img src="https://cdn.muni.cz/media/3289673/matejkosiba.jpg?mode=crop&center=0.53,0.57&rnd=132563180850000000&width=179" height="160px" align="right">

<img src="https://cdn.muni.cz/media/3291288/toastm.jpg?mode=crop&center=0.5,0.5&rnd=132569108280000000&width=179" height="160px" align="right">

\\
### **Timetable**
>16.9. - 1st lecture\
23.9. - Tomáš: *Intro to Colab / Jupyter*\
30.9. - Dean's leave\
7.10. - 2nd lecture\
14.10. - \
21.10. - 3rd lecture\
28.10. - \
4.11. - 4th lecture\
11.11. - \
18.11. - 5th lecture\
25.11. - Tomáš: *Intro to neural networks*\
2.12. - 6th lecture\
9.12. - \
16.12. - 7th lecture\
23.12. - ???

</div>

> **Tomáš Plšek**
> <br><br>
> email: <a href="mailto:plsek@physics.muni.cz">plsek@physics.muni.cz</a>\
> github: [tomasplsek](https://github.com/tomasplsek)\
> web: [physics.muni.cz/~plsek](https://www.physics.muni.cz/~plsek/)\
> office: [building 8 (math), 4th floor, last door to the left](https://is.muni.cz/auth/kontakty/mistnost?id=10454)

# 1. hands-on session: **Intro to Colab / Jupyter**

## **Contents**



1.   What is machine learning?
2.   Hardware / software
3.   Google Colab / Conda + Jupyter
4.   Basics of Python
5.   "Hello world" of ML
6.   What can I do with ML?
7.   CADET
8.   OpenAI Codex

## **What is Machine learning?**

Machine Learning is a set of algorithms with tunable parameters that can learn and adjust the values of these parameters from previously seen data and generalising for prediction of new yet unseen data.

<img src="https://drive.google.com/uc?export=view&id=1iCTWs8F4-fxZgqOi3opWPg_PRB3Xgd92" height="340px" align="left">

<img src="https://drive.google.com/uc?export=view&id=11faQQUfu0XGwcpItsXronTO9Bs1B4SRg" height="340px" align="right">

### **Why now?**
1943 - Neural networks\
1957 - **Perceptron** (Frank Rosenblatt)\
1967 -  **Nearest neighbor algorithm**\
...\
2006 - Deep Learning\
2016 - Alpha Go\
2021 - selfdriving cars

<img src="https://drive.google.com/uc?export=view&id=19rLShDospF7mDvaWn9aJtVPePeNV0SQx" width="460px" align="right">

## **Hardware**

*   **CPU** = *central processing unit*\
    \- in every computer / mobile phone\
    \- scalar operations\
    \- integrated GPUs\
    \- multiple cores / threads (parallelizable)\
    \- Intel, AMD

<br>

<img src="https://drive.google.com/uc?export=view&id=1m5eUCkodoLlHpW6JsABUOVOVm_nn2Gtr" width="460px" align="right">


*   **GPU** = *graphical processing unit*\
    \- vector operations\
    \- graphics and games\
    \- dedicated: NVIDIA, AMD

<br>

*   **TPU** = *tensor processing unit*\
    \- tensor operations\
    \- developed by Google for Tensorflow\
    \- narrow utilization field

<br>




## **Software**

Libraries for comunication with GPU:

<img src="https://www.nvidia.com/etc/designs/nvidiaGDC/clientlibs_base/images/NVIDIA-Logo.svg" height="50px" align="right">

*   CUDA = NVIDIA libraries ([toolkit](https://developer.nvidia.com/cuda-toolkit-archive), [cuDNN](https://developer.nvidia.com/rdp/cudnn-download))\
    \- GPU acceleration (Win & Linux)\
    \- lower level

<br>

Higher level libraries are being developed mostly for Python:

<img src="https://scikit-learn.org/stable/_static/scikit-learn-logo-small.png" height="50px" align="right">

*   [scikit-learn](https://scikit-learn.org/stable/)\
    \- set of general ML tools

<img src="https://www.gstatic.com/devrel-devsite/prod/v509a5f4800978e3ce5a1a5f2c1483bd166c25f20fdb759fe97f6131b7e9f1f00/tensorflow/images/lockup.svg" height="50px" align="right">

*   [tensorflow](https://www.tensorflow.org/)\
    \- matrix and tensor computations\
    \- using CUDA and GPU ([install](https://www.tensorflow.org/install/gpu), [setup](https://www.tensorflow.org/guide/gpu))

<img src="https://keras.io/img/logo.png" height="50px" align="right">

*   [keras](https://keras.io/)\
    \- on top of tensorflow\
    \- neural networks (dense & convolutional layers)

*   others: [pyTorch](https://pytorch.org/), Theano

<br>

Programing environments:

<div style="display:table">

<img src="https://colab.research.google.com/img/colab_favicon_256px.png" height="90px" align="right">

<img src="https://jupyter.org/assets/main-logo.svg" height="90px" align="right" style="margin-right:100px">

</div>


*   [Conda](https://www.anaconda.com/products/individual) + [Jupyter](https://jupyter.org/) notebook/lab

*   Google Colab

## **Conda + Jupyterlab**

Win10: [WSL](https://docs.microsoft.com/en-us/windows/wsl/install) + Ubuntu\
Linux: terminal

[Colab](https://www.anaconda.com/products/individual)

## **Google Colab**

- access to notebooks and files from **Google drive** and **GitHub**
- share notebooks
- GPU / TPU acceleration
- plenty of preinstalled packages (```numpy```, ```scipy```, ```matplotlib```, ```keras```, ```tensorflow```...)
- bash shell commands ```!```
- new packages are easy to install: ```!pip install corner```
- [magical commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html) ```%```
- include markdown or HTML blocks
- get help

## **Basics of Python**

- object oriented programming language
- high level language (implemented using C)
- interpreted
- open sourced / free to use
- plenty of modules (`scipy`, `astropy`, `keras`, `Qt`...)
- Python2, **Python3**
- NOT ONLY scripting language
- computations, programs, graphical interface, webs...

In [None]:
import sys
print(sys.version)

3.7.12 (default, Sep 10 2021, 00:21:48) 
[GCC 7.5.0]


### **Variables**

In [74]:
a = 1
b = 4.2
c = "word"
d = [1, 2, 3]
e = (1, 2, 3)
f = {"Tomas" : 1, "Toast" : 2, "Matěj" : 3}
def fun(): pass
import numpy
from numpy import array
g = array([1,2,3])
from pandas import DataFrame

for i in [a, b, c, d, e, f, fun, numpy, g, DataFrame()]:
    print(type(i))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'list'>
<class 'tuple'>
<class 'dict'>
<class 'function'>
<class 'module'>
<class 'numpy.ndarray'>
<class 'pandas.core.frame.DataFrame'>


### **Methods**

### **task 1:** compute sum of list using %timeit

a = [1, 2, 3, ...., 1000]

### **task 2**: multiply two lists

### Change to GPU

In [73]:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Num GPUs Available:  0


## **"Hello world" of ML**

### **Clustering**

Generate 2 randomly distributed groups of points and cluster them using a clustering algorithm.

### **Classification**

Generate 2 randomly distributed groups of points, train a SVC classifier to distinguish between the groups and test it.

## **What can I do with ML?**

Cases when I shouldn't use ML:
- simple problems / algorithms
- simply codable problems
- problems that need to be deterministic
- basic mathematics, precise computations


Cases when I can use ML:
- problems that require human abilities (vision, speech recognition, pattern recognition...)
- complex image / video operations
- high dimensional problems
- big data problems

## **CADET**

## **OpenAI Codex**

- [beta version](https://beta.openai.com/playground?model=davinci-codex)