<a href="https://colab.research.google.com/github/yanncoadou/MLtutorials/blob/IPhU2022/ML_installation_checks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Machine learning software installation checks</h1>

Welcome! You have successfully downloaded this notebook from GitHub.

Most comments below are meant for using this notebook in Google Colaboratory, but it should be possible to run it locally, for instance with an [anaconda](https://docs.anaconda.com/anaconda/install/) installation.


In order to be able to edit and save the notebook, we advise you to use your personal Google Drive account. Choose the "File" menu in the menu bar of this page and select the "save a copy in Drive" sub item:

<img style="display: block; margin-left: auto; margin-right: auto; width: 50%;" alt="Save" width="30%" src="https://raw.githubusercontent.com/yanncoadou/MLtutorials/main/Save_a_copy_in_GDrive.png" >

Once you have done this, if you change anything to this notebook, you can save it for later access. You will have to follow this same procedure during the practice session.

If you are not familiar with Google Colaboratory, we recommend you read [this](https://colab.research.google.com/notebooks/welcome.ipynb) to get familiar with the environment and the interface.

You will find below technical checks to perform before joining the practice sessions. Given versions of software are for Colab. If not running on Colab, make sure to have the same revision or a more recent one.

# **Installation checks**

## scikit-learn

In [None]:
try:
  import sklearn
except ImportError as e:
  !pip install sklearn
  import sklearn
print (sklearn.__version__)  # preinstalled version 1.0.2 20220330

## tensorflow if not on Colab

In [None]:
try:
  import tensorflow as tf
except ImportError as e:
  !pip install tensorflow
  import tensorflow as tf
print (tf.__version__)  # preinstalled version 2.8.0 20220330

## XGBoost

In [None]:
# preinstalled version 0.9.0 20220330
!pip install xgboost --upgrade # install 1.5.2 20220330
import xgboost as xgb
print(xgb.__version__)

## LightGBM

In [None]:
# preinstalled version 2.2.3 20220330
!pip install lightgbm --upgrade # install 3.3.2 20220330
import lightgbm as lgb
print (lgb.__version__)

## CatBoost

In [None]:
# not preinstalled 20220330
!pip install catboost # install 1.0.4 20220330
import catboost
print (catboost.__version__)

## Enabling and testing the GPU
(largely copied from https://colab.research.google.com/notebooks/gpu.ipynb)

On COLAB, you can enable GPUs for the notebook:
- Navigate to Edit→Notebook Settings
- select GPU from the Hardware Accelerator drop-down

Note the warning when activating GPU: *To get the most out of Colab, avoid using a GPU unless you need one.*

In [None]:
try:
  import google.colab
  COLAB = True # if running in COLAB
except:
  COLAB = False # if not running on COLAB
GPU=False # if using GPU (can be True on COLAB and possibly Linux)
if (tf.test.gpu_device_name()):
  GPU=True
print('You are running on Colab:',COLAB)
print('You are running with GPU:',GPU)

In [None]:
if COLAB: 
    # Display available and used memory.
    !free -h
    print("-"*70)
    # Display the CPU specification.
    !lscpu
    print("-"*70)
    # Display the GPU specification (if available).
    if GPU:
      !(nvidia-smi | grep -q "has failed") && echo "No GPU found!" || nvidia-smi

In [None]:
import tensorflow as tf
import timeit

device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  if GPU:
        raise SystemError('GPU device not found')

def cpu():
  with tf.device('/cpu:0'):
    random_image_cpu = tf.random.normal((100, 100, 100, 3))
    net_cpu = tf.keras.layers.Conv2D(32, 7)(random_image_cpu)
    return tf.math.reduce_sum(net_cpu)

def gpu():
  with tf.device('/device:GPU:0'):
    random_image_gpu = tf.random.normal((100, 100, 100, 3))
    net_gpu = tf.keras.layers.Conv2D(32, 7)(random_image_gpu)
    return tf.math.reduce_sum(net_gpu)
  
# We run each op once to warm up; see: https://stackoverflow.com/a/45067900
cpu()
if GPU:
    gpu()

# Run the op several times.
from datetime import datetime
print (datetime.now())
print('Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images '
      '(batch x height x width x channel). Sum of ten runs.')
print('CPU (s):')
cpu_time = timeit.timeit('cpu()', number=10, setup="from __main__ import cpu")
print(cpu_time)
if GPU:
  print('GPU (s):')
  gpu_time = timeit.timeit('gpu()', number=10, setup="from __main__ import gpu")
  print(gpu_time)
  print('GPU speedup over CPU: {}x'.format(int(cpu_time/gpu_time)))

## Input dataset

Setting up access to data files on Google Drive

In [None]:
if COLAB:
    #### Reading files from Google Drive
    # Need a Google account to be identified
    #!pip install PyDrive
    import os
    from pydrive.auth import GoogleAuth
    from pydrive.drive import GoogleDrive
    from google.colab import auth
    from oauth2client.client import GoogleCredentials
    auth.authenticate_user()
    gauth = GoogleAuth()
    gauth.credentials = GoogleCredentials.get_application_default()
    drive = GoogleDrive(gauth)

In [None]:
import os
datapath=""
if not os.path.isfile("dataWW_d1_600k.csv.gz"):
  if COLAB:
    #attach dataset from google drive 
    download = drive.CreateFile({'id': '1nlXp7P-xq_jip4aPE0j0mnPhYnIOcBv4'})
    download.GetContentFile("dataWW_d1_600k.csv.gz")
    !ls -lrt
  else :
    # Make sure the file is available locally. 
    # Should be downloaded from https://drive.google.com/uc?id=1nlXp7P-xq_jip4aPE0j0mnPhYnIOcBv4
    !ls -lrt # what is in the local directory
    datapath="/directory/where/you/stored/dataWW_d1_600k.csv.gz"
    !ls -lrt {datapath} # what is in the data directory
    datapath=os.path.abspath(datapath).replace("\ ", " ")  # try to normalise the path (annoyance with the space)
    print ("Will take data from : ",datapath)
filename=os.path.join(datapath,"dataWW_d1_600k.csv.gz")
!ls -l {filename}

***If everything worked fine up to here, you are all set for the ML hands-on session.***