<a href="https://colab.research.google.com/github/urosgodnov/juypterNotebooks/blob/main/DataMining/Machine_Learning_with_Python_1_setting_the_environment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Using Python to implement machine learning process
by dr. Uros Godnov**

# Google Colab

- Free Cloud-Based Environment: Google Colab
(Collaboratory) allows users to write and execute Python code in a web-based environment, without needing to install any software locally.

- Integration with Google Drive: You can save and load files from your Google Drive directly, making it easy to work with data stored online.

- GPU and TPU Support: Google Colab provides free access to GPUs (Graphics Processing Units) and TPUs (Tensor Processing Units), which accelerates tasks like machine learning, deep learning, and large-scale computations.

- Jupyter Notebook Interface: Colab uses the familiar Jupyter Notebook interface, supporting interactive code execution, text, images, and visualizations within cells.

- Pre-installed Libraries: It comes with many commonly used Python libraries pre-installed (e.g., NumPy, TensorFlow, Keras, Pandas), making it easy to get started with data science and machine learning projects.

- Collaborative Features: Similar to Google Docs, multiple users can collaborate in real-time on the same notebook, making it ideal for team projects.

- Limitations: While Google Colab is powerful, it has limitations on session duration (up to 12 hours) and RAM usage, especially in the free tier.

- RunType: Python and R

# Loading modules and functions

In [None]:
import pandas as pd
from google.colab import drive

**from … import and aliasing modules**

In [None]:
from random import random as rd
from random import randint as rdint

**using help() on a function**

In [None]:
help(rd)
help(rdint)

In [None]:
rd()

In [None]:
rdint(1,100)

**seed**

The function seed is typically used to set the seed for a random number generator in various Python libraries, like random, numpy, or torch, to ensure reproducibility of results. Setting the seed ensures that the sequence of random numbers generated is the same across different runs, which is particularly useful in testing, debugging, and experimentation in machine learning models or any system that relies on randomness.

In [None]:
import random

In [None]:
random.seed(42)
rdint(1,100)

## Reading data

### From GDrive

In [None]:
# drive.mount('/content/drive')

In [None]:
# df=pd.read_csv('/content/drive/MyDrive/Pandas_data/laptop_price_with_missing_values.csv')
# df.head()

### From online resource

In [None]:
df=pd.read_csv("https://raw.githubusercontent.com/urosgodnov/datasets/refs/heads/master/laptop_price_with_missing_values.csv")


In [None]:
df.head()

### From disk


In [None]:
from google.colab import files

In [None]:
uploaded = files.upload()

After the file is stored in uploaded variable, we have to read it with pd.read_*. The file is visible under **Files** option in Google Colab.


In [None]:
df_xlsx=pd.read_excel("/content/Meritve 0014.xlsx")
df_xlsx.iloc[0:5,1:]

**changing pandas behavior**

In [None]:
pd.set_option('display.max_rows', 20)         # Show up to 20 rows
pd.set_option('display.max_columns', 10)      # Show up to 10 columns
pd.set_option('display.precision', 3)         # Set float precision to 3 decimals
pd.set_option('display.max_colwidth', 50)     # Limit column width to 50 characters


In [None]:
df.head()

## Installing new modules into Colab

Shell Command Execution:
- In Colab, the ! allows you to run shell commands (like pip install) from within a Python notebook. Colab is a notebook environment, so using ! tells the notebook to execute the command in the underlying Linux shell instead of as Python code.

- Installing External Libraries: Colab comes with many pre-installed libraries, but if you need an external package, you use !pip install to install it just as you would in a terminal.

- Alternative to %pip: In environments like Jupyter, %pip can be used as a magic command to install Python packages. However, in Colab, !pip is more commonly used to leverage the shell's functionality.

- Exclude messages while installing: > /dev/null 2>&1

In [None]:
!pip install sweetviz pyjanitor > /dev/null 2>&1

**Problem**:

- In Google Colab, the environment resets when you close or restart the notebook.
- Any installed packages using !pip install are lost after the environment resets.
- This happens because Colab uses ephemeral virtual machines, which means:
  1. Installed packages and data do not persist after the session ends.
  2. Each session starts with a clean environment.

**Solution**:
- Install the needed libraries in a separate file on Google Drive.
- Every time you need the libraries, mount the drive and import the file with the libraries

In [None]:
import os, sys
from google.colab import drive
drive.mount('/content/gdrive')

# Create a path variable

nb_path = '/content/notebooks'



In [None]:
# Create a symlink to the Google Drive
try:
  os.symlink('/content/gdrive/MyDrive/Google_Colab_modules', nb_path)
  # Append the path variable
  sys.path.insert(0, nb_path)
except:
  print("Symlink already exists")


In [None]:
# I specified the nb_path to install the library in it
!pip install --target=$nb_path ydata-profiling sweetviz > /dev/null 2>&1

**Importing the modules from gdrive**

In [None]:
# Importing libraries
from google.colab import drive
import sys
# Mount google drive
drive.mount('/content/gdrive')
# Changing path dirctory
sys.path.append('/content/gdrive/MyDrive/Google_Colab_modules')

import sweetviz as sv
import ydata_profiling as ydp
