### Aim: Implementation of Python Libraries - SciKit

Scikit-learn, often referred to as SciKit-learn, is a popular open-source machine learning library for Python. It provides a wide range of tools and algorithms for data mining and data analysis tasks. SciKit-learn is built on top of other scientific libraries in Python such as NumPy, SciPy, and matplotlib, making it easy to integrate with existing Python workflows.

Here are some of the key features and functions provided by SciKit-learn:

1. Data preprocessing:
   - `preprocessing`: This module includes functions for scaling, standardizing, encoding, and imputing missing values in the data.

2. Supervised learning algorithms:
   - `linear_model`: This module provides implementations of linear regression, logistic regression, and other linear models.
   - `svm`: It includes support vector machine (SVM) algorithms for classification and regression.
   - `neighbors`: This module offers nearest neighbors algorithms, such as K-Nearest Neighbors (KNN).
   - `ensemble`: It contains ensemble methods like random forests and gradient boosting.

3. Unsupervised learning algorithms:
   - `cluster`: This module provides various clustering algorithms, including K-means clustering and hierarchical clustering.
   - `decomposition`: It includes functions for matrix factorization, such as Principal Component Analysis (PCA) and Non-negative Matrix Factorization (NMF).

4. Model selection and evaluation:
   - `model_selection`: This module provides functions for model selection, cross-validation, and hyperparameter tuning.
   - `metrics`: It includes a wide range of evaluation metrics for classification, regression, and clustering tasks.

5. Dimensionality reduction:
   - `decomposition`: This module offers techniques for reducing the dimensionality of datasets, including PCA and Singular Value Decomposition (SVD).

6. Model persistence and utility functions:
   - `model_selection`: This module provides functions for saving and loading models, as well as generating train-test splits.
   - `utils`: It includes various utility functions for data manipulation and model evaluation.

These are just a few examples of the modules and functions available in SciKit-learn. The library provides extensive documentation with detailed explanations and examples for each function, making it easier for users to understand and utilize its capabilities.

In [1]:
from sklearn.linear_model import LinearRegression as sunny

In [3]:
model = sunny()
model

In [4]:
import numpy as np
from sklearn.model_selection import train_test_split
x,y = np.arange(20).reshape((10,2)), range(10)
x

array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15],
       [16, 17],
       [18, 19]])

In [5]:
list(y)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

The code snippet you provided is using the train_test_split function from the model_selection module in SciKit-learn. This function is commonly used to split a dataset into training and testing subsets for machine learning tasks.

Here's a breakdown of what each component does:

x and y: These are the input features and target variable, respectively, that you want to split into training and testing subsets. Typically, x represents the input data, while y represents the corresponding labels or targets.

test_size=0.33: This parameter specifies the proportion of the dataset that should be allocated for testing. In this case, 33% (or 0.33) of the data will be used for testing, and the remaining 67% will be used for training. You can adjust this value to change the size of the testing set.

random_state=30: This parameter sets the random seed for reproducibility. By specifying a particular value, in this case, 30, you ensure that the random splitting of the data will be the same every time you run the code. This is useful for obtaining consistent results and comparing different models or experiments.

X_train, X_test, y_train, y_test: These are the output variables that will hold the resulting training and testing subsets. X_train will contain the training input features, X_test will contain the testing input features, y_train will contain the training target variable, and y_test will contain the testing target variable.

In [7]:
X_train, X_test, y_train, y_test  = train_test_split(x, y, test_size=0.33, random_state=30)

In [8]:
X_train

array([[12, 13],
       [ 8,  9],
       [14, 15],
       [16, 17],
       [18, 19],
       [10, 11]])

In [9]:
y_train

[6, 4, 7, 8, 9, 5]

In [10]:
X_test

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])

In [11]:
y_test

[0, 1, 2, 3]

In [12]:
train_test_split(y,shuffle=False)

[[0, 1, 2, 3, 4, 5, 6], [7, 8, 9]]

The model.fit(X_train, y_train) method is used to train a machine learning model using the training data.

Here's a brief explanation of what this function does:

X_train: This parameter represents the input features of the training set. It contains the data that the model will use to learn patterns and make predictions.

y_train: This parameter represents the target variable or labels of the training set. It contains the corresponding correct outcomes for the input features in X_train.

model.fit(): This method fits the model to the training data by adjusting its internal parameters or coefficients based on the input features (X_train) and the target variable (y_train). The model learns the patterns and relationships in the training data during this process.

In [13]:
model.fit(X_train,y_train)

In [14]:
predictions = model.predict(X_test)

In [15]:
predictions

array([3.44169138e-15, 1.00000000e+00, 2.00000000e+00, 3.00000000e+00])

Conclusions: We have successfully implemented SciKit library in this practical successfully.