**Importing Libraries**

In [1]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model

**Loading dataset**

In [4]:
diabetes = datasets.load_diabetes()

**About datasets from sklearn**

*The datasets provided by sklearn (Scikit-learn) are useful for practicing machine learning techniques. Scikit-learn offers several datasets, including both small toy datasets (for learning and testing purposes) and larger real-world datasets. These datasets are available through the sklearn.datasets module and are typically in the form of Bunch objects, which are dictionary-like objects.*

**Structure of the Datasets**

**The datasets are typically returned as Bunch objects, which have the following attributes:**

*data: The feature matrix (2D numpy array where each row represents a sample and each column represents a feature).*

*target: The target values (1D numpy array where each element represents the target value for a sample).---------dependent variable*

*DESCR: A description of the dataset.*

*feature_names: The names of the features (for some datasets). -------- independent variables*

*target_names: The names of the target classes (for classification datasets).*

*frame: A pandas DataFrame containing the dataset, available if as_frame=True is passed when loading the dataset.*

**Note : the independent and dependent variables(target variables) are already separated in this dataset**

In [5]:
print(diabetes.keys())     #or print(diabetes['keys'])             #we have used numpy array as sklearn works on numpy array

dict_keys(['data', 'target', 'frame', 'DESCR', 'feature_names', 'data_filename', 'target_filename', 'data_module'])


In [7]:
print(diabetes.data.shape)      #or print(diabetes['data'].shape) 

(442, 10)


In [None]:
print(diabetes.data)          #or print(diabetes['data'])

In [None]:
print(diabetes.feature_names)     #or print(diabetes['features_name'])

In [None]:
print(diabetes.target)      #or print(diabetes['target'])

In [6]:
print(diabetes.DESCR)     #or print(diabetes['DESCR'])

.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

:Number of Instances: 442

:Number of Attributes: First 10 columns are numeric predictive values

:Target: Column 11 is a quantitative measure of disease progression one year after baseline

:Attribute Information:
    - age     age in years
    - sex
    - bmi     body mass index
    - bp      average blood pressure
    - s1      tc, total serum cholesterol
    - s2      ldl, low-density lipoproteins
    - s3      hdl, high-density lipoproteins
    - s4      tch, total cholesterol / HDL
    - s5      ltg, possibly log of serum triglycerides level
    - s6      glu, blood sugar level

Note: Each of these 10 feature variables have bee

**Access the 'sex' column**

In [None]:
feature_names = diabetes.feature_names
sex_index = feature_names.index('sex')
print(sex_index)
diabetes_sex_column = diabetes.data[:,1]
print(diabetes_sex_column)

**Indexing , adding new axis and reshaping**

*The expression diabetes.data[:, np.newaxis, 2] is used to extract the third feature (column) of the diabetes dataset and add a new axis,
effectively turning it into a 2D array with one column.
Here's a detailed breakdown:*

*diabetes.data:*

*This is the dataset containing the features (predictors) for the diabetes dataset. It is a 2D numpy array where rows represent samples and 
columns represent features.*

*[:, np.newaxis, 2]:*

*: This selects all rows.*

*np.newaxis: This is used to add a new axis, effectively turning the selected data into a column vector.*

*2: This selects the third column (index 2, since Python uses 0-based indexing).*

*By using np.newaxis, you reshape the data to add a new dimension, converting it from a 1D array to a 2D array with a single column. This can be 
useful when you need to fit a model that expects a 2D array for the features.*

In [None]:
import numpy as np
from sklearn import datasets

# Load the diabetes dataset
diabetes = datasets.load_diabetes()

# Extract the third feature (index 2) and add a new axis
diabetes_feature_3 = diabetes.data[:,np.newaxis, 2]
print(diabetes_feature_3)

# Check the shape of the original data and the reshaped data
print("Original shape:", diabetes.data.shape)
print("New shape with one feature and new axis:", diabetes_feature_3.shape)


**Train-Split**

*diabetes_X_train = diabetes_X[:-30]:*

*1. This selects all rows of diabetes_X except the last 30.*

*2. :-30 is slicing syntax that means "up to but not including the last 30 elements."*

*3. This is commonly used to create a training set, leaving the last 30 samples for testing.*


*diabetes_X_test = diabetes_X[-30:]:*

*1. This selects the last 30 rows of diabetes_X.*

*2. -30: is slicing syntax that means "starting from the 30th element from the end to the last element."*

*3. This is used to create the testing set*

In [None]:
diabetes_X = diabetes.data[:,np.newaxis,2]
diabetes_X_train = diabetes_X[:-30]
diabetes_X_test = diabetes_X[-30:]

In [None]:
diabetes_y_train = diabetes.target[:-30]
diabetes_y_test = diabetes.target[-30:]

**Building the Model - Linear Regressor**

In [None]:
model = linear_model.LinearRegression()

**Training the Model**

In [None]:
model.fit(diabetes_X_train, diabetes_y_train)

**Testing the model - predicting**

In [None]:
diabetes_y_predicted = model.predict(diabetes_X_test)

# Calculating mean squared error - to verify accuracy of the model &

                
# Calculating weight and the intercept

**For Second column - two variables- the equation will line - single weight**

**Linear Regression**

In [None]:
plt.scatter(diabetes_X_test, diabetes_y_test)
plt.plot(diabetes_X_test, diabetes_y_predicted)

plt.show()

In [None]:
from sklearn.metrics import  mean_squared_error
print("Mean squared error is: ", mean_squared_error(diabetes_y_test, diabetes_y_predicted))

print("Weights: ", model.coef_)     #weights - coef
print("Intercept: ", model.intercept_)

# Calculating mean squared error - to verify accuracy of the model &

                
# Calculating weight and the intercept

**For All the column(features)-**
**The eqaution will not be line-multiple weights**

**MULTIPLE REGRESSION**

In [None]:
diabetes_X1 = diabetes.data
diabetes_X_train1 = diabetes_X1[:-30]
diabetes_X_test1 = diabetes_X1[-30:]
diabetes_y_train = diabetes.target[:-30]
diabetes_y_test = diabetes.target[-30:]
model.fit(diabetes_X_train1, diabetes_y_train)
diabetes_y_predicted = model.predict(diabetes_X_test1)

from sklearn.metrics import  mean_squared_error
print("Mean squared error is: ", mean_squared_error(diabetes_y_test, diabetes_y_predicted))

print("Weights: ", model.coef_)
print("Intercept: ", model.intercept_)

*note that after taking more number of feautures ----- the mean sqaured error decreased to half the earlier value indicating 
increased accuracy of the model*