# Starter Code for Modules 1, 2, and 3 Research Discussions



## Module 1: Evaluating the Effectiveness of Pre-trained Models from Different Hubs

In this module, you will explore and evaluate pre-trained models from Kaggle, PyTorch Hub, TensorFlow Hub, and Hugging Face. Consider the performance, computational resources, and suitability for specific tasks.

### Starter Code


In [6]:

# Hugging Face Example
from transformers import pipeline
nlp_classifier = pipeline("sentiment-analysis")
print(nlp_classifier("This is a great movie!"))

# PyTorch Hub Example
import torch
model = torch.hub.load('pytorch/vision:v0.16.0', 'resnet50', pretrained=True)
model.eval()

# TensorFlow Hub Example
import tensorflow_hub as hub
tf_model = hub.KerasLayer("https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/5")
print(tf_model)

# Kaggle Example - (assume Kaggle API is configured)
# !kaggle datasets download -d zalando-research/fashionmnist


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0
Using cache found in C:\Users\zhatz/.cache\torch\hub\pytorch_vision_v0.16.0


[{'label': 'POSITIVE', 'score': 0.9998748302459717}]


AttributeError: module 'tensorflow' has no attribute '__version__'

In [5]:
import tensorflow as tf
print(tf.__version__)

AttributeError: module 'tensorflow' has no attribute '__version__'


## Module 2: Impact of Data Preprocessing Techniques on Model Accuracy

In this module, you will evaluate various preprocessing steps like data cleaning, normalization, handling missing values, and outlier detection using an example dataset.

### Starter Code


In [7]:
# Import libraries
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target)

# Introduce an artificial missing value (for demonstration purposes)
X.iloc[0, 0] = None

# Handle missing values using mean imputation
imputer = SimpleImputer(strategy='mean')
X_imputed = imputer.fit_transform(X)

# Normalize data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_imputed)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42
)

# Train the model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Evaluate the model
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy:.2f}")


Accuracy: 1.00



## Module 3: Feature Engineering and Selection for Enhanced Predictive Performance

This module demonstrates feature engineering, selection, and dimensionality reduction using an example dataset.

### Starter Code


In [8]:

# Import libraries
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest, mutual_info_classif
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Load example dataset
iris = load_iris()
X, y = iris.data, iris.target

# Feature selection
selector = SelectKBest(mutual_info_classif, k=2)
X_selected = selector.fit_transform(X, y)

# Dimensionality reduction
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_selected)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X_pca, y, test_size=0.2, random_state=42)

# Train model
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Evaluate model
predictions = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, predictions):.2f}")


Accuracy: 1.00
