# Download & Save Kaggle Dataset to Google Drive

## Introduction
**Goals**






*   Authenticate with Kaggle.
*   Download the Dataset.


* Save it to a structured folder in Google Drive.










**Description**




This notebook is the first step in the breast cancer classification project.

This notebook downloads the Breast Cancer Wisconsin (Diagnostic) dataset from the UCI Machine Learning Repository via Kaggle, performs basic inspection, and saves a cleaned version for further analysis.

- Source: [UCI ML Repository](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic))
- Format: CSV
- Samples: 569
- Features: 30 numeric features + ID + diagnosis

##Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


##Setup & Imports







In [None]:
# Install KaggleHub
!pip install -q kagglehub

In [None]:
import pandas as pd
import numpy as np

import os
import kagglehub
import shutil
from google.colab import files

##Create Project Directory

In [None]:
# Define project folder in Drive
TARGET_DIR = f"/content/drive/My Drive/Portfolio/DataSciencePortfolio/Projects/Breast-Cancer/data/raw"

# Create the directory if it doesn't exist
os.makedirs(TARGET_DIR, exist_ok=True)
print("Saving dataset to:", TARGET_DIR)

Saving dataset to: /content/drive/My Drive/Portfolio/DataSciencePortfolio/Projects/Breast-Cancer/data/raw


##Authenticate with Kaggle

In [None]:
# Upload Kaggle API key (kaggle.json)
files.upload()  # Upload kaggle.json when prompted
os.remove("kaggle.json")

### Move kaggle.json to the correct location

In [None]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

##Download Dataset from Kaggle

In [None]:
DATASET = "wasiqaliyasir/breast-cancer-dataset"

# Download the dataset
dataset_path = kagglehub.dataset_download(DATASET)
print("Dataset downloaded to:", dataset_path)

Using Colab cache for faster access to the 'breast-cancer-dataset' dataset.
Dataset downloaded to: /kaggle/input/breast-cancer-dataset


##Save Dataset to Drive

In [None]:
for filename in os.listdir(dataset_path):
    src = os.path.join(dataset_path, filename)
    dst = os.path.join(TARGET_DIR, filename)
    if os.path.isfile(src):
        shutil.copy(src, dst)

print("Dataset saved to Drive at:", TARGET_DIR)

Dataset saved to Drive at: /content/drive/My Drive/Portfolio/DataSciencePortfolio/Projects/Breast-Cancer/data/raw


##Preview the Data

In [None]:
csv_path = os.path.join(TARGET_DIR, "Breast_cancer_dataset.csv")
df = pd.read_csv(csv_path)


df.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,
