# UC-Berkeley Milling Dataset
> This notebook demonstrates how we can use PyPHM with the UC-Berkeley Milling Dataset. We also conduct some basic data exploration to help the reader understand the data.

The UC-Berkeley Milling Dataset is a collection of milling machine signals, collected during metal machining. The dataset contains 167 cuts on four different type of metal. Periodic measurement of tool wear was also recorded. Further details are found in the Data Exploration section, below.

The dataset is available from the [NASA Prognostic Data Repository](https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/#milling). Consider citing the creators of the dataset if you use it in academic research.

# Quickstart
Use PyPHM to quickly download and prepare the dataset.

Start by importing PyPHM, numpy, pandas, and pathlib.

In [4]:
import numpy as np
import pandas as pd
from pyphm.datasets.milling import MillingPrepMethodA
from pathlib import Path

We will use `MillingPrepMethodA` to prepare the dataset. This method is similar to that described in Cheng et al. [cheng2019multisensory] and von Hahn et al. [].

Instantiate the `MillingPrepMethodA` class. Dowload the dataset by setting `download` to `True`. Keep the sliding window length (`window_size`) to 64, and the amount each window moves (`stride`) to 64. These are also the default values, if you leave these variables blank.

In [10]:
# define the location of where the raw data folders will be kept.
# e.g. the milling data will be in path_data_raw_folder/milling/ 
path_data_raw_folder = Path(Path.cwd().parent / 'data/raw/' )
print(path_data_raw_folder)

# create the path_data_raw_folder if it does not exist
path_data_raw_folder.mkdir(parents=True, exist_ok=True)

# instantiate the MillingPrepMethodA class and download data if it does not exist
mill = MillingPrepMethodA(root=path_data_raw_folder, download=True, window_size=64, stride=64)

c:\_Python\PyPHM\data\raw


With the dataprep class instantiated, we can now prepare the dataset by calling methods from the class.

First, let's call the `create_xy_arrays` method. This will create two numpy arrays. The `x` array contains the windowed signals from the milling machine. The `y` array contains the labels and meta-data for each window.

In [9]:
x, y = mill.create_xy_arrays()
print("x.shape", x.shape)
print("y.shape", y.shape)

x.shape (11570, 64, 6)
y.shape (11570, 64, 3)


As can be seen, the `x` array has a shape of `(11570, 64, 6)`. This represents the number of samples (windows) in the dataset, the window size, and the number of signals in each window, respectively.

In [1]:
import sys
# sys.path.append(r'/home/tim/Documents/PyPHM')
from pyphm.datasets.utils import _urlretrieve, extract_archive
from pyphm.datasets.ims import ImsDataLoad
from pathlib import Path
import hashlib
import py7zr
from rarfile import RarFile
import os


%load_ext autoreload
%autoreload 2

In [2]:
root_dir = Path.cwd().parent
print(root_dir)
path_data_raw_folder = Path(root_dir / 'data' )
print(path_data_raw_folder)
print(type(path_data_raw_folder))

/home/tim/Documents/PyPHM
/home/tim/Documents/PyPHM/data
<class 'pathlib.PosixPath'>


In [3]:
ims = ImsDataLoad(path_data_raw_folder, 'ims', download=False)

type(root) =  <class 'pathlib.PosixPath'>
/home/tim/Documents/PyPHM/data/ims


In [16]:
ims.download()

IMS.7z already exists.


In [4]:
ims.extract()

Extracting IMS.7z


In [8]:
l = list(Path(path_data_raw_folder / 'ims').glob('*.rar*'))

In [9]:
len(l)

0

In [3]:
extract_archive(Path.cwd() / '1st_test.rar')

'/home/tim/Documents/PyPHM/notebooks'

In [4]:
extract_archive(path_data_raw_folder / 'ims/IMS.7z')

'/home/tim/Documents/PyPHM/data/ims'

In [3]:
# extract .7z file
with py7zr.SevenZipFile(path_data_raw_folder / 'ims/IMS.7z', mode='r') as z:
    z.extractall()

In [6]:
# list all files in the cwd
print(os.listdir())


['test.jpg', '2nd_test.rar', 'Readme Document for IMS Bearing Data.pdf', 'test_ims.ipynb', 'test.ipynb', '3rd_test.rar', '1st_test.rar']


In [7]:
with RarFile(Path.cwd() / '1st_test.rar') as rf:
    rf.extractall()

In [None]:
import tarfile

tar = tarfile.open(path_data_raw_folder / 'ims/IMS.7z')
tar.extractall()
tar.close()

In [8]:
extract_archive(path_data_raw_folder / 'ims/IMS.7z')

RuntimeError: Unknown compression or archive type: '.7z'.
Known suffixes are: '['.bz2', '.gz', '.tar', '.tbz', '.tbz2', '.tgz', '.xz', '.zip']'.

In [None]:
ims = ImsDataLoad(path_data_raw_folder, download=True)

In [None]:
mill = MillingPrepMethodA(path_data_raw_folder, download=False)

In [None]:
x, y = mill.create_xy_arrays()
print("x.shape", x.shape)
print("y.shape", y.shape)

In [None]:
y[0,0,:]

In [None]:
df = mill.create_xy_dataframe()
df.head()

In [None]:
df.shape

In [None]:
y.shape

In [None]:
x.shape

In [None]:
# sys.path.append(root_dir / 'pyphm')
from pyphm.datasets.utils import _urlretrieve

In [None]:
def file_as_bytes(file):
    with file:
        return file.read()

print(hashlib.md5(file_as_bytes(open(path_data_raw_folder / 'IMS.7z', 'rb'))).hexdigest())

In [None]:
# _urlretrieve('https://files.realpython.com/media/Python-Imports_Watermarked.ae72c8a00197.jpg', 'test.jpg')

In [None]:
import sys
sys.path

In [None]:
import pyphm