Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML system testing concepts #121

Open
yeomko22 opened this issue Sep 21, 2021 · 4 comments
Open

ML system testing concepts #121

yeomko22 opened this issue Sep 21, 2021 · 4 comments

Comments

@yeomko22
Copy link
Owner

why test?

  • confidence of the system을 측정할 수 있어야 한다.
  • moitoring historic system behavior를 분석해라

predicticting reliability

  • 대부분의 소프트웨어 시스템은 변화한다.
  • 이 떄, 변화가 시스템에 어떠한 영향을 줄 지 예측을 할 수 있어야 한다.

functionality

  • confidence that functionality remains unchanged
  • 즉, 테스트란 우리 시스템이 우리가 기대한 대로 동작하는 지를 보여주는 방식이다. 설령 시스템에 변화가 있다 하더라도.
@yeomko22
Copy link
Owner Author

Testing ML Systems

  • ML system은 더 세심하게 테스트되어야 한다. 왜? rules들이 더 모호하게 defined 되었기 때문이다.

Key testing principles for ML: pre-deployemnt

  • use a schema for features
  • model specification test: model config 변경은 unit test가 필요하다.
  • validate model quality: sudden degradation, slow degradation 테스트를 해야함
  • test input feature code
  • training is reproducible: random seed 고정 등
  • integration test the pipeline

스크린샷 2021-09-21 오후 8 15 59

@yeomko22
Copy link
Owner Author

testing theory

스크린샷 2021-09-21 오후 8 17 43

  • unit test: 단일 로직, 단일 클래스에 대한 테스트
  • integration test: assembled component에 대한 테스트
  • system test: end to end test

How much testing?

  • prioritise the code base?
  • what is mission critical?
  • test reduce uncertainty about your system?

@yeomko22
Copy link
Owner Author

test data schema

iris_schema = {
    'sepal length': {
        'range': {
            'min': 4.0,  # determined by looking at the dataframe .describe() method
            'max': 8.0
        },
        'dtype': float,
    },
    'sepal width': {
        'range': {
            'min': 1.0,
            'max': 5.0
        },
        'dtype': float,
    },
    'petal length': {
        'range': {
            'min': 1.0,
            'max': 7.0
        },
        'dtype': float,
    },
    'petal width': {
        'range': {
            'min': 0.1,
            'max': 3.0
        },
        'dtype': float,
    }
}
import unittest
import sys

class TestIrisInputData(unittest.TestCase):
    def setUp(self):
        
        # `setUp` will be run before each test, ensuring that you
        # have a new pipeline to access in your tests. See the 
        # unittest docs if you are unfamiliar with unittest.
        # https://docs.python.org/3/library/unittest.html#unittest.TestCase.setUp
        self.pipeline = SimplePipeline()
        self.pipeline.run_pipeline()
    
    def test_input_data_ranges(self):
        # get df max and min values for each column
        max_values = self.pipeline.frame.max()
        min_values = self.pipeline.frame.min()
        
        # loop over each feature (i.e. all 4 column names)
        for feature in self.pipeline.feature_names:
            
            # use unittest assertions to ensure the max/min values found in the dataset
            # are less than/greater than those expected by the schema max/min.
            self.assertTrue(max_values[feature] <= iris_schema[feature]['range']['max'])
            self.assertTrue(min_values[feature] >= iris_schema[feature]['range']['min'])
            
    def test_input_data_types(self):
        data_types = self.pipeline.frame.dtypes  # pandas dtypes method
        
        for feature in self.pipeline.feature_names:
            self.assertEqual(data_types[feature], iris_schema[feature]['dtype'])
  • schema를 미리 정의해놓고 inpute data가 스키마에서 주어진 min, max를 만족하는지, data type은 만족하는지 테스트를 한다.

@yeomko22
Copy link
Owner Author

testing data engineering

import unittest


class TestIrisDataEngineering(unittest.TestCase):
    def setUp(self):
        self.pipeline = PipelineWithDataEngineering()
        self.pipeline.load_dataset()
    
    def test_scaler_preprocessing_brings_x_train_mean_near_zero(self):
        # Given
        # convert the dataframe to be a single column with pandas stack
        original_mean = self.pipeline.X_train.stack().mean()
        
        # When
        self.pipeline.apply_scaler()
        
        # Then
        # The idea behind StandardScaler is that it will transform your data 
        # to center the distribution at 0 and scale the variance at 1.
        # Therefore we test that the mean has shifted to be less than the original
        # and close to 0 using assertAlmostEqual to check to 3 decimal places:
        # https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertAlmostEqual
        self.assertTrue(original_mean > self.pipeline.X_train.mean())  # X_train is a numpy array at this point.
        self.assertAlmostEqual(self.pipeline.X_train.mean(), 0.0, places=3)
        print(f'Original X train mean: {original_mean}')
        print(f'Transformed X train mean: {self.pipeline.X_train.mean()}')
        
    def test_scaler_preprocessing_brings_x_train_std_near_one(self):
        # When
        self.pipeline.apply_scaler()
        
        # Then
        # We also check that the standard deviation is close to 1
        self.assertAlmostEqual(self.pipeline.X_train.std(), 1.0, places=3)
        print(f'Transformed X train standard deviation : {self.pipeline.X_train.std()}')
  • 데이터에 StandardScaler를 적용하는 전처리를 하였다.
  • 이 전처리가 우리가 의도한 대로 적용이 되었는지를 테스트한다.
  • 데이터 전처리 로직을 테스트 하기에 쉽데록 잘 쪼개는 것이 중요하다.
  • 좋은 아이디어는 파이프라인 자체를 클래스로 만든 뒤, load_data까지만 수행하고 그 다음 스텝들을 차례로 수행하면서 테스트를 진행한다.
  • 즉, 데이터 전처리를 수행하는 함수들은 리턴 값을 주지 않고 파이프라인 객체 내의 frame에만 동작을 수행한다. bigdata-platform 프로젝트에서도 이렇게 테스트를 짰더라면 훨씬 편했을 것이다.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant