Profile Viewer Notebook: Drift calculations corrections #461

FelipeAdachi · 2022-03-02T20:19:30Z

Summary

Experimenting with summary_drift_report, I found cases of two very similar distributions that are tagged as "severe drift".
In the image below, we have two profiles of the same features with 5k samples each, drawn from the same distribution. From the report, I understand that it's tagged as "severe drift", as the p-values are within the range for "severe drift". However, they are pretty similar.

We can see the similarity in the double_histogram plot:

I also stored the features from both profiles and ran a kstest, wich yielded a "no-drift" result, with p-value of 0.36.

Steps to Reproduce it

To generate the distributions with a higher number of samples, and to be able to access them as lists after logging it, I changed the code in the Profile_Viewer_In_Notebook a bit:

session = get_or_create_session()

def profile_generator(size=500):
    mixture = []
    mixture_1 = []
    mixture_2 = []
    mixture_3 = []
    mixture_4 = []

    with session.logger("mytestytest", dataset_timestamp=datetime.datetime(2021, 6, 2)) as logger:
        for _ in range(size):
            mix_sample = np.random.choice(distribution, 1)[0]
            mix1_sample = np.random.choice(distribution, 1)[0]
            mix2_sample = np.random.choice(distribution, 1)[0]
            mix3_sample = np.random.choice(distribution, 1)[0]
            mix4_sample = np.random.choice(distribution, 1)[0]

            logger.log({"uniform_integers": np.random.randint(0,50)})
            logger.log({"mixture_distribution": mix_sample}) 
            logger.log({"1mixture_distribution": mix1_sample})
            logger.log({"2mixture_distribution": mix2_sample})
            logger.log({"3mixture_distribution": mix3_sample})
            logger.log({"4mixture_distribution": mix4_sample})
            logger.log({"nulls": None})

            mixture.append(mix_sample)
            mixture_1.append(mix1_sample)
            mixture_2.append(mix2_sample)
            mixture_3.append(mix3_sample)
            mixture_4.append(mix4_sample)

        logger.log({"moah_data": 1})
        logger.log({"moah_data": 1})
        logger.log({"moah_data": 5})

        dists = {
            "mixture":mixture,
            "mixture_1":mixture_1,
            "mixture_2":mixture_2,
            "mixture_3":mixture_3,
            "mixture_4":mixture_4,

                }

        return logger.profile,dists
    
target_profile,target_dists = profile_generator(size=5000)

reference_profile,reference_dists = profile_generator(size=5000)

Which yielded the profile in the below images.

With the lists of values for 1mixture_distribution, I ran the ks test like the following:

from alibi_detect.cd import KSDrift
import numpy as np
import matplotlib.pyplot as plt
# Initialize drift detector
length_drift_detector = KSDrift(np.array(reference_dists['mixture_1']), p_val=0.01)
print(length_drift_detector.predict(np.array(target_dists['mixture_1']), return_p_val=True, return_distance=True))


plt.hist(reference_dists['mixture_1'], alpha=0.75, label="reference")
plt.hist(target_dists['mixture_1'], alpha=0.5, label="production")
plt.legend()
plt.show()

Which yielded the following result:
{'data': {'is_drift': 0, 'distance': array([0.0184], dtype=float32), 'p_val': array([0.36134896], dtype=float32), 'threshold': 0.01}, 'meta': {'name': 'KSDrift', 'detector_type': 'offline', 'data_type': None, 'version': '0.8.1'}}

What is the expected correct behavior?

Maybe I'm misunderstanding the results, but my understanding is that the drift results are different from what I'd expect. For this case, there should be no alerts of drift between the distributions.

The text was updated successfully, but these errors were encountered:

FelipeAdachi · 2022-03-24T21:54:44Z

This was solved in PR - #483

FelipeAdachi added the bug Something isn't working label Mar 2, 2022

FelipeAdachi assigned Jirayr-Solvee Mar 2, 2022

FelipeAdachi closed this as completed Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile Viewer Notebook: Drift calculations corrections #461

Profile Viewer Notebook: Drift calculations corrections #461

FelipeAdachi commented Mar 2, 2022

FelipeAdachi commented Mar 24, 2022

Profile Viewer Notebook: Drift calculations corrections #461

Profile Viewer Notebook: Drift calculations corrections #461

Comments

FelipeAdachi commented Mar 2, 2022

Summary

Steps to Reproduce it

What is the expected correct behavior?

FelipeAdachi commented Mar 24, 2022