Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile Viewer Notebook: Drift calculations corrections #461

Closed
FelipeAdachi opened this issue Mar 2, 2022 · 1 comment
Closed

Profile Viewer Notebook: Drift calculations corrections #461

FelipeAdachi opened this issue Mar 2, 2022 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@FelipeAdachi
Copy link
Contributor

Summary

Experimenting with summary_drift_report, I found cases of two very similar distributions that are tagged as "severe drift".
In the image below, we have two profiles of the same features with 5k samples each, drawn from the same distribution. From the report, I understand that it's tagged as "severe drift", as the p-values are within the range for "severe drift". However, they are pretty similar.

drif1

We can see the similarity in the double_histogram plot:

hists2

I also stored the features from both profiles and ran a kstest, wich yielded a "no-drift" result, with p-value of 0.36.

Steps to Reproduce it

To generate the distributions with a higher number of samples, and to be able to access them as lists after logging it, I changed the code in the Profile_Viewer_In_Notebook a bit:

session = get_or_create_session()

def profile_generator(size=500):
    mixture = []
    mixture_1 = []
    mixture_2 = []
    mixture_3 = []
    mixture_4 = []

    with session.logger("mytestytest", dataset_timestamp=datetime.datetime(2021, 6, 2)) as logger:
        for _ in range(size):
            mix_sample = np.random.choice(distribution, 1)[0]
            mix1_sample = np.random.choice(distribution, 1)[0]
            mix2_sample = np.random.choice(distribution, 1)[0]
            mix3_sample = np.random.choice(distribution, 1)[0]
            mix4_sample = np.random.choice(distribution, 1)[0]

            logger.log({"uniform_integers": np.random.randint(0,50)})
            logger.log({"mixture_distribution": mix_sample}) 
            logger.log({"1mixture_distribution": mix1_sample})
            logger.log({"2mixture_distribution": mix2_sample})
            logger.log({"3mixture_distribution": mix3_sample})
            logger.log({"4mixture_distribution": mix4_sample})
            logger.log({"nulls": None})

            mixture.append(mix_sample)
            mixture_1.append(mix1_sample)
            mixture_2.append(mix2_sample)
            mixture_3.append(mix3_sample)
            mixture_4.append(mix4_sample)

        logger.log({"moah_data": 1})
        logger.log({"moah_data": 1})
        logger.log({"moah_data": 5})

        dists = {
            "mixture":mixture,
            "mixture_1":mixture_1,
            "mixture_2":mixture_2,
            "mixture_3":mixture_3,
            "mixture_4":mixture_4,

                }

        return logger.profile,dists
    
target_profile,target_dists = profile_generator(size=5000)

reference_profile,reference_dists = profile_generator(size=5000)

Which yielded the profile in the below images.

With the lists of values for 1mixture_distribution, I ran the ks test like the following:

from alibi_detect.cd import KSDrift
import numpy as np
import matplotlib.pyplot as plt
# Initialize drift detector
length_drift_detector = KSDrift(np.array(reference_dists['mixture_1']), p_val=0.01)
print(length_drift_detector.predict(np.array(target_dists['mixture_1']), return_p_val=True, return_distance=True))


plt.hist(reference_dists['mixture_1'], alpha=0.75, label="reference")
plt.hist(target_dists['mixture_1'], alpha=0.5, label="production")
plt.legend()
plt.show()

Which yielded the following result:
{'data': {'is_drift': 0, 'distance': array([0.0184], dtype=float32), 'p_val': array([0.36134896], dtype=float32), 'threshold': 0.01}, 'meta': {'name': 'KSDrift', 'detector_type': 'offline', 'data_type': None, 'version': '0.8.1'}}

What is the expected correct behavior?

Maybe I'm misunderstanding the results, but my understanding is that the drift results are different from what I'd expect. For this case, there should be no alerts of drift between the distributions.

@FelipeAdachi FelipeAdachi added the bug Something isn't working label Mar 2, 2022
@FelipeAdachi
Copy link
Contributor Author

This was solved in PR - #483

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants