Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New phase assessModel #296

Open
sonalgoyal opened this issue May 26, 2022 · 6 comments
Open

New phase assessModel #296

sonalgoyal opened this issue May 26, 2022 · 6 comments
Assignees

Comments

@sonalgoyal
Copy link
Member

Write a python script which whill expose the model stats - confusion matrix and number of records marked, unmarked, matches, non matches, not sure.

We will use the Labeller class. The python script takes the conf and passes it to the Client. Client will invoke the Labeller. Refer to the python api example at https://github.com/zinggAI/zingg/blob/main/api/scala/FebrlExample.py.

The script calls getMarkedRecords, getMarkedRecordsStat, getUnmarkedRecords on the Client and provides the stats. You can convert the df returned by the Client to python df. To build the confusion matrix, following can be used.

import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt

confusion_matrix = pd.crosstab(markedRecords['z_isMatch'], markedRecords['z_prediction'], rownames=['Actual'], colnames=['Predicted'])

sn.heatmap(confusion_matrix, annot=True)
plt.show()

@sonalgoyal
Copy link
Member Author

I have added new methods on the client - getMatchedMarkedRecordsStat(Dataset markedRecords), getUnmatchedMarkedRecordsStat(Dataset markedRecords), getUnsureMarkedRecordsStat and getMarkedRecords()

you can use them to build the logic

@sonalgoyal
Copy link
Member Author

@RavirajBaraiya

@sonalgoyal sonalgoyal self-assigned this May 28, 2022
@navinrathore
Copy link
Contributor

Confusion Matrix looks like below
image

@navinrathore
Copy link
Contributor

Generated Config File from Arguments object
ArgumentsToFile.txt

@navinrathore
Copy link
Contributor

Statistics for model 100

No. of Records Marked   :  76
No. of Records UnMarked :  72
No. of Matches          :  14
No. of Non-Matches      :  24
No. of Not Sure         :  0

@sonalgoyal
Copy link
Member Author

need to look at the right model internally for this - should be expose label model or should we expose the actual model

@sonalgoyal sonalgoyal modified the milestones: 0.3.4, 0.3.5 Jul 26, 2022
@sonalgoyal sonalgoyal removed this from the 0.3.5 milestone Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants