Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Giving same cluster id to all records linked from multiple sources #108 #109

Merged
merged 1 commit into from
Jan 5, 2022

Conversation

navinrathore
Copy link
Contributor

addUniqueCol() is also referred to TrainingDataFinder.
dropDuplicate() could be expansive, but perhaps necessary. Applied on data from master source only.

Note: addUniqueCol() is also duplicated in FileUtil.java, nowhere used as of now.

Copy link
Member

@sonalgoyal sonalgoyal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of changing trainingdatafinder and addUnqieCol, can we invoke addUniqueCol with id_col and change alignLinked method to make z_cluster same as z_id ?

@navinrathore
Copy link
Contributor Author

Yes. That's look good. Made the changes.

@sonalgoyal sonalgoyal merged commit 7f14bd7 into zinggAI:main Jan 5, 2022
@navinrathore navinrathore deleted the zLink branch February 25, 2022 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants