Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documenter refactoring and handling error 'Path does not exist' #223

Merged
merged 3 commits into from May 6, 2022

Conversation

navinrathore
Copy link
Contributor

No description provided.

@navinrathore navinrathore changed the title Documenter refactoring and minor issues resolution Documenter refactoring and handled error 'Path does not exist' Apr 27, 2022
@navinrathore navinrathore changed the title Documenter refactoring and handled error 'Path does not exist' Documenter refactoring and handling error 'Path does not exist' Apr 27, 2022
@navinrathore
Copy link
Contributor Author

#199

Copy link
Member

@sonalgoyal sonalgoyal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make required changes and then let us discuss again

writeColumnDocument(HTML_TEMPLATE, root, filenameHTML);
}

public void writeColumnDocument(String template, Map<String, Object> root, String fileName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be moved to base class - buildAndWriteHtml in modeldoc also is the same

checkAndCreateDir(stopWordsDir);
checkAndCreateDir(columnsDir);

for (FieldDefinition field: args.getFieldDefinition()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not build a list of fields for which you want to send in empty df - invoke that with empty and invoke rest with the actual df

writeColumnDocument(HTML_TEMPLATE, root, filenameHTML);
}

public void writeColumnDocument(String template, Map<String, Object> root, String fileName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of string template - accept list templates, loop and build?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather - accept template...

}
}

private void checkAndCreateDir(String dirName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to base

file.close();
LOG.info("Documenter starts");
// Marked records details
ModelDocumenter colDoc = new ModelDocumenter(spark, args);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to modelDoc

}
private void prepareAndWriteColumnDocument(Dataset<Row> data, String fieldName, String stopWordsDir, String columnsDir) throws ZinggClientException {
Map<String, Object> root = new HashMap<String, Object>();
root.put("title", fieldName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz standardize common field names across templates(modeId, title etc) and move to an interface with static final String constants - use that in all the classes.

LOG.info("Model document generation starts");

File directory = new File(args.getZinggTrainingDataMarkedDir());
if (!directory.exists()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model documentation has to happen even when there are no marked records - i think we discussed this. if someone starts with plain config and runs generateDocs, we need to have information about columns, types, field types, match types and generate the stop words which they can configure

File directory = new File(args.getZinggTrainingDataMarkedDir());
if (!directory.exists()) {
LOG.warn("Marked data folder(models/<model_id>/trainingData/marked) does not exist. Please run findTrainingData and/or label phases to mark records");
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return breaks the code flow, please avoid

}
}

public void buildAndWriteHTML(Map<String, Object> root) throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to base

markedRecords = markedRecords.cache();
List<Column> displayCols = DSUtil.getFieldDefColumns(markedRecords, args, false, args.getShowConcise());
displayCols.add(0, markedRecords.col(ColName.MATCH_FLAG_COL));
displayCols.add(1, markedRecords.col(ColName.CLUSTER_COLUMN));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened to source, prediction, probability?

@navinrathore navinrathore marked this pull request as ready for review May 6, 2022 12:16
@sonalgoyal sonalgoyal merged commit a7a751e into zinggAI:main May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants