New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documenter refactoring and handling error 'Path does not exist' #223
Conversation
84d8959
to
108d710
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make required changes and then let us discuss again
writeColumnDocument(HTML_TEMPLATE, root, filenameHTML); | ||
} | ||
|
||
public void writeColumnDocument(String template, Map<String, Object> root, String fileName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be moved to base class - buildAndWriteHtml in modeldoc also is the same
checkAndCreateDir(stopWordsDir); | ||
checkAndCreateDir(columnsDir); | ||
|
||
for (FieldDefinition field: args.getFieldDefinition()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not build a list of fields for which you want to send in empty df - invoke that with empty and invoke rest with the actual df
writeColumnDocument(HTML_TEMPLATE, root, filenameHTML); | ||
} | ||
|
||
public void writeColumnDocument(String template, Map<String, Object> root, String fileName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of string template - accept list templates, loop and build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather - accept template...
} | ||
} | ||
|
||
private void checkAndCreateDir(String dirName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to base
file.close(); | ||
LOG.info("Documenter starts"); | ||
// Marked records details | ||
ModelDocumenter colDoc = new ModelDocumenter(spark, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to modelDoc
} | ||
private void prepareAndWriteColumnDocument(Dataset<Row> data, String fieldName, String stopWordsDir, String columnsDir) throws ZinggClientException { | ||
Map<String, Object> root = new HashMap<String, Object>(); | ||
root.put("title", fieldName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz standardize common field names across templates(modeId, title etc) and move to an interface with static final String constants - use that in all the classes.
LOG.info("Model document generation starts"); | ||
|
||
File directory = new File(args.getZinggTrainingDataMarkedDir()); | ||
if (!directory.exists()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model documentation has to happen even when there are no marked records - i think we discussed this. if someone starts with plain config and runs generateDocs, we need to have information about columns, types, field types, match types and generate the stop words which they can configure
File directory = new File(args.getZinggTrainingDataMarkedDir()); | ||
if (!directory.exists()) { | ||
LOG.warn("Marked data folder(models/<model_id>/trainingData/marked) does not exist. Please run findTrainingData and/or label phases to mark records"); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return breaks the code flow, please avoid
} | ||
} | ||
|
||
public void buildAndWriteHTML(Map<String, Object> root) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to base
markedRecords = markedRecords.cache(); | ||
List<Column> displayCols = DSUtil.getFieldDefColumns(markedRecords, args, false, args.getShowConcise()); | ||
displayCols.add(0, markedRecords.col(ColName.MATCH_FLAG_COL)); | ||
displayCols.add(1, markedRecords.col(ColName.CLUSTER_COLUMN)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happened to source, prediction, probability?
Documentation should be genetated if minimal required info is available
No description provided.