Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document structure parsing as a separate component #2

Closed
marcverhagen opened this issue Feb 13, 2016 · 4 comments
Closed

Document structure parsing as a separate component #2

marcverhagen opened this issue Feb 13, 2016 · 4 comments

Comments

@marcverhagen
Copy link
Member

Simple document structure is now done in docmodel.parsers, and simple document structure is built in into the TarsqiDocument class by having the elements variable with its list of TarsqiDocElements, in addition, the output ttk format groups everything by document element.

It is probably better to use separate components for document structure where these components would be sensitive to the --source option and maybe the --genre option as well. This would probably involve doing away with TarsqiDocElement and putting the TagRepositories directly into the TarsqiDocument. Document structure would then be added as its own set of tags.

This would make the docmodel code cleaner and delegate structure to its own component.

@marcverhagen marcverhagen added this to the DocModel redesign milestone Feb 15, 2016
marcverhagen added a commit that referenced this issue Feb 25, 2016
- Pulled apart the code document structure parsing and metadata parsing.
- Created new source parser for plain text.
- Source parsers now created a TarsqiDocument.
- Metadata and document structure parsers run over the tarsqiDocument.

Related to issues:
   #1
   #2
@marcverhagen
Copy link
Member Author

Did part of this in b065a91 and ee10e2b by pulling apart the document structure parser and the metadata parser.

marcverhagen added a commit that referenced this issue Feb 26, 2016
- The new name is a better match after the redesign.
- Some minor refactoring of tarsqi module.

Related to issues:
   #2
   https://github.com/tarsqi/ttk/milestones/DocModel%20redesign
@marcverhagen
Copy link
Member Author

The elements list and its TarsqiDocParagraph members are still in place. All other changes were made. The question is whether it is a good idea to have the elements list and the doc_element tag as a tag in the ttk output that wraps around other ttk tags.

This may not be resolved till I play with importing data in the ttk format.

@marcverhagen
Copy link
Member Author

Having played with the ttk format a bit I am now convinced that having the element list is a bad idea. There are some messy dependencies between TagRepositories on TarsqiDocElements that should not be there and that prove hard to debug.

So TarsqiDocElement will be retired. This is a bit involved and includes:

  • Update SourceDoc to have just one TagRepository called tags.
  • Remove elements from TarsqiDocument
  • Add a TagRepository called tags or tarsqi_tags on TarsqiDocument. This pulls tags from source as needed, for example when the source is a TTK file with Tarsqi tags.
  • Update the SourceParserXML.
  • Update the SourceParserText.
  • Update the SourceParserTTK.
  • Update MetaData parser if needed.
  • Update default document structure parser, just adding tags instead of create doc elements.
  • Adjust PreprocessorWrapper.
  • Adjust GUTimeWrapper.
  • Adjust EvitaWrapper.
  • Adjust SlinketWrapper.
  • Adjust S2TWrapper.
  • Adjust BlinkerWrapper

marcverhagen added a commit that referenced this issue Apr 10, 2016
Document structure was treated differently from other information by having
hard-wired the elements array with TarsqiDocElements.

- Eliminated elements arrays, TarsqiDocElement and TarsqiDocParagraph
- added convenience methods to get the docelement tags
- updated all components to work with the new design

Related to issue #2
@marcverhagen
Copy link
Member Author

Done in 1a9ffc3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant