Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce views? #3

Open
marcverhagen opened this issue Feb 13, 2016 · 4 comments
Open

Introduce views? #3

marcverhagen opened this issue Feb 13, 2016 · 4 comments
Labels

Comments

@marcverhagen
Copy link
Member

Is there a case for adding views to Tarsqi? A view would contain some set of tags and will be totally separate form other views. We could have a view for Evita events and one for events taken from another component.

Not sure if this is worth the trouble. An alternative is to enforce that each tag has a source attribute that stores what component created the tag.

@marcverhagen
Copy link
Member Author

Views are nice, but using them inside of TTK is probably a case of over-engineering. Tarsqi creates documents according to a certain pipeline and that's it. There are no views needed for that. We may want to add several components that add EVENTS, like there are several components that add TLINKS, using a source attribute to keep track of what component added a tag would be enough.

Let's focus on flexibility in taking several kinds of input and adjust the pipeline accordingly. For example, taking YTEX output (in TTK format) could be useful even if we just use the tokenization, tagging and lemmatization of YTEX. May want to spend some time on creating a ytex --source option which loads some tags into the tarsqi_tags.

@marcverhagen
Copy link
Member Author

Here is a potential advantage of having views. Currently, you can run a pipeline with a preprocessor and save the results as a ttk file. You can then run a pipeline with Evita. But say you ran the second pipeline with the preprocessor as well. In that case, if you have views you would have Evita select one of the views and nothing bad happens, except that in the end you have two views with preprocessor data. But currently you get a document with duplicate sentences and chunks (somehow tokens do not get duplicated) and this results in weird TarsqiTree instances that break Evita.

@reevesr
Copy link

reevesr commented Mar 3, 2016

I can see the sense of having views,given this duplication problem. I guess the question is whether having views is the easiest way to solve that.

@marcverhagen
Copy link
Member Author

Using views is definitely a more scalable solution. I will look a bit more into how much coding and added complexity it would actually take.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants