Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message generation by the basic LaTeX filter #180

Open
matze-dd opened this issue Feb 1, 2021 · 5 comments
Open

Message generation by the basic LaTeX filter #180

matze-dd opened this issue Feb 1, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@matze-dd
Copy link
Collaborator

matze-dd commented Feb 1, 2021

Currently, the basic filter yalafi provides the plain text and a map for character positions. In order to indicate certain problems as LaTeX-related errors, a special mark has to be inserted in the plain text that then is detected by the proofreading software.

We should add a third "channel" between yalafi and yalafi.shell with complete messages that can be inserted just as currently for option --single-letters (which is implemented in yalafi.shell). When using the base filter yalafi in isolation, one could save these messages in a JSON file.

We first need to resolve issue #169.

A first application could be direct injection of LaTeX-related problems (cf. issue #177), then we could shift processing of option --single-letters and --equation-punctuation to core yalafi, and finally we perhaps could generate better messages for problems in displayed equations (for instance, issue #158).

In all cases, we would not need to insert special marks (or repeated words) to be found by the proofreading program, any more.

@torik42
Copy link
Owner

torik42 commented Feb 1, 2021

This is exactly what I intended in #177 (see my comment there). But I will stick to your separation of the issues. Now to this one.

I do not yet understand how all the code works together. Anyway I will put some thought here

  • Could the parser collect all errors and report them together with the tokens (I guess line 56 in tex2txt.py)?
  • Could one insert a special error token which contain the error properties (i.e. message, ID, short description, …) but replaces to an empty string? Later the tokens can be searched for this error token and the messages appended to the messages from the proofreader.

Although I wrote ‘empty string’ above, I think it would be nice if some kind of mark is written to the output in case it produces extra errors. This could be made optional. Also, it is helpful in plain text output to have these. One could also think about removing all LT errors which report a misspelled LATEXXXERROR to not have duplicate messages.

@matze-dd
Copy link
Collaborator Author

matze-dd commented Feb 2, 2021

Thank you for the thoughts!

Could the parser collect all errors and report them together with the tokens (I guess line 56 in tex2txt.py)?

Yes, this is roughly the plan. (But tex2txt returns the plain text as string(s).)

Could one insert a special error token which contain the error properties ...

See first point.

Also, it is helpful in plain text output to have these.

This would then be done "automatically" by yalafi.shell, once it is ready to read from the "third channel".

... This could be made optional. Also, it is helpful in plain text output to have these. One could also think about removing all LT errors which report a misspelled LATEXXXERROR to not have duplicate messages.

Yes, I was already thinking about these points, too. There are some subtle interdepencies to be taken into account.

@torik42
Copy link
Owner

torik42 commented Feb 2, 2021

Thank you for all the answers. I already played a little with this idea yesterday:

  • Could one insert a special error token which contain the error properties (i.e. message, ID, short description, …) but replaces to an empty string? Later the tokens can be searched for this error token and the messages appended to the messages from the proofreader.

It should also work pretty well without too many modifications to the code. Here is a quick sketch. Don’t take it too seriously, but it already yields reasonable results. I can put more effort into this next week. Just let me know. In the end you know the code a lot better.

The mockup ErrorToken, the offset is changed later.

class ErrorToken(TextToken):
    def __init__(self, pos, txt, id, short_msg, msg, pos_fix=True):
        super().__init__(pos, txt)
        self.error = {
            'offset': 0,
            'length': 1,
            'context': {
                'text': 'sometext',
                'offset': 0,
                'length': 1,
            },
            'rule': {'id': id, 'category': {'name': 'YY_LATEX_ERROR'}},
            'message': msg,
            'replacements': [],
        }

In tex2txt I replace txt, pos = utils.get_txt_pos(toks) with txt, pos, err = utils.get_txt_pos_err(toks) (line 60) defined by

def get_txt_pos_err(toks):
    txt = ''
    pos = []
    errors = []
    for t in toks:
        txt += t.txt
        if type(t) is defs.ErrorToken:
            error = t.error
            error['offset'] = len(pos)
            errors.append(error)
        if t.pos_fix:
            pos += [t.pos] * len(t.txt)
        else:
            pos += list(range(t.pos, t.pos + len(t.txt)))
    return txt, pos, errors

And also output the error return txt, pos, err (line 68).
In proofreader.py I catch those plain, charmap, err = tex2txt.tex2txt(tex, t2t_options) (line 82) and add them to the matches later matches += err.
In the definition of latex_error(err, pos, latex, parms) I exchange the Token defs.TextToken(pos, mark[:mx], pos_fix=True) with defs.ErrorToken(pos, mark[:mx], 'YY_ERROR', '', err) to create an Error token for any error.

Too also support errors for nested calls (i.e. \LTinput) I need to filter also for type ErrorToken in h_load_defs.

So far, this obviously only works for certain settings (e.g. no multilanguage) but the changes should be straight forward.

EDIT: By the way, this would show you errors of nested \LTinput calls at the position of the top most call. One could add the actual file path in the error message.

@matze-dd
Copy link
Collaborator Author

matze-dd commented Feb 2, 2021

This is in principle the scheme I'm also thinking about. As you also point out, there are quite some things to consider, if one wants to achieve a solution of good quality. In part, this is also due to the "not so tidy" internal structure of the software.

So, if you need a quick solution for your application, then you are of course free to modify the tool for your needs. On the other hand, this is a hobby project, and I would like to "hack" the core parts on my own. (I see an effort of more than a few days. In other words, I'd like to slow down.)

By the way, 'token' is a technical term in compiler building. It is perhaps misplaced for the messages between the core LaTeX filter and an application like yalafi.shell.

@torik42
Copy link
Owner

torik42 commented Feb 2, 2021

This is in principle the scheme I'm also thinking about. As you also point out, there are quite some things to consider, if one wants to achieve a solution of good quality. In part, this is also due to the "not so tidy" internal structure of the software.

So, if you need a quick solution for your application, then you are of course free to modify the tool for your needs. On the other hand, this is a hobby project, and I would like to "hack" the core parts on my own. (I see an effort of more than a few days. In other words, I'd like to slow down.)

Sure. I just don’t want to open issues and expect that you solve them. There is also really no need to hurry. Have fun hacking!

By the way, 'token' is a technical term in compiler building. It is perhaps misplaced for the messages between the core LaTeX filter and an application like yalafi.shell.

I don’t know compiler building at all. But I called it ErrorToken because it’s a subclass of TextToken and you called all these …Token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants