Skip to content

Allow janky empty fields within double tabs in keywords.txt files #7245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

bhagman
Copy link
Contributor

@bhagman bhagman commented Feb 19, 2018

Fix for #6693 (comment)

Bottom line: need to fix the outliers, not the majority -- the base keywords.txt file is the worst offender.

https://github.com/arduino/Arduino/blob/master/build/shared/lib/keywords.txt

(N.B. code is untested)

*UNTESTED* - we need to fix the outliers, not the majority.  I think this solution will fix the problem with the base keywords.txt file.
@cmaglie
Copy link
Member

cmaglie commented Feb 19, 2018

(N.B. code is untested)

There is a trivial compile error, BTW using the "human" notation there is no way to write the equivalent of:

goto	KEYWORD3		RESERVED_WORD

to make the word goto appear as a RESERVED_WORD (in light green see #6693 (comment) for reference) because there is no way to express the empty field using the "human" notation.

So, almost there, but not yet...

@bhagman
Copy link
Contributor Author

bhagman commented Feb 19, 2018

Are you saying that this code doesn't fix the problem?

Also, there is a way to write it so that a "human" can read/work with it:

goto    KEYWORD3    NONE    RESERVED_WORD

It just means that the base keywords.txt file needs to be fixed and a small fix to the parser.

@bperrybap
Copy link

Where can I find any documentation on the intended file format of the keywords.txt file?
THAT is extremely important since without any sort of documentation/specification there are no clear rules on what the file should look like or how it should be parsed.

@bperrybap
Copy link

bperrybap commented Feb 19, 2018

I really dislike things that use magic names and especially cutesy names. So using "NONE" seems like such a hack and once magic keywords start to creep in their use will start to expand.
Although if the magic field were something like a single dash character, I'd have a lot less objection.
So it would be like this:
goto KEYWORD3 - RESERVED_WORD
One side benefit of using a dash is that it is not tied to English and seems pretty obvious what it means.
It is also a single character which might make the parsing a bit simpler.

Going another route, perhaps it is time to consider a version flexible format that fixes that file format?
i.e moving to a v2 format.
It would require some sort of version info/key/string in the file to specify file version format.
But once a file version format designator is in the file, it frees the file format from having support some of this odd and not very well documented (if at all) behaviors in newer versioned files.
It might make the parser simple since once it detected the format version, it could call separate parsers.

@cmaglie
Copy link
Member

cmaglie commented Feb 19, 2018

Where can I find any documentation on the intended file format of the keywords.txt file?

There isn't an official document, the only real document is the code of the parser.

The history of the keywords.txt format goes back to the beginning of Arduino, probably it's something that we inherited from Processing and slowly evolved over time without a real planning. Maybe git could help to restore some of the history, but we are talking about changes made in the last 10+ years without a real planning.

AFAIR the keywords.txt file started as a simple:

KEYWORD \t KEYWORD_TOKENTYPE

then expanded to

KEYWORD \t KEYWORD_TOKENTYPE \t REFERENCE_LINK

then expanded again to

KEYWORD \t KEYWORD_TOKENTYPE \t REFERENCE_LINK \t RSYNTAXTEXTAREA_TOKENTYPE

I guess to support new kind of tokens when we introduced RSyntaxTextArea but, again, I don't know why a second type of token has been added. This is how the format is now, that's all I know.

@cmaglie
Copy link
Member

cmaglie commented Feb 19, 2018

One side benefit of using a dash is that it is not tied to English and seems pretty obvious what it means.

totally agree with that, BTW the special case for - must be implemented so the "Find in reference" context menu is disabled when - is encontered:

image

otherwise we get an error like:

image

@bhagman
That's why I said we are "almost there", may you implement this check to complete this PR? (please note that your first commit has a trivial error too)

@bperrybap
Copy link

Wow, I had no idea about the other fields in the keywords file.
They aren't mentioned in the library 1.5 spec and they don't seem to be used in the IDE bundled libraries.
Seems like the 1.5 library spec needs some updating.
But then it will be messy to document as these extra things have been thrown in along the way so what is supported in the keywords files is based on the IDE version rather than a library spec version.
IMO, this is a great example of what can happen when coding is done without specifications/documentation.


I would recommend slowing down a bit and not going off and coding anything yet.
I don't think it is good to do these types of changes without some amount of discussion and thought as to implications.

A drawback with having a magic word/character for blank/null fields is that no matter what it is,
it breaks backward compatibility with older IDE versions.
My concern is that the existing file keyword.txt format is so fragile and broken because there are so many different ways it is parsed in different IDE versions that there is no way to ever create a file that is fully compatible across all versions of the IDE from 1.0 to the present.
I have run into many issues related to this over the years already, and the addition of magic words for null/blanks fields adds another complexity that might be worse than some of the current ugliness.
In other words, it seems that currently, one can make a keywords file that works across all the versions of the IDE from 1.0 to the present.
It is ugly to read, since the alignment is all messed up but it can work.
With a change like this, my concern is that when using a new file format, one must then limit the library to a newer IDE, since if you don't it severely breaks something on the older IDEs.
That seems like a big limitation and issue.

Using whitespace as a separator solves the readability issues which I think is the biggest complaint right now, but using whitespace as a seperator precludes having null/blank fields.
I think the real issue is that supporting blank/null fields was a very idea and should never have been supported.
Given the lack of documentation on the format of the file, do we have any sense of how many keyword files actually use and take advantage of this blank field capability?
I assume that it would only be files that are using the undocumented extra fields.
REFERENCE_LINK and RSYNTAXTEXTAREA_TOKENTYPE

Given the many different ways the file has been parsed by different versions of the IDE, I can't figure out a way that offers the ability to provide a file that looks good and works on IDE versions 1.0 to the present.
I don't think it can be done.

@bhagman
Copy link
Contributor Author

bhagman commented Feb 19, 2018

may you implement this check

@cmaglie Which check? I've fixed the type conversion error. If you mean implementing the dash - check... That's a different story, and a completely different patch.

@bperrybap This fix doesn't implement anything new. It only fixes the currently problem with the base keywords. Currently, most libraries will work with the "human" parsing of the keywords.txt file. I agree that changing the keyword definitions is a worthy topic, however, it should be a separate issue from this fix.

(FYI - another update to this fix forthcoming... I think we should check for double tabs instead of singles.)

@bperrybap
Copy link

Currently, most libraries will work with the "human" parsing of the keywords.txt file.

As of right now I have not seen a way to create a keywords file that looks good AND works on all IDEs 1.0 to 1.8.5
I can create one that works, but in order to work on all the IDEs, it has to use a single tab as field separator which makes it look bad - (to a human as fields are not aligned on the lines)

(FYI - another update to this fix forthcoming... I think we should check for double tabs instead of singles.)

I would say that may not be a good idea.
Keep in mind that current IDEs support using multiple tabs as a separator and have for quite some time so any check for double tabs has implications on those files that used multiple tabs for alignment purposes.
So I'm not sure this change solves much but it can create other issues.
whitespace is whitespace. It does not matter how many spaces are tabs are adjacent to each other.
checking for double tabs doesn't really solve anything since you don't know now many tabs there may be since tabs are also being used for alignment purposes in the file and the number of tabs a person uses is based on the number of characters in the preceding strings.
Compounding matters is many people go in and change their text editor to set the tab position to something other than the standard 8 so they may use even more tab characters for their alignment.

I don't believe that there is solution to the problem.
I believe that it will come down to what is issues can be reduced vs what issues must stay vs what new issues will be created by changing things.

@bperrybap
Copy link

@bhagman
Here is some information from pert on the use of multiple tabs (whitespace) for a single field separator. Which is really just using the tabs for alignment to make the file look better which is what I was talking about earlier.
#6693 (comment)

This is why checking for double tabs could be an issue.
And is why I think it will be very difficult to make any updates to the format or how the file is parsed.

@per1234
Copy link
Collaborator

per1234 commented Feb 20, 2018

Regarding keywords.txt backwards IDE compatibility:

Almost every 3rd party library follows this format:
KEYWORD (random whitespace) KEYWORD_TOKENTYPE
It is very rare to see a 3rd party library that uses the REFERENCE_LINK or RSYNTAXTEXTAREA_TOKENTYPE fields. The former because usually that wouldn't make sense, the latter because nobody even knows what the purpose of that field is.

The Arduino IDE's base keywords.txt file will only ever be used by the version of the Arduino IDE it ships with.

The REFERENCE_LINK field is used in some official Arduino libraries but almost every one is in 1.5 format and thus can't be used with Arduino IDE 1.0.x. None of them bundled with the IDE or in the arduino-libraries organization used RSYNTAXTEXTAREA_TOKENTYPE.

I'm all for backwards IDE compatibility but the scope of the problem may not be very significant. The consequences of supporting a format to be used used only by a small number of libraries which will is not supported by the keywords highlighting scheme of 1.0.x IDE versions is mild.

Of more concern to me is to continue supporting the commonly used format that previously worked. Further muddying the water is that with the current system some combinations of whitespace used as separator appears to work but isn't truly supported. For example, using multiple tabs as separator does result in keywords highlighting but keywords are all colored according to editor.function.style regardless of the identifier used.

keywords.txt:

KEYWORD1_tab	KEYWORD1
KEYWORD2_tab	KEYWORD2
KEYWORD3_tab	KEYWORD3
LITERAL1_tab	LITERAL1
LITERAL2_tab	LITERAL2

KEYWORD1_2tab		KEYWORD1
KEYWORD2_2tab		KEYWORD2
KEYWORD3_2tab		KEYWORD3
LITERAL1_2tab		LITERAL1
LITERAL2_2tab		LITERAL2

Arduino IDE 1.8.5 coloration:
clipboard01

@PaulStoffregen
Copy link
Contributor

Is there any info about the new tokentype field? Even just a link to the relevant source code?

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@per1234 per1234 closed this Mar 13, 2023
@per1234 per1234 self-assigned this Mar 13, 2023
@per1234 per1234 added the Type: Invalid Off topic for this repository, or a bug report determined to not actually represent a bug label Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request A request to make an enhancement (not a bug fix) Type: Invalid Off topic for this repository, or a bug report determined to not actually represent a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants