Skip to content

Simplify how the binary and graphics digit options (b and g) work #1689

Closed
@Rangi42

Description

@Rangi42

These options affect how RGBASM works at the lexer level, which can lead to surprising syntax errors. They're also too permissive.

  1. Standard digits can be changed. b10 swaps 0 and 1, so %101010 is 21 instead of 42. Assigning a standard digit to a nonstandard placement should be an error.
  2. Digits can be ambiguous. bXX is considered valid (and just treats X as the digit 0, not 1). Repeating any digit should be an error.
  3. Custom digits override the standard digits. b.X makes %X.X.X. lex as the value 42, but makes %101010 lex as a separate % operator and 101010 number. Custom digits should be alternatives, while still lexing the standard ones (since by point 1, the standards ones cannot have their values changed).
  4. Some characters can be specified via the CLI that cannot via opt, and they significantly change how the source code is lexed. opt b;X lexes as opt b and then a comment, but rgbasm "-b;X" successfully makes ; be the 0 digit, so %X;X;X; lexes as the number 42. There should be an allowed subset of digit characters, and special ones like ; and \ should not be in it.

Activity

added this to the 0.9.3 milestone on May 12, 2025
added
bugUnexpected behavior / crashes; to be fixed ASAP!
enhancementTypically new features; lesser priority than bugs
rgbasmThis affects RGBASM
on May 12, 2025
aaaaaa123456789

aaaaaa123456789 commented on May 12, 2025

@aaaaaa123456789
Member

I'm not sure about changing standard digits; maybe there is utility in that — but I do believe that, if the standard digits aren't changed (i.e., if none of them is specified in the mask), they should be allowed as alternatives. Duplicates should obviously be an error.

As for a character whitelist, it's probably better to start by listing the things that shouldn't be allowed, just so nobody forgets:

  • Anything not ASCII. This is a nightmare waiting to happen.
  • Control characters. I hope I don't have to explain this one.
  • Spaces and operators. I'm sure that someone out there is using + and - as digits, but the truth of the matter is that %++2 becomes ambiguous if you do that.
  • Colons, semicolons and backslashes. :: would become ambiguous if it could be part of a number, ; would become confusing (particularly considering that some tools do parse code, which might interpret it as a comment), and \ is essentially always a problem.
  • Quote marks. While those aren't ambiguous, it's a matter of time until they are mishandled.
  • Parentheses, brackets and braces. The opening ones are unambiguous, but certainly nightmarish to process; the closing ones could lead to ambiguous syntax.
Rangi42

Rangi42 commented on May 12, 2025

@Rangi42
ContributorAuthor

Agreed. Note, "operators" means + - * / % & | ^ < > = ! ~.

Omitting those, that leaves these ASCII characters:

  • Letters A-Z a-z. Those are fine.
  • Characters that are numeric literal prefixes: % & $ `. Disallow those.
  • Underscore _. Probably disallow that, since it's the goes-anywhere "digit separator" in numeric literals. (On the other hand, it might be popular as a choice for 0.)
  • Period .. Should be fine, since it's valid in local labels and fixed-point literals, and is a popular choice for 0.
  • Characters allowed in identifiers: @ #. Should be fine.
  • Comma ,. Disallow this, since it separates lists of numeric literals.
  • Characters not used in the grammar: question mark ? and single quote '. Probably disallow these, since ' will plausibly gain use as a quote character (e.g. for character literals), and ? could become a ternary operator.

That leaves the whitelist as: A-Z a-z . @ #, maybe _, maybe 0-9, and maaaybe ? or '.

aaaaaa123456789

aaaaaa123456789 commented on May 12, 2025

@aaaaaa123456789
Member

You should definitely at least also allow the digits themselves — so you can specify an alternative for just some of them. (Think opt b.1.) I'd honestly allow them all, but at the very least they should be able to represent themselves.

EDIT: note that this allows opt b01 as a natural way of disabling all aliases.

Rangi42

Rangi42 commented on May 12, 2025

@Rangi42
ContributorAuthor

Yeah, that's why I said "Assigning a standard digit to a nonstandard placement should be an error."

Rangi42

Rangi42 commented on May 12, 2025

@Rangi42
ContributorAuthor

Our current test cases:

test/asm/invalid-opt.asm:opt b123
test/asm/invalid-opt.asm:opt g12345
test/asm/opt-b.asm:OPT b.X
test/asm/opt-g.asm:OPT g.x0X
test/asm/trailing-commas.asm:   opt boO, g.xX#,
test/asm/underscore-in-numeric-literal.asm:     opt g_ABC, b_X

Most of those will become errors if they weren't already, and that's okay.

Note that invalid opt directives -- like opt b123 -- are currently non-fatal errors, so invalid chars like opt b$^ should be too.

aaaaaa123456789

aaaaaa123456789 commented on May 12, 2025

@aaaaaa123456789
Member

Errors should always be the lowest category that they can be. That means that things that could be warnings should be warnings; things that can be non-fatal errors should be non-fatal. I couldn't tell you if it can be non-fatal, as I'm not implementing the feature; but if it can be, it should be.

EDIT: CLI options should fail right away, though. There's no reason to parse a file from a mistyped command line.

Rangi42

Rangi42 commented on May 12, 2025

@Rangi42
ContributorAuthor

CLI options should fail right away, though. There's no reason to parse a file from a mistyped command line.

Yeah, things like rgbasm -b123 are instant-failure errors, so -b;% would be too.

Rangi42

Rangi42 commented on May 12, 2025

@Rangi42
ContributorAuthor

Anything not ASCII. This is a nightmare waiting to happen.

But I need to do db %😻ඞ😻ඞ😻ඞ! (It's not ambiguous with any language syntax! :3 )

aaaaaa123456789

aaaaaa123456789 commented on May 12, 2025

@aaaaaa123456789
Member
[22:55:20] ax6@n3 ~ $ echo -n '😻ඞ😻ඞ😻ඞ' | hexdump -C
00000000  f0 9f 98 bb e0 b6 9e f0  9f 98 bb e0 b6 9e f0 9f  |................|
00000010  98 bb e0 b6 9e                                    |.....|
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugUnexpected behavior / crashes; to be fixed ASAP!enhancementTypically new features; lesser priority than bugsrgbasmThis affects RGBASM

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @aaaaaa123456789@Rangi42

      Issue actions

        Simplify how the binary and graphics digit options (`b` and `g`) work · Issue #1689 · gbdev/rgbds