Skip to content

Improve -MG #903

Open
Open
@ISSOtm

Description

@ISSOtm
Member

-MG only reports new dependencies one by one, it would be much better to do so in batches: generating more dependencies per step reduces the amount of RGBASM invocations, which is better.

Some testing with GCC shows that it reports all missing deps, and treats all missing files as empty. (#if, #ifdef etc. are evaluated normally.)

This could in theory cause some problems:

Data:
INCBIN "file.bin"
IF @ - Data == 0
    FAIL "Wut, the file is empty?"
ENDC

...to which I suggest aborting normally on any fatal error. This would be similar to the current behavior, but probably less common. Note however that GCC honors #error, still producing the deps, but returning 1 (so the first make errors out, but further ones work correctly). Maybe this could be better handled by skipping all conditional blocks entirely, still producing fatal errors encountered along the way?

... Overall, this sounds incredibly fragile, and I'm really not confident in this change, but it's largely a necessity if we want

Activity

added
enhancementTypically new features; lesser priority than bugs
rgbasmThis affects RGBASM
on Jul 4, 2021
Rangi42

Rangi42 commented on Jul 4, 2021

@Rangi42
Contributor

"skipping all conditional blocks entirely" wouldn't allow cases like this:

IF DEF(_GOLD)
	INCBIN "gfx/pokemon/eevee/front_gold.dimensions"
ELIF DEF(_SILVER)
	INCBIN "gfx/pokemon/eevee/front_silver.dimensions"
ENDC
added this to the v0.5.2 milestone on Jul 4, 2021
ISSOtm

ISSOtm commented on Jul 4, 2021

@ISSOtm
MemberAuthor

Then how should they be handled? Conditionally evaluating all blocks won't work either for example due to macro arg expansion...

(As a side note, I feel like we're fighting an uphill battle regarding this due to not being able to run -M as a preprocessing step...)

Rangi42

Rangi42 commented on Jul 4, 2021

@Rangi42
Contributor

I'm not sure; pret uses a custom scan_includes tool that ignores all rgbasm semantics and just scans for lines starting with INCLUDE/INCBIN. (Which doesn't even let us do things like INCLUDE "{VERSION}/foo.inc" or INCBIN STRCAT("build/", "bar.bin").)

The initial report sparking this gave this example:

section "x", ROM0
incbin "a"
incbin "b"

Without -M -MG both incbins complain about missing files; with -M -MG the first is printed as a dependency but the second isn't. (sect_BinaryFile calls fstk_FindFile calls printDep; then sect_BinaryFile can't open the file so complains and sets failedOnMissingInclude = true; I'm actually not sure why the second incbin isn't printed anyway by the if (generatedMissingIncludes) printDep(path); clause in fstk_FindFile.)

ISSOtm

ISSOtm commented on Jul 5, 2021

@ISSOtm
MemberAuthor

That's because failedOnMissingInclude causes YYACCEPT in parser.y

aaaaaa123456789

aaaaaa123456789 commented on Jul 5, 2021

@aaaaaa123456789
Member

Dependency listing is one of those things where accurate output is more important than fast output. If a project relies on dependency output in its build system, inaccurate output might cause a file not to be rebuilt, leaving stale code/data in the final binary.

Scanning assembly files for inclusion statements is a trivial task if anyone wants to build a faster but simpler tool. (I once wrote a scan_includes equivalent in all of half an hour, testing included. In pure ISO C17.) RGBASM should be as correct as possible in this aspect — projects that want the faster behavior can easily implement it.

daid

daid commented on Jul 5, 2021

@daid
Contributor

Scanning assembly files for inclusion statements is a trivial task if anyone wants to build a faster but simpler tool.

Unless you have to deal with rgbds macros and equs.

I currently first run rgbasm and then append the resulting dependency file with a grep + sed to add all the extra incbin statements as well, it's ugly, but it works in the common case.

RGBASM should be as correct as possible in this aspect — projects that want the faster behavior can easily implement it.

Depends on your definition of "correct". From my perspective, the current behavior is incorrect. I ask a file for it's dependencies, and it does not list all dependencies.

I had earlier questions about -MP behavior, and the reply I got back then was "current behavior matches gcc". And for this specific issue, the behavior does not match gcc. With -MG gcc lists all includes, and process all defines it does have, and if any #error is encountered it aborts with an error.

Maybe this could be better handled by skipping all conditional blocks entirely

That sounds like a really bad plan.

ISSOtm

ISSOtm commented on Jul 5, 2021

@ISSOtm
MemberAuthor

Unless you have to deal with rgbds macros and equs.

The counter-argument is "we don't use those, so our tool doesn't account for them". This is valid, as "this specific codebase doesn't use these features" lets you optimize for their lack. However, they introduce additional dependencies (typically Python or a C compiler), so removing their need would be useful.

I ask a file for it's dependencies, and it does not list all dependencies. [...] With -MG gcc lists all includes, and process all defines it does have, and if any #error is encountered it aborts with an error.

You are correct, but this is by necessity. C dependency generation is performed by the preprocessor, which only cares about its own syntax:

% cat c.c
#include "a.h"

rofl 1 { "ah yes" }?$;:
% gcc -MM -MG -MP c.c
c.o: c.c a.h

a.h:

And in addition, the preproc's syntax is not context-sensitive; that is, a C file can be correctly parsed (by the preproc) even when some of the files to be included are missing. (I believe #if etc. blocks cannot "straddle" files, but correct me on that if I'm wrong.)

RGBASM is not so lucky due to EQUS.

% cat a.asm
INCLUDE "{INC}"
PRINTLN 16 tiles
% cat b.inc
tiles EQUS "* 16"
% cat c.inc
% rgbasm -DINC=b.inc a.asm
$100
% rgbasm -DINC=c.inc a.asm
ERROR: a.asm(2)
    syntax error, unexpected identifier
error: Assembly aborted (1 error)!

The rest of the syntax can largely be handled: treat "required" yet non-existent variables as 0, ignore macros, etc. But the lack of an EQUS can (and likely will) introduce parsing errors.

Even then, since macros can generate INCLUDEs, dependencies can be hidden even when not transitive (a.asm defines a macro ninja that INCLUDE "c.inc", includes b.inc which defines samurai EQUS "ninja", then a.asm calls samurai as if it were a macro. Contrived? Certainly.)
The presence or absence of an INCBIN can be easily tested for as well, since the file's size is not known, therefore PC must advance by some number. This can further alter the chain in non-obvious ways, more than just FAIL or tripping asserts (assert @ - DataStart == $1234, "File does not have expected size"). How should those be handled?

That sounds like a really bad plan.

Since it appears that ignoring all IFs is wrong, then that should be restricted to all IFs whose expression generates a syntax error, I guess. Though introducing more behavioral differences between normal parsing and after a dependency turned out to be missing is a can of worms that I'm definitely not opening.

All in all, the lack of robustness in RGBASM programs is why the current behavior is what it is: after a failed INCBIN, and even moreso a failed INCLUDE, here be dragons. Only one dependency is currently listed per invocation out of necessity, not laziness. Even reporting syntax errors (gcc -MM -MG still errors out while printing nothing if the file contains #if defined(kaboom!) is not reliable, so reporting FAILs as well does not sound like a good idea.

This could be improved while staying relatively simple and safe: only abort on the first conditional block or fatal error after a missing dep, which would at least account for a chain of INCBINs not followed by strict assertions, and some INCLUDE chains.

aaaaaa123456789

aaaaaa123456789 commented on Jul 5, 2021

@aaaaaa123456789
Member

A fully accurate mode could be introduced by adding a flag that simply assembles the input into /dev/null and outputs the list of files it read. This will obviously be very slow, but also 100% correct (bar any randomness in the input) for any method that needs that, such as @daid's tool.

I'm not sure of whether this would be worth it, though.

ISSOtm

ISSOtm commented on Jul 5, 2021

@ISSOtm
MemberAuthor

I've just explained that it is not possible to assemble the input without the dependencies in the general case.

To do so in one go, RGBASM would need to be able to invoke the command responsible for generating the file, and proceed once the file has been generated. (This would flip the paradigm from RGBASM telling make what it needs to RGBASM invoking make itself, though I've proposed this idea long ago and it was shot down by ax6.)

daid

daid commented on Jul 5, 2021

@daid
Contributor

Only one dependency is currently listed per invocation out of necessity, not laziness.

I disagree, as without -MG the parsing continues after a missing incbin. So it can continue, but decides not to. For some extreme edge case reasons.

ISSOtm

ISSOtm commented on Jul 5, 2021

@ISSOtm
MemberAuthor

Parsing continues after a failed INCBIN as "current state known to be incorrect, let's try reporting more errors for this batch"; -MG instead tries to avoid being incorrect, so it aborts immediately. Again, I agree this is worth changing, what do you think about the changes I suggested above?

28 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementTypically new features; lesser priority than bugsrgbasmThis affects RGBASM

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @daid@aaaaaa123456789@ISSOtm@Rangi42

      Issue actions

        Improve -MG · Issue #903 · gbdev/rgbds