Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime format descriptions #24

Closed
jmerdich opened this issue Dec 23, 2021 · 5 comments
Closed

Runtime format descriptions #24

jmerdich opened this issue Dec 23, 2021 · 5 comments

Comments

@jmerdich
Copy link

Very interested in this project, but I often work on non-standard or proprietary formats. All of the format support thus far seems to be done at compile time with parsers written as go modules. It'd be nice to have something that can be used for real oddball formats that either already exist in other binary-templating tools (kaitai, 010, hexfiend) or are not appropriate for upstreaming. Additionally, the current status quo requires anyone wanting to contribute a format to know/learn Go.

As such, it'd be very nice to have some way of using a new format without extending and recompiling fq directly ("Runtime Loading").

There's a bunch of existing standards here:

  • Kaitai Struct format (.ksy)
  • 010 Binary Template format (.bt)
  • HexFiend/WinHex TCL format (.tcl)
  • Probably much more

In addition, Go's plugin system could be used for this. This probably is the easiest to implement (since all existing formats are written as Go modules), but can't reuse existing templates from other ecosystems.

@wader
Copy link
Owner

wader commented Dec 24, 2021

Hi, yes i'm very interesting in support this somehow and i've done some thinking about it.

Currently the only way to provide own decoders written in go is to use fq as submodule and then register own formats and call the cli.Main function from your own main. I have private version of fq that does this and it work quite nice but it not very flexible, also the go decode API will probably not be stable for a while.

Kaitai is probably the one i will look at first. My plan is to see how complicated it would be to implement a ksy parser and "executor" on top of the decoder API, maybe first step would be to support the most basic things.

To support TCL scripts would probably require linking with the C-version of TCL which i really don't want to do, it would complicate building and introduce lots of other issues i think. While prototyping fq i actually went this route for a while (https://github.com/wader/textfiend is part of it :) ).

010 Binary Template i don't know about, looks like C-codeish?

Go plugins i haven't used myself so have to read up on those. Guess you have to provide a versioned API somehow to make that sense?

One approach i've looked into is writing decoder in jq. It's a bit exotic but it could be useful for very simple decoders and i think the syntax can be quite nice, something like .. | decode_by({a: u8, b: utf8(10)}).

In summary that would provide three levels of decoders:

  • Builtin or via submodule in go, fast and typed
  • Builtin decoder in go that gets a description (kaitai, protobuf, ...)
  • Decoder in jq

Another issue is how the CLI interface should work when providing description(s). I think it would be good if the decoders only gets a string and then it's up to the cli code to read from file etc and glue it together. Also then you could potentially manually call a kaitai decoder or different decoders with two different external descriptions.

Maybe something like this:

# provide a default description that will be used by all description based decoders (--desc sets some global var) 
fq --desc file.ksy -d kaitai file
# decode two different files with two different descriptions
# TODO: not very nice
fq --raw-file d1 d1.ksy --raw-file d2 d2.ksy -n -d raw '(input | kaitai($d1)), (input | kaitai($d2))' f1 f2

Sorry this got a bit long and turned into some kind of implementation plan. Let me know what you think.

@arl
Copy link

arl commented Feb 18, 2022

Hey, awesome project.

First thing I've looked for in the features was, since it's the use case I'd have had, the ability to define my own format at runtime, like with 010 editor. I've used this commercial hex editor that did that, years ago, and this was a life saver when working on proprietary binary formats or protocols. Format were defined in what they called binary templates, which is a struct description language similar to, but of course, trimmed down, C. I'm linking it for reference.

Also, FWIW there's a pure go port of TCL, https://pkg.go.dev/modernc.org/tcl

@wader
Copy link
Owner

wader commented Feb 18, 2022

Hey, awesome project.

Thanks!

First thing I've looked for in the features was, since it's the use case I'd have had, the ability to define my own format at runtime, like with 010 editor. I've used this commercial hex editor that did that, years ago, and this was a life saver when working on proprietary binary formats or protocols. Format were defined in what they called binary templates, which is a struct description language similar to, but of course, trimmed down, C. I'm linking it for reference.

Yeah i would like to have some kind of runtime decoders also and hope i can get some time to focus on it soon. Lately been busy documenting and cleaning up some early sins/mess to make things more consistent.

First i think i would like to support kaitai struct somehow but it's a bit tricky as kaitai seems focused on transpiling decoders to other languages at built/dev-time and I want to do it at runtime. So I think i will have to implement my own interpreter thingy somehow. I did some prototyping that was quite promising. Also supporting decoding in jq would be interesting but a bit exotic, have done some prototyping on that also that was promising.

Is it something you would like to work on?

Also, FWIW there's a pure go port of TCL, https://pkg.go.dev/modernc.org/tcl

Interesting, i've seen the same C to Go transpiler used for sqlite. But seems experimental for some archs still? wonder why.

@wader
Copy link
Owner

wader commented Jul 28, 2022

Haven an early test prototype how this could work, mostly just to see how it would look/feel to a user:

$ fq -o source=@path/to/test.ksy -d kaitai d test
     │00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15│0123456789abcdef012345│.{}: testfiles/test.mp3 (kaitai)
0x000│49 44                                                            │ID                    │  numbers1: 18756
     │                                                                 │                      │  numbers3[0:10]:
0x000│      33 04 00 00 00 00 00 23                                    │  3......#            │    [0]: 3676063195841167395
0x000│                              54 53 53 45 00 00 00 0f            │          TSSE....    │    [1]: 6076291878070779919
0x000│                                                      00 00 03 4c│                  ...L│    [2]: 3626587547189
0x016│61 76 66 35                                                      │avf5                  │
0x016│            38 2e 32 39 2e 31 30 30                              │    8.29.100          │    [3]: 4048228336222154800
0x016│                                    00 00 00 00 00 00 00 00      │            ........  │    [4]: 0
0x016│                                                            00 00│                    ..│    [5]: 1099432984576
0x02c│00 ff fb 50 00 00                                                │...P..                │
0x02c│                  00 00 00 00 00 00 00 00                        │      ........        │    [6]: 0
0x02c│                                          00 00 00 00 00 00 00 00│              ........│    [7]: 0
0x042│00 00 00 00 00 00 00 00                                          │........              │    [8]: 0
0x042│                        00 00 00 00 00 00 00 49                  │        .......I      │    [9]: 73
     │                                                                 │                      │  test{}:
0x042│                                                6e               │                n     │    a: 110
0x042│                                                   66 6f 00 00 00│                 fo...│  unknown0: raw bits
0x058│0f 00 00 00 02 00 00 04 13 00 99 99 99 99 99 99 99 99 99 99 99 99│......................│
*    │until 0x43f.7 (end) (1005)                                       │                      │

Idea is pass ksy as an option to a kaitai decoder, use @ syntax to read option value from a file. This way rest of fq cli code does not need to know about kaitai. But maybe there are nicer ways?

To do this in runtime there is much to be done:

  • A runtime kaitai interpreter
  • Parse/eval of kaitai expression langauge
  • Should it handle ksy including other formats/ksy?

I'm not actively working on this at the moment so let me know if someone is interested.

@wader
Copy link
Owner

wader commented Aug 7, 2023

Let's continue in #627

@wader wader closed this as completed Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants