-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apple bookmarkData #493
Apple bookmarkData #493
Conversation
Nice!
Aha yes Had a quick look, maybe something like this would work?
I think the problem might be that by default fq currently turns "raw" fields into truncated "" strings when represented as a jq value, jq has no binary type unfortunately. Maybe have to rethink that and also i think maybe gojq had some changes that might have made strings "binary safe". But there is an option to change how raw fields should be represented, try Nice work, i will review more tomorrow! |
Had look at binary safe strings in jq and gojq. jq is not safe (replaces invalid code points with "REPLACEMENT CHARACTER") but gojq is. Seems i remembered it wrong, i think maybe i confused with gojq removed support for \x## escapes.
But maybe as expected in gojq the bytes will not survive a JSON run-trip, could maybe be worked around in fq but then would not be standard compliant i guess.
Also gojq uses go So i guess the only reason to still truncate would be that some formats would in common cases produce very big strings for some raw fields (ex mdat in mp4) when convert to JSON. Note that even with no truncate and use string instead of base64 torepr as JSON output would still produce JSON that would not be keeping some bytes (to JSON run-trip issue), it's only binary safe "internally" in fq, ex when use a query to find bookmark blob strings and decode them. |
Had a look at the NSKeyedArchiver thingy, seems to represent an object graph? would be cool if we could reconstruct it somehow? i gave it a try but think i might found out where the unknown fields come from. When i look at some of of sfl2 files they have a "root" uid that is 0 but i think they should be 1. Could it be that uid types are decode to short somehow? have a look at this:
Should the 0x01 be part of the uid value? |
You're right, UID was reading short. I fixed that so that it reads correctly now. I'm still seeing bit ranges unaccounted for, but a side-by-side comparison of the current |
I'm confused about why this won't work:
But this will:
Does |
`fq -h apple_bookmarkdata` seems to be producing some strange effects with line breaks. Is this something that is being caused by my editor? I noticed that the format markdown docs for other formats are much more succinct, should I remove all of the explanation of the format and maybe just leave them as references? |
Hmm that is strange, i think your markdown should be ok. There is some line break code that is a bit shaky, i suspect that might be the reason (in markdown.jq), will investigate. |
👍 will have a look again also and see if i can figure something out, but i have run into formats that do have gaps that are not "reachable" or not even padding/alignment it seems |
So it only return strings for decode value "roots", this way it can be used to find all nested formats etc. As a side note, there happens to exist a Back to the query: So let's break down
So
Hope that helped. I'm thinking about writing some kind of jq guide, so good to practice explaining how jq works :) So for sfl2 files i think we have to figure out some good way to find bookmarks blobs. I was thinking maybe it's poossible to reconstruct the NSKeyedArchiver object tree (assume it's not a cyclic graph?) and then look for boomarks and decode things? |
Fixed the help word breaking a bit #497. Now looks ok i think, but maybe can move large links inside the text to references? they are treated liked a very wide word :) |
You're explanation was very helpful! Unfortunately my question was malformed (I meant to have a second But after thinking some more about things, I think I understand why |
Correct! and i guess it's tricky to make the bplist decoder somehow be able to detect "subformats" like bootmarks, even tricker when its yet another encoding like NSKeyedArchive. So doing it with some jq help function or some snippet if short enough make sense i think. |
Looks good overall i think, some things you want to add? the jq snippet code things? |
Found definitions for the individual bit fields so I added decoding for that. Those are tricky but I think they are being decoded correctly. It's difficult to test and verify. Not quite ready to move forward until I'm sure those are correct, I'm going to take another look tomorrow. |
A bit unrelated to this PR but what are some similar tools to fq for doing forensics? think i would be interesting to see how they work |
https://github.com/ydkhatri/mac_apt These are a few. Some have a different mission than The fact that |
Any suggestions for a good workflow for when I need to regenerate |
|
||
type stack []int64 | ||
|
||
var offsetStack stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed a slice bounds out of range [:-1]
panic in pop
when i ran the tests and could reproduce with go test -v -count=1 -run TestFormats/applebookmark ./format
. I think the issue is that tests run in parallell and offsetStack
is modified and shared. Move var into bookmarkDecode
and pass around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't know what I was thinking making that global. Also a problem in pushAndPop
, the method references the global variable instead of the struct pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some thought the cleanest way to do things was to make the stack internal to the decode.D
. Maybe this is good going forward? If so, can fix up the bplist
decoder to use it as well since it does something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't know what I was thinking making that global. Also a problem in pushAndPop, the method references the global variable instead of the struct pointer.
No worries. Im surprised it was not detect by the test running with -race hmm
After some thought the cleanest way to do things was to make the stack internal to the decode.D. Maybe this is good going forward? If so, can fix up the bplist decoder to use it as well since it does something similar.
I'm a bit reluctant to add things to *decode.D
, need to think a bit more about it if so. I guess that field would have to get inherited down into sub decoders? could we go with a separate type for this for now? maybe we could put it in the decode package if it should be shared? another option is to put bplist and apple_bookmark into formart/apple etc so they can share things?
What do you think about something like this:
type posLoopDetector []int64
// PushAndPop attempts to push a value to the stack of saved offsets, invoking
// d.Fatalf if the offset is already present on the stack. Returns the Pop
// function, useful for invocation with defer. Intended to be used for
// detecting and short circuiting infinite recursion.
func (s *posLoopDetector) PushAndPop(p int64, detect func()) func() {
s.Push(p, detect)
return s.Pop
}
// Push attempts to add an offset to the offset stack, invoking d.Fatalf if the
// offset is already present on the stack. Intended as a means of detecting
// infinite recursion.
func (s *posLoopDetector) Push(p int64, detect func()) {
for _, o := range *s {
if p == o {
detect()
}
}
*s = append(*s, p)
}
// ...
func makeDecodeRecord() func(d *decode.D) {
var pld posLoopDetector
var decodeRecord func(d *decode.D)
decodeRecord = func(d *decode.D) {
defer pld.PushAndPop(
d.Pos(),
func() { d.Fatalf("infinite record loop detected") },
)()
d.FieldStruct("record", func(d *decode.D) {
n := int(d.FieldU32("length"))
typ := d.FieldU32("type", dataTypeMap)
switch typ {
case dataTypeString:
d.FieldUTF8("data", n)
d.FieldRawLen("alignment_bytes", 32-((d.Pos()+32)%32))
case dataTypeData:
d.FieldRawLen("data", int64(n*8))
case dataTypeNumber8:
d.FieldS8("data")
case dataTypeNumber16:
d.FieldS16("data")
case dataTypeNumber32:
d.FieldS32("data")
case dataTypeNumber64:
d.FieldS64("data")
case dataTypeNumber32F:
d.FieldF32("data")
case dataTypeNumber64F:
d.FieldF64("data")
case dataTypeDate:
d.FieldF64BE("data", scalar.DescriptionTimeFn(scalar.S.TryActualF, cocoaTimeEpochDate, time.RFC3339))
case dataTypeBooleanFalse:
case dataTypeBooleanTrue:
case dataTypeArray:
d.FieldStructNArray("data", "element", int64(n/arrayEntrySize), func(d *decode.D) {
offset := calcOffset(d.FieldU32("offset"))
d.SeekAbs(offset, decodeRecord)
})
case dataTypeDictionary:
d.FieldStructNArray("data", "element", int64(n/dictEntrySize), func(d *decode.D) {
keyOffset := calcOffset(d.FieldU32("key_offset"))
d.FieldStruct("key", func(d *decode.D) {
d.SeekAbs(keyOffset, decodeRecord)
})
valueOffset := calcOffset(d.FieldU32("value_offset"))
d.FieldStruct("value", func(d *decode.D) {
d.SeekAbs(valueOffset, decodeRecord)
})
})
case dataTypeUUID:
d.FieldRawLen("data", int64(n*8))
case dataTypeURL:
d.FieldUTF8("data", n)
case dataTypeRelativeURL:
baseOffset := d.FieldU32("base_url_offset")
d.FieldStruct("base_url", func(d *decode.D) {
d.SeekAbs(int64(baseOffset), decodeRecord)
})
suffixOffset := d.FieldU32("suffix_offset")
d.FieldStruct("suffix", func(d *decode.D) {
d.SeekAbs(int64(suffixOffset), decodeRecord)
})
}
})
}
return decodeRecord
}
// and then in decodeEntries do:
// d.SeekAbs(calcOffset(key&0x7fffffff), makeDecodeRecord())
Spent some time figure out a test case for this, maybe this in loop.book.fqtest
# loop.book created with:
# fq '[.bookmark_entries[0] | .offset_to_record, .record.data[0].offset | tobytesrange] as [$o0,$o1] | tobytes | [.[0:$o1.start],$o0,.[$o1.start+$o1.size:]] | tobytes' > loop.book
$ fq -d apple_bookmark ._error.error loop.book
"error at position 0x6c: infinite record loop detected"
(i should really figure out nicer why to deal with errors hmm)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really cool, I was trying to figure out the cleanest way to add the stack to the scope of the decodeRecord
function that didn't make a huge mess when passing parameters around, but I didn't know closures could be used like this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason it is not catching the infinite recursion in the example that I produced with your fq
one-liner there. See the loop.book
that I added to the testdata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason it is not catching the infinite recursion in the example that I produced with your fq one-liner there. See the loop.book that I added to the testdata.
Was a typo, see code comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really cool, I was trying to figure out the cleanest way to add the stack to the scope of the decodeRecord function that didn't make a huge mess when passing parameters around, but I didn't know closures could be used like this!
Yeap very useful pattern! go has quite good support for closures and lambdas, being GC:d and blurring what is stack and heap helps also. The syntax is a bit verbose, but not sure i wish for ml/haskell syntax either, a tiny bit more syntax help would be nice :)
…ebookmark.fqtestdecode: converts applebookmark to use new d.PushAndPop method
@@ -30,6 +36,7 @@ References | |||
========== | |||
- https://developer.apple.com/documentation/foundation/url/2143023-bookmarkdata | |||
- https://mac-alias.readthedocs.io/en/latest/bookmark_fmt.html | |||
- https://www.mac4n6.com/blog/2016/1/1/manual-analysis-of-nskeyedarchiver-formatted-plist-files-a-review-of-the-new-os-x-1011-recent-items | |||
- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess this is line breaking code gone wrong when in lists? hmm maybe keep it like this and i will have look at is separately
} | ||
|
||
var resourcePropDecoder = &dataObjectDecoder{decodeTgtPropertyFlagBits} | ||
var volumePropDecoder = &dataObjectDecoder{decodeVolPropertyFlagBits} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about something like:
diff --git a/format/applebookmark/apple_bookmark.go b/format/applebookmark/apple_bookmark.go
index 4716d4d0..d22bc83c 100644
--- a/format/applebookmark/apple_bookmark.go
+++ b/format/applebookmark/apple_bookmark.go
@@ -150,22 +150,12 @@ func decodeFlagDataObject(d *decode.D, flagFn func(d *decode.D)) {
d.FieldU32("length", d.AssertU(dataObjectLen))
d.FieldU32("raw_type", dataTypeMap, d.AssertU(dataTypeData))
d.FieldValueStr("type", "flag_data")
- decodePropertyFlags(d, flagFn)
+ d.FieldStruct("property_flags", flagFn)
+ d.FieldStruct("enabled_property_flags", flagFn)
d.FieldRawLen("reserved", 64)
})
}
-type dataObjectDecoder struct {
- flagFn func(d *decode.D)
-}
-
-func (dod *dataObjectDecoder) decode(d *decode.D) {
- decodeFlagDataObject(d, dod.flagFn)
-}
-
-var resourcePropDecoder = &dataObjectDecoder{decodeTgtPropertyFlagBits}
-var volumePropDecoder = &dataObjectDecoder{decodeVolPropertyFlagBits}
-
func decodeTgtPropertyFlagBits(d *decode.D) {
start := d.Pos()
d.FieldBool("is_hidden")
@@ -251,12 +241,6 @@ func decodeVolPropertyFlagBits(d *decode.D) {
d.FieldBool("supports_volume_sizes")
}
-func decodePropertyFlags(d *decode.D, bitFn func(d *decode.D)) {
- d.FieldStruct("property_flags", bitFn)
-
- d.FieldStruct("enabled_property_flags", bitFn)
-}
-
var cocoaTimeEpochDate = time.Date(2001, time.January, 1, 0, 0, 0, 0, time.UTC)
type tocHeader struct {
@@ -286,9 +270,9 @@ func (hdr *tocHeader) decodeEntries(d *decode.D) {
switch entry.key {
case elementTypeTargetFlags:
- d.SeekAbs(entry.recordOffset, resourcePropDecoder.decode)
+ d.SeekAbs(entry.recordOffset, func(d *decode.D) { decodeFlagDataObject(d, decodeTgtPropertyFlagBits) })
case elementTypeVolumeFlags:
- d.SeekAbs(entry.recordOffset, volumePropDecoder.decode)
+ d.SeekAbs(entry.recordOffset, func(d *decode.D) { decodeFlagDataObject(d, decodeVolPropertyFlagBits) })
default:
d.SeekAbs(entry.recordOffset, decodeRecord)
}
sometimes i wish go hade a little bit of more functional languages things like currying :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably better. I was trying to avoid the function declaration inside of the d.SeekAbs
parameter list but it ended up creating a whole bunch of other code that isn't being reused. I haven't done much functional programming outside of general familiarity with first-class functions, but currying does seem like it would help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I guess we could try use generics to make this pattern nicer. It seems to happen quite a lot, maybe something like d.SeekAbsFn1<T>(offset, some-T-value, fn(d *decode.D, v T) { ... })
could save on nested functions in the decoder code (and have SeekAbsFn2 if it takes two generic args)
}) | ||
} | ||
|
||
entry.recordOffset = calcOffset(d.FieldU32("offset_to_record")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could both key and recordOffset this be a local variable instead? skip the tocEntry struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to leave this as-is so that torepr
will continue to work properly (it relies on the key translations in the elementTypeMap
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha i meant still have key and offset_to_record fields but do recordOffset := calcOffset(d.FieldU32("offset_to_record"))
and above key := d.FieldU32("key", elementTypeMap)
as they are not used outside each iteration?
Sorry for all the late comments, hard to review by just reading so once you start poking around in an editor you notice things :) But think we're close to merge now |
defer pld.pushAndPop( | ||
d.Pos(), | ||
func() { d.Fatalf("infinite recursion detected in record decode function") }, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo here, pop function is not defer called, add a ()
, after that the loop fqtest works for me 👍
Did you get the WRITE_ACTUAL workflow to work? Some other workflows i use: Use
Use
Now It's usually also nice in general to spend some time to get a query that shows just the part your work on and then start to do changes. |
Ready to merge? only nit pick left was to possibly replace |
Yeah I think it's ready, just made the final edit you suggested |
One last question though - I keep getting test failures locally for three formats, all for the same reason (differences in the way raw byte ranges are represented as strings). Something misconfigured locally maybe?
|
🥳 |
Hmm i recognize this issue. Have a vague memory there was a string escape different between go 1.18 and 1.19? could that be it? if so try to update, but we can probably add some workaround also |
Upgraded to 1.19 and that fixed it, thx |
Great, but maybe a workaround is worth it, official go 1.18 support will be dropped in 4 month i think but i suspect that new versions of fq will be able to build with it for a long time. |
This PR implements a decoder for macOS/iOS
bookmarkData
blobs, which are often found within binary plist files. These are used to resolve URL objects for a file, even if the user moves or renames it.There are some issues that I could use help ironing out, namely getting it to work properly as a nested format using
grep_by
orselect
in recursivejq
expressions. I think there may be something wrong with the waytorepr
andtobytes
are working in thebplist
decoder. For instance this does not work in the way that I would think it would (does not yield any results):In this case, when a
bplist
data
type is passed totobytes
, the bytes output are the truncated base64 representation: