Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bplist: NSKeyedArchiver jq function #502

Closed
dgmcdona opened this issue Dec 3, 2022 · 20 comments
Closed

bplist: NSKeyedArchiver jq function #502

dgmcdona opened this issue Dec 3, 2022 · 20 comments

Comments

@dgmcdona
Copy link
Contributor

dgmcdona commented Dec 3, 2022

NSKeyedArchiver stores objects in a bplist format by flattening the object into a set of keys and values, which reference each other by index. A common example of these are the sfl2 files located in ~/Library/Application Support/com.apple.sharedfilelist. @wader proposed the following function for reconstructing these objects into a more meaningful JSON representation:

def from_ns_keyed_archiver:
  (  . as {"$objects": $objs, "$top": {root: $root_uid}}
  | def _f($id):
      ( . #debug({$id})
      | $objs[$id]
      # | debug
      | if type == "string" then .
        elif type == "number" then .
        else
          (. as {"$class": $class}
          | if $class == 13 then # NSDictionary?
              ( . as {"NS.keys": $ns_keys, "NS.objects": $ns_objects}
              | [$ns_keys, $ns_objects]
              | transpose
              | map(
                  ( . as [$k, $o]
                  | {key: _f($k), value: _f($o)}
                  )
                )
              # | debug
              | from_entries
              )
            elif $class == 58 then #?
              ( . as {"NS.objects": $ns_objects}
              | $ns_objects
              | map(_f(.))
              )
            else "class-\($class)"
            end
          )
        end
      );
    _f($root_uid)
  );

However, it was found that the class numbers are not consistent across multiple files, so relying on them for interpreting underlying types is not a general solution. The following seems to work:

def from_ns_keyed_archiver:
  (  . as {"$objects": $objs, "$top": {root: $root_uid}}
  | def _f($id):
      ( . #| debug({$id})
      | $objs[$id]
      #| debug
      | if type == "string" then .
        elif type == "number" then .
        else
          (. as {"$class": $class}
          | . #debug
          | if ."NS.keys" != null and ."NS.objects" != null then
              ( . as {"NS.keys": $ns_keys, "NS.objects": $ns_objects}
              | [$ns_keys, $ns_objects]
              | transpose
              | map
                (
                  ( . as [$k, $o]
                  | {key: _f($k), value: _f($o)}
                  )
                )
              | from_entries
              )
            elif ."NS.objects" != null then
              ( . as {"NS.objects": $ns_objects}
              | $ns_objects
              | map(_f(.))
              )
            else "class-\($class)"
            end
          )
        end
      );
    _f($root_uid)
  );

However, we are not yet sure that this is a best practice since it is was created from a heuristic approach that is not based on any known reference documentation. More work is needed to identify the best way of identifying arrays and objects within NSKeyedArchiver representations.

@wader
Copy link
Owner

wader commented Dec 4, 2022

Good summary. Let's collect info here and figure out what to do

@dgmcdona
Copy link
Contributor Author

@dgmcdona
Copy link
Contributor Author

dgmcdona commented Dec 10, 2022

The class number seems to just be an index back into the object array, where the classname can be found, which is why those numbers were varying. This code seems to work:

def from_ns_keyed_archiver:
  (  . as {"$objects": $objs, "$top": {root: $root_uid}}
  | def _f($id):
      ( .
      | $objs[$id]
      | if type == "string" then .
        elif type == "number" then .
        elif type == "boolean" then .
        elif type == "null" then .
        else
          (. as {"$class": $class}
          | if $objs[$class]."$classname" == "NSDictionary" then
              ( . as {"NS.keys": $ns_keys, "NS.objects": $ns_objects}
              | [$ns_keys, $ns_objects]
              | transpose
              | map
                (
                  ( . as [$k, $o]
                  | {key: _f($k), value: _f($o)}
                  )
                )
              | from_entries
              )
            elif $objs[$class]."$classname" == "NSArray" then
              ( . as {"NS.objects": $ns_objects}
              | $ns_objects
              | map(_f(.))
              )
            else "class-\($class)"
            end
          )
        end
      );
    _f($root_uid)
  );

@wader
Copy link
Owner

wader commented Dec 11, 2022

👍 Nice! that makes sense and things much easier.

Do you think there are more NS* or other classes to support? possible to do more for the fallback case? also wonder how robust to do think this needs to be? could possibly check for keys and objects exist etc, should throw error or something else?

@wader
Copy link
Owner

wader commented Dec 11, 2022

btw github markdown supports jq :)

@dgmcdona
Copy link
Contributor Author

dgmcdona commented Dec 11, 2022

Did some more digging through as many plist files as I could, and found a few more NS class types that are covered here:

def from_ns_keyed_archiver:
  (  . as {"$objects": $objs, "$top": {root: $root_uid}}
  | def _f($id):
      ( .
      | $objs[$id]
      | if type == "string" then .
        elif type == "number" then .
        elif type == "boolean" then .
        elif type == "null" then .
        elif type == "array" then .
        else
          (. as {"$class": $class}
          | if $class == null then . else
            $objs[$class]."$classname" as $cname
            | if $cname == "NSDictionary" or $cname == "NSMutableDictionary" then
                ( . as {"NS.keys": $ns_keys, "NS.objects": $ns_objects}
                | [$ns_keys, $ns_objects]
                | transpose
                | map
                    (
                    ( . as [$k, $o]
                    | {key: _f($k), value: _f($o)}
                    )
                    )
                | from_entries
                )
              elif $cname == "NSArray" 
                or $cname == "NSMutableArray" 
                or $cname == "NSSet" 
                or $cname == "NSMutableSet" then
                ( . as {"NS.objects": $ns_objects}
                | $ns_objects
                | map(_f(.))
                )
              elif $cname == "NSData" or $cname == "NSMutableData" then ."NS.Data"
              elif $cname == "NSUUID" then ."NS.uuidbytes"
              else ."$class"=$cname # replace class ID with classname, while returning the rest of the data as-is
              end
            end
          )
        end
      );
    _f($root_uid)
  );

However, I ran into a problem with an NSKeyedArchiver file /Library/Preferences/com.apple.networkextensions.plist (contains VPN configurations, from tailscale in my case). This particular file does not have a root value in the $top object, and none of the items in the $objects array seem to be the root. Very confusing.

@dgmcdona
Copy link
Contributor Author

I'm thinking it might be a good idea to name it something like from_ns_keyed_archiver_root, although that's getting to be a bit long. But we're not going to be able to reliably decode anything that doesn't have a root value.

@wader
Copy link
Owner

wader commented Dec 11, 2022

Nice progress. Are you able to share com.apple.networkextensions.plist or maybe sensitive?

Will have a deeper look more later day

@wader
Copy link
Owner

wader commented Dec 11, 2022

Cleaned up fix the style a bit to match the one used in fq, there was some destructing bindings that was only used once anyway, removed those, also added some TODOs for cases to maybe clarify.

def from_ns_keyed_archiver:
  (  . as {
      "$objects": $objects,
      "$top": {root: $root}
    }
  | def _f($id):
      ( $objects[$id]
      | type as $type
      | if $type |
          . == "string"
          or . == "number"
          or . == "boolean"
          or . == "null" then .
        elif $type == "array" then . # TODO: does this happen?
        else
          ( ."$class" as $class
          | if $class == null then . # TODO: what case is this?
            else
              ( $objects[$class]."$classname" as $cname
              | if $cname == "NSDictionary"
                  or $cname == "NSMutableDictionary" then
                  # transform arrays [key_id1, key_id2,...] and [obj_id1, obj_id2,..] into {key: obj, ...}
                  ( [."NS.keys", ."NS.objects"]
                  | transpose
                  | map({key: _f(.[0]), value: _f(.[1])})
                  | from_entries
                  )
                elif $cname == "NSArray"
                  or $cname == "NSMutableArray"
                  or $cname == "NSSet"
                  or $cname == "NSMutableSet" then
                  ( ."NS.objects"
                  | map(_f(.))
                  )
                elif $cname == "NSData" or $cname == "NSMutableData" then ."NS.Data" # TODO: will be a json string?
                elif $cname == "NSUUID" then ."NS.uuidbytes" # TODO: will be a json string?
                else
                  # replace class ID with classname, while returning the rest of the data as-is
                  ."$class " = $cname
                end
              )
            end
          )
        end
      );
    _f($root)
  );

If it's hard to follow transformation code like i sometimes add a snippet above it of how the input looks, maybe good idea?

# {
#   "$archiver": "NSKeyedArchiver",
#   "$objects": [
#     "$null",
#     {
#       "$class": 12,
#       "NS.keys": [
#         2,
#         3
#       ],
#       "NS.objects": [
#         4,
#         32
#       ]
#     },
# ...
#     {
#       "$classes": [
#         "NSDictionary",
#         "NSObject"
#       ],
#       "$classname": "NSDictionary"
#     },
# ...
#   ],
#   "$top": {
#     "root": 1
#   },
#   "$version": 100000
# }

Also this might be a good snippet to expand bookmarks:

$ fq -L . 'include "ns_keyed_archiver"; torepr | from_ns_keyed_archiver | (.. | .Bookmark? // empty) |= apple_bookmark' ...

(.. | .Bookmark? // empty) will recurse and output all value that it succeeds to index into, which will produce nulls when missing, the // takes care of that, it evals it right side if left side is empty of false-ish (null and false)).

Some things to figure out:

  • Name? _root or not, from_<name> or from<name>
  • Add to fq or keep in separate repo and link to it for now? less convenient but maybe easier to develop if you think there will be more changes? maybe can also make sense it the will be more functions like this?

@dgmcdona
Copy link
Contributor Author

plist.zip
Here's the file in question, I sanitized the data

@dgmcdona
Copy link
Contributor Author

One more thing to deal with in this one: It looks like every dictionary value that is a number is a reference to an object from the original array, if I'm reading things correctly.

@dgmcdona
Copy link
Contributor Author

I think com.apple.networkextension.plist is an encoding of 3 objects. Possible strategy:

  • detect that there is no root uid in $top, so we are decoding an array of objects.
  • start at object 0, working our way forward until we reach a dictionary with a $class property. This is going to be the first object in the decoded array.
  • decode the object and all of it's nested references, keeping track of which object indices have been referenced so that we know they are not an object root.
  • proceed forward until the next unreferenced object with a $class attribute, and follow same steps as before. repeat until end of object array.

@wader
Copy link
Owner

wader commented Dec 12, 2022

Thanks, that is a bit strange. I wonder if it could be that network extension has classes that use their own custom serializers somehow? i found this https://github.com/Chr0nicT/macOS-Headers-10.14.6-Mojave/blob/master/Frameworks/NetworkExtension/1/NEConfiguration.h which seems to indicate as you say that the number are classes but sometimes they are just numbers also? seems hard to have some generic heuristic for that?

Here is version that treat the UUID in $top as root and also recurses and stops at cycles:

def from_ns_keyed_archiver:
  (  . as {
      "$objects": $objects,
      # "$top": {root: $root}
      "$top": {"796BFF22-6712-4486-A32C-A1C5DB3273BA": $root}
    }
  | def _f($id; $seen_ids):
      def _r($id):
        if $seen_ids | has("\($id)") then "cycle-\($id)"
        else _f($id; $seen_ids | ."\($id)" = true)
        end;
      ( $objects[$id]
      | . #debug({$id, obj: .})
      | type as $type
      | if $type |
          . == "string"
          or . == "number"
          or . == "boolean"
          or . == "null" then .
        elif $type == "array" then . # TODO: does this happen?
        else
          ( ."$class" as $class
          | if $class == null then . # TODO: what case is this?
            else
              ( $objects[$class]."$classname" as $cname
              | if $cname == "NSDictionary"
                  or $cname == "NSMutableDictionary" then
                  # transform arrays [key_id1, key_id2,...] and [obj_id1, obj_id2,..] into {key: obj, ...}
                  ( [."NS.keys", ."NS.objects"]
                  | transpose
                  | map({key: _r(.[0]), value: _r(.[1])})
                  | from_entries
                  )
                elif $cname == "NSArray"
                  or $cname == "NSMutableArray"
                  or $cname == "NSSet"
                  or $cname == "NSMutableSet" then
                  ( ."NS.objects"
                  | map(_r(.))
                  )
                elif $cname == "NSData" or $cname == "NSMutableData" then ."NS.Data" # TODO: will be a json string?
                elif $cname == "NSUUID" then ."NS.uuidbytes" # TODO: will be a json string?
                elif $cname == "NEConfiguration" then
                  with_entries(
                    .value |= _r(.)
                  )
                else
                  # replace class ID with classname, while returning the rest of the data as-is
                  ."$class" = $cname
                end
              )
            end
          )
        end
      );
    def _f($id): _f($id; {"\($id)": true});
    _f($root)
  );

Then i get this:

{
  "$class": {
    "$classes": [
      "NEConfiguration",
      "NSObject"
    ],
    "$classname": "NEConfiguration"
  },
  "AlwaysOnVPN": "$null",
  "AppPush": "$null",
  "AppVPN": "$null",
  "Application": "io.tailscale.ipn.macsys",
  "ApplicationName": "Tailscale",
  "ContentFilter": "$null",
  "DNSProxy": "$null",
  "DNSSettings": "$null",
  "ExternalIdentifierString": "$null",
  "Grade": "cycle-1",
  "Identifier": "\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd,\ufffd\ufffd\ufffd2s\ufffd",
  "Name": "Tailscale Tunnel",
  "PathController": "$null",
  "ProfileInfo": "$null",
  "VPN": {
    "$class": "NEVPN",
    "DisconnectOnDemandEnabled": false,
    "Enabled": true,
    "ExceptionApps": 0,
    "OnDemandEnabled": false,
    "OnDemandRules": 0,
    "OnDemandUserOverrideDisabled": false,
    "Protocol": 6,
    "TunnelType": 1
  }
}

"Grade" is a long long so that cycle is a bogus i guess.

(the reason $seen_ids uses strings as keys is just that json only allow string keys)

@dgmcdona
Copy link
Contributor Author

I don't think we're going to be able to create a general enough function for NSKeyedArchiver objects that aren't of the standard $top.root type because of the reference vs. integer problem. In your output above, "Protocol": 6, is pretty clearly a reference, but "TunnelType": 1, if treated as a reference, would point to the top level object which would create infinite recursion, and we don't really have a way to make that decision accurately right now.

@wader
Copy link
Owner

wader commented Dec 14, 2022

Yeap i think your right and you know more how i will be used in practice. The only more idea i have is to have an optional lambda argument that would be called in the fallback case, but maybe not worth it?

So i guess left is to cleanup it up a bit, decide on name and if to include in fq or not? have made any progress on the forensic fq idea?

BTW are xml plists of interest also? are they used as NSKeyedArchiver also? there is start of an xml plist to json function in the fq wiki.

@dgmcdona
Copy link
Contributor Author

dgmcdona commented Dec 14, 2022

I'm not sure if there are XML NSKeyedArchiver files, but I'll keep an eye out next time I get to digging around.

I think I found a solution to the problem we were facing: we had lost useful type information in the bplist torepr function: uid types are getting converted to integers, and they can help us identify references since that type seems to be used explicitly for that purpose. I made some changes to the bplist implementation:

diff --git a/format/bplist/bplist.jq b/format/bplist/bplist.jq
index 22551d77..0656dddf 100644
--- a/format/bplist/bplist.jq
+++ b/format/bplist/bplist.jq
@@ -7,7 +7,7 @@ def _bplist_torepr:
       elif .type == "data" then .value | tovalue
       elif .type == "ascii_string" then .value | tovalue
       elif .type == "unicode_string" then .value | tovalue
-      elif .type == "uid" then .value | tovalue
+      elif .type == "uid" then .value | tovalue | tostring | ["cfuid-", .] | join("")
       elif .type == "array" then
         ( .entries
         | map(_f)

And changed your function above to account for this (I'm sure it needs some cleanup but it seems to be working):

def from_ns_keyed_archiver:
  (  . as {
      "$objects": $objects,
      # "$top": {root: $root}
      "$top": {"796BFF22-6712-4486-A32C-A1C5DB3273BA": $root}
    }
  | def _try_parse_uid($uidstr):
      if $uidstr | startswith("cfuid-") then
        $uidstr | match("[0-9]+", "l") | .string | tonumber else null end;
    def _f($id; $seen_ids):
      def _r($id):
        if $seen_ids | has("\($id)") then "cycle-\($id)"
        else _f($id; $seen_ids | ."\($id)" = true)
        end;
      ( $objects[_try_parse_uid($id)]
      | . #| debug({$id, obj: .})
      | type as $type |
        if $type == "string" and . == "$null" then null
        elif $type == "string" and _try_parse_uid(.) then _r(_try_parse_uid(.))
        elif $type |
          . == "number"
          or . == "boolean"
          or . == "null" then .
        elif $type == "array" then . # TODO: does this happen?
        elif $type == "object" then
          ( ."$class" as $class
          | if $class == null then # TODO: what case is this?
              with_entries(
              .value |= _r(.)
              )
            else
              #debug($class)|
              _try_parse_uid($class) as $uid | debug($uid) |
              ( $objects[$uid]."$classname" as $cname
              | debug
              | if $cname == "NSDictionary"
                  or $cname == "NSMutableDictionary" then
                  # transform arrays [key_id1, key_id2,...] and [obj_id1, obj_id2,..] into {key: obj, ...}
                  ( [."NS.keys", ."NS.objects"]
                  | debug
                  | transpose
                  | debug(.[0], .[1])
                  | map({key: _r(.[0]), value: _r(.[1])})
                  | from_entries
                  )
                elif $cname == "NSArray"
                  or $cname == "NSMutableArray"
                  or $cname == "NSSet"
                  or $cname == "NSMutableSet" then
                  ( ."NS.objects"
                  | map(_r(.))
                  )
                elif $cname == "NSData" or $cname == "NSMutableData" then ."NS.Data" # TODO: will be a json string?
                elif $cname == "NSUUID" then ."NS.uuidbytes" # TODO: will be a json string?
                else
                  # replace class ID with classname, while returning the rest of the data as-is
                  ."$class" = $cname |
                  with_entries(
                    if (.value | type) == "string" and _try_parse_uid(.value) then .value |= _r(.) end
                  )
                end
              )
            end
          )
        end
      );
    def _f($id): _f($id; {"\($id)": true});
    _f($root)
  );

Which produces the following output for com.apple.networkextension.plist:

{
  "$class": "NEConfiguration",
  "AlwaysOnVPN": null,
  "AppPush": null,
  "AppVPN": null,
  "Application": "io.tailscale.ipn.macsys",
  "ApplicationName": "Tailscale",
  "ContentFilter": null,
  "DNSProxy": null,
  "DNSSettings": null,
  "ExternalIdentifierString": null,
  "Grade": 1,
  "Identifier": "\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd,\ufffd\ufffd\ufffd2s\ufffd",
  "Name": "Tailscale Tunnel",
  "PathController": null,
  "ProfileInfo": null,
  "VPN": {
    "$class": "NEVPN",
    "DisconnectOnDemandEnabled": false,
    "Enabled": true,
    "ExceptionApps": null,
    "OnDemandEnabled": false,
    "OnDemandRules": null,
    "OnDemandUserOverrideDisabled": false,
    "Protocol": {
      "$class": "NETunnelProviderProtocol",
      "AuthenticationMethod": 0,
      "AuthenticationPluginType": null,
      "DNSSettings": null,
      "DesignatedRequirement": "anchor apple generic and identifier \"io.tailscale.ipn.macsys.network-extension\" and (certificate leaf[field.1.2.2222222222.100.6.1.9] /* exists */ or certificate 1[field.1.2.2222222222.100.6.2.6] /* exists */ and certificate leaf[field.1.2.2222222222.100.6.1.13] /* exists */ and certificate leaf[subject.OU] = 2222222222)",
      "DisconnectOnIdle": false,
      "DisconnectOnIdleTimeout": 0,
      "DisconnectOnLogoutKey": false,
      "DisconnectOnSleep": false,
      "DisconnectOnUserSwitch": false,
      "DisconnectOnWake": false,
      "DisconnectOnWakeTimeout": 0,
      "EnforceRoutes": false,
      "ExcludeLocalNetworks": false,
      "Identifier": "\ufffd\ufffd\ufffdL\ufffd\ufffd\u000f\ufffd\u0005\ufffd\ufffd\u001aq",
      "Identity": null,
      "IdentityData": null,
      "IdentityDataHash": null,
      "IdentityDataImported": false,
      "IdentityDataPassword": null,
      "IdentityDataPasswordKeychainItem": null,
      "IncludeAllNetworks": false,
      "NEProviderBundleIdentifier": "io.tailscale.ipn.macsys.network-extension",
      "Password": null,
      "PasswordEncryption": null,
      "PasswordReference": null,
      "PluginType": "io.tailscale.ipn.macsys",
      "ProxySettings": null,
      "ReassertTimeout": 0,
      "ServerAddress": "Tailscale Mesh",
      "Type": 4,
      "Username": null,
      "VendorConfiguration": null,
      "VendorInfo": null
    },
    "TunnelType": 1
  }
}

@dgmcdona
Copy link
Contributor Author

It would be better to create an object than doing the funky string concatenation and parsing, I’ll fix that up later.

@wader
Copy link
Owner

wader commented Dec 14, 2022

I'm not sure if there are XML NSKeyedArchiver files, but I'll keep an eye out next time I get to digging around.

👍

I think I found a solution to the problem we were facing: we had lost useful type information in the bplist torepr function: uid types are getting converted to integers, and they can help us identify references since that type seems to be used explicitly for that purpose. I made some changes to the bplist implementation:

Oh good catch! nice. String interpolation can be nice for this ... | tovalue | "cfuid-\(.)" but i agree an object is probably better.

@dgmcdona
Copy link
Contributor Author

I'd be down to keep this in the fq repo if that's okay, don't really have a lot of other functions in mind off the top of my head. Where can we put it?

@wader
Copy link
Owner

wader commented Dec 14, 2022

Ok let's put in fq. Maybe a "macos" package could make sense? move bplist and apple_bookmark there? maybe even move the macho decoder? otherwise a "plist" package but would apple_bookmark fit? the structure under format/ is not very strict and should be no problem moving things around later. Any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants