Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

对BytesExtra实际结构的猜想 #101

Closed
TsXor opened this issue Jun 6, 2024 · 2 comments
Closed

对BytesExtra实际结构的猜想 #101

TsXor opened this issue Jun 6, 2024 · 2 comments

Comments

@TsXor
Copy link

TsXor commented Jun 6, 2024

使用protodeep可以得到一个初步的schema:

syntax = "proto3";


message Schema {

  message field1_type {
    int64 field1 = 1;
    int64 field2 = 2;
  }

  repeated field1_type field1 = 1;

  message field3_type {
    int64 field1 = 1;
    string field2 = 2;
  }

  repeated field3_type field3 = 3;
}

基于实际数据做一些猜测,可以得到如下schema:

syntax = "proto3";

message BytesExtra {
  message FlagAttr { int64 enum_code = 1; int64 value = 2; }
  message StringAttr { int64 enum_code = 1; string value = 2; }
  repeated FlagAttr flags = 1;
  repeated StringAttr strings = 3;
}

ExtraBytes可能由“属性”构成,每个“属性”包含一个枚举标志和其实际值。例如,enum_code1代表属性为微信号。

@TsXor
Copy link
Author

TsXor commented Jun 6, 2024

POC:

from typing import Any
from bytes_extra_pb2 import BytesExtra

def parse_extra(extra: bytes):
    deser = BytesExtra(); deser.ParseFromString(extra)
    attrs = dict[str, Any]()
    for s in deser.flags: attrs[s.enum_code] = s.value
    for f in deser.strings: attrs[f.enum_code] = f.value
    return attrs

解析结果:

{
    1: 'wxid_wf3i4qvi1fs422',
    2: 'ed888b05f5e43d187e82f7b3e3d43955',
    3: 'wxid_uapoh6oefocv22\\FileStorage\\MsgAttach\\bc44fe438e271901567ead7e4b94eb23\\Thumb\\2024-06\\1a08016ec909b67ed62ece884a511cb6_t.dat',
    4: 'wxid_uapoh6oefocv22\\FileStorage\\MsgAttach\\bc44fe438e271901567ead7e4b94eb23\\Image\\2024-06\\ae982b5a092b000771e86f9170eef736.dat',
    5: 1,
    6: 0,
    7: '<msgsource>\n    <alnode>\n        <fr>2</fr>\n    </alnode>\n    <sec_msg_node>\n        <uuid>cfbbfac92959371dcc1a01dcc894a638_</uuid>\n        <risk-file-flag />\n        <risk-file-md5-list />\n        <alnode>\n            <fr>1</fr>\n        </alnode>\n    </sec_msg_node>\n    <imgmsg_pd cdnmidimgurl_size="372253" cdnmidimgurl_pd_pri="30" cdnmidimgurl_pd="0" />\n    <silence>1</silence>\n    <membercount>437</membercount>\n    <signature>V1_SCc9hsGK|v1_Vaz8G5n5</signature>\n    <tmp_node>\n        <publisher-id />\n    </tmp_node>\n</msgsource>\n',
    16: 1,
}

@TsXor TsXor changed the title 对ExtraBytes实际结构的猜想 对BytesExtra实际结构的猜想 Jun 7, 2024
@xaoyaoo
Copy link
Owner

xaoyaoo commented Jun 7, 2024

非常感谢更加详细的数据,如果方便的话,可以提交一个pr吗?
pywxdump/dbpreprocess/parsingMSG.py

def get_BytesExtra(self, BytesExtra): 
         if BytesExtra is None or not isinstance(BytesExtra, bytes): 
             return None 
         try: 
             deserialize_data, message_type = blackboxprotobuf.decode_message(BytesExtra) 
             return deserialize_data 
         except Exception as e: 
             return None

让他的返回值更加详细

@xaoyaoo xaoyaoo closed this as completed Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants