-
Notifications
You must be signed in to change notification settings - Fork 773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Communication protocol redesign #16
Comments
I'd recommend adding a protocol version identifier early on in whatever protocol you decide to go for, so you can make changes in the future without too much breakage. Currently since Mercury is one-way, you only need to worry about a new client attempting to connect to an old server, which I'd guess will just error out with an exception. Since all existing Mercury calls should start with "<", we ought to be able to build a reasonable detection mechanism for a new server responding to an old client. In the future both sides should indicate what version of the Mercury protocol they support. I'd recommend checking out http://code.google.com/p/protobuf/ as a serialization library, it seems to have a fair number of useful bindings (java, python, c++), so could be good for this. |
I also like the idea of using protobuf. I'ts very simple to define data models and it has support for Java and Python. Fits pretty well in mercury. |
Agreed, this looks like the way forward. How about the following definitions on messages in both directions.
The pro about having the message type field is that a completely new protobuf definition can be made for other incorporated features. At the moment it would probably be enum values for messages like:
The ProtoBuf Length is an easy way on the receiving side so that we don't need to look for silly delimiters in the TCP stream. A protobuf definition can be made for each message type and parsed as such by checking the message type field. Thoughts and suggestions? |
This would be a break from the existing API, as it wouldn't interoperate with the old XML formatted code. We'd also need to be careful not to accept a mercury version of '<'. I haven't looked into protocol buffers closely enough to know whether they could encapsulate the buffer length themselves (or even the message type and mercury version). Are there any existing examples of how other people construct protocols using them? We should probably do a bit more reading before we dive right in... |
I don't see the need for it to inter-operate with existing XML code. The project is still in early phases and we need proper structures to take the project forward. A complete redesign of the comms protocol that breaks backward compatibility with the existing protocol is fine in my opinion. Protobuf has no way of knowing its own size and so a size header is recommended. I found these 2 threads helpful: |
For reference, my proposed layout follows the Type-length-value structure found in many protocols. |
I appreciate using protocol buffers too. I wrote a .proto file as an example (https://github.com/gjunquera/mercury/blob/pb2_communication/Message.proto). I also modify "connect" and "provider->info" implementations to use protocol buffers as a PoC (https://github.com/gjunquera/mercury/blob/pb2_communication) Please give your thoughts. |
Hi, Thank you very much for our first code contribution on this matter. I don't want to be harsh on it but I do not believe that it is the best way to do this. By implementing specific responses in the protocol itself, we are losing flexibility all the way. Instead why not have something more generalised like this as the Response:
This way, it is up to the method that received the Request to fill in the structured_data fields with appropriate data, like various columns and fields etc. If the method that received the Request does not need to send back specific ordered information it can merely fill in the data field and leave structured_data blank. Let me know if anyone sees any obvious faults in that idea? EDIT: It also might be worth using the bytes proto type instead of string because this protocol should also be able to transfer binary files. EDIT 2: Rethought the structure a bit |
I'll need a bit of time to process this, please give me until after the weekend to have a think on it... |
No problem to change my solution, it was just an example to restart the discussion. I was not sure if it was a good. On KVPair, why did you use key as bytes? Key is always a string, isn't it? What about the Request? Follows my suggestion: message Request { optional string section = 1; optional string function = 2; repeated KVPair args = 3; } message KVPair { required string key = 1; required bytes value = 2; } What do you think? Another point, Protocol Buffers would add some dependencies to mercury, is it a problem? |
Agreed that the key will be a string. There might be a case for having a data field in the Request as well for unstructured data requests. We would like to keep it as open for extension as possible. I think that these are good first steps but there are still many things to consider. The dependencies will have to be looked at as well, if it is cross-platform and reliably simple to install then it shouldn't be a problem. |
About the response: message Response { optional bytes data = 1; optional bytes error = 2; repeated KVPair structured_data = 3; } I think the structure above is not a good solution for all kind of Responses. For example: Package Name: ****** Authority: ******* Permissions: ******* (...) Package Name: ****** Authority: ******* Permissions: ******* (...) Package Name: ****** Authority: ******* Permissions: ******* (...) How would we map this kind of response in a KVPair list? For some cases we could create a specific .proto. ProviderResponse{ Info { optional string package_name = 1; optional string authority = 2; optional string permissions = 3; (...) } repeated Info info = 1; } We could wrapper this structure in the data field of an usual Response, then all commands always would return the same structure. I like the idea of creating specific responses because Protocol Buffers automatically generates the parser which makes data manipulation really easy. What do you think? |
As an example (probably not nearly the best way), that response could be structured as follows: Key | Value row1:packagename | ******* row1:authority | ******* row1:permissions | ******* row2:packagename | ******* row2:authority | ******* row2:permissions | ******* |
Well.... I don't like too much the idea of using separators in the Key or in the Value. Today I can't say one case where a Response would require adding a lot of separators, but maybe in the future Mercury could have more complex Responses and making specific .proto could make a lot of sense. I agree with the Response structure suggested, but I think we could create new .protos for complex Responses and wrapper them in the data field =). |
I completely agree with you. New .protos could even be wrapped inside a value in the KVPair with a nice key name. The point is that we need a very generalised Response structure in order to be able to do these sorts of things and not limit ourselves at all by the structures on the outer layers :) |
OK! What about encryption? Did you think something about it? I am not a cryptography expert, but follows my first thoughts about it: Well.... just an idea to start the discussion about encryption. As I said I don't know too much about encryption, maybe there are much better ways to do it. |
I would choose user configurable client and server certificates as the way to go. This is probably not functionality that is required right now though, so I think we will let the idea rest for a while :) |
Right, so having built up a tech-preview of a system built on protocol buffers, it turns out that whilst they produce extremely small data transmissions for the network, they're pretty slow to construct (and add a significant overhead). I'm currently investigating thrift to see if that gets the balance of bandwidth to computation time a little better. I'll report back with how I get on... |
I really did not write a proof of concept to measure the performance of Protocol Buffers over XML parsers. But Google says that "[Protocol Buffers] are 3 to 10 times smaller, and 20 to 100 times faster". (source: https://developers.google.com/protocol-buffers/docs/overview#whynotxml ) |
I implemented Mercury working with protobuffers on this branch: https://github.com/gjunquera/mercury/tree/temp If you have some time, run this implementation and give your thoughts about the performance. |
So I was trying out protocol buffers with the reflection code (which has a very high communications requirement, hence the performance needs). @gjunquera, it doesn't look like you've converted the reflection code over, and the performance of the remaining elements isn't going to be anywhere near as sensitive as that is, so testing it out won't help a great deal. It turns out, after some tweaking, that the performance with protocol buffers is marginally better than with the XML parsing code we had in there, but not by much (maybe 6% faster). Still, it saves me having to re-implement it in thrift, so I'm going to keep pushing ahead on that front. |
I forgot to say that I didn't change reflection code, sorry about it. If it is defined that protobuf is the way to go, when are you planning to start the implementation? Our team (samsung sidi) can help with the implementation. |
We are looking at redesigning the communications protocol to take this project to the next level.
The communication protocol is rather simple at the moment and is using a defined XML communication in 1 direction only, from client to server. In order to reach a point where Mercury can be used for many purposes, the following would need to be supported by the communications protocol:
By implementing these, it would be the most flexible. Examples of when a bind connection is needed: current "assessment suite". Examples of when a reverse connection is needed: full exploitation suite
Any ideas are welcome
Tyrone
The text was updated successfully, but these errors were encountered: