Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protocol tests changes #76

Closed

Conversation

mickvandijke
Copy link
Member

No description provided.

Copy link
Member

@josecelano josecelano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful improvements @WarmBeer.

Could you rebase it? And take a look at my comments about ConnectionIdData. It's the only change I'm not sure.

pub client_id: ClientId,
pub expiration_timestamp: Timestamp32
}
pub struct ConnectionIdData(pub [u8; 8]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @WarmBeer, The whole point of this struct was to deliberately create a higher abstraction for what we store inside the connection id. I did not want to deal with low-level byte operations, only for the constructor and the converter.

I wanted to make it more explicit what a ConnectionId contains.

With your current changes is more like a parser.

Maybe your approach is more clear because right now, it's a parser/converter but then we should rename it to ConnectionIdDataPacker or something like that.

Anyway, I would make sure we will not have any extra behaviour for this struct. For example, we could need to check if two ConnectionIdData have the same ClientId or we could move some logic to check if the ConnectionId has expired. If we have such changes in the future, I would revert the change because I prefer to work with structs instead of byte arrays. I prefer to keep the byte arrays in the lower layers. I'm afraid the app could end up having a lot of cryptic byte array operations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @WarmBeer, The whole point of this struct was to deliberately create a higher abstraction for what we store inside the connection id. I did not want to deal with low-level byte operations, only for the constructor and the converter.

I wanted to make it more explicit what a ConnectionId contains.

Hey @josecelano , the reason why I wanted to store the whole ConnectionIdData as one bytes array instead of two is because the following function returns a new owned [u8; 8] of two already existing [u8; 4]:

impl ConnectionIdData {
    ...
    pub fn to_bytes(&self) -> [u8; 8] {
        let connection_id: Vec<u8> = [
            self.client_id.to_bytes().as_slice(),
            self.expiration_timestamp.to_le_bytes().as_slice(),
        ].concat();

        let connection_as_array: [u8; 8] = connection_id.try_into().unwrap();

        connection_as_array
    }
    ...
}

My thought process was that ideally we'd want to return &[u8; 8] from two &[u8; 4], but unfortunately Rust does not support this yet. We can do the reverse however and that is why I chose to store the whole ConnectionIdData as one bytes array instead and then pass ClientId and Timestamp32 by reference: &[u8] as a subset of [u8; 8].

Passing by reference is usually the preferred way in Rust because it saves memory in most cases. However, in this case I'm not even sure if it will save memory since the data types we are passing are so small.

I'm fine with whatever you decide to keep or change. If you want to keep the existing ConnectionIdData format, we could change the above function so that it returns a moved value instead of a copied value:

Not tested.

impl ConnectionIdData {
    ...
    pub fn into_bytes(self) -> [u8; 8] {
        let connection_id: Vec<u8> = [
            self.client_id.to_bytes().as_slice(),
            self.expiration_timestamp.to_le_bytes().as_slice(),
        ].concat();

        let connection_as_array: [u8; 8] = connection_id.try_into().unwrap();

        connection_as_array
    }
    ...
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WarmBeer interesting ...

My thought process is: sometimes you need to convert a Model object into another type. And sometimes, just to keep it simple, you decide to put the converter inside the Model like this:

class Model {
  attribute1;
  attribute2;
 
  function constructor(attribtue1, attribute2) {
    ...
  }

  function toJson() -> string;
}

The toJson() method creates a new object, but you do not want to change the original Model object.

You could separate responsibilities like this:

class Model {
  attribute1;
  attribute2;
 
  function constructor(attribtue1, attribute2) {
    ...
  }

  function toJson() -> string;
}

class ModelToJsonConverter() {
  function toJson(model: Model) -> String {
    return "{
      attribute1: model.attribute1,
      attribute2: model.attribute2,
    }";
  }

In our case, we only need to use the fromJson and toJson.

If we use your approach, into_bytes for me would be a little bit weird because it seems you are destroying the Model object and converting it into a different thing.

In that case, I would call it ConnectionIdDataBuilder or something like that.

I have not found a place in the code where the ConnectionIdData is used after calling to_bytes. I suppose I found it weird because, in other languages, you would not be sure if that is going to happen elsewhere, so you do not do it (in fact, you can't). But in Rust, if that were the case, the compiler would tell you.

For example, in this function:

fn new_connection_id(&self, remote_address: &SocketAddr, current_timestamp: Timestamp64) -> ConnectionId {

    let connection_id_data = self.generate_connection_id_data(&remote_address, current_timestamp);

    let encrypted_connection_id_data = self.encrypt_connection_id_data(&connection_id_data);

    self.pack_connection_id(encrypted_connection_id_data)
}

I would not be able to call connection_id_data.client_id after encrypt_connection_id_data because internally it converts the connection_id_data into the byte array.

Besides, I suppose Rust would drop the variables as soon as possible. If I create a new array and the ConenctionIdData is not used anymore, the ConenctionIdData would be dropped.

In that example, I would not expect connection_id_data to be a byte array. It would be surprising if, after calling encrypt_connection_id_data the connection_id_data has been mutated. People would think: why do you need to change the connection_id_data in order to build a new encrypted version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, in this function:

fn new_connection_id(&self, remote_address: &SocketAddr, current_timestamp: Timestamp64) -> ConnectionId {

    let connection_id_data = self.generate_connection_id_data(&remote_address, current_timestamp);

    let encrypted_connection_id_data = self.encrypt_connection_id_data(&connection_id_data);

    self.pack_connection_id(encrypted_connection_id_data)
}

I would not be able to call connection_id_data.client_id after encrypt_connection_id_data because internally it converts the connection_id_data into the byte array.

I know it might seem a little strange if you are used to OOP languages, but in Rust this is preferred. We don't want to waste memory by keeping two identical pieces of data in memory if they will always be identical for as long as they both live. We should either pass the original data by reference or move ownership of the data.

Exceptions for this are usually very small data types such as u8, u16, u32, i8, i6, i32 etc.. Passing these values as copied takes about as much memory as passing them as reference. This is why you never see &u32 etc..

In that example, I would not expect connection_id_data to be a byte array. It would be surprising if, after calling encrypt_connection_id_data the connection_id_data has been mutated. People would think: why do you need to change the connection_id_data in order to build a new encrypted version?

If you still need connection_id_data after encrypting, then it does not make sense to move its data to another place. Instead we should likely pass a read only/shared reference of the data to the encrypting function. Eg: &ConnectionIdData. But if we don't need to retain the data in connection_id_data anymore after encrypting it. It makes sense to just move ownership of connection_id_data to the encryption function instead of passing it a temporary copy of connection_id.

Bare in mind that a function that takes self instead of &self can only be called from an owned reference. You can not call these functions from a shared reference, so you always know that the original data can not be suddenly manipulated or moved from another function.

struct TestStruct(pub [u8; 12]);

    impl TestStruct {
        pub fn into_bytes(self) -> [u8; 12] {
            self.0
        }
    }

    // pass by shared reference
    fn move_test_struct_impossible(test_struct: &TestStruct) {
        let x = test_struct.into_bytes(); // this IS NOT possible
    }

    // pass by owned reference
    fn move_test_struct_possible(test_struct: TestStruct) {
        let x = test_struct.into_bytes(); // this IS possible
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WarmBeer thank you for helping me to understand Rust better.

I see how powerful the concept is in terms of saving memory. You can reuse the same allocated memory and pass the ownership safely because you always know the owner.

But the thing I do not like it's that it seems Rust "forces" you to change the internal representation of your objects just to save memory. Since you know you want to convert the ConnectionIdData into a byte array, store it as a byte array to avoid having both objects in memory for a while.

My reasoning is:

  • If you know that the only purpose of the first object it's to create the second one (the byte array). That's for me, a king builder or factory.
  • But if your initial object makes sense by itself because it has other rules and it just happens that you want to convert it to a byte array, then you should pay the cost of more memory consumption and let Rust drop the value when it's not used anymore.

For example, in this new function:

pub fn client_id(&self) -> &[u8] {
    &self.0[..4]
}

You return a byte array instead of a ClientId, and that's used in the verification:

let current_client_id = Make::<DefaultHasher>::new(remote_address);
if connection_id_data.client_id != current_client_id {
    return false;
}

I suppose that only works because you know that internally the ClientId is a [u8; 4].
But the ClientId implementation could change. In fact, you changed the ClientId:

pub struct ClientId {
    pub value: [u8; 4],
}

Making the value public.

You can fix it by constructing the initial value again:

pub fn client_id(&self) -> &[u8] {
    &self.0[..4]
}

But it seems weird to me because you are storing values in a low-level format and converting back and forth when you need them inside your struct.

Sorry for being picky. This helps me to understand Rust and also rusty style better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @josecelano , I only decided to store the client_id and timestamp in low-level format because they both only contain a small byte array. If the client_id had another field such as:

pub struct ClientId {
    inner: [u8; 4],
    name: String
}

It would be easier to just store the entire client_id as struct in connection_id_data. When the client_id is needed, we just return a reference from the connection_id_data:

pub struct ConnectionIdData {
    pub client_id: ClientId,
    pub timestamp: u32
}

fn main() {
    ...
    let client_id: &ClientId = &connection_id_data.client_id;
    let client_id_as_bytes: &[u8] = client_id.as_bytes();
}

This way we won't have to copy any data where we don't need ownership of.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the thing I do not like it's that it seems Rust "forces" you to change the internal representation of your objects just to save memory. Since you know you want to convert the ConnectionIdData into a byte array, store it as a byte array to avoid having both objects in memory for a while.

We're not changing the internal representation of the object. We are forcibly dropping the object (instead of waiting for Rust to drop it on the end of scope) and recycling some if its data for a new object :).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @WarmBeer, the discussion is becoming even more interesting.

  1. For me, just because an object contains a small amount of data does not mean you cannot use an abstraction. For example and email is only and String, but if you have some rules, you can create an Email abstraction. In this case, you are using the abstraction in the constructor to make sure we validate the ClientId and Timestamp32. But then you do not need them anymore. The getters for them return the lower-level values. The problem with this approach is consumers of this struct have to deal with those low-level representations of ClientId and Timesampt32:
pub fn client_id(&self) -> &[u8] {
    &self.0[..4]
}

pub fn timestamp(&self) -> u32 {
    u32::from_le_bytes([self.0[4], self.0[5], self.0[6], self.0[7]])
}

They do not even know those values are a ClientId and Timestamp32.

How do you know that two ClientIds are equal if their low-level representation are equal.
That's something known only by those structs.

2.- You are prioritizing the use of the bytes array. You are prioritizing memory consumption over CPU. If you have many calls to client_id and timestamp methods, you would constantly be somehow re-constructing those objects. It seems the object has two responsibilities: storing information you want to retrieve later and transforming that information into a low-level dto format. It's OK if we want to merge those responsibilities, but I would change the getters to generate the original values in the constructor.

3.- Owernship

This way we won't have to copy any data where we don't need ownership of.

This made me think a lot about ownership. It seems you do not want to repeat twice different representations of the same data in memory to consume less memory. It's like a DRY principle but for memory.

I think we build classes (structs) that have data and behaviour. Just because two classes use the same data, that does not mean we should build only one class. For example, a domain Model and its Dto representation. They could have the same data:

class User {
  name: UserName
  email: Email
}

class UserName {
  // Some rules here
}

class Email {
  // Some rules here, validate email ...
}

class UserDto {
   name: String,
   email: String
}

Both User and UserDto use two strings, but in the model, you want to check things.
I suppose you are doing something like:

class UserDto {
  name: string
  email: string
  function from(name: UserName, email: Email);
  function name() -> String;
  function email() -> String;
}

For me, ownership should be used to ensure only one owner can mutate the state of the struct at a given time. The process would be as follows: first, you design your entities, then control the state with the ownership.

In this case, I have the feeling we are doing the opposite. Since we know User and UserDto will not mutate state independently, we join them together to save memory.

I feel that this strategy, in the long term, can lead to a poor design or at least a design close to the machine with a lot of low level data structures.

My question is would you do the same if you had infinite memory? I mean, just because we are using Rust or code priority is to costume less memory rather than maybe make it more readable. I'm totally fine with that approach. I suppose for this kind of service; performance is key. But I think, in general, this kind of software is also very cryptic.

In our example:

fn new_connection_id(&self, remote_address: &SocketAddr, current_timestamp: Timestamp64) -> ConnectionId {

    let connection_id_data = self.generate_connection_id_data(&remote_address, current_timestamp);

    let encrypted_connection_id_data = self.encrypt_connection_id_data(&connection_id_data);

    self.pack_connection_id(encrypted_connection_id_data)
}

We know that after calling encrypt_connection_id_data we do not need the connection_id_data. It's going to be dropped anyway soon, but if you want to save memory we could:

fn new_connection_id(&self, remote_address: &SocketAddr, current_timestamp: Timestamp64) -> ConnectionId {

    let connection_id_data = self.generate_connection_id_data(&remote_address, current_timestamp);

    let decrypted_raw_data = connection_id_data.to_bytes();

    drop connection_id_data; // or extract method

    let encrypted_connection_id_data = self.encrypt_connection_id_data(&decrypted_raw_data);

    self.pack_connection_id(encrypted_connection_id_data)
}

I think with this solution; we explicitly save memory by dropping variables we are sure we will not use anymore. But I guess that should not improve memory consumption too much because it will be dropped anyway.

This leads me to another thought: small methods reduce memory consumption because you do not have to keep variables around too much. But I'm sure the Rust compiler does that drop in the middle of the method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, just because an object contains a small amount of data does not mean you cannot use an abstraction. For example and email is only and String, but if you have some rules, you can create an Email abstraction. In this case, you are using the abstraction in the constructor to make sure we validate the ClientId and Timestamp32. But then you do not need them anymore. The getters for them return the lower-level values. The problem with this approach is consumers of this struct have to deal with those low-level representations of ClientId and Timesampt32:

They do not even know those values are a ClientId and Timestamp32.

2.- You are prioritizing the use of the bytes array. You are prioritizing memory consumption over CPU. If you have many calls to client_id and timestamp methods, you would constantly be somehow re-constructing those objects. It seems the object has two responsibilities: storing information you want to retrieve later and transforming that information into a low-level dto format. It's OK if we want to merge those responsibilities, but I would change the getters to generate the original values in the constructor.

For me storing the client_id and timestamp in one byte array was just a neat trick to avoid the function that returns the whole_connection_id_data as a new byte array ([u8; 8]). I only considered doing this because both ClientId and Timestamp32 are only a [u8; 4] and will probably only ever be just that. If ClientId and/or Timestamp32 are mostly needed to be borrowed in struct form (so that you can use specialized impl for these structs) then it would make more sense to store them in struct form instead of low-level.

3.- Owernship

This way we won't have to copy any data where we don't need ownership of.

This made me think a lot about ownership. It seems you do not want to repeat twice different representations of the same data in memory to consume less memory. It's like a DRY principle but for memory.

I think we build classes (structs) that have data and behaviour. Just because two classes use the same data, that does not mean we should build only one class. For example, a domain Model and its Dto representation. They could have the same data:

Both User and UserDto use two strings, but in the model, you want to check things.
I suppose you are doing something like:

For me, ownership should be used to ensure only one owner can mutate the state of the struct at a given time. The process would be as follows: first, you design your entities, then control the state with the ownership.

In this case, I have the feeling we are doing the opposite. Since we know User and UserDto will not mutate state independently, we join them together to save memory.

I feel that this strategy, in the long term, can lead to a poor design or at least a design close to the machine with a lot of low level data structures.

From your example I would do the following (Rust playground: link):

pub struct User {
    pub name: UserName,
    pub email: Email
}

impl User {
    pub fn new(name: String, email: String) -> Self {
        Self {
            name: UserName(name),
            email: Email(email)
        }
    }

    pub fn as_dto(&self) -> UserDto {
        UserDto {
            name: &self.name.0,
            email: &self.email.0
        }
    }
}

pub struct UserName(pub String);

impl UserName {
    pub fn validate() -> bool { true }
}

pub struct Email(pub String);

impl Email {
    pub fn validate() -> bool { true }
}

#[derive(Debug)]
pub struct UserDto<'a> {
    pub name: &'a str,
    pub email: &'a str
}

fn main() {
    let user = User::new("lucky".to_string(), "lucky@labrador.ca".to_string());
    
    println!("{:?}", user.as_dto());
}

This way, the UserDto struct only hold references to the original Strings instead of having owned copies of the strings. The implication of this is that User::as_dto() can only be called as long as the User is still in scope. You can test this in the Rust playground by adding std::mem::drop(user); after let user_dto = user.as_dto();.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is would you do the same if you had infinite memory? I mean, just because we are using Rust or code priority is to costume less memory rather than maybe make it more readable. I'm totally fine with that approach. I suppose for this kind of service; performance is key. But I think, in general, this kind of software is also very cryptic.

I'd think that passing a shared reference instead of allocating new memory for a copy would not only save memory, but would also be quicker. I think this is highly dependent on how big the data type is that you are trying to pass/copy though.

pub struct Timestamp32 {
pub value: u32
}
pub struct Timestamp32(pub u32);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @WarmBeer, I like this change.

src/udp/connection/timestamp_64.rs Show resolved Hide resolved
josecelano added a commit that referenced this pull request Sep 8, 2022
Proposed by @WarmBeer in this pull request:

#76

Co-authored-by: Warm Beer <mickvd99@gmail.com>
@da2ce7
Copy link
Contributor

da2ce7 commented Nov 30, 2022

Out of Date.

@da2ce7 da2ce7 closed this Nov 30, 2022
@mickvandijke mickvandijke deleted the protocol-tests branch October 25, 2023 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

3 participants