Conversation
yehonatanz
left a comment
There was a problem hiding this comment.
Are we really sure we want to generate random packets in python for the test?
Won't it be preferable to have a benchmark pcap file in our repo and play it through marine 1 or more times?
| from random import randint, getrandbits | ||
| from typing import List | ||
|
|
||
| from tests.marine.benchmark.conversation_generators import ( |
There was a problem hiding this comment.
Relative import please
from .conversation_generators import ...| macs = generate_macs(ip_count * 2) | ||
| # The list is already randomly generated, so taking consecutive values is random enough | ||
| return [ | ||
| IpPair(macs[i], macs[i + 1], ips[i], ips[i + 1]) |
There was a problem hiding this comment.
Not sure whether I'd rather that or:
list(map(IpPair, macs[::2], macs[1::2], ips[::2], ips[1::2]))What do you think?
tests/marine/benchmark/main.py
Outdated
| benchmark_start_used_memory = get_used_memory_in_mb() | ||
|
|
||
| if not args.benchmark or args.benchmark == "all": | ||
| # TODO: I can take these from benchmark_functions, but I want them executed in this order |
There was a problem hiding this comment.
dict is ordered since in python>=3.8 (maybe even >=3.7), so you can take them from benchmark_functions
There was a problem hiding this comment.
ordered by order of insertion? If so I'll take a look
| benchmark_8_fields(generated_packets) | ||
| benchmark_bpf_and_display_filter_and_3_fields(generated_packets) | ||
| benchmark_bpf_and_display_filter_and_8_fields(generated_packets) | ||
| elif args.benchmark in benchmark_functions: |
There was a problem hiding this comment.
It would be cool if this option got a glob pattern and not just an exact name (but that's obviously an overkill)
tests/marine/benchmark/main.py
Outdated
| print( | ||
| f""" | ||
| Executed {f.__name__} on {len(packets)} packets in {delta_time} seconds, | ||
| which is {len(packets) / delta_time} packets per second. |
There was a problem hiding this comment.
Format all floats with {bla:.2f} to include only two decimal digits
tests/marine/benchmark/utils.py
Outdated
|
|
||
|
|
||
| @dataclass | ||
| class IpPair: |
tests/marine/benchmark/utils.py
Outdated
| dst_ip: str | ||
| _ports: Set[int] = field(default_factory=set) | ||
|
|
||
| def _generate_port(self) -> int: |
There was a problem hiding this comment.
This ugly stateful logic really shouldn't reside here
tests/marine/benchmark/main.py
Outdated
| marine_instance = Marine( | ||
| os.path.join(os.path.dirname(os.path.abspath(__file__)), "libmarine.so") | ||
| ) | ||
| process = psutil.Process(os.getpid()) |
There was a problem hiding this comment.
No need to pass pid, default is current process
| def udp_base_layer( | ||
| fn: Callable[[Packet, Packet, int], List[BenchmarkPacket]] | ||
| ) -> Callable[[IpPair, int], List[BenchmarkPacket]]: | ||
| @wraps(fn) |
There was a problem hiding this comment.
This entire file is way more comlicated than it has any right to.
Why is everything a decorator and not a simple function? Your decorators don't really treat your functions as black boxes.
There was a problem hiding this comment.
What isn't treated as a blackbox? The decorated function simply returns an instance of BenchmarkPacket.
The "complication" stems from the generation of conversations - I wanted it to be easy to generate layer-5 conversations (notice how easy it will be to generate an HTTP conversation now - it will simply call existing decorators and return BenchmarkPacket with HTTP in it)
There was a problem hiding this comment.
You can do all that with plain functions, without decorators that completely change the function signatures.
I think this structure will be way clearer:
def _amplify_to_conversartion(side_a_packet: Packet, side_b_packet: Packet, conversation_length: int) -> Iterable[Packet]: ...
def _generate_sides_tcp_packets(ip_pair: IpPair) -> Tuple[Packet, Packet]: ...
def _generate_sides_udp_packets(ip_pair: IpPair) -> Tuple[Packet, Packet]: ...
def generate_raw_tcp_conversation(ip_pair: IpPair, conversation_length: int) -> Iterable[BenchmarkPacket]:
side_a_packet, side_b_packet = _generate_sides_tcp_packets(ip_pair)
packets = _amplify_to_conversation(side_a_packet, side_b_packet, conversation_length)
for packet in packets:
yield BenchmarkPacket(...)
.gitignore
Outdated
| .idea/ | ||
| tests/fixtures/marine/libmarine.so No newline at end of file | ||
| tests/fixtures/marine/libmarine.so | ||
| tests/marine/benchmark/libmarine.so |
There was a problem hiding this comment.
Doesn't it get annoying to have so many copies of this .so in our tests?
| def udp_base_layer( | ||
| fn: Callable[[Packet, Packet, int], List[BenchmarkPacket]] | ||
| ) -> Callable[[IpPair, int], List[BenchmarkPacket]]: | ||
| @wraps(fn) |
There was a problem hiding this comment.
You can do all that with plain functions, without decorators that completely change the function signatures.
I think this structure will be way clearer:
def _amplify_to_conversartion(side_a_packet: Packet, side_b_packet: Packet, conversation_length: int) -> Iterable[Packet]: ...
def _generate_sides_tcp_packets(ip_pair: IpPair) -> Tuple[Packet, Packet]: ...
def _generate_sides_udp_packets(ip_pair: IpPair) -> Tuple[Packet, Packet]: ...
def generate_raw_tcp_conversation(ip_pair: IpPair, conversation_length: int) -> Iterable[BenchmarkPacket]:
side_a_packet, side_b_packet = _generate_sides_tcp_packets(ip_pair)
packets = _amplify_to_conversation(side_a_packet, side_b_packet, conversation_length)
for packet in packets:
yield BenchmarkPacket(...)
yehonatanz
left a comment
There was a problem hiding this comment.
Some general thoughts in addition to the notes:
I still think the conversation generation code is too complicated, but I'm not sure how to simplify it.
I just wouldn't expect the benchmark code to be so much more about generating than about benchmarking.
| from .utils import BenchmarkPacket | ||
|
|
||
| AUTO_RESET_COUNT = ( | ||
| 20000 # TODO use marine_instance.get_epan_reset_count() when it's implemented |
| class PortGenerator: | ||
| _conversation_to_ports: Dict[Layer3Conversation, set] = defaultdict(set) | ||
|
|
||
| @staticmethod |
There was a problem hiding this comment.
You're still keeping as a sad global state.
Convert it to instance methods and make generate_packets initialize an instance and pass it to tcp/udp generators.
But is it really so crucial to generate distinct ports? Can't we just rely on randomness to minimize collisions and tolerate the rare few collisions that may occur?
There was a problem hiding this comment.
I'll use this comment to also comment on your general comment.
In terms of randomness in the benchmark - I think creating a "static case" will take just as much code (still need to generate the conversations - which is just as much code). We'll obviously have to keep this code if we want to extend the "static case", so in terms of total code usage we haven't achieved much.
Additionally, and in my opinion even more important, is treating Marine as a black box. We don't know how conversation storage and creation affects the benchmark, so using random values allows us to at least see we can get similar results over many runs, and indicate that the code works fine in terms of speed.
As for the ports, I think keeping the random is important. I agree with you that we can trust randomness to minimize collisions, so I'll remove the state.
Added basic benchmark functionality.
Marked many features as TODO since they're nice to have but not required for the basic benchmark.