Added basic benchmark by tomlegkov · Pull Request #12 · tomlegkov/marine-python

tomlegkov · 2020-03-29T11:39:43Z

Added basic benchmark functionality.
Marked many features as TODO since they're nice to have but not required for the basic benchmark.

yehonatanz

Are we really sure we want to generate random packets in python for the test?
Won't it be preferable to have a benchmark pcap file in our repo and play it through marine 1 or more times?

yehonatanz · 2020-03-29T12:05:39Z

tests/marine/benchmark/benchmark_generator.py

+from random import randint, getrandbits
+from typing import List
+
+from tests.marine.benchmark.conversation_generators import (


Relative import please

from .conversation_generators import ...

yehonatanz · 2020-03-29T12:07:50Z

tests/marine/benchmark/benchmark_generator.py

+    macs = generate_macs(ip_count * 2)
+    # The list is already randomly generated, so taking consecutive values is random enough
+    return [
+        IpPair(macs[i], macs[i + 1], ips[i], ips[i + 1])


Not sure whether I'd rather that or:

list(map(IpPair, macs[::2], macs[1::2], ips[::2], ips[1::2]))

What do you think?

Jesus Christ

yehonatanz · 2020-03-29T12:13:09Z

tests/marine/benchmark/main.py

+    benchmark_start_used_memory = get_used_memory_in_mb()
+
+    if not args.benchmark or args.benchmark == "all":
+        # TODO: I can take these from benchmark_functions, but I want them executed in this order


dict is ordered since in python>=3.8 (maybe even >=3.7), so you can take them from benchmark_functions

ordered by order of insertion? If so I'll take a look

yehonatanz · 2020-03-29T12:14:02Z

tests/marine/benchmark/main.py

+        benchmark_8_fields(generated_packets)
+        benchmark_bpf_and_display_filter_and_3_fields(generated_packets)
+        benchmark_bpf_and_display_filter_and_8_fields(generated_packets)
+    elif args.benchmark in benchmark_functions:


It would be cool if this option got a glob pattern and not just an exact name (but that's obviously an overkill)

I'll add a todo :)

yehonatanz · 2020-03-29T12:15:51Z

tests/marine/benchmark/main.py

+        print(
+            f"""
+Executed {f.__name__} on {len(packets)} packets in {delta_time} seconds, 
+which is {len(packets) / delta_time} packets per second.


Format all floats with {bla:.2f} to include only two decimal digits

yehonatanz · 2020-03-29T12:21:32Z

tests/marine/benchmark/utils.py

+
+
+@dataclass
+class IpPair:


Find a better name

yehonatanz · 2020-03-29T12:24:26Z

tests/marine/benchmark/utils.py

+    dst_ip: str
+    _ports: Set[int] = field(default_factory=set)
+
+    def _generate_port(self) -> int:


This ugly stateful logic really shouldn't reside here

yehonatanz · 2020-03-29T12:33:40Z

tests/marine/benchmark/main.py

+marine_instance = Marine(
+    os.path.join(os.path.dirname(os.path.abspath(__file__)), "libmarine.so")
+)
+process = psutil.Process(os.getpid())


No need to pass pid, default is current process

yehonatanz · 2020-03-29T12:46:07Z

tests/marine/benchmark/conversation_generators.py

+def udp_base_layer(
+    fn: Callable[[Packet, Packet, int], List[BenchmarkPacket]]
+) -> Callable[[IpPair, int], List[BenchmarkPacket]]:
+    @wraps(fn)


This entire file is way more comlicated than it has any right to.
Why is everything a decorator and not a simple function? Your decorators don't really treat your functions as black boxes.

What isn't treated as a blackbox? The decorated function simply returns an instance of BenchmarkPacket.
The "complication" stems from the generation of conversations - I wanted it to be easy to generate layer-5 conversations (notice how easy it will be to generate an HTTP conversation now - it will simply call existing decorators and return BenchmarkPacket with HTTP in it)

You can do all that with plain functions, without decorators that completely change the function signatures.
I think this structure will be way clearer:

def _amplify_to_conversartion(side_a_packet: Packet, side_b_packet: Packet, conversation_length: int) -> Iterable[Packet]: ... def _generate_sides_tcp_packets(ip_pair: IpPair) -> Tuple[Packet, Packet]: ... def _generate_sides_udp_packets(ip_pair: IpPair) -> Tuple[Packet, Packet]: ... def generate_raw_tcp_conversation(ip_pair: IpPair, conversation_length: int) -> Iterable[BenchmarkPacket]: side_a_packet, side_b_packet = _generate_sides_tcp_packets(ip_pair) packets = _amplify_to_conversation(side_a_packet, side_b_packet, conversation_length) for packet in packets: yield BenchmarkPacket(...)

yehonatanz · 2020-03-29T12:48:30Z

.gitignore

 .idea/
-tests/fixtures/marine/libmarine.so
+tests/fixtures/marine/libmarine.so
+tests/marine/benchmark/libmarine.so


Doesn't it get annoying to have so many copies of this .so in our tests?

yehonatanz · 2020-03-30T17:22:00Z

tests/marine/benchmark/conversation_generators.py

+def udp_base_layer(
+    fn: Callable[[Packet, Packet, int], List[BenchmarkPacket]]
+) -> Callable[[IpPair, int], List[BenchmarkPacket]]:
+    @wraps(fn)


You can do all that with plain functions, without decorators that completely change the function signatures.
I think this structure will be way clearer:

def _amplify_to_conversartion(side_a_packet: Packet, side_b_packet: Packet, conversation_length: int) -> Iterable[Packet]: ... def _generate_sides_tcp_packets(ip_pair: IpPair) -> Tuple[Packet, Packet]: ... def _generate_sides_udp_packets(ip_pair: IpPair) -> Tuple[Packet, Packet]: ... def generate_raw_tcp_conversation(ip_pair: IpPair, conversation_length: int) -> Iterable[BenchmarkPacket]: side_a_packet, side_b_packet = _generate_sides_tcp_packets(ip_pair) packets = _amplify_to_conversation(side_a_packet, side_b_packet, conversation_length) for packet in packets: yield BenchmarkPacket(...)

yehonatanz

Some general thoughts in addition to the notes:
I still think the conversation generation code is too complicated, but I'm not sure how to simplify it.
I just wouldn't expect the benchmark code to be so much more about generating than about benchmarking.

yehonatanz · 2020-04-05T09:07:06Z

tests/marine/benchmark/main.py

+from .utils import BenchmarkPacket
+
+AUTO_RESET_COUNT = (
+    20000  # TODO use marine_instance.get_epan_reset_count() when it's implemented


It's implemented

Not yet (pull request #13 )

tests/marine/benchmark/main.py

yehonatanz · 2020-04-05T09:23:15Z

tests/marine/benchmark/conversation_generators.py

+class PortGenerator:
+    _conversation_to_ports: Dict[Layer3Conversation, set] = defaultdict(set)
+
+    @staticmethod


You're still keeping as a sad global state.
Convert it to instance methods and make generate_packets initialize an instance and pass it to tcp/udp generators.
But is it really so crucial to generate distinct ports? Can't we just rely on randomness to minimize collisions and tolerate the rare few collisions that may occur?

I'll use this comment to also comment on your general comment.

In terms of randomness in the benchmark - I think creating a "static case" will take just as much code (still need to generate the conversations - which is just as much code). We'll obviously have to keep this code if we want to extend the "static case", so in terms of total code usage we haven't achieved much.

Additionally, and in my opinion even more important, is treating Marine as a black box. We don't know how conversation storage and creation affects the benchmark, so using random values allows us to at least see we can get similar results over many runs, and indicate that the code works fine in terms of speed.

As for the ports, I think keeping the random is important. I agree with you that we can trust randomness to minimize collisions, so I'll remove the state.

tests/marine/benchmark/benchmark_generator.py

Added basic benchmark

269b3ca

tomlegkov requested a review from yehonatanz March 29, 2020 11:39

tomlegkov linked an issue Mar 29, 2020 that may be closed by this pull request

Add benchmarks #3

Closed

yehonatanz requested changes Mar 29, 2020

View reviewed changes

yehonatanz requested changes Mar 30, 2020

View reviewed changes

tomlegkov added 2 commits April 2, 2020 02:11

Fix benchmark CR

9a60c2f

Fix flake8 and configure flake8 to work with Black

3fb1bd6

tomlegkov self-assigned this Apr 2, 2020

tomlegkov requested a review from yehonatanz April 3, 2020 08:29

yehonatanz requested changes Apr 5, 2020

View reviewed changes

CR fixes

22f1236

yehonatanz approved these changes Apr 6, 2020

View reviewed changes

tomlegkov merged commit 4b341cc into master Apr 6, 2020

tomlegkov deleted the benchmark branch April 6, 2020 13:20



		@dataclass
		class IpPair:

Conversation

tomlegkov commented Mar 29, 2020

Uh oh!

yehonatanz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomlegkov Mar 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yehonatanz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomlegkov Apr 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tomlegkov Mar 29, 2020 •

edited

Loading

tomlegkov Apr 5, 2020 •

edited

Loading