Skip to content

Added basic benchmark#12

Merged
tomlegkov merged 4 commits intomasterfrom
benchmark
Apr 6, 2020
Merged

Added basic benchmark#12
tomlegkov merged 4 commits intomasterfrom
benchmark

Conversation

@tomlegkov
Copy link
Copy Markdown
Owner

Added basic benchmark functionality.
Marked many features as TODO since they're nice to have but not required for the basic benchmark.

@tomlegkov tomlegkov requested a review from yehonatanz March 29, 2020 11:39
@tomlegkov tomlegkov linked an issue Mar 29, 2020 that may be closed by this pull request
Copy link
Copy Markdown
Collaborator

@yehonatanz yehonatanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we really sure we want to generate random packets in python for the test?
Won't it be preferable to have a benchmark pcap file in our repo and play it through marine 1 or more times?

from random import randint, getrandbits
from typing import List

from tests.marine.benchmark.conversation_generators import (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relative import please

from .conversation_generators import ...

macs = generate_macs(ip_count * 2)
# The list is already randomly generated, so taking consecutive values is random enough
return [
IpPair(macs[i], macs[i + 1], ips[i], ips[i + 1])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether I'd rather that or:

list(map(IpPair, macs[::2], macs[1::2], ips[::2], ips[1::2]))

What do you think?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jesus Christ

benchmark_start_used_memory = get_used_memory_in_mb()

if not args.benchmark or args.benchmark == "all":
# TODO: I can take these from benchmark_functions, but I want them executed in this order
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict is ordered since in python>=3.8 (maybe even >=3.7), so you can take them from benchmark_functions

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ordered by order of insertion? If so I'll take a look

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

benchmark_8_fields(generated_packets)
benchmark_bpf_and_display_filter_and_3_fields(generated_packets)
benchmark_bpf_and_display_filter_and_8_fields(generated_packets)
elif args.benchmark in benchmark_functions:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cool if this option got a glob pattern and not just an exact name (but that's obviously an overkill)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a todo :)

print(
f"""
Executed {f.__name__} on {len(packets)} packets in {delta_time} seconds,
which is {len(packets) / delta_time} packets per second.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format all floats with {bla:.2f} to include only two decimal digits



@dataclass
class IpPair:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find a better name

dst_ip: str
_ports: Set[int] = field(default_factory=set)

def _generate_port(self) -> int:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ugly stateful logic really shouldn't reside here

marine_instance = Marine(
os.path.join(os.path.dirname(os.path.abspath(__file__)), "libmarine.so")
)
process = psutil.Process(os.getpid())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to pass pid, default is current process

def udp_base_layer(
fn: Callable[[Packet, Packet, int], List[BenchmarkPacket]]
) -> Callable[[IpPair, int], List[BenchmarkPacket]]:
@wraps(fn)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire file is way more comlicated than it has any right to.
Why is everything a decorator and not a simple function? Your decorators don't really treat your functions as black boxes.

Copy link
Copy Markdown
Owner Author

@tomlegkov tomlegkov Mar 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What isn't treated as a blackbox? The decorated function simply returns an instance of BenchmarkPacket.
The "complication" stems from the generation of conversations - I wanted it to be easy to generate layer-5 conversations (notice how easy it will be to generate an HTTP conversation now - it will simply call existing decorators and return BenchmarkPacket with HTTP in it)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do all that with plain functions, without decorators that completely change the function signatures.
I think this structure will be way clearer:

def _amplify_to_conversartion(side_a_packet: Packet, side_b_packet: Packet, conversation_length: int) -> Iterable[Packet]: ...

def _generate_sides_tcp_packets(ip_pair: IpPair) -> Tuple[Packet, Packet]: ...
def _generate_sides_udp_packets(ip_pair: IpPair) -> Tuple[Packet, Packet]: ...

def generate_raw_tcp_conversation(ip_pair: IpPair, conversation_length: int) -> Iterable[BenchmarkPacket]:
    side_a_packet, side_b_packet = _generate_sides_tcp_packets(ip_pair)
    packets = _amplify_to_conversation(side_a_packet, side_b_packet, conversation_length)
    for packet in packets:
        yield BenchmarkPacket(...) 

.gitignore Outdated
.idea/
tests/fixtures/marine/libmarine.so No newline at end of file
tests/fixtures/marine/libmarine.so
tests/marine/benchmark/libmarine.so
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it get annoying to have so many copies of this .so in our tests?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah

def udp_base_layer(
fn: Callable[[Packet, Packet, int], List[BenchmarkPacket]]
) -> Callable[[IpPair, int], List[BenchmarkPacket]]:
@wraps(fn)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do all that with plain functions, without decorators that completely change the function signatures.
I think this structure will be way clearer:

def _amplify_to_conversartion(side_a_packet: Packet, side_b_packet: Packet, conversation_length: int) -> Iterable[Packet]: ...

def _generate_sides_tcp_packets(ip_pair: IpPair) -> Tuple[Packet, Packet]: ...
def _generate_sides_udp_packets(ip_pair: IpPair) -> Tuple[Packet, Packet]: ...

def generate_raw_tcp_conversation(ip_pair: IpPair, conversation_length: int) -> Iterable[BenchmarkPacket]:
    side_a_packet, side_b_packet = _generate_sides_tcp_packets(ip_pair)
    packets = _amplify_to_conversation(side_a_packet, side_b_packet, conversation_length)
    for packet in packets:
        yield BenchmarkPacket(...) 

@tomlegkov tomlegkov self-assigned this Apr 2, 2020
@tomlegkov tomlegkov requested a review from yehonatanz April 3, 2020 08:29
Copy link
Copy Markdown
Collaborator

@yehonatanz yehonatanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general thoughts in addition to the notes:
I still think the conversation generation code is too complicated, but I'm not sure how to simplify it.
I just wouldn't expect the benchmark code to be so much more about generating than about benchmarking.

from .utils import BenchmarkPacket

AUTO_RESET_COUNT = (
20000 # TODO use marine_instance.get_epan_reset_count() when it's implemented
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's implemented

Copy link
Copy Markdown
Owner Author

@tomlegkov tomlegkov Apr 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet (pull request #13 )

class PortGenerator:
_conversation_to_ports: Dict[Layer3Conversation, set] = defaultdict(set)

@staticmethod
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're still keeping as a sad global state.
Convert it to instance methods and make generate_packets initialize an instance and pass it to tcp/udp generators.
But is it really so crucial to generate distinct ports? Can't we just rely on randomness to minimize collisions and tolerate the rare few collisions that may occur?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll use this comment to also comment on your general comment.

In terms of randomness in the benchmark - I think creating a "static case" will take just as much code (still need to generate the conversations - which is just as much code). We'll obviously have to keep this code if we want to extend the "static case", so in terms of total code usage we haven't achieved much.

Additionally, and in my opinion even more important, is treating Marine as a black box. We don't know how conversation storage and creation affects the benchmark, so using random values allows us to at least see we can get similar results over many runs, and indicate that the code works fine in terms of speed.

As for the ports, I think keeping the random is important. I agree with you that we can trust randomness to minimize collisions, so I'll remove the state.

@tomlegkov tomlegkov merged commit 4b341cc into master Apr 6, 2020
@tomlegkov tomlegkov deleted the benchmark branch April 6, 2020 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add benchmarks

2 participants