Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime performance #11

Open
3 of 10 tasks
mratsim opened this issue Mar 24, 2018 · 5 comments
Open
3 of 10 tasks

runtime performance #11

mratsim opened this issue Mar 24, 2018 · 5 comments

Comments

@mratsim
Copy link
Contributor

mratsim commented Mar 24, 2018

  • I don't think there much of a speed difference between D and Nim when GC'ed types are not involved for general code.

  • One perf gotcha is that by default Nim seq and strings have value semantics and assignment will deep copy.

I think performance all come down to data structure + programmer familiarity with the language + time spent.

Now there are domain specific considerations that can make a huge difference, and most Nim library authors publish extensive benchmarks of their solutions compared to their mainstream alternative

Generic need in the wild, parsing files

  • There is the Faster Command Line Tool in <insert language> benchmark that was started by the D community, Nim also replicated it, TL;DR D and Nim had the same speed and same compilation time. To be honest the fastest CSV parser I used (to parse GBs of machine learning datasets) is XSV in Rust.

Domain specific

Http server:

Functional programming

  • Zero_functional is currently number 1 or 2 against 9 other langs. The other number 2 or 1 lang being Rust. Zero_functional fuses loop at compile-time when chaining zip.map.filter.reduce functional constructs.

Numerical/scientific computing

This is my domain so I know much more about it.

  • D has the advantage of having access to register size and L1 cache or L2 cache size at compile-time when using LDC, this is important for truly generic code.

  • D does not have access to restrict and builtin_assume_aligned which is necessary to reach Fortran speed when operating on arrays and tensors.

  • D cannot disable (?) the GC at specific point.

Open questions

  • Does D has an alternative to closures that can inline proc passed to higher order functions like map?

  • Can D arrays be parametrized with compile-time proc? For example, for efficient parallel reduction you need to create an intermediate array of N elements (N your number of cores), it should be padded so that the elements do not sit in the same cache line (64B on all CPU) to avoid false sharing/cache invalidation. For a type T I need something like this var results{.align64, noInit.}: array[min(T.sizeof, OPENMP_NB_THREADS * maxItemsPerCacheLine), T]

@timotheecour
Copy link
Owner

thanks! PR's welcome to incorporate your points so they don't get lost!

@timotheecour
Copy link
Owner

added 208b717 to address some points above (marking them as checked)

@timotheecour timotheecour changed the title Speed runtime performance Mar 27, 2018
@timotheecour
Copy link
Owner

/cc @mratsim

D cannot disable (?) the GC at specific point.

what do you mean? see https://dlang.org/library/core/memory/gc.disable.html

@ghost
Copy link

ghost commented Mar 27, 2018

@timotheecour it has some limitations though:
Collections may continue to occur in instances where the implementation deems necessary for correct program behavior, such as during an out of memory condition.

@mratsim
Copy link
Contributor Author

mratsim commented Mar 28, 2018

@timotheecour hence the question mark ;) I don't know D.

By the way regarding: D has the advantage of having access to register size and L1 cache or L2 cache size at compile-time when using LDC. This was used in Mir, full article here. I have also seen L1 cache size dependent code in mir library.

@timotheecour timotheecour mentioned this issue Apr 4, 2018
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants