Add tsort by akiradeveloper · Pull Request #359 · uutils/coreutils

akiradeveloper · 2014-07-19T13:14:02Z

Signed-off-by: Akira Hayakawa ruby.wktk@gmail.com

Another test project is here

https://github.com/akiradeveloper/rust-tsort

Testing

It generates random graph and compares both outputs of GNU-version and Rust-version. I think Rust-version works correctly.

Performance

The current version doesn't perform well for especially large scale dataset.

Here is a experimental result from the above test project.

The test is first create DAG of N nodes and N(N-1)/2 edges and then feed the dataset into both versions. Note, t_new is of Rust-version and t_ref is of GNU-version.

N increases from top to bottom as logarithmic scale (10, 20, 50, 100, 200, 500, 1000, 2000, 5000). When the N is small. Rust-version is not bad however, it loses more the higher the N is. At the last case, it's 10 times worse.

The algorithm is O(|V|+|E|) which is O(|V|^2) in this case. I will improve the algorithm if I find better one.

TEST30000 result: OK, acyclic:true, time: t_new/t_ref=0.008438441/0.006900673
TEST30001 result: OK, acyclic:true, time: t_new/t_ref=0.009235107/0.006930367
TEST30002 result: OK, acyclic:true, time: t_new/t_ref=0.009223152/0.007113545
TEST30003 result: OK, acyclic:true, time: t_new/t_ref=0.013038735/0.007669914
TEST30004 result: OK, acyclic:true, time: t_new/t_ref=0.034065732/0.011519491
TEST30005 result: OK, acyclic:true, time: t_new/t_ref=0.186134332/0.03995111
TEST30006 result: OK, acyclic:true, time: t_new/t_ref=0.862109813/0.149578491
TEST30007 result: OK, acyclic:true, time: t_new/t_ref=4.584371179/0.62385227
TEST30008 result: OK, acyclic:true, time: t_new/t_ref=42.988573293/4.288322895

Signed-off-by: Akira Hayakawa <ruby.wktk@gmail.com>

Arcterus · 2014-07-19T18:02:51Z

This could be let n = start_nodes.shift().unwrap();. You wouldn't need the next line in this case.

The regression I said in below (refactoring looping n_out_edges) was found caused by this change.

diff --git a/tsort/tsort.rs b/tsort/tsort.rs index 1a514a9..b0c6283 100644 --- a/tsort/tsort.rs +++ b/tsort/tsort.rs @@ -154,7 +154,7 @@ impl Graph { } while !start_nodes.is_empty() { - let n = start_nodes[0].clone(); + let n = start_nodes.shift().unwrap(); start_nodes.remove(0); self.result.push(n.clone());

Oh. You need to remove start_nodes.remove(0) below. Otherwise, it will try to remove the node twice.

The double removing removed and it came back to life. Thanks.

No problem. :)

Arcterus · 2014-07-19T18:30:06Z

Everything else seems fine. The performance issue can be worked out later. 👍

Arcterus · 2014-07-19T22:34:45Z

Signed-off-by: Akira Hayakawa <ruby.wktk@gmail.com>

akiradeveloper · 2014-07-20T03:18:54Z

All proposals applied.

No regression found and performance really improved. I think refactoring that inner loop did it.

I will submit another pull request

akira@Hercules:~/rust-tsort$ rake run_tests
TEST0 result: OK, acyclic:false, time: t_new/t_ref=0.008577448/0.006712845
TEST1 result: OK, acyclic:false, time: t_new/t_ref=0.007850375/0.006539402
TEST10000 result: OK, acyclic:false, time: t_new/t_ref=0.009184312/0.00813009
TEST10001 result: OK, acyclic:false, time: t_new/t_ref=0.009967213/0.007744685
TEST10002 result: OK, acyclic:false, time: t_new/t_ref=0.009206604/0.006665816
TEST10003 result: OK, acyclic:true, time: t_new/t_ref=0.011407594/0.00696269
TEST10004 result: OK, acyclic:false, time: t_new/t_ref=0.009013078/0.006981327
TEST10005 result: OK, acyclic:false, time: t_new/t_ref=0.010660868/0.007739061
TEST10006 result: OK, acyclic:true, time: t_new/t_ref=0.008568894/0.00687528
TEST10007 result: OK, acyclic:false, time: t_new/t_ref=0.008746539/0.006904187
TEST10008 result: OK, acyclic:false, time: t_new/t_ref=0.008943133/0.006871977
TEST10009 result: OK, acyclic:true, time: t_new/t_ref=0.011155742/0.007051623
TEST2 result: OK, acyclic:true, time: t_new/t_ref=0.009606108/0.006861813
TEST20000 result: OK, acyclic:true, time: t_new/t_ref=0.016767014/0.008741542
TEST20001 result: OK, acyclic:true, time: t_new/t_ref=0.109895699/0.031641821
TEST20002 result: OK, acyclic:true, time: t_new/t_ref=0.38382173/0.102129412
TEST20003 result: OK, acyclic:true, time: t_new/t_ref=0.030457973/0.011940991
TEST20004 result: OK, acyclic:true, time: t_new/t_ref=0.427178843/0.112283163
TEST20005 result: OK, acyclic:true, time: t_new/t_ref=0.222073998/0.061800087
TEST20006 result: OK, acyclic:true, time: t_new/t_ref=0.506039759/0.134829509
TEST20007 result: OK, acyclic:true, time: t_new/t_ref=0.057163792/0.018484568
TEST20008 result: OK, acyclic:true, time: t_new/t_ref=0.079317631/0.024882092
TEST20009 result: OK, acyclic:true, time: t_new/t_ref=0.022170054/0.009995337
TEST3 result: OK, acyclic:false, time: t_new/t_ref=0.008100273/0.006701319
TEST30000 result: OK, acyclic:true, time: t_new/t_ref=0.008127883/0.006822402
TEST30001 result: OK, acyclic:true, time: t_new/t_ref=0.008253873/0.006932948
TEST30002 result: OK, acyclic:true, time: t_new/t_ref=0.009098254/0.006977331
TEST30003 result: OK, acyclic:true, time: t_new/t_ref=0.012857694/0.009357036
TEST30004 result: OK, acyclic:true, time: t_new/t_ref=0.030292198/0.011266431
TEST30005 result: OK, acyclic:true, time: t_new/t_ref=0.136233344/0.040125623
TEST30006 result: OK, acyclic:true, time: t_new/t_ref=0.562640418/0.149092327
TEST30007 result: OK, acyclic:true, time: t_new/t_ref=2.460193113/0.613513872
TEST30008 result: OK, acyclic:true, time: t_new/t_ref=16.838372892/4.308812779
TEST4 result: OK, acyclic:true, time: t_new/t_ref=0.007966765/0.006680662
TEST5 result: OK, acyclic:false, time: t_new/t_ref=0.007779832/0.007144855
failures=[]
max ratio=4.010004052524504
min ratio=1.1905286178404915
ave ratio=2.5372806169406505
std dev=1.194178433942098

huonw · 2014-07-20T03:24:59Z

This could be written as

print!("{name} v{version} Usage: {name} [OPTIONS] FILE {usage}", name = NAME, vers = VERSION, usage = getopts::usage(...));

I'll probably refactor all of the utilities to use this at some point.

Arcterus · 2014-07-20T03:27:55Z

Yay! 😄

Yes, it's faster because of the inner loop. In case you're interested, the reason for this is the loop no longer has bounds checks, and the elements in the Vec are no longer being shifted on each loop iteration.

Arcterus · 2014-07-20T03:31:00Z

I assume you want me to merge the other pull request, so I can close this one, correct?

akiradeveloper · 2014-07-20T03:31:58Z

I assume you want me to merge the other pull request, so I can close this one, correct?

Yes. Please close and merge #361

Add tsort

40638bd

Signed-off-by: Akira Hayakawa <ruby.wktk@gmail.com>

Arcterus reviewed Jul 19, 2014
View reviewed changes

Comment thread tsort/tsort.rs Outdated

Copy link
Copy Markdown

Collaborator

Arcterus Jul 19, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo.

akiradeveloper added 9 commits July 20, 2014 11:44

fix (pop start_nodes)

0e57ea5

Signed-off-by: Akira Hayakawa <ruby.wktk@gmail.com>

fix (use println! instead of explicit bufferedwriter)

2982a58

Signed-off-by: Akira Hayakawa <ruby.wktk@gmail.com>

fix spacing

79f57be

Signed-off-by: Akira Hayakawa <ruby.wktk@gmail.com>

fix spacing

e93aa89

Signed-off-by: Akira Hayakawa <ruby.wktk@gmail.com>

fix spacing

00e5519

Signed-off-by: Akira Hayakawa <ruby.wktk@gmail.com>

fix spacing

15994ed

Signed-off-by: Akira Hayakawa <ruby.wktk@gmail.com>

fix typo

619b745

Signed-off-by: Akira Hayakawa <ruby.wktk@gmail.com>

fix buffered reader

8c0a63f

Signed-off-by: Akira Hayakawa <ruby.wktk@gmail.com>

refactor looping

a7304e9

Signed-off-by: Akira Hayakawa <ruby.wktk@gmail.com>

huonw reviewed Jul 20, 2014
View reviewed changes

akiradeveloper mentioned this pull request Jul 20, 2014

Add tsort #361

Merged

Arcterus closed this Jul 20, 2014

Uh oh!

Conversation

akiradeveloper commented Jul 19, 2014

Testing

Performance

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Arcterus commented Jul 19, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akiradeveloper commented Jul 20, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Arcterus commented Jul 20, 2014

Uh oh!

Arcterus commented Jul 20, 2014

Uh oh!

akiradeveloper commented Jul 20, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants