Skip to content
This repository has been archived by the owner on Apr 25, 2022. It is now read-only.

Consider technology shift #5

Closed
rr- opened this issue Dec 30, 2014 · 9 comments
Closed

Consider technology shift #5

rr- opened this issue Dec 30, 2014 · 9 comments
Assignees

Comments

@rr-
Copy link
Member

rr- commented Dec 30, 2014

  1. Some areas suffer form very poor performance.
  2. This has prompted me to start creating native extensions in C instead of writing it in plain Ruby.
  3. The obvious advantage lies in the gained performance.
  4. There are, however, numerous disadvantages of such approach:
    1. The codebase becomes impure, it mixes Ruby with C.
    2. It imposes great requirements on the end user - he must have gcc and make around. I assume it is a great PITA to get this right on Windows (used by the target audience) without using Cygwin.
    3. There is RubyInline, but it causes multiple problems on every machine I try to install it on (surprisingly, biggest problems kept happening on Debian). It doesn't resolve the issues mentioned previous points either.
    4. Even if I port the most sensitive parts to C, the remaining code is still quite slow.

The biggest issue with this is that when I started this project, I've chosen Ruby because I wanted it to be as clean and nice as possible... to developers. Now I feel like that as a byproduct, this tool can be used only by developers.

My suggestions are following:

  • Ignore this issue. Which is what I'm going to do for now.
  • Switch to other language.
    1. Go. I'm a bit reluctant to rely on anything from Google, but from what I heard about Go, this seems like a good choice. The syntax is weird.
    2. Rust. For now, it's too immature - core modules might change any moment. The syntax is crazy. I'm afraid of compile times - no incremental building.
    3. D. I know nothing about D, except that it falls into similar niche.
    4. C/C++. Makefiles are nightmare, unit testing that needs to use macros is a nightmare, includes that expand to 70k SLOC is a nightmare. Everything reeks of 80s and C++14 doesn't fix that.
@erengy
Copy link

erengy commented Jan 10, 2015

I've previously written a similar multipurpose tool in C++. It did work fast, and it could have worked even faster, but I couldn't help questioning myself whether I actually needed that kind of performance or not. If I were to write it again now, I'd probably go with Python (I'm not familiar with Ruby) or try out Go. Python would gather more contributors, if any. Go would run considerably faster.

How slow is quite slow? Unpacking of an archive is rarely done twice, so that shouldn't be an issue. If the packing process is not unreasonably slow, then the shift may not be worth it.

PS: Rust has just hit v1.0 alpha, which is reassuring in terms of stability.

@rr-
Copy link
Member Author

rr- commented Jan 10, 2015

Hmm... when I implement stuff such as LZSS compressor in Ruby (which uses bit-level arithmetic), it can take up to 40-50 minutes to convert all the graphic files, while the C-powered version crunches everything down in about 2 minutes (yay for unsafe type casting). That's why I keep implementing compressors in C, while implementing everything else in Ruby.

That is to be expected, though: C instructions such as >> and pointer arithmetic translates almost directly into machine code such as shr and lea, while Ruby has to emulate everything in its VM.

I'd go with either Go or Rust. Go seems promising with regard to short compilation times. Although I was aware Rust was going to hit alpha soon, I wasn't aware that they have bold plans to release 1.0 final in, like, just a few months.

@rr-
Copy link
Member Author

rr- commented Jan 10, 2015

By the way, I'm considering withdrawing the support for compressing/packing.

The reason I keep implementing packers is that they make unit testing really easy: assert stuff == unpack(pack(stuff)). But the truth is that:

  • I should provide some hand-crafted minimal binary files in the unit tests, otherwise packer and unpacker can just return stuff and it will go unnoticed by the unit test.
  • It actually makes little sense to add packing support:
    • What's the point of being able to repack files for novels that are already translated?
    • Translating untranslated novels needs to be provided with much more than a simple repacker such as this one. So instead of spending time on trying to satisfy everyone in advance, I probably should focus on adding unpack support for more games.

@erengy
Copy link

erengy commented Jan 10, 2015

40-50 minutes? It definitely extends beyond being unreasonable, then. As a wise woman once said, ain't nobody got time for that.

I think it boils down to what your intentions are, and how the tool is supposed to be used. Having a nice and clean codebase is quite helpful when another developer wants to extend the functionality or fix a bug. As long as you don't use a relatively unknown language such as OCaml, it should be fine.

That said, even though most people can figure out how to set up a development environment and to use the command line, non-developers will always prefer having a simple executable file in hand, preferably with a GUI (e.g. AnimED, Crass, ExtractData). This is also true for developers, actually. I don't mind this when I'm working on a translation project, but when I just want to quickly extract the contents of an eroge, I'd rather drag-and-drop the archive on a window and be done with it.

@rr-
Copy link
Member Author

rr- commented Jan 11, 2015

40-50 minutes if I use Ruby, though. I do the critical stuff in C, so it's sort of acceptable. Regarding the purpose of the tool: frankly, most of archives I support so far can be extracted using other tools, so I guess it boils down to this:

Standards

This, and personally, I consider GUI to be a total bloat most of the time. Converting some files is definitely one of these cases. Majority of the tools out there does #include <windows.h>... why? arc_unpacker supports drag'n'drop even though it's CLI. It's only dependency is rmagick, which is probably going to go away after I switch languages.

Like I said in the ticket, I'll give it some more time, and when I feel up to the task, I'll try out Rust and Go. They should allow me to:

  • avoid the need of having a "dual-language" repository to keep things fast enough
  • allow users with no development environment to run the project (Windows package manager when?)
  • control the code cleanliness myself
  • keep it CLI and portable

Dropping packing support seems reasonable, since (I think) every translation project needs its own hacker anyway. Giving the source code to him allows him to build his own tools and set up any environment he wants, and reversing unpacking from looking at the source code shouldn't be too difficult.

@rr-
Copy link
Member Author

rr- commented Jan 26, 2015

I finally completed the research, and here are my thoughts from the standpoint of this project:

  1. Scripting languages are slow and cannot be compiled to standalone .exe, thus making target audience even smaller than it already is.
  2. Go's main advantage lies in compile speeds and parallel processing. Go's toolchain, workspace management and requirement to set up $GOLANG are a deal-breaker to me.
  3. Rust has super weird syntax. I could get over it, but there's another huge disadvantage - compiling hello world results in 3.5 mb exe, which I find totally unacceptable. It might improve in the future, but we're talking about here and now.
  4. D, Ada and others: too exotic.
  5. C++. Bloatware.
  6. C. Yeah... C.

I checked out how developing in vanilla C feels like. After winning an epic fight with necessary evil that is makefile, I found the development in C to be... kind of calming.

  • It's as close to machine as it gets before digging into assembler.
  • Standard library is minimal, which means there is no temptation to link against bloatware.
  • Memory footprint is minimal.
  • So is the executable size.
  • Performance is as good as the implementation thanks to nonexistent overhead.
  • make and gcc are available on virtually every platform out there.
  • make and gcc are more reasonable dependencies than forcing users to download Ruby/Rust/whatever new shiny tool.
  • As opposed to C++, the differences in standard library implementations are minimal (example reasonable problem with C and example unreasonable problem with C++).

I'll go with C. Results will be committed into c branch.

@erengy
Copy link

erengy commented Jan 27, 2015

I agree with most of the points, but I'd argue that C++ brings more to the table with no practical cost. You can always pick and choose which features of C++ to use, and continue coding in C where you see fit. As a result, you can write less code and spend less time on dealing with pointers and stuff. I don't have a strong opinion on the matter though, and you should totally use C if you enjoy it more.

@rr-
Copy link
Member Author

rr- commented Feb 5, 2015

I learned you were right the hard way.

At first, everything went smooth: I had total control over the program, no stuff was happening under the hood, etc. Memory footprint and executable sizes were minimal.

Then I wanted my program not to SIGSEGV when things went bad (e.g. archive was corrupted). Since C doesn't have exception model, I need to check all the function's return values ALWAYS. Not only is this tiresome, it's also totally counterproductive because it makes refactoring more difficult. The only alternative is to use longjmp, or a nice wrapper for longjmp such as e4c that introduces try-catch-finally keywords to C. This, however, means I need to be extra careful about my mallocs and put them inside finally blocks, otherwise I'll leak memory on exceptions (and won't even know about this). So my code still needs to be very verbose, just in another way.

Now C++ suffered the same problems... until C++11 introduced smart pointers. This should allow me to write almost assert-free code, which sounds great.

@rr-
Copy link
Member Author

rr- commented Feb 11, 2015

Finally done.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants