Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Erlang string splitting experiments
Erlang C Shell
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
README.md
epiece.erl
epiece_nif.c
epiece_nif.erl
epiece_nif.mk
output.log
run.sh

README.md

Erlang string splitting experiments

The goal is to find the fastest possible way of string splitting/tokenization with string delimiter. Inspiration comes from ANSI/ISO MUMPS $P[iece] function.

Assumption is that string:split does not exist (even if it was proposed multiple times) and we don't want to implement it as BIF/NIF native calls.

This project contains several string splitting methods which we're trying to test and optimize until we find the fastest one.

The resulting function may be generalization of multiple methods so that it will decide which method to use. (Input text length, delimiter length, etc.)

An ideal candidate will look like:

  epiece ("hello:world::zoo::joe", ":") ->
    [{1, "hello"}, {2, "world"}, {3, []}, {4, "zoo"}, {5, []}, {6, "joe"}].
  epiece ("hello:world::zoo::joe", ":", [foo, bar, x, x, x, ex]) ->
    [{foo, "hello"}, {bar, "world"}, {x, []}, {ex, "joe"}].

The NIF way

Well, we're not lazy, so the NIF library was added too. It is pretty fast and may be even faster.

The next step would be to change the NIF module to work with binaries instead of lists/strings.

Another thing is to make epiece:epn function completely native -- for sure, it will be faster to make xzip and epiece_nif:piece within a single iteration loop in the NIF itself.

Something went wrong with that request. Please try again.