Skip to content

OperatorList

zzxx-husky edited this page Feb 5, 2021 · 1 revision
Operator Input → Output Description
Source Operator
iterate iterable ⇒ O Iterate elements in a temporal or persistent iterable or a range.
(Iterable i): iterate elements in iterable i.
(Iterator begin, Iterator end): iterate elements in a range specified by [start, end).
range [L, R) ⇒ O Generate elements within a numeric range.
(Num l, Num r): range (l ≤ i < r).
(Num r): range (0 ≤ i < r).
generate next( ) ⇒ O Generate elements by user defined functions.
(( )→O next): each invocation of next generates one element.
.until(( )→bool condition): generation stops when condition outputs true; invoked before next.
.times(int64): the num of elements to generate.
elements (O...) ⇒ O Iterate each of the given elements to output
(O... e...): elements to iterate.
Pipeline Operator
map I → O Transform one input to one output, e.g., a new value, a member of the input, or a reference to an external variable.
(I→O mapper): map each input to an output.
zip_with_index I → (int64, I) Output a pair with second to be the input and first to be the index of the input.
The index starts from 0 and increases by 1 for each input.
window I → [I] Output inputs in windows. Each input appears in N ≥ 0 window(s). The last window may be incomplete but not empty.
(int64 s): creates tumbling window of size s.
(int64 s, int64 t): create sliding windows of size s for every t inputs.
.cache_by_ref( ): use Reference to cache inputs to avoid copy.
foreach I → I Process each input with a UDF and forward the input to output. User can modify the input if it is a reference.
(I→( ) proc): actions to take for each input.
filter I → I Forward to output the inputs that satisfy a given condition.
(I→bool condition): check if an input satisfies the condition.
flatmap I → [O] Transform one input to N ≥ 0 output(s).
(I→[O] mapper): map each input into an iterable or a coll operator.
flatten [I] → I Output the elements of the iterable input.
concat I + I → I or
I + J → common<I, J>
Forward the output of the first parent and then forward the output of the second parent.
init
tail
I → I Forward each input to output except the last / first input.
.cache_by_ref(): use Reference to cache inputs to avoid copy.
sort I → I Forward the inputs in a sorted order to output. A buffer is used to store all the inputs for sorting.
.by((I, I)→bool cmp): specify a comparator to tell which input is smaller.
.by(I→M mapper): map each input to a new value for comparison.
.cache_by_ref(): use Reference to cache inputs to avoid copy.
.reverse(): sort the inputs in a reverse order.
unique I → I Forwad an input to output if it does not duplicate with the previous input.
.by(I → V mapper): map an input to a value or a reference for checking uniqueness.
reverse I → I Forward the inputs in a reverse order to output. Default strategy is to push reversion upwards until certain operators (e.g., source operator, sort) that can iterate elements reversely.
.with_buffer( ): use when reversion pushup fails due to non-reversable operators (e.g., window, generate).
.cache_by_ref( ): use Reference to cache inputs to avoid copy; available only if .with_buffer() is used.
distinct I → I Forward each input to output if it does not duplicate with any previous input. A set (default: unordered_set) is used to cache distinct inputs.
.cache_by_ref( ): use Reference to cache inputs to avoid copy.
split I→ vector<I> or
I → Container
Split the input stream by those inputs that satisfy a given condition and group adjacent inputs.
(I→bool condition): the condition fot split.
(Container s, I→bool condition): the condition for split with adjacent inputs grouped into s.
branch I → I Make a new branch to process inputs in a different pipeline.
Each input is first forwarded to the branch and then to the output.
If no sink operator is used in the branch, the entire pipeline will not process any input.
Sink Operator
groupby I → (K, A) or
I ⇒ {K→A}
(I→K key): extract a key from the input (default: identity function).
.adjacent( ): if used, only adjacent inputs with the same key are grouped and the processing remains pipelining; if not, a map storing the aggregation results for each key is produced.
.valueby(I→V val): extract a value from the input for aggregation (default: identity function).
.aggregate(A aggregator, (A&, V)→( ) aggregate), .aggregate(Type<I>→A builder, (A&, V)→( ) aggregate): aggregate the values into aggregator. (default: insertion to a vector).
.count( ): count the number of inputs for each group.
act I → ( ) To trigger the processing of the pipeline. Used when the pipeline does not produce a result or write to an output stream.
aggregate I → A Aggregate inputs into an aggregator.
(A aggregator, (A&, V)→( ) aggregate), <A>((A&, V)→( ) aggregate), (Type<I>→A builder, (A&, V)→( ) aggregate): aggregate the values into aggregator.
max
min
I ⇒ optional<I> or
I ⇒ Reference<I>
Find the max / min value from the inputs. Return nullopt if no input.
.ref( ): requires return value to be a reference to the input.
sum
avg
I ⇒ O Calculate the sum / average of the inputs.
Return nullopt if no input or default value.
.init(O v): set the initial value for summation
head
last
I ⇒ optional<I> Return the first / last input.
Return nullopt if no input.
count I ⇒ int64 Return the number of inputs
to I ⇒ Container or
I ⇒ ( )
Insert each input to a new or existing container by insert, push_back, push or emplace.
<Container>( ), (Container c), (Type<I>→Container): specify the container to store the inputs.
(Container& c): forward the inputs to an existing container.
.reserve(int64): reverse capacity for the container, if possible.
.by_move( ): insert the inputs into the container by move (default: by copy).
print
println
I ⇒ ( ) Forward each input to an output stream.
(Str start, Str delimiter, Str end): start / end is inserted before / after the first / last input while delimiter is inserted between any two inputs (default: "[", ", ", "]" for print; "", "\n", "\n" for println).
.to(ostream& os): output to os (default: cout).
.format((ostream&, I)→( ) formatter): define how to output an input to the output stream.
Optimized Pipeline Description
sort | to ⇒ sort.buffer The buffer of sort operator will be used as the output of to if type matches.
reverse Reverson is pushed upward to a reversable operator to carry out, if possible.
Not Optimized Pipeline Why
sort | last ⇏ max
sort | head ⇏ min
This can be easily replaced by the user.
iterate(c) | count ⇏ c.size( ) This can be easily replaced by the user.
sort | distinct ⇏ sort | unique User has better knowledge to tell if unique is better than distinct.
Clone this wiki locally