diff --git a/com/package.html b/com/package.html deleted file mode 100644 index 1a2625a43f..0000000000 --- a/com/package.html +++ /dev/null @@ -1,85 +0,0 @@ - - - -
-Returns the number of times that divides this and the remainder -The law is: that * result_.
Returns the number of times that divides this and the remainder -The law is: that * result_.1 + result._2 == this -
Returns the number of times that divides this and the remainder -The law is: that * result_.
Returns the number of times that divides this and the remainder -The law is: that * result_.1 + result._2 == this -
The args class does a simple command line parsing. The rules are: -keys start with one or more "-". Each key has zero or more values -following.
parses keys as starting with a dash, except single dashed digits.
parses keys as starting with a dash, except single dashed digits. -Also parses key value pairs that are separated with an equal sign. -All following non-dashed args are a list of values. -If the list starts with non-dashed args, these are associated with the -empty string: "" -
Split on whitespace and then parse.
-This is a synonym for required -
-Does this Args contain a given key? -
-Equivalent to .
Equivalent to .optional(key).getOrElse(default) -
Get the list of values associated with a given key.
Get the list of values associated with a given key. -if the key is absent, return the empty list. NOTE: empty -does not mean the key is absent, it could be a key without -a value. Use boolean() to check existence. -
If there is zero or one element, return it as an Option.
If there is zero or one element, return it as an Option. -If there is a list of more than one item, you get an error -
Gets the list of positional arguments -
-return exactly one value for a given key.
return exactly one value for a given key. -If there is more than one value, you get an exception -
Cascading can't handle multiple head pipes with the same -name.
Cascading can't handle multiple head pipes with the same -name. This handles them by caching the source and only -having a single head pipe to represent each head. -
Builder classes used internally to implement coGroups (joins). -Can also be used for more generalized joins, e.g., star joins.
Pretty much a synonym for mapReduceMap with the methods collected into a trait.
Pretty much a synonym for mapReduceMap with the methods collected into a trait.
Approximate number of unique values
-We use about m = (104/errPercent)^2 bytes of memory per key
-Uses .toString.getBytes
to serialize the data so you MUST
-ensure that .toString is an equivalance on your counted fields
-(i.e. x.toString == y.toString
if and only if x == y
)
Approximate number of unique values
-We use about m = (104/errPercent)^2 bytes of memory per key
-Uses .toString.getBytes
to serialize the data so you MUST
-ensure that .toString is an equivalance on your counted fields
-(i.e. x.toString == y.toString
if and only if x == y
)
For each key:
-10% error ~ 256 bytes -5% error ~ 1kb -1% error ~ 8kb -0.5% error ~ 64kb -0.25% error ~ 256kb -
uses a more stable online algorithm which should -be suitable for large numbers of records
uses a more stable online algorithm which should -be suitable for large numbers of records
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm -
This may significantly reduce performance of your job.
This may significantly reduce performance of your job. -It kills the ability to do map-side aggregation. -
This is count with a predicate: only counts the tuples for which
-fn(tuple)
is true
-
This is count with a predicate: only counts the tuples for which
-fn(tuple)
is true
-
First do "times" on each pair, then "plus" them all together.
First do "times" on each pair, then "plus" them all together.
-groupBy('x) { _.dot('y,'z, 'ydotz) }
-
Remove the first cnt elements -
Remove the first cnt elements -
Drop while the predicate is true, starting at the first false, output all -
Drop while the predicate is true, starting at the first false, output all -
Prefer aggregateBy operations! -
Prefer aggregateBy operations! -
This is the description of this Grouping in terms of a sequence of Every operations -
This is the description of this Grouping in terms of a sequence of Every operations -
Prefer reduce or mapReduceMap.
Prefer reduce or mapReduceMap. foldLeft will force all work to be -done on the reducers. If your function is not associative and -commutative, foldLeft may be required.
Make sure init is an immutable object.
Init needs to be serializable with Kryo (because we copy it for each -grouping to avoid possible errors using a mutable init object). -
This cancels map side aggregation -and forces everything to the reducers -
This cancels map side aggregation -and forces everything to the reducers -
Return the first, useful probably only for sorted case.
Return the first, useful probably only for sorted case. -
Collect all the values into a List[T] and then operate on that -list.
Collect all the values into a List[T] and then operate on that -list. This fundamentally uses as much memory as it takes to store the list. -This gives you the list in the reverse order it was encounted (it is built -as a stack for efficiency reasons). If you care about order, call .reverse in your fn
STRONGLY PREFER TO AVOID THIS. Try reduce or plus and an O(1) memory algorithm. -
Type T
is the type of the input field (input to map, T => X)
Type T
is the type of the input field (input to map, T => X)
Type X
is the intermediate type, which your reduce function operates on
-(reduce is (X,X) => X)
Type U
is the final result type, (final map is: X => U)
The previous output goes into the reduce function on the left, like foldLeft, -so if your operation is faster for the accumulator to be on one side, be aware. -
Corresponds to a Cascading Buffer -which allows you to stream through the data, keeping some, dropping, scanning, etc.
Corresponds to a Cascading Buffer -which allows you to stream through the data, keeping some, dropping, scanning, etc... -The iterator you are passed is lazy, and mapping will not trigger the -entire evaluation. If you convert to a list (i.e. to reverse), you need to be aware -that memory constraints may become an issue.
Any fields not referenced by the input fields will be aligned to the first output, -and the final hadoop stream will have a length of the maximum of the output of this, and -the input stream. So, if you change the length of your inputs, the other fields won't -be aligned. YOU NEED TO INCLUDE ALL THE FIELDS YOU WANT TO KEEP ALIGNED IN THIS MAPPING! -POB: This appears to be a Cascading design decision.
mapfn needs to be stateless. Multiple calls needs to be safe (no mutable -state captured) -
these will only be called if a tuple is not passed, meaning just one -column -
these will only be called if a tuple is not passed, meaning just one -column -
Similar to the scala.
Similar to the scala.collection.Iterable.mkString -takes the source and destination fieldname, which should be a single -field. The result will be start, each item.toString separated by sep, -followed by end for convenience there several common variants below -
An identity function that keeps all the tuples.
An identity function that keeps all the tuples. A hack to implement -groupAll and groupRandomly. -
Opposite of RichPipe.
Opposite of RichPipe.unpivot. See SQL/Excel for more on this function -converts a row-wise representation into a column-wise one.
-pivot(('feature, 'value) -> ('clicks, 'impressions, 'requests)) -
it will find the feature named "clicks", and put the value in the column with the field named -clicks.
Absent fields result in null unless a default value is provided. Unnamed output fields are ignored.
Duplicated fields will result in an error.
if you want more precision, first do a
-map('value -> value) { x : AnyRef => Option(x) } -
and you will have non-nulls for all present values, and Nones for values that were present -but previously null. All nulls in the final output will be those truly missing. -Similarly, if you want to check if there are any items present that shouldn't be:
-map('feature -> 'feature) { fname : String => - if (!goodFeatures(fname)) { throw new Exception("ohnoes") } - else fname -} -
The same as plus(fs -> fs)
-
The same as plus(fs -> fs)
-
Use Monoid.plus
to compute a sum.
Use Monoid.plus
to compute a sum. Not called sum to avoid conflicting with standard sum
-Your Monoid[T]
should be associated and commutative, else this doesn't make sense
-
Apply an associative/commutative operation on the left field.
Apply an associative/commutative operation on the left field.
-reduce(('mass,'allids)->('totalMass, 'idset)) { (left:(Double,Set[Long]),right:(Double,Set[Long])) => - (left._1 + right._1, left._2 ++ right._2) -} -
Equivalent to a mapReduceMap with trivial (identity) map functions.
The previous output goes into the reduce function on the left, like foldLeft, -so if your operation is faster for the accumulator to be on one side, be aware. -
Override the number of reducers used in the groupBy.
Override the number of reducers used in the groupBy. -
Analog of standard scanLeft (@see scala.
Analog of standard scanLeft (@see scala.collection.Iterable.scanLeft ) -This invalidates map-side aggregation, forces all data to be transferred -to reducers. Use only if you REALLY have to.
Make sure init is an immutable object.
init needs to be serializable with Kryo (because we copy it for each -grouping to avoid possible errors using a mutable init object). - We override the default implementation here to use Kryo to serialize - the initial value, for immutable serializable inits, this is not needed -
How many values are there for this key -
How many values are there for this key -
Compute the count, ave and standard deviation in one pass -example: g.
Compute the count, ave and standard deviation in one pass -example: g.sizeAveStdev('x -> ('cntx, 'avex, 'stdevx)) -
This invalidates aggregateBy! -
This invalidates aggregateBy! -
Equivalent to sorting by a comparison function -then take-ing k items.
Equivalent to sorting by a comparison function -then take-ing k items. This is MUCH more efficient than doing a total sort followed by a take, -since these bounded sorts are done on the mapper, so only a sort of size k is needed.
-sortWithTake( ('clicks, 'tweet) -> 'topClicks, 5) { - fn : (t0 :(Long,Long), t1:(Long,Long) => t0._1 < t1._1 } -
topClicks will be a List[(Long,Long)] -
Reverse of above when the implicit ordering makes sense.
Reverse of above when the implicit ordering makes sense. -
Same as above but useful when the implicit ordering makes sense.
Same as above but useful when the implicit ordering makes sense. -
Override the spill threshold on AggregateBy -
Override the spill threshold on AggregateBy -
Only keep the first cnt elements -
Only keep the first cnt elements -
Take while the predicate is true, stopping at the -first false.
Take while the predicate is true, stopping at the -first false. Output all taken elements. -
This is convenience method to allow plugging in blocks
-of group operations similar to RichPipe.then
-
This is convenience method to allow plugging in blocks
-of group operations similar to RichPipe.then
-
The same as times(fs -> fs)
-
The same as times(fs -> fs)
-
Returns the product of all the items in this grouping -
Returns the product of all the items in this grouping -
Convert a subset of fields into a list of Tuples.
Convert a subset of fields into a list of Tuples. Need to provide the types of the tuple fields. -
begining of block with access to expensive nonserializable state.
begining of block with access to expensive nonserializable state. The state object should -contain a function release() for resource management purpose. -
By default adds a column with name "count" counting the number in -this group.
By default adds a column with name "count" counting the number in -this group. deprecated, use size. -
(Since version 0.2.0) Use size instead to match the scala.collections.Iterable API
Csv value source -separated by commas and quotes wrapping all fields
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Holds some coversion functions for dealing with strings as RichDate objects
Return the guessed format for this datestring -
-Parse the string with one of the value DATE_FORMAT_VALIDATORS in the order listed above.
Parse the string with one of the value DATE_FORMAT_VALIDATORS in the order listed above. -We allow either date, date with time in minutes, date with time down to seconds. -The separator between date and time can be a space or "T". -
represents a closed interval of time.
TODO: This should be Range[RichDate, Duration] for an appropriate notion -of Range
shift this by the given unit -
-Is the given Date range a (non-strict) subset of the given range -
-produce a contiguous non-overlapping set of DateRanges -whose union is equivalent to this.
produce a contiguous non-overlapping set of DateRanges -whose union is equivalent to this. -If it is passed an integral unit of time (not a DurationList), it stops at boundaries -which are set by the start timezone, else break at start + k * span. -
make the range wider by delta on each side.
make the range wider by delta on each side. Good to catch events which -might spill over. -
Extend the length by moving the end.
Extend the length by moving the end. We can keep the party going, but we -can't start it earlier. -
Sets up an implicit dateRange to use in your sources and an implicit -timezone. -Example args: --date 2011-10-02 2011-10-04 --tz UTC -If no timezone is given, Pacific is assumed.
By default we only set two keys: -io.
By default we only set two keys: -io.serializations -cascading.tuple.element.comparator.default -Override this class, call base and ++ your additional -map to set more options -
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc.
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc...) ALL is the output. - Perhaps only fromFields=ALL will make sense -2) If one of from or to is a strict super set of the other, SWAP is used. -3) If they are equal, REPLACE is used. -4) Otherwise, ALL is used. -
Multi-entry fields.
Multi-entry fields. This are higher priority than Product conversions so -that List will not conflict with Product. -
Implement this method if you want some other jobs to run after the current -job.
Implement this method if you want some other jobs to run after the current -job. These will not execute until the current job has run successfully. -
you should never call these directly, there are here to make -the DSL work.
you should never call these directly, there are here to make -the DSL work. Just know, you can treat a Pipe as a RichPipe and -vice-versa within a Job -
Useful to convert f : Any* to Fields.
Useful to convert f : Any* to Fields. This handles mixed cases ("hey", 'you). -Not sure we should be this flexible, but given that Cascading will throw an -exception before scheduling the job, I guess this is okay. -
Handles treating any TupleN as a Fields object.
Handles treating any TupleN as a Fields object. -This is low priority because List is also a Product, but this method -will not work for List (because List is Product2(head, tail) and so -productIterator won't work as expected. -Lists are handled by an implicit in FieldConversions, which have -higher priority. -
'* means Fields.
'* means Fields.ALL, otherwise we take the .name -
Mix this in for delimited schemes such as TSV or one-separated values -By default, TSV is given
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
This object has all the implicit functions and values that are used -to make the scalding DSL.
It's useful to import Dsl._ when you are writing scalding code outside -of a Job. -
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc.
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc...) ALL is the output. - Perhaps only fromFields=ALL will make sense -2) If one of from or to is a strict super set of the other, SWAP is used. -3) If they are equal, REPLACE is used. -4) Otherwise, ALL is used. -
Multi-entry fields.
Multi-entry fields. This are higher priority than Product conversions so -that List will not conflict with Product. -
Useful to convert f : Any* to Fields.
Useful to convert f : Any* to Fields. This handles mixed cases ("hey", 'you). -Not sure we should be this flexible, but given that Cascading will throw an -exception before scheduling the job, I guess this is okay. -
Handles treating any TupleN as a Fields object.
Handles treating any TupleN as a Fields object. -This is low priority because List is also a Product, but this method -will not work for List (because List is Product2(head, tail) and so -productIterator won't work as expected. -Lists are handled by an implicit in FieldConversions, which have -higher priority. -
'* means Fields.
'* means Fields.ALL, otherwise we take the .name -
Represents millisecond based duration (non-calendar based): seconds, minutes, hours -calField should be a java.util.Calendar field
(Since version 0.8.2) Use AbsoluteDuration.fromMillisecs
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc.
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc...) ALL is the output. - Perhaps only fromFields=ALL will make sense -2) If one of from or to is a strict super set of the other, SWAP is used. -3) If they are equal, REPLACE is used. -4) Otherwise, ALL is used. -
Multi-entry fields.
Multi-entry fields. This are higher priority than Product conversions so -that List will not conflict with Product. -
Useful to convert f : Any* to Fields.
Useful to convert f : Any* to Fields. This handles mixed cases ("hey", 'you). -Not sure we should be this flexible, but given that Cascading will throw an -exception before scheduling the job, I guess this is okay. -
Handles treating any TupleN as a Fields object.
Handles treating any TupleN as a Fields object. -This is low priority because List is also a Product, but this method -will not work for List (because List is Product2(head, tail) and so -productIterator won't work as expected. -Lists are handled by an implicit in FieldConversions, which have -higher priority. -
'* means Fields.
'* means Fields.ALL, otherwise we take the .name -
This is a base class for File-based sources
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
This handles the mapReduceMap work on the map-side of the operation. The code below -attempts to be optimal with respect to memory allocations and performance, not functional -style purity. -
Implements reductions on top of a simple abstraction for the Fields-API -We use the f-bounded polymorphism trick to return the type called Self -in each operation. -
Type T is the type of the input field (input to map, T => X) -Type X is the intermediate type, which your reduce function operates on -(reduce is (X,X) => X) -Type U is the final result type, (final map is: X => U)
Type T is the type of the input field (input to map, T => X) -Type X is the intermediate type, which your reduce function operates on -(reduce is (X,X) => X) -Type U is the final result type, (final map is: X => U)
The previous output goes into the reduce function on the left, like foldLeft, -so if your operation is faster for the accumulator to be on one side, be aware. -
Pretty much a synonym for mapReduceMap with the methods collected into a trait.
Pretty much a synonym for mapReduceMap with the methods collected into a trait.
Approximate number of unique values
-We use about m = (104/errPercent)^2 bytes of memory per key
-Uses .toString.getBytes
to serialize the data so you MUST
-ensure that .toString is an equivalance on your counted fields
-(i.e. x.toString == y.toString
if and only if x == y
)
Approximate number of unique values
-We use about m = (104/errPercent)^2 bytes of memory per key
-Uses .toString.getBytes
to serialize the data so you MUST
-ensure that .toString is an equivalance on your counted fields
-(i.e. x.toString == y.toString
if and only if x == y
)
For each key:
-10% error ~ 256 bytes -5% error ~ 1kb -1% error ~ 8kb -0.5% error ~ 64kb -0.25% error ~ 256kb -
uses a more stable online algorithm which should -be suitable for large numbers of records
uses a more stable online algorithm which should -be suitable for large numbers of records
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm -
This is count with a predicate: only counts the tuples for which
-fn(tuple)
is true
-
This is count with a predicate: only counts the tuples for which
-fn(tuple)
is true
-
First do "times" on each pair, then "plus" them all together.
First do "times" on each pair, then "plus" them all together.
-groupBy('x) { _.dot('y,'z, 'ydotz) }
-
Return the first, useful probably only for sorted case.
Return the first, useful probably only for sorted case. -
Collect all the values into a List[T] and then operate on that -list.
Collect all the values into a List[T] and then operate on that -list. This fundamentally uses as much memory as it takes to store the list. -This gives you the list in the reverse order it was encounted (it is built -as a stack for efficiency reasons). If you care about order, call .reverse in your fn
STRONGLY PREFER TO AVOID THIS. Try reduce or plus and an O(1) memory algorithm. -
these will only be called if a tuple is not passed, meaning just one -column -
these will only be called if a tuple is not passed, meaning just one -column -
Similar to the scala.
Similar to the scala.collection.Iterable.mkString -takes the source and destination fieldname, which should be a single -field. The result will be start, each item.toString separated by sep, -followed by end for convenience there several common variants below -
Opposite of RichPipe.
Opposite of RichPipe.unpivot. See SQL/Excel for more on this function -converts a row-wise representation into a column-wise one.
-pivot(('feature, 'value) -> ('clicks, 'impressions, 'requests)) -
it will find the feature named "clicks", and put the value in the column with the field named -clicks.
Absent fields result in null unless a default value is provided. Unnamed output fields are ignored.
Duplicated fields will result in an error.
if you want more precision, first do a
-map('value -> value) { x : AnyRef => Option(x) } -
and you will have non-nulls for all present values, and Nones for values that were present -but previously null. All nulls in the final output will be those truly missing. -Similarly, if you want to check if there are any items present that shouldn't be:
-map('feature -> 'feature) { fname : String => - if (!goodFeatures(fname)) { throw new Exception("ohnoes") } - else fname -} -
The same as plus(fs -> fs)
-
The same as plus(fs -> fs)
-
Use Monoid.plus
to compute a sum.
Use Monoid.plus
to compute a sum. Not called sum to avoid conflicting with standard sum
-Your Monoid[T]
should be associated and commutative, else this doesn't make sense
-
Apply an associative/commutative operation on the left field.
Apply an associative/commutative operation on the left field.
-reduce(('mass,'allids)->('totalMass, 'idset)) { (left:(Double,Set[Long]),right:(Double,Set[Long])) => - (left._1 + right._1, left._2 ++ right._2) -} -
Equivalent to a mapReduceMap with trivial (identity) map functions.
The previous output goes into the reduce function on the left, like foldLeft, -so if your operation is faster for the accumulator to be on one side, be aware. -
How many values are there for this key -
How many values are there for this key -
Compute the count, ave and standard deviation in one pass -example: g.
Compute the count, ave and standard deviation in one pass -example: g.sizeAveStdev('x -> ('cntx, 'avex, 'stdevx)) -
Equivalent to sorting by a comparison function -then take-ing k items.
Equivalent to sorting by a comparison function -then take-ing k items. This is MUCH more efficient than doing a total sort followed by a take, -since these bounded sorts are done on the mapper, so only a sort of size k is needed.
-sortWithTake( ('clicks, 'tweet) -> 'topClicks, 5) { - fn : (t0 :(Long,Long), t1:(Long,Long) => t0._1 < t1._1 } -
topClicks will be a List[(Long,Long)] -
Reverse of above when the implicit ordering makes sense.
Reverse of above when the implicit ordering makes sense. -
Same as above but useful when the implicit ordering makes sense.
Same as above but useful when the implicit ordering makes sense. -
The same as times(fs -> fs)
-
The same as times(fs -> fs)
-
Returns the product of all the items in this grouping -
Returns the product of all the items in this grouping -
Convert a subset of fields into a list of Tuples.
Convert a subset of fields into a list of Tuples. Need to provide the types of the tuple fields. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-com.twitter.scalding.GeneratedTupleAdders
-This controls the sequence of reductions that happen inside a -particular grouping operation. Not all elements can be combined, -for instance, a scanLeft/foldLeft generally requires a sorting -but such sorts are (at least for now) incompatible with doing a combine -which includes some map-side reductions. -
Pretty much a synonym for mapReduceMap with the methods collected into a trait.
Pretty much a synonym for mapReduceMap with the methods collected into a trait.
Approximate number of unique values
-We use about m = (104/errPercent)^2 bytes of memory per key
-Uses .toString.getBytes
to serialize the data so you MUST
-ensure that .toString is an equivalance on your counted fields
-(i.e. x.toString == y.toString
if and only if x == y
)
Approximate number of unique values
-We use about m = (104/errPercent)^2 bytes of memory per key
-Uses .toString.getBytes
to serialize the data so you MUST
-ensure that .toString is an equivalance on your counted fields
-(i.e. x.toString == y.toString
if and only if x == y
)
For each key:
-10% error ~ 256 bytes -5% error ~ 1kb -1% error ~ 8kb -0.5% error ~ 64kb -0.25% error ~ 256kb -
uses a more stable online algorithm which should -be suitable for large numbers of records
uses a more stable online algorithm which should -be suitable for large numbers of records
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm -
This may significantly reduce performance of your job.
This may significantly reduce performance of your job. -It kills the ability to do map-side aggregation. -
This is count with a predicate: only counts the tuples for which
-fn(tuple)
is true
-
This is count with a predicate: only counts the tuples for which
-fn(tuple)
is true
-
First do "times" on each pair, then "plus" them all together.
First do "times" on each pair, then "plus" them all together.
-groupBy('x) { _.dot('y,'z, 'ydotz) }
-
Remove the first cnt elements -
Remove the first cnt elements -
Drop while the predicate is true, starting at the first false, output all -
Drop while the predicate is true, starting at the first false, output all -
Prefer aggregateBy operations! -
-This is the description of this Grouping in terms of a sequence of Every operations -
This is the description of this Grouping in terms of a sequence of Every operations -
Prefer reduce or mapReduceMap.
Prefer reduce or mapReduceMap. foldLeft will force all work to be -done on the reducers. If your function is not associative and -commutative, foldLeft may be required.
Make sure init is an immutable object.
Init needs to be serializable with Kryo (because we copy it for each -grouping to avoid possible errors using a mutable init object). -
This cancels map side aggregation -and forces everything to the reducers -
-Return the first, useful probably only for sorted case.
Return the first, useful probably only for sorted case. -
Collect all the values into a List[T] and then operate on that -list.
Collect all the values into a List[T] and then operate on that -list. This fundamentally uses as much memory as it takes to store the list. -This gives you the list in the reverse order it was encounted (it is built -as a stack for efficiency reasons). If you care about order, call .reverse in your fn
STRONGLY PREFER TO AVOID THIS. Try reduce or plus and an O(1) memory algorithm. -
Type T
is the type of the input field (input to map, T => X)
Type T
is the type of the input field (input to map, T => X)
Type X
is the intermediate type, which your reduce function operates on
-(reduce is (X,X) => X)
Type U
is the final result type, (final map is: X => U)
The previous output goes into the reduce function on the left, like foldLeft, -so if your operation is faster for the accumulator to be on one side, be aware. -
Corresponds to a Cascading Buffer -which allows you to stream through the data, keeping some, dropping, scanning, etc.
Corresponds to a Cascading Buffer -which allows you to stream through the data, keeping some, dropping, scanning, etc... -The iterator you are passed is lazy, and mapping will not trigger the -entire evaluation. If you convert to a list (i.e. to reverse), you need to be aware -that memory constraints may become an issue.
Any fields not referenced by the input fields will be aligned to the first output, -and the final hadoop stream will have a length of the maximum of the output of this, and -the input stream. So, if you change the length of your inputs, the other fields won't -be aligned. YOU NEED TO INCLUDE ALL THE FIELDS YOU WANT TO KEEP ALIGNED IN THIS MAPPING! -POB: This appears to be a Cascading design decision.
mapfn needs to be stateless. Multiple calls needs to be safe (no mutable -state captured) -
these will only be called if a tuple is not passed, meaning just one -column -
these will only be called if a tuple is not passed, meaning just one -column -
Similar to the scala.
Similar to the scala.collection.Iterable.mkString -takes the source and destination fieldname, which should be a single -field. The result will be start, each item.toString separated by sep, -followed by end for convenience there several common variants below -
An identity function that keeps all the tuples.
An identity function that keeps all the tuples. A hack to implement -groupAll and groupRandomly. -
Opposite of RichPipe.
Opposite of RichPipe.unpivot. See SQL/Excel for more on this function -converts a row-wise representation into a column-wise one.
-pivot(('feature, 'value) -> ('clicks, 'impressions, 'requests)) -
it will find the feature named "clicks", and put the value in the column with the field named -clicks.
Absent fields result in null unless a default value is provided. Unnamed output fields are ignored.
Duplicated fields will result in an error.
if you want more precision, first do a
-map('value -> value) { x : AnyRef => Option(x) } -
and you will have non-nulls for all present values, and Nones for values that were present -but previously null. All nulls in the final output will be those truly missing. -Similarly, if you want to check if there are any items present that shouldn't be:
-map('feature -> 'feature) { fname : String => - if (!goodFeatures(fname)) { throw new Exception("ohnoes") } - else fname -} -
The same as plus(fs -> fs)
-
The same as plus(fs -> fs)
-
Use Monoid.plus
to compute a sum.
Use Monoid.plus
to compute a sum. Not called sum to avoid conflicting with standard sum
-Your Monoid[T]
should be associated and commutative, else this doesn't make sense
-
Apply an associative/commutative operation on the left field.
Apply an associative/commutative operation on the left field.
-reduce(('mass,'allids)->('totalMass, 'idset)) { (left:(Double,Set[Long]),right:(Double,Set[Long])) => - (left._1 + right._1, left._2 ++ right._2) -} -
Equivalent to a mapReduceMap with trivial (identity) map functions.
The previous output goes into the reduce function on the left, like foldLeft, -so if your operation is faster for the accumulator to be on one side, be aware. -
Override the number of reducers used in the groupBy.
-Analog of standard scanLeft (@see scala.
Analog of standard scanLeft (@see scala.collection.Iterable.scanLeft ) -This invalidates map-side aggregation, forces all data to be transferred -to reducers. Use only if you REALLY have to.
Make sure init is an immutable object.
init needs to be serializable with Kryo (because we copy it for each -grouping to avoid possible errors using a mutable init object). - We override the default implementation here to use Kryo to serialize - the initial value, for immutable serializable inits, this is not needed -
How many values are there for this key -
How many values are there for this key -
Compute the count, ave and standard deviation in one pass -example: g.
Compute the count, ave and standard deviation in one pass -example: g.sizeAveStdev('x -> ('cntx, 'avex, 'stdevx)) -
This invalidates aggregateBy! -
This invalidates aggregateBy! -
Equivalent to sorting by a comparison function -then take-ing k items.
Equivalent to sorting by a comparison function -then take-ing k items. This is MUCH more efficient than doing a total sort followed by a take, -since these bounded sorts are done on the mapper, so only a sort of size k is needed.
-sortWithTake( ('clicks, 'tweet) -> 'topClicks, 5) { - fn : (t0 :(Long,Long), t1:(Long,Long) => t0._1 < t1._1 } -
topClicks will be a List[(Long,Long)] -
Reverse of above when the implicit ordering makes sense.
Reverse of above when the implicit ordering makes sense. -
Same as above but useful when the implicit ordering makes sense.
Same as above but useful when the implicit ordering makes sense. -
Override the spill threshold on AggregateBy -
-Only keep the first cnt elements -
Only keep the first cnt elements -
Take while the predicate is true, stopping at the -first false.
Take while the predicate is true, stopping at the -first false. Output all taken elements. -
This is convenience method to allow plugging in blocks
-of group operations similar to RichPipe.then
-
The same as times(fs -> fs)
-
The same as times(fs -> fs)
-
Returns the product of all the items in this grouping -
Returns the product of all the items in this grouping -
Convert a subset of fields into a list of Tuples.
Convert a subset of fields into a list of Tuples. Need to provide the types of the tuple fields. -
begining of block with access to expensive nonserializable state.
begining of block with access to expensive nonserializable state. The state object should -contain a function release() for resource management purpose. -
By default adds a column with name "count" counting the number in -this group.
By default adds a column with name "count" counting the number in -this group. deprecated, use size. -
(Since version 0.2.0) Use size instead to match the scala.collections.Iterable API
Represents a grouping which is the transition from map to reduce phase in hadoop. -Grouping is on a key of type K by ordering Ordering[K]. -
Use Algebird Aggregator to do the reduction -
Use Algebird Aggregator to do the reduction -
WARNING This behaves semantically very differently than cogroup.
WARNING This behaves semantically very differently than cogroup. -this is because we handle (K,T) tuples on the left as we see them. -the iterator on the right is over all elements with a matching key K, and it may be empty -if there are no values for this key K. -(because you haven't actually cogrouped, but only read the right hand side into a hashtable) -
Operate on a Stream[T] of all the values for each key at one time.
-This is a special case of mapValueStream, but can be optimized because it doesn't need -all the values for a given key at once.
-reduce with fn which must be associative and commutative.
-Cascading can't handle multiple head pipes with the same -name.
Cascading can't handle multiple head pipes with the same -name. This handles them by caching the source and only -having a single head pipe to represent each head. -
Cascading can't handle multiple head pipes with the same -name.
Cascading can't handle multiple head pipes with the same -name. This handles them by caching the source and only -having a single head pipe to represent each head. -
Cascading can't handle multiple head pipes with the same -name.
Cascading can't handle multiple head pipes with the same -name. This handles them by caching the source and only -having a single head pipe to represent each head. -
Returns the number of times that divides this and the remainder -The law is: that * result_.
Returns the number of times that divides this and the remainder -The law is: that * result_.1 + result._2 == this -
thrown when validateTaps fails -
Allows working with an iterable object defined in the job (on the submitter) -to be used within a Job as you would a Pipe/RichPipe
These lists should probably be very tiny by Hadoop standards. If they are -getting large, you should probably dump them to HDFS and use the normal -mechanisms to address the data (a FileSource). -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
If you want to filter, you should use this and output a 0 or 1 length Iterable.
If you want to filter, you should use this and output a 0 or 1 length Iterable. -Filter does not change column names, and we generally expect to change columns here -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
By default we only set two keys: -io.
By default we only set two keys: -io.serializations -cascading.tuple.element.comparator.default -Override this class, call base and ++ your additional -map to set more options -
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc.
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc...) ALL is the output. - Perhaps only fromFields=ALL will make sense -2) If one of from or to is a strict super set of the other, SWAP is used. -3) If they are equal, REPLACE is used. -4) Otherwise, ALL is used. -
Multi-entry fields.
Multi-entry fields. This are higher priority than Product conversions so -that List will not conflict with Product. -
Implement this method if you want some other jobs to run after the current -job.
Implement this method if you want some other jobs to run after the current -job. These will not execute until the current job has run successfully. -
you should never call these directly, there are here to make -the DSL work.
you should never call these directly, there are here to make -the DSL work. Just know, you can treat a Pipe as a RichPipe and -vice-versa within a Job -
Useful to convert f : Any* to Fields.
Useful to convert f : Any* to Fields. This handles mixed cases ("hey", 'you). -Not sure we should be this flexible, but given that Cascading will throw an -exception before scheduling the job, I guess this is okay. -
Handles treating any TupleN as a Fields object.
Handles treating any TupleN as a Fields object. -This is low priority because List is also a Product, but this method -will not work for List (because List is Product2(head, tail) and so -productIterator won't work as expected. -Lists are handled by an implicit in FieldConversions, which have -higher priority. -
'* means Fields.
'* means Fields.ALL, otherwise we take the .name -
This class is used to construct unit tests for scalding jobs. -You should not use it unless you are writing tests. -For examples of how to do that, see the tests included in the -main scalding repository: -https://github.com/twitter/scalding/tree/master/src/test/scala/com/twitter/scalding -
Performs a block join, otherwise known as a replicate fragment join (RF join).
Performs a block join, otherwise known as a replicate fragment join (RF join). -The input params leftReplication and rightReplication control the replication of the left and right -pipes respectively.
This is useful in cases where the data has extreme skew. A symptom of this is that we may see a job stuck for -a very long time on a small number of reducers.
A block join is way to get around this: we add a random integer field and a replica field -to every tuple in the left and right pipes. We then join on the original keys and -on these new dummy fields. These dummy fields make it less likely that the skewed keys will -be hashed to the same reducer.
The final data size is right * rightReplication + left * leftReplication -but because of the fragmentation, we are guaranteed the same number of hits as the original join.
If the right pipe is really small then you are probably better off with a joinWithTiny. If however -the right pipe is medium sized, then you are better off with a blockJoinWithSmaller, and a good rule -of thumb is to set rightReplication = left.size / right.size and leftReplication = 1
Finally, if both pipes are of similar size, e.g. in case of a self join with a high data skew, -then it makes sense to set leftReplication and rightReplication to be approximately equal.
You can only use an InnerJoin or a LeftJoin with a leftReplication of 1 -(or a RightJoin with a rightReplication of 1) when doing a blockJoin. -
This method is used internally to implement all joins.
This method is used internally to implement all joins. -You can use this directly if you want to implement something like a star join, -e.g., when joining a single pipe to multiple other pipes. Make sure that you call this method -on the larger pipe to make the grouping as efficient as possible.
If you are only joining two pipes, then you are better off -using joinWithSmaller/joinWithLarger/joinWithTiny/leftJoinWithTiny.
Joins the first set of keys in the first pipe to the second set of keys in the second pipe.
Joins the first set of keys in the first pipe to the second set of keys in the second pipe. -All keys must be unique UNLESS it is an inner join, then duplicated join keys are allowed, but -the second copy is deleted (as cascading does not allow duplicated field names).
Avoid going crazy adding more explicit join modes. Instead do for some other join -mode with a larger pipe:
-.then { pipe => other. - joinWithSmaller(('other1, 'other2)->('this1, 'this2), pipe, new FancyJoin) - } -
This does an assymmetric join, using cascading's "Join".
This does an assymmetric join, using cascading's "Join". This only runs through -this pipe once, and keeps the right hand side pipe in memory (but is spillable).
joins the first set of keys in the first pipe to the second set of keys in the second pipe. -All keys must be unique UNLESS it is an inner join, then duplicated join keys are allowed, but -the second copy is deleted (as cascading does not allow duplicated field names).
This does not work with outer joins, or right joins, only inner and -left join versions are given. -
Performs a skewed join, which is useful when the data has extreme skew.
Performs a skewed join, which is useful when the data has extreme skew.
For example, imagine joining a pipe of Twitter's follow graph against a pipe of user genders, -in order to find the gender distribution of the accounts every Twitter user follows. Since celebrities -(e.g., Justin Bieber and Lady Gaga) have a much larger follower base than other users, and (under -a standard join algorithm) all their followers get sent to the same reducer, the job will likely be -stuck on a few reducers for a long time. A skewed join attempts to alleviate this problem.
This works as follows:
1. First, we sample from the left and right pipes with some small probability, in order to determine - approximately how often each join key appears in each pipe. -2. We use these estimated counts to replicate the join keys, according to the given replication strategy. -3. Finally, we join the replicated pipes together. -
This controls how often we sample from the left and right pipes when estimating key counts.
Algorithm for determining how much to replicate a join key in the left and right pipes.
Note: since we do not set the replication counts, only inner joins are allowed. (Otherwise, replicated -rows would stay replicated when there is no counterpart in the other pipe.) -
This Source writes out the TupleEntry as a simple JSON object, using the field -names as keys and the string representation of the values.
TODO: it would be nice to have a way to add read/write transformations to pipes -that doesn't require extending the sources and overriding methods.
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
If you want to filter, you should use this and output a 0 or 1 length Iterable.
If you want to filter, you should use this and output a 0 or 1 length Iterable. -Filter does not change column names, and we generally expect to change columns here -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Represents sharded lists of items of type T -
Operate on a Stream[T] of all the values for each key at one time.
Operate on a Stream[T] of all the values for each key at one time. -Avoid accumulating the whole list in memory if you can. Prefer reduce. -
Use Algebird Aggregator to do the reduction -
-This is a special case of mapValueStream, but can be optimized because it doesn't need -all the values for a given key at once.
This is a special case of mapValueStream, but can be optimized because it doesn't need -all the values for a given key at once. An unoptimized implementation is: -mapValueStream { _.map { fn } } -but for Grouped we can avoid resorting to mapValueStream -
reduce with fn which must be associative and commutative.
reduce with fn which must be associative and commutative. -Like the above this can be optimized in some Grouped cases. -
Cascading can't handle multiple head pipes with the same -name.
Cascading can't handle multiple head pipes with the same -name. This handles them by caching the source and only -having a single head pipe to represent each head. -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
com.twitter.scalding.LowPriorityConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
Handles treating any TupleN as a Fields object.
Handles treating any TupleN as a Fields object. -This is low priority because List is also a Product, but this method -will not work for List (because List is Product2(head, tail) and so -productIterator won't work as expected. -Lists are handled by an implicit in FieldConversions, which have -higher priority. -
MapReduceMapBy Class -
This handles the mapReduceMap work on the map-side of the operation. The code below -attempts to be optimal with respect to memory allocations and performance, not functional -style purity. -
Usually as soon as we open a source, we read and do some mapping -operation on a single column or set of columns. -T is the type of the single column. If doing multiple columns -T will be a TupleN representing the types, e.g. (Int,Long,String)
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
If you want to filter, you should use this and output a 0 or 1 length Iterable.
If you want to filter, you should use this and output a 0 or 1 length Iterable. -Filter does not change column names, and we generally expect to change columns here -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Returns the number of times that divides this and the remainder -The law is: that * result_.
Returns the number of times that divides this and the remainder -The law is: that * result_.1 + result._2 == this -
Returns the number of times that divides this and the remainder -The law is: that * result_.
Returns the number of times that divides this and the remainder -The law is: that * result_.1 + result._2 == this -
This mode is used by default by sources in read and write -
-There are three ways to run jobs -sourceStrictness is set to true
Cascading can't handle multiple head pipes with the same -name.
Cascading can't handle multiple head pipes with the same -name. This handles them by caching the source and only -having a single head pipe to represent each head. -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
If you want to filter, you should use this and output a 0 or 1 length Iterable.
If you want to filter, you should use this and output a 0 or 1 length Iterable. -Filter does not change column names, and we generally expect to change columns here -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
A source outputs nothing. It is used to drive execution of a task for side effect only. -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
A tap that output nothing. It is used to drive execution of a task for side effect only. This -can be used to drive a pipe without actually writing to HDFS. -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
This just blindly uses the first public constructor with the same arity as the fields size -
One separated value (commonly used by Pig)
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Implements reductions on top of a simple abstraction for the Fields-API -This is for associative and commutive operations (particularly Monoids play a big role here)
We use the f-bounded polymorphism trick to return the type called Self -in each operation. -
Type T is the type of the input field (input to map, T => X) -Type X is the intermediate type, which your reduce function operates on -(reduce is (X,X) => X) -Type U is the final result type, (final map is: X => U)
Type T is the type of the input field (input to map, T => X) -Type X is the intermediate type, which your reduce function operates on -(reduce is (X,X) => X) -Type U is the final result type, (final map is: X => U)
The previous output goes into the reduce function on the left, like foldLeft, -so if your operation is faster for the accumulator to be on one side, be aware. -
Pretty much a synonym for mapReduceMap with the methods collected into a trait.
-Approximate number of unique values
-We use about m = (104/errPercent)^2 bytes of memory per key
-Uses .toString.getBytes
to serialize the data so you MUST
-ensure that .toString is an equivalance on your counted fields
-(i.e. x.toString == y.toString
if and only if x == y
)
Approximate number of unique values
-We use about m = (104/errPercent)^2 bytes of memory per key
-Uses .toString.getBytes
to serialize the data so you MUST
-ensure that .toString is an equivalance on your counted fields
-(i.e. x.toString == y.toString
if and only if x == y
)
For each key:
-10% error ~ 256 bytes -5% error ~ 1kb -1% error ~ 8kb -0.5% error ~ 64kb -0.25% error ~ 256kb -
uses a more stable online algorithm which should -be suitable for large numbers of records
uses a more stable online algorithm which should -be suitable for large numbers of records
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm -
This is count with a predicate: only counts the tuples for which
-fn(tuple)
is true
-
First do "times" on each pair, then "plus" them all together.
First do "times" on each pair, then "plus" them all together.
-groupBy('x) { _.dot('y,'z, 'ydotz) }
-
Return the first, useful probably only for sorted case.
-Collect all the values into a List[T] and then operate on that -list.
Collect all the values into a List[T] and then operate on that -list. This fundamentally uses as much memory as it takes to store the list. -This gives you the list in the reverse order it was encounted (it is built -as a stack for efficiency reasons). If you care about order, call .reverse in your fn
STRONGLY PREFER TO AVOID THIS. Try reduce or plus and an O(1) memory algorithm. -
these will only be called if a tuple is not passed, meaning just one -column -
-Similar to the scala.
Similar to the scala.collection.Iterable.mkString -takes the source and destination fieldname, which should be a single -field. The result will be start, each item.toString separated by sep, -followed by end for convenience there several common variants below -
Opposite of RichPipe.
Opposite of RichPipe.unpivot. See SQL/Excel for more on this function -converts a row-wise representation into a column-wise one.
-pivot(('feature, 'value) -> ('clicks, 'impressions, 'requests)) -
it will find the feature named "clicks", and put the value in the column with the field named -clicks.
Absent fields result in null unless a default value is provided. Unnamed output fields are ignored.
Duplicated fields will result in an error.
if you want more precision, first do a
-map('value -> value) { x : AnyRef => Option(x) } -
and you will have non-nulls for all present values, and Nones for values that were present -but previously null. All nulls in the final output will be those truly missing. -Similarly, if you want to check if there are any items present that shouldn't be:
-map('feature -> 'feature) { fname : String => - if (!goodFeatures(fname)) { throw new Exception("ohnoes") } - else fname -} -
The same as plus(fs -> fs)
-
Use Monoid.plus
to compute a sum.
Use Monoid.plus
to compute a sum. Not called sum to avoid conflicting with standard sum
-Your Monoid[T]
should be associated and commutative, else this doesn't make sense
-
Apply an associative/commutative operation on the left field.
Apply an associative/commutative operation on the left field.
-reduce(('mass,'allids)->('totalMass, 'idset)) { (left:(Double,Set[Long]),right:(Double,Set[Long])) => - (left._1 + right._1, left._2 ++ right._2) -} -
Equivalent to a mapReduceMap with trivial (identity) map functions.
The previous output goes into the reduce function on the left, like foldLeft, -so if your operation is faster for the accumulator to be on one side, be aware. -
How many values are there for this key -
-Compute the count, ave and standard deviation in one pass -example: g.
Compute the count, ave and standard deviation in one pass -example: g.sizeAveStdev('x -> ('cntx, 'avex, 'stdevx)) -
Equivalent to sorting by a comparison function -then take-ing k items.
Equivalent to sorting by a comparison function -then take-ing k items. This is MUCH more efficient than doing a total sort followed by a take, -since these bounded sorts are done on the mapper, so only a sort of size k is needed.
-sortWithTake( ('clicks, 'tweet) -> 'topClicks, 5) { - fn : (t0 :(Long,Long), t1:(Long,Long) => t0._1 < t1._1 } -
topClicks will be a List[(Long,Long)] -
Reverse of above when the implicit ordering makes sense.
-Same as above but useful when the implicit ordering makes sense.
-The same as times(fs -> fs)
-
Returns the product of all the items in this grouping -
-Convert a subset of fields into a list of Tuples.
Convert a subset of fields into a list of Tuples. Need to provide the types of the tuple fields. -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
Packs a tuple into any object with set methods, e.g. thrift or proto objects. -TODO: verify that protobuf setters for field camel_name are of the form setCamelName. -In that case this code works for proto. -
A helper to check the passed-in -fields to see if Fields.
A helper to check the passed-in -fields to see if Fields.ALL is set. -If it is, return lazy allFields. -
A helper for working with class reflection. -Allows us to avoid code repetition. -
Returns the set of fields in the given class.
Returns the set of fields in the given class. -We use a List to ensure fields are in the same -order they were declared. -
RichDate adds some nice convenience functions to the Java date/calendar classes -We commonly do Date/Time work in analysis jobs, so having these operations convenient -is very helpful.
Use String.
Use String.format to format the date, as opposed to toString with uses SimpleDateFormat -
Use SimpleDateFormat to print the string -
-Gets the underlying config for this pipe and sets the number of reducers -useful for cascading GroupBy/CoGroup pipes.
-Merge or Concatenate several pipes together with this one: -
-Adds a trap to the current pipe, -which will capture all exceptions that occur in this pipe -and save them to the trapsource given
Adds a trap to the current pipe, -which will capture all exceptions that occur in this pipe -and save them to the trapsource given
Traps do not include the original fields in a tuple, -only the fields seen in an operation. -Traps also do not include any exception information.
There can only be at most one trap for each pipe. -
Performs a block join, otherwise known as a replicate fragment join (RF join).
Performs a block join, otherwise known as a replicate fragment join (RF join). -The input params leftReplication and rightReplication control the replication of the left and right -pipes respectively.
This is useful in cases where the data has extreme skew. A symptom of this is that we may see a job stuck for -a very long time on a small number of reducers.
A block join is way to get around this: we add a random integer field and a replica field -to every tuple in the left and right pipes. We then join on the original keys and -on these new dummy fields. These dummy fields make it less likely that the skewed keys will -be hashed to the same reducer.
The final data size is right * rightReplication + left * leftReplication -but because of the fragmentation, we are guaranteed the same number of hits as the original join.
If the right pipe is really small then you are probably better off with a joinWithTiny. If however -the right pipe is medium sized, then you are better off with a blockJoinWithSmaller, and a good rule -of thumb is to set rightReplication = left.size / right.size and leftReplication = 1
Finally, if both pipes are of similar size, e.g. in case of a self join with a high data skew, -then it makes sense to set leftReplication and rightReplication to be approximately equal.
You can only use an InnerJoin or a LeftJoin with a leftReplication of 1 -(or a RightJoin with a rightReplication of 1) when doing a blockJoin. -
This method is used internally to implement all joins.
This method is used internally to implement all joins. -You can use this directly if you want to implement something like a star join, -e.g., when joining a single pipe to multiple other pipes. Make sure that you call this method -on the larger pipe to make the grouping as efficient as possible.
If you are only joining two pipes, then you are better off -using joinWithSmaller/joinWithLarger/joinWithTiny/leftJoinWithTiny.
Discard the given fields, and keep the rest -Kind of the opposite previous.
-Convenience method for integrating with existing cascading Functions -
-Same as above, but only keep the results field.
-the same as
the same as
-flatMap(fs) { it : Iterable[T] => it }
-
Common enough to be useful. -
Force a materialization to disk in the flow.
Force a materialization to disk in the flow. -This is useful before crossWithTiny if you filter just before. Ideally scalding/cascading would -see this (and may in future versions), but for now it is here to aid in hand-tuning jobs -
This kills parallelism.
This kills parallelism. All the work is sent to one reducer.
Only use this in the case that you truly need all the data on one -reducer.
Just about the only reasonable case of this data is to reduce all values of a column -or count all the rows. -
Group all tuples down to one reducer.
Group all tuples down to one reducer. -(due to cascading limitation). -This is probably only useful just before setting a tail such as Database -tail, so that only one reducer talks to the DB. Kind of a hack. -
group
group
builder is typically a block that modifies the given GroupBuilder -the final OUTPUT of the block is used to schedule the new pipe -each method in GroupBuilder returns this, so it is recommended -to chain them and use the default input:
- _.size.max('f1) etc... -
Like groupAll, but randomly groups data into n reducers.
Like groupAll, but randomly groups data into n reducers.
you can provide a seed for the random number generator -to get reproducible results -
Adds a field with a constant value.
Adds a field with a constant value.
-insert('a, 1)
-
Joins the first set of keys in the first pipe to the second set of keys in the second pipe.
Joins the first set of keys in the first pipe to the second set of keys in the second pipe. -All keys must be unique UNLESS it is an inner join, then duplicated join keys are allowed, but -the second copy is deleted (as cascading does not allow duplicated field names).
Avoid going crazy adding more explicit join modes. Instead do for some other join -mode with a larger pipe:
-.then { pipe => other. - joinWithSmaller(('other1, 'other2)->('this1, 'this2), pipe, new FancyJoin) - } -
This does an assymmetric join, using cascading's "Join".
This does an assymmetric join, using cascading's "Join". This only runs through -this pipe once, and keeps the right hand side pipe in memory (but is spillable).
joins the first set of keys in the first pipe to the second set of keys in the second pipe. -All keys must be unique UNLESS it is an inner join, then duplicated join keys are allowed, but -the second copy is deleted (as cascading does not allow duplicated field names).
This does not work with outer joins, or right joins, only inner and -left join versions are given. -
Keep at most n elements.
Keep at most n elements. This is implemented by keeping -approximately n/k elements on each of the k mappers or reducers (whichever we wind -up being scheduled on). -
If you use a map function that does not accept TupleEntry args,
-which is the common case, an implicit conversion in GeneratedConversions
-will convert your function into a (TupleEntry => T)
.
If you use a map function that does not accept TupleEntry args,
-which is the common case, an implicit conversion in GeneratedConversions
-will convert your function into a (TupleEntry => T)
. The result type
-T is converted to a cascading Tuple by an implicit TupleSetter[T]
.
-acceptable T types are primitive types, cascading Tuples of those types,
-or scala.Tuple(1-22)
of those types.
After the map, the input arguments will be set to the output of the map, -so following with filter or map is fine without a new using statement if -you mean to operate on the output.
-map('data -> 'stuff)
-
* if output equals input, REPLACE is used. -* if output or input is a subset of the other SWAP is used. -* otherwise we append the new fields (cascading Fields.ALL is used)
-mapTo('data -> 'stuff)
-
Only the results (stuff) are kept (cascading Fields.RESULTS)
Using mapTo is the same as using map followed by a project for -selecting just the ouput fields -
Rename the current pipe -
-Divides sum of values for this variable by their sum; assumes without checking that division is supported -on this type and that sum is not zero
Divides sum of values for this variable by their sum; assumes without checking that division is supported -on this type and that sum is not zero
If those assumptions do not hold, will throw an exception -- consider checking sum sepsarately and/or using addTrap
in some cases, crossWithTiny has been broken, the implementation supports a work-around -
Maps the input fields into an output field of type T.
Maps the input fields into an output field of type T. For example:
- pipe.pack[(Int, Int)] (('field1, 'field2) -> 'field3) -
will pack fields 'field1 and 'field2 to field 'field3, as long as 'field1 and 'field2
-can be cast into integers. The output field 'field3 will be of tupel (Int, Int)
Same as pack but only the to fields are preserved.
-Given a function, partitions the pipe into several groups based on the -output of the function.
Given a function, partitions the pipe into several groups based on the -output of the function. Then applies a GroupBuilder function on each of the -groups.
Example: - pipe - .mapTo(()->('age, 'weight) { ... } - .partition('age -> 'isAdult) { _ > 18 } { _.average('weight) } -pipe now contains the average weights of adults and minors. -
Keep only the given fields, and discard the rest.
Keep only the given fields, and discard the rest. -takes any number of parameters as long as we can convert -them to a fields object -
Rename some set of N fields as another set of N fields
Rename some set of N fields as another set of N fields
-rename('x -> 'z) - rename(('x,'y) -> ('X,'Y)) -
rename('x,'y)
is interpreted by scala as rename(Tuple2('x,'y))
-which then does rename('x -> 'y)
. This is probably not what is intended
-but the compiler doesn't resolve the ambiguity. YOU MUST CALL THIS WITH
-A TUPLE2! If you don't, expect the unexpected.
-
Put all rows in random order
Put all rows in random order
you can provide a seed for the random number generator -to get reproducible results -
Performs a skewed join, which is useful when the data has extreme skew.
Performs a skewed join, which is useful when the data has extreme skew.
For example, imagine joining a pipe of Twitter's follow graph against a pipe of user genders, -in order to find the gender distribution of the accounts every Twitter user follows. Since celebrities -(e.g., Justin Bieber and Lady Gaga) have a much larger follower base than other users, and (under -a standard join algorithm) all their followers get sent to the same reducer, the job will likely be -stuck on a few reducers for a long time. A skewed join attempts to alleviate this problem.
This works as follows:
1. First, we sample from the left and right pipes with some small probability, in order to determine - approximately how often each join key appears in each pipe. -2. We use these estimated counts to replicate the join keys, according to the given replication strategy. -3. Finally, we join the replicated pipes together. -
This controls how often we sample from the left and right pipes when estimating key counts.
Algorithm for determining how much to replicate a join key in the left and right pipes.
Note: since we do not set the replication counts, only inner joins are allowed. (Otherwise, replicated -rows would stay replicated when there is no counterpart in the other pipe.) -
Insert a function into the pipeline: -
-Returns the set of unique tuples containing the specified fields -
-The opposite of pack.
The opposite of pack. Unpacks the input field of type T
into
-the output fields. For example:
- pipe.unpack[(Int, Int)] ('field1 -> ('field2, 'field3)) -
will unpack 'field1 into 'field2 and 'field3 -
Same as unpack but only the to fields are preserved.
-This is an analog of the SQL/Excel unpivot function which converts columns of data -into rows of data.
This is an analog of the SQL/Excel unpivot function which converts columns of data -into rows of data. Only the columns given as input fields are expanded in this way. -For this operation to be reversible, you need to keep some unique key on each row. -See GroupBuilder.pivot to reverse this operation assuming you leave behind a grouping key
-pipe.unpivot(('w,'x,'y,'z) -> ('feature, 'value)) -
takes rows like:
-key, w, x, y, z -1, 2, 3, 4, 5 -2, 8, 7, 6, 5 -
to:
-key, feature, value -1, w, 2 -1, x, 3 -1, y, 4 -
etc... -
begining of block with access to expensive nonserializable state.
begining of block with access to expensive nonserializable state. The state object should -contain a function release() for resource management purpose. -
Scala 2.8 Iterators don't support scanLeft so we have to reimplement -
(Changed in version 2.8.0) collect
has changed. The previous behavior can be reproduced with toSeq
.
(Since version 2.3.2) use ++
(Since version 2.8.0) use zipWithIndex in Iterator
(Since version 2.8.0) use indexWhere
instead
(Since version 2.8.0) use copyToArray instead
(Since version 2.8.0) use copyToArray instead
(Since version 2.8.0) use copyToArray instead
By default we only set two keys: -io.
By default we only set two keys: -io.serializations -cascading.tuple.element.comparator.default -Override this class, call base and ++ your additional -map to set more options -
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc.
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc...) ALL is the output. - Perhaps only fromFields=ALL will make sense -2) If one of from or to is a strict super set of the other, SWAP is used. -3) If they are equal, REPLACE is used. -4) Otherwise, ALL is used. -
Multi-entry fields.
Multi-entry fields. This are higher priority than Product conversions so -that List will not conflict with Product. -
Implement this method if you want some other jobs to run after the current -job.
Implement this method if you want some other jobs to run after the current -job. These will not execute until the current job has run successfully. -
you should never call these directly, there are here to make -the DSL work.
you should never call these directly, there are here to make -the DSL work. Just know, you can treat a Pipe as a RichPipe and -vice-versa within a Job -
Useful to convert f : Any* to Fields.
Useful to convert f : Any* to Fields. This handles mixed cases ("hey", 'you). -Not sure we should be this flexible, but given that Cascading will throw an -exception before scheduling the job, I guess this is okay. -
Handles treating any TupleN as a Fields object.
Handles treating any TupleN as a Fields object. -This is low priority because List is also a Product, but this method -will not work for List (because List is Product2(head, tail) and so -productIterator won't work as expected. -Lists are handled by an implicit in FieldConversions, which have -higher priority. -
'* means Fields.
'* means Fields.ALL, otherwise we take the .name -
Returns the number of times that divides this and the remainder -The law is: that * result_.
Returns the number of times that divides this and the remainder -The law is: that * result_.1 + result._2 == this -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Represents a strategy for replicating rows when performing skewed joins. -
Given the estimated frequencies of a join key in two pipes that we want to skew-join together, -this returns the key's replication amount in each pipe.
Given the estimated frequencies of a join key in two pipes that we want to skew-join together, -this returns the key's replication amount in each pipe.
Note: if we switch to a Count-Min sketch, we'll need to change the meaning of these counts -from "sampled counts" to "estimates of full counts", and also change how we deal with counts of -zero. -
See https://github.com/twitter/scalding/pull/229#issuecomment-10773810 -
Given the estimated frequencies of a join key in two pipes that we want to skew-join together, -this returns the key's replication amount in each pipe.
Given the estimated frequencies of a join key in two pipes that we want to skew-join together, -this returns the key's replication amount in each pipe.
Note: if we switch to a Count-Min sketch, we'll need to change the meaning of these counts -from "sampled counts" to "estimates of full counts", and also change how we deal with counts of -zero. -
See https://github.com/twitter/scalding/pull/229#issuecomment-10792296 -
Given the estimated frequencies of a join key in two pipes that we want to skew-join together, -this returns the key's replication amount in each pipe.
Given the estimated frequencies of a join key in two pipes that we want to skew-join together, -this returns the key's replication amount in each pipe.
Note: if we switch to a Count-Min sketch, we'll need to change the meaning of these counts -from "sampled counts" to "estimates of full counts", and also change how we deal with counts of -zero. -
Every source must have a correct toString method. If you use -case classes for instances of sources, you will get this for free. -This is one of the several reasons we recommend using cases classes
java.io.Serializable is needed if the Source is going to have any -methods attached that run on mappers or reducers, which will happen -if you implement transformForRead or transformForWrite.
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
-A simple trait for releasable resource. Provides noop implementation. -
Implements reductions on top of a simple abstraction for the Fields-API -We use the f-bounded polymorphism trick to return the type called Self -in each operation. -
Corresponds to a Cascading Buffer -which allows you to stream through the data, keeping some, dropping, scanning, etc.
Corresponds to a Cascading Buffer -which allows you to stream through the data, keeping some, dropping, scanning, etc... -The iterator you are passed is lazy, and mapping will not trigger the -entire evaluation. If you convert to a list (i.e. to reverse), you need to be aware -that memory constraints may become an issue.
WARNING: Any fields not referenced by the input fields will be aligned to the first output, -and the final hadoop stream will have a length of the maximum of the output of this, and -the input stream. So, if you change the length of your inputs, the other fields won't -be aligned. YOU NEED TO INCLUDE ALL THE FIELDS YOU WANT TO KEEP ALIGNED IN THIS MAPPING! -POB: This appears to be a Cascading design decision.
WARNING: mapfn needs to be stateless. Multiple calls needs to be safe (no mutable -state captured) -
Remove the first cnt elements -
-Drop while the predicate is true, starting at the first false, output all -
-Only keep the first cnt elements -
-Take while the predicate is true, stopping at the -first false.
Take while the predicate is true, stopping at the -first false. Output all taken elements. -
Ensures that a _SUCCESS file is present in the Source path. -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
implicits for the type-safe DSL -import TDsl._ to get the implicit conversions from Grouping/CoGrouping to Pipe, - to get the .toTypedPipe method on standard cascading Pipes. - to get automatic conversion of Mappable[T] to TypedPipe[T] -
Memory only testing for unit tests
Cascading can't handle multiple head pipes with the same -name.
Cascading can't handle multiple head pipes with the same -name. This handles them by caching the source and only -having a single head pipe to represent each head. -
Cascading can't handle multiple head pipes with the same -name.
Cascading can't handle multiple head pipes with the same -name. This handles them by caching the source and only -having a single head pipe to represent each head. -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
If you want to filter, you should use this and output a 0 or 1 length Iterable.
If you want to filter, you should use this and output a 0 or 1 length Iterable. -Filter does not change column names, and we generally expect to change columns here -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
The fields here are ('offset, 'line)
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
If you want to filter, you should use this and output a 0 or 1 length Iterable.
If you want to filter, you should use this and output a 0 or 1 length Iterable. -Filter does not change column names, and we generally expect to change columns here -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
This will automatically produce a globbed version of the given path. -THIS MEANS YOU MUST END WITH A / followed by * to match a file -For writing, we write to the directory specified by the END time. -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Allows you to set the job for the Tool to run -TODO: currently, Mode.
Allows you to set the job for the Tool to run -TODO: currently, Mode.mode must be set BEFORE your job is instantiated. -so, this function MUST call "new" somewhere inside, it cannot return an -already constructed job (else the mode is not set properly) -
Tab separated value source
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Mixed in to both TupleConverter and TupleSetter to improve arity safety -of cascading jobs before we run anything on Hadoop.
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.TupleConversions
-com.twitter.scalding.TupleConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.TupleConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.TupleConversions
-com.twitter.scalding.TupleConversions
-com.twitter.scalding.TupleConversions
-com.twitter.scalding.TupleConversions
-com.twitter.scalding.TupleConversions
-com.twitter.scalding.TupleConversions
-com.twitter.scalding.TupleConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.TupleConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
com.twitter.scalding.TupleConversions
-com.twitter.scalding.TupleConversions
-Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
Base class for classes which pack a Tuple into a serializable object. -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
Return the arity of product types, should probably only be used implicitly -The use case here is to see how many fake field names we need in Cascading -to hold an intermediate value for mapReduceMap -
assert that the arity of this setter matches the fields given.
assert that the arity of this setter matches the fields given. -if arity == -1, we can't check, and if Fields is not a definite -size, (such as Fields.ALL), we also cannot check, so this should -only be considered a weak check. -
Base class for objects which unpack an object into a tuple. -The packer can verify the arity, types, and also the existence -of the getter methods at plan time, without having the job -blow up in the middle of a run. -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
If you want to filter, you should use this and output a 0 or 1 length Iterable.
If you want to filter, you should use this and output a 0 or 1 length Iterable. -Filter does not change column names, and we generally expect to change columns here -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
factory methods for TypedPipe -
Represents a phase in a distributed computation on an input data source -Wraps a cascading Pipe object, and holds the transformation done up until that point -
Same as groupAll.
Same as groupAll.aggregate.values -
Force a materialization of this pipe prior to the next operation.
Force a materialization of this pipe prior to the next operation. -This is useful if you filter almost everything before a hashJoin, for instance. -
limit the output to at most count items.
limit the output to at most count items. -useful for debugging, but probably that's about it. -The number may be less than count, and not sampled particular method -
This actually runs all the pure map functions in one Cascading Each -This approach is more efficient than untyped scalding because we -don't use TupleConverters/Setters after each map.
This actually runs all the pure map functions in one Cascading Each -This approach is more efficient than untyped scalding because we -don't use TupleConverters/Setters after each map. -The output pipe has a single item CTuple with an object of type T in position 0 -
Reasonably common shortcut for cases of associative/commutative reduction -returns a typed pipe with only one element.
-A convenience method equivalent to toPipe(fieldNames).
A convenience method equivalent to toPipe(fieldNames).write(dest)
a pipe equivalent to the current pipe. -
Allows you to set the types, prefer this: -If T is a subclass of Product, we assume it is a tuple. If it is not, wrap T in a Tuple1: -e.g. TypedTsv[Tuple1[List[Int]]] -
By default we only set two keys: -io.
By default we only set two keys: -io.serializations -cascading.tuple.element.comparator.default -Override this class, call base and ++ your additional -map to set more options -
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc.
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc...) ALL is the output. - Perhaps only fromFields=ALL will make sense -2) If one of from or to is a strict super set of the other, SWAP is used. -3) If they are equal, REPLACE is used. -4) Otherwise, ALL is used. -
Multi-entry fields.
Multi-entry fields. This are higher priority than Product conversions so -that List will not conflict with Product. -
Implement this method if you want some other jobs to run after the current -job.
Implement this method if you want some other jobs to run after the current -job. These will not execute until the current job has run successfully. -
you should never call these directly, there are here to make -the DSL work.
you should never call these directly, there are here to make -the DSL work. Just know, you can treat a Pipe as a RichPipe and -vice-versa within a Job -
Useful to convert f : Any* to Fields.
Useful to convert f : Any* to Fields. This handles mixed cases ("hey", 'you). -Not sure we should be this flexible, but given that Cascading will throw an -exception before scheduling the job, I guess this is okay. -
Handles treating any TupleN as a Fields object.
Handles treating any TupleN as a Fields object. -This is low priority because List is also a Product, but this method -will not work for List (because List is Product2(head, tail) and so -productIterator won't work as expected. -Lists are handled by an implicit in FieldConversions, which have -higher priority. -
'* means Fields.
'* means Fields.ALL, otherwise we take the .name -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
This example job does not yet work. It is a test for Kyro serialization
By default we only set two keys: -io.
By default we only set two keys: -io.serializations -cascading.tuple.element.comparator.default -Override this class, call base and ++ your additional -map to set more options -
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc.
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc...) ALL is the output. - Perhaps only fromFields=ALL will make sense -2) If one of from or to is a strict super set of the other, SWAP is used. -3) If they are equal, REPLACE is used. -4) Otherwise, ALL is used. -
Multi-entry fields.
Multi-entry fields. This are higher priority than Product conversions so -that List will not conflict with Product. -
Implement this method if you want some other jobs to run after the current -job.
Implement this method if you want some other jobs to run after the current -job. These will not execute until the current job has run successfully. -
you should never call these directly, there are here to make -the DSL work.
you should never call these directly, there are here to make -the DSL work. Just know, you can treat a Pipe as a RichPipe and -vice-versa within a Job -
Useful to convert f : Any* to Fields.
Useful to convert f : Any* to Fields. This handles mixed cases ("hey", 'you). -Not sure we should be this flexible, but given that Cascading will throw an -exception before scheduling the job, I guess this is okay. -
Handles treating any TupleN as a Fields object.
Handles treating any TupleN as a Fields object. -This is low priority because List is also a Product, but this method -will not work for List (because List is Product2(head, tail) and so -productIterator won't work as expected. -Lists are handled by an implicit in FieldConversions, which have -higher priority. -
'* means Fields.
'* means Fields.ALL, otherwise we take the .name -
Options: ---input: the three column TSV with node, comma-sep-out-neighbors, initial pagerank (set to 1.0 first) ---ouput: the name for the TSV you want to write to, same as above. -optional arguments: ---errorOut: name of where to write the L1 error between the input page-rank and the output - if this is omitted, we don't compute the error ---iterations: how many iterations to run inside this job. Default is 1, 10 is about as - much as cascading can handle. ---jumpprob: probability of a random jump, default is 0.15 ---convergence: if this is set, after every "--iterations" steps, we check the error and see - if we should continue. Since the error check is expensive (involving a join), you should - avoid doing this too frequently. 10 iterations is probably a good number to set. ---temp: this is the name where we will store a temporary output so we can compare to the previous - for convergence checking. If convergence is set, this MUST be.
By default we only set two keys: -io.
By default we only set two keys: -io.serializations -cascading.tuple.element.comparator.default -Override this class, call base and ++ your additional -map to set more options -
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc.
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc...) ALL is the output. - Perhaps only fromFields=ALL will make sense -2) If one of from or to is a strict super set of the other, SWAP is used. -3) If they are equal, REPLACE is used. -4) Otherwise, ALL is used. -
The basic idea is to groupBy the dst key with BOTH the nodeset and the edge rows.
The basic idea is to groupBy the dst key with BOTH the nodeset and the edge rows. -the nodeset rows have the old page-rank, the edge rows are reversed, so we can get -the incoming page-rank from the nodes that point to each destination. -
Multi-entry fields.
Multi-entry fields. This are higher priority than Product conversions so -that List will not conflict with Product. -
override this function to change how you generate a pipe of -(Long, String, Double) -where the first entry is the nodeid, the second is the list of neighbors, -as a comma (no spaces) separated string representation of the numeric nodeids, -the third is the initial page rank (if not starting from a previous run, this -should be 1.
override this function to change how you generate a pipe of -(Long, String, Double) -where the first entry is the nodeid, the second is the list of neighbors, -as a comma (no spaces) separated string representation of the numeric nodeids, -the third is the initial page rank (if not starting from a previous run, this -should be 1.0
NOTE: if you want to run until convergence, the initialize method must read the same -EXACT format as the output method writes. This is your job! -
Here is where we check for convergence and then run the next job if we're not converged -
-you should never call these directly, there are here to make -the DSL work.
you should never call these directly, there are here to make -the DSL work. Just know, you can treat a Pipe as a RichPipe and -vice-versa within a Job -
Useful to convert f : Any* to Fields.
Useful to convert f : Any* to Fields. This handles mixed cases ("hey", 'you). -Not sure we should be this flexible, but given that Cascading will throw an -exception before scheduling the job, I guess this is okay. -
Handles treating any TupleN as a Fields object.
Handles treating any TupleN as a Fields object. -This is low priority because List is also a Product, but this method -will not work for List (because List is Product2(head, tail) and so -productIterator won't work as expected. -Lists are handled by an implicit in FieldConversions, which have -higher priority. -
'* means Fields.
'* means Fields.ALL, otherwise we take the .name -
weighted page rank for the given graph, start from the given pagerank, -perform one iteartion, test for convergence, if not yet, clone itself -and start the next page rank job with updated pagerank as input.
This class is very similar to the PageRank class, main differences are: -1. supported weighted pagerank -2. the reset pagerank is pregenerated, possibly through a previous job -3. dead pagerank is evenly distributed
Options: ---pwd: working directory, will read/generate the following files there - numnodes: total number of nodes - nodes: nodes file <'src_id, 'dst_ids, 'weights, 'mass_prior> - pagerank: the page rank file eg pagerank_0, pagerank_1 etc - totaldiff: the current max pagerank delta -Optional arguments: ---weighted: do weighted pagerank, default false ---curiteration: what is the current iteration, default 0 ---maxiterations: how many iterations to run. Default is 20 ---jumpprob: probability of a random jump, default is 0.1 ---threshold: total difference before finishing early, default 0.001 -
By default we only set two keys: -io.
By default we only set two keys: -io.serializations -cascading.tuple.element.comparator.default -Override this class, call base and ++ your additional -map to set more options -
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc.
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc...) ALL is the output. - Perhaps only fromFields=ALL will make sense -2) If one of from or to is a strict super set of the other, SWAP is used. -3) If they are equal, REPLACE is used. -4) Otherwise, ALL is used. -
one iteration of pagerank -inputPagerank: <'src_id_input, 'mass_input> -return <'src_id, 'mass_n, 'mass_input>
one iteration of pagerank -inputPagerank: <'src_id_input, 'mass_input> -return <'src_id, 'mass_n, 'mass_input>
Here is a highlevel view of the unweighted algorithm: -let -N: number of nodes -inputPagerank(N_i): prob of walking to node i, -d(N_j): N_j's out degree -then -pagerankNext(N_i) = (\sum_{j points to i} inputPagerank(N_j) / d_j) -deadPagerank = (1 - \sum_{i} pagerankNext(N_i)) / N -randomPagerank(N_i) = userMass(N_i) * ALPHA + deadPagerank * (1-ALPHA) -pagerankOutput(N_i) = randomPagerank(N_i) + pagerankNext(N_i) * (1-ALPHA)
For weighted algorithm: -let -w(N_j, N_i): weight from N_j to N_i -tw(N_j): N_j's total out weights -then -pagerankNext(N_i) = (\sum_{j points to i} inputPagerank(N_j) * w(N_j, N_i) / tw(N_j))
Multi-entry fields.
Multi-entry fields. This are higher priority than Product conversions so -that List will not conflict with Product. -
read the pregenerated nodes file <'src_id, 'dst_ids, 'weights, 'mass_prior> -
-the total number of nodes, single line file -
-test convergence, if not yet, kick off the next iteration -
test convergence, if not yet, kick off the next iteration -
you should never call these directly, there are here to make -the DSL work.
you should never call these directly, there are here to make -the DSL work. Just know, you can treat a Pipe as a RichPipe and -vice-versa within a Job -
Useful to convert f : Any* to Fields.
Useful to convert f : Any* to Fields. This handles mixed cases ("hey", 'you). -Not sure we should be this flexible, but given that Cascading will throw an -exception before scheduling the job, I guess this is okay. -
Handles treating any TupleN as a Fields object.
Handles treating any TupleN as a Fields object. -This is low priority because List is also a Product, but this method -will not work for List (because List is Product2(head, tail) and so -productIterator won't work as expected. -Lists are handled by an implicit in FieldConversions, which have -higher priority. -
'* means Fields.
'* means Fields.ALL, otherwise we take the .name -
A weighted PageRank implementation using the Scalding Matrix API. This
-assumes that all rows and columns are of type Int
and values or egde
-weights are Double
. If you want an unweighted PageRank, simply set
-the weights on the edges to 1.
Input arguments:
d -- damping factor - n -- number of nodes in the graph - currentIteration -- start with 0 probably - maxIterations -- stop after n iterations - convergenceThreshold -- using the sum of the absolute difference between - iteration solutions, iterating stops once we reach - this threshold - rootDir -- the root directory holding all starting, intermediate and final - data/output
The expected structure of the rootDir is:
rootDir - |- iterations - | |- 0 <-- a TSV of (row, value) of size n, value can be 1/n (generate this) - | |- n <-- holds future iterations/solutions - |- edges <-- a TSV of (row, column, value) for edges in the graph - |- onesVector <-- a TSV of (row, 1) of size n (generate this) - |- diff <-- a single line representing the difference between the last iterations - |- constants <-- built at iteration 0, these are constant for any given matrix/graph - |- M_hat - |- priorVector
Don't forget to set the number of reducers for this job: --D mapred.reduce.tasks=n -
Load or generate on first iteration the matrix M^ given A. -
-By default we only set two keys: -io.
By default we only set two keys: -io.serializations -cascading.tuple.element.comparator.default -Override this class, call base and ++ your additional -map to set more options -
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc.
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc...) ALL is the output. - Perhaps only fromFields=ALL will make sense -2) If one of from or to is a strict super set of the other, SWAP is used. -3) If they are equal, REPLACE is used. -4) Otherwise, ALL is used. -
Multi-entry fields.
Multi-entry fields. This are higher priority than Product conversions so -that List will not conflict with Product. -
Measure convergence by calculating the total of the absolute difference -between the previous and next vectors.
Measure convergence by calculating the total of the absolute difference -between the previous and next vectors. This stores the result after -calculation. -
Recurse and iterate again iff we are under the max number of iterations and -vector has not converged.
Recurse and iterate again iff we are under the max number of iterations and -vector has not converged. -
you should never call these directly, there are here to make -the DSL work.
you should never call these directly, there are here to make -the DSL work. Just know, you can treat a Pipe as a RichPipe and -vice-versa within a Job -
Useful to convert f : Any* to Fields.
Useful to convert f : Any* to Fields. This handles mixed cases ("hey", 'you). -Not sure we should be this flexible, but given that Cascading will throw an -exception before scheduling the job, I guess this is okay. -
Load or generate on first iteration the prior vector given d and n.
-Handles treating any TupleN as a Fields object.
Handles treating any TupleN as a Fields object. -This is low priority because List is also a Product, but this method -will not work for List (because List is Product2(head, tail) and so -productIterator won't work as expected. -Lists are handled by an implicit in FieldConversions, which have -higher priority. -
'* means Fields.
'* means Fields.ALL, otherwise we take the .name -
By default we only set two keys: -io.
By default we only set two keys: -io.serializations -cascading.tuple.element.comparator.default -Override this class, call base and ++ your additional -map to set more options -
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc.
Rather than give the full power of cascading's selectors, we have -a simpler set of rules encoded below: -1) if the input is non-definite (ALL, GROUP, ARGS, etc...) ALL is the output. - Perhaps only fromFields=ALL will make sense -2) If one of from or to is a strict super set of the other, SWAP is used. -3) If they are equal, REPLACE is used. -4) Otherwise, ALL is used. -
Multi-entry fields.
Multi-entry fields. This are higher priority than Product conversions so -that List will not conflict with Product. -
Implement this method if you want some other jobs to run after the current -job.
Implement this method if you want some other jobs to run after the current -job. These will not execute until the current job has run successfully. -
you should never call these directly, there are here to make -the DSL work.
you should never call these directly, there are here to make -the DSL work. Just know, you can treat a Pipe as a RichPipe and -vice-versa within a Job -
Useful to convert f : Any* to Fields.
Useful to convert f : Any* to Fields. This handles mixed cases ("hey", 'you). -Not sure we should be this flexible, but given that Cascading will throw an -exception before scheduling the job, I guess this is okay. -
Handles treating any TupleN as a Fields object.
Handles treating any TupleN as a Fields object. -This is low priority because List is also a Product, but this method -will not work for List (because List is Product2(head, tail) and so -productIterator won't work as expected. -Lists are handled by an implicit in FieldConversions, which have -higher priority. -
'* means Fields.
'* means Fields.ALL, otherwise we take the .name -
This example job does not yet work.
-Options: ---input: the three column TSV with node, comma-sep-out-neighbors, initial pagerank (set to 1.
-weighted page rank for the given graph, start from the given pagerank, -perform one iteartion, test for convergence, if not yet, clone itself -and start the next page rank job with updated pagerank as input.
-A weighted PageRank implementation using the Scalding Matrix API.
-com.twitter.scalding.mathematics
-com.twitter.scalding.mathematics
-com.twitter.scalding.mathematics
-com.twitter.scalding.mathematics
-com.twitter.scalding.mathematics
-like zipWithIndex.
like zipWithIndex.map but ONLY CHANGES THE VALUE not the index. -Note you will only see non-zero elements on the vector. This does not enumerate the zeros -
Write optionally renaming val fields to the given fields -then return this.
-com.twitter.scalding.mathematics
-Serve as a repo for self-contained combinatorial functions with no dependencies -such as -combinations, aka n choose k, nCk -permutations , aka nPk -subset sum : numbers that add up to a finite sum -weightedSum: For weights (a,b,c, ...), want integers (x,y,z,...) to satisfy constraint |ax + by + cz + ... - result | < error -... -
Return a pipe with all nCk combinations, with k columns per row -
-Given an int k, and an input of size n, - return a pipe with nCk combinations, with k columns per row
Given an int k, and an input of size n, - return a pipe with nCk combinations, with k columns per row
Computes nCk = n choose k, for large values of nCk
Use-case: Say you have 100 hashtags sitting in an array - You want a table with 5 hashtags per row, all possible combinations - If the hashtags are sitting in a string array, then - combinations[String]( hashtags, 5) - will create the 100 chose 5 combinations.
Algorithm: Use k pipes, cross pipes two at a time, filter out non-monotonic entries
eg. 10C2 = 10 choose 2 - Use 2 pipes. - Pipe1 = (1,2,3,...10) - Pipe2 = (2,3,4....10) - Cross Pipe1 with Pipe2 for 10*9 = 90 tuples - Filter out tuples that are non-monotonic - For (t1,t2) we want t1<t2, otherwise reject. - This brings down 90 tuples to the desired 45 tuples = 10C2 -
Return a pipe with all nPk permutations, with k columns per row -
-Return a pipe with all nPk permutations, with k columns per row - For details, see combinations(.
Return a pipe with all nPk permutations, with k columns per row - For details, see combinations(...) above -
Does the exact same thing as weightedSum, but filters out tuples with a weight of 0 - The returned pipe contain only positive non-zero weights.
-Goal: Given weights (a,b,c, .
Goal: Given weights (a,b,c, ...), we seek integers (x,y,z,...) to satisft - the constraint |ax + by + cz + ... - result | < error
Parameters: The weights (a,b,c,...) must be non-negative doubles. - Our search space is 0 to result/min(weights) - The returned pipe will contain integer tuples (x,y,z,...) that satisfy ax+by+cz +... = result
Note: This is NOT Simplex - WE use a slughtly-improved brute-force algorithm that performs well on account of parallelization. - Algorithm: - Create as many pipes as the number of weights - Each pipe copntains integral multiples of the weight w ie. (0,1w,2w,3w,4w,....) - Iterate as below - - Cross two pipes - Create a temp column that stores intermediate results - Apply progressive filtering on the temp column - Discard the temp column - Once all pipes are crossed, test for temp column within error bounds of result - Discard duplicates at end of process
Usecase: We'd like to generate all integer tuples for typical usecases like
0. How many ways can you invest $1000 in facebook, microsoft, hp ? - val cash = 1000.0 - val error = 5.0 // max error $5, so its ok if we cannot invest the last $5 or less - val (FB, MSFT, HP) = (23.3,27.4,51.2) // share prices - val stocks = IndexedSeq( FB,MSFT,HP ) - weightedSum( stocks, cash, error).write( Tsv("invest.txt"))
2. find all (a,b,c,d) such that 2a+12b+12.5c+34.7d = 3490 with max error 3 - weightedSum( IndexedSeq(2.0,12.0,2.5,34.7),3490.0,3.0)
This is at the heart of portfolio mgmt( Markowitz optimization), subset-sum, operations-research LP problems.
com.twitter.scalding.mathematics
-Write optionally renaming val fields to the given fields -then return this.
-com.twitter.scalding.mathematics
-com.twitter.scalding.mathematics
-com.twitter.scalding.mathematics
-like zipWithIndex.
like zipWithIndex.map but ONLY CHANGES THE VALUE not the index. -Note you will only see non-zero elements on the matrix. This does not enumerate the zeros -
Considering the matrix as a graph, propagate the column: -Does the calculation: \sum_{j where M(i,j) == true) c_j -
-Write the matrix, optionally renaming row,col,val fields to the given fields -then return this.
-com.twitter.scalding.mathematics
-com.twitter.scalding.mathematics
-Abstracts the approach taken to join the two matrices -
com.twitter.scalding.mathematics
-This is the enrichment pattern on Mappable[T] for converting to Matrix types -
com.twitter.scalding.mathematics
-Matrix class - represents an infinite (hopefully sparse) matrix. - any elements without a row are interpretted to be zero. - the pipe hold ('rowIdx, 'colIdx, 'val) where in principle - each row/col/value type is generic, with the constraint that ValT is a Ring[T] - In practice, RowT and ColT are going to be Strings, Integers or Longs in the usual case.
WARNING: - It is NOT OKAY to use the same instance of Matrix/Row/Col with DIFFERENT Monoids/Rings/Fields. - If you want to change, midstream, the Monoid on your ValT, you have to construct a new Matrix. - This is due to caching of internal computation graphs.
RowVector - handles matrices of row dimension one. It is the result of some of the matrix methods and has methods - that return ColVector and diagonal matrix
ColVector - handles matrices of col dimension one. It is the result of some of the matrix methods and has methods - that return RowVector and diagonal matrix -
TODO: Muliplication is the expensive stuff. We need to optimize the methods below: -This object holds the implicits to handle matrix products between various types -
com.twitter.scalding.mathematics
-com.twitter.scalding.mathematics
-like zipWithIndex.
like zipWithIndex.map but ONLY CHANGES THE VALUE not the index. -Note you will only see non-zero elements on the vector. This does not enumerate the zeros -
Do a right-propogation of a row, transpose of Matrix.
Do a right-propogation of a row, transpose of Matrix.propagate -
Write optionally renaming val fields to the given fields -then return this.
-com.twitter.scalding.mathematics
-Write the Scalar, optionally renaming val fields to the given fields -then return this.
-com.twitter.scalding.mathematics
-Allows us to sort matrices by approximate type -
com.twitter.scalding.mathematics
-com.twitter.scalding.mathematics
-com.twitter.scalding.mathematics
-com.twitter.scalding.mathematics
-Abstracts the approach taken to join the two matrices -
-This is the enrichment pattern on Mappable[T] for converting to Matrix types -
-Matrix class - represents an infinite (hopefully sparse) matrix.
-Serve as a repo for self-contained combinatorial functions with no dependencies -such as -combinations, aka n choose k, nCk -permutations , aka nPk -subset sum : numbers that add up to a finite sum -weightedSum: For weights (a,b,c, .
-TODO: Muliplication is the expensive stuff.
-Allows us to sort matrices by approximate type -
-Builder classes used internally to implement coGroups (joins).
-Csv value source -separated by commas and quotes wrapping all fields
-represents a closed interval of time.
-Sets up an implicit dateRange to use in your sources and an implicit -timezone.
-Mix this in for delimited schemes such as TSV or one-separated values -By default, TSV is given
-This is a base class for File-based sources
-This handles the mapReduceMap work on the map-side of the operation.
-Implements reductions on top of a simple abstraction for the Fields-API -We use the f-bounded polymorphism trick to return the type called Self -in each operation.
-This controls the sequence of reductions that happen inside a -particular grouping operation.
-Represents a grouping which is the transition from map to reduce phase in hadoop.
-thrown when validateTaps fails -
-Allows working with an iterable object defined in the job (on the submitter) -to be used within a Job as you would a Pipe/RichPipe
-This class is used to construct unit tests for scalding jobs.
-This Source writes out the TupleEntry as a simple JSON object, using the field -names as keys and the string representation of the values.
-Represents sharded lists of items of type T -
-MapReduceMapBy Class -
-This handles the mapReduceMap work on the map-side of the operation.
-Usually as soon as we open a source, we read and do some mapping -operation on a single column or set of columns.
-There are three ways to run jobs -sourceStrictness is set to true
-A tap that output nothing.
-This just blindly uses the first public constructor with the same arity as the fields size -
-One separated value (commonly used by Pig)
-Implements reductions on top of a simple abstraction for the Fields-API -This is for associative and commutive operations (particularly Monoids play a big role here)
-Packs a tuple into any object with set methods, e.
-Scala 2.
-Represents a strategy for replicating rows when performing skewed joins.
-See https://github.
-See https://github.
-Every source must have a correct toString method.
-A simple trait for releasable resource.
-Implements reductions on top of a simple abstraction for the Fields-API -We use the f-bounded polymorphism trick to return the type called Self -in each operation.
-Ensures that a _SUCCESS file is present in the Source path.
-Memory only testing for unit tests
-The fields here are ('offset, 'line)
-This will automatically produce a globbed version of the given path.
-Tab separated value source
-Mixed in to both TupleConverter and TupleSetter to improve arity safety -of cascading jobs before we run anything on Hadoop.
-Represents a phase in a distributed computation on an input data source -Wraps a cascading Pipe object, and holds the transformation done up until that point -
-The args class does a simple command line parsing.
-Holds some coversion functions for dealing with strings as RichDate objects
-This object has all the implicit functions and values that are used -to make the scalding DSL.
-Represents millisecond based duration (non-calendar based): seconds, minutes, hours -calField should be a java.
-A source outputs nothing.
-A helper for working with class reflection.
-RichDate adds some nice convenience functions to the Java date/calendar classes -We commonly do Date/Time work in analysis jobs, so having these operations convenient -is very helpful.
-implicits for the type-safe DSL -import TDsl.
-Base class for classes which pack a Tuple into a serializable object.
-Base class for objects which unpack an object into a tuple.
-factory methods for TypedPipe -
-Allows you to set the types, prefer this: -If T is a subclass of Product, we assume it is a tuple.
-com.twitter.scalding.serialization
-com.twitter.scalding.serialization
-com.twitter.scalding.serialization
-com.twitter.scalding.serialization
-com.twitter.scalding.serialization
-com.twitter.scalding.serialization
-com.twitter.scalding.serialization
-TODO!!! -Deal with this issue.
TODO!!! -Deal with this issue. The problem is grouping by Kryo serialized -objects silently breaks the results. If Kryo gets in front of TupleSerialization -(and possibly Writable, unclear at this time), grouping is broken. -There are two issues here: -1) Kryo objects not being compared properly. -2) Kryo being used instead of cascading.
We must identify each and fix these bugs. -
com.twitter.scalding.serialization
-com.twitter.scalding.serialization
-* -Below are some serializers for objects in the scalding project. -
* -Below are some serializers for objects in the scalding project.
-Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Subclasses of Source MUST override this method.
Subclasses of Source MUST override this method. The base only handles test -modes, so you should invoke this method for test modes unless your Source -has some special handling of testing. -
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS.
Allows you to read a Tap on the submit node NOT FOR USE IN THE MAPPERS OR REDUCERS. -Typical use might be to read in Job.next to determine if another job is needed -
write the pipe and return the input so it can be chained into -the next operation -
write the pipe and return the input so it can be chained into -the next operation -
Use Algebird Aggregator to do the reduction -
Use Algebird Aggregator to do the reduction -
Operate on a Stream[T] of all the values for each key at one time.
Operate on a Stream[T] of all the values for each key at one time. -Avoid accumulating the whole list in memory if you can. Prefer reduce. -
This is a special case of mapValueStream, but can be optimized because it doesn't need -all the values for a given key at once.
This is a special case of mapValueStream, but can be optimized because it doesn't need -all the values for a given key at once. An unoptimized implementation is: -mapValueStream { _.map { fn } } -but for Grouped we can avoid resorting to mapValueStream -
reduce with fn which must be associative and commutative.
reduce with fn which must be associative and commutative. -Like the above this can be optimized in some Grouped cases. -
This fully replicates the entire right hand side to the left. -This means that we never see the case where the key is absent on the left. This -means implementing a right-join is impossible. -Note, there is no reduce-phase in this operation. -The next issue is that obviously, unlike a cogroup, for a fixed key, each joiner will -NOT See all the tuples with those keys. This is because the keys on the left are -distributed across many machines -See hashjoin: -http://docs.cascading.org/cascading/2.0/javadoc/cascading/pipe/HashJoin.html -
This fully replicates the entire right hand side to the left.
-'+this._get(a,"weekHeader")+" | ":"";for(t=0;t<7;t++){var q=(t+h)%7;A+="=5?' class="ui-datepicker-week-end"':"")+'>'+s[q]+" | "}x+=A+"'+this._get(a,"calculateWeek")(q)+" | ";for(t=0;t<7;t++){var F=p?p.apply(a.input?a.input[0]:null,[q]):[true,""],B=q.getMonth()!=g,J=B&&!G||!F[0]||j&&q"+(B&&!w?" ":J?''+q.getDate()+ -"":''+q.getDate()+"")+" | ";q.setDate(q.getDate()+1);q=this._daylightSavingAdjust(q)}x+=O+""}g++;if(g>11){g=0;m++}x+="
---|
=0))l||m.push(v);else if(l)h[p]=false;return false},ID:function(g){return g[1].replace(/\\/g,"")},TAG:function(g){return g[1].toLowerCase()}, -CHILD:function(g){if(g[1]==="nth"){var h=/(-?)(\d*)n((?:\+|-)?\d*)/.exec(g[2]==="even"&&"2n"||g[2]==="odd"&&"2n+1"||!/\D/.test(g[2])&&"0n+"+g[2]||g[2]);g[2]=h[1]+(h[2]||1)-0;g[3]=h[3]-0}g[0]=e++;return g},ATTR:function(g,h,l,m,q,p){h=g[1].replace(/\\/g,"");if(!p&&n.attrMap[h])g[1]=n.attrMap[h];if(g[2]==="~=")g[4]=" "+g[4]+" ";return g},PSEUDO:function(g,h,l,m,q){if(g[1]==="not")if((f.exec(g[3])||"").length>1||/^\w/.test(g[3]))g[3]=k(g[3],null,null,h);else{g=k.filter(g[3],h,l,true^q);l||m.push.apply(m, -g);return false}else if(n.match.POS.test(g[0])||n.match.CHILD.test(g[0]))return true;return g},POS:function(g){g.unshift(true);return g}},filters:{enabled:function(g){return g.disabled===false&&g.type!=="hidden"},disabled:function(g){return g.disabled===true},checked:function(g){return g.checked===true},selected:function(g){return g.selected===true},parent:function(g){return!!g.firstChild},empty:function(g){return!g.firstChild},has:function(g,h,l){return!!k(l[3],g).length},header:function(g){return/h\d/i.test(g.nodeName)}, -text:function(g){return"text"===g.type},radio:function(g){return"radio"===g.type},checkbox:function(g){return"checkbox"===g.type},file:function(g){return"file"===g.type},password:function(g){return"password"===g.type},submit:function(g){return"submit"===g.type},image:function(g){return"image"===g.type},reset:function(g){return"reset"===g.type},button:function(g){return"button"===g.type||g.nodeName.toLowerCase()==="button"},input:function(g){return/input|select|textarea|button/i.test(g.nodeName)}}, -setFilters:{first:function(g,h){return h===0},last:function(g,h,l,m){return h===m.length-1},even:function(g,h){return h%2===0},odd:function(g,h){return h%2===1},lt:function(g,h,l){return hl[3]-0},nth:function(g,h,l){return l[3]-0===h},eq:function(g,h,l){return l[3]-0===h}},filter:{PSEUDO:function(g,h,l,m){var q=h[1],p=n.filters[q];if(p)return p(g,l,h,m);else if(q==="contains")return(g.textContent||g.innerText||a([g])||"").indexOf(h[3])>=0;else if(q==="not"){h= -h[3];l=0;for(m=h.length;l =0}},ID:function(g,h){return g.nodeType===1&&g.getAttribute("id")===h},TAG:function(g,h){return h==="*"&&g.nodeType===1||g.nodeName.toLowerCase()===h},CLASS:function(g,h){return(" "+(g.className||g.getAttribute("class"))+" ").indexOf(h)>-1},ATTR:function(g,h){var l=h[1];g=n.attrHandle[l]?n.attrHandle[l](g):g[l]!=null?g[l]:g.getAttribute(l);l=g+"";var m=h[2];h=h[4];return g==null?m==="!=":m=== -"="?l===h:m==="*="?l.indexOf(h)>=0:m==="~="?(" "+l+" ").indexOf(h)>=0:!h?l&&g!==false:m==="!="?l!==h:m==="^="?l.indexOf(h)===0:m==="$="?l.substr(l.length-h.length)===h:m==="|="?l===h||l.substr(0,h.length+1)===h+"-":false},POS:function(g,h,l,m){var q=n.setFilters[h[2]];if(q)return q(g,l,h,m)}}},r=n.match.POS;for(var u in n.match){n.match[u]=new RegExp(n.match[u].source+/(?![^\[]*\])(?![^\(]*\))/.source);n.leftMatch[u]=new RegExp(/(^(?:.|\r|\n)*?)/.source+n.match[u].source.replace(/\\(\d+)/g,function(g, -h){return"\\"+(h-0+1)}))}var z=function(g,h){g=Array.prototype.slice.call(g,0);if(h){h.push.apply(h,g);return h}return g};try{Array.prototype.slice.call(s.documentElement.childNodes,0)}catch(C){z=function(g,h){h=h||[];if(j.call(g)==="[object Array]")Array.prototype.push.apply(h,g);else if(typeof g.length==="number")for(var l=0,m=g.length;l ";var l=s.documentElement;l.insertBefore(g,l.firstChild);if(s.getElementById(h)){n.find.ID=function(m,q,p){if(typeof q.getElementById!=="undefined"&&!p)return(q=q.getElementById(m[1]))?q.id===m[1]||typeof q.getAttributeNode!=="undefined"&& -q.getAttributeNode("id").nodeValue===m[1]?[q]:w:[]};n.filter.ID=function(m,q){var p=typeof m.getAttributeNode!=="undefined"&&m.getAttributeNode("id");return m.nodeType===1&&p&&p.nodeValue===q}}l.removeChild(g);l=g=null})();(function(){var g=s.createElement("div");g.appendChild(s.createComment(""));if(g.getElementsByTagName("*").length>0)n.find.TAG=function(h,l){l=l.getElementsByTagName(h[1]);if(h[1]==="*"){h=[];for(var m=0;l[m];m++)l[m].nodeType===1&&h.push(l[m]);l=h}return l};g.innerHTML=""; -if(g.firstChild&&typeof g.firstChild.getAttribute!=="undefined"&&g.firstChild.getAttribute("href")!=="#")n.attrHandle.href=function(h){return h.getAttribute("href",2)};g=null})();s.querySelectorAll&&function(){var g=k,h=s.createElement("div");h.innerHTML="";if(!(h.querySelectorAll&&h.querySelectorAll(".TEST").length===0)){k=function(m,q,p,v){q=q||s;if(!v&&q.nodeType===9&&!x(q))try{return z(q.querySelectorAll(m),p)}catch(t){}return g(m,q,p,v)};for(var l in g)k[l]=g[l];h=null}}(); -(function(){var g=s.createElement("div");g.innerHTML="";if(!(!g.getElementsByClassName||g.getElementsByClassName("e").length===0)){g.lastChild.className="e";if(g.getElementsByClassName("e").length!==1){n.order.splice(1,0,"CLASS");n.find.CLASS=function(h,l,m){if(typeof l.getElementsByClassName!=="undefined"&&!m)return l.getElementsByClassName(h[1])};g=null}}})();var E=s.compareDocumentPosition?function(g,h){return!!(g.compareDocumentPosition(h)&16)}: -function(g,h){return g!==h&&(g.contains?g.contains(h):true)},x=function(g){return(g=(g?g.ownerDocument||g:0).documentElement)?g.nodeName!=="HTML":false},ga=function(g,h){var l=[],m="",q;for(h=h.nodeType?[h]:h;q=n.match.PSEUDO.exec(g);){m+=q[0];g=g.replace(n.match.PSEUDO,"")}g=n.relative[g]?g+"*":g;q=0;for(var p=h.length;q =0===d})};c.fn.extend({find:function(a){for(var b=this.pushStack("","find",a),d=0,f=0,e=this.length;f
0)for(var j=d;j 0},closest:function(a,b){if(c.isArray(a)){var d=[],f=this[0],e,j= -{},i;if(f&&a.length){e=0;for(var o=a.length;e -1:c(f).is(e)){d.push({selector:i,elem:f});delete j[i]}}f=f.parentNode}}return d}var k=c.expr.match.POS.test(a)?c(a,b||this.context):null;return this.map(function(n,r){for(;r&&r.ownerDocument&&r!==b;){if(k?k.index(r)>-1:c(r).is(a))return r;r=r.parentNode}return null})},index:function(a){if(!a||typeof a=== -"string")return c.inArray(this[0],a?c(a):this.parent().children());return c.inArray(a.jquery?a[0]:a,this)},add:function(a,b){a=typeof a==="string"?c(a,b||this.context):c.makeArray(a);b=c.merge(this.get(),a);return this.pushStack(qa(a[0])||qa(b[0])?b:c.unique(b))},andSelf:function(){return this.add(this.prevObject)}});c.each({parent:function(a){return(a=a.parentNode)&&a.nodeType!==11?a:null},parents:function(a){return c.dir(a,"parentNode")},parentsUntil:function(a,b,d){return c.dir(a,"parentNode", -d)},next:function(a){return c.nth(a,2,"nextSibling")},prev:function(a){return c.nth(a,2,"previousSibling")},nextAll:function(a){return c.dir(a,"nextSibling")},prevAll:function(a){return c.dir(a,"previousSibling")},nextUntil:function(a,b,d){return c.dir(a,"nextSibling",d)},prevUntil:function(a,b,d){return c.dir(a,"previousSibling",d)},siblings:function(a){return c.sibling(a.parentNode.firstChild,a)},children:function(a){return c.sibling(a.firstChild)},contents:function(a){return c.nodeName(a,"iframe")? -a.contentDocument||a.contentWindow.document:c.makeArray(a.childNodes)}},function(a,b){c.fn[a]=function(d,f){var e=c.map(this,b,d);eb.test(a)||(f=d);if(f&&typeof f==="string")e=c.filter(f,e);e=this.length>1?c.unique(e):e;if((this.length>1||gb.test(f))&&fb.test(a))e=e.reverse();return this.pushStack(e,a,R.call(arguments).join(","))}});c.extend({filter:function(a,b,d){if(d)a=":not("+a+")";return c.find.matches(a,b)},dir:function(a,b,d){var f=[];for(a=a[b];a&&a.nodeType!==9&&(d===w||a.nodeType!==1||!c(a).is(d));){a.nodeType=== -1&&f.push(a);a=a[b]}return f},nth:function(a,b,d){b=b||1;for(var f=0;a;a=a[d])if(a.nodeType===1&&++f===b)break;return a},sibling:function(a,b){for(var d=[];a;a=a.nextSibling)a.nodeType===1&&a!==b&&d.push(a);return d}});var Ja=/ jQuery\d+="(?:\d+|null)"/g,V=/^\s+/,Ka=/(<([\w:]+)[^>]*?)\/>/g,hb=/^(?:area|br|col|embed|hr|img|input|link|meta|param)$/i,La=/<([\w:]+)/,ib=/"+d+">"},F={option:[1,""],legend:[1,""],thead:[1," ","
"],tr:[2,"","
"],td:[3,""],col:[2,"
"," "],area:[1,""],_default:[0,"",""]};F.optgroup=F.option;F.tbody=F.tfoot=F.colgroup=F.caption=F.thead;F.th=F.td;if(!c.support.htmlSerialize)F._default=[1,"div
"," ",""];c.fn.extend({text:function(a){if(c.isFunction(a))return this.each(function(b){var d= -c(this);d.text(a.call(this,b,d.text()))});if(typeof a!=="object"&&a!==w)return this.empty().append((this[0]&&this[0].ownerDocument||s).createTextNode(a));return c.text(this)},wrapAll:function(a){if(c.isFunction(a))return this.each(function(d){c(this).wrapAll(a.call(this,d))});if(this[0]){var b=c(a,this[0].ownerDocument).eq(0).clone(true);this[0].parentNode&&b.insertBefore(this[0]);b.map(function(){for(var d=this;d.firstChild&&d.firstChild.nodeType===1;)d=d.firstChild;return d}).append(this)}return this}, -wrapInner:function(a){if(c.isFunction(a))return this.each(function(b){c(this).wrapInner(a.call(this,b))});return this.each(function(){var b=c(this),d=b.contents();d.length?d.wrapAll(a):b.append(a)})},wrap:function(a){return this.each(function(){c(this).wrapAll(a)})},unwrap:function(){return this.parent().each(function(){c.nodeName(this,"body")||c(this).replaceWith(this.childNodes)}).end()},append:function(){return this.domManip(arguments,true,function(a){this.nodeType===1&&this.appendChild(a)})}, -prepend:function(){return this.domManip(arguments,true,function(a){this.nodeType===1&&this.insertBefore(a,this.firstChild)})},before:function(){if(this[0]&&this[0].parentNode)return this.domManip(arguments,false,function(b){this.parentNode.insertBefore(b,this)});else if(arguments.length){var a=c(arguments[0]);a.push.apply(a,this.toArray());return this.pushStack(a,"before",arguments)}},after:function(){if(this[0]&&this[0].parentNode)return this.domManip(arguments,false,function(b){this.parentNode.insertBefore(b, -this.nextSibling)});else if(arguments.length){var a=this.pushStack(this,"after",arguments);a.push.apply(a,c(arguments[0]).toArray());return a}},remove:function(a,b){for(var d=0,f;(f=this[d])!=null;d++)if(!a||c.filter(a,[f]).length){if(!b&&f.nodeType===1){c.cleanData(f.getElementsByTagName("*"));c.cleanData([f])}f.parentNode&&f.parentNode.removeChild(f)}return this},empty:function(){for(var a=0,b;(b=this[a])!=null;a++)for(b.nodeType===1&&c.cleanData(b.getElementsByTagName("*"));b.firstChild;)b.removeChild(b.firstChild); -return this},clone:function(a){var b=this.map(function(){if(!c.support.noCloneEvent&&!c.isXMLDoc(this)){var d=this.outerHTML,f=this.ownerDocument;if(!d){d=f.createElement("div");d.appendChild(this.cloneNode(true));d=d.innerHTML}return c.clean([d.replace(Ja,"").replace(/=([^="'>\s]+\/)>/g,'="$1">').replace(V,"")],f)[0]}else return this.cloneNode(true)});if(a===true){ra(this,b);ra(this.find("*"),b.find("*"))}return b},html:function(a){if(a===w)return this[0]&&this[0].nodeType===1?this[0].innerHTML.replace(Ja, -""):null;else if(typeof a==="string"&&!ta.test(a)&&(c.support.leadingWhitespace||!V.test(a))&&!F[(La.exec(a)||["",""])[1].toLowerCase()]){a=a.replace(Ka,Ma);try{for(var b=0,d=this.length;b0||e.cacheable||this.length>1?k.cloneNode(true):k)}o.length&&c.each(o,Qa)}return this}});c.fragments={};c.each({appendTo:"append",prependTo:"prepend",insertBefore:"before",insertAfter:"after",replaceAll:"replaceWith"},function(a,b){c.fn[a]=function(d){var f=[];d=c(d);var e=this.length===1&&this[0].parentNode;if(e&&e.nodeType===11&&e.childNodes.length===1&&d.length===1){d[b](this[0]); -return this}else{e=0;for(var j=d.length;e 0?this.clone(true):this).get();c.fn[b].apply(c(d[e]),i);f=f.concat(i)}return this.pushStack(f,a,d.selector)}}});c.extend({clean:function(a,b,d,f){b=b||s;if(typeof b.createElement==="undefined")b=b.ownerDocument||b[0]&&b[0].ownerDocument||s;for(var e=[],j=0,i;(i=a[j])!=null;j++){if(typeof i==="number")i+="";if(i){if(typeof i==="string"&&!jb.test(i))i=b.createTextNode(i);else if(typeof i==="string"){i=i.replace(Ka,Ma);var o=(La.exec(i)||["", -""])[1].toLowerCase(),k=F[o]||F._default,n=k[0],r=b.createElement("div");for(r.innerHTML=k[1]+i+k[2];n--;)r=r.lastChild;if(!c.support.tbody){n=ib.test(i);o=o==="table"&&!n?r.firstChild&&r.firstChild.childNodes:k[1]===" "&&!n?r.childNodes:[];for(k=o.length-1;k>=0;--k)c.nodeName(o[k],"tbody")&&!o[k].childNodes.length&&o[k].parentNode.removeChild(o[k])}!c.support.leadingWhitespace&&V.test(i)&&r.insertBefore(b.createTextNode(V.exec(i)[0]),r.firstChild);i=r.childNodes}if(i.nodeType)e.push(i);else e= -c.merge(e,i)}}if(d)for(j=0;e[j];j++)if(f&&c.nodeName(e[j],"script")&&(!e[j].type||e[j].type.toLowerCase()==="text/javascript"))f.push(e[j].parentNode?e[j].parentNode.removeChild(e[j]):e[j]);else{e[j].nodeType===1&&e.splice.apply(e,[j+1,0].concat(c.makeArray(e[j].getElementsByTagName("script"))));d.appendChild(e[j])}return e},cleanData:function(a){for(var b,d,f=c.cache,e=c.event.special,j=c.support.deleteExpando,i=0,o;(o=a[i])!=null;i++)if(d=o[c.expando]){b=f[d];if(b.events)for(var k in b.events)e[k]? -c.event.remove(o,k):Ca(o,k,b.handle);if(j)delete o[c.expando];else o.removeAttribute&&o.removeAttribute(c.expando);delete f[d]}}});var kb=/z-?index|font-?weight|opacity|zoom|line-?height/i,Na=/alpha\([^)]*\)/,Oa=/opacity=([^)]*)/,ha=/float/i,ia=/-([a-z])/ig,lb=/([A-Z])/g,mb=/^-?\d+(?:px)?$/i,nb=/^-?\d/,ob={position:"absolute",visibility:"hidden",display:"block"},pb=["Left","Right"],qb=["Top","Bottom"],rb=s.defaultView&&s.defaultView.getComputedStyle,Pa=c.support.cssFloat?"cssFloat":"styleFloat",ja= -function(a,b){return b.toUpperCase()};c.fn.css=function(a,b){return X(this,a,b,true,function(d,f,e){if(e===w)return c.curCSS(d,f);if(typeof e==="number"&&!kb.test(f))e+="px";c.style(d,f,e)})};c.extend({style:function(a,b,d){if(!a||a.nodeType===3||a.nodeType===8)return w;if((b==="width"||b==="height")&&parseFloat(d)<0)d=w;var f=a.style||a,e=d!==w;if(!c.support.opacity&&b==="opacity"){if(e){f.zoom=1;b=parseInt(d,10)+""==="NaN"?"":"alpha(opacity="+d*100+")";a=f.filter||c.curCSS(a,"filter")||"";f.filter= -Na.test(a)?a.replace(Na,b):b}return f.filter&&f.filter.indexOf("opacity=")>=0?parseFloat(Oa.exec(f.filter)[1])/100+"":""}if(ha.test(b))b=Pa;b=b.replace(ia,ja);if(e)f[b]=d;return f[b]},css:function(a,b,d,f){if(b==="width"||b==="height"){var e,j=b==="width"?pb:qb;function i(){e=b==="width"?a.offsetWidth:a.offsetHeight;f!=="border"&&c.each(j,function(){f||(e-=parseFloat(c.curCSS(a,"padding"+this,true))||0);if(f==="margin")e+=parseFloat(c.curCSS(a,"margin"+this,true))||0;else e-=parseFloat(c.curCSS(a, -"border"+this+"Width",true))||0})}a.offsetWidth!==0?i():c.swap(a,ob,i);return Math.max(0,Math.round(e))}return c.curCSS(a,b,d)},curCSS:function(a,b,d){var f,e=a.style;if(!c.support.opacity&&b==="opacity"&&a.currentStyle){f=Oa.test(a.currentStyle.filter||"")?parseFloat(RegExp.$1)/100+"":"";return f===""?"1":f}if(ha.test(b))b=Pa;if(!d&&e&&e[b])f=e[b];else if(rb){if(ha.test(b))b="float";b=b.replace(lb,"-$1").toLowerCase();e=a.ownerDocument.defaultView;if(!e)return null;if(a=e.getComputedStyle(a,null))f= -a.getPropertyValue(b);if(b==="opacity"&&f==="")f="1"}else if(a.currentStyle){d=b.replace(ia,ja);f=a.currentStyle[b]||a.currentStyle[d];if(!mb.test(f)&&nb.test(f)){b=e.left;var j=a.runtimeStyle.left;a.runtimeStyle.left=a.currentStyle.left;e.left=d==="fontSize"?"1em":f||0;f=e.pixelLeft+"px";e.left=b;a.runtimeStyle.left=j}}return f},swap:function(a,b,d){var f={};for(var e in b){f[e]=a.style[e];a.style[e]=b[e]}d.call(a);for(e in b)a.style[e]=f[e]}});if(c.expr&&c.expr.filters){c.expr.filters.hidden=function(a){var b= -a.offsetWidth,d=a.offsetHeight,f=a.nodeName.toLowerCase()==="tr";return b===0&&d===0&&!f?true:b>0&&d>0&&!f?false:c.curCSS(a,"display")==="none"};c.expr.filters.visible=function(a){return!c.expr.filters.hidden(a)}}var sb=J(),tb=/ - - - - - - -
- - -- -root package
-- - - package - - - root - -
- - - - --- -- - -- Visibility ---
- Public
- All
-- - - - - - - -- - - - - - -- --- - -Value Members
--
- - -
- - - package - - - com - -
- -- -- -