- Back out 4 changes to be binary compatible: #1187
- Use java.util.Random instead of scala.util.Random: #1186
- Add Execution.failed: #1185
- Using a ConcurrentHashMap instead of a WeakHashMap to make the Stats behave in a correct manner: #1184
- Add applicative for Execution: #1181
- Covert LzoTextDelimited to Cascading scheme.: #1179
- Make TraceUtil support versions of cascading older than 2.6: #1180
- Add support for more LzoTextDeilmited parameters in LzoTraits.: #1178
- Use latest algebird, bijection, chill, elephantbird, and scala 2.11.5: #1174
- Cascading 2.6 tracing: #1156
- use Cascading 2.6.1 and cascading-jdbc 2.6.0: #1110
- add reducer option to LookupJoin: #1160
- Add dump to ValuePipe in the REPL: #1157
- Ianoc/type descriptor: #1147
- Refactor around the macro definitions into 3 files. Both converter and setter support Options: #1145
- Fix a few random typos: #1144
- Fix two issues found by static analysis: #1143
- Add implicit helpers for numeric arguments: #1138
- Add a fields macro: #1132
- Ianoc/case class tuple converters: #1131
- Some minor changes, cleanup pulled from jco's macro branch: #1130
- Adds a typedjson source: #1129
- Pulls all external 3rdparty versions up to the top of the build file: #1128
- remove transitive pig and elephantbird dependencies for parquet-cascading: #1127
- Some minor clean up in the build file: #1123
- Ianoc/scalding 210: #1116
- Decrease test count: #1117
- Removes scala 2.9.3: #1106
- Fix some typos in TypedPipe docs, expand flatMap docs: #1115
- Implicit execution context / easier switching between modes: #1113
- Add more documentation to TypedPipe: #1111
- Update the README: #1114
- Fixed comment in LookupJoin.scala: #1108
- Fix long compile time for MultiJoin helpers: #1109
- Allows reducer estimation to operate on all hfs taps: #1080
- Fix bufferedTake: #1107
- Generate methods for flattening the results of many joins: #1097
- Make TimePathedSource more configurable: #1105
- Adding DailyPrefixSuffixLzoTsv: #1082
- Option to select the fields for output in templatesource: #1061
- Add a DailySuffixMostRecentLzoProtobuf source: #1104
- Updates default scala version to 2.10.4: #1081
- MultiSourceTap hashcode: #1101
- scalding-core: merge flow step strategies to allow reducer estimation combined with other strategies: #1094
- Improve command line handling of the execution app: #1083
- More testing around the globifier with new properties: #1092
- Refactor JDBCSource to add compile-time info about type of DB: #1087
- Add a cumulative sum to KeyedList: #1085
- Add in failing test case: #1090
- Adds ability to also get the mode inside the Execution monad.: #1088
- Enforce invariant: mapGroup iterators all nonempty: #1072
- Allow PartitionSource to limit the number of open files: #1078
- append to Cascading frameworks system property instead of setting it directly: #1076
- Adds some output while assembly is building to keep travis happy: #1084
- Only request necessary hadoop configs in hraven reducer estimator: #1067
- Add parquet-scrooge sources: #1064
- Outer join handles case when both are empty: #1065
- Fix race in merging: #1063
- Add support for column projection to parquet sources: #1056
- Add typed version of RichPipe 'using': #1049
- Add getExecution/getOrElseExecution: #1062
- Change toIteratorExecution to toIterableExecution: #1058
- Cache Execution evaluations: #1057
- Add support for push down filters in parquet sources: #1050
- Add support for Fold: #1053
- move to use JobConf(true) for hadoop crazyness that causes host not foun...: #1051
- Disable Cascading update check.: #1048
- Respects -Dmapred.job.name when passed in on the command line: #1045
- Add some instances from Algebird: #1039
- Fix join.mapGroup issue: #1038
- Add a defensive .forceToDisk in Sketched: #1035
- Override toIterator for all Mappable with transformForRead: #1034
- Make sinkFields in TypedDelimited final.: #1032
- Fixed type of exception thrown by validateTaps: #1033
- Add default local maven repo to the resolver list: #1024
- Add an ExecutionApp trait for objects to skip the Job class: #1027
- Make each head pipe have a unique name: #1025
- Run REPL from SBT: #1021
- Add Config to openForRead: #1023
- Fix replConfig merging and evaluate values in Config.fromHadoop: #1015
- REPL Autoload file: #1009
- Fix hRaven Reducer Estimator: #1018
- Update Cascading JDBC Version.: #1016
- Some Execution fixes: #1007
- Refactor InputSizeReducerEstimator to correctly unroll MultiSourceTaps: #1017
- Fix issue #1011: Building develop branch fails: #1012
- hRaven Reducer Estimator: #996
- JsonLine should handle empty lines: #966
- Add comments for memory-related reduce operations.: #1006
- Add the remaining odds and ends to Execution[T]: #985
- Fix up the tests to run forked, and split across lots of travis builds: #993
- Typedpipe partition: #987
- Fix toIterator bug (#988): #990
- Basic reducer estimator support: #973
- Improve TypedSimilarity algorithm and update test.: #983
- Adds support for Counters inside the Execution Monad.: #982
- Make map/flatMap lazy on IterablePipe to address OOM: #981
- JsonLine: enable read transformation in test to get correct fields in sourceTap: #971
- Read and writable partitioned sources: #969
- Make an Execution[T] type, which is a monad, which makes composing Jobs easy.: #974
- Generalize handling of merged TypedPipes: #975
- Do not inherit from FileSource in LzoTraits: #976
- Make TypedPipe immutable: #968
- Adds an optional source: #963
- Add pipe1.join(pipe2) syntax in TypedAPI: #958
- Extending BddDsl for Typed API: #956
- VerticaJdbcDriver: #957
- fix the example usage in JDBCSource: #955
- Push back off ec2 requiring sudo, build failures are a nightmare: #953
- Add ExecutionContextJob to interop execution style with Job style: #952
- hadoop.tmp.dir for snapshot in config
- Fixes bad release portion where code wasn't updated for new scalding version number.
- use cascading-jdbc 2.5.3 for table exists fix and cascading 2.5.5: #951
- Bump build properties and sbt launcher: #950
- Fixes the travis build: #944
- Making the README.md consistent with 0.11.0 changes for the REPL.: #941
- Backport Meatlocker: #571
- REPL: Add toIterator (and related methods): #929
- Fix the build to use the shared module method: #938
- Clean up the UniqueID stuff, to avoid plumbing it everywhere: #937
- TypedPipe.from(List).distinct fails: #935
- Clean up ExecutionContext a bit: #933
- Fix Issue 932: no-op Jobs should not throw: #934
- Use Execution to run flows in REPL: #928
- Snapshot a pipe in the REPL: #918
- Add support for AppJar in Config: #924
- Fix LzoTextLine as a TypedSource: #921
- Use externalizer in BijectedSourceSink: #926
- Add an Executor to run flows without a Job: #915
- This handles the case where scalding will save out a tsv and re-use it down stream leading to issues where the types are not strings: #913
- Fix DailySuffixTsv for testability, remove leaked DailySuffixTsv: #919
- Add a Config class to make configuration understandable: #914
- Integrate the repl completely into scald.rb. Fixup scald-rb for better hdfs-local mode now with our provides: #902
- Add some auto-reformats: #911
- Update JDBCSource: #898
- Allow tests for typed delimited by fixing swallowed bug: #910
- Add Hadoop platform test to enable unit testing for Hadoop semantics: #858
- Some minor improvements to typed joining code: #909
- Fix #906: #908
- Run the test target, so the tests are reformatted: #907
- Enable scalariform: #905
- Simplify "scald-repl.sh": #901
- Typed Tutorial: #897
- Adding a test for the scalding repl: #890
- Properly close tuple iterator in test framework.: #896
- Add constructors to ValuePipe: #893
- contraMap and andThen on TypedSink/TypedSource: #892
- Tiny fix to use an ImplicitBijection rather than Bijection: #887
- Feature/bijected source sink: #886
- Fix intersection equality error: #878
- Add DailySuffixTypedTsv and HourlySuffixTypedTsv.: #873
- add stepListner register support in Scalding: #875
- Backport Meatlocker: #571
- Upgrade cascading to 2.5.4, cascading jdbc to 2.5.2
- Adding an hdfs mode for the Scalding REPL
- Added implementation of PartitionSource with tests
- Add helper methods to KeyedList and TypedPipe
- Add addTrap to TypedPipe
- Add join operations to TypedPipe that do not require grouping beforehand
- Fixed bug in size estimation of diagonal matrices
- Optimized the reduceRow/ColVectors function for the number of reducers
- Add a BlockMatrix object (an abstraction of a Vector of Matrices)
- Publish 0.8.7 for scala 2.9.3.
- Hotfix to bypass a bug in Hadoop, which cannot sync up all deprecated keys.
- Hotfix a bug in Tool. Now Tool will re-throw all exceptions again.
- Fixed bug in RichPipe.insert
- Add souce[T] to JobTest
- Allow DelimitedScheme to override strictness and safety.
- Add distinct method to RichPipe and TypedPipe, add mapValues to TypedPipe
- ISSUE 389: Catch exceptions in Tool
- Sbt assembly 0.8.7
- Add CascadeJob to allow multiple flows in one job
- Adding cross-build scala versions
- Use mima to check binary compatibility
- ISSUE 340: Upgrade to Cascading 2.1.5
- ISSUE 327,329,337: adds sample method with seed in RichPipe
- ISSUE 323: Remove untyped write from TypedPipe (must write to Mappable[U])
- ISSUE 321: pulls out scalding-date and scalding-args as separate projects
27 commits
- P. Oscar Boykin: 13 commits
- willf: 4 commits
- Argyris Zymnis: 3 commits
- Sam Ritchie: 3 commits
- Tim Chklovski: 1 commits
- Chris Severs: 1 commits
- Rickey Visinski: 1 commits
- David Shimon: 1 commits
- ISSUE 312: dramatic speedup for sortWithTake/sortedTake if you take many items
- ISSUE 307: Read support for JsonLine support (previously, just write)
- ISSUE 305: Adds a shuffle-method to RichPipe (for sampling/sharding)
- ISSUE 299: limit method for Typed-safe API.
- ISSUE 296: Fixes self-joins in the Type-safe API.
- ISSUE 295: unpack-all syntax (use Fields.ALL) for TupleUnpacker
- ISSUE 280: Improvements to AbsoluteDuration
- ISSUE 277: Upgrade to Cascading 2.0.7
75 commits total.
- P. Oscar Boykin: 25 commits
- Alex Dean: 15 commits
- Argyris Zymnis: 11 commits
- Sam Ritchie: 4 commits
- Timothy Chklovski: 4 commits
- Dan McKinley: 4 commits
- Aaron Siegel: 3 commits
- Tim Chklovski: 3 commits
- Ashutosh Singhal: 2 commits
- João Oliveirinha: 2 commits
- Arkajit Dey: 1 commits
- Avi Bryant: 1 commits
- ISSUE 269: Improvements to AbsoluteDuration.fromMillisecs and some new APIs
- ISSUE 256: Weighted page-rank with the Matrix API
- ISSUE 249: Fix for Matrix.scala missing some obvious operations
- ISSUE 246: Partition in RichPipe (create a new field, and then groupBy on it)
- ISSUE 241: Fix joinWithLarger with a custom joiner
- ISSUE 234. 238: Etsy sync: periodic date jobs, ability to add traps, more flexible Args
- ISSUE 230: shard and groupRandomly on RichPipe
- ISSUE 229: Initial skew-Join implementation (please test!)
- ISSUE 228 - 233, 265: Improve Typed-API
- ISSUE 221: Combinatorics in scalding.mathematics
106 commits total.
- P. Oscar Boykin: 29 commits
- Krishnan Raman: 16 commits
- Arkajit Dey: 15 commits
- Avi Bryant: 7 commits
- Edwin Chen: 6 commits
- Aaron Siegel: 5 commits
- Koert Kuipers: 5 commits
- Argyris Zymnis: 4 commits
- Sam Ritchie: 4 commits
- Chris Severs: 4 commits
- Brad Greenlee: 3 commits
- Wil Stuckey: 2 commits
- Dan McKinley: 2 commits
- Matteus Klich: 2 commits
- Josh Devins: 1 commits
- Steve Mardenfeld: 1 commits
- ISSUE 220: Etsy date improvements and local-mode tap improvements
- ISSUE 219: scald.rb fix
- ISSUE 218: Add aggregate method to ReduceOperations
- ISSUE 216: Improve variance notations
- ISSUE 213,215: Make Field[T] serializable
- ISSUE 210,211: Refactor date code into individual files + tests
- ISSUE 209: Add hourly/daily time pathed source classes
- ISSUE 207,208: sbt build improvements
- ISSUE 205,206: Remove scala serialization code to com.twitter.chill
- ISSUE 203: Improved date-parsing and docs (from Etsy)
- ISSUE 202,204: Add propagate/mapWithIndex in Matrix (use Monoids with graphs)
- ISSUE 201: add stdDev to groupBuilder.
- ISSUE 200: typed write and key-value swap
- ISSUE 196: Clean up deprecations
- ISSUE 194,195: Fix negative numbers as args
- ISSUE 190-192,197,199: Improved Joining/Co-grouping in the typed-API
- Many small bugfixes
- ISSUE 189: Adds spillThreshold to GroupBuilder to tune memory usage
- ISSUE 187: Adds TypedTsv for type-safe TSV files.
- ISSUE 179: Add forceToDisk to help hand optimization of flows
- ISSUE 175: API to set the type/comparators in the Fields API.
- ISSUE 162, 163: Adds keys, values methods to Typed-API
- ISSUE 160: adds approxUniques to GroupBuilder
- ISSUE 153: mapPlusMap in GroupBuilder
- ISSUE 149: Support for Hadoop sequence files
- ISSUE 148: Matrix API
- ISSUE 147: Move Monoid/Algebra code to Algebird
- Mulitple issues: many new Kryo serializers added
- ISSUE 140: Adds ability to do side-effects (foreach, using in RichPipe)
- ISSUE 134: Cleans up scalding.Tool
- ISSUE 133: Adds SortedListTake monoid, Either monoid, and tests
- ISSUE 130: Upgrades cascading.kryo to 0.4.4
- ISSUE 129: Disable Kryo references by default, add config
- ISSUE 128: Minor fix to normalize job
- ISSUE 127: Upgrade maple and remove jdks.
- ISSUE 126: Switch normalize to use crossWithTiny
- ISSUE 125: Serialization fixes
- ISSUE 124: Add crossWithSmaller and use it in normalize.
- ISSUE 123: Adds generated classes that allow tuple concatenation.
- ISSUE 122: Upgrade cascading to version 2.0.2 and maple to 0.2.1
- ISSUE 119: Iterator fix
- ISSUE 118: Specialized tuplegetter
- ISSUE 117: Job conf keys
- ISSUE 116: Feature/flatten
- ISSUE 115: Fold scan init fix
- ISSUE 114: Feature/scald update
- ISSUE 113: Make sure SequenceFile uses the fields list if one is passed in
- ISSUE 110: Upgrade Kryo to 2.16.
- ISSUE 107: Add a method to GroupBuilder to force all aggregation on the reducers.
- ISSUE 105: Feature/head fix
- ISSUE 104: Feature/default date improvements
- ISSUE 103: Feature/case case pack
- ISSUE 100: Adds trivial UnitGroup
- ISSUE 99: Fix build breakage and add tutorial run script.
- ISSUE 98: Feature/tpipe stream
- ISSUE 97: Add the user input to the unknown mode error message
- ISSUE 95: Feature/buffer cleanup
- ISSUE 94: Feature/tpipe
- ISSUE 93: Upgrade to cascading 2.0.0 and maple 0.2.0
- ISSUE 92: Feature/dsl refactor
- ISSUE 91: Upgrade to cascading wip-310 and maple 0.1.10
- ISSUE 90: Test against multiple JDKs on travis-ci.org
- ISSUE 88: Bump sbt-assembly version from 0.7.3 to 0.8.1
- ISSUE 87: Header Lines support in DelimitedScheme
- ISSUE 84: Add packTo and unpackTo
- ISSUE 83: Adds sum and product to Monoid and Ring
- ISSUE 79: Updates to sbt 0.11.3 to match Travis CI
- ISSUE 77: Add a scalding multi source tap that has a proper unique identifier
- ISSUE 74: Upgrade cascading to wip-291 and maple to 0.1.7
- ISSUE 73: Feature/comparator prop
- ISSUE 72: Upgrade to cascading wip-288 and maple 0.1.5
- ISSUE 71: Fixes an issue due to type erasure in KryoHadoopSerialization for scala 2.9.1
- ISSUE 70: Upgrade scalding to use cascading wip-286.
- ISSUE 69: Feature/more kryo tests
- ISSUE 67: Upgrade cascading to wip-281.
- ISSUE 66: Allow default time zone in DefaultDateRangeJob.
- ISSUE 65: Fixed the error message thrown by FileSource.validateTaps.
- ISSUE 62: Kryo Upgrade to 2.04
- ISSUE 60: Feature/abstract algebra
- ISSUE 52: Feature/cogroup builder
- ISSUE 51: Feature/headfix
- ISSUE 42: Feature/iterable source
- ISSUE 41: Adds blockJoinWithSmaller to JoinAlgorithms.
- ISSUE 39: Adding default value to pivot
- ISSUE 38: Fix bug with hash code collisions of Source objects
- ISSUE 36: Some cleanups of reduce and Operations
- ISSUE 35: Split RichPipe join methods into their own trait
- ISSUE 34: Adds Pivot/Unpivot
- ISSUE 33: Add pack and unpack methods to RichPipe
- ISSUE 32: Refactors reducer setting into RichPipe
- ISSUE 31: Implemented Mode.fileExists
- ISSUE 28: Simplifies TupleConverter
- ISSUE 21: move JobTest into main
- ISSUE 20: Adding a source for the most recent good date path.