Skip to content

Commit

Permalink
Added an option for the default parser type and fixed a mapseq iterat…
Browse files Browse the repository at this point in the history
…ion error.
  • Loading branch information
cnuernber committed Aug 3, 2021
1 parent 1aaf40e commit 280bad3
Show file tree
Hide file tree
Showing 5 changed files with 19 additions and 13 deletions.
4 changes: 2 additions & 2 deletions project.clj
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
:license {:name "Eclipse Public License"
:url "http://www.eclipse.org/legal/epl-v10.html"}
:dependencies [[org.clojure/clojure "1.10.3" :scope "provided"]
[cnuernber/dtype-next "8.021"]
[techascent/tech.io "4.07"
[cnuernber/dtype-next "8.022"]
[techascent/tech.io "4.08"
:exclusions [org.apache.commons/commons-compress]]
[com.univocity/univocity-parsers "2.9.0"]
[org.apache.poi/poi-ooxml "5.0.0"
Expand Down
8 changes: 4 additions & 4 deletions src/tech/v3/dataset/column.clj
Original file line number Diff line number Diff line change
Expand Up @@ -229,11 +229,11 @@ Implementations should check their metadata before doing calculations."

(defn union-missing-sets
"Union the missing sets of the columns returning a roaring bitmap"
^RoaringBitmap [col-seq]
(reduce dtype-proto/set-or (map col-proto/missing col-seq)))
[col-seq]
(reduce dtype/set-or (map col-proto/missing col-seq)))


(defn intersect-missing-sets
"Intersect the missing sets of the columns returning a roaring bitmap"
^RoaringBitmap [col-seq]
(reduce dtype-proto/set-and (map col-proto/missing col-seq)))
[col-seq]
(reduce dtype/set-and (map col-proto/missing col-seq)))
5 changes: 5 additions & 0 deletions src/tech/v3/dataset/io.clj
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,11 @@
be created in `(System/getProperty \"java.io.tmpdir\")` by default.
- `:n-initial-skip-rows` - Skip N rows initially. This currently may include the
header row. Works across both csv and spreadsheet datasets.
- `:parser-type` - Default parser to use if no parser-fn is specified for that column.
For csv files, the default parser type is `:string` which indicates a promotional
string parser. For sequences of maps, the default parser type is :object. It can
be useful in some contexts to use the `:string` parser with sequences of maps or
maps of columns.
- `:parser-fn` -
- `keyword?` - all columns parsed to this datatype. For example:
`{:parser-fn :string}`
Expand Down
2 changes: 1 addition & 1 deletion src/tech/v3/dataset/io/context.clj
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
that produces a column parser for a given column name or index.
parse-type is either :string or :object."
[options parse-type]
(let [default-parse-fn (case parse-type
(let [default-parse-fn (case (get options :parser-type parse-type)
:object column-parsers/promotional-object-parser
:string column-parsers/promotional-string-parser
nil (constantly nil))
Expand Down
13 changes: 7 additions & 6 deletions src/tech/v3/dataset/io/mapseq_colmap.clj
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,13 @@
n-rows (loop [continue? (.hasNext iter)
row-idx 0]
(if continue?
(let [row (.next iter)]
(pfor/doiter
cell row
(let [[k v] cell
parser (colname->parser k)]
(column-parsers/add-value! parser row-idx v)))
(do
(when-let [row (.next iter)]
(pfor/doiter
cell row
(let [[k v] cell
parser (colname->parser k)]
(column-parsers/add-value! parser row-idx v))))
(recur (.hasNext iter)
(unchecked-inc row-idx)))
row-idx))]
Expand Down

0 comments on commit 280bad3

Please sign in to comment.