Skip to content

Commit

Permalink
allow column filter function as column-selector in replace-missing (#183
Browse files Browse the repository at this point in the history
)

* replace-missing with fn as columns-selector

* update docs of replace-missing
  • Loading branch information
kimim committed Dec 10, 2020
1 parent 309dc98 commit fc20dfb
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 6 deletions.
21 changes: 15 additions & 6 deletions src/tech/v3/dataset/missing.clj
Original file line number Diff line number Diff line change
Expand Up @@ -176,8 +176,13 @@

(defn replace-missing
"Replace missing values in some columns with a given strategy.
The columns selector may be any legal argument to select-columns.
The columns selector may be:
- seq of any legal column names
- or a column filter function, such as `numeric` and `categorical`
Strategies may be:
- `:down` - take value from previous non-missing row if possible else use next
non-missing row.
- `:up` - take value from next non-missing row if possible else use previous
Expand All @@ -186,17 +191,21 @@
rows.
- `:lerp` - Linearly interpolate values between previous and next nonmissing rows.
- `:value` - Value will be provided - see below.
value may be provided which will then be used. Value may be a function in which
case it will be called on the column with missing values elided and the return will
be used to as the filler."
value may be provided which will then be used. Value may be a function in which
case it will be called on the column with missing values elided and the return will
be used to as the filler."
([ds] (replace-missing ds :mid))
([ds strategy] (replace-missing ds :all strategy))
([ds columns-selector strategy]
(replace-missing ds columns-selector strategy nil))
([ds columns-selector strategy value]
(let [strategy (or strategy :mid)
row-cnt (ds-base/row-count ds)]
(->> (ds-base/select-columns ds columns-selector)
row-cnt (ds-base/row-count ds)
selected (if (fn? columns-selector)
(columns-selector ds)
(ds-base/select-columns ds columns-selector))]
(->> selected
(ds-base/columns)
(reduce (fn [ds col]
(let [^RoaringBitmap missing (col/missing col)]
Expand Down
12 changes: 12 additions & 0 deletions test/tech/v3/dataset_test.clj
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
[tech.v3.dataset.string-table :as str-table]
[tech.v3.dataset.join :as ds-join]
[tech.v3.dataset.test-utils :as test-utils]
[tech.v3.dataset.column-filters :as cf]
;;Loading multimethods required to load the files
[tech.v3.libs.poi]
[tech.v3.libs.fastexcel]
Expand Down Expand Up @@ -739,6 +740,17 @@
(ds/missing)
(dtype/ecount))))))

(deftest replace-missing-selector-fn
(let [ds (ds/->dataset {:a [nil nil 2 4]
:b [nil nil 4 6]
:c [nil nil "A" "B"]})
ds-replaced (-> ds
(ds/replace-missing cf/numeric :value dfn/mean)
(ds/replace-missing cf/categorical :value "C"))]
(is (= [3 3 2 4] (vec (ds-replaced :a))))
(is (= [5 5 4 6] (vec (ds-replaced :b))))
(is (= ["C" "C" "A" "B"] (vec (ds-replaced :c))))))


(deftest replace-missing-ldt
(let [dtds (ds/->dataset {:dt [(java.time.LocalDateTime/of 2020 1 1 1 1 1)
Expand Down

0 comments on commit fc20dfb

Please sign in to comment.