rows selection enhanced #44

genmeblog · 2020-04-23T10:29:00Z

I can imagine such helper functions:

head - select first n rows (equivalent to (select-rows (range n))
tail - select last n rows
sample - return n random rows (with repetitions or not)
shuffle (permute) dataset
unique (whole dataset) by rows

The text was updated successfully, but these errors were encountered:

cnuernber · 2020-04-25T17:40:40Z

Unique is hardcore - you mean something like keep every row in a set and don't allow repeat rows or do you mean unique-by-column(s)?
head,tail,sample,shuffle, (rand-nth), all of those make sense.

cnuernber · 2020-04-25T18:01:15Z

Leaving unique/distinct for later as I think that one requires a bit more discussion.

genmeblog · 2020-04-25T18:56:00Z

Unique - removing row duplicates in whole dataset. But it's not urgent :)

harold · 2020-04-26T01:09:33Z

There is also Pandas' drop_duplicates, which I've needed in the past: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html

cnuernber self-assigned this Apr 25, 2020

cnuernber closed this as completed in da238b8 Apr 25, 2020

cnuernber mentioned this issue Apr 26, 2020

Unique, drop duplicates #54

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rows selection enhanced #44

rows selection enhanced #44

genmeblog commented Apr 23, 2020 •

edited

Loading

cnuernber commented Apr 25, 2020

cnuernber commented Apr 25, 2020

genmeblog commented Apr 25, 2020

harold commented Apr 26, 2020

rows selection enhanced #44

rows selection enhanced #44

Comments

genmeblog commented Apr 23, 2020 • edited Loading

cnuernber commented Apr 25, 2020

cnuernber commented Apr 25, 2020

genmeblog commented Apr 25, 2020

harold commented Apr 26, 2020

genmeblog commented Apr 23, 2020 •

edited

Loading