You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be valuable to me to know if the all the values are unique. There is a number of unique values, and at the of the csvstat command there is
Row count: 1856
csvstat data/subtitles_day.tsv
1. "IDSubtitle"
Type of data: Number
Contains null values: False
Unique values: 1856
Smallest value: 9,747,231
Largest value: 9,749,339
Sum: 18,092,851,467
Mean: 9,748,303.592
Median: 9,748,352.5
StDev: 628.279
Most common values: 9,747,231 (1x)
9,747,232 (1x)
9,747,233 (1x)
9,747,234 (1x)
9,747,235 (1x)
That's lost with --csv, along with the frequency, so there's no easy way to know if the values are unique. For my purposes, I'm trying to find the primary key from a set of files, so knowing that the values are unique would be enormously helpful.
If the frequency count were included in the "freq" key, I could parse that and see if the top one was just 1, but adding "Values are unique" would be better. Of course, to determine primary key I'd also check "Contains null values".
The text was updated successfully, but these errors were encountered:
In HEAD, I instead added a "Non-null values" statistic (also appears in the --csv output). This information is useful for this use case as well as others.
You can thus compare non-null values to unique values. Note that if the column contains nulls, then NULL counts as one additional unique value.
It would be valuable to me to know if the all the values are unique. There is a number of unique values, and at the of the csvstat command there is
csvstat data/subtitles_day.tsv 1. "IDSubtitle" Type of data: Number Contains null values: False Unique values: 1856 Smallest value: 9,747,231 Largest value: 9,749,339 Sum: 18,092,851,467 Mean: 9,748,303.592 Median: 9,748,352.5 StDev: 628.279 Most common values: 9,747,231 (1x) 9,747,232 (1x) 9,747,233 (1x) 9,747,234 (1x) 9,747,235 (1x)
That's lost with --csv, along with the frequency, so there's no easy way to know if the values are unique. For my purposes, I'm trying to find the primary key from a set of files, so knowing that the values are unique would be enormously helpful.
If the frequency count were included in the "freq" key, I could parse that and see if the top one was just 1, but adding "Values are unique" would be better. Of course, to determine primary key I'd also check "Contains null values".
The text was updated successfully, but these errors were encountered: