Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional selection helpers. #34

Closed
ericemc3 opened this issue Dec 2, 2020 · 6 comments · Fixed by #36
Closed

Add additional selection helpers. #34

ericemc3 opened this issue Dec 2, 2020 · 6 comments · Fixed by #36
Labels
enhancement New feature or request

Comments

@ericemc3
Copy link

ericemc3 commented Dec 2, 2020

These are suggestions from a R practice, with handy functionalities i frequently use:

  1. select
    current possibilities such as selecting by name, indices, not, range are excellent. I'd like also: starts_with, contains, and the possibility to test on column types (for instance select all numeric columns)

  2. rename
    of course select in Arquero allows to rename, i have noticed that.
    Within R, janitor::clean_names is very useful for tidying and fixing complex column_names that you can find, sometimes, in CSV files (with blanks, accented characters, caps) => DB compliant column names (one single world, lowercase, without accented or special chars...)

@jheer
Copy link
Member

jheer commented Dec 2, 2020

Thanks! I've created a PR that adds more selection helpers (matches, startswith, endswith) for parity with dplyr.

As for more advanced renaming support, you can always add (and share!) your own utilities outside of the core Arquero library. For example:

function normalize_name(str) {
  // perform custom column name normalization
}

// create a mapping of original column names to normalized names
const nameMap = {};
table.columnNames.forEach(s => nameMap[name] = normalize_name(s));

// rename the columns
const newTable = table.select(nameMap);

@jheer jheer changed the title select and/or rename columns Add additional selection helpers. Dec 2, 2020
@ericemc3
Copy link
Author

ericemc3 commented Dec 3, 2020

Thank you for the suggestion, so here is my share for normalizing table columns names:

function normalize_column_name(str, options = {lower:true}) {
    str = str.replace(/[%#$£()\'\"]/g, '')
             .replace( /[ /,+.*:\-\r\n@]/g, "_" ) 
             .replace(/_+/g, '_') 
             .normalize("NFD").replace(/[\u0300-\u036f]/g, "") //get rid of accents
   
  if (options.lower) 
    str = str.toLowerCase() ;
  
  return str ;
}

function normalize_table_names(tb, options = {lower:true}) {
  return tb.select( tb.columnNames().map(d => ({[d]: normalize_column_name(d, options)})) ) ;
}

@jheer
Copy link
Member

jheer commented Dec 3, 2020

Thanks for sharing! @ericemc3, do you mind if I later add this (or a minor revision thereof) to the Arquero Cookbook? If that's OK, let me know how you'd like me to acknowledge your contribution (by name, GitHub id, Observable id, or...?).

@ericemc3
Copy link
Author

ericemc3 commented Dec 3, 2020

My pleasure, you can of course reuse this as you want. I published them on https://observablehq.com/@ericmauviere/arquero-table-utils

@ericemc3
Copy link
Author

ericemc3 commented Dec 3, 2020

I added a new utility function for renaming columns :

function rename_columns(tb, upd) {
  let colsdef = tb.columnNames().map(d => ({[d]:d}) ).reduce((a,v) => Object.assign(a,v)) ;
  
  for (const k in upd) {
    if (colsdef.hasOwnProperty(k)) 
        colsdef[k] = upd[k] ; 
  }
  
  return tb.select(colsdef) ;
}

rename_columns(tb, {currentname1:'newname1', currentname2:'newname2'})

@jheer
Copy link
Member

jheer commented Dec 4, 2020

New selection helpers now staged for v1.2.0.

@jheer jheer closed this as completed Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants