You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Mark. Thanks for the issue. What is meant by commas is when it is comma separating the denominations (e.g., millions, billions, thousands, hundereds). This is U.S. convention. When I wrote qdapRegex I included a default U.S. dictionary with room for growth by adding additional other locale specific dictionaries via community support. In the README I have:
The functions in qdapRegex work on a dictionary system. The current implementation defaults to a United States flavor of canned regular expressions. Users may submit proposed region specific regular expression dictionaries that contain the same fields as the regex_usa data set or improvements to regular expressions in current dictionaries. Please submit proposed regional regular expression dictionaries via: https://github.com/trinker/qdapRegex/issues
I would love if you were willing to make a Netherlands specific dictionary. I/We could blog/tweet about it and the community support and hopefully get the ball rolling with other locale specific dictionaries from the community if you were willing. I'm guessing a lot of the dictionary for Netherlands would be the same as the U.S. one I made (e.g., IP address is a universal thing) while others would require nly minor tweaks.
So for example with your problem we could use the current regex for U.S. and just swap out the comma and period using the textclean package's swap function:
library(qdapRegex)
library(textclean)
## make netherlands pattern
textclean::swap(qdapRegex::grab('rm_number'), ',', '.')
## "(?<=^| )[-,]*\\d+(?:\\,\\d+)?(?= |\\,?$)|\\d+(?:.\\d{3})+(\\,\\d+)*"
## make rm_number function for netherlands
rm_number2 <- rm_(pattern = textclean::swap(qdapRegex::grab('rm_number'), ',', '.'))
rm_number2("hello 12,5 world and another 1.234.567,89")
## [1] "hello world and another"
According to the help file it should recognize this:
Here's the
sessionInfo
The text was updated successfully, but these errors were encountered: