Rack::UTF8Sanitizer is a Rack middleware which cleans up invalid UTF8 characters in request URI and headers. Additionally, it cleans up invalid UTF8 characters in the request body (depending on the configurable content type filters) by reading the input into a string, sanitizing the string, then replacing the Rack input stream with a rewindable input stream backed by the sanitized string.
Add this line to your application's Gemfile:
And then execute:
Or install it yourself as:
$ gem install rack-utf8_sanitizer
For Rails, add this to your
config.middleware.insert 0, Rack::UTF8Sanitizer
For Rack apps, add this to
Rack::UTF8Sanitizer divides all keys in the Rack environment in two distinct groups: keys which contain raw data and the ones with percent-encoded data. The fields which are treated as percent-encoded are:
The generic sanitization algorithm is as follows:
- Force the encoding to UTF-8.
- If the result contains invalid characters:
- Force the encoding to ASCII8-BIT.
- Re-encode it as UTF-8, replacing invalid and undefined characters as U+FFFD.
For fields with "raw data", the algorithm is applied once and the (UTF-8 encoded) result is left in the environment.
For fields with "percent-encoded data", the algorithm is applied twice to catch both invalid characters appearing as-is and invalid characters appearing in the percent encoding. The percent encoded, ASCII-8BIT encoded result is left in the environment.
Sanitizable content types
To add sanitizable content types to the list of defaults, pass the
additional_content_types options when using Rack::UTF8Sanitizer, e.g.
config.middleware.insert 0, Rack::UTF8Sanitizer, additional_content_types: ['application/vnd.api+json']
To explicitly set sanitizable content types and override the defaults, use the
config.middleware.insert 0, Rack::UTF8Sanitizer, sanitizable_content_types: ['application/vnd.api+json']
Whitelist/Blacklist Rack Env Keys
:except keys you can skip sanitation of values in the Rack Env.
:except are arrays that can contain strings or regular expressions.
Only sanitize the body, query string, and url of a request.
config.middleware.insert 0, Rack::UTF8Sanitizer, only: ['rack.input', 'PATH_INFO', 'QUERY_STRING']
Sanitize everything except HTTP headers.
config.middleware.insert 0, Rack::UTF8Sanitizer, except: [/HTTP_.+/]
There are two built in strategies for handling invalid characters. The default strategy is
:replace, which will cause any invalid characters to be replaces with the unicode replacement character (�). The second built in strategy is
:exception which will cause an
EncodingError exception to be raised if invalid characters are found (the exception can then be handled by another Rack middleware).
An object that responds to
#call and accepts the offending string with invalid characters as an argumant can also be passed as a
:strategy. This is how you can define custom strategies.
config.middleware.insert 0, Rack::UTF8Sanitizer, strategy: :exception
replace_string = lambda do |_invalid| Rails.logger.warn('Replacing invalid string') '<Bad Encoding>'.freeze end config.middleware.insert 0, Rack::UTF8Sanitizer, strategy: replace_string
- Fork it
- Create your feature branch (
git checkout -b my-new-feature)
- Commit your changes (
git commit -am 'Add some feature')
- Push to the branch (
git push origin my-new-feature)
- Create new Pull Request
To run the tests, run
rake spec in the project directory.