Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Boost Attribute Speed and Address Security Issue
## SECURITY ISSUE The code was taking data, which in many cases could be externally submitted, and was generating atoms from it to filter attribute keys down. This is a problem because atoms are not garbage collected and only a finite amount can exist. An attacker can craft loads of query strings which would eventually cripple a running server. After processing the following request as suggested by the README this would generate 7 potential new atoms. **EXAMPLE:** `/widgets?fields[widget]=a,aa,aaa,aaaa,ab,aba,abb` This has been corrected by using `String.to_existing_atom/1`. It should be noted that there is a chance this can break existing clients that have some kind of dynamic attribute system in which keys aren't represented as atoms before this filter applies. ## BOOST BREAKDOWN The Attribute module has the responsibility of delegating data to a serializer module, filtering any optional fields given as opts and then wrapping them in an `Attribute` structure that can later be used for custom formatting. The most expensive part of this process is wrapping the returned values in a `Attribute` structure, and this part is not avoidable. However, there was a huge opportunity for a speed increase when looking at how the filtering works. It splits strings, turns them into atoms, then filters data based on that list. Doing this for potentially every item in a collection becomes an N+1 problem, and can be avoided by doing this work only once per request. This work does not provide a solution for ahead-of-time filtering; however, it will allow for it in the future when it becomes available. When a filter list is given a list of pre-calculated atoms the speed increase for each calculation goes up by about 240% for a small trivial list of fields to filter. This speed increase only gets bigger as the list of fields to filter grows. The most notable changes to the code are: * fast out filtering when no opts are present * opting for `Map.take` over custom map filtering code * only generating list of filter keys if it is needed ## ORIGINAL BENCH ``` ## Builder.AttributeBench optimized opts for correct serializer small data -- not supported -- optimized opts for correct serializer big data -- not supported -- field opts for wrong serializer small data 1000000 1.09 µs/op no field opts to process small data 1000000 1.21 µs/op no opts to process small data 1000000 1.22 µs/op no opts to process big data 1000000 2.60 µs/op no field opts to process big data 1000000 2.85 µs/op field opts for wrong serializer big data 1000000 2.86 µs/op fields opts for correct serializer big data 500000 3.43 µs/op fields opts for correct serializer small data 500000 3.44 µs/op ``` ## AFTER BOOST ``` ## Builder.AttributeBench no opts to process small data 10000000 0.93 µs/op optimized opts for correct serializer small data 10000000 0.94 µs/op no field opts to process small data 1000000 1.01 µs/op optimized opts for correct serializer big data 1000000 1.07 µs/op field opts for wrong serializer small data 1000000 1.10 µs/op no opts to process big data 1000000 2.34 µs/op no field opts to process big data 1000000 2.42 µs/op field opts for wrong serializer big data 1000000 2.84 µs/op fields opts for correct serializer big data 500000 3.23 µs/op fields opts for correct serializer small data 500000 3.23 µs/op ``` ## SPEED INCREASE BREAKDOWN * filtering with pre-calculated filter-list **240%++ faster** * worst case senario (non-calculated filter list) **6% faster** * no field filters present **17% faster** * filters present, but not for correct serializer **no speed increase**
- Loading branch information