Boost Attribute Speed and Address Security Issue

## SECURITY ISSUE The code was taking data, which in many cases could be externally submitted, and was generating atoms from it to filter attribute keys down. This is a problem because atoms are not garbage collected and only a finite amount can exist. An attacker can craft loads of query strings which would eventually cripple a running server. After processing the following request as suggested by the README this would generate 7 potential new atoms. **EXAMPLE:** `/widgets?fields[widget]=a,aa,aaa,aaaa,ab,aba,abb` This has been corrected by using `String.to_existing_atom/1`. It should be noted that there is a chance this can break existing clients that have some kind of dynamic attribute system in which keys aren't represented as atoms before this filter applies. ## BOOST BREAKDOWN The Attribute module has the responsibility of delegating data to a serializer module, filtering any optional fields given as opts and then wrapping them in an `Attribute` structure that can later be used for custom formatting. The most expensive part of this process is wrapping the returned values in a `Attribute` structure, and this part is not avoidable. However, there was a huge opportunity for a speed increase when looking at how the filtering works. It splits strings, turns them into atoms, then filters data based on that list. Doing this for potentially every item in a collection becomes an N+1 problem, and can be avoided by doing this work only once per request. This work does not provide a solution for ahead-of-time filtering; however, it will allow for it in the future when it becomes available. When a filter list is given a list of pre-calculated atoms the speed increase for each calculation goes up by about 240% for a small trivial list of fields to filter. This speed increase only gets bigger as the list of fields to filter grows. The most notable changes to the code are: * fast out filtering when no opts are present * opting for `Map.take` over custom map filtering code * only generating list of filter keys if it is needed ## ORIGINAL BENCH ``` ## Builder.AttributeBench optimized opts for correct serializer small data -- not supported -- optimized opts for correct serializer big data -- not supported -- field opts for wrong serializer small data 1000000 1.09 µs/op no field opts to process small data 1000000 1.21 µs/op no opts to process small data 1000000 1.22 µs/op no opts to process big data 1000000 2.60 µs/op no field opts to process big data 1000000 2.85 µs/op field opts for wrong serializer big data 1000000 2.86 µs/op fields opts for correct serializer big data 500000 3.43 µs/op fields opts for correct serializer small data 500000 3.44 µs/op ``` ## AFTER BOOST ``` ## Builder.AttributeBench no opts to process small data 10000000 0.93 µs/op optimized opts for correct serializer small data 10000000 0.94 µs/op no field opts to process small data 1000000 1.01 µs/op optimized opts for correct serializer big data 1000000 1.07 µs/op field opts for wrong serializer small data 1000000 1.10 µs/op no opts to process big data 1000000 2.34 µs/op no field opts to process big data 1000000 2.42 µs/op field opts for wrong serializer big data 1000000 2.84 µs/op fields opts for correct serializer big data 500000 3.23 µs/op fields opts for correct serializer small data 500000 3.23 µs/op ``` ## SPEED INCREASE BREAKDOWN * filtering with pre-calculated filter-list **240%++ faster** * worst case senario (non-calculated filter list) **6% faster** * no field filters present **17% faster** * filters present, but not for correct serializer **no speed increase**
vt-elixir · May 26, 2016 · c9bcf96 · c9bcf96
1 parent d91413b
commit c9bcf96
Show file tree

Hide file tree

Showing 3 changed files with 114 additions and 13 deletions.
diff --git a/bench/builder/attribute_bench.exs b/bench/builder/attribute_bench.exs
@@ -0,0 +1,62 @@
+defmodule Builder.AttributeBench do
+  use Benchfella
+
+  @big 1..20 
+       |> Enum.map(&{String.to_atom("key_#{&1}"), &1})
+       |> Enum.into(%{})
+
+  @small 1..5
+         |> Enum.map(&{String.to_atom("key_#{&1}"), &1})
+         |> Enum.into(%{})
+
+  bench "no opts to process small data",
+    [context: data_no_opts(@small)],
+    do: JaSerializer.Builder.Attribute.build(context)
+
+  bench "no opts to process big data",
+    [context: data_no_opts(@big)],
+    do: JaSerializer.Builder.Attribute.build(context)
+
+  bench "no field opts to process small data",
+    [context: data_opts(@small)],
+    do: JaSerializer.Builder.Attribute.build(context)
+
+  bench "no field opts to process big data",
+    [context: data_opts(@big)],
+    do: JaSerializer.Builder.Attribute.build(context)
+
+  bench "field opts for wrong serializer small data",
+    [context: data_opts(@small, fields: %{"seabass"=>"fin,scale"})],
+    do: JaSerializer.Builder.Attribute.build(context)
+
+  bench "field opts for wrong serializer big data",
+    [context: data_opts(@big, fields: %{"seabass"=>"fin,scale"})],
+    do: JaSerializer.Builder.Attribute.build(context)
+
+  bench "fields opts for correct serializer small data",
+    [context: data_opts(@small, fields: %{"widget"=>"key_1,key_5"})],
+    do: JaSerializer.Builder.Attribute.build(context)
+
+  bench "fields opts for correct serializer big data",
+    [context: data_opts(@small, fields: %{"widget"=>"key_1,key_5"})],
+    do: JaSerializer.Builder.Attribute.build(context)
+
+  bench "optimized opts for correct serializer small data",
+    [context: data_opts(@small, fields: %{"widget"=>[:key_2, :key_8]})],
+    do: JaSerializer.Builder.Attribute.build(context)
+
+  bench "optimized opts for correct serializer big data",
+    [context: data_opts(@big, fields: %{"widget"=>[:key_2, :key_8]})],
+    do: JaSerializer.Builder.Attribute.build(context)
+
+  defmodule Serializer do
+    def type, do: "widget"
+    def attributes(data, _), do: data
+  end
+
+  defp data_no_opts(data),
+    do: %{data: data, serializer: Serializer, conn: nil}
+
+  defp data_opts(data, opts \\ []),
+    do: %{data: data, serializer: Serializer, conn: nil, opts: opts}
+end
diff --git a/lib/ja_serializer/builder/attribute.ex b/lib/ja_serializer/builder/attribute.ex
@@ -4,24 +4,32 @@ defmodule JaSerializer.Builder.Attribute do
   defstruct [:key, :value]
 
   def build(context) do
-    context
-    |> fields_to_include
+    attributes(context)
+    |> filter_fields(context)
     |> Enum.map(&do_build/1)
   end
 
-  def fields_to_include(%{data: data, serializer: serializer, conn: conn} = context) do
-    attrs = apply(serializer,:attributes, [data, conn])
-    field_map = context[:opts][:fields] || %{}
-    attrs_to_include = Map.get(field_map, serializer.type)
+  defp attributes(%{serializer: serializer, data: data, conn: conn}) do
+    serializer.attributes(data, conn)
+  end
 
-    cond do
-      attrs_to_include ->
-        include_list = String.split(attrs_to_include, ",") |> Enum.map(&String.to_atom/1)
-        Enum.filter attrs, fn({key, _value}) -> key in include_list end
-      true ->
-        attrs
+  defp filter_fields(attrs, %{serializer: serializer, opts: opts}) do
+    case opts[:fields] do
+      fields when is_map(fields) -> do_filter(attrs, fields[serializer.type])
+      _any -> attrs
     end
   end
+  defp filter_fields(attrs, _), do: attrs
+
+  defp do_filter(attrs, nil), do: attrs
+  defp do_filter(attrs, fields) when is_list(fields),
+    do: Map.take(attrs, fields)
+  defp do_filter(attrs, fields) when is_binary(fields),
+    do: do_filter(attrs, safe_atom_list(fields))
+
+  defp safe_atom_list(field_str) do
+    field_str |> String.split(",") |> Enum.map(&String.to_existing_atom/1)
+  end
 
-  def do_build({key, value}), do: %__MODULE__{key: key, value: value}
+  defp do_build({key, value}), do: %__MODULE__{key: key, value: value}
 end
diff --git a/test/ja_serializer/formatter/attribute_test.exs b/test/ja_serializer/formatter/attribute_test.exs
@@ -7,6 +7,11 @@ defmodule JaSerializer.Formatter.AttributeTest do
     defstruct [:foo, :bar]
   end
 
+  defmodule SimpleSerializer do
+    def type, do: "simple"
+    def attributes(data,_), do: data
+  end
+
   defimpl JaSerializer.Formatter, for: [Example, Map] do
     def format(%{foo: foo, bar: bar}), do: [foo, bar] |> Enum.join("")
     def format(%{} = map), do: map
@@ -27,4 +32,30 @@ defmodule JaSerializer.Formatter.AttributeTest do
 
     assert {"example", "foobar"} == results
   end
+
+  test "the correct keys are filtered out with build" do
+    context = %{data: %{key_1: 1, key_2: 2, key_3: 3},
+                serializer: SimpleSerializer,
+                conn: nil,
+                opts: [fields: %{"simple"=>"key_2,key_3"}]}
+
+    result = @attr.build(context)
+
+    refute :key_1 in Enum.map(result, &(&1.key))
+    assert :key_2 in Enum.map(result, &(&1.key))
+    assert :key_3 in Enum.map(result, &(&1.key))
+  end
+
+  test "the correct keys are are filtered when given a list" do
+    context = %{data: %{key_1: 1, key_2: 2, key_3: 3},
+                serializer: SimpleSerializer,
+                conn: nil,
+                opts: [fields: %{"simple"=>[:key_2, :key_3]}]}
+
+    result = @attr.build(context)
+
+    refute :key_1 in Enum.map(result, &(&1.key))
+    assert :key_2 in Enum.map(result, &(&1.key))
+    assert :key_3 in Enum.map(result, &(&1.key))
+  end
 end