New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres bytea value integrity broken on import #443

Closed
mwalsher opened this Issue Aug 31, 2017 · 6 comments

Comments

Projects
None yet
2 participants
@mwalsher
Contributor

mwalsher commented Aug 31, 2017

Hi there,

I seem to have uncovered a bug with how values are quoted when it comes to PG's binary "bytea" column. Basically upon import, the integrity of our encrypted data is broken as a result.

I've traced the problem line to here: ActiveRecord::Import:L756: connection_memo.quote(column.type_cast(val), column)

The issue is that column.type_cast(val) is calling PGconn.unescape_bytea on PG bytea column values (http://api.rubyonrails.org/v4.1.16/classes/ActiveRecord/ConnectionAdapters/PostgreSQLAdapter/OID/Bytea.html).

If we have an encrypted string, e.g.: enc_str = "\xE0c\xB2\xB0\xB3Bh\\\xC2M\xB1m\\I\xC4r"

and call PGconn.unescape_bytea(enc_str), we get:

"\xE0c\xB2\xB0\xB3Bh\xC2M\xB1mI\xC4r"

The issue here is that multiple adjacent slashes (e.g. \\\xC2M) are removed (becoming e.g. \XC2M), which breaks integrity of the encryption .

Then connection_memo.quote(...) calls PGconn.escape_bytea(...) with the the mangled encryption string, which results in a successful save, but the encrypted value is no longer valid.

To work around this, we basically need to preemptively escape the encryption string before ActiveRecord::Import unescapes/escapes it. This means each encrypted value is escaped/unescaped/escaped though, which is obviously inefficient.

I guess I'm wondering if it's necessary to call column.type_cast(val) in the aforementioned line?

If that can't be removed, can you think of a better workaround?

Thanks for the help :)

@mwalsher

This comment has been minimized.

Show comment
Hide comment
@mwalsher

mwalsher Sep 1, 2017

Contributor

The issue can further be demonstrated with:

key = "asdf'\x16".force_encoding('ASCII-8BIT')
key2 = PGconn.unescape_bytea(PGconn.escape_bytea(key.dup))
=> "asdf''\x16"
key == key2
=> false

The problem is that PGconn.unescape_bytea is not the inverse of PGconn.escape_bytea.

I've fixed the issue with a monkey patch modifying values_sql_for_columns_and_attributes as follows:

# Returns SQL the VALUES for an INSERT statement given the passed in +columns+
    # and +array_of_attributes+.
    def values_sql_for_columns_and_attributes(columns, array_of_attributes) # :nodoc:
      # connection gets called a *lot* in this high intensity loop.
      # Reuse the same one w/in the loop, otherwise it would keep being re-retreived (= lots of time for large imports)
      connection_memo = connection
      array_of_attributes.map do |arr|
        my_values = arr.each_with_index.map do |val, j|
          column = columns[j]

          # be sure to query sequence_name *last*, only if cheaper tests fail, because it's costly
          if val.nil? && column.name == primary_key && !sequence_name.blank?
            connection_memo.next_value_for_sequence(sequence_name)
          elsif column
            if respond_to?(:type_caster)                                         # Rails 5.0 and higher
              type = type_for_attribute(column.name)
              val = type.type == :boolean ? type.cast(val) : type.serialize(val)
              connection_memo.quote(val)
            elsif column.respond_to?(:type_cast_from_user)                       # Rails 4.2
              connection_memo.quote(column.type_cast_from_user(val), column)
            else                                                                 # Rails 3.2, 4.0 and 4.1
              if serialized_attributes.include?(column.name)
                val = serialized_attributes[column.name].dump(val)
              end
              unless column.type.to_sym == :binary
                val = column.type_cast(val)
              end
              connection_memo.quote(val, column)
            end
          end
        end
        "(#{my_values.join(',')})"
      end
    end

The key part that changed being:

unless column.type.to_sym == :binary
  val = column.type_cast(val)
end
 connection_memo.quote(val, column)
Contributor

mwalsher commented Sep 1, 2017

The issue can further be demonstrated with:

key = "asdf'\x16".force_encoding('ASCII-8BIT')
key2 = PGconn.unescape_bytea(PGconn.escape_bytea(key.dup))
=> "asdf''\x16"
key == key2
=> false

The problem is that PGconn.unescape_bytea is not the inverse of PGconn.escape_bytea.

I've fixed the issue with a monkey patch modifying values_sql_for_columns_and_attributes as follows:

# Returns SQL the VALUES for an INSERT statement given the passed in +columns+
    # and +array_of_attributes+.
    def values_sql_for_columns_and_attributes(columns, array_of_attributes) # :nodoc:
      # connection gets called a *lot* in this high intensity loop.
      # Reuse the same one w/in the loop, otherwise it would keep being re-retreived (= lots of time for large imports)
      connection_memo = connection
      array_of_attributes.map do |arr|
        my_values = arr.each_with_index.map do |val, j|
          column = columns[j]

          # be sure to query sequence_name *last*, only if cheaper tests fail, because it's costly
          if val.nil? && column.name == primary_key && !sequence_name.blank?
            connection_memo.next_value_for_sequence(sequence_name)
          elsif column
            if respond_to?(:type_caster)                                         # Rails 5.0 and higher
              type = type_for_attribute(column.name)
              val = type.type == :boolean ? type.cast(val) : type.serialize(val)
              connection_memo.quote(val)
            elsif column.respond_to?(:type_cast_from_user)                       # Rails 4.2
              connection_memo.quote(column.type_cast_from_user(val), column)
            else                                                                 # Rails 3.2, 4.0 and 4.1
              if serialized_attributes.include?(column.name)
                val = serialized_attributes[column.name].dump(val)
              end
              unless column.type.to_sym == :binary
                val = column.type_cast(val)
              end
              connection_memo.quote(val, column)
            end
          end
        end
        "(#{my_values.join(',')})"
      end
    end

The key part that changed being:

unless column.type.to_sym == :binary
  val = column.type_cast(val)
end
 connection_memo.quote(val, column)
@jkowens

This comment has been minimized.

Show comment
Hide comment
@jkowens

jkowens Sep 1, 2017

Collaborator

@mwalsher what version of Rails are you using? It looks like this might have been addressed in Rails 4.1.9. Do you happen to be on an earlier release?

See: rails/rails@cca85ba

Collaborator

jkowens commented Sep 1, 2017

@mwalsher what version of Rails are you using? It looks like this might have been addressed in Rails 4.1.9. Do you happen to be on an earlier release?

See: rails/rails@cca85ba

@jkowens

This comment has been minimized.

Show comment
Hide comment
@jkowens

jkowens Sep 1, 2017

Collaborator

If I understand that update correctly, it prevents PGconn.unescape_bytea from being called for binary data when type casting.

Collaborator

jkowens commented Sep 1, 2017

If I understand that update correctly, it prevents PGconn.unescape_bytea from being called for binary data when type casting.

@mwalsher

This comment has been minimized.

Show comment
Hide comment
@mwalsher

mwalsher Sep 1, 2017

Contributor

We're using 4.1.16. I saw that as well, but the encrypted bytes in many cases do not have the /\x00/ character.

Contributor

mwalsher commented Sep 1, 2017

We're using 4.1.16. I saw that as well, but the encrypted bytes in many cases do not have the /\x00/ character.

@jkowens

This comment has been minimized.

Show comment
Hide comment
@jkowens

jkowens Sep 1, 2017

Collaborator

Ah I see, so that is definitely a shortcoming there 😄

Could you possibly put together a PR with a failing test?

Collaborator

jkowens commented Sep 1, 2017

Ah I see, so that is definitely a shortcoming there 😄

Could you possibly put together a PR with a failing test?

@mwalsher

This comment has been minimized.

Show comment
Hide comment
@mwalsher

mwalsher Sep 2, 2017

Contributor

Yup np, I'll take a stab at one next week. Thanks for looking into it!

Contributor

mwalsher commented Sep 2, 2017

Yup np, I'll take a stab at one next week. Thanks for looking into it!

mwalsher added a commit to mwalsher/activerecord-import that referenced this issue Sep 6, 2017

mwalsher added a commit to mwalsher/activerecord-import that referenced this issue Sep 6, 2017

@jkowens jkowens closed this in c38ac0d Sep 7, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment