Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC issue with string view #530

Closed
andrewray opened this issue Jul 3, 2017 · 6 comments
Closed

GC issue with string view #530

andrewray opened this issue Jul 3, 2017 · 6 comments

Comments

@andrewray
Copy link
Contributor

I have written the following wrapper for a hashing function from the cmph library.

It uses a string coercion to pass a pointer to a char buffer. Running 10K times leads to occasional incorrect values being calculated.

Different runs produce values, though always on the same iteration. Adding a Gc.full_major after every nth loop will change which iteration it occurs on.

(* cmph_uint32 jenkins_hash_packed(void *jenkins_packed, const char *k, cmph_uint32 keylen) *)
  let hash = 
    let f = 
      Foreign.foreign "jenkins_hash_packed"
        (ptr uint @-> string @-> uint @-> returning uint)
    in
    let iseed = Unsigned.UInt.of_int64 0xdeadbeefL in
    let pseed = allocate uint Unsigned.UInt.zero in
    (fun ?(seed=iseed) data ->
       pseed <-@ seed;
       f pseed data (Unsigned.UInt.of_int (String.length data)))

The following, which allocates a CArray explicitly, seems to be fine.

  let hash = 
    let f = 
      Foreign.foreign "jenkins_hash_packed"
        (ptr uint @-> ptr char @-> uint @-> returning uint)
    in
    let iseed = Unsigned.UInt.of_int64 0xdeadbeefL in
    let pseed = allocate uint Unsigned.UInt.zero in
    (fun ?(seed=iseed) data ->
       let len = String.length data in
       pseed <-@ seed;
       let arr = CArray.make char len in
       for i=0 to len-1 do
         CArray.set arr i data.[i]
       done;
       f pseed (CArray.start arr) (Unsigned.UInt.of_int len))

Any thoughts on what might be happening?

@hhugo
Copy link
Contributor

hhugo commented Jul 3, 2017

#159

@andrewray
Copy link
Contributor Author

Thanks. If this is the case, can it ever be safe to use a string view to pass data to a function? I can see no way to enforce the lifetime of the internal array.

@yallop
Copy link
Owner

yallop commented Jul 4, 2017

It's possible the same problem as #159, but it's not immediately clear. The problem in #159 involves pointers to strings, not just plain strings.

The code you've posted looks fine. What does the call to hash look like?

@andrewray
Copy link
Contributor Author

This is the test code calling the above. The HardCamlHash.Jenkins.hash function is a pure ocaml implementation.

module U = Unsigned.UInt

let () = 
  for i=0 to 100000 do
    let len = Random.int 100 + 1 in (* 1..100 length strings *)
    let str = String.init len (fun _ -> Char.chr (Random.int 256)) in
    for j=0 to 3 do (* test a few different seeds *)
      let seed = U.of_int64 (Random.int64 0x100000000L) in
      let hash1 = Cmph.Capi.Jenkins.hash ~seed str in
      let hash2 = HardCamlHash.Jenkins.hash ~seed str in
      if hash1 <> hash2 then begin
        Printf.printf "[%.8Lx] {%i} %.8Lx %.8Lx\n%!"
          (U.to_int64 seed)
          len
          (U.to_int64 hash1)
          (U.to_int64 hash2)
      end;
    done;
  done

Running the string view version above twice gives:

andyman@brumble:~/dev/github/ujamjar/hardcaml/hardcaml-hash (master)*$ ../_build/default/hardcaml-hash/tools/testhash.exe 
[d24ffb9e] {98} c07ff411 71564d56
[49078f84] {58} c4377e01 b1f472e2
[1014fb1c] {61} fa1b7beb 9f1194d5
[cf9c8dec] {32} baec3be4 991cf4aa
[fff20b54] {28} d257df4c 386e9001
[451efbf1] {41} 264c7b58 f3f005ae
[44a945f2] {12} 7caabeb4 48214533
[0827e551] {83} eabe3e6e bfee3b5a
[ddd00122] {68} d1efe9f4 4a5eec52
[b5f2f784] {68} a0271ace d8fd834d
[f68d77a1] {11} 29f3e929 f5cafab1
[1bef37d8] {18} a87d7a22 37589be2
[cae7f596] {63} dbb75536 acc1729e
[3d5a9126] {92} 3599e739 7c28d539
[2a356889] {90} 0e56aea5 18bf07a6
[cbd2e828] {62} 5a917b3e 02faaf93
[b89c84cf] {4} 926e4034 d863565d
[bb7c2048] {74} 7692ad09 891bbe7b
[fbec7d47] {78} b7a3eabd ae0e3cd5
[6b5046ad] {23} ae4195c0 47a94961
[05056869] {99} bb1e9091 af63c2a4
[90c87adc] {69} bee9b9db db8ac41d
[cafe0cbd] {49} fda46c4e 9aa0c786
[40b21098] {95} 08217ec2 2302d68d
andyman@brumble:~/dev/github/ujamjar/hardcaml/hardcaml-hash (master)*$ ../_build/default/hardcaml-hash/tools/testhash.exe 
[d24ffb9e] {98} bd3a74b6 71564d56
[49078f84] {58} 1764f778 b1f472e2
[1014fb1c] {61} c7d3668d 9f1194d5
[cf9c8dec] {32} 805f2049 991cf4aa
[fff20b54] {28} 9b117434 386e9001
[451efbf1] {41} dd4b9be0 f3f005ae
[44a945f2] {12} 794f59d7 48214533
[0827e551] {83} ec078818 bfee3b5a
[ddd00122] {68} f8177be8 4a5eec52
[b5f2f784] {68} b4905f34 d8fd834d
[f68d77a1] {11} 4ca01a75 f5cafab1
[1bef37d8] {18} b8397962 37589be2
[cae7f596] {63} b8e235aa acc1729e
[3d5a9126] {92} 920be660 7c28d539
[2a356889] {90} 37234f79 18bf07a6
[cbd2e828] {62} c6245da9 02faaf93
[b89c84cf] {4} 673d26cf d863565d
[bb7c2048] {74} 19b2fd51 891bbe7b
[fbec7d47] {78} b0c63281 ae0e3cd5
[6b5046ad] {23} 088fcdbe 47a94961
[05056869] {99} 69b75377 af63c2a4
[90c87adc] {69} bdc62413 db8ac41d
[cafe0cbd] {49} a2d25078 9aa0c786
[40b21098] {95} 4e805fe6 2302d68d

Note that the OCaml version is producing the same values each time, and the C bound version differs in the 2 runs. Using the CArray version no errors are printed.

@andrewray
Copy link
Contributor Author

@yallop
Copy link
Owner

yallop commented Jul 5, 2017

Thanks for the detailed report. I can reproduce the bug here, and I think I've found the root cause: intermediate values generated by views, including string, are sometimes prematurely collected when passed as arguments to Foreign calls (i.e. dynamically-bound functions, but not generated stubs). There'll be a ctypes point release shortly with a fix for the problem (#531).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants