Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory allocation error - incorrect checksum for freed object #871

Closed
wehriam opened this issue Mar 25, 2019 · 17 comments
Closed

Memory allocation error - incorrect checksum for freed object #871

wehriam opened this issue Mar 25, 2019 · 17 comments

Comments

@wehriam
Copy link

wehriam commented Mar 25, 2019

Description

Memory allocation errors occur when launching 100 servers that do not appear when launching 10 servers in the same configuration.

  • OS: Mac OS 10.14
  • Node: Homebrew v10.15.2

Code to reproduce

https://github.com/bunchtogether/braid-server/blob/master/tests/peers-ring.test.js

(Apologies, I have been unable to reproduce in a simpler example.)

Output

node(65675,0x1091595c0) malloc: Incorrect checksum for freed object 0x104819200: probably modified after being freed.
Corrupt value: 0x1baffed00baffedf
node(65675,0x1091595c0) malloc: *** set a breakpoint in malloc_error_break to debug

Extended Description

The Braid application syncs key-value pairs between a small network of peers and a larger network of read-only subscribers. All communication between the peer and subscriber processes occurs over WebSockets.

In this test 100 uWebSocket servers are created and linked in a ring using ws, i.e. peer A opens a connection to peer B, peer B opens a connection to peer C, peer C opens a connection to peer A.

When the test is reduced to 10 uWebSocket servers the error does not occur.

Thanks

Thanks for the great work and let me know if there is any additional information I can provide.

@wehriam
Copy link
Author

wehriam commented Mar 25, 2019

I was not able to reproduce the error using the code below, but have included it for reference in the event you would like a similar setup.

const WebSocket = require('ws');
const uWS = require('uWebSockets.js');

const danglingWs = new Map();

const launchServerAndClient = async (port, connectPort) => {
  const app = uWS.App({});
  app.ws('/*', {
    compression: 0,
    maxPayloadLength: 16 * 1024 * 1024,
    idleTimeout: 10,
    open: (ws) => {
      const id = Math.random().toString();
      danglingWs.set(id, ws);
      const b = Buffer.from(id);
      ws.send(b.buffer.slice(b.byteOffset, b.byteOffset + b.byteLength));
    },
  });
  const listenSocket = await new Promise((resolve, reject) => {
    app.listen(port, (token) => {
      if (token) {
        resolve(token);
      } else {
        reject(new Error(`Unable to listen on port ${port}`));
      }
    });
  });
  let ws;
  return {
    connect: async () => {
      ws = new WebSocket(`ws://localhost:${connectPort}`);
      const messagePromise = new Promise((resolve, reject) => {
        ws.once('message', (data) => {
          console.log({ data });
          resolve();
        });
        ws.once('error', reject);
      });
      await messagePromise;
    },
    stop: async () => {
      await new Promise((resolve, reject) => {
        ws.once('close', resolve);
        ws.once('error', reject);
        ws.close();
      });
      uWS.us_listen_socket_close(listenSocket);
      for(const ws of danglingWs.values()) {
        // Modifying the socket here does not seem to trigger the error
        ws.foo = Math.random().toString();
      }
    },
  };
};

const count = 100;

const run = async () => {
  const serverClientPairs = [];
  for (let i = 0; i < count; i += 1) {
    const port = 20000 + i;
    const connectPort = i === 0 ? 19999 + count : 19999 + i;
    const serverClientPair = await launchServerAndClient(port, connectPort);
    serverClientPairs.push(serverClientPair);
  }
  await Promise.all(serverClientPairs.map((x) => x.connect()));
  await Promise.all(serverClientPairs.map((x) => x.stop()));
};

run();

@ghost ghost transferred this issue from uNetworking/uWebSockets.js Apr 3, 2019
@ghost
Copy link

ghost commented Apr 3, 2019

Could you try out 15.9.0?

@wehriam
Copy link
Author

wehriam commented Apr 4, 2019

Still present in 15.9.0:

node(92178,0x10af8e5c0) malloc: Incorrect checksum for freed object 0x10517ac00: probably modified after being freed.
Corrupt value: 0x1baffed00baffedf
node(92178,0x10af8e5c0) malloc: *** set a breakpoint in malloc_error_break to debug

@ghost
Copy link

ghost commented Apr 4, 2019

You have to strip down your code and provide instructions. I don't know what to do with it.

@wehriam
Copy link
Author

wehriam commented Apr 4, 2019

To run the tests and trigger the error you can “yarn install” then “yarn test” from the linked repo - but I wasn’t able to create an example that reproduces the error building from the bottom up. (See the previous code.)

I’ll try taking things out to see if I can isolate the issue.

@ghost
Copy link

ghost commented Apr 4, 2019

I figured npm install npm test but I got error when npm installing. So yarn install yarn test should work better?

@wehriam
Copy link
Author

wehriam commented Apr 5, 2019

I've updated the repo to work with npm, npm install followed by npm run test:js should work.

git clone https://github.com/bunchtogether/braid-server.git
cd braid-server/
npm install
npm run test:js

@wehriam
Copy link
Author

wehriam commented Apr 5, 2019

Also, while I can still reproduce the error with the test suite, it does seem to be less frequent with 15.9.0, happening ~30% of the time instead of 90% with 15.8.0.

@ghost
Copy link

ghost commented Apr 5, 2019

How can I run that test without any other process or mid-step? Only vanilla nodejs, no test runner or anything. These files don't contain JavaScript, it's some kind of typed thing.

I need to run only nodejs and only that one script that triggers the issue.

@wehriam
Copy link
Author

wehriam commented Apr 6, 2019

These files use Flow, which is like Typescript. I’ll work on a version that triggers the error without the test framework or type checking.

@ghost
Copy link

ghost commented Apr 9, 2019

Did you make any progress? I would like to run AddressSanitizer on the nodejs process that runs this test. But I don't know how to do that with all of these test runner parent processes spawning things to the left and right.

wehriam added a commit to bunchtogether/braid-server that referenced this issue Apr 9, 2019
@wehriam
Copy link
Author

wehriam commented Apr 9, 2019

Frustratingly I am unable to reproduce outside of the tests, which leads me to believe the issue is caused by Jest's process spawning.

I've tried to mirror the Jest environment as closely as possible, which spawns child process workers.

You can run an example - including child process spawning - with the following command:

node tests/malloc-example-spawner.js

And the virtually identical test:

node ./node_modules/jest/bin/jest.js --config "{}" tests/malloc-example.test.js

Roughly 1/3 of the test executions report the error:

 PASS  tests/malloc-example.test.js
  ✓ Should peer then close gracefully (2884ms)

Test Suites: 1 passed, 1 total
Tests:       1 passed, 1 total
Snapshots:   0 total
Time:        4.242s
Ran all test suites matching /tests\/malloc-example.test.js/i.
node(56023,0x10b7a15c0) malloc: Incorrect checksum for freed object 0x10489f200: probably modified after being freed.
Corrupt value: 0x1baffed00baffedf
node(56023,0x10b7a15c0) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

@ghost
Copy link

ghost commented Apr 9, 2019

Even if you don't see any error, how can I run without spawning any new process?

@wehriam
Copy link
Author

wehriam commented Apr 9, 2019

To run the example without spawning:

node tests/malloc-example.js

@ghost
Copy link

ghost commented Apr 9, 2019

Perfect, will try this out

@jeremy-j-ackso
Copy link

Chiming in with something sort of relevant: I started watching this issue because it's similar to an error I'm coming across. I have a C addon that I wrote for Node, but I'm only using Mocha for my test framework. I also only get the error when running my tests. Practice samples seem to work fine.

I'm starting to think that it may be a problem related to mocha with native addons. Mocha is a dependency of Jest, so could be relevant?

@ghost
Copy link

ghost commented Apr 15, 2019

I cannot reproduce this on Linux and I own no Apple products. It may be something about forking the process on non-Linux systems, as it really only has been tested on Linux.

Really, spawning multiple processes on non-Linux is completely untested.

You say it only happens when spawning processes? Can you try it on Linux instead?

@ghost ghost closed this as completed Apr 16, 2019
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants