Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uca_phantom_get_variable freezes after too many immediate consecutive calls #7

Closed
gabs1234 opened this issue Feb 13, 2023 · 4 comments
Assignees
Labels
bug Something isn't working prio:low Low priority issue

Comments

@gabs1234
Copy link
Collaborator

gabs1234 commented Feb 13, 2023

Problem description

Behaviour is undefined when doing the following:

    GError *error = NULL;
    GValue val = G_VALUE_INIT;
    gboolean result = FALSE;

    UcaPhantomCommunicate *communicator = uca_phantom_communicate_new();
    gboolean connected = uca_phantom_communicate_attempt_connect(communicator, &error);

    if (!connected) {
        g_print ("Houston theres a problem: %s\n", error->message);
        g_error_free (error);
        g_object_unref (communicator);
        return FALSE;
    }

    struct timespec req, rem;
    req.tv_sec = 0;
    req.tv_nsec = 5e+6;

    for (int i=0; i < N_UNIT_PROPERTIES; i++) {
        result = uca_phantom_get_variable (communicator, i, &val, &error);
        if (!result && error != NULL) {
            g_print ("Houston we've got a problem: %s\n", error->message);
            g_error_free (error);
            g_object_unref (communicator);
            return FALSE;
        }
        print_gvalue (i, &val);
        g_value_unset (&val);
        // nanosleep(&req, &rem);
    }

    g_object_unref (communicator);

Indeed, after a certain and seemingly random amount of requests, the program feezes. However, by adding a blocking sleep command at the end of the loop, all requests are made and all answers are received.

Valgrind output

[...]
==45048== Process terminating with default action of signal 2 (SIGINT)
==45048==    at 0x4D01D7F: poll (poll.c:29)
==45048==    by 0x4AB12AD: g_socket_condition_timed_wait (in /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.7200.4)
==45048==    by 0x4AB161E: ??? (in /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.7200.4)
==45048==    by 0x4A8F1F6: g_input_stream_read (in /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.7200.4)
==45048==    by 0x10CDF9: uca_phantom_communicate (uca-phantom-communicate.c:517)
==45048==    by 0x10CDF9: uca_phantom_get_variable (uca-phantom-communicate.c:670)
==45048==    by 0x10B9B4: main (test_network.c:51)
==45048== 
==45048== HEAP SUMMARY:
==45048==     in use at exit: 278,549 bytes in 2,329 blocks
==45048==   total heap usage: 11,650 allocs, 9,321 frees, 9,845,914 bytes allocated
==45048== 
==45048== LEAK SUMMARY:
==45048==    definitely lost: 14 bytes in 1 blocks
==45048==    indirectly lost: 0 bytes in 0 blocks
==45048==      possibly lost: 3,216 bytes in 4 blocks
==45048==    still reachable: 250,999 bytes in 2,089 blocks
==45048==         suppressed: 0 bytes in 0 blocks
==45048== Rerun with --leak-check=full to see details of leaked memory
==45048== 
==45048== For lists of detected and suppressed errors, rerun with: -s
==45048== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

The memory leak is likely due to the communicator object not being freed after the ctl+c signal.

@gabs1234 gabs1234 self-assigned this Feb 13, 2023
@gabs1234 gabs1234 added the bug Something isn't working label Feb 13, 2023
@gabs1234
Copy link
Collaborator Author

gabs1234 commented Feb 13, 2023

sizeof(communicator) gives 8. So there's actually some memory leaking going on. With --leak-check=full, I find:

==45885== HEAP SUMMARY:
==45885==     in use at exit: 278,165 bytes in 2,329 blocks
==45885==   total heap usage: 6,583 allocs, 4,254 frees, 2,483,386 bytes allocated
==45885== 
==45885== 14 bytes in 1 blocks are definitely lost in loss record 139 of 1,927
==45885==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==45885==    by 0x4935738: g_malloc (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7200.4)
==45885==    by 0x494A583: g_strdup (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7200.4)
==45885==    by 0x4A8581C: g_inet_address_to_string (in /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.7200.4)
==45885==    by 0x10C70E: uca_phantom_communicate_discover (uca-phantom-communicate.c:456)
==45885==    by 0x10C70E: uca_phantom_communicate_attempt_connect (uca-phantom-communicate.c:565)
==45885==    by 0x10B982: main (test_network.c:37)

So a malloc isn't being free'd before I ctrl+c.

@gabs1234
Copy link
Collaborator Author

gabs1234 commented Feb 13, 2023

So actually I introduced the memroy leak by letting github copilot malloc something while forgetting to free it... Darn AI.

That being said, the main issue persists.

@gabs1234
Copy link
Collaborator Author

gabs1234 commented Feb 16, 2023

Probable cause

I might simply be overwhelming the IO capacity of the phantom camera (or the PC ? seems unlikely).

This would explain why the only solution to the problem is adding some waiting time between each call.

Solutions:

  1. throttling of data stream (i.e. reduce the frequency)
  2. use async methods
    • write_all_async
    • or even buffering (new_thread + queue)

@gabs1234 gabs1234 added the prio:low Low priority issue label Feb 16, 2023
@gabs1234
Copy link
Collaborator Author

Issue was fixed in cad4a3c: was caused by some trailing carriage return

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working prio:low Low priority issue
Projects
None yet
Development

No branches or pull requests

1 participant