Troubleshooting VRPN

Russell Taylor edited this page Mar 30, 2015 · 5 revisions

Troubleshooting VRPN

This page contains a list of problems that users have encountered with VRPN and the solutions to these problems.

My client code crashes when I use VRPN

If the vrpn_print_devices program is able to connect to your server and run properly, then VRPN is 5 out of 5 for being okay. The most common cause of issues in client code is using threads and calling some part of VRPN from different threads. VRPN is not thread-safe. All access to the VRPN devices should either be done from within the same thread or protected with a semaphore. Remember that calling mainloop() on one object can cause callbacks to be triggered on any object.

Razer Hydra Cannot Open Device on Windows

If you are using the Windows drivers for the Hydra, this prevents the VRPN server from opening the device directly. Uninstall the drivers if you want to use VRPN to manage the device.

Cannot compile Python with CMake on Linux

Version 2.6 of CMake does not correctly compile the Python libraries on VRPN versions 07.29 and above. Switching to CMake version 2.8.4 should let you compile VRPN out of the box without changing any settings.

Cannot bind address

Problem: While launching VRPN Server, I get this error message: "open_socket : cannot bind address -- 0 -- No Error". Cause: There is probably another VRPN server running on the machine. You can run multiple devices through one server, but there can only be one server running on port 3883 (the default VRPN port). If you need to run two servers, use the command-line option on vrpn_server to change its port number.

No other program should be using the IANA-registered port number 3883 for VRPN, but it is possible that another program is running on this port.

USB HID device not working under Linux

We had this problem for the Xkeys device. The max packet size of the HID endpoints was causing us problems. The reports were longer than the max packet size, so got fragmented across two packets. The libusb implementation of HIDAPI doesn't really handle this correctly. If you find another device that gives trouble, please email the list server to let us know. We'll either fix it, or start down the path of one of the more general fixes below.

Alan Ott says: If you want to, try adjusting the length in hid-libusb.c line 658 (const size_t length). Right now, it's set to the endpoint max packet length. The way USB works, it treats every packet which is the same length as the endpoint max packet length to mean that there is another packet (continuation) which is coming. It treats them all this way until there is a packet shorter than the max packet length. A short packet indicates the end of transfer. A zero-length packet needs to be sent if the transfer length is equal to the max packet length.

On top of that, libusb will buffer them up until there is either a short packet or the buffer is filled up (which can be adjusted by the size_t length variable I mentioned above).

If you want to try another work-around, the hidraw implementation (change the include to use linux/hid.c instead of linux/hid-libusb.c) should handle this correctly, but read the README about using the hidraw implementation. We had trouble using hidraw when we switched VRPN from native implementations to HIDAPI.

Don't get any data from server... "No data from ..."

The first thing to check is whether the server is getting a connection request from the client. If so, when the client is run the server should print "vrpn: Connection request received: 127.0.0.1 4260", where the first number is the IP address of the machine the client is running on and the second is the port number it is using. (There will also be a "Can't read header" message or something when the client exits.) If you are not getting these messages on the server when you run the client, then you should check the machine name or IP address used by the client.

The second is to check whether the name of the device matches on the client and the server. The server configuration file (vrpn.cfg by default) has names associated with each entry. The test entry is a NULL tracker named "Tracker0". If you uncomment the line for this in the config file and then run the server, you should be able to connect to it as "Tracker0@MACHINE" where you replace MACHINE with the DNS name or IP address of the host where the server is running. If you are running the client and server on the same machine, "Tracker0@localhost" or "Tracker0@127.0.0.1" should work. Use the vrpn_print_devices application to test whether the client name is correct. If the name is wrong, you'll keep getting the timeout messages. If it is correct, they should go away.

Why can't I get my Callback handler to be recognized?

All VRPN callback functions need the VRPN_CALLBACK declaration type, which maps to __stdcall on Windows. This was a deliberate choice made to keep everyone on the same default callback convention, so that people could use others' VRPN libraries (static and dynamic) and not get random stack crashes. See the client_and_server application for an example.

I can't get my SpaceNavigator to report values

The 3DConnexion driver (at least on the mac) causes input data to be ignored by VRPN depending on which window has active focus. Because the VRPN driver talks directly to the SpaceNavigator as a user-level driver, the system driver should not be running at the same time. To talk with the various SpaceX devices from 3DConnexion, you may need to uninstall the drivers that come with the device.

I can't get my Flock to work

The VRPN drivers used to only work with flocks that are in the "Flock of Birds" configuration, where the extended-range transmitter unit is not also a receiver unit. David Nahon has added an option in VRPN 06.05 that works with flocks that are in "Standalone" mode with the transmitter and receiver plugged into the same unit.

All released versions of VRPN through 06.03 have been tested with a parallel flock running connected to a Linux server (that's how we run at UNC). Versions 6.00 and 6.03 have been tested with a Flock in the daisy-chained configuration running on a Windows 2000 machine (it worked with 1, 2, and 3 sensors connected).

Version 06.05 may work with transmitters that are plugged in as other than the first unit, but this has not been tested. Usually (for the non-parallel case), you plug the ERC into the computer via RS-232 and then daisy-chain the receivers from the ERC. This configuration also works with CAVElib, so both can be used with systems connected that way.

Make sure that the baud rate on the serial port is set to match the one being used by the Flock (the dip switches determine this, as per the manual). If they don't match, the tracker continually resets and never gets any characters from the Flock (although the lights sometimes flash).

Make sure that the baud rate is fast enough. We find that a rate of 38400 is not fast enough to keep up with three daisy-chained sensors (the tracker runs for a very short while, then times out and resets, and repeats). 115200 is fast enough for three sensors (and why would you run slower?).

Make sure that you have only the transmit, receive, and ground lines on the serial cable connected to the Flock: Sebastien Maraux reported the following behavior when using his flock in daisy-chained mode: "I have to nitialize the flock with cbird.exe (flock utility) and use 'change value' (parameter 15 in main menu), and 'autoconfig FBB' (choice 16 in submenu), then set number of devices to 3 in order to make vrpn server work after each ERC boot. Rebooting flock and trying to use vrpn server directly leads to never ending 'POLL-RESET-SETTING PARAMETERS-WAITING FOR RESPONSE'. To be more precise, I can make VRPN work if I use cbird to initialize flock, then quit with alt F4 (not closing properly) and then launch VRPN server. I also noticed that I always need to call submenu 15 of cbird twice to get it working (first call always returns : rx line errors have occured, could not read bird status). I don't know if this two things are related ? (as vrpn makes several attempts automatically)." David Nahon and Kyle Smith reported the following fix: "Have you tried to use a cable with only Tx, Rx and GND wires? I haven't read your mail very carefully, but it looks like this is the problem. Indeed, cbird and winbird work fine with a more wired cable, but vrpn doesn't. Just for clarification, Tx, Rx, and GND are pins 2,3, and 5. You can remove others from the male end of the cable easily with needle-nose pliers."

I don't hear from the buttons on my IS-900 Wand/Stylus

It seems likely that the "remote" button object is defined with the name of the tracking device, rather than with the name of the button device. The button device name is defined on the stylus line in the config file, and might not be the same as the tracker. In the case described by the example configuration below, if the name of the machine running the server is "myis900server" you would connect to the tracker as "Isense900@myis900server" and to the buttons on the stylus as "Stylus0@myis900server":

vrpn_Tracker_Fastrak Isense900 COM1 115200 / Wand Wand0 0 -1.0 0.0 0.0 1.0 -1.0 0.0 0.0 1.0 / Stylus Stylus0 2 If you connect the button to "Isense900@myis900server", it will not complain, because there is a device with that name at the other end, but it will never receive button reports from it either.

No "unreliable" messages seem to get through, can't send UDP

Step 1: upgrade to version 7.15 or higher; there was a bug in earlier versions that caused this to happen on some computers with more than one network connection.

Step 2: We had trouble with VRPN when going through firewalls at high schools. To fix it, we implemented the TCP-only connection, you can connect to device@tcp://machine:port rather than device@machine:port, and the client will then make a single outgoing TCP connection to the server and no unreliable (UDP) channel will be established between them. This should also work to avoid firewalls on the client computer.

We've not experimented with Linux firewalls, but here is what is needed to enable the full reliable+unreliable VRPN connection mechanism: The client sends a UDP message to the server telling the server the port number of a randomly determined at runtime TCP port on the client. The server then makes a TCP connection back to the client computer on that port number. Once this is established, each computer picks a randomly determined at runtime UDP port at which it will receive packets and tells the other side. Unfortunately for security, this means opening up all incoming TCP connections on the client. It also means accepting UDP packets to all ports on both the client and the server. Actually, I guess you only need to open up all non-system ports (greater than port 1024) on these systems.

This complicated connection-establishment procedure exists to enable wait-free testing for connection to hosts that are up, hosts that are down, hosts that have hung processes, and clients/servers that have multiple VRPN connections going on at once. Simpler schemes tried earlier each suffered from multiple-second blocking at various portions of the connection scheme. The TCP-only scheme can cause blocking if the server computer is in certain states. However, it only requires opening up the server to accept connections on the specific port that VRPN is set to listen on.

I can't make sphere_client work with my Microsoft FF Joystick

Getting this going requires at least three things be set up just right.

First, you need to make sure that you have set up the vrpn.cfg file so that the joystick looks like a Phantom; this requires that it be exporting a Button, ForceDevice, and Tracker interface all with the same device name. If the device names do not match, or it is not exporting the tracker interface, then the device will always pull the same direction (maybe jerking around some) and you won't see any messages in the sphere_client window about the sensor being at different locations. A configuration file that will drive the device properly and can be connected to with the name Joystick0 is listed here (the first entry creates the ForceDevice, Button, and Analog interfaces; the second reads the Analogs and turns them into a Tracker):

vrpn_DirectXFFJoystick Joystick0 60 200

vrpn_Tracker_AnalogFly Joystick0 60.0 absolute X *Joystick0 0 0.0 0.0 1.0 1.0 Y *Joystick0 1 0.0 0.0 -1.0 1.0 Z *Joystick0 6 0.0 0.0 1.0 1.0 RX NULL 3 0.0 0.0 1.0 1.0 RY NULL 4 0.0 0.0 1.0 1.0 RZ *Joystick0 5 0.0 0.0 0.5 1.0 RESET NULL 0 The above file works with the following command-line arguments to the sphere_client program: 0.1 0.1 Joystick0@localhost.

Second, if you are going to run the client and server both on the same single-processor machine, then you need to increase the priority of the vrpn_server.exe program using the task Manager so that it is able to service the device often enough. If you don't do this, you will get lots of jerking around on the joystick.

Third, you need to have the throttle (grey rotating doober to the left of the stick) just slightly towards you from straight up: this sets Z in the config file above. If you don't have it in the correct location, then you'll be outside of the sphere and won't feel any force.

I can't get my SGI to talk to my serial tracker (Flock, Fastrak, ...)

A couple of sites have had trouble talking with their tracking devices from SGI machines (Onyx, O2). We really don't run these servers on our SGI machines, since we use Linux (usually) or NT (rarely) to drive the trackers and other serial devices. This prompted us to verify that the serial-port code works on an SGI. As of a beta version 4.11, it does work for talking to a 3Space tracker from an O2. Since the driver code is not OS-specific (except at the serial port layer), this leads one to suspect that the other drivers should work as well. However, there are a host of problems that one can encounter when trying to get RS-232 working between a device and a host computer. So, here are the things we try for SGIs:

  • Ensure that the serial cable pinout is correct for the SGI ports. Older Onyx computers have a different pinout than the standard used by PCs. If you can't get anything back from the tracker, try adding a NULL-modem cable.
  • We use /dev/ttyd[1234] for the devices.
  • Make sure that the machine isn't going to launch a getty() on the port by editing /etc/inittab and commenting out the lines for the serial port you are using (or maybe changing the line to say "off"). When the getty() runs, you can get all sorts of strange behavior, as you and another process fight for the characters from the port.
  • Set the permissions on the port. Note that on some machines, /dev/ttyd1 is a link to another location; you need to follow the link and chmod the actual device. You want read/write access for all users, probably.
  • If the server still won't run, verify the physical connection by doing 'cat < /dev/ttyd1' in a window to capture the output. Then, set the baudrate by (for example) 'stty 19200 < /dev/ttyd1' to the correct rate in another window. In that window, do 'cat > /dev/ttyd1' so that you can send commands to the device. For the 3Space and Fastrak, 'S' for status should get you a response. Power-cycle the tracker, then try this command after waiting 30 seconds or so. Keep retrying the above things (or others) until you can get a command to send you data back from the tracker; if you can't get this to happen, no software can talk to it.

Magellan Space Mouse Beeping a lot on reset

Problem: it often takes several tries to get spacemouse properly initialized the first time a use it (after rebooting computer and peripheral). It typically takes 2 tries where it beeps continuously, one where it beep twice, and the fourth is generally the good one. In comparison, spacemouse utilities always initialize peripheral well.

Solution: I encountered a similar problem recently, where the Magellan space mouse would not initialized through VRPN, beeping continuously. I solved the problem by un-installing the Spacemouse driver/task bar utility - you might also be able to just disable it. VRPN connects to the Magellan using direct serial-port calls and a built-in driver, so I believe the two programs are competing for serial port access.

My code calls tracker->mainloop(), which stays busy and never returns.

You should not normally place your rendering loop (or any other heavy computation) inside the tracker callback. If you do, there will likely be another tracker message by the time you have finished the computation. This will cause the program to enter an infinite loop and never return from the mainloop() call to the tracker. This is because there is a new report from the tracker after each call to the handler for the last message; the tracker never finishes handling all the messages.

In some cases (such as a vrpn_Imager), it is not practical to ensure that all processing is fast enough. In these cases, call the Jane_stop_this_crazy_thing() on the vrpn_Connection used; see the comments in vrpn_Connection.h on how it is used and why it has such an odd-sounding name.

When I try to connect to a server with a certain client, the whole server locks up and never responds to any other clients until this client disconnects.

If a client program does not call mainloop() on any connections it makes (either directly or through created Remote objects), then the TCP buffers between that client and the server will eventually become completely full and the code will block. Since VRPN is not threaded, this blocks the whole server until the offending client disconnects. An example of a problem client is shown here:

#include <vrpn_Tracker.h>
void main(void)
{
  char  tkr[] = "Tracker0@localhost";
  vrpn_Tracker_Remote *t = new vrpn_Tracker_Remote(tkr);
  printf("Blocking tracker %s\n", tkr);
  while(true); // endless spinning without mainloop() blocks the server
}

The tracker reports always lag way behind (about one frame time).

Trackers use unreliable (UDP as of 2/24/98) transmission for their updates. They report these updates at some frequency, perhaps 60-100 times per second. If the application does not call mainloop() on its vrpn_Tracker_Remote object frequently enough, the incoming buffer fills up and some of these messages are not delivered. Unfortunately, UDP discards the later packets and keeps the oldest ones. This has the effect of introducing latency equal to the application's main loop time when that time is much slower than the tracker report time. To get around this, the application should make sure that it gets a new report each time through its main loop. It can do this by purging all of the old reports (through a call to mainloop()) and then reading until a new report comes in. A code fragment showing how this might be done is given on the tracker page.

Why doesn't the VRPN server run when compiled with shared libraries?

The DLL file needs to be copied from the build directory (pc_win32/DLL/Debug or Release) into the executable directory (pc_win32/server_src/vrpn_server/Debug or Release) before it can be run.

I unregistered a handler that did not exist, and then the code blew up when I registered another.

This may have been fixed in recent releases.

When I stop the Phantom server under Irix, my machine crashes!

Email from the author: A while back I reported that stopping the VRPN server with the Ghost3.1/Irix port crashes the machine from time to time. I think I have further pinned down the problem. It seems only to occur when one runs the server from within an X-Session where xphant is also running. Then it is quite reproducable. But when I rlogin into the machine and run it from terminal, I have had no crashes so far by stopping the server.