Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quesion: AX.25 socket data corruption #9

Closed
isavitsky opened this issue Feb 18, 2020 · 9 comments
Closed

Quesion: AX.25 socket data corruption #9

isavitsky opened this issue Feb 18, 2020 · 9 comments

Comments

@isavitsky
Copy link

Hello,

I'm writting a proxy programme to pass-through raw binary data between AX.25 socket and a TCP socket.

In fact, to not to reinvent the wheel, I took a simple TCP proxy source and converted the server TCP socket to AX.25 socket on one side and a client TCP socket to AX.25 socket on another side of AX.25 link.

Here is the snippet of the programme:

/* Forward data between sockets */
void forward_data(int source_sock, int destination_sock, const char *name) {
    ssize_t n;
    fd_set read_fd;
    int s = source_sock;
#define BUF_SIZE 16384
    char buffer[BUF_SIZE];

    if (fcntl(s, F_SETFL, O_NONBLOCK) < 0) {
        perror("socket");
        close(s);
        return;
    }

    while (1) {
        FD_ZERO(&read_fd);
        FD_SET(s, &read_fd);
        select(s + 1, &read_fd, NULL, NULL, NULL);
        if (FD_ISSET(s, &read_fd)) {
            n = recv(source_sock, buffer, BUF_SIZE, 0);
        }
        if (n < 0) break;
        write(destination_sock, buffer, n); // send data to output socket
    }

On one side of AX.25 link it accepts TCP connections on port 6789 and initiates an AX.25 connection to UR5VIB-8. On the other side of the link proxy accepts AX.25 connections on UR5VIB-8 and forwards data to local TCP port 22.

It generally works. For a while. You can log in by SSH (or telnet) via proxied link to the remote site. The overal interactivity (reaction on packet loss) is far better than in IP encapsulation mode. But after a while you've got a connection reset from SSH with message:

Bad packet length 573483957.
ssh_dispatch_run_fatal: Connection to 127.0.0.1 port 6789: Connection corrupted

(For telnet connection you will just get a connection reset.)

I tried to figure out what is the problem with axlisten and tcpdump, and I'm not 100% sure, but I think I found that there is data duplication in the middle of the transfer. The inserted data makes SSH confused.

May be someone can shed a light on what am I missing here?
I though it should be starighforward to pass the data between two reliable sequential sockets.
Thanks!

Ivan,
UR5VIB

@dranch
Copy link

dranch commented Feb 18, 2020

I wonder if this has to do with SSH rekeying. Try running debugging on the SSH client and server to see what might be going on there. Beyond that, I believe this would be better asked on the Linux Hams linux-hams@vger.kernel.org list as this probably has nothing to do with AX.25.

--David
KI6ZHD

@isavitsky
Copy link
Author

Hello, David,

Thanks for your suggestion.
With ssh -vvv the debug is virtually the same as I presented above.
I tried to simulate what I did in SSH within a telnet session, as seems telnet is less susceptible to the connection termination due to data corruption. Here what I did, I set BUF_SIZE to 128 and logged in via proxied telnet session:

$ dd if=/dev/zero bs=1 count=5000 2>/dev/null| hexdump -vC
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 000 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |...00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
.............|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000070  00 00 00 00 00 00 00 000000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000100  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000110  00 00 00 00 00 00 00 00  00 00 00 00 000000120  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000130  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000130  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000150  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000150  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000170  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000180  00 00 00 00 00 00 00 00  00 00 00 00 000000170  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000180  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000190  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Now it is obvious that there is not only a data duplication, but data truncation as well.
I will try to ask in linux-hams mailing list on this matter.
Thanks!

Ivan,
UR5VIB

@ve7fet
Copy link
Owner

ve7fet commented Feb 18, 2020

I'm going to recommend that this issue be taken up with the official source (which this is not). If there are patches to correct this behaviour, that is where they need to be applied, and they will get pulled in here in due time.

@isavitsky
Copy link
Author

Hello,

A small update. I used the 'call' utility for checking if there a same problem there as well. And it looks (at least for me) that there is indeed a problem in linux's system buffer during write().

During debugging of my proxy.c I was adding some 10—100 ms usleep() delays between each write() into SOCK_SEQPACKET socket and it helped on the fast AXUDP Ethernet link. I doubt it would help to maintain consistency on slower links.

I've recorded a small video: https://youtu.be/K4vhCXLK1b0 and updated linux-hams mailing list as well.

It also looks like noone is using connected mode anymore, at least I've got no replies to my message on linux-hams mailing list.

Ivan

@dranch
Copy link

dranch commented Mar 15, 2020

Hello Ivan,
Thanks for posting that video and that does show the issue quite easily. It's not that people aren't using CONNECTed sessions anymore... the issue is more that we have been unable to find anyone who can fix the kernel. Most people have either been using legacy kernels (3.19.x) or people have been using external AX.25 stacks (JNOS, Direwolf, etc). Many people would love to see current kernels fixed but we've been unable to find anyone who can propose changes.

--David
KI6ZHD

@isavitsky
Copy link
Author

Hello, David,

Thanks for the legacy kernel mention. I'll check out what's going on in that department in a VirtaulBox environment.

As I understand, there is a complication with keeping AX.25 code in sync with the current kernel state. May be I should look at that side of user-level stacks.

Thanks,
Ivan

@kq6up
Copy link

kq6up commented Jan 19, 2023

I'm going to recommend that this issue be taken up with the official source (which this is not). If there are patches to correct this behaviour, that is where they need to be applied, and they will get pulled in here in due time.

Are you sure this is not a kernel bug?

@dranch
Copy link

dranch commented Jan 19, 2023

If the belief this is a real kernel issue, it would best to create a new email thread at:

linux-hams@vger.kernel.org

Thomas Osterried, who maintains the official AX.25 repo for user space code, is also on that list just in case this might be a kernel+userspace interaction issue.

@ve7fet
Copy link
Owner

ve7fet commented Aug 2, 2023

Kernel or upstream issue. Not fixing it in our source.

@ve7fet ve7fet closed this as completed Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants