Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodejs app that previously ran fine under pm2 will no longer start, even though same app with same args and user works #5767

Closed
coyoteconscious opened this issue Feb 16, 2024 · 5 comments

Comments

@coyoteconscious
Copy link

coyoteconscious commented Feb 16, 2024

What's going wrong?

First: Apologies for this being so long. It's weird and difficult to succinctly describe, and I wanted to make sure I provided as complete a description and supporting files as possible.

This one is weird.

I run a virtual tabletop app named Foundry vtt. Previously, I'd always written a small systemd service file to launch the nodejs apps as a service, but I wanted to give PM2 a go, both because it's a little friendlier, and because it's so easy to set up clusters.

Anyway, I have multiple copies of this thing running. Each is completely identical except for the name, and they run in a different folder tree. Each is essentially an identical copy, just running under different subdomains.

Well, at some point, after I stopped and then restarted the services in PM2, one just refuses to function or make any connections. If I run the exact same nginx startup script either manually from the command line, or as a systemd service, it works flawlessly, and it had been running for days, through multiple restarts of the machine before it stopped working via pm2.

It's running the app as the same user, with the exact same parameters. I literally can copy the command line used to start the service, remove the pm2 bits, and it runs fine. I have even tried removing first the application's app directory, then it's data directory and replacing them with default, and it will not run. If I target the other application's code and data, with the same name, it won't run. But it also won't run the troublesome one if I change the name, so it doesn't appear to be a namespace issue.

When I load the site, it returns a 504 error, and the nginx logs say the connection was refused, but it loads the favicon. And only the favicon.

I'm not sure if I need tech support or an exorcist.

How could we reproduce this issue?

I am not sure. Presumably by using my haunted server. I would not be surprised it it is just happening to me, and that the error originated between my monitor and chair (i.e., I'm doing something screwy, and can't figure out what).

Supporting information

The app I'm working with has an application directory (of course), plus a data directory that can be placed arbitrarily wherever you like. My setup looks like:

App Directories:

/var/foundry/bacon/foundryvtt
/var/foundry/sadboys/foundryvtt
/var/foundry/twb/foundryvtt
/var/foundry/vtt/foundryvtt

(note: These are all identical copies, copied and with identical permissions and owner (0775/root, but I also tried running not as root with things set to group html-data) set with sudo chown/chmod -fR, and compared via diff and visually to ensure they are absolutely identical with identical permissions).

Data Directories:

/media/coyote/Vault/foundry/bacon.mydomain.org/foundryvtt-data
/media/coyote/Vault/foundry/sadboys.mydomain.org/foundryvtt-data
/media/coyote/Vault/foundry/twb.mydomain.org/foundryvtt-data
/media/coyote/Vault/foundry/vtt.mydomain.org/foundryvtt-data

Hosts file:

127.0.0.1 localhost
127.0.1.1 mylocalhostname

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

192.168.4.87	www.mydomain.org
192.168.4.87	twb.mydomain.org
192.168.4.87	sadboys.mydomain.org
192.168.4.87	vtt.mydomain.org
192.168.4.87	bacon.mydomain.org

The pm2 commands used to launch are:

sudo pm2 start -i -1 /var/foundry/bacon/foundryvtt/resources/app/main.js  --name bacon -- --dataPath=/media/coyote/Vault/foundry/bacon.mydomain.org/foundryvtt-data
sudo pm2 start -i -1 /var/foundry/sadboys/foundryvtt/resources/app/main.js  --name sadboys -- --dataPath=/media/coyote/Vault/foundry/sadboys.mydomain.org/foundryvtt-data
sudo pm2 start -i -1 /var/foundry/twb/foundryvtt/resources/app/main.js  --name twb -- --dataPath=/media/coyote/Vault/foundry/twb.mydomain.org/foundryvtt-data
sudo pm2 start -i -1 /var/foundry/vtt/foundryvtt/resources/app/main.js  --name vtt -- --dataPath=/media/coyote/Vault/foundry/vtt.mydomain.org/foundryvtt-data

Run the following commands

$ pm2 report

--- PM2 report ----------------------------------------------------------------
Date                 : Fri Feb 16 2024 07:28:49 GMT-0700 (Mountain Standard Time)
===============================================================================
--- Daemon -------------------------------------------------
pm2d version         : 5.3.1
node version         : 18.13.0
node path            : not found
argv                 : /usr/bin/node,/usr/local/lib/node_modules/pm2/lib/Daemon.js
argv0                : node
user                 : root
uid                  : 0
gid                  : 0
uptime               : 403min
===============================================================================
--- CLI ----------------------------------------------------
local pm2            : 5.3.1
node version         : 18.13.0
node path            : not found
argv                 : /usr/bin/node,/usr/local/bin/pm2,report
argv0                : node
user                 : root
uid                  : 0
gid                  : 0
===============================================================================
--- System info --------------------------------------------
arch                 : x64
platform             : linux
type                 : Linux
cpus                 : Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz
cpus nb              : 4
freemem              : 8105222144
totalmem             : 16640122880
home                 : /root
===============================================================================
--- PM2 list -----------------------------------------------
┌────┬────────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
│ id │ name       │ namespace   │ version │ mode    │ pid      │ uptime │ ↺    │ status    │ cpu      │ mem      │ user     │ watching │
├────┼────────────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┼──────────┼──────────┼──────────┼──────────┤
│ 24 │ bacon      │ default     │ 11.315… │ cluster │ 3912916  │ 1s     │ 2006 │ online    │ 0%       │ 54.3mb   │ root     │ disabled │
│ 25 │ bacon      │ default     │ 11.315… │ cluster │ 3912837  │ 6s     │ 2000 │ online    │ 0%       │ 111.6mb  │ root     │ disabled │
│ 26 │ bacon      │ default     │ 11.315… │ cluster │ 3861292  │ 57m    │ 1338 │ online    │ 0.3%     │ 132.5mb  │ root     │ disabled │
│ 30 │ sadboys    │ default     │ 11.315… │ cluster │ 3912806  │ 8s     │ 4    │ online    │ 0%       │ 123.7mb  │ root     │ disabled │
│ 31 │ sadboys    │ default     │ 11.315… │ cluster │ 3912818  │ 7s     │ 4    │ online    │ 0%       │ 118.2mb  │ root     │ disabled │
│ 32 │ sadboys    │ default     │ 11.315… │ cluster │ 3912853  │ 5s     │ 4    │ online    │ 0%       │ 98.8mb   │ root     │ disabled │
│ 0  │ twb        │ default     │ 10.291… │ cluster │ 3554801  │ 6h     │ 0    │ online    │ 0.3%     │ 147.2mb  │ root     │ disabled │
│ 1  │ twb        │ default     │ 10.291… │ cluster │ 3912888  │ 3s     │ 6743 │ online    │ 0%       │ 84.9mb   │ root     │ disabled │
│ 2  │ twb        │ default     │ 10.291… │ cluster │ 3912877  │ 3s     │ 6751 │ online    │ 0%       │ 86.9mb   │ root     │ disabled │
│ 3  │ vtt        │ default     │ 11.315… │ cluster │ 3912860  │ 5s     │ 4743 │ online    │ 0%       │ 96.2mb   │ root     │ disabled │
│ 4  │ vtt        │ default     │ 11.315… │ cluster │ 3912829  │ 7s     │ 4743 │ online    │ 0%       │ 116.8mb  │ root     │ disabled │
│ 5  │ vtt        │ default     │ 11.315… │ cluster │ 3582199  │ 6h     │ 0    │ online    │ 0.3%     │ 307.3mb  │ root     │ disabled │
└────┴────────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘
===============================================================================
--- Daemon logs --------------------------------------------
/root/.pm2/pm2.log last 20 lines:
PM2        | 2024-02-16T07:28:44: PM2 log: App name:vtt id:3 disconnected
PM2        | 2024-02-16T07:28:44: PM2 log: App [vtt:3] exited with code [1] via signal [SIGINT]
PM2        | 2024-02-16T07:28:44: PM2 log: App [vtt:3] starting in -cluster mode-
PM2        | 2024-02-16T07:28:45: PM2 log: App [sadboys:32] online
PM2        | 2024-02-16T07:28:45: PM2 log: App [vtt:3] online
PM2        | 2024-02-16T07:28:46: PM2 log: App name:twb id:2 disconnected
PM2        | 2024-02-16T07:28:46: PM2 log: App [twb:2] exited with code [1] via signal [SIGINT]
PM2        | 2024-02-16T07:28:46: PM2 log: App [twb:2] starting in -cluster mode-
PM2        | 2024-02-16T07:28:46: PM2 log: App [twb:2] online
PM2        | 2024-02-16T07:28:46: PM2 log: App name:twb id:1 disconnected
PM2        | 2024-02-16T07:28:46: PM2 log: App [twb:1] exited with code [1] via signal [SIGINT]
PM2        | 2024-02-16T07:28:46: PM2 log: App [twb:1] starting in -cluster mode-
PM2        | 2024-02-16T07:28:47: PM2 log: App [twb:1] online
PM2        | 2024-02-16T07:28:48: PM2 log: App name:bacon id:24 disconnected
PM2        | 2024-02-16T07:28:48: PM2 log: App [bacon:24] exited with code [1] via signal [SIGINT]
PM2        | 2024-02-16T07:28:48: PM2 log: App [bacon:24] starting in -cluster mode-
PM2        | 2024-02-16T07:28:49: PM2 log: App [bacon:24] online
PM2        | 2024-02-16T07:28:49: PM2 log: App name:sadboys id:30 disconnected
PM2        | 2024-02-16T07:28:49: PM2 log: App [sadboys:30] exited with code [1] via signal [SIGINT]
PM2        | 2024-02-16T07:28:49: PM2 log: App [sadboys:30] starting in -cluster mode-

When I go to any of the above sites, except the sadboys one, they work fine. If I go to the sadboys one, I get 502 Bad Gateway nginx/1.24.0 (Ubuntu)

If I launch the exact same app, using:

/var/foundry/sadboys/foundryvtt/resources/app/main.js --dataPath=/media/coyote/Vault/foundry/sadboys.mydomain.org/foundryvtt-data

That works fine, and as expected.

If I instead activate the app via a systemd service like this (again, all are identical, except for the one word, vtt/sadboys/twb/bacon), it runs without issue:

[Unit]
Description=foundry-sadboys
After=network.target

[Service]
Type=simple
ExecStart=node /var/foundry/sadboys/foundryvtt/resources/app/main.js --dataPath=/media/coyote/Vault/foundry/sadboys.mydomain.org/foundryvtt-data
Restart=on-failure
User=root
Group=root
Environment=PATH=/usr/bin:/usr/local/bin
Environment=NODE_ENV=production
WorkingDirectory=/var/foundry/sadboys/

[Install]
WantedBy=multi-user.target

That also works fine.
Just that copy of the app will not run, named as sadboys, or as any other name. The other identical ones work fine.
Also, all of them were working fine previously.

nginx config files linked from from sites-enabled:
default.txt
bacon.mydomain.org.conf.txt
sadboys.mydomain.org.conf.txt
twb.mydomain.org.conf.txt
vtt.mydomain.org.conf.txt

I can run this app as literally everything but a pm2 service, even though I run several functionally-identical ones, and the same service ran for days with no problem previously.

I even tried uninstalling and reinstalling pm2, and even reinstalling everything with node version manager, then a fresh install of pm2. No dice.

It's like it hates this one install. It's siblings are fine, but it hates this one.
And yes, I have also tried changing the name (incl references within the file), and it still fails. I have also tried launching with an empty/default data directory, to make sure it's not size or content of the data, and even running it against the application/main.js from another install. It will not run, it if even thinks I'm trying to run the sadboys instance.

I kid you not. I tried replacing first the app directory. Then the data directory. Then even the name, so that by the end, it had different (otherwise functioning) app directories and data directories and name and it still would not work. It felt like that old joke about if you replace the ax, the axe handle, and the axe grip, is it still the same axe?

I would absolutely be delighted if someone can help me sort this out, even it it's that I've done something embarrassingly silly and it's all my fault. I don't mind looking foolish if you can at least stop making me think my box has bad faeries living in it or something. At this point, I just want it to make sense! : )

Thoughts?

@coyoteconscious
Copy link
Author

Nearly forgot: Running Ubuntu 23.10 and Node v21.6.2

@coyoteconscious
Copy link
Author

Unless someone comes up with a surprise solution to this one in the next couple of days, I think I'm going to reinstall the entire OS and everything from scratch, possibly while performing the lesser banishing ritual of the pentagram of Earth, while waving chicken bones at it, then hoping the problem doesn't re-appear - then I'll try to forget it happened.

I figure this is either something ridiculously simple and silly that I did, or it's some darn thing stuck in an arcane location that is messed up, and not fixed by standard purging and reinstalling.

@coyoteconscious
Copy link
Author

I mean, other than that, my thoughts on pm2 are basically "where have you been all my life? Where were you when I was still working?"

: )

I feel like the problem is likely not with PM2 per se, I'm figuring either I'm missing something, or something got corrupted somewhere, and the problem shows up with PM2, but other launch methods don't trigger whatever it is for some reason. I don't know. It's weird.

@coyoteconscious
Copy link
Author

Okay, this one was dumb. Apparently, while doing some debugging, I got the error.log file for that server stun-locked. When it would try to write a warning message, it would apparently hit the log and die, instead of writing to it or failing politely.

A victory for science. Not ghosts.

@coyoteconscious
Copy link
Author

And it wasn't PM2's fault, there was a file path generating a warning, that was only happening with that server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant