Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskomaticDaemon - Failure occured during job recovery #5556

Closed
sysy69 opened this issue Jun 9, 2022 · 22 comments
Closed

TaskomaticDaemon - Failure occured during job recovery #5556

sysy69 opened this issue Jun 9, 2022 · 22 comments
Assignees
Labels
bug Something isn't working P2

Comments

@sysy69
Copy link

sysy69 commented Jun 9, 2022

Problem description

Taskomatic is crashing all 10Sec.

Version of Uyuni Server and Proxy (if used)

Information for package Uyuni-Server-release:

Repository : uyuni-server-stable
Name : Uyuni-Server-release
Version : 2022.05-180.1.uyuni1
Arch : x86_64
Vendor : obs://build.opensuse.org/systemsmanagement:Uyuni
Support Level : Level 3
Installed Size : 1.4 KiB
Installed : Yes
Status : up-to-date
Source package : Uyuni-Server-release-2022.05-180.1.uyuni1.src
Summary : Uyuni Server
Description :
Uyuni lets you efficiently manage physical, virtual,
and cloud-based Linux systems. It provides automated and cost-effective
configuration and software management, asset management, and system
provisioning.

Details about the issue

Taskomatic stopt working, no updates or anything else was made.

error: [Thread-43] FATAL com.redhat.rhn.taskomatic.core.TaskomaticDaemon - Failure occured during job recovery.

/var/log/rhn/rhn_taskomatic_daemon.log

2022-06-09 12:53:35,015 [Thread-43] WARN  org.hibernate.orm.deprecation - HHH90000014: Found use of deprecated [org.hibernate.id.SequenceGenerator] sequence-based id generator; use org.hibernate.id.enhanced.SequenceStyleGenerator instead.  See Hibernate Domain Model Mapping Guide for details.
2022-06-09 12:53:35,015 [Thread-43] WARN  org.hibernate.orm.deprecation - HHH90000014: Found use of deprecated [org.hibernate.id.SequenceGenerator] sequence-based id generator; use org.hibernate.id.enhanced.SequenceStyleGenerator instead.  See Hibernate Domain Model Mapping Guide for details.
2022-06-09 12:53:38,618 [Thread-43] WARN  org.hibernate.orm.deprecation - HHH90000017: Found use of deprecated entity-type selector syntax in HQL/JPQL query ['s.class']; use TYPE operator instead : type(s)
2022-06-09 12:53:38,742 [Thread-43] WARN  org.hibernate.orm.deprecation - HHH90000017: Found use of deprecated entity-type selector syntax in HQL/JPQL query ['s.class']; use TYPE operator instead : type(s)
2022-06-09 12:53:38,850 [Thread-43] WARN  org.hibernate.orm.deprecation - HHH90000017: Found use of deprecated entity-type selector syntax in HQL/JPQL query ['sa.server.class']; use TYPE operator instead : type(sa.server)
2022-06-09 12:53:38,861 [Thread-43] WARN  org.hibernate.orm.deprecation - HHH90000017: Found use of deprecated entity-type selector syntax in HQL/JPQL query ['sa.server.class']; use TYPE operator instead : type(sa.server)
2022-06-09 12:53:38,870 [Thread-43] WARN  org.hibernate.orm.deprecation - HHH90000017: Found use of deprecated entity-type selector syntax in HQL/JPQL query ['s.class']; use TYPE operator instead : type(s)
2022-06-09 12:53:39,291 [Thread-43] FATAL com.redhat.rhn.taskomatic.core.TaskomaticDaemon - Failure occured during job recovery.
2022-06-09 12:53:42,284 [Thread-43] WARN  org.hibernate.orm.deprecation - HHH020100: The Ehcache second-level cache provider for Hibernate is deprecated.  See https://hibernate.atlassian.net/browse/HHH-12441 for details.
2022-06-09 12:53:43,027 [Thread-43] WARN  org.hibernate.orm.deprecation - HHH90000009: The outer-join attribute on <many-to-many> has been deprecated. Instead of outer-join="false", use lazy="extra" with <map>, <set>, <bag>, <idbag>, or <list>, which will only initialize entities (not as a proxy) as needed.
2022-06-09 12:53:43,039 [Thread-43] WARN  org.hibernate.orm.deprecation - HHH90000005: Defining an entity [com.redhat.rhn.domain.channel.PublicChannelFamily] with no physical id attribute is no longer supported; please map the identifier to a physical entity attribute
2022-06-09 12:53:43,045 [Thread-43] WARN  org.hibernate.orm.deprecation - HHH90000009: The outer-join attribute on <many-to-many> has been deprecated. Instead of outer-join="false", use lazy="extra" with <map>, <set>, <bag>, <idbag>, or <list>, which will only initialize entities (not as a proxy) as needed.

_

@sysy69 sysy69 added bug Something isn't working P5 labels Jun 9, 2022
@aaannz
Copy link
Contributor

aaannz commented Jun 21, 2022

Can you please provide complete logs?

@mbussolotto mbussolotto added P3 and removed P5 labels Jun 27, 2022
@mbussolotto
Copy link
Member

@sysy69 can you please enable log level debug and provide the complete /var/log/rhn/rhn_taskomatic_daemon.log ?
for enabling debug log uncomment line

log4j.logger.com.redhat.rhn=DEBUG

from /usr/share/rhn/classes/log4j.properties

@hbrown-uiowa
Copy link

We're seeing the same issue on 2022.05. However, I don't see a log4j.properties in that directory.

Should this be the edit?

     <Logger name="com.redhat.rhn.taskomatic.SchedulerKernel" level="debug" />
        <Logger name="com.redhat.rhn.taskomatic.task" level="debug" />

from /usr/share/rhn/classes/log4j2.xml

@hbrown-uiowa
Copy link

I see this leading up to the fatal entries:

2022-07-27 18:12:21,817 [Thread-41] DEBUG com.redhat.rhn.common.localization.XmlMessages - Reloading BUNDLE : com.redhat.rhn.branding.strings.StringResource.en_US
2022-07-27 18:12:21,818 [Thread-41] DEBUG com.redhat.rhn.common.conf.Config - getString() -     getString() called with: java.supported_locales
2022-07-27 18:12:21,818 [Thread-41] DEBUG com.redhat.rhn.common.conf.Config - getString() -     getString() -> Getting property: supported_locales
2022-07-27 18:12:21,818 [Thread-41] DEBUG com.redhat.rhn.common.conf.Config - getString() -     getString() -> result: null
2022-07-27 18:12:21,818 [Thread-41] DEBUG com.redhat.rhn.common.conf.Config - getString() -     getString() -> returning: bn_IN,ca,de,en_US,es,fr,gu,hi,it,ja,ko,pa,pt,pt_BR,ru,ta,zh_CN,zh_TW
2022-07-27 18:12:23,901 [Thread-41] WARN  org.hibernate.orm.deprecation - HHH90000017: Found use of deprecated entity-type selector syntax in HQL/JPQL query ['s.class']; use TYPE operator instead : type(s)
2022-07-27 18:12:24,032 [Thread-41] WARN  org.hibernate.orm.deprecation - HHH90000017: Found use of deprecated entity-type selector syntax in HQL/JPQL query ['s.class']; use TYPE operator instead : type(s)
2022-07-27 18:12:24,147 [Thread-41] WARN  org.hibernate.orm.deprecation - HHH90000017: Found use of deprecated entity-type selector syntax in HQL/JPQL query ['sa.server.class']; use TYPE operator instead : type(sa.server)
2022-07-27 18:12:24,158 [Thread-41] WARN  org.hibernate.orm.deprecation - HHH90000017: Found use of deprecated entity-type selector syntax in HQL/JPQL query ['sa.server.class']; use TYPE operator instead : type(sa.server)
2022-07-27 18:12:24,166 [Thread-41] WARN  org.hibernate.orm.deprecation - HHH90000017: Found use of deprecated entity-type selector syntax in HQL/JPQL query ['s.class']; use TYPE operator instead : type(s)
2022-07-27 18:12:24,220 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - startMessaging() - start
2022-07-27 18:12:24,221 [Thread-41] DEBUG com.redhat.rhn.common.conf.Config - getString() -     getString() called with: java.message_queue_thread_pool_size
2022-07-27 18:12:24,221 [Thread-41] DEBUG com.redhat.rhn.common.conf.Config - getString() -     getString() -> Getting property: message_queue_thread_pool_size
2022-07-27 18:12:24,221 [Thread-41] DEBUG com.redhat.rhn.common.conf.Config - getString() -     getString() -> result: null
2022-07-27 18:12:24,221 [Thread-41] DEBUG com.redhat.rhn.common.conf.Config - getString() -     getString() -> returning: 5
2022-07-27 18:12:24,222 [Thread-41] INFO  com.redhat.rhn.common.messaging.MessageQueueThreadPool - Started message queue thread pool (size: 5)
2022-07-27 18:12:24,223 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - startMessaging() - end
2022-07-27 18:12:24,224 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.TraceBackAction@17415034 class: com.redhat.rhn.frontend.events.TraceBackEvent
2022-07-27 18:12:24,225 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.NewUserAction@49e84202 class: com.redhat.rhn.frontend.events.NewUserEvent
2022-07-27 18:12:24,226 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.UpdateErrataCacheAction@158e3e42 class: com.redhat.rhn.frontend.events.UpdateErrataCacheEvent
2022-07-27 18:12:24,226 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.RestartSatelliteAction@167a09dc class: com.redhat.rhn.frontend.events.RestartSatelliteEvent
2022-07-27 18:12:24,227 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.SsmChangeBaseChannelSubscriptionsAction@14be1535 class: com.redhat.rhn.frontend.events.SsmChangeBaseChannelSubscriptionsEvent
2022-07-27 18:12:24,227 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.SsmChangeChannelSubscriptionsAction@2ac9aa5c class: com.redhat.rhn.frontend.events.SsmChangeChannelSubscriptionsEvent
2022-07-27 18:12:24,227 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.SsmDeleteServersAction@63288813 class: com.redhat.rhn.frontend.events.SsmDeleteServersEvent
2022-07-27 18:12:24,228 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.SsmInstallPackagesAction@7b5f827c class: com.redhat.rhn.frontend.events.SsmInstallPackagesEvent
2022-07-27 18:12:24,229 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.SsmRemovePackagesAction@514e049f class: com.redhat.rhn.frontend.events.SsmRemovePackagesEvent
2022-07-27 18:12:24,229 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.SsmVerifyPackagesAction@2d011722 class: com.redhat.rhn.frontend.events.SsmVerifyPackagesEvent
2022-07-27 18:12:24,230 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.SsmUpgradePackagesAction@10cad139 class: com.redhat.rhn.frontend.events.SsmUpgradePackagesEvent
2022-07-27 18:12:24,230 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.SsmPowerManagementAction@b0fa3cb class: com.redhat.rhn.frontend.events.SsmPowerManagementEvent
2022-07-27 18:12:24,231 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.CloneErrataAction@41ec90bb class: com.redhat.rhn.frontend.events.CloneErrataEvent
2022-07-27 18:12:24,231 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.NewCloneErrataAction@2e1f45e5 class: com.redhat.rhn.frontend.events.NewCloneErrataEvent
2022-07-27 18:12:24,232 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.SsmErrataAction@49e6d254 class: com.redhat.rhn.frontend.events.SsmErrataEvent
2022-07-27 18:12:24,233 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.AlignSoftwareTargetAction@5722666 class: com.redhat.rhn.frontend.events.AlignSoftwareTargetMsg
2022-07-27 18:12:24,233 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.ScheduleRepoSyncAction@46c7f070 class: com.redhat.rhn.frontend.events.ScheduleRepoSyncEvent
2022-07-27 18:12:24,233 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.SsmSystemRebootAction@3841ff3f class: com.redhat.rhn.frontend.events.SsmSystemRebootEvent
2022-07-27 18:12:24,234 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.redhat.rhn.frontend.events.SsmConfigFilesAction@93ef8a9 class: com.redhat.rhn.frontend.events.SsmConfigFilesEvent
2022-07-27 18:12:24,235 [Thread-41] DEBUG com.redhat.rhn.common.messaging.MessageQueue - registerAction(MessageAction, Class) - : com.suse.manager.reactor.messaging.ChannelsChangedEventMessageAction@25a143c3 class: com.suse.manager.reactor.messaging.ChannelsChangedEventMessage
2022-07-27 18:12:24,373 [Thread-41] FATAL com.redhat.rhn.taskomatic.core.TaskomaticDaemon - Failure occured during job recovery.

@hbrown-uiowa
Copy link

We updated to 2022.06 and the issue persists. We can get taskomatic to generate this error:

Traceback (most recent call last):
	3: from ./uyuni_channel.rb:16:in `<main>'
	2: from ./uyuni_channel.rb:16:in `each'
	1: from ./uyuni_channel.rb:25:in `block in <main>'
/usr/lib64/ruby/gems/2.5.0/gems/xmlrpc-0.3.0/lib/xmlrpc/client.rb:272:in `call': redstone.xmlrpc.XmlRpcFault: Taskomatic is not available. (XMLRPC::FaultException)

by calling this api: sync_schedule = @client.call('channel.software.getRepoSyncCronExpression', @key, channel['label'])

@hbrown-uiowa
Copy link

The last successful action before this started failing for us was:

Auto Patch Updates: 	2022-07-24 15:31:45 CDT 	FINISHED

as seen in https://uyuniserver/rhn/manager/admin/runtime-status

@mbussolotto
Copy link
Member

@sysy69 can you please enable log level debug and provide the complete /var/log/rhn/rhn_taskomatic_daemon.log ? for enabling debug log uncomment line

log4j.logger.com.redhat.rhn=DEBUG

from /usr/share/rhn/classes/log4j.properties

@hbrown-uiowa if /usr/share/rhn/classes/log4j.properties does not exist, please create it, add log4j.logger.com.redhat.rhn=DEBUG and then rctaskomatic restart. You should also add the same line to /srv/tomcat/webapps/rhn/WEB-INF/classes/log4j.properties (create the file if it doesn't exist) then rctomcat restart . Then reproduce the issue and provide these two logfiles:

  • /var/log/rhn/rhn_web_ui.log
  • /var/log/rhn/rhn_taskomatic_daemon.log

Logs will be really verbose so after this test you may want to disable them, to save disk space (so remove /var/log/rhn/rhn_taskomatic_daemon.log from the two files and restart the processes)

@megamaced
Copy link

I enabled the debug settings from above and restarted but didn't seem to add additional logging for me

https://gist.github.com/megamaced/7eb5a7d4e6e825da5bebdee166d00243

@megamaced
Copy link

Is there any work around to this issue? We can't sync repodata

@megamaced
Copy link

still broken in 2022.08

@ddholstad
Copy link

I suspect the issue is with quartz scheduler. Is there a way to clean out the jobs?

uyuni=# select * from qrtz_simple_triggers;
trigger_name | trigger_group | repeat_count | repeat_interval | times_triggered | sched_name
-------------------------------------+---------------+--------------+-----------------+-----------------+------------------------
repo-sync-1-149-retry20220724154716 | 1 | 0 | 0 | 1 | DefaultQuartzScheduler
repo-sync-1-148-retry20220724143715 | 1 | 0 | 0 | 0 | DefaultQuartzScheduler
repo-sync-1-131-retry20220724153236 | 1 | 0 | 0 | 0 | DefaultQuartzScheduler
repo-sync-1-128-retry20220724132525 | 1 | 0 | 0 | 0 | DefaultQuartzScheduler
repo-sync-1-147-retry20220724133756 | 1 | 0 | 0 | 0 | DefaultQuartzScheduler
repo-sync-1-151-retry20220724155226 | 1 | 0 | 0 | 0 | DefaultQuartzScheduler
repo-sync-5-161-retry20220724155650 | 5 | 0 | 0 | 0 | DefaultQuartzScheduler
(7 rows)

uyuni=# select trigger_name,trigger_group,job_name,trigger_state from qrtz_triggers WHERE trigger_name LIKE '%retry%';
trigger_name | trigger_group | job_name | trigger_state
-------------------------------------+---------------+-----------------+---------------
repo-sync-1-149-retry20220724154716 | 1 | repo-sync-1-149 | COMPLETE
repo-sync-1-149-retry20220724155311 | 1 | repo-sync-1-149 | WAITING
repo-sync-1-131-retry20220724153236 | 1 | repo-sync-1-131 | WAITING
repo-sync-1-128-retry20220724132525 | 1 | repo-sync-1-128 | WAITING
repo-sync-1-147-retry20220724133756 | 1 | repo-sync-1-147 | WAITING
repo-sync-1-151-retry20220724155226 | 1 | repo-sync-1-151 | WAITING
repo-sync-5-161-retry20220724155650 | 5 | repo-sync-5-161 | WAITING
repo-sync-1-148-retry20220724143715 | 1 | repo-sync-1-148 | ACQUIRED
(8 rows)

@ddholstad99
Copy link

takomatic started after running:
uyuni=# delete from qrtz_simple_triggers where trigger_name like '%retry%';
DELETE 7
uyuni=# delete from qrtz_triggers WHERE trigger_name LIKE '%retry%';
DELETE 8

We are waiting to see if it stops in the same part of the schedule when it gets back to it.

@MaxHerrmannSVA
Copy link

We had the same issue...
Thanks @ddholstad99 for those two SQL statements. Taskomatic wont fail now after 10 seconds.

@sayingwhat
Copy link

same problem...
thanks @ddholstad99 for the statements

@mbussolotto mbussolotto added P2 and removed P3 labels Sep 8, 2022
@mbussolotto
Copy link
Member

cc @mackdk

@hbrown-uiowa
Copy link

If the problem recurs, can someone add the log info that was requested. Cleaning out the entries in the DB is just dealing with symptoms (and we're not even sure that it's the right thing to do). Since clearing out those rows in the DB, we haven't had it recur and our test system is syncing well on 2022.08.

@darrynl
Copy link

darrynl commented Oct 4, 2022

Unfortunately the statements @ddholstad99 did not work for me. Entries were removed, but no tasks running. Version 2022.08

@darrynl
Copy link

darrynl commented Oct 7, 2022

Running Uyuni 2022.08

I have tried the statements shared by @ddholstad99 , but I am still stuck with the same issue.

uyuni=# select * from qrtz_simple_triggers;
trigger_name | trigger_group | repeat_count | repeat_interval | times_triggered | sched_name
--------------+---------------+--------------+-----------------+-----------------+------------
(0 rows)

uyuni=# select trigger_name,trigger_group,job_name,trigger_state from qrtz_triggers WHERE trigger_name LIKE '%retry%';
trigger_name | trigger_group | job_name | trigger_state
--------------+---------------+----------+---------------
(0 rows)

In Uyuni when clicking on Admin, Task Schedules, I get this:
image

I see this in /var/log/rhn/rhn_taskomatic.daemon.log:
image

Status of taskomatic service:
image

Please let me know if any other logs/information would help.

@mbussolotto mbussolotto self-assigned this Nov 3, 2022
@mbussolotto
Copy link
Member

mbussolotto commented Nov 3, 2022

@darrynl please follow the instruction here: #5556 (comment) .
Moreover, which uyuni version are you using? Is it an upgrade?

@mbussolotto
Copy link
Member

@darrynl @sysy69 please try to upgrade uyuni and let me know if the issue is still present. If still present, please follow #5556 (comment) . Thanks!

@mbussolotto
Copy link
Member

@darrynl @sysy69 any update ^^? Thanks :)

@mbussolotto
Copy link
Member

unfortunately I wasn't able to reproduce the bug and the current logs are not enough to explain the reason of the issue. Since there's no answer for a long time, I suppose the issue has been fix, so I'll close the bug. Feel free to re-open it if you still see the error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P2
Projects
None yet
Development

No branches or pull requests

10 participants