Skip to content

xapi or state.db broken after XCP-NG 7.6 Upgrade from 7.5  #94

@cocoon

Description

@cocoon

As discussed here:
https://xcp-ng.org/forum/topic/575/vmware-xcp-ng-7-6-iso-upgrade-from-7-5-network-gone

After upgrading 2 VMware VMs xapi is not working anymore because it can't read the database state.db.
Upgrade was tested with ISO anmd YUM with the same problem.

Problems that are caused by this:

There is no network adapter shown in the console menu and it is not configurable with "Configure Management Interface" as it shows no interface to select, only "".
But on the command line there are still the interfaces and bridges and ping/ssh from/to the ip is still working.

xe commands are not working:

xe network-list
Error: Connection refused (calling connect )`

On the console the following message appears:

xapi-nbd[6121]: main: Failed to log in via xapi's Unix domain socket in 300.000000 seconds

The real problem:

xapi error in /var/log/xensource.log:

Nov  1 09:25:52 xen-01 xapi: [error|xen-01|0 ||backtrace] server_init D:7b76fe698182 failed with exception Db_exn.DBCache_NotFound("missing column", "Cluster", "network")
Nov  1 09:25:52 xen-01 xapi: [error|xen-01|0 ||backtrace] Raised Db_exn.DBCache_NotFound("missing column", "Cluster", "network")
Nov  1 09:25:52 xen-01 xapi: [error|xen-01|0 ||backtrace] 1/1 xapi @ xen-01 Raised at file (Thread 0 has no backtrace table. Was with_backtraces called?, line 0
Nov  1 09:25:52 xen-01 xapi: [error|xen-01|0 ||backtrace]
Nov  1 09:25:52 xen-01 xapi: [debug|xen-01|0 ||xapi] xapi top-level caught exception: INTERNAL_ERROR: [ missing column; Cluster; network ]
Nov  1 09:25:52 xen-01 xapi: [error|xen-01|0 ||backtrace] Raised Db_exn.DBCache_NotFound("missing column", "Cluster", "network")
Nov  1 09:25:52 xen-01 xapi: [error|xen-01|0 ||backtrace] 1/1 xapi @ xen-01 Raised at file (Thread 0 has no backtrace table. Was with_backtraces called?, line 0
Nov  1 09:25:52 xen-01 xapi: [error|xen-01|0 ||backtrace]
Nov  1 09:25:53 xen-01 xapi: [ warn|xen-01|0 ||xapi] Duplicate configuration keys in Xcp_service.configure: disable-logging-for in [ use-switch; switch-path; search-path; pidfile; log; daemon; disable-logging-for; loglevel; inventory; config; config-dir; master_connection_reset_timeout; master_connection_retry_timeout; master_connection_default_timeout; qemu_dm_ready_timeout; hotplug_timeout; pif_reconfigure_ip_timeout; pool_db_sync_interval; pool_data_sync_interval; domain_shutdown_total_timeout; emergency_reboot_delay_base; emergency_reboot_delay_extra; ha_xapi_healthcheck_interval; ha_xapi_healthcheck_timeout; ha_xapi_restart_attempts; ha_xapi_restart_timeout; logrotate_check_interval; rrd_backup_interval; session_revalidation_interval; update_all_subjects_interval; wait_memory_target_timeout; snapshot_with_quiesce_timeout; host_heartbeat_interval; host_assumed_dead_interval; fuse_time; db_restore_fuse_time; inactive_session_timeout; pending_task_timeout; completed_task_timeout; minimum_time_between_bounces; minimum_time_between_reboot_with_no_added_delay; ha_monitor_interval; ha_monitor_plan_interval; ha_monitor_startup_timeout; ha_default_timeout_base; guest_liveness_timeout; permanent_master_failure_retry_interval; redo_log_max_block_time_empty; redo_log_max_block_time_read; redo_log_max_block_time_writedelta; redo_log_max_block_time_writedb; redo_log_max_startup_time; redo_log_connect_delay; default-vbd3-polling-duration; default-vbd3-polling-idle-threshold; vm_call_plugin_interval; xapi_clusterd_port; sm-plugins; hotfix-fingerprint; logconfig; writereadyfile; writeinitcomplete; nowatchdog; log-getter; onsystemboot; relax-xsm-sr-check; disable-logging-for; disable-dbsync-for; xenopsd-queues; xenopsd-default; nvidia-whitelist; igd-passthru-vendor-whitelist; gvt-g-whitelist; mxgpu-whitelist; pass-through-pif-carrier; cluster-stack-default; ciphersuites-good-outbound; ciphersuites-legacy-outbound; gpumon_stop_timeout; reboot_required_hfxs; xen_livepatch_list; kpatch_list; modprobe_path; db_idempotent_map; post-install-scripts-dir; gpg-homedir; xen-cmdline; cluster-stack-root; web-dir; tools-sr-dir; sm-dir; udhcpd-skel; db-config-file; pool_config_file; fcoe-driver; xen-cmdline-script; static-vdis; xsh; xe-toolstack-restart; xe; host-restore; host-backup; upload-wrapper; update-mh-info; logs-download; xe-syslog-reconfigure; set-hostname; host-bugreport-upload; fence; vhd-tool; sparse_dd; redo-log-block-device-io; pbis-force-domain-leave-script; busybox; xapissl; startup-script-hook; rolling-upgrade-script-hook; xapi-message-script; non-managed-pifs; update-issue; killall; nbd-firewall-config; firewall-port-config; nbd_client_manager; pool_secret_path; udhcpd-conf; remote-db-conf-file; logconfig; cpu-info-file; server-cert-path; iscsi_initiatorname; master-scripts-dir; packs-dir; xapi-hooks-root; xapi-plugins-root; xapi-extensions-root; static-vdis-root ]

Details about the environment:

  • VMware ESXi v6.0.0 6921384
  • 3 network adapters, 2 adapters on one network and the one in the middle on another network
  • VM nic tested: Intel E1000E and also changing to E1000
  • VM was I think initially installed with Citrix Xen Server 7.4 and then upgraded to XCP-NG 7.4 or directly fresh installed with XCP-NG 7.4 (can't remember), then Upgraded to 7.5 and now to 7.6
  • It was a Cluster with a pool and also used with CloudStack and had multiple vlan
    -OVS backend network enabled

Image

Image

The same problem was reportet on the forum to happen on a Cluster with Dell M600.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions