Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash: 17270152460592452540 #1532

Closed
tigerbeetle-vopr opened this issue Feb 9, 2024 · 3 comments · Fixed by #1598
Closed

Crash: 17270152460592452540 #1532

tigerbeetle-vopr opened this issue Feb 9, 2024 · 3 comments · Fixed by #1598

Comments

@tigerbeetle-vopr
Copy link

Commit: 1d1182974a36ec480a79f2300a01d0eb01264283

Branches: main

Duration to run seed in ReleaseSafe mode: 3s

Stack Trace:

thread panic: reached unreachable code
src/vsr/replica.zig:0:76: in valid_hash_chain_between (simulator)
src/vsr/replica.zig:5427:49: in repair (simulator)
assert(self.valid_hash_chain_between(self.op_repair_min(), self.op));

src/vsr/replica.zig:2779:24: in tick (simulator)
self.repair();

src/testing/cluster.zig:356:37: in tick (simulator)
replica.tick();

src/simulator.zig:253:23: in main (simulator)
simulator.tick();

zig/lib/std/start.zig:574:37: in posixCallMainAndExit (simulator)
const result = root.main() catch |err| {

zig/lib/std/start.zig:243:5: in _start (simulator)
asm volatile (switch (native_arch) {

???:?:?: in ??? (???)
Unwind information for `???:0x1` was not available, trace may be incomplete

debug: exit with signal: 6. Indicates a crash bug.



Debug Logs:

Tail



SEED=17270152460592452540

replicas=1
standbys=1
clients=6
request_probability=76%
idle_on_probability=14%
idle_off_probability=18%
one_way_delay_mean=8 ticks
one_way_delay_min=0 ticks
packet_loss_probability=6%
path_maximum_capacity=13 messages
path_clog_duration_mean=368 ticks
path_clog_probability=1%
packet_replay_probability=9%
partition_mode=testing.packet_simulator.PartitionMode.isolate_single
partition_symmetry=testing.packet_simulator.PartitionSymmetry.asymmetric
partition_probability=2%
unpartition_probability=9%
partition_stability=143 ticks
unpartition_stability=13 ticks
read_latency_min=0
read_latency_mean=4
write_latency_min=0
write_latency_mean=77
read_fault_probability=8%
write_fault_probability=9%
crash_probability=0.00002%
crash_stability=337 ticks
restart_probability=0.0002%
restart_stability=431 ticks
 0/ .1V0/__1/__1C0:__4Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?4/4Pp2/3Rq
 0/ .1V0/__2/__2C0:__5Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?4/4Pp2/3Rq
 0/ .1V0/__3/__3C0:__6Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?4/4Pp2/3Rq
 1|.1V0/__1/__3C0:__7Jo0/_7J!0:_31Wo <__0:__0>0Ga0G!0G?
 0/ .1V0/__4/__4C0:__7Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?4/4Pp2/3Rq
 0/ .1V0/__5/__5C0:__8Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?4/4Pp2/3Rq
 0/ .1V0/__6/__6C0:__9Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?4/4Pp2/3Rq
 0/ .1V0/__7/__7C0:_10Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?4/4Pp1/3Rq
 1|.1V0/__2/__6C0:__9Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?
 1|.1V0/__3/__6C0:__9Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?
 1|.1V0/__4/__6C0:__9Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?
 1|.1V0/__5/__6C0:__9Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?
 1|.1V0/__6/__6C0:__9Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?
 1|.1V0/__7/__7C0:__9Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?
 0/ .1V0/__8/__8C0:_11Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?4/4Pp1/3Rq
 1|.1V0/__8/__8C0:_12Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?
 0/ .1V0/__9/__9C0:_12Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?4/4Pp1/3Rq
 0/ .1V0/_10/_10C0:_13Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?4/4Pp0/3Rq
 0/ .1V0/_11/_11C0:_14Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?4/4Pp0/3Rq
 0/ .1V0/_12/_12C0:_15Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?4/4Pp0/3Rq
 1|.1V0/__9/_11C0:_14Jo0/_1J!0:_31Wo <__0:__0>0Ga0G!0G?
 1|.1V0/_10/_11C0:_14Jo0/_1J!0:_31Wo <__0:__0>0Ga0G!0G?
 1|.1V0/_11/_11C0:_14Jo0/_1J!0:_31Wo <__0:__0>0Ga0G!0G?
 0/ .1V0/_13/_13C0:_16Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?4/4Pp0/3Rq
 1|.1V0/_12/_13C0:_14Jo0/_1J!0:_31Wo <__0:__0>0Ga0G!0G?
 0/ .1V0/_14/_14C0:_16Jo0/_2J!0:_31Wo <__0:__0>0Ga0G!0G?3/4Pp0/3Rq
 0/ .1V0/_15/_15C0:_17Jo0/_2J!0:_31Wo <__0:__0>0Ga0G!0G?3/4Pp0/3Rq
 0/ .1V0/_16/_16C0:_17Jo0/_1J!0:_31Wo <__0:__0>0Ga0G!0G?2/4Pp0/3Rq
 0/ .1V0/_17/_17C0:_19Jo0/_2J!0:_31Wo <__0:__0>0Ga0G!0G?3/4Pp0/3Rq
 0/ .1V0/_18/_18C0:_21Jo0/_3J!0:_31Wo <__0:__0>0Ga0G!0G?4/4Pp0/3Rq
 0/ .1V0/_19/_19C0:_21Jo0/_2J!0:_31Wo <__0:__0>8Ga0G!0G?3/4Pp0/3Rq
 0/ .1V0/_20/_20C0:_22Jo0/_2J!0:_31Wo <__0:__0>9Ga0G!0G?3/4Pp0/3Rq
 0/ .1V0/_21/_21C0:_24Jo0/_3J!0:_31Wo <__0:__0>9Ga0G!0G?4/4Pp1/3Rq
 0/ .1V0/_22/_22C0:_25Jo0/_3J!0:_31Wo <__0:__0>9Ga0G!0G?4/4Pp1/3Rq
 0/ .1V0/_23/_23C0:_26Jo0/_3J!0:_31Wo <__0:__0>9Ga0G!0G?4/4Pp0/3Rq
 0/ .1V0/_24/_24C0:_26Jo0/_2J!0:_31Wo <__0:__0>9Ga0G!0G?3/4Pp0/3Rq
 0/ .1V0/_25/_25C0:_26Jo0/_1J!0:_31Wo <__0:__0>9Ga0G!0G?2/4Pp0/3Rq
 0/ .1V0/_26/_26C0:_26Jo0/_0J!0:_31Wo <__0:__0>9Ga0G!0G?1/4Pp0/3Rq
 0/ .1V0/_27/_27C0:_27Jo0/_0J!0:_31Wo <__0:__0>17Ga0G!0G?1/4Pp0/3Rq
 0 [ / .1V0/_27/_27C0:_27Jo0/_0J!0:_31Wo <__0:__0>17Ga0G!0G?0/4Pp0/3Rq
 0 ] / .1V23/_27/_27C0:_29Jo0/_1J!0:_31Wo <__0:__0>17Ga0G!0G?2/4Pp0/3Rq
 0/ .1V23/_28/_28C0:_29Jo0/_1J!0:_31Wo <__0:__0>17Ga0G!0G?2/4Pp0/3Rq
 0/ .1V23/_29/_29C0:_31Jo0/_2J!0:_31Wo <__0:__0>18Ga0G!0G?3/4Pp0/3Rq
 0/ .1V23/_30/_30C0:_31Jo0/_1J!0:_31Wo <__0:__0>18Ga0G!0G?2/4Pp0/3Rq
 0/ .1V23/_31/_31C2:_33Jo0/_2J!0:_31Wo <__0:__0>18Ga0G!0G?3/4Pp0/3Rq
 0/ .1V23/_32/_32C2:_33Jo0/_1J!1:_32Wo <__0:__0>18Ga0G!0G?2/4Pp0/3Rq
 0/ .1V23/_33/_33C5:_36Jo0/_3J!2:_33Wo <__0:__0>18Ga0G!0G?4/4Pp0/3Rq
 0/ .1V23/_34/_34C6:_37Jo0/_3J!3:_34Wo <__0:__0>18Ga0G!0G?4/4Pp0/3Rq
 0/ .1V23/_35/_35C7:_38Jo0/_3J!4:_35Wo <__0:__0>18Ga0G!0G?4/4Pp1/3Rq
 0/ .1V23/_36/_36C8:_39Jo0/_3J!5:_36Wo <__0:__0>18Ga0G!0G?4/4Pp0/3Rq
 0/ .1V23/_37/_37C8:_39Jo0/_2J!6:_37Wo <__0:__0>18Ga0G!0G?3/4Pp0/3Rq
 0/ .1V23/_38/_38C9:_40Jo0/_2J!7:_38Wo <__0:__0>18Ga0G!0G?3/4Pp0/3Rq
 0/ .1V23/_39/_39C10:_41Jo0/_2J!8:_39Wo <__0:__0>18Ga0G!0G?3/4Pp0/3Rq
 0/ .1V23/_40/_40C10:_41Jo0/_1J!9:_40Wo <__0:__0>18Ga0G!0G?2/4Pp0/3Rq
 0/ .1V23/_41/_41C11:_42Jo0/_1J!10:_41Wo <__0:__0>18Ga0G!0G?2/4Pp0/3Rq
 1 < ~.1V0/_12/_41C0:_31Jo0/_6J!0:_31Wo <__0:__0>0Ga0G!0G?
 0/ .1V23/_42/_42C14:_45Jo0/_3J!11:_42Wo <__0:__0>18Ga0G!0G?4/4Pp0/3Rq
 0/ .1V23/_43/_43C15:_46Jo0/_3J!12:_43Wo <__0:__0>18Ga0G!0G?4/4Pp0/3Rq
 1 > |.1V23/_23/_42C0:_31Jo0/_7J!0:_31Wo <__0:_27>nullGa0G!0G?
 0/ .1V23/_44/_44C16:_47Jo0/_3J!13:_44Wo <__0:__0>18Ga0G!0G?4/4Pp0/3Rq
 0/ .1V23/_45/_45C17:_48Jo0/_3J!14:_45Wo <__0:__0>18Ga0G!0G?4/4Pp0/3Rq
 0 $ # #
 0  / r2V23/_23/_42C15:_46Jo0/_0J!15:_46Wo <__0:__0>nullGa0G!0G?
 0/ r2V23/_24/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>17Ga0G!0G?
 0/ r2V23/_25/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>17Ga0G!0G?
 0/ r2V23/_26/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>17Ga0G!0G?
 0/ r2V23/_27/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>17Ga0G!0G?
 0/ r2V23/_28/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>17Ga0G!0G?
 0/ r2V23/_29/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_30/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_31/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_32/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_33/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_34/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_35/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_36/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_37/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_38/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_39/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_40/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_41/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_42/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_43/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_44/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_45/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ r2V23/_46/_46C15:_46Jo0/_0J!15:_46Wo <__0:__0>18Ga0G!0G?
 0/ .2V23/_47/_47C16:_47Jo0/_0J!16:_47Wo <__0:__0>18Ga0G!0G?1/4Pp0/3Rq
 0/ .2V23/_48/_48C18:_49Jo0/_1J!17:_48Wo <__0:__0>18Ga0G!0G?2/4Pp0/3Rq
 0/ .2V23/_49/_49C19:_50Jo0/_1J!18:_49Wo <__0:__0>18Ga0G!0G?2/4Pp0/3Rq
 0/ .2V23/_50/_50C22:_53Jo0/_3J!19:_50Wo <__0:__0>18Ga0G!0G?4/4Pp0/3Rq
 0/ .2V23/_51/_51C23:_54Jo0/_3J!20:_51Wo <__0:__0>18Ga0G!0G?4/4Pp1/3Rq
 0 [ / .2V23/_51/_51C24:_55Jo0/_4J!20:_51Wo <__0:__0>18Ga0G!0G?4/4Pp0/3Rq
 0 ] / .2V47/_51/_51C24:_55Jo0/_3J!21:_52Wo <__0:__0>17Ga0G!0G?4/4Pp0/3Rq
 0/ .2V47/_52/_52C24:_55Jo0/_3J!21:_52Wo <__0:__0>17Ga0G!0G?4/4Pp0/3Rq
 0/ .2V47/_53/_53C25:_56Jo0/_3J!22:_53Wo <__0:__0>18Ga0G!0G?4/4Pp0/3Rq
 1 < ~.1V23/_23/_45C0:_48Jo0/11J!0:_48Wo <__0:_27>17Ga0G!0G?
 0/ .2V47/_54/_54C25:_56Jo0/_2J!23:_54Wo <__0:__0>18Ga0G!0G?3/4Pp0/3Rq
 0/ .2V47/_55/_55C25:_56Jo0/_1J!24:_55Wo <__0:__0>18Ga0G!0G?2/4Pp0/3Rq
 0/ .2V47/_56/_56C27:_58Jo0/_2J!25:_56Wo <__0:__0>18Ga0G!0G?3/4Pp0/3Rq
 0/ .2V47/_57/_57C28:_59Jo0/_2J!26:_57Wo <__0:__0>18Ga0G!0G?3/4Pp0/3Rq
 0/ .2V47/_58/_58C28:_59Jo0/_1J!27:_58Wo <__0:__0>18Ga0G!0G?2/4Pp0/3Rq
 0/ .2V47/_59/_59C28:_59Jo0/_0J!28:_59Wo <__0:__0>18Ga0G!0G?1/4Pp0/3Rq
 0/ .2V47/_60/_60C30:_61Jo0/_1J!29:_60Wo <__0:__0>18Ga0G!0G?2/4Pp0/3Rq
 0/ .2V47/_61/_61C31:_62Jo0/_1J!30:_61Wo <__0:__0>18Ga0G!0G?2/4Pp0/3Rq
 0/ .2V47/_62/_62C31:_62Jo0/_0J!31:_62Wo <__0:__0>18Ga0G!0G?1/4Pp0/3Rq
 0/ .2V47/_63/_63C34:_65Jo0/_2J!32:_63Wo <__0:__0>18Ga0G!0G?3/4Pp0/3Rq
 0/ .2V47/_64/_64C34:_65Jo0/_1J!33:_64Wo <__0:__0>18Ga0G!0G?2/4Pp0/3Rq
 0/ .2V47/_65/_65C34:_65Jo0/_0J!34:_65Wo <__0:__0>18Ga0G!0G?1/4Pp0/3Rq
 0/ .2V47/_66/_66C35:_66Jo0/_0J!35:_66Wo <__0:__0>18Ga0G!0G?1/4Pp0/3Rq
 1 > |.1V47/_47/_47C0:_48Jo0/20J!0:_48Wo <__0:_51>nullGa0G!0G?
@matklad
Copy link
Member

matklad commented Feb 12, 2024

          replicas=1
          standbys=1

after state-sync, standby finds a break between checkpoint's commit_min checksum, and the journals (op_checkpoint() + 1).parent

@matklad
Copy link
Member

matklad commented Feb 12, 2024

Debugged this.

The standby in view=1 receives a commit message for view=2. This commit message causes the standby to state sync. However, the standby is still in view=1, as it waits for start view.

So, when the standby finishes the sync, it has:

  • checkpoint from view=2
  • log from view=1 (which was later truncated by the primary).

@matklad
Copy link
Member

matklad commented Feb 26, 2024

replica_test: af60c36

matklad added a commit that referenced this issue Feb 27, 2024
The added test is a minimization of #1532.

If replica state-syncs into a newer view, its log may be disconnected
from a checkpoint. To fix this, require that replica jumps view before
jumping sync target.

The effect here is that, after state sync, there's _still_ a break
between the checkpoint and the log, but, as replica is in a view_change
state, it doesn't assume valid hash chain and waits until an SV to
correct the log.

SEED: 17270152460592452540
Closes: #1532
matklad added a commit that referenced this issue Feb 27, 2024
The added test is a minimization of #1532.

If replica state-syncs into a newer view, its log may be disconnected
from a checkpoint. To fix this, require that replica jumps view before
jumping sync target.

The effect here is that, after state sync, there's _still_ a break
between the checkpoint and the log, but, as replica is in a view_change
state, it doesn't assume valid hash chain and waits until an SV to
correct the log.

SEED: 17270152460592452540
Closes: #1532
matklad added a commit that referenced this issue Feb 27, 2024
The added test is a minimization of #1532.

If replica state-syncs into a newer view, its log may be disconnected
from a checkpoint. To fix this, require that replica jumps view before
jumping sync target.

The effect here is that, after state sync, there's _still_ a break
between the checkpoint and the log, but, as replica is in a view_change
state, it doesn't assume valid hash chain and waits until an SV to
correct the log.

SEED: 17270152460592452540
Closes: #1532
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants