Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to rebuild edge index #4166

Closed
kikimo opened this issue Apr 15, 2022 · 3 comments
Closed

Unable to rebuild edge index #4166

kikimo opened this issue Apr 15, 2022 · 3 comments
Assignees
Labels
type/bug Type: something is unexpected
Milestone

Comments

@kikimo
Copy link
Contributor

kikimo commented Apr 15, 2022

Please check the FAQ documentation before raising an issue

Describe the bug (required)

Unable to rebuild edge index.

way to reproduce:

  1. Start a cluster of 5storage + 5replicas + 1part, create a space with one edge (no edge index at first)
  2. keep insert edge and trigger network partition
  3. create edge index and rebuild edge index, it will faile
  4. stop insertint edge and recover network partition, and rebuild edge index, it always fail

What we found in the log:

585604 E20220418 11:28:30.456131    40 AddEdgesProcessor.cpp:361] Error! ret = E_LEADER_LEASE_FAILED, spaceId 1
585605 I20220418 11:28:30.456140    40 RaftPart.cpp:218] ===> OOPs, atomOp failed!!!, code = E_RAFT_ATOMIC_OP_FAILED
585606 E20220418 11:28:30.456156    40 AddEdgesProcessor.cpp:361] Error! ret = E_LEADER_LEASE_FAILED, spaceId 1
585607 I20220418 11:28:30.456161    40 RaftPart.cpp:218] ===> OOPs, atomOp failed!!!, code = E_RAFT_ATOMIC_OP_FAILED
585608 E20220418 11:28:30.456176    40 AddEdgesProcessor.cpp:361] Error! ret = E_LEADER_LEASE_FAILED, spaceId 1
585609 I20220418 11:28:30.456183    40 RaftPart.cpp:218] ===> OOPs, atomOp failed!!!, code = E_RAFT_ATOMIC_OP_FAILED
585610 E20220418 11:28:30.456197    40 AddEdgesProcessor.cpp:361] Error! ret = E_LEADER_LEASE_FAILED, spaceId 1
585611 I20220418 11:28:30.456204    40 RaftPart.cpp:218] ===> OOPs, atomOp failed!!!, code = E_RAFT_ATOMIC_OP_FAILED
585612 I20220418 11:28:34.883504    36 AdminTask.cpp:21] createAdminTask (79, 0)
585613 I20220418 11:28:34.883563    36 RebuildIndexTask.cpp:28] Rebuild index task is rate limited to 4194304 for each subtask by default
585614 I20220418 11:28:34.883694    36 AdminTaskManager.cpp:158] enqueue task(79, 0)
585615 I20220418 11:28:34.883728   131 AdminTaskManager.cpp:239] dequeue task(79, 0)
585616 I20220418 11:28:34.883819   131 AdminTaskManager.cpp:282] run task(79, 0), 1 subtasks in 1 thread
585617 I20220418 11:28:34.884032   131 AdminTaskManager.cpp:227] waiting for incoming task
585618 I20220418 11:28:34.884073   799 RebuildIndexTask.cpp:213] Modify the index failed
585619 I20220418 11:28:34.884099   799 RebuildIndexTask.cpp:97] Start building index
585620 I20220418 11:28:34.884121   799 RebuildEdgeIndexTask.cpp:58] Processing Part 1 Failed
585621 I20220418 11:28:34.884126   799 RebuildIndexTask.cpp:100] Building index failed
585622 I20220418 11:28:34.884130   799 AdminTaskManager.cpp:318] subtask of task(79, 0) finished, unfinished task 0
585623 I20220418 11:28:34.884135   799 AdminTask.h:129] task(79, 0) finished, rc=[E_REBUILD_INDEX_FAILED]
585624 I20220418 11:28:34.884284   132 AdminTaskManager.cpp:92] reportTaskFinish(), job=79, task=0, rc=E_REBUILD_INDEX_FAILED
585625 I20220418 11:28:34.888643   132 AdminTaskManager.cpp:134] reportTaskFinish(), job=79, task=0, rc=SUCCEEDED
585626 I20220418 11:28:50.798808    38 AdminTask.cpp:21] createAdminTask (80, 0)
585627 I20220418 11:28:50.798851    38 RebuildIndexTask.cpp:28] Rebuild index task is rate limited to 4194304 for each subtask by default
585628 I20220418 11:28:50.798928    38 AdminTaskManager.cpp:158] enqueue task(80, 0)
585629 I20220418 11:28:50.798934   131 AdminTaskManager.cpp:239] dequeue task(80, 0)
585630 I20220418 11:28:50.798987   131 RebuildIndexTask.cpp:66] This space is building index
585631 I20220418 11:28:50.798995   131 AdminTaskManager.cpp:258] job 80, genSubTask failed, err=E_REBUILD_INDEX_FAILED
585632 I20220418 11:28:50.799010   131 AdminTask.h:129] task(80, 0) finished, rc=[E_REBUILD_INDEX_FAILED]
585633 I20220418 11:28:50.799058   131 AdminTaskManager.cpp:227] waiting for incoming task
585634 I20220418 11:28:50.799137   132 AdminTaskManager.cpp:92] reportTaskFinish(), job=80, task=0, rc=E_REBUILD_INDEX_FAILED
585635 I20220418 11:28:50.800017   132 AdminTaskManager.cpp:134] reportTaskFinish(), job=80, task=0, rc=SUCCEEDED
585636 I20220418 11:28:55.531267    79 MetaClient.cpp:3062] Load leader of "store1":9779 in 0 space

Your Environments (required)

  • OS: uname -a
  • Compiler: g++ --version or clang++ --version
  • CPU: lscpu
  • Commit id 5626e64

How To Reproduce(required)

Steps to reproduce the behavior:

  1. Step 1
  2. Step 2
  3. Step 3

Expected behavior

Additional context

@kikimo kikimo added the type/bug Type: something is unexpected label Apr 15, 2022
@kikimo kikimo added this to the v3.1.0 milestone Apr 15, 2022
@Sophie-Xie Sophie-Xie assigned critical27 and liuyu85cn and unassigned critical27 Apr 15, 2022
@liuyu85cn
Copy link
Contributor

Rebuild index running on storage raft leader.
It will fail if its network isolated. (step3).

And, when raft leader's network partition, there should be a new leader.
But looks like the request keep sending to old leader. (step).

However, after wait for a while, we (with kikimo) found the request can be sent to new leader.
Now wait to see if it can run successfully.

@critical27
Copy link
Contributor

I could modify some logic here, perviously when meta call addTask it does not handle leader change. But REBUILD and some other kind of task do need to handle leader change

@critical27
Copy link
Contributor

After a little digging, the TaskManager in storage can't tell whether a part is leader or not, it only add a task into a queue, and return the response. In other words, it can't tell the leader until the job is actually executed.

So that's why we can only recover the task to the previous old leader. For now, if network partition happens and leader change, use can start a new job instead of recovering the old one as workarounds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

3 participants