Skip to content

server: use same initialcluster config to restart joined member#1279

Merged
disksing merged 10 commits into
tikv:masterfrom
nolouch:fix-join
Oct 25, 2018
Merged

server: use same initialcluster config to restart joined member#1279
disksing merged 10 commits into
tikv:masterfrom
nolouch:fix-join

Conversation

@nolouch

@nolouch nolouch commented Oct 18, 2018

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

detail in: pingcap/tidb-operator#126
because some problem in etcd when joining a member. it may prepare to join failed. and we dynamically generate the join config for join member. and that's may cause the problem:

2018/10/18 03:43:15.239 main.go:92: [fatal] run server failed: couldn't find local name "demo-cluster-pd-3" in the initial cluster configuration

What is changed and how it works?

use the same join config after join success.

Check List

Tests

  • Unit test
  • Integration test(tidb-operator)

@nolouch nolouch changed the title server: use the same initcluster to restart joined member server: use same initalcluster config to restart joined member Oct 18, 2018
Comment thread server/join.go Outdated
}

// Cases with data directory.
filePath := cfg.DataDir + "/join"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filePath := cfg.DataDir + "/join"
filePath := filepath.Join(cfg.DataDir, "join")

Comment thread server/join.go
Comment thread server/join.go Outdated
cfg.InitialCluster = initialCluster
cfg.InitialClusterState = embed.ClusterStateFlagExisting
return nil
err = os.Mkdir(cfg.DataDir, privateDirMode)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use MkdirAll here?

@nolouch nolouch added DNM and removed DNM labels Oct 23, 2018
@nolouch

nolouch commented Oct 23, 2018

Copy link
Copy Markdown
Contributor Author

PTAL

Comment thread server/join.go Outdated
existed := false
for _, m := range listResp.Members {
if len(m.Name) == 0 {
return errors.New("exsist a member that the join is not completed")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about there is a member that has not been joined successfully?
PTAL @CaitinChen

@CaitinChen CaitinChen Oct 23, 2018

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@disksing there is a member that has not joined successfully

Comment thread server/join.go Outdated
return nil
err = os.MkdirAll(cfg.DataDir, privateDirMode)
if err != nil && !os.IsExist(err) {
return err

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithStack?

Comment thread server/join.go Outdated
return err
}

for i := 0; i < retryTimes; i++ {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need to retry. It's ok to exit directly.

Comment thread server/join.go Outdated
}
break
}
return err

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithStack?

@nolouch nolouch changed the title server: use same initalcluster config to restart joined member server: use same initialcluster config to restart joined member Oct 24, 2018
Comment thread server/join.go
@gregwebs

gregwebs commented Nov 9, 2018

Copy link
Copy Markdown
Contributor

@disksing are there any plans to put this into the next RC (assuming there is one)? This issue shows up frequently in our DinD setup, so we could also use this to produce a docker image specifically for that setup.

@disksing

disksing commented Nov 9, 2018

Copy link
Copy Markdown
Contributor

I think we can include it in RC5. /cc @nolouch

@nolouch nolouch added the needs-cherry-pick-release-2.1 The PR needs to cherry pick to release-2.1 branch. label Nov 12, 2018
@nolouch nolouch deleted the fix-join branch November 12, 2018 03:37
nolouch added a commit that referenced this pull request Nov 12, 2018
* server: use same initialcluster config to restart joined member (#1279)

* server/leader: use the last modify revision to watch leader (#1317)
@nolouch nolouch mentioned this pull request Nov 12, 2018
@nolouch nolouch added the needs-cherry-pick-release-2.0 The PR needs to cherry pick to release-2.0 branch. label Nov 13, 2018
disksing pushed a commit to oh-my-tidb/pd that referenced this pull request Nov 14, 2018
disksing added a commit that referenced this pull request Nov 14, 2018
* server, client: fix hanging problem when etcd failed to start (#1267)

* server: use same initialcluster config to restart joined member (#1279)

* fix server build

* pdctl: cherry pick bugfixes (#1298, #1299, #1308)

* server/api: fix the issue about `regions/check` API (#1311)

* fix join build

* fix pdctl build

* fix region test

* fix warnings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-cherry-pick-release-2.0 The PR needs to cherry pick to release-2.0 branch. needs-cherry-pick-release-2.1 The PR needs to cherry pick to release-2.1 branch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants