Skip to content

v2.1.0-rc1 boostd stops accepting deals and performing retrievals, stops displaying piece-doctor UI page #1771

Open
@Shekelme

Description

@Shekelme

Checklist

  • This is not a question or a support request. If you have any boost related questions, please ask in the discussion forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the discussion forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to boost.

Boost component

  • boost daemon - storage providers
  • boost client
  • boost UI
  • boost data-transfer
  • boost index-provider
  • Other

Boost Version

boostd version 2.1.0-rc1+mainnet+git.ba2b53c

Describe the Bug

On two large miners of our team, the following problem manifests itself with different frequency. At some point, boostd stops accepting deals and performing retrievals. At such moments, all UI pages can be opened, except for the piece-doctor (http://192.168.x.y:8080/piece-doctor).
Today I can say I was lucky, and the problem appeared within the same hour on both miners. But on lenovo91 the symptoms are as different from lenovo 92 miner. Please see the "Logging Information" section.
I have tried different versions of this RC, including compiling from the master branch. I am currently using release v2.1.0-rc1 plus git cherry-pick 52f8bc7.
I don't know how to diagnose this problem, I will be happy to provide any necessary information.
If this is informative, then the YugabyteDB Version is 2.18.2.1
When I restart processes to fix the problem, the boostd-data process may not be restarted - everything will work anyway.

Logging Information

LENOVO91
admfc@lenovo91:~$ journalctl -f -u boostd-data
Oct 19 19:43:42 lenovo91 systemd[1]: Started boostd-data.
Oct 19 19:43:42 lenovo91 boostd-data[2077503]: 2023/10/19 19:43:42 goose: no migrations to run. current version: 20230828111523
Oct 19 19:43:42 lenovo91 boostd-data[2077503]: 2023-10-19T19:43:42.907+0300        INFO        boostd-data        cmd/run.go:152        Started boostd-data yugabyte service on address 0.0.0.0:8044
Oct 24 03:52:58 lenovo91 boostd-data[2077503]: 2023-10-24T03:52:58.107+0300        ERROR        rpc        go-jsonrpc@v0.1.8/websocket.go:122        handle me:write tcp 192.168.11.91:8044->192.168.11.91:48918: write: connection reset by peer
Oct 24 03:52:58 lenovo91 boostd-data[2077503]: 2023-10-24T03:52:58.109+0300        ERROR        rpc        go-jsonrpc@v0.1.8/handler.go:369        error and res returned        {"request": {"jsonrpc":"2.0","id":749442,"method":"boostddata.GetIndex","params":[{"/":"baga6ea4seaqaomgacplk5546unk74iv2yalb3wtyawm532cmya5beeh6vwveeaq"}],"meta":{"SpanContext":"AAAhmfZeEpQ5/jnu9swfjcYKAco6iaute5XlAgA="}}, "r.err": "connection closing", "resError": "json: unsupported type: <-chan types.IndexRecord"}
Oct 24 03:52:58 lenovo91 boostd-data[2077503]: 2023-10-24T03:52:58.109+0300        ERROR        rpc        go-jsonrpc@v0.1.8/handler.go:369        error and res returned        {"request": {"jsonrpc":"2.0","id":749441,"method":"boostddata.GetIndex","params":[{"/":"baga6ea4seaqaomgacplk5546unk74iv2yalb3wtyawm532cmya5beeh6vwveeaq"}],"meta":{"SpanContext":"AACc2fiG8+DOJxxD+so385T4AWHeu+lCC7wOAgA="}}, "r.err": "connection closing", "resError": "json: unsupported type: <-chan types.IndexRecord"}
Oct 24 03:52:58 lenovo91 boostd-data[2077503]: 2023-10-24T03:52:58.109+0300        ERROR        rpc        go-jsonrpc@v0.1.8/websocket.go:122        handle me:write tcp 192.168.11.91:8044->192.168.11.91:48918: write: connection reset by peer
Oct 24 03:52:58 lenovo91 boostd-data[2077503]: 2023-10-24T03:52:58.109+0300        ERROR        rpc        go-jsonrpc@v0.1.8/handler.go:369        error and res returned        {"request": {"jsonrpc":"2.0","id":749440,"method":"boostddata.GetIndex","params":[{"/":"baga6ea4seaqaomgacplk5546unk74iv2yalb3wtyawm532cmya5beeh6vwveeaq"}],"meta":{"SpanContext":"AACHvDbU2qS1Uo0wHy+cR7VRAVGTps2S8xU/AgA="}}, "r.err": "connection closing", "resError": "json: unsupported type: <-chan types.IndexRecord"}
Oct 24 03:52:58 lenovo91 boostd-data[2077503]: 2023-10-24T03:52:58.109+0300        ERROR        rpc        go-jsonrpc@v0.1.8/websocket.go:122        handle me:write tcp 192.168.11.91:8044->192.168.11.91:48918: write: connection reset by peer
Oct 24 03:52:58 lenovo91 boostd-data[2077503]: 2023-10-24T03:52:58.109+0300        ERROR        rpc        go-jsonrpc@v0.1.8/websocket.go:122        handle me:write tcp 192.168.11.91:8044->192.168.11.91:48918: write: connection reset by peer


LENOVO92
2023-10-24T03:36:59.634+0300^IDEBUG^Ipiecedirectory^Ipiecedirectory/doctor.go:111^IcheckPiece processing^I{"took": 0.075624085}$
2023-10-24T03:36:59.699+0300^IDEBUG^Ipiecedoc^Ipiecedirectory/doctor.go:177^Ichecking state for market deal^I{"piece": "baga6ea4seaqfmypuryk5f2gmptxnc5qgi7vzal4w6dlebvugh25gbjvo7h4a2ai", "deal": 45107371}$
2023-10-24T03:36:59.701+0300^IDEBUG^Ipiecedoc^Ipiecedirectory/doctor.go:238^Iunflagging piece^I{"piece": "baga6ea4seaqfmypuryk5f2gmptxnc5qgi7vzal4w6dlebvugh25gbjvo7h4a2ai"}$
2023-10-24T03:36:59.702+0300^IDEBUG^Ipiecedirectory^Ipiecedirectory/doctor.go:111^IcheckPiece processing^I{"took": 0.068215793}$
2023-10-24T03:36:59.766+0300^IDEBUG^Ipiecedoc^Ipiecedirectory/doctor.go:177^Ichecking state for market deal^I{"piece": "baga6ea4seaqfn36i6abvwjxumsa5io7zxblxfbq4jz4ej5es5er6inci5jhxqia", "deal": 30902550}$
2023-10-24T03:36:59.769+0300^IDEBUG^Ipiecedoc^Ipiecedirectory/doctor.go:238^Iunflagging piece^I{"piece": "baga6ea4seaqfn36i6abvwjxumsa5io7zxblxfbq4jz4ej5es5er6inci5jhxqia"}$
2023-10-24T03:36:59.770+0300^IDEBUG^Ipiecedirectory^Ipiecedirectory/doctor.go:111^IcheckPiece processing^I{"took": 0.067854893}$
2023-10-24T03:36:59.799+0300^IINFO^Iboost-provider^Istoragemarket/contract_deal_monitor.go:86^Icontract deal monitor context canceled, exiting...$
2023-10-24T03:36:59.799+0300^IINFO^Imodules^Imodules/storageminer.go:458^Icontract deals monitor started$
2023-10-24T03:36:59.864+0300^IDEBUG^Ipiecedoc^Ipiecedirectory/doctor.go:177^Ichecking state for market deal^I{"piece": "baga6ea4seaqfn7dccfxobshs6c5dhyx7g7aow7hloeu7qvcguirborldiulhcla", "deal": 7133975}$
---SILENCE---
2023-10-24T04:32:16.316+0300^IDEBUG^Isectorstatemgr^Isectorstatemgr/sectorstatemgr.go:139^Ichecking for sector state updates$
2023-10-24T04:32:16.316+0300^IINFO^Isectorstatemgr^Isectorstatemgr/sectorstatemgr.go:166^Iredeclaring storage$
2023-10-24T04:32:20.155+0300^IDEBUG^Isectorstatemgr^Isectorstatemgr/sectorstatemgr.go:277^Isector present in all sector states, but not
active^I{"number": {"Miner":1222595,"Number":24199}}$
2023-10-24T04:32:20.155+0300^IDEBUG^Isectorstatemgr^Isectorstatemgr/sectorstatemgr.go:277^Isector present in all sector states, but not
active^I{"number": {"Miner":1222595,"Number":24189}}$
2023-10-24T04:32:20.155+0300^IDEBUG^Isectorstatemgr^Isectorstatemgr/sectorstatemgr.go:277^Isector present in all sector states, but not
active^I{"number": {"Miner":1222595,"Number":24419}}$
2023-10-24T04:32:20.155+0300^IDEBUG^Isectorstatemgr^Isectorstatemgr/sectorstatemgr.go:277^Isector present in all sector states, but not
active^I{"number": {"Miner":1222595,"Number":24413}}$
2023-10-24T04:32:20.155+0300^IDEBUG^Isectorstatemgr^Isectorstatemgr/sectorstatemgr.go:277^Isector present in all sector states, but not
active^I{"number": {"Miner":1222595,"Number":24190}}$
---LOG LINES---
2023-10-24T04:32:21.200+0300^IDEBUG^Iindex-provider-wrapper^Iindexprovider/wrapper.go:161^Isector {1222595 20947} has 0 deals, seal status Removed$
2023-10-24T04:32:21.202+0300^IDEBUG^Iindex-provider-wrapper^Iindexprovider/wrapper.go:161^Isector {1222595 20815} has 0 deals, seal status Removed$
2023-10-24T04:32:21.204+0300^IDEBUG^Iindex-provider-wrapper^Iindexprovider/wrapper.go:161^Isector {1222595 17329} has 0 deals, seal status Removed$
2023-10-24T04:32:27.536+0300^IINFO^Iboost-storage-deal^Ilogs/log.go:95^ICleaning logs older than 30 days from logsDB $
---SILENCE---
2023-10-24T05:32:16.316+0300^IDEBUG^Isectorstatemgr^Isectorstatemgr/sectorstatemgr.go:139^Ichecking for sector state updates$
2023-10-24T05:32:16.316+0300^IINFO^Isectorstatemgr^Isectorstatemgr/sectorstatemgr.go:166^Iredeclaring storage$
2023-10-24T05:32:20.022+0300^IDEBUG^Isectorstatemgr^Isectorstatemgr/sectorstatemgr.go:277^Isector present in all sector states, but not active^I{"number": {"Miner":1222595,"Number":24294}}$
2023-10-24T05:32:20.022+0300^IDEBUG^Isectorstatemgr^Isectorstatemgr/sectorstatemgr.go:277^Isector present in all sector states, but not active^I{"number": {"Miner":1222595,"Number":24199}}$
2023-10-24T05:32:20.022+0300^IDEBUG^Isectorstatemgr^Isectorstatemgr/sectorstatemgr.go:277^Isector present in all sector states, but not active^I{"number": {"Miner":1222595,"Number":24424}}$
---LOG LINES---

Repo Steps

  1. Run '...'
  2. Do '...'
  3. See error '...'
    ...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions