Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: bug fix #8

Merged
merged 7 commits into from Jan 2, 2018
Merged

Conversation

Leeshine
Copy link

@Leeshine Leeshine commented Nov 28, 2017

At now, we can sync data from RGW to COS(support S3 protocol ) as expected, related modifications are as below:

  • use camelcase format in request headers(e.g: use Date instead of DATE)
  • use time format defined by RFC1123 in request headers
  • avoid use Chunked transfer encoding
  • add virtual hosted-style support
  • add a bucket-suffix tier-config

Hope you can take a review, and we'll be glad if these modifications are useful. @yehudasa

@Leeshine Leeshine changed the title Wip rgw cloud sync rgw: bug fix Nov 28, 2017
@Leeshine Leeshine force-pushed the wip-rgw-cloud-sync branch 2 times, most recently from 0ec0495 to 5a6c030 Compare November 28, 2017 09:33
Copy link
Owner

@yehudasa yehudasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not finished reviewing, but it seems that the code was a bit in-flux and the commits I commented on changed so I'm submitting it early

@@ -182,7 +182,7 @@ bool rgw_create_s3_canonical_header(const req_info& info,
const char *str = info.env->get("HTTP_X_AMZ_DATE");
const char *req_date = str;
if (str == NULL) {
req_date = info.env->get("HTTP_DATE");
req_date = info.env->get("HTTP_Date");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Leeshine I understand where this is coming from, but I'm not sure this is the correct fix. We use an all uppercase header names (with underscores instead of dashes) due to historical reason. That is how fastcgi used to deal with request headers, so that it would be compatible with the cgi interrace (that used env variables for headers). While all this history is not really relevant to what we're trying to do, I'm afraid that without a more comprehensive change, we'll end up breaking unintended stuff.
Now, note that the map used in RGWEnv is already case insensitive, so you don't really need to do the change here. What we can do, however, is that in RGWEnv itself when we insert new entries, the keys end up being transformed into the required format. We'll need to audit the cases where the RGWEnv map is used directly (see RGWEnv::get_map()) and make sure nothing got broken.
Another option that is less likely to break things is to do a transformation at the http client level, when generating the curl request.

time_t rawtime;
char buffer[80];

time(&rawtime);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ceph::real_clock::now() and ceph::real_clock::to_time_t()

char buffer[80];

time(&rawtime);
struct tm* timeInfo = gmtime(&rawtime);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gmtime() is not thread safe, use gmtime_r().

@yehudasa
Copy link
Owner

yehudasa commented Nov 28, 2017

@Leeshine I also had a comment on a commit that you dropped (calling url_encode() in send_prepare()). While the commit wasn't completely correct, please note that there is a bug there that could be fixed:
some callers to send_prepare() already do url_encode() prior to calling it. Since the only relevant caller that doesn't do it is internal, you can set this send_prepare() to be a private method, rename it to something like do_send_prepare() and create a wrapper that is public that does a url_encode() on resource. Make sure that the other caller to this send_prepare() call the do_send_prepare() now.

Edit: seems that the commit is still there, added the comment there

* make attrs Look-Like-This
* converts underscores to dashes
*/
static string camelcase_dash_http_attr(const string& orig)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Leeshine this is just a copy paste of the same function in rgw_rest.cc. You can just put the prototype here if including rgw_rest.h is a problem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should move the related functions camelcase_dash_http_attr, lowercase_dash_http_attr,lowercase_underscore_http_attr, uppercase_underscore_http_attr from rgw_rest to rgw_common

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Owner

@yehudasa yehudasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments

@@ -251,6 +282,8 @@ static curl_slist *headers_to_slist(param_vec_t& headers)
val = val.substr(5);
}

val = camelcase_dash_http_attr(val);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@@ -616,9 +616,9 @@ int RGWRESTStreamRWRequest::send_prepare(RGWAccessKey *key, map<string, string>&

string new_resource;
if (resource[0] == '/') {
new_resource = resource.substr(1);
url_encode(resource.substr(1), new_resource);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some callers to send_prepare() already do url_encode() prior to calling it. Since the only relevant caller that doesn't do it is internal, you can set this send_prepare() to be a private method, rename it to something like do_send_prepare() and create a wrapper that is public that does a url_encode() on resource. Make sure that the other caller to this send_prepare() call the do_send_prepare() now.

}

url_encode(resourceStr, resource);
replaceAll(resource, "%2F", "/");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have another version of url_encode that gets a list of characters that shouldn't be encoded.

if (i != config.end())
host_style_str = i->second;

if (host_style_str == "" || host_style_str != "virtual") {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for two checks here, can just if (host_style_str != "virtual"), unless you change the second check to if (host_style_str == "path")

@@ -438,7 +438,7 @@ void RGWRESTSendResource::init_common(param_vec_t *extra_headers)
int RGWRESTSendResource::send(bufferlist& outbl)
{
req.set_outbl(outbl);
int ret = req.send_request(&conn->get_key(), headers, resource, mgr);
int ret = req.send_request(&conn->get_key(), headers, resource, mgr, req.get_outbl());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this call is weird. req already holds the outbl, so maybe find a different way to telll it to set the send size?

Copy link
Author

@Leeshine Leeshine Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of call set_outbl here, maybe we can specify the send_data in req.send_request() and use set_outbl(*send_data) in RGWRESTStreamRWRequest::send_prepare, is this in a right way ?
@yehudasa

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Leeshine I'm not 100% sure, need to take a closer look at everything. One thing to keep in mind is that RGWRESTStreamRWRequest can be used without outbl when streaming data. There are also a few subclasses that inherit from it, so I'd be careful not to break these also.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yehudasa if the send_data is not specified here, we also need to use get_outbl() in RGWRESTStreamRWRequest to decide whether to set send size or not, and in RGWRESTStreamRWRequest::send_prepare() , if the send_data is specified, it will call set_outbl() too, it seems to be a superfluous call, so maybe we only need to remove req.set_outbl(outbl) and specified send_data here ? And it should not take some side effects to the subclassed of RGWRESTStreamRWRequest or used when streaming data.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Leeshine can you just call req.set_send_length(oubl.length()) here? Another option is at req.set_outbl() to do a set_send_length(). I'm not extremely happy with any of the solutions, there isn't a clear distinction in the interfaces between an actual streaming write and a single write where we have all the data in advance.

@@ -452,7 +452,8 @@ int RGWRESTSendResource::send(bufferlist& outbl)
int RGWRESTSendResource::aio_send(bufferlist& outbl)
{
req.set_outbl(outbl);
int ret = req.send_request(&conn->get_key(), headers, resource, mgr);

int ret = req.send_request(&conn->get_key(), headers, resource, mgr, req.get_outbl());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@Leeshine
Copy link
Author

Leeshine commented Nov 29, 2017

@yehudasa thanks for your kindly review, here we do some changes:

  • since use camelcase format in request headers is a widely problem, we have put the related PR to master branch(pr: use camelcase format ) , and for the convenience of review, we just revert this commit here, we will do a rebase in the end
  • add a private function do_send_prepare() that will not do url_encode, all the resource will be url_encode in send_prepare()
  • in RGWAWSHandleRemoteObjCBCR, we'll get a 409 response from S3 when we re-send creating bucket request, after then, the data sync will not in the right way, so we shall do some changes to handle it correctly, the related logs are as below:
2017-11-29 20:31:40.396743 7f56d246e700 10 received header:HTTP/1.1 409 Conflict
2017-11-29 20:31:40.396754 7f56d246e700 10 receive_http_header
2017-11-29 20:31:40.396767 7f56d246e700 10 received header:Content-Type: application/xml
2017-11-29 20:31:40.396777 7f56d246e700 10 receive_http_header
2017-11-29 20:31:40.396778 7f56d246e700 10 received header:Content-Length: 558
2017-11-29 20:31:40.396805 7f56d246e700 10 receive_http_header
2017-11-29 20:31:40.396808 7f56d246e700 10 received header:Connection: keep-alive
2017-11-29 20:31:40.396815 7f56d246e700 10 receive_http_header
2017-11-29 20:31:40.396816 7f56d246e700 10 received header:Date: Wed, 29 Nov 2017 12:31:51 GMT
2017-11-29 20:31:40.396820 7f56d246e700 10 receive_http_header
2017-11-29 20:31:40.396821 7f56d246e700 10 received header:Server: tencent-cos
2017-11-29 20:31:40.396824 7f56d246e700 10 receive_http_header
2017-11-29 20:31:40.396825 7f56d246e700 10 received header:x-amz-request-id: NWExZWE4YjZfNWJiMjU4NjRfYTA4Yl9kOTljZg==
2017-11-29 20:31:40.396894 7f56d246e700 10 receive_http_header
2017-11-29 20:31:40.396897 7f56d246e700 10 received header:x-amz-trace-id: OGVmYzZiMmQzYjA2OWNhODk0NTRkMTBiOWVmMDAxODc0OWRkZjk0ZDM1NmI1M2E2MTRlY2MzZDhmNmI5MWI1OTBjYzE2MjAxN2M1MzJiOTd
kZjMxMDVlYTZjN2FiMmI0MzhmNzA3MDhiNThiMTkzNmQ4ZDY4OTM2OGU4YTM5Y2M=
2017-11-29 20:31:40.396906 7f56d246e700 10 receive_http_header
2017-11-29 20:31:40.396907 7f56d246e700 10 received header:
2017-11-29 20:31:40.397448 7f56d2c6f700 20 cr:s=0x7f570209d560:op=0x7f5702789800:23RGWPutRawRESTResourceCRIiE: operate()
2017-11-29 20:31:40.397514 7f56d2c6f700  5 failed to wait for op, ret=-39: PUT rgwx43f7d3c24b444dd385b9eb984104193c93c-1253596042.cos.ap-chengdu.myqcloud.com/
2017-11-29 20:31:40.397536 7f56d2c6f700 20 cr:s=0x7f570209d560:op=0x7f5702789800:23RGWPutRawRESTResourceCRIiE: operate() returned r=-39
2017-11-29 20:31:40.397550 7f56d2c6f700 20 cr:s=0x7f570209d560:op=0x7f5701dd0000:25RGWAWSHandleRemoteObjCBCR: operate()
2017-11-29 20:31:40.397553 7f56d2c6f700 20 cr:s=0x7f570209d560:op=0x7f5701dd0000:25RGWAWSHandleRemoteObjCBCR: operate() returned r=-39
2017-11-29 20:31:40.397578 7f56d2c6f700 20 cr:s=0x7f570209d560:op=0x7f5701bf2000:23RGWAWSHandleRemoteObjCR: operate()
2017-11-29 20:31:40.397580 7f56d2c6f700  0 RGWStatRemoteObjCR() callback returned -39
2017-11-29 20:31:40.397581 7f56d2c6f700 20 cr:s=0x7f570209d560:op=0x7f5701bf2000:23RGWAWSHandleRemoteObjCR: operate() returned r=-39
2017-11-29 20:31:40.397586 7f56d2c6f700 20 cr:s=0x7f570209d560:op=0x7f5701ce7000:26RGWBucketSyncSingleEntryCRISs11rgw_obj_keyE: operate()
2017-11-29 20:31:40.397618 7f56d2c6f700 10 RGW-SYNC:data:sync:shard[104]:entry[samonlv:170a17b2-38c4-42e7-9ecc-fafcd3a81e39.4115.1]:bucket[samonlv:170a17b2-38c4-42e7-9ecc-fafcd3a81e3
9.4115.1]:inc_sync[samonlv:170a17b2-38c4-42e7-9ecc-fafcd3a81e39.4115.1]:entry[tt2]: failed, retcode=-39 ((39) Directory not empty) 
  • in addition, use unordered_map <string, bool> bucket_created in RGWAWSHandleRemoteObjCBCR is useless, maybe we should move it to RGWAWSDataSyncModule to avoid creating bucket every time when sync a object

@liuchang0812
Copy link

need rebase

@Leeshine
Copy link
Author

Leeshine commented Dec 1, 2017

rebase it

@liuchang0812
Copy link

@yehudasa ping

Copy link
Owner

@yehudasa yehudasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments

@@ -1448,23 +1448,22 @@ static bool char_needs_url_encoding(char c)
return false;
}

void url_encode(const string& src, string& dst)
void url_encode(const string& src, string& dst, bool encodeSlash)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change encodeSlash to encode_slash?

if (host_style == VirtualStyle){
resourceStr = obj.get_oid();
new_url = obj.bucket.name + "." + new_url;
}else {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after }

string new_url = url;

if (host_style == VirtualStyle){
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space before {

@@ -438,9 +438,24 @@ static void add_grants_headers(map<int, string>& grants, RGWEnv& env, map<string

void RGWRESTStreamS3PutObj::send_init(rgw_obj& obj)
{
string resourceStr;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resourceStr -> resource_str

uri.append("/");
}

if(host_style == VirtualStyle) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaceing


if (host_style_str != "virtual") {
conf.host_style = PathStyle;
}else {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after }

string bucket_name="rgwx" + bucket_info.zonegroup;
if (user_buckets){
bucket_name+=bucket_info.owner.tenant + bucket_info.owner.id;
}
bucket_name.erase(std::remove(bucket_name.begin(),bucket_name.end(),'-'));
if (bucket_suffix != "")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open scope

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, if (!bucket_prefix.empty())

@@ -971,6 +974,10 @@ int RGWAWSSyncModule::create_instance(CephContext *cct, map<string, string, ltst
if (i != config.end())
conf.s3_endpoint = i->second;

i = config.find("bucket_suffix");
if ( i != config.end())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scope

h = curl_slist_append(h, "Transfer-Encoding:");
stringstream ss;
ss << "Content-Length: " << send_len;
h = curl_slist_append(h, ss.str().c_str());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is all of this needed? shouldn't setting CURLOPT_INFILESIZE do it for us already?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, it's an unnecessary modify, if the send_length has specified correctly, it will not use chunked encoding

@@ -811,7 +811,7 @@ class RGWAWSHandleRemoteObjCBCR: public RGWStatRemoteObjCBCR {
sync_env->http_manager,
target_bucket_name, nullptr, bl, nullptr));
}
if (retcode < 0) {
if (retcode < 0 && retcode != -ENOLCK) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound right to me. Where do you get the -ENOLCK, and why is ignoring it here fixes it for you?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, that's my mistake, it should be -ENOTEMPTY in this case.

@Leeshine Leeshine force-pushed the wip-rgw-cloud-sync branch 3 times, most recently from eb98bf5 to df1e114 Compare December 4, 2017 14:05
@Leeshine
Copy link
Author

Leeshine commented Dec 4, 2017

thanks for your kindly review, @yehudasa ping

Copy link
Owner

@yehudasa yehudasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

@@ -812,7 +812,7 @@ class RGWAWSHandleRemoteObjCBCR: public RGWStatRemoteObjCBCR {
sync_env->http_manager,
target_bucket_name, nullptr, bl, nullptr));
}
if (retcode < 0) {
if (retcode < 0 && retcode != -ENOTEMPTY) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure it's not -EEXIST?

Copy link
Author

@Leeshine Leeshine Dec 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At now, the retcode is -39(-ENOTEMPTY), the related logs are as below:

2017-11-29 20:31:40.397448 7f56d2c6f700 20 cr:s=0x7f570209d560:op=0x7f5702789800:23RGWPutRawRESTResourceCRIiE: operate()
2017-11-29 20:31:40.397514 7f56d2c6f700  5 failed to wait for op, ret=-39: PUT rgwx43f7d3c24b444dd385b9eb984104193c93c-1253596042.cos.ap-chengdu.myqcloud.com/
2017-11-29 20:31:40.397536 7f56d2c6f700 20 cr:s=0x7f570209d560:op=0x7f5702789800:23RGWPutRawRESTResourceCRIiE: operate() returned r=-39
2017-11-29 20:31:40.397550 7f56d2c6f700 20 cr:s=0x7f570209d560:op=0x7f5701dd0000:25RGWAWSHandleRemoteObjCBCR: operate()
2017-11-29 20:31:40.397553 7f56d2c6f700 20 cr:s=0x7f570209d560:op=0x7f5701dd0000:25RGWAWSHandleRemoteObjCBCR: operate() returned r=-39
2017-11-29 20:31:40.397578 7f56d2c6f700 20 cr:s=0x7f570209d560:op=0x7f5701bf2000:23RGWAWSHandleRemoteObjCR: operate()
2017-11-29 20:31:40.397580 7f56d2c6f700  0 RGWStatRemoteObjCR() callback returned -39
2017-11-29 20:31:40.397581 7f56d2c6f700 20 cr:s=0x7f570209d560:op=0x7f5701bf2000:23RGWAWSHandleRemoteObjCR: operate() returned r=-39
2017-11-29 20:31:40.397586 7f56d2c6f700 20 cr:s=0x7f570209d560:op=0x7f5701ce7000:26RGWBucketSyncSingleEntryCRISs11rgw_obj_keyE: operate()
2017-11-29 20:31:40.397618 7f56d2c6f700 10 RGW-SYNC:data:sync:shard[104]:entry[samonlv:170a17b2-38c4-42e7-9ecc-fafcd3a81e39.4115.1]:bucket[samonlv:170a17b2-38c4-42e7-9ecc-fafcd3a81e3
9.4115.1]:inc_sync[samonlv:170a17b2-38c4-42e7-9ecc-fafcd3a81e39.4115.1]:entry[tt2]: failed, retcode=-39 ((39) Directory not empty) 

But to be honest, I think the reasonable retcode should be -EEXIST or -ERR_BUCKET_EXISTS

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see. Yeah, the 409 is ambiguous. Maybe we can somehow pass the http status code and use that here instead?

@@ -971,6 +975,10 @@ int RGWAWSSyncModule::create_instance(CephContext *cct, map<string, string, ltst
if (i != config.end())
conf.s3_endpoint = i->second;

i = config.find("bucket_suffix");
if (i != config.end())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing { and }

@@ -44,6 +43,8 @@ static string obj_to_aws_path(const rgw_obj& obj)
struct AWSSyncConfig {
string s3_endpoint;
RGWAccessKey key;
HostStyle host_style;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Leeshine still needs to have default initialization

@@ -702,7 +702,9 @@ int RGWRESTStreamRWRequest::do_send_prepare(RGWAccessKey *key, map<string, strin
if (send_data) {
set_outbl(*send_data);
send_data_hint = true;
set_send_length(send_data->length());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should call set_send_length before set_outbl.

if send_data is not nullptr, outbl is nullptr(set_outbl is not called), content length will be 0. because set_outbl will swap send_data and outbl.

  void set_outbl(bufferlist& _outbl) {
    outbl.swap(_outbl);
  }

@Leeshine Leeshine force-pushed the wip-rgw-cloud-sync branch 2 times, most recently from a2affda to 3743d1c Compare December 14, 2017 05:09
Signed-off-by: lvshanchun <lvshanchun@gmail.com>
when sent request to S3, we should not encode the forward
slash character('/') in the object key name, so we need add
a encode_slash param in url_encode to decide whether to encode
the slash or not.

Signed-off-by: lvshanchun <lvshanchun@gmail.com>
encode the resource in send_prepare(), as a result, all the callers
to do_send_prepare() have do url_encode before calling it.

Signed-off-by: lvshanchun <lvshanchun@gmail.com>
add host-style field in tier-config to specify the related
zone's hosted-style used in request from RGW, if this config
is not specified, path hosted-style will be used as default.

Signed-off-by: lvshanchun <lvshanchun@gmail.com>
Signed-off-by: lvshanchun <lvshanchun@gmail.com>
Signed-off-by: lvshanchun <lvshanchun@gmail.com>
pass the http body and use it when creating a bucket

Signed-off-by: lvshanchun <lvshanchun@gmail.com>
@Leeshine
Copy link
Author

Leeshine commented Dec 14, 2017

@yehudasa ping
modification:

  • pass http body to RGWAWSHandleRemoteObjCBCR when creating a bucket, if the http_code is BucketAlreadyOwnedByYou, the data sync CR will go on, otherwise it will exit.

@Leeshine
Copy link
Author

@yehudasa would you mind take a look?

@yehudasa
Copy link
Owner

Sorry for late response, I'm out this week, will continue when I'm back.

@yehudasa yehudasa merged commit 9bba8ee into yehudasa:wip-rgw-cloud-sync Jan 2, 2018
yehudasa pushed a commit that referenced this pull request Feb 21, 2020
Accordingly to cppreference.com [1]:

  "If multiple threads of execution access the same std::shared_ptr
  object without synchronization and any of those accesses uses
  a non-const member function of shared_ptr then a data race will
  occur (...)"

[1]: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic

One of the coredumps showed the `shared_ptr`-typed `OSD::osdmap`
with healthy looking content but damaged control block:

  ```
  [Current thread is 1 (Thread 0x7f7dcaf73700 (LWP 205295))]
  (gdb) bt
  #0  0x0000559cb81c3ea0 in ?? ()
  #1  0x0000559c97675b27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
  #2  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
  #3  0x0000559c975ef8aa in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
  #4  std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
  #5  std::shared_ptr<OSDMap const>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
  #6  OSD::create_context (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9053
  #7  0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...)
      at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665
  #8  0x0000559c97886db6 in ceph::osd::scheduler::PGPeeringItem::run (this=<optimized out>, osd=<optimized out>, sdata=<optimized out>, pg=..., handle=...) at /usr/include/c++/8/ext/atomicity.h:96
  #9  0x0000559c9764862f in ceph::osd::scheduler::OpSchedulerItem::run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7f7dcaf703f0) at /usr/include/c++/8/bits/unique_ptr.h:342
  #10 OSD::ShardedOpWQ::_process (this=<optimized out>, thread_index=<optimized out>, hb=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:10677
  #11 0x0000559c97c76094 in ShardedThreadPool::shardedthreadpool_worker (this=0x559ca22aca28, thread_index=14) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.cc:311
  #12 0x0000559c97c78cf4 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.h:706
  #13 0x00007f7df17852de in start_thread () from /lib64/libpthread.so.0
  #14 0x00007f7df052f133 in __libc_ifunc_impl_list () from /lib64/libc.so.6
  #15 0x0000000000000000 in ?? ()
  (gdb) frame 7
  #7  0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...)
      at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665
  9665      in /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc
  (gdb) print osdmap
  $24 = std::shared_ptr<const OSDMap> (expired, weak count 0) = {get() = 0x559cba028000}
  (gdb) print *osdmap
     # pretty sane OSDMap
  (gdb) print sizeof(osdmap)
  $26 = 16
  (gdb) x/2a &osdmap
  0x559ca22acef0:   0x559cba028000  0x559cba0ec900

  (gdb) frame 2
  #2  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
  148       /usr/include/c++/8/bits/shared_ptr_base.h: No such file or directory.
  (gdb) disassemble
  Dump of assembler code for function std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release():
  ...
     0x0000559c97675b1e <+62>:      mov    (%rdi),%rax
     0x0000559c97675b21 <+65>:      mov    %rdi,%rbx
     0x0000559c97675b24 <+68>:      callq  *0x10(%rax)
  => 0x0000559c97675b27 <+71>:      test   %rbp,%rbp
  ...
  End of assembler dump.
  (gdb) info registers rdi rbx rax
  rdi            0x559cba0ec900      94131624790272
  rbx            0x559cba0ec900      94131624790272
  rax            0x559cba0ec8a0      94131624790176
  (gdb) x/a 0x559cba0ec8a0 + 0x10
  0x559cba0ec8b0:   0x559cb81c3ea0
  (gdb) bt
  #0  0x0000559cb81c3ea0 in ?? ()
  ...
  (gdb) p $_siginfo._sifields._sigfault.si_addr
  $27 = (void *) 0x559cb81c3ea0
  ```

Helgrind seems to agree:
  ```
  ==00:00:02:54.519 510301== Possible data race during write of size 8 at 0xF123930 by thread ceph#90
  ==00:00:02:54.519 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8
  ==00:00:02:54.519 510301==    at 0x7218DD: operator= (shared_ptr_base.h:1078)
  ==00:00:02:54.519 510301==    by 0x7218DD: operator= (shared_ptr.h:103)
  ==00:00:02:54.519 510301==    by 0x7218DD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116)
  ==00:00:02:54.519 510301==    by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
  ==00:00:02:54.519 510301==    by 0x72A06C: Context::complete(int) (Context.h:77)
  ==00:00:02:54.519 510301==    by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
  ==00:00:02:54.519 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:02:54.519 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:02:54.519 510301==    by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
  ==00:00:02:54.519 510301==
  ==00:00:02:54.519 510301== This conflicts with a previous read of size 8 by thread ceph#117
  ==00:00:02:54.519 510301== Locks held: 1, at address 0x2123E9A0
  ==00:00:02:54.519 510301==    at 0x6B5842: __shared_ptr (shared_ptr_base.h:1165)
  ==00:00:02:54.519 510301==    by 0x6B5842: shared_ptr (shared_ptr.h:129)
  ==00:00:02:54.519 510301==    by 0x6B5842: get_osdmap (OSD.h:1700)
  ==00:00:02:54.519 510301==    by 0x6B5842: OSD::create_context() (OSD.cc:9053)
  ==00:00:02:54.519 510301==    by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665)
  ==00:00:02:54.519 510301==    by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701)
  ==00:00:02:54.519 510301==    by 0x70E62E: run (OpSchedulerItem.h:148)
  ==00:00:02:54.519 510301==    by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677)
  ==00:00:02:54.519 510301==    by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311)
  ==00:00:02:54.519 510301==    by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706)
  ==00:00:02:54.519 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:02:54.519 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:02:54.519 510301==  Address 0xf123930 is 3,824 bytes inside a block of size 10,296 alloc'd
  ==00:00:02:54.519 510301==    at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433)
  ==00:00:02:54.519 510301==    by 0x66F766: main (ceph_osd.cc:688)
  ==00:00:02:54.519 510301==  Block was alloc'd by thread #1
  ```

Actually there is plenty of similar issues reported like:
  ```
  ==00:00:05:04.903 510301== Possible data race during read of size 8 at 0x1E3E0588 by thread ceph#119
  ==00:00:05:04.903 510301== Locks held: 1, at address 0x1EAD41D0
  ==00:00:05:04.903 510301==    at 0x753165: clear (hashtable.h:2051)
  ==00:00:05:04.903 510301==    by 0x753165: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t>
  >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__deta
  il::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369)
  ==00:00:05:04.903 510301==    by 0x75331C: ~unordered_map (unordered_map.h:102)
  ==00:00:05:04.903 510301==    by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350)
  ==00:00:05:04.903 510301==    by 0x753606: operator() (shared_cache.hpp:100)
  ==00:00:05:04.903 510301==    by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr
  _base.h:471)
  ==00:00:05:04.903 510301==    by 0x73BB26: _M_release (shared_ptr_base.h:155)
  ==00:00:05:04.903 510301==    by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148)
  ==00:00:05:04.903 510301==    by 0x6B58A9: ~__shared_count (shared_ptr_base.h:728)
  ==00:00:05:04.903 510301==    by 0x6B58A9: ~__shared_ptr (shared_ptr_base.h:1167)
  ==00:00:05:04.903 510301==    by 0x6B58A9: ~shared_ptr (shared_ptr.h:103)
  ==00:00:05:04.903 510301==    by 0x6B58A9: OSD::create_context() (OSD.cc:9053)
  ==00:00:05:04.903 510301==    by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665)
  ==00:00:05:04.903 510301==    by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701)
  ==00:00:05:04.903 510301==    by 0x70E62E: run (OpSchedulerItem.h:148)
  ==00:00:05:04.903 510301==    by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677)
  ==00:00:05:04.903 510301==    by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311)
  ==00:00:05:04.903 510301==    by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706)
  ==00:00:05:04.903 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:05:04.903 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:05:04.903 510301==    by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
  ==00:00:05:04.903 510301==
  ==00:00:05:04.903 510301== This conflicts with a previous write of size 8 by thread ceph#90
  ==00:00:05:04.903 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8
  ==00:00:05:04.903 510301==    at 0x7531E1: clear (hashtable.h:2054)
  ==00:00:05:04.903 510301==    by 0x7531E1: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369)
  ==00:00:05:04.903 510301==    by 0x75331C: ~unordered_map (unordered_map.h:102)
  ==00:00:05:04.903 510301==    by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350)
  ==00:00:05:04.903 510301==    by 0x753606: operator() (shared_cache.hpp:100)
  ==00:00:05:04.903 510301==    by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:471)
  ==00:00:05:04.903 510301==    by 0x73BB26: _M_release (shared_ptr_base.h:155)
  ==00:00:05:04.903 510301==    by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148)
  ==00:00:05:04.903 510301==    by 0x72191E: operator= (shared_ptr_base.h:747)
  ==00:00:05:04.903 510301==    by 0x72191E: operator= (shared_ptr_base.h:1078)
  ==00:00:05:04.903 510301==    by 0x72191E: operator= (shared_ptr.h:103)
  ==00:00:05:04.903 510301==    by 0x72191E: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116)
  ==00:00:05:04.903 510301==    by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
  ==00:00:05:04.903 510301==    by 0x72A06C: Context::complete(int) (Context.h:77)
  ==00:00:05:04.903 510301==    by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
  ==00:00:05:04.903 510301==  Address 0x1e3e0588 is 872 bytes inside a block of size 1,208 alloc'd
  ==00:00:05:04.903 510301==    at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433)
  ==00:00:05:04.903 510301==    by 0x6C7C0C: OSDService::try_get_map(unsigned int) (OSD.cc:1606)
  ==00:00:05:04.903 510301==    by 0x7213BD: get_map (OSD.h:699)
  ==00:00:05:04.903 510301==    by 0x7213BD: get_map (OSD.h:1732)
  ==00:00:05:04.903 510301==    by 0x7213BD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8076)
  ==00:00:05:04.903 510301==    by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
  ==00:00:05:04.903 510301==    by 0x72A06C: Context::complete(int) (Context.h:77)
  ==00:00:05:04.903 510301==    by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
  ==00:00:05:04.903 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:05:04.903 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:05:04.903 510301==    by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
  ```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
yehudasa pushed a commit that referenced this pull request Apr 13, 2020
* no need to discard_result(). as `output_stream::close()` returns an
  empty future<> already
* free the connected socket after the background task finishes, because:

we should not free the connected socket before the promise referencing it is fulfilled.

otherwise we have error messages from ASan, like

==287182==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000019aa0 at pc 0x55e2ae2de882 bp 0x7fff7e2bf080 sp 0x7fff7e2bf078
READ of size 8 at 0x611000019aa0 thread T0
    #0 0x55e2ae2de881 in seastar::reactor_backend_aio::await_events(int, __sigset_t const*) ../src/seastar/src/core/reactor_backend.cc:396
    #1 0x55e2ae2dfb59 in seastar::reactor_backend_aio::reap_kernel_completions() ../src/seastar/src/core/reactor_backend.cc:428
    #2 0x55e2adbea397 in seastar::reactor::reap_kernel_completions_pollfn::poll() (/var/ssd/ceph/build/bin/crimson-osd+0x155e9397)
    #3 0x55e2adaec6d0 in seastar::reactor::poll_once() ../src/seastar/src/core/reactor.cc:2789
    #4 0x55e2adae7cf7 in operator() ../src/seastar/src/core/reactor.cc:2687
    #5 0x55e2adb7c595 in __invoke_impl<bool, seastar::reactor::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:60
    #6 0x55e2adb699b0 in __invoke_r<bool, seastar::reactor::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:113
    #7 0x55e2adb50222 in _M_invoke /usr/include/c++/10/bits/std_function.h:291
    #8 0x55e2adc2ba00 in std::function<bool ()>::operator()() const /usr/include/c++/10/bits/std_function.h:622
    #9 0x55e2adaea491 in seastar::reactor::run() ../src/seastar/src/core/reactor.cc:2713
    #10 0x55e2ad98f1c7 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) ../src/seastar/src/core/app-template.cc:199
    #11 0x55e2a9e57538 in main ../src/crimson/osd/main.cc:148
    #12 0x7fae7f20de0a in __libc_start_main ../csu/libc-start.c:308
    #13 0x55e2a9d431e9 in _start (/var/ssd/ceph/build/bin/crimson-osd+0x117421e9)

0x611000019aa0 is located 96 bytes inside of 240-byte region [0x611000019a40,0x611000019b30)
freed by thread T0 here:
    #0 0x7fae80a4e487 in operator delete(void*, unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.6+0xac487)
    #1 0x55e2ae302a0a in seastar::aio_pollable_fd_state::~aio_pollable_fd_state() ../src/seastar/src/core/reactor_backend.cc:458
    #2 0x55e2ae2e1059 in seastar::reactor_backend_aio::forget(seastar::pollable_fd_state&) ../src/seastar/src/core/reactor_backend.cc:524
    #3 0x55e2adab9b9a in seastar::pollable_fd_state::forget() ../src/seastar/src/core/reactor.cc:1396
    #4 0x55e2adab9d05 in seastar::intrusive_ptr_release(seastar::pollable_fd_state*) ../src/seastar/src/core/reactor.cc:1401
    #5 0x55e2ace1b72b in boost::intrusive_ptr<seastar::pollable_fd_state>::~intrusive_ptr() /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:98
    #6 0x55e2ace115a5 in seastar::pollable_fd::~pollable_fd() ../src/seastar/include/seastar/core/internal/pollable_fd.hh:109
    #7 0x55e2ae0ed35c in seastar::net::posix_server_socket_impl::~posix_server_socket_impl() ../src/seastar/include/seastar/net/posix-stack.hh:161
    #8 0x55e2ae0ed3cf in seastar::net::posix_server_socket_impl::~posix_server_socket_impl() ../src/seastar/include/seastar/net/posix-stack.hh:161
    #9 0x55e2ae0ed943 in std::default_delete<seastar::net::api_v2::server_socket_impl>::operator()(seastar::net::api_v2::server_socket_impl*) const /usr/include/c++/10/bits/unique_ptr.h:81
    #10 0x55e2ae0db357 in std::unique_ptr<seastar::net::api_v2::server_socket_impl, std::default_delete<seastar::net::api_v2::server_socket_impl> >::~unique_ptr()
	/usr/include/c++/10/bits/unique_ptr.h:357    #11 0x55e2ae1438b7 in seastar::api_v2::server_socket::~server_socket() ../src/seastar/src/net/stack.cc:195
    #12 0x55e2aa1c7656 in std::_Optional_payload_base<seastar::api_v2::server_socket>::_M_destroy() /usr/include/c++/10/optional:260
    #13 0x55e2aa16c84b in std::_Optional_payload_base<seastar::api_v2::server_socket>::_M_reset() /usr/include/c++/10/optional:280
    #14 0x55e2ac24b2b7 in std::_Optional_base_impl<seastar::api_v2::server_socket, std::_Optional_base<seastar::api_v2::server_socket, false, false> >::_M_reset() /usr/include/c++/10/optional:432
    #15 0x55e2ac23f37b in std::optional<seastar::api_v2::server_socket>::reset() /usr/include/c++/10/optional:975
    #16 0x55e2ac21a2e7 in crimson::admin::AdminSocket::stop() ../src/crimson/admin/admin_socket.cc:265
    #17 0x55e2aa099825 in operator() ../src/crimson/osd/osd.cc:450
    #18 0x55e2aa0d4e3e in apply ../src/seastar/include/seastar/core/apply.hh:36

Signed-off-by: Kefu Chai <kchai@redhat.com>
yehudasa pushed a commit that referenced this pull request Apr 21, 2022
The problem is:

```
DEBUG 2022-03-07 13:50:40,027 [shard 0] osd - calling method rbd.create, num_read=0, num_write=0
DEBUG 2022-03-07 13:50:40,027 [shard 0] objclass - <cls> ../src/cls/rbd/cls_rbd.cc:787: create object_prefix=parent_id size=2097152 order=0 features=1
DEBUG 2022-03-07 13:50:40,027 [shard 0] osd - handling op omap-get-vals-by-keys on object 1:144d5af5:::parent_id:head
=================================================================
==2109764==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f6de5176e70 at pc 0x7f6dfd2a7157 bp 0x7f6de5176e30 sp 0x7f6de51765d8
WRITE of size 24 at 0x7f6de5176e70 thread T0
    #0 0x7f6dfd2a7156 in __interceptor_sigaltstack.part.0 (/lib64/libasan.so.6+0x54156)
    #1 0x7f6dfd30d5b3 in __asan::PlatformUnpoisonStacks() (/lib64/libasan.so.6+0xba5b3)
    #2 0x7f6dfd31314c in __asan_handle_no_return (/lib64/libasan.so.6+0xc014c)
Reactor stalled for 275 ms on shard 0. Backtrace: 0x45d9d 0xda72bd3 0xd801f73 0xd81f6f9 0xd81fb9c 0xd81fe2c 0xd8200f7 0x12b2f 0x7f6dfd3383c1 0x7f6dfd339b18 0x7f6dfd339bd4 0x7f6dfd339bd4 0x7f6dfd339bd4 0x7f6dfd339bd4 0x7f6dfd33b089 0x7f6dfd33bb36 0x7f6dfd32e0b5 0x7f6dfd32ff3a 0xd61d0 0x32412 0xbd8a7 0xbd134 0x54178 0xba5b3 0xc014c 0x1881f22 0x188344a 0xe8b439d 0xe8b58f2 0x2521d5a 0x2a2ee12 0x2c76349 0x2e04ce9 0x3c70c55 0x3cb8aa8 0x7f6de558de39
    #3 0x1881f22 in fmt::v6::internal::arg_map<fmt::v6::basic_format_context<seastar::internal::log_buf::inserter_iterator, char> >::~arg_map() /usr/include/fmt/core.h:1170
    #4 0x1881f22 in fmt::v6::basic_format_context<seastar::internal::log_buf::inserter_iterator, char>::~basic_format_context() /usr/include/fmt/core.h:1265
    #5 0x1881f22 in fmt::v6::format_handler<fmt::v6::arg_formatter<fmt::v6::internal::output_range<seastar::internal::log_buf::inserter_iterator, char> >, char, fmt::v6::basic_format_context<seastar::internal::log_buf::inserter_iterator, char> >::~format_handler() /usr/include/fmt/format.h:3143
    #6 0x1881f22 in fmt::v6::basic_format_context<seastar::internal::log_buf::inserter_iterator, char>::iterator fmt::v6::vformat_to<fmt::v6::arg_formatter<fmt::v6::internal::output_range<seastar::internal::log_buf::inserter_iterator, char> >, char, fmt::v6::basic_format_context<seastar::internal::log_buf::inserter_iterator, char> >(fmt::v6::arg_formatter<fmt::v6::internal::output_range<seastar::internal::log_buf::inserter_iterator, char> >::range, fmt::v6::basic_string_view<char>, fmt::v6::basic_format_args<fmt::v6::basic_format_context<seastar::internal::log_buf::inserter_iterator, char> >, fmt::v6::internal::locale_ref) /usr/include/fmt/format.h:3206
    #7 0x188344a in seastar::internal::log_buf::inserter_iterator fmt::v6::vformat_to<fmt::v6::basic_string_view<char>, seastar::internal::log_buf::inserter_iterator, , 0>(seastar::internal::log_buf::inserter_iterator, fmt::v6::basic_string_view<char> const&, fmt::v6::basic_format_args<fmt::v6::basic_format_context<fmt::v6::type_identity<seastar::internal::log_buf::inserter_iterator>::type, fmt::v6::internal::char_t_impl<fmt::v6::basic_string_view<char>, void>::type> >) /usr/include/fmt/format.h:3395
    #8 0x188344a in seastar::internal::log_buf::inserter_iterator fmt::v6::format_to<seastar::internal::log_buf::inserter_iterator, std::basic_string_view<char, std::char_traits<char> >, hobject_t const&, 0>(seastar::internal::log_buf::inserter_iterator, std::basic_string_view<char, std::char_traits<char> > const&, hobject_t const&) /usr/include/fmt/format.h:3418
    #9 0x188344a in seastar::logger::log<hobject_t const&>(seastar::log_level, seastar::logger::format_info, hobject_t const&)::{lambda(seastar::internal::log_buf::inserter_iterator)#1}::operator()(seastar::internal::log_buf::inserter_iterator) const ../src/seastar/include/seastar/util/log.hh:227
    #10 0x188344a in seastar::logger::lambda_log_writer<seastar::logger::log<hobject_t const&>(seastar::log_level, seastar::logger::format_info, hobject_t const&)::{lambda(seastar::internal::log_buf::inserter_iterator)#1}>::operator()(seastar::internal::log_buf::inserter_iterator) ../src/seastar/include/seastar/util/log.hh:106
    #11 0xe8b439d in operator() ../src/seastar/src/util/log.cc:268
    #12 0xe8b58f2 in seastar::logger::do_log(seastar::log_level, seastar::logger::log_writer&) ../src/seastar/src/util/log.cc:280
    #13 0x2521d5a in void seastar::logger::log<hobject_t const&>(seastar::log_level, seastar::logger::format_info, hobject_t const&) ../src/seastar/include/seastar/util/log.hh:230
    #14 0x2a2ee12 in void seastar::logger::debug<hobject_t const&>(seastar::logger::format_info, hobject_t const&) ../src/seastar/include/seastar/util/log.hh:373
    #15 0x2a2ee12 in PGBackend::omap_get_vals_by_keys(ObjectState const&, OSDOp&, object_stat_sum_t&) const ../src/crimson/osd/pg_backend.cc:1220
    #16 0x2c76349 in operator()<PGBackend, ObjectState> ../src/crimson/osd/ops_executer.cc:577
    #17 0x2c76349 in do_const_op<crimson::osd::OpsExecuter::execute_op(OSDOp&)::<lambda(auto:167&, const auto:168&)> > ../src/crimson/osd/ops_executer.cc:449
    #18 0x2e04ce9 in do_read_op<crimson::osd::OpsExecuter::execute_op(OSDOp&)::<lambda(auto:167&, const auto:168&)> > ../src/crimson/osd/ops_executer.h:216
    #19 0x2e04ce9 in crimson::osd::OpsExecuter::execute_op(OSDOp&) ../src/crimson/osd/ops_executer.cc:576
Reactor stalled for 762 ms on shard 0. Backtrace: 0x45d9d 0xda72bd3 0xd801f73 0xd81f6f9 0xd81fb9c 0xd81fe2c 0xd8200f7 0x12b2f 0x7f6dfd33ae85 0x7f6dfd33bb36 0x7f6dfd32e0b5 0x7f6dfd32ff3a 0xd61d0 0x32412 0xbd8a7 0xbd134 0x54178 0xba5b3 0xc014c 0x1881f22 0x188344a 0xe8b439d 0xe8b58f2 0x2521d5a 0x2a2ee12 0x2c76349 0x2e04ce9 0x3c70c55 0x3cb8aa8 0x7f6de558de39
    #20 0x3c70c55 in execute_osd_op ../src/crimson/osd/objclass.cc:35
    ceph#21 0x3cb8aa8 in cls_cxx_map_get_val(void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list*) ../src/crimson/osd/objclass.cc:372
    ceph#22 0x7f6de558de39  (/home/rzarzynski/ceph1/build/lib/libcls_rbd.so.1.0.0+0x28e39)

0x7f6de5176e70 is located 249456 bytes inside of 262144-byte region [0x7f6de513a000,0x7f6de517a000)
allocated by thread T0 here:
    #0 0x7f6dfd3084a7 in aligned_alloc (/lib64/libasan.so.6+0xb54a7)
    #1 0xdd414fc in seastar::thread_context::make_stack(unsigned long) ../src/seastar/src/core/thread.cc:196
    #2 0x7fff3214bc4f  ([stack]+0xa5c4f)
```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants