Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch_create does not report on 'store is read-only' errors #1932

Closed
chris-aeviator opened this issue Apr 25, 2022 · 18 comments
Closed

batch_create does not report on 'store is read-only' errors #1932

chris-aeviator opened this issue Apr 25, 2022 · 18 comments
Labels
autoclosed Closed by the bot. We still want this, but it didn't quite make the latest prioritization round

Comments

@chris-aeviator
Copy link

When using the python client with client.batch and having crossed the DISK_USE_READONLY_PERCENTAGE my ingest fails when using data_object.create but will seemingly succeed when using batches. However the ingest that succeeded using batch did not contain any items when running a Get query though they have been processed by my t2v-transformers container.

There might be other errors that are not covered by batch.create and I only found out through a helpful comment in Weaviate Slack and after running into my issue. This seems to also have been the issue with #1929 .

@ju-bezdek
Copy link

Oh I was just about to file mine version of this issue... :)

So I'll just add that I encountered similar problem, where batch command doesn't report this error either:

Creating object! Unexpected status code: 422, with response body: {'error': [{'message': "invalid object: 'id' is a reserved property name"}]}

I was able to retrieve this error by switching to one-by-one loop via self.client.data_object.create()

but calling this lines of code didn't report any errors, but no data were imported

            results = batch.create_objects()
            if results is not None:
                for result in results:
                    if 'result' in result and 'errors' in result['result'] and  'error' in result['result']['errors']:
                        for message in result['result']['errors']['error']:
                            raise Exception(message['message'])

for reference, it was discussed on slack with @StefanBogdan in this thread: https://weaviate.slack.com/archives/C017EG2SL3H/p1648577835143169

@etiennedi
Copy link
Member

etiennedi commented Apr 27, 2022

Hi @chris-aeviator (cc @StefanBogdan)

I've tried this using curl to make sure this is not broken in Weaviate itself and I can't reproduce the problem. I get the correct error message. Steps to reproduce:

I was running with the contextionary module, but that doesn't matter, should be the same with another or no module:

# Create a class
curl localhost:8080/v1/schema -H 'content-type: application/json' -d '{"class": "Demo", "properties": [{"name": "foo", "dataType":["string"]}]}'
# inspect the shards of the newly created class
curl localhost:8080/v1/schema/Demo/shards
[{"name":"VLDrddZu3xm0","status":"READY"}]
# update the shard as read-only
curl localhost:8080/v1/schema/Demo/shards/VLDrddZu3xm0 -X PUT -d '{"status": "READONLY"}' -H 'content-type: application/json'
{"status":"READONLY"}
# try to import using batch
curl localhost:8080/v1/batch/objects -H 'content-type:application/json' -d '{"objects":[{"class": "Demo", "properties":{"foo": "Hello"}}]}'
[{"additional":{"interpretation":{"source":[{"concept":"demo","occurrence":2498970,"weight":0.2275829017162323},{"concept":"hello","occurrence":6818552,"weight":0.10000000149011612}]}},"class":"Demo","creationTimeUnix":1651062680441,"id":"548dfaa9-be1b-4ad9-8532-d6ec0e9656cb","lastUpdateTimeUnix":1651062680441,"properties":{"foo":"Hello"},"vector":[-0.5832661, ..., -0.47240162],"deprecations":null,"result":{"errors":{"error":[{"message":"store is read-only"}]}}}]

It reports the error correctly

"result":{"errors":{"error":[{"message":"store is read-only"}]}}}

So, I can't reproduce this.

@ju-bezdek
Copy link

Hi,
I tried the similar curl command with my Id problem and got the result back...

so I assumed the bug will be in python client... however since I've updated to v3.4.0 two days ago, so I thought I can give it a try and found out that I couldn't reproduce it anymore either.

So I assume is has been fixed lately?

@etiennedi
Copy link
Member

Oh, thanks for following up @ju-bezdek, I was about to suggest creating a separate issue for your case. But maybe they are the same. Was there a recent fix around this somewhere in the python client, @StefanBogdan?

@ju-bezdek
Copy link

Maybe @chris-aeviator could confirm what client and weaviate version he was on, to confirm this hypothesis too

@chris-aeviator
Copy link
Author

weaviate-client~=3.4.1
Weaviate V1.12.1

@chris-aeviator
Copy link
Author

I just used some old code that was still using python client batch and realized it was not even throwing on any error. My ingest succeeded (empty data) and I was only able to see this error with data_object.create()

Creating object! Unexpected status code: 422, with response body: {'error': [{'message': "invalid object: invalid string property 'onvId' on class 'OnvCompanyIndustry_CCCCCCKKKKKTEST0004': not a string, but json.Number"}]}

@etiennedi
Copy link
Member

Thanks for the update, just to confirm: Is the error message valid?

not a string, but json.Number

Did you indeed try to send a number to a string field? Or do you expect this to be an (unrelated) bug?

@chris-aeviator
Copy link
Author

@etiennedi yes the error was valid, I was indeed trying to set a number as a string.

@StefanBogdan
Copy link
Member

Hi, sorry I missed my tag in here.

The python client does not do any manipulation of the results. I would recommend using a callback function when using auto-batching.

def check_batch_result(results: dict) -> None:
    """
    Check batch results for errors.

    Parameters
    ----------
    results : dict
        The Weaviate batch creation return value.
    """

    if results is not None:
        for result in results:
            if 'result' in result and 'errors' in result['result']:
                if 'error' in result['result']['errors']:
                    for message in result['result']['errors']['error']:
                        print(message['message'], flush=True)

client.batch.configure(
    batch_size=BATCH_SIZE,
    callback=check_batch_result,
    ...
)

@chris-aeviator
Copy link
Author

@StefanBogdan thanks for pointing this out, I will try this approach when I find the time to see if it solves my issue. Wouldnt it make sense to provide an on_error hook alongside the callback on the client level rather than expecting the user to do deeply nested checking for errors?

@ju-bezdek
Copy link

@StefanBogdan You explained to me on slack, that there is no need to use callback if I wan't synch result... that I'll get the result dict as a response/output for create_object if callback is not provided.

so this method of error detection

            results = batch.create_objects()
            if results is not None:
                for result in results:
                    if 'result' in result and 'errors' in result['result'] and  'error' in result['result']['errors']:
                        for message in result['result']['errors']['error']:
                            raise Exception(message['message'])

should get the same result... the only difference is that with callback the calls are asynchronous (which could lead to some performance gains I suppose

Am I right, or I misunderstood you there?

@StefanBogdan
Copy link
Member

@chris-aeviator

Wouldnt it make sense to provide an on_error hook alongside the callback on the client level rather than expecting the user to do deeply nested checking for errors?

This is under development.

@StefanBogdan
Copy link
Member

@ju-bezdek I think you misunderstood. The current version of the python client does not support async functionality. The callback is used in case you have auto-batching. If you use the methods client.batch.create_objects()/client.batch.create_references() and check the returned value, it is going to be the same, i.e.:

# auto-batching (when we set `batch_size` to a non-None value)
client.batch.configure(
  batch_size=100,
  callback=check_batch_result,
)

# is the same as 
client.batch.configure(
  batch_size=None,
)

check_batch_result( 
  client.batch.create_objects() # every time I call this method
)

@ju-bezdek
Copy link

ju-bezdek commented May 5, 2022

Ooooh... I seee,
so when I have batch_size != None, weaviate does auto-batching and batch.create_objects() will/could be empty?

I wonder how I could get errors with my previous implementation when I was checking this again with v3.4.0
....

To double-check and also for other folks that can come across this... this is the right way, right?

       def check_batch_result(results: dict) -> None:
            if results is not None:
                for result in results:
                    if 'result' in result and 'errors' in result['result']:
                        if 'error' in result['result']['errors']:
                            for message in result['result']['errors']['error']:
                                raise Exception(message['message'])


        with self.client.batch(
                batch_size=100, #if this is not None 
                callback=check_batch_result #callback function is needed if I need check for errors
            ) as batch:
            
            for rec in data:
                            
                #add some data
                doc_id = str(uuid4())  if "id" not in rec else rec.pop("id")
                batch.add_data_object(
                    data_object=rec,
                    class_name= proj_cls_name,
                    uuid=doc_id,
                    vector=vector
                )

           i_can_ignore_this_result = batch.create_objects()           
           and_this_as_well  = batch.create_references()

@StefanBogdan
Copy link
Member

Hi @ju-bezdek , you do not need this lines at all:

           i_can_ignore_this_result = batch.create_objects()           
           and_this_as_well  = batch.create_references()

the auto-batching does this for you, you should call the methods batch.create_objects() and batch.create_references() ONLY when batch_size is set to None. With non None value for batch_size you only add objects/references, and a callback to check the batch creation results.

@ju-bezdek
Copy link

Thanks for correction...

May I also suggest maybe to update documentation here?
https://weaviate.io/developers/weaviate/current/restful-api-references/batch.html#error-handling

I suppose it's working example, but obviously it's not the recommended way and it may lead to this kind of issue reports:)

@stale
Copy link

stale bot commented Jul 10, 2022

Thank you for your contribution to Weaviate. This issue has not received any activity in a while and has therefore been marked as stale. Stale issues will eventually be autoclosed. This does not mean that we are ruling out to work on this issue, but it most likely has not been prioritized high enough in the last months.
If you believe that this issue should remain open, please leave a short reply. This lets us know that the issue is not abandoned and acts as a reminder for our team to consider prioritizing this again.
Please also consider if you can make a contribution to help with the solution of this issue. If you are willing to contribute, but don't know where to start, please leave a quick message and we'll try to help you.
Thank you, The Weaviate Team

@stale stale bot added the autoclosed Closed by the bot. We still want this, but it didn't quite make the latest prioritization round label Jul 10, 2022
@stale stale bot closed this as completed Jul 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autoclosed Closed by the bot. We still want this, but it didn't quite make the latest prioritization round
Projects
None yet
Development

No branches or pull requests

4 participants