Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntegrityError with unique field and get_or_create #140

Closed
LuckCky opened this issue Jun 13, 2019 · 7 comments · Fixed by #302
Closed

IntegrityError with unique field and get_or_create #140

LuckCky opened this issue Jun 13, 2019 · 7 comments · Fixed by #302
Labels
bug Something isn't working

Comments

@LuckCky
Copy link

LuckCky commented Jun 13, 2019

Describe the bug
Hi!
First of all thank you for developing such useful ORM.
I am not sure that this is really bug, maybe I am doing something wrong.
I get IntegrityError with fields having unique=True and using get_or_create.

To Reproduce
Models.py is:

class CheckDates(Model):
    id = fields.IntField(pk=True)
    check_date = fields.DateField(unique=True)

    def __str__(self):
        return self.check_date.strftime('%Y-%m-%d')

class Projects(Model):
    id = fields.IntField(pk=True)
    name = fields.CharField(max_length=255, unique=True)

    def __str__(self):
        return self.name

class Results(Model):
    id = fields.IntField(pk=True)
    date = fields.ForeignKeyField('models.CheckDates', related_name='results')
    project = fields.ForeignKeyField('models.Projects', related_name='results')
    suspicious_url = fields.CharField(max_length=255)

    class Meta:
        unique_together = ('date', 'project', 'suspicious_url')

    def __str__(self):
        return self.suspicious_url

I have file for generating schemas:

from tortoise import Tortoise, run_async


async def init(generate=False):
    await Tortoise.init(
        db_url='sqlite://db.sqlite3',
        modules={'models': ['AntiDoorway.models']}
    )
    if generate:
        await generate_schemas()

async def generate_schemas():
    await Tortoise.generate_schemas()

async def close_conns():
    await Tortoise.close_connections()

if __name__ == '__main__':
    run_async(init(generate=True))

And file with main pipeline (some code is excluded as irrelevant):

def check_by_visiting(urls_dict: dict, project_name: str, check_date: date, counter_id: int) -> None:
    tasks = []
    loop = asyncio.get_event_loop()
    loop.run_until_complete(init())
    for url in urls_dict.keys():
        for platform in platforms:
            tasks.append(loop.create_task(main(url, project_name, check_date, counter_id, platform)))
    loop.run_until_complete(asyncio.wait(tasks))
    loop.run_until_complete(close_conns())
    loop.close()

async def main(url: str, project_name: str, check_date: date, counter_id: int, platform: str) -> None:
    result = []
    try:
        date_id = await CheckDates.get_or_create(check_date=check_date)
    except exceptions.IntegrityError:
        date_id = await CheckDates.filter(check_date=check_date)
    date_id = date_id[0]
    project_id = await Projects.get_or_create(name=project_name)
    project_id = project_id[0]

    params = await form_request_params(platform)
    formed_url = await form_url(url)
    async with aiohttp.ClientSession() as session:
        try:
            response = await fetch_data(formed_url, params, session)
        except Exception as e:
            response = ''
    result.append(await check_counter_presence(response, counter_id))
    result.append(await return_reload_in_header(response))
    if any(result):
        try:
            await Results.get_or_create(date=date_id, project=project_id, suspicious_url=url)
        except (sqlite3.IntegrityError, exceptions.IntegrityError):
            pass

I get following errors when I run script with empty DB:
sqlite3.IntegrityError: UNIQUE constraint failed: checkdates.check_date
and
sqlite3.IntegrityError: UNIQUE constraint failed: results.date_id, results.project_id, results.suspicious_url
but all data is saved (well, I hope it is).
And no errors if date is already in DB.

Expected behavior
No error messages.

Additional context
So this might be a bug if I'm doing it right.

@abondar
Copy link
Member

abondar commented Jun 13, 2019

Hi

I tried running your example (though I had to modify it, because it was out of context) and could not reproduce it.

Could you give me some more details on problem (like stacktrace of error) and some info about how main() function is being called.

Would be best if you could concatenate all your code in one file (like files in examples), so I could reproduce error exactly as you have it.

@LuckCky
Copy link
Author

LuckCky commented Jun 13, 2019

Please find files attached.
I run code with Python 3.7.2.
When you run it first time it throws an error, but saves all records to DB.
If you comment out try-except leaving only one get_or_create, you'll get an error (if run for the first time), but not all records get saved in DB.
When you run this code for the second time, no error are thrown.

app.zip

@abondar
Copy link
Member

abondar commented Jun 14, 2019

Okay, so I see the problem now.

Issue there that get_or_create method is not atomic and is actually two different calls, one of which is select, and second one is create.
Starting two get_or_create concurrently results in following error.

Even though I wouldn't say it is bug, but it is certainly undesired behaviour.

Will fix it now

@LuckCky
Copy link
Author

LuckCky commented Jun 14, 2019

Thank you!

@abondar
Copy link
Member

abondar commented Jun 14, 2019

Seems like it will take more time than I thought because I encountered concurrency issues on other db backends apart from sqlite
I hope I will overcome them relatively soon.

Apart from the fix from our side - your current solution with handling IntegrityError should work fine, only problem with it is that you should have such handler on each get_or_create, which is inconviniet

@grigi grigi added the bug Something isn't working label Jun 17, 2019
@grigi grigi mentioned this issue Jul 10, 2019
72 tasks
@grigi
Copy link
Member

grigi commented Nov 11, 2019

@abondar The base framework is in place, so we could check and/or re-implement the fix?
Another solution would be to move more logic into the executor layer?

@mmkhitaryan
Copy link

Did you use naive or timezone aware dates? This might be the MRE #1556

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants