Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grakn Crash: Inserting two things and matching one #6304

Closed
thomaschristopherking opened this issue Apr 27, 2021 · 4 comments
Closed

Grakn Crash: Inserting two things and matching one #6304

thomaschristopherking opened this issue Apr 27, 2021 · 4 comments

Comments

@thomaschristopherking
Copy link

thomaschristopherking commented Apr 27, 2021

Description

On an empty database, when executing separate insert, match, insert, queries Grakn uses up substantial CPU resources. A few such cases in succession easily leads to a situation where all resources are exhausted and the AWS EC2 instance goes down and/or a situation where Grakn never seems to give up the resources.

Environment

  1. OS: Ubuntu 20.0.4 LTS
  2. Grakn 2.0.2 Core
  3. Grakn client: Python 2.0.1
  4. Hosted on an AWS EC2 instance (t3-medium)

Reproducible Steps

Steps to create the smallest reproducible scenario:

  1. Load the following schema:

     identifier sub attribute, value string;
     entity_with_id sub entity, abstract, owns identifier @key,
      plays equal:object, plays provenance:owner;
     relation_with_id sub relation, abstract, owns identifier @key;
     equal sub relation, relates object;
     
     version sub attribute, value long;
     schema_metadata sub entity, owns version;
     
     phone_number sub attribute, value string;
     branch_name sub attribute, value string;
     property_type sub attribute, value string, regex "^(Apartment|Condominium|House|Room in Shared Living)$";
     
     portfolio sub relation_with_id,
      relates landlord,
      relates landlord_signing_authority,
      relates property,
      owns phone_number,
      owns branch_name,
      owns property_type,
      owns longitude,
      owns latitude;
     
     building plays portfolio:property;
     unit plays portfolio:property;
     
     is_located_at sub relation_with_id,
      relates location,
      relates thing_located;
     
     building_name sub attribute, value string;
     address_line_1 sub attribute, value string;
     address_line_2 sub attribute, value string;
     city sub attribute, value string;
     state sub attribute, value string;
     country sub attribute, value string;
     postal_code sub attribute, value string;
     latitude sub attribute, value double;
     longitude sub attribute, value double;
     original_string sub attribute, value string;
     
     address sub entity_with_id,
      plays is_located_at:location,
      owns building_name,
      owns address_line_1,
      owns address_line_2,
      owns city,
      owns state,
      owns country,
      owns postal_code,
      owns latitude,
      owns longitude,
      owns original_string;
     
     
     building sub entity_with_id,
      plays is_located_at:location,
      plays is_located_at:thing_located;
     
     unit_number sub attribute, value string;
     number_of_bedrooms sub attribute, value long;
     number_of_bathrooms sub attribute, value long;
     has_flex_wall sub attribute, value boolean;
     
     unit sub entity_with_id, abstract,
      owns unit_number,
      owns number_of_bathrooms,
      plays is_located_at:location,
      plays is_located_at:thing_located;
     
     apartment sub unit,
      owns number_of_bedrooms;
     
     provenance sub relation, relates object, relates owner, relates source;
    
  2. Execute the following Python code:

     def crash_db():
         with Grakn.core_client('<enter your db address>') as client:
             with client.session("knowledge-graph", SessionType.DATA) as session:
                 with session.transaction(transaction_type=TransactionType.WRITE) as transaction:
     
                     transaction.query().insert(
                         "insert $apartment isa apartment, has unit_number \"1307\", has identifier \"a74a9eac-3012-46aa-8ac3-1026e46e0402\", has number_of_bathrooms 1, has number_of_bedrooms 1;")
     
                     transaction.query().match(
                         "match $unit_loc(location: $building, thing_located: $apartment) isa is_located_at; $building_loc(location: $address, thing_located: $building) isa is_located_at;$portfolio(property: $unit) isa portfolio; $apartment isa apartment, has unit_number \"1307\", has number_of_bathrooms 1, has number_of_bedrooms 1, has identifier $unit_identifier; $building isa building, has identifier \"62a7ff50-d529-40d9-bae1-f5fda15d5dd4\";$address isa address, has address_line_1 \"1307 Calibre Creek Pkwy\", has city \"Roswell\", has country \"United States\", has postal_code \"30076\", has latitude 34.0157616, has longitude -84.30288139999999, has building_name \"Park 83\", has address_line_2 \"\", has state \"Georgia\", has identifier \"1dd59eff-ac79-4439-a8bc-af9eee7d66f9\"; get $apartment, $unit_identifier, $building, $address, $portfolio;sort $unit_identifier; offset 0; limit 10;")
     
                     transaction.query().insert(
                         "match $thing_located0 isa thing, has identifier \"62a7ff50-d529-40d9-bae1-f5fda15d5dd4\"; $location0 isa thing, has identifier \"1dd59eff-ac79-4439-a8bc-af9eee7d66f9\"; insert $is_located_at (thing_located: $thing_located0, location: $location0) isa is_located_at, has identifier \"f21e0eab-393a-47e2-85da-452d5a035b16\";")
     
                     transaction.commit()
    

Expected Output

The database doesn't use up so much resource

Actual Output

CPU usage: 126%, Memory: 65.4% (2.6GB) (on a 2-core t3-medium EC2 instance)

Additional Information

This is on a database with no data to start with.

@flyingsilverfin
Copy link
Member

Ok -- this issue is due to the long match query in the middle. What's occuring is that the static query type checker currently uses a suboptimal algorithm for type inference. We have an issue to address this at: #6194

@thomaschristopherking
Copy link
Author

Thanks. Is that issue a large problem to solve, or is there a workaround that can be put in place? For some context, although the match might seem long, it's actually a very common query that we execute on our other graphs (Neo4j and NetworkX).

@flyingsilverfin
Copy link
Member

flyingsilverfin commented Apr 29, 2021

Had a quick look - I don't think the algorithm would take more than a day, but we have to do some shifting of the architecture to accommodate the elegant implementation, which could take some time. Quick chat with haikal we think it'll probably get done in the next month or so?

Short term workaround is ugly as heck: split your query into a two parts (for this one, you only have to knock off a couple of the attributes at the end), and then pipe the IIDs you get from the first query into the second and filter out the ones that are not satisfying the second query.

Agree that this is a priority (label is blocker on the other issue) as people do write long queries, and this is used to validate rules which can also get large and then the validation takes ages!

@flyingsilverfin
Copy link
Member

This is closed with #6431, which also closed #6194

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants