Grakn Crash: Inserting two things and matching one #6304

thomaschristopherking · 2021-04-27T13:54:46Z

Description

On an empty database, when executing separate insert, match, insert, queries Grakn uses up substantial CPU resources. A few such cases in succession easily leads to a situation where all resources are exhausted and the AWS EC2 instance goes down and/or a situation where Grakn never seems to give up the resources.

Environment

OS: Ubuntu 20.0.4 LTS
Grakn 2.0.2 Core
Grakn client: Python 2.0.1
Hosted on an AWS EC2 instance (t3-medium)

Reproducible Steps

Steps to create the smallest reproducible scenario:

Load the following schema:

 identifier sub attribute, value string;
 entity_with_id sub entity, abstract, owns identifier @key,
  plays equal:object, plays provenance:owner;
 relation_with_id sub relation, abstract, owns identifier @key;
 equal sub relation, relates object;
 
 version sub attribute, value long;
 schema_metadata sub entity, owns version;
 
 phone_number sub attribute, value string;
 branch_name sub attribute, value string;
 property_type sub attribute, value string, regex "^(Apartment|Condominium|House|Room in Shared Living)$";
 
 portfolio sub relation_with_id,
  relates landlord,
  relates landlord_signing_authority,
  relates property,
  owns phone_number,
  owns branch_name,
  owns property_type,
  owns longitude,
  owns latitude;
 
 building plays portfolio:property;
 unit plays portfolio:property;
 
 is_located_at sub relation_with_id,
  relates location,
  relates thing_located;
 
 building_name sub attribute, value string;
 address_line_1 sub attribute, value string;
 address_line_2 sub attribute, value string;
 city sub attribute, value string;
 state sub attribute, value string;
 country sub attribute, value string;
 postal_code sub attribute, value string;
 latitude sub attribute, value double;
 longitude sub attribute, value double;
 original_string sub attribute, value string;
 
 address sub entity_with_id,
  plays is_located_at:location,
  owns building_name,
  owns address_line_1,
  owns address_line_2,
  owns city,
  owns state,
  owns country,
  owns postal_code,
  owns latitude,
  owns longitude,
  owns original_string;
 
 
 building sub entity_with_id,
  plays is_located_at:location,
  plays is_located_at:thing_located;
 
 unit_number sub attribute, value string;
 number_of_bedrooms sub attribute, value long;
 number_of_bathrooms sub attribute, value long;
 has_flex_wall sub attribute, value boolean;
 
 unit sub entity_with_id, abstract,
  owns unit_number,
  owns number_of_bathrooms,
  plays is_located_at:location,
  plays is_located_at:thing_located;
 
 apartment sub unit,
  owns number_of_bedrooms;
 
 provenance sub relation, relates object, relates owner, relates source;

Execute the following Python code:

 def crash_db():
     with Grakn.core_client('<enter your db address>') as client:
         with client.session("knowledge-graph", SessionType.DATA) as session:
             with session.transaction(transaction_type=TransactionType.WRITE) as transaction:
 
                 transaction.query().insert(
                     "insert $apartment isa apartment, has unit_number \"1307\", has identifier \"a74a9eac-3012-46aa-8ac3-1026e46e0402\", has number_of_bathrooms 1, has number_of_bedrooms 1;")
 
                 transaction.query().match(
                     "match $unit_loc(location: $building, thing_located: $apartment) isa is_located_at; $building_loc(location: $address, thing_located: $building) isa is_located_at;$portfolio(property: $unit) isa portfolio; $apartment isa apartment, has unit_number \"1307\", has number_of_bathrooms 1, has number_of_bedrooms 1, has identifier $unit_identifier; $building isa building, has identifier \"62a7ff50-d529-40d9-bae1-f5fda15d5dd4\";$address isa address, has address_line_1 \"1307 Calibre Creek Pkwy\", has city \"Roswell\", has country \"United States\", has postal_code \"30076\", has latitude 34.0157616, has longitude -84.30288139999999, has building_name \"Park 83\", has address_line_2 \"\", has state \"Georgia\", has identifier \"1dd59eff-ac79-4439-a8bc-af9eee7d66f9\"; get $apartment, $unit_identifier, $building, $address, $portfolio;sort $unit_identifier; offset 0; limit 10;")
 
                 transaction.query().insert(
                     "match $thing_located0 isa thing, has identifier \"62a7ff50-d529-40d9-bae1-f5fda15d5dd4\"; $location0 isa thing, has identifier \"1dd59eff-ac79-4439-a8bc-af9eee7d66f9\"; insert $is_located_at (thing_located: $thing_located0, location: $location0) isa is_located_at, has identifier \"f21e0eab-393a-47e2-85da-452d5a035b16\";")
 
                 transaction.commit()

Expected Output

The database doesn't use up so much resource

Actual Output

CPU usage: 126%, Memory: 65.4% (2.6GB) (on a 2-core t3-medium EC2 instance)

Additional Information

This is on a database with no data to start with.

The text was updated successfully, but these errors were encountered:

flyingsilverfin · 2021-04-29T15:52:50Z

Ok -- this issue is due to the long match query in the middle. What's occuring is that the static query type checker currently uses a suboptimal algorithm for type inference. We have an issue to address this at: #6194

thomaschristopherking · 2021-04-29T15:57:46Z

Thanks. Is that issue a large problem to solve, or is there a workaround that can be put in place? For some context, although the match might seem long, it's actually a very common query that we execute on our other graphs (Neo4j and NetworkX).

flyingsilverfin · 2021-04-29T17:42:30Z

Had a quick look - I don't think the algorithm would take more than a day, but we have to do some shifting of the architecture to accommodate the elegant implementation, which could take some time. Quick chat with haikal we think it'll probably get done in the next month or so?

Short term workaround is ugly as heck: split your query into a two parts (for this one, you only have to knock off a couple of the attributes at the end), and then pipe the IIDs you get from the first query into the second and filter out the ones that are not satisfying the second query.

Agree that this is a priority (label is blocker on the other issue) as people do write long queries, and this is used to validate rules which can also get large and then the validation takes ages!

flyingsilverfin · 2022-02-01T18:21:36Z

This is closed with #6431, which also closed #6194

flyingsilverfin added priority: high type: bug labels Apr 29, 2021

thomaschristopherking mentioned this issue May 26, 2021

Traversal Engine Graph Combinator #6194

Closed

flyingsilverfin closed this as completed Feb 1, 2022

grabl added the status: solved label Feb 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grakn Crash: Inserting two things and matching one #6304

Grakn Crash: Inserting two things and matching one #6304

thomaschristopherking commented Apr 27, 2021 •

edited

Loading

flyingsilverfin commented Apr 29, 2021

thomaschristopherking commented Apr 29, 2021

flyingsilverfin commented Apr 29, 2021 •

edited

Loading

flyingsilverfin commented Feb 1, 2022

Grakn Crash: Inserting two things and matching one #6304

Grakn Crash: Inserting two things and matching one #6304

Comments

thomaschristopherking commented Apr 27, 2021 • edited Loading

Description

Environment

Reproducible Steps

Expected Output

Actual Output

Additional Information

flyingsilverfin commented Apr 29, 2021

thomaschristopherking commented Apr 29, 2021

flyingsilverfin commented Apr 29, 2021 • edited Loading

flyingsilverfin commented Feb 1, 2022

thomaschristopherking commented Apr 27, 2021 •

edited

Loading

flyingsilverfin commented Apr 29, 2021 •

edited

Loading