Skip to content
This repository

Delete Node from Index is not possible. #36

SafwanHak opened this Issue · 19 comments

4 participants

SafwanHak Aseem Kishore Mat Tyndall pvencill


When deleting a node, the index is not cleared automatically. We found during our testing that Neo4J will not delete the node from the index and if we attempt to create (during testing) a node with the same ID, it'll give an error. While in testing it's not important, in production the index will continue to grow even if we delete nodes. so we might have 1 million node in the database but 2 million in the index (assuming we deleted a million of course).

Would it be possible to add 2 functions;

1- One to remove a node/relationship from Index.
2- Another to delete an index completely.

Any and All help would be much appreciated.


Aseem Kishore
aseemk commented

Sorry for the delay in responding!

Wow, that's surprising. You should ask the Neo4j guys about it -- seems like a (logic/usability) bug on Neo4j's side if a node doesn't get removed from the index as well.

Until then, yes it'd be good to add methods to remove from an index and delete indexes.

For removing from an index, it should be straightforward to add an unindex() method to the Node class alongside index(); want to give it a shot?

For deleting indexes, it's the same as the GraphDatabase class's queryNodeIndex() except making a DELETE request instead of a GET request:

If you're able to give these a shot (and some quick/basic unit tests would be appreciated!), a pull request would be welcome. =) Thanks!

Mat Tyndall

I spent some time working on this last night, unfortunately I don't know coffeescript that well. How do I get the node id inside the function?

Also looked into creating and deleting indexes, that way I wouldn't have to create nodes to initialize indexes.

Aseem Kishore

Hey @flipside, awesome to hear. Do you mean inside a Node method? If so, @id will do it — that's CoffeeScript for Looking forward to seeing your changes! =)

Mat Tyndall

Thanks @aseemk. The other thing I need to figure out is that the key and value are optional parameters for un-indexing a node.
unindex: (index, _) ->
unindex: (index, key, _) ->
unindex: (index, key, value, _) ->

Aseem Kishore

Ah yes, that's unfortunately one thing that's tricky to do with Streamline, since you can't re-assign the _ variable. The solution is to simply wrap the method in a non-Streamline function that normalizes the arguments.

Take a look at GraphDatabase::execute() for an example!

The actual Streamline method that expects all arguments:

The non-Streamline wrapper that accepts whatever arguments and normalizes them:

Does that make sense?

Mat Tyndall

That looks like fun, I'll give it a shot once I get the basics working.

Mat Tyndall

got unindex working, overloaded and everything =)
gonna work on the other ones while i still have coffeescript on the brain.

Aseem Kishore

Sweet! Feel free to share a link if you want a quick review.

Aseem Kishore

Looks good!

Mat Tyndall

Just added get/create/delete for node indexes and it's working just fine.

Mat Tyndall

Was just taking a look at the api and I found this:

Particularly: 18.10.3. Add a node to an index unless a node already exists for the given mapping

Seems all I'd need to do would be to add a "?unique" to the end, so the questions are:

  1. Should unique be a paramater for index? node.index(index, key, value, unique, callback)
  2. Should unique be optional and default to true?

What do you think, @aseemk ?

Aseem Kishore

Hey @flipside, your code looks pretty good. I have a few comments, so why don't you go ahead open a pull request and we can move the conversation over there?

Btw, before you do, it'd be good for you to get the latest changes from this repo and also move your changes to a feature branch so you can do other work too without mixing things up. Try this:

# make sure you have no unstaged changes:
git status

# if you do:
git stash

# move these changes to a feature branch, e.g.:
git checkout -b feature/indexing-enhancements

# publish this branch to your repo (assuming the remote is origin):
git push -u origin feature/indexing-enhancements

# go back to develop so we can pull the latest changes:
git checkout develop

# if you don't already have this upstream repo tracked:
git remote add upstream git://

# pull in the latest changes to your development branch:
git fetch upstream
git reset --hard upstream/develop

# and now rebase your changes to be on top of these changes:
git checkout feature/indexing-enhancements
git rebase develop
git push origin feature/indexing-enhancements

# optionally update your own github repo's develop branch to this too:
git checkout
git push origin -f

I should capture some of these pointers into a file like GitHub uses now. =)

Re: unique indexing, check out issue #14 -- it's definitely something people want, so it would indeed be good to have. But I don't think these methods are the right place to do it, because they're Node instance methods -- when you instead probably want to do this as a "get or create" step.

So I was thinking that it might be better to make a GraphDatabase.getOrCreateIndexedNode() method to parallel GraphDatabase.getIndexedNode(). What do you think?

Mat Tyndall

Cool, will try to do a pull request later tonight or tomorrow.

As for the unique indexing, currently if I run the following repeatedly it will create a new entry every time.

post /db/data/index/node/templates {"key":"id","value":33507,"uri":"/db/data/node/33"}
response: created

With unique indexing, if the key/value pair exists for the node, it won't overwrite it.
post /db/data/index/node/templates?unique {"key":"id","value":33507,"uri":"/db/data/node/33"}
response: ok

Anyways, just going to add it in as a new function uniqueIndex for now.

Aseem Kishore

Indeed, the regular behavior feels silly, though it's documented FWIW:

I believe the problem with the second one is that if any other node is indexed under that key and value, this node won't be. Which I think is undesirable and unexpected in some cases, e.g. indexing nodes by type or category.

Aseem Kishore

But I can see your point now that adding/updating an existing node to an index uniquely is different than creating a node uniquely.

But even then, you're stuck in the case of many nodes under a given key/value like I mentioned; you still have to make sure you don't duplicate index entries in that case.

It doesn't feel great to add another method for unique indexing -- it'd be nice to avoid API clutter -- but it doesn't feel great to add another parameter to the already-long index() method either.

If you still think it'd be worth it to have such functionality, okay, let's do it as a method called indexUniquely() for now (instead of uniqueIndex()).

Thanks for the great feedback @flipside!

Aseem Kishore

(I say a separate method for readability, so that we can reflect the difference in the name; boolean parameters hide that. Options hashes are better than both approaches I think; I'd love to move to an options-based API in the next version.)


+1 for the option hash approach; that's more javascripty/node-like anyway.
+10 for adding the ability to do unique nodes and indices.

Aseem Kishore

This is now possible via @flipside's unindex() methods from pull req #55; merged in 45cc950. It'll make the next release very soon...

Aseem Kishore aseemk closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.