Add hasKeys() to StoreClient #95

Open
wants to merge 8 commits into
from

Conversation

Projects
None yet
2 participants
@rsumbaly
Contributor

rsumbaly commented Sep 15, 2012

No description provided.

@ghost ghost assigned leigao Sep 21, 2012

@leigao

This comment has been minimized.

Show comment Hide comment
@leigao

leigao Sep 21, 2012

Collaborator

Roshan, could you please refactor the hasKeys related code so getAll and hasKeys can share a same code path for configuring nodes, and processing request in both parallel and serial phases.

We've just fixed a bug in getAll where it always hit two nodes when even when required is 1. I see that your code picked up the change. But if we do additional modification to getAll and hasKeys, it will be much easier and less error prone when they share the refactored common code path.

Collaborator

leigao commented Sep 21, 2012

Roshan, could you please refactor the hasKeys related code so getAll and hasKeys can share a same code path for configuring nodes, and processing request in both parallel and serial phases.

We've just fixed a bug in getAll where it always hit two nodes when even when required is 1. I see that your code picked up the change. But if we do additional modification to getAll and hasKeys, it will be much easier and less error prone when they share the refactored common code path.

@@ -103,6 +103,10 @@ public void truncate() {
return slopEngine.getAll(keys, transforms);
}
+ public Map<ByteArray, Boolean> hasKeys(Iterable<ByteArray> keys, boolean exact) {

This comment has been minimized.

Show comment Hide comment
@leigao

leigao Sep 21, 2012

Collaborator

Why do we need implement hasKeys in SlopStorageEngine?

@leigao

leigao Sep 21, 2012

Collaborator

Why do we need implement hasKeys in SlopStorageEngine?

This comment has been minimized.

Show comment Hide comment
@rsumbaly

rsumbaly Sep 22, 2012

Contributor

Because it implements the StorageEngine interface...

@rsumbaly

rsumbaly Sep 22, 2012

Contributor

Because it implements the StorageEngine interface...

+ }
+ }
+
+ if(exact) {

This comment has been minimized.

Show comment Hide comment
@leigao

leigao Sep 21, 2012

Collaborator

So if 'exact' is specified, it will no longer be an index-file-only read, correct? In this case, don't we still need to control the rate at which hasKeys shall be used?

@leigao

leigao Sep 21, 2012

Collaborator

So if 'exact' is specified, it will no longer be an index-file-only read, correct? In this case, don't we still need to control the rate at which hasKeys shall be used?

This comment has been minimized.

Show comment Hide comment
@rsumbaly

rsumbaly Sep 22, 2012

Contributor

So if 'exact' is specified, it will no longer be an index-file-only read, correct? - Yes
In this case, don't we still need to control the rate at which hasKeys shall be used? - define "control". Do you mean throttle? If yes, then "exact" being false is precisely what the user would want to use for quick lookups. Unfortunately the current format of read-only files does not permit us to do exact matches without looking at the data file.

@rsumbaly

rsumbaly Sep 22, 2012

Contributor

So if 'exact' is specified, it will no longer be an index-file-only read, correct? - Yes
In this case, don't we still need to control the rate at which hasKeys shall be used? - define "control". Do you mean throttle? If yes, then "exact" being false is precisely what the user would want to use for quick lookups. Unfortunately the current format of read-only files does not permit us to do exact matches without looking at the data file.

This comment has been minimized.

Show comment Hide comment
@leigao

leigao Sep 23, 2012

Collaborator

On Fri, Sep 21, 2012 at 7:39 PM, Roshan Sumbaly notifications@github.comwrote:

In src/java/voldemort/store/readonly/ReadOnlyStorageEngine.java:

  •        fileModificationLock.readLock().lock();
    
  •        List<KeyValueLocation> keysAndValueLocations = Lists.newArrayList();
    
  •        for(ByteArray key: keys) {
    
  •            int chunk = fileSet.getChunkForKey(key.get());
    
  •            int valueLocation = searchStrategy.indexOf(fileSet.indexFileFor(chunk),
    
  •                                                       fileSet.keyToStorageFormat(key.get()),
    
  •                                                       fileSet.getIndexFileSize(chunk));
    
  •            if(valueLocation >= 0) {
    
  •                if(exact)
    
  •                    keysAndValueLocations.add(new KeyValueLocation(chunk, key, valueLocation));
    
  •                else
    
  •                    results.put(key, true);
    
  •            }
    
  •        }
    
  •        if(exact) {
    

So if 'exact' is specified, it will no longer be an index-file-only read,
correct? - Yes
In this case, don't we still need to control the rate at which hasKeys
shall be used? - define "control". Do you mean throttle? If yes, then
"exact" being false is precisely what the user would want to use for quick
lookups. Unfortunately the current format of read-only files does not
permit us to do exact matches without looking at the data file.

If hasKeys access data files, it will introduce the same amount of load to
servers as getAll. Won't it defeat the purpose of this function? There will
be saving on network and serialization, but those costs are insignificant
compare to the server cost.

All I am saying is that hasKeys for RO is not as cheap as we thought. Use
it with care!

Thanks,

Lei


Reply to this email directly or view it on GitHubhttps://github.com/voldemort/voldemort/pull/95/files#r1667218.

Lei Gao

@leigao

leigao Sep 23, 2012

Collaborator

On Fri, Sep 21, 2012 at 7:39 PM, Roshan Sumbaly notifications@github.comwrote:

In src/java/voldemort/store/readonly/ReadOnlyStorageEngine.java:

  •        fileModificationLock.readLock().lock();
    
  •        List<KeyValueLocation> keysAndValueLocations = Lists.newArrayList();
    
  •        for(ByteArray key: keys) {
    
  •            int chunk = fileSet.getChunkForKey(key.get());
    
  •            int valueLocation = searchStrategy.indexOf(fileSet.indexFileFor(chunk),
    
  •                                                       fileSet.keyToStorageFormat(key.get()),
    
  •                                                       fileSet.getIndexFileSize(chunk));
    
  •            if(valueLocation >= 0) {
    
  •                if(exact)
    
  •                    keysAndValueLocations.add(new KeyValueLocation(chunk, key, valueLocation));
    
  •                else
    
  •                    results.put(key, true);
    
  •            }
    
  •        }
    
  •        if(exact) {
    

So if 'exact' is specified, it will no longer be an index-file-only read,
correct? - Yes
In this case, don't we still need to control the rate at which hasKeys
shall be used? - define "control". Do you mean throttle? If yes, then
"exact" being false is precisely what the user would want to use for quick
lookups. Unfortunately the current format of read-only files does not
permit us to do exact matches without looking at the data file.

If hasKeys access data files, it will introduce the same amount of load to
servers as getAll. Won't it defeat the purpose of this function? There will
be saving on network and serialization, but those costs are insignificant
compare to the server cost.

All I am saying is that hasKeys for RO is not as cheap as we thought. Use
it with care!

Thanks,

Lei


Reply to this email directly or view it on GitHubhttps://github.com/voldemort/voldemort/pull/95/files#r1667218.

Lei Gao

This comment has been minimized.

Show comment Hide comment
@rsumbaly

rsumbaly Sep 23, 2012

Contributor

If hasKeys access data files, it will introduce the same amount of load to
servers as getAll. Won't it defeat the purpose of this function? - Which is why there is a "exact" flag. I anyways introduced a new function that reads only the key now in the data file. Though performance-wise I don't think that'll be a big deal unless the value size is really huge.

@rsumbaly

rsumbaly Sep 23, 2012

Contributor

If hasKeys access data files, it will introduce the same amount of load to
servers as getAll. Won't it defeat the purpose of this function? - Which is why there is a "exact" flag. I anyways introduced a new function that reads only the key now in the data file. Though performance-wise I don't think that'll be a big deal unless the value size is really huge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment