table.scan() with filter set will always fail in happybase >= 0.7 #54

Closed
ruhe opened this Issue Jan 22, 2014 · 12 comments

Projects

None yet

3 participants

@ruhe
ruhe commented Jan 22, 2014

Description:
batchSize should not be set on scans with filter.

happybase v0.7 introduced new argument batchSize for TScan in method happybase.table.scan(). When used with filter this parameter will cause all scan operations to fail.

happybase always passes batch_size to TScan, even if there is filter_string present.
there is no way to set batch_size to None since method scan() validates batch_size value:
https://github.com/wbolster/happybase/blob/0.7/happybase/table.py#L259

See corresponding HBase code:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.94.9/org/apache/hadoop/hbase/client/Scan.java?av=f#311

Steps to reproduce:

import happybase
conn = happybase.Connection(host='localhost', port=9090)
conn.create_table('project', {'f': dict()})
table = conn.table('project')

table.put('row1', {'f:qual1': 'val1'})
table.put('row2', {'f:qual1': 'val2'})
table.put('row3', {'f:qual1': 'val1'})

# this operation always fails
for k, v in table.scan(filter="SingleColumnValueFilter ('f', 'qual1', =, 'binary:val1')"): 
    print v
@wbolster
Owner

Would a simple removal of the batch size argument in case a filter argument was supplied introduce other issues?

@wbolster wbolster added a commit that referenced this issue Jan 25, 2014
@wbolster Allow batch_size=None in Table.scan() to avoid filter incompatibilities
Allow None as a valid value for the batch_size argument to Table.scan(),
since HBase does not support specifying a batch size when some scanner
filters are used.

Fixes issue #54.
8481d31
@wbolster
Owner

@ruhe Please test these changes and let me know whether this is a good enough fix? Thanks.

@wbolster
Owner

@ruhe I'm not completely sure how the batchSize argument is actually used when using Thrift. If you have suggestions that improve the current implementation, please share.

@ruhe
ruhe commented Jan 28, 2014

@wbolster thanks a lot for fixing it so fast!
your fix is what i had in mind. the only concern i have is for default value of 'how_many'. but i need to digg deeper in hbase code to provide more detailed comments.

i'll definitely test this fix, stay tuned :)

@wbolster
Owner
wbolster commented Feb 1, 2014

It's a bit more complicated; see issue #56.

@openstack-gerrit openstack-gerrit added a commit to openstack/openstack that referenced this issue Feb 12, 2014
@openstack-gerrit Jenkins + openstack-gerrit Updated openstack/openstack
Project: openstack/requirements  051bd0cca12c57f2fd016f56db2be724c30499f9
null
Fix happybase version

Since version 0.7 happybase contains a bug wbolster/happybase#54
It makes impossible HBase table scanning with filters. Version 0.6 works ok in this scenario.

Change-Id: I33bad6447f6bc1241f3168a3df14e6f5bf028f5b
3bcd9c7
@openstack-gerrit openstack-gerrit added a commit to openstack/openstack that referenced this issue Feb 12, 2014
@openstack-gerrit Jenkins + openstack-gerrit Updated openstack/openstack
Project: openstack/requirements  051bd0cca12c57f2fd016f56db2be724c30499f9
null
Fix happybase version

Since version 0.7 happybase contains a bug wbolster/happybase#54
It makes impossible HBase table scanning with filters. Version 0.6 works ok in this scenario.

Change-Id: I33bad6447f6bc1241f3168a3df14e6f5bf028f5b
ff9bd75
@openstack-gerrit openstack-gerrit pushed a commit to openstack/requirements that referenced this issue Feb 12, 2014
@nshakhat nshakhat Fix happybase version
Since version 0.7 happybase contains a bug wbolster/happybase#54
It makes impossible HBase table scanning with filters. Version 0.6 works ok in this scenario.

Change-Id: I33bad6447f6bc1241f3168a3df14e6f5bf028f5b
db728cb
@wbolster wbolster added a commit that referenced this issue Feb 25, 2014
@wbolster No longer confuse batching/caching; add Table.scan(scan_batching=...)
For details, see the comments added in this commit, and issues #54 and
issue #56.
106dcf0
@wbolster
Owner

I've reverted my previous fix (8481d31) in commit da109ab, and implemented (hopefully) the right fix in 106dcf0.

@wbolster
Owner

Should be fixed in 0.8 (just released).

@wbolster wbolster closed this Feb 25, 2014
@bachvtuan

Hi. I appreciate your works but it won't fixed. I upgraded to v.8 and below is my code which working with Hbase0.96

connection = happybase.Connection(host='localhost', port=9090,autoconnect=False,compat='0.96',transport='buffered')
connection.open()

tables = connection.tables()
print "All available tables"
print tables

user_table = connection.table('users')

for key, data in user_table.scan():
    print key, data

And below is the result in terminal:

All available tables
['inboxes', 'invitations', 'links', 'logs', 'module_categories', 'modules', 'projects', 'todo_comments', 'todo_groups', 'todo_tasks', 'users', 'workspaces']
Traceback (most recent call last):
  File "test.py", line 19, in <module>
for key, data in user_table.scan():
  File "/usr/local/lib/python2.7/dist-packages/happybase/table.py", line 368, in scan
self.name, scan, {})
  File "/usr/local/lib/python2.7/dist-packages/happybase/hbase/Hbase.py", line 1889, in scannerOpenWithScan
return self.recv_scannerOpenWithScan()
  File "/usr/local/lib/python2.7/dist-packages/happybase/hbase/Hbase.py", line 1914, in recv_scannerOpenWithScan
raise result.io
happybase.hbase.ttypes.IOError: IOError(_message='users')

And thrit log:

2014-02-26 08:43:56,244 WARN  [thrift-worker-1] client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch hbase:meta table: 
org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in hbase:meta for table: users, row=users,,99999999999999
  at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:146)
  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1102)
  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1162)
  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1054)
  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1011)
  at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:326)
  at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:192)
  at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:165)
  at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.getTable(ThriftServerRunner.java:462)
  at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.getTable(ThriftServerRunner.java:468)
  at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.scannerOpenWithScan(ThriftServerRunner.java:1200)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.invoke(HbaseHandlerMetricsProxy.java:67)
  at com.sun.proxy.$Proxy7.scannerOpenWithScan(Unknown Source)
  at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$scannerOpenWithScan.getResult(Hbase.java:4433)
  at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$scannerOpenWithScan.getResult(Hbase.java:4417)
  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
  at org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)
2014-02-26 08:43:56,247 WARN  [thrift-worker-1] thrift.ThriftServerRunner$HBaseHandler: users
org.apache.hadoop.hbase.TableNotFoundException: users
  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1181)
  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1054)
  at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1011)
  at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:326)
  at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:192)
  at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:165)
  at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.getTable(ThriftServerRunner.java:462)
  at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.getTable(ThriftServerRunner.java:468)
  at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.scannerOpenWithScan(ThriftServerRunner.java:1200)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.invoke(HbaseHandlerMetricsProxy.java:67)
  at com.sun.proxy.$Proxy7.scannerOpenWithScan(Unknown Source)
  at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$scannerOpenWithScan.getResult(Hbase.java:4433)
  at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$scannerOpenWithScan.getResult(Hbase.java:4417)
  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
  at org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)

Thrift said 'users' table not found but it was there in the list of tables

@bachvtuan

I downgraded to Hbase0.94. V8 worked well.

@wbolster
Owner

I doubt that that is the same issue as the incorrect scan arguments, since for that I have a test that fails before and passes after my most recent fixes. Are you sure your database is valid? Can you access it using the HBase shell, for instance?

@bachvtuan

sorry, that's my fault. I created all tables by using combat 0.94 in connection method ( I forgot update to 0.96 ). Now it worked.

@wbolster
Owner
wbolster commented Mar 2, 2014

@bachvtuan ok, glad to hear that it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment