Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stopped downloads lead to zombie queries #509

Closed
dhalperi opened this issue May 4, 2014 · 1 comment · Fixed by #602
Closed

stopped downloads lead to zombie queries #509

dhalperi opened this issue May 4, 2014 · 1 comment · Fixed by #602
Labels

Comments

@dhalperi
Copy link
Member

dhalperi commented May 4, 2014

There seems to be some bug with handling of stopped downloads, which does and/or can lead to queries never being removed from the activeQueries list.

To reproduce:

  • start downloading a big dataset
  • close the tab / cancel the download / etc.
  • the query should go into ERROR status mode because the pipe was closed.
  • the query might not be removed from the activeQueries list, but I was not able to reproduce this on my Mac (SSD vs Disk and different commit modes for Mac vs Linux?)

One way to check the active queries list is to exploit a bug(?) in the system code by picking a small max query. E.g., https://demo.myria.cs.washington.edu/queries?max=1. Any query with an ID# >max is an active query.

@dhalperi dhalperi added the Bug label May 4, 2014
@dhalperi
Copy link
Member Author

dhalperi commented May 4, 2014

Seems to be an InterruptedException in the MasterCatalog while the Server is trying to update the query status to killed. Hmm. Maybe the server is killing itself after it's already killed the query?

ERROR 2014-05-02 12:17:29,702 [Master query executor#18] QuerySubTreeTask - Unexpected exception occur at operator excution. Operator: edu.washington.escience.myria.operator.DataOutput@65e80afe
edu.washington.escience.myria.DbException: java.io.IOException: Pipe closed
        at edu.washington.escience.myria.operator.DataOutput.consumeTuples(DataOutput.java:55)
        at edu.washington.escience.myria.operator.RootOperator.fetchNextReady(RootOperator.java:59)
        at edu.washington.escience.myria.operator.Operator.nextReady(Operator.java:320)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask.executeActually(QuerySubTreeTask.java:411)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask.access$200(QuerySubTreeTask.java:33)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask$1.call(QuerySubTreeTask.java:162)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask$1.call(QuerySubTreeTask.java:153)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at edu.washington.escience.myria.util.concurrent.RenamingThreadFactory$1.run(RenamingThreadFactory.java:33)
Caused by: java.io.IOException: Pipe closed
        at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:261)
        at java.io.PipedInputStream.receive(PipedInputStream.java:227)
        at java.io.PipedOutputStream.write(PipedOutputStream.java:149)
        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
        at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
        at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
        at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
        at java.io.BufferedWriter.write(BufferedWriter.java:188)
        at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129)
        at java.io.BufferedWriter.write(BufferedWriter.java:230)
        at java.io.Writer.write(Writer.java:157)
        at org.supercsv.io.AbstractCsvWriter.writeRow(AbstractCsvWriter.java:196)
        at org.supercsv.io.CsvListWriter.write(CsvListWriter.java:87)
        at edu.washington.escience.myria.CsvTupleWriter.writeTuples(CsvTupleWriter.java:74)
        at edu.washington.escience.myria.operator.DataOutput.consumeTuples(DataOutput.java:53)
        ... 10 more
WARN  2014-05-02 12:17:29,705 [Master query executor#18] OperationFutureBase - An exception was thrown by OperationFutureListener.
edu.washington.escience.myria.coordinator.catalog.CatalogException: java.lang.InterruptedException
        at edu.washington.escience.myria.coordinator.catalog.MasterCatalog.queryFinished(MasterCatalog.java:1393)
        at edu.washington.escience.myria.parallel.Server$2.operationComplete(Server.java:1121)
        at edu.washington.escience.myria.parallel.QueryFutureListener.operationComplete(QueryFutureListener.java:43)
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListener(OperationFutureBase.java:606)                                                                        
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListeners(OperationFutureBase.java:565)                                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.wakeupWaitersAndNotifyListeners(OperationFutureBase.java:158)                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.setFailure0(OperationFutureBase.java:529)                                                                           
        at edu.washington.escience.myria.parallel.DefaultQueryFuture.setFailure(DefaultQueryFuture.java:67)                                                                                      
        at edu.washington.escience.myria.parallel.MasterQueryPartition$WorkerExecutionInfo$2.operationComplete(MasterQueryPartition.java:125)                                                    
        at edu.washington.escience.myria.parallel.QueryFutureListener.operationComplete(QueryFutureListener.java:43)                                                                             
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListener(OperationFutureBase.java:606)                                                                        
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListeners(OperationFutureBase.java:565)                                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.wakeupWaitersAndNotifyListeners(OperationFutureBase.java:158)                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.setFailure0(OperationFutureBase.java:529)                                                                           
        at edu.washington.escience.myria.parallel.DefaultQueryFuture.setFailure(DefaultQueryFuture.java:67)                                                                                      
        at edu.washington.escience.myria.parallel.MasterQueryPartition$WorkerExecutionInfo$2.operationComplete(MasterQueryPartition.java:125)                                                    
        at edu.washington.escience.myria.parallel.QueryFutureListener.operationComplete(QueryFutureListener.java:43)                                                                             
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListener(OperationFutureBase.java:606)                                                                        
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListeners(OperationFutureBase.java:565)                                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.wakeupWaitersAndNotifyListeners(OperationFutureBase.java:158)                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.setFailure0(OperationFutureBase.java:529)                                                                           
        at edu.washington.escience.myria.parallel.DefaultQueryFuture.setFailure(DefaultQueryFuture.java:67)                                                                                      
        at edu.washington.escience.myria.parallel.MasterQueryPartition$1.operationComplete(MasterQueryPartition.java:256)                                                                        
        at edu.washington.escience.myria.parallel.TaskFutureListener.operationComplete(TaskFutureListener.java:43)                                                                               
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListener(OperationFutureBase.java:606)                                                                        
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.notifyListeners(OperationFutureBase.java:565)                                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.wakeupWaitersAndNotifyListeners(OperationFutureBase.java:158)                                                       
        at edu.washington.escience.myria.util.concurrent.OperationFutureBase.setFailure0(OperationFutureBase.java:529)                                                                           
        at edu.washington.escience.myria.parallel.DefaultTaskFuture.setFailure(DefaultTaskFuture.java:67)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask.executeActually(QuerySubTreeTask.java:462)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask.access$200(QuerySubTreeTask.java:33)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask$1.call(QuerySubTreeTask.java:162)
        at edu.washington.escience.myria.parallel.QuerySubTreeTask$1.call(QuerySubTreeTask.java:153)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at edu.washington.escience.myria.util.concurrent.RenamingThreadFactory$1.run(RenamingThreadFactory.java:33)
Caused by: java.lang.InterruptedException
        at com.almworks.sqlite4java.SQLiteJob.get(SQLiteJob.java:322)
        at com.almworks.sqlite4java.SQLiteJob.get(SQLiteJob.java:283)
        at edu.washington.escience.myria.coordinator.catalog.MasterCatalog.queryFinished(MasterCatalog.java:1367)
        ... 29 more

dhalperi added a commit that referenced this issue May 4, 2014
Even if there's an error in the Catalog, remove the query from the
active queryset after updating (or failing to update) the Catalog.

Otherwise we get zombies. See #509.
dhalperi added a commit that referenced this issue May 6, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant