New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error running ICA on a local machine #172
Comments
This section sticks out to me
This indicates it encounted an |
Hey yeah,
dirpath is the path to the directory containing the data. The data is a set of 8bit .tif files corresponding to individual frames. |
Also, I tried the above with 16 bit tif files, which is what the original images were. That didn't work. Since all the example files seem to be 8 bit tif, I converted them into 8 bit. |
Does the call to |
Yeah, the images load fine. data.first() looks fine too. |
Then I'm lost, because it looks like there's a problem arising when thunder tries to serialize the data, but I can't pinpoint where the |
Weirdly, running with no start and stop index didn't give that error. But I am getting a JAVA out of memory error again. I think I set the heap size to 6gb. I did JAVA_OPTS="-Xms6g -Xmx 6g". The data I am loading is 100mb. So I don't know what else to do to fix that. I am now attempting to run it on EC2. Haven't ever done that before. So might take a bit. |
Interesting. That could mean the indices you chose were handled weirdly by the serializer, or perhaps the system just ran out of memory before it could get to the By the way, for your Java options, it doesn't make much sense to make your initial and max heap sizes the same. For 100 MB of data, I'd say 512m is a fine initial size, and just make 6g or 4096m your max size. This way your system won't eat up way more than it needs and should have the ability to extend if it needs to. If even that doesn't fix your heap problems, you might want to dig into your program and understand more about where the problem is coming from. Try using some numpy or scipy tools to do your ICA manually and use IPython's memory profiler, or something similar, to track down what's eating up all of your memory. |
@vjlbym thanks for the detailed info on this issue! and thanks for @GrantRVD for helping out. Would it be possible to share these particular files? (even just a subset of them would be helpful) You could post to a public bucket on S3, or if you'd rather not share publicly, we could find another way to send them to me. That would let us do some tests and try to get to the bottom of it. As I mentioned in the gitter, local usage is currently sub-optimal, especially when it comes to memory, but that's something we're looking to improve in future releases. Comparisons to memory use with other tools may, unfortunately, not be particularly informative, though profiling could be. Trying to run on EC2 would definitely be worthwhile (and ultimately probably the way to go!). That said, we should be able to figure out the problem here. And btw, both 16-bit and 8-bit tifs should in general work fine, I think that's probably not the particular problem here. |
@GrantRVD I am pretty sure it had to do with the index since it barely ran previously, whereas with no indices mentioned, a first set of tasks were done for the 3000 images. Changed the heap sizes. I am pretty much new to JAVA, Thunder, Spark and Python. So live and learn :) That didn't solve the memory issue though. |
@vjlbym thanks for sharing the data! I've been playing with it and -- for better or for worse! -- can't seem to reproduce these problems. I'm running Thunder 0.6.0.dev (the current master branch), and Spark 1.3.0, on Mac OS X 10.9.5, no special Java memory settings With the data you sent, the following all works fine:
Some notes / caveats:
so with all that, i'd say there's a chance switching to 1.3.0 will help, other it's likely Windows-specific memory issues, which may be hard to debug |
@freeman-lab Thanks a lot for the detailed reply. One thing is that I have been running all the above in Ubuntu. I was trying to get Spark to run on Windows since people in my lab are more comfortable with it. It's good to know about |
Ahh, thanks for the clarification, in that case ignore everything I said about Windows =) Fingers crossed 1.3.0 will help then. And this was on a MacBook Air with 8 GB RAM. |
Oh. My desktop should be perfectly able to handle it then. Hope it's solved by 1.3.0. Thanks again! Will post if it worked first thing in the morning :) |
So it worked finally! 1.3.0 still gave me OOM errors. But the issue apparently was that java runtime options in Ubuntu had to be set using _JAVA_OPTIONS and not JAVA_OPTS, even though this website says JAVA_OPTS is the way to go for Ubuntu http://askubuntu.com/questions/107665/how-do-i-change-java-runtime-parameters Changing it to _JAVA_OPTIONS as @GrantRVD had suggested for Windows worked for my Ubuntu. I didn't use that one since it seemed to be Windows specific, but apparently not! Thanks a lot, @GrantRVD and @freeman-lab!! I'll close this issue now. |
@vjlbym that's great! I didn't know that about the naming conventions for JAVA_OPTS, very curious, great job figuring it out. It would be great if you could submit a pull request adding a note about this to the FAQ, something like "I'm getting out of memory errors during local usage", and then a description of how to set these java opts in different environments. |
Here's the source file you'd be adding to https://github.com/thunder-project/thunder/blob/master/python/doc/faq.rst |
@freeman-lab Just did. Apparently, this isn't environment specific. It's just a JAVA thing: https://community.oracle.com/message/6440415 |
Hi all,
I am posting an error log that I am getting when trying to run ICA on a recording of Ca2+ traces. There are about 50 cells in the field of view. So I set the number of ICs to 75, with 150 PCs.
The images at each time point are stored as .tif files. I loaded them in as a series and then normalized them using:
normdata = data.toTimeSeries().normalize(baseline='mean') #Normalize data by the global mean. (data-mean)/mean
normdata = data.toTimeSeries()
normdata.cache()
Thanks a lot for your help! And also, thanks a lot for Thunder :)
Py4JJavaError Traceback (most recent call last)
in ()
3 start_time = time.time()
4 from thunder import ICA
----> 5 modelICA = ICA(k=150,c=75).fit(normdata) # Run ICA on normalized data. k=#of principal components, c=#of ICs
6 sns.set_style('darkgrid')
7 plt.plot(modelICA.a);
/home/stuberlab/anaconda/lib/python2.7/site-packages/thunder/factorization/ica.pyc in fit(self, data)
95
96 # reduce dimensionality
---> 97 svd = SVD(k=self.k, method=self.svdMethod).calc(data)
98
99 # whiten data
/home/stuberlab/anaconda/lib/python2.7/site-packages/thunder/factorization/svd.pyc in calc(self, mat)
137
138 # compute (xx')^-1 through a map reduce
--> 139 xx = mat.times(cInv).gramian()
140 xxInv = inv(xx)
141
/home/stuberlab/anaconda/lib/python2.7/site-packages/thunder/rdds/matrices.pyc in times(self, other)
191 newindex = arange(0, new_d)
192 return self._constructor(self.rdd.mapValues(lambda x: dot(x, other_b.value)),
--> 193 nrows=self._nrows, ncols=new_d, index=newindex).finalize(self)
194
195 def elementwise(self, other, op):
/home/stuberlab/anaconda/lib/python2.7/site-packages/thunder/rdds/matrices.pyc in init(self, rdd, index, dims, dtype, nrows, ncols, nrecords)
52 elif ncols is not None:
53 index = arange(ncols)
---> 54 super(RowMatrix, self).init(rdd, nrecords=nrecs, dtype=dtype, dims=dims, index=index)
55
56 @Property
/home/stuberlab/anaconda/lib/python2.7/site-packages/thunder/rdds/series.pyc in init(self, rdd, nrecords, dtype, index, dims)
48 self._index = None
49 if index is not None:
---> 50 self.index = index
51 if dims and not isinstance(dims, Dimensions):
52 try:
/home/stuberlab/anaconda/lib/python2.7/site-packages/thunder/rdds/series.pyc in index(self, value)
65 def index(self, value):
66 # touches self.index to trigger automatic calculation from first record if self.index is not set
---> 67 lenSelf = len(self.index)
68 if type(value) is str:
69 value = [value]
/home/stuberlab/anaconda/lib/python2.7/site-packages/thunder/rdds/series.pyc in index(self)
59 def index(self):
60 if self._index is None:
---> 61 self.populateParamsFromFirstRecord()
62 return self._index
63
/home/stuberlab/anaconda/lib/python2.7/site-packages/thunder/rdds/series.pyc in populateParamsFromFirstRecord(self)
103 Returns the result of calling self.rdd.first().
104 """
--> 105 record = super(Series, self).populateParamsFromFirstRecord()
106 if self._index is None:
107 val = record[1]
/home/stuberlab/anaconda/lib/python2.7/site-packages/thunder/rdds/data.pyc in populateParamsFromFirstRecord(self)
76 from numpy import asarray
77
---> 78 record = self.rdd.first()
79 self._dtype = str(asarray(record[1]).dtype)
80 return record
/home/stuberlab/Downloads/spark-1.1.0-bin-hadoop1/python/pyspark/rdd.pyc in first(self)
1165 2
1166 """
-> 1167 return self.take(1)[0]
1168
1169 def saveAsNewAPIHadoopDataset(self, conf, keyConverter=None, valueConverter=None):
/home/stuberlab/Downloads/spark-1.1.0-bin-hadoop1/python/pyspark/rdd.pyc in take(self, num)
1151 p = range(
1152 partsScanned, min(partsScanned + numPartsToTry, totalParts))
-> 1153 res = self.context.runJob(self, takeUpToNumLeft, p, True)
1154
1155 items += res
/home/stuberlab/Downloads/spark-1.1.0-bin-hadoop1/python/pyspark/context.pyc in runJob(self, rdd, partitionFunc, partitions, allowLocal)
768 # SparkContext#runJob.
769 mappedRDD = rdd.mapPartitions(partitionFunc)
--> 770 it = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, javaPartitions, allowLocal)
771 return list(mappedRDD._collect_iterator_through_file(it))
772
/home/stuberlab/Downloads/spark-1.1.0-bin-hadoop1/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in call(self, *args)
536 answer = self.gateway_client.send_command(command)
537 return_value = get_return_value(answer, self.gateway_client,
--> 538 self.target_id, self.name)
539
540 for temp_arg in temp_args:
/home/stuberlab/Downloads/spark-1.1.0-bin-hadoop1/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
298 raise Py4JJavaError(
299 'An error occurred while calling {0}{1}{2}.\n'.
--> 300 format(target_id, '.', name), value)
301 else:
302 raise Py4JError(
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 17.0 failed 1 times, most recent failure: Lost task 0.0 in stage 17.0 (TID 12005, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/stuberlab/Downloads/spark-1.1.0-bin-hadoop1/python/pyspark/worker.py", line 75, in main
command = pickleSer._read_with_length(infile)
File "/home/stuberlab/Downloads/spark-1.1.0-bin-hadoop1/python/pyspark/serializers.py", line 146, in _read_with_length
length = read_int(stream)
File "/home/stuberlab/Downloads/spark-1.1.0-bin-hadoop1/python/pyspark/serializers.py", line 464, in read_int
raise EOFError
EOFError
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
The text was updated successfully, but these errors were encountered: