Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SnappyOutputStream.close() is not idempotent #107

Closed
JoshRosen opened this issue May 14, 2015 · 2 comments
Closed

SnappyOutputStream.close() is not idempotent #107

JoshRosen opened this issue May 14, 2015 · 2 comments

Comments

@JoshRosen
Copy link
Contributor

SnappyOutputStream's close() method is not idempotent, which can lead to stream corruption issues when CachedBufferAllocator is used (the default behavior).

Here's the current source for close():

    public void close() throws IOException {
        try {
            flush();
            out.close();
        } finally {
            inputBufferAllocator.release(inputBuffer);
            outputBufferAllocator.release(outputBuffer);
        }
    }

If we call close() twice, then we'll free the buffers twice. After the first close() call, the stream's buffers will be released back to the BachedBufferAllocator's buffer pool for re-use. If we create a new SnappyOutputStream, this stream will re-use the buffers from the BachedBufferAllocator. If we now call close() on the original stream a second time, we'll end up placing the same buffers into the BachedBufferAllocator's buffer pool even though another non-closed SnappyOutputStream is now using them. This can lead to scenarios where two open SnappyOutputStreams share the same buffers, leading to stream corruption issues.

This should be pretty easy to fix; I'm working on a unit test + patch now.

@xerial
Copy link
Owner

xerial commented May 14, 2015

Good catch!

JoshRosen added a commit to JoshRosen/snappy-java that referenced this issue May 14, 2015
JoshRosen added a commit to JoshRosen/snappy-java that referenced this issue May 14, 2015
asfgit pushed a commit to apache/spark that referenced this issue May 17, 2015
This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for xerial/snappy-java#107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits:

8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660

(cherry picked from commit f2cc6b5)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
asfgit pushed a commit to apache/spark that referenced this issue May 17, 2015
This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for xerial/snappy-java#107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits:

8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660
asfgit pushed a commit to apache/spark that referenced this issue May 17, 2015
This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for xerial/snappy-java#107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits:

8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660

(cherry picked from commit f2cc6b5)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>

Conflicts:
	core/src/main/scala/org/apache/spark/io/CompressionCodec.scala
	core/src/test/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriterSuite.java
asfgit pushed a commit to apache/spark that referenced this issue May 17, 2015
This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for xerial/snappy-java#107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits:

8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660

(cherry picked from commit f2cc6b5)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>

Conflicts:
	core/src/main/scala/org/apache/spark/io/CompressionCodec.scala
	core/src/test/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriterSuite.java
@xerial
Copy link
Owner

xerial commented May 18, 2015

Uploaded 1.1.2-RC2-SNAPSHOT

xerial added a commit that referenced this issue May 18, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this issue May 28, 2015
This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for xerial/snappy-java#107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix.

Author: Josh Rosen <joshrosen@databricks.com>

Closes apache#6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits:

8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660
jeanlyn pushed a commit to jeanlyn/spark that referenced this issue Jun 12, 2015
This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for xerial/snappy-java#107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix.

Author: Josh Rosen <joshrosen@databricks.com>

Closes apache#6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits:

8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660
nemccarthy pushed a commit to nemccarthy/spark that referenced this issue Jun 19, 2015
This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for xerial/snappy-java#107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix.

Author: Josh Rosen <joshrosen@databricks.com>

Closes apache#6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits:

8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants