Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Port python zippath #26

Closed
wants to merge 20 commits into from

5 participants

@cpdean

For fixing #6917

https://twistedmatrix.com/trac/ticket/6917

Most of the changes involve keeping file paths as bytestrings except when hitting python3 stdlib zipfile, which only allows you to use strs.

cpdean added some commits
@cpdean cpdean add python3 future imports and add zippath to py3 test suite 242cc5f
@cpdean cpdean WIP: getting around bytes + str error, still broken with ZipArchive(b…
…'..'). see twisted.test.test_paths for workaround?
0d451dc
@cpdean cpdean fix bugs using zipfile. it only works if you pass it strings, not byt…
…estrings
e094de2
@cpdean cpdean fix tests in python2, strings are now unicode b9db7e9
@cpdean cpdean converting strings to bytes, since that's how the test_paths suite looks eac43c8
@cpdean cpdean convert stray strings to bytes aa8620e
@cpdean cpdean convert more stray strings to bytes fa8909a
@cpdean cpdean wip: confused with these repr tests breaking in py2 82dfacf
@cpdean cpdean fix escaping test between py2 and 3 f728abf
@cpdean cpdean fix error adding bytes to strings, fix ZipArchive to allow taking byt…
…es, since ZipFile breaks if you pass it bytes
e9e0dd7
@cpdean cpdean keep archive path internally as bytes, fix parentdir test 43f072b
@cpdean cpdean fix bytes mixing issue 3aba529
@cpdean cpdean fix double encoding error. broken equality check now :( 8db91d6
@cpdean cpdean fix error with getting file info from archive. was hashed against 'st…
…r', not 'bytes'
44fe461
@cpdean cpdean fix 'validFiles' test, python3 zipfile doesn't look at bytestrings. f…
…ixed encoding consistency
34b213f
@cpdean cpdean fix last tests -- keep internal modeling with bytes, otherwise switch…
… to str with stdlib zipfile
b0837cc
@cpdean cpdean fix twistedchecker errors af5627d
@cpdean cpdean removing commits about __eq__, using original __cmp__ with class deco…
…rator
38bce6d
twisted/python/zippath.py
@@ -53,10 +58,16 @@ def __init__(self, archive, pathInArchive):
@param pathInArchive: a ZIP_PATH_SEP-separated string.
"""
self.archive = archive
- self.pathInArchive = pathInArchive
+
+ # Keep pathInArchive as bytes
@adiroiban Collaborator

This comment is not very helpful . It just state the obvious think of what the code does, but now why you need to keep it as bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
twisted/python/test/test_zippath.py
@@ -9,19 +9,24 @@
from twisted.test.test_paths import AbstractFilePathTestCase
from twisted.python.zippath import ZipArchive
+import twisted.python.compat as compat
+
+import sys
+
+encoding = sys.getfilesystemencoding()
@adiroiban Collaborator

I find this module global confusing. Maybe it should be fsEncoding or something to hint that it is not an arbitrary encoding, but tied to filesystem.

@adiroiban Collaborator

I see below that this is also defined as ENCODING ... I think this patch should be more consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@adiroiban adiroiban commented on the diff
twisted/python/test/test_zippath.py
@@ -9,19 +9,24 @@
from twisted.test.test_paths import AbstractFilePathTestCase
from twisted.python.zippath import ZipArchive
+import twisted.python.compat as compat
@adiroiban Collaborator

why not do: from twisted.python import compat ?

@adiroiban Collaborator

compat is only imported for _PY3 ... so maybe better to have

fromm twisted.python.compat import _PY3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
twisted/python/test/test_zippath.py
@@ -46,13 +51,15 @@ def test_zipPathRepr(self):
"""
child = self.path.child("foo")
pathRepr = "ZipPath(%r)" % (
- os.path.abspath(self.cmn + ".zip" + os.sep + 'foo'),)
+ os.path.abspath(self.cmn + b".zip" + os.sep.encode() + b'foo'),)
@adiroiban Collaborator

This patch uses os.sep.encode() and os.sep.encode('utf-8') ... is there a reason for having these 2 variants?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
twisted/python/test/test_zippath.py
((9 lines not shown))
def zipit(dirname, zfname):
"""
Create a zipfile on zfname, containing the contents of dirname'
"""
- zf = zipfile.ZipFile(zfname, "w")
+ zf = zipfile.ZipFile(zfname.decode(encoding), "w")
@adiroiban Collaborator

I see that VAR.zfname.decode(encoding) is called in multiple places... maybe it can be extracted into a helper function _filesystemEncode(VAR) which does the appropriate conversion in a single place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
twisted/python/test/test_zippath.py
@@ -68,7 +75,8 @@ def test_zipPathReprParentDirSegment(self):
"""
child = self.path.child("foo").child("..").child("bar")
pathRepr = "ZipPath(%r)" % (
- self.cmn + ".zip" + os.sep.join(["", "foo", "..", "bar"]))
+ self.cmn +
+ (".zip" + os.sep.join(["", "foo", "..", "bar"])).encode("utf-8"))
@adiroiban Collaborator

I it hard to read this kind of code. I suggest to work with Unicode as much as possible and then only do the conversion once for the whole pathRepr.

also... here the encoding should be hardcoded to utf-8 or filesystem encoding ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
twisted/python/test/test_zippath.py
@@ -78,8 +86,11 @@ def test_zipPathReprEscaping(self):
string literals are escaped in the ZipPath repr.
"""
child = self.path.child("'")
- path = self.cmn + ".zip" + os.sep.join(["", "'"])
- pathRepr = "ZipPath('%s')" % (path.encode('string-escape'),)
+ path = self.cmn + (".zip" + os.sep.join(["", "'"])).encode("utf-8")
+ if compat._PY3:
@adiroiban Collaborator

Maybe encoding the text only at the end could get rid of this if ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@glyph
Owner

Please note that we don't currently accept pull requests, so you'll need to submit this for review via trac.

@cpdean

Hey @glyph! Thanks for reaching out on github. When I was working on this I decided to kind of track it in parallel on both systems for some reason. Here's the trac ticket: https://twistedmatrix.com/trac/ticket/6917

My fixes were messy and gross because I wanted to get something that seemed to work out there to get feedback sooner. Along the way I learned a lot about how zip files are kind of broken in python3, and then @itamarst pointed out that a better solution might be to just disable zip file support in python3 since it's only used for loading packages stored as zips. I wanted to try to implement that before getting more feedback but hit a total roadblock and ran out of free time for it.

Looking back at this, you're right that I shouldn't have opened up multiple places for my submission. This PR should probably be closed.

@cpdean cpdean closed this
@cpdean

I wanted to update trac on where I got stuck but I can't seem to comment on the ticket anymore...

Have there been changes to trac in the past couple months?

@itamarst

Comments are disabled due to a massive spam problem; we're hoping to upgrade the software to fix that in the near future.

@exarkun
Owner

Comments aren't disabled. They're just privileged. See https://twistedmatrix.com/pipermail/twisted-python/2014-June/028439.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Apr 27, 2014
  1. @cpdean
  2. @cpdean

    WIP: getting around bytes + str error, still broken with ZipArchive(b…

    cpdean authored
    …'..'). see twisted.test.test_paths for workaround?
  3. @cpdean
  4. @cpdean
  5. @cpdean
  6. @cpdean

    convert stray strings to bytes

    cpdean authored
  7. @cpdean
  8. @cpdean
  9. @cpdean
  10. @cpdean

    fix error adding bytes to strings, fix ZipArchive to allow taking byt…

    cpdean authored
    …es, since ZipFile breaks if you pass it bytes
  11. @cpdean
  12. @cpdean

    fix bytes mixing issue

    cpdean authored
  13. @cpdean
Commits on Apr 28, 2014
  1. @cpdean
  2. @cpdean

    fix 'validFiles' test, python3 zipfile doesn't look at bytestrings. f…

    cpdean authored
    …ixed encoding consistency
  3. @cpdean

    fix last tests -- keep internal modeling with bytes, otherwise switch…

    cpdean authored
    … to str with stdlib zipfile
  4. @cpdean

    fix twistedchecker errors

    cpdean authored
  5. @cpdean
Commits on May 2, 2014
  1. @cpdean
  2. @cpdean
This page is out of date. Refresh to see the latest.
View
1  twisted/python/dist3.py
@@ -100,6 +100,7 @@
"twisted.python.test",
"twisted.python.test.deprecatedattributes",
"twisted.python.test.modules_helpers",
+ "twisted.python.test.test_zippath",
"twisted.python.threadable",
"twisted.python.threadpool",
"twisted.python.usage",
View
48 twisted/python/test/test_zippath.py
@@ -9,19 +9,24 @@
from twisted.test.test_paths import AbstractFilePathTestCase
from twisted.python.zippath import ZipArchive
+import twisted.python.compat as compat
@adiroiban Collaborator

why not do: from twisted.python import compat ?

@adiroiban Collaborator

compat is only imported for _PY3 ... so maybe better to have

fromm twisted.python.compat import _PY3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+
+import sys
+
+ENCODING = sys.getfilesystemencoding()
def zipit(dirname, zfname):
"""
Create a zipfile on zfname, containing the contents of dirname'
"""
- zf = zipfile.ZipFile(zfname, "w")
+ zf = zipfile.ZipFile(zfname.decode(ENCODING), "w")
for root, ignored, files, in os.walk(dirname):
for fname in files:
fspath = os.path.join(root, fname)
arcpath = os.path.join(root, fname)[len(dirname)+1:]
# print fspath, '=>', arcpath
- zf.write(fspath, arcpath)
+ zf.write(fspath.decode(ENCODING), arcpath.decode(ENCODING))
zf.close()
@@ -33,10 +38,10 @@ class ZipFilePathTestCase(AbstractFilePathTestCase):
"""
def setUp(self):
AbstractFilePathTestCase.setUp(self)
- zipit(self.cmn, self.cmn + '.zip')
- self.path = ZipArchive(self.cmn + '.zip')
+ zipit(self.cmn, self.cmn + b'.zip')
+ self.path = ZipArchive((self.cmn + b'.zip').decode(ENCODING))
self.root = self.path
- self.all = [x.replace(self.cmn, self.cmn + '.zip') for x in self.all]
+ self.all = [x.replace(self.cmn, self.cmn + b'.zip') for x in self.all]
def test_zipPathRepr(self):
@@ -44,17 +49,19 @@ def test_zipPathRepr(self):
Make sure that invoking ZipPath's repr prints the correct class name
and an absolute path to the zip file.
"""
- child = self.path.child("foo")
+ child = self.path.child(b"foo")
pathRepr = "ZipPath(%r)" % (
- os.path.abspath(self.cmn + ".zip" + os.sep + 'foo'),)
+ os.path.abspath(self.cmn + b".zip" + os.sep.encode() + b'foo'),)
# Check for an absolute path
self.assertEqual(repr(child), pathRepr)
# Create a path to the file rooted in the current working directory
- relativeCommon = self.cmn.replace(os.getcwd() + os.sep, "", 1) + ".zip"
+ relativeCommon = self.cmn.replace(
+ os.getcwd().encode(ENCODING) + os.sep.encode(ENCODING), b"", 1)
+ relativeCommon += b".zip"
relpath = ZipArchive(relativeCommon)
- child = relpath.child("foo")
+ child = relpath.child(b"foo")
# Check using a path without the cwd prepended
self.assertEqual(repr(child), pathRepr)
@@ -66,9 +73,10 @@ def test_zipPathReprParentDirSegment(self):
includes the C{".."} rather than applying the usual parent directory
meaning.
"""
- child = self.path.child("foo").child("..").child("bar")
+ child = self.path.child(b"foo").child(b"..").child(b"bar")
pathRepr = "ZipPath(%r)" % (
- self.cmn + ".zip" + os.sep.join(["", "foo", "..", "bar"]))
+ self.cmn +
+ (".zip" + os.sep.join(["", "foo", "..", "bar"])).encode("utf-8"))
self.assertEqual(repr(child), pathRepr)
@@ -77,9 +85,12 @@ def test_zipPathReprEscaping(self):
Bytes in the ZipPath path which have special meaning in Python
string literals are escaped in the ZipPath repr.
"""
- child = self.path.child("'")
- path = self.cmn + ".zip" + os.sep.join(["", "'"])
- pathRepr = "ZipPath('%s')" % (path.encode('string-escape'),)
+ child = self.path.child(b"'")
+ path = self.cmn + (".zip" + os.sep.join(["", "'"])).encode("utf-8")
+ if compat._PY3:
+ pathRepr = "ZipPath(%s)" % (path,)
+ else:
+ pathRepr = "ZipPath(%r)" % (path,)
self.assertEqual(repr(child), pathRepr)
@@ -88,13 +99,18 @@ def test_zipArchiveRepr(self):
Make sure that invoking ZipArchive's repr prints the correct class
name and an absolute path to the zip file.
"""
- pathRepr = 'ZipArchive(%r)' % (os.path.abspath(self.cmn + '.zip'),)
+ pathRepr = 'ZipArchive(%r)' % (os.path.abspath(self.cmn + b'.zip'),)
# Check for an absolute path
self.assertEqual(repr(self.path), pathRepr)
# Create a path to the file rooted in the current working directory
- relativeCommon = self.cmn.replace(os.getcwd() + os.sep, "", 1) + ".zip"
+ relativeCommon = self.cmn.replace(
+ os.getcwd().encode(ENCODING) + os.sep.encode(ENCODING),
+ b"",
+ 1
+ )
+ relativeCommon = relativeCommon + b".zip"
relpath = ZipArchive(relativeCommon)
# Check using a path without the cwd prepended
View
68 twisted/python/zippath.py
@@ -7,6 +7,7 @@
See the constructor for ZipArchive for use.
"""
+from __future__ import print_function, division, absolute_import
__metaclass__ = type
@@ -28,14 +29,18 @@
from twisted.python.filepath import IFilePath, FilePath, AbstractFilePath
from zope.interface import implementer
+from twisted.python.compat import comparable, cmp
-# using FilePath here exclusively rather than os to make sure that we don't do
+# Using FilePath here exclusively rather than os to make sure that we don't do
# anything OS-path-specific here.
-ZIP_PATH_SEP = '/' # In zipfiles, "/" is universally used as the
+ZIP_PATH_SEP = b'/' # In zipfiles, "/" is universally used as the
# path separator, regardless of platform.
+ENCODING = sys.getfilesystemencoding()
+
+@comparable
@implementer(IFilePath)
class ZipPath(AbstractFilePath):
"""
@@ -54,9 +59,10 @@ def __init__(self, archive, pathInArchive):
"""
self.archive = archive
self.pathInArchive = pathInArchive
+
# self.path pretends to be os-specific because that's the way the
# 'zipimport' module does it.
- self.path = os.path.join(archive.zipfile.filename,
+ self.path = os.path.join(archive.zipfile.filename.encode(),
*(self.pathInArchive.split(ZIP_PATH_SEP)))
def __cmp__(self, other):
@@ -69,8 +75,8 @@ def __cmp__(self, other):
def __repr__(self):
parts = [os.path.abspath(self.archive.path)]
parts.extend(self.pathInArchive.split(ZIP_PATH_SEP))
- path = os.sep.join(parts)
- return "ZipPath('%s')" % (path.encode('string-escape'),)
+ path = os.sep.encode().join(parts)
+ return "ZipPath(%r)" % (path,)
def parent(self):
@@ -92,7 +98,13 @@ def child(self, path):
it) as this means it may include special names with special
meaning outside of the context of a zip archive.
"""
- return ZipPath(self.archive, ZIP_PATH_SEP.join([self.pathInArchive, path]))
+ try:
+ encodedPath = path.encode(ENCODING)
+ except AttributeError:
+ encodedPath = path
+
+ return ZipPath(self.archive,
+ ZIP_PATH_SEP.join([self.pathInArchive, encodedPath]))
def sibling(self, path):
@@ -115,7 +127,8 @@ def islink(self):
def listdir(self):
if self.exists():
if self.isdir():
- return self.archive.childmap[self.pathInArchive].keys()
+ # py3's dict().keys() is no longer a list
+ return list(self.archive.childmap[self.pathInArchive])
else:
raise OSError(errno.ENOTDIR, "Leaf zip entry listed")
else:
@@ -135,18 +148,22 @@ def splitext(self):
def basename(self):
return self.pathInArchive.split(ZIP_PATH_SEP)[-1]
+
def dirname(self):
# XXX NOTE: This API isn't a very good idea on filepath, but it's even
# less meaningful here.
return self.parent().path
+
def open(self, mode="r"):
if _USE_ZIPFILE:
- return self.archive.zipfile.open(self.pathInArchive, mode=mode)
+ return self.archive.zipfile.open(
+ self.pathInArchive.decode(ENCODING), mode=mode)
else:
# XXX oh man, is this too much hax?
self.archive.zipfile.mode = mode
- return self.archive.zipfile.readfile(self.pathInArchive)
+ return self.archive.zipfile.readfile(
+ self.pathInArchive.decode(ENCODING))
def changed(self):
pass
@@ -158,7 +175,8 @@ def getsize(self):
@return: file size, in bytes
"""
- return self.archive.zipfile.NameToInfo[self.pathInArchive].file_size
+ pathInArchive = self.pathInArchive.decode(ENCODING)
+ return self.archive.zipfile.NameToInfo[pathInArchive].file_size
def getAccessTime(self):
"""
@@ -177,8 +195,9 @@ def getModificationTime(self):
@return: a number of seconds since the epoch.
"""
+ pathInArchive = self.pathInArchive.decode(ENCODING)
return time.mktime(
- self.archive.zipfile.NameToInfo[self.pathInArchive].date_time
+ self.archive.zipfile.NameToInfo[pathInArchive].date_time
+ (0, 0, 0))
@@ -194,30 +213,43 @@ def getStatusChangeTime(self):
class ZipArchive(ZipPath):
- """ I am a FilePath-like object which can wrap a zip archive as if it were a
- directory.
+ """ I am a FilePath-like object which can wrap a zip archive as if it were
+ a directory.
"""
archive = property(lambda self: self)
def __init__(self, archivePathname):
- """Create a ZipArchive, treating the archive at archivePathname as a zip file.
+ """Create a ZipArchive, treating the archive at archivePathname as a
+ zip file.
@param archivePathname: a str, naming a path in the filesystem.
"""
+
+ # convert to string because python3 ZipFile doesn't take bytes
+ if isinstance(archivePathname, bytes):
+ archivePathname = archivePathname.decode(ENCODING)
+
if _USE_ZIPFILE:
self.zipfile = ZipFile(archivePathname)
else:
self.zipfile = ChunkingZipFile(archivePathname)
- self.path = archivePathname
- self.pathInArchive = ''
+ try:
+ self.path = archivePathname.encode(ENCODING)
+ except AttributeError:
+ self.path = archivePathname
+
+ self.pathInArchive = b''
# zipfile is already wasting O(N) memory on cached ZipInfo instances,
# so there's no sense in trying to do this lazily or intelligently
self.childmap = {} # map parent: list of children
for name in self.zipfile.namelist():
- name = name.split(ZIP_PATH_SEP)
+ name = name.split(ZIP_PATH_SEP.decode())
for x in range(len(name)):
child = name[-x]
- parent = ZIP_PATH_SEP.join(name[:-x])
+ parent = ZIP_PATH_SEP.decode().join(name[:-x])
+ # convert back to bytes to reflect correct file path api
+ parent = parent.encode(ENCODING)
+ child = child.encode(ENCODING)
if parent not in self.childmap:
self.childmap[parent] = {}
self.childmap[parent][child] = 1
Something went wrong with that request. Please try again.