Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to quickly decompress from an archive created with -9 -f 3.0 #59

Closed
goblin opened this issue Mar 1, 2016 · 2 comments
Closed

Comments

@goblin
Copy link

goblin commented Mar 1, 2016

With pixz 1.0.2-2 from Debian, running pixz -x files/file00309491 -i files.tpxz9 results with "Index and archive differ as to next file: files/file00309491 vs files/file00246065".

files.tpxz9 was created with pixz -9 -f 3.0 files.tar files.tpxz9 and is available at the following
magnet link:

magnet:?xl=136933688&dn=files.tpxz9&xt=urn:btih:utilil77ywy4hwmd2r5slxpwkfuy3bo2&tr=udp://tracker.openbittorrent.com:80&tr=udp://open.demonii.com:1337&tr=udp://tracker.coppersurfer.tk:6969&tr=udp://tracker.leechers-paradise.org:6969

(131MB) (sha256 = 2a1cd58008bd527e5d6b8256de796e8368019c2877ed4548afc163d4ede9e614) (more info on it below)

Freshly compiled pixz from git master (936d806) with debug enabled outputs this instead:

$ ~/git/pixz/I/bin/pixz -x files/file00309491 -i ../files.tpxz9
want: files/file00309491
read: skip 1
read: skip 2
read: skip 3
read: skip 4
read: want 5
tar want: files/file00309491
tar off = 185420288, size = 18446744073525179904
(null)
Error reading archive entry
$ 

Slow decompression of the whole tar archive and then instructing tar to output the file seems to work OK.

files.tar was created from a Wikimedia dump at https://dumps.wikimedia.org/simplewiki/20160203/simplewiki-20160203-pages-meta-current.xml.bz2 with this command line:

$ mkdir files
$ bzcat ../simplewiki-20160203-pages-meta-current.xml.bz2 | ../splitter.pl <(bzgrep -b '<page>' ../simplewiki-20160203-pages-meta-current.xml.bz2  | sed -e 's/: .*$//')
$ tar cf files.tar files

The splitter.pl script is as below:

#! /usr/bin/perl

use strict;
use warnings;

my $splitf;
open $splitf, "<", $ARGV[0];

my $cnt = 0;
my $filecnt = 0;
my $done = 0;
my $buf;

while(!$done) {
        my $mark = <$splitf>;
        if($mark) {
                my $len = read(STDIN, $buf, $mark - $cnt);
                my $outf;
                open $outf, ">", sprintf("file%08d", $filecnt++);
                print $outf $buf;
                close $outf;
                $cnt += $len;
        } else {
                $done = 1;
        }
}
@goblin
Copy link
Author

goblin commented Mar 1, 2016

The original file was created with debian's version of pixz, but compressing with latest git master results in identical files.tpxz9

@goblin
Copy link
Author

goblin commented Mar 1, 2016

Sorry, this looks like a duplicate of #13 , I somehow missed it on my first search.

@goblin goblin closed this as completed Mar 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant