2
Vote

Issue with "Nested Compressed File"

description

I have used SharpCompress Lib in a windows 8 (WinRT) app.
The issue I'm reporting is with ReaderFactory.

I noticed that, while Reading specific Zip file which contains further Zip (or any archived) file, the Reader will move to inner Zip and after completion of inner zip file, the Reader will stop reading. Remaining Part of outer (root) Zip next to inner zip will not be Read.

i.e. :-
I have a file named "Root.zip"
contents of "Root.zip" => abc.txt, "InnerArchive.zip", pqr.jpg
contents of "InnerArchive.zip" => xyz.png, hello.txt

in this case Reader will read contents of "Root.zip" something like this:
abc.txt -> InnerArchive.zip -> xyz.png -> hello.txt

so the file "pqr.jpj" is not going to extract... and the file "InnerArchive.zip" which we don't want to extract, will be extracted.

Sample code that I'm using :-
var Reader = ReaderFactory.open(ZipFileStream);
List<string> lstFileName = new List<string>();
while (Reader.MoveToNextEntry())
{
    lstFileName.Add(Reader.Entry.FilePath);
}

file attachments

comments

adamhathcock wrote Feb 12, 2014 at 12:40 PM

I think this must be that the code is looking for headers rather than reading a certain amount first and ignoring headers.

I will test and fix hopefully soon. Thanks!

thinkCloud wrote Feb 12, 2014 at 12:51 PM

I would like to add one thing, in this case ".docx" file is also acting like a compressed file as it actually is.
if there is a ".docx" file in Zip, then Reader extracts whole docx file with it's content.

please update if you find solution for it..
and thanx for this awesome library by the way..

idodiamant wrote Feb 26, 2014 at 2:18 PM

Hi There,
Unfortunately I got affected by this same issue today myself. Noticed it's happening only when the external file is a Zip file, and not when it's a RAR file for example.

Did anyone find any solution for this issue? it's kind of causing me not to support ZIP files in my site due to this problem now.

Thanks!

adamhathcock wrote Feb 26, 2014 at 4:16 PM

I just made a test file myself and tried this out. I'm not actually seeing the issue. How are the zip files created? Can a sample zip for me be attached to this issue?

thinkCloud wrote Feb 27, 2014 at 5:33 AM

I'm really sorry because I forget to update on this issue here though i found a solution and it's cause.

I found that the problem was not with ReaderFactory or any other reader API, but the problem arise when the the Archive created with compression type none.
Instead I started applying compression type as Deflate and tried then the Code is working perfectly.

Conclusion:

Reader API can't read those archive which is created without any CompressionType like "Deflate", "Rar" or "GZip" or "LZMA" or so on..

so while compressing files I started passing parameter as below..:
using (var zipWriter = WriterFactory.Open(zipStream, archiveType, CompressionType.Deflate))  //(zip, archiveType, CompressionType.None))
                                {
                                    foreach (var singleFile in OC_selectedFiles)
                                    {

                                        zipWriter.Write(singleFile.Name, await singleFile.OpenStreamForReadAsync());

                                        //zipWriter.Write(singleFile.Name, singleFile.OpenStreamForReadAsync);
                                        //zipWriter.Write(Path.GetFileName(file), filePath);
                                    }
                                }

elgonzo wrote Sep 3, 2015 at 8:48 PM

I was hit by the same problem today - a ZIP archive containing another (nested) ZIP archive.

As stated by the comments, the problem is that the local file header is followed by the the nested ZIP file data. Of course, the first 4 bytes of the nested ZIP file (which often will be stored uncompressed, since it usually cannot be compressed any further) are identical to ZIP's local file header signature (0x04034b50). SharpCompress mistakenly believes this signature to be the start of the next archive entry.

The consequence: ZipReader.OpenEntryStream() will provide an EntryStream whose internal stream is System.IO.Stream.Null for the archive entry representing the nested ZIP archive.

thinkCloud's comment about the cause being the compression type is not correct; (s)he made the wrong conclusions from his/her observations. Forcing deflate compression for the nested ZIP file will of course result in a compressed data blob which in all likelihood will not start with the local file header signature bytes, and thus thinkCloud could not observe the problem anymore. However, this is only a workaround if one creates the ZIP files by him/herself, and does not really help when dealing with ZIP files provided by other sources...

elgonzo wrote Sep 3, 2015 at 8:56 PM

I just made a test file myself and tried this out. I'm not actually seeing the issue. How are the zip files created? Can a sample zip for me be attached to this issue?

Make sure you add the nested ZIP file to your archive WITHOUT compression. Since (uncompressed) data blob of this archive entry begins (obviously) with a local file header signature, SharpCompress will mistake this as as the next file header in the archive instead of treat this as simple data of a nested ZIP archive which is stored uncompressed.

The clean solution would be to not check for a header signature following the local file header structure but rather evaluating the local file headers "Compressed size" field to first skip the file data and then check for presence of a local file header signature.

elgonzo wrote Sep 3, 2015 at 9:24 PM

I just made a test file myself and tried this out. I'm not actually seeing the issue. How are the zip files created? Can a sample zip for me be attached to this issue?

Test archive "test.zip" attached. It contains only one entry "1.zip". This nested ZIP archive "1.zip" is stored uncompressed.

Also noteworthy is that i can reproduce the problem only when using ReaderFactory.Open(...). When using ArchiveFactory.Open(...) the problem does not seem to occur.