How to recursively explore for zip file contents without extraction - c#

I want to write a function that will explore a ZIP file and will find if it contains a .png file. Problem is, it should also explore contained zip files that might be within the parent zip (also from other zip files and folders).
as if it is not painful enough, the task must be done without extracting any of the zip files, parent or children.
I would like to write something like this (semi pseudo):
public bool findPng(zipPath) {
bool flag = false;
using (ZipArchive archive = ZipFile.OpenRead(zipPath))
{
foreach (ZipArchiveEntry entry in archive.Entries)
{
string s = entry.FullName;
if (s.EndsWith(".zip"))
{
/* recoursively calling findPng */
flag = findPng(s);
if (flag == true)
return true;
}
/* same as above with folders within the zip */
if((s.EndsWith(".png")
return true;
}
return false
}
}
Problem is, I can't find a way to explore inner zip files without extracting the file, which is a must prerequisite (to not extract the file).
Thanks in advance!

As I pointed to in the question I marked yours basically as a duplicate off, you need to open the inner zip file.
I'd change your "open from file" method to be like this:
// Open ZipArchive from a file
public bool findPng(zipPath) {
using (ZipArchive archive = ZipFile.OpenRead(zipPath))
{
return findPng(archive);
}
}
And then have a separate method that takes a ZipArchive so that you can call it recursively by opening the entry as a Stream as demonstrated here
// Search ZipArchive for PNG
public bool findPng(ZipArchive archive)
{
foreach (ZipArchiveEntry entry in archive.Entries)
{
string s = entry.FullName;
if (s.EndsWith(".zip"))
{
// Open inner zip and pass to same method
using (ZipArchive innerArchive = new ZipArchive(entry.Open()))
{
if (findPng(innerArchive))
return true;
}
}
/* same as above with folders within the zip */
if(s.EndsWith(".png"))
return true;
}
return false;
}
}
As an optimisation, I would recommend checking all of the filenames before handling nested zip files.

Related

Find a string in a zipped file without unzipping the file

Is there a way to search for a string within a file(s) within a zipped folder WITHOUT unzipping the files?
My situation is I have over 1 million files zipped by months of the year.
For example 2008_01, 2008_02, etc.
I need to extract/unzip only the files with specific serial numbers within the files.
The only thing I can find is unzipping the data to a temporary location to perform that search, but it takes me 45-60 minutes just to unzip the data manually. So I assume the code would take just as long to perform that task, plus I don't have that much available space.
Please Help.
Unfortunately, there isn't a way to do this. The zip format maintains an uncompressed manifest that shows file names and directory structure, but the contents of the files themselves are compressed, and therefore any string inside a file won't match your search until the file is decompressed.
This same limitation exists with just about any general-purpose file compression format (7zip, gzip, rar, etc.). You're essentially reclaiming disk space at the expense of CPU cycles.
Using some extension methods, you can scan through the Zip files. I don't think you can gain anything by trying to scan a single zip in parallel, but you could probably scan multiple zip files in parallel.
public static class ZipArchiveEntryExt {
public static IEnumerable<string> GetLines(this ZipArchiveEntry e) {
using (var stream = e.Open()) {
using (var sr = new StreamReader(stream)) {
string line;
while ((line = sr.ReadLine()) != null)
yield return line;
}
}
}
}
public static class ZipArchiveExt {
public static IEnumerable<string> FilesContain(this ZipArchive arch, string target) {
foreach (var entry in arch.Entries.Where(e => !e.FullName.EndsWith("/")))
if (entry.GetLines().Any(line => line.Contains(target)))
yield return entry.FullName;
}
public static void ExtractFilesContaining(this ZipArchive arch, string target, string extractPath) {
if (!extractPath.EndsWith(Path.DirectorySeparatorChar.ToString(), StringComparison.Ordinal))
extractPath += Path.DirectorySeparatorChar;
foreach (var entry in arch.Entries.Where(e => !e.FullName.EndsWith("/")))
if (entry.GetLines().Any(line => line.Contains(target)))
entry.ExtractToFile(Path.Combine(extractPath, entry.Name));
}
}
With these, you can search a zip file with:
var arch = ZipFile.OpenRead(zipPath);
var targetString = "Copyright";
var filesToExtract = arch.FilesContain(targetString);
You could also extract them to a particular path (assuming no filename conflicts) with:
var arch = ZipFile.OpenRead(zipPath);
var targetString = "Copyright";
arch.ExtractFilesContaining(targetString, #"C:\Temp");
You could modify ExtractFilesContaining to e.g. add the year-month to the file names to help avoid conflicts.

HttpPostedFileBase gets content length to 0 when C# iterates the zipfile

I have a web interface where users can choose one of many files from local computer and upload them to a central location, in this case Azure Blob Storage. I have a check in my C# code to validate that the filename ending is .bin. The receiving method in C# takes an array of HttpPostedFileBase.
I want to allow users to choose a zipfile instead. In my C# code, I iterate through the content of the zipfile and check each filename to verify that the ending is .bin.
However, when I iterate through the zipfile, the ContentLength of the HttpPostedFileBase object becomes 0 (zero) and when I later on upload the zipfile to Azure, it is empty.
How can I make a check for filename endings without manipulating the zipfile?
I have tried to DeepCopy a single object of HttpPostedFileBase but it is not serializable.
I've tried to make a copy of the array but nothing works. It seems that everything is reference and not value. Some example of my code as follows. Yes, I tried the lines individually.
private static bool CanUploadBatchOfFiles(HttpPostedFileBase[] files)
{
var filesCopy = new HttpPostedFileBase[files.Length];
// Neither of these lines works
Array.Copy(files, 0, filesCopy, 0, files.Length);
Array.Copy(files, filesCopy, files.Length);
files.CopyTo(filesCopy, 0);
}
This is how I iterate through the zipfile
foreach (var file in filesCopy)
{
if (file.FileName.EndsWith(".zip"))
{
using (ZipArchive zipFile = new ZipArchive(file.InputStream))
{
foreach (ZipArchiveEntry entry in zipFile.Entries)
{
if (entry.Name.EndsWith(".bin"))
{
// Some code left out
}
}
}
}
}
I solved my problem. I had to do two separate things:
First, I do not do a copy of the array. Instead, for each zip file, I just copy the stream. This made the ContentLength stay at whatever length it was.
The second thing is did was to reset the position after I looked inside the zipfile. I need to do this or else the zip file that I upload to Azure Blob Storage will be empty.
private static bool CanUploadBatchOfFiles(HttpPostedFileBase[] files)
{
foreach (var file in files)
{
if (file.FileName.EndsWith(".zip"))
{
// Part one of the solution
Stream fileCopy = new MemoryStream();
file.InputStream.CopyTo(fileCopy);
using (ZipArchive zipFile = new ZipArchive(fileCopy))
{
foreach (ZipArchiveEntry entry in zipFile.Entries)
{
// Code left out
}
}
// Part two of the solution
file.InputStream.Position = 0;
}
}
return true;
}

How to read zip entries of a zip file inside another zip file, ad nauseam for c# or vb.net

While there is a response to this question using the java libraries (Read a zip file inside zip file), I cannot find an example of this anywhere in c# or vb.net.
What I have to do for a client is use the .NET 4.5 ZipArchive library to traverse zip files for specific entries. Before anyone asks, the client refuses to allow me to use dotnetzip, because his chief architect has experience with that library and says it is too buggy to be used in a real application. He's pointed out a couple to me, and it doesn't matter what I think anyway!
If I have a zip file, that itself contains other zip files, I need a way of opening the inner zip files, and read the entries for that zip file. Eventually I will also have to actually open the zip entry for the zip in a zip, but for now I just have to be able to get at the zipentries of an inner zip file.
Here's what I have so far:
public string PassThruZipFilter(string[] sfilters, string sfile, bool buseregexp, bool bignorecase, List<ZipArchiveZipFile> alzips)
{
bool bpassed = true;
bool bfound = false;
bool berror = false;
string spassed = "";
int ifile = 0;
try
{
ZipArchive oarchive = null; ;
int izipfiles = 0;
if (alzips.Count == 0)
{
oarchive = ZipFile.OpenRead(sfile);
izipfiles = oarchive.Entries.Count;
}
else
{
//need to dig into zipfile n times in alzips[i] where n = alzips.Count
oarchive = GetNthZipFileEntries(alzips, sfile); <------ NEED TO CREATE THIS FUNCTION!
izipfiles = oarchive.Entries.Count;
}
while (((ifile < izipfiles) & (bfound == false)))
{
string sfilename = "";
sfilename = oarchive.Entries[ifile].Name;
//need to take into account zip files that contain zip files...
bfound = PassThruFilter(sfilters, sfilename, buseregexp, bignorecase);
if ((bfound == false) && (IsZipFile(sfilename)))
{
//add this to the zip stack
ZipArchiveZipFile ozazp = new ZipArchiveZipFile(alzips.Count, sfile, sfilename);
alzips.Add(ozazp);
spassed = PassThruZipFilter(sfilters, sfilename, buseregexp, bignorecase, alzips);
if (spassed.Equals(sISTRUE))
{
bfound = true;
}
else
{
if (spassed.Equals(sISFALSE))
{
bfound = false;
}
else
{
bfound = false;
berror = true;
}
}
}
ifile += 1;
}
}
catch (Exception oziperror)
{
berror = true;
spassed = oziperror.Message;
}
if ((bfound == false))
{
bpassed = false;
}
else
{
bpassed = true;
}
if (berror == false)
{
spassed = bpassed.ToString();
}
return (spassed);
}
So the function I have to create is 'GetNthZipFileEntries(List, sfile)', where the ZipFileZipEntry is just a structure that contains an int index, string szipfile, string szipentry.
I cannot figure out how read a zip file inside a zip file (or G-d forbid, a zip file inside a zip file inside a zip file...the 'PassThruZipFilter is a function inside a recursive function) using .NET 4.5. Obviously microsoft does it, because you can open up a zip file inside a zip file in explorer. Many thanks for anyone that can help.
So, I truly need your help on how to open zip files inside of zip files in .NET 4.5 without writing to the disk. There are NO examples on the web I can find for this specific purpose. I can find tons of examples for reading zip file entries, but that doesn't help. To be clear, I cannot use a hard disk to write anything. I can use a memory stream, but that is the extent of what I can do. I cannot use the dotnetzip library, so any comments using that won't help, but of course I'm thankful for any help at all. I could use another library like the Sharp zip libs, but I'd have to convince the client that it is impossible with .NET 4.5.
Once you identify a ZipArchiveEntry as a Zipfile, you can call the Open method on the entry to obtain a Stream. That stream can then be used to create a new ZipArchive.
The following code demonstrates listing all entries and sub-entries of a nested Zip archive.
Private Sub Test()
Using strm As Stream = File.Open("Textfile.zip", FileMode.Open)
ListZipEntries(strm)
End Using
End Sub
Private Sub ListZipEntries(strm As Stream)
Using archive As New ZipArchive(strm, ZipArchiveMode.Read, False) ' closes stream when done
For Each entry As ZipArchiveEntry In archive.Entries
Debug.Print(entry.FullName)
Dim fi As New FileInfo(entry.FullName)
If String.Equals(fi.Extension, ".zip", StringComparison.InvariantCultureIgnoreCase) Then
Debug.IndentLevel += 1
Using entryStream As Stream = entry.Open()
ListZipEntries(entryStream)
End Using
Debug.IndentLevel -= 1
End If
Next
End Using
End Sub

c# zip file - Extract file last

Quick question: I need to extract zip file and have a certain file extract last.
More info: I know how to extract a zip file with c# (fw 4.5).
The problem I'm having now is that I have a zip file and inside it there is always a file name (for example) "myFlag.xml" and a few more files.
Since I need to support some old applications that listen to the folder I'm extracting to, I want to make sure that the XML file will always be extract the last.
Is there some thing like "exclude" for the zip function that can extract all but a certain file so I can do that and then extract only the file alone?
Thanks.
You could probably try a foreach loop on the ZipArchive, and exclude everything that doesn't match your parameters, then, after the loop is done, extract the last file.
Something like this:
private void TestUnzip_Foreach()
{
using (ZipArchive z = ZipFile.Open("zipfile.zip", ZipArchiveMode.Read))
{
string LastFile = "lastFileName.ext";
int curPos = 0;
int lastFilePosition = 0;
foreach (ZipArchiveEntry entry in z.Entries)
{
if (entry.Name != LastFile)
{
entry.ExtractToFile(#"C:\somewhere\" + entry.FullName);
}
else
{
lastFilePosition = curPos;
}
curPos++;
}
z.Entries[lastFilePosition].ExtractToFile(#"C:\somewhere_else\" + LastFile);
}
}

Restrict users to upload zip file with folder inside

I have a file upload control.
I restrict users to upload only zip files.
the namespace i use is Ionic.Zip;
I also want check if that zip file has a folder inside.
I have to restrict the users not upload a zipfile with a folder inside.
I could check how many files inside zip file like
using (ZipFile zip = ZipFile.Read(file_path))
{
if (zip.Count < 5)
{
}
I do not know how to check for a folder inside
Anyone can help me please.
thanks in advance
void Main()
{
var isGood=false;
using (ZipFile zip = new ZipFile(#"c:\\1.zip"))
{
for (var i=0;i<zip.Count;i++)
if (zip[i].Attributes==FileAttributes.Directory)
{
isGood=false;
break;
}
}
if (isGood) Console.WriteLine ("ok");
else
Console.WriteLine ("error");
}
// Define other methods and classes here
edit :
there's seems to be a problem with the way you created this zip file.
I extracted the files from the file you sent me and created new zip : (named 3.zip):
and as you can see - the code works :
so I guess the dll is not powerful enough to recognize edge format
You can iterate on your zip object's ZipEntries - ZipEntry object contains IsDirectory property.
foreach(var entry in zip)
{
if(entry.IsDirectory)
{
//your stuff
}
}

Categories