c# zip file - Extract file last - c#

Quick question: I need to extract zip file and have a certain file extract last.
More info: I know how to extract a zip file with c# (fw 4.5).
The problem I'm having now is that I have a zip file and inside it there is always a file name (for example) "myFlag.xml" and a few more files.
Since I need to support some old applications that listen to the folder I'm extracting to, I want to make sure that the XML file will always be extract the last.
Is there some thing like "exclude" for the zip function that can extract all but a certain file so I can do that and then extract only the file alone?
Thanks.

You could probably try a foreach loop on the ZipArchive, and exclude everything that doesn't match your parameters, then, after the loop is done, extract the last file.
Something like this:
private void TestUnzip_Foreach()
{
using (ZipArchive z = ZipFile.Open("zipfile.zip", ZipArchiveMode.Read))
{
string LastFile = "lastFileName.ext";
int curPos = 0;
int lastFilePosition = 0;
foreach (ZipArchiveEntry entry in z.Entries)
{
if (entry.Name != LastFile)
{
entry.ExtractToFile(#"C:\somewhere\" + entry.FullName);
}
else
{
lastFilePosition = curPos;
}
curPos++;
}
z.Entries[lastFilePosition].ExtractToFile(#"C:\somewhere_else\" + LastFile);
}
}

Related

Find a string in a zipped file without unzipping the file

Is there a way to search for a string within a file(s) within a zipped folder WITHOUT unzipping the files?
My situation is I have over 1 million files zipped by months of the year.
For example 2008_01, 2008_02, etc.
I need to extract/unzip only the files with specific serial numbers within the files.
The only thing I can find is unzipping the data to a temporary location to perform that search, but it takes me 45-60 minutes just to unzip the data manually. So I assume the code would take just as long to perform that task, plus I don't have that much available space.
Please Help.
Unfortunately, there isn't a way to do this. The zip format maintains an uncompressed manifest that shows file names and directory structure, but the contents of the files themselves are compressed, and therefore any string inside a file won't match your search until the file is decompressed.
This same limitation exists with just about any general-purpose file compression format (7zip, gzip, rar, etc.). You're essentially reclaiming disk space at the expense of CPU cycles.
Using some extension methods, you can scan through the Zip files. I don't think you can gain anything by trying to scan a single zip in parallel, but you could probably scan multiple zip files in parallel.
public static class ZipArchiveEntryExt {
public static IEnumerable<string> GetLines(this ZipArchiveEntry e) {
using (var stream = e.Open()) {
using (var sr = new StreamReader(stream)) {
string line;
while ((line = sr.ReadLine()) != null)
yield return line;
}
}
}
}
public static class ZipArchiveExt {
public static IEnumerable<string> FilesContain(this ZipArchive arch, string target) {
foreach (var entry in arch.Entries.Where(e => !e.FullName.EndsWith("/")))
if (entry.GetLines().Any(line => line.Contains(target)))
yield return entry.FullName;
}
public static void ExtractFilesContaining(this ZipArchive arch, string target, string extractPath) {
if (!extractPath.EndsWith(Path.DirectorySeparatorChar.ToString(), StringComparison.Ordinal))
extractPath += Path.DirectorySeparatorChar;
foreach (var entry in arch.Entries.Where(e => !e.FullName.EndsWith("/")))
if (entry.GetLines().Any(line => line.Contains(target)))
entry.ExtractToFile(Path.Combine(extractPath, entry.Name));
}
}
With these, you can search a zip file with:
var arch = ZipFile.OpenRead(zipPath);
var targetString = "Copyright";
var filesToExtract = arch.FilesContain(targetString);
You could also extract them to a particular path (assuming no filename conflicts) with:
var arch = ZipFile.OpenRead(zipPath);
var targetString = "Copyright";
arch.ExtractFilesContaining(targetString, #"C:\Temp");
You could modify ExtractFilesContaining to e.g. add the year-month to the file names to help avoid conflicts.

HttpPostedFileBase gets content length to 0 when C# iterates the zipfile

I have a web interface where users can choose one of many files from local computer and upload them to a central location, in this case Azure Blob Storage. I have a check in my C# code to validate that the filename ending is .bin. The receiving method in C# takes an array of HttpPostedFileBase.
I want to allow users to choose a zipfile instead. In my C# code, I iterate through the content of the zipfile and check each filename to verify that the ending is .bin.
However, when I iterate through the zipfile, the ContentLength of the HttpPostedFileBase object becomes 0 (zero) and when I later on upload the zipfile to Azure, it is empty.
How can I make a check for filename endings without manipulating the zipfile?
I have tried to DeepCopy a single object of HttpPostedFileBase but it is not serializable.
I've tried to make a copy of the array but nothing works. It seems that everything is reference and not value. Some example of my code as follows. Yes, I tried the lines individually.
private static bool CanUploadBatchOfFiles(HttpPostedFileBase[] files)
{
var filesCopy = new HttpPostedFileBase[files.Length];
// Neither of these lines works
Array.Copy(files, 0, filesCopy, 0, files.Length);
Array.Copy(files, filesCopy, files.Length);
files.CopyTo(filesCopy, 0);
}
This is how I iterate through the zipfile
foreach (var file in filesCopy)
{
if (file.FileName.EndsWith(".zip"))
{
using (ZipArchive zipFile = new ZipArchive(file.InputStream))
{
foreach (ZipArchiveEntry entry in zipFile.Entries)
{
if (entry.Name.EndsWith(".bin"))
{
// Some code left out
}
}
}
}
}
I solved my problem. I had to do two separate things:
First, I do not do a copy of the array. Instead, for each zip file, I just copy the stream. This made the ContentLength stay at whatever length it was.
The second thing is did was to reset the position after I looked inside the zipfile. I need to do this or else the zip file that I upload to Azure Blob Storage will be empty.
private static bool CanUploadBatchOfFiles(HttpPostedFileBase[] files)
{
foreach (var file in files)
{
if (file.FileName.EndsWith(".zip"))
{
// Part one of the solution
Stream fileCopy = new MemoryStream();
file.InputStream.CopyTo(fileCopy);
using (ZipArchive zipFile = new ZipArchive(fileCopy))
{
foreach (ZipArchiveEntry entry in zipFile.Entries)
{
// Code left out
}
}
// Part two of the solution
file.InputStream.Position = 0;
}
}
return true;
}

Create KMZ file from KML file Programatically in C#

I have a number of kml file I am generating in a directory.
I am wondering if there is a way to group them all into a kmz file programmatically in C#. With a name and description displayed in google earth.
Thanks and best regards,
private static void combineAllKMLFilesULHR(String dirPath)
{
string kmzPath = "outputULHR.kmz";
string appPath = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location);
string rootKml = appPath + #"\" + dirPath + #"\doc.kml";
Console.WriteLine(appPath + #"\"+ dirPath);
String[] filepaths = Directory.GetFiles(appPath + #"\" + dirPath);
using (ZipArchive archive = ZipFile.Open(kmzPath, ZipArchiveMode.Create))
{
archive.CreateEntryFromFile( rootKml, "doc.kml");
foreach (String file in filepaths)
{
Console.WriteLine(file);
if(!file.Equals(rootKml))
archive.CreateEntryFromFile( file, Path.GetFileName(file) );
}
}
}
Being a KMZ file a zip archive, you can use the ZipArchive class to generate it.
string kmzPath = "output.kmz";
string rootKml = "doc.kml";
string referencedKml = "someother.kml";
using (ZipArchive archive = ZipFile.Open(kmzPath, ZipArchiveMode.Create))
{
archive.CreateEntryFromFile(rootKml, "doc.kml");
archive.CreateEntryFromFile(referencedKml, "someother.kml");
}
just remember to name the default kml as doc.kml, from the documentation:
Put the default KML file (doc.kml, or whatever name you want to give
it) at the top level within this folder. Include only one .kml file.
(When Google Earth opens a KMZ file, it scans the file, looking for
the first .kml file in this list. It ignores all subsequent .kml
files, if any, in the archive. If the archive contains multiple .kml
files, you cannot be sure which one will be found first, so you need
to include only one.)

Creating files using c#, like an evernote

I currently am making a UI for a note keeper and was just going to preview documents etc, but i was wondering what file type i would need to create if instead i wanted to do things like tag the file etc, preferably in c#, basically make my own evernote, how do these programs store the notes?
I dont know how to directly tag the file, but you could create your own system to do it. I mentioned two ways to do it:
The first way is to format the note's / file's contents so that there are two parts, the tags and the actual text. When the program loads the note / file, it seperates the tags and the text. This has the downside that the program have to load the whole file to just find the tags.
The second way is to have a database with the filename and it's associated tags. In this way the program doesn't have to load the whole file just to find the tags.
The first way
In this solution you need to format your files in a specific way
<Tags>
tag1,tag2,tag3
</Tags>
<Text>
The text you
want in here
</Text>
By setting up the file like this, the program can separate the tags from the text. To load it's tags you'd need this code:
public List<string> GetTags(string filePath)
{
string fileContents;
// read the file if it exists
if (File.Exists(filePath))
fileContents = File.ReadAllText(filePath);
else
return null;
// Find the place where "</Tags>" is located
int tagEnd = fileContents.IndexOf("</Tags>");
// Get the tags
string tagString = fileContents.Substring(6, tagEnd - 6).Replace(Environment.NewLine, ""); // 6 comes from the length of "<Tags>"
return tagString.Split(',').ToList();
}
Then to get the text you'd need this:
public string GetText(string filePath)
{
string fileContents;
// read the file if it exists
if (File.Exists(filePath))
fileContents = File.ReadAllText(filePath);
else
return null;
// Find the place where the text content begins
int textStart = fileContents.IndexOf("<Text>") + 6 + Environment.NewLine.Length; // The length on newLine is neccecary because the line shift after "<Text>" shall NOT be included in the text content
// Find the place where the text content ends
int textEnd = fileContents.LastIndexOf("</Text>");
return fileContents.Substring(textStart, textEnd - textStart - Environment.NewLine.Length); // The length again to NOT include a line shift added earlier by code
}
Then I'll let you find out how you do the rest.
The second way
In this solution you have a database file over all your files and their associated tags. This database file would look like this:
[filename]:[tags]
file.txt:tag1, tag2, tag3
file2.txt:tag4, tag5, tag6
The program will then read the file name and the tags in this way:
public static void LoadDatabase(string databasePath)
{
string[] fileContents;
// End process if database doesn't exist
if (File.Exists(databasePath))
return;
fileContents = File.ReadAllLines(databasePath); // Read all lines seperately and put them into an array
foreach (string str in fileContents)
{
string fileName = str.Split(':')[0]; // Get the filename
string tags = str.Split(':')[1]; // Get the tags
// Do what you must with the information
}
}
I hope this helps.

DotNetZip extract prevents process from accessing file

I'm using the DotNetZip library to extract files from a zip file.
using(ZipFile zip = ZipFile.Read(zipLocation))
{
foreach (ZipEntry entry in zip){
entry.Extract(_updateDir);
Log.Write("Unpacked: " + entry.FileName, Log.LogType.Info);
}
zip.Dispose();
}
Later on, I attempt to edit one of the files that I extracted.
var updateList = allFiles.Where(x => x.Contains(".UPD"));
foreach (string upd in updateList){
string[] result = File.ReadAllLines(upd);
int index = Array.IndexOf(result, "[Info]");
//then I do stuff with index
}
At the line
string[] result = File.ReadAllLines(upd);
I get the exception: The process cannot access the file <file name> because it is being used by another process.
I know that this exception is being thrown because the file is in use elsewhere. The only place it is in use before File.ReadAllLines(upd) is in the DotNetZip code above.
Is there a way in the DotNetZip code to prevent this from happening?
The problem it's not from DotNetZip. I tried the code in my project and it works file:
[Test]
public void Test2()
{
using (ZipFile zip = ZipFile.Read("D:/ArchiveTest.zip"))
{
foreach (ZipEntry entry in zip)
{
entry.Extract("D:/ArchiveTest");
}
zip.Dispose();
}
var updateList = Directory.GetFiles("D:/ArchiveTest").Where(x => x.Contains(".UPD"));
foreach (string upd in updateList)
{
string[] result = File.ReadAllLines(upd);
int index = Array.IndexOf(result, "[Info]");
//then I do stuff with index
}
}
Probably another process is using the file you are trying to read. If you have Windows7 or Windows8, you can use the built-in Resource Monitor. Read this post: How to know what process is using a given file?

Categories