I have a program where I need to search an arbitrary number of nested zip-files. I was able to solve this in python 3 by taking the namelist of the archive at a given path, finding zip-files, opening them, converting the file to a byte string with BytesIO, and then calling the method again recursively on the bytestring. Like so:
def zip_dig(source_path, posts):
try:
with zipfile.ZipFile(source_path, 'r') as zip_ref: # Open initial zip file, list contents
for name in zip_ref.namelist():
if re.search(r'\.zip$', name) is not None:
if name.endswith('.zip'):
zfiledata = BytesIO(zip_ref.read(name))
zip_dig(zfiledata, posts)
except zipfile.BadZipFile:
pass
return posts
I now need to solve this in C#, but I can't seem to find any equivalent solution.
I have googled extensively and looked through the documentation of the ZipFile and ZipArchive classes, but I can't seem to find similar workaround for the fact that the file reference comes in the form of a Stream rather than a String:
internal static List<BsonDocument> ZipDig(string path, List<BsonDocument> posts)
{
path = Path.GetFullPath(path);
using (ZipArchive archive = ZipFile.OpenRead(path))
{
foreach (ZipArchiveEntry entry in archive.Entries)
{
if (entry.FullName.EndsWith(".zip", StringComparison.OrdinalIgnoreCase))
{
posts = ZipDig(entry, posts);
}
}
}
return posts;
}
Any help is appreciated!
EDIT: I should clarify, the zip files are often several gigabytes large and therefore extraction is not really and option from a time consumption perspective. I'm just finding a particular type of txt-file, reading them and entering the contents into a database.
ZipArchive has a constructor which takes a stream.
Use that below the initial level of recursion.
Related
I inherited some code that makes use of ZipArchive to save some information from the database. It uses BinaryFormatter to do this. When you look at the zip file with 7-zip (for example), you see a couple of folders and a .txt file. All is working well. I simply want to modify the code to also have a folder in the ZipArchive called "temp" that consists of files and folders under C:\temp. Is there an easy way to add a entry (ZipArchiveEntry?) that consist of an entire folder or the disc? I saw "CreateEntryFromFile" in the member methods of ZipArchive, but no CreateEntryFromDirectory. Or perhaps there's some other simple way to do it? Anyone have example code? I should say that C:\temp could have variable number of files and directories (that have child directories and files, etc.) Must I enumerate them somehow, create my own directories use CreateEntryFromFile? Any help is appreciated.
Similarly, when I read the ZipArchive, I want to take the stuff related to C:\temp and just dump it in a directory (like C:\temp_old)
Thanks,
Dave
The answer by user1469065 in Zip folder in C# worked for me. user1469065 shows how to get all the files/directories in the directory (using some cool "yield" statements) and then do the serialization. For completeness, I did add the code to deserialize as user1469065 suggested (at least I think I did it the way he suggested).
private static void ReadTempFileStuff(ZipArchive archive) // adw
{
var sessionArchives = archive.Entries.Where(x => x.FullName.StartsWith(#"temp_directory_contents")).ToArray();
if (sessionArchives != null && sessionArchives.Length > 0)
{
foreach (ZipArchiveEntry entry in sessionArchives)
{
FileInfo info = new FileInfo(#"C:\" + entry.FullName);
if (!info.Directory.Exists)
{
Directory.CreateDirectory(info.DirectoryName);
}
entry.ExtractToFile(#"C:\" + entry.FullName,true);
}
}
}
I have been applying what I have learned so far in Bob Tabors absolute beginners series and I wrote a small console word game for my daughter that requires me to generate a random 5 letter word.
I was previously using File.ReadAllLines(path) to generate a string array from a text file (wordlist.txt) on my system and Random.next to generate the index I would pull from the array.
I learned from some posts here how to embed the file as a resource but now I am unable to find the syntax to point to it (path). Or do I have to access it differently now that it is embedded?
Thanks in advance
Without a good, minimal, complete code example it is impossible to offer specific advice.
However, the basic issue is this: when you embed a file as a resource, it is no longer a file. That is, the original file still exists, but the resource itself is not a file in any way. It is stored as some specific kind of data in your assembly; resources embedded from file sources generally wind up as binary data objects.
How to use this data depends on what you mean by "embed". There are actually two common ways to store resources in a C# program: you can use the "Resources" object in the project, which exposes the resource via the project's ...Properties.Resources class (which in turn uses the ResourceManager class in .NET). Or you can simply add the file to the project itself, and select the "Embedded Resource" build option.
If you are using the "Resources" designer, then there are a couple of different ways you might have added the file. One is to use the "New Text File..." option, which allows you to essentially copy/paste or type new text into a resource. This is exposed in code as a string property on the Properties.Resources object. The same thing will happen if you add the resource using the "Existing File..." option and select a file that Visual Studio recognizes as a text file.
Otherwise, the file will be included as a byte[] object exposed by a property in the Properties.Resources class.
If you have used the "Embedded Resource" build option instead of the "Resources" designer, then your data will be available by calling Assembly.GetManifestResourceStream(string) method, which returns a Stream object. This can be wrapped in StreamReader to allow it to be read line-by-line.
Direct replacements for the File.ReadAllLines(string) approach would look something like the following…
Using "Embedded Resource":
string[] ReadAllResourceLines(string resourceName)
{
using (Stream stream = Assembly.GetEntryAssembly()
.GetManifestResourceStream(resourceName))
using (StreamReader reader = new StreamReader(stream))
{
return EnumerateLines(reader).ToArray();
}
}
IEnumerable<string> EnumerateLines(TextReader reader)
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
Using Properties.Resources:
You can do something similar when using the Properties.Resources class. It looks almost identical:
string[] ReadAllResourceLines(string resourceText)
{
using (StringReader reader = new StringReader(resourceText))
{
return EnumerateLines(reader).ToArray();
}
}
called like string[] allLines = ReadAllResourceLines(Properties.Resources.MyTextFile);, where MyTextFile is the property name for the resource you added in the designer (i.e. the string you pass in that second example is the text of the file itself, not the name of the resource).
If you added an existing file that Visual Studio didn't recognize as a text file, then the property type will be byte[] instead of string and you'll need yet another slightly different approach:
string[] ReadAllResourceLines(byte[] resourceData)
{
using (Stream stream = new MemoryStream(resourceData))
using (StreamReader reader = new StreamReader(stream))
{
return EnumerateLines(reader).ToArray();
}
}
Note that in all three examples, the key is that the data winds up wrapped in a TextReader implementation, which is then used to read each line individually, to populate an array. These all use the same EnumerateLines() helper method I show above.
Of course, now that you see how the data can be retrieved, you can adapt that to use the data in a variety of other ways, in case for example you don't really want or need the text represented as an array of string objects.
If you are using The Resource file and added a text file you could use
string text=Properties.Resources.<ResourceName>
here Resources is default Resource for your project .If you have added a custom Resource File you can use its name instead of Properties.Resources
if your content is a file then it is represented as a byte.In your case for simple Text it will be an string if you have included a Text File.
for any other file you can use the syntax for converting content to text(if it is text) as
string text=Encoding.ASCII.GetString(Properties.Resources.<ResourceName>);
if your file has any other encoding (as UTF Unicode ) you can use UTF8 or such classes for that under Encoding
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
After referring many blogs and articles, I have reached at the following code for searching for a string in all files inside a folder. It is working fine in my tests.
QUESTIONS
Is there a faster approach for this (using C#)?
Is there any scenario that will fail with this code?
Note: I tested with very small files. Also very few number of files.
CODE
static void Main()
{
string sourceFolder = #"C:\Test";
string searchWord = ".class1";
List<string> allFiles = new List<string>();
AddFileNamesToList(sourceFolder, allFiles);
foreach (string fileName in allFiles)
{
string contents = File.ReadAllText(fileName);
if (contents.Contains(searchWord))
{
Console.WriteLine(fileName);
}
}
Console.WriteLine(" ");
System.Console.ReadKey();
}
public static void AddFileNamesToList(string sourceDir, List<string> allFiles)
{
string[] fileEntries = Directory.GetFiles(sourceDir);
foreach (string fileName in fileEntries)
{
allFiles.Add(fileName);
}
//Recursion
string[] subdirectoryEntries = Directory.GetDirectories(sourceDir);
foreach (string item in subdirectoryEntries)
{
// Avoid "reparse points"
if ((File.GetAttributes(item) & FileAttributes.ReparsePoint) != FileAttributes.ReparsePoint)
{
AddFileNamesToList(item, allFiles);
}
}
}
REFERENCE
Using StreamReader to check if a file contains a string
Splitting a String with two criteria
C# detect folder junctions in a path
Detect Symbolic Links, Junction Points, Mount Points and Hard Links
FolderBrowserDialog SelectedPath with reparse points
C# - High Quality Byte Array Conversion of Images
Instead of File.ReadAllText() better use
File.ReadLines(#"C:\file.txt");
It returns IEnumerable (yielded) so you will not have to read the whole file if your string is found before the last line of the text file is reached
I wrote somthing very similar, a couple of changes I would recommend.
Use Directory.EnumerateDirectories instead of GetDirectories, it returns immediately with a IEnumerable so you don't need to wait for it to finish reading all of the directories before processing.
Use ReadLines instead of ReadAllText, this will only load one line in at a time in memory, this will be a big deal if you hit a large file.
If you are using a new enough version of .NET use Parallel.ForEach, this will allow you to search multiple files at once.
You may not be able to open the file, you need to check for read permissions or add to the manifest that your program requires administrative privileges (you should still check though)
I was creating a binary search tool, here is some snippets of what I wrote to give you a hand
private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
Parallel.ForEach(Directory.EnumerateFiles(_folder, _filter, SearchOption.AllDirectories), Search);
}
//_array contains the binary pattern I am searching for.
private void Search(string filePath)
{
if (Contains(filePath, _array))
{
//filePath points at a match.
}
}
private static bool Contains(string path, byte[] search)
{
//I am doing ReadAllBytes due to the fact that I am doing a binary search not a text search
// There are no "Lines" to seperate out on.
var file = File.ReadAllBytes(path);
var result = Parallel.For(0, file.Length - search.Length, (i, loopState) =>
{
if (file[i] == search[0])
{
byte[] localCache = new byte[search.Length];
Array.Copy(file, i, localCache, 0, search.Length);
if (Enumerable.SequenceEqual(localCache, search))
loopState.Stop();
}
});
return result.IsCompleted == false;
}
This uses two nested parallel loops. This design is terribly inefficient, and could be greatly improved by using the Booyer-Moore search algorithm but I could not find a binary implementation and I did not have the time when I wrote it originally to implement it myself.
the main problem here is that you are searching all the files in real time for every search. there is also the possibility of file access conflicts if 2+ users are searching at the same time.
to dramtically improve performance I would index the files ahead of time, and as they are edited/saved. store the indexed using something like lucene.net and then query the index (again using luence.net) and return the file names to the user. so the user never queries the files directly.
if you follow the links in this SO Post you may have a head start on implementing the indexing. I didn't follow the links, but it's worth a look.
Just a heads up, this will be an intense shift from your current approach and will require
a service to monitor/index the files
the UI project
I think your code will fail with an exception if you lack permission to open a file.
Compare it with the code here: http://bgrep.codeplex.com/releases/view/36186
That latter code supports
regular expression search and
filters for file extensions
-- things you should probably consider.
Instead of Contains better use algorithm Boyer-Moore search.
Fail scenario: file have not read permission.
How can I read content of a text file inside a zip archive?
For example I have an archive qwe.zip, and insite it there's a file asd.txt, so how can I read contents of that file?
Is it possible to do without extracting the whole archive? Because it need to be done quick, when user clicks a item in a list, to show description of the archive (it needed for plugin system for another program). So extracting a whole archive isn't the best solution... because it might be few Mb, which will take at least few seconds or even more to extract... while only that single file need to be read.
You could use a library such as SharpZipLib or DotNetZip to unzip the file and fetch the contents of individual files contained inside. This operation could be performed in-memory and you don't need to store the files into a temporary folder.
Unzip to a temp-folder take the file and delete the temp-data
public static void Decompress(string outputDirectory, string zipFile)
{
try
{
if (!File.Exists(zipFile))
throw new FileNotFoundException("Zip file not found.", zipFile);
Package zipPackage = ZipPackage.Open(zipFile, FileMode.Open, FileAccess.Read);
foreach (PackagePart part in zipPackage.GetParts())
{
string targetFile = outputDirectory + "\\" + part.Uri.ToString().TrimStart('/');
using (Stream streamSource = part.GetStream(FileMode.Open, FileAccess.Read))
{
using (Stream streamDestination = File.OpenWrite(targetFile))
{
Byte[] arrBuffer = new byte[10000];
int iRead = streamSource.Read(arrBuffer, 0, arrBuffer.Length);
while (iRead > 0)
{
streamDestination.Write(arrBuffer, 0, iRead);
iRead = streamSource.Read(arrBuffer, 0, arrBuffer.Length);
}
}
}
}
}
catch (Exception)
{
throw;
}
}
Although late in the game and the question is already answered, in hope that this still might be useful for others who find this thread, I would like to add another solution.
Just today I encountered a similar problem when I wanted to check the contents of a ZIP file with C#. Other than NewProger I cannot use a third party library and need to stay within the out-of-the-box .NET classes.
You can use the System.IO.Packaging namespace and use the ZipPackage class. If it is not already included in the assembly, you need to add a reference to WindowsBase.dll.
It seems, however, that this class does not always work with every Zip file. Calling GetParts() may return an empty list although in the QuickWatch window you can find a property called _zipArchive that contains the correct contents.
If this is the case for you, you can use Reflection to get the contents of it.
On geissingert.com you can find a blog article ("Getting a list of files from a ZipPackage") that gives a coding example for this.
SharpZipLib or DotNetZip may still need to get/read the whole .zip file to unzip a file. Actually, there is still method could make you just extract special file from the .zip file without reading the entire .zip file but just reading small segment.
I needed to have insights into Excel files, I did it like so:
using (var zip = ZipFile.Open("ExcelWorkbookWithMacros.xlsm", ZipArchiveMode.Update))
{
var entry = zip.GetEntry("xl/_rels/workbook.xml.rels");
if (entry != null)
{
var tempFile = Path.GetTempFileName();
entry.ExtractToFile(tempFile, true);
var content = File.ReadAllText(tempFile);
[...]
}
}
I keep getting the error "Stream was not writable" whenever I try to execute the following code. I understand that there's still a reference to the stream in memory, but I don't know how to solve the problem. The two blocks of code are called in sequential order. I think the second one might be a function call or two deeper in the call stack, but I don't think this should matter, since I have "using" statements in the first block that should clean up the streams automatically. I'm sure this is a common task in C#, I just have no idea how to do it...
string s = "";
using (Stream manifestResourceStream =
Assembly.GetExecutingAssembly().GetManifestResourceStream("Datafile.txt"))
{
using (StreamReader sr = new StreamReader(manifestResourceStream))
{
s = sr.ReadToEnd();
}
}
...
string s2 = "some text";
using (Stream manifestResourceStream =
Assembly.GetExecutingAssembly().GetManifestResourceStream("Datafile.txt"))
{
using (StreamWriter sw = new StreamWriter(manifestResourceStream))
{
sw.Write(s2);
}
}
Any help will be very much appreciated. Thanks!
Andrew
Embedded resources are compiled into your assembly, you can't edit them.
As stated above, embedded resources are read only. My recommendation, should this be applicable, (say for example your embedded resource was a database file, XML, CSV etc.) would be to extract a blank resource to the same location as the program, and read/write to the extracted resource.
Example Pseudo Code:
if(!Exists(new PhysicalResource())) //Check to see if a physical resource exists.
{
PhysicalResource.Create(); //Extract embedded resource to disk.
}
PhysicalResource pr = new PhysicalResource(); //Create physical resource instance.
pr.Read(); //Read from physical resource.
pr.Write(); //Write to physical resource.
Hope this helps.
Additional:
Your embedded resource may be entirely blank, contain data structure and / or default values.
A bit late, but for descendants=)
About embedded .txt:
Yep, on runtime you couldnt edit embedded because its embedded. You could play a bit with disassembler, but only with outter assemblies, which you gonna load in current context.
There is a hack if you wanna to write to a resource some actual information, before programm starts, and to not keep the data in a separate file.
I used to worked a bit with winCE and compact .Net, where you couldnt allow to store strings at runtime with ResourceManager. I needed some dynamic information, in order to catch dllNotFoundException before it actually throws on start.
So I made embedded txt file, which I filled at the pre-build event.
like this:
cd $(ProjectDir)
dir ..\bin\Debug /a-d /b> assemblylist.txt
here i get files in debug folder
and the reading:
using (var f = new StreamReader(Assembly.GetExecutingAssembly().GetManifestResourceStream("Market_invent.assemblylist.txt")))
{
str = f.ReadToEnd();
}
So you could proceed all your actions in pre-build event run some exes.
Enjoy! Its very usefull to store some important information and helps avoid redundant actions.