XML file from ZIP Archive is incomplete in C#

XML file from ZIP Archive is incomplete in C# - c#

I've work with large XML Files (~1000000 lines, 34mb) that are stored in a ZIP archive. The XML file is used at runtime to store and load app settings and measurements. The gets loadeted with this function:
public static void LoadFile(string path, string name)
{
using (var file = File.OpenRead(path))
{
using (var zip = new ZipArchive(file, ZipArchiveMode.Read))
{
var foundConfigurationFile = zip.Entries.First(x => x.FullName == ConfigurationFileName);
using (var stream = new StreamReader(foundConfigurationFile.Open()))
{
var xmlSerializer = new XmlSerializer(typeof(ProjectConfiguration));
var newObject = xmlSerializer.Deserialize(stream);
CurrentConfiguration = null;
CurrentConfiguration = newObject as ProjectConfiguration;
AddRecentFiles(name, path);
}
}
}
}
This works for most of the time.
However, some files don't get read to the end and i get an error that the file contains non valid XML. I used
foundConfigurationFile.ExtractToFile();
and fount that the readed file stops at line ~800000. But this only happens inside this code. When i open the file via editor everything is there.
It looks like the zip doesnt get loaded correctly, or for that matter, completly.
Am i running in some limitations? Or is there an error in my code i don't find?
The file is saved via:
using (var file = File.OpenWrite(Path.Combine(dirInfo.ToString(), fileName.ToString()) + ".pwe"))
{
var zip = new ZipArchive(file, ZipArchiveMode.Create);
var configurationEntry = zip.CreateEntry(ConfigurationFileName, CompressionLevel.Optimal);
var stream = configurationEntry.Open();
var xmlSerializer = new XmlSerializer(typeof(ProjectConfiguration));
xmlSerializer.Serialize(stream, CurrentConfiguration);
stream.Close();
zip.Dispose();
}

Update:
The problem was the File.OpenWrite() method.
If you try to override a file with this method it will result in a mix between the old file and the new file, if the new file is shorter than the old file.
File.OpenWrite() doenst truncate the old file first as stated in the docs
In order to do it correctly it was neccesary to use the File.Create() method. Because this method truncates the old file first.

Related

How to create and send ZIP archive from C# to Python and save to filesystem?

In C# I create zip archive with some files:
using (MemoryStream ms = new MemoryStream())
{
using (ZipStorer zip = ZipStorer.Create(ms, "comment"))
{
zip.EncodeUTF8 = true;
foreach (FileInfo logFile in logFiles)
{
zip.AddFile(ZipStorer.Compression.Deflate, logFile.FullName, logFile.Name, "");
}
}
this.logger.log("memory stream to array");
zipBytesArr = ms.ToArray();
}
It works, because when I persist zip archive on server with C# - I can open it.
File.WriteAllBytes("C:\\test.zip", zipBytesArr);
Now I send it as bytes array zipBytesArr in POST request, I create object with field type byte[], than serialise it to JSON and send.
MyObject myObject = new MyObject()
{
// .. some fields
ZipFileBytesArr = ZipFileBytesArr
};
var json = JsonConvert.SerializeObject(myObject);
Now I want to get it in Python 3 (flask), I got some long string and I want to create zip archive. My question is: how to achieve this?
I tried something like this:
bytes_as_str = "UEsDBBQAAAgIACyrWk4qVSmD0gYAAA4WAAAYAAAAMjAxOTAyMjZfb3BlbnZ...."
bytes_as_bytes = bytes_as_str.encode(encoding="utf-8")
file = zipfile.ZipFile(io.BytesIO(bytes_as_bytes))
and it does not work, I got an error:
zipfile.BadZipFile: File is not a zip file

How to decompress .zip files in c# without extracting to new location

How can I decompress (.zip) files without extracting to a new location in the .net framework? Specifically, I'm trying to read a filename.csv.zip into a DataTable.
I'm aware of .extractToDirectory (which is within ZipArchive) but I just want to extract it into an object in c# and I would like to not create a new file.
Hoping to be able to do this w/o third party libraries, but I'll take what I can get.

May be some bugs because I never tested this, but here you go:
List<byte[]> urmom = new List<byte[]>();
using (ZipArchive archive = ZipFile.OpenRead(zipPath))
foreach (ZipArchiveEntry entry in archive.Entries)
using (StreamReader r = new StreamReader(entry.Open()))
urmom.Add(r.ReadToEnd(entry));
Basically you use the ZipArchive's openread class to iterate through each entry. At this point, you can use the streamreader to read each entry. From there you can create a file from the stream and even read the filename if you want to. My code doesn't do this, a bit of laziness on my part.

Keep in mind that a compressed stream might contain multiple files. To resolve this is required to iterate through all entries of zip file in order to retrieve them and treat separately.
The sample bellow converts a sequence of bytes in a list of string where each one is the context of the files included in zipped folder:
public static IEnumerable<string> DecompressToEntriesTextContext(byte[] input)
{
var zipEntriesContext = new List<string>();
using (var compressedStream = new MemoryStream(input))
using (var zip = new ZipArchive(compressedStream, ZipArchiveMode.Read))
{
foreach(var entry in zip.Entries)
{
using (var entryStream = entry.Open())
using (var memoryEntryStream = new MemoryStream())
using (var reader = new StreamReader(memoryEntryStream))
{
entryStream.CopyTo(memoryEntryStream);
memoryEntryStream.Position = 0;
zipEntriesContext.Add(reader.ReadToEnd());
}
}
}
return zipEntriesContext;
}

How to open PowerPoint file in resource file C#

I have a number of small PowerPoint files in my resources folder and I want to open them. I'm having issues doing this as my Resource.sendToPPTTemp is of type byte[] and to open the file I need it as a string. Is there a way I can open a file from resources as a string?
var file = Resources.sendToPPTTemp;
ppnt.Application ppntApplication = new ppnt.Application();
var _assembly = Assembly.GetExecutingAssembly();
var myppnt = ppntApplication.Presentations.Open(file.ToString());
ppntApplication.Visible = MsoTriState.msoTrue;

You need to give the path to your file to the Open method, not the binary representation. Either you have the path and pass it to the method or you have to create a file with your byte[].
I'd rather create a folder with all your PPT and store in your resource file the path to that folder. Then you can use the first method:
var di = new DirectoryInfo(Resources.PPTFolderPath);
foreach(var file in di.GetFiles())
{
var myppnt = ppntApplication.Presentations.Open(fi.FullName);
ppntApplication.Visible = MsoTriState.msoTrue;
[..]
}
But if you really want to store your PPT in the resource file, you can do it like this, with a temporary file for example:
var tmpPath = Path.GetTempFileName();
try
{
File.WriteAllBytes(tmpPath, Resources.sendToPPTTemp);
var myppnt = ppntApplication.Presentations.Open(tmpPath);
ppntApplication.Visible = MsoTriState.msoTrue;
[..]
}
finally
{
// you have to delete your tmp file at the end!!!
// probably not the better way to do it because I guess the program does not block on Open.
// Better store the file path into a list and delete later.
var fi = new FileInfo(tmpPath);
fi.Delete();
}

Xml document gets written twice rather than overwriting through an in-memory stream

I'm trying to open an archive Xml file (inside a zip file but not extracting it to a physical directory) in an in-memory stream then making changes to it and saving it. But archive xml file doesn't get overwritten rather it gets two copies of Xml data. One copy is the original copy of Xml data and the other one is changed/modified/edited copy of Xml data in the same archive file.
Here is my code, please help me overwrite the existing Xml data with the changes made rather than having 2 copies of Xml data in the same archive xml file.
static void Main(string[] args)
{
string rootFolder = #"C:\Temp\MvcApplication5\MvcApplication5\Package1";
string archiveName = "MvcApplication5.zip";
string folderFullPath = Path.GetFullPath(rootFolder);
string archivePath = Path.Combine(folderFullPath, archiveName);
string fileName = "archive.xml";
using (ZipArchive zip = ZipFile.Open(archivePath, ZipArchiveMode.Update))
{
var archiveFile = zip.GetEntry(fileName);
if (archiveFile == null)
{
throw new ArgumentException(fileName, "not found in Zip");
}
if (archiveFile != null)
{
using (Stream stream = archiveFile.Open())
{
XDocument doc = XDocument.Load(stream);
IEnumerable<XElement> xElemAgent = doc.Descendants("application");
foreach(var node in xElemAgent)
{
if(node.Attribute("applicationPool").Value!=null)
{
node.Attribute("applicationPool").Value = "MyPool";
}
}
doc.Save(stream);
}
Console.WriteLine("Document saved");
}
}
}

You are first reading the XML data from the stream and then writing to the same stream, which is pointing to the end of the file. To illustrate, let's say the old file contains ABCD and we want to replace this with 123.
The current approach would result in ABCD123, since the stream is pointing to the last char in ABCD.
If you reset the stream to position 0 (stream.Seek(0) before writing the changed file, the file would contain 123D, because it wouldn't reduce the file length.
The solution is to delete your old ZipArchiveEntry and create a new one.

I came across this same issue just now, and I fixed it by adding this first line:
stream.SetLength(0);
xmlDoc.Save(stream);
edit: I see you came across the same solution as you mentioned in the comments of the previous answer. You can add an answer to your own question. It would have helped someone like me :]

Using updateEntry() method with dotnetzip won't overwrite files correctly

I've been having a bit of a problem lately. I've been trying to extract one zip file into a memory stream and then from that stream, use the updateEntry() method to add it to the destination zip file.
The problem is, when the file in the stream is being put into the destination zip, it works if the file is not already in the zip. If there is a file with the same name, it does not overwrite correctly. It says on the dotnetzip docs that this method will overwrite files that are present in the zip with the same name but it does not seem to work. It will write correctly but when I go to check the zip, the files that are supposed to be overwritten have a compressed byte size of 0 meaning something went wrong.
I'm attaching my code below to show you what I'm doing:
ZipFile zipnew = new ZipFile(forgeFile);
ZipFile zipold = new ZipFile(zFile);
using(zipnew) {
foreach(ZipEntry zenew in zipnew) {
percent = (current / zipnew.Count) * 100;
string flna = zenew.FileName;
var fstream = new MemoryStream();
zenew.Extract(fstream);
fstream.Seek(0, SeekOrigin.Begin);
using(zipold) {
var zn = zipold.UpdateEntry(flna, fstream);
zipold.Save();
fstream.Dispose();
}
current++;
}
zipnew.Dispose();
}

Although it might be a bit slow, I found a solution by manually deleting and adding in the file. I'll leave the code here in case anyone else comes across this problem.
ZipFile zipnew = new ZipFile(forgeFile);
ZipFile zipold = new ZipFile(zFile);
using(zipnew) {
// Loop through each entry in the zip file
foreach(ZipEntry zenew in zipnew) {
string flna = zenew.FileName;
// Create a new memory stream for extracted files
var ms = new MemoryStream();
// Extract entry into the memory stream
zenew.Extract(ms);
ms.Seek(0, SeekOrigin.Begin); // Rewind the memory stream
using(zipold) {
// Remove existing entry first
try {
zipold.RemoveEntry(flna);
zipold.Save();
}
catch (System.Exception ex) {} // Ignore if there is nothing found
// Add in the new entry
var zn = zipold.AddEntry(flna, ms);
zipold.Save(); // Save the zip file with the newly added file
ms.Dispose(); // Dispose of the stream so resources are released
}
}
zipnew.Dispose(); // Close the zip file
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

XML file from ZIP Archive is incomplete in C# - c#

Related

How to create and send ZIP archive from C# to Python and save to filesystem?

How to decompress .zip files in c# without extracting to new location

How to open PowerPoint file in resource file C#

Xml document gets written twice rather than overwriting through an in-memory stream

Using updateEntry() method with dotnetzip won't overwrite files correctly

Categories

Resources