ZipArchive, update entry: read - truncate - write - c#

I'm using System.IO.Compression's ZipArchive to modify a file within a ZIP. I need first to read the whole content (JSON), transform the JSON then truncate the file and write the new JSON to the file. At the moment I have the following code:
using (var zip = new ZipArchive(new FileStream(zipFilePath, FileMode.Open, FileAccess.ReadWrite), ZipArchiveMode.Update))
{
using var stream = zip.GetEntry(entryName).Open();
using var reader = new StreamReader(stream);
using var jsonTextReader = new JsonTextReader(reader);
var json = JObject.Load(jsonTextReader);
PerformModifications(json);
stream.Seek(0, SeekOrigin.Begin);
using var writer = new StreamWriter(stream);
using var jsonTextWriter = new JsonTextWriter(writer);
json.WriteTo(jsonTextWriter);
}
However, the problem is: if the resulting JSON is shorter than the original version, the remainder of the original is not truncated. Therefore I need to properly truncate the file before writing to it.
How to truncate the entry before writing to it?

You can either delete the entry before writing it back or, which I prefer, use stream.SetLength(0) to truncate the stream before writing. (See also https://stackoverflow.com/a/46810781/62838.)

Related

Is there a way to not generate a file via CSV Helper?

If the any opportunity to not generate csv file in system?
I don't wanna to store it in my application and can we generate it smth on fly?
As return I want to converted csv to base64.
var path = Path.Combine(Directory.GetCurrentDirectory(), "test.csv");
await using var writer = new StreamWriter(path);
await using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
await csv.WriteRecordsAsync(list);
}
var bytes = await File.ReadAllBytesAsync(path);
return Convert.ToBase64String(bytes);
A StreamWriter can write to any stream, including a MemoryStream:
using var ms=new MemoryStream();
using var writer = new StreamWriter(ms);
...
return Convert.ToBase64String(ms.GetBuffer());
CSV files are text files though, so converting them to BASE64 isn't very useful. StreamWriter uses UTF8 encoding by default so it already handles any language.
It would be better to keep the text as text, especially if it's going to be stored in a text field in a database. This can be done by reading the bytes using a StreamReader
using var reader=new StreamReader(ms);
ms.Position=0;
var csvText=reader.ReadToEnd();
var csvText

How to write a CSV file after reading it with CsvHelper?

I need to write a HttpPostedFileBase csv (and save it) after mapping it into a list using the CsvHelper nuget package. However, after mapping it with CsvHelper, the ContentLength is 0, and I end up saving an empty .csv file.
CsvHelper itself states that the Read method should not be used when using the GetRecords<T>() method.
// Summary:
// Gets all the records in the CSV file and converts each to System.Type T. The
// Read method should not be used when using this.
//
// Type parameters:
// T:
// The System.Type of the record.
//
// Returns:
// An System.Collections.Generic.IEnumerable`1 of records.
public virtual IEnumerable<T> GetRecords<T>();
I tried placing it into a copy variable:
HttpPostedFileBase csvCopy = csvFile;
But this didn't work. Tried some other solutions I found on stackoverflow, which didn't work either. I "solved" this problem by sending the same file twice to the controller as a parameter. Then I use the first one with CsvHelper, and I read and save the other one.
public async Task<ActionResult> ImportCSV(HttpPostedFileBase csvFile, HttpPostedFileBase csvFileCopy)
However, I think this is a bad solution. I would like to use a single file, map it, reread it and save it.
Mapping it into a list:
using(var reader = new StreamReader(csvFile.InputStream)) {
using(var csvReader = new CsvReader(reader)) {
csvReader.Configuration.RegisterClassMap(new CSVModelMap(mapDictionary));
csvReader.Configuration.BadDataFound = null;
csvReader.Configuration.HeaderValidated = null;
csvReader.Configuration.MissingFieldFound = null;
importData = csvReader.GetRecords<CSVModel>().ToList();
}
}
Saving it:
var fileName = serverPath + "\\" + hashedFileName;
CheckIfDirectoryExists(serverPath);
var reader = new StreamReader(csvFile.InputStream);
var csvContent = await reader.ReadToEndAsync();
File.WriteAllText(fileName, csvContent);
I'm not to sure how GetRecords works, but there might be the possibility that it leaves the cursor on the stream pointed towards the end.
Meaning on your saving sequence you start reading the InputStream at the end, which results in no data to read.
So you could try
csvFile.InputStream.Seek(0, SeekOrigin.Begin)
EDIT:
For your try to copy the stream you only copy the reference to the object and not the object itself.
To copy the stream data you would need to use the method CopyTo(stream) which would leave the cursor on the end of the stream, so here seek is needed for sure.
The issue was CsvHelper:
using(var csvReader = new CsvReader(reader))
At the end of the using statement, CsvReader closes the reader. Then when I try
csvFile.InputStream.Seek(0, SeekOrigin.Begin);
or
reader.BaseStream.Position = 0;
it throws a NullReferenceException. I solved this by simply overriding the CsvReader:
using (var csvReader = new CsvReader(reader, true))
true being leaveOpen:
// Summary:
// Creates a new CSV reader using the given System.IO.TextReader.
//
// Parameters:
// reader:
// The reader.
//
// leaveOpen:
// true to leave the reader open after the CsvReader object is disposed, otherwise
// false.
public CsvReader(TextReader reader, bool leaveOpen);
Then I set the position back to 0, using reader.BaseStream.Position = 0;, and then after saving the file, dispose the reader.

C# Transform Data in ETL Process

I am learning the ETL process in C# and have already extracted and read the sample CSV data but am unsure at what to do to transform it properly.
I have been using this website as a reference at how to transform data, but I am unsure on how to apply it to my sample data (below).
name,gender,age,numKids,hasPet,petType
Carl,M,43,2,true,gecko
Jake,M,22,1,true,snake
Cindy,F,53,3,false,null
Matt,M,23,0,true,dog
Ally,F,28,1,false,null
Megan,F,42,2,false,null
Carly,F,34,4,true,cat
Neal,M,27,2,false,null
Tina,F,21,2,true,pig
Paul,M,1,3,true,chicken
Below is how I extracted the data from the CSV file using CSVHelper
using (FileStream fs = File.Open(#"C:\Users\Grant\Documents\SampleData4.csv", FileMode.Open, FileAccess.Read))
using (StreamReader sr = new StreamReader(fs))
{
CsvConfiguration csvConfig = new CsvConfiguration()
{BufferSize = bufferSize, AllowComments = true};
using (var csv = new CsvReader(sr, csvConfig))
{
while (csv.Read())
{
var name = csv.GetField<string>(0);
var gender = csv.GetField<string>(1);
var age = csv.GetField<int>(2);
var numKids = csv.GetField<int>(3);
var hasPet = csv.GetField<bool>(4);
var petType = csv.GetField<string>(5);
}
}
}
If you need me to provide additional details, just ask below.
Although a little late, I still would like to add an answer:
To create you own ETL process and Data Flow with C#, I would recommend you the nuget package ETLBox (https://etlbox.net). It will enable you to write a ETL data flow, where the CSV reader implementation is already wrapped in a CSVSource object. E.g., you would have to do the following to load data from a CSV into a database:
Defina a CSV source
CSVSource sourceOrderData = new CSVSource("demodata.csv");
Optionally define a row transformation:
RowTransformation<string[], Order> rowTrans = new RowTransformation<string[], Order>(
row => new Order(row)
);
Define the destination
DBDestination<Order> dest = new DBDestination<Order>("dbo.OrderTable");
Link your ETL data pipeline together
sourceOrderData.LinkTo(rowTrans);
rowTrans.LinkTo(dest);
Finally start the dataflow (async) and wait for all data to be loaded.
source.Execute();
dest.Wait();

Isolated Storage adding caracters at the end of a Stream

I'm having problems converting long into string.
What I'm doing is trying to save a DateTime.Now.Ticks property in isolatedStorage, then retrieve it afterwords. This is what I did to save it:
IsolatedStorageFile appStorage = IsolatedStorageFile.GetUserStoreForApplication();
using (var file = appStorage.CreateFile("appState"))
{
using (var sw = new StreamWriter(file))
{
sw.Write(DateTime.Now.Ticks);
}
}
When I retrieve the file, I do it like this:
if (appStorage.FileExists("appState"))
{
using (var file = appStorage.OpenFile("appState", FileMode.Open))
{
using (StreamReader sr = new StreamReader(file))
{
string s = sr.ReadToEnd();
}
}
appStorage.DeleteFile("appState");
}
Until here I have no problem, but when I try to convert the string I retrieved, the compiler throws a FormatExeption. This are the two ways I tried to do it with:
long time = long.Parse(s);
long time = (long)Convert.ToDouble(s);
So is there any other ways to so this?
EDIT:
The problem is not in the conversion but rather in the StreamWriter adding extra characters.
I suspect you are seeing some other data at the end. Something else may have written other data to the stream.
I think you should use StreamWriter.WriteLine() instead of StreamWriter.Write() to write the data and then call StreamReader.ReadLine() instead of StreamReader.ReadToEnd() to read it back in.

Is it possible to write a packaging.package to a stream without having to save it to a file first?

I have a System.IO.Packaging.Package in memory (it is a WordprocessingDocument) and want to stream it down to browser to save it. The word document has been modified by the MVC-based application and the resulting file has been modified for the current request.
I understand the package represents a 'zip' file containing a number of parts. These parts include headers, footers and main body document. I've modified each individually and now want to stream the package back to the user.
I can get the individual part streams... package.GetPart(new Uri("/word/document.xml", UriKind.Relative)).GetStream()
However I'm missing how to get an output stream on the entire document (package)- without writing to the file system.
Thanks in advance
No- what I think I need is something like this... I've already read in the template document and made modifications in memory. Now I want to stream a modified document (leaving the template un-touched) back to the user.
MemoryStream stream = new MemoryStream();
WordprocessingDocument docOut =
WordprocessingDocument.Create( stream, WordprocessingDocumentType.Document);
foreach (var part in package.GetParts())
{
using (StreamReader streamReader = new StreamReader(part.GetStream()))
{
PackagePart newPart = docOut.Package.CreatePart(
part.Uri, part.ContentType );
using (StreamWriter streamWriter = new StreamWriter(newPart.GetStream(FileMode.Create)))
{
streamWriter.Write(streamReader.ReadToEnd());
}
}
}
Unfortunately- this produces a 'corrupt' word document...
OpenXmlPackage.Close Method saves all changes in all parts to the underlying store. If you opened the package from a stream, just use that stream:
public Stream packageStream() {
var ms = new MemoryStream();
var wrdPk = WordprocessingDocument.Create(ms, WordprocessingDocumentType.Document);
// Build the package ...
var docPart = wrdPk.AddMainDocumentPart();
docPart.Document = new Document(
new Body(new Paragraph(new Run(new Text("Hello world.")))));
// Flush all changes
wrdPk.Close();
return ms;
}

Categories