Is there a way to not generate a file via CSV Helper? - c#

If the any opportunity to not generate csv file in system?
I don't wanna to store it in my application and can we generate it smth on fly?
As return I want to converted csv to base64.
var path = Path.Combine(Directory.GetCurrentDirectory(), "test.csv");
await using var writer = new StreamWriter(path);
await using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
await csv.WriteRecordsAsync(list);
}
var bytes = await File.ReadAllBytesAsync(path);
return Convert.ToBase64String(bytes);

A StreamWriter can write to any stream, including a MemoryStream:
using var ms=new MemoryStream();
using var writer = new StreamWriter(ms);
...
return Convert.ToBase64String(ms.GetBuffer());
CSV files are text files though, so converting them to BASE64 isn't very useful. StreamWriter uses UTF8 encoding by default so it already handles any language.
It would be better to keep the text as text, especially if it's going to be stored in a text field in a database. This can be done by reading the bytes using a StreamReader
using var reader=new StreamReader(ms);
ms.Position=0;
var csvText=reader.ReadToEnd();
var csvText

Related

InputFile and TextFieldParser to parse csv

I can parse a CSV file by using the file path and TextFieldParser.
Now I'm trying to parse a CSV file received from InputFile component.
Here is what I tried:
var stream = e.File.OpenReadStream();
var memoryStream = new MemoryStream();
await stream.CopyToAsync(memoryStream);
stream.Close();
using (var parser = new TextFieldParser(memoryStream))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(";", ",");
while (!parser.EndOfData)
{
//Do something here
}
}
But when I run that, it does not enter the using block and never go further.
What should I do?
Thanks
use try-catch block for see Exceptions and error details.

Encoding issue with spanish file in C#

I have a file store online in an azure blob storage in spanish. Some word have special charactere (for example : Almacén)
When I open the file in notepad++, the encoding is ANSI.
So now I try to read the file with the code :
using StreamReader reader = new StreamReader(Stream, Encoding.UTF8);
blobStream.Seek(0, SeekOrigin.Begin);
var allLines = await reader.ReadToEndAsync();
the issue is that "allLines" are not proper encoding, I have some issue like : Almac�n
I have try some solution like this one :
C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H
but still not working
(the final goal is to "merge" two csv so I read the stream of both, remove the header and concatenate the string to push it again. If there is a better solution to merge csv in c# that can skip this encoding issue I am open to it also)
You are trying to read a non-UTF8 encoded file as if it was UTF8 encoded. I can replicate this issue with
var s = "Almacén";
using var memStream = new MemoryStream(Encoding.GetEncoding(28591).GetBytes(s));
using var reader = new StreamReader(memStream, Encoding.UTF8);
var allLines = await reader.ReadToEndAsync();
Console.WriteLine(allLines); // writes "Almac�n" to console
You should be attempting to read the file with encoding iso-8859-1 "Western European (ISO)" which is codepage 28591.
using var reader = new StreamReader(Stream, Encoding.GetEncoding(28591));
var allLines = await reader.ReadToEndAsync();

ZipArchive, update entry: read - truncate - write

I'm using System.IO.Compression's ZipArchive to modify a file within a ZIP. I need first to read the whole content (JSON), transform the JSON then truncate the file and write the new JSON to the file. At the moment I have the following code:
using (var zip = new ZipArchive(new FileStream(zipFilePath, FileMode.Open, FileAccess.ReadWrite), ZipArchiveMode.Update))
{
using var stream = zip.GetEntry(entryName).Open();
using var reader = new StreamReader(stream);
using var jsonTextReader = new JsonTextReader(reader);
var json = JObject.Load(jsonTextReader);
PerformModifications(json);
stream.Seek(0, SeekOrigin.Begin);
using var writer = new StreamWriter(stream);
using var jsonTextWriter = new JsonTextWriter(writer);
json.WriteTo(jsonTextWriter);
}
However, the problem is: if the resulting JSON is shorter than the original version, the remainder of the original is not truncated. Therefore I need to properly truncate the file before writing to it.
How to truncate the entry before writing to it?
You can either delete the entry before writing it back or, which I prefer, use stream.SetLength(0) to truncate the stream before writing. (See also https://stackoverflow.com/a/46810781/62838.)

convert compress xmldocument as zip and get as byte array

I am building a XmlDocument in memory (I am not writing it to disk). I need to be able to create a zip archive that would contain the Xml file and then get the zip archive as byte array (all of this without actually writing/creating anything on the hard-disk). Is this possible?
I should mention that I am trying to do this in C#.
var buffer = new MemoryStream();
using (buffer)
using (var zip = new ZipArchive(buffer, ZipArchiveMode.Create) )
{
var entry = zip.CreateEntry("content.xml", CompressionLevel.Optimal);
using (var stream = entry.Open())
{
xmlDoc.Save(stream);
}
}
var bytes = buffer.ToArray();

How to read uploaded CSV UTF-8 for processing with CsvHelper?

My WebAPI allows a user to upload a CSV file and then parses the file. I use CsvHelper to do the heavy lifting of reading the CSV and mapping it to domain objects.
However, I have one customer who's files are in CSV UTF-8 format. The code that works for "vanilla" (ASCII) CSV files hurls when it tries to deal with CSV UTF-8.
Is there a way to import the CSV UTF-8 data and convert it to ASCII CSV so that my code will continue to work?
My current code looks like this:
//In my WebAPI Controller
//fileToProcess is IFormFile
byte[] fileBytes = new byte[fileToProcess.Length];
using(var stream = fileToProcess.OpenReadStream())
{
await stream.ReadAsync(fileBytes);
stream.Close();
}
var result = await ProcessFileAsync(fileBytes);
return OK(result);
...
//In a Parsing Class
public async Task<List<Client>> ProcessFileAsync(byte[] fileBytes)
{
List<Client> result = null;
var fileText = Encoding.Default.GetString(fileBytes);
using(var reader = new StringReader(fileText))
{
using(var csv = new CsvReader(reader))
{
csv.RegisterClassMap<ClientMap>();
result = csv.GetRecords<T>().ToList();
await PostProcess(result);
}
}
return result;
}
The problem is that CSV UTF-8 has the BOM so when CsvHelper tries to process a mapping that references the first column header
Map(c => c.ClientId).Name("CLIENT ID");
it fails because the column name includes the BOM.
So, my questions are:
How can I tell if the file coming in is UTF-8 or ASCII.
How do I convert the UTF-8 to ASCII so it can be processed normally?
NOTE
I did try the following:
fileBytes = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, fileBytes);
However, this replaced the BOM with a ? which still causes CsvHelper to fail.
By doing this:
var fileText = Encoding.Default.GetString(fileBytes);
using(var reader = new StringReader(fileText))
... you're locking yourself into a specific encoding at the point of converting it to a string. Encoding.Default is can vary by platform and CLR implementation.
The StreamReader class is designed to read text from a stream (which you can wrap around the raw bytes with a MemoryStream) and is capable of detecting the encoding for you if you let it. Try this instead:
using (var stream = new MemoryStream(fileBytes))
using (var reader = new StreamReader(stream))
In your case, you could use the incoming stream directly by changing ProcessFileAsync to accept the stream.
using (var stream = fileToProcess.OpenReadStream())
{
var result = await ProcessFileAsync(stream);
return OK(result);
}
public async Task<List<Client>> ProcessFileAsync(Stream stream)
{
using (var reader = new StreamReader(stream))
{
using (var csv = new CsvReader(reader))
{
csv.RegisterClassMap<ClientMap>();
List<Client> result = csv.GetRecords<Client>().ToList();
await PostProcess(result);
return result;
}
}
}
As long as the BOM is present, this will also support UTF16-encoded and UTF32-encoded files (and pretty much anything else that can be detected) because it'll see the U+FEFF code point in whichever encoding it uses.

Categories