How to read uploaded CSV UTF-8 for processing with CsvHelper? - c#

My WebAPI allows a user to upload a CSV file and then parses the file. I use CsvHelper to do the heavy lifting of reading the CSV and mapping it to domain objects.
However, I have one customer who's files are in CSV UTF-8 format. The code that works for "vanilla" (ASCII) CSV files hurls when it tries to deal with CSV UTF-8.
Is there a way to import the CSV UTF-8 data and convert it to ASCII CSV so that my code will continue to work?
My current code looks like this:
//In my WebAPI Controller
//fileToProcess is IFormFile
byte[] fileBytes = new byte[fileToProcess.Length];
using(var stream = fileToProcess.OpenReadStream())
{
await stream.ReadAsync(fileBytes);
stream.Close();
}
var result = await ProcessFileAsync(fileBytes);
return OK(result);
...
//In a Parsing Class
public async Task<List<Client>> ProcessFileAsync(byte[] fileBytes)
{
List<Client> result = null;
var fileText = Encoding.Default.GetString(fileBytes);
using(var reader = new StringReader(fileText))
{
using(var csv = new CsvReader(reader))
{
csv.RegisterClassMap<ClientMap>();
result = csv.GetRecords<T>().ToList();
await PostProcess(result);
}
}
return result;
}
The problem is that CSV UTF-8 has the BOM so when CsvHelper tries to process a mapping that references the first column header
Map(c => c.ClientId).Name("CLIENT ID");
it fails because the column name includes the BOM.
So, my questions are:
How can I tell if the file coming in is UTF-8 or ASCII.
How do I convert the UTF-8 to ASCII so it can be processed normally?
NOTE
I did try the following:
fileBytes = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, fileBytes);
However, this replaced the BOM with a ? which still causes CsvHelper to fail.

By doing this:
var fileText = Encoding.Default.GetString(fileBytes);
using(var reader = new StringReader(fileText))
... you're locking yourself into a specific encoding at the point of converting it to a string. Encoding.Default is can vary by platform and CLR implementation.
The StreamReader class is designed to read text from a stream (which you can wrap around the raw bytes with a MemoryStream) and is capable of detecting the encoding for you if you let it. Try this instead:
using (var stream = new MemoryStream(fileBytes))
using (var reader = new StreamReader(stream))
In your case, you could use the incoming stream directly by changing ProcessFileAsync to accept the stream.
using (var stream = fileToProcess.OpenReadStream())
{
var result = await ProcessFileAsync(stream);
return OK(result);
}
public async Task<List<Client>> ProcessFileAsync(Stream stream)
{
using (var reader = new StreamReader(stream))
{
using (var csv = new CsvReader(reader))
{
csv.RegisterClassMap<ClientMap>();
List<Client> result = csv.GetRecords<Client>().ToList();
await PostProcess(result);
return result;
}
}
}
As long as the BOM is present, this will also support UTF16-encoded and UTF32-encoded files (and pretty much anything else that can be detected) because it'll see the U+FEFF code point in whichever encoding it uses.

Related

Is there a way to not generate a file via CSV Helper?

If the any opportunity to not generate csv file in system?
I don't wanna to store it in my application and can we generate it smth on fly?
As return I want to converted csv to base64.
var path = Path.Combine(Directory.GetCurrentDirectory(), "test.csv");
await using var writer = new StreamWriter(path);
await using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
await csv.WriteRecordsAsync(list);
}
var bytes = await File.ReadAllBytesAsync(path);
return Convert.ToBase64String(bytes);
A StreamWriter can write to any stream, including a MemoryStream:
using var ms=new MemoryStream();
using var writer = new StreamWriter(ms);
...
return Convert.ToBase64String(ms.GetBuffer());
CSV files are text files though, so converting them to BASE64 isn't very useful. StreamWriter uses UTF8 encoding by default so it already handles any language.
It would be better to keep the text as text, especially if it's going to be stored in a text field in a database. This can be done by reading the bytes using a StreamReader
using var reader=new StreamReader(ms);
ms.Position=0;
var csvText=reader.ReadToEnd();
var csvText

Can save stream as local file, but when returning it as HttpResponseMessage - always empty file

I want to write export/download functionality for files from external API.
I've created separate Action for it. Using external API I can get stream for that file.
When I am saving that stream to local file, everything is fine, file isn't empty.
var exportedFile = await this.GetExportedFile(client, this.ReportId, this.WorkspaceId, export);
// Now you have the exported file stream ready to be used according to your specific needs
// For example, saving the file can be done as follows:
string pathOnDisk = #"D:\Temp\" + export.ReportName + exportedFile.FileSuffix;
using (var fileStream = File.Create(pathOnDisk))
{
await exportedFile.FileStream.CopyToAsync(fileStream);
}
But when I return exportedFile object that contains in it stream and do next:
var result = await this._service.ExportReport(reportName, format, CancellationToken.None);
var fileResult = new HttpResponseMessage(HttpStatusCode.OK);
using (var ms = new MemoryStream())
{
await result.FileStream.CopyToAsync(ms);
ms.Position = 0;
fileResult.Content = new ByteArrayContent(ms.GetBuffer());
}
fileResult.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment")
{
FileName = $"{reportName}{result.FileSuffix}"
};
fileResult.Content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
return fileResult;
Exported file is always empty.
Is it problem with stream or with code that try to return that stream as file?
Tried as #Nobody suggest to use ToArray
fileResult.Content = new ByteArrayContent(ms.ToArray());
the same result.
Also tried to use StreamContent
fileResult.Content = new StreamContent(result.FileStream);
still empty file.
But when I'm using StreamContent and MemmoryStream
using (var ms = new MemoryStream())
{
await result.FileStream.CopyToAsync(ms);
ms.Position = 0;
fileResult.Content = new StreamContent(ms);
}
in result I got
{
"error": "no response from server"
}
Note: from 3rd party API I get stream that is readonly.
you used GetBuffer() to retrieve the data of the memory stream.
The function you should use is ToArray()
Please read the Remarks of the documentation of these functions.
https://learn.microsoft.com/en-us/dotnet/api/system.io.memorystream.getbuffer?view=net-6.0
using (var ms = new MemoryStream())
{
ms.Position = 0;
await result.FileStream.CopyToAsync(ms);
fileResult.Content = new ByteArrayContent(ms.ToArray()); //ToArray() and not GetBuffer()
}
Your "mistake" although it's an obvious one is that you return a status message, but not the actual file itself (which is in it's own also a 200).
You return this:
var fileResult = new HttpResponseMessage(HttpStatusCode.OK);
So you're not sending a file, but a response message. What I'm missing in your code samples is the procedure call itself, but since you use a HttpResonseMessage I will assume it's rather like a normal Controller action. If that is the case you could respond in a different manner:
return new FileContentResult(byteArray, mimeType){ FileDownloadName = filename };
where byteArray is ofcourse just a byte[], the mimetype could be application/octet-stream (but I suggest you'd actually find the correct mimetype for the browser to act accordingly) and the filename is the filename you want the file to be named.
So, if you were to stitch above and my comment together you'd get this:
var exportedFile = await this.GetExportedFile(client, this.ReportId, this.WorkspaceId, export);
// Now you have the exported file stream ready to be used according to your specific needs
// For example, saving the file can be done as follows:
string pathOnDisk = #"D:\Temp\" + export.ReportName + exportedFile.FileSuffix;
using (var fileStream = File.Create(pathOnDisk))
{
await exportedFile.FileStream.CopyToAsync(fileStream);
}
return new FileContentResult(System.IO.File.ReadAllBytes(pathOnDisk), "application/octet-stream") { FileDownloadName = export.ReportName + exportedFile.FileSuffix };
I suggest to try it, since you still report a 200 (and not a fileresult)

Encoding issue with spanish file in C#

I have a file store online in an azure blob storage in spanish. Some word have special charactere (for example : Almacén)
When I open the file in notepad++, the encoding is ANSI.
So now I try to read the file with the code :
using StreamReader reader = new StreamReader(Stream, Encoding.UTF8);
blobStream.Seek(0, SeekOrigin.Begin);
var allLines = await reader.ReadToEndAsync();
the issue is that "allLines" are not proper encoding, I have some issue like : Almac�n
I have try some solution like this one :
C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H
but still not working
(the final goal is to "merge" two csv so I read the stream of both, remove the header and concatenate the string to push it again. If there is a better solution to merge csv in c# that can skip this encoding issue I am open to it also)
You are trying to read a non-UTF8 encoded file as if it was UTF8 encoded. I can replicate this issue with
var s = "Almacén";
using var memStream = new MemoryStream(Encoding.GetEncoding(28591).GetBytes(s));
using var reader = new StreamReader(memStream, Encoding.UTF8);
var allLines = await reader.ReadToEndAsync();
Console.WriteLine(allLines); // writes "Almac�n" to console
You should be attempting to read the file with encoding iso-8859-1 "Western European (ISO)" which is codepage 28591.
using var reader = new StreamReader(Stream, Encoding.GetEncoding(28591));
var allLines = await reader.ReadToEndAsync();

How to save the docx binary data to file in web api C#

I'm fetching a byte[] of a docx file from one API to another and processing it like Application.Documents.Open(array); or File.WriteAllBytes(path, array);
I think the data received is in some UTF-8 format but i have no idea how to convert and process them
Fetching the file from the API using below code.
var httpResponse = (HttpWebResponse)httpWebRequest.GetResponse();
using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
{
var result = streamReader.ReadToEnd();
return result;
}
The output(byte[]) is received as string like this.
string result = "PK[][]?......";
Please check the postman screenshot
And trying with below code to save to folder but not working
byte[] res = result .ToArray();
File.WriteAllBytes(#"C:\temp\myfile.docx", res);
also tried this but didn't work,
byte[] mybytearray = Convert.FromBase64String(t);
byte[] barr = Encoding.ASCII.GetBytes(hardcode);
Maybe you should use BinaryReader instead of StreamReader.
See also: https://stackoverflow.com/a/8613300/1438829

How to get the stream for a Multipart file in webapi upload?

I need to upload a file using Stream (Azure Blobstorage), and just cannot find out how to get the stream from the object itself. See code below.
I'm new to the WebAPI and have used some examples. I'm getting the files and filedata, but it's not correct type for my methods to upload it. Therefore, I need to get or convert it into a normal Stream, which seems a bit hard at the moment :)
I know I need to use ReadAsStreamAsync().Result in some way, but it crashes in the foreach loop since I'm getting two provider.Contents (first one seems right, second one does not).
[System.Web.Http.HttpPost]
public async Task<HttpResponseMessage> Upload()
{
if (!Request.Content.IsMimeMultipartContent())
{
this.Request.CreateResponse(HttpStatusCode.UnsupportedMediaType);
}
var provider = GetMultipartProvider();
var result = await Request.Content.ReadAsMultipartAsync(provider);
// On upload, files are given a generic name like "BodyPart_26d6abe1-3ae1-416a-9429-b35f15e6e5d5"
// so this is how you can get the original file name
var originalFileName = GetDeserializedFileName(result.FileData.First());
// uploadedFileInfo object will give you some additional stuff like file length,
// creation time, directory name, a few filesystem methods etc..
var uploadedFileInfo = new FileInfo(result.FileData.First().LocalFileName);
// Remove this line as well as GetFormData method if you're not
// sending any form data with your upload request
var fileUploadObj = GetFormData<UploadDataModel>(result);
Stream filestream = null;
using (Stream stream = new MemoryStream())
{
foreach (HttpContent content in provider.Contents)
{
BinaryFormatter bFormatter = new BinaryFormatter();
bFormatter.Serialize(stream, content.ReadAsStreamAsync().Result);
stream.Position = 0;
filestream = stream;
}
}
var storage = new StorageServices();
storage.UploadBlob(filestream, originalFileName);**strong text**
private MultipartFormDataStreamProvider GetMultipartProvider()
{
var uploadFolder = "~/App_Data/Tmp/FileUploads"; // you could put this to web.config
var root = HttpContext.Current.Server.MapPath(uploadFolder);
Directory.CreateDirectory(root);
return new MultipartFormDataStreamProvider(root);
}
This is identical to a dilemma I had a few months ago (capturing the upload stream before the MultipartStreamProvider took over and auto-magically saved the stream to a file). The recommendation was to inherit that class and override the methods ... but that didn't work in my case. :( (I wanted the functionality of both the MultipartFileStreamProvider and MultipartFormDataStreamProvider rolled into one MultipartStreamProvider, without the autosave part).
This might help; here's one written by one of the Web API developers, and this from the same developer.
Hi just wanted to post my answer so if anybody encounters the same issue they can find a solution here itself.
here
MultipartMemoryStreamProvider stream = await this.Request.Content.ReadAsMultipartAsync();
foreach (var st in stream.Contents)
{
var fileBytes = await st.ReadAsByteArrayAsync();
string base64 = Convert.ToBase64String(fileBytes);
var contentHeader = st.Headers;
string filename = contentHeader.ContentDisposition.FileName.Replace("\"", "");
string filetype = contentHeader.ContentType.MediaType;
}
I used MultipartMemoryStreamProvider and got all the details like filename and filetype from the header of content.
Hope this helps someone.

Categories