StringWriter processes strange characters on csv generated - c#

I'm having troubles using StringWriter on our application.
I do a rest call over a nosql db and it returns a list of dynamics.
I use StringWriter to write a csv file that contains a header and records from my list.
I also tried to extend the StringWriter with a sealed class with constructor method which allows you to enter the type of encoding as a parameter. But trying all the encodings available it still generates wrong charachters.
This is our extension of StringWriter:
public sealed class StringWriterWithEncoding : StringWriter
{
private readonly Encoding encoding;
public StringWriterWithEncoding() : this(Encoding.UTF8) { }
public StringWriterWithEncoding(Encoding encoding)
{
this.encoding = encoding;
}
public override Encoding Encoding
{
get { return encoding; }
}
}
and this is the code for generate the csv file:
StringWriterWithEncoding sw = new StringWriterWithEncoding();
// Header
sw.WriteLine(string.Format("{0};{1};{2};{3};{4};{5};{6};{7};{8};{9};", "Soddisfazione", "Data Ricerca", "Categorie Cercate", "Id Utente", "Utente", "Categoria", "Id Documento", "Documento", "Id Sessione", "Testo Ricerca"));
foreach (var item in result.modelListDyn)
{
sw.WriteLine(string.Format("{0};{1};{2};{3};{4};{5};{6};{7};{8};{9};", item.Satisfaction, item.Date, item.Cluster, item.UserId, item.Username, item.Category, item.DocumentId, HttpUtility.HtmlDecode(item.DocumentTitle.ToString()), item.SessionId,
item.TextSearch));
}
var response = Request.CreateResponse(HttpStatusCode.OK, sw.ToString());
response.Content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("text/plain");
return response;
When the file is generated on in a column with some text, it display strange chars:
L’indennità di licenziamento del Jobs Act è incostituzionale
This is italian, and the wrong chars are seems to be à è ò ' ù etc.
Anyone can suggest a solution?
Thank you!
UPDATE
As user suggested, i started using CsvHelper
I created a Class and a ClassMap but it still returns corrupted chars.
StringWriter sw = new StringWriter();
CsvWriter cw = new CsvWriter(sw);
using (CsvWriter csv = new CsvWriter(sw))
{
csv.Configuration.RegisterClassMap<HistorySearchModelCsvHelperMap>();
csv.Configuration.CultureInfo = CultureInfo.InvariantCulture;
csv.WriteRecords(csvModelHelperList);
}
Result:
UPDATE 2
The problem is client-side, my action returns the correct text, without broken chars.
Action is triggered when i call it with an axios get instance.
axios.get(url, {
headers: {
'Accept': 'application/vnd.ms-excel',
'Content-Type': 'application/vnd.ms-excel'
}
})
.then(({ data }) => {
const blob = new Blob([data], {
type: 'application/vnd.ms-excel',
});
// "fileDownload" is 'js-file-download' module.
fileDownload(blob, 'HistorySearches.csv', 'application/vnd.ms-excel');
this.setState({ exportLoaded: true, exportLoading: false });
}).catch(() => {
this.setState({ exportLoaded: false, exportLoading: false });
});
I read to set responseType to blob but even passing the type: 'application/vnd.ms-excel' the chars over my csv file are still corrupted.
In my action when i return the Response:
// ... some code
StringWriterWithEncoding sw = new StringWriterWithEncoding();
CsvWriter cw = new CsvWriter(sw);
using (CsvWriter csv = new CsvWriter(sw))
{
csv.Configuration.RegisterClassMap<HistorySearchModelCsvHelperMap>();
csv.Configuration.CultureInfo = CultureInfo.InvariantCulture;
csv.WriteRecords(csvModelHelperList);
}
return Request.CreateResponse(HttpStatusCode.OK, sw.ToString());
// response.Content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/vnd.ms-excel");
return response;
I tried to set content type server-side too, but the format is incorrect anyway.

If you want to be able to open your csv in Excel, you need to write it with an encoding of Windows-1255.
If you open the csv in a generic text editor and it still displays incorrectly, I'm not sure what's wrong, as your code looks sane.

Solved directly on client-side.
I made my own download routine and passed the UTF-8 BOM as first value of response string:
downloadFile2(data, fileName, type="text/string") {
// Create an invisible A element
const a = document.createElement("a");
a.style.display = "none";
// Using "universal BOM" https://technet.microsoft.com/en-us/2yfce773(v=vs.118)
const universalBOM = "\uFEFF";
a.setAttribute('href', 'data:text/csv; charset=utf-8,' + encodeURIComponent(universalBOM+data));
// Use download attribute to set set desired file name
a.setAttribute('download', fileName);
document.body.appendChild(a);
// Trigger the download by simulating click
a.click();
// Cleanup
window.URL.revokeObjectURL(a.href);
document.body.removeChild(a);
},

Related

Get processed word file from c# endpoint to nodejs server

I have a nodejs server that sends a GET request with axios to a c# endpoint with json as a parameter. My c# api uses Newtonsoft.Json to deserialize the json, then it reads a word file into memory, and inserts data. The final step I need is for this api to respond by sending the modified document back to the nodejs server. Currently, the c# endpoint is called, and a response is sent back. Upon writing the word document using the archiver library and opening it, a dialogue box appears, saying "Word found unreadable content in export0.docx. Do you want to recover the contents of this document? If you trust the source of this document, click Yes"
async exportToDotnet() {
return await axios.get(`https://localhost:8082/test/${JSON.stringify(this)}`, { responseType: 'arrayBuffer' }).catch(err => {
console.log(`ERR `, err);
}).then((axiosResponse) => {
const data = axiosResponse.data;
console.log(`DATA `, data);
console.log(`DATA LENGTH '${data.length}'`);
return data;
});
}
async function writeZipFile(resultFromExportToDotnet) {
const output = createWriteStream('exported.zip');
output.on("close", () => {
console.log("success????");
});
const archive = archiver('zip');
archive.on('error', (err) => {
console.log('error in archive ', err);
});
archive.append(form, { name: `export0.docx` });
archive.pipe(output);
await archive.finalize();
}
[HttpGet("test/{json}")]
public byte[] ExportDocumentBuffer(string json)
{
Console.WriteLine("Called");
//Converts the json passed in to a FormInstance Object
FormInstance form = JsonConvert.DeserializeObject<FormInstance>(json);
//Read the dotx into memory so we can use it. Would it be better to just use System.IO.File.ReadAllBytes()?
MemoryStream dotxBytes = ReadAllBytesToMemoryStream("test.dotx");
//Read the file bytes into a WordProcessingDocument that we can edit
WordprocessingDocument template = WordprocessingDocument.Open(dotxBytes, true);
template.ChangeDocumentType(WordprocessingDocumentType.Document);
template = ParseFormAndInsertValues(form, template);
byte[] output = dotxBytes.ToArray();
Console.WriteLine($"BYTES '{output.Length}'");
return output;
}
///<summary>Reads all Bytes of the provided file into memory</summary>
///<param name="path">The path to the file</param>
///<returns>A MemoryStream of the file data</returns>
public static MemoryStream ReadAllBytesToMemoryStream(string path)
{
byte[] buffer = System.IO.File.ReadAllBytes(path);
MemoryStream destStream = new MemoryStream(buffer.Length);
destStream.Write(buffer, 0, buffer.Length);
destStream.Seek(0, SeekOrigin.Begin);
return destStream;
}
Things I've tried
Changing the axios responsetype to 'stream', converting the response to a buffer with a function, and writing it to a file
function stream2buffer(stream) {
return new Promise((resolve, reject) => {
const _buf = [];
stream.on("data", (chunk) => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", (err) => reject(err));
});
}
Changing my c# method to return a HttpResponseMessage
HttpResponseMessage result = new HttpResponseMessage(System.Net.HttpStatusCode.OK)
{
Content = new ByteArrayContent(dotxBytes.ToArray())
};
result.Content.Headers.ContentDisposition = new System.Net.Http.Headers.ContentDispositionHeaderValue("attachment")
{
FileName = "exampleName.docx"
};
result.Content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
Logging the length of byte[] and logging data.length produce 2 different numbers (52107 and 69476, respectively). Is this just a serialization issue? Obviously I'm missing something. Any help would be much appreciated!
Turns out to have been a few things: I used template = WordProcessingDocument.Open(), but never called template.Save() or template.Close() and as such, my changes were never written, and the file was still open. Once I got my byte array output, I used Convert.ToBase64String(output) and returned the string. On the NodeJs side, I changed the responsetype to 'text', and returned Buffer.from(axiosResponse.data, 'base64'); and wrote the file that way.

Can save stream as local file, but when returning it as HttpResponseMessage - always empty file

I want to write export/download functionality for files from external API.
I've created separate Action for it. Using external API I can get stream for that file.
When I am saving that stream to local file, everything is fine, file isn't empty.
var exportedFile = await this.GetExportedFile(client, this.ReportId, this.WorkspaceId, export);
// Now you have the exported file stream ready to be used according to your specific needs
// For example, saving the file can be done as follows:
string pathOnDisk = #"D:\Temp\" + export.ReportName + exportedFile.FileSuffix;
using (var fileStream = File.Create(pathOnDisk))
{
await exportedFile.FileStream.CopyToAsync(fileStream);
}
But when I return exportedFile object that contains in it stream and do next:
var result = await this._service.ExportReport(reportName, format, CancellationToken.None);
var fileResult = new HttpResponseMessage(HttpStatusCode.OK);
using (var ms = new MemoryStream())
{
await result.FileStream.CopyToAsync(ms);
ms.Position = 0;
fileResult.Content = new ByteArrayContent(ms.GetBuffer());
}
fileResult.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment")
{
FileName = $"{reportName}{result.FileSuffix}"
};
fileResult.Content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
return fileResult;
Exported file is always empty.
Is it problem with stream or with code that try to return that stream as file?
Tried as #Nobody suggest to use ToArray
fileResult.Content = new ByteArrayContent(ms.ToArray());
the same result.
Also tried to use StreamContent
fileResult.Content = new StreamContent(result.FileStream);
still empty file.
But when I'm using StreamContent and MemmoryStream
using (var ms = new MemoryStream())
{
await result.FileStream.CopyToAsync(ms);
ms.Position = 0;
fileResult.Content = new StreamContent(ms);
}
in result I got
{
"error": "no response from server"
}
Note: from 3rd party API I get stream that is readonly.
you used GetBuffer() to retrieve the data of the memory stream.
The function you should use is ToArray()
Please read the Remarks of the documentation of these functions.
https://learn.microsoft.com/en-us/dotnet/api/system.io.memorystream.getbuffer?view=net-6.0
using (var ms = new MemoryStream())
{
ms.Position = 0;
await result.FileStream.CopyToAsync(ms);
fileResult.Content = new ByteArrayContent(ms.ToArray()); //ToArray() and not GetBuffer()
}
Your "mistake" although it's an obvious one is that you return a status message, but not the actual file itself (which is in it's own also a 200).
You return this:
var fileResult = new HttpResponseMessage(HttpStatusCode.OK);
So you're not sending a file, but a response message. What I'm missing in your code samples is the procedure call itself, but since you use a HttpResonseMessage I will assume it's rather like a normal Controller action. If that is the case you could respond in a different manner:
return new FileContentResult(byteArray, mimeType){ FileDownloadName = filename };
where byteArray is ofcourse just a byte[], the mimetype could be application/octet-stream (but I suggest you'd actually find the correct mimetype for the browser to act accordingly) and the filename is the filename you want the file to be named.
So, if you were to stitch above and my comment together you'd get this:
var exportedFile = await this.GetExportedFile(client, this.ReportId, this.WorkspaceId, export);
// Now you have the exported file stream ready to be used according to your specific needs
// For example, saving the file can be done as follows:
string pathOnDisk = #"D:\Temp\" + export.ReportName + exportedFile.FileSuffix;
using (var fileStream = File.Create(pathOnDisk))
{
await exportedFile.FileStream.CopyToAsync(fileStream);
}
return new FileContentResult(System.IO.File.ReadAllBytes(pathOnDisk), "application/octet-stream") { FileDownloadName = export.ReportName + exportedFile.FileSuffix };
I suggest to try it, since you still report a 200 (and not a fileresult)

ServiceStack's HttpResult class does not properly format CSV files

I am trying to output a CSV file using an endpoint on a service in ServiceStack, using the HttpResult class.
The CSV string itself is being constructed via StringWriter and CsvHelper.
If the content type is set to "text/plain", the text appears on the browser screen fine when the endpoint URL is hit. However, if it is set to "text/csv", a CSV file is generated, but the information inside it is not correct.
For example:
Expected output:
Header 1, Header 2, Header 3
Actual output:
H,e,a,d,e,r, ,1,,, ,H,e,a,d,e,r, ,2,,, ,H,e,a,d,e,r, 3,"
Is there something I'm possibly missing?
Also, on a side note, how do I set the file name for the file itself? It appears I have to use HttpHeaders.ContentDisposition, but when I tried to set it I got an error along the lines of having multiple header elements.
EDIT: Sorry forgot to include code snippet.
string response = string.Empty;
using (var writer = new StringWriter())
{
using (var csv = new CsvWriter(writer))
{
csv.WriteHeader<TestClass>();
foreach (var element in elements)
{
csv.WriteField(elements.header1);
csv.WriteField(elements.header2);
csv.WriteField(elements.header3);
csv.NextRecord();
}
}
//apparently double quotes can cause the rendered CSV to go wrong in some parts, so I added this as part of trying
response = writer.ToString().Replace("\"", "");
}
return new HttpResult(response)
{
StatusCode = HttpStatusCode.OK,
ContentType = "test/csv"
};
And the info on TestClass:
public class TestClass
{
public string Header1 { get; set; }
public string Header2 { get; set; }
public string Header3 { get; set; }
}
From your description your HttpResult File Response may be serialized by the built-in CSV Format.
If you're not using it you can remove it with:
Plugins.RemoveAll(x => x is CsvFormat);
Otherwise if you are using it, you can circumvent its serialization by writing the CSV file in your Services implementation, e.g:
public class MyCsv : IReturn<string> {}
public async Task Any(MyCsv request)
{
var file = base.VirtualFileSources.GetFile("path/to/my.csv");
if (file == null)
throw HttpError.NotFound("no csv here");
Response.ContentType = MimeTypes.Csv;
Response.AddHeader(HttpHeaders.ContentDisposition,
$"attachment; filename=\"{file.Name}\";");
using (var stream = file.OpenRead())
{
await stream.CopyToAsync(Response.OutputStream);
await Response.OutputStream.FlushAsync();
Response.EndRequest(skipHeaders:true);
}
}
Edit since you're returning a raw CSV string you can write it to the response with:
Response.ContentType = MimeTypes.Csv;
await Response.WriteAsync(response);
Response.EndRequest(skipHeaders:true);

How to read uploaded CSV UTF-8 for processing with CsvHelper?

My WebAPI allows a user to upload a CSV file and then parses the file. I use CsvHelper to do the heavy lifting of reading the CSV and mapping it to domain objects.
However, I have one customer who's files are in CSV UTF-8 format. The code that works for "vanilla" (ASCII) CSV files hurls when it tries to deal with CSV UTF-8.
Is there a way to import the CSV UTF-8 data and convert it to ASCII CSV so that my code will continue to work?
My current code looks like this:
//In my WebAPI Controller
//fileToProcess is IFormFile
byte[] fileBytes = new byte[fileToProcess.Length];
using(var stream = fileToProcess.OpenReadStream())
{
await stream.ReadAsync(fileBytes);
stream.Close();
}
var result = await ProcessFileAsync(fileBytes);
return OK(result);
...
//In a Parsing Class
public async Task<List<Client>> ProcessFileAsync(byte[] fileBytes)
{
List<Client> result = null;
var fileText = Encoding.Default.GetString(fileBytes);
using(var reader = new StringReader(fileText))
{
using(var csv = new CsvReader(reader))
{
csv.RegisterClassMap<ClientMap>();
result = csv.GetRecords<T>().ToList();
await PostProcess(result);
}
}
return result;
}
The problem is that CSV UTF-8 has the BOM so when CsvHelper tries to process a mapping that references the first column header
Map(c => c.ClientId).Name("CLIENT ID");
it fails because the column name includes the BOM.
So, my questions are:
How can I tell if the file coming in is UTF-8 or ASCII.
How do I convert the UTF-8 to ASCII so it can be processed normally?
NOTE
I did try the following:
fileBytes = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, fileBytes);
However, this replaced the BOM with a ? which still causes CsvHelper to fail.
By doing this:
var fileText = Encoding.Default.GetString(fileBytes);
using(var reader = new StringReader(fileText))
... you're locking yourself into a specific encoding at the point of converting it to a string. Encoding.Default is can vary by platform and CLR implementation.
The StreamReader class is designed to read text from a stream (which you can wrap around the raw bytes with a MemoryStream) and is capable of detecting the encoding for you if you let it. Try this instead:
using (var stream = new MemoryStream(fileBytes))
using (var reader = new StreamReader(stream))
In your case, you could use the incoming stream directly by changing ProcessFileAsync to accept the stream.
using (var stream = fileToProcess.OpenReadStream())
{
var result = await ProcessFileAsync(stream);
return OK(result);
}
public async Task<List<Client>> ProcessFileAsync(Stream stream)
{
using (var reader = new StreamReader(stream))
{
using (var csv = new CsvReader(reader))
{
csv.RegisterClassMap<ClientMap>();
List<Client> result = csv.GetRecords<Client>().ToList();
await PostProcess(result);
return result;
}
}
}
As long as the BOM is present, this will also support UTF16-encoded and UTF32-encoded files (and pretty much anything else that can be detected) because it'll see the U+FEFF code point in whichever encoding it uses.

MonoTouch - UIDevice.CurrentDevice.Name - UTF8

We've noticed that UTF8 characters don't come out correctly when using UIDevice.CurrentDevice.Name in MonoTouch.
It comes out as "iPad 2 ??", if you use some of the special characters like holding down the apostrophe key on the iPad keyboard. (Sorry don't know the equivalent to show these characters in windows)
Is there a recommended workaround to get the correct text? We don't mind to convert to UTF8 ourselves. I also tried simulating this from a UITextField and it worked fine--no UTF8 problems.
The reason this is causing problems is we are sending this text off to a web service, and it's causing XML parsing issues.
Here is a snipped of the XmlWriter code (_parser.WriteRequest):
using (XmlWriter xmlWriter = XmlWriter.Create(textWriter, new XmlWriterSettings
{
#if DEBUG
Indent = true,
#else
Indent = false, NewLineHandling = NewLineHandling.None,
#endif
OmitXmlDeclaration = true
}))
{
xmlWriter.WriteStartDocument();
xmlWriter.WriteStartElement("REQUEST");
xmlWriter.WriteAttributeString("TYPE", "EXAMPLE");
xmlWriter.WriteEndElement();
xmlWriter.WriteEndDocument();
}
The TextWriter is passed in from:
public Response MakeRequest(Request request)
{
var httpRequest = CreateRequest(request);
WriteRequest(httpRequest.GetRequestStream(), request);
using (var httpResponse = httpRequest.GetResponse() as HttpWebResponse)
{
using (var responseStream = httpResponse.GetResponseStream())
{
var response = new Response();
ReadResponse(response, responseStream);
return response;
}
}
}
private void WriteRequest(Stream requestStream, Request request)
{
if (request.Type == null)
{
throw new InvalidOperationException("Request Type was null!");
}
if (_logger.Enabled)
{
var builder = new StringBuilder();
using (var writer = new StringWriter(builder, CultureInfo.InvariantCulture))
{
_parser.WriteRequest(writer, request);
}
_logger.Log("REQUEST: " + builder.ToString());
using (requestStream)
{
using (StreamWriter writer = new StreamWriter(requestStream))
{
writer.Write(builder.ToString());
}
}
}
else
{
using (requestStream)
{
using (StreamWriter writer = new StreamWriter(requestStream))
{
_parser.WriteRequest(writer, request);
}
}
}
}
_logger writes to Console.WriteLine, it is enabled in #if DEBUG mode. Request is just a storage class with properties, sorry easy to confuse with HttpWebRequest.
I'm seeing ?? in both XCode's console and MonoDevelop's console. I'm also assuming the server is receiving them strangely as well, as I get an error. Using UITextField.Text with the same strange characters instead of the device description works fine with no issues. It makes me think the device description is the culprit.
EDIT: this fixed it -
Encoding.UTF8.GetString (Encoding.ASCII.GetBytes(UIDevice.CurrentDevice.Name));
Okay, I think I know the problem. You're creating a StringWriter, which always reports its encoding as UTF-16 (unless you override the Encoding property). You're then taking the string from that StringWriter (which will start with <?xml version="1.0" encoding="UTF-16" ?>) and writing it to a StreamWriter which will default to UTF-8. That mixture of encodings is causing the problem.
The simplest approach would be to change your code to pass a Stream directly to the XmlWriter - a MemoryStream if you really want, or just requestStream. That way the XmlWriter can declare that it's using the exact encoding that it's actually writing the binary data in - you haven't got an intermediate step to mess things up.
Alternatively, you could create a subclass of StringWriter which allows you to specify the encoding. See this answer for some sample code.
MonoTouch simply calls NSString.FromHandle on the value it receive from the call on UIDevice.CurrentDevice.Name. That just like most string are created from NSString inside all bindings.
That should get you a string that you can see it MonoDevelop (no ?) so I can't rule out a bug.
Can you tell us exactly how the device is named ? if so then please open a bug report and we'll check this possibility.

Categories