Read Json file containing umlaut in c#

Read Json file containing umlaut in c# - c#

I am trying to Read Json file containing umlaut in c#, following is the format of the file:
{
"BankCodeOertlich": "59000000",
"BicOertlich": "",
"ErgaenzungName": "Außenst. Sulzbach",
"HauptstelleAussenstellen": "Außenstelle v. Finanzamt Saarbrücken Am"
}
I am using following code to read the json in c#:
public static List<T> Load<T>(string filePath)
{
using (var stream = File.OpenRead(filePath))
{
var reader = new StreamReader(stream, Encoding.UTF8);
List<T> data = JsonConvert.DeserializeObject<List<T>>(reader.ReadToEnd());
return data;
}
}
I am getting the following output:
{
"BankCodeOertlich": "59000000",
"BicOertlich": "",
"ErgaenzungName": "Au?enst. Sulzbach",
"HauptstelleAussenstellen": "Au?enstelle v. Finanzamt Saarbr?cken Am"
}
This is just a example not a actual ouput. I tried to change to Encoding of StreamReader but its not working. is there any better way to do it?

The file is not encoded in UTF8, try Encoding.GetEncoding("iso-8859-1") instead.

Related

How to read uploaded CSV UTF-8 for processing with CsvHelper?

My WebAPI allows a user to upload a CSV file and then parses the file. I use CsvHelper to do the heavy lifting of reading the CSV and mapping it to domain objects.
However, I have one customer who's files are in CSV UTF-8 format. The code that works for "vanilla" (ASCII) CSV files hurls when it tries to deal with CSV UTF-8.
Is there a way to import the CSV UTF-8 data and convert it to ASCII CSV so that my code will continue to work?
My current code looks like this:
//In my WebAPI Controller
//fileToProcess is IFormFile
byte[] fileBytes = new byte[fileToProcess.Length];
using(var stream = fileToProcess.OpenReadStream())
{
await stream.ReadAsync(fileBytes);
stream.Close();
}
var result = await ProcessFileAsync(fileBytes);
return OK(result);
...
//In a Parsing Class
public async Task<List<Client>> ProcessFileAsync(byte[] fileBytes)
{
List<Client> result = null;
var fileText = Encoding.Default.GetString(fileBytes);
using(var reader = new StringReader(fileText))
{
using(var csv = new CsvReader(reader))
{
csv.RegisterClassMap<ClientMap>();
result = csv.GetRecords<T>().ToList();
await PostProcess(result);
}
}
return result;
}
The problem is that CSV UTF-8 has the BOM so when CsvHelper tries to process a mapping that references the first column header
Map(c => c.ClientId).Name("CLIENT ID");
it fails because the column name includes the BOM.
So, my questions are:
How can I tell if the file coming in is UTF-8 or ASCII.
How do I convert the UTF-8 to ASCII so it can be processed normally?
NOTE
I did try the following:
fileBytes = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, fileBytes);
However, this replaced the BOM with a ? which still causes CsvHelper to fail.

By doing this:
var fileText = Encoding.Default.GetString(fileBytes);
using(var reader = new StringReader(fileText))
... you're locking yourself into a specific encoding at the point of converting it to a string. Encoding.Default is can vary by platform and CLR implementation.
The StreamReader class is designed to read text from a stream (which you can wrap around the raw bytes with a MemoryStream) and is capable of detecting the encoding for you if you let it. Try this instead:
using (var stream = new MemoryStream(fileBytes))
using (var reader = new StreamReader(stream))
In your case, you could use the incoming stream directly by changing ProcessFileAsync to accept the stream.
using (var stream = fileToProcess.OpenReadStream())
{
var result = await ProcessFileAsync(stream);
return OK(result);
}
public async Task<List<Client>> ProcessFileAsync(Stream stream)
{
using (var reader = new StreamReader(stream))
{
using (var csv = new CsvReader(reader))
{
csv.RegisterClassMap<ClientMap>();
List<Client> result = csv.GetRecords<Client>().ToList();
await PostProcess(result);
return result;
}
}
}
As long as the BOM is present, this will also support UTF16-encoded and UTF32-encoded files (and pretty much anything else that can be detected) because it'll see the U+FEFF code point in whichever encoding it uses.

Convert byte[] to excel file (xlsx)

I need to convert a byte array into an excel file using C# to upload it in Sharepoint.
The following code read an input file from client as a byte array:
public object UploadFile(HttpPostedFile file)
{
byte[] fileData = null;
using (var binaryReader = new BinaryReader(file.InputStream))
{
fileData = binaryReader.ReadBytes(imageFile.ContentLength);
// convert fileData to excel
}
}
How can I do it?

It sounds like you're just after File.WriteAllBytes(path, contents). However, if the input file could be large, you may be better off using the Stream API:
using(var destination = File.Create(path)) {
file.InputStream.CopyTo(destination);
}
Edit: it looks like HttpPostedFile has a SaveAs method, so just:
file.SaveAs(path);

Convert an IFile (JSON File) to MyObject

I upload a JSON file with a HTML form as explained here in the first paragraph. I accept only 1 file at a time so this is my controller:
public IActionResult Upload(IFormFile file)
{
}
Now I want to convert the file containing JSON to an object. Just like this accepted answer of Cuong Le. How do I convert the file to lets say MyObject? How do i deserialize the file?
(Newtonsoft is the lib to import right?)

You can read the text from the file and then convert to JSON. You can try something like,
string fileContent = null;
using (var reader = new StreamReader(file.OpenReadStream()))
{
fileContent = reader.ReadToEnd();
}
var result = JsonConvert.DeserializeObject<MyObject>(fileContent );
Yes, you can use Newtonsoft NuGet package for deserializing.

JSON.NET writing invalid JSON?

It appears that JSON.NET is writing invalid JSON, although I wouldn't be surprised if it was due to my misuse.
It appears that it is repeating the last few characters of JSON:
/* ... */ "Teaser":"\nfoo.\n","Title":"bar","ImageSrc":null,"Nid":44462,"Vid":17}]}4462,"Vid":17}]}
The repeating string is:
4462,"Vid":17}]}
I printed it out to the console, so I don't think this is a bug in Visual Studio's text visualizer.
The serialization code:
static IDictionary<int, ObservableCollection<Story>> _sectionStories;
private static void writeToFile()
{
IsolatedStorageFile storage = IsolatedStorageFile.GetUserStoreForApplication();
using (IsolatedStorageFileStream stream = storage.OpenFile(STORIES_FILE, FileMode.OpenOrCreate))
{
using (StreamWriter writer = new StreamWriter(stream))
{
writer.Write(JsonConvert.SerializeObject(_sectionStories));
}
}
#if DEBUG
StreamReader reader = new StreamReader(storage.OpenFile(STORIES_FILE, FileMode.Open));
string contents = reader.ReadToEnd();
JObject data = JObject.Parse(contents);
string result = "";
foreach (char c in contents.Skip(contents.Length - 20))
{
result += c;
}
Debug.WriteLine(result);
// crashes here with ArgumentException
// perhaps because JSON is invalid?
var foo = JsonConvert.DeserializeObject<Dictionary<int, List<Story>>>(contents);
#endif
}
Am I doing something wrong here? Or is this a bug? Are there any known workarounds?
Curiously, JObject.Parse() doesn't throw any errors.
I'm building a Silverlight app for Windows Phone 7.

When writing the file you specify
FileMode.OpenOrCreate
If the file exists and is 16 bytes longer than the data you intend to write to it (from an older version of your data that just happens to end with the exact same data) then that data will still be present when you're done writing your new data.
Solution:
FileMode.Create
From:
http://msdn.microsoft.com/en-us/library/system.io.filemode.aspx
FileMode.Create: Specifies that the operating system should create a new file. If the file already exists, it will be overwritten

.NET: Reading/writing binary string

EDIT: Don't answer; I've found the solution on my own.
I have some code that does this:
using (var stream = new FileStream(args[1], FileMode.Create))
{
using (var writer = new BinaryWriter(stream))
{
writer.Write(ip.Iso3166CountryCode);
...
}
}
Iso3166CountryCode is a string with two characters ("US").
When I try to read "US" from the file:
// line is a byte[] from the file with the first 1024 bytes
UnicodeEncoding.Default.GetString(line.Take(2).ToArray());
I don't get "US" back, I get some odd ASCII characters back. How do I read the two country-code characters from this binary file?
EDIT: NEVER MIND. I changed writer.Write(ip.Iso3166CountryCode) to writer.Write(UnicodeEncoding.Default.GetBytes(ip.Iso3166CountryCode)) and it works.

Try changing writer.Write(ip.Iso3166CountryCode) to writer.Write(UnicodeEncoding.Default.GetBytes(ip.Iso3166CountryCode)), that should work! :)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Read Json file containing umlaut in c# - c#

The file is not encoded in UTF8, try Encoding.GetEncoding("iso-8859-1") instead.

Related

How to read uploaded CSV UTF-8 for processing with CsvHelper?

Convert byte[] to excel file (xlsx)

Convert an IFile (JSON File) to MyObject

JSON.NET writing invalid JSON?

.NET: Reading/writing binary string

Categories

Resources