C# : Change String Encoding? - c#

I'm struggling with the encoding of one of my string.
On a Mail Sending WS, I'm receiving a bad string containing "�" instead of "é" (that's what I'm seeing in the Debug Mode of Visual Studio at least).
The character comes from some JSON that is deserialized when entering the WS into my DTO.
Changing the Content-Type of the JSON is not solving the thing.
So I thought I'll change the encoding of my string by myself, because the JSON encoding thing seems like a VS deserialization issue (I started a thread here if one of you guys want to take a look at it).
I tried :
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding defaultEncoding = Encoding.Default;
byte[] bytes = defaultEncoding.GetBytes(messedUpString);
byte[] isoBytes = Encoding.Convert(defaultEncoding, iso, bytes);
cleanString = iso.GetString(isoBytes);
Or :
byte[] bytes = Encoding.Default.GetBytes(messedUpString);
cleanString = Encoding.UTF8.GetString(bytes);
And it's not really effective... I get rid of the "�" char, which is the nice part, but I'm receiving in the cleanString "?" instead of the expected "é", and this in not really nice, or at least, the expected behavior.

In fact, every thing was fine in my application.
I used SOAPUI to test, and this was my error.
I downloaded some rest plugin for my browser, try from there, and everything worked.
Thanks for the help though #MattiVirkkunen

Related

SendGrid inbound parse nordic chars

Completely stuck on a problem related to the inbound parse webhook functionality offered by SendGrid: https://sendgrid.com/docs/for-developers/parsing-email/setting-up-the-inbound-parse-webhook/
First off everything is working just fine with retrieving the mail sent to my application endpoint. Using Request.Form I'm able to retrieve the data and work with it.
The problem is that we started noticing question mark symbols instead of letters when recieving some mails (written in swedish using Å Ä and Ö). This occured both when sending plaintext mails, and mails with an HTML-body.
However, this only happens every now and then. After a lot of searching I found out that if the mail is sent from e.g. Postbox or Outlook (or the like), and the application has the charset set to iso-8859-1 that's when Å Ä Ö is replaced by question marks.
To replicate the error and be able to debug it I set up a HTML page with a form using the iso-8859-1 encoding, sending a similar payload as the one seen in the link above (the default one). And after that been through testing a multitude of things trying to get it to work.
As of now I'm trying to recode the input, without success. Code I'm testing:
Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = wind1252.GetBytes(Request.Form.["html"]);
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8,wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);
This only results in the utf8String producing the same result with "???" where Å Ä Ö should be. My guess here is that perhaps it's due to the Request.Form["html"] returning a UTF-16 string, of the content that is encoded already in the wrong encoding iso-8859-1.
The method for fetching the POST is as follows
public async Task<InboundParseModel> FetchMail(IFormCollection form)
{
InboundParseModel _em = new InboundParseModel
{
To = form["to"].SingleOrDefault(),
From = form["from"].SingleOrDefault(),
Subject = form["subject"].SingleOrDefault(),
Html = form["html"].SingleOrDefault(),
Text = System.Net.WebUtility.HtmlEncode(form["text"].SingleOrDefault()),
Envelope = form["envelope"].SingleOrDefault()
};
}
Called from another method that the POST is done to by FetchMail(Request.Form);
Project info: ASP.NET Core 2.2, C#
So as stated earlier, I am completely stuck and don't really have any ideas on how to solve this. Any help would be much appreciated!

Send UTF-8 string from Android to C#

I've been trying to accomplish a simple text transmission from my Android app to my C# server (asmx server), sending the simplest string - and for some reason it never works. My Android code is as following (assume that the variable 'message' holds the string as received from an EditText, which is UTF-16 as far as I'm concerned):
httpClient = new DefaultHttpClient();
HttpPost post = new HttpPost(POST_MESSAGE_ADDRESS);
byte[] messageBytes = message.getBytes("utf-8");
builder.addPart("message", new StringBody(messageBytes.toString()));
HttpEntity entity = builder.build();
post.setEntity(entity);
HttpResponse response = httpClient.execute(post);
So I get something simple for my message, say a 10 bytes array. In my server, I have a function set to that specific address; its code is:
string message = HttpContext.Current.Request.Form["message"];
byte[] test = System.Text.Encoding.UTF8.GetBytes(message);
Now after that line the byte array ('test') has the exact same value as the result of the ToString() function I called in the app. Question is, how do I convert it to normal UTF-8 text to display?
Note: I have tried sending the string normally as a string content, but as far as I understood the default coding is ASCII so I got a lot of question marks.
Edit: Now I'm looking for some conversions solutions and trying them, but my question is also if there's a simpler way to do that (perhaps BinaryBody in the android, or different coding?)
Problem is in following lines:
byte[] messageBytes = message.getBytes("utf-8");
builder.addPart("message", new StringBody(messageBytes.toString()));
First you are transforming your UTF-16 string message into UTF-8 encoded messageBytes only to convert them back to UTF-16 string in next line. And there you are using StringBody constructor that will use ASCII encoding as default.
You should replace those lines with:
builder.addPart("message", new StringBody(message, Charset.forName("UTF-8")));

Sending quotation marks in a GCM Payload (and other special characters that break syntax)

I'm struggling finding a feasible solution to this. I've tried looking around but can't find any documentation regarding this issue. If a customer sends out a message with quote(s), it break the payload syntax and android spits me back a 400 Bad Request error.
The only solution I can think of is by doing my own translations and validations. Allow only the basics, and for the restricted do my own "parsing" Ie take a quote, replace them with "/q" and then replace "/q" on the App when received. I don't like this solution because it involves logic on the App that if, I forget something. I want to be able to change it immediately rather then update everyones phone, app, etc.
I'm looking for an existing encoding I could apply that is processed correctly by the GCM servers. Allowing them to be accepted then broadcasted. Received by the phone with the characters intact.
Base64 encoding should get rid of the special characters. Just encode it before sending and decode it again on receiving:
Edit: sorry, just got a java/android sample here, I don't know how exactly xamarin works and what functions it provides:
// before sending
byte[] data = message.getBytes("UTF-8");
String base64Message = Base64.encodeToString(data, Base64.DEFAULT);
// on receiving
byte[] data = Base64.decode(base64Message , Base64.DEFAULT);
String message= new String(data, "UTF-8");
.Net translation of #tknell solution
Decode:
Byte[] data = System.Convert.FromBase64String(encodedString);
String decoded = System.Text.Encoding.UTF8.GetString(data);
Encode:
Byte[] data = System.Text.Encoding.UTF8.GetBytes(decodedString);
String encoded = System.Convert.ToBase64String(data);

Detect Stream or Byte Array Encoding on Windows Phone

I'm trying to read xmls that I downloaded with the WebClient.OpenReadAsync() in a Windows Phone application. The problem is that sometimes, the xml won't come with UTF8 Encoding, it might come with other encodings such as "ISO-8859-1", so the accents come messed up.
I was able to load one of the ISO-8858-1 xmls perfectly using the code:
var buff = e.Result.ReadFully(); //Gets byte array from the stream
var resultBuff = Encoding.Convert(Encoding.GetEncoding("ISO-8859-1"), Encoding.UTF8, buff);
var result = Encoding.UTF8.GetString(resultBuff, 0, resultBuff.Length);
It works beautifully with ISO-8859-1, the text came perfect after, but It messed up the UTF8 xmls.
So, the idea here is to detect the encoding of the byte array or the stream before doing this, then if it's not UTF8, it will convert the data using the method above with the detected encoding.
I am searching for some method that can detect the encoding on the internet but I cannot find any!
Does anybody know how I could do this kind of thing on Windows Phone?
Thanks!
You can look for the "Content-Type" value in the WebClient.ResponseHeaders property; If you are lucky the server is setting it to indicate the type of media plus its encoding (e.g. "text/html; charset=ISO-8859-4").

How do I get this encoding right with ANTLR?

I'm working on a project for school. We are making a static code analyzer.
A requirement for this is to analyse C# code in Java, which is going so far so good with ANTLR.
I have made some example C# code to scan with ANTLR in Visual Studio. I analyse every C# file in the solution. But it does not work. I am getting a memory leak and the error message :
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.antlr.runtime.Lexer.emit(Lexer.java:151)
at org.antlr.runtime.Lexer.nextToken(Lexer.java:86)
at org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java:119)
at org.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:238)
After a while I thought it was an issue with encoding, because all the files are in UTF-8. I think it can't read the encoded Stream. So i opened Notepad++ and i changed the encoding of every file to ANSI, and then it worked. I don't really understand what ANSI means, is this one character set or some kind of organisation?
I want to change the encoding from any encoding (probably UTF-8) to this ANSI encoding so i won't get memory leaks anymore.
This is the code that makes the Lexer and Parser:
InputStream inputStream = new FileInputStream(new File(filePath));
CharStream charStream = new ANTLRInputStream(inputStream);
CSharpLexer cSharpLexer = new CSharpLexer(charStream);
CommonTokenStream commonTokenStream = new CommonTokenStream(cSharpLexer);
CSharpParser cSharpParser = new CSharpParser(commonTokenStream);
Does anyone know how to change the encoding of the InputStream to the right encoding?
And what does Notepad++ do when I change the encoding to ANSI?
When reading text files you should set the encoding explicitly. Try you examples with the following change
CharStream charStream = new ANTLRInputStream(inputStream, "UTF-8");
I solved this issue by putting the ImputStream into a BufferedStream and then removed the Byte Order Mark.
I guess my parser didn't like that encoding, because I also tried set the encoding explicitly.

Categories