Can't convert HttpResponseMessage with UTF8 encoding - c#

I'm struggling with the usual conversion issue, but unfortunately I haven't been able to find anything for my specific problem.
My app is receiving a System.Net.Http.HttpResponseMessage, from a php server, UTF8 encoded, containing some characters like \u00c3\u00a0 (à) and I'm not able to convert them.
string message = await result.Content.ReadAsStringAsync();
byte[] messageBytes = Encoding.UTF8.GetBytes(message);
string newmessage = Encoding.UTF8.GetString(messageBytes, 0, messageBytes.Length);
This is just one of my try, but nothing happens, the resultring string still has the \u00c3\u00a0 characters.
I have also read some answers like How to convert a UTF-8 string into Unicode? but this solution doesn't work for me. This is the solution code:
public static string DecodeFromUtf8(this string utf8String)
{
// copy the string as UTF-8 bytes.
byte[] utf8Bytes = new byte[utf8String.Length];
for (int i=0;i<utf8String.Length;++i) {
//Debug.Assert( 0 <= utf8String[i] && utf8String[i] <= 255, "the char must be in byte's range");
utf8Bytes[i] = (byte)utf8String[i];
}
return Encoding.UTF8.GetString(utf8Bytes,0,utf8Bytes.Length);
}
DecodeFromUtf8("d\u00C3\u00A9j\u00C3\u00A0"); // déjà
I have noticed that when I try the above solution with a simple string like
string str = "Comunit\u00c3\u00a0"
the DecodeFromUtf8 method works perfectly, the problem is when I use my response message.
Any advice would be very appreciated

I've solved this problem by myself. I've discovered that the server response was a ISO string of a utf-8 json, so I had to remove the json escape characters and then convert the iso into a utf8
So I had to do the following:
private async Task<string> ResponseMessageAsync(HttpResponseMessage result)
{
string message = await result.Content.ReadAsStringAsync();
string parsedString = Regex.Unescape(message);
byte[] isoBites = Encoding.GetEncoding("ISO-8859-1").GetBytes(parsedString);
return Encoding.UTF8.GetString(isoBites, 0, isoBites.Length);
}

for me works change from:
string message = await result.Content.ReadAsStringAsync();
byte[] messageBytes = Encoding.UTF8.GetBytes(message);
string newmessage = Encoding.UTF8.GetString(messageBytes, 0, messageBytes.Length);
to:
byte[] bytes = await result.Content.ReadAsByteArrayAsync();
Encoding utf8 = Encoding.UTF8;
string newmessage = utf8.GetString(bytes);

Related

Websocket having Error: Frame must be terminated with a null octet while using Cyrillic instead of english

I am having a problem with sending a cyrillic (russian letters) instead of english ones to server(java spring boot utf-8). Here are my frames examples below. The one with english works fine, but cyrillic have a wrong calculating null octet. I am using websocket-csharp-net-stomp-client for it.
I have also tried to change encoding of the string with message to UTF-8
The one that works:
The one that does not work:
public static string SendMessage(string messageText, string chatID)
{
Encoding utf16 = Encoding.GetEncoding("utf-16"); //also tried encode by 1251 instead of utf-16
Encoding utf8 = Encoding.UTF8;
byte[] utf8Bytes = utf8.GetBytes(messageText);
byte[] utf16Bytes = Encoding.Convert(utf8, utf16, utf8Bytes);
string msg = utf16.GetString(isoBytes);
StompMessageSerializer serializer = new StompMessageSerializer();
var content = new MessageContent() { text = msg };
var broad = new StompMessage("SEND", JsonConvert.SerializeObject(content));
broad["token"] = $"{Global.AuthCompTokenFinal}";
broad["contentType"] = "application/json";
broad["destination"] = $"/app/send/{chatID}";
var str = serializer.Serialize(broad);
Console.WriteLine(str);
Global.ws.Send(str);
return str;
}
content length is getting here (library text)
internal StompMessage(string command, string body, Dictionary<string, string> headers)
{
stompCommand = command;
Body = body;
nativeHeaders = headers;
this["content-length"] = body.Length.ToString();
}
What am I missing here?
Here is an error example:
Just deleted this["content-length"] = body.Length.ToString(); and encoding to UTF-8
and it works fine . Wow

ASP.NET SOAP Webservice ,Encode Problem in Exception

Here is my problem, Im trying to Encode the response of my webservice with the following Code.
public static string ConvertToUTF8(string Cadena)
{
string mensajeex = Cadena;
Encoding utf8 = Encoding.UTF8;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte array.
byte[] unicodeBytes = unicode.GetBytes(mensajeex);
// Perform the conversion from one encoding to the other.
byte[] asciiBytes = Encoding.Convert(unicode, utf8, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
char[] asciiChars = new char[utf8.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
utf8.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
string Utf8string = new string(asciiChars);
// Display the strings created before and after the conversion.
Console.WriteLine("Original string: {0}", mensajeex);
Console.WriteLine("Ascii converted string: {0}", Utf8string);
return Utf8string;
}
And actually it works! But when I try to Encode a string and then pass through an exception as a Message property like this
throw new Exception(XMLHelper.ConvertToUTF8(Message));
It give me the response wrong like:
El valor 'R' no es válido seg&#250
Any ideas? Thanks

Encode and Decode in c# asp.net?

i am using Encoding and decoding :
For Encoding:
private string EncodeServerName(string ServerName)
{
byte[] NameEncodein = new byte[ServerName.Length];
NameEncodein = System.Text.Encoding.UTF8.GetBytes(ServerName);
string EcodedName = Convert.ToBase64String(NameEncodein);
return EcodedName;
}
and Decoding:
public string DecoAndGetServerName(string Servername)
{
System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding();
System.Text.Decoder strDecoder = encoder.GetDecoder();
byte[] to_DecodeByte = Convert.FromBase64String(Servername);
int charCount = strDecoder.GetCharCount(to_DecodeByte, 0, to_DecodeByte.Length);
char[] decoded_char = new char[charCount];
strDecoder.GetChars(to_DecodeByte, 0, to_DecodeByte.Length, decoded_char,0);
string Name = new string(decoded_char);
return Name;
}
I am sending ServerName:DEV-SQL1\SQL2008
It is encoded:REVWLVNRTDFcU1FMMjAwOA==
Again i want to decode but getting Exception:in line:
byte[] to_DecodeByte = Convert.FromBase64String(Servername);
Exception IS:
`The input is not a valid Base-64 string as it contains a non-base 64 character,
more than two padding characters, or a non-white space character among the padding characters.`
How to solve this issue.
Please Help Me
Your code seems way too complex :-), here is one that works:
public static string EncodeServerName(string serverName)
{
return Convert.ToBase64String(Encoding.UTF8.GetBytes(serverName));
}
public static string DecodeServerName(string encodedServername)
{
return Encoding.UTF8.GetString(Convert.FromBase64String(encodedServername));
}
the same code works for me, which you written in DecoAndGetServerName().
the thing is, you need to pass ENCODED STRING to your DecoAndGetServerName() function,
which might be encoded like :
string Servername=Convert.ToBase64String(Encoding.UTF8.GetBytes("serverName"));
That's why you got that Error The input is not a valid Base-64 string as it contains a non-base 64 character,....

C#/Why does Get html returns random junk characters?

I have this for ex:
Link
This code:
const String nick = "Alex";
const String log = "http://demonscity.combats.com/zayavka.pl?logs=";
foreach (DateTime cd in dateRange)
{
string str = log + String.Format("{0:MM_dd_yy}", cd.Date) + "&filter=" + nick;
String htmlCode = wc.DownloadString(str);
}
returns something...."‹\b\0\0\0\0\0\0я•XYsЫЦ~зЇёѕ™d)bг.тBҐ$ЪRЖ’<2УN&сh#р ’„\f\0J–—_Фџђ§¤нt¦г6ќѕУЄђ0’IQtТґcµо№X(jі-Щ/Ђі|g?`yҐ¶ц"
Other links works fine.
I think the problem is with codepage, how can i fix it? Or it's server problem?
The issue is that the response is GZip-compressed (response has a Content-Encoding: gzip header). You need to first decompress it, then you'll be able to read it:
public class StackOverflow_6660689
{
public static void Test()
{
WebClient wc = new WebClient();
Encoding encoding = Encoding.GetEncoding("windows-1251");
byte[] data = wc.DownloadData("http://demonscity.combats.com/zayavka.pl?logs=08_07_11&filter=Alex");
GZipStream gzip = new GZipStream(new MemoryStream(data), CompressionMode.Decompress);
MemoryStream decompressed = new MemoryStream();
gzip.CopyTo(decompressed);
string str = encoding.GetString(decompressed.GetBuffer(), 0, (int)decompressed.Length);
Console.WriteLine(str);
}
}
I think it is returning result in gzip format which it should not unless client explicitly accepts the format.

Text Decoding Problem

So given this input string:
=?ISO-8859-1?Q?TEST=2C_This_Is_A_Test_of_Some_Encoding=AE?=
And this function:
private string DecodeSubject(string input)
{
StringBuilder sb = new StringBuilder();
MatchCollection matches = Regex.Matches(inputText.Text, #"=\?(?<encoding>[\S]+)\?.\?(?<data>[\S]+[=]*)\?=");
foreach (Match m in matches)
{
string encoding = m.Groups["encoding"].Value;
string data = m.Groups["data"].Value;
Encoding enc = Encoding.GetEncoding(encoding.ToLower());
if (enc == Encoding.UTF8)
{
byte[] d = Convert.FromBase64String(data);
sb.Append(Encoding.ASCII.GetString(d));
}
else
{
byte[] bytes = Encoding.Default.GetBytes(data);
string decoded = enc.GetString(bytes);
sb.Append(decoded);
}
}
return sb.ToString();
}
The result is the same as the data extracted from the input string. What am i doing wrong that this text is not getting decoded properly?
UPDATE
So i have this code for decoding quote-printable:
public string DecodeQuotedPrintable(string encoded)
{
byte[] buffer = new byte[1];
return Regex.Replace(encoded, "=(\r\n?|\n)|=([A-F0-9]{2})", delegate(Match m)
{
if (byte.TryParse(m.Groups[2].Value, NumberStyles.HexNumber, CultureInfo.InvariantCulture, out buffer[0]))
{
return Encoding.ASCII.GetString(buffer);
}
else
{
return string.Empty;
}
});
}
And that just leaves the underscores. Do i manually convert those to spaces (Replace("_"," ")), or is there something else i need to do to handle that?
Looks like you don't fully understand format of input line. Check it here: http://www.ietf.org/rfc/rfc2047.txt
format is: encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
so you have to
Extranct charset(encoding in terms of .net). Not just UTF8 or Default (Utf16)
Extract encoding: either B for base64 Q for quoted-printable (your case!)
Then perform decoding to bytes then to string
The function's not even trying to decode the quoted-printable encoded stuff (the hex codes and underscores). You need to add that.
It's handling the encoding wrong (UTF-8 gets decoded with Encoding.ASCII for some bizarre reason)

Categories