So given this input string:
=?ISO-8859-1?Q?TEST=2C_This_Is_A_Test_of_Some_Encoding=AE?=
And this function:
private string DecodeSubject(string input)
{
StringBuilder sb = new StringBuilder();
MatchCollection matches = Regex.Matches(inputText.Text, #"=\?(?<encoding>[\S]+)\?.\?(?<data>[\S]+[=]*)\?=");
foreach (Match m in matches)
{
string encoding = m.Groups["encoding"].Value;
string data = m.Groups["data"].Value;
Encoding enc = Encoding.GetEncoding(encoding.ToLower());
if (enc == Encoding.UTF8)
{
byte[] d = Convert.FromBase64String(data);
sb.Append(Encoding.ASCII.GetString(d));
}
else
{
byte[] bytes = Encoding.Default.GetBytes(data);
string decoded = enc.GetString(bytes);
sb.Append(decoded);
}
}
return sb.ToString();
}
The result is the same as the data extracted from the input string. What am i doing wrong that this text is not getting decoded properly?
UPDATE
So i have this code for decoding quote-printable:
public string DecodeQuotedPrintable(string encoded)
{
byte[] buffer = new byte[1];
return Regex.Replace(encoded, "=(\r\n?|\n)|=([A-F0-9]{2})", delegate(Match m)
{
if (byte.TryParse(m.Groups[2].Value, NumberStyles.HexNumber, CultureInfo.InvariantCulture, out buffer[0]))
{
return Encoding.ASCII.GetString(buffer);
}
else
{
return string.Empty;
}
});
}
And that just leaves the underscores. Do i manually convert those to spaces (Replace("_"," ")), or is there something else i need to do to handle that?
Looks like you don't fully understand format of input line. Check it here: http://www.ietf.org/rfc/rfc2047.txt
format is: encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
so you have to
Extranct charset(encoding in terms of .net). Not just UTF8 or Default (Utf16)
Extract encoding: either B for base64 Q for quoted-printable (your case!)
Then perform decoding to bytes then to string
The function's not even trying to decode the quoted-printable encoded stuff (the hex codes and underscores). You need to add that.
It's handling the encoding wrong (UTF-8 gets decoded with Encoding.ASCII for some bizarre reason)
Related
I am having a problem with sending a cyrillic (russian letters) instead of english ones to server(java spring boot utf-8). Here are my frames examples below. The one with english works fine, but cyrillic have a wrong calculating null octet. I am using websocket-csharp-net-stomp-client for it.
I have also tried to change encoding of the string with message to UTF-8
The one that works:
The one that does not work:
public static string SendMessage(string messageText, string chatID)
{
Encoding utf16 = Encoding.GetEncoding("utf-16"); //also tried encode by 1251 instead of utf-16
Encoding utf8 = Encoding.UTF8;
byte[] utf8Bytes = utf8.GetBytes(messageText);
byte[] utf16Bytes = Encoding.Convert(utf8, utf16, utf8Bytes);
string msg = utf16.GetString(isoBytes);
StompMessageSerializer serializer = new StompMessageSerializer();
var content = new MessageContent() { text = msg };
var broad = new StompMessage("SEND", JsonConvert.SerializeObject(content));
broad["token"] = $"{Global.AuthCompTokenFinal}";
broad["contentType"] = "application/json";
broad["destination"] = $"/app/send/{chatID}";
var str = serializer.Serialize(broad);
Console.WriteLine(str);
Global.ws.Send(str);
return str;
}
content length is getting here (library text)
internal StompMessage(string command, string body, Dictionary<string, string> headers)
{
stompCommand = command;
Body = body;
nativeHeaders = headers;
this["content-length"] = body.Length.ToString();
}
What am I missing here?
Here is an error example:
Just deleted this["content-length"] = body.Length.ToString(); and encoding to UTF-8
and it works fine . Wow
Here is my problem, Im trying to Encode the response of my webservice with the following Code.
public static string ConvertToUTF8(string Cadena)
{
string mensajeex = Cadena;
Encoding utf8 = Encoding.UTF8;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte array.
byte[] unicodeBytes = unicode.GetBytes(mensajeex);
// Perform the conversion from one encoding to the other.
byte[] asciiBytes = Encoding.Convert(unicode, utf8, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
char[] asciiChars = new char[utf8.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
utf8.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
string Utf8string = new string(asciiChars);
// Display the strings created before and after the conversion.
Console.WriteLine("Original string: {0}", mensajeex);
Console.WriteLine("Ascii converted string: {0}", Utf8string);
return Utf8string;
}
And actually it works! But when I try to Encode a string and then pass through an exception as a Message property like this
throw new Exception(XMLHelper.ConvertToUTF8(Message));
It give me the response wrong like:
El valor 'R' no es válido segú
Any ideas? Thanks
I have the following string:
=?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_gew=C3=A4hrt=...
which is an encoding of
[proconact-Verbesserung #279] (Neu) Stellvertretungen Benutzerrecht - andere können für andere Stellvertretungen erstellen ändern usw. dadurch ist der Schutz der Aktiviäten Mails nicht gewährt.
I am searching for a way do decode the quoted string.
I have tried:
private static string DecodeQuotedPrintables(string input, string charSet) {
Encoding enc = new ASCIIEncoding();
try {
enc = Encoding.GetEncoding(charSet);
} catch {
enc = new UTF8Encoding();
}
var occurences = new Regex(#"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline);
var matches = occurences.Matches(input);
foreach (Match match in matches) {
try {
byte[] b = new byte[match.Groups[0].Value.Length / 3];
for (int i = 0; i < match.Groups[0].Value.Length / 3; i++) {
b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier);
}
char[] hexChar = enc.GetChars(b);
input = input.Replace(match.Groups[0].Value, hexChar[0].ToString());
} catch { ;}
}
input = input.Replace("?=", "").Replace("=\r\n", "");
return input;
}
when I call (where s is my string)
var x = DecodeQuotedPrintables(s, "utf-8");
this will return
=?utf-8?Q?[proconact_-_Verbesserung_#_(Neu)_Stellvertretungen_Benutzerrecht_-_andere_können_für_andere_Stellvertretungen_erstellen_ändern_usw._dadurch_ist_der_Schutz_der_Aktiviäten_Mails_nicht_gewährt=...
What can I do, that there will also the _ and the starting =?utf-8?Q? and the trailing =.. be removed?
The text you’re trying to decode is typically found in MIME headers, and is encoded according to the specification defined in the following Internet standard: RFC 2047: MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text.
There is a sample implementation for such a decoder on GitHub; maybe you can draw some ideas from it: RFC2047 decoder in C#.
You can also use this online tool for comparing your results: Online MIME Headers Decoder.
Note that your sample text is incorrect. The specification declares:
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
Per the specification, any encoded word must end in ?=. Thus, your sample must be corrected from:
=?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_gew=C3=A4hrt=
…to (scroll to the far right):
=?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_gew=C3=A4hrt?=
Strictly speaking, your sample is also invalid because it exceeds the 75-character limit imposed on any encoded word; however, most decoders tend to be tolerant of this non-conformity.
I've tested 5+ of code snippets and this is the working one: I've modified the regex part:
Test line:
im sistemlerimizde bak=FDm =E7al=FD=FEmas=FD yap=FDlaca=F0=FDndan; www.gib.=
Sample call:
string encoding = "windows-1254";
string input = "im sistemlerimizde bak=FDm =E7al=FD=FEmas=FD yap=FDlaca=F0=FDndan; www.gib.=";
DecodeQuotedPrintables(input, encoding);
Code snippet:
private static string DecodeQuotedPrintables(string input, string charSet)
{
System.Text.Encoding enc = System.Text.Encoding.UTF7;
try
{
enc = Encoding.GetEncoding(charSet);
}
catch
{
enc = new UTF8Encoding();
}
////parse looking for =XX where XX is hexadecimal
//var occurences = new Regex(#"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline);
var occurences = new Regex("(\\=([0-9A-F][0-9A-F]))", RegexOptions.Multiline);
var matches = occurences.Matches(input);
foreach (Match match in matches)
{
try
{
byte[] b = new byte[match.Groups[0].Value.Length / 3];
for (int i = 0; i < match.Groups[0].Value.Length / 3; i++)
{
b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier);
}
char[] hexChar = enc.GetChars(b);
input = input.Replace(match.Groups[0].Value, hexChar[0].ToString());
}
catch
{ ;}
}
input = input.Replace("?=", "").Replace("=\r\n", "");
return input;
}
As mentioned at standard class .NET is exist for this purpose.
string unicodeString =
"=?UTF-8?Q?YourText?=";
System.Net.Mail.Attachment attachment = System.Net.Mail.Attachment.CreateAttachmentFromString("", unicodeString);
Console.WriteLine(attachment.Name);
Following my comment I'd suggest
private static string MessedUpUrlDecode(string input, string encoding)
{
Encoding enc = new ASCIIEncoding();
try
{
enc = Encoding.GetEncoding(charSet);
}
catch
{
enc = new UTF8Encoding();
}
string messedup = input.Split('?')[3];
string cleaned = input.Replace("_", " ").Replace("=...", ".").Replace("=", "%");
return System.Web.HttpUtility.UrlDecode(cleaned, enc);
}
assuming that the mutilating of the source strings is consistent.
I am not too sure on how to remove the
=?utf-8?Q?
Unless it appears all the time, if it does, you can do this:
input = input.Split('?')[3];
To get rid of the trailing '=' you can remove it by:
input = input.Remove(input.Length - 1);
You can get rid of the '_' by replacing it with a space:
input = input.Replace("_", " ");
You can use those pieces of code in your DecodeQuotedPrintables function.
Hope this Helps!
i am using Encoding and decoding :
For Encoding:
private string EncodeServerName(string ServerName)
{
byte[] NameEncodein = new byte[ServerName.Length];
NameEncodein = System.Text.Encoding.UTF8.GetBytes(ServerName);
string EcodedName = Convert.ToBase64String(NameEncodein);
return EcodedName;
}
and Decoding:
public string DecoAndGetServerName(string Servername)
{
System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding();
System.Text.Decoder strDecoder = encoder.GetDecoder();
byte[] to_DecodeByte = Convert.FromBase64String(Servername);
int charCount = strDecoder.GetCharCount(to_DecodeByte, 0, to_DecodeByte.Length);
char[] decoded_char = new char[charCount];
strDecoder.GetChars(to_DecodeByte, 0, to_DecodeByte.Length, decoded_char,0);
string Name = new string(decoded_char);
return Name;
}
I am sending ServerName:DEV-SQL1\SQL2008
It is encoded:REVWLVNRTDFcU1FMMjAwOA==
Again i want to decode but getting Exception:in line:
byte[] to_DecodeByte = Convert.FromBase64String(Servername);
Exception IS:
`The input is not a valid Base-64 string as it contains a non-base 64 character,
more than two padding characters, or a non-white space character among the padding characters.`
How to solve this issue.
Please Help Me
Your code seems way too complex :-), here is one that works:
public static string EncodeServerName(string serverName)
{
return Convert.ToBase64String(Encoding.UTF8.GetBytes(serverName));
}
public static string DecodeServerName(string encodedServername)
{
return Encoding.UTF8.GetString(Convert.FromBase64String(encodedServername));
}
the same code works for me, which you written in DecoAndGetServerName().
the thing is, you need to pass ENCODED STRING to your DecoAndGetServerName() function,
which might be encoded like :
string Servername=Convert.ToBase64String(Encoding.UTF8.GetBytes("serverName"));
That's why you got that Error The input is not a valid Base-64 string as it contains a non-base 64 character,....
Our website has files in a few different languages - French, Spanish, Portuguese, and English. When a user uploads a file that contains special characters like ó or ç or ã etc i get an error message when i return File(data, "application/octet-stream", name); in MVC i get the exception:
System.FormatException: An invalid character was found in the mail header.
I found an article in MSDN for this showing how to set the mailmessage to UTF-8 encoding to avoid this. But i do not know how to UTF-8 encode the filename when using the MVC file actionresult. I found an article on the net to UTF-8 encode a string but when I try to use it I get a garbage name so I guess I do not understand what UTF-8 encoding is supposed to do to the string. Here is the sample code found in this blog post: An invalid character was found in the mail header
public static string GetCleanedFileName(string s)
{
char[] chars = s.ToCharArray();
var sb = new StringBuilder();
for (int index = 0; index < chars.Length; index++)
{
string encodedString = EncodeChar(chars[index]);
sb.Append(encodedString);
}
return sb.ToString();
}
private static string EncodeChar(char chr)
{
var encoding = new UTF8Encoding();
var sb = new StringBuilder();
byte[] bytes = encoding.GetBytes(chr.ToString());
for (int index = 0; index < bytes.Length; index++)
{
sb.AppendFormat("%{0}", Convert.ToString(bytes[index], 16));
}
return sb.ToString();
}
Maybe try another function encoding from and to utf8
//UTF8
public static string ConvertToUTF8(string inputString)
{
string toReturn = "";
byte[] arr = Encoding.UTF8.GetBytes(inputString);
for (int i = 0; i < arr.Length; i++)
{
toReturn += arr[i].ToString() + " ";
}
return toReturn;
}
public static string ConvertFromUTF8(string inputString)
{
inputString = inputString.Trim();
string result = "";
string[] parts = inputString.Split(' ');
byte[] bytes = new byte[parts.Length];
for (int i = 0; i < parts.Length; i++)
{
if (parts[i] == "")
{
continue;
}
try
{
bytes[i] = Convert.ToByte(parts[i]);
}
catch (Exception)
{
MessageBox.Show("Input string was not in a correct format.");
}
}
try
{
result = Encoding.UTF8.GetString(bytes);
}
catch (Exception)
{
throw;
}
return result;
}
I think i have got an idea you have to convert your string not to utf-8 but to utf-16
because utf-8 is encripted ascii as i think.
UTF-16 represents every character using two bytes. UTF-8 uses the one byte ASCII character encodings for ASCII characters and represents non-ASCII characters using variable-length encodings. Keep in mind that while UTF-8 can save space for Western languages, which is an argument often used by proponents, it can actually use up to three bytes per character for other languages.
And that symbols you wrote are not ASCII