I have the following string:
=?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_gew=C3=A4hrt=...
which is an encoding of
[proconact-Verbesserung #279] (Neu) Stellvertretungen Benutzerrecht - andere können für andere Stellvertretungen erstellen ändern usw. dadurch ist der Schutz der Aktiviäten Mails nicht gewährt.
I am searching for a way do decode the quoted string.
I have tried:
private static string DecodeQuotedPrintables(string input, string charSet) {
Encoding enc = new ASCIIEncoding();
try {
enc = Encoding.GetEncoding(charSet);
} catch {
enc = new UTF8Encoding();
}
var occurences = new Regex(#"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline);
var matches = occurences.Matches(input);
foreach (Match match in matches) {
try {
byte[] b = new byte[match.Groups[0].Value.Length / 3];
for (int i = 0; i < match.Groups[0].Value.Length / 3; i++) {
b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier);
}
char[] hexChar = enc.GetChars(b);
input = input.Replace(match.Groups[0].Value, hexChar[0].ToString());
} catch { ;}
}
input = input.Replace("?=", "").Replace("=\r\n", "");
return input;
}
when I call (where s is my string)
var x = DecodeQuotedPrintables(s, "utf-8");
this will return
=?utf-8?Q?[proconact_-_Verbesserung_#_(Neu)_Stellvertretungen_Benutzerrecht_-_andere_können_für_andere_Stellvertretungen_erstellen_ändern_usw._dadurch_ist_der_Schutz_der_Aktiviäten_Mails_nicht_gewährt=...
What can I do, that there will also the _ and the starting =?utf-8?Q? and the trailing =.. be removed?
The text you’re trying to decode is typically found in MIME headers, and is encoded according to the specification defined in the following Internet standard: RFC 2047: MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text.
There is a sample implementation for such a decoder on GitHub; maybe you can draw some ideas from it: RFC2047 decoder in C#.
You can also use this online tool for comparing your results: Online MIME Headers Decoder.
Note that your sample text is incorrect. The specification declares:
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
Per the specification, any encoded word must end in ?=. Thus, your sample must be corrected from:
=?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_gew=C3=A4hrt=
…to (scroll to the far right):
=?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_gew=C3=A4hrt?=
Strictly speaking, your sample is also invalid because it exceeds the 75-character limit imposed on any encoded word; however, most decoders tend to be tolerant of this non-conformity.
I've tested 5+ of code snippets and this is the working one: I've modified the regex part:
Test line:
im sistemlerimizde bak=FDm =E7al=FD=FEmas=FD yap=FDlaca=F0=FDndan; www.gib.=
Sample call:
string encoding = "windows-1254";
string input = "im sistemlerimizde bak=FDm =E7al=FD=FEmas=FD yap=FDlaca=F0=FDndan; www.gib.=";
DecodeQuotedPrintables(input, encoding);
Code snippet:
private static string DecodeQuotedPrintables(string input, string charSet)
{
System.Text.Encoding enc = System.Text.Encoding.UTF7;
try
{
enc = Encoding.GetEncoding(charSet);
}
catch
{
enc = new UTF8Encoding();
}
////parse looking for =XX where XX is hexadecimal
//var occurences = new Regex(#"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline);
var occurences = new Regex("(\\=([0-9A-F][0-9A-F]))", RegexOptions.Multiline);
var matches = occurences.Matches(input);
foreach (Match match in matches)
{
try
{
byte[] b = new byte[match.Groups[0].Value.Length / 3];
for (int i = 0; i < match.Groups[0].Value.Length / 3; i++)
{
b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier);
}
char[] hexChar = enc.GetChars(b);
input = input.Replace(match.Groups[0].Value, hexChar[0].ToString());
}
catch
{ ;}
}
input = input.Replace("?=", "").Replace("=\r\n", "");
return input;
}
As mentioned at standard class .NET is exist for this purpose.
string unicodeString =
"=?UTF-8?Q?YourText?=";
System.Net.Mail.Attachment attachment = System.Net.Mail.Attachment.CreateAttachmentFromString("", unicodeString);
Console.WriteLine(attachment.Name);
Following my comment I'd suggest
private static string MessedUpUrlDecode(string input, string encoding)
{
Encoding enc = new ASCIIEncoding();
try
{
enc = Encoding.GetEncoding(charSet);
}
catch
{
enc = new UTF8Encoding();
}
string messedup = input.Split('?')[3];
string cleaned = input.Replace("_", " ").Replace("=...", ".").Replace("=", "%");
return System.Web.HttpUtility.UrlDecode(cleaned, enc);
}
assuming that the mutilating of the source strings is consistent.
I am not too sure on how to remove the
=?utf-8?Q?
Unless it appears all the time, if it does, you can do this:
input = input.Split('?')[3];
To get rid of the trailing '=' you can remove it by:
input = input.Remove(input.Length - 1);
You can get rid of the '_' by replacing it with a space:
input = input.Replace("_", " ");
You can use those pieces of code in your DecodeQuotedPrintables function.
Hope this Helps!
Related
I am having a problem with sending a cyrillic (russian letters) instead of english ones to server(java spring boot utf-8). Here are my frames examples below. The one with english works fine, but cyrillic have a wrong calculating null octet. I am using websocket-csharp-net-stomp-client for it.
I have also tried to change encoding of the string with message to UTF-8
The one that works:
The one that does not work:
public static string SendMessage(string messageText, string chatID)
{
Encoding utf16 = Encoding.GetEncoding("utf-16"); //also tried encode by 1251 instead of utf-16
Encoding utf8 = Encoding.UTF8;
byte[] utf8Bytes = utf8.GetBytes(messageText);
byte[] utf16Bytes = Encoding.Convert(utf8, utf16, utf8Bytes);
string msg = utf16.GetString(isoBytes);
StompMessageSerializer serializer = new StompMessageSerializer();
var content = new MessageContent() { text = msg };
var broad = new StompMessage("SEND", JsonConvert.SerializeObject(content));
broad["token"] = $"{Global.AuthCompTokenFinal}";
broad["contentType"] = "application/json";
broad["destination"] = $"/app/send/{chatID}";
var str = serializer.Serialize(broad);
Console.WriteLine(str);
Global.ws.Send(str);
return str;
}
content length is getting here (library text)
internal StompMessage(string command, string body, Dictionary<string, string> headers)
{
stompCommand = command;
Body = body;
nativeHeaders = headers;
this["content-length"] = body.Length.ToString();
}
What am I missing here?
Here is an error example:
Just deleted this["content-length"] = body.Length.ToString(); and encoding to UTF-8
and it works fine . Wow
I'm familiar with python and very new to C#.
I'm trying to convert a python code to C# code, which is not going well
Here is part of my python code:
def make_signature(uri, access_key):
secret_key = "****" # secret key (from portal or sub account)
secret_key = bytes(secret_key, 'UTF-8')
method = "POST"
message = method + " " + uri + "\n" + timestamp + "\n" + access_key
message = bytes(message, 'UTF-8')
signingKey = base64.b64encode(hmac.new(secret_key, message, digestmod=hashlib.sha256).digest())
and I tried to convert it by this C# code:
using (HMACSHA256 sha = new HMACSHA256(hmac_key))
{
var bytes = Encoding.UTF8.GetBytes(messageRaw);
string base64 = Convert.ToBase64String(bytes);
var message = Encoding.UTF8.GetBytes(base64);
// encode
var hash = sha.ComputeHash(message);
// base64 convert
return Convert.ToBase64String(hash);
}
I found out that these two code make different outputs despite of their same inputs.
Could anyone let me know how to convert it correctly?
To convert to Base64, you can use this:
public static string Base64Encode(string plainText)
{
var plainTextBytes = System.Text.Encoding.ASCII.GetBytes(plainText);
return System.Convert.ToBase64String(plainTextBytes);
}
And to convert back!
public static string Base64Decode(string base64EncodedData)
{
var base64EncodedBytes = System.Convert.FromBase64String(base64EncodedData);
return System.Text.Encoding.ASCII.GetString(base64EncodedBytes);
}
I am using RichTextBox for this example:
richTextBox1.Text = Base64Encode(richTextBox1.Text);, Convert TO Base64
richTextBox1.Text = Base64Decode(richTextBox1.Text);, Convert FROM Base64
You can change the ASCII Encoding to UTF-8, or any Encoding that Visual Studio offers :)
You don't necessarily HAVE to use a public static string, you can just take the code out and put it into buttons or MenuStrip Items
Now, prior to your question, here:
public static string Base64Encode(string plainText)
{
var plainTextBytes = System.Text.Encoding.ASCII.GetBytes(plainText);
var 1base64 = System.Convert.ToBase64String(plainTextBytes);
var 2base64 = System.Text.Encoding.ASCII.GetBytes(1base64);
return System.Convert.ToBase64String(2base64); //This is the double encryption :)
}
I hope this helps you :)
I am using a SQLite database for my program. Everything works fine when I am using English characters in the database path . but when I want to open my SQLite database with Persian characters in its path it fails to open . I searched the internet and found two answers for other languages but it did not worked for Persian.
the two option:
First option:
var dbPath2 =
Path.Combine(Windows.Storage.ApplicationData.Current.RoamingFolder.Path,
"test.db");
string utf8String = String.Empty;
// Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes
byte[] utf16Bytes = Encoding.Unicode.GetBytes(dbPath2);
byte[] utf8Bytes = Encoding.Convert(Encoding.Unicode,
Encoding.UTF8, utf16Bytes);
// Fill UTF8 bytes inside UTF8 string
for (int i = 0; i < utf8Bytes.Length; i++)
{
// Because char always saves 2 bytes, fill char with 0
byte[] utf8Container = new byte[2] { utf8Bytes[i], 0 };
utf8String += BitConverter.ToChar(utf8Container, 0);
}
string dbPath = utf8String;
var db = new SQLite.SQLiteConnection(dbPath)
Second option (In Sqlite.cs comes when you add the reference)
public SQLiteConnection(string databasePath, bool
storeDateTimeAsTicks = false)
{
DatabasePath = databasePath;
Sqlite3DatabaseHandle handle;
var r = SQLite3.Open16(DatabasePath, out handle);
Handle = handle;
if (r != SQLite3.Result.OK)
{
throw SQLiteException.New(r, String.Format("Could not
open database file: {0} ({1})", DatabasePath, r));
}
_open = true;
StoreDateTimeAsTicks = storeDateTimeAsTicks;
BusyTimeout = TimeSpan.FromSeconds(0.1);
}
thanks
So given this input string:
=?ISO-8859-1?Q?TEST=2C_This_Is_A_Test_of_Some_Encoding=AE?=
And this function:
private string DecodeSubject(string input)
{
StringBuilder sb = new StringBuilder();
MatchCollection matches = Regex.Matches(inputText.Text, #"=\?(?<encoding>[\S]+)\?.\?(?<data>[\S]+[=]*)\?=");
foreach (Match m in matches)
{
string encoding = m.Groups["encoding"].Value;
string data = m.Groups["data"].Value;
Encoding enc = Encoding.GetEncoding(encoding.ToLower());
if (enc == Encoding.UTF8)
{
byte[] d = Convert.FromBase64String(data);
sb.Append(Encoding.ASCII.GetString(d));
}
else
{
byte[] bytes = Encoding.Default.GetBytes(data);
string decoded = enc.GetString(bytes);
sb.Append(decoded);
}
}
return sb.ToString();
}
The result is the same as the data extracted from the input string. What am i doing wrong that this text is not getting decoded properly?
UPDATE
So i have this code for decoding quote-printable:
public string DecodeQuotedPrintable(string encoded)
{
byte[] buffer = new byte[1];
return Regex.Replace(encoded, "=(\r\n?|\n)|=([A-F0-9]{2})", delegate(Match m)
{
if (byte.TryParse(m.Groups[2].Value, NumberStyles.HexNumber, CultureInfo.InvariantCulture, out buffer[0]))
{
return Encoding.ASCII.GetString(buffer);
}
else
{
return string.Empty;
}
});
}
And that just leaves the underscores. Do i manually convert those to spaces (Replace("_"," ")), or is there something else i need to do to handle that?
Looks like you don't fully understand format of input line. Check it here: http://www.ietf.org/rfc/rfc2047.txt
format is: encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
so you have to
Extranct charset(encoding in terms of .net). Not just UTF8 or Default (Utf16)
Extract encoding: either B for base64 Q for quoted-printable (your case!)
Then perform decoding to bytes then to string
The function's not even trying to decode the quoted-printable encoded stuff (the hex codes and underscores). You need to add that.
It's handling the encoding wrong (UTF-8 gets decoded with Encoding.ASCII for some bizarre reason)
Our website has files in a few different languages - French, Spanish, Portuguese, and English. When a user uploads a file that contains special characters like ó or ç or ã etc i get an error message when i return File(data, "application/octet-stream", name); in MVC i get the exception:
System.FormatException: An invalid character was found in the mail header.
I found an article in MSDN for this showing how to set the mailmessage to UTF-8 encoding to avoid this. But i do not know how to UTF-8 encode the filename when using the MVC file actionresult. I found an article on the net to UTF-8 encode a string but when I try to use it I get a garbage name so I guess I do not understand what UTF-8 encoding is supposed to do to the string. Here is the sample code found in this blog post: An invalid character was found in the mail header
public static string GetCleanedFileName(string s)
{
char[] chars = s.ToCharArray();
var sb = new StringBuilder();
for (int index = 0; index < chars.Length; index++)
{
string encodedString = EncodeChar(chars[index]);
sb.Append(encodedString);
}
return sb.ToString();
}
private static string EncodeChar(char chr)
{
var encoding = new UTF8Encoding();
var sb = new StringBuilder();
byte[] bytes = encoding.GetBytes(chr.ToString());
for (int index = 0; index < bytes.Length; index++)
{
sb.AppendFormat("%{0}", Convert.ToString(bytes[index], 16));
}
return sb.ToString();
}
Maybe try another function encoding from and to utf8
//UTF8
public static string ConvertToUTF8(string inputString)
{
string toReturn = "";
byte[] arr = Encoding.UTF8.GetBytes(inputString);
for (int i = 0; i < arr.Length; i++)
{
toReturn += arr[i].ToString() + " ";
}
return toReturn;
}
public static string ConvertFromUTF8(string inputString)
{
inputString = inputString.Trim();
string result = "";
string[] parts = inputString.Split(' ');
byte[] bytes = new byte[parts.Length];
for (int i = 0; i < parts.Length; i++)
{
if (parts[i] == "")
{
continue;
}
try
{
bytes[i] = Convert.ToByte(parts[i]);
}
catch (Exception)
{
MessageBox.Show("Input string was not in a correct format.");
}
}
try
{
result = Encoding.UTF8.GetString(bytes);
}
catch (Exception)
{
throw;
}
return result;
}
I think i have got an idea you have to convert your string not to utf-8 but to utf-16
because utf-8 is encripted ascii as i think.
UTF-16 represents every character using two bytes. UTF-8 uses the one byte ASCII character encodings for ASCII characters and represents non-ASCII characters using variable-length encodings. Keep in mind that while UTF-8 can save space for Western languages, which is an argument often used by proponents, it can actually use up to three bytes per character for other languages.
And that symbols you wrote are not ASCII