Decoding in asp.net 4.0 - Special Characters - c#

decoding of a special characters in asp.net as per the W3C standards. ASCII - URL encoding chart. Some of the special characters are not being converted instead converting to "?", check the below issue, actual result for %92 ASCII is "`", I'm trying to achieve this to decode to a urlencoding equal character.
strurl="Workers%92+Accommodation";
string strdecode=Server.UrlDecode(strurl);
Ex: ASCII code %92 (as per W3C standarards url encoding is for ' - which is not there in key board refer http://www.w3schools.com/TAGS/ref_urlencode.asp).

try to encode/decode with this:
public static string EncodeString(string decodedString)
{
return Convert.ToBase64String(Encoding.UTF8.GetBytes(decodedString));
}
public static string DecodeString(string encodedString)
{
return Encoding.UTF8.GetString(Convert.FromBase64String(encodedString));
}
edit add:
When you are going to transmit a value via a URL the method to use is the: HttpServerUtility.UrlTokenEncode() method
reference:
http://msdn.microsoft.com/en-us/library/system.web.httpserverutility.urltokenencode.aspx

You have to decode with appropriate encoding codepage. Quoting from http://www.ascii-code.com:
There are several different variations of the 8-bit ASCII table. The table below is according to ISO 8859-1, also called ISO Latin-1....
According to http://en.wikipedia.org/wiki/Windows_code_page, codepage for ISO Latin-1 is 1252. To decode your string, simply do the following:
strurl = Encoding.GetEncoding(1252).
GetString(HttpUtility.UrlDecodeToBytes("%92"));

Related

C# - Decoding a string does not return the original encoded one

I have a random generated string that I need to put it in a URL, so I encode it like this:
var encodedToken = System.Web.HttpUtility.UrlEncode(token, System.Text.Encoding.UTF8);
In an ASP.NET action method, I receive this token and decode it:
var token = System.Web.HttpUtility.UrlDecode(encodedToken, System.Text.Encoding.UTF8);
but these tokens are not the same. For example the ab+cd string would encode to ab%2bcd and decoding the result would give me the ab cd string (the plus character changed to whitespace).
So far I have only noticed the + character problem, there may be others.
How can I solve this issue?
In your context, it appears that you don't need to call UrlDecode (since %2b decodes to + and + decodes to a blank space - i.e. you have double decoded).
Given, the framework appears to have already decoded it for you, you may remove your use of UrlDecode.
According to the Microsoft documentation:
You can encode a URL using with the UrlEncode method or the UrlPathEncode method. However, the methods return different results. The UrlEncode method converts each space character to a plus character (+). The UrlPathEncode method converts each space character into the string "%20", which represents a space in hexadecimal notation. Use the UrlPathEncode method when you encode the path portion of a URL in order to guarantee a consistent decoded URL, regardless of which platform or browser performs the decoding.
https://learn.microsoft.com/en-us/dotnet/api/system.web.httputility.urlencode?view=netframework-4.7.2

QueryString converting %E1 to %ufffd

I have a URL like the following
http://mysite.com/default.aspx?q=%E1
Where %E1 is supposed to be á. When I call Request.QueryString from my C# page I receive
http://mysite.com/default.aspx?q=%ufffd
It does this for any accented character. %E1, %E3, %E9, %ED etc. all get passed as %ufffd. Normal encoded values (%2D, %2E, %27) all get passed correctly.
The config file already has the responseEncoding/requestEncoding in the globalization section set to UTF-8.
How could I read the correct values?
Please note that I'm not the one generating the query string and I have no control over it.
While it's true that á is encoded as U+00E1, the UTF-8 encoding (which is relevant for URL parameters) is 0xC3 0xA1.
You can verify by called a Wikipedia entry on an accented letter, such as http://en.wikipedia.org/wiki/%C3%81
U+FFFD is the Unicode Replacement Character which indicates the a given character value cannot be correctly encoded in Unicode.
Update:
Your question has two points.
First: How do I encode a Unicode string as parameter. Use
"?q=" + HttpUtility.UrlEncode(value)
Second: How do I retrieve a Unicode value? Use:
Request["q"]
If you receive the %E1 from some other source you do not control, maybe the RawUrl can help you. (I have not tried)

ASCII Extended in C# string

How do I make a string in C# to accept non printable ASCII extended characters like • , cause when I try to put • in a string it just give a blank space or null.
Extended ASCII is just ASCII with the 8 high bits set to different values.
The problem lies in the fact that no commission has ratified a standard for extended ASCII. There are a lot of variants out there and there's no way to tell what you are using.
Now C# uses UTF-16 encoding which will be different from whichever extended ASCII you are using.
You will have to find the matching Unicode character and display it as follows
string a ="\u2649" ; //where 2649 is a the Unicode number
Console.write(a) ;
Alternatively you could find out which encoding your files use and use it like so
eg. encoding Windows-1252:
Encoding encoding = Encoding.GetEncoding(1252);
and for UTF-16
Encoding enc = new UnicodeEncoding(false, true, true);
and convert it using
Encoding.Convert (Encoding, Encoding, Byte[], Int32, Int32)
Details are here
Try this..
Convert those charcaters as string as folows.
string equivalentLetter = Encoding.Default.GetString(new byte[] { (byte)letter });
Now, the equivalent letter contains the correct string.
I tried this for EURO symbol, it worked.
.NET strings are UTF-16 encoded, not extended-ascii (whatever that is). By simply adding a number to a character will give you another defined character within the UTF-16 plain set. If you want to see the underlying character as it would be in your extended ASCII encoding you need to convert the newly calculated letter from whatever encoding you are talking about to UTF-16. See: http://msdn.microsoft.com/en-us/library/66sschk1.aspx

Decoding Base64 / Quoted Printable encoded UTF8 string

In my ASP.Net application working process, I need to do some work with string, which equals something like
=?utf-8?B?SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=?=
How can I decode it to normal human language?
Thanks in advance!
Update:
Convert.FromBase64String() does not work for string, which equals
=?UTF-8?Q?Bestellbest=C3=A4tigung?=
I get The format of s is invalid. s contains a non-base-64 character, more than two padding characters, or a non-white space-character among the padding characters. exception.
Update:
Solution Here
Alternative solution
Update:
What kind of string encoding is that: Nweiß ???
It's actually a base-64 string:
string zz = "SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=";
byte[] dd = Convert.FromBase64String(zz);
// Returns Ihre Bestellung - Versandbestätigung - 1105891246
string yy = System.Text.Encoding.UTF8.GetString(dd);
I've written a library that will decode these sorts of strings. You can find it at http://github.com/jstedfast/MimeKit
Specifically, take a look at MimeKit.Utils.Rfc2047.DecodeText()
This seems to be MIME Header Encoding. The Q in your second example indicates that it is Quoted Printable.
This question seems to cover the variants fairly well. In a quick search I didn't find any .NET libraries to decode this automatically, but it shouldn't be hard to do manually if you need to.
That's not UTF8. Thats a Base64 encoded string.
the UTF-8 only indicates that the target string is in UTF8 format.
After decoding the Base64 string:
SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=
You'll get the following result:
Ihre Bestellung - Versandbestätigung - 1105891246
See Base64 online decode/encode
Looks like a base64 string.
Try Convert.FromBase64String
http://msdn.microsoft.com/en-us/library/system.convert.frombase64string.aspx
This is an encoded word, which is used in email headers when there is non-ASCII content. Encoded words are defined in RFC 2047:
https://www.rfc-editor.org/rfc/rfc2047#section-2
The BNF for an encoded word is:
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
So the correct way to interpret this is:
The data is the stuff between the 3rd and 4th question marks
It has been Base64 encoded (the 'B' stands for Base64; if it were a
'Q' then it would be quoted-printable).
Once you decode the
data, it will be in the UTF-8 character set.
The result, as #Shai correctly pointed out, is:
Ihre Bestellung - Versandbestätigung - 1105891246
This is German. The umlaut is obviously the reason for the UTF-8 and thus the need for an encoded word. The translation is:
Your order - Delivery confirmation - 1105891246
Apparently it's a tracking number for an order.
All modern email clients (and Outlook) transparently support encoded words.
This is a bit of guesswork, but let's try
remove =? from start and ?= from end
keep the start up to the next ? as the character set
Remove the B? - don't know, what it is
Convert the rest to a byte[] via System.Convert.FromBase64String()
Convert this to the final String via Encoding.GetSTring() using the character set remembered in the second step

Can a Base64 String contain tabs?

Simple yes or no question, and I'm 90% sure that it is no... but I'm not sure.
Can a Base64 string contain tabs?
It depends on what you're asking. If you are asking whether or not tabs can be base-64 encoded, then the answer is "yes" since they can be treated the same as any other ASCII character.
However, if you are asking whether or not base-64 output can contain tabs, then the answer is no. The following link is for an article detailing base-64, including which characters are considered valid:
http://en.wikipedia.org/wiki/Base64
The short answer is no - but Base64 cannot contain carriage returns either.
That is why, if you have multiple lines of Base64, you strip out any carriage returns, line feeds, and anything else that is not in the Base64 alphabet
That includes tabs.
From wikipedia.com:
The current version of PEM (specified
in RFC 1421) uses a 64-character
alphabet consisting of upper- and
lower-case Roman alphabet characters
(A–Z, a–z), the numerals (0–9), and
the "+" and "/" symbols. The "="
symbol is also used as a special
suffix code. The original
specification, RFC 989, additionally
used the "*" symbol to delimit encoded
but unencrypted data within the output
stream.
As you can see, tab characters are not included. However, you can of course encode a tab character into a base64 string.
Sure. Tab is just ASCII character 9, and that has a base64 representation just like any other integer.
Base64 specification (RFC 4648) states in Section 3.3 that any encountered non-alphabet characters should be rejected unless explicitly allowed by another specification:
Implementations MUST reject the
encoded data if it contains
characters outside the base alphabet
when interpreting base-encoded
data, unless the specification
referring to this document explicitly
states otherwise. Such specifications
may instead state, as MIME does,
that characters outside the base
encoding alphabet should simply be
ignored when interpreting data ("be
liberal in what you accept").
Note that this means that any
adjacent carriage return/ line feed
(CRLF) characters constitute
"non-alphabet characters" and are
ignored.
Specs such as PEM (RFC 1421) and MIME (RFC 2045) specify that Base64 strings can be broken up by whitespaces. Per referenced RFC 822, a tab (HTAB) is considered a whitespace character.
So, when Base64 is used in context of either MIME or PEM (and probably other similar specifications), it can contain whitespace, including tabs, which should be handled (stripped out) while decoding the encoded content.
Haha, as you see from the responses, this is actually not such a simple yes no answer.
A resulting Base64 string after conversion cannot contain a tab character, but It seems to me that you are not asking that, seems to me that you are asking can you represent a string (before conversion) containing a tab in Base64, and the answer to that is yes.
I would add though that really what you should do is make sure that you take care to preserve the encoding of your string, i.e. convert it to an array of bytes with your correct encoding (Unicode, UTF-8 whatever) then convert that array of bytes to base64.
EDIT: A simple test.
private void button2_Click(object sender, EventArgs e)
{
StringBuilder sb = new StringBuilder();
string test = "The rain in spain falls \t mainly on the plain";
sb.AppendLine(test);
UTF8Encoding enc = new UTF8Encoding();
byte[] b = enc.GetBytes(test);
string cvtd = Convert.ToBase64String(b);
sb.AppendLine(cvtd);
byte[] c = Convert.FromBase64String(cvtd);
string backAgain = enc.GetString(c);
sb.AppendLine(backAgain);
MessageBox.Show(sb.ToString());
}
It seems that there is lots of confusion here; and surprisingly most answers are of "No" variety. I don't think that is a good canonical answer.
The reason for confusion is probably the fact that Base64 is not strictly specified; multiple practical implementations and interpretations exist.
You can check out link text for more discussion on this.
In general, however, conforming base64 codecs SHOULD understand linefeeds, as they are mandated by some base64 definitions (76 character segments, then linefeed etc).
Because of this, most decoders also allow for indentation whitespace, and quite commonly any whitespace between 4-character "triplets" (so named since they encode 3 bytes).
So there's a good chance that in practice you can use tabs and other white space.
But I would not add tabs myself if generating base64 content sent to a service -- be conservative at what you send, (more) liberal at what you receive.
Convert.FromBase64String() in the .NET framework does not seem to mind them. I believe all whitespace in the string is ignored.
string xxx = "ABCD\tDEFG"; //simulated Base64 encoded string w/added tab
Console.WriteLine(xxx);
byte[] xx = Convert.FromBase64String(xxx); // convert string back to binary
Console.WriteLine(BitConverter.ToString(xx));
output:
ABCD DEFG
00-10-83-0C-41-46
The relevant clause of RFC-2045 (6:8)
The encoded output stream must be
represented in lines of no more
than 76 characters each. All line
breaks or other characters not
found in Table 1 must be ignored by
decoding software. In base64 data,
characters other than those in Table
1, line breaks, and other white
space probably indicate a transmission
error, about which a warning
message or even a message rejection
might be appropriate under some
circumstances.
YES!
Base64 is used to encode ANY 8bit value (Decimal 0 to 255) into a string using a set of safe characters. TAB is decimal 9.
Base 64 uses one of the following character sets:
Data: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
URLs: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_
Binary Attachments (eg: email) in text are also encoded using this system.

Categories