convert utf-8 in sting into character - c#

How can i convert a string that contain UTF8 encode into a string that contain only character ?
At the beginning I have a UTF8 string example : "Thành" but after some action i perfom it convert all UTF8 character into UTF8 encode ( in this case it convert "Thành" into "Thành" ). How can i convert it back to origin string ? ( convert "Thành" into "Thành" ). I'm using c#. Thank you all

The text you see is encoded with XML/HTML numeric character references, nothing to do with UTF-8. You can use HttpUtility.HtmlDecode to decode it.

Related

Decoding Base64 string from URL c#

I am trying to decode the following Base64 string that i'm getting from a url like so:
string msgFromUrl = "j3oaCbiwIfZF1QFa%2FHkMaW5lVpnOMBsA5wYI";
byte[] data = Convert.FromBase64String(msgFromUrl);
string decoded = Encoding.UTF8.GetString(data);
Console.WriteLine(decoded);
I am getting an error on the string that i need to decode. The error i'm getting is:
The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.
I have realised that there is a percentage symbol in the string and when i try remove it i come short of the valid length of Base64 char. Since i'm retrieving this from a url, could there be anything hidden or do i have to do more than i have done in order to decode the encoded string?
Thank you .

HTML ASCII Code for £

In my XML document, I have to pass the query string with "£" character and need to retrieve the Job counts result.
While passing query string with "£", does not calculate the correct job postings and gives as 0 for all time.
Please let me know what I need to pass in that xml document to replace "£".
Part of Query string with £:
( [salary]: \\\"less than £10,000 \\\" )
I had a look here: http://htmlarrows.com/currency/
Try one of these:
£
U+000A3
£
£
In a query string in a URI, non-ASCII characters should be escaped using the %NN convention. The Unicode codepoint for a pound sign (£) -- not to be confused with # which some Americans refer to as a pound sign -- is 163. The UTF-8 encoding of that is the two byte sequence C2-A3, so in a URI you should write %C2%A3.

Change Encoding in C#?

Theoretical question :
Let's say there is one source which knows only how to transmit ASCII chars. (0..127)
And let's say there is an endpoint which receives these chars .
Can the endpoint decode those chars as utf8 ?
ascii chars
...
...
|
|
V
read as utf ?
Something like this pseudo code :
var txt="אבג";
var _bytes=Encoding.ASCII.GetBytes(txt); <= it wont recognize [א] here
...transmit...
var myUtfString=Encoding.UTF8.GetString(getBytesFromWire(); <= some magic has to be done here
That is possible, but not using UTF8.
UTF8 works by encoding multibyte characters into sequences of bytes that are between 128 and 255.
Your ASCII protocol will not be able to transmit those bytes.
Instead, you need some mechanism to store arbitrary Unicode codepoints or bytes in pure ASCII text:
You can encode the Unicode text using any encoding to get a stream of (non-ASCII) bytes, then transmit those bytes using Base64 encoding
You can use the UTF7 encoding to encode Unicode codepoints using pure ASCII characters.
This will be substantially more space-efficient than Base64 if your text is mostly ASCII.
var txt = "אבג";
var str = Convert.ToBase64String(Encoding.UTF8.GetBytes(txt)); //<--ASCII
//Transmit
var txt2 = Encoding.UTF8.GetString(Convert.FromBase64String(str));

How to encode & decode non Ascii characters?

I am developing an application in which i want to encode the Spanish text.
But the problem is that,it doesn't encode the special characters such as á, é, í, ó, ú, ü,Á, É, Í, Ó, Ú, Ü,Ñ,ñ .
How can i do this?i want to encode-decode the spanish text.
For international support using simple UTF-8 encoding to encode/decode your data should be enough.
Utf-8 has a beautiful capability to be able to read ASCII with one byte, as ordinary ASCII, and Unicode characters with 2 bytes. So it's able "to shrink" when it's necesary.
For complete C# documentation look on
UTF-8
EDIT
Encoding enc = new UTF8Encoding(true, true);
string value = " á, é, í, ó, ú, ü,Á, É, Í, Ó, Ú, Ü,Ñ,ñ ";
byte[] bytes= enc.GetBytes(value); //convert to BYTE array
//save in some file
//after can read from the file like
string decodedString = enc.GetString(byteArrayReadFromFile);
ok,I am answering my own question ,Hope it will help someone; to print spanish or any other non-ascii character in the given string replace all non-ascii characters by their unicode escape character set
E.g repalce á by \u00e1
And then simply print the string.
i.e
string str="árgrgrgrááhhttá";
str=str.Replace("á", "\u00e1");

What's the difference between UTF8/UTF16 and Base64 in terms of encoding

In. c#
We can use below classes to do encoding:
System.Text.Encoding.UTF8
System.Text.Encoding.UTF16
System.Text.Encoding.ASCII
Why there is no System.Text.Encoding.Base64?
We can only use Convert.From(To)Base64String method, what's special of base64?
Can I say base64 is the same encoding method as UTF-8? Or UTF-8 is one of base64?
UTF-8 and UTF-16 are methods to encode Unicode strings to byte sequences.
See: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Base64 is a method to encode a byte sequence to a string.
So, these are widely different concepts and should not be confused.
Things to keep in mind:
Not every byte sequence represents an Unicode string encoded in UTF-8 or UTF-16.
Not every Unicode string represents a byte sequence encoded in Base64.
Base64 is a way to encode binary data, while UTF8 and UTF16 are ways to encode Unicode text. Note that in a language like Python 2.x, where binary data and strings are mixed, you can encode strings into base64 or utf8 the same way:
u'abc'.encode('utf16')
u'abc'.encode('base64')
But in languages where there's a more well-defined separation between the two types of data, the two ways of representing data generally have quite different utilities, to keep the concerns separate.
UTF-8 is like the other UTF encodings a character encoding to encode characters of the Unicode character set UCS.
Base64 is an encoding to represent any byte sequence by a sequence of printable characters (i.e. A–Z, a–z, 0–9, +, and /).
There is no System.Text.Encoding.Base64 because Base64 is not a text encoding but rather a base conversion like the hexadecimal that uses 0–9 and A–F (or a–f) to represent numbers.
Simply speaking, a charcter enconding, like UTF8 , or UTF16 are useful for to match numbers, i.e. bytes to characters and viceversa, for example in ASCII 65 is matched to "A" , while a base encoding is used mainly to translate bytes to bytes so that the resulting bytes converted from a single byte are printable and are a subset of the ASCII charachter encoding, for that reason you can see Base64 also as a bytes to text encoding mechanism. The main reason to use Base64 is to be trasmit data over a channel that doesn't allow binary data transfer.
That said, now it should be clear that you can have a stream encoded in Base64 that rapresent a stream UTF8 encoded.

Categories