How to convert Arabic string to UTF-8 using C#? - c#

I need to convert some arabic text to utf-8 and convert it to Hexa I made some codes but it turns the output to like what in next image.
Codes I trid :
string myName = _name.Text;
string myNameLength = _name.TextLength.ToString("X2");
byte[] nameByte = Encoding.Default.GetBytes(myName);
var hexStringName = BitConverter.ToString(nameByte);
hexStringCo = hexStringCo.Replace("-", "");
Picture

Getting the utf8 bytes is:
string name = "عبود";
byte[] utf8 = Encoding.UTF8.GetBytes(name);
var hex = BitConverter.ToString(utf8);
hex = hex.Replace("-", "");
Console.WriteLine(hex); // D8B9D8A8D988D8AF
What you do with those is up to you; there's zero chance that a hex string was rendered with the replacement character (aka �), so: you're doing something else that we can't see. Maybe show us what you're doing with the value once you have it.

Related

Converting UTF 16 string to ANSI string

So I have a string:
string test = "Checking";
And it's being stored as a UTF 16 string. I however want to convert the string into the format of ANSI. Is there a way to do that?
If "ANSI" is the default code page:
byte[] result = Encoding.Default.GetBytes(test);
Otherwise you can define specific ANSI code page:
int cp = 1250; // e.g. Windows-1250
byte[] result = Encoding.GetEncoding(cp).GetBytes(test);

Need help converting escaped Unicode back to original string

I have this string that was UTF8 encoded from PHP.
Encoded value : tr\u008fs
How do I get the original value back using C#?
The original value should be très.
Try this:
Encoding encoding = new UTF8Encoding();
string s = "tr\u008fs";
string value = encoding.GetString(encoding.GetBytes(s));

UtF-8 gives extra string in German character

I have file name testtäöüßÄÖÜ . I want to convert in UTF-8 using c#.
string test ="testtäöüß";
var bytes = new List<byte>(test.Length);
foreach (var c in test)
bytes.Add((byte)c);
var retValue = Encoding.UTF8.GetString(bytes.ToArray());
after running this code my output is : 'testt mit Umlaute äöü?x. where mit Umlaute is extra
text.
Can anybody help me ?
Thanks in advance.
You can't do that. You can't cast an UTF-8 character to byte. UTF-8 for anything other than ASCII requires at least two bytes, byte can can't store this
Instead of creating a list, use
byte[] bytes = System.Text.Encoding.UTF8.GetBytes (test);
I think, Tseng means the following
Taken from: http://www.chilkatsoft.com/p/p_320.asp
System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;
// This is our Unicode string:
string s_unicode = "abcéabc";
// Convert a string to utf-8 bytes.
byte[] utf8Bytes = System.Text.Encoding.UTF8.GetBytes(s_unicode);
// Convert utf-8 bytes to a string.
string s_unicode2 = System.Text.Encoding.UTF8.GetString(utf8Bytes);
MessageBox.Show(s_unicode2);

Decoding an UTF-8 string to Windows-1256

I used this code to encode a UTF-8 string to Windows-1256 string:
string q = textBox1.Text;
UTF7Encoding utf = new UTF7Encoding();
byte[] winByte = Encoding.GetEncoding(1256).GetBytes(q);
string result = utf.GetString(winByte);
This code is working but I can't decode the result or encoded to original string!
How I can decode an encoded string (result variable) to same before converted (q variable)?
You are converting the strings incorrectly.
Have a look at the commented code below. The comments explain what is wrong, and how to do it correctly, but basically what is happening is:
Firstly you use Encoding.GetEncoding(1256).GetBytes(q) to convert a string (which is UTF16) to an ANSI codepage 1256 string.
Then you use a UTF7 encoding to convert it back. But that's wrong because you need to use an ANSI codepage 1256 encoding to convert it back:
string q = "ABئبئ"; // UTF16.
UTF7Encoding utf = new UTF7Encoding(); // Used to convert UTF16 to/from UTF7
// Convert UTF16 to ANSI codepage 1256. winByte[] will be ANSI codepage 1256.
byte[] winByte = Encoding.GetEncoding(1256).GetBytes(q);
// Convert UTF7 to UTF16.
// But this is WRONG because winByte is ANSI codepage 1256, NOT UTF7!
string result = utf.GetString(winByte);
Debug.Assert(result != q); // So result doesn't equal q
// The CORRECT way to convert the ANSI string back:
// Convert ANSI codepage 1256 string to UTF16
result = Encoding.GetEncoding(1256).GetString(winByte);
Debug.Assert(result == q); // Now result DOES equal q

How to encode Japanese characters

I have to develop a program. This is encoding system.
I have this Japanese characters that are:
つれづれなるまゝに、日暮らし、硯にむかひて、心にうつりゆくよしなし事を、そこはかとなく書きつくれば、あやしうこそものぐるほしけれ
I want to convert this string to encoding like this:
%26%2312388%3B%26%2312428%3B%26%2312389%3B%26%2312428%3B%26%2312394%3B%26%2312427%3B%26%2312414%3B%26%2312445%3B%26%2312395%3B%26%2312289%3B%26%2326085%3B%26%2326286%3B%26%2312425%3B%26%2312375%3B%26%2312289%3B%26%2330831%3B%26%2312395%3B%26%2312416%3B%26%2312363%3B%26%2312402%3B%26%2312390%3B%26%2312289%3B%26%2324515%3B%26%2312395%3B%26%2312358%3B%26%2312388%3B%26%2312426%3B%26%2312422%3B%26%2312367%3B%26%2312424%3B%26%2312375%3B%26%2312394%3B%26%2312375%3B%26%2320107%3B%26%2312434%3B%26%2312289%3B%26%2312381%3B%26%2312371%3B%26%2312399%3B%26%2312363%3B%26%2312392%3B%26%2312394%3B%26%2312367%3B%26%2326360%3B%26%2312365%3B%26%2312388%3B%26%2312367%3B%26%2312428%3B%26%2312400%3B%26%2312289%3B%26%2312354%3B%26%2312420%3B%26%2312375%3B%26%2312358%3B%26%2312371%3B%26%2312381%3B%26%2312418%3B%26%2312398%3B%26%2312368%3B%26%2312427%3B%26%2312411%3B%26%2312375%3B%26%2312369%3B%26%2312428%3B%26%2312290%3B.
How can I do that?
I believe you are looking for HttpUtility.UrlEncode, can't figure out the encoding to get exactly the same output that you show.
var testString = "つれづれなるまゝに、日暮らし、硯にむかひて、心にうつりゆくよしなし事を、そこはかとなく書きつくれば、あやしうこそものぐるほしけれ。";
var encodedUrl = HttpUtility.UrlEncode(testString, Encoding.UTF8);
You might want to change your question, as you don't really need to convert Unicode to ASCII, which is impossible. You rather need to Persent encode or URL encode Percent-encoding.
[EDIT]
I figured it out:
var testString = "つれづれなるまゝに、日暮らし、硯にむかひて、心にうつりゆくよしなし事を、そこはかとなく書きつくれば、あやしうこそものぐるほしけれ。";
var htmlEncoded = string.Concat(testString.Select(arg => string.Format("&#{0};", (int)arg)));
var result = HttpUtility.UrlEncode(htmlEncoded);
The result will exactly match to the encoding you that you provided.
Step by step:
var inputChar = 'つ';
var charValue = (int)inputChar; // 12388
var htmlEncoded = "&#" + charValue + ";"; // つ
var ulrEncoded = HttpUtility.UrlEncode(htmlEncoded); // %26%2312388%3b
This is impossible. Unicode is so much larger than ASCII and you can't look up every character from Unicode in ASCII. while ASCII is 256 characters only (with control chars), Unicode is tens of thousands (I guess).
Here is a function that seems to work:
public static string UrlDoubleEncode(string text)
{
if (text == null)
return null;
StringBuilder sb = new StringBuilder();
foreach (int i in text)
{
sb.Append('&');
sb.Append('#');
sb.Append(i);
sb.Append(';');
}
return HttpUtility.UrlEncode(sb.ToString());
}

Categories