Decoding a string to get the value "Indsætning"

Decoding a string to get the value "Indsætning" - c#

I have a string (C# code) that looks as follows:
string s = "IndsÃ¦tning";
It is encoded in some way that I am not sure of.
I'd like to decode it so that I get the following string:
Indsætning
I have tried with
string s1 = HttpUtility.UrlDecode(s);
string s2 = HttpUtility.HtmlDecode(s);
However, I don't get the string that I am looking for.
Any help is appreciated.

Convert the string to bytes and then using the Encoding class to get the UTF-8 string representation.
string s = "IndsÃ¦tning";
byte[] sBytes = s.Select(x => (byte)x).ToArray();
string decoded = Encoding.UTF8.GetString(sBytes);
Edit - As mentioned in the comments, this assumes that the string being converted is of a particular encoding (Latin-1 in this case). Therefore, it won't necessarily work for all strings, unless you know that they've all been encoded into the same format.

Related

How to convert unicode to utf-8 encoding in c#

I want to convert unicode string to UTF8 string. I want to use this UTF8 string in SMS API to send unicode SMS.
I want conversion like this tool
https://cafewebmaster.com/online_tools/utf8_encode
eg. I have unicode string "हैलो फ़्रेंड्स" and it should be converted into "à¤¹à¥à¤²à¥ à¥à¥à¤°à¥à¤à¤¡à¥à¤¸"
I have tried this but not getting expected output
private string UnicodeToUTF8(string strFrom)
{
byte[] bytes = Encoding.Default.GetBytes(strFrom);
return Encoding.UTF8.GetString(bytes);
}
and calling function like this
string myUTF8String = UnicodeToUTF8("हैलो फ़्रेंड्स");

I don't think this is possible to answer concretely without knowing more about the SMS API you want to use. The string type in C# is UTF-16. If you want a different encoding, it's given to you as a byte[] (because a string is UTF-16, always).
You could 'cast' that into a string by doing something like this:
static string UnicodeToUTF8(string from) {
var bytes = Encoding.UTF8.GetBytes(from);
return new string(bytes.Select(b => (char)b).ToArray());
}
As far as I can tell this yields the same output as the website you linked. However, without knowing what API you're handing this string off to, I can't guarantee that this will ultimately work.
The point of string is that we don't need to worry about its underlying encoding, but this casting operation is kind of a giant hack and makes no guarantees that string represents a well-formed string anymore.
If something expects a UTF-8 encoding, it should accept a byte[], not a string.

Try this:
string output = "hello world";
byte[] bytes1 = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, Encoding.Unicode.GetBytes(output));
byte[] bytes2 = Encoding.Convert(Encoding.Unicode, Encoding.Unicode, Encoding.Unicode.GetBytes(output));
var output1 = Encoding.UTF8.GetString(bytes1);
var output2 = Encoding.Unicode.GetString(bytes2);
You will see that bytes1 is 11 bytes (1 byte per char UTF-8) and bytes2 is 22 bytes (2 bytes per char for unicode)

Need help converting escaped Unicode back to original string

I have this string that was UTF8 encoded from PHP.
Encoded value : tr\u008fs
How do I get the original value back using C#?
The original value should be très.

Try this:
Encoding encoding = new UTF8Encoding();
string s = "tr\u008fs";
string value = encoding.GetString(encoding.GetBytes(s));

Passing a url encoded byte array

I need to pass a byte array via a URL. So I am encoding it with the UrlEncode Method like this:
string ergebnis = HttpUtility.UrlEncode(array);
The result is the following string: %00%00%00%00%00%25%b8j
Now when I pass this string in a URL like this http://localhost:51980/api/Insects?Version=%00%00%00%00%00%25%b8j
This is my Get function:
public List<TaxonDiagnosis> Get([FromUri] string version)
{
List<TaxonDiagnosis> result = new List<TaxonDiagnosis>();
result = db.TaxonDiagnosis.ToList();
byte[] array = HttpUtility.UrlDecodeToBytes(version);
if (version != null)
result = db.GetTaxonDiagnosis(array).ToList();
return result;
}
The problem is, version's value isn't %00%00%00%00%00%25%b8j. Instead it is this \0\0\0\0\0%�j. This of course causes problems when I try to decode it into a byte array again.
How can I pass the correct string in the Url?

As suggested by Jon Skeet, I encoded the arraywith a URL-safe base64 decodabet like in this post: How to achieve Base64 URL safe encoding in C#?

Encoding strings to and from base-64

I'm trying to encode some strings back and forth from base-64 string and I'm having truble to get the right result.
string text = base64string.... //Here I have a base-64 string.
byte[] encodedByte = System.Text.ASCIIEncoding.ASCII.GetBytes(text);
string base64Encoded = Convert.ToBase64String(encodedByte);
if (text == base64Encoded) //If the new encoded string is equal to its original value
return base64Encoded;
I have tried my ways to do this and I don't seem to get the right result. I have tried both with System.Text.Encoding.Unicode and System.Text.Encoding.UTF8
What could be the problem? Does anyone have a proper solution?

string text = base64string.... //Here I have a base-64 string.
byte[] encodedByte = System.Text.ASCIIEncoding.ASCII.GetBytes(text);
string base64Encoded = Convert.ToBase64String(encodedByte);
You are double encoding the string. You begin with a base64 string, get the bytes, and then encode it again. If you want to compare you will need to begin with the original string.

If text is a base-64 string, then you are doing it backwards:
byte[] raw = Convert.FromBase64String(text); // unpack the base-64 to a blob
string s = Encoding.UTF8.GetString(raw); // assume the blob is UTF-8, and
// decode to a string
which will get you it as a string. Note, though, that this scenario is only useful for representing unicode text in an ascii format. Normally you wouldn't base-64 encode it if the original contents are string.

Convert whatever it is that you need in Base64 into a Byte array then use the FromBase64String and ToBase64String to convert to and from Base64:
Byte[] buffer = Convert.FromBase64String(myBase64String1);
myBase64String2 = Convert.ToBase64String(buffer);
myBase64String1 will be equal to myBase64String2. You will need to use other methods to get your data type into a Byte array and the reverse to get your data type back. I have used this to convert the content of a class into a byte array and then to Base64 string and write the string to the filesystem. Later I read it back into a class instance by reversing the process.

You have the encoding code correctly laid out. To confirm whether the base64-encoded string is correct, you can try decoding it and comparing the decoded contents to the original:
var decodedBytes = Convert.FromBase64String(base64encoded);
var compareText = System.Text.Encoding.ASCII.GetString(decodedText);
if (text == compareText)
{
// carry on...
return base64encoded;
}

Encoding Conversion problem

I've got a little problem changing the ecoding of a string. Actually I read from a DB strings that are encoded using the codepage 850 and I have to prepare them in order to be suitable for an interoperable WCF service.
From the DB I read characters \x10 and \x11 (triangular shapes) and i want to convert them to the Unicode format in order to prevent serialization/deserialization problem during WCF call. (Chars
and are not valid according of the XML specs even if WCF serialize them).
Now, I use following code in order to covert string encoding, but nothing happens. Result string is in fact identical to the original one.
I'm probably missing something...
Please help me!!!
Emanuele
static class UnicodeEncodingExtension
{
public static string Convert(this Encoding sourceEncoding, Encoding targetEncoding, string value)
{
string reEncodedString = null;
byte[] sourceBytes = sourceEncoding.GetBytes(value);
byte[] targetBytes = Encoding.Convert(sourceEncoding, targetEncoding, sourceBytes);
reEncodedString = sourceEncoding.GetString(targetBytes);
return reEncodedString;
}
}
class Program
{
private static Encoding Cp850Encoding = Encoding.GetEncoding(850);
private static Encoding UnicodeEncoding = Encoding.UTF8;
static void Main(string[] args)
{
string value;
string resultValue;
value = "\x10";
resultValue = Cp850Encoding.Convert(UnicodeEncoding, value);
value = "\x11";
resultValue = Cp850Encoding.Convert(UnicodeEncoding, value);
value = "\u25b6";
resultValue = UnicodeEncoding.Convert(Cp850Encoding, value);
value = "\u25c0";
resultValue = UnicodeEncoding.Convert(Cp850Encoding, value);
}
}

It seems you think there is a problem based on an incorrect understanding. But jmservera is correct - all strings in .NET are encoded internally as unicode.
You didn't say exactly what you want to accomplish. Are you experiencing a problem at the other end of the wire?
Just FYI, you can set the text encoding on a WCF binding with the textMessageEncoding element in the config file.

I suspect this line may be your culprit
reEncodedString = sourceEncoding.GetString(targetBytes);
which seems to take your target encoded string of bytes and asks your sourceEncoding to make a string out of them. I've not had a chance to verify it but I suspect the following might be better
reEncodedString = targetEncoding.GetString(targetBytes);

All the strings stored in string are in fact Unicode.Unicode. Read: Strings in .Net and C# and The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Edit: I suppose that you want the Convert function to automatically change \x11 to \u25c0, but the problem here is that \x11 is valid in almost any encoding, the differences usually start in character \x80, so the Convert function will maintain it even if you do that:
string reEncodedString = null;
byte[] unicodeBytes = UnicodeEncoding.Unicode.GetBytes(value);
byte[] sourceBytes = Encoding.Convert(Encoding.Unicode,
sourceEncoding, unicodeBytes);
You can see in unicode.org the mappings from CP850 to Unicode. So, for this conversion to happen you will have to change these characters manually.

byte[] sourceBytes =Encoding.Default.GetBytes(value)
Encoding.UTF8.GetString(sourceBytes)
this sequence usefull for download unicode file from service(for example xml file that contain persian character)

You should try this:
byte[] sourceBytes = sourceEncoding.GetBytes(value);
var convertedString = Encoding.UTF8.GetString(sourceBytes);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Decoding a string to get the value "Indsætning" - c#

Related

How to convert unicode to utf-8 encoding in c#

Need help converting escaped Unicode back to original string

Passing a url encoded byte array

Encoding strings to and from base-64

Encoding Conversion problem

Categories

Resources