I have a requirement to encode and decode Japanese characters. I tried in JAVA and it worked fine with "Cp939" encoding but am unable to find that encoding in .NET. The 932 encoding doesn't encode all the characters and so i need to find out a way of implementing 939 encoding in .NET.
Java Code :
convStr = new String(str8859_1.getBytes("Cp037"), "Cp939");
.NET :
bytesConverted = Encoding.Convert(Encoding.GetEncoding(37),
Encoding.GetEncoding(932), bytesConverted);
// This result is a junk of characters and is totally different
// from the expected output 'ニツポンバ'
convStr = Encoding.GetEncoding(1252).GetString(bytesConverted);
The encoded bytes are in the encoding 932, so why are you using the encoding 1252 when you convert the encoded bytes to a string?
The following should work:
bytesConverted = Encoding.Convert(Encoding.GetEncoding(37),
Encoding.GetEncoding(932), bytesConverted);
// This result is a junk of characters and is totally different
// from the expected output 'ニツポンバ'
convStr = Encoding.GetEncoding(932).GetString(bytesConverted);
is this an error or just how you typed it ?
bytesConverted = Encoding.Convert(Encoding.GetEncoding(37),
Encoding.GetEncoding(932), bytesConverted);
should be:
bytesConverted = Encoding.Convert(Encoding.GetEncoding(37),
Encoding.GetEncoding(939), bytesConverted);
Surely ?
Related
I'm trying to create a PHP client wrapper to talk to a .NET API. What I have is working but I am new to PHP development and what I have now looks like it may not work 100% of the time.
C# code I am trying to replicate:
private static void HMAC_Debug()
{
Console.WriteLine("Secret Key (Base64): 'qCJ6KNCd/ASFOt1cL5uq2TUYcRjplpYUy7QdUmvaCTs='");
var secret = Convert.FromBase64String("qCJ6KNCd/ASFOt1cL5uq2TUYcRjplpYUy7QdUmvaCTs=");
Console.WriteLine("Value To Hash (UTF8): 'MyHashingValue©'");
var value = Encoding.UTF8.GetBytes("MyHashingValue©");
using (HMACSHA256 hmac = new HMACSHA256(secret))
{
byte[] signatureBytes = hmac.ComputeHash(value);
string requestSignatureBase64String = Convert.ToBase64String(signatureBytes);
Console.WriteLine("Resulting Hash (Base64): '{0}'", requestSignatureBase64String);
}
Console.ReadLine();
}
My PHP Equiv:
$rawKey = base64_decode("qCJ6KNCd/ASFOt1cL5uq2TUYcRjplpYUy7QdUmvaCTs=");
// $hashValArr = unpack("C*", utf8_encode("MyHashingValue©"));
//
// $hashVal = call_user_func_array("pack", array_merge(array("C*"), $hashValArr));
$hashVal = "MyHashingValue©";
$raw = hash_hmac("sha256", $hashVal, $rawKey, TRUE);
$rawEnc = base64_encode($raw);
echo $rawEnc;
These two snippets produce the same Base64 output, but I am relying on the string variables in PHP being default encoded to UTF8 - is this a correct assumption or is there something more stable I can do?
You can see from the commented out PHP lines I attempted to manually encode it to UTF8 then extract out the ASCII bytes for the PHP HMAC function but it didn't produce the same output as the c# code.
Thanks
Marlon
Which version of PHP are you using?
In general you cannot rely on the encoding being UTF-8. In fact it might be possible that you just stored the file as UTF-8 (I guess without BOM) but older PHP versions (as far as I know before PHP 7) are not capable to work natively with unicode, they just read it as ASCII / Extended ASCII.
That said, if you do not manipulate the string it is possible that your example works because you are just processing the bytes that are stored in the variable. And if this byte sequence happend to be a UTF-8 encoded string at the time you inserted it into your source code it stays that way.
If you get the string from an abritrary source you should make sure which encoding is used and consider the multibyte string processing functions of PHP, which can work with different encodings [1].
[1] http://us2.php.net/manual/en/ref.mbstring.php
I am parsing some web content in a response from a HttpWebRequest.
This web content is using charset ISO-8859-1 and when parsing it and finally getting the word needed from the response, I am receiving a string with a question mark like this � and I want to know which is the right way to transform it back into a readable string.
So, what I've tried is to convert the current word encoding into UTF-8 like this:
(I am wondering if UTF-8 could solve my problem)
string word = "ESPA�OL";
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf = Encoding.GetEncoding("UTF-8");
byte[] isoBytes = iso.GetBytes(word);
byte[] utfBytes = Encoding.Convert(iso, utf, isoBytes);
string utfWord = utf.GetString(utfBytes);
Console.WriteLine(utfWord);
However, utfWord variable outputs ESPA?OL which is still wrong. The correct output is supposed to be ESPAÑOL.
Can someone please give me the right directions to solve this, if possible?
The word in question is "ESPAÑOL". This can be encoded correctly in ISO-8859-1 since all characters in the word are represented in ISO-8859-1.
You can see this for yourself using the following simple program:
using System;
using System.Diagnostics;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Encoding enc = Encoding.GetEncoding("ISO-8859-1");
string original = "ESPAÑOL";
byte[] iso_8859_1 = enc.GetBytes(original);
string roundTripped = enc.GetString(iso_8859_1);
Debug.Assert(original == roundTripped);
Console.WriteLine(roundTripped);
}
}
}
What this tells you is that you need to properly diagnose where the erroneous character comes from. By the time that you have a � character, it is too late. The information has been lost. The presence of the � character indicates that, at some point, a conversion was performed into a character set that did not contain the character Ñ.
A conversion from ISO-8859-1 to a Unicode encoding will correctly handle "ESPAÑOL" because that word can be encoded in ISO-8859-1.
The most likely explanation is that somewhere along the way, the text "ESPAÑOL" is being converted to a character set that does not contain the letter Ñ.
In my Silverlight Application I am getting an XML File encoded with windows-1252.
Now my Problem it won't display correctly until the windows-1252 string is converted to a UTF8 string.
In a normal C# enviornment that wouldn't be that big of a problem: There I could do something like this:
Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = ReadFile(Server.MapPath(HtmlFile));
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);
(Convert a string's character encoding from windows-1252 to utf-8)
But silverlight doesn't support windows-1252 - it is unicode only.
PS
I stumbled upon "Encoding for Silverlight" http://encoding4silverlight.codeplex.com/ - but it seems there is no support for windows-1252 there either?
EDIT:
I solved my problem on the "Server Side" - The actual problem is still open.
Encoding for Silverlight is a third party encoding system but only supported all DBCS (Double-Byte Character Set) now. However, windows-1252 is SBCS (Single-Byte Character Set).
But you can write a encoder/decoder for Encoding for Silverlight, I Think will be very easy.
I am reading a file (line by line) full of Swedish characters like äåö but how can I read and save the strings with Swedish characters. Here is my code and I am using UTF8 encoding:
TextReader tr = new StreamReader(#"c:\testfile.txt", System.Text.Encoding.UTF8, true);
tr.ReadLine() //returns a string but Swedish characters are not appearing correctly...
You need to change the System.Text.Encoding.UTF8 to System.Text.Encoding.GetEncoding(1252). See below
System.IO.TextReader tr = new System.IO.StreamReader(#"c:\testfile.txt", System.Text.Encoding.GetEncoding(1252), true);
tr.ReadLine(); //returns a string but Swedish characters are not appearing correctly
I figured it out myself i.e System.Text.Encoding.Default will support Swedish characters.
TextReader tr = new StreamReader(#"c:\testfile.txt", System.Text.Encoding.Default, true);
System.Text.Encoding.UTF8 should be enough and it is supported both on .NET Framework and .NET Core https://learn.microsoft.com/en-us/dotnet/api/system.text.encoding?redirectedfrom=MSDN&view=netframework-4.8
If you still have issues with ��� characters (instead of having ÅÖÄ) then check the source file - what kind of encoding does it have? Maybe it's ANSI, then you have to convert to UTF8.
You can do it in Notepad++. You can open text file and go to Encoding - Convert to UTF-8.
Alternatively in the source code (C#):
var myString = Encoding.UTF8.GetString(File.ReadAllBytes(pathToTheTextFile));
I'm working with ICQ protocol and I found problem with special letters (fxp diacritics). I read that ICQ using another encoding (CP-1251 if I remember).
How can I decode string with text to correct encoding?
I've tried using UTF8Encoding class, but without success.
Using ICQ-sharp library.
private void ParseMessage (string uin, byte[] data)
{
ushort capabilities_length = LittleEndianBitConverter.Big.ToUInt16 (data, 2);
ushort msg_tlv_length = LittleEndianBitConverter.Big.ToUInt16 (data, 6 + capabilities_length);
string message = Encoding.UTF8.GetString (data, 12 + capabilities_length, msg_tlv_length - 4);
Debug.WriteLine(message);
}
If contact using the same client it's OK, but if not incoming and outcoming messages with diacritics are just unreadable.
I've determinated (using this -> https://stackoverflow.com/a/12853721/846232) that it's in BigEndianUnicode encoding. But if string not contains diacritics its unreadable (chinese letters). But if I use UTF8 encoding on text without diacritics its ok. But I don't know how to do that it will be encoded right allways.
If UTF-8 kinda works (i.e. it works for "english", or any US-ASCII characters), then you don't have UTF-16. Latin1 (or Windows-1252, Microsoft's variant), or e.g. Windows-1251 or Windows-1250 are perfectly possible though, since these the first part containing latin letters without diacritics are the same.
Decode like this:
var encoding = Encoding.GetEncoding("Windows-1250");
string message = encoding.GetString(data, 12 + capabilities_length, msg_tlv_length - 4);