I want to convert a String to ByteArray in C# for Decrypt some data.
When I get de String from the ByteArray created, it shows question marks (?).
Example code:
byte[] strTemp = Encoding.ASCII.GetBytes(strData);
MessageBox.Show(strData);
MessageBox.Show(Encoding.ASCII.GetString(strTemp));
The string is "Ê<,,l"x¡" (With the double quotation mark) and the result to convert again to string is: ???l?x?
I hope this helps you:
To Get byte array from a string
private byte[] StringToByteArray(string str)
{
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
return enc.GetBytes(str);
}
To get a string back from byte array:
private string ByteArrayToString(byte[] arr)
{
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
return enc.GetString(arr);
}
For specific of that input, this BigEndianUnicode encoding seems works fine
byte[] strTemp = Encoding.BigEndianUnicode.GetBytes(strData);
MessageBox.Show(strData);
MessageBox.Show(Encoding.BigEndianUnicode.GetString(strTemp));
`
You are getting a byte array for the ASCII representation of your string, but your string is Unicode.
C# uses Unicode to encode strings, Unicode being able to represent far more symbols as ASCII.
In your example, every symbol which has no ASCII representation is replaced by '?', this is why only 'l' and 'x' appear in the output.
The proper way to do it is to use a Unicode encoding instead:
byte[] strTemp = Encoding.UTF8.GetBytes(strData);
MessageBox.Show(strData);
MessageBox.Show(Encoding.UTF8.GetString(strTemp));
Basically, any Unicode encoding can be used: UTF8, UTF32, Unicode, BigEndianUnicode (https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings).
Related
I have file name testtäöüßÄÖÜ . I want to convert in UTF-8 using c#.
string test ="testtäöüß";
var bytes = new List<byte>(test.Length);
foreach (var c in test)
bytes.Add((byte)c);
var retValue = Encoding.UTF8.GetString(bytes.ToArray());
after running this code my output is : 'testt mit Umlaute äöü?x. where mit Umlaute is extra
text.
Can anybody help me ?
Thanks in advance.
You can't do that. You can't cast an UTF-8 character to byte. UTF-8 for anything other than ASCII requires at least two bytes, byte can can't store this
Instead of creating a list, use
byte[] bytes = System.Text.Encoding.UTF8.GetBytes (test);
I think, Tseng means the following
Taken from: http://www.chilkatsoft.com/p/p_320.asp
System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;
// This is our Unicode string:
string s_unicode = "abcéabc";
// Convert a string to utf-8 bytes.
byte[] utf8Bytes = System.Text.Encoding.UTF8.GetBytes(s_unicode);
// Convert utf-8 bytes to a string.
string s_unicode2 = System.Text.Encoding.UTF8.GetString(utf8Bytes);
MessageBox.Show(s_unicode2);
I have a encryption method GetDecryptedSSN(). I tested it’s correctness by the following test. It works fine
//////////TEST 2//////////
byte[] encryptedByteWithIBMEncoding2 = DecryptionServiceHelper.GetEncryptedSSN("123456789");
string clearTextSSN2 = DecryptionServiceHelper.GetDecryptedSSN(encryptedByteWithIBMEncoding2);
But when I do a conversion to ASCII String and then back, it is not working correctly. What is the problem in the conversion logic?
//////////TEST 1//////////
//String -- > ASCII Byte --> IBM Byte -- > encryptedByteWithIBMEncoding
byte[] encryptedByteWithIBMEncoding = DecryptionServiceHelper.GetEncryptedSSN("123456789");
//encryptedByteWithIBMEncoding --> Encrypted Byte ASCII
string EncodingFormat = "IBM037";
byte[] encryptedByteWithASCIIEncoding = Encoding.Convert(Encoding.GetEncoding(EncodingFormat), Encoding.ASCII,
encryptedByteWithIBMEncoding);
//Encrypted Byte ASCII - ASCII Encoded string
string encodedEncryptedStringInASCII = System.Text.ASCIIEncoding.ASCII.GetString(encryptedByteWithASCIIEncoding);
//UpdateSSN(encodedEncryptedStringInASCII);
byte[] dataInBytesASCII = System.Text.ASCIIEncoding.ASCII.GetBytes(encodedEncryptedStringInASCII);
byte[] bytesInIBM = Encoding.Convert(Encoding.ASCII, Encoding.GetEncoding(EncodingFormat),
dataInBytesASCII);
string clearTextSSN = DecryptionServiceHelper.GetDecryptedSSN(bytesInIBM);
Helper Class
public static class DecryptionServiceHelper
{
public const string EncodingFormat = "IBM037";
public const string SSNPrefix = "0000000";
public const string Encryption = "E";
public const string Decryption = "D";
public static byte[] GetEncryptedSSN(string clearTextSSN)
{
return GetEncryptedID(SSNPrefix + clearTextSSN);
}
public static string GetDecryptedSSN(byte[] encryptedSSN)
{
return GetDecryptedID(encryptedSSN);
}
private static byte[] GetEncryptedID(string id)
{
ServiceProgram input = new ServiceProgram();
input.RequestText = Encodeto64(id);
input.RequestType = Encryption;
ProgramInterface inputRequest = new ProgramInterface();
inputRequest.Test__Request = input;
using (MY_Service operation = new MY_Service())
{
return ((operation.MY_Operation(inputRequest)).Test__Response.ResponseText);
}
}
private static string GetDecryptedID(byte[] id)
{
ServiceProgram input = new ServiceProgram();
input.RequestText = id;
input.RequestType = Decryption;
ProgramInterface request = new ProgramInterface();
request.Test__Request = input;
using (MY_Service operationD = new MY_Service())
{
ProgramInterface1 response = operationD.MY_Operation(request);
byte[] encodedBytes = Encoding.Convert(Encoding.GetEncoding(EncodingFormat), Encoding.ASCII,
response.Test__Response.ResponseText);
return System.Text.ASCIIEncoding.ASCII.GetString(encodedBytes);
}
}
private static byte[] Encodeto64(string toEncode)
{
byte[] dataInBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(toEncode);
Encoding encoding = Encoding.GetEncoding(EncodingFormat);
return Encoding.Convert(Encoding.ASCII, encoding, dataInBytes);
}
}
REFERENCE:
Getting incorrect decryption value using AesCryptoServiceProvider
This is the problem, I suspect:
string encodedEncryptedStringInASCII =
System.Text.ASCIIEncoding.ASCII.GetString(encryptedByteWithASCIIEncoding);
(It's not entirely clear because of all the messing around with encodings beforehand, which seems pointless to me, but...)
The result of encryption is not "text encoded in ASCII" - so you shouldn't try to treat it that way. (You haven't said what kind of encryption you're using, but it would be very odd for it to produce ASCII text.)
It's just an arbitrary byte array. In order to represent that in text only using the ASCII character set, the most common approach is to use base64. So the above code would become:
string encryptedText = Convert.ToBase64(encryptedByteWithIBMEncoding);
Then later, you'd convert it back to a byte array ready for decryption as:
encryptedByteWithIBMEncoding = Convert.FromBase64String(encryptedText);
I would strongly advise you to avoid messing around with the encodings like this if you can help it though. It's not clear why ASCII needs to get involved at all. If you really want to encode your original text as IBM037 before encryption, you should just use:
Encoding encoding = Encoding.GetEncoding("IBM037");
string unencryptedBinary = encoding.GetBytes(textInput);
Personally I'd usually use UTF-8, as an encoding which can handle any character data rather than just a limited subset, but that's up to you. I think you're making the whole thing much more complicated than it needs to be though.
A typical "encrypt a string, getting a string result" workflow is:
Convert input text to bytes using UTF-8. The result is a byte array.
Encrypt result of step 1. The result is a byte array.
Convert result of step 2 into base64. The result is a string.
To decrypt:
Convert the string from base64. The result is a byte array.
Decrypt the result of step 1. The result is a byte array.
Convert the result of step 2 back to a string using the same encoding as step 1 of the encryption process.
In DecryptionServiceHelper.GetEncryptedSSN you are encoding the text in IBM037 format BEFORE encrypting.
So the following piece of code is not correct as you are converting the encrypted bytes to ASCII assuming that its in the IBM037 format. That's wrong as the encrypted bytes is not in IBM037 format (the text was encoded before encryption)
//encryptedByteWithIBMEncoding --> Encrypted Byte ASCII
string EncodingFormat = "IBM037";
byte[] encryptedByteWithASCIIEncoding = Encoding.Convert(Encoding.GetEncoding(EncodingFormat), Encoding.ASCII,
encryptedByteWithIBMEncoding);
One possible solution is to encode the encrypted text using IBM037 format, that should fix the issue I guess.
I have a C++ code snippet that uses MultiByteToWideChar to convert UTF-8 string to UTF-16
For C++, if input is "Hôtel", the output is "Hôtel" which is correct
For C#, if input is "Hôtel", the output is "Hôtel" which is not correct.
The C# code to convert from UTF8 to UTF16 looks like
Encoding.Unicode.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.Unicode,
Encoding.UTF8.GetBytes(utf8)));
In C++ the conversion code looks like
MultiByteToWideChar(
CP_UTF8, // convert from UTF-8
0, // default flags
utf8.data(), // source UTF-8 string
utf8.length(), // length (in chars) of source UTF-8 string
&utf16[0], // destination buffer
utf16.length() // size of destination buffer, in wchar_t's
)
I want to have the same results in C# that I am getting in C++. Is there anything wrong with the C# code ?
It appears you want to treat string characters as Windows-1252 (Often mislabeled as ANSI) code points, and have those code points decoded as UTF-8 bytes, where Windows-1252 code point == UTF-8 byte value.
The reason the accepted answer doesn't work is that it treats the string characters as unicode code points, rather than
Windows-1252. It can get away with most characters because Windows-1252 maps them exactly the same as unicode, but input with characters
like –, €, ™, ‘, ’, ”, • etc.. will fail because Windows-1252 maps those differently than unicode in this sense.
So what you want is simply this:
public static string doWeirdMapping(string arg)
{
Encoding w1252 = Encoding.GetEncoding(1252);
return Encoding.UTF8.GetString(w1252.GetBytes(arg));
}
Then:
Console.WriteLine(doWeirdMapping("Hôtel")); //prints Hôtel
Console.WriteLine(doWeirdMapping("HVOLSVÖLLUR")); //prints HVOLSVÖLLUR
Maybe this one:
private static string Utf8ToUnicode(string input)
{
return Encoding.UTF8.GetString(input.Select(item => (byte)item).ToArray());
}
Try This
string str = "abc!";
Encoding unicode = Encoding.Unicode;
Encoding utf8 = Encoding.UTF8;
byte[] unicodeBytes = unicode.GetBytes(str);
byte[] utf8Bytes = Encoding.Convert( unicode,
utf8,
unicodeBytes );
Console.WriteLine( "UTF Bytes:" );
StringBuilder sb = new StringBuilder();
foreach( byte b in utf8Bytes ) {
sb.Append( b ).Append(" : ");
}
Console.WriteLine( sb.ToString() );
This Link would be helpful for you to understand about encodings and their conversions
Use System.Text.Encoding.UTF8.GetString().
Pass in your UTF-8 encoded text, as a byte array. The function returns a standard .net string which is encoded in UTF-16.
Sample function will be as below:
private string ReadData(Stream binary_file) {
System.Text.Encoding encoding = System.Text.Encoding.UTF8;
// Read string from binary file with UTF8 encoding
byte[] buffer = new byte[30];
binary_file.Read(buffer, 0, 30);
return encoding.GetString(buffer);
}
I have a string which contains binary data (non-text data).
How do I convert this to a raw byte array?
A string in C# - by definition - does not contain binary data. It consists of a sequence of Unicode characters.
If your string contains only Unicode characters in the ASCII (7-bit) character set, you can use Encoding.ASCII to convert the string to bytes:
byte[] result = Encoding.ASCII.GetBytes(input);
If you have a string that contains Unicode characters in the range u0000-u00ff and want to interpret these as bytes, you can cast the characters to bytes:
byte[] result = new byte[input.Length];
for (int i = 0; i < input.Length; i++)
{
result[i] = (byte)input[i];
}
It is a very bad idea to store binary data in a string. However, if you absolutely must do so, you can convert a binary string to a byte array using the 1252 code page. Do not use code page 0 or you will lose some values when in foreign languages. It just so happens that code page 1252 properly converts all byte values from 0 to 255 to Unicode and back again.
There have been some poorly written VB6 programs that use binary strings. Unfortunately some are so many lines of code that it is almost impossible to convert it all to byte() arrays in one go.
You've been warned. Use at your own peril:
Dim bData() As Byte
Dim sData As String
'convert string binary to byte array
bData = System.Text.Encoding.GetEncoding(1252).GetBytes(sData)
'convert byte array to string binary
sData = System.Text.Encoding.GetEncoding(1252).GetString(bData)
Here is one way:
public static byte[] StrToByteArray(string str)
{
System.Text.ASCIIEncoding encoding=new System.Text.ASCIIEncoding();
return encoding.GetBytes(str);
}
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
byte[] theBytes = encoding.GetBytes("Some String");
Note, there are other encoding formats you may want to use.
How can i convert an UTF-8 string into an ISO-8859-1 string?
Try:
System.Text.Encoding iso_8859_1 = System.Text.Encoding.GetEncoding("iso-8859-1");
System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;
// Unicode string.
string s_unicode = "abcéabc";
// Convert to ISO-8859-1 bytes.
byte[] isoBytes = iso_8859_1.GetBytes(s_unicode);
// Convert to UTF-8.
byte[] utf8Bytes = System.Text.Encoding.Convert(iso_8859_1, utf_8, isoBytes);
.NET Strings are all UTF-16 internally. There is no UTF-8 or ISO-8859-1 encoded System.String in .NET. To get the binary representation of the string in a particular encoding use System.Text.Encoding class:
byte[] bytes = Encoding.GetEncoding("iso-8859-1").GetBytes("hello world");