I have a string which contains binary data (non-text data).
How do I convert this to a raw byte array?
A string in C# - by definition - does not contain binary data. It consists of a sequence of Unicode characters.
If your string contains only Unicode characters in the ASCII (7-bit) character set, you can use Encoding.ASCII to convert the string to bytes:
byte[] result = Encoding.ASCII.GetBytes(input);
If you have a string that contains Unicode characters in the range u0000-u00ff and want to interpret these as bytes, you can cast the characters to bytes:
byte[] result = new byte[input.Length];
for (int i = 0; i < input.Length; i++)
{
result[i] = (byte)input[i];
}
It is a very bad idea to store binary data in a string. However, if you absolutely must do so, you can convert a binary string to a byte array using the 1252 code page. Do not use code page 0 or you will lose some values when in foreign languages. It just so happens that code page 1252 properly converts all byte values from 0 to 255 to Unicode and back again.
There have been some poorly written VB6 programs that use binary strings. Unfortunately some are so many lines of code that it is almost impossible to convert it all to byte() arrays in one go.
You've been warned. Use at your own peril:
Dim bData() As Byte
Dim sData As String
'convert string binary to byte array
bData = System.Text.Encoding.GetEncoding(1252).GetBytes(sData)
'convert byte array to string binary
sData = System.Text.Encoding.GetEncoding(1252).GetString(bData)
Here is one way:
public static byte[] StrToByteArray(string str)
{
System.Text.ASCIIEncoding encoding=new System.Text.ASCIIEncoding();
return encoding.GetBytes(str);
}
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
byte[] theBytes = encoding.GetBytes("Some String");
Note, there are other encoding formats you may want to use.
Related
I want to convert a String to ByteArray in C# for Decrypt some data.
When I get de String from the ByteArray created, it shows question marks (?).
Example code:
byte[] strTemp = Encoding.ASCII.GetBytes(strData);
MessageBox.Show(strData);
MessageBox.Show(Encoding.ASCII.GetString(strTemp));
The string is "Ê<,,l"x¡" (With the double quotation mark) and the result to convert again to string is: ???l?x?
I hope this helps you:
To Get byte array from a string
private byte[] StringToByteArray(string str)
{
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
return enc.GetBytes(str);
}
To get a string back from byte array:
private string ByteArrayToString(byte[] arr)
{
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
return enc.GetString(arr);
}
For specific of that input, this BigEndianUnicode encoding seems works fine
byte[] strTemp = Encoding.BigEndianUnicode.GetBytes(strData);
MessageBox.Show(strData);
MessageBox.Show(Encoding.BigEndianUnicode.GetString(strTemp));
`
You are getting a byte array for the ASCII representation of your string, but your string is Unicode.
C# uses Unicode to encode strings, Unicode being able to represent far more symbols as ASCII.
In your example, every symbol which has no ASCII representation is replaced by '?', this is why only 'l' and 'x' appear in the output.
The proper way to do it is to use a Unicode encoding instead:
byte[] strTemp = Encoding.UTF8.GetBytes(strData);
MessageBox.Show(strData);
MessageBox.Show(Encoding.UTF8.GetString(strTemp));
Basically, any Unicode encoding can be used: UTF8, UTF32, Unicode, BigEndianUnicode (https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings).
So, I have a string that is actually UTF encoded characters with the ASCII representation codes stripped out:
"537465616d6c696e6564"
This would be represented in ASCII encoded UTF as \x53\x74\x65 [...]
I've tried to Regexp replace in \x at the correct positions, byte encoding it and reading it back as UTF, but to no avail.
What's the most effective way of turning the ASCII string into readable UTF in C#?
So what I understand you have a string "537465616d6c696e6564" that actually means char[] chars = { '\x53', '\x74', ... }.
First convert this string to array of bytes How can I convert a hex string to a byte array?
For your convenience:
public static byte[] StringToByteArray(string hex) {
return Enumerable.Range(0, hex.Length)
.Where(x => x % 2 == 0)
.Select(x => Convert.ToByte(hex.Substring(x, 2), 16))
.ToArray();
}
Then there are many UTF encodings (UTF-8, UTF-16), C# internally uses UTF-16 (actually subset of it), so I assume that you want UTF-16 string:
string str = System.Text.Encoding.Unicode.GetString(array);
If you get incorrect characters after decoding you may also try UTF-8 encoding (just in case if you don't know exact encoding, Encoding.UTF8).
I don't know much about string encodings, but assuming that your original string is the hex representation of a series of bytes, you could do something like this:
class Program
{
private const string encoded = "537465616d6c696e6564";
static void Main(string[] args)
{
byte[] bytes = StringToByteArray(encoded);
string text = Encoding.ASCII.GetString(bytes);
Console.WriteLine(text);
Console.ReadKey();
}
// From https://stackoverflow.com/questions/311165/how-do-you-convert-byte-array-to-hexadecimal-string-and-vice-versa
public static byte[] StringToByteArray(String hex)
{
int NumberChars = hex.Length;
byte[] bytes = new byte[NumberChars / 2];
for (int i = 0; i < NumberChars; i += 2)
bytes[i / 2] = Convert.ToByte(hex.Substring(i, 2), 16);
return bytes;
}
}
If you later wanted to encode the result as UTF8, you could then use:
Encoding.UTF8.GetBytes(text);
I've taken one implementation of the StringToByteArray conversion but there are many. If performance is important, you may want to choose a more efficient one. See the links below for more info.
On byte to string conversion (some interesting discussions on performance):
How do you convert Byte Array to Hexadecimal String, and vice versa?
How can I convert a hex string to a byte array?
On strings in .NET
Determine a string's encoding in C#
http://csharpindepth.com/Articles/General/Strings.aspx
I am attempting to create an Hash for an API.
my input is something like this:
FBN|Web|3QTC0001|RS1|260214133217|000000131127897656
And my expected output is like :
17361DU87HT56F0O9967E34FDFFDFG7UO334665324308667FDGJKD66F9888766DFKKJJR466634HH6566734JHJH34766734NMBBN463499876554234343432456
I tried the bellow but I keep getting
"Specified value has invalid Control characters. Parameter name: value"
I am actually doing this in a REST service.
public static string GetHash(string text)
{
string hash = "";
SHA512 alg = SHA512.Create();
byte[] result = alg.ComputeHash(Encoding.UTF8.GetBytes(text));
hash = Encoding.UTF8.GetString(result);
return hash;
}
What am I missing?
The problem is Encoding.UTF8.GetString(result) as the data in result is invalid UTF-8 (it's just binary goo!) so trying to convert it to text is invalid - in general, and specifically for this input - which results in the Exception being thrown.
Instead, convert the byte[] to the hex representation of said byte sequence; don't treat it as UTF-8 encoded text.
See the questions How do you convert Byte Array to Hexadecimal String, and vice versa? and How can I convert a hex string to a byte array?, which discuss several different methods of achieving this task.
In order to make this work you need to convert the individual byte elements into a hex representation
var builder = new StringBuilder();
foreach(var b in result) {
builder.AppendFormat("{0:X2}", b);
}
return builder.ToString();
You might want to consider using Base64 encoding (AKA UUEncode):
public static string GetHash(string text)
{
SHA512 alg = SHA512.Create();
byte[] result = alg.ComputeHash(Encoding.UTF8.GetBytes(text));
return Convert.ToBase64String(result);
}
For your example string, the result is
OJgzW5JdC1IMdVfC0dH98J8tIIlbUgkNtZLmOZsjg9H0wRmwd02tT0Bh/uTOw/Zs+sgaImQD3hh0MlzVbqWXZg==
It has an advantage of being more compact than encoding each byte into two characters: three bytes takes four characters with Base64 encoding or six characters the other way.
I have a C++ code snippet that uses MultiByteToWideChar to convert UTF-8 string to UTF-16
For C++, if input is "Hôtel", the output is "Hôtel" which is correct
For C#, if input is "Hôtel", the output is "Hôtel" which is not correct.
The C# code to convert from UTF8 to UTF16 looks like
Encoding.Unicode.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.Unicode,
Encoding.UTF8.GetBytes(utf8)));
In C++ the conversion code looks like
MultiByteToWideChar(
CP_UTF8, // convert from UTF-8
0, // default flags
utf8.data(), // source UTF-8 string
utf8.length(), // length (in chars) of source UTF-8 string
&utf16[0], // destination buffer
utf16.length() // size of destination buffer, in wchar_t's
)
I want to have the same results in C# that I am getting in C++. Is there anything wrong with the C# code ?
It appears you want to treat string characters as Windows-1252 (Often mislabeled as ANSI) code points, and have those code points decoded as UTF-8 bytes, where Windows-1252 code point == UTF-8 byte value.
The reason the accepted answer doesn't work is that it treats the string characters as unicode code points, rather than
Windows-1252. It can get away with most characters because Windows-1252 maps them exactly the same as unicode, but input with characters
like –, €, ™, ‘, ’, ”, • etc.. will fail because Windows-1252 maps those differently than unicode in this sense.
So what you want is simply this:
public static string doWeirdMapping(string arg)
{
Encoding w1252 = Encoding.GetEncoding(1252);
return Encoding.UTF8.GetString(w1252.GetBytes(arg));
}
Then:
Console.WriteLine(doWeirdMapping("Hôtel")); //prints Hôtel
Console.WriteLine(doWeirdMapping("HVOLSVÖLLUR")); //prints HVOLSVÖLLUR
Maybe this one:
private static string Utf8ToUnicode(string input)
{
return Encoding.UTF8.GetString(input.Select(item => (byte)item).ToArray());
}
Try This
string str = "abc!";
Encoding unicode = Encoding.Unicode;
Encoding utf8 = Encoding.UTF8;
byte[] unicodeBytes = unicode.GetBytes(str);
byte[] utf8Bytes = Encoding.Convert( unicode,
utf8,
unicodeBytes );
Console.WriteLine( "UTF Bytes:" );
StringBuilder sb = new StringBuilder();
foreach( byte b in utf8Bytes ) {
sb.Append( b ).Append(" : ");
}
Console.WriteLine( sb.ToString() );
This Link would be helpful for you to understand about encodings and their conversions
Use System.Text.Encoding.UTF8.GetString().
Pass in your UTF-8 encoded text, as a byte array. The function returns a standard .net string which is encoded in UTF-16.
Sample function will be as below:
private string ReadData(Stream binary_file) {
System.Text.Encoding encoding = System.Text.Encoding.UTF8;
// Read string from binary file with UTF8 encoding
byte[] buffer = new byte[30];
binary_file.Read(buffer, 0, 30);
return encoding.GetString(buffer);
}
I want to compare the first few bytes in byte[] with a string. How can i do this?
You must know the encoding of the byte array to properly compare them.
For example, if you know your byte array is made of UTF-8 bytes, then you can create a string from the byte array:
System.Text.UTF8Encoding enc = new System.Text.UTF8Encoding();
string s = enc.GetString(originalBytes);
Now you can compare string s to your other string.
Conversely, if you want to compare just the first few bytes, you can convert the string into a UTF8 byte array:
System.Text.UTF8Encoding enc = new System.Text.UTF8Encoding();
byte[] b = enc.GetBytes(originalString);
Now you can compare byte array b to your other byte array.
There are several other encoding objects for ASCII, Unicode, etc. See the MSDN page here.
use
byte [] fromString = Encoding.Default.GetBytes("helloworld");