I have a unicode text with some unicode characters say,"Hello, world! this paragraph has some unicode characters."
I want to convert this paragraph to binary string i.e in binary digits with datatype string. and after converting, I also want to convert that binary string back to unicode string.
If you're simply looking for a way to decode and encode a string into byte[] and not actual binary then i would use System.Text
The actual example from msdn:
string unicodeString = "This string contains the unicode character Pi (\u03a0)";
// Create two different encodings.
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte array.
byte[] unicodeBytes = unicode.GetBytes(unicodeString);
// Perform the conversion from one encoding to the other.
byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
// Display the strings created before and after the conversion.
Console.WriteLine("Original string: {0}", unicodeString);
Console.WriteLine("Ascii converted string: {0}", asciiString);
Don't forget
using System;
using System.Text;
Since there are several encodings for the Unicode character set, you have to pick: UTF-8, UTF-16, UTF-32, etc. Say you picked UTF-8. You have to use the same encoding going both ways.
To convert to a binary string:
String.Join(
String.Empty, // running them all together makes it tricky.
Encoding.UTF8
.GetBytes("Hello, world! this paragraph has some unicode characters.")
.Select(byt => Convert.ToString(byt, 2).PadLeft(8, '0'))) // must ensure 8 digits.
And back again:
Encoding.UTF8.GetString(
Regex.Split(
"010010000110010101101100011011000110111100101100001000000111011101101111011100100110110001100100001000010010000001110100011010000110100101110011001000000111000001100001011100100110000101100111011100100110000101110000011010000010000001101000011000010111001100100000011100110110111101101101011001010010000001110101011011100110100101100011011011110110010001100101001000000110001101101000011000010111001001100001011000110111010001100101011100100111001100101110"
,"(.{8})") // this is the consequence of running them all together.
.Where(binary => !String.IsNullOrEmpty(binary)) // keeps the matches; drops empty parts
.Select(binary => Convert.ToByte(binary, 2))
.ToArray())
Related
Anyone knows how to encode ISO-8859-2 charset in C#? The following example does not work:
String name = "Filipović";
String encoded = WebUtility.HtmlEncode(name);
The resulting string should be
"Filipović"
Thanks
After reading your comments (you should support also Chinese names using ASCII chars only) I think you shouldn't stick to ISO-8859-2 encoding.
Solution 1
Use UTF-7 encoding for such names. UTF-7 is designed to use only ASCII characters for any Unicode string.
string value = "Filipović with Unicode symbol: 🏯";
var encoded = Encoding.ASCII.GetString(Encoding.UTF7.GetBytes(value));
Console.WriteLine(encoded); // Filipovi+AQc- with Unicode symbol: +2Dzf7w-
var decoded = Encoding.UTF7.GetString(Encoding.ASCII.GetBytes(encoded));
Solution 2
Alternatively, you can use base64 encoding, too. But in this case the pure ASCII strings will not be human-readable anymore.
string value = "Filipović with Unicode symbol: 🏯";
encoded = Convert.ToBase64String(Encoding.UTF8.GetBytes(value));
Console.WriteLine(encoded); // RmlsaXBvdmnEhyB3aXRoIFVuaWNvZGUgc3ltYm9sOiDwn4+v
var decoded = Encoding.UTF8.GetString(Convert.FromBase64String(encoded));
Solution 3
If you really stick to HTML Entity encoding you can achieve it like this:
string value = "Filipović with Unicode symbol: 🏯";
var result = new StringBuilder();
for (int i = 0; i < value.Length; i++)
{
if (Char.IsHighSurrogate(value[i]))
{
result.Append($"&#{Char.ConvertToUtf32(value[i], value[i + 1])};");
i++;
}
else if (value[i] > 127)
result.Append($"&#{(int)value[i]};");
else
result.Append(value[i]);
}
Console.WriteLine(result); // Filipović with Unicode symbol: 🏯
If you don't have strict requirement on Html encoding I'd recommend using Url (%) encoding which encodes all non-ASCII characters:
String name = "Filipović";
String encoded = WebUtility.UrlEncode(name); // Filipovi%C4%87
If you must have string with all non-ASCII characters to be HTML encoded consistently your best bet is use &xNNNN; or &#NNNN; format to encode all characters above 127. Unfortunately there is no way to convience HtmlEncode to encode all characters, so you need to do it yourself i.e. similarly how it is done in Convert a Unicode string to an escaped ASCII string. You can continue using HtmlDecode to read the values back at it handles &#xNNNN just fine.
Non optimal sample:
var name = "Filipović";
var result = String.Join("",
name.Select(x => x < 127 ? x.ToString() : String.Format("&#x{0:X4}", (int)x))
);
If I save this string to a text file;
Hello this \n is a test message
The \n character is saved as HEX [5C 6E] I would like to have it saved as [0A].
I believe this is an encoding issue?
I am using;
// 1252 is a variable in the application
Encoding codePage = Encoding.GetEncoding("1252");
Byte[] bytes = new UTF8Encoding(true).GetBytes("Hello this \\n is a test message");
Byte[] encodedBytes = Encoding.Convert(Encoding.UTF8, codePage , bytes);
All this is inside a FileStream scope and uses fs.Write to write the encodedBytes into the file.
I have tried to use \r\n but had the same result.
Any suggestions?
Thanks!
EDIT
The string is being read from a tsv file and placed into an string array. The string being read has the "\n" in it.
To read the string I use a StreamReader reader and split at \t
At execution time, your string contains a backslash character followed by an n. They're encoded exactly as they should be. If you actually want a linefeed character, you shouldn't be escaping the backslash in your code:
Byte[] bytes = new UTF8Encoding(true).GetBytes("Hello this \n is a test message");
That string literal uses \n to represent U+000A, the linefeed character. At execution time, the string won't contain a backslash or an n - it will only contain the linefeed.
However, your code is already odd in that if you want to get the encoded form of a string, there's no reason to go via UTF-8:
byte encodedBytes = codePage.GetBytes("Hello this \n is a test message");
I want to convert an integer to 3 character ascii string. For example if integer is 123, the my ascii string will also be "123". If integer is 1, then my ascii will be "001". If integer is 45, then my ascii string will be "045". So far I've tried Convert.ToString but could not get the result. How?
int myInt = 52;
string myString = myInt.ToString("000");
myString is "052" now. Hope it will help
Answer for the new question:
You're looking for String.PadLeft. Use it like myInteger.ToString().PadLeft(3, '0'). Or, simply use the "0" custom format specifier. Like myInteger.ToString("000").
Answer for the original question, returning strings like "0x31 0x32 0x33":
String.Join(" ",myInteger.ToString().PadLeft(3,'0').Select(x=>String.Format("0x{0:X}",(int)x))
Explanation:
The first ToString() converts your integer 123 into its string representation "123".
PadLeft(3,'0') pads the returned string out to three characters using a 0 as the padding character
Strings are enumerable as an array of char, so .Select selects into this array
For each character in the array, format it as 0x then the value of the character
Casting the char to int will allow you to get the ASCII value (you may be able to skip this cast, I am not sure)
The "X" format string converts a numeric value to hexadecimal
String.Join(" ", ...) puts it all back together again with spaces in between
It depends on if you actually want ASCII characters or if you want text. The below code will do both.
int value = 123;
// Convert value to text, adding leading zeroes.
string text = value.ToString().PadLeft(3, '0');
// Convert text to ASCII.
byte[] ascii = Encoding.ASCII.GetBytes(text);
Realise that .Net doesn't use ASCII for text manipulation. You can save ASCII to a file, but if you're using string objects, they're encoded in UTF-16.
While reading bytes from a file containing UTF7 encoded characters the first bracket '{' is supposed to be encoded to 123 or 007B but it is not happening.All other characters are encoded right but not '{'.The code I am using is given below.
StreamReader _HistoryLocation = new StreamReader("abc.txt");
String _ftpInformation = _HistoryLocation.ReadLine();
UTF7Encoding utf7 = new UTF7Encoding();
Byte[] encodedBytes = utf7.GetBytes(_ftpInformation);
What might be the problem ?
As per RFC2152 that you reference '{' and similar characters may only optionally be encoded as directly - they may instead be encoded.
Notice that UTF7Encoding has an overloaded constructor with an allowOptionals flag that will directly encode the RFC2152 optional characters.
I have a device to which I'm trying to connect via a socket, and according to the manual, I need the "STX character of hex 02".
How can I do this using C#?
Just a comment to GeoffM's answer (I don't have enough points to comment the proper way).
You should never embed STX (or other characters) that way using only two digits.
If the next character (after "\x02") was a valid hex digit, that would also be parsed and it would be a mess.
string s1 = "\x02End";
string s2 = "\x02" + "End";
string s3 = "\x0002End";
Here, s1 equals ".nd", since 2E is the dot character, while s2 and s3 equal STX + "End".
You can use a Unicode character escape: \u0002
Cast the Integer value of 2 to a char:
char cChar = (char)2;
\x02 is STX Code you can check the ASCII Table
checkFinal = checkFinal.Replace("\x02", "End").ToString().Trim();
Within a string, clearly the Unicode format is best, but for use as a byte, this approach works:
byte chrSTX = 0x02; // Start of Text
byte chrETX = 0x03; // End of Text
// etc...
You can embed the STX within a string like so:
byte[] myBytes = System.Text.Encoding.ASCII.GetBytes("\x02Hello, world!");
socket.Send(myBytes);