While reading bytes from a file containing UTF7 encoded characters the first bracket '{' is supposed to be encoded to 123 or 007B but it is not happening.All other characters are encoded right but not '{'.The code I am using is given below.
StreamReader _HistoryLocation = new StreamReader("abc.txt");
String _ftpInformation = _HistoryLocation.ReadLine();
UTF7Encoding utf7 = new UTF7Encoding();
Byte[] encodedBytes = utf7.GetBytes(_ftpInformation);
What might be the problem ?
As per RFC2152 that you reference '{' and similar characters may only optionally be encoded as directly - they may instead be encoded.
Notice that UTF7Encoding has an overloaded constructor with an allowOptionals flag that will directly encode the RFC2152 optional characters.
Related
I am converting hex to a UTF8 string using below line.
var obInstruction = Encoding.UTF8.GetString(ob.Bits);
In the result I got � between every character as shown in the picture below.
what is � ?
So I added replace to the line and changed it to
var obInstruction = Encoding.UTF8.GetString(ob.Bits).Replace("�", ""); but � won't go away.
When I tried to replace other characters Replace work fine but not for �.
What is � and how can I remove it?
In Power query, Text.Clean will remove such strange characters but I am not sure how to do in C#.
*Edit: added a picture for the result with UTF32
Empty boxes with UTF32:
This should do the job:
var input = new byte[5];
var encoding = Encoding.GetEncoding(Encoding.ASCII.EncodingName,
new EncoderReplacementFallback(""),
new DecoderReplacementFallback(""));
var converted = Encoding.Convert(Encoding.UTF8, encoding, input);
var output = encoding.GetString(converted);
This would remove all non-ascii chars with an empty string
Anyone knows how to encode ISO-8859-2 charset in C#? The following example does not work:
String name = "Filipović";
String encoded = WebUtility.HtmlEncode(name);
The resulting string should be
"Filipović"
Thanks
After reading your comments (you should support also Chinese names using ASCII chars only) I think you shouldn't stick to ISO-8859-2 encoding.
Solution 1
Use UTF-7 encoding for such names. UTF-7 is designed to use only ASCII characters for any Unicode string.
string value = "Filipović with Unicode symbol: 🏯";
var encoded = Encoding.ASCII.GetString(Encoding.UTF7.GetBytes(value));
Console.WriteLine(encoded); // Filipovi+AQc- with Unicode symbol: +2Dzf7w-
var decoded = Encoding.UTF7.GetString(Encoding.ASCII.GetBytes(encoded));
Solution 2
Alternatively, you can use base64 encoding, too. But in this case the pure ASCII strings will not be human-readable anymore.
string value = "Filipović with Unicode symbol: 🏯";
encoded = Convert.ToBase64String(Encoding.UTF8.GetBytes(value));
Console.WriteLine(encoded); // RmlsaXBvdmnEhyB3aXRoIFVuaWNvZGUgc3ltYm9sOiDwn4+v
var decoded = Encoding.UTF8.GetString(Convert.FromBase64String(encoded));
Solution 3
If you really stick to HTML Entity encoding you can achieve it like this:
string value = "Filipović with Unicode symbol: 🏯";
var result = new StringBuilder();
for (int i = 0; i < value.Length; i++)
{
if (Char.IsHighSurrogate(value[i]))
{
result.Append($"&#{Char.ConvertToUtf32(value[i], value[i + 1])};");
i++;
}
else if (value[i] > 127)
result.Append($"&#{(int)value[i]};");
else
result.Append(value[i]);
}
Console.WriteLine(result); // Filipović with Unicode symbol: 🏯
If you don't have strict requirement on Html encoding I'd recommend using Url (%) encoding which encodes all non-ASCII characters:
String name = "Filipović";
String encoded = WebUtility.UrlEncode(name); // Filipovi%C4%87
If you must have string with all non-ASCII characters to be HTML encoded consistently your best bet is use &xNNNN; or &#NNNN; format to encode all characters above 127. Unfortunately there is no way to convience HtmlEncode to encode all characters, so you need to do it yourself i.e. similarly how it is done in Convert a Unicode string to an escaped ASCII string. You can continue using HtmlDecode to read the values back at it handles &#xNNNN just fine.
Non optimal sample:
var name = "Filipović";
var result = String.Join("",
name.Select(x => x < 127 ? x.ToString() : String.Format("&#x{0:X4}", (int)x))
);
If I save this string to a text file;
Hello this \n is a test message
The \n character is saved as HEX [5C 6E] I would like to have it saved as [0A].
I believe this is an encoding issue?
I am using;
// 1252 is a variable in the application
Encoding codePage = Encoding.GetEncoding("1252");
Byte[] bytes = new UTF8Encoding(true).GetBytes("Hello this \\n is a test message");
Byte[] encodedBytes = Encoding.Convert(Encoding.UTF8, codePage , bytes);
All this is inside a FileStream scope and uses fs.Write to write the encodedBytes into the file.
I have tried to use \r\n but had the same result.
Any suggestions?
Thanks!
EDIT
The string is being read from a tsv file and placed into an string array. The string being read has the "\n" in it.
To read the string I use a StreamReader reader and split at \t
At execution time, your string contains a backslash character followed by an n. They're encoded exactly as they should be. If you actually want a linefeed character, you shouldn't be escaping the backslash in your code:
Byte[] bytes = new UTF8Encoding(true).GetBytes("Hello this \n is a test message");
That string literal uses \n to represent U+000A, the linefeed character. At execution time, the string won't contain a backslash or an n - it will only contain the linefeed.
However, your code is already odd in that if you want to get the encoded form of a string, there's no reason to go via UTF-8:
byte encodedBytes = codePage.GetBytes("Hello this \n is a test message");
I have a unicode text with some unicode characters say,"Hello, world! this paragraph has some unicode characters."
I want to convert this paragraph to binary string i.e in binary digits with datatype string. and after converting, I also want to convert that binary string back to unicode string.
If you're simply looking for a way to decode and encode a string into byte[] and not actual binary then i would use System.Text
The actual example from msdn:
string unicodeString = "This string contains the unicode character Pi (\u03a0)";
// Create two different encodings.
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte array.
byte[] unicodeBytes = unicode.GetBytes(unicodeString);
// Perform the conversion from one encoding to the other.
byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
// Display the strings created before and after the conversion.
Console.WriteLine("Original string: {0}", unicodeString);
Console.WriteLine("Ascii converted string: {0}", asciiString);
Don't forget
using System;
using System.Text;
Since there are several encodings for the Unicode character set, you have to pick: UTF-8, UTF-16, UTF-32, etc. Say you picked UTF-8. You have to use the same encoding going both ways.
To convert to a binary string:
String.Join(
String.Empty, // running them all together makes it tricky.
Encoding.UTF8
.GetBytes("Hello, world! this paragraph has some unicode characters.")
.Select(byt => Convert.ToString(byt, 2).PadLeft(8, '0'))) // must ensure 8 digits.
And back again:
Encoding.UTF8.GetString(
Regex.Split(
"010010000110010101101100011011000110111100101100001000000111011101101111011100100110110001100100001000010010000001110100011010000110100101110011001000000111000001100001011100100110000101100111011100100110000101110000011010000010000001101000011000010111001100100000011100110110111101101101011001010010000001110101011011100110100101100011011011110110010001100101001000000110001101101000011000010111001001100001011000110111010001100101011100100111001100101110"
,"(.{8})") // this is the consequence of running them all together.
.Where(binary => !String.IsNullOrEmpty(binary)) // keeps the matches; drops empty parts
.Select(binary => Convert.ToByte(binary, 2))
.ToArray())
I have a device to which I'm trying to connect via a socket, and according to the manual, I need the "STX character of hex 02".
How can I do this using C#?
Just a comment to GeoffM's answer (I don't have enough points to comment the proper way).
You should never embed STX (or other characters) that way using only two digits.
If the next character (after "\x02") was a valid hex digit, that would also be parsed and it would be a mess.
string s1 = "\x02End";
string s2 = "\x02" + "End";
string s3 = "\x0002End";
Here, s1 equals ".nd", since 2E is the dot character, while s2 and s3 equal STX + "End".
You can use a Unicode character escape: \u0002
Cast the Integer value of 2 to a char:
char cChar = (char)2;
\x02 is STX Code you can check the ASCII Table
checkFinal = checkFinal.Replace("\x02", "End").ToString().Trim();
Within a string, clearly the Unicode format is best, but for use as a byte, this approach works:
byte chrSTX = 0x02; // Start of Text
byte chrETX = 0x03; // End of Text
// etc...
You can embed the STX within a string like so:
byte[] myBytes = System.Text.Encoding.ASCII.GetBytes("\x02Hello, world!");
socket.Send(myBytes);