Convert decimal NCR string to bytes using C#

Convert decimal NCR string to bytes using C# - c#

I need to allow emojis the be sent in the subject of emails from my application. The user can save the emoji from a web form. When copying and pasting the emoji on the web form, the emoji is saved in a decimal NCR format by the browser.
With the emoji saved in a decimal NCR format, it does not display correctly in the subject line. It is just the decimal NCR string. If I can convert this string to the UTF-8 byte representation, it works just fine.
How can I convert from the decimal NCR format to a UTF-8 byte array? Here is some code that I have been using to test this. I used the Unicode Code Converter to get the values for the croissant emoji.
[Fact]
public void ConvertsToUnicode() {
// arrange
var decimalNcr = "🥐";
var bytes = new byte[] {0xF0, 0x9F, 0xA5, 0x90};
var emoji = "🥐";
// act
var emojiTest = Encoding.UTF8.GetString(bytes);
var encoded = Encoding.UTF8.GetBytes(emoji);
// assert
emojiTest.Should()
.Be(emoji);
encoded.Should()
.BeEquivalentTo(bytes);
}

I guess you could use
WebUtility.HtmlDecode
Converts a string that has been HTML-encoded for HTTP transmission
into a decoded string.
or
HttpUtility.HtmlDecode
Converts a string that has been HTML-encoded for HTTP transmission
into a decoded string.
Example
Console.WriteLine(WebUtility.HtmlDecode("🥐"));
Online Demo

Related

How to convert unicode to utf-8 encoding in c#

I want to convert unicode string to UTF8 string. I want to use this UTF8 string in SMS API to send unicode SMS.
I want conversion like this tool
https://cafewebmaster.com/online_tools/utf8_encode
eg. I have unicode string "हैलो फ़्रेंड्स" and it should be converted into "à¤¹à¥à¤²à¥ à¥à¥à¤°à¥à¤à¤¡à¥à¤¸"
I have tried this but not getting expected output
private string UnicodeToUTF8(string strFrom)
{
byte[] bytes = Encoding.Default.GetBytes(strFrom);
return Encoding.UTF8.GetString(bytes);
}
and calling function like this
string myUTF8String = UnicodeToUTF8("हैलो फ़्रेंड्स");

I don't think this is possible to answer concretely without knowing more about the SMS API you want to use. The string type in C# is UTF-16. If you want a different encoding, it's given to you as a byte[] (because a string is UTF-16, always).
You could 'cast' that into a string by doing something like this:
static string UnicodeToUTF8(string from) {
var bytes = Encoding.UTF8.GetBytes(from);
return new string(bytes.Select(b => (char)b).ToArray());
}
As far as I can tell this yields the same output as the website you linked. However, without knowing what API you're handing this string off to, I can't guarantee that this will ultimately work.
The point of string is that we don't need to worry about its underlying encoding, but this casting operation is kind of a giant hack and makes no guarantees that string represents a well-formed string anymore.
If something expects a UTF-8 encoding, it should accept a byte[], not a string.

Try this:
string output = "hello world";
byte[] bytes1 = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, Encoding.Unicode.GetBytes(output));
byte[] bytes2 = Encoding.Convert(Encoding.Unicode, Encoding.Unicode, Encoding.Unicode.GetBytes(output));
var output1 = Encoding.UTF8.GetString(bytes1);
var output2 = Encoding.Unicode.GetString(bytes2);
You will see that bytes1 is 11 bytes (1 byte per char UTF-8) and bytes2 is 22 bytes (2 bytes per char for unicode)

Converting unknown characters to Greek characters

I have a file which contains the following characters:
ÇËÅÊÔÑÏÖÏÑÇÓÇ ÁÉÌÏÓÖÁÉÑÉÍÇÓ
I am trying to convert that to Greek words and the result should be:
ΗΛΕΚΤΡΟΦΟΡΗΣΗ ΑΙΜΟΣΦΑΙΡΙΝΗΣ
The file that the above value is stored in in Unicode format.
I am applying all possible encodings but no luck in the conversion.
private void Convert()
{
string textFilePhysicalPath = (#"C:\Users\Nec\Desktop\a.txt");
string contents = File.ReadAllText(textFilePhysicalPath);
List<string> sLines = new List<string>();
// For every encoding, get the property values.
EncodingInfo ei;
foreach (var ei in Encoding.GetEncodings())
{
Encoding e = ei.GetEncoding();
Encoding iso = Encoding.GetEncoding(ei.Name);
Encoding utfx = Encoding.Unicode;
byte[] utfBytes = utfx.GetBytes(contents);
byte[] isoBytes = Encoding.Convert(utfx, iso, utfBytes);
string msg = iso.GetString(isoBytes);
string xx = (ei.Name + " " + msg);
sLines.Add(xx);
}
using (StreamWriter file = new StreamWriter(#"C:\Users\Nec\Desktop\result.txt"))
{
foreach (var line in sLines)
file.WriteLine(line);
}
}
A website that converts it correctly is http://www.online-decoder.com/el
but even when I use the ISO-8859-1 to ISO-8859-7 it still doesn't work in .NET.

This code converts the string from the C# which is UTF-16 to an 8-bit representation using the common ISO-8859-1 codepage. Then it converts it back to UTF-16 using the greek codepage windows-1253. The result is ΗΛΕΚΤΡΟΦΟΡΗΣΗ ΑΙΜΟΣΦΑΙΡΙΝΗΣ as you want.
string errorneousString = "ÇËÅÊÔÑÏÖÏÑÇÓÇ ÁÉÌÏÓÖÁÉÑÉÍÇÓ";
byte[] asIso88591Bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(errorneousString);
string asGreekString = Encoding.GetEncoding("windows-1253").GetString(asIso88591Bytes);
Console.OutputEncoding = System.Text.Encoding.UTF8;
Console.WriteLine(asGreekString);
Edit: Since your file is encoded in an 8-bit format, you need to specify the codepage when reading it. Use this:
string fileContents = File.ReadAllText("189.dat", Encoding.GetEncoding("windows-1253"));
Console.OutputEncoding = System.Text.Encoding.UTF8;
Console.WriteLine(fileContents);
That reads the content as
'CS','C.S.F. EXAMINATION','ΕΞΕΤΑΣΗ Ε.Ν.Υ.' 'EH','Hb
ELECTROPHORESIS','ΗΛΕΚΤΡΟΦΟΡΗΣΗ ΑΙΜΟΣΦΑΙΡΙΝΗΣ' 'EP','PROTEIN
ELECTROPHORESIS','ΗΛΕΚΤΡΟΦΟΡΗΣΗ ΠΡΩΤΕΙΝΩΝ' 'FB','HAEMATOLOGY -
FBC','ΓΕΝΙΚΗ ΕΞΕΤΑΣΗ ΑΙΜΑΤΟΣ - FBC' 'FR','FREE TEXT', 'GT','GLUCOSE
TOLERANCE TEST','ΔΟΚΙΜΑΣΙΑ ΑΝΟΧΗΣ ΓΛΥΚΟΖΗΣ'
'MI','MICROBIOLOGY','ΜΙΚΡΟΒΙΟΛΟΓΙΑ' 'NO','NORMAL FORM','ΚΑΝΟΝΙΚΟ
ΔΕΛΤΙΟ' 'RE','RENAL CALCULUS','ΧΗΜΙΚΗ ΑΝΑΛΥΣΗ ΟΥΡΟΛΙΘΟΥ' 'SE','SEMEN
ANALYSIS','ΣΠΕΡΜΟΔΙΑΓΡΑΜΜΑ' 'SP','SPECIAL PATHOLOGY','SPECIAL
PATHOLOGY' 'ST','STOOL EXAMINATION
','ΕΞΕΤΑΣΗ ΚΟΠΡΑΝΩΝ' 'SW','SEMEN WASH','SEMEN WASH'
'TH','THROMBOPHILIA PANEL','THROMBOPHILIA PANEL' 'UR','URINE
ANALYSIS','ΓΕΝΙΚΗ ΕΞΕΤΑΣΗ ΟΥΡΩΝ' 'WA','WATER CULTURE REPORT','ΑΝΑΛΥΣΗ
ΝΕΡΟΥ' 'WI','WIDAL ','ΑΝΟΣΟΒΙΟΛΟΓΙΑ'

This is an ASCII file stored using the Greek (1253) codepage which was read using a different codepage.
File.ReadAllText tries to detect whether the file is UTF16 or UTF8 by checking the BOM bytes and falls back to UTF8 by default. UTF8 is essentially the 7-bit ANSI codepage for single-byte text, which means that trying to read a nonUnicode, nonANSI file like this will result in garbled text.
To load a file using a specific encoding/codepage, just pass the encoding as the Encoding parametter, eg :
var enc = Encoding.GetEncoding(1253);
var text=File.ReadAllText(#"189.dat",enc);
Strings in .NET are Unicode, specifically UTF16. This means that text doesn't need any conversions. Its contents will be :
'CS','C.S.F. EXAMINATION','ΕΞΕΤΑΣΗ Ε.Ν.Υ.'
'EH','Hb ELECTROPHORESIS','ΗΛΕΚΤΡΟΦΟΡΗΣΗ ΑΙΜΟΣΦΑΙΡΙΝΗΣ'
'EP','PROTEIN ELECTROPHORESIS','ΗΛΕΚΤΡΟΦΟΡΗΣΗ ΠΡΩΤΕΙΝΩΝ'
'FB','HAEMATOLOGY - FBC','ΓΕΝΙΚΗ ΕΞΕΤΑΣΗ ΑΙΜΑΤΟΣ - FBC'
'FR','FREE TEXT',
'GT','GLUCOSE TOLERANCE TEST','ΔΟΚΙΜΑΣΙΑ ΑΝΟΧΗΣ ΓΛΥΚΟΖΗΣ'
'MI','MICROBIOLOGY','ΜΙΚΡΟΒΙΟΛΟΓΙΑ'
'NO','NORMAL FORM','ΚΑΝΟΝΙΚΟ ΔΕΛΤΙΟ'
'RE','RENAL CALCULUS','ΧΗΜΙΚΗ ΑΝΑΛΥΣΗ ΟΥΡΟΛΙΘΟΥ'
'SE','SEMEN ANALYSIS','ΣΠΕΡΜΟΔΙΑΓΡΑΜΜΑ'
'SP','SPECIAL PATHOLOGY','SPECIAL PATHOLOGY'
'ST','STOOL EXAMINATION ','ΕΞΕΤΑΣΗ ΚΟΠΡΑΝΩΝ'
'SW','SEMEN WASH','SEMEN WASH'
'TH','THROMBOPHILIA PANEL','THROMBOPHILIA PANEL'
'UR','URINE ANALYSIS','ΓΕΝΙΚΗ ΕΞΕΤΑΣΗ ΟΥΡΩΝ'
'WA','WATER CULTURE REPORT','ΑΝΑΛΥΣΗ ΝΕΡΟΥ'
'WI','WIDAL ','ΑΝΟΣΟΒΙΟΛΟΓΙΑ'
UTF16 uses two bytes for every character. If a UTF16 file was opened in a hex browser, every other character would be a NUL (0x00). It's not UTF8 either - outside the 7-bit ANSI range each character uses two or more bytes that always have the high bit set. Instead of one garbled character there would be two at least.
File and stream methods that could be affected by encoding or culture in .NET always have an overload that accepts an Encoding or CultureInfo parameter.
Console
Writing the output to the Console may display in garbled text. The text isn't really converted, just displayed the wrong way.
While the console can display Unicode text it assumes that the system's codepage is used by default. In the past it couldn't even support UTF8 as a codepage - there was no such option in the settings. After all, the label for the system locale settings is Language used for non-Unicode programs.
The latest Windows 10 Insider releases offer UTF8 as the system codepage as a beta option.
To ensure Unicode text appears properly in the console one would have to set its encoding to UTF8, eg :
var text=File.ReadAllText(#"189.dat",enc);
Console.OutputEncoding = Encoding.UTF8;
Console.WriteLine(text);

I don't know what codepage this is, but it seems to be simply offset by some values. You can convert the source string to the target string by adding 11 to the first byte and 16 to the second byte:
var input = Encoding.Default.GetBytes("ÇËÅÊÔÑÏÖÏÑÇÓÇ ÁÉÌÏÓÖÁÉÑÉÍÇÓ");
for (var i = 0; i < input.Length; i++)
{
if (input[i] == 32) continue;
input[i++] += 11;
input[i] += 16;
}
var output = Encoding.UTF8.GetString(input);
Result: ΗΛΕΚΤΡΟΦΟΡΗΣΗ ΑΙΜΟΣΦΑΙΡΙΝΗΣ
Not sure if this is a solution, but it may give you a hint.

Just it c#
HtmlWeb web = new HtmlWeb();
web.OverrideEncoding = Encoding.GetEncoding(65001);

Encode numeric string as alphanumeric in ZXing.Net

I'm building an app that requires encoding of a 16-digit all-numerical string, using ZXing.Net, if I do the following:
ZXing.Common.EncodingOptions qr_options = new ZXing.Common.EncodingOptions();
qr_options.Width = 144;
qr_options.Height = 144;
qr_options.Hints.Add(ZXing.EncodeHintType.DISABLE_ECI, true);
IBarcodeWriter qr_wr = new BarcodeWriter() { Format = BarcodeFormat.QR_CODE, Options = qr_options};
OutputImg = (WriteableBitmap)qr_wr.Write(MyAllNumberString).ToBitmap();
I'm getting a QR code that's encoded as numerics, which generates a different raw byte array. Is there a setting somewhere that I need to set, so that the numeric string is encoded as alphanumeric string?

I ended up building ZXing.net from source, and modified the source code. There is a function inside ZXing.net that checks your input string to be encoded, if that string contains only numeric characters, it will set the encoding mode of QR code to NUMERICAL. I added an option to force it to be encoded as ALPHANUMERICAL. I'll post a link to changes I made later. I have raised this issue to ZXing.Net's owner but they seem lukewarm to the solution I proposed.

Encoding not converting

An ASP.NET page (ashx) receives a GET request with a UTF8 string. It reads a SqlServer database with Windows-1255 data.
I can't seem to get them to work together. I've used information gathered on SO (mainly Convert a string's character encoding from windows-1252 to utf-8) as well as msdn on the subject.
When I run anything through the functions below - it always ends up the same as it started - not converted at all.
Is something done wrong?
EDIT
What I'm specifically trying to do (getData returns a Dictionary<int, string>):
getData().Where(a => a.Value.Contains(context.Request.QueryString["q"]))
Result is empty, unless I send a "neutral" character such as "'" or ",".
CODE
string windows1255FromUTF8(string p)
{
Encoding win = Encoding.GetEncoding(1255);
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(p);
byte[] winBytes = Encoding.Convert(utf8, win, utfBytes);
return win.GetString(winBytes);
}
string UTF8FromWindows1255(string p)
{
Encoding win = Encoding.GetEncoding(1255);
Encoding utf8 = Encoding.UTF8;
byte[] winBytes = win.GetBytes(p);
byte[] utfBytes = Encoding.Convert(win, utf8, winBytes);
return utf8.GetString(utfBytes);
}

There is nothing wrong with the functions, they are simply useless.
What the functions do is to encode the strings into bytes, convert the data from one encoding to another, then decode the bytes back to a string. Unless the string contains a character that is not possible to encode using the windows-1255 encoding, the returned value should be identical to the input.
Strings in .NET doesn't have an encoding. If you get a string from a source where the text was encoded using for example UTF-8, once it's decoded into a string it doesn't have that encoding any more. You don't have to do anyting to a string to use it when the destination has a specific encoding, whatever library you are using that takes the string will take care of the encoding.

For some reason this worked:
byte[] fromBytes = (fromEncoding.UTF8).GetBytes(myString);
string finalString = (Encoding.GetEncoding(1255)).GetString(fromBytes);
Switching encoding without the conversion...

Encoding strings to and from base-64

I'm trying to encode some strings back and forth from base-64 string and I'm having truble to get the right result.
string text = base64string.... //Here I have a base-64 string.
byte[] encodedByte = System.Text.ASCIIEncoding.ASCII.GetBytes(text);
string base64Encoded = Convert.ToBase64String(encodedByte);
if (text == base64Encoded) //If the new encoded string is equal to its original value
return base64Encoded;
I have tried my ways to do this and I don't seem to get the right result. I have tried both with System.Text.Encoding.Unicode and System.Text.Encoding.UTF8
What could be the problem? Does anyone have a proper solution?

string text = base64string.... //Here I have a base-64 string.
byte[] encodedByte = System.Text.ASCIIEncoding.ASCII.GetBytes(text);
string base64Encoded = Convert.ToBase64String(encodedByte);
You are double encoding the string. You begin with a base64 string, get the bytes, and then encode it again. If you want to compare you will need to begin with the original string.

If text is a base-64 string, then you are doing it backwards:
byte[] raw = Convert.FromBase64String(text); // unpack the base-64 to a blob
string s = Encoding.UTF8.GetString(raw); // assume the blob is UTF-8, and
// decode to a string
which will get you it as a string. Note, though, that this scenario is only useful for representing unicode text in an ascii format. Normally you wouldn't base-64 encode it if the original contents are string.

Convert whatever it is that you need in Base64 into a Byte array then use the FromBase64String and ToBase64String to convert to and from Base64:
Byte[] buffer = Convert.FromBase64String(myBase64String1);
myBase64String2 = Convert.ToBase64String(buffer);
myBase64String1 will be equal to myBase64String2. You will need to use other methods to get your data type into a Byte array and the reverse to get your data type back. I have used this to convert the content of a class into a byte array and then to Base64 string and write the string to the filesystem. Later I read it back into a class instance by reversing the process.

You have the encoding code correctly laid out. To confirm whether the base64-encoded string is correct, you can try decoding it and comparing the decoded contents to the original:
var decodedBytes = Convert.FromBase64String(base64encoded);
var compareText = System.Text.Encoding.ASCII.GetString(decodedText);
if (text == compareText)
{
// carry on...
return base64encoded;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Convert decimal NCR string to bytes using C# - c#

Related

How to convert unicode to utf-8 encoding in c#

Converting unknown characters to Greek characters

Encode numeric string as alphanumeric in ZXing.Net

Encoding not converting

Encoding strings to and from base-64

Categories

Resources