Encode to single byte extended ascii values - c#

In C# is there a way to encode the extended ascii values (128-255) into their single byte values as shown here: http://asciitable.com/
I've tried using Encoding.UTF8.GetBytes() but that returns multi byte values for the extended codes. I don't need anything beyond 255, but it would be nice to at least support those. I'm trying to send the text data to an Arduino running and LED matrix and want to handle accented letters, without having to deal with multibyte characters.
EDIT: To clarify, the LED matrix has no specific codepage. It's basically whatever I say it is. There's no built in text support in it or the arduino. It's just a dumb 128x8 pixel display and the controller is manually drawing the text pixel by pixel. Therefore, I'm actually providing a font (as a byte array in a header file) to it and can make any character code correspond to any output that I want... so, which codepage to use is not really an issue other than which one will give me full 8-bit characters.

Just pass the code page number to the Encoding constructor. If what you linked is the correct "extended ASCII" table, that would be 437.
But IBM437 encoding is uncommon outside of DOS programs and Windows console apps. Otherwise, the standard encoding for Western European languages is ISO-8859-1 (Windows code page 28591) or windows-1252.

You need to know the code page that the LED matrix uses. It is bound to be a standard one like 1252, the Windows code page for Western Europe and the Americas.
var bytes = Encoding.GetEncoding(1252).GetBytes("Åãrdvárk");

The Default encoding should handle that. Or use the ANSI codepage/encoding.

Related

Decode UTF-8 bytes as Latin-1 characters

I have a string that I receive from a third party app and I would like to display it correctly in any language using C# on my Windows Surface.
Due to incorrect encoding, a piece of my string looks like this in Farsi (Persian-Arabic):
مدل-رنگ-موی-جدید-5-436x500
whereas it should look like this:
مدل-رنگ-موی-جدید-5-436x500
This link convert this correctly:
http://www.ltg.ed.ac.uk/~richard/utf-8.html
How I can do it in c#?
It is very hard to tell exactly what is going on from the description of your question. We would all be much better off if you provided us with an example of what is happening using a single character instead of a whole string, and if you chose an example character which does not belong to some exotic character set, for example the bullet character (u2022) or something like that.
Anyhow, what is probably happening is this:
The letter "ر" is represented in UTF-8 as a byte sequence of D8 B1, but what you see is "ر", and that's because in UTF-16 Ø is u00D8 and ± is u00B1. So, the incoming text was originally in UTF-8, but in the process of importing it to a dotNet Unicode String in your application it was incorrectly interpreted as being in some 8-bit character set such as ANSI or Latin-1. That's why you now have a Unicode String which appears to contain garbage.
However, the process of converting 8-bit characters to Unicode is for the most part not destructive, so all of the information is still there, that's why the UTF-8 tool that you linked to can still kind of make sense out of it.
What you need to do is convert the string back to an array of ANSI (or Latin-1, whatever) bytes, and then re-construct the string the right way, which is a conversion of UTF-8 to Unicode.
I cannot easily reproduce your situation, so here are some things to try:
byte[] bytes = System.Text.Encoding.Ansi.GetBytes( garbledUnicodeString );
followed by
string properUnicodeString = System.Text.Encoding.UTF8.GetString( bytes );

Best way to convert arabic to hex in custom code table in c#

I'm currently working on a new program to print receipts which contain arabic text. The printer can handle these characters, but uses a special code table to print them. Therefore all information sent to the printer must be in hex. Information sent to the printer can have a mix of both arabic and non arabic characters.
The code table is here(page 133 http://support.epostraders.co.uk/support-files/documents/3/dwY-TM-T88V_TechRefGuide.pdf)
For example, ق = E7, ك = E8
With a standard hex conversion (below) the first 128 latin alphanumeric characters work just fine, but arabic are displayed as question marks.
byte[] ba = Encoding.Default.GetBytes(textBox1.Text);
var hexstring = BitConverter.ToString(ba);
Does anyone have any suggestions for the best way to convert to the correct hex?
Answering this myself in case anyone else has a similar issue.
So the scenario was trying to send Arabic characters to a TM-T88iV Epson printer. Firstly, You need the TM-T88V or later to support Arabic.
Secondly, characters must be sent to the printer as Bytes using a Write command rather than as a String in a WriteLine. The Write command needs to terminate with a CRLF in order for the printer to print. The printer has no Right-to-Left funtion, so the Bytes sent to the printer need to be inverted first: Array.Reverse(byte[])
In order to get visual studio to convert arabic characters to the correct hex values, you need to change the windows code table. This is usually done in Control panel > region & language > administrative > language for non-unicode.
Windows default for UK was 850. Arabic (U.A.E) is 720. You can double check by running cmd and typing chcp. As it turns out, Arabic U.A.E 720 is NOT correct, you need 1256. I couldn't find the matching language in control panel, so I changed it manually in cmd using the command 'chcp 1256'
Finally, you need to change the internal code table of the printer. To do this, I used the TM-T88V utility (https://download.epson-biz.com/modules/pos/index.php?page=single_soft&cid=4100&pcat=3&scat=42). I believe you can do this via direct commands sent via serial too, but it proved too fiddly.
Fingers crossed this should all work now. Happy printing.

How do I use C#'s IndexOf when strange characters are in the string

Below is what the text looks like when viewed in NotePad++.
I need to get the IndexOf for that peice of the string. for use the the below code. And I can't figure out how to use the odd characters in my code.
int start = text.IndexOf("AppxxxxxxDB INFO");
Where the "xxxxx"'s represent the strange characters.
All these characters have corresponding ASCII codes, you can insert them in a string by escaping it.
For instance:
"App\x0000\x0001\x0000\x0003\x0000\x0000\x0000DB INFO"
or shorter:
"App\x00\x01\x00\x03\x00\x00\x00"+"DB INFO"
\xXXXX means you specify one character with XXXX the hexadecimal number corresponding to the character.
Notepad++ simply wants to make it a bit more convenient by rendering these characters by printing the abbreviation in a "bubble". But that's just rendering.
The origin of these characters is printer (and other media) directives. For instance you needed to instruct a printer to move to the next line, stop the printing job, nowadays they are still used. Some terminals use them to communicate color changes, etc. The most well known is \n or \x000A which means you start a new line. For text they are thus characters that specify how to handle text. A bit equivalent to modern html, etc. (although it's only a limited equivalence). \n is thus only a new line because there is a consensus about that. If one defines his/her own encoding, he can invent a new system.
Echoing #JonSkeet's warning, when you read a file into a string, the file's bytes are decoded according to a character set encoding. The decoder has to do something with bytes values or sequences that are invalid per the encoding rules. Typical decoders substitute a replacement character and attempt to go on.
I call that data corruption. In most cases, I'd rather have the decoder throw an exception.
You can use a standard decoder, customize one or create a new one with the Encoding class to get the behavior you want. Or, you can preserve the original bytes by reading the file as bytes instead of as text.
If you insist on reading the file as text, I suggest using the 437 encoding because it has 256 characters, one for every byte value, no restrictions on byte sequences and each 437 character is also in Unicode. The bytes that represent text will possibly decode the same characters that you want to search for as strings, but you have to check, comparing 437 and Unicode in this table.
Really, you should have and follow the specification for the file type you are reading. After all, there is no text but encoded text, and you have to know which encoding it is.

Default C# String encoding

I am having some issues with the default string encoding in C#. I need to read strings from certain files/packets. However, these strings include characters from the 128-256 range (extended ascii), and all of these characters show up as question marks , instead of the proper character. For example, when reading a string ,it could come up as "S?meStr?n?" if the string contained the extended ascii characters.
Now, is there any way to change the default encoding for my application? I know in java you could define the default character set from command line.
There's no one single "extended ASCII" encoding. There are lots of different 8-bit encodings which are compatible with ASCII for the bottom 128 values.
You need to find out what encoding your files actually use, and specific that when reading the data with StreamReader (or whatever else you're using). For example, you may want encoding Windows-1252:
Encoding encoding = Encoding.GetEncoding(1252);
.NET strings are always sequences of UTF-16 code points. You can't change that, and you shouldn't try. (That's true in Java as well, and you really shouldn't use the platform default encoding when calling getBytes() etc unless that's what you really, really mean.)
An Encoding can be specified in at least one overload of functions for reading text - for example, ReadAllText(string, Encoding).
So if you no a file's encoded using Windows-1252, then you can specify it like so:
string contents = File.ReadAllText(someFilePath, Encoding.GetEncoding(1252));
Of course, doing this requires knowing ahead of time which code page is being used.

How to show symbols like "Lambda" or "Mu" on labels in desktop application in c#.net

Please tell me how can i show symbols like "lambda" or Mu using c#.net in desktop application. what i think is we may do it using ASCII values and convert.toChar();.. if i am right that please give me link of page where i can get ASCII values of all such a scientific symbols.
Please give me link of any URL which contains list of such a ASCII numbers.
Open the Windows character map (charmap.exe), select a Unicode font (Arial should suffice) and copy the symbols into your source code or resources. It's just characters. Of course, you can also switch to Greek keyboard layout, so you can write the characters directly rather than going the charmap route.
Note that you need to use a Unicode font for the labels. You can use charmap to look up which font has Greek characters.
Please tell me how can i show symbols like "lambda" or Mu using c#.net in desktop application.
You don't have to do anything special. Just use whatever letters you want in either the IDE or in strings in the program. C# treats Greek letters the same as any other letters; they are not special.
what i think is we may do it using ASCII values and convert.toChar();
Hold on, I have a phone call. Oh, it's for you. It's 1968 calling, and they want their character set back. :-)
ASCII proper only has 95 printable characters, and Greek letters are not among them. ASCII was invented for teletypes back in the 1960's; we don't use it anymore. Characters in modern programming environments are represented using Unicode, which provides uniform support for tens of thousands of characters in dozens of alphabets.
if i am right then please give me link of page where i can get ASCII values of all such a scientific symbols.
You can get a list of all the Unicode characters at unicode.org. But like I said, you don't need to. You can just embed the character you want directly in the text. There's no need to resort to clumsy tricks like unicode escapes. (Unless, of course, you are planning on sending your source code to your coworkers using a 1970's era teletype machine.)
C# applications are all Unicode - so there should be no problem assigning Unicode strings to the controls' text, for example:
textBox1.Text = "this is a lambda symbol - λ";
Try this
char c = '\u03BB'; //03BC
System.Console.WriteLine(c.ToString());
does it work for you?

Categories