I need to encode some data (text) so that it can easily be passed by the user over phone.
The text contains random characters and is normally not longer than 100 chars. Example:
"37-b,kA.sZ:Bb9--10.y<§"
I'd like to encode this text into more human readable form so that it can easily be passed over phone.
Base36 produces a text that can easily be passed over phone, but I don't see how to encode/decode this correctly.
Any ideas or alternatives?
(Platform is .net 3.5 SP1)
Base 36 sounds like a good choice (when using symbols a-z and 0-9, it is the largest space of characters, that can be easily passed over the phone). I would suggest you make the output contain blocks of 6 or 8 characters, to make it easier to read. Also; consider adding a checksum in the end, so you can verify there are no errors in the data.
100 characters in this encoding will still not be easy to read over the phone and get right the first time. Have you considered another delivery mechanism ? Text message (SMS) ?
On Wikipedia, there is an example of encoding Base36 in Python - shouldn't be too hard to convert to C#.
Related
Below is what the text looks like when viewed in NotePad++.
I need to get the IndexOf for that peice of the string. for use the the below code. And I can't figure out how to use the odd characters in my code.
int start = text.IndexOf("AppxxxxxxDB INFO");
Where the "xxxxx"'s represent the strange characters.
All these characters have corresponding ASCII codes, you can insert them in a string by escaping it.
For instance:
"App\x0000\x0001\x0000\x0003\x0000\x0000\x0000DB INFO"
or shorter:
"App\x00\x01\x00\x03\x00\x00\x00"+"DB INFO"
\xXXXX means you specify one character with XXXX the hexadecimal number corresponding to the character.
Notepad++ simply wants to make it a bit more convenient by rendering these characters by printing the abbreviation in a "bubble". But that's just rendering.
The origin of these characters is printer (and other media) directives. For instance you needed to instruct a printer to move to the next line, stop the printing job, nowadays they are still used. Some terminals use them to communicate color changes, etc. The most well known is \n or \x000A which means you start a new line. For text they are thus characters that specify how to handle text. A bit equivalent to modern html, etc. (although it's only a limited equivalence). \n is thus only a new line because there is a consensus about that. If one defines his/her own encoding, he can invent a new system.
Echoing #JonSkeet's warning, when you read a file into a string, the file's bytes are decoded according to a character set encoding. The decoder has to do something with bytes values or sequences that are invalid per the encoding rules. Typical decoders substitute a replacement character and attempt to go on.
I call that data corruption. In most cases, I'd rather have the decoder throw an exception.
You can use a standard decoder, customize one or create a new one with the Encoding class to get the behavior you want. Or, you can preserve the original bytes by reading the file as bytes instead of as text.
If you insist on reading the file as text, I suggest using the 437 encoding because it has 256 characters, one for every byte value, no restrictions on byte sequences and each 437 character is also in Unicode. The bytes that represent text will possibly decode the same characters that you want to search for as strings, but you have to check, comparing 437 and Unicode in this table.
Really, you should have and follow the specification for the file type you are reading. After all, there is no text but encoded text, and you have to know which encoding it is.
I am trying to pass a block of text to a system I do not own, which will pass the data to a system I do own.
Unfortunately, when the first system talks to the second system, it uses a TSV format. Thus, I wonder if there's a convenient way to take my block of text and encode it in an ASCII format without any kind of whitespace (mostly newlines and tabs, of course), and then later decode it.
When I'm doing the encoding, I'm working in C#. When I'm doing the decoding, I'm working in Javascript.
I realize that I can write my own code to essentially "manually" perform the encoding and decoding by creating my own scheme, but I wonder if there already exists one for this purpose.
One option which would blow up the size of your data but be really simple to implement: UTF-8 encode all the text, base64-encode that:
byte[] utf8 = Encoding.UTF8.GetBytes(text);
string base64 = Convert.ToBase64(utf);
That won't contain any whitespace, and can be converted back. It'll be significantly larger than the original string, and unreadable... but it'll work.
You could try using HttpUtility.UrlEncode(string) or Uri.EscapeDataString(string), which would percent-encode any whitespace in the passed in text (as well as other special characters, which means the encoded text may be much larger than the original).
On the javascript side you could then use decodeURIComponent(string) to decode it back to the original text.
I'm developing an sms application in c#. The service that I use to send a message only allows characters form the 7 bit alphabet. I'm looking for a way to check if a message only contains characters from this alphabet.
My first idea was to split the message into a character array and then loop these characters and compare them to the alphabet. But I bet there is a much better.
7 big alphabet:
http://www.dreamfabric.com/sms/default_alphabet.html
You can find a utility GSM Encoding class (it simply derives from the abstract System.Text.Encoding) defined here: The GSM character set in .NET. I think this is the most elegant and reusable way.
I think you need to determine the encoding of input text look at the Encoding class
In C# is there a way to encode the extended ascii values (128-255) into their single byte values as shown here: http://asciitable.com/
I've tried using Encoding.UTF8.GetBytes() but that returns multi byte values for the extended codes. I don't need anything beyond 255, but it would be nice to at least support those. I'm trying to send the text data to an Arduino running and LED matrix and want to handle accented letters, without having to deal with multibyte characters.
EDIT: To clarify, the LED matrix has no specific codepage. It's basically whatever I say it is. There's no built in text support in it or the arduino. It's just a dumb 128x8 pixel display and the controller is manually drawing the text pixel by pixel. Therefore, I'm actually providing a font (as a byte array in a header file) to it and can make any character code correspond to any output that I want... so, which codepage to use is not really an issue other than which one will give me full 8-bit characters.
Just pass the code page number to the Encoding constructor. If what you linked is the correct "extended ASCII" table, that would be 437.
But IBM437 encoding is uncommon outside of DOS programs and Windows console apps. Otherwise, the standard encoding for Western European languages is ISO-8859-1 (Windows code page 28591) or windows-1252.
You need to know the code page that the LED matrix uses. It is bound to be a standard one like 1252, the Windows code page for Western Europe and the Americas.
var bytes = Encoding.GetEncoding(1252).GetBytes("Åãrdvárk");
The Default encoding should handle that. Or use the ANSI codepage/encoding.
Please tell me how can i show symbols like "lambda" or Mu using c#.net in desktop application. what i think is we may do it using ASCII values and convert.toChar();.. if i am right that please give me link of page where i can get ASCII values of all such a scientific symbols.
Please give me link of any URL which contains list of such a ASCII numbers.
Open the Windows character map (charmap.exe), select a Unicode font (Arial should suffice) and copy the symbols into your source code or resources. It's just characters. Of course, you can also switch to Greek keyboard layout, so you can write the characters directly rather than going the charmap route.
Note that you need to use a Unicode font for the labels. You can use charmap to look up which font has Greek characters.
Please tell me how can i show symbols like "lambda" or Mu using c#.net in desktop application.
You don't have to do anything special. Just use whatever letters you want in either the IDE or in strings in the program. C# treats Greek letters the same as any other letters; they are not special.
what i think is we may do it using ASCII values and convert.toChar();
Hold on, I have a phone call. Oh, it's for you. It's 1968 calling, and they want their character set back. :-)
ASCII proper only has 95 printable characters, and Greek letters are not among them. ASCII was invented for teletypes back in the 1960's; we don't use it anymore. Characters in modern programming environments are represented using Unicode, which provides uniform support for tens of thousands of characters in dozens of alphabets.
if i am right then please give me link of page where i can get ASCII values of all such a scientific symbols.
You can get a list of all the Unicode characters at unicode.org. But like I said, you don't need to. You can just embed the character you want directly in the text. There's no need to resort to clumsy tricks like unicode escapes. (Unless, of course, you are planning on sending your source code to your coworkers using a 1970's era teletype machine.)
C# applications are all Unicode - so there should be no problem assigning Unicode strings to the controls' text, for example:
textBox1.Text = "this is a lambda symbol - λ";
Try this
char c = '\u03BB'; //03BC
System.Console.WriteLine(c.ToString());
does it work for you?