Decode cyrillic quoted-printable content - c#

I'm using this sample for getting mail from server. Problem is that response contains cyrillic symbols I cannot decode.
Here is a header:
Content-type: text/html; charset="koi8-r"
Content-Transfer-Encoding: quoted-printable
And receive response function:
static void receiveResponse(string command)
{
try
{
if (command != "")
{
if (tcpc.Connected)
{
dummy = Encoding.ASCII.GetBytes(command);
ssl.Write(dummy, 0, dummy.Length);
}
else
{
throw new ApplicationException("TCP CONNECTION DISCONNECTED");
}
}
ssl.Flush();
byte[] bigBuffer = new byte[1024*16];
int bites = ssl.Read(bigBuffer, 0, bigBuffer.Length);
byte[] buffer = new byte[bites];
Array.Copy(bigBuffer, 0, buffer, 0, bites);
sb.Append(Encoding.ASCII.GetString(buffer));
string result = sb.ToString();
// here is an unsuccessful attempt at decoding
result = Regex.Replace(result, #"=([0-9a-fA-F]{2})",
m => m.Groups[1].Success
? Convert.ToChar(Convert.ToInt32(m.Groups[1].Value, 16)).ToString()
: "");
byte[] bytes = Encoding.Default.GetBytes(result);
result = Encoding.GetEncoding("koi8r").GetString(bytes);
}
catch (Exception ex)
{
throw new ApplicationException(ex.ToString());
}
}
How to decode stream correctly? In result string I got <p>=F0=D2=C9=D7=C5=D4 =D1 =F7=C1=CE=D1</p> instead of <p>Привет я Ваня</p>.

As #Max pointed out, you will need to decode the content using the encoding algorithm declared in the Content-Transfer-Encoding header.
In your case, it is the quoted-printable encoding.
You will need to decode the text of the message into an array of bytes and then you’ll need to convert that array of bytes into a string using the appropriate System.Text.Encoding. The name of the encoding to use will typically be specified in the Content-Type header as the charset parameter (in your case, koi8-r).
Since you already have the text as bytes in the buffer variable, simply perform the deciding on that:
byte[] buffer = new byte[bites];
int decodedLength = 0;
for (int i = 0; i < bites; i++) {
if (bigBuffer[i] == (byte) '=') {
if (bites > i + 1) {
// possible hex sequence
byte b1 = bigBuffer[i + 1];
byte b2 = bigBuffer[i + 2];
if (IsXDigit (b1) && IsXDigit (b2)) {
// decode
buffer[decodedLength++] = (ToXDigit (b1) << 4) | ToXDigit (b2);
i += 2;
} else if (b1 == (byte) '\r' && b2 == (byte) '\n') {
// folded line, drop the '=\r\n' sequence
i += 2;
} else {
// error condition, just pass it through
buffer[decodedLength++] = bigBuffer[i];
}
} else {
// truncated? just pass it through
buffer[decodedLength++] = bigBuffer[i];
}
} else {
buffer[decodedLength++] = bigBuffer[i];
}
}
string result = Encoding.GetEncoding ("koi8-r").GetString (buffer, 0, decodedLength);
Custom functions:
static byte ToXDigit (byte c)
{
if (c >= 0x41) {
if (c >= 0x61)
return (byte) (c - (0x61 - 0x0a));
return (byte) (c - (0x41 - 0x0A));
}
return (byte) (c - 0x30);
}
static bool IsXDigit (byte c)
{
return (c >= (byte) 'A' && c <= (byte) 'F') || (c >= (byte) 'a' && c <= (byte) 'f') || (c >= (byte) '0' && c <= (byte) '9');
}
Of course, instead of writing your own hodge podge IMAP library, you could just use MimeKit and MailKit ;-)

Related

C# String to Byte[]

I want to convert a string to byte and output it as byte array.
example: string: 3074 output: 0C02
private static byte[] ConvertHexToBytes(string input)
{
var result = new byte[(input.Length + 1) / 2];
var offset = 0;
if (input.Length % 2 == 1)
{
result[0] = (byte)Convert.ToUInt32(input[0] + "", 16);
offset = 1;
}
for (int i = 0; i < input.Length / 2; i++)
{
result[i + offset] = (byte)Convert.ToUInt32(input.Substring(i * 2 + offset, 2), 16);
}
return result;
}
private static void SetValue(string input)
{
byte[] port = ConvertHexToBytes(input);
byte[] port = new byte[] { byte.Parse("" + port[0]), byte.Parse("" + port[1]) };
}
I am getting 3074 instead of the 0C02.
Just figured it out, here is my code
private static void SetValue(string input)
{
byte[] port = BitConverter.GetBytes(int.Parse(input));
byte[] res= new byte[] { port[1], port[0] };
}
It seems you're working with int, not string ("I want to convert a string to byte...")
int input = 3074; // 00 00 0C 02
And you want to trim leading zeroes and get "0C02" string; you can do it with a help of Linq:
byte[] result =
(BitConverter.IsLittleEndian
? BitConverter.GetBytes(input).Reverse()
: BitConverter.GetBytes(input))
.SkipWhile(b => b == 0)
.DefaultIfEmpty() // we don't want trim all in case of input == 0
.ToArray();
Test:
// "0C02"
Console.WriteLine(string.Concat(result.Select(b => b.ToString("X2"))));
Combining it all into the proposed method:
private static void SetValue(string input) {
byte[] res =
(BitConverter.IsLittleEndian
? BitConverter.GetBytes(int.Parse(input)).Reverse()
: BitConverter.GetBytes(int.Parse(input)))
.SkipWhile(b => b == 0)
.DefaultIfEmpty() // we don't want trim all in case of input == 0
.ToArray();
// Console.WriteLine(string.Concat(res.Select(b => b.ToString("X2"))));
//TODO: relevant code here
}
A solution which will work independent of the endianess of your system architecture which will make things more portable:
byte[] bytes = BitConverter.GetBytes(input);
if (BitConverter.IsLittleEndian)
Array.Reverse(bytes);
Console.WriteLine("byte array: " +
BitConverter.ToString(bytes).Replace("-", string.Empty));

Reading multi language text file in c#

I have to read a text file which can contains char from following languages: English, Japanese, Chinese, French, Spanish, German, Italian
My task is to simply read the data and write it to new text file (placing new line char \n after 100 chars).
I cannot use File.ReadAllText and File.ReadAllLines as file size can be more than 500 MB. So I have written following code:
using (var streamReader = new StreamReader(inputFilePath, Encoding.ASCII))
{
using (var streamWriter = new StreamWriter(outputFilePath,false))
{
char[] bytes = new char[100];
while (streamReader.Read(bytes, 0, 100) > 0)
{
var data = new string(bytes);
streamWriter.WriteLine(data);
}
MessageBox.Show("Compleated");
}
}
Other than ASCII encoding I have tried UTF-7, UTF-8, UTF-32 and IBM500. But no luck in reading and writing multi language characters.
Please help me to achieve this.
You will have to take a look at the first 4 bytes of the file you are parsing.
these bytes will give you a hint on what encoding you have to use.
Here is a helper method I have written to do the task:
public static string GetStringFromEncodedBytes(this byte[] bytes) {
var encoding = Encoding.Default;
var skipBytes = 0;
if (bytes[0] == 0x2b && bytes[1] == 0x2f && bytes[2] == 0x76) {
encoding = Encoding.UTF7;
skipBytes = 3;
}
if (bytes[0] == 0xef && bytes[1] == 0xbb && bytes[2] == 0xbf) {
encoding = Encoding.UTF8;
skipBytes = 3;
}
if (bytes[0] == 0xff && bytes[1] == 0xfe) {
encoding = Encoding.Unicode;
skipBytes = 2;
}
if (bytes[0] == 0xfe && bytes[1] == 0xff) {
encoding = Encoding.BigEndianUnicode;
skipBytes = 2;
}
if (bytes[0] == 0 && bytes[1] == 0 && bytes[2] == 0xfe && bytes[3] == 0xff) {
encoding = Encoding.UTF32;
skipBytes = 4;
}
return encoding.GetString(bytes.Skip(skipBytes).ToArray());
}
This is a good enough start to get to the answer. If i is not equal to 100 you need to read more chars. No trouble with french chars like é - they are all handled in C# char class.
char[] soFlow = new char[100];
int posn = 0;
using (StreamReader sr = new StreamReader("a.txt"))
using (StreamWriter sw = new StreamWriter("b.txt", false))
while(sr.EndOfStream == false)
{
try {
int i = sr.Read(soFlow, posn%100, 100);
//if i < 100 need to read again with second char array
posn += 100;
sw.WriteLine(new string(soFlow));
}
catch(Exception e){Console.WriteLine(e.Message);}
}
Spec: Read(Char[], Int32, Int32) Reads a specified maximum of characters from the current stream into a buffer, beginning at the specified index.
Certainly worked for me anyway :)

Migrating BCD conversion code from Java to .Net

I am converting a Java class that converts BCD data to ASCII
I am converting it to .Net BCD Convertor
Following is the converted from java but it is giving wrong converted value e.g. for 123456789 it is giving 123456153153
public static string GetStringFromBcd(int[] b)
{
StringBuilder buffer = new StringBuilder();
foreach (var t in b)
{
if ((t & 0x0000000F) == 0x0000000F && ((t >> 4) & 0x0000000F) == 0x0000000F)
{
break;
}
buffer.Append((t & 0x0000000F) + "");
if ((t & 0x000000F0) != 0x000000F0)
{
buffer.Append(((t >> 4) & 0x0000000F) + "");
}
}
}
What could be the problem?
EDIT: ANSWER:
I got the source program where the data has been BCD encoded.
I found that nothing was wrong in that logic, then I discovered the source of the function where the data was converting from network stream to string and later converted to byte/int array.
following is the code
int bytesRead = tcpClient.Receive(message);//, 0, bytetoReadSize);
if (bytesRead == 0)
{
break;
//the client has disconnected from the server
}
//message has successfully been received
data += new ASCIIEncoding().GetString(message, 0, bytesRead);
here is the problem ASCIIEncoding does not convert many encoded character and gives '?'63 instead of those character , when putting 63 in BCD conversion logic it gives 153.
To resolve this error, I Modified the last line and instead of decoding , I am simply casting the received byte to char.
foreach (byte b in message)
{
data += ((char) b);
}
Here's a Similar Question that comes about it a few different ways.
Here's a Site that has an excellent detailed description of what your facing.
It shouldn't be that hard, but processing them as int's is going to be much harder.
Zoned Decimal (BCD) is pretty straight forward to convert, but you have to be careful if your taking files from a mainframe that have been converted via an ASCII transfer. It can still be converted, but the byte values change due to the ebcdic to ascii conversion during the FTP.
If your processing binary files, it's much easier to deal with.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace TestZoned
{
class Program
{
public static String zoneToString(byte[] zoneBytes)
{
Encoding ascii = Encoding.ASCII;
Encoding ebcdic = Encoding.GetEncoding("IBM037");
byte[] asciiBytes = null;
String str = null;
int zoneLen = zoneBytes.Length;
int i = zoneLen - 1;
int b1 = zoneBytes[i];
b1 = (b1 & 0xf0) >> 4;
switch (b1)
{
case 13:
case 11:
zoneBytes[i] = (byte)(zoneBytes[i] | 0xf0);
asciiBytes = Encoding.Convert(ebcdic, ascii, zoneBytes);
str = "-" + ASCIIEncoding.ASCII.GetString(asciiBytes);
break;
default:
zoneBytes[i] = (byte)(zoneBytes[i] | 0xf0);
asciiBytes = Encoding.Convert(ebcdic, ascii, zoneBytes);
str = ASCIIEncoding.ASCII.GetString(asciiBytes);
break;
}
return (str);
}
static void Main(string[] args)
{
byte[] array = { 0xf0, 0xf0, 0xf1 }; // 001
byte[] pos = { 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9 }; // 123456789
byte[] neg = { 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xd9 }; // -123456789
Console.WriteLine("Converted: {0}", zoneToString(array));
Console.WriteLine("Converted: {0}", zoneToString(pos));
Console.WriteLine("Converted: {0}", zoneToString(neg));
}
}
}
If you wanted to stick to something similar
public static String GetStringFromBcd(byte[] zoneBytes)
{
StringBuilder buffer = new StringBuilder();
int b1 = (zoneBytes[zoneBytes.Length - 1] & 0xf0) >> 4;
if ( (b1 == 13) || (b1 == 11) ) buffer.Append("-");
for (int i = 0; i < zoneBytes.Length; i++)
{
buffer.Append((zoneBytes[i] & 0x0f));
}
return buffer.ToString();
}

How to convert a byte array (MD5 hash) into a string (36 chars)?

I've got a byte array that was created using a hash function. I would like to convert this array into a string. So far so good, it will give me hexadecimal string.
Now I would like to use something different than hexadecimal characters, I would like to encode the byte array with these 36 characters: [a-z][0-9].
How would I go about?
Edit: the reason I would to do this, is because I would like to have a smaller string, than a hexadecimal string.
I adapted my arbitrary-length base conversion function from this answer to C#:
static string BaseConvert(string number, int fromBase, int toBase)
{
var digits = "0123456789abcdefghijklmnopqrstuvwxyz";
var length = number.Length;
var result = string.Empty;
var nibbles = number.Select(c => digits.IndexOf(c)).ToList();
int newlen;
do {
var value = 0;
newlen = 0;
for (var i = 0; i < length; ++i) {
value = value * fromBase + nibbles[i];
if (value >= toBase) {
if (newlen == nibbles.Count) {
nibbles.Add(0);
}
nibbles[newlen++] = value / toBase;
value %= toBase;
}
else if (newlen > 0) {
if (newlen == nibbles.Count) {
nibbles.Add(0);
}
nibbles[newlen++] = 0;
}
}
length = newlen;
result = digits[value] + result; //
}
while (newlen != 0);
return result;
}
As it's coming from PHP it might not be too idiomatic C#, there are also no parameter validity checks. However, you can feed it a hex-encoded string and it will work just fine with
var result = BaseConvert(hexEncoded, 16, 36);
It's not exactly what you asked for, but encoding the byte[] into hex is trivial.
See it in action.
Earlier tonight I came across a codereview question revolving around the same algorithm being discussed here. See: https://codereview.stackexchange.com/questions/14084/base-36-encoding-of-a-byte-array/
I provided a improved implementation of one of its earlier answers (both use BigInteger). See: https://codereview.stackexchange.com/a/20014/20654. The solution takes a byte[] and returns a Base36 string. Both the original and mine include simple benchmark information.
For completeness, the following is the method to decode a byte[] from an string. I'll include the encode function from the link above as well. See the text after this code block for some simple benchmark info for decoding.
const int kByteBitCount= 8; // number of bits in a byte
// constants that we use in FromBase36String and ToBase36String
const string kBase36Digits= "0123456789abcdefghijklmnopqrstuvwxyz";
static readonly double kBase36CharsLengthDivisor= Math.Log(kBase36Digits.Length, 2);
static readonly BigInteger kBigInt36= new BigInteger(36);
// assumes the input 'chars' is in big-endian ordering, MSB->LSB
static byte[] FromBase36String(string chars)
{
var bi= new BigInteger();
for (int x= 0; x < chars.Length; x++)
{
int i= kBase36Digits.IndexOf(chars[x]);
if (i < 0) return null; // invalid character
bi *= kBigInt36;
bi += i;
}
return bi.ToByteArray();
}
// characters returned are in big-endian ordering, MSB->LSB
static string ToBase36String(byte[] bytes)
{
// Estimate the result's length so we don't waste time realloc'ing
int result_length= (int)
Math.Ceiling(bytes.Length * kByteBitCount / kBase36CharsLengthDivisor);
// We use a List so we don't have to CopyTo a StringBuilder's characters
// to a char[], only to then Array.Reverse it later
var result= new System.Collections.Generic.List<char>(result_length);
var dividend= new BigInteger(bytes);
// IsZero's computation is less complex than evaluating "dividend > 0"
// which invokes BigInteger.CompareTo(BigInteger)
while (!dividend.IsZero)
{
BigInteger remainder;
dividend= BigInteger.DivRem(dividend, kBigInt36, out remainder);
int digit_index= Math.Abs((int)remainder);
result.Add(kBase36Digits[digit_index]);
}
// orientate the characters in big-endian ordering
result.Reverse();
// ToArray will also trim the excess chars used in length prediction
return new string(result.ToArray());
}
"A test 1234. Made slightly larger!" encodes to Base64 as "165kkoorqxin775ct82ist5ysteekll7kaqlcnnu6mfe7ag7e63b5"
To decode that Base36 string 1,000,000 times takes 12.6558909 seconds on my machine (I used the same build and machine conditions as provided in my answer on codereview)
You mentioned that you were dealing with a byte[] for the MD5 hash, rather than a hexadecimal string representation of it, so I think this solution provide the least overhead for you.
If you want a shorter string and can accept [a-zA-Z0-9] and + and / then look at Convert.ToBase64String
Using BigInteger (needs the System.Numerics reference)
Using BigInteger (needs the System.Numerics reference)
const string chars = "0123456789abcdefghijklmnopqrstuvwxyz";
// The result is padded with chars[0] to make the string length
// (int)Math.Ceiling(bytes.Length * 8 / Math.Log(chars.Length, 2))
// (so that for any value [0...0]-[255...255] of bytes the resulting
// string will have same length)
public static string ToBaseN(byte[] bytes, string chars, bool littleEndian = true, int len = -1)
{
if (bytes.Length == 0 || len == 0)
{
return String.Empty;
}
// BigInteger saves in the last byte the sign. > 7F negative,
// <= 7F positive.
// If we have a "negative" number, we will prepend a 0 byte.
byte[] bytes2;
if (littleEndian)
{
if (bytes[bytes.Length - 1] <= 0x7F)
{
bytes2 = bytes;
}
else
{
// Note that Array.Resize doesn't modify the original array,
// but creates a copy and sets the passed reference to the
// new array
bytes2 = bytes;
Array.Resize(ref bytes2, bytes.Length + 1);
}
}
else
{
bytes2 = new byte[bytes[0] > 0x7F ? bytes.Length + 1 : bytes.Length];
// We copy and reverse the array
for (int i = bytes.Length - 1, j = 0; i >= 0; i--, j++)
{
bytes2[j] = bytes[i];
}
}
BigInteger bi = new BigInteger(bytes2);
// A little optimization. We will do many divisions based on
// chars.Length .
BigInteger length = chars.Length;
// We pre-calc the length of the string. We know the bits of
// "information" of a byte are 8. Using Log2 we calc the bits of
// information of our new base.
if (len == -1)
{
len = (int)Math.Ceiling(bytes.Length * 8 / Math.Log(chars.Length, 2));
}
// We will build our string on a char[]
var chs = new char[len];
int chsIndex = 0;
while (bi > 0)
{
BigInteger remainder;
bi = BigInteger.DivRem(bi, length, out remainder);
chs[littleEndian ? chsIndex : len - chsIndex - 1] = chars[(int)remainder];
chsIndex++;
if (chsIndex < 0)
{
if (bi > 0)
{
throw new OverflowException();
}
}
}
// We append the zeros that we skipped at the beginning
if (littleEndian)
{
while (chsIndex < len)
{
chs[chsIndex] = chars[0];
chsIndex++;
}
}
else
{
while (chsIndex < len)
{
chs[len - chsIndex - 1] = chars[0];
chsIndex++;
}
}
return new string(chs);
}
public static byte[] FromBaseN(string str, string chars, bool littleEndian = true, int len = -1)
{
if (str.Length == 0 || len == 0)
{
return new byte[0];
}
// This should be the maximum length of the byte[] array. It's
// the opposite of the one used in ToBaseN.
// Note that it can be passed as a parameter
if (len == -1)
{
len = (int)Math.Ceiling(str.Length * Math.Log(chars.Length, 2) / 8);
}
BigInteger bi = BigInteger.Zero;
BigInteger length2 = chars.Length;
BigInteger mult = BigInteger.One;
for (int j = 0; j < str.Length; j++)
{
int ix = chars.IndexOf(littleEndian ? str[j] : str[str.Length - j - 1]);
// We didn't find the character
if (ix == -1)
{
throw new ArgumentOutOfRangeException();
}
bi += ix * mult;
mult *= length2;
}
var bytes = bi.ToByteArray();
int len2 = bytes.Length;
// BigInteger adds a 0 byte for positive numbers that have the
// last byte > 0x7F
if (len2 >= 2 && bytes[len2 - 1] == 0)
{
len2--;
}
int len3 = Math.Min(len, len2);
byte[] bytes2;
if (littleEndian)
{
if (len == bytes.Length)
{
bytes2 = bytes;
}
else
{
bytes2 = new byte[len];
Array.Copy(bytes, bytes2, len3);
}
}
else
{
bytes2 = new byte[len];
for (int i = 0; i < len3; i++)
{
bytes2[len - i - 1] = bytes[i];
}
}
for (int i = len3; i < len2; i++)
{
if (bytes[i] != 0)
{
throw new OverflowException();
}
}
return bytes2;
}
Be aware that they are REALLY slow! REALLY REALLY slow! (2 minutes for 100k). To speed them up you would probably need to rewrite the division/mod operation so that they work directly on a buffer, instead of each time recreating the scratch pads as it's done by BigInteger. And it would still be SLOW. The problem is that the time needed to encode the first byte is O(n) where n is the length of the byte array (this because all the array needs to be divided by 36). Unless you want to work with blocks of 5 bytes and lose some bits. Each symbol of Base36 carries around 5.169925001 bits. So 8 of these symbols would carry 41.35940001 bits. Very near 40 bytes.
Note that these methods can work both in little-endian mode and in big-endian mode. The endianness of the input and of the output is the same. Both methods accept a len parameter. You can use it to trim excess 0 (zeroes). Note that if you try to make an output too much small to contain the input, an OverflowException will be thrown.
System.Text.Encoding enc = System.Text.Encoding.ASCII;
string myString = enc.GetString(myByteArray);
You can play with what encoding you need:
System.Text.ASCIIEncoding,
System.Text.UnicodeEncoding,
System.Text.UTF7Encoding,
System.Text.UTF8Encoding
To match the requrements [a-z][0-9] you can use it:
Byte[] bytes = new Byte[] { 200, 180, 34 };
string result = String.Join("a", bytes.Select(x => x.ToString()).ToArray());
You will have string representation of bytes with char separator. To convert back you will need to split, and convert the string[] to byte[] using the same approach with .Select().
Usually a power of 2 is used - that way one character maps to a fixed number of bits. An alphabet of 32 bits for instance would map to 5 bits. The only challenge in that case is how to deserialize variable-length strings.
For 36 bits you could treat the data as a large number, and then:
divide by 36
add the remainder as character to your result
repeat until the division results in 0
Easier said than done perhaps.
you can use modulu.
this example encode your byte array to string of [0-9][a-z].
change it if you want.
public string byteToString(byte[] byteArr)
{
int i;
char[] charArr = new char[byteArr.Length];
for (i = 0; i < byteArr.Length; i++)
{
int byt = byteArr[i] % 36; // 36=num of availible charachters
if (byt < 10)
{
charArr[i] = (char)(byt + 48); //if % result is a digit
}
else
{
charArr[i] = (char)(byt + 87); //if % result is a letter
}
}
return new String(charArr);
}
If you don't want to lose data for de-encoding you can use this example:
public string byteToString(byte[] byteArr)
{
int i;
char[] charArr = new char[byteArr.Length*2];
for (i = 0; i < byteArr.Length; i++)
{
charArr[2 * i] = (char)((int)byteArr[i] / 36+48);
int byt = byteArr[i] % 36; // 36=num of availible charachters
if (byt < 10)
{
charArr[2*i+1] = (char)(byt + 48); //if % result is a digit
}
else
{
charArr[2*i+1] = (char)(byt + 87); //if % result is a letter
}
}
return new String(charArr);
}
and now you have a string double-lengthed when odd char is the multiply of 36 and even char is the residu. for example: 200=36*5+20 => "5k".

Converting a paragraph to hex notatation, then back to string

How would you convert a parapraph to hex notation, and then back again into its original string form?
(C#)
A side note: would putting the string into hex format shrink it the most w/o getting into hardcore shrinking algo's?
What exactly do you mean by "hex notation"? That usually refers to encoding binary data, not text. You'd need to encode the text somehow (e.g. using UTF-8) and then encode the binary data as text by converting each byte to a pair of characters.
using System;
using System.Text;
public class Hex
{
static void Main()
{
string original = "The quick brown fox jumps over the lazy dog.";
byte[] binary = Encoding.UTF8.GetBytes(original);
string hex = BytesToHex(binary);
Console.WriteLine("Hex: {0}", hex);
byte[] backToBinary = HexToBytes(hex);
string restored = Encoding.UTF8.GetString(backToBinary);
Console.WriteLine("Restored: {0}", restored);
}
private static readonly char[] HexChars = "0123456789ABCDEF".ToCharArray();
public static string BytesToHex(byte[] data)
{
StringBuilder builder = new StringBuilder(data.Length*2);
foreach(byte b in data)
{
builder.Append(HexChars[b >> 4]);
builder.Append(HexChars[b & 0xf]);
}
return builder.ToString();
}
public static byte[] HexToBytes(string text)
{
if ((text.Length & 1) != 0)
{
throw new ArgumentException("Invalid hex: odd length");
}
byte[] ret = new byte[text.Length/2];
for (int i=0; i < text.Length; i += 2)
{
ret[i/2] = (byte)(ParseNybble(text[i]) << 4 | ParseNybble(text[i+1]));
}
return ret;
}
private static int ParseNybble(char c)
{
if (c >= '0' && c <= '9')
{
return c-'0';
}
if (c >= 'A' && c <= 'F')
{
return c-'A'+10;
}
if (c >= 'a' && c <= 'f')
{
return c-'A'+10;
}
throw new ArgumentOutOfRangeException("Invalid hex digit: " + c);
}
}
No, doing this would not shrink it at all. Quite the reverse - you'd end up with a lot more text! However, you could compress the binary form. In terms of representing arbitrary binary data as text, Base64 is more efficient than plain hex. Use Convert.ToBase64String and Convert.FromBase64String for the conversions.
public string ConvertToHex(string asciiString)
{
string hex = "";
foreach (char c in asciiString)
{
int tmp = c;
hex += String.Format("{0:x2}", (uint)System.Convert.ToUInt32(tmp.ToString()));
}
return hex;
}
While I can't help much on the C# implementation, I would highly recommend LZW as a simple-to-implement data compression algorithm for you to use.
Perhaps the answer can be more quickly reached if we ask: what are you really trying to do? Converting an ordinary string to a string of a hex representation seems like the wrong approach to anything, unless you are making a hexidecimal/encoding tutorial for the web.
static byte[] HexToBinary(string s) {
byte[] b = new byte[s.Length / 2];
for (int i = 0; i < b.Length; i++)
b[i] = Convert.ToByte(s.Substring(i * 2, 2), 16);
return b;
}
static string BinaryToHex(byte[] b) {
StringBuilder sb = new StringBuilder(b.Length * 2);
for (int i = 0; i < b.Length; i++)
sb.Append(Convert.ToString(256 + b[i], 16).Substring(1, 2));
return sb.ToString();
}

Categories