Converting a paragraph to hex notatation, then back to string - c#

How would you convert a parapraph to hex notation, and then back again into its original string form?
(C#)
A side note: would putting the string into hex format shrink it the most w/o getting into hardcore shrinking algo's?

What exactly do you mean by "hex notation"? That usually refers to encoding binary data, not text. You'd need to encode the text somehow (e.g. using UTF-8) and then encode the binary data as text by converting each byte to a pair of characters.
using System;
using System.Text;
public class Hex
{
static void Main()
{
string original = "The quick brown fox jumps over the lazy dog.";
byte[] binary = Encoding.UTF8.GetBytes(original);
string hex = BytesToHex(binary);
Console.WriteLine("Hex: {0}", hex);
byte[] backToBinary = HexToBytes(hex);
string restored = Encoding.UTF8.GetString(backToBinary);
Console.WriteLine("Restored: {0}", restored);
}
private static readonly char[] HexChars = "0123456789ABCDEF".ToCharArray();
public static string BytesToHex(byte[] data)
{
StringBuilder builder = new StringBuilder(data.Length*2);
foreach(byte b in data)
{
builder.Append(HexChars[b >> 4]);
builder.Append(HexChars[b & 0xf]);
}
return builder.ToString();
}
public static byte[] HexToBytes(string text)
{
if ((text.Length & 1) != 0)
{
throw new ArgumentException("Invalid hex: odd length");
}
byte[] ret = new byte[text.Length/2];
for (int i=0; i < text.Length; i += 2)
{
ret[i/2] = (byte)(ParseNybble(text[i]) << 4 | ParseNybble(text[i+1]));
}
return ret;
}
private static int ParseNybble(char c)
{
if (c >= '0' && c <= '9')
{
return c-'0';
}
if (c >= 'A' && c <= 'F')
{
return c-'A'+10;
}
if (c >= 'a' && c <= 'f')
{
return c-'A'+10;
}
throw new ArgumentOutOfRangeException("Invalid hex digit: " + c);
}
}
No, doing this would not shrink it at all. Quite the reverse - you'd end up with a lot more text! However, you could compress the binary form. In terms of representing arbitrary binary data as text, Base64 is more efficient than plain hex. Use Convert.ToBase64String and Convert.FromBase64String for the conversions.

public string ConvertToHex(string asciiString)
{
string hex = "";
foreach (char c in asciiString)
{
int tmp = c;
hex += String.Format("{0:x2}", (uint)System.Convert.ToUInt32(tmp.ToString()));
}
return hex;
}

While I can't help much on the C# implementation, I would highly recommend LZW as a simple-to-implement data compression algorithm for you to use.

Perhaps the answer can be more quickly reached if we ask: what are you really trying to do? Converting an ordinary string to a string of a hex representation seems like the wrong approach to anything, unless you are making a hexidecimal/encoding tutorial for the web.

static byte[] HexToBinary(string s) {
byte[] b = new byte[s.Length / 2];
for (int i = 0; i < b.Length; i++)
b[i] = Convert.ToByte(s.Substring(i * 2, 2), 16);
return b;
}
static string BinaryToHex(byte[] b) {
StringBuilder sb = new StringBuilder(b.Length * 2);
for (int i = 0; i < b.Length; i++)
sb.Append(Convert.ToString(256 + b[i], 16).Substring(1, 2));
return sb.ToString();
}

Related

HTML hex to polish characters

I'm downloading HTML file with polish characters, and parsing it to string by:
public static string HexToString(string hex)
{
var sb = new StringBuilder();
for (int i = 0; i < hex.Length; i += 2)
{
string hexdec = hex.Substring(i, 2);
int number = int.Parse(hexdec, NumberStyles.HexNumber);
char charToAdd = (char)number;
sb.Append(charToAdd);
}
return sb.ToString();
}
so when I found %21 I'm sending 21 to HexToString() and in return there is !, this is ok, but char ą is represented as %C4%85 (Ä) and I whant to get ą char
The problem here is that you are treating the hex codes as if they are UTF16 (which is the native format for char), but they are in fact UTF8.
This is easy to resolve using a UTF8 encoding.
First, let's write a handy StringToByteArray() method:
public static byte[] StringToByteArray(string hex)
{
return Enumerable.Range(0, hex.Length)
.Where(x => x%2 == 0)
.Select(x => Convert.ToByte(hex.Substring(x, 2), 16))
.ToArray();
}
Now you can convert the hex string to text like so:
string hexStr = "C485"; // Or whatever your input hex string is.
var bytes = StringToByteArray(hexStr);
string text = Encoding.UTF8.GetString(bytes);
// ...use text
Matthew is right, but you can also use this:
public static string ConvertHexToString(string HexValue)
{
var res = "";
var replacedHex = HexValue.Replace("%", String.Empty);
while (replacedHex.Length > 0)
{
res += System.Convert.ToChar(System.Convert.ToUInt32(replacedHex.Substring(0, 2), 16)).ToString();
replacedHex = replacedHex.Substring(2, replacedHex.Length - 2);
}
return res;
}

Problems with converting char array to string

I have a function in a small application that I'm writing to break a recycled one-time pad cypher. Having used VB.NET for most of my career I thought it would be interesting to implement the app in C#. However, I have encountered a problem due to my present unfamiliarity with C#.
The function takes in two strings (of binary digits), converts these strings to char arrays, and then performs an XOR on them and places the result in a third char array.
This is fine until I try to convert the third char array to a string. Instead of the string looking like "11001101" etc, I get the following result: " \0\0 \0 " i.e. the "1"s are being represented by spaces and the "0"s by "\0".
My code is as follows:
public string calcXor(string a, string b)
{
char[] charAArray = a.ToCharArray();
char[] charBArray = b.ToCharArray();
int len = 0;
// Set length to be the length of the shorter string
if (a.Length > b.Length)
len = b.Length - 1;
else
len = a.Length - 1;
char[] result = new char[len];
for (int i = 0; i < len; i++)
{
result[i] = (char)(charAArray[i] ^ charBArray[i]);
}
return new string(result);
}
Your problem is in the line
result[i] = (char)(charAArray[i] ^ charBArray[i]);
that should be
// (Char) 1 is not '1'!
result[i] = (char)((charAArray[i] ^ charBArray[i]) + '0');
More compact solution is to use StringBuilder, not arrays:
public string calcXor(String a, String b) {
int len = (a.Length < b.Length) ? a.Length : b.Length;
StringBuilder Sb = new StringBuilder();
for (int i = 0; i < len; ++i)
// Sb.Append(CharToBinary(a[i] ^ b[i])); // <- If you want 0's and 1's
Sb.Append(a[i] ^ b[i]); // <- Just int, not in binary format as in your solution
return Sb.ToString();
}
public static String CharToBinary(int value, Boolean useUnicode = false) {
int size = useUnicode ? 16 : 8;
StringBuilder Sb = new StringBuilder(size);
Sb.Length = size;
for (int i = size - 1; i >= 0; --i) {
Sb[i] = value % 2 == 0 ? '0' : '1';
value /= 2;
}
return Sb.ToString();
}
Your solution just computes xor's (e.g. "65") and put them into line (e.g. 65728...); if you want 0's and 1's representation, you should use formatting
Have a look at the ASCII Table. 0 is the Null character \0. You could try ToString()
Have you tried using binary / byte[]? It seems like the fastest way to me.
public string calcXor(string a, string b)
{
//String to binary
byte[] ab = ConvertToBinary(a);
byte[] bb = ConvertToBinary(b);
//(XOR)
byte[] cb = a^b
return cb.ToString();
}
public static byte[] ConvertToBinary(string str)
{
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
return encoding.GetBytes(str);
}
I just wanted to add that the solution I eventually chose is as follows:
//Parameter binary is a bit string
public void someroutine(String binary)
{
var data = GetBytesFromBinaryString(binary);
var text = Encoding.ASCII.GetString(data);
}
public Byte[] GetBytesFromBinaryString(String binary)
{
var list = new List<Byte>();
for (int i = 0; i < binary.Length; i += 8)
{
String t = binary.Substring(i, 8);
list.Add(Convert.ToByte(t, 2));
}
return list.ToArray();
}

Trying to reproduce PHP's pack("H*") function in C#

this is my code in C# :
public static String MD5Encrypt(String str, Boolean raw_output=false)
{
// Use input string to calculate MD5 hash
String output;
MD5 md5 = System.Security.Cryptography.MD5.Create();
byte[] inputBytes = System.Text.Encoding.ASCII.GetBytes(str);
byte[] hashBytes = md5.ComputeHash(inputBytes);
// Convert the byte array to hexadecimal string
StringBuilder sb = new StringBuilder();
for (int i = 0; i < hashBytes.Length; i++)
{
sb.Append(hashBytes[i].ToString("x2"));
}
output = sb.ToString();
if (raw_output)
{
output = pack(output);
}
return output;
}
public static String pack(String S)
{
string MultiByte = "";
for (int i = 0; i <= S.Length - 1; i += 2)
{
MultiByte += Convert.ToChar(HexToDec(S.Substring(i, 2)));
}
return MultiByte;
}
private static int HexToDec(String hex)
{
//Int32.Parse(hexString, System.Globalization.NumberStyles.HexNumber);
return Convert.ToInt32(hex, 16);
}
To reproduce what is done in php by this way :
md5($str, true);
OR
pack('H*', md5( $str ));
I tried many things but can't get the same on the two sides in some cases of word.
For example, Trying this test on the string "8tv7er5j"
PHP Side :
9c36ad446f83ca38619e12d9e1b3c39e <= md5("8tv7er5j");
œ6­DoƒÊ8ažÙá³Ãž <= md5("8tv7er5j", true) or pack("H*", md5("8tv7er5j"))
C# Side :
9c36ad446f83ca38619e12d9e1b3c39e <= MD5Encrypt("8tv7er5j")
6­DoÊ8aÙá³Ã <= MD5Encrypt("8tv7er5j", true) or pack( MD5Encrypt("8tv7er5j") )
Why ? Encoding problem ?
EDIT 1 :
I have the good result, but bad encoded with this this function for pack() :
if ((hex.Length % 2) == 1) hex += '0';
byte[] bytes = new byte[hex.Length / 2];
for (int i = 0; i < hex.Length; i += 2)
{
bytes[i / 2] = Convert.ToByte(hex.Substring(i, 2), 16);
}
return bytes;
So, System.Text.Encoding.UTF8.GetString(bytes) give me :
�6�Do��8a���Þ
And System.Text.Encoding.ASCII.GetString(bytes)
?6?Do??8a??????
...
I encountered same scenario where I am in need of php's pack-unpack-md5 functions in C#. Most important was that I need to match out of all these 3 functions with php.
I created my own functions and then validated(verified) my output with functions at onlinephpfunctions.com. The output was same when I parsed with DefaultEncoding. FYI, I checked my application's encoding(Encoding.Default.ToString()) and it was System.Text.SBCSCodePageEncoding
Pack
private static string pack(string input)
{
//only for H32 & H*
return Encoding.Default.GetString(FromHex(input));
}
public static byte[] FromHex(string hex)
{
hex = hex.Replace("-", "");
byte[] raw = new byte[hex.Length / 2];
for (int i = 0; i < raw.Length; i++)
{
raw[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16);
}
return raw;
}
MD5
private static string md5(string input)
{
byte[] asciiBytes = Encoding.Default.GetBytes(input);
byte[] hashedBytes = MD5CryptoServiceProvider.Create().ComputeHash(asciiBytes);
string hashedString = BitConverter.ToString(hashedBytes).Replace("-", "").ToLower();
return hashedString;
}
Unpack
private static string unpack(string p1, string input)
{
StringBuilder output = new StringBuilder();
for (int i = 0; i < input.Length; i++)
{
string a = Convert.ToInt32(input[i]).ToString("X");
output.Append(a);
}
return output.ToString();
}
PS: User can enhance these functions with other formats
I guess that PHP defaults to Latin1 so the code should look like :
public static String PhpMd5Raw(string str)
{
var md5 = System.Security.Cryptography.MD5.Create();
var inputBytes = System.Text.Encoding.ASCII.GetBytes(str);
var hashBytes = md5.ComputeHash(inputBytes);
var latin1Encoding = System.Text.Encoding.GetEncoding("ISO-8859-1");
return latin1Encoding.GetString(hashBytes);
}
If you are going to feed the result as a key for HMAC-SHA1 hashing keep it as bytes[] and initialize the HMACSHA1 with the return value of this function: DO NOT convert it to a string and back to bytes, I have spent hours because of this mistake.
public static byte[] PackH(string hex)
{
if ((hex.Length % 2) == 1) hex += '0';
byte[] bytes = new byte[hex.Length / 2];
for (int i = 0; i < hex.Length; i += 2)
{
bytes[i / 2] = Convert.ToByte(hex.Substring(i, 2), 16);
}
return bytes;
}
I know this is an old question. I am posting my answer for anyone who might reach this page searching for it.
The following code is the full conversion of the pearl function pack("H*") to c#.
public static String Pack(String input)
{
input = input.Replace("-", " ");
byte[] hashBytes = new byte[input.Length / 2];
for (int i = 0; i < hashBytes.Length; i++)
{
hashBytes[i] = Convert.ToByte(input.Substring(i * 2, 2), 16);
}
return Encoding.UTF7.GetString(hashBytes); // for perl/php
}
I'm sorry. I didn't go with the questions completely. But if php code is as below,
$testpack = pack("H*" , "you value");
and if can't read the $testpack values(due to some non support format), then first do base64_encode as below and echo it.
echo base64_encode($testpack);
Then use Risky Pathak answer. For complete this answer I'll post his answer with some small modification like base 64 encoding etc.
var hex = "you value";
hex = hex.Replace("-", "");
byte[] raw = new byte[hex.Length / 2];
for (int i = 0; i < raw.Length; i++)
{
raw[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16);
}
var res = Convert.ToBase64String(raw);
Console.WriteLine(res);
Now if you compare both of values, those should be similar.
And all credit should go to the Risky Pathak answer.
The same in c# can be reached with Hex.Decode() method.
And bin2hex() in php is Hex.Encode().

Dotnet Hex string to Java

Have a problem, much like this post: How to read a .NET Guid into a Java UUID.
Except, from a remote svc I get a hex str formatted like this: ABCDEFGH-IJKL-MNOP-QRST-123456.
I need to match the GUID.ToByteArray() generated .net byte array GH-EF-CD-AB-KL-IJ-OP-MN- QR- ST-12-34-56 in Java for hashing purposes.
I'm kinda at a loss as to how to parse this. Do I cut off the QRST-123456 part and perhaps use something like the Commons IO EndianUtils on the other part, then stitch the 2 arrays back together as well? Seems way too complicated.
I can rearrange the string, but I shouldn't have to do any of these. Mr. Google doesn't wanna help me neither..
BTW, what is the logic in Little Endian land that keeps those last 6 char unchanged?
Yes, for reference, here's what I've done {sorry for 'answer', but had trouble formatting it properly in comment}:
String s = "3C0EA2F3-B3A0-8FB0-23F0-9F36DEAA3F7E";
String[] splitz = s.split("-");
String rebuilt = "";
for (int i = 0; i < 3; i++) {
// Split into 2 char chunks. '..' = nbr of chars in chunks
String[] parts = splitz[i].split("(?<=\\G..)");
for (int k = parts.length -1; k >=0; k--) {
rebuilt += parts[k];
}
}
rebuilt += splitz[3]+splitz[4];
I know, it's hacky, but it'll do for testing.
Make it into a byte[] and skip the first 3 bytes:
package guid;
import java.util.Arrays;
public class GuidConvert {
static byte[] convertUuidToBytes(String guid) {
String hexdigits = guid.replaceAll("-", "");
byte[] bytes = new byte[hexdigits.length()/2];
for (int i = 0; i < bytes.length; i++) {
int x = Integer.parseInt(hexdigits.substring(i*2, (i+1)*2), 16);
bytes[i] = (byte) x;
}
return bytes;
}
static String bytesToHexString(byte[] bytes) {
StringBuilder buf = new StringBuilder();
for (byte b : bytes) {
int i = b >= 0 ? b : (int) b + 256;
buf.append(Integer.toHexString(i / 16));
buf.append(Integer.toHexString(i % 16));
}
return buf.toString();
}
public static void main(String[] args) {
String guid = "3C0EA2F3-B3A0-8FB0-23F0-9F36DEAA3F7E";
byte[] bytes = convertUuidToBytes(guid);
System.err.println("GUID = "+ guid);
System.err.println("bytes = "+ bytesToHexString(bytes));
byte[] tail = Arrays.copyOfRange(bytes, 3, bytes.length);
System.err.println("tail = "+ bytesToHexString(tail));
}
}
The last group of 6 bytes is not reversed because it is an array of bytes. The first four groups are reversed because they are a four-byte integer followed by three two-byte integers.

How to convert a string of bits to byte array

I have a string representing bits, such as:
"0000101000010000"
I want to convert it to get an array of bytes such as:
{0x0A, 0x10}
The number of bytes is variable but there will always be padding to form 8 bits per byte (so 1010 becomes 000010101).
Use the builtin Convert.ToByte() and read in chunks of 8 chars without reinventing the thing..
Unless this is something that should teach you about bitwise operations.
Update:
Stealing from Adam (and overusing LINQ, probably. This might be too concise and a normal loop might be better, depending on your own (and your coworker's!) preferences):
public static byte[] GetBytes(string bitString) {
return Enumerable.Range(0, bitString.Length/8).
Select(pos => Convert.ToByte(
bitString.Substring(pos*8, 8),
2)
).ToArray();
}
public static byte[] GetBytes(string bitString)
{
byte[] output = new byte[bitString.Length / 8];
for (int i = 0; i < output.Length; i++)
{
for (int b = 0; b <= 7; b++)
{
output[i] |= (byte)((bitString[i * 8 + b] == '1' ? 1 : 0) << (7 - b));
}
}
return output;
}
Here's a quick and straightforward solution (and I think it will meet all your requirements): http://vbktech.wordpress.com/2011/07/08/c-net-converting-a-string-of-bits-to-a-byte-array/
This should get you to your answer: How can I convert bits to bytes?
You could just convert your string into an array like that article has, and from there use the same logic to perform the conversion.
Get the characers in groups of eight, and parse to a byte:
string bits = "0000101000010000";
byte[] data =
Regex.Matches(bits, ".{8}").Cast<Match>()
.Select(m => Convert.ToByte(m.Groups[0].Value, 2))
.ToArray();
private static byte[] GetBytes(string bitString)
{
byte[] result = Enumerable.Range(0, bitString.Length / 8).
Select(pos => Convert.ToByte(
bitString.Substring(pos * 8, 8),
2)
).ToArray();
List<byte> mahByteArray = new List<byte>();
for (int i = result.Length - 1; i >= 0; i--)
{
mahByteArray.Add(result[i]);
}
return mahByteArray.ToArray();
}
private static String ToBitString(BitArray bits)
{
var sb = new StringBuilder();
for (int i = bits.Count - 1; i >= 0; i--)
{
char c = bits[i] ? '1' : '0';
sb.Append(c);
}
return sb.ToString();
}
You can go any of below,
byte []bytes = System.Text.Encoding.UTF8.GetBytes("Hi");
string str = System.Text.Encoding.UTF8.GetString(bytes);
byte []bytesNew = System.Convert.FromBase64String ("Hello!");
string strNew = System.Convert.ToBase64String(bytesNew);

Categories