I'm looking to make a caesar cipher that includes common ASCII printable characters (character code 32-126).
My current code:
private static char Cipher(char ch, int key)
{
if (!char.IsLetter(ch))
return ch;
char offset = char.IsUpper(ch) ? 'A' : 'a';
return (char)((((ch + key) - offset) % 26) + offset);
}
public static string Encipher(string input, int key)
{
string output = string.Empty;
foreach (char ch in input)
output += Cipher(ch, key);
return output;
}
public static string Decipher(string input, int key) {return Encipher(input, 26 - key);}
(source: https://www.programmingalgorithms.com/algorithm/caesar-cipher/)
I assume I need to at least change
if (!char.IsLetter(ch)) *and* return Encipher(input, 26 - key);
to
if (char.IsControl(ch)) *and* return Encipher(input, 94 - key);
and change the modulo 26 to 94(?) but what else needs to be done? I assume the random number generator (this is for a one time pad implementation) needs to be changed as well, to 0-93 (or maybe 95??). However, testing this gave me errors and didn't make the output the same as the input. Maybe I need to make a isLetter check as well, so the isUpper check doesn't fail for non letters. What else am I missing?
private static char Cipher(char ch, int key)
{
if (char.IsControl(ch))
return ch;
char offset = ' ';
return (char)((((ch + key) - offset) % 95) + offset);
}
public static string Encipher(string input, int key)
{
string output = string.Empty;
foreach (char ch in input)
output += Cipher(ch, key);
return output;
}
public static string Decipher(string input, int key)
{
return Encipher(input, 95 - key);
}
Related
I have the following function that hashes a string:
public static uint hashString(string myString)
{
uint hash = 0;
foreach (char c in myString)
{
hash *= 0x1F;
hash += c;
}
return hash;
}
So if I want to hash hello it will produce 99162322.
Is it possible to write a reverse function that takes in a number and spits out the string (given that the string result is unknown)?
Since you don't use cryptographic hash, your implementation is easy to reverse (i.e. return some string which has the given hash value)
Code:
public static uint hashString(string myString) {
//DONE: validate public methods' parameters
if (null == myString)
return 0;
uint hash = 0;
//DONE: hash function must never throw exceptions
unchecked {
foreach (char c in myString) {
hash *= 0x1F;
hash += c;
}
}
return hash;
}
private static string HashReverse(uint value) {
StringBuilder sb = new StringBuilder();
for (; value > 0; value /= 31)
sb.Append((char)(value % 31));
return string.Concat(sb.ToString().Reverse());
}
Demo: (given a hash we produce a string and compute hash from it to check)
uint[] tests = new uint[] {
99162322,
123,
456
};
// Since the string can contain control characters, let's provide its Dump
string Dump(string value) => string.Join(" ", value.Select(c =>((int) c).ToString("x4")));
string report = string.Join(Environment.NewLine, tests
.Select(test => new {
test,
reversed = HashReverse(test)
})
.Select(item => $"{item.test,9} :: {Dump(item.reversed),-30} :: {hashString(item.reversed),9}"));
Console.WriteLine(report);
Outcome:
99162322 :: 0003 000e 000b 0012 0012 0012 :: 99162322
123 :: 0003 001e :: 123
456 :: 000e 0016 :: 456
Please, note, that many a string produce the same hash value (say, "hello" and mine "\u0003\u000e\u000b\u0012\u0012\u0012")
No.
One of the fundamental points of hashing is that it's irreversible.
There are many string that will produce the has 99162322, so while it might be possible to find all of them (given a maximum string length), but there would be no way to determine which one was 'correct'.
How can I convert this string:
This string contains the Unicode character Pi(π)
into an escaped ASCII string:
This string contains the Unicode character Pi(\u03a0)
and vice versa?
The current Encoding available in C# converts the π character to "?". I need to preserve that character.
This goes back and forth to and from the \uXXXX format.
class Program {
static void Main( string[] args ) {
string unicodeString = "This function contains a unicode character pi (\u03a0)";
Console.WriteLine( unicodeString );
string encoded = EncodeNonAsciiCharacters(unicodeString);
Console.WriteLine( encoded );
string decoded = DecodeEncodedNonAsciiCharacters( encoded );
Console.WriteLine( decoded );
}
static string EncodeNonAsciiCharacters( string value ) {
StringBuilder sb = new StringBuilder();
foreach( char c in value ) {
if( c > 127 ) {
// This character is too big for ASCII
string encodedValue = "\\u" + ((int) c).ToString( "x4" );
sb.Append( encodedValue );
}
else {
sb.Append( c );
}
}
return sb.ToString();
}
static string DecodeEncodedNonAsciiCharacters( string value ) {
return Regex.Replace(
value,
#"\\u(?<Value>[a-zA-Z0-9]{4})",
m => {
return ((char) int.Parse( m.Groups["Value"].Value, NumberStyles.HexNumber )).ToString();
} );
}
}
Outputs:
This function contains a unicode character pi (π)
This function contains a unicode character pi (\u03a0)
This function contains a unicode character pi (π)
For Unescape You can simply use this functions:
System.Text.RegularExpressions.Regex.Unescape(string)
System.Uri.UnescapeDataString(string)
I suggest using this method (It works better with UTF-8):
UnescapeDataString(string)
string StringFold(string input, Func<char, string> proc)
{
return string.Concat(input.Select(proc).ToArray());
}
string FoldProc(char input)
{
if (input >= 128)
{
return string.Format(#"\u{0:x4}", (int)input);
}
return input.ToString();
}
string EscapeToAscii(string input)
{
return StringFold(input, FoldProc);
}
As a one-liner:
var result = Regex.Replace(input, #"[^\x00-\x7F]", c =>
string.Format(#"\u{0:x4}", (int)c.Value[0]));
class Program
{
static void Main(string[] args)
{
char[] originalString = "This string contains the unicode character Pi(π)".ToCharArray();
StringBuilder asAscii = new StringBuilder(); // store final ascii string and Unicode points
foreach (char c in originalString)
{
// test if char is ascii, otherwise convert to Unicode Code Point
int cint = Convert.ToInt32(c);
if (cint <= 127 && cint >= 0)
asAscii.Append(c);
else
asAscii.Append(String.Format("\\u{0:x4} ", cint).Trim());
}
Console.WriteLine("Final string: {0}", asAscii);
Console.ReadKey();
}
}
All non-ASCII chars are converted to their Unicode Code Point representation and appended to the final string.
Here is my current implementation:
public static class UnicodeStringExtensions
{
public static string EncodeNonAsciiCharacters(this string value) {
var bytes = Encoding.Unicode.GetBytes(value);
var sb = StringBuilderCache.Acquire(value.Length);
bool encodedsomething = false;
for (int i = 0; i < bytes.Length; i += 2) {
var c = BitConverter.ToUInt16(bytes, i);
if ((c >= 0x20 && c <= 0x7f) || c == 0x0A || c == 0x0D) {
sb.Append((char) c);
} else {
sb.Append($"\\u{c:x4}");
encodedsomething = true;
}
}
if (!encodedsomething) {
StringBuilderCache.Release(sb);
return value;
}
return StringBuilderCache.GetStringAndRelease(sb);
}
public static string DecodeEncodedNonAsciiCharacters(this string value)
=> Regex.Replace(value,/*language=regexp*/#"(?:\\u[a-fA-F0-9]{4})+", Decode);
static readonly string[] Splitsequence = new [] { "\\u" };
private static string Decode(Match m) {
var bytes = m.Value.Split(Splitsequence, StringSplitOptions.RemoveEmptyEntries)
.Select(s => ushort.Parse(s, NumberStyles.HexNumber)).SelectMany(BitConverter.GetBytes).ToArray();
return Encoding.Unicode.GetString(bytes);
}
}
This passes a test:
public void TestBigUnicode() {
var s = "\U00020000";
var encoded = s.EncodeNonAsciiCharacters();
var decoded = encoded.DecodeEncodedNonAsciiCharacters();
Assert.Equals(s, decoded);
}
with the encoded value: "\ud840\udc00"
This implementation makes use of a StringBuilderCache (reference source link)
A small patch to #Adam Sills's answer which solves FormatException on cases where the input string like "c:\u00ab\otherdirectory\" plus RegexOptions.Compiled makes the Regex compilation much faster:
private static Regex DECODING_REGEX = new Regex(#"\\u(?<Value>[a-fA-F0-9]{4})", RegexOptions.Compiled);
private const string PLACEHOLDER = #"#!#";
public static string DecodeEncodedNonAsciiCharacters(this string value)
{
return DECODING_REGEX.Replace(
value.Replace(#"\\", PLACEHOLDER),
m => {
return ((char)int.Parse(m.Groups["Value"].Value, NumberStyles.HexNumber)).ToString(); })
.Replace(PLACEHOLDER, #"\\");
}
To store actual Unicode codepoints, you have to first decode the String's UTF-16 codeunits to UTF-32 codeunits (which are currently the same as the Unicode codepoints). Use System.Text.Encoding.UTF32.GetBytes() for that, and then write the resulting bytes to the StringBuilder as needed,i.e.
static void Main(string[] args)
{
String originalString = "This string contains the unicode character Pi(π)";
Byte[] bytes = Encoding.UTF32.GetBytes(originalString);
StringBuilder asAscii = new StringBuilder();
for (int idx = 0; idx < bytes.Length; idx += 4)
{
uint codepoint = BitConverter.ToUInt32(bytes, idx);
if (codepoint <= 127)
asAscii.Append(Convert.ToChar(codepoint));
else
asAscii.AppendFormat("\\u{0:x4}", codepoint);
}
Console.WriteLine("Final string: {0}", asAscii);
Console.ReadKey();
}
You need to use the Convert() method in the Encoding class:
Create an Encoding object that represents ASCII encoding
Create an Encoding object that represents Unicode encoding
Call Encoding.Convert() with the source encoding, the destination encoding, and the string to be encoded
There is an example here:
using System;
using System.Text;
namespace ConvertExample
{
class ConvertExampleClass
{
static void Main()
{
string unicodeString = "This string contains the unicode character Pi(\u03a0)";
// Create two different encodings.
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte[].
byte[] unicodeBytes = unicode.GetBytes(unicodeString);
// Perform the conversion from one encoding to the other.
byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
// This is a slightly different approach to converting to illustrate
// the use of GetCharCount/GetChars.
char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
// Display the strings created before and after the conversion.
Console.WriteLine("Original string: {0}", unicodeString);
Console.WriteLine("Ascii converted string: {0}", asciiString);
}
}
}
How do I encrypt passwords in such a way that it not only changes the characters of the password but also add extra characters. For example a password as "ABC" would become "12345" instead of "123". Another way is that for each character, the key shift is different. Below are my codes.
class CipherMachine
{
static private List<char> Charset =
new List<char>("PQOWIEURYTLAKSJDHFGMZNXBCV"
+ "olpikmujnyhbtgvrfcedxwszaq"
+ "1597362480"
+ "~!##$%^&*()_+"
+ "PQOWIEURYTLAKSJDHFGMZNXBCV"
+ "olpikmujnyhbtgvrfcedxwszaq"
+ "1597362480"
+ "~!##$%^&*()_+");
static private int Key = 39;
//static private int Length = 0;
static public string Encrypt(string plain)
{
string cipher = "";
foreach (char i in plain)
{
cipher += Charset.ElementAt(Charset.IndexOf(i) + Key);
//cipher += Charset.ElementAt(Charset.IndexOf(i) + Length);
}
return cipher;
}
static public string Decrypt(string cipher)
{
string plain = "";
foreach (char i in cipher)
{
plain += Charset.ElementAt(Charset.LastIndexOf(i) - Key);
//plain += Charset.ElementAt(Charset.LastIndexOf(i) - Length);
}
return plain;
}
}
}
Lines that are commented out are what I thought I could do but it turned out wrong.
You have made the string double length so that the + Key and - Key works, but you ought to have one string of all characters and then WRAP the index (so that if the index goes beyond the length of the string, it wraps back to the beginning). You can achieve this with the % modulus operator:
static private List<char> Charset =
new List<char>("PQOWIEURYTLAKSJDHFGMZNXBCV"
+ "olpikmujnyhbtgvrfcedxwszaq"
+ "1597362480"
+ "~!##$%^&*()_+");
int length = Charset.Count();
// to encrypt
int key = 24;
char unencryptedChar = 'P';
int unencryptedIndex = Charset.IndexOf(unencryptedChar);
int encryptedIndex = (unencryptedIndex + key) % length;
char encryptedChar = Charset.ElementAt(encryptedIndex);
// to unencrypt
int encryptedIndex = Charset.IndexOf(encryptedChar);
int unencryptedIndex = (encryptedIndex - key + length) % length;
char unencryptedChar = Charset.ElementAt(unencryptedIndex);
When you subtract the key in the second part, the index goes negative, and modulus won't work properly on a negative, so we add the length (though this only works if the key is smaller than the length).
I am trying to create a php function thats will allow me access to a dotnet single sign on system and I am hung up on finding a php equivalent to GetBytesFromUTF8, I have tried ord and mb_string to no avail. Any ideas for a php equivalent to the C# GetBytesFromUTF8?
//Function to Create the SSO function SSO($key,$uid){ $lenth=32; $aZ09 = array_merge(range('A', 'Z'), range('a', 'z'),range(0, 9)); $randphrase ='';
for($c=0;$c < $lenth;$c++) {
$randphrase .= $aZ09[mt_rand(0,count($aZ09)-1)];
}
//Append key onto phrase end
$randkey=$randphrase.$key;
//Number of Bytes is string (THIS IS THE PROBLEM, ITS JUST ADDING THE STRING LENGTH)
$bytevalue=mb_strlen($randkey, 'latin1');
// SHA512 Hash
//$toencode= utf8_encode($bytevalue);
$output = hash("sha512", $bytevalue);
//base 64 encode the hash
$sso = base64_encode($output);
$length = mb_strlen($sso);
$characters = 2;
$start = $length - $characters;
$last2 = substr($sso , $start ,$characters);
//$startitup = APIClient::Create('http://my.staging.dosespot.com/LoginSingleSignOn.aspx','SingleSignOnCode=$ssocode');
// Yes, Strip the extra ==
if($last2 == "=="){$ssocode = substr($sso,0,-2);}
// No, just pass the value to the next step
else{$ssocode=$sso;}
//Use first 22 charecters of random.
$shortphrase=substr($randphrase,0,22);
//Append uid & key onto shortened phrase end
$uidv=$uid.$shortphrase.$key;
//Number of Bytes is string
$idbytevalue=mb_strlen($uidv, 'latin1');
//$idbytevalue= strBytes(utf8_encode($uidv));
// SHA512 Hash
$idencode= utf8_encode($idbytevalue);
$idoutput = hash("sha512", $idencode);
// Base64 Encode of hash
$idssoe = base64_encode($idoutput);
//Determine if we need to strip the zeros
$idlength = mb_strlen($idssoe);
$idcharacters = 2;
$idstart = $idlength - $idcharacters;
$idlast2 = substr($idssoe , $idstart ,$idcharacters);
if($idlast2 == "=="){$ssouidv = substr($idssoe,0,-2);}
// No, just pass the value to the next step
else{$ssouidv=$idssoe;}
return array($ssocode, $ssouidv);
}
I am trying to replicate this c#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace DoseSpot.EncryptionLibrary
{
public class EncodingUtility
{
public enum encodingOptions : int
{
ASCII = 0,
UTF7,
UTF8,
UTF32,
Unicode,
Base64String
}
public static string GetString(byte[] data, encodingOptions eo)
{
switch (eo)
{
case encodingOptions.ASCII:
return ToASCII(data);
case encodingOptions.Unicode:
return ToUnicode(data);
case encodingOptions.Base64String:
return ToBase64String(data);
case encodingOptions.UTF7:
return ToUTF7(data);
case encodingOptions.UTF32:
return ToUTF32(data);
case encodingOptions.UTF8:
default:
return ToUTF8(data);
}
}
public static byte[] GetBytes(string message, encodingOptions eo)
{
switch (eo)
{
case encodingOptions.ASCII:
return FromASCII(message);
case encodingOptions.Unicode:
return FromUnicode(message);
case encodingOptions.Base64String:
return FromBase64String(message);
case encodingOptions.UTF7:
return FromUTF7(message);
case encodingOptions.UTF32:
return FromUTF32(message);
case encodingOptions.UTF8:
default:
return FromUTF8(message);
}
}
protected static string ToBase64String(byte[] data)
{
return Convert.ToBase64String(data);
}
protected static string ToUnicode(byte[] data)
{
return unicode.GetString(data);
}
protected static string ToASCII(byte[] data)
{
return ascii.GetString(data);
}
protected static string ToUTF7(byte[] data)
{
return utf7.GetString(data);
}
protected static string ToUTF8(byte[] data)
{
return utf8.GetString(data);
}
protected static string ToUTF32(byte[] data)
{
return utf32.GetString(data);
}
protected static byte[] FromBase64String(string originalString)
{
return Convert.FromBase64String(originalString);
}
protected static byte[] FromUnicode(string originalString)
{
return unicode.GetBytes(originalString);
}
protected static byte[] FromASCII(string originalString)
{
return ascii.GetBytes(originalString);
}
protected static byte[] FromUTF7(string originalString)
{
return utf7.GetBytes(originalString);
}
protected static byte[] FromUTF8(string originalString)
{
return utf8.GetBytes(originalString);
}
protected static byte[] FromUTF32(string originalString)
{
return utf32.GetBytes(originalString);
}
public static Encoding getEncoding(encodingOptions eo)
{
switch (eo)
{
case encodingOptions.ASCII:
return ascii;
case encodingOptions.UTF7:
return utf7;
case encodingOptions.UTF8:
return utf8;
case encodingOptions.UTF32:
return utf32;
case encodingOptions.Unicode:
default:
return unicode;
}
}
private static ASCIIEncoding ascii = new ASCIIEncoding();
private static UTF8Encoding utf8 = new UTF8Encoding();
private static UTF7Encoding utf7 = new UTF7Encoding();
private static UTF32Encoding utf32 = new UTF32Encoding();
private static UnicodeEncoding unicode = new UnicodeEncoding();
}
}
public static class EncryptionCommon
{
public static int KeyLength = 32;
public static int PhraseLength = 32;
public static string CreatePhrase()
{
return Randomizer.RandomNumberOfLettersAll(PhraseLength);
}
public static string CreateKey()
{
return Randomizer.RandomNumberOfLetters(KeyLength);
}
public static string Encrypt(string Phrase, string MyKey)
{
byte[] data = EncodingUtility.GetBytes(Phrase + MyKey, EncodingUtility.encodingOptions.UTF8);
byte[] result = new SHA512Managed().ComputeHash(data);
string tempString = EncodingUtility.GetString(result, EncodingUtility.encodingOptions.Base64String);
if (tempString.Substring(tempString.Length - 2).ToString().Equals("=="))
tempString = tempString.Substring(0, tempString.Length - 2);
return tempString;
}
public static string EncryptUserId(string Phrase, int UserId, string MyKey)
{
string UserPhrase = UserId.ToString();
if (Phrase.Length > 22)
UserPhrase += Phrase.Substring(0, 22);
else
UserPhrase += Phrase;
return Encrypt(UserPhrase, MyKey);
}
public static bool VerifyKey(string key, string combinedPhraseAndEncryptedString)
{
Dictionary<string, string> myDict = SplitStringIntoPhraseAndHash(combinedPhraseAndEncryptedString);
string phrase = myDict["phrase"];
string providedEncryptedPhrase = myDict["encryptedString"];
string testEncryptedPhrase = Encrypt(phrase, key);
if (providedEncryptedPhrase.Equals(testEncryptedPhrase))
return true;
else
return false;
}
public static Dictionary<string, string> SplitStringIntoPhraseAndHash(string stringToSplit)
{
Dictionary<string, string> myResult = new Dictionary<string, string>();
if (stringToSplit != null && stringToSplit.Trim().Length >= PhraseLength)
{
string phraseFound = stringToSplit.Substring(0, PhraseLength);
string encryptedString = stringToSplit.Substring(PhraseLength);
myResult.Add("phrase", phraseFound);
myResult.Add("encryptedString", encryptedString);
}
return myResult;
}
public static string CreatePhraseEncryptedCombinedString(string phrase, string key)
{
string toReturn = phrase;
toReturn += Encrypt(phrase, key);
return toReturn;
}
}
I am trying to replicate this C# process in PHP to no avail.
HOW TO CREATE THE CORRECT SINGLESIGNONCODE:
1. You have been provided a key (in UTF-8)
2. Create a random phrase 32 characters long in UTF-8
a. Create32CharPhrase
3. Append the key to the phrase
a. Create32CharPhrase + Key
4. Get the value in Bytes from UTF-8 String
a. GetBytesFromUTF8(Create32CharPhrase + Key)
5. Use SHA512 to hash the byte value you just received
SHA512Hash(GetBytesFromUTF8(Create32CharPhrase + Key))
6. Get a Base64String out of the hash that you created
GetBase64String(SHA512Hash(GetBytesFromUTF8(Create32CharPhrase + Key)))
7. If there are two = signs at the end, then remove them.
RemoveExtraEqualsSigns(GetBase64String(SHA512Hash(GetBytesFromUTF8(Create32CharPhrase
+ Key))))
The Second part of the function...
HOW TO CREATE THE CORRECT SINGLESIGNONUSERIDVERIFY:
1. Grab the first 22 characters of the phrase from step 1
2. Append to the UserId string the 22 characters grabbed from step one
3. (UserId) + (first 22 characters of phrase)
4. Append the key to the string created in 2b
(UserId) +(first 22 characters of phrase) + key
5. Get the Byte value of the string
GetBytesFromUTF8((UserId) + (first 22 characters of phrase) + key)
6. Use SHA512 to hash the byte value you just received
SHA512Hash(GetBytesFromUTF8((UserId) + (first 22 characters of phrase) + key))
7. Get a Base64String out of the hash that you created
8. GetBase64String(SHA512Hash(GetBytesFromUTF8((UserId) + (first 22 characters of phrase) +
key)))
9. If there are two = signs at the end, then remove them.
RemoveExtraEqualsSigns(GetBase64String(SHA512Hash(GetBytesFromUTF8((UserId) + (first 22
characters of phrase) + key))))
Taken out of an edit to the original post
PHP SSO for ASP service
function SSO($key,$uid){
$lenth=32;
$aZ09 = array_merge(range('A', 'Z'), range('a', 'z'),range(0, 9));
$randphrase ='';
for($c=0;$c < $lenth;$c++) {
$randphrase .= $aZ09[mt_rand(0,count($aZ09)-1)];
}
//echo "Key: ".$key."<br/>";
//echo "Phrase: ".$randphrase."<br/>";
//Append key onto phrase end
$randkey=$randphrase.$key;
// SHA512 Hash
$toencode= utf8_encode($randkey);
// Pass 3rd, optional parameter as TRUE to output raw binary data
$output = hash("sha512", $toencode, true);
//base 64 encode the hash binary data
$sso = base64_encode($output);
$length = mb_strlen($sso);
$characters = 2;
$start = $length - $characters;
$last2 = substr($sso , $start ,$characters);
// Yes, Strip the extra ==
if($last2 == "==")
{$ssocode = substr($sso,0,-2);}
// No, just pass the value to the next step
else{$ssocode=$sso;}
// Prepend the random phrase to the encrypted code.
$ssocode = $randphrase.$ssocode;
//echo "SSO: ".$ssocode."<br/>";
//Use first 22 charecters of random.
$shortphrase=substr($randphrase,0,22);
//Append uid & key onto shortened phrase end
$uidv=$uid.$shortphrase.$key;
// SHA512 Hash
$idencode= utf8_encode($uidv);
// Pass 3rd, optional parameter as TRUE to output raw binary data
$idoutput = hash("sha512", $idencode, true);
// Base64 Encode of hash binary data
$idssoe = base64_encode($idoutput);
//Determine if we need to strip the zeros
$idlength = mb_strlen($idssoe);
$idcharacters = 2;
$idstart = $idlength - $idcharacters;
$idlast2 = substr($idssoe , $idstart ,$idcharacters);
if($idlast2 == "==")
{$ssouidv = substr($idssoe,0,-2);}
// No, just pass the value to the next step
else{$ssouidv=$idssoe;}
//echo "SSOID: ".$ssouidv;
return array($ssocode, $ssouidv);
}
How would you convert a parapraph to hex notation, and then back again into its original string form?
(C#)
A side note: would putting the string into hex format shrink it the most w/o getting into hardcore shrinking algo's?
What exactly do you mean by "hex notation"? That usually refers to encoding binary data, not text. You'd need to encode the text somehow (e.g. using UTF-8) and then encode the binary data as text by converting each byte to a pair of characters.
using System;
using System.Text;
public class Hex
{
static void Main()
{
string original = "The quick brown fox jumps over the lazy dog.";
byte[] binary = Encoding.UTF8.GetBytes(original);
string hex = BytesToHex(binary);
Console.WriteLine("Hex: {0}", hex);
byte[] backToBinary = HexToBytes(hex);
string restored = Encoding.UTF8.GetString(backToBinary);
Console.WriteLine("Restored: {0}", restored);
}
private static readonly char[] HexChars = "0123456789ABCDEF".ToCharArray();
public static string BytesToHex(byte[] data)
{
StringBuilder builder = new StringBuilder(data.Length*2);
foreach(byte b in data)
{
builder.Append(HexChars[b >> 4]);
builder.Append(HexChars[b & 0xf]);
}
return builder.ToString();
}
public static byte[] HexToBytes(string text)
{
if ((text.Length & 1) != 0)
{
throw new ArgumentException("Invalid hex: odd length");
}
byte[] ret = new byte[text.Length/2];
for (int i=0; i < text.Length; i += 2)
{
ret[i/2] = (byte)(ParseNybble(text[i]) << 4 | ParseNybble(text[i+1]));
}
return ret;
}
private static int ParseNybble(char c)
{
if (c >= '0' && c <= '9')
{
return c-'0';
}
if (c >= 'A' && c <= 'F')
{
return c-'A'+10;
}
if (c >= 'a' && c <= 'f')
{
return c-'A'+10;
}
throw new ArgumentOutOfRangeException("Invalid hex digit: " + c);
}
}
No, doing this would not shrink it at all. Quite the reverse - you'd end up with a lot more text! However, you could compress the binary form. In terms of representing arbitrary binary data as text, Base64 is more efficient than plain hex. Use Convert.ToBase64String and Convert.FromBase64String for the conversions.
public string ConvertToHex(string asciiString)
{
string hex = "";
foreach (char c in asciiString)
{
int tmp = c;
hex += String.Format("{0:x2}", (uint)System.Convert.ToUInt32(tmp.ToString()));
}
return hex;
}
While I can't help much on the C# implementation, I would highly recommend LZW as a simple-to-implement data compression algorithm for you to use.
Perhaps the answer can be more quickly reached if we ask: what are you really trying to do? Converting an ordinary string to a string of a hex representation seems like the wrong approach to anything, unless you are making a hexidecimal/encoding tutorial for the web.
static byte[] HexToBinary(string s) {
byte[] b = new byte[s.Length / 2];
for (int i = 0; i < b.Length; i++)
b[i] = Convert.ToByte(s.Substring(i * 2, 2), 16);
return b;
}
static string BinaryToHex(byte[] b) {
StringBuilder sb = new StringBuilder(b.Length * 2);
for (int i = 0; i < b.Length; i++)
sb.Append(Convert.ToString(256 + b[i], 16).Substring(1, 2));
return sb.ToString();
}