I'm toying with the idea of using a Guid as a PrimaryKey in a noSQL database thats the combination of three different properties (its probably a bad idea). These three properties are; two integers and a DateTime - they are unique when combined. The reason I'm using a Guid is because preexisting data of same structure uses the Guid instead of the these properties to lookup data.
If I convert them to strings and concat them. Then I convert to byte[] and create a Guid. What are the chances of a collision? I assume the hashing will be the problem here? If I use a weak 16byte hashing algorithm such as MD5 what are the chance of two guid matching (collision) if properties are different; eg integers and datetime? What happens if I use a hashing algorithm like SHA256 and just used the first 16 bytes instead of MD5? Are the odds of collision still the same?
Otherwise I have other options such as a secondary lookup if required but this doubles the writes, reads and cost.
Example:
public static Guid GenerateId(int locationId, int orderNumber, DateTime orderDate)
{
var combined = $"{locationId}{orderNumber}{orderDate.ToString("d", CultureInfo.InvariantCulture)}";
using (MD5 md5 = MD5.Create())
{
byte[] hash = md5.ComputeHash(Encoding.Default.GetBytes(combined));
return new Guid(hash);
}
}
Why hashing at all? If you are totally sure those three parameters combined are always unique then you have all the data you need to create a unique GUID. DateTime is 8 bytes long, int is 4 bytes long, so your data is 16 bytes long, and that's the exact size of a GUID. You can use BitConverter to get the bytes of those values and use the GUID's constructor that takes a 16 byte array:
DateTime firstValue = DateTime.Now; //Or whatever it is
int secondValue = 33; //whatever
int thirdValue = 44; //whatever
List<byte> tempBuffer = new List<byte>();
tempBuffer.AddRange(BitConverter.GetBytes(firstValue.ToBinary())); //Needs to convert to long first with ToBinary
tempBuffer.AddRange(BitConverter.GetBytes(secondValue));
tempBuffer.AddRange(BitConverter.GetBytes(thirdValue));
Guid id = new Guid(tempBuffer.ToArray());
Related
I need my app to handle a list of mods from a database and a list of locally downloaded mods that aren't.
Each mod of the database has a unique uint ID that I use to identify him but local mods don't have any ID.
At first I tried to generate an ID with string.GetHashCode() by using the mod's name but GetHashCode is still randomized at each run of the app.
Is there any other way to generate a persistent uint ID from the mod's name ?
Current code :
foreach(string mod in localMods)
{
//This way I get a number between 0 and 2147483648
uint newId = Convert.ToUInt32(Math.Abs(mod.GetHashCode());
ProfileMod newMod = new ProfileMod(newId);
}
The method GetHashCode() doesn't return the same value for the same string, especially if you re-run the application. It has a different purpose (like checking the equality during runtime, etc.).
So, it shouldn't be used as a unique identifier.
If you'd like to calculate the hash and get consistent results, you might consider using the standard hashing algorithms like MD5, SHA256, etc.
Here is a sample that calculates SHA256:
using System;
using System.Security.Cryptography;
using System.Text;
public class Program
{
public static void Main()
{
string input = "Hello World!";
// Using the SHA256 algorithm for the hash.
// NOTE: You can replace it with any other algorithm (e.g. MD5) if you need.
using (var hashAlgorithm = SHA256.Create())
{
// Convert the input string to a byte array and compute the hash.
byte[] data = hashAlgorithm.ComputeHash(Encoding.UTF8.GetBytes(input));
// Create a new Stringbuilder to collect the bytes
// and create a string.
var sBuilder = new StringBuilder();
// Loop through each byte of the hashed data
// and format each one as a hexadecimal string.
for (int i = 0; i < data.Length; i++)
{
sBuilder.Append(data[i].ToString("x2"));
}
// Return the hexadecimal string.
var hash = sBuilder.ToString();
Console.WriteLine($"The SHA256 hash of {input} is: {hash}.");
}
}
}
Though SHA256 produces longer result than MD5, the risk of the collisions are much lower. But if you still want to have smaller hashes (with a higher risk of collisions), you can use MD5, or even CRC32.
P.S. The sample code is based on the one from the Microsoft's documentation.
So I ended up listening to your advises and found a good answer in another post by using SHA-1
private System.Security.Cryptography.SHA1 hash = new System.Security.Cryptography.SHA1CryptoServiceProvider();
private uint GetUInt32HashCode(string strText)
{
if (string.IsNullOrEmpty(strText)) return 0;
//Unicode Encode Covering all characterset
byte[] byteContents = Encoding.Unicode.GetBytes(strText);
byte[] hashText = hash.ComputeHash(byteContents);
uint hashCodeStart = BitConverter.ToUInt32(hashText, 0);
uint hashCodeMedium = BitConverter.ToUInt32(hashText, 8);
uint hashCodeEnd = BitConverter.ToUInt32(hashText, 16);
var hashCode = hashCodeStart ^ hashCodeMedium ^ hashCodeEnd;
return uint.MaxValue - hashCode;
}
Could probably be optimized but it's good enough for now.
I wouldn't trust any solution involving hashing or such. Eventually you will end-up having conflicts in the IDs especially if you have huge amount of records on your DB.
What I would prefer to do is to cast the int ID of the DB to a string when reading it and then use some function like Guid.NewGuid().ToString() to generate a string UID for the local ones.
This way you will not have any conflict at all.
I guess that you will have to employ some kind of such strategy.
I created two structures of TheKey type k1={17,1375984} and k2={17,1593144}.
Obviosly the pointers in the second fields are different. But both get same hash code=346948941.
Expected to see different hash codes. See the code below.
struct TheKey
{
public int id;
public string Name;
public TheKey(int id, string name)
{
this.id = id;
Name = name;
}
}
static void Main() {
// assign two different strings to avoid interning
var k1 = new TheKey(17, "abc");
var k2 = new TheKey(17, new string(new[] { 'a', 'b', 'c' }));
Dump(k1); // prints the layout of a structure
Dump(k2);
Console.WriteLine("hash1={0}", k1.GetHashCode());
Console.WriteLine("hash2={0}", k2.GetHashCode());
}
unsafe static void Dump<T>(T s) where T : struct
{
byte[] b = new byte[8];
fixed (byte* pb = &b[0])
{
IntPtr ptr = new IntPtr(pb);
Marshal.StructureToPtr(s, ptr, true);
int* p1 = (int*)(&pb[0]); // first 32 bits
int* p2 = (int*)(&pb[4]);
Console.WriteLine("{0}", *p1);
Console.WriteLine("{0}", *p2);
}
}
Output:
17
1375984
17
1593144
hash1=346948941
hash2=346948941
It is a lot more complicated than meets the eye. For starters, give the key2 value a completely different string. Notice how the hash code is still the same:
var k1 = new TheKey(17, "abc");
var k2 = new TheKey(17, "def");
System.Diagnostics.Debug.Assert(k1.GetHashCode() == k2.GetHashCode());
Which is quite valid, the only requirement for a hash code is that the same value produces the same hash code. Different values don't have to produce different hash codes. That's not physically possible since a .NET hash code can only represent 4 billion distinct values.
Calculating the hash code for a struct is tricky business. The first thing the CLR does is check if the structure contains any reference type references or has gaps between the fields. A reference requires special treatment because the reference value is random. It is a pointer whose value changes when the garbage collector compacts the heap. Gaps in the structure layout are created because of alignment. A struct with a byte and an int has a 3 byte gap between the two fields.
If neither is the case then all the bits in the structure value are significant. The CLR quickly calculates the hash by xor-ing the bits, 32 at a time. This is a 'good' hash, all the fields in the struct participate in the hash code.
If the struct has fields of a reference type or has gaps then another approach is needed. The CLR iterates the fields of the struct and goes looking for one that is usable to generate a hash. A usable one is a field of a value type or an object reference that isn't null. As soon as it finds one, it takes the hash of that field, xors it with the method table pointer and quits.
In other words, only one field in the structure participates in the hash code calculation. Which is your case, only the id field is used. Which is why the string member value doesn't matter.
This is an obscure factoid that's obviously important to be aware of if you ever leave it up to the CLR to generate hash codes for a struct. By far the best thing to do is to just never do this. If you have to, then be sure to order the fields in the struct so that the first field gives you the best hash code. In your case, just swap the id and Name fields.
Another interesting tidbit, the 'good' hash calculation code has a bug. It will use the fast algorithm when the structure contains a System.Decimal. Problem is, the bits of a Decimal are not representative for its numeric value. Try this:
struct Test { public decimal value; }
static void Main() {
var t1 = new Test() { value = 1.0m };
var t2 = new Test() { value = 1.00m };
if (t1.GetHashCode() != t2.GetHashCode())
Console.WriteLine("gack!");
}
k1 and k2 contain the same values. Why are you surprised that they have the same hash code? It is contracted to return the same value for two objects that compare as equal.
Hash codes are created from state (values inside) of the structure / object. Not from where it is saved. And according to this : Why is ValueType.GetHashCode() implemented like it is?, the default behaviour of GetHashCode for value types, which struct is, is to return hash based on the values. And I believe that is the correct behaviour especialy for structures, that are suposed to be imutable.
I have a very large prime number (for RSA purposes) that needs to be converted to a byte array. The number however is currently stored as a string. I'm OK with storing it as a byte[] but either way the number is a string and I have to get it into a byte array.
Now to be clear I have used the RSA encryption and decryption sample data provided on MSDN and everything works so I have a high degree of confidence that the encryption portion is fine. Further the samples provided by MSDN provide prime numbers that have already been turned into byte[]. Thus I have a high degree of confidence that the breakdown is in MY conversion of the string representation of the number to a byte[].
I currently do this:
private static string _publicKeyExponent = "12345...310 digits......9876";
private static string _publicKeyModulus = "654782....620 digits.....4576";
_rsaPublicKey.Exponent = CoreHelpers.GetBytes(_publicKeyExponent);
And here is my GetBytes method that I suspect is causing the issue as it is getting the bytes of STRING characters NOT digits.
public static byte[] GetBytes(string str)
{
byte[] bytes = new byte[str.Length * sizeof(char)];
System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
return bytes;
}
Now if I have already identified the problem fixing should be straight forward no? Well for me yes and no. I don't know of any strong type in c# that I can parse a number of this size into. The best idea I can come up with is to break up the string into smaller chunks of say 10 chars which would then easily parse to INT32 and then getbytes of that. Add it to some master byte array and do it again.
You could use the BigInteger struct.
It contains numerous Parse static methods and the ToByteArray method.
Sample code:
public static byte[] GetBytes(string str)
{
BigInteger number;
return BigInteger.TryParse(str, out number) ? number.ToByteArray() : null;
}
I have the following String:
String characters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
I need to create two strings from it:
A string obtained simply by reordering the characters;
A string obtained by selecting 10 characters and reordering them.
So for (1) I would get, for example:
String characters = "jkDEF56789hisGHIbdefpqraXYZ1234txyzABCcglmnoRSTUVWuvwJKLMNOPQ0";
And for (2) I would get, for example:
String shortList = "8GisIbH9hd";
THE PROBLEM
I could just change to Char Array and order by randomly by a Guid.
However I want to specify some kind of key (maybe a guid?) and for that key the result or reordering and of selecting the shortList must be the same.
Does this make sense?
you could convert your GUID string to an int array of its ascii/utf/whatever codes like here
Getting The ASCII Value of a character in a C# string.
then iterate over this array with something along lines of this (note: this is pseudocode):
string res="";
for(elem in intconvertedGUIDstring) res+= characters[elem%(characters.count)];
for the task [2] you could reverse your Characters i.e. like here Best way to reverse a string
and use the c# string function left() to truncate it before running it through the same procedure
You can use a hash function with a good distribution value as seed for comparison between elements. Here's a sample:
static ulong GetHash(char value, ulong seed)
{
ulong hash = seed * 3074457345618258791ul;
hash += value;
hash *= 3074457345618258799ul;
return hash;
}
And use this function for comparison:
static void Main()
{
var seed = 53ul;
var str = "ABCDEFHYUXASPOIMNJH";
var shuffledStr = new string(str.OrderBy(x => GetHash(x, seed)).ToArray());
Console.WriteLine(shuffledStr);
}
Now every time you order by seed 53 you'll get the same result, and if you seed by 54 you'll get a different result.
I was provided the following code sample in Java and I'm having trouble converting it to C#. How would I go about converting this so it'll work in .NET 4.5?
public static String constructOTP(final Long counter, final String key)
throws NoSuchAlgorithmException, DecoderException, InvalidKeyException
{
// setup the HMAC algorithm, setting the key to use
final Mac mac = Mac.getInstance("HmacSHA512");
// convert the key from a hex string to a byte array
final byte[] binaryKey = Hex.decodeHex(key.toCharArray());
// initialize the HMAC with a key spec created from the key
mac.init(new SecretKeySpec(binaryKey, "HmacSHA512"));
// compute the OTP using the bytes of the counter
byte[] computedOtp = mac.doFinal(
ByteBuffer.allocate(8).putLong(counter).array());
//
// increment the counter and store the new value
//
// return the value as a hex encoded string
return new String(Hex.encodeHex(computedOtp));
}
Here is the C# code that I've come up with thanks to Duncan pointing out the HMACSHA512 class, but I'm unable to verify the results match without installing java, which I can't do on this machine. Does this code match the above Java?
public string ConstructOTP(long counter, string key)
{
var mac = new HMACSHA512(ConvertHexStringToByteArray(key));
var buffer = BitConverter.GetBytes(counter);
Array.Resize(ref buffer, 8);
var computedOtp = mac.ComputeHash(buffer);
var hex = new StringBuilder(computedOtp.Length * 2);
foreach (var b in computedOtp)
hex.AppendFormat("{0:x2", b);
return hex.ToString();
}
A SecretKeySpec is used to convert binary input into something that is recognised by Java security providers as a key. It does little more than decorate the bytes with a little post-it note saying "Pssst, it's an HmacSHA512 key...".
You can basically ignore it as a Java-ism. For your .NET code, you just need to find a way of declaring what the HMAC key is. Looking at the HMACSHA512 class, this seems quite straight-forward. There is a constructor that takes a byte array containing your key value.