Generate unique ID from string in C# - c#

I need my app to handle a list of mods from a database and a list of locally downloaded mods that aren't.
Each mod of the database has a unique uint ID that I use to identify him but local mods don't have any ID.
At first I tried to generate an ID with string.GetHashCode() by using the mod's name but GetHashCode is still randomized at each run of the app.
Is there any other way to generate a persistent uint ID from the mod's name ?
Current code :
foreach(string mod in localMods)
{
//This way I get a number between 0 and 2147483648
uint newId = Convert.ToUInt32(Math.Abs(mod.GetHashCode());
ProfileMod newMod = new ProfileMod(newId);
}

The method GetHashCode() doesn't return the same value for the same string, especially if you re-run the application. It has a different purpose (like checking the equality during runtime, etc.).
So, it shouldn't be used as a unique identifier.
If you'd like to calculate the hash and get consistent results, you might consider using the standard hashing algorithms like MD5, SHA256, etc.
Here is a sample that calculates SHA256:
using System;
using System.Security.Cryptography;
using System.Text;
public class Program
{
public static void Main()
{
string input = "Hello World!";
// Using the SHA256 algorithm for the hash.
// NOTE: You can replace it with any other algorithm (e.g. MD5) if you need.
using (var hashAlgorithm = SHA256.Create())
{
// Convert the input string to a byte array and compute the hash.
byte[] data = hashAlgorithm.ComputeHash(Encoding.UTF8.GetBytes(input));
// Create a new Stringbuilder to collect the bytes
// and create a string.
var sBuilder = new StringBuilder();
// Loop through each byte of the hashed data
// and format each one as a hexadecimal string.
for (int i = 0; i < data.Length; i++)
{
sBuilder.Append(data[i].ToString("x2"));
}
// Return the hexadecimal string.
var hash = sBuilder.ToString();
Console.WriteLine($"The SHA256 hash of {input} is: {hash}.");
}
}
}
Though SHA256 produces longer result than MD5, the risk of the collisions are much lower. But if you still want to have smaller hashes (with a higher risk of collisions), you can use MD5, or even CRC32.
P.S. The sample code is based on the one from the Microsoft's documentation.

So I ended up listening to your advises and found a good answer in another post by using SHA-1
private System.Security.Cryptography.SHA1 hash = new System.Security.Cryptography.SHA1CryptoServiceProvider();
private uint GetUInt32HashCode(string strText)
{
if (string.IsNullOrEmpty(strText)) return 0;
//Unicode Encode Covering all characterset
byte[] byteContents = Encoding.Unicode.GetBytes(strText);
byte[] hashText = hash.ComputeHash(byteContents);
uint hashCodeStart = BitConverter.ToUInt32(hashText, 0);
uint hashCodeMedium = BitConverter.ToUInt32(hashText, 8);
uint hashCodeEnd = BitConverter.ToUInt32(hashText, 16);
var hashCode = hashCodeStart ^ hashCodeMedium ^ hashCodeEnd;
return uint.MaxValue - hashCode;
}
Could probably be optimized but it's good enough for now.

I wouldn't trust any solution involving hashing or such. Eventually you will end-up having conflicts in the IDs especially if you have huge amount of records on your DB.
What I would prefer to do is to cast the int ID of the DB to a string when reading it and then use some function like Guid.NewGuid().ToString() to generate a string UID for the local ones.
This way you will not have any conflict at all.
I guess that you will have to employ some kind of such strategy.

Related

Creating Guid using a hash of a string

I'm toying with the idea of using a Guid as a PrimaryKey in a noSQL database thats the combination of three different properties (its probably a bad idea). These three properties are; two integers and a DateTime - they are unique when combined. The reason I'm using a Guid is because preexisting data of same structure uses the Guid instead of the these properties to lookup data.
If I convert them to strings and concat them. Then I convert to byte[] and create a Guid. What are the chances of a collision? I assume the hashing will be the problem here? If I use a weak 16byte hashing algorithm such as MD5 what are the chance of two guid matching (collision) if properties are different; eg integers and datetime? What happens if I use a hashing algorithm like SHA256 and just used the first 16 bytes instead of MD5? Are the odds of collision still the same?
Otherwise I have other options such as a secondary lookup if required but this doubles the writes, reads and cost.
Example:
public static Guid GenerateId(int locationId, int orderNumber, DateTime orderDate)
{
var combined = $"{locationId}{orderNumber}{orderDate.ToString("d", CultureInfo.InvariantCulture)}";
using (MD5 md5 = MD5.Create())
{
byte[] hash = md5.ComputeHash(Encoding.Default.GetBytes(combined));
return new Guid(hash);
}
}
Why hashing at all? If you are totally sure those three parameters combined are always unique then you have all the data you need to create a unique GUID. DateTime is 8 bytes long, int is 4 bytes long, so your data is 16 bytes long, and that's the exact size of a GUID. You can use BitConverter to get the bytes of those values and use the GUID's constructor that takes a 16 byte array:
DateTime firstValue = DateTime.Now; //Or whatever it is
int secondValue = 33; //whatever
int thirdValue = 44; //whatever
List<byte> tempBuffer = new List<byte>();
tempBuffer.AddRange(BitConverter.GetBytes(firstValue.ToBinary())); //Needs to convert to long first with ToBinary
tempBuffer.AddRange(BitConverter.GetBytes(secondValue));
tempBuffer.AddRange(BitConverter.GetBytes(thirdValue));
Guid id = new Guid(tempBuffer.ToArray());

c# Generate Random number passing long as seed instead of int32

c# Generate Random number passing long as a seed instead of int32, but l need to pass phone numbers or accounts number
https://learn.microsoft.com/en-us/dotnet/api/system.random.-ctor?view=netframework-4.8#System_Random__ctor_System_Int32_
Please suggest any reliable NuGet package which does this or any implementation who has already done something like this.
I need to pass the complete PhoneNumber as the seed which I'm able to do in python but not with C# and my code stack is all in C#
using System;
public class Program
{
public static void Main()
{
int seed = 0123456789;
Random random = new Random(seed);
double result = random.NextDouble();
Console.WriteLine(result);
}
}
Some insights on my requirements and what I'm trying to achieve:
1)We're doing this for A/B testing and todo data analysis on the
experience of two services.
2)When a request comes with
phoneNumber based on random.NextDouble() there is a preset percentage
which we use to determine whether to send a request to service A or
service B
3)For example, let's says the request comes and falls
under >0.5 then we direct the request to service A and the next time
the request with the same phone number comes in it will be >0.5 and
goes service A since the seed is a unique hash of phoneNumber.
The method GetHashCode() belongs to Object class, it has nothing to do with random number generation. Please read here (https://learn.microsoft.com/en-us/dotnet/api/system.object.gethashcode?view=netframework-4.8). The documentation clearly states that it is possible to get collisions specially if input is consistent.
The method HashAlgorithm.ComputeHash (documented here - https://learn.microsoft.com/en-us/dotnet/api/system.security.cryptography.hashalgorithm.computehash?view=netframework-4.8) calculates the hash for a given value, but it is consistent in nature, i.e. if input is same, generated output is also same. Obviously this is not the desired output (I assume). I have attached the sample code I tried to generate this.
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
while (true)
{
Console.WriteLine("Enter a 9 digit+ number to calculate hash");
var val = Console.ReadLine();
long target = 0;
bool result = long.TryParse(val,out target);
if (result)
{
var calculatedHash = OutputHash(target);
Console.WriteLine("Calculated hash is : " + calculatedHash);
}
else
{
Console.WriteLine("Incorrect input. Please try again.");
}
}
}
public static string OutputHash(long number)
{
string source = Convert.ToString(number);
string hash;
using (SHA256 sha256Hash = SHA256.Create())
{
hash = GetHash(sha256Hash, source);
Console.WriteLine($"The SHA256 hash of {source} is: {hash}.");
Console.WriteLine("Verifying the hash...");
if (VerifyHash(sha256Hash, source, hash))
{
Console.WriteLine("The hashes are the same.");
}
else
{
Console.WriteLine("The hashes are not same.");
}
}
return hash;
}
private static string GetHash(HashAlgorithm hashAlgorithm, string input)
{
// Convert the input string to a byte array and compute the hash.
byte[] data = hashAlgorithm.ComputeHash(Encoding.UTF8.GetBytes(input));
// Create a new Stringbuilder to collect the bytes
// and create a string.
var sBuilder = new StringBuilder();
// Loop through each byte of the hashed data
// and format each one as a hexadecimal string.
for (int i = 0; i < data.Length; i++)
{
sBuilder.Append(data[i].ToString("x2"));
}
// Return the hexadecimal string.
return sBuilder.ToString();
}
// Verify a hash against a string.
private static bool VerifyHash(HashAlgorithm hashAlgorithm, string input, string hash)
{
// Hash the input.
var hashOfInput = GetHash(hashAlgorithm, input);
// Create a StringComparer an compare the hashes.
StringComparer comparer = StringComparer.OrdinalIgnoreCase;
return comparer.Compare(hashOfInput, hash) == 0;
}
I agree with #Knoop 's comment above that you might end up with same integer mapping to multiple long number input values.
If you are looking for a 'pure' random number generator with long value as seed, you don't have a choice but to go for third party libraries (or implementing your own custom algorithm). However, rather than getting into such complexities, simple
Guid g = Guid.NewGuid();
should do the trick (https://learn.microsoft.com/en-us/dotnet/api/system.guid.newguid?view=netframework-4.8).
Documentation (https://learn.microsoft.com/en-gb/windows/win32/api/combaseapi/nf-combaseapi-cocreateguid?redirectedfrom=MSDN )says that even this can end up having collisions but chances are very minimal.
Finally, this sounds like potential duplicate of .NET unique object identifier
take the hash of the phone number, eg:
var phoneNumber = 123456789L;
var seed = phoneNumber.GetHashCode();
This means that for the same phoneNumber you will get the same sequence. It also means that for some phone numbers you will get identical sequences, but that is going to be slim. And it might be different on different .net runtimes as commented, but you might not care.
Not sure why you want to, but I there are reasons, e.g. test code

C# Hash with Collision Domain

I am working on a project that encrypts a string, which is the easy part. The hard part is finding a method to hash the string before encryption that returns a value with a collision domain. That hash will be stored along with the encrypted string in a database table.
The reason for doing this is to create a subset to decrypt when needing to search for a single record. How can this be accomplished using C#?
I assume you need help creating the collision domain, the easiest way to do it is write a function that transforms the string in to a new string that has a high collision chance then hash that new string for your lookup value.
private static int COLLISION_LENGTH = 5;
public static string CreateCollision(string oldValue)
{
var chars = new char[COLLISION_LENGTH];
for(int i = 0; i < oldValue.Length; i++)
{
chars[i % chars.Length] ^= oldValue[i];
}
return new String(chars);
}
You then just need to hash the output of CreateCollision with the hash algorithm of your choice, I recommend using a strong hash system like you would for a password like Rfc2898DeriveBytes and treat the hash like you would a password (you will need to use a fixed salt however) because this hash does leak information about the data you encrypted.
Adjust COLLISION_LENGTH as needed.

What is the .NET Equivalent of Java's SecretKeySpec class?

I was provided the following code sample in Java and I'm having trouble converting it to C#. How would I go about converting this so it'll work in .NET 4.5?
public static String constructOTP(final Long counter, final String key)
throws NoSuchAlgorithmException, DecoderException, InvalidKeyException
{
// setup the HMAC algorithm, setting the key to use
final Mac mac = Mac.getInstance("HmacSHA512");
// convert the key from a hex string to a byte array
final byte[] binaryKey = Hex.decodeHex(key.toCharArray());
// initialize the HMAC with a key spec created from the key
mac.init(new SecretKeySpec(binaryKey, "HmacSHA512"));
// compute the OTP using the bytes of the counter
byte[] computedOtp = mac.doFinal(
ByteBuffer.allocate(8).putLong(counter).array());
//
// increment the counter and store the new value
//
// return the value as a hex encoded string
return new String(Hex.encodeHex(computedOtp));
}
Here is the C# code that I've come up with thanks to Duncan pointing out the HMACSHA512 class, but I'm unable to verify the results match without installing java, which I can't do on this machine. Does this code match the above Java?
public string ConstructOTP(long counter, string key)
{
var mac = new HMACSHA512(ConvertHexStringToByteArray(key));
var buffer = BitConverter.GetBytes(counter);
Array.Resize(ref buffer, 8);
var computedOtp = mac.ComputeHash(buffer);
var hex = new StringBuilder(computedOtp.Length * 2);
foreach (var b in computedOtp)
hex.AppendFormat("{0:x2", b);
return hex.ToString();
}
A SecretKeySpec is used to convert binary input into something that is recognised by Java security providers as a key. It does little more than decorate the bytes with a little post-it note saying "Pssst, it's an HmacSHA512 key...".
You can basically ignore it as a Java-ism. For your .NET code, you just need to find a way of declaring what the HMAC key is. Looking at the HMACSHA512 class, this seems quite straight-forward. There is a constructor that takes a byte array containing your key value.

32 bit fast uniform hash function. Use MD5 / SHA1 and cut off 4 bytes? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the best 32bit hash function for short strings (tag names)?
I need to hash many strings to 32bit (uint).
Can I just use MD5 or SHA1 and take 4 bytes from it? Or are there better alternatives?
There is no need for security or to care if one is cracked and so on.
I just need to hash fast and uniform to 32 bit. MD5 and SHA1 should be uniform.
But are there better (faster) build in alternatives I could use? If not, which of both would you use?
Here someone asked which one is better, but not for alternatives and there was a security matter (I don't care for security):
How to Use SHA1 or MD5 in C#?(Which One is Better in Performance and Security for Authentication)
Do you need a cryptographic-strength hash? If all you need is 32 bits I bet not.
Try the Fowler-Noll-Vo hash. It's fast, has good distribution and avalanche effect, and is generally acceptable for hashtables, checksums etc:
public static uint To32BitFnv1aHash(this string toHash,
bool separateUpperByte = false)
{
IEnumerable<byte> bytesToHash;
if (separateUpperByte)
bytesToHash = toHash.ToCharArray()
.Select(c => new[] { (byte)((c - (byte)c) >> 8), (byte)c })
.SelectMany(c => c);
else
bytesToHash = toHash.ToCharArray()
.Select(Convert.ToByte);
//this is the actual hash function; very simple
uint hash = FnvConstants.FnvOffset32;
foreach (var chunk in bytesToHash)
{
hash ^= chunk;
hash *= FnvConstants.FnvPrime32;
}
return hash;
}
public static class FnvConstants
{
public static readonly uint FnvPrime32 = 16777619;
public static readonly ulong FnvPrime64 = 1099511628211;
public static readonly uint FnvOffset32 = 2166136261;
public static readonly ulong FnvOffset64 = 14695981039346656037;
}
This is really useful for creating semantically equatable hashes for GetHashCode, based on a string digest of each object (a custom ToString() or otherwise). You can overload this to take any IEnumerable<byte> making it suitable for checksumming stream data etc. If you ever need a 64-bit hash (ulong), just copy the function and replace the constants used with the 64-bit constants. Oh, one more thing; the hash (as most do) rely on unchecked integer overflow; never run this hash in a "checked" block, or it will be virtually guaranteed to throw out exceptions.
If security does not play a role, generating a hash with a cryptographic hash function (such as MD5 or SHA1) and taking 4 bytes from it works. But they are slower than various non-cryptographic hash functions, as these functions are primarily designed for security, not speed.
Have a look at non-cryptographic hash functions such as FNV or Murmur.
Non-Cryptographic Hash Function Zoo
Performance Graphs
MurMurHash3, an ultra fast hash algorithm for C# / .NET
Edit: The floodyberry.com domain is now registered by a domain parking service - removed dead links
The easiest and yet good algorithm for strings is as follow:
int Hash(string s)
{
int res = 0;
for(int i = 0; i < str.Length; i++)
{
res += (i * str[i]) % int.MaxValue;
}
return res;
}
Obviously, this is absolutely not a secured hash algorithm but it is fast (really fast) returns 32 bit and as far as I know, is uniform (I've tried it for many algorithmic challenges with good results).
Not for use to hash password or any sensible data.

Categories