I am working on a project that encrypts a string, which is the easy part. The hard part is finding a method to hash the string before encryption that returns a value with a collision domain. That hash will be stored along with the encrypted string in a database table.
The reason for doing this is to create a subset to decrypt when needing to search for a single record. How can this be accomplished using C#?
I assume you need help creating the collision domain, the easiest way to do it is write a function that transforms the string in to a new string that has a high collision chance then hash that new string for your lookup value.
private static int COLLISION_LENGTH = 5;
public static string CreateCollision(string oldValue)
{
var chars = new char[COLLISION_LENGTH];
for(int i = 0; i < oldValue.Length; i++)
{
chars[i % chars.Length] ^= oldValue[i];
}
return new String(chars);
}
You then just need to hash the output of CreateCollision with the hash algorithm of your choice, I recommend using a strong hash system like you would for a password like Rfc2898DeriveBytes and treat the hash like you would a password (you will need to use a fixed salt however) because this hash does leak information about the data you encrypted.
Adjust COLLISION_LENGTH as needed.
Related
I need my app to handle a list of mods from a database and a list of locally downloaded mods that aren't.
Each mod of the database has a unique uint ID that I use to identify him but local mods don't have any ID.
At first I tried to generate an ID with string.GetHashCode() by using the mod's name but GetHashCode is still randomized at each run of the app.
Is there any other way to generate a persistent uint ID from the mod's name ?
Current code :
foreach(string mod in localMods)
{
//This way I get a number between 0 and 2147483648
uint newId = Convert.ToUInt32(Math.Abs(mod.GetHashCode());
ProfileMod newMod = new ProfileMod(newId);
}
The method GetHashCode() doesn't return the same value for the same string, especially if you re-run the application. It has a different purpose (like checking the equality during runtime, etc.).
So, it shouldn't be used as a unique identifier.
If you'd like to calculate the hash and get consistent results, you might consider using the standard hashing algorithms like MD5, SHA256, etc.
Here is a sample that calculates SHA256:
using System;
using System.Security.Cryptography;
using System.Text;
public class Program
{
public static void Main()
{
string input = "Hello World!";
// Using the SHA256 algorithm for the hash.
// NOTE: You can replace it with any other algorithm (e.g. MD5) if you need.
using (var hashAlgorithm = SHA256.Create())
{
// Convert the input string to a byte array and compute the hash.
byte[] data = hashAlgorithm.ComputeHash(Encoding.UTF8.GetBytes(input));
// Create a new Stringbuilder to collect the bytes
// and create a string.
var sBuilder = new StringBuilder();
// Loop through each byte of the hashed data
// and format each one as a hexadecimal string.
for (int i = 0; i < data.Length; i++)
{
sBuilder.Append(data[i].ToString("x2"));
}
// Return the hexadecimal string.
var hash = sBuilder.ToString();
Console.WriteLine($"The SHA256 hash of {input} is: {hash}.");
}
}
}
Though SHA256 produces longer result than MD5, the risk of the collisions are much lower. But if you still want to have smaller hashes (with a higher risk of collisions), you can use MD5, or even CRC32.
P.S. The sample code is based on the one from the Microsoft's documentation.
So I ended up listening to your advises and found a good answer in another post by using SHA-1
private System.Security.Cryptography.SHA1 hash = new System.Security.Cryptography.SHA1CryptoServiceProvider();
private uint GetUInt32HashCode(string strText)
{
if (string.IsNullOrEmpty(strText)) return 0;
//Unicode Encode Covering all characterset
byte[] byteContents = Encoding.Unicode.GetBytes(strText);
byte[] hashText = hash.ComputeHash(byteContents);
uint hashCodeStart = BitConverter.ToUInt32(hashText, 0);
uint hashCodeMedium = BitConverter.ToUInt32(hashText, 8);
uint hashCodeEnd = BitConverter.ToUInt32(hashText, 16);
var hashCode = hashCodeStart ^ hashCodeMedium ^ hashCodeEnd;
return uint.MaxValue - hashCode;
}
Could probably be optimized but it's good enough for now.
I wouldn't trust any solution involving hashing or such. Eventually you will end-up having conflicts in the IDs especially if you have huge amount of records on your DB.
What I would prefer to do is to cast the int ID of the DB to a string when reading it and then use some function like Guid.NewGuid().ToString() to generate a string UID for the local ones.
This way you will not have any conflict at all.
I guess that you will have to employ some kind of such strategy.
c# Generate Random number passing long as a seed instead of int32, but l need to pass phone numbers or accounts number
https://learn.microsoft.com/en-us/dotnet/api/system.random.-ctor?view=netframework-4.8#System_Random__ctor_System_Int32_
Please suggest any reliable NuGet package which does this or any implementation who has already done something like this.
I need to pass the complete PhoneNumber as the seed which I'm able to do in python but not with C# and my code stack is all in C#
using System;
public class Program
{
public static void Main()
{
int seed = 0123456789;
Random random = new Random(seed);
double result = random.NextDouble();
Console.WriteLine(result);
}
}
Some insights on my requirements and what I'm trying to achieve:
1)We're doing this for A/B testing and todo data analysis on the
experience of two services.
2)When a request comes with
phoneNumber based on random.NextDouble() there is a preset percentage
which we use to determine whether to send a request to service A or
service B
3)For example, let's says the request comes and falls
under >0.5 then we direct the request to service A and the next time
the request with the same phone number comes in it will be >0.5 and
goes service A since the seed is a unique hash of phoneNumber.
The method GetHashCode() belongs to Object class, it has nothing to do with random number generation. Please read here (https://learn.microsoft.com/en-us/dotnet/api/system.object.gethashcode?view=netframework-4.8). The documentation clearly states that it is possible to get collisions specially if input is consistent.
The method HashAlgorithm.ComputeHash (documented here - https://learn.microsoft.com/en-us/dotnet/api/system.security.cryptography.hashalgorithm.computehash?view=netframework-4.8) calculates the hash for a given value, but it is consistent in nature, i.e. if input is same, generated output is also same. Obviously this is not the desired output (I assume). I have attached the sample code I tried to generate this.
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
while (true)
{
Console.WriteLine("Enter a 9 digit+ number to calculate hash");
var val = Console.ReadLine();
long target = 0;
bool result = long.TryParse(val,out target);
if (result)
{
var calculatedHash = OutputHash(target);
Console.WriteLine("Calculated hash is : " + calculatedHash);
}
else
{
Console.WriteLine("Incorrect input. Please try again.");
}
}
}
public static string OutputHash(long number)
{
string source = Convert.ToString(number);
string hash;
using (SHA256 sha256Hash = SHA256.Create())
{
hash = GetHash(sha256Hash, source);
Console.WriteLine($"The SHA256 hash of {source} is: {hash}.");
Console.WriteLine("Verifying the hash...");
if (VerifyHash(sha256Hash, source, hash))
{
Console.WriteLine("The hashes are the same.");
}
else
{
Console.WriteLine("The hashes are not same.");
}
}
return hash;
}
private static string GetHash(HashAlgorithm hashAlgorithm, string input)
{
// Convert the input string to a byte array and compute the hash.
byte[] data = hashAlgorithm.ComputeHash(Encoding.UTF8.GetBytes(input));
// Create a new Stringbuilder to collect the bytes
// and create a string.
var sBuilder = new StringBuilder();
// Loop through each byte of the hashed data
// and format each one as a hexadecimal string.
for (int i = 0; i < data.Length; i++)
{
sBuilder.Append(data[i].ToString("x2"));
}
// Return the hexadecimal string.
return sBuilder.ToString();
}
// Verify a hash against a string.
private static bool VerifyHash(HashAlgorithm hashAlgorithm, string input, string hash)
{
// Hash the input.
var hashOfInput = GetHash(hashAlgorithm, input);
// Create a StringComparer an compare the hashes.
StringComparer comparer = StringComparer.OrdinalIgnoreCase;
return comparer.Compare(hashOfInput, hash) == 0;
}
I agree with #Knoop 's comment above that you might end up with same integer mapping to multiple long number input values.
If you are looking for a 'pure' random number generator with long value as seed, you don't have a choice but to go for third party libraries (or implementing your own custom algorithm). However, rather than getting into such complexities, simple
Guid g = Guid.NewGuid();
should do the trick (https://learn.microsoft.com/en-us/dotnet/api/system.guid.newguid?view=netframework-4.8).
Documentation (https://learn.microsoft.com/en-gb/windows/win32/api/combaseapi/nf-combaseapi-cocreateguid?redirectedfrom=MSDN )says that even this can end up having collisions but chances are very minimal.
Finally, this sounds like potential duplicate of .NET unique object identifier
take the hash of the phone number, eg:
var phoneNumber = 123456789L;
var seed = phoneNumber.GetHashCode();
This means that for the same phoneNumber you will get the same sequence. It also means that for some phone numbers you will get identical sequences, but that is going to be slim. And it might be different on different .net runtimes as commented, but you might not care.
Not sure why you want to, but I there are reasons, e.g. test code
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
After lots of google searches I really cant seem to find out how to hash passwords in c# UWP, I've tried Bcrypt but that is not available for RT. Any ideas how I can hash my passwords in UWP? This is my first app in UWP so I thought everything would work like in WPF, seems like I was wrong. I have tried the BCRYPT from Nugetstore but nothing will run on UWP.
I just need a simple way to hash and/or salt a string and a simple way to validate the hash.
How about this approach(using System.Security.Cryptography):
To store user passwords in the database in a way that they cannot be extracted, the passwords need to be hashed using a one-way hashing algorithm such as SHA1
To do so, use the RNGCryptoServiceProvider to create a random salt, append the salt to the password, hash it using SHA1 CryptoServiceProvider class, and store the resulting string in the database along with the salt
The benefit provided by using a salted password is making a lookup table assisted dictionary attack against the stored values impractical, provided the salt is large enough
Sample Code:
// Create salted password to save in database.
private byte [] CreateDbPassword(byte[] unsaltedPassword)
{
//Create a salt value.
byte[] saltValue = new byte[saltLength];
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(saltValue);
return CreateSaltedPassword(saltValue, unsaltedPassword);
}
// Create a salted password given the salt value.
private byte[] CreateSaltedPassword(byte[] saltValue, byte[] unsaltedPassword)
{
// Add the salt to the hash.
byte[] rawSalted = new byte[unsaltedPassword.Length + saltValue.Length];
unsaltedPassword.CopyTo(rawSalted,0);
saltValue.CopyTo(rawSalted,unsaltedPassword.Length);
//Create the salted hash.
SHA1 sha1 = SHA1.Create();
byte[] saltedPassword = sha1.ComputeHash(rawSalted);
// Add the salt value to the salted hash.
byte[] dbPassword = new byte[saltedPassword.Length + saltValue.Length];
saltedPassword.CopyTo(dbPassword,0);
saltValue.CopyTo(dbPassword,saltedPassword.Length);
return dbPassword;
}
// Compare the hashed password against the stored password.
private bool ComparePasswords(byte[] storedPassword, byte[] hashedPassword)
{
if (storedPassword == null || hashedPassword == null || hashedPassword.Length != storedPassword.Length - saltLength)
return false;
// Get the saved saltValue.
byte[] saltValue = new byte[saltLength];
int saltOffset = storedPassword.Length - saltLength;
for (int i = 0; i < saltLength; i++)
saltValue[i] = storedPassword[saltOffset + i];
byte[] saltedPassword = CreateSaltedPassword(saltValue, hashedPassword);
// Compare the values.
return CompareByteArray(storedPassword, saltedPassword);
}
// Compare the contents of two byte arrays.
private bool CompareByteArray(byte[] array1, byte[] array2)
{
if (array1.Length != array2.Length)
return false;
int mismatch = 0;
for (int i = 0; i < array1.Length; i++)
{
mismatch |= array1[i] ^ array2[i];
}
return mismatch == 0;
}
MSDN: https://msdn.microsoft.com/en-us/library/aa288534(v=vs.71).aspx
UPDATE
For UWP apps you need to use the namespace Windows.Security.Cryptography.Core:
public String SampleDeriveFromPbkdf(
String strAlgName,
UInt32 targetSize)
{
// Open the specified algorithm.
KeyDerivationAlgorithmProvider objKdfProv = KeyDerivationAlgorithmProvider.OpenAlgorithm(strAlgName);
// Create a buffer that contains the secret used during derivation.
String strSecret = "MyPassword";
IBuffer buffSecret = CryptographicBuffer.ConvertStringToBinary(strSecret, BinaryStringEncoding.Utf8);
// Create a random salt value.
IBuffer buffSalt = CryptographicBuffer.GenerateRandom(32);
// Specify the number of iterations to be used during derivation.
UInt32 iterationCount = 10000;
// Create the derivation parameters.
KeyDerivationParameters pbkdf2Params = KeyDerivationParameters.BuildForPbkdf2(buffSalt, iterationCount);
// Create a key from the secret value.
CryptographicKey keyOriginal = objKdfProv.CreateKey(buffSecret);
// Derive a key based on the original key and the derivation parameters.
IBuffer keyDerived = CryptographicEngine.DeriveKeyMaterial(
keyOriginal,
pbkdf2Params,
targetSize);
// Encode the key to a hexadecimal value (for display)
String strKeyHex = CryptographicBuffer.EncodeToHexString(keyDerived);
// Return the encoded string
return strKeyHex;
}
After trying all of the above answers ( thanks a lot for helping tho ) i desided to go with MD5 hashing even through i know its a really really weak hashing and have no salt its fine for my need. I hash my passwords in UWP with the following code:
private static string ComputeMD5(string str)
{
var alg = HashAlgorithmProvider.OpenAlgorithm(HashAlgorithmNames.Md5);
IBuffer buff = CryptographicBuffer.ConvertStringToBinary(str, BinaryStringEncoding.Utf8);
var hashed = alg.HashData(buff);
var res = CryptographicBuffer.EncodeToHexString(hashed);
return res;
}
I have a very simple library on GitHub to do just that which is much better than using SHA directly. It uses MS PBKDF2 inside via Rfc2898DeriveBytes (no fancy invented home-brew algorithms in there) and is as easy to use as BCrypt.
I have a nuget package as well, but I suppose you'll need to compile it yourself to use with UWP.
Example usage:
ISimpleHash simpleHash = new SimpleHash();
// Creating a user hash, hashedPassword can be stored in a database
// hashedPassword contains the number of iterations and salt inside it similar to bcrypt format
string hashedPassword = simpleHash.Compute("Password123");
// Validating user's password by first loading it from database by username
string storedHash = _repository.GetUserPasswordHash(username);
bool isPasswordValid = false;
if (storedHash != null)
{
isPasswordValid = simpleHash.Verify("Password123", storedHash);
}
P.S. I guess I can even compile it to use with UWP target for nuget if it works for you and I figure out how to do that :)
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the best 32bit hash function for short strings (tag names)?
I need to hash many strings to 32bit (uint).
Can I just use MD5 or SHA1 and take 4 bytes from it? Or are there better alternatives?
There is no need for security or to care if one is cracked and so on.
I just need to hash fast and uniform to 32 bit. MD5 and SHA1 should be uniform.
But are there better (faster) build in alternatives I could use? If not, which of both would you use?
Here someone asked which one is better, but not for alternatives and there was a security matter (I don't care for security):
How to Use SHA1 or MD5 in C#?(Which One is Better in Performance and Security for Authentication)
Do you need a cryptographic-strength hash? If all you need is 32 bits I bet not.
Try the Fowler-Noll-Vo hash. It's fast, has good distribution and avalanche effect, and is generally acceptable for hashtables, checksums etc:
public static uint To32BitFnv1aHash(this string toHash,
bool separateUpperByte = false)
{
IEnumerable<byte> bytesToHash;
if (separateUpperByte)
bytesToHash = toHash.ToCharArray()
.Select(c => new[] { (byte)((c - (byte)c) >> 8), (byte)c })
.SelectMany(c => c);
else
bytesToHash = toHash.ToCharArray()
.Select(Convert.ToByte);
//this is the actual hash function; very simple
uint hash = FnvConstants.FnvOffset32;
foreach (var chunk in bytesToHash)
{
hash ^= chunk;
hash *= FnvConstants.FnvPrime32;
}
return hash;
}
public static class FnvConstants
{
public static readonly uint FnvPrime32 = 16777619;
public static readonly ulong FnvPrime64 = 1099511628211;
public static readonly uint FnvOffset32 = 2166136261;
public static readonly ulong FnvOffset64 = 14695981039346656037;
}
This is really useful for creating semantically equatable hashes for GetHashCode, based on a string digest of each object (a custom ToString() or otherwise). You can overload this to take any IEnumerable<byte> making it suitable for checksumming stream data etc. If you ever need a 64-bit hash (ulong), just copy the function and replace the constants used with the 64-bit constants. Oh, one more thing; the hash (as most do) rely on unchecked integer overflow; never run this hash in a "checked" block, or it will be virtually guaranteed to throw out exceptions.
If security does not play a role, generating a hash with a cryptographic hash function (such as MD5 or SHA1) and taking 4 bytes from it works. But they are slower than various non-cryptographic hash functions, as these functions are primarily designed for security, not speed.
Have a look at non-cryptographic hash functions such as FNV or Murmur.
Non-Cryptographic Hash Function Zoo
Performance Graphs
MurMurHash3, an ultra fast hash algorithm for C# / .NET
Edit: The floodyberry.com domain is now registered by a domain parking service - removed dead links
The easiest and yet good algorithm for strings is as follow:
int Hash(string s)
{
int res = 0;
for(int i = 0; i < str.Length; i++)
{
res += (i * str[i]) % int.MaxValue;
}
return res;
}
Obviously, this is absolutely not a secured hash algorithm but it is fast (really fast) returns 32 bit and as far as I know, is uniform (I've tried it for many algorithmic challenges with good results).
Not for use to hash password or any sensible data.
I am having a problem with hash collisions using short strings in .NET4.
EDIT: I am using the built-in string hashing function in .NET.
I'm implementing a cache using objects that store the direction of a conversion like this
public class MyClass
{
private string _from;
private string _to;
// More code here....
public MyClass(string from, string to)
{
this._from = from;
this._to = to;
}
public override int GetHashCode()
{
return string.Concat(this._from, this._to).GetHashCode();
}
public bool Equals(MyClass other)
{
return this.To == other.To && this.From == other.From;
}
public override bool Equals(object obj)
{
if (obj == null) return false;
if (this.GetType() != obj.GetType()) return false;
return Equals(obj as MyClass);
}
}
This is direction dependent and the from and to are represented by short strings like "AAB" and "ABA".
I am getting sparse hash collisions with these small strings, I have tried something simple like adding a salt (did not work).
The problem is that too many of my small strings like "AABABA" collides its hash with the reverse of "ABAAAB" (Note that these are not real examples, I have no idea if AAB and ABA actually cause collisions!)
and I have gone heavy duty like implementing MD5 (which works, but is MUCH slower)
I have also implemented the suggestion from Jon Skeet here:
Should I use a concatenation of my string fields as a hash code?
This works but I don't know how dependable it is with my various 3-character strings.
How can I improve and stabilize the hashing of small strings without adding too much overhead like MD5?
EDIT: In response to a few of the answers posted... the cache is implemented using concurrent dictionaries keyed from MyClass as stubbed out above. If I replace the GetHashCode in the code above with something simple like #JonSkeet 's code from the link I posted:
int hash = 17;
hash = hash * 23 + this._from.GetHashCode();
hash = hash * 23 + this._to.GetHashCode();
return hash;
Everything functions as expected.
It's also worth noting that in this particular use-case the cache is not used in a multi-threaded environment so there is no race condition.
EDIT: I should also note that this misbehavior is platform dependant. It works as intended on my fully updated Win7x64 machine but does not behave properly on a non-updated Win7x64 machine. I don't know the extend of what updates are missing but I know it doesn't have Win7 SP1... so I would assume there may also be a framework SP or update it's missing as well.
EDIT: As susggested, my issue was not caused by a problem with the hashing function. I had an elusive race condition, which is why it worked on some computers but not others and also why a "slower" hashing method made things work properly. The answer I selected was the most useful in understanding why my problem was not hash collisions in the dictionary.
Are you sure that collisions are who causes problems? When you say
I finally discovered what was causing this bug
You mean some slowness of your code or something else? If not I'm curious what kind of problem is that? Because any hash function (except "perfect" hash functions on limited domains) would cause collisions.
I put a quick piece of code to check for collisions for 3-letter words. And this code doesn't report a single collision for them. You see what I mean? Looks like buid-in hash algorithm is not so bad.
Dictionary<int, bool> set = new Dictionary<int, bool>();
char[] buffer = new char[3];
int count = 0;
for (int c1 = (int)'A'; c1 <= (int)'z'; c1++)
{
buffer[0] = (char)c1;
for (int c2 = (int)'A'; c2 <= (int)'z'; c2++)
{
buffer[1] = (char)c2;
for (int c3 = (int)'A'; c3 <= (int)'z'; c3++)
{
buffer[2] = (char)c3;
string str = new string(buffer);
count++;
int hash = str.GetHashCode();
if (set.ContainsKey(hash))
{
Console.WriteLine("Collision for {0}", str);
}
set[hash] = false;
}
}
}
Console.WriteLine("Generated {0} of {1} hashes", set.Count, count);
While you could pick almost any of well-known hash functions (as David mentioned) or even choose a "perfect" hash since it looks like your domain is limited (like minimum perfect hash)... It would be great to understand if the source of problems are really collisions.
Update
What I want to say is that .NET build-in hash function for string is not so bad. It doesn't give so many collisions that you would need to write your own algorithm in regular scenarios. And this doesn't depend on the lenght of strings. If you have a lot of 6-symbol strings that doesn't imply that your chances to see a collision are highier than with 1000-symbol strings. This is one of the basic properties of hash functions.
And again, another question is what kind of problems do you experience because of collisions? All build-in hashtables and dictionaries support collision resolution. So I would say all you can see is just... probably some slowness. Is this your problem?
As for your code
return string.Concat(this._from, this._to).GetHashCode();
This can cause problems. Because on every hash code calculation you create a new string. Maybe this is what causes your issues?
int hash = 17;
hash = hash * 23 + this._from.GetHashCode();
hash = hash * 23 + this._to.GetHashCode();
return hash;
This would be much better approach - just because you don't create new objects on the heap. Actually it's one of the main points of this approach - get a good hash code of an object with a complex "key" without creating new objects. So if you don't have a single value key then this should work for you. BTW, this is not a new hash function, this is just a way to combine existing hash values without compromising main properties of hash functions.
Any common hash function should be suitable for this purpose. If you're getting collisions on short strings like that, I'd say you're using an unusually bad hash function. You can use Jenkins or Knuth's with no issues.
Here's a very simple hash function that should be adequate. (Implemented in C, but should easily port to any similar language.)
unsigned int hash(const char *it)
{
unsigned hval=0;
while(*it!=0)
{
hval+=*it++;
hval+=(hval<<10);
hval^=(hval>>6);
hval+=(hval<<3);
hval^=(hval>>11);
hval+=(hval<<15);
}
return hval;
}
Note that if you want to trim the bits of the output of this function, you must use the least significant bits. You can also use mod to reduce the output range. The last character of the string tends to only affect the low-order bits. If you need a more even distribution, change return hval; to return hval * 2654435761U;.
Update:
public override int GetHashCode()
{
return string.Concat(this._from, this._to).GetHashCode();
}
This is broken. It treats from="foot",to="ar" as the same as from="foo",to="tar". Since your Equals function doesn't consider those equal, your hash function should not. Possible fixes include:
1) Form the string from,"XXX",to and hash that. (This assumes the string "XXX" almost never appears in your input strings.
2) Combine the hash of 'from' with the hash of 'to'. You'll have to use a clever combining function. For example, XOR or sum will cause from="foo",to="bar" to hash the same as from="bar",to="foo". Unfortunately, choosing the right combining function is not easy without knowing the internals of the hashing function. You can try:
int hc1=from.GetHashCode();
int hc2=to.GetHashCode();
return (hc1<<7)^(hc2>>25)^(hc1>>21)^(hc2<<11);