EDIT: Sorry I forgot to mention, I'm not using the implemented sha512 crypt because as far as I can tell it doesn't involve a salt value or a specified number of rounds to compute the hash with.
Okay so I'm coding the sha-512 crypt in c# and I'm following the steps found here...
http://people.redhat.com/drepper/SHA-crypt.txt
This is my first time doing anything encryption related so I want to make sure I'm understanding the steps correctly... I don't understand c code well enough to direct translation from c to c# :/
I have assumed finishing a digest is the same as computing the hash. In this case, I've also assumed that when the steps refer to a finished digest, they are referring the the computed hash, rather than the pre-hash computed digest bytes. Correct me if I'm wrong please!
Assuming everything has been done correctly for steps 1-8, my doubts start at step 9
9. For each block of 32 or 64 bytes in the password string (excluding
the terminating NUL in the C representation), add digest B to digest A
Since I'm using SHA-512, I have block sizes of 64 bytes.
Would the following code produce the desired result?
//FYI, temp = digestA from steps 1-3 (before expanding digestA for step 9)
//alt_result = computed digestB hash (64 byte hash)
for (cnt = key.Length; cnt > 64; cnt -= 64) //9
{
int i = 0;
ctx.TransformBlock(alt_result, 0, 64, digestA, temp.Length + 64 * i);
i++;
}
If anyone can clarify that what I've stated is correct, I would appreciate it. Thanks!
Salting is as simple as appending a fixed byte string on the end of your input string. Essentially providing a known "homegrown" transform to your input.
About the algorithm itself: you seem to be starting at a disadvantage. A neophyte, you're making a lot of "assumptions" about basic crypting terminology that even need clarification. If the CLR implementation won't work for you, I think your time would be better spent finding a good C implementation and figuring out how to integrate to that. Figuring out the interop (extern) calls to that will be far easier than diving into the intracacies of crypting, the results will be more efficient, and the knowledge you gain about native interop will be far more useful/reusable.
I'll add some important clarification for others who might come across this later.
First:
SHA512 and SHA512Crypt are two distinct algorithms for two different purposes. SHA512 is a general purpose hashing algorithm (see this). SHA512Crypt is a password storage or password based key derivation algorithm that uses SHA512 (hash) internally (see this). SHA512Crypt is based on the earlier Crypt function that used MD5 instead of SHA512.
The password storage/key generation algorithms have been specifically created to make it orders of magnitude more expensive to brute force. The typical way this is done is by iterating over the underlying hash algorithm in some fashion. However, you don't want to to this yourself... which brings us to...
Second:
Do NOT write your own cryptography methods. (see this) There are tons of ways to screw it up, even if you know exactly what you are doing.
If you don't want to use the built in Rfc2898DerviceBytes due to it being based on SHA1, then you could look at bcrypt or some other public, reviewed implementation of a known cryptographic algorithms.
Related
I have a unique situation where I need to produce hashes on the fly. Here is my situation. This question is related to here. I need to store a many urls in the database which need to be indexed. A URL can be over 2000 characters long. The database complains that a string over 900 bytes cannot be indexed. My solution is to hash the URL using MD5 or SHA256. I am not sure which hashing algorithm to use. Here are my requirements
Shortest character length with minimal collision
Needs to be very fast. I will be hashing the referurl on every page request
Collisions need to be minimized since I may have millions of urls in the database
I am not worried about security. I am worried about character length, speed, and collisions. Anyone know of a good algorithm for this?
In your case, I wouldn't use any of the cryptographic hash functions (i.e. MD5, SHA), since they were designed with security in mind: They mainly want to make it as hard as possible to finde two different strings with the same hash. I think this wouldn't be a problem in your case. (the possibility of random collisions is inherent to hashing, of course)
I'd strongly not suggest to use String.GetHashCode(), since the implementation is not known and MSDN says that it might vary between different versions of the framework. Even the results between x86 and x64 versions may be different. So you'll get into troubles when trying to access the same database using a newer (or different) version of the .NET framework.
I found the algorithm for the Java implementation of hashCode on Wikipedia (here), it seems quite easy to implement. Even a straightforward implementation would be faster than an implementation of MD5 or SHA imo. You could also use long values which reduces the probability of collisions.
There is also a short analysis of the .NET GetHashCode implementation here (not the algorithm itself but some implementation details), you could also use this one I guess. (or try to implement the Java version in a similar way ...)
a quick one :
URLString.GetHashCode().ToString("x")
While both MD5 and SHA1 have been proved ineffective where collision prevention is essential I suspect for your application either would be sufficient. I don't know for sure but I suspect that MD5 would be the simpler and quicker of the two algorithms.
Use the System.Security.Cryptography.SHA1Cng class, I would suggest. It's 160 bits or 20 bytes long, so that should definitely be small enough. If you need it to be a string, it will only require 40 characters, so that should suit your needs well. It should also be fast enough, and as far as I know, no collisions have yet been found.
I'd personally use String.GetHashCode(). This is the basic hash function. I honestly have no idea how it performs compared to other implementations but it should be fine.
Either of the two hashing functions that you name should be quick enough that you won't notice much difference between them. Unless this site requires ultra-high performance I would not worry too much about them. I'd personally probably go for MD5. This can be formatted as a string as hexdecimal in 64 characters or as a base 64 string in 44 characters.
The reason I'd go for MD5 is because you are very unlikely to run into collisions and even if you do you can structure your queries with "where urlhash = #hash and url = #url". The database engine should work out that one is indexed and the other isn't and use that information to do a sensible search.
If there are colisions the indexed scan on urlhash will return a handful of results which will be easy to do text comparisons on to get the right one. This is unlikely to be relevant very often though. You've pretty low chances of getting collisions this way.
Reflected source code of GetHashCode function in .net 4.0
public override unsafe int GetHashCode()
{
fixed (char* str = ((char*) this))
{
char* chPtr = str;
int num = 0x15051505;
int num2 = num;
int* numPtr = (int*) chPtr;
for (int i = this.Length; i > 0; i -= 4)
{
num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
if (i <= 2)
{
break;
}
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
numPtr += 2;
}
return (num + (num2 * 0x5d588b65));
}
}
There was O(n) simple operations(+, <<, ^) and one multiplication. So this is very fast.
I've tested this function on 3 mln DB contains strings lengths up to 256 characters and about 97% of strings has no collision. (Maximum 5 strings have the same hash)
You may want to look at the following project:
CMPH - C Minimal Perfect Hashing Library
And check out the following hot topics listing for perfect hashes:
Hottest 'perfect-hash' Answers - Stack Overflow
You could also consider using a full text index in SQL rather than hashing:
CREATE FULLTEXT INDEX (Transact-SQL)
What is a good password hashing algorithm to use from C#, if later on you may know that there are going to be Java and Ruby programs that may also need to 'login'/authenticate a user. Anything out of the box in .NET that translates well to other languages and is easy to use.
The strongest cryptographic hash algorithm which NSA/NIST has standardized on is SHA-512.
Be sure to use a per-password random salt (a 128-bit salt generated by a cryptographically strong random number generator is good). Or, even better, be sure to use a per-password random key (again generated by a cryptorandom), and use HMAC-SHA-512. Be sure to use multiple iterations - 4096 and 65,536 are good round numbers (2^12 and 2^16).
let h = get_hash_hunction("SHA-512")
let k = get_key_for_user("Justice")
let hmac = get_hmac(h, k)
let test = get_bytes("utf-8", http_request_params["password"])
for(i in 0 .. (2^16 - 1))
let test = run_hmac(hmac, test)
return test == get_hashed_password_for_user("Justice")
Correct using of MD5 having salt added makes rainbow tables and brute force quite expensive. So, the comment pretty valid to use md5.
A think the MD5 is the most common one.
I have a little problem where need to do a hash of a number of about 10 digits into a number of 6 digits. The hash needs to be deterministic.
It's more important that the hash is not resource intensive.
For example, say that I have some number, x, like 123456789
I want to write an hash function that gives me a number, y, back like 987654.
I'd then like to have a function that takes the x and y as parameters, re-applies the hash on x, and checks that the result is y.
It should be difficult to compute possible input values given the hash.
My first idea of multiplying pairs of digits led to a lot of duplicate hashed values.
I have the feeling that this sort of problem has some kind of elegant solution, but I just can't think of it myself.
Can anyone help me out here? Thanks in advance :)
What you need is called "hashing".
Try CRC16.
Your problem as stated is not solvable.
You say that you want the system to be "somewhat hard to break", by which I assume you mean that it is "somewhat hard" for an attacker to take a known digest and produce from it a possible input which hashes to the given digest. Since there are only 4 billion possible inputs and only 65536 possible hashes in the system you propose, it is utterly trivial to find a message that corresponds to a given hash, no matter what the hash algorithm is. On average, the attacker will have about 65000 possible messages to choose from, and can therefore cherry-pick the message that best serves his nefarious scheme.
I would expect a "somewhat hard" problem in the hash-breaking space to require, dedicating, say, a few million dollars worth of supercomputer time to break. Your proposal can be broken by inexperienced high school students writing Javascript programs that take a couple minutes to write and maybe a minute to run, tops; this is not even vaguely close to "somewhat hard".
Why are you choosing such tiny limits on your algorithm, limits which will by their very nature make it trivial to break the hashing? And for that matter, what's the value in hashing such a tiny amount of data as a 32 bit integer?
(( X>>16) ^ (X)) & 0xFFFF
.......
What you want to do is to try to distribute the hash values as evenly as possible over the range. Some of the built in hashing methods are fairly good at this, so you could perhaps try something like getting the hash code of the string representation, and simply throw away half of the bits:
ushort code = (ushort)value.ToString().GetHashCode();
However, it also depends on what you are going to use the hash code for. The built in hash codes are not intended to be stored permanently. The algorithms for calculating the hash codes can change with any new version of the framework, so if you store the hash codes in the database they may become useless in the future. In that case you would instead have to create the hashing algorithm yourself from scratch, or use some hashing algorithm that was designed for permanent storage.
One simple algorithm that is used for hash codes for some values in the framework is to use exclusive or to make all bits in the value matter when the hash code is smaller than the data:
byte[] b = BitConverter.GetBytes(value);
ushort code = (ushort)(BitConverter.ToUInt16(b, 0) ^ BitConverter.ToUInt16(b, 2));
or the more efficient but less obvious way to do the same:
ushort code = (ushort)((value >> 16) ^ value);
This of course has no obfuscating properties for small values, so you might want to throw in some "random" bits to make the hash code significantly different from the value:
ushort code = (ushort)(0x56D4 ^ (value >> 16) ^ value);
How about just discarding the lower 16 bits or last 4 digits?
1234567890 --> 123456
Easily done by just doing an integer division by 10000.
I have an unencrypted/unencoded string - "565040574". I also have the encrypted/encoded string for this string - "BSubW2AUWrSCL7dk9ucoiA==".
It looks like this string has been Base64ed after encryption, but I don't know which encryption algorithm has been used. If I convert "BSubW2AUWrSCL7dk9ucoiA==" string to bytes using Convert.FromBase64String("BSubW2AUWrSCL7dk9ucoiA=="), I get 16 bytes.
Is there anything using which I can know what type of encryption has been used to encrypt the "565040574" to "BSubW2AUWrSCL7dk9ucoiA=="?
No, there is nothing to tell you how it was encrypted. If you don't have the key to decrypt it then you will be out of luck anyway.
If the plan was to save this to a file or send it in email then it would be base-64 encoded, so that was a good guess.
You may be able to narrow down what it is not by looking at the fact that you have 7 bytes of padding perhaps, but whether it was IDEA or Blowfish or AES, there is no way to know.
Looking at it, from the top of my head I would say AES and more specifically Rijndael.
EDIT:
Just to add, as I said in my comment, without the key you will never know what this is. I am taking it on a best guess scenario, also based on implementations that could be termed "more common", which could also be a complete oversight from me.
Remember that if you can ever outright say what algorithm a ciphertext is in, never, ever use that algorithm.
What can you tell from the data you have? Well, the most concrete bit of information you have is that 9 bytes of cleartext encrypts to 16 bytes of ciphertext. Since it is unlikely that a data compression algorithm is being used on such a small chunk of data, this means we can make an educated guess that:
It is encrypted with a block cipher, with a block size <= 128 bits.
The encryption mode is ECB, since there is no room for an IV.
The .NET framework ships with 6 different hashing algorithms:
MD5: 16 bytes (Time to hash 500MB: 1462 ms)
SHA-1: 20 bytes (1644 ms)
SHA256: 32 bytes (5618 ms)
SHA384: 48 bytes (3839 ms)
SHA512: 64 bytes (3820 ms)
RIPEMD: 20 bytes (7066 ms)
Each of these functions performs differently; MD5 being the fastest and RIPEMD being the slowest.
MD5 has the advantage that it fits in the built-in Guid type; and it is the basis of the type 3 UUID. SHA-1 hash is the basis of type 5 UUID. Which makes them really easy to use for identification.
MD5 however is vulnerable to collision attacks, SHA-1 is also vulnerable but to a lesser degree.
Under what conditions should I use which hashing algorithm?
Particular questions I'm really curious to see answered are:
Is MD5 not to be trusted? Under normal situations when you use the MD5 algorithm with no malicious intent and no third party has any malicious intent would you expect ANY collisions (meaning two arbitrary byte[] producing the same hash)
How much better is RIPEMD than SHA1? (if its any better) its 5 times slower to compute but the hash size is the same as SHA1.
What are the odds of getting non-malicious collisions when hashing file-names (or other short strings)? (Eg. 2 random file-names with same MD5 hash) (with MD5 / SHA1 / SHA2xx) In general what are the odds for non-malicious collisions?
This is the benchmark I used:
static void TimeAction(string description, int iterations, Action func) {
var watch = new Stopwatch();
watch.Start();
for (int i = 0; i < iterations; i++) {
func();
}
watch.Stop();
Console.Write(description);
Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
}
static byte[] GetRandomBytes(int count) {
var bytes = new byte[count];
(new Random()).NextBytes(bytes);
return bytes;
}
static void Main(string[] args) {
var md5 = new MD5CryptoServiceProvider();
var sha1 = new SHA1CryptoServiceProvider();
var sha256 = new SHA256CryptoServiceProvider();
var sha384 = new SHA384CryptoServiceProvider();
var sha512 = new SHA512CryptoServiceProvider();
var ripemd160 = new RIPEMD160Managed();
var source = GetRandomBytes(1000 * 1024);
var algorithms = new Dictionary<string,HashAlgorithm>();
algorithms["md5"] = md5;
algorithms["sha1"] = sha1;
algorithms["sha256"] = sha256;
algorithms["sha384"] = sha384;
algorithms["sha512"] = sha512;
algorithms["ripemd160"] = ripemd160;
foreach (var pair in algorithms) {
Console.WriteLine("Hash Length for {0} is {1}",
pair.Key,
pair.Value.ComputeHash(source).Length);
}
foreach (var pair in algorithms) {
TimeAction(pair.Key + " calculation", 500, () =>
{
pair.Value.ComputeHash(source);
});
}
Console.ReadKey();
}
In cryptography, hash functions provide three separate functions.
Collision resistance: How hard is it for someone to find two messages (any two messages) that hash the same.
Preimage Resistance: Given a hash, how hard is it to find another message that hashes the same? Also known as a one way hash function.
Second preimage resistance: Given a message, find another message that hashes the same.
These properties are related but independent. For example, collision resistance implies second preimage resistance, but not the other way around. For any given application, you will have different requirements, needing one or more of these properties. A hash function for securing passwords on a server will usually only require preimage resistance, while message digests require all three.
It has been shown that MD5 is not collision resistant, however, that does not preclude its use in applications that do not require collision resistance. Indeed, MD5 is often still used in applications where the smaller key size and speed are beneficial. That said, due to its flaws, researchers recommend the use of other hash functions in new scenarios.
SHA1 has a flaw that allows collisions to be found in theoretically far less than the 2^80 steps a secure hash function of its length would require. The attack is continually being revised and currently can be done in ~2^63 steps - just barely within the current realm of computability (as of April, 2009). For this reason NIST is phasing out the use of SHA1, stating that the SHA2 family should be used after 2010.
SHA2 is a new family of hash functions created following SHA1. Currently there are no known attacks against SHA2 functions. SHA256, 384 and 512 are all part of the SHA2 family, just using different key lengths.
RIPEMD I can't comment too much on, except to note that it isn't as commonly used as the SHA families, and so has not been scrutinized as closely by cryptographic researchers. For that reason alone I would recommend the use of SHA functions over it. In the implementation you are using it seems quite slow as well, which makes it less useful.
In conclusion, there is no one best function - it all depends on what you need it for. Be mindful of the flaws with each and you will be best able to choose the right hash function for your scenario.
⚠️ WARNING
August, 2022
DO NOT USE SHA-1 OR MD5 FOR CRYPTOGRAPHIC APPLICATIONS. Both of these algorithms are broken (MD5 can be cracked in 30 seconds by a cell phone).
All hash functions are "broken"
The pigeonhole principle says that try as hard as you will you can not fit more than 2 pigeons in 2 holes (unless you cut the pigeons up). Similarly you can not fit 2^128 + 1 numbers in 2^128 slots. All hash functions result in a hash of finite size, this means that you can always find a collision if you search through "finite size" + 1 sequences. It's just not feasible to do so. Not for MD5 and not for Skein.
MD5/SHA1/Sha2xx have no chance collisions
All the hash functions have collisions, its a fact of life. Coming across these collisions by accident is the equivalent of winning the intergalactic lottery. That is to say, no one wins the intergalactic lottery, its just not the way the lottery works. You will not come across an accidental MD5/SHA1/SHA2XXX hash, EVER. Every word in every dictionary, in every language, hashes to a different value. Every path name, on every machine in the entire planet has a different MD5/SHA1/SHA2XXX hash. How do I know that, you may ask. Well, as I said before, no one wins the intergalactic lottery, ever.
But ... MD5 is broken
Sometimes the fact that its broken does not matter.
As it stands there are no known pre-image or second pre-image attacks on MD5.
So what is so broken about MD5, you may ask? It is possible for a third party to generate 2 messages, one of which is EVIL and another of which is GOOD that both hash to the same value. (Collision attack)
Nonetheless, the current RSA recommendation is not to use MD5 if you need pre-image resistance. People tend to err on the side of caution when it comes to security algorithms.
So what hash function should I use in .NET?
Use MD5 if you need the speed/size and don't care about birthday attacks or pre-image attacks.
Repeat this after me, there are no chance MD5 collisions, malicious collisions can be carefully engineered. Even though there are no known pre-image attacks to date on MD5 the line from the security experts is that MD5 should not be used where you need to defend against pre-image attacks. SAME goes for SHA1.
Keep in mind, not all algorithms need to defend against pre-image or collision attacks. Take the trivial case of a first pass search for duplicate files on your HD.
Use SHA2XX based function if you want a cryptographically secure hash function.
No one ever found any SHA512 collision. EVER. They have tried really hard. For that matter no one ever found any SHA256 or 384 collision ever. .
Don't use SHA1 or RIPEMD unless its for an interoperability scenario.
RIPMED has not received the same amount of scrutiny that SHAX and MD5 has received. Both SHA1 and RIPEMD are vulnerable to birthday attacks. They are both slower than MD5 on .NET and come in the awkward 20 byte size. Its pointless to use these functions, forget about them.
SHA1 collision attacks are down to 2^52, its not going to be too long until SHA1 collisions are out in the wild.
For up to date information about the various hash functions have a look at the hash function zoo.
But wait there is more
Having a fast hash function can be a curse. For example: a very common usage for hash functions is password storage. Essentially, you calculate hash of a password combined with a known random string (to impede rainbow attacks) and store that hash in the database.
The problem is, that if an attacker gets a dump of the database, he can, quite effectively guess passwords using brute-force. Every combination he tries only takes a fraction of millisecond, and he can try out hundreds of thousands of passwords a second.
To work around this issue, the bcrypt algorithm can be used, it is designed to be slow so the attacker will be heavily slowed down if attacking a system using bcrypt. Recently scrypt has made some headline and is considered by some to be more effective than bcrypt but I do not know of a .Net implementation.
Update:
Times have changed, we have a SHA3 winner. I would recommend using keccak (aka SHA3) winner of the SHA3 contest.
Original Answer:
In order of weakest to strongest I would say:
RIPEMD BROKEN, Should never be used as can be seen in this pdf
MD-5 BROKEN, Should never be used, can be broken in 2 minutes with a laptop
SHA-1 BROKEN, Should never be used, is broken in principal, attacks are getting better by the week
SHA-2 WEAK, Will probably be broken in the next few years. A few weaknesses have been found. Note that generally the higher key size, the harder the hash function is to break. While key size = strength is not always true, it is mostly true. So SHA-256 is probably weaker than SHA-512.
Skein NO KNOWN WEAKNESSES, is a candidate for SHA-3. It is fairly new and thus untested. It has been implemented in a bunch of languages.
MD6 NO KNOWN WEAKNESSES, is another a candidate for SHA-3. Probably stronger than Skien, but slower on single core machines. Like Skien it is untested. Some security minded developers are using it, in mission critical roles.
Personally I'd use MD6, because one can never been too paranoid. If speed is a real concern I'd look at Skein, or SHA-256.
In MD5's defense, there is no known way to produce a file with an arbitrary MD5 hash. The original author must plan in advance to have a working collision. Thus if the receiver trusts the sender, MD5 is fine. MD5 is broken if the signer is malicious, but it is not known to be vulnerable to man-in-the-middle attacks.
It would be a good ideea to take a look at the BLAKE2 algorythm.
As it is described, it is faster than MD5 and at least as secure as SHA-3. It is also implemented by several software applications, including WinRar.
Which one you use really depends on what you are using it for. If you just want to make sure that files don't get corrupted in transit and aren't that concerned about security, go for fast and small. If you need digital signatures for multi-billion dollar federal bailout agreements and need to make sure they aren't forged, go for hard to spoof and slow.
I would like to chime in (before md5 gets torn apart) that I do still use md5 extensively despite its overwhelming brokenness for a lot of crypto.
As long as you don't care to protect against collisions (you are still safe to use md5 in an hmac as well) and you do want the speed (sometimes you want a slower hash) then you can still use md5 confidently.
I am not an expert at this sort of thing, but I keep up with the security community and a lot of people there consider the md5 hash broken. I would say that which one to use depends on how sensitive the data is and the specific application. You might be able to get away with a slightly less secure hash as long as the key is good and strong.
Here are my suggestions for you:
You should probably forget MD5 if you anticipate attacks. There are many rainbow tables for them online, and corporations like the RIAA have been known to be able to produce sequences with equivalent hashes.
Use a salt if you can. Including the message length in the message can make it very difficult to make a useful hash collision.
As a general rule of thumb, more bits means less collisions (by pigeonhole principle) and slower, and maybe more secure (unless you are a math genius who can find vulnerabilities).
See here for a paper detailing an algorithm to create md5 collisions in 31 seconds with a desktop Intel P4 computer.
http://eprint.iacr.org/2006/105