Which cryptographic hash function should I choose? - c#

The .NET framework ships with 6 different hashing algorithms:
MD5: 16 bytes (Time to hash 500MB: 1462 ms)
SHA-1: 20 bytes (1644 ms)
SHA256: 32 bytes (5618 ms)
SHA384: 48 bytes (3839 ms)
SHA512: 64 bytes (3820 ms)
RIPEMD: 20 bytes (7066 ms)
Each of these functions performs differently; MD5 being the fastest and RIPEMD being the slowest.
MD5 has the advantage that it fits in the built-in Guid type; and it is the basis of the type 3 UUID. SHA-1 hash is the basis of type 5 UUID. Which makes them really easy to use for identification.
MD5 however is vulnerable to collision attacks, SHA-1 is also vulnerable but to a lesser degree.
Under what conditions should I use which hashing algorithm?
Particular questions I'm really curious to see answered are:
Is MD5 not to be trusted? Under normal situations when you use the MD5 algorithm with no malicious intent and no third party has any malicious intent would you expect ANY collisions (meaning two arbitrary byte[] producing the same hash)
How much better is RIPEMD than SHA1? (if its any better) its 5 times slower to compute but the hash size is the same as SHA1.
What are the odds of getting non-malicious collisions when hashing file-names (or other short strings)? (Eg. 2 random file-names with same MD5 hash) (with MD5 / SHA1 / SHA2xx) In general what are the odds for non-malicious collisions?
This is the benchmark I used:
static void TimeAction(string description, int iterations, Action func) {
var watch = new Stopwatch();
watch.Start();
for (int i = 0; i < iterations; i++) {
func();
}
watch.Stop();
Console.Write(description);
Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
}
static byte[] GetRandomBytes(int count) {
var bytes = new byte[count];
(new Random()).NextBytes(bytes);
return bytes;
}
static void Main(string[] args) {
var md5 = new MD5CryptoServiceProvider();
var sha1 = new SHA1CryptoServiceProvider();
var sha256 = new SHA256CryptoServiceProvider();
var sha384 = new SHA384CryptoServiceProvider();
var sha512 = new SHA512CryptoServiceProvider();
var ripemd160 = new RIPEMD160Managed();
var source = GetRandomBytes(1000 * 1024);
var algorithms = new Dictionary<string,HashAlgorithm>();
algorithms["md5"] = md5;
algorithms["sha1"] = sha1;
algorithms["sha256"] = sha256;
algorithms["sha384"] = sha384;
algorithms["sha512"] = sha512;
algorithms["ripemd160"] = ripemd160;
foreach (var pair in algorithms) {
Console.WriteLine("Hash Length for {0} is {1}",
pair.Key,
pair.Value.ComputeHash(source).Length);
}
foreach (var pair in algorithms) {
TimeAction(pair.Key + " calculation", 500, () =>
{
pair.Value.ComputeHash(source);
});
}
Console.ReadKey();
}

In cryptography, hash functions provide three separate functions.
Collision resistance: How hard is it for someone to find two messages (any two messages) that hash the same.
Preimage Resistance: Given a hash, how hard is it to find another message that hashes the same? Also known as a one way hash function.
Second preimage resistance: Given a message, find another message that hashes the same.
These properties are related but independent. For example, collision resistance implies second preimage resistance, but not the other way around. For any given application, you will have different requirements, needing one or more of these properties. A hash function for securing passwords on a server will usually only require preimage resistance, while message digests require all three.
It has been shown that MD5 is not collision resistant, however, that does not preclude its use in applications that do not require collision resistance. Indeed, MD5 is often still used in applications where the smaller key size and speed are beneficial. That said, due to its flaws, researchers recommend the use of other hash functions in new scenarios.
SHA1 has a flaw that allows collisions to be found in theoretically far less than the 2^80 steps a secure hash function of its length would require. The attack is continually being revised and currently can be done in ~2^63 steps - just barely within the current realm of computability (as of April, 2009). For this reason NIST is phasing out the use of SHA1, stating that the SHA2 family should be used after 2010.
SHA2 is a new family of hash functions created following SHA1. Currently there are no known attacks against SHA2 functions. SHA256, 384 and 512 are all part of the SHA2 family, just using different key lengths.
RIPEMD I can't comment too much on, except to note that it isn't as commonly used as the SHA families, and so has not been scrutinized as closely by cryptographic researchers. For that reason alone I would recommend the use of SHA functions over it. In the implementation you are using it seems quite slow as well, which makes it less useful.
In conclusion, there is no one best function - it all depends on what you need it for. Be mindful of the flaws with each and you will be best able to choose the right hash function for your scenario.
⚠️ WARNING
August, 2022
DO NOT USE SHA-1 OR MD5 FOR CRYPTOGRAPHIC APPLICATIONS. Both of these algorithms are broken (MD5 can be cracked in 30 seconds by a cell phone).

All hash functions are "broken"
The pigeonhole principle says that try as hard as you will you can not fit more than 2 pigeons in 2 holes (unless you cut the pigeons up). Similarly you can not fit 2^128 + 1 numbers in 2^128 slots. All hash functions result in a hash of finite size, this means that you can always find a collision if you search through "finite size" + 1 sequences. It's just not feasible to do so. Not for MD5 and not for Skein.
MD5/SHA1/Sha2xx have no chance collisions
All the hash functions have collisions, its a fact of life. Coming across these collisions by accident is the equivalent of winning the intergalactic lottery. That is to say, no one wins the intergalactic lottery, its just not the way the lottery works. You will not come across an accidental MD5/SHA1/SHA2XXX hash, EVER. Every word in every dictionary, in every language, hashes to a different value. Every path name, on every machine in the entire planet has a different MD5/SHA1/SHA2XXX hash. How do I know that, you may ask. Well, as I said before, no one wins the intergalactic lottery, ever.
But ... MD5 is broken
Sometimes the fact that its broken does not matter.
As it stands there are no known pre-image or second pre-image attacks on MD5.
So what is so broken about MD5, you may ask? It is possible for a third party to generate 2 messages, one of which is EVIL and another of which is GOOD that both hash to the same value. (Collision attack)
Nonetheless, the current RSA recommendation is not to use MD5 if you need pre-image resistance. People tend to err on the side of caution when it comes to security algorithms.
So what hash function should I use in .NET?
Use MD5 if you need the speed/size and don't care about birthday attacks or pre-image attacks.
Repeat this after me, there are no chance MD5 collisions, malicious collisions can be carefully engineered. Even though there are no known pre-image attacks to date on MD5 the line from the security experts is that MD5 should not be used where you need to defend against pre-image attacks. SAME goes for SHA1.
Keep in mind, not all algorithms need to defend against pre-image or collision attacks. Take the trivial case of a first pass search for duplicate files on your HD.
Use SHA2XX based function if you want a cryptographically secure hash function.
No one ever found any SHA512 collision. EVER. They have tried really hard. For that matter no one ever found any SHA256 or 384 collision ever. .
Don't use SHA1 or RIPEMD unless its for an interoperability scenario.
RIPMED has not received the same amount of scrutiny that SHAX and MD5 has received. Both SHA1 and RIPEMD are vulnerable to birthday attacks. They are both slower than MD5 on .NET and come in the awkward 20 byte size. Its pointless to use these functions, forget about them.
SHA1 collision attacks are down to 2^52, its not going to be too long until SHA1 collisions are out in the wild.
For up to date information about the various hash functions have a look at the hash function zoo.
But wait there is more
Having a fast hash function can be a curse. For example: a very common usage for hash functions is password storage. Essentially, you calculate hash of a password combined with a known random string (to impede rainbow attacks) and store that hash in the database.
The problem is, that if an attacker gets a dump of the database, he can, quite effectively guess passwords using brute-force. Every combination he tries only takes a fraction of millisecond, and he can try out hundreds of thousands of passwords a second.
To work around this issue, the bcrypt algorithm can be used, it is designed to be slow so the attacker will be heavily slowed down if attacking a system using bcrypt. Recently scrypt has made some headline and is considered by some to be more effective than bcrypt but I do not know of a .Net implementation.

Update:
Times have changed, we have a SHA3 winner. I would recommend using keccak (aka SHA3) winner of the SHA3 contest.
Original Answer:
In order of weakest to strongest I would say:
RIPEMD BROKEN, Should never be used as can be seen in this pdf
MD-5 BROKEN, Should never be used, can be broken in 2 minutes with a laptop
SHA-1 BROKEN, Should never be used, is broken in principal, attacks are getting better by the week
SHA-2 WEAK, Will probably be broken in the next few years. A few weaknesses have been found. Note that generally the higher key size, the harder the hash function is to break. While key size = strength is not always true, it is mostly true. So SHA-256 is probably weaker than SHA-512.
Skein NO KNOWN WEAKNESSES, is a candidate for SHA-3. It is fairly new and thus untested. It has been implemented in a bunch of languages.
MD6 NO KNOWN WEAKNESSES, is another a candidate for SHA-3. Probably stronger than Skien, but slower on single core machines. Like Skien it is untested. Some security minded developers are using it, in mission critical roles.
Personally I'd use MD6, because one can never been too paranoid. If speed is a real concern I'd look at Skein, or SHA-256.

In MD5's defense, there is no known way to produce a file with an arbitrary MD5 hash. The original author must plan in advance to have a working collision. Thus if the receiver trusts the sender, MD5 is fine. MD5 is broken if the signer is malicious, but it is not known to be vulnerable to man-in-the-middle attacks.

It would be a good ideea to take a look at the BLAKE2 algorythm.
As it is described, it is faster than MD5 and at least as secure as SHA-3. It is also implemented by several software applications, including WinRar.

Which one you use really depends on what you are using it for. If you just want to make sure that files don't get corrupted in transit and aren't that concerned about security, go for fast and small. If you need digital signatures for multi-billion dollar federal bailout agreements and need to make sure they aren't forged, go for hard to spoof and slow.

I would like to chime in (before md5 gets torn apart) that I do still use md5 extensively despite its overwhelming brokenness for a lot of crypto.
As long as you don't care to protect against collisions (you are still safe to use md5 in an hmac as well) and you do want the speed (sometimes you want a slower hash) then you can still use md5 confidently.

I am not an expert at this sort of thing, but I keep up with the security community and a lot of people there consider the md5 hash broken. I would say that which one to use depends on how sensitive the data is and the specific application. You might be able to get away with a slightly less secure hash as long as the key is good and strong.

Here are my suggestions for you:
You should probably forget MD5 if you anticipate attacks. There are many rainbow tables for them online, and corporations like the RIAA have been known to be able to produce sequences with equivalent hashes.
Use a salt if you can. Including the message length in the message can make it very difficult to make a useful hash collision.
As a general rule of thumb, more bits means less collisions (by pigeonhole principle) and slower, and maybe more secure (unless you are a math genius who can find vulnerabilities).
See here for a paper detailing an algorithm to create md5 collisions in 31 seconds with a desktop Intel P4 computer.
http://eprint.iacr.org/2006/105

Related

Get a file SHA256 Hash code and Checksum

Previously I asked a question about combining SHA1+MD5 but after that I understand calculating SHA1 and then MD5 of a lagrge file is not that faster than SHA256.
In my case a 4.6 GB file takes about 10 mins with the default implementation SHA256 with (C# MONO) in a Linux system.
public static string GetChecksum(string file)
{
using (FileStream stream = File.OpenRead(file))
{
var sha = new SHA256Managed();
byte[] checksum = sha.ComputeHash(stream);
return BitConverter.ToString(checksum).Replace("-", String.Empty);
}
}
Then I read this topic and somehow change my code according what they said to :
public static string GetChecksumBuffered(Stream stream)
{
using (var bufferedStream = new BufferedStream(stream, 1024 * 32))
{
var sha = new SHA256Managed();
byte[] checksum = sha.ComputeHash(bufferedStream);
return BitConverter.ToString(checksum).Replace("-", String.Empty);
}
}
But It doesn't have such a affection and takes about 9 mins.
Then I try to test my file through sha256sum command in Linux for the same file and It takes about 28 secs and both the above code and Linux command give the same result !
Someone advised me to read about differences between Hash Code and Checksum and I reach to this topic that explains the differences.
My Questions are :
What causes such different between the above code and Linux sha256sum in time ?
What does the above code do ? (I mean is it the hash code calculation or checksum calculation? Because if you search about give a hash code of a file and checksum of a file in C#, they both reach to the above code.)
Is there any motivated attack against sha256sum even when SHA256 is collision resistant ?
How can I make my implementation as fast as sha256sum in C#?
public string SHA256CheckSum(string filePath)
{
using (SHA256 SHA256 = SHA256Managed.Create())
{
using (FileStream fileStream = File.OpenRead(filePath))
return Convert.ToBase64String(SHA256.ComputeHash(fileStream));
}
}
My best guess is that there's some additional buffering in the Mono implementation of the File.Read operation. Having recently looked into checksums on a large file, on a decent spec Windows machine you should expect roughly 6 seconds per Gb if all is running smoothly.
Oddly it has been reported in more than one benchmark test that SHA-512 is noticeably quicker than SHA-256 (see 3 below). One other possibility is that the problem is not in allocating the data, but in disposing of the bytes once read. You may be able to use TransformBlock (and TransformFinalBlock) on a single array rather than reading the stream in one big gulp—I have no idea if this will work, but it bears investigating.
The difference between hashcode and checksum is (nearly) semantics. They both calculate a shorter 'magic' number that is fairly unique to the data in the input, though if you have 4.6GB of input and 64B of output, 'fairly' is somewhat limited.
A checksum is not secure, and with a bit of work you can figure out the input from enough outputs, work backwards from output to input and do all sorts of insecure things.
A Cryptographic hash takes longer to calculate, but changing just one bit in the input will radically change the output and for a good hash (e.g. SHA-512) there's no known way of getting from output back to input.
MD5 is breakable: you can fabricate an input to produce any given output, if needed, on a PC. SHA-256 is (probably) still secure, but won't be in a few years time—if your project has a lifespan measured in decades, then assume you'll need to change it. SHA-512 has no known attacks and probably won't for quite a while, and since it's quicker than SHA-256 I'd recommend it anyway. Benchmarks show it takes about 3 times longer to calculate SHA-512 than MD5, so if your speed issue can be dealt with, it's the way to go.
No idea, beyond those mentioned above. You're doing it right.
For a bit of light reading, see Crypto.SE: SHA51 is faster than SHA256?
Edit in response to question in comment
The purpose of a checksum is to allow you to check if a file has changed between the time you originally wrote it, and the time you come to use it. It does this by producing a small value (512 bits in the case of SHA512) where every bit of the original file contributes at least something to the output value. The purpose of a hashcode is the same, with the addition that it is really, really difficult for anyone else to get the same output value by making carefully managed changes to the file.
The premise is that if the checksums are the same at the start and when you check it, then the files are the same, and if they're different the file has certainly changed. What you are doing above is feeding the file, in its entirety, through an algorithm that rolls, folds and spindles the bits it reads to produce the small value.
As an example: in the application I'm currently writing, I need to know if parts of a file of any size have changed. I split the file into 16K blocks, take the SHA-512 hash of each block, and store it in a separate database on another drive. When I come to see if the file has changed, I reproduce the hash for each block and compare it to the original. Since I'm using SHA-512, the chances of a changed file having the same hash are unimaginably small, so I can be confident of detecting changes in 100s of GB of data whilst only storing a few MB of hashes in my database. I'm copying the file at the same time as taking the hash, and the process is entirely disk-bound; it takes about 5 minutes to transfer a file to a USB drive, of which 10 seconds is probably related to hashing.
Lack of disk space to store hashes is a problem I can't solve in a post—buy a USB stick?
Way late to the party but seeing as none of the answers mentioned it, I wanted to point out:
SHA256Managed is an implementation of the System.Security.Cryptography.HashAlgorithm class, and all of the functionality related to the read operations are handled in the inherited code.
HashAlgorithm.ComputeHash(Stream) uses a fixed 4096 byte buffer to read data from a stream. As a result, you're not really going to see much difference using a BufferedStream for this call.
HashAlgorithm.ComputeHash(byte[]) operates on the entire byte array, but it resets the internal state after every call, so it can't be used to incrementally hash a buffered stream.
Your best bet would be to use a third party implementation that's optimized for your use case.
using (SHA256 SHA256 = SHA256Managed.Create())
{
using (FileStream fileStream = System.IO.File.OpenRead(filePath))
{
string result = "";
foreach (var hash in SHA256.ComputeHash(fileStream))
{
result += hash.ToString("x2");
}
return result;
}
}
For Reference: https://www.c-sharpcorner.com/article/how-to-convert-a-byte-array-to-a-string/

Rehashing a hash (SHA512)

We are following a pretty standard user id / password check. We store the hashed password in the db. When the user enters credentials we hash the entered password then compare to what the db has. If they match then user is authenticated.
Now this login process under load test is slowing down considerably so I was asked to look at it. VS 2013 Profiler pointed out the hashing method as a hot path. Looking at the method in question we are looping over the hashing process??
private const int totalHashCount = 1723;
public string CreateHash(string salt, string password, int securityIndex)
{
string hashedPass = this.GenerateHashString(salt + password, securityIndex);
for (int i = 1; i <= totalHashCount; i++)
{
hashedPass = this.GenerateHashString(hashedPass, securityIndex);
}
return hashedPass;
}
I went to the developer and he stated the client's security team wanted us to rehash the hash and to do it some prime number greater than 1000....and he provided the email as documentation.
Now I am not a cryptology expert and we have a good relationship with the client so before I went to them and connected this rehash loop to their performance woes I wanted to see if rehashing like this does indeed increase security?
To my understanding a single hash is practically impossible to invert so why waste cycles repeating the process?
Thoughts?
Edit
Added GenerateHash:
protected internal string GenerateHashString(string textToHash, int securityIndex = 0)
{
UnicodeEncoding uEncode = new UnicodeEncoding();
SHA512Managed sha = new SHA512Managed();
byte[] bytVal = uEncode.GetBytes(textToHash + hashIntPool[securityIndex].ToString());
byte[] hashVal = sha.ComputeHash(bytVal);
return Convert.ToBase64String(hashVal);
}
Repeating the hash operation is essential to secure password authentication, but you are doing it wrong and therefore indeed wasting CPU to achieve nothing.
You should use an algorithm like PBKDF2 that includes the password in each round of hashing in order to preserve all the unpredictability of the password. bcrypt and especially scrypt are good alternatives too.
Also, one thousand rounds is not nearly enough; to be secure against offline dictionary attacks, you need the hashing operation to be relatively slow, even when performed on the attacker's dedicated password testing hardware. Picking a prime number of rounds is meaningless mumbo jumbo. The number of rounds will depend on the algorithm you select, but for PBKDF2 with SHA-256, somewhere between 10,000 and 100,000 rounds should provide a reasonable level of security.
A slow algorithm is necessary to prevent an attacker who obtains a hash from quickly trying many different passwords to see which produces the same hash. It's true that a secure hash is not feasible to invert, but it won't stop guessing, and attackers are good at prioritizing their guesses to try the most likely passwords first. Repetition of the hash is what provides this necessary slowness.
This has been discussed many times on StackOverflow. I refer you to a previous answer for more background.
In C#, you could use Rfc2898DeriveBytes to perform password hashing securely. You can encode the derived key in Base-64 to be stored as a string, or actually use it as an encryption key to encrypt a known plain text like the bcrypt algorithm does. You'll notice that Rfc2898DeriveBytes uses a "salt", which I discuss elsewhere; you'll need to store this value along with the hash value to perform authentication later.
The technique, called "stretching", of repeated hashing is used to make brute force attacks more difficult. If it takes 0.1 second to hash a password (due to the repetitions) then an attacker can at best try 10 passwords a second to find a match. If you speed up the hashing process so it takes a microsecond, then the attacker can test a million passwords a second.
You need to balance speed against security. A user login only need to be fast enough to satisfy the user, so 0.1 to 0.5 second is probably acceptable.
If your server is overloaded then get a faster processor, or buy a dedicated hashing server. That will be a lot cheaper than the legal consequences of a data breach.

How to hash a URL quickly

I have a unique situation where I need to produce hashes on the fly. Here is my situation. This question is related to here. I need to store a many urls in the database which need to be indexed. A URL can be over 2000 characters long. The database complains that a string over 900 bytes cannot be indexed. My solution is to hash the URL using MD5 or SHA256. I am not sure which hashing algorithm to use. Here are my requirements
Shortest character length with minimal collision
Needs to be very fast. I will be hashing the referurl on every page request
Collisions need to be minimized since I may have millions of urls in the database
I am not worried about security. I am worried about character length, speed, and collisions. Anyone know of a good algorithm for this?
In your case, I wouldn't use any of the cryptographic hash functions (i.e. MD5, SHA), since they were designed with security in mind: They mainly want to make it as hard as possible to finde two different strings with the same hash. I think this wouldn't be a problem in your case. (the possibility of random collisions is inherent to hashing, of course)
I'd strongly not suggest to use String.GetHashCode(), since the implementation is not known and MSDN says that it might vary between different versions of the framework. Even the results between x86 and x64 versions may be different. So you'll get into troubles when trying to access the same database using a newer (or different) version of the .NET framework.
I found the algorithm for the Java implementation of hashCode on Wikipedia (here), it seems quite easy to implement. Even a straightforward implementation would be faster than an implementation of MD5 or SHA imo. You could also use long values which reduces the probability of collisions.
There is also a short analysis of the .NET GetHashCode implementation here (not the algorithm itself but some implementation details), you could also use this one I guess. (or try to implement the Java version in a similar way ...)
a quick one :
URLString.GetHashCode().ToString("x")
While both MD5 and SHA1 have been proved ineffective where collision prevention is essential I suspect for your application either would be sufficient. I don't know for sure but I suspect that MD5 would be the simpler and quicker of the two algorithms.
Use the System.Security.Cryptography.SHA1Cng class, I would suggest. It's 160 bits or 20 bytes long, so that should definitely be small enough. If you need it to be a string, it will only require 40 characters, so that should suit your needs well. It should also be fast enough, and as far as I know, no collisions have yet been found.
I'd personally use String.GetHashCode(). This is the basic hash function. I honestly have no idea how it performs compared to other implementations but it should be fine.
Either of the two hashing functions that you name should be quick enough that you won't notice much difference between them. Unless this site requires ultra-high performance I would not worry too much about them. I'd personally probably go for MD5. This can be formatted as a string as hexdecimal in 64 characters or as a base 64 string in 44 characters.
The reason I'd go for MD5 is because you are very unlikely to run into collisions and even if you do you can structure your queries with "where urlhash = #hash and url = #url". The database engine should work out that one is indexed and the other isn't and use that information to do a sensible search.
If there are colisions the indexed scan on urlhash will return a handful of results which will be easy to do text comparisons on to get the right one. This is unlikely to be relevant very often though. You've pretty low chances of getting collisions this way.
Reflected source code of GetHashCode function in .net 4.0
public override unsafe int GetHashCode()
{
fixed (char* str = ((char*) this))
{
char* chPtr = str;
int num = 0x15051505;
int num2 = num;
int* numPtr = (int*) chPtr;
for (int i = this.Length; i > 0; i -= 4)
{
num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
if (i <= 2)
{
break;
}
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
numPtr += 2;
}
return (num + (num2 * 0x5d588b65));
}
}
There was O(n) simple operations(+, <<, ^) and one multiplication. So this is very fast.
I've tested this function on 3 mln DB contains strings lengths up to 256 characters and about 97% of strings has no collision. (Maximum 5 strings have the same hash)
You may want to look at the following project:
CMPH - C Minimal Perfect Hashing Library
And check out the following hot topics listing for perfect hashes:
Hottest 'perfect-hash' Answers - Stack Overflow
You could also consider using a full text index in SQL rather than hashing:
CREATE FULLTEXT INDEX (Transact-SQL)

clarification for crypt SHA-512 algorithm (c#)

EDIT: Sorry I forgot to mention, I'm not using the implemented sha512 crypt because as far as I can tell it doesn't involve a salt value or a specified number of rounds to compute the hash with.
Okay so I'm coding the sha-512 crypt in c# and I'm following the steps found here...
http://people.redhat.com/drepper/SHA-crypt.txt
This is my first time doing anything encryption related so I want to make sure I'm understanding the steps correctly... I don't understand c code well enough to direct translation from c to c# :/
I have assumed finishing a digest is the same as computing the hash. In this case, I've also assumed that when the steps refer to a finished digest, they are referring the the computed hash, rather than the pre-hash computed digest bytes. Correct me if I'm wrong please!
Assuming everything has been done correctly for steps 1-8, my doubts start at step 9
9. For each block of 32 or 64 bytes in the password string (excluding
the terminating NUL in the C representation), add digest B to digest A
Since I'm using SHA-512, I have block sizes of 64 bytes.
Would the following code produce the desired result?
//FYI, temp = digestA from steps 1-3 (before expanding digestA for step 9)
//alt_result = computed digestB hash (64 byte hash)
for (cnt = key.Length; cnt > 64; cnt -= 64) //9
{
int i = 0;
ctx.TransformBlock(alt_result, 0, 64, digestA, temp.Length + 64 * i);
i++;
}
If anyone can clarify that what I've stated is correct, I would appreciate it. Thanks!
Salting is as simple as appending a fixed byte string on the end of your input string. Essentially providing a known "homegrown" transform to your input.
About the algorithm itself: you seem to be starting at a disadvantage. A neophyte, you're making a lot of "assumptions" about basic crypting terminology that even need clarification. If the CLR implementation won't work for you, I think your time would be better spent finding a good C implementation and figuring out how to integrate to that. Figuring out the interop (extern) calls to that will be far easier than diving into the intracacies of crypting, the results will be more efficient, and the knowledge you gain about native interop will be far more useful/reusable.
I'll add some important clarification for others who might come across this later.
First:
SHA512 and SHA512Crypt are two distinct algorithms for two different purposes. SHA512 is a general purpose hashing algorithm (see this). SHA512Crypt is a password storage or password based key derivation algorithm that uses SHA512 (hash) internally (see this). SHA512Crypt is based on the earlier Crypt function that used MD5 instead of SHA512.
The password storage/key generation algorithms have been specifically created to make it orders of magnitude more expensive to brute force. The typical way this is done is by iterating over the underlying hash algorithm in some fashion. However, you don't want to to this yourself... which brings us to...
Second:
Do NOT write your own cryptography methods. (see this) There are tons of ways to screw it up, even if you know exactly what you are doing.
If you don't want to use the built in Rfc2898DerviceBytes due to it being based on SHA1, then you could look at bcrypt or some other public, reviewed implementation of a known cryptographic algorithms.

C#/Java/Ruby - Hash Alogrthym for Passwords - Cross-Lang/Platform

What is a good password hashing algorithm to use from C#, if later on you may know that there are going to be Java and Ruby programs that may also need to 'login'/authenticate a user. Anything out of the box in .NET that translates well to other languages and is easy to use.
The strongest cryptographic hash algorithm which NSA/NIST has standardized on is SHA-512.
Be sure to use a per-password random salt (a 128-bit salt generated by a cryptographically strong random number generator is good). Or, even better, be sure to use a per-password random key (again generated by a cryptorandom), and use HMAC-SHA-512. Be sure to use multiple iterations - 4096 and 65,536 are good round numbers (2^12 and 2^16).
let h = get_hash_hunction("SHA-512")
let k = get_key_for_user("Justice")
let hmac = get_hmac(h, k)
let test = get_bytes("utf-8", http_request_params["password"])
for(i in 0 .. (2^16 - 1))
let test = run_hmac(hmac, test)
return test == get_hashed_password_for_user("Justice")
Correct using of MD5 having salt added makes rainbow tables and brute force quite expensive. So, the comment pretty valid to use md5.
A think the MD5 is the most common one.

Categories