Previously I asked a question about combining SHA1+MD5 but after that I understand calculating SHA1 and then MD5 of a lagrge file is not that faster than SHA256.
In my case a 4.6 GB file takes about 10 mins with the default implementation SHA256 with (C# MONO) in a Linux system.
public static string GetChecksum(string file)
{
using (FileStream stream = File.OpenRead(file))
{
var sha = new SHA256Managed();
byte[] checksum = sha.ComputeHash(stream);
return BitConverter.ToString(checksum).Replace("-", String.Empty);
}
}
Then I read this topic and somehow change my code according what they said to :
public static string GetChecksumBuffered(Stream stream)
{
using (var bufferedStream = new BufferedStream(stream, 1024 * 32))
{
var sha = new SHA256Managed();
byte[] checksum = sha.ComputeHash(bufferedStream);
return BitConverter.ToString(checksum).Replace("-", String.Empty);
}
}
But It doesn't have such a affection and takes about 9 mins.
Then I try to test my file through sha256sum command in Linux for the same file and It takes about 28 secs and both the above code and Linux command give the same result !
Someone advised me to read about differences between Hash Code and Checksum and I reach to this topic that explains the differences.
My Questions are :
What causes such different between the above code and Linux sha256sum in time ?
What does the above code do ? (I mean is it the hash code calculation or checksum calculation? Because if you search about give a hash code of a file and checksum of a file in C#, they both reach to the above code.)
Is there any motivated attack against sha256sum even when SHA256 is collision resistant ?
How can I make my implementation as fast as sha256sum in C#?
public string SHA256CheckSum(string filePath)
{
using (SHA256 SHA256 = SHA256Managed.Create())
{
using (FileStream fileStream = File.OpenRead(filePath))
return Convert.ToBase64String(SHA256.ComputeHash(fileStream));
}
}
My best guess is that there's some additional buffering in the Mono implementation of the File.Read operation. Having recently looked into checksums on a large file, on a decent spec Windows machine you should expect roughly 6 seconds per Gb if all is running smoothly.
Oddly it has been reported in more than one benchmark test that SHA-512 is noticeably quicker than SHA-256 (see 3 below). One other possibility is that the problem is not in allocating the data, but in disposing of the bytes once read. You may be able to use TransformBlock (and TransformFinalBlock) on a single array rather than reading the stream in one big gulp—I have no idea if this will work, but it bears investigating.
The difference between hashcode and checksum is (nearly) semantics. They both calculate a shorter 'magic' number that is fairly unique to the data in the input, though if you have 4.6GB of input and 64B of output, 'fairly' is somewhat limited.
A checksum is not secure, and with a bit of work you can figure out the input from enough outputs, work backwards from output to input and do all sorts of insecure things.
A Cryptographic hash takes longer to calculate, but changing just one bit in the input will radically change the output and for a good hash (e.g. SHA-512) there's no known way of getting from output back to input.
MD5 is breakable: you can fabricate an input to produce any given output, if needed, on a PC. SHA-256 is (probably) still secure, but won't be in a few years time—if your project has a lifespan measured in decades, then assume you'll need to change it. SHA-512 has no known attacks and probably won't for quite a while, and since it's quicker than SHA-256 I'd recommend it anyway. Benchmarks show it takes about 3 times longer to calculate SHA-512 than MD5, so if your speed issue can be dealt with, it's the way to go.
No idea, beyond those mentioned above. You're doing it right.
For a bit of light reading, see Crypto.SE: SHA51 is faster than SHA256?
Edit in response to question in comment
The purpose of a checksum is to allow you to check if a file has changed between the time you originally wrote it, and the time you come to use it. It does this by producing a small value (512 bits in the case of SHA512) where every bit of the original file contributes at least something to the output value. The purpose of a hashcode is the same, with the addition that it is really, really difficult for anyone else to get the same output value by making carefully managed changes to the file.
The premise is that if the checksums are the same at the start and when you check it, then the files are the same, and if they're different the file has certainly changed. What you are doing above is feeding the file, in its entirety, through an algorithm that rolls, folds and spindles the bits it reads to produce the small value.
As an example: in the application I'm currently writing, I need to know if parts of a file of any size have changed. I split the file into 16K blocks, take the SHA-512 hash of each block, and store it in a separate database on another drive. When I come to see if the file has changed, I reproduce the hash for each block and compare it to the original. Since I'm using SHA-512, the chances of a changed file having the same hash are unimaginably small, so I can be confident of detecting changes in 100s of GB of data whilst only storing a few MB of hashes in my database. I'm copying the file at the same time as taking the hash, and the process is entirely disk-bound; it takes about 5 minutes to transfer a file to a USB drive, of which 10 seconds is probably related to hashing.
Lack of disk space to store hashes is a problem I can't solve in a post—buy a USB stick?
Way late to the party but seeing as none of the answers mentioned it, I wanted to point out:
SHA256Managed is an implementation of the System.Security.Cryptography.HashAlgorithm class, and all of the functionality related to the read operations are handled in the inherited code.
HashAlgorithm.ComputeHash(Stream) uses a fixed 4096 byte buffer to read data from a stream. As a result, you're not really going to see much difference using a BufferedStream for this call.
HashAlgorithm.ComputeHash(byte[]) operates on the entire byte array, but it resets the internal state after every call, so it can't be used to incrementally hash a buffered stream.
Your best bet would be to use a third party implementation that's optimized for your use case.
using (SHA256 SHA256 = SHA256Managed.Create())
{
using (FileStream fileStream = System.IO.File.OpenRead(filePath))
{
string result = "";
foreach (var hash in SHA256.ComputeHash(fileStream))
{
result += hash.ToString("x2");
}
return result;
}
}
For Reference: https://www.c-sharpcorner.com/article/how-to-convert-a-byte-array-to-a-string/
Related
I am writing a console application which iterates through a binary tree and searches for new or changed files based on their md5 checksums.
The whole process is acceptable fast (14sec for ~70.000 files) but generating the checksums takes about 5min which is quite too slow...
Any suggestions for improving this process? My hash function is the following:
private string getMD5(string filename)
{
using (var md5 = new MD5CryptoServiceProvider())
{
if (File.Exists(#filename))
{
try
{
var buffer = md5.ComputeHash(File.ReadAllBytes(filename));
var sb = new StringBuilder();
for (var i = 0; i < buffer.Length; i++)
{
sb.Append(buffer[i].ToString("x2"));
}
return sb.ToString();
}
catch (Exception)
{
Program.logger.log("Error while creating checksum!", Program.logger.LOG_ERROR);
return "";
}
}
else
{
return "";
}
}
}
Well, accepted answer is not valid, because, of course, there is a ways to improve your code performance. It is valid for some other thoughts however)
Main stopper here, except disk I/O, is memory allocation. Here the some thoughts that should improve speed:
Do not read entire file in memory for calculation, it is slow, and it'll produce a lot of memory pressure via LOH objects. Instead open file as a stream, and calculate Hash by chunks.
The reason, why you have slowdown when using ComputeHash stream override, because internally it use very small buffer (4kb), so choose appropriate buffer size (256kb or more, optimal value to be found by experimenting)
Use TransformBlock and TransformFinalBlock functions to calculate hash value. You can pass null for outputBuffer parameter.
Reuse that buffer for following files hash calculations, so there is no need for additional allocations.
Additionally you can reuse MD5CryptoServiceProvider, but benefits are questionable.
And the last, you can apply async pattern for reading chunks from stream, so OS will read next chunk from disk on the same time, when you calculating partial hash for previous chunk. Of course such code is more difficult to write, and you'll need at least two buffers (reuse them as well), but it can provide great impact on speed.
As a minor improvement, do not check for file existence. I believe, that your function called from some enumeration, and there is very little chance, that file is deleted meanwhile.
All above is valid for medium to large sized files. If you, instead, have a lot of very small files, you can speed calculation by processing files in parallel. Actually parallelization can also help with large files, but it is up to be measured.
And the last, if collisions doesn't bother you too much, you can chose less expensive hash algorithm, CRC, for example.
In order to create the Hash, you have to read every last byte of the file. So this operation is Disk-limited, not CPU limited and scales proportionally to the size of files. Multithreading will not help.
Unless the FS can somehow calculate and store the hash for you, there is just no way to speed this up. You are dependant on what the FS does for you to track changes.
Generally proramms that check for "changed files" (like backup routines) do not calculate the Hashvalue for exactly that reason. They may still calculate and store it for validation purposes, but that is it.
Unless the user does some serious (NTFS driver loading level) sabotage, the "last changed" date with the filesize is enough to detect changes. Maybe also check the archive bit, but that one is rarely used nowadays.
A minor improovement for these kind of scenarios (list files and process them) is using "Enumerate Files" rather then list files. But at 14 seconds Listing/5 minutes processing that will just not have any relevant effect.
I'm just about to build a simple chat application using One Time Pad.
I've already made the algorithm, and to encrypt the messages, I need some sort of key material that is the same on both sides. The distribution of the key material is supposed to happen with physical contact (e.g. USB dongle). So I would like to make some very large random key files, that the two clients can use to communicate. So my questions are:
I need a very secure random number/string generator, do you know any good ones that I can use in C#?
And how do I, when I use such big files, avoid to load the whole file into memory, as I plan to read a chunk of the key material (e.g. 1 MB), and remove it from the file afterwards when read, so the same key won't be used twice.
I Should probably start with this: I assume this is a for fun or for exercise project - not an attempt to create something truly secure.
As owlstead says: Use RNGCryptoServiceProvider.
Removing the used key material from the file is much easier if you use it in reverse. If you need to encrypt 1024 bytes, read the last 1024 bytes from the file and truncate it. Simplified:
byte[] Encrypt(byte[] plain){
using (FileStream keyFile = new FileStream(FileName, FileMode.Open))
{
keyFile.Seek(-plain.Length, SeekOrigin.End);
byte[] key = new byte[plain.Length];
keyFile.Read(key, 0, plain.Length);
byte[] encrypted = new byte[plain.Length];
for(int i=0;i<plain.Length;i++){
encrypted[i] = (byte) (plain[i] ^ key[plain.Length - 1 - i]);
}
keyFile.SetLength(keyFile.Length - plain.Length);
return encrypted;
}
}
You are trying to solve an issue that has already been solved: using AES is almost certainly as safe as using the One Time Pad. In that case your key becomes 16 bytes (although you will need a NONCE or IV depending on the mode).
You need to use a secure random number generator RNGCryptoServiceProvider.
Read the file into memory using MemoryMappedFile.
Just (atomically) store the offset in the file instead. E.g. use the first 8 bytes in the same file to store an ulong type, so you can sync when needed. You may write zero's in the file in addition to that for the bytes used if you really want. Note that e.g. with SSD's you may not actually overwrite the physical data.
What you have designed is almost certainly not a One Time Pad. The generation of large quantities of truly random bytes is a far from trivial task. You should really be spending a few thousand dollars on a hardware card if you really must go down that route. Even the crypto quality C# RNG is not up to generating that sort of volume of true random data. It can generate a short key securely enough, but does not have enough entropy input to generate large quantities of true random data. As soon as the entropy runs out, its output reverts to pseudo-random and you no longer have a One Time Pad.
As #owlstead said, use AES, in either CBC or CTR mode. That is secure, easily available and is not designed by an amateur.
I need to generate a unique id for file sizes of upto 200-300MB. The condition is that the algo should be quick, it should not take much time. I am selecting the files from a desktop and calculation a hash value as such:
HMACSHA256 myhmacsha256 = new HMACSHA256(key);
byte[] hashValue = myhmacsha256.ComputeHash(fileStream);
filestream is a handle to the file to read content from it. This method is going to take a lot of time for obvious reasons.
Does windows generate a key for a file for its own book keeping that I could directly use ?
Is there any other way to identify if the file is same, instead of matching file name which is not very foolproof.
MD5.Create().ComputeHash(fileStream);
Alternatively, I'd suggest looking at this rather similar question.
How about generating a hash from the info that's readily available from the file itself? i.e. concatenate :
File Name
File Size
Created Date
Last Modified Date
and create your own?
When you compute hashes and compare them, it would require both files to completely go through. My suggestion is to first check the file sizes, if they are identical and then go through the files byte by byte.
If you want a "quick and dirty" check, I would suggest looking at CRC-32. It is extremely fast (the algorithm simply involves doing XOR with table lookups), and if you aren't too concerned about collision resistance, a combination of the file size and the CRC-32 checksum over the file data should be adequate. 28.5 bits are required to represent the file size (that gets you to 379M bytes), which means you get a checksum value of effectively just over 60 bits. I would use a 64-bit quantity to store the file size, for future proofing, but 32 bits would work too in your scenario.
If collision resistance is a consideration, then you pretty much have to use one of the tried-and-true-yet-unbroken cryptographic hash algorithms. I would still concur with what Devils child wrote and also include the file size as a separate (readily accessible) part of the hash, however; if the sizes don't match, there is no chance that the file content can be the same, so in that case the computationally intensive hash calculation can be skipped.
I was looking at the good example in this MSDN page: http://msdn.microsoft.com/en-us/library/system.security.cryptography.x509certificates.x509certificate2.aspx
Scroll down half way to the example and look into the method:
// Decrypt a file using a private key.
private static void DecryptFile(string inFile, RSACryptoServiceProvider rsaPrivateKey)
You will notice the reader is reading only 3 bytes at time while it is trying to read an int off the stream:
inFs.Seek(0, SeekOrigin.Begin);
inFs.Read(LenK, 0, 3);// <---- this should be 4
inFs.Seek(4, SeekOrigin.Begin);// <--- this line masks the bug for smaller ints
inFs.Read(LenIV, 0, 3); // <---- this should be 4
Since the next line is Seeking to position "4", the bug is getting masked. Am I getting it right or, is it intentional i.e. some sort of weird optimization since we know that (for this example) length of AES Key and IV is going to be small enough to be accomodated in 3 bytes so read only 3 and then skip to 4, thus save reading 1 byte off the disk?
If an optimization.... Really??
I very much doubt it's an optimisation. Disk reads tend to be in chunks substantially larger than four bytes, and caching would pretty much invalidate this type of optimisation, except in the very rare case where the four bytes may cross two different disk "sectors" (or whatever the disk read resolution is).
That sort of paradigm tends to be seen where (for example) only three bytes are used and the implementation stores other information there.
Not saying that's the case here but you may want to look into the history of certain large companies in using fields for their own purposes, despite what standards say, a la "embrace, extend, extinguish" :-)
The .NET framework ships with 6 different hashing algorithms:
MD5: 16 bytes (Time to hash 500MB: 1462 ms)
SHA-1: 20 bytes (1644 ms)
SHA256: 32 bytes (5618 ms)
SHA384: 48 bytes (3839 ms)
SHA512: 64 bytes (3820 ms)
RIPEMD: 20 bytes (7066 ms)
Each of these functions performs differently; MD5 being the fastest and RIPEMD being the slowest.
MD5 has the advantage that it fits in the built-in Guid type; and it is the basis of the type 3 UUID. SHA-1 hash is the basis of type 5 UUID. Which makes them really easy to use for identification.
MD5 however is vulnerable to collision attacks, SHA-1 is also vulnerable but to a lesser degree.
Under what conditions should I use which hashing algorithm?
Particular questions I'm really curious to see answered are:
Is MD5 not to be trusted? Under normal situations when you use the MD5 algorithm with no malicious intent and no third party has any malicious intent would you expect ANY collisions (meaning two arbitrary byte[] producing the same hash)
How much better is RIPEMD than SHA1? (if its any better) its 5 times slower to compute but the hash size is the same as SHA1.
What are the odds of getting non-malicious collisions when hashing file-names (or other short strings)? (Eg. 2 random file-names with same MD5 hash) (with MD5 / SHA1 / SHA2xx) In general what are the odds for non-malicious collisions?
This is the benchmark I used:
static void TimeAction(string description, int iterations, Action func) {
var watch = new Stopwatch();
watch.Start();
for (int i = 0; i < iterations; i++) {
func();
}
watch.Stop();
Console.Write(description);
Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
}
static byte[] GetRandomBytes(int count) {
var bytes = new byte[count];
(new Random()).NextBytes(bytes);
return bytes;
}
static void Main(string[] args) {
var md5 = new MD5CryptoServiceProvider();
var sha1 = new SHA1CryptoServiceProvider();
var sha256 = new SHA256CryptoServiceProvider();
var sha384 = new SHA384CryptoServiceProvider();
var sha512 = new SHA512CryptoServiceProvider();
var ripemd160 = new RIPEMD160Managed();
var source = GetRandomBytes(1000 * 1024);
var algorithms = new Dictionary<string,HashAlgorithm>();
algorithms["md5"] = md5;
algorithms["sha1"] = sha1;
algorithms["sha256"] = sha256;
algorithms["sha384"] = sha384;
algorithms["sha512"] = sha512;
algorithms["ripemd160"] = ripemd160;
foreach (var pair in algorithms) {
Console.WriteLine("Hash Length for {0} is {1}",
pair.Key,
pair.Value.ComputeHash(source).Length);
}
foreach (var pair in algorithms) {
TimeAction(pair.Key + " calculation", 500, () =>
{
pair.Value.ComputeHash(source);
});
}
Console.ReadKey();
}
In cryptography, hash functions provide three separate functions.
Collision resistance: How hard is it for someone to find two messages (any two messages) that hash the same.
Preimage Resistance: Given a hash, how hard is it to find another message that hashes the same? Also known as a one way hash function.
Second preimage resistance: Given a message, find another message that hashes the same.
These properties are related but independent. For example, collision resistance implies second preimage resistance, but not the other way around. For any given application, you will have different requirements, needing one or more of these properties. A hash function for securing passwords on a server will usually only require preimage resistance, while message digests require all three.
It has been shown that MD5 is not collision resistant, however, that does not preclude its use in applications that do not require collision resistance. Indeed, MD5 is often still used in applications where the smaller key size and speed are beneficial. That said, due to its flaws, researchers recommend the use of other hash functions in new scenarios.
SHA1 has a flaw that allows collisions to be found in theoretically far less than the 2^80 steps a secure hash function of its length would require. The attack is continually being revised and currently can be done in ~2^63 steps - just barely within the current realm of computability (as of April, 2009). For this reason NIST is phasing out the use of SHA1, stating that the SHA2 family should be used after 2010.
SHA2 is a new family of hash functions created following SHA1. Currently there are no known attacks against SHA2 functions. SHA256, 384 and 512 are all part of the SHA2 family, just using different key lengths.
RIPEMD I can't comment too much on, except to note that it isn't as commonly used as the SHA families, and so has not been scrutinized as closely by cryptographic researchers. For that reason alone I would recommend the use of SHA functions over it. In the implementation you are using it seems quite slow as well, which makes it less useful.
In conclusion, there is no one best function - it all depends on what you need it for. Be mindful of the flaws with each and you will be best able to choose the right hash function for your scenario.
⚠️ WARNING
August, 2022
DO NOT USE SHA-1 OR MD5 FOR CRYPTOGRAPHIC APPLICATIONS. Both of these algorithms are broken (MD5 can be cracked in 30 seconds by a cell phone).
All hash functions are "broken"
The pigeonhole principle says that try as hard as you will you can not fit more than 2 pigeons in 2 holes (unless you cut the pigeons up). Similarly you can not fit 2^128 + 1 numbers in 2^128 slots. All hash functions result in a hash of finite size, this means that you can always find a collision if you search through "finite size" + 1 sequences. It's just not feasible to do so. Not for MD5 and not for Skein.
MD5/SHA1/Sha2xx have no chance collisions
All the hash functions have collisions, its a fact of life. Coming across these collisions by accident is the equivalent of winning the intergalactic lottery. That is to say, no one wins the intergalactic lottery, its just not the way the lottery works. You will not come across an accidental MD5/SHA1/SHA2XXX hash, EVER. Every word in every dictionary, in every language, hashes to a different value. Every path name, on every machine in the entire planet has a different MD5/SHA1/SHA2XXX hash. How do I know that, you may ask. Well, as I said before, no one wins the intergalactic lottery, ever.
But ... MD5 is broken
Sometimes the fact that its broken does not matter.
As it stands there are no known pre-image or second pre-image attacks on MD5.
So what is so broken about MD5, you may ask? It is possible for a third party to generate 2 messages, one of which is EVIL and another of which is GOOD that both hash to the same value. (Collision attack)
Nonetheless, the current RSA recommendation is not to use MD5 if you need pre-image resistance. People tend to err on the side of caution when it comes to security algorithms.
So what hash function should I use in .NET?
Use MD5 if you need the speed/size and don't care about birthday attacks or pre-image attacks.
Repeat this after me, there are no chance MD5 collisions, malicious collisions can be carefully engineered. Even though there are no known pre-image attacks to date on MD5 the line from the security experts is that MD5 should not be used where you need to defend against pre-image attacks. SAME goes for SHA1.
Keep in mind, not all algorithms need to defend against pre-image or collision attacks. Take the trivial case of a first pass search for duplicate files on your HD.
Use SHA2XX based function if you want a cryptographically secure hash function.
No one ever found any SHA512 collision. EVER. They have tried really hard. For that matter no one ever found any SHA256 or 384 collision ever. .
Don't use SHA1 or RIPEMD unless its for an interoperability scenario.
RIPMED has not received the same amount of scrutiny that SHAX and MD5 has received. Both SHA1 and RIPEMD are vulnerable to birthday attacks. They are both slower than MD5 on .NET and come in the awkward 20 byte size. Its pointless to use these functions, forget about them.
SHA1 collision attacks are down to 2^52, its not going to be too long until SHA1 collisions are out in the wild.
For up to date information about the various hash functions have a look at the hash function zoo.
But wait there is more
Having a fast hash function can be a curse. For example: a very common usage for hash functions is password storage. Essentially, you calculate hash of a password combined with a known random string (to impede rainbow attacks) and store that hash in the database.
The problem is, that if an attacker gets a dump of the database, he can, quite effectively guess passwords using brute-force. Every combination he tries only takes a fraction of millisecond, and he can try out hundreds of thousands of passwords a second.
To work around this issue, the bcrypt algorithm can be used, it is designed to be slow so the attacker will be heavily slowed down if attacking a system using bcrypt. Recently scrypt has made some headline and is considered by some to be more effective than bcrypt but I do not know of a .Net implementation.
Update:
Times have changed, we have a SHA3 winner. I would recommend using keccak (aka SHA3) winner of the SHA3 contest.
Original Answer:
In order of weakest to strongest I would say:
RIPEMD BROKEN, Should never be used as can be seen in this pdf
MD-5 BROKEN, Should never be used, can be broken in 2 minutes with a laptop
SHA-1 BROKEN, Should never be used, is broken in principal, attacks are getting better by the week
SHA-2 WEAK, Will probably be broken in the next few years. A few weaknesses have been found. Note that generally the higher key size, the harder the hash function is to break. While key size = strength is not always true, it is mostly true. So SHA-256 is probably weaker than SHA-512.
Skein NO KNOWN WEAKNESSES, is a candidate for SHA-3. It is fairly new and thus untested. It has been implemented in a bunch of languages.
MD6 NO KNOWN WEAKNESSES, is another a candidate for SHA-3. Probably stronger than Skien, but slower on single core machines. Like Skien it is untested. Some security minded developers are using it, in mission critical roles.
Personally I'd use MD6, because one can never been too paranoid. If speed is a real concern I'd look at Skein, or SHA-256.
In MD5's defense, there is no known way to produce a file with an arbitrary MD5 hash. The original author must plan in advance to have a working collision. Thus if the receiver trusts the sender, MD5 is fine. MD5 is broken if the signer is malicious, but it is not known to be vulnerable to man-in-the-middle attacks.
It would be a good ideea to take a look at the BLAKE2 algorythm.
As it is described, it is faster than MD5 and at least as secure as SHA-3. It is also implemented by several software applications, including WinRar.
Which one you use really depends on what you are using it for. If you just want to make sure that files don't get corrupted in transit and aren't that concerned about security, go for fast and small. If you need digital signatures for multi-billion dollar federal bailout agreements and need to make sure they aren't forged, go for hard to spoof and slow.
I would like to chime in (before md5 gets torn apart) that I do still use md5 extensively despite its overwhelming brokenness for a lot of crypto.
As long as you don't care to protect against collisions (you are still safe to use md5 in an hmac as well) and you do want the speed (sometimes you want a slower hash) then you can still use md5 confidently.
I am not an expert at this sort of thing, but I keep up with the security community and a lot of people there consider the md5 hash broken. I would say that which one to use depends on how sensitive the data is and the specific application. You might be able to get away with a slightly less secure hash as long as the key is good and strong.
Here are my suggestions for you:
You should probably forget MD5 if you anticipate attacks. There are many rainbow tables for them online, and corporations like the RIAA have been known to be able to produce sequences with equivalent hashes.
Use a salt if you can. Including the message length in the message can make it very difficult to make a useful hash collision.
As a general rule of thumb, more bits means less collisions (by pigeonhole principle) and slower, and maybe more secure (unless you are a math genius who can find vulnerabilities).
See here for a paper detailing an algorithm to create md5 collisions in 31 seconds with a desktop Intel P4 computer.
http://eprint.iacr.org/2006/105