I have about 16,000,000 urls, and I want to get hashcode about the urls.
I use System.Security.Cryptography.MD5 to create hashcode, refer this link
my code like this:
private string createMD5(string input)
{
using(MD5 md5 = MD5.Create())//the problem is here
{
byte[] inputBytes = System.Text.Encoding.Unicode.GetBytes(input);
byte[] hashBytes = md5.ComputeHash(inputBytes);
StringBuilder sb = new StringBuilder();
for(int i = 0;i < hashBytes.Length;i ++)
{
sb.Append(hashBytes[i].ToString("X2"));
}
return sb.ToString();
}
}
public void checkMD5Hash()
{
HashSet<string> urls = new HashSet<string>();
using (StreamReader reader = new StreamReader(#"D:/ScopePractice/SatoriExtractionReport/TransformData/WikipediaUrlList.tsv"))
{
//calcuate the MD5 hashcode for all urls
HashSet<string> set = new HashSet<string>();
string line = null;
while ((line = reader.ReadLine()) != null)
{
set.Add(createMD5(line));
}
double totalNum = 15844272;
Console.WriteLine("MD5Hash: {0}",set.Count() * 1.0 / totalNum * 1.0);
}
}
I use using(MD5 md5 = MD5.Create()),so I create many MD5 instances, it causes difference urls get same MD5 hashcode.
after that, I change my code, use the same MD5 instance to generate hashcode and the problem is solved.
static MD5 md5 = MD5.Create();
private string createMD5(string input)
{
byte[] inputBytes = System.Text.Encoding.Unicode.GetBytes(input);
byte[] hashBytes = md5.ComputeHash(inputBytes);
StringBuilder sb = new StringBuilder();
for(int i = 0;i < hashBytes.Length;i ++)
{
sb.Append(hashBytes[i].ToString("X2"));
}
return sb.ToString();
}
But I don't why different instance will cause this problem, please help me, thanks.
Related
I have some string and I want to hash it with the SHA-256 hash function using C#. I want something like this:
string hashString = sha256_hash("samplestring");
Is there something built into the framework to do this?
The implementation could be like that
public static String sha256_hash(String value) {
StringBuilder Sb = new StringBuilder();
using (SHA256 hash = SHA256Managed.Create()) {
Encoding enc = Encoding.UTF8;
Byte[] result = hash.ComputeHash(enc.GetBytes(value));
foreach (Byte b in result)
Sb.Append(b.ToString("x2"));
}
return Sb.ToString();
}
Edit: Linq implementation is more concise, but, probably, less readable:
public static String sha256_hash(String value) {
using (SHA256 hash = SHA256Managed.Create()) {
return String.Concat(hash
.ComputeHash(Encoding.UTF8.GetBytes(value))
.Select(item => item.ToString("x2")));
}
}
Edit 2: .NET Core , .NET5, .NET6 ...
public static String sha256_hash(string value)
{
StringBuilder Sb = new StringBuilder();
using (var hash = SHA256.Create())
{
Encoding enc = Encoding.UTF8;
byte[] result = hash.ComputeHash(enc.GetBytes(value));
foreach (byte b in result)
Sb.Append(b.ToString("x2"));
}
return Sb.ToString();
}
From .NET 5 onwards, you can use the new Convert.ToHexString method to convert the hash byte array into a (hex) string without having to use a StringBuilder or .ToString("X0"), etc.:
public static string HashWithSHA256(string value)
{
using var hash = SHA256.Create();
var byteArray = hash.ComputeHash(Encoding.UTF8.GetBytes(value));
return Convert.ToHexString(byteArray);
}
I was looking for an in-line solution, and was able to compile the below from Dmitry's answer:
public static String sha256_hash(string value)
{
return (System.Security.Cryptography.SHA256.Create()
.ComputeHash(Encoding.UTF8.GetBytes(value))
.Select(item => item.ToString("x2")));
}
Im trying to translate this to c#
f1 = Digest::SHA1.hexdigest(#password)
f2 = nonce + ":" + f1
Digest::MD5.hexdigest(f2)
My Code
private static string GetSHA1HashData(string data)
{
//create new instance of md5
SHA1 sha1 = SHA1.Create();
//convert the input text to array of bytes
byte[] hashData = sha1.ComputeHash(Encoding.Default.GetBytes(data));
//create new instance of StringBuilder to save hashed data
StringBuilder returnValue = new StringBuilder();
//loop for each byte and add it to StringBuilder
for (int i = 0; i < hashData.Length; i++)
{
returnValue.Append(hashData[i].ToString());
}
// return hexadecimal string
return returnValue.ToString();
}
public static string CreateMD5Hash(string input)
{
// Use input string to calculate MD5 hash
MD5 md5 = System.Security.Cryptography.MD5.Create();
byte[] inputBytes = System.Text.Encoding.ASCII.GetBytes(input);
byte[] hashBytes = md5.ComputeHash(inputBytes);
// Convert the byte array to hexadecimal string
StringBuilder sb = new StringBuilder();
for (int i = 0; i < hashBytes.Length; i++)
{
sb.Append(hashBytes[i].ToString("X2"));
// To force the hex string to lower-case letters instead of
// upper-case, use he following line instead:
// sb.Append(hashBytes[i].ToString("x2"));
}
return sb.ToString();
}
Call
var nonce = "1386755695841";
var password = "edc123";
var sha = GetSHA1HashData(password);
var md5 = CreateMD5Hash(nonce + ":" + sha);
But i cant get it right, any ideas
The problem is that .NET uses UTF-16 by default whilst Ruby will use something else (normally UTF-8, but it may also respect encoding that has been set by database or web source).
I think you just need to alter one line:
//convert the input text to array of bytes
byte[] hashData = sha1.ComputeHash(Encoding.ASCII.GetBytes(data));
You may need UTF-8 or even some other encoding instead, depending on the range of passwords accepted by the Ruby version. However, the ASCII coding should at least prove correctness of this answer in general from your sample data.
How to generate a random Md5 hash value in C#?
A random MD5 hash value is effectively just a 128-bit crypto-strength random number.
var bytes = new byte[16];
using (var rng = new RNGCryptoServiceProvider())
{
rng.GetBytes(bytes);
}
// and if you need it as a string...
string hash1 = BitConverter.ToString(bytes);
// or maybe...
string hash2 = BitConverter.ToString(bytes).Replace("-", "").ToLower();
You could create a random string using Guid.NewGuid() and generate its MD5 checksum.
using System.Text;
using System.Security.Cryptography;
public static string ConvertStringtoMD5(string strword)
{
MD5 md5 = MD5.Create();
byte[] inputBytes = System.Text.Encoding.ASCII.GetBytes(strword);
byte[] hash = md5.ComputeHash(inputBytes);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < hash.Length; i++)
{
sb.Append(hash[i].ToString("x2"));
}
return sb.ToString();
}
Looking for a way to do the following in C# from a string.
public static String sha512Hex(byte[] data)
Calculates the SHA-512 digest and returns the value as a hex string.
Parameters:
data - Data to digest
Returns:
SHA-512 digest as a hex string
private static string GetSHA512(string text)
{
UnicodeEncoding UE = new UnicodeEncoding();
byte[] hashValue;
byte[] message = UE.GetBytes(text);
SHA512Managed hashString = new SHA512Managed();
string encodedData = Convert.ToBase64String(message);
string hex = "";
hashValue = hashString.ComputeHash(UE.GetBytes(encodedData));
foreach (byte x in hashValue)
{
hex += String.Format("{0:x2}", x);
}
return hex;
}
Would System.Security.Cryptography.SHA512 be what you need?
var alg = SHA512.Create();
alg.ComputeHash(Encoding.UTF8.GetBytes("test"));
BitConverter.ToString(alg.Hash).Dump();
Executed in LINQPad produces:
EE-26-B0-DD-4A-F7-E7-49-AA-1A-8E-E3-C1-0A-E9-92-3F-61-89-80-77-2E-47-3F-88-19-A5-D4-94-0E-0D-B2-7A-C1-85-F8-A0-E1-D5-F8-4F-88-BC-88-7F-D6-7B-14-37-32-C3-04-CC-5F-A9-AD-8E-6F-57-F5-00-28-A8-FF
To create the method from your question:
public static string sha512Hex(byte[] data)
{
using (var alg = SHA512.Create())
{
alg.ComputeHash(data);
return BitConverter.ToString(alg.Hash);
}
}
Got this to work. Taken from here and modified a bit.
public static string CreateSHAHash(string Phrase)
{
SHA512Managed HashTool = new SHA512Managed();
Byte[] PhraseAsByte = System.Text.Encoding.UTF8.GetBytes(string.Concat(Phrase));
Byte[] EncryptedBytes = HashTool.ComputeHash(PhraseAsByte);
HashTool.Clear();
return Convert.ToBase64String(EncryptedBytes);
}
Better memory management:
public static string SHA512Hash(string value)
{
byte[] encryptedBytes;
using (var hashTool = new SHA512Managed())
{
encryptedBytes = hashTool.ComputeHash(System.Text.Encoding.UTF8.GetBytes(string.Concat(value)));
hashTool.Clear();
}
return Convert.ToBase64String(encryptedBytes);
}
I have a function that generates a MD5 hash in C# like this:
MD5 md5 = new MD5CryptoServiceProvider();
byte[] result = md5.ComputeHash(data);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < result.Length; i++)
{
sb.Append(result[i].ToString("X2"));
}
return sb.ToString();
In java my function looks like this:
MessageDigest m = MessageDigest.getInstance("MD5");
m.update(bytes,0,bytes.length);
String hashcode = new BigInteger(1,m.digest()).toString(16);
return hashcode;
While the C# code generates: "02945C9171FBFEF0296D22B0607D522D" the java codes generates: "5a700e63fa29a8eae77ebe0443d59239".
Is there a way to generate the same md5 hash for the same bytearray?
On demand:
This is the testcode in java:
File file = new File(System.getProperty("user.dir") + "/HashCodeTest.flv");
byte[] bytes = null;
try {
bytes = FileUtils.getBytesFromFile(file);
} catch (IOException e) {
fail();
}
try {
generatedHashCode = HashCode.generate(bytes);
} catch (NoSuchAlgorithmException e) {
fail();
}
and this is my code in C#
var blob = GetBlobByHttpPostedFile(httpPostedFile);
var hashCode = Md5Factory.ConvertByteArray(blob);
private static byte[] GetBlobByHttpPostedFile(HttpPostedFile httpPostedFile)
{
var contentLength = httpPostedFile.ContentLength;
var result = new byte[contentLength];
var inputStream = httpPostedFile.InputStream;
inputStream.Read(result, 0, contentLength);
return result;
}
Cheers
That should be fine - although you could make the Java code simpler by just calling
byte[] digest = m.digest(bytes);
instead of calling update then digest.
Are you absolutely sure you've got the same data in both cases? Could you post sample programs showing this failing with the same hard-coded data?
EDIT: Here's the sort of test I was thinking of. These two programs give the same result:
C#:
using System;
using System.Security.Cryptography;
using System.Text;
class Test
{
static void Main()
{
byte[] bytes = { 0x35, 0x24, 0x76, 0x12 };
MD5 md5 = new MD5CryptoServiceProvider();
byte[] result = md5.ComputeHash(bytes);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < result.Length; i++)
{
sb.Append(result[i].ToString("x2"));
}
Console.WriteLine(sb);
}
}
Java:
import java.math.BigInteger;
import java.security.MessageDigest;
public class Test
{
public static void main(String[] args) throws Exception
{
byte[] bytes = { 0x35, 0x24, 0x76, 0x12 };
MessageDigest m = MessageDigest.getInstance("MD5");
byte[] digest = m.digest(bytes);
String hash = new BigInteger(1, digest).toString(16);
System.out.println(hash);
}
}
Hi I m using this code and it works
C# code :
public static string ConvertStringToMD5(string ClearText)
{
byte[] ByteData = Encoding.ASCII.GetBytes(ClearText);
//MD5 creating MD5 object.
MD5 oMd5 = MD5.Create();
//Hash değerini hesaplayalım.
byte[] HashData = oMd5.ComputeHash(ByteData);
//convert byte array to hex format
StringBuilder oSb = new StringBuilder();
for (int x = 0; x < HashData.Length; x++)
{
//hexadecimal string value
oSb.Append(HashData[x].ToString("x2"));
}
and Java code :
private String getMD5Digest(byte[] buffer) {
String resultHash = null;
try {
MessageDigest md5 = MessageDigest.getInstance("MD5");
byte[] result = new byte[md5.getDigestLength()];
md5.reset();
md5.update(buffer);
result = md5.digest();
StringBuffer buf = new StringBuffer(result.length * 2);
for (int i = 0; i < result.length; i++) {
int intVal = result[i] & 0xff;
if (intVal < 0x10) {
buf.append("0");
}
buf.append(Integer.toHexString(intVal));
}
resultHash = buf.toString();
} catch (NoSuchAlgorithmException e) {
}
return resultHash;
}
I came cross the similar issue that we were using Java MD5 Hash to determine whether a file has been processed. We found we cannot create same hash using .NET library. I tried all above suggestion, unfortunately it is not working for me.
The solution I found out later is: instead of create similar function in .NET, we call Java function directly in .NET. There is one great open source project called Ja.NET. Basically what i did is: create a Java class that create hash using the same code. compile it using Ja.NET javac. Then using bam compile the generated Java class file into DLL and use it in my .NET project.
I know this topic is old but I ran into the same issue just now and couldn't find an answer that worked for me. I was writing a patcher for a game and needed the md5 hashcode of files as a way to ensure that the files are up to date, but C# and Java gave me different strings although the files were identical.
Here's how I solved it:
C# Code:
public static string getMD5(string fullPath)
{
MD5 md5 = MD5.Create();
using (FileStream stream = new FileStream(fullPath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
byte[] hash = md5.ComputeHash(stream);
StringBuilder sb = new StringBuilder();
for (int j = 0; j < hash.Length; j++)
{
sb.Append(hash[j].ToString("X2"));
}
return sb.ToString();
}
}
This creates a 32 character hex string. Apache Commons DigestUtils.md5Hex(InputStream) does the same, now the only different is that the C# example returns an uppercase string, so the solution is simply to convert the hash from the Java program to an uppercase string.
Java code:
public static String checkSumApacheCommons(String filePath)
{
String checksum = null;
try
{
checksum = DigestUtils.md5Hex(new FileInputStream(filePath));
}
catch (IOException ex)
{
ex.printStackTrace(System.out);
}
return checksum.toUpperCase();
}
The produced hashes look like F674865D8A44695A2443017CFA2B0C67.
Hope this helps someone.