Correct MD5 hash to decode resources hashes

Correct MD5 hash to decode resources hashes - c#

I am trying to decode the value I find in tags "hash" attribute, like "b4002e70b6cb73b1093d83e2b8e6c734", to a byte array so I can call the noteStore.getResourceByHash method correctly. Right now I am constantly getting EDAMNotFoundException errors, so I am guessing I am not computing the hash correctly.
Did anyone already figure this out?
Here is the code. I tried many different methods. This is the current state of affairs:
System.Security.Cryptography.MD5CryptoServiceProvider test123 = new System.Security.Cryptography.MD5CryptoServiceProvider();
byte[] data = System.Text.Encoding.ASCII.GetBytes("b4002e70b6cb73b1093d83e2b8e6c733");
data = test123.ComputeHash(data);
var note = noteStore.getResourceByHash(evernoteToken, noteGuid, data, true, false, false);

It looks like your hexadecimal number is 16 bytes. Is it a GUID? If so, you can just use this:
var id = Guid.Parse("b4002e70b6cb73b1093d83e2b8e6c733").ToByteArray();
Using Encoding.ASCII.GetBytes is definitely not right, because that will get you a byte per char, corresponding to the ASCII value for that character. You want a byte per two char (hexadecimal decoding).

Evernote references resources via the resource's GUID or the hash of the binary file stream.
If you are looking to get the hash of a resource, you must take the hash of the file you have uploaded to Evernote. The code might look something like this:
public string CalculateFileHashTotal(string fileLocation)
{
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(fileLocation))
{
byte[] b = md5.ComputeHash(stream);
stream.Close();
return BitConverter.ToString(B).Replace("-", "").ToLower();
}
}
}
If you are looking to get a resource that has already been uploaded or you don't have access to the file, referencing the resource via the GUID would likely be your best option.
You can call the getNote method to get the note object which will have the attribute resources which contains a list of Resources each of has the attribute GUID. This GUID can be used to call any of the following methods (each is linked to Evernote API reference):
getResource
getResourceAlternateData
getResourceApplicationData
getResourceApplicationDataEntry
getResourceAttributes
getResourceData
getResourceRecognition
getResourceSearchText

Related

Generate unique ID from string in C#

I need my app to handle a list of mods from a database and a list of locally downloaded mods that aren't.
Each mod of the database has a unique uint ID that I use to identify him but local mods don't have any ID.
At first I tried to generate an ID with string.GetHashCode() by using the mod's name but GetHashCode is still randomized at each run of the app.
Is there any other way to generate a persistent uint ID from the mod's name ?
Current code :
foreach(string mod in localMods)
{
//This way I get a number between 0 and 2147483648
uint newId = Convert.ToUInt32(Math.Abs(mod.GetHashCode());
ProfileMod newMod = new ProfileMod(newId);
}

The method GetHashCode() doesn't return the same value for the same string, especially if you re-run the application. It has a different purpose (like checking the equality during runtime, etc.).
So, it shouldn't be used as a unique identifier.
If you'd like to calculate the hash and get consistent results, you might consider using the standard hashing algorithms like MD5, SHA256, etc.
Here is a sample that calculates SHA256:
using System;
using System.Security.Cryptography;
using System.Text;
public class Program
{
public static void Main()
{
string input = "Hello World!";
// Using the SHA256 algorithm for the hash.
// NOTE: You can replace it with any other algorithm (e.g. MD5) if you need.
using (var hashAlgorithm = SHA256.Create())
{
// Convert the input string to a byte array and compute the hash.
byte[] data = hashAlgorithm.ComputeHash(Encoding.UTF8.GetBytes(input));
// Create a new Stringbuilder to collect the bytes
// and create a string.
var sBuilder = new StringBuilder();
// Loop through each byte of the hashed data
// and format each one as a hexadecimal string.
for (int i = 0; i < data.Length; i++)
{
sBuilder.Append(data[i].ToString("x2"));
}
// Return the hexadecimal string.
var hash = sBuilder.ToString();
Console.WriteLine($"The SHA256 hash of {input} is: {hash}.");
}
}
}
Though SHA256 produces longer result than MD5, the risk of the collisions are much lower. But if you still want to have smaller hashes (with a higher risk of collisions), you can use MD5, or even CRC32.
P.S. The sample code is based on the one from the Microsoft's documentation.

So I ended up listening to your advises and found a good answer in another post by using SHA-1
private System.Security.Cryptography.SHA1 hash = new System.Security.Cryptography.SHA1CryptoServiceProvider();
private uint GetUInt32HashCode(string strText)
{
if (string.IsNullOrEmpty(strText)) return 0;
//Unicode Encode Covering all characterset
byte[] byteContents = Encoding.Unicode.GetBytes(strText);
byte[] hashText = hash.ComputeHash(byteContents);
uint hashCodeStart = BitConverter.ToUInt32(hashText, 0);
uint hashCodeMedium = BitConverter.ToUInt32(hashText, 8);
uint hashCodeEnd = BitConverter.ToUInt32(hashText, 16);
var hashCode = hashCodeStart ^ hashCodeMedium ^ hashCodeEnd;
return uint.MaxValue - hashCode;
}
Could probably be optimized but it's good enough for now.

I wouldn't trust any solution involving hashing or such. Eventually you will end-up having conflicts in the IDs especially if you have huge amount of records on your DB.
What I would prefer to do is to cast the int ID of the DB to a string when reading it and then use some function like Guid.NewGuid().ToString() to generate a string UID for the local ones.
This way you will not have any conflict at all.
I guess that you will have to employ some kind of such strategy.

How do I properly emit binary data from a SecureString, so that it can later be converted to a string?

I have strings of sensitive information that I need to collect from my users. I am using a WPF PasswordBox to request this information. For the uninitiated, the PasswordBox control provides a SecurePassword property which is a SecureString object rather than an insecure string object. Within my application, the data from the PasswordBox gets passed as a SecureString to an encryption method.
What I need to be able to do is marshal the data to a byte array that essentially represents a .Net string value without first converting the data to a .Net string. More specifically, given a SecureString with a value such as...
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890`~!##$%^&*()_-+={[}]|:;"'<,>.?/ ≈篭母
...how can I convert it to a byte array that is the equivalent a .Net string that's been serialized and written to a stream with a StreamWriter?
By using Marshal.SecureStringToCoTaskMemUnicode(...) I am able to do this with more traditional, western text. However, when I created the above text string using additional, not-typical characters and a string of Japanese text (see the last few bolded characters) my method of getting a Unicode byte array assigned to the IntPtr position doesn't seem to properly work anymore.
How can I emit the data of a SecureString in a secure way such that the returned byte data is structured the same as the byte data of a standard .Net string, serialized to binary output?
NOTE
Please ignore all security concerns at the moment. I am working on making various security upgrades to my application. For now, I need to use a SecureString for getting the sensitive data to the encryptor. The decryptor (for now) will still need to decrypt this data to string values, which is why I need to some how serialize the data in the the SecureString to a binary format similar to the binary format of the string object.
I agree that this approach is a bit unfortunate, however, I'm having to make incremental improvements on an existing application, and the first phase is locking down the data in SecureString objects from the user to the encryptor.

If you need to write secure string to stream, I'd suggest to create method like this:
public static class Extensions {
public static void WriteSecure(this StreamWriter writer, SecureString sec) {
int length = sec.Length;
if (length == 0)
return;
IntPtr ptr = Marshal.SecureStringToBSTR(sec);
try {
// each char in that string is 2 bytes, not one (it's UTF-16 string)
for (int i = 0; i < length * 2; i += 2) {
// so use ReadInt16 and convert resulting "short" to char
var ch = Convert.ToChar(Marshal.ReadInt16(ptr + i));
// write
writer.Write(ch);
}
}
finally {
// don't forget to zero memory
Marshal.ZeroFreeBSTR(ptr);
}
}
}
If you really need byte array - you can reuse this method too:
byte[] result;
using (var ms = new MemoryStream()) {
using (var writer = new StreamWriter(ms)) {
writer.WriteSecure(secureString);
}
result = ms.ToArray();
}
Though method from first comment might be a bit more pefomant (not sure if that's important for you).

How to compare files using Byte Array and Hash

Background
I am converting media files to a new format and need a way of knowing if I've previously in current runtime, converted a file.
My solution
To hash each file and store the hash in an array. Each time I go to convert a file I hash it and check the hash against the hashes stored in the array.
Problem
My logic doesn't seem able to detect when I've already seen a file and I end up converting the same file multiple times.
Code
//Byte array of already processed files
private static readonly List<byte[]> Bytelist = new List<byte[]>();
public static bool DoCheck(string file)
{
FileInfo info = new FileInfo(file);
while (FrmMain.IsFileLocked(info)) //Make sure file is finished being copied/moved
{
Thread.Sleep(500);
}
//Get byte sig of file and if seen before dont process
byte[] myFileData = File.ReadAllBytes(file);
byte[] myHash = MD5.Create().ComputeHash(myFileData);
if (Bytelist.Count != 0)
{
foreach (var item in Bytelist)
{
//If seen before ignore
if (myHash == item)
{
return true;
}
}
}
Bytelist.Add(myHash);
return false;
}
Question
Is there more efficient way of trying to acheive my end goal? What am I doing wrong?

There are multiple questions, I'm going to answer the first one:
Is there more efficient way of trying to acheive my end goal?
TL;DR yes.
You're storing hashes and comparing hashes only for the files, which is a really expensive operation. You can do other checks before calculating the hash:
Is the file size the same? If not, go to the next check.
Are the first bunch of bytes the same? If not, go to the next check.
At this point you have to check the hashes (MD5).
Of course you will have to store size/first X bytes/hash for each processed file.
In addition, same MD5 doesn't mean the files are the same so you might want to take an extra step to check if they're really the same, but this might be an overkill, depends on how heavy the cost of reprocessing the file is, might be more important not to calculate expensive hashes.
EDIT: The second question: is likely to fail as you are comparing the reference of two byte arrays that will never be the same as you create a new one every time, you need to create a sequence equal comparison between byte[]. (Or convert the hash to a string and compare strings then)
var exists = Bytelist.Any(hash => hash.SequenceEqual(myHash));

Are you sure this new file format doesn't add extra meta data into
the content? like last modified, or attributes that change ?
Also, if you are converting to a known format, then there should be a
way using a file signature to know if its already in this format or
not, if this is your format, then add some extra bytes for signature to identify it.
Don't forget that if your app gets closed and opened again it will
reporcess all files again by your approach.
Another last point regarding the code, I prefer not storing byte
arrays, but if you should, its better you create HashSet
instead of list, it has an access time of O(1).

There's a lot of room for improvement with regard to efficiency, effectiveness and style, but this isn't CodeReview.SE, so I'll try to stick the problem at hand:
You're checking if a two byte arrays are equivalent by using the == operator. But that will only perform reference equality testing - i.e. test if the two variables point to the same instance, the very same array. That, of course, won't work here.
There are many ways to do it, starting with a simple foreach loop over the arrays (with an optimization that checks the length first, probably) or using Enumerable.SequenceEquals as you can find in this answer here.
Better yet, convert your hash's byte[] to a string (any string - Convert.ToBase64String would be a good choice) and store that in your Bytelist cache (which should be a Hashset, not a List). Strings are optimized for these sort of comparisons, and you won't run into the "reference equality" problem here.
So a sample solution would be this:
private static readonly HashSet<string> _computedHashes = new HashSet<string>();
public static bool DoCheck(string file)
{
/// stuff
//Get byte sig of file and if seen before dont process
byte[] myFileData = File.ReadAllBytes(file);
byte[] myHash = MD5.Create().ComputeHash(myFileData);
string hashString = Convert.ToBase64String(myHash);
return _computedHashes.Contains(hashString);
}
Presumably, you'll add the hash to the _computedHashes set after you've done the conversion.

You have to compare the byte arrays item by item:
foreach (var item in Bytelist)
{
//If seen before ignore
if (myHash.Length == item.Length)
{
bool isequal = true;
for (int i = 0; i < myHash.Length; i++)
{
if (myHash[i] != item[i])
{
isequal = false;
}
}
if (isequal)
{
return true;
}
}
}

Comparing two files in C# [duplicate]

This question already has answers here:
How to compare 2 files fast using .NET?
(20 answers)
Closed 7 years ago.
I want to compare two files in C# and see if they are different. They have the same file names and they are the exact same size when different. I was just wondering if there is a fast way to do this without having to manually go in and read the file.
Thanks

Depending on how far you're looking to take it, you can take a look at Diff.NET
Here's a simple file comparison function:
// This method accepts two strings the represent two files to
// compare. A return value of 0 indicates that the contents of the files
// are the same. A return value of any other value indicates that the
// files are not the same.
private bool FileCompare(string file1, string file2)
{
int file1byte;
int file2byte;
FileStream fs1;
FileStream fs2;
// Determine if the same file was referenced two times.
if (file1 == file2)
{
// Return true to indicate that the files are the same.
return true;
}
// Open the two files.
fs1 = new FileStream(file1, FileMode.Open, FileAccess.Read);
fs2 = new FileStream(file2, FileMode.Open, FileAccess.Read);
// Check the file sizes. If they are not the same, the files
// are not the same.
if (fs1.Length != fs2.Length)
{
// Close the file
fs1.Close();
fs2.Close();
// Return false to indicate files are different
return false;
}
// Read and compare a byte from each file until either a
// non-matching set of bytes is found or until the end of
// file1 is reached.
do
{
// Read one byte from each file.
file1byte = fs1.ReadByte();
file2byte = fs2.ReadByte();
}
while ((file1byte == file2byte) && (file1byte != -1));
// Close the files.
fs1.Close();
fs2.Close();
// Return the success of the comparison. "file1byte" is
// equal to "file2byte" at this point only if the files are
// the same.
return ((file1byte - file2byte) == 0);
}

I was just wondering if there is a fast way to do this without having to manually go in and read the file.
Not really.
If the files came with hashes, you could compare the hashes, and if they are different you can conclude the files are different (same hashes, however, does not mean the files are the same and so you will still have to do a byte by byte comparison).
However, hashes use all the bytes in the file, so no matter what, you at some point have to read the files byte for byte. And in fact, just a straight byte by byte comparison will be faster than computing a hash. This is because a hash reads all the bytes just like comparing byte-by-byte does, but hashes do some other computations that add time. Additionally, a byte-by-byte comparison can terminate early on the first pair of non-equal bytes.
Finally, you can not avoid the need for a byte-by-byte read. If the hashes are equal, that doesn't mean the files are equal. In this case you still have to compare byte-by-byte.

Well, I'm not sure if you can in the file write timestamps. If not, your unique alternative, is comparing the content of the files.
A simple approach is comparing the files byte-to-byte, but if you're going to compare a file several times with others, you can calculate the hashcode of the files and compare it.
The following code snippet shows how you can do it:
public static string CalcHashCode(string filename)
{
FileStream stream = new FileStream(
filename,
System.IO.FileMode.Open,
System.IO.FileAccess.Read,
System.IO.FileShare.ReadWrite);
try
{
return CalcHashCode(stream);
}
finally
{
stream.Close();
}
}
public static string CalcHashCode(FileStream file)
{
MD5CryptoServiceProvider md5Provider = new MD5CryptoServiceProvider();
Byte[] hash = md5Provider.ComputeHash(file);
return Convert.ToBase64String(hash);
}
If you're going to compare a file with others more that one time, you can save the file hash and compare it. For a single comparison, the byte-to-byte comparison is better. You need also to recompute hash when the file changes, but if you're going to do massive comparisons (more than one time), I recommend using the hash approach.

If the filenames are the same, and the file sizes are the same, then, no, there is no way to know if they have different content without examining the content.

Read the file into a stream, then hash the stream. That should give you a reliable result for comparing.
byte[] fileHash1, fileHash2;
using (SHA256Managed sha = new SHA256Managed())
{
fileHash1 = sha.ComputeHash(streamforfile1);
fileHash2 = sha.ComputeHash(streamforfile2);
}
for (int i = 0; (i < fileHash1.Length) && (i < fileHash2.Length); i++)
{
if (fileHash[i] != fileHash2[i])
{
//files are not the same
break;
}
}

If they are not complied files then use a diff tool like KDiff or WinMerge. It will highlight were they are different.
http://kdiff3.sourceforge.net/
http://winmerge.org/

pass each file stream through an MD5 hasher and compare the hashes.

C# - RSACryptoServiceProvider Decrypt into a SecureString instead of byte array

I have a method that currently returns a string converted from a byte array:
public static readonly UnicodeEncoding ByteConverter = new UnicodeEncoding();
public static string Decrypt(string textToDecrypt, string privateKeyXml)
{
if (string.IsNullOrEmpty(textToDecrypt))
{
throw new ArgumentException(
"Cannot decrypt null or blank string"
);
}
if (string.IsNullOrEmpty(privateKeyXml))
{
throw new ArgumentException("Invalid private key XML given");
}
byte[] bytesToDecrypt = Convert.FromBase64String(textToDecrypt);
byte[] decryptedBytes;
using (var rsa = new RSACryptoServiceProvider())
{
rsa.FromXmlString(privateKeyXml);
decryptedBytes = rsa.Decrypt(bytesToDecrypt, FOAEP);
}
return ByteConverter.GetString(decryptedBytes);
}
I'm trying to update this method to instead return a SecureString, but I'm having trouble converting the return value of RSACryptoServiceProvider.Decrypt from byte[] to SecureString. I tried the following:
var secStr = new SecureString();
foreach (byte b in decryptedBytes)
{
char[] chars = ByteConverter.GetChars(new[] { b });
if (chars.Length != 1)
{
throw new Exception(
"Could not convert a single byte into a single char"
);
}
secStr.AppendChar(chars[0]);
}
return secStr;
However, using this SecureString equality tester, the resulting SecureString was not equal to the SecureString constructed from the original, unencrypted text. My Encrypt and Decrypt methods worked before, when I was just using string everywhere, and I've also tested the SecureString equality code, so I'm pretty sure the problem here is how I'm trying to convert byte[] into SecureString. Is there another route I should take for using RSA encryption that would allow me to get back a SecureString when I decrypt?
Edit: I didn't want to convert the byte array to a regular string and then stuff that string into a SecureString, because that seems to defeat the point of using a SecureString in the first place. However, is it also bad that Decrypt returns byte[] and I'm then trying to stuff that byte array into a SecureString? It's my guess that if Decrypt returns a byte[], then that's a safe way to pass around sensitive information, so converting one secure representation of the data to another secure representation seems okay.

A char and a byte can be used interchangeably with casting, so modify your second chunk of code as such:
var secStr = new SecureString();
foreach (byte b in decryptedBytes)
{
secStr.AppendChar((char)b);
}
return secStr;
This should work properly, but keep in mind that you're still bringing the unencrypted information into the "clear" in memory, so there's a point at which it could be compromised (which sort of defeats the purpose to a SecureString).
** Update **
A byte[] of your sensitive information is not secure. You can look at it in memory and see the information (especially if it's just a string). The individual bytes will be in the exact order of the string, so 'read'ing it is pretty straight-forward.
I was (actually about an hour ago) just struggling with this same issue myself, and as far as I know there is no good way to go straight from the decrypter to the SecureString unless the decryter is specifically programmed to support this strategy.

I think the problem might be your ByteConvert.GetChars method. I can't find that class or method in the MSDN docs. I'm not sure if that is a typo, or a homegrown function. Regardless, it is mostly likely not interpreting the encoding of the bytes correctly. Instead, use the UTF8Encoding's GetChars method. It will properly convert the bytes back into a .NET string, assuming they were encrypted from a .NET string object originally. (If not, you'll want to use the GetChars method on the encoding that matches the original string.)
You're right that using arrays is the most secure approach. Because the decrypted representations of your secret are stored in byte or char arrays, you can easily clear them out when done, so your plaintext secret isn't left in memory. This isn't perfectly secure, but more secure than converting to a string. Strings can't be changed and they stay in memory until they are garbage collected at some indeterminate future time.
var secStr = new SecureString();
var chars = System.Text.Encoding.UTF8.GetChars(decryptedBytes);
for( int idx = 0; idx < chars.Length; ++idx )
{
secStr.AppendChar(chars[idx]);
# Clear out the chars as you go.
chars[idx] = 0
}
# Clear the decrypted bytes from memory, too.
Array.Clear(decryptedBytes, 0, decryptedBytes.Length);
return secStr;

Based on Coding Gorilla's answer, I tried the following in my Decrypt method:
string decryptedString1 = string.Empty;
foreach (byte b in decryptedBytes)
{
decryptedString1 += (char)b;
}
string decryptedString2 = ByteConverter.GetString(decryptedBytes);
When debugging, decryptedString1 and decryptedString2 were not equal:
decryptedString1 "m\0y\0V\0e\0r\0y\0L\0o\0n\0g\0V\03\0r\0y\05\03\0c\0r\03\07\0p\04\0s\0s\0w\00\0r\0d\0!\0!\0!\0"
decryptedString2 "myVeryLongV3ry53cr37p4ssw0rd!!!"
So it looks like I can just go through the byte[] array, do a direct cast to char, and skip \0 characters. Like Coding Gorilla said, though, this does seem to again in part defeat the point of SecureString, because the sensitive data is floating about in memory in little byte-size chunks. Any suggestions for getting RSACryptoServiceProvider.Decrypt to return a SecureString directly?
Edit: yep, this works:
var secStr = new SecureString();
foreach (byte b in decryptedBytes)
{
var c = (char)b;
if ('\0' == c)
{
continue;
}
secStr.AppendChar(c);
}
return secStr;
Edit: correction: this works with plain old English strings. Encrypting and then attempting to decrypt the string "標準語 明治維新 english やった" doesn't work as expected because the resulting decrypted string, using this foreach (byte b in decryptedBytes) technique, does not match the original unencrypted string.
Edit: using the following works for both:
var secStr = new SecureString();
foreach (char c in ByteConverter.GetChars(decryptedBytes))
{
secStr.AppendChar(c);
}
return secStr;
This still leaves a byte array and a char array of the password in memory, which sucks. Maybe I should find another RSA class that returns a SecureString. :/

What if you stuck to UTF-16?
Internally, .NET (and therefore, SecureString) uses UTF-16 (double byte) to store string contents. You could take advantage of this and translate your protected data two bytes (i.e. 1 char) at a time...
When you encrypt, peel off a Char, and use Encoding.UTF16.GetBytes() to get your two bytes, and push those two bytes into your encryption stream. In reverse, when you are reading from your encrypted stream, read two bytes at a time, and UTF16.GetString() to get your char.
It probably sounds awful, but it keeps all the characters of your secret string from being all in one place, AND it gives you the reliability of character "size" (you won't have to guess if the next single byte is a char, or a UTF marker for a double-wide char). There's no way for an observer to know which characters go with which, nor in which order, so guessing the secret should be near impossible.
Honestly, this is just a suggested idea... I'm about to try it myself, and see how viable it is. My goal is to produce extension methods (SecureString.Encrypt and ICrypto.ToSecureString, or something like that).

Use System.Encoding.Default.GetString
GetString MSDN

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.