Asynchronous SHA256 Hashing - c#

I have the following method:
public static string Sha256Hash(string input) {
if(String.IsNullOrEmpty(input)) return String.Empty;
using(HashAlgorithm algorithm = new SHA256CryptoServiceProvider()) {
byte[] inputBytes = Encoding.UTF8.GetBytes(input);
byte[] hashBytes = algorithm.ComputeHash(inputBytes);
return BitConverter.ToString(hashBytes).Replace("-", String.Empty);
}
}
Is there a way to make it asynchronous? I was hoping to use the async and await keywords, but the HashAlgorithm class does not provide any asynchronous support for this.
Another approach was to encapsulate all the logic in a:
public static async string Sha256Hash(string input) {
return await Task.Run(() => {
//Hashing here...
});
}
But this does not seem clean and I'm not sure if it's a correct (or efficient) way to perform an operation asynchronously.
What can I do to accomplish this?

As stated by the other answerers, hashing is a CPU-bound activity so it doesn't have Async methods you can call. You can, however, make your hashing method async by asynchronously reading the file block by block and then hashing the bytes you read from the file. The hashing will be done synchronously but the read will be asynchronous and consequently your entire method will be async.
Here is sample code for achieving the purpose I just described.
public static async Threading.Tasks.Task<string> GetHashAsync<T>(this Stream stream)
where T : HashAlgorithm, new()
{
StringBuilder sb;
using (var algo = new T())
{
var buffer = new byte[8192];
int bytesRead;
// compute the hash on 8KiB blocks
while ((bytesRead = await stream.ReadAsync(buffer, 0, buffer.Length)) != 0)
algo.TransformBlock(buffer, 0, bytesRead, buffer, 0);
algo.TransformFinalBlock(buffer, 0, bytesRead);
// build the hash string
sb = new StringBuilder(algo.HashSize / 4);
foreach (var b in algo.Hash)
sb.AppendFormat("{0:x2}", b);
}
return sb?.ToString();
}
The function can be invoked as such
using (var stream = System.IO.File.OpenRead(#"C:\path\to\file.txt"))
string sha256 = await stream.GetHashAsync<SHA256CryptoServiceProvider>();
Of course,you could equally call the method with other hash algorithms such as SHA1CryptoServiceProvider or SHA512CryptoServiceProvider as the generic type parameter.
Likewise with a few modifications, you can also get it to hash a string as is specific to your case.

The work that you're doing is inherently synchronous CPU bound work. It's not inherently asynchronous as something like network IO is going to be. If you would like to run some synchronous CPU bound work in another thread and asynchronously wait for it to be completed, then Task.Run is indeed the proper tool to accomplish that, assuming the operation is sufficiently long running to need to perform it asynchronously.
That said, there really isn't any reason to expose an asynchronous wrapper over your synchronous method. It generally makes more sense to just expose the method synchronously, and if a particular caller needs it to run asynchronously in another thread, they can use Task.Run to explicitly indicate that need for that particular invocation.

The overhead of running this asynchronously (using Task.Run) will probably be higher that just running it synchronously.
An asynchronous interface is not available because it is a CPU bound operation. You can make it asynchronous (using Task.Run) as you pointed out, but I would recommend against it.

Related

What is the fastest possible way to read a serial port in .net?

I need a serial port program to read data coming in at 4800 baud. Right now I have a simulator sending 15 lines of data every second. The output of it seems to get "behind" and can't keep up with the speed/amount of data coming in.
I have tried using ReadLine() with a DataReceieved event, which did not seem to be reliable, and now I am using an async method with serialPort.BaseStream.ReadAsync:
okToReadPort = true;
Task readTask = new Task(startAsyncRead);
readTask.Start();
//this method starts the async read process and the "nmeaList" is what
// is used by the other thread to display data
public async void startAsyncRead()
{
while (okToReadPort)
{
Task<string> task = ReadLineAsync(serialPort);
string line = await task;
NMEAMsg tempMsg = new NMEAMsg(line);
if (tempMsg.sentenceType != null)
{
nmeaList[tempMsg.sentenceType] = tempMsg;
}
}
public static async Task<string> ReadLineAsync(
this SerialPort serialPort)
{
// Console.WriteLine("Entering ReadLineAsync()...");
byte[] buffer = new byte[1];
string ret = string.Empty;
while (true)
{
await serialPort.BaseStream.ReadAsync(buffer, 0, 1);
ret += serialPort.Encoding.GetString(buffer);
if (ret.EndsWith(serialPort.NewLine))
return ret.Substring(0, ret.Length - serialPort.NewLine.Length);
}
}
This still seems inefficient, does anyone know of a better way to ensure that every piece of data is read from the port and accounted for?
Generally speaking, your issue is that you are performing IO synchronously with data processing. It doesn't help that your data processing is relatively expensive (string concatenation).
To fix the general problem, when you read a byte put it into a processing buffer (BlockingCollection works great here as it solves Producer/Consumer) and have another thread read from the buffer. That way the serial port can immediately begin reading again instead of waiting for your processing to finish.
As a side note, you would likely see a benefit by using StringBuilder in your code instead of string concatenation. You should still process via queue though.

c# 4.5 - Should a TCP Server, mainly doing database inserts, start each client on a Task

My understanding is that async await is for IO (network, db, etc) and parallel task is for cpu.
Note: This code is a little harsh to make it concise for this post.
I have a windows service created in c# that has the following code
while (true)
{
var socket = await tcpListener.AcceptSocketAsync();
if (socket == null) { break; }
var client = new RemoteClient(socket);
Task.Run(() => client.ProcessMessage());
}
In the RemoteClient class the ProcessMessage method does this
byte[] buffer = new byte[4096];
rawMessage = string.Empty;
while (true)
{
Array.Clear(buffer, 0, buffer.Length);
int bytesRead = await networkStream.ReadAsync(buffer, 0, buffer.Length);
rawMessage += (System.Text.Encoding.ASCII.GetString(buffer).Replace("\0", string.Empty));
if (bytesRead == 0 || buffer[buffer.Length - 1] == 0)
{
StoreMessage();
return;
}
}
So I have the I/O work happening asynchronously. But my concern and my question is in using Task.Run to kick off the work am I still creating a block?
I'm trying to take a TCP connection and release it as quickly as possible in order to scale to a large number of connections.
I feel like I'm mixing paradigms here.
Thanks
My understanding is that async await is for IO (network, db, etc) and parallel task is for cpu.
I would say that understand is incorrect. async/await is for any asynchronous operation, whether I/O or CPU bound.
…my concern and my question is in using Task.Run to kick off the work am I still creating a block?
"A block"? What kind of block do you think you would be creating otherwise?
Personally, I would not write the code that way. The accept operation will already complete in a thread pool thread (or synchronously in the same thread), i.e. one from the IOCP thread pool. It would be perfectly fine to set up some initial conditions for the connection on that thread, and then initiate the I/O from there. There's no reason to queue up the work on yet another thread.
So the way I'd write the code is like this:
async Task ProcessMessage()
{
byte[] buffer = new byte[4096];
rawMessage = string.Empty;
while (true)
{
Array.Clear(buffer, 0, buffer.Length);
int bytesRead = await networkStream.ReadAsync(buffer, 0, buffer.Length);
rawMessage += (System.Text.Encoding.ASCII.GetString(buffer).Replace("\0", string.Empty));
if (bytesRead == 0 || buffer[buffer.Length - 1] == 0)
{
StoreMessage();
return;
}
}
}
Then in your service:
while (true)
{
var socket = await tcpListener.AcceptSocketAsync();
if (socket == null) { break; }
var client = new RemoteClient(socket);
var _ = client.ProcessMessage();
}
Notes:
The dummy _ variable is just there to keep the compiler from warning you about the ignored, non-awaited async return)
Since you are ignoring the returned Task object, you won't receive thrown exceptions. So in lieu of that, you should add appropriate exception handling to the ProcessMessage() method itself.
I agree with commenter shr regarding cleanup. You didn't provide a complete code example, so we don't know what e.g. the StoreMessage() method does. But presumably/hopefully you have logic in there somewhere that correctly and gracefully shuts down the connection and closes the socket.

Unbuffered version of Process.BeginOutputReadLine method

Some console applications, such a plink, may not print a new line character after printing important information (eg. "Store key in cache? (y/n)"). Is there a built in way to asynchronously read from a programs stdout that does not wait for new lines? If not, is creating a separate thread to read characters synchronously a good idea?
Just use Read() instead of ReadLine(). A simple asynchronous implementation would look something like this:
void SomeMethod()
{
Process process = ...; // init as appropriate, including redirection of stdout
StringBuilder sb = new StringBuilder();
var _ = ConsumeReader(process.StandardOutput, sb);
}
async Task ConsumeReader(TextReader reader, StringBuilder sb)
{
char[] buffer = new char[1024];
int cch;
while ((cch = await reader.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
sb.Append(buffer, 0, cch);
}
}
The above simply copies the text to a StringBuilder. Presumably in your own scenario you would do something else, like parse it and respond to prompts, that sort of thing. Given the lack of a code example, I can't be more specific than that.
This example also ignores the Task returned from the async method. That may be fine in your case, or you might want to give the variable a better name than _ and eventually wait on the Task at some later point. Use it as you see fit.

async / await vs BeginRead, EndRead

I don't quite 'get' async and await yet, and I'm looking for some clarification around a particular problem I'm about to solve. Basically, I need to write some code that'll handle a TCP connection. It'll essentially just receive data and process it until the connection is closed.
I'd normally write this code using the NetworkStream BeginRead and EndRead pattern, but since the async / await pattern is much cleaner, I'm tempted to use that instead. However, since I admittedly don't fully understand exactly what is involved in these, I'm a little wary of the consequences. Will one use more resources than the other; will one use a thread where another would use IOCP, etc.
Convoluted example time. These two do the same thing - count the bytes in a stream:
class StreamCount
{
private Stream str;
private int total = 0;
private byte[] buffer = new byte[1000];
public Task<int> CountBytes(Stream str)
{
this.str = str;
var tcs = new TaskCompletionSource<int>();
Action onComplete = () => tcs.SetResult(total);
str.BeginRead(this.buffer, 0, 1000, this.BeginReadCallback, onComplete);
return tcs.Task;
}
private void BeginReadCallback(IAsyncResult ar)
{
var bytesRead = str.EndRead(ar);
if (bytesRead == 0)
{
((Action)ar.AsyncState)();
}
else
{
total += bytesRead;
str.BeginRead(this.buffer, 0, 1000, this.BeginReadCallback, ar.AsyncState);
}
}
}
... And...
public static async Task<int> CountBytes(Stream str)
{
var buffer = new byte[1000];
var total = 0;
while (true)
{
int bytesRead = await str.ReadAsync(buffer, 0, 1000);
if (bytesRead == 0)
{
break;
}
total += bytesRead;
}
return total;
}
To my eyes, the async way looks cleaner, but there is that 'while (true)' loop that my uneducated brain tells me is going to use an extra thread, more resources, and therefore won't scale as well as the other one. But I'm fairly sure that is wrong. Are these doing the same thing in the same way?
To my eyes, the async way looks cleaner, but there is that 'while (true)' loop that my uneducated brain tells me is going to use an extra thread, more resources, and therefore won't scale as well as the other one.
Nope, it won't. The loop will only use a thread when it's actually running code... just as it would in your BeginRead callback. The await expression will return control to whatever the calling code is, having registered a continuation which jumps back to the right place in the method (in an appropriate thread, based on the synchronization context) and then continues running until it either gets to the end of the method or hits another await expression. It's exactly what you want :)
It's worth learning more about how async/await works behind the scenes - you might want to start with the MSDN page on it, as a jumping off point.

Multithread Write byte[] into file

I hoping someone can help me, if have a question about writing into a file using multiple threads/Tasks. See my code sample below...
AddFile return a array of longs holding the values, blobNumber, the offset inside the blob and the size of the data writing into the blob
public long[] AddFile(byte[] data){
long[] values = new long[3];
values[0] = WorkingIndex = getBlobIndex(data); //blobNumber
values[1] = blobFS[WorkingIndex].Position; //Offset
values[2] = length = data.length; //size
//BlobFS is a filestream
blobFS[WorkingIndex].Write(data, 0, data.Length);
return values;
}
So lets say I use the AddFile function inside a foreach loop like the one below.
List<Task> tasks = new List<Task>(System.Environment.ProcessorCount);
foreach(var file in Directory.GetFiles(#"C:\Documents"){
var task = Task.Factory.StartNew(() => {
byte[] data = File.ReadAllBytes(file);
long[] info = blob.AddFile(data);
return info
});
task.ContinueWith(// do some stuff);
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray);
return result;
I can imagine that this will totally fail, in the way that files will override each other inside the blob due to the fact that the Write function hasn't finished writing file1 and an other task is writing file2 at the same time.
So what is the best way to solve this problem? Maybe using asynchronous write functions...
Your help would be appreciated!
Kind regards,
Martijn
My suggestion here would be to not run these tasks in parallel. It's likely that disk IO will be the bottleneck for any file-based operation, and so running them in parallel will just cause each thread to be blocked accessing the disk. Ultimately, you'll quite possibly find that your code runs significantly slower as you've written it than it would run in serial.
Is there a particular reason that you want these in parallel? Can you handle the disk writes serially and just call ContinueWith() on separate threads instead? This would have the benefit of removing the problem you're posting about, too.
EDIT: an example naive reimplementation of your for loop:
foreach(var file in Directory.GetFiles(#"C:\Documents"){
byte[] data = File.ReadAllBytes(file); // this happens on the main thread
// processing of each file is handled in multiple threads in parallel to disk IO
var task = Task.Factory.StartNew(() => {
long[] info = blob.AddFile(data);
return info
});
task.ContinueWith(// do some stuff);
tasks.Add(task);
}

Categories