.NET Native incredibly slower than Debug build with ReadAsync calls - c#

so I just found a really weird issue in my app and it turns out it was caused by the .NET Native compiler for some reason.
I have a method that compares the content of two files, and it works fine. With two 400KBs files, it takes like 0.4 seconds to run on my Lumia 930 in Debug mode. But, when in Release mode, it takes up to 17 seconds for no apparent reason. Here's the code:
// Compares the content of the two streams
private static async Task<bool> ContentEquals(ulong size, [NotNull] Stream fileStream, [NotNull] Stream testStream)
{
// Initialization
const int bytes = 8;
int iterations = (int)Math.Ceiling((double)size / bytes);
byte[] one = new byte[bytes];
byte[] two = new byte[bytes];
// Read all the bytes and compare them 8 at a time
for (int i = 0; i < iterations; i++)
{
await fileStream.ReadAsync(one, 0, bytes);
await testStream.ReadAsync(two, 0, bytes);
if (BitConverter.ToUInt64(one, 0) != BitConverter.ToUInt64(two, 0)) return false;
}
return true;
}
/// <summary>
/// Checks if the content of two files is the same
/// </summary>
/// <param name="file">The source file</param>
/// <param name="test">The file to test</param>
public static async Task<bool> ContentEquals([NotNull] this StorageFile file, [NotNull] StorageFile test)
{
// If the two files have a different size, just stop here
ulong size = await file.GetFileSizeAsync();
if (size != await test.GetFileSizeAsync()) return false;
// Open the two files to read them
try
{
// Direct streams
using (Stream fileStream = await file.OpenStreamForReadAsync())
using (Stream testStream = await test.OpenStreamForReadAsync())
{
return await ContentEquals(size, fileStream, testStream);
}
}
catch (UnauthorizedAccessException)
{
// Copy streams
StorageFile fileCopy = await file.CreateCopyAsync(ApplicationData.Current.TemporaryFolder);
StorageFile testCopy = await file.CreateCopyAsync(ApplicationData.Current.TemporaryFolder);
using (Stream fileStream = await fileCopy.OpenStreamForReadAsync())
using (Stream testStream = await testCopy.OpenStreamForReadAsync())
{
// Compare the files
bool result = await ContentEquals(size, fileStream, testStream);
// Delete the temp files at the end of the operation
Task.Run(() =>
{
fileCopy.DeleteAsync(StorageDeleteOption.PermanentDelete).Forget();
testCopy.DeleteAsync(StorageDeleteOption.PermanentDelete).Forget();
}).Forget();
return result;
}
}
}
Now, I have absolutely no idea why this same exact method goes from 0.4 seconds all the way up to more than 15 seconds when compile with the .NET Native toolchain.
I fixed this issue using a single ReadAsync call to read the entire files, then I generated two MD5 hashes from the results and compared the two. This approach worked in around 0.4 seconds on my Lumia 930 even in Release mode.
Still, I'm curious about this issue and I'd like to know why it was happening.
Thank you in advance for your help!
EDIT: so I've tweaked my method in order to reduce the number of actual IO operations, this is the result and it looks like it's working fine so far.
private static async Task<bool> ContentEquals(ulong size, [NotNull] Stream fileStream, [NotNull] Stream testStream)
{
// Initialization
const int bytes = 102400;
int iterations = (int)Math.Ceiling((double)size / bytes);
byte[] first = new byte[bytes], second = new byte[bytes];
// Read all the bytes and compare them 8 at a time
for (int i = 0; i < iterations; i++)
{
// Read the next data chunk
int[] counts = await Task.WhenAll(fileStream.ReadAsync(first, 0, bytes), testStream.ReadAsync(second, 0, bytes));
if (counts[0] != counts[1]) return false;
int target = counts[0];
// Compare the first bytes 8 at a time
int j;
for (j = 0; j < target; j += 8)
{
if (BitConverter.ToUInt64(first, j) != BitConverter.ToUInt64(second, j)) return false;
}
// Compare the bytes in the last chunk if necessary
while (j < target)
{
if (first[j] != second[j]) return false;
j++;
}
}
return true;
}

Reading eight bytes at a time from an I/O device is a performance disaster. That's why we are using buffered reading (and writing) in the first place. It takes time for an I/O request to be submitted, processed, executed and finally returned.
OpenStreamForReadAsync appears to not be using a buffered stream. So your 8-byte requests are actually requesting 8 bytes at a time. Even with the solid-state drive, this is very slow.
You don't need to read the whole file at once, though. The usual approach is to find a reasonable buffer size to pre-read; something like reading 1 kiB at a time should fix your whole issue without requiring you to load the whole file in memory at once. You can use BufferedStream between the file and your reading to handle this for you. And if you're feeling adventurous, you could issue the next read request before the CPU processing is done - though it's very likely that this isn't going to help your performance much, given how much of the work is just I/O.
It also seems that .NET native has a lot bigger overhead than managed .NET for asynchronous I/O in the first place, which would make those tiny asynchronous calls all the more of a problem. Fewer requests of larger data will help.

Related

Blazor WebAssembly File Chunking directly to Azure Blob Storage using C# (no JavaScript)

I am using Blazor Webssembly and .Net 5.0. I need to be able to upload very large files (2-5GB) to Azure Blob Storage using chunking by uploading file data in stages and then firing a final commit message on the blob once all blocks have been staged.
I was able to achieve this using SharedAccessSignatures and the Azure JavaScript Libraries (there are many examples available online).
However I would like to handle this using pure C#. Where I am running into an issue is the IBrowserFile reference seems to try to load the entire file into memory rather than read in just the chunks it needs for each stage in the loop.
For simplicity sake my example code below does not include any Azure Blob Storage code. I am simply writing the chunking and commit messages to the console:
#page "/"
<InputFile OnChange="OnInputFileChange" />
#code{
async Task OnInputFileChange(InputFileChangeEventArgs e)
{
try
{
var file = e.File;
int blockSize = 1 * 1024 * 1024;//1 MB Block
int offset = 0;
int counter = 0;
List<string> blockIds = new List<string>();
using (var fs = file.OpenReadStream(5000000000)) //<-- Need to go up to 5GB
{
var bytesRemaining = fs.Length;
do
{
var dataToRead = Math.Min(bytesRemaining, blockSize);
byte[] data = new byte[dataToRead];
var dataRead = fs.Read(data, offset, (int)dataToRead);
bytesRemaining -= dataRead;
if (dataRead > 0)
{
var blockId = Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes(counter.ToString("d6")));
Console.WriteLine($"blockId:{blockId}");
Console.WriteLine(string.Format("Block {0} uploaded successfully.", counter.ToString("d6")));
blockIds.Add(blockId);
counter++;
}
}
while (bytesRemaining > 0);
Console.WriteLine("All blocks uploaded. Now committing block list.");
Console.WriteLine("Blob uploaded successfully!");
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
}
This first issue is that is that:
Synchronous reads are not supported.
So I tried:
var fs = new System.IO.MemoryStream();
await file.OpenReadStream(5000000000).CopyToAsync(fs);
using (fs)
{
...
}
But obviously I am now going to run into memory issues! And I do. The error on even a 200kb file is:
Out of memory
And anything over 1MB:
Garbage collector could not allocate 16384u bytes of memory for major heap section.
Is there a way to read in smaller chunks of data at a time from the IBrowserFile so this can be achieved natively in client side Blazor without having to resort to JavaScript?
.NET 6.0 has a nice Stream.CopyToAsync() implementation, which can be found here
https://github.com/microsoft/referencesource/blob/master/mscorlib/system/io/stream.cs
This will copy the data from one stream to an other asynchronously.
The gist of it is this:
private async Task CopyToAsyncInternal(Stream source, Stream destination, Int32 bufferSize, CancellationToken cancellationToken)
{
byte[] buffer = new byte[bufferSize];
int bytesRead;
while ((bytesRead = await source.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) != 0)
{
await destination.WriteAsync(buffer, 0, bytesRead, cancellationToken).ConfigureAwait(false);
}
}
(copied from link above)
Set the bufferSize to something like 4096 or a multiple and it should work. Other values are also possible, but usually block are taken as a multiple of 4k.
The assumption here is that you have a writable stream to which you can write the bytes asynchronously. You can modify this loop to count blocks and to other stuff per block. In any case don't use a memory stream client side or server side with large files.

C# System.IO.FileStream performance hangs badly (10+ seconds)

I'm running into an issue wherein I am getting significant (10+ second) delays when performing file write operations. It seems only to happen once, and always happens during the 2nd (or sometimes 3rd?) call to the WriteToFile() function.
I've written out 3 different 'WriteToFile' functions to show some of the variations I've tried thus far + shown additional lines in 'OpenFileIfNecessary' that I've tried.
The code never throws an error, and the offsets/counts are all valid. Once the delays occur a single time, there seem to be no further delays.
This has been a pain in my side for 2+ days and I'm definitely at that point where I'm in need of a 2nd opinion.
private void WriteToFile(byte[] data, long offset, int count)
{
lock (this.monitor)
{
this.OpenFileIfNecessary();
this.fileStream.Seek(offset, SeekOrigin.Begin); // <- Takes 10+ seconds for THIS line to execute
this.fileStream.Write(data, 0, count);
}
}
private void WriteToFile2(byte[] data, long offset, int count)
{
lock (this.monitor)
{
this.OpenFileIfNecessary();
this.fileStream.Position = offset; // <- Takes 10+ seconds for THIS line to execute
this.fileStream.Write(data, 0, count);
}
}
private void WriteToFile3(byte[] data, long offset, int count)
{
lock (this.monitor)
{
var fileName = this.file.FullName;
using (Stream fileStream = new FileStream(fileName, FileMode.OpenOrCreate))
{
fileStream.Position = offset; //(instant execution of this line)
fileStream.Write(data, 0, count);
//Getting from HERE ->
}
//To HERE <- takes 10+ seconds
}
}
private System.IO.FileStream fileStream = null;
private System.IO.FileInfo file; //value set during construction
private void OpenFileIfNecessary()
{
lock (this.monitor) {
if (this.fileStream == null) {
//The following 3 lines all result in the same behavior described in this post
//this.fileStream = this.file.Open(FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite);
//this.fileStream = this.file.Open(FileMode.OpenOrCreate, FileAccess.Write, FileShare.Write);
//this.fileStream = this.file.OpenWrite();
this.fileStream = this.file.Open(FileMode.OpenOrCreate);
}
}
}
Found the issue. It's worth mentioning that we had previously been testing with smaller (<1GB files) until late last week. With that in mind:
We write to the file at different positions, that is, we don't simply start at position 0 and go to the end. What that means (especially for larger files) is that every time we first go to a position that is deep into the file, there is apparently a wait period for the newly extended size to be allocated.
The way FileStream obfuscates a lot of the under-the-hood stuff made it a little difficult to find the pattern, and once we did some deeper profiling and discovered smaller delays with smaller files (never noticed the delays before) it became clear what was happening.
The plan forward is to do some multithreading to allow for the space for the file to be allocated fully before writing to disk; we can buffer in memory during that wait period.
Example code for preallocating the entire file:
fileStream.Seek(size - 1, SeekOrigin.Begin);
fileStream.WriteByte(0);
fileStream.Flush();
That is happening because when you set a file position to some large value, underlying storage system has to zero out the contents of allocated blocks. I do not believe BCL will let you bypass that but there is actual a way in Win32 to skip that functionality which requires running program to have administrator privileges (in a very imprecise manner).
Search for SetFileValidData() documentation.

SerialPort.ReadLine() slow compared to manual method?

I've recently implemented a small program which reads data coming from a sensor and plotting it as diagram.
The data comes in as chunks of 5 bytes, roughly every 500 µs (baudrate: 500000). Around 3000 chunks make up a complete line. So the total transmission time is around 1.5 s.
As I was looking at the live diagram I noticed a severe lag between what is shown and what is currently measured. Investigating, it all boiled down to:
SerialPort.ReadLine();
It takes around 0.5 s more than the line to be transmitted. So each line read takes around 2 s. Interestingly no data is lost, it just lags behind even more with each new line read. This is very irritating for the user, so I couldn't leave it like that.
I've implemented my own variant and it shows a consistent time of around 1.5 s, and no lag occurs. I'm not really proud of my implementation (more or less polling the BaseStream) and I'm wondering if there is a way to speed up the ReadLine function of the SerialPort class. With my implementation I'm also getting some corrupted lines, and haven't found the exact issue yet.
I've tried changing the ReadTimeout to 1600, but that just produced a TimeoutException. Although the data arrived.
Any explanation as of why it is slow or a way to fix it is appreciated.
As a side-note: I've tried this on a Console application with only SerialPort.ReadLine() as well and the result is the same, so I'm ruling out my own application affecting the SerialPort.
I'm not sure this is relevant, but my implementation looks like this:
LineSplitter lineSplitter = new LineSplitter();
async Task<string> SerialReadLineAsync(SerialPort serialPort)
{
byte[] buffer = new byte[5];
string ret = string.Empty;
while (true)
{
try
{
int bytesRead = await serialPort.BaseStream.ReadAsync(buffer, 0, buffer.Length).ConfigureAwait(false);
byte[] line = lineSplitter.OnIncomingBinaryBlock(this, buffer, bytesRead);
if (null != line)
{
return Encoding.ASCII.GetString(line).TrimEnd('\r', '\n');
}
}
catch
{
return string.Empty;
}
}
}
With LineSplitter being the following:
class LineSplitter
{
// based on: http://www.sparxeng.com/blog/software/reading-lines-serial-port
public byte Delimiter = (byte)'\n';
byte[] leftover;
public byte[] OnIncomingBinaryBlock(object sender, byte[] buffer, int bytesInBuffer)
{
leftover = ConcatArray(leftover, buffer, 0, bytesInBuffer);
int newLineIndex = Array.IndexOf(leftover, Delimiter);
if (newLineIndex >= 0)
{
byte[] result = new byte[newLineIndex+1];
Array.Copy(leftover, result, result.Length);
byte[] newLeftover = new byte[leftover.Length - result.Length];
Array.Copy(leftover, newLineIndex + 1, newLeftover, 0, newLeftover.Length);
leftover = newLeftover;
return result;
}
return null;
}
static byte[] ConcatArray(byte[] head, byte[] tail, int tailOffset, int tailCount)
{
byte[] result;
if (head == null)
{
result = new byte[tailCount];
Array.Copy(tail, tailOffset, result, 0, tailCount);
}
else
{
result = new byte[head.Length + tailCount];
head.CopyTo(result, 0);
Array.Copy(tail, tailOffset, result, head.Length, tailCount);
}
return result;
}
}
I ran into this issue in 2008 talking to GPS modules. Essentially the blocking functions are flaky and the solution is to use APM.
Here are the gory details in another Stack Overflow answer: How to do robust SerialPort programming with .NET / C#?
You may also find this of interest: How to kill off a pending APM operation

Reading String from Stream

I am encrypting data to a stream. If, for example, my data is of type Int32, I will use BitConverter.GetBytes(myInt) to get the bytes and then write those bytes to the stream.
To read the data back, I read sizeof(Int32) to determine the number of bytes to read, read those bytes, and then use BitConverter.ToInt32(byteArray, 0) to convert the bytes back to an Int32.
So how would I do this with a string? Writing the string is no problem. But the trick when reading the string is knowing how many bytes to read before I can then convert it back to a string.
I have found similar questions, but they seem to assume the string occupies the entire stream and just read to the end of the stream. But here, I can have any number of other items before and after the string.
Note that StringReader is not an option here since I want the option of handling file data that may be larger than I want to load into memory.
You would normally send a content length header, and then read the length determined by that information.
Here is some sample code:
public async Task ContinouslyReadFromStream(NetworkStream sourceStream, CancellationToken token)
{
while (!ct.IsCancellationRequested && sourceStream.CanRead)
{
while (sourceStream.CanRead && !sourceStream.DataAvailable)
{
// Avoid potential high CPU usage when doing stream.ReadAsync
// while waiting for data
Thread.Sleep(10);
}
var lengthOfMessage = BitConverter.ToInt32(await ReadExactBytesAsync(stream, 4, ct), 0);
var content = await ReadExactBytesAsync(stream, lengthOfMessage, ct);
// Assuming you use UTF8 encoding
var stringContent = Encoding.UTF8.GetString(content);
}
}
protected static async Task<byte[]> ReadExactBytesAsync(Stream stream, int count, CancellationToken ct)
{
var buffer = new byte[count];
var totalBytesRemaining = count;
var totalBytesRead = 0;
while (totalBytesRemaining != 0)
{
var bytesRead = await stream.ReadAsync(buffer, totalBytesRead, totalBytesRemaining, ct);
ct.ThrowIfCancellationRequested();
totalBytesRead += bytesRead;
totalBytesRemaining -= bytesRead;
}
return buffer;
}
The solutions that come to mind are to either provide a predetermined sentinel value to signal the end of the string (ASM uses a 0 byte for this, for example), or to provide a fixed-length block of metadata ahead of each new datatype. In that block of metadata would be the type and the length, plus whatever other information you found it useful to include.
For compactness I would use the sentinel value if it will work in your system.

Infinite Do...while loop using Async Await

I have following code:
public static async Task<string> ReadLineAsync(this Stream stream, Encoding encoding)
{
byte[] byteArray = null;
using (MemoryStream ms = new MemoryStream())
{
int bytesRead= 0;
do
{
byte[] buf = new byte[1024];
try
{
bytesRead = await stream.ReadAsync(buf, 0, 1024);
await ms.WriteAsync(buf, 0, bytesRead);
}
catch (Exception e)
{
Console.WriteLine(e.Message + e.StackTrace);
}
} while (stream.CanRead && bytesRead> 0);
byteArray = ms.ToArray();
return encoding.GetString(ms.ToArray());
}
I am trying to read Stream to write into MemoryStream asynchronously, but the Do...while loop is failing to break. I mean it's an infinite loop. How to solve this?
First, in an exceptional situation, your loop would continue indefinitely. You shouldn't catch and ignore exceptions.
Secondly, if the stream doesn't actually end, then bytesRead would never be zero. I suspect this is the case because the name of the method (ReadLineAsync) doesn't imply to me that it will read until the end of the stream.
P.S. CanRead does not ever change for a specific stream. It's whether it makes semantic sense for a stream to do a read operation, not whether it can read right now.
You have your loop condition set to run as long as CanRead is true and bytesRead is greater then 0. CanRead will always be true if your file is readable. This means as long as you start reading your bytes will always be greater than zero. You need to have a maximum number of bytes to be read as well as a minimum or set some other control to break out.
So, you are taking a stream from IMAP and this method is for converting that steam into text?
Why not construct a SteamReader round the stream and call either it's ReadToEndAsync or just ReadToEnd? I doubt the need for making this an Async operation, if the stream is something like an e-mail then it is unlikely to be so big that a user will notice the UI blocking while it reads.
If, as one of your comments suggests, this isn't a UI app at all then it is probably even less of an issue.
If my assumptions are wrong then could I ask you to update your question with some more information about how this function is being used. The more information you can tell us, the better our answers can be.
EDIT:
I just noticed that your method is called ReadLineAsync, although I can't see anywhere in the code that you are looking for a line ending. If your intention is to read a line of text then the SteamReader also provides ReadLine and ReadLineAsync.
I took your method and modified it just a tad by shortening the read buffer size and adding some debug statements
public static async Task<string> ReadLineAsync(this Stream stream, Encoding encoding)
{
const int count = 2;
byte[] byteArray = Enumerable.Empty<byte>().ToArray();
using (MemoryStream ms = new MemoryStream())
{
int bytesRead = 0;
do
{
byte[] buf = new byte[count];
try
{
bytesRead = await stream.ReadAsync(buf, 0, count);
await ms.WriteAsync(buf, 0, bytesRead);
Console.WriteLine("{0:ffffff}:{1}:{2}",DateTime.Now, stream.CanRead, bytesRead);
}
catch (Exception e)
{
Console.WriteLine(e.Message + e.StackTrace);
}
} while (stream.CanRead && bytesRead > 0);
byteArray = ms.ToArray();
return encoding.GetString(byteArray);
}
}
but basically it worked as expected with the following call:
private static void Main(string[] args)
{
FileStream stream = File.OpenRead(#"C:\in.txt");
Encoding encoding = Encoding.GetEncoding(1252);
Task<string> result = stream.ReadLineAsync(encoding);
result.ContinueWith(o =>
{
Console.Write(o.Result);
stream.Dispose();
});
Console.WriteLine("Press ENTER to continue...");
Console.ReadLine();
}
so I'm wondering could it be something with your input file? Mine was (encoded in Windows-1252 in Notepad++)
one
two
three
and my output was
Press ENTER to continue...
869993:True:2
875993:True:2
875993:True:2
875993:True:2
875993:True:2
875993:True:2
875993:True:2
875993:True:1
875993:True:0
one
two
three
note how the "Press ENTER to continue..." was printed first as expected because the main method was invoked asynchronously, and CanRead is always true because it means the file is readable. Its the state of how the file was opened, not the state meaning that the cursor is at the EOF.
From my POV, looks like your code is trying to do the following:
read an entire stream as a sequence of 1024-octet chunks,
concatenate all those chunks into a MemoryStream (which uses a byte array as its backing store),
convert the MemoryStream to a string using the specified encoding
return that string to the caller.
This seems...complicated to me. Maybe I'm missing something, but to use async and await, you've got to be using VS2012 and .Net 4.5, or VS2010. .Net 4.0 and the Async CTP, right? If so, why wouldn't you simply use a StreamReader and its StreamReader.ReadToEndAsync() method?
public static async Task<string> MyReadLineAsync(this Stream stream, Encoding encoding)
{
using ( StreamReader reader = new StreamReader( stream , encoding ) )
{
return await reader.ReadToEndAsync() ;
}
}
The overlapping i/o idea is nice, but the time required to write to a memory stream is, to say the least, not enough to make one whit of difference with respect to the time required to peform actual I/O (presumably your input stream is doing disk or network i/o).

Categories