Windows API CreateFile: "Invalid parameter" exception with FILE_FLAG_NO_BUFFERING [duplicate] - c#

A little background: I've been experimenting with using the FILE_FLAG_NO_BUFFERING flag when doing IO with large files. We're trying to reduce the load on the cache manager in the hope that with background IO, we'll reduce the impact of our app on user machines. Performance is not an issue. Being behind the scenes as much as possible is a big issue. I have a close-to-working wrapper for doing unbuffered IO but I ran into a strange issue. I get this error when I call Read with an offset that is not a multiple of 4.
Handle does not support synchronous operations. The parameters to the FileStream constructor may need to be changed to indicate that the handle was opened asynchronously (that is, it was opened explicitly for overlapped I/O).
Why does this happen? And is doesn't this message contradict itself? If I add the Asynchronous file option I get an IOException(The parameter is incorrect.)
I guess the real question is what do these requirements, http://msdn.microsoft.com/en-us/library/windows/desktop/cc644950%28v=vs.85%29.aspx, have to do with these multiples of 4.
Here is the code that demonstrates the issue:
FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;
int MinSectorSize = 512;
byte[] buffer = new byte[MinSectorSize * 2];
int i = 0;
while (i < MinSectorSize)
{
try
{
using (FileStream fs = new FileStream(#"<some file>", FileMode.Open, FileAccess.Read, FileShare.None, 8, FileFlagNoBuffering | FileOptions.Asynchronous))
{
fs.Read(buffer, i, MinSectorSize);
Console.WriteLine(i);
}
}
catch { }
i++;
}
Console.ReadLine();

When using FILE_FLAG_NO_BUFFERING, the documented requirement is that the memory address for a read or write must be a multiple of the physical sector size. In your code, you've allowed the address of the byte array to be randomly chosen (hence unlikely to be a multiple of the physical sector size) and then you're adding an offset.
The behaviour you're observing is that the call works if the offset is a multiple of 4. It is likely that the byte array is aligned to a 4-byte boundary, so the call is working if the memory address is a multiple of 4.
Therefore, your question can be rewritten like this: why is the read working when the memory address is a multiple of 4, when the documentation says it has to be a multiple of 512?
The answer is that the documentation doesn't make any specific guarantees about what happens if you break the rules. It may happen that the call works anyway. It may happen that the call works anyway, but only in September on even-numbered years. It may happen that the call works anyway, but only if the memory address is a multiple of 4. (It is likely that this depends on the specific hardware and device drivers involved in the read operation. Just because it works on your machine doesn't mean it will work on anybody else's.)
It probably isn't a good idea to use FILE_FLAG_NO_BUFFERING with FileStream in the first place, because I doubt that FileStream actually guarantees that it will pass the address you give it unmodified to the underlying ReadFile call. Instead, use P/Invoke to call the underlying API functions directly. You may also need to allocate your memory this way, because I don't know whether .NET provides any way to allocate memory with a particular alignment or not.

Just call CreateFile directly with FILE_FLAG_NO_BUFFERING and then close it before opening with FileStream to achieve the same effect.

Related

Memory limitted to about 2.5 GB for single .net process

I am writing .NET applications running on Windows Server 2016 that does an http get on a bunch of pieces of a large file. This dramatically speeds up the download process since you can download them in parallel. Unfortunately, once they are downloaded, it takes a fairly long time to pieces them all back together.
There are between 2-4k files that need to be combined. The server this will run on has PLENTLY of memory, close to 800GB. I thought it would make sense to use MemoryStreams to store the downloaded pieces until they can be sequentially written to disk, BUT I am only able to consume about 2.5GB of memory before I get an System.OutOfMemoryException error. The server has hundreds of GB available, and I can't figure out how to use them.
MemoryStreams are built around byte arrays. Arrays cannot be larger than 2GB currently.
The current implementation of System.Array uses Int32 for all its internal counters etc, so the theoretical maximum number of elements is Int32.MaxValue.
There's also a 2GB max-size-per-object limit imposed by the Microsoft CLR.
As you try to put the content in a single MemoryStream the underlying array gets too large, hence the exception.
Try to store the pieces separately, and write them directly to the FileStream (or whatever you use) when ready, without first trying to concatenate them all into 1 object.
According to the source code of the MemoryStream class you will not be able to store more than 2 GB of data into one instance of this class.
The reason for this is that the maximum length of the stream is set to Int32.MaxValue and the maximum index of an array is set to 0x0x7FFFFFC7 which is 2.147.783.591 decimal (= 2 GB).
Snippet MemoryStream
private const int MemStreamMaxLength = Int32.MaxValue;
Snippet array
// We impose limits on maximum array lenght in each dimension to allow efficient
// implementation of advanced range check elimination in future.
// Keep in sync with vm\gcscan.cpp and HashHelpers.MaxPrimeArrayLength.
// The constants are defined in this method: inline SIZE_T MaxArrayLength(SIZE_T componentSize) from gcscan
// We have different max sizes for arrays with elements of size 1 for backwards compatibility
internal const int MaxArrayLength = 0X7FEFFFFF;
internal const int MaxByteArrayLength = 0x7FFFFFC7;
The question More than 2GB of managed memory has already been discussed long time ago on the microsoft forum and has a reference to a blog article about BigArray, getting around the 2GB array size limit there.
Update
I suggest to use the following code which should be able to allocate more than 4 GB on a x64 build but will fail < 4 GB on a x86 build
private static void Main(string[] args)
{
List<byte[]> data = new List<byte[]>();
Random random = new Random();
while (true)
{
try
{
var tmpArray = new byte[1024 * 1024];
random.NextBytes(tmpArray);
data.Add(tmpArray);
Console.WriteLine($"{data.Count} MB allocated");
}
catch
{
Console.WriteLine("Further allocation failed.");
}
}
}
As has already been pointed out, the main problem here is the nature of MemoryStream being backed by a byte[], which has fixed upper size.
The option of using an alternative Stream implementation has been noted. Another alternative is to look into "pipelines", the new IO API. A "pipeline" is based around discontiguous memory, which means it isn't required to use a single contiguous buffer; the pipelines library will allocate multiple slabs as needed, which your code can process. I have written extensively on this topic; part 1 is here. Part 3 probably has the most code focus.
Just to confirm that I understand your question: you're downloading a single very large file in multiple parallel chunks and you know how big the final file is? If you don't then this does get a bit more complicated but it can still be done.
The best option is probably to use a MemoryMappedFile (MMF). What you'll do is to create the destination file via MMF. Each thread will create a view accessor to that file and write to it in parallel. At the end, close the MMF. This essentially gives you the behavior that you wanted with MemoryStreams but Windows backs the file by disk. One of the benefits to this approach is that Windows manages storing the data to disk in the background (flushing) so you don't have to, and should result in excellent performance.

SqlFileStream - Which FileOption and allocation size?

I am using the SqlFileStream and when constructing the object I am not sure which FileOptions and allocation size to use. I got this from another article but it did not explain why. Can somone help explain or give me a recommendation?
thanks!
using (var destination = new SqlFileStream(serverPathName, serverTxnContext, FileAccess.Write, FileOptions.Asynchronous, 4096))
{
await file.CopyToAsync(destination);
}
Since it appears like you are trying to copy this file asynchronously, you probably want FileOptions.Asynchronous. This the most responsive way to access your file, because you aren't bound to one thread. FileOptions.RandomAccess and FileOptions.SequentialScan both use caching the access the file however FileOptions.SequentialScan isn't guaranteed to cache optimally. Like the name implies, the large difference is how the access the file either randomly or sequentially. The WriteThrough just skips the cache and goes directly to the file which would be faster but riskier.
Allocation size is just the block size on the drive. If you pass 0 it would use the default size, which for an NTFS formatted drive would be 4KB. 4096 turns out to be 4KB so the person here is just making sure the block size is 4KB.

How to make these IO reads parallel and performant

I have a list of files: List<string> Files in my C#-based WPF application.
Files contains ~1,000,000 unique file paths.
I ran a profiler on my application. When I try to do parallel operations, it's REALLY laggy because it's IO bound. It even lags my UI threads, despite not having dispatchers going to them (note the two lines I've marked down):
Files.AsParallel().ForAll(x =>
{
char[] buffer = new char[0x100000];
using (FileStream stream = new FileStream(x, FileMode.Open, FileAccess.Read)) // EXTREMELY SLOW
using (StreamReader reader = new StreamReader(stream, true))
{
while (true)
{
int bytesRead = reader.Read(buffer, 0, buffer.Length); // EXTREMELY SLOW
if (bytesRead <= 0)
{
break;
}
}
}
}
These two lines of code take up ~70% of my entire profile test runs. I want to achieve maximum parallelization for IO, while keeping performance such that it doesn't cripple my app's UI entirely. There is nothing else affecting my performance. Proof: Using Files.ForEach doesn't cripple my UI, and WithDegreeOfParallelism helps out too (but, I am writing an application that is supposed to be used on any PC, so I cannot assume a specific degree of parallelism for this computation); also, the PC I am on has a solid-state hard disk. I have searched on StackOverflow, and have found links that talk about using asynchronous IO read methods. I'm not sure how they apply in this case, though. Perhaps someone can shed some light? Also; how can you tune down the constructor time of a new FileStream; is that even possible?
Edit: Well, here's something strange that I've noticed...the UI doesn't get crushed so bad when I swap Read for ReadAsync while still using AsParallel. Simply awaiting the task created by ReadAsync to finish causes my UI thread to maintain some degree of usability. I think this does some sort of asynchronous scheduling that is done in this method to maintain optimal disk usage while not crushing existing threads. And on that note, is there ever a chance that the operating system contends for existing threads to do IO, such as my application's UI thread? I seriously don't understand why its slowing my UI thread. Is the OS scheduling work from IO on my thread or something? Did they do something to the CLR to eat threads that haven't been explicitly affinated using Thread.BeginThreadAffinity or something? Memory is not an issue; I am looking at Task Manager and there is plenty.
I don't agree with your assertion that you can't use WithDegreeOfParallelism because it will be used on any PC. You can base it on number of CPU. By not using WithDegreeOfParallelism you are going to get crushed on some PCs.
You optimized for a solid state disc where heads don't have to move. I don't think this unrestricted parallel design will hold up on regular disc (any PC).
I would try a BlockingCollection with 3 queues : FileStream, StreamReader, and ObservableCollection. Limit the FileStream to like 4 - it just has to stay ahead of StreamReader. And no parallelism.
A single head is a single head. It cannot read from 5 or 5000 files faster than it can read from 1. On solid state the is no penalty switching from file to file - on a regular disc there is a significant penalty. If your files are fragmented there is a significant penalty (on regular disc).
You don't show what the data write but the next step would be to put the write in a another queue with a BlockingCollection in the BlockingCollection.
E.G. sb.Append(text); in a separate queue.
But that may be more overhead than it is worth.
Keep that head as close to 100% busy on a single contiguous file is the best you are going to do.
private async Task<string> ReadTextAsync(string filePath)
{
using (FileStream sourceStream = new FileStream(filePath,
FileMode.Open, FileAccess.Read, FileShare.Read,
bufferSize: 4096, useAsync: true))
{
StringBuilder sb = new StringBuilder();
byte[] buffer = new byte[0x1000];
int numRead;
while ((numRead = await sourceStream.ReadAsync(buffer, 0, buffer.Length)) != 0)
{
string text = Encoding.Unicode.GetString(buffer, 0, numRead);
sb.Append(text);
}
return sb.ToString();
}
}
File access is inherently not parallel. You can only benefit from parallelism, if you treat some files while reading others. It makes no sense to wait for the disk in parallel.
Instead of waiting 100 000 time 1 ms for disk access, you program to wait once 100 000 ms = 100 s.
Unfortunately, it's a vague question without a reproducible code example. So it's impossible to offer specific advice. But my two recommendations are:
Pass a ParallelOptions instance where you've set the MaxDegreeOfParallelism property to something reasonably low. Something like the number of cores in your system, or even that number minus one.
Make sure you aren't expecting too much from the disk. You should start with the known speed of the disk and controller, and compare that with the data throughput you're getting. Adjust the degree of parallelism even lower if it looks like you're already at or near the maximum theoretical throughput.
Performance optimization is all about setting realistic goals based on known limitations of the hardware, measuring your actual performance, and then looking at how you can improve the costliest elements of your algorithm. If you haven't done the first two steps yet, you really should start there. :)
I got it working; the problem was me trying to use an ExtendedObservableCollection with AddRange instead of calling Add multiple times in every UI dispatch...for some reason, the performance of the methods people list in here is actually slower in my situation: ObservableCollection Doesn't support AddRange method, so I get notified for each item added, besides what about INotifyCollectionChanging?
I think because it forces you to call change notifications with .Reset (reload) instead of .Add (a diff), there is some sort of logic in place that causes bottlenecks.
I apologize for not posting the rest of the code; I was really thrown off by this, and I'll explain why in a moment. Also, a note for others who come across the same issue, this might help. The main problem with profiling tools in this scenario is that they don't help much here. Most of your app's time will be spent reading files regardless. So you have to unit test all dispatchers separately.

stream reliability

I came across a few posts that where claming that streams are not a reliable data structure, meaning that read/write operations might not follow through in all cases.
So:
a) Is there any truth to this consensus?
b) If so what are the cases in wich read/write operaitions might fail?
This consensus on streams which I came across claims that you sould loop through read/write operations until complete:
var bytesRead = 0;
var _packet = new byte[8192];
while ((bytesRead += file_reader.Read(_packet, bytesRead, _packet.Length - bytesRead)) < _packet.Length) ;
Well, it depends on what operation you're talking about, and on what layer you consider it a failure.
For instance, if you attempt to read past the end of a stream (ie. read 1000 bytes from a file that only contains 100 bytes, or read a 1000 bytes from a position that is closer to the end of the file than 1000), you will get fewer bytes left. The stream read methods returns the number of bytes they actually managed to read, so you should check that value.
As for write operations, writing to a file might fail if the disk is full, or other similar problems, but in case of write operations you'll get back an exception.
If you're writing to sockets or other network streams, there is no guarantee that even if the Write method returns without exceptions, that the other end is able to receive it, there's a ton of problems that can go wrong along the way.
However, to alleviate your concerns, streams by themselves are not unreliable.
The medium they talk to, however, can be.
If you follow the documentation, catch and act accordingly when errors occur, I think you'll find streams are pretty much bullet-proof. There's a significant amount of code that has invested in this reliability.
Can we see links to those who claim otherwise? Either you've misunderstood, or they are incorrect.

FileStream.Seek vs. Buffered Reading

Motivated by this answer I was wondering what's going on under the curtain if one uses lots of FileStream.Seek(-1).
For clarity I'll repost the answer:
using (var fs = File.OpenRead(filePath))
{
fs.Seek(0, SeekOrigin.End);
int newLines = 0;
while (newLines < 3)
{
fs.Seek(-1, SeekOrigin.Current);
newLines += fs.ReadByte() == 13 ? 1 : 0; // look for \r
fs.Seek(-1, SeekOrigin.Current);
}
byte[] data = new byte[fs.Length - fs.Position];
fs.Read(data, 0, data.Length);
}
Personally I would have read like 2048 bytes into a buffer and searched that buffer for the char.
Using Reflector I found out that internally the method is using SetFilePointer.
Is there any documentation about windows caching and reading a file backwards? Does Windows buffer "backwards" and consult the buffer when using consecutive Seek(-1) or will it read ahead starting from the current position?
It's interesting that on the one hand most people agree with Windows doing good caching, but on the other hand every answer to "reading file backwards" involves reading chunks of bytes and operating on that chunk.
Going forward vs backward doesn't usually make much difference. The file data is read into the file system cache after the first read, you get a memory-to-memory copy on ReadByte(). That copy isn't sensitive to the file pointer value as long as the data is in the cache. The caching algorithm does however work from the assumption that you'd normally read sequentially. It tries to read ahead, as long as the file sectors are still on the same track. They usually are, unless the disk is heavily fragmented.
But yes, it is inefficient. You'll get hit with two pinvoke and API calls for each individual byte. There's a fair amount of overhead in that, those same two calls could also read, say, 65 kilobytes with the same amount of overhead. As usual, fix this only when you find it to be a perf bottleneck.
Here is a pointer on File Caching in Windows
The behavior may also depends on where physically resides the file (hard disk, network, etc.) as well as local configuration/optimization.
An also important source of information is the CreateFile API documentation: CreateFile Function
There is a good section named "Caching Behavior" that tells us at least how you can influence file caching, at least in the unmanaged world.

Categories