How can I find hard disk speed? I can not use System.IO.File.Copy and use timer to get hard disk speed, because after caching file, the speed will be really higher than real time.
What can I do instead?
The reason the subsequent read speed is much higher than expected after writing a file, is that the file is cached by the OS in the disk system cache when it is written i.e. in-memory. The subsequent file read is in effect being read from memory, rather than disk.
Please see this code project article which provides a solution for bypassing the OS disk cache by leveraging the FILE_FLAG_NO_BUFFERING flag:
http://www.codeproject.com/KB/files/unbuffered.aspx
This solution can be used in your context to avoid OS disk caching, and so obtain "real" disk speeds.
You are asking about "real time". Since in real case cache is in use, I don't see any problems. You might like to use some script or similar to run your program instead of some File.Copy or other simpler tests.
But the real question is, what then? I.e. are you trying to find out if some disk is fast enough or are you trying to find out if your program is fast enough?
Related
Windows File Caching, as described here:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364218(v=vs.85).aspx
By default, Windows caches file data that is read from disks and
written to disks. This implies that read operations read file data
from an area in system memory known as the system file cache, rather
than from the physical disk.
I have a C# app that reads files from disk in a very frequent, rapid manner. Should I worry about writing my own "file caching" mechanism, or will Windows handle this for me? When observing my app with Process Explorer, I still notice a lot of disk I/O during operation, even though I'm reading the same static file over and over again. Could it be that the Windows Cache Manager is simply telling the operating system that disk IO is taking place, when in fact the file is being read from the cache in memory?
Caching is enabled by default for filesystem operations in all OSes I'm aware of, Windows included. I'd be astonished if the implementation of File.ReadAllText disabled it, because it gives a pretty huge performance benefit.
The filesystem cache is fast, but a custom cache can be purpose-built and therefor much faster.
For instance, ReadAllText needs to decode the file into a string -- the filesystem cache won't help you there. You can also keep that same instance around, so that all parts of your app accessing it reference the same copy. This gives your CPU's cache a better chance of skipping main memory. Less allocations also means reduced GC pressure.
Do you need your own second layer of caching? Maybe, maybe not -- you should write the simplest code you can first, and then work to optimize it if it's a bottleneck after measuring.
Yesterday,I asked the question at here:how do disable disk cache in c# invoke win32 CreateFile api with FILE_FLAG_NO_BUFFERING.
In my performance test show(write and read test,1000 files and total size 220M),the FILE_FLAG_NO_BUFFERING can't help me improve performance and lower than .net default disk cache,since i try change FILE_FLAG_NO_BUFFERING to FILE_FLAG_SEQUENTIAL_SCAN can to reach the .net default disk cache and faster little.
before,i try use mongodb's gridfs feature replace the windows file system,not good(and i don't need to use distributed feature,just taste).
in my Product,the server can get a lot of the smaller files(60-100k) on per seconds through tcp/ip,then need save it to the disk,and third service read these files once(just read once and process).if i use asynchronous I/O whether can help me,whether can get best speed and best low cpu cycle?. someone can give me suggestion?or i can still use FileStream class?
update 1
the memory mapped file whether can to achieve my demand.that all files write to one big file or more and read from it?
If your PC is taking 5-10 seconds to write a 100kB file to disk, then you either have the world's oldest, slowest PC, or your code is doing something very inefficient.
Turning off disk caching will probably make things worse rather than better. With a disk cache in place, your writes will be fast, and Windows will do the slow part of flushing the data to disk later. Indeed, increasing I/O buffering usually results in significantly improved I/O in general.
You definitely want to use asynchronous writes - that means your server starts the data writing, and then goes back to responding to its clients while the OS deals with writing the data to disk in the background.
There shouldn't be any need to queue the writes (as the OS will already be doing that if disc caching is enabled), but that is something you could try if all else fails - it could potentially help by writing only one file at a time to minimise the need for disk seeks..
Generally for I/O, using larger buffers helps to increase your throughput. For example instead of writing each individual byte to the file in a loop, write a buffer-ful of data (ideally the entire file, for the sizes you mentioned) in one Write operation. This will minimise the overhead (instead of calling a write function for every byte, you call a function once for the entire file). I suspect you may be doing something like this, as it's the only way I know to reduce performance to the levels you've suggested you are getting.
Memory-mapped files will not help you. They're really best for accessing the contents of huge files.
One of buggest and significant improvements, in your case, can be, imo, process the filles without saving them to a disk and after, if you really need to store them, push them on Queue and provess it in another thread, by saving them on disk. By doing this you will immidiately get processed data you need, without losing time to save a data on disk, but also will have a file on disk after, without losing computational power of your file processor.
I have an application (EXE file). it is running and while running generate some files (jpeg files) on hard disk. we know read and write to hard disk has poor performance.
Is there any solution to force this application to use memory to save its output jpeg files.
If this solution will be under Windows and use C#, it will be ideal.
Thanks.
The simplest option is probably not a programmatic one - it's just to use a RAM disk such as RAMDisk (there are others available, of course).
That way other processes get to use the results easily, without any messing around.
Since you don't have the source for the EXE and you can't/won't use a RAM disk, the next option is to improve the IO performance of your machine:
Use an SSD or a RAID 0 array, or add loads of memory that can be used as a cache.
But without access to the source code for the application, this isn't really a programming question, because the only way you can 'program' a solution is to write your own RAM disk application - and you can't use a RAM disk, so you've said.
IF you really need to make this solution programmatic then you need to dig deep - depending on the application you will have to hook a lot of functions used by the exe...
That is a really tough thing to do and is prone to problems with several things - permissions/rights, antivirus-protections...
Starting points:
http://www.codeproject.com/KB/winsdk/MonitorWindowsFileSystem.aspx
http://msdn.microsoft.com/en-us/windows/hardware/gg462968.aspx
A very similar question has also been asked here on SO in case you are interested, but as we will see the accepted answer of that question is not always the case (and it's never the case for my application use-pattern).
The performance determining code consists of FileStream constructor (to open a file) and a SHA1 hash (the .Net framework implementation). The code is pretty much C# version of what was asked in the question I've linked to above.
Case 1: The Application is started either for the first time or Nth time, but with different target file set. The application is now told to compute the hash values on the files that were never accessed before.
~50ms
80% FileStream constructor
18% hash computation
Case 2: Application is now fully terminated, and started again, asked to compute hash on the same files:
~8ms
90% hash computation
8% FileStream constructor
Problem
My application is always in use Case 1. It will never be asked to re-compute a hash on a file that was already visited once.
So my rate-determining step is FileStream Constructor! Is there anything I can do to speed up this use case?
Thank you.
P.S. Stats were gathered using JetBrains profiler.
... but with different target file set.
Key phrase, your app will not be able to take advantage of the file system cache. Like it did in the second measurement. The directory info can't come from RAM because it wasn't read yet, the OS always has to fall back to the disk drive and that is slow.
Only better hardware can speed it up. 50 msec is about the standard amount of time needed for a spindle drive, 20 msec is about as low as such drives can go. Reader head seek time is the hard mechanical limit. That's easy to beat today, SSD is widely available and reasonably affordable. The only problem with it is that when you got used to it then you never move back :)
The file system and or disk controller will cache recently accessed files / sectors.
The rate-determining step is reading the file, not constructing a FileStream object, and it's completely normal that it will be significantly faster on the second run when data is in the cache.
Off track suggestion, but this is something that I have done a lot and got our analyses 30% - 70% faster:
Caching
Write another piece of code that will:
iterate over all the files;
compute the hash; and,
store it in another index file.
Now, don't call a FileStream constructor to compute the hash when your application starts. Instead, open the (expectedly much) smaller index file and read the precomputed hash off it.
Further, if these files are log etc. files which are freshly created every time before your application starts, add code in the file creator to also update the index file with the hash of the newly created file.
This way your application can always read the hash from the index file only.
I concur with #HansPassant's suggestion of using SSDs to make your disk reads faster. This answer and his answer are complimentary. You can implement both to maximize the performance.
As stated earlier, the file system has its own caching mechanism which perturbates your measurement.
However, the FileStream constructor performs several tasks which, the first time are expensive and require accessing the file system (therefore something which might not be in the data cache). For explanatory reasons, you can take a look at the code, and see that the CompatibilitySwitches classes is used to detect sub feature usage. Together with this class, Reflection is heavily used both directly (to access the current assembly) and indirectly (for CAS protected sections, security link demands). The Reflection engine has its own cache, and requires accessing the file system when its own cache is empty.
It feels a little bit odd that the two measurements are so different. We currently have something similar on our machines equipped with an antivirus software configured with realtime protection. In this case, the antivirus software is in the middle and the cache is hit or missed the first time depending the implementation of such software.
The antivirus software might decide to aggressively check certain image files, like PNGs, due to known decode vulnerabilities. Such checks introduce additional slowdown and accounts the time in the outermost .NET class, i.e. the FileStream class.
Profiling using native symbols and/or with kernel debugging, should give you more insights.
Based on my experience, what you describe cannot be mitigated as there are multiple hidden layers out of our control. Depending on your usage, which is not perfectly clear to me right now, you might turn the application in a service, therefore you could serve all the subsequent requests faster. Alternative, you could batch multiple requests into one single call to achieve an amortized reduced cost.
You should try to use the native FILE_FLAG_SEQUENTIAL_SCAN, you will have to pinvoke CreateFile in order to get an handle and pass it to FileStream
What is the logic behind disk defragmentation and Disk Check in Windows? Can I do it using C# coding?
For completeness sake, here's a C# API wrapper for defragmentation:
http://blogs.msdn.com/jeffrey_wall/archive/2004/09/13/229137.aspx
Defragmentation with these APIs is (supposed to be) very safe nowadays. You shouldn't be able to corrupt the file system even if you wanted to.
Commercial defragmentation programs use the same APIs.
Look at Defragmenting Files at msdn for possible API helpers.
You should carefully think about using C# for this task, as it may introduce some undesired overhead for marshaling into native Win32.
If you don't know the logic for defragmentation, and if you didn't write the file system yourself so you can't authoritatively check it for errors, why not just start new processes running 'defrag' and 'chkdsk'?
Mark Russinovich wrote an article Inside Windows NT Disk Defragmentation a while ago which gives in-depth details. If you really want to do this I would really advise you to use the built-in facilities for defragmenting. More so, on recent OSes I have never seen a need as a user to even care about defragmenting; it will be done automatically on a schedule and the NTFS folks at MS are definitely smarter at that stuff than you (sorry, but they do this for some time now, you don't).
Despite its importance, the file system is no more than a data structure that maps file names into lists of disk blocks. And keeps track of meta-information such as the actual length of the file and special files that keep lists of files (e.g., directories). A disk checker verifies that the data structure is consistent. That is, every disk block must either be free for allocation to a file or belong to a single file. It can also check for certain cases where a set of disk blocks appears to be a file that should be in a directory but is not for some reason.
Defragmentation is about looking at the lists of disk blocks assigned to each file. Files will generally load faster if they use a contiguous set of blocks rather than ones scattered all over the disk. And generally the entire file system will perform best if all the disk blocks in use confine themselves to a single congtiguous range of the disk. Thus the trick is moving disk blocks around safely to achieve this end while not destroying the file system.
The major difficulty here is running these application while a disk is in use. It is possible but one has to be very, very, very careful not to make some kind of obvious or extremely subtle error and destroy most or all of the files. It is easier to work on a file system offline.
The other difficulty is dealing with the complexities of the file system. For example, you'd be much better off building something that supports FAT32 rather than NTFS because the former is a much, much simpler file system.
As long as you have low-level block access and some sensible way for dealing with concurrency problems (best handled by working on the file system when it is not in use) you can do this in C#, perl or any language you like.
BUT BE VERY CAREFUL. Early versions of the program will destroy entire file systems. Later versions will do so but only under obscure circumstances. And users get extremely angry and litigious if you destroy their data.