I want to stream a large data file for a game continuously from the disk of a iOS device.
The question is if anyone has streamed such files ( Blocks of 20MB ) before by using a System.IO.FileStream. I have no iOS-device do test it myself and i not expect to get one in the next time.
There are 2 questions:
Is the file streamed without loading it fully ( The behaviour which i expect from a stream but i'm unsure about the handling of MonoTouch ) and how is the memory usage while streaming it?
How is the performance of the loading process, especially when loading different files at once?
Thank you for any information.
MonoTouch base class libraries (BCL) comes from Mono so a lot of the code is available as open source. In the case of FileStream you can see the code on github.
Is the file streamed without loading it fully ( The behaviour which i expect from a stream but i'm unsure about the handling of MonoTouch )
You're right, it won't be fully loaded. You'll control what's being read.
and how is the memory usage while streaming it?
The above link shows that the default buffer size is set to 8192 bytes (8k) but that several constructors allows you to use a different size (if you wish so).
and how is the memory usage while streaming it?
But that buffer is an internal buffer. You'll provide your own buffer when you call methods like Read so you will be, again, in control of how much memory is being used.
How is the performance of the loading process, especially when loading different files at once?
That's difficult to predict and will largely depend on your application (e.g. number of files, total memory required...). You can use FileStream asynchronous methods, like BeginRead, to get better performance if required.
Related
I'm developing a multiple segment file downloader. To accomplish this task I'm currently creating as many temporary files on disk as segments I have (they are fixed in number during the file downloading). In the end I just create a new file f and copy all the segments' contents onto f.
I was wondering if there's not a better way to accomplish this. My idealization is of initially creating f in its full-size and then have the different threads write directly onto their portion. There need not to be any kind of interaction between them. We can assume any of them will start at its own starting point in the file and then only fill information sequentially in the file until its task is over.
I've heard about Memory-Mapped files (http://msdn.microsoft.com/en-us/library/dd997372(v=vs.110).aspx) and I'm wondering if they are the solution to my problem or not.
Thanks
Using the memory mapped API is absolute doable and it will probably perform quite well - of cause some testing would be recommended.
If you want to look for a possible alternative implementation, I have the following suggestion.
Create a static stack data structure, where the download threads can push each file segment as soon as it's downloaded.
Have a separate thread listen for push notifications on the stack. Pop the stack file segments and save each segment into the target file in a single threaded way.
By following the above pattern, you have separated the download of file segments and the saving into a regular file, by putting a stack container in between.
Depending on the implementation of the stack handling, you will be able to implement this with very little thread locking, which will maximise performance.
The pros of this is that you have 100% control on what is going on and a solution that might be more portable (if that ever should be a concern).
The stack decoupling pattern you do, can also be implemented pretty generic and might even be reused in the future.
The implementation of this is not that complex and probably on par with the implementation needed to be done around the memory mapped api.
Have fun...
/Anders
The answers posted so far are, of course addressing your question but you should also consider the fact that multi-threaded I/O writes will most likely NOT give you gains in performance.
The reason for multi-threading downloads is obvious and has dramatic results. When you try to combine the files though, remember that you are having multiple threads manipulate a mechanical head on conventional hard drives. In case of SSD's you may gain better performance.
If you use a single thread, you are by far exceeding the HDD's write capacity in a SEQUENTIAL way. That IS by definition the fastest way to write to conventions disks.
If you believe otherwise, I would be interested to know why. I would rather concentrate on tweaking the write performance of a single thread by playing around with buffer sizes, etc.
Yes, it is possible but the only precaution you need to have is to control that no two threads are writing at the same location of file, otherwise file content will be incorrect.
FileStream writeStream = new FileStream(destinationPath, FileMode.OpenOrCreate, FileAccess.Write, FileShare.Write);
writeStream.Position = startPositionOfSegments; //REMEMBER This piece of calculation is important
// A simple function to write the bytes ... just read from your source and then write
writeStream.Write(ReadBytes, 0 , bytesReadFromInputStream);
After each Write we used writeStream.Flush(); so that buffered data gets written to file but you can change according to your requirement.
Since you have code already working which downloads the file segments in parallel. The only change you need to make is just open the file stream as posted above, and instead of creating many segments file locally just open stream for a single file.
The startPositionOfSegments is very important and calculate it perfectly so that no two segments overwrite the desired downloaded bytes to same location on file otherwise it will provide incorrect result.
The above procedure works perfectly fine at our end, but this can be problem if your segment size are too small (We too faced it but after increasing size of segments it got fixed). If you face any exception then you can also synchronize only the Write part.
I'm receiving a file as a stream of byte[] data packets (total size isn't known in advance) that I need to store somewhere before processing it immediately after it's been received (I can't do the processing on the fly). Total received file size can vary from as small as 10 KB to over 4 GB.
One option for storing the received data is to use a MemoryStream, i.e. a sequence of MemoryStream.Write(bufferReceived, 0, count) calls to store the received packets. This is very simple, but obviously will result in out of memory exception for large files.
An alternative option is to use a FileStream, i.e. FileStream.Write(bufferReceived, 0, count). This way, no out of memory exceptions will occur, but what I'm unsure about is bad performance due to disk writes (which I don't want to occur as long as plenty of memory is still available) - I'd like to avoid disk access as much as possible, but I don't know of a way to control this.
I did some testing and most of the time, there seems to be little performance difference between say 10 000 consecutive calls of MemoryStream.Write() vs FileStream.Write(), but a lot seems to depend on buffer size and the total amount of data in question (i.e the number of writes). Obviously, MemoryStream size reallocation is also a factor.
Does it make sense to use a combination of MemoryStream and FileStream, i.e. write to memory stream by default, but once the total amount of data received is over e.g. 500 MB, write it to FileStream; then, read in chunks from both streams for processing the received data (first process 500 MB from the MemoryStream, dispose it, then read from FileStream)?
Another solution is to use a custom memory stream implementation that doesn't require continuous address space for internal array allocation (i.e. a linked list of memory streams); this way, at least on 64-bit environments, out of memory exceptions should no longer be an issue. Con: extra work, more room for mistakes.
So how do FileStream vs MemoryStream read/writes behave in terms of disk access and memory caching, i.e. data size/performance balance. I would expect that as long as enough RAM is available, FileStream would internally read/write from memory (cache) anyway, and virtual memory would take care of the rest. But I don't know how often FileStream will explicitly access a disk when being written to.
Any help would be appreciated.
No, trying to optimize this doesn't make any sense. Windows itself already caches file writes, they are buffered by the file system cache. So your test is about accurate, both MemoryStream.Write() and FileStream.Write() actually write to RAM and have no significant perf differences. The file system driver lazily writes it to disk in the background.
The RAM used for the file system cache is what's left over after processes claimed their RAM needs. By using a MemoryStream, you reduce the effectiveness of the file system cache. Or in other words, you trade one for the other without benefit. You're in fact off worse, you use double the amount of RAM.
Don't help, this is already heavily optimized inside the operating system.
Since recent versions of Windows enable write caching by default, I'd say you could simply use FileStream and let Windows manage when or if anything actually is written to the physical hard drive.
If these files don't stick around after you've received them, you should probably write the files to a temp directory and delete them when you're done with them.
Use a FileStream constructor that allows you to define the buffer size. For example:
using (outputFile = new FileStream("filename",
FileMode.Create, FileAccess.Write, FileShare.None, 65536))
{
}
The default buffer size is 4K. Using a 64K buffer reduces the number of calls to the file system. A larger buffer will reduce the number of writes, but each write starts to take longer. Emperical data (many years of working with this stuff) indicates that 64K is a very good choice.
As somebody else pointed out, the file system will likely do further caching, and do the actual disk write in the background. It's highly unlikely that you'll receive data faster than you can write it to a FileStream.
Yesterday,I asked the question at here:how do disable disk cache in c# invoke win32 CreateFile api with FILE_FLAG_NO_BUFFERING.
In my performance test show(write and read test,1000 files and total size 220M),the FILE_FLAG_NO_BUFFERING can't help me improve performance and lower than .net default disk cache,since i try change FILE_FLAG_NO_BUFFERING to FILE_FLAG_SEQUENTIAL_SCAN can to reach the .net default disk cache and faster little.
before,i try use mongodb's gridfs feature replace the windows file system,not good(and i don't need to use distributed feature,just taste).
in my Product,the server can get a lot of the smaller files(60-100k) on per seconds through tcp/ip,then need save it to the disk,and third service read these files once(just read once and process).if i use asynchronous I/O whether can help me,whether can get best speed and best low cpu cycle?. someone can give me suggestion?or i can still use FileStream class?
update 1
the memory mapped file whether can to achieve my demand.that all files write to one big file or more and read from it?
If your PC is taking 5-10 seconds to write a 100kB file to disk, then you either have the world's oldest, slowest PC, or your code is doing something very inefficient.
Turning off disk caching will probably make things worse rather than better. With a disk cache in place, your writes will be fast, and Windows will do the slow part of flushing the data to disk later. Indeed, increasing I/O buffering usually results in significantly improved I/O in general.
You definitely want to use asynchronous writes - that means your server starts the data writing, and then goes back to responding to its clients while the OS deals with writing the data to disk in the background.
There shouldn't be any need to queue the writes (as the OS will already be doing that if disc caching is enabled), but that is something you could try if all else fails - it could potentially help by writing only one file at a time to minimise the need for disk seeks..
Generally for I/O, using larger buffers helps to increase your throughput. For example instead of writing each individual byte to the file in a loop, write a buffer-ful of data (ideally the entire file, for the sizes you mentioned) in one Write operation. This will minimise the overhead (instead of calling a write function for every byte, you call a function once for the entire file). I suspect you may be doing something like this, as it's the only way I know to reduce performance to the levels you've suggested you are getting.
Memory-mapped files will not help you. They're really best for accessing the contents of huge files.
One of buggest and significant improvements, in your case, can be, imo, process the filles without saving them to a disk and after, if you really need to store them, push them on Queue and provess it in another thread, by saving them on disk. By doing this you will immidiately get processed data you need, without losing time to save a data on disk, but also will have a file on disk after, without losing computational power of your file processor.
I am writing a log backup program in C#. The main objective is to take logs from multiple servers, copy and compress the files and then move them to a central data storage server. I will have to move about 270Gb of data every 24 hours. I have a dedicated server to run this job and a LAN of 1Gbps. Currently I am reading lines from a (text)file, copying them into a buffer stream and writing them to the destination.
My last test copied about 2.5Gb of data in 28 minutes. This will not do. I will probably thread the program for efficiency, but I am looking for a better method to copy the files.
I was also playing with the idea of compressing everything first and then using a stream buffer a bit to copy. Really, I am just looking for a little advice from someone with more experience than me.
Any help is appreciated, thanks.
You first need to profile as Umair said so that you can figure out how much of the 28 minutes is spent compressing vs. transmitting. Also measure the compression rate (bytes/sec) with different compression libraries, and compare your transfer rate against other programs such as Filezilla to see if you're close to your system's maximum bandwidth.
One good library to consider is DotNetZip, which allows you to zip to a stream, which can be handy for large files.
Once you get it fine-tuned for one thread, experiment with several threads and watch your processor utilization to see where the sweet spot is.
One of the solutions can be is what you mantioned: compress files in one Zip file and after transfer them via network. This will bemuch faster as you are transfering one file and often on of principal bottleneck during file transfers is Destination security checks.
So if you use one zip file, there should be one check.
In short:
Compress
Transfer
Decompress (if you need)
This already have to bring you big benefits in terms of performance.
Compress the logs at source and use TransmitFile (that's a native API - not sure if there's a framework equivalent, or how easy it is to P/Invoke this) to send them to the destination. (Possibly HttpResponse.TransmitFile does the same in .Net?)
In any event, do not read your files linewise - read the files in blocks (loop doing FileStream.Read for 4K - say - bytes until read count == 0) and send that direct to the network pipe.
Trying profiling your program... bottleneck is often where you least expect it to be. As some clever guy said "Premature optimisation is the root of all evil".
Once in a similar scenario at work, I was given the task to optimise the process. And after profiling the bottleneck was found to be a call to sleep function (which was used for synchronisation between thread!!!! ).
I have an application that receives chunks of data over the network, and writes these to disk.
Once all chunks have been received, they can be decoded/recombined into the single file they actually represent.
I'm wondering if it's useful to use memory-mapped files or not - first for writing the single chunks to disk, second for the single file into which all of them are decoded.
My own feeling is that it might be useful for the second case only, anyone got some ideas on this?
Edit:
It's a C# app, and I'm only planning an x64 version.
(So running into the 'largest contigious free space' problem shouldn't be relevant)
Memory-mapped files are beneficial for scenarios where a relatively small portion (view) of a considerably larger file needs to be accessed repeatedly.
In this scenario, the operating system can help optimize the overall memory usage and paging behavior of the application by paging in and out only the most recently used portions of the mapped file.
In addition, memory-mapped files can expose interesting features such as copy-on-write or serve as the basis of shared-memory.
For your scenario, memory-mapped files can help you assemble the file if the chunks arrive out of order. However, you would still need to know the final file size in advance.
Also, you should be accessing the files only once, for writing a chunk. Thus, a performance advantage over explicitly implemented asynchronous I/O is unlikely, but it may be easier and quicker to implement your file writer correctly.
In .NET 4, Microsoft added support for memory-mapped files and there are some comprehensive articles with sample code, e.g. http://blogs.msdn.com/salvapatuel/archive/2009/06/08/working-with-memory-mapped-files-in-net-4.aspx.
Memory-mapped files are primarily used for Inter-Process Communication or I/O performance improvement.
In your case, are you trying to get better I/O performance?
Hate to point out the obivious, but Wikipedia gives a good rundown of the situation...
http://en.wikipedia.org/wiki/Memory-mapped_file
Specifically...
The memory mapped approach has its cost in minor page faults - when a block of data is loaded in page cache, but not yet mapped in to the process's virtual memory space. Depending on the circumstances, memory mapped file I/O can actually be substantially slower than standard file I/O.
It sounds like you're about to prematurely optimize for speed. Why not a regular file approach, and then refactor for MM files later if needed?
I'd say both cases are relevant. Simply write the single chunks to their proper place in the memory mapped file, out of order, as they come in. This of course is only useful if you know where each chunk should go, like in a bittorrent downloader. If you have to perform some extra analysis to know where the chunk should go, the benefit of a memory mapped file might not be as large.