My Program ist getting alot and very frequently Data, up to 2-4 Times per second. My Goal is to take this Data and write it into a File.
My Question now, is it smart to having a File-Pointer constantly open? May it be better to just cache the Data first and then write it into a File?
How is the perfomance?
Are there Design-Patterns which are good for this? Any Tips are welcome.
Actually buffering is already implemented in standard System.IO.FileStream http://msdn.microsoft.com/en-us/library/system.io.filestream.aspx
Instead of constant writing all changes are accumulated in buffer and flush to disk as buffer getting full. Just remember specify buffer in constructor and call flush as you finish.
Related
I am working in c# and in my program i am currently opening a file stream at launch and writing data to it in csv format about every second. I am wondering, would it be more efficient to store this data in an arraylist and write it all at once at the end, or continue to keep an open filestream and just write the data every second?
If the amount of data is "reasonably manageable" in memory, then write the data a the end.
If this is continuous, I wonder id an option could be to use something like NLog to write your csv (create a specific log format) as that manages writes pretty efficiently. You would also need to set it to raise exceptions if there was an error.
You should consider using a BufferedStream instead. Write to the stream and allow the framework to flush to file as necessary. Just make sure to flush the stream before closing it.
From what I learned in Operating Systems writing to a file is a lot more expensive than writing to memory. However, your stream is most likely going to be cached. Which means that under the hood, all that file writing you are doing is happening in memory. The operating system handles all the actual writing to file asynchronously when it's the right time. Depending on your applications there is no need to worry about such micro-optimizations.
You can read more about why most languages take this approach under the hood here https://unix.stackexchange.com/questions/224415/whats-the-philosophy-behind-delaying-writing-data-to-disk
This kind of depends on your specific case. If you're writing data about once per second it seems likely that you're not going to see much of an impact from writing directly.
In general writing to a FileStream in small pieces is quite performant because the .NET Framework and the OS handle buffering for you. You won't see the file itself being updated until the buffer fills up or you explicitly flush the stream.
Buffering in memory isn't a terrible idea for smallish data and shortish periods. Of course if your program throws an exception or someone kills it before it writes to the disk then you lose all that information, which is probably not your favourite thing.
If you're worried about performance then use a logging thread. Post objects to it through a ConcurrentQueue<> or similar and have it do all the writes on a separate thread. Obviously threaded logging is more complex. It's not something I'd advise unless you really, really need the extra performance.
For quick-and-dirty logging I generally just use File.AppendAllText() or File.AppendAllLines() to push the data out. It takes a bit longer, but it's pretty reliable. And I can read the output while the program is still running, which is often useful.
I am developing an application in c# which handles a large stream of incoming and outgoing data from a queue like buffer. The buffer needs to be some sort of file in the disk. Data will be written to the buffer very often (i'm talking like once every 10 ms!). Each data written will make 1 record/line in the buffer.
The same program will also read the buffer (line by line) and process the buffered data. After one line/record has been processed, the buffered data must immediately delete the line/record from the buffer file to prevent it from reprocessing the buffered data in the event of a system reboot. This read and delete will also be done at a very fast rate (like every 10ms or so).
So it's as system which writes, reads, and purges what has been read. It gets even harder as this buffer may grow up to 5GB (GIGABYTE) in size if the program decides not to process the buffered data.
**So my question is: What kind of method should I use to handle this buffering mechanism? I had a look at using SQlite and simple text file but they may be ineffecient handling large sizes or maybe not so good handling concurrent inserts, read and delete.
Anyway, really appreciate any advice. Thank you in advance for any answers given!**
You sound like you're describing Message Queues
There's MSMQ, ZeroMQ, RabbitMQ, and a couple others.
Here's a bunch of links on MSMQ:
http://www.techrepublic.com/article/use-microsoft-message-queuing-in-c-for-inter-process-communication/6170794
http://support.microsoft.com/KB/815811
http://www.csharphelp.com/2007/06/msmq-your-reliable-asynchronous-message-processing/
http://msdn.microsoft.com/en-us/library/ms973816.aspx
http://www.c-sharpcorner.com/UploadFile/rajkpt/101262007012217AM/1.aspx
Here's ZeroMQ (or 0MQ)
And here's RabbitMQ
Yesterday,I asked the question at here:how do disable disk cache in c# invoke win32 CreateFile api with FILE_FLAG_NO_BUFFERING.
In my performance test show(write and read test,1000 files and total size 220M),the FILE_FLAG_NO_BUFFERING can't help me improve performance and lower than .net default disk cache,since i try change FILE_FLAG_NO_BUFFERING to FILE_FLAG_SEQUENTIAL_SCAN can to reach the .net default disk cache and faster little.
before,i try use mongodb's gridfs feature replace the windows file system,not good(and i don't need to use distributed feature,just taste).
in my Product,the server can get a lot of the smaller files(60-100k) on per seconds through tcp/ip,then need save it to the disk,and third service read these files once(just read once and process).if i use asynchronous I/O whether can help me,whether can get best speed and best low cpu cycle?. someone can give me suggestion?or i can still use FileStream class?
update 1
the memory mapped file whether can to achieve my demand.that all files write to one big file or more and read from it?
If your PC is taking 5-10 seconds to write a 100kB file to disk, then you either have the world's oldest, slowest PC, or your code is doing something very inefficient.
Turning off disk caching will probably make things worse rather than better. With a disk cache in place, your writes will be fast, and Windows will do the slow part of flushing the data to disk later. Indeed, increasing I/O buffering usually results in significantly improved I/O in general.
You definitely want to use asynchronous writes - that means your server starts the data writing, and then goes back to responding to its clients while the OS deals with writing the data to disk in the background.
There shouldn't be any need to queue the writes (as the OS will already be doing that if disc caching is enabled), but that is something you could try if all else fails - it could potentially help by writing only one file at a time to minimise the need for disk seeks..
Generally for I/O, using larger buffers helps to increase your throughput. For example instead of writing each individual byte to the file in a loop, write a buffer-ful of data (ideally the entire file, for the sizes you mentioned) in one Write operation. This will minimise the overhead (instead of calling a write function for every byte, you call a function once for the entire file). I suspect you may be doing something like this, as it's the only way I know to reduce performance to the levels you've suggested you are getting.
Memory-mapped files will not help you. They're really best for accessing the contents of huge files.
One of buggest and significant improvements, in your case, can be, imo, process the filles without saving them to a disk and after, if you really need to store them, push them on Queue and provess it in another thread, by saving them on disk. By doing this you will immidiately get processed data you need, without losing time to save a data on disk, but also will have a file on disk after, without losing computational power of your file processor.
I use _FileStream.Write(_ByteArray, 0, _ByteArray.Length); to write a bytearray to a file. I noticed that's very slow.
I read a line from a text file, convert it to a bytearray and then need to write it to a new (large > 500 Mb) file. Please some advice to speed up the write process.
FileStream.Write is basically what there is. It's possible that using a BufferedStream would help, but unlikely.
If you're really reading a single line of text which, when encoded, is 500MB then I wouldn't be surprised to find that most of the time is being spent performing encoding. You should be able to test that by doing the encoding and then throwing away the result.
Assuming the "encoding" you're performing is just Encoding.GetBytes(string), you might want to try using a StreamWriter to wrap the FileStream - it may work better by tricks like repeatedly encoding into the same array before writing to the file.
If you're actually reading a line at a time and appending that to the file, then:
Obviously it's best if you keep both the input stream and output stream open throughout the operation. Don't repeatedly read and then write.
You may get better performance using multiple threads or possibly asynchronous IO. This will partly depend on whether you're reading from and writing to the same drive.
Using a StreamWriter is probably still a good idea.
Additionally when creating the file, you may want to look at using a constructor which accepts a FileOptions. Experiment with the available options, but I suspect you'll want SequentialScan and possibly WriteThrough.
If your writing nothing but Byte arrays, have you tried using BinaryWriter's Write method? Writing in bulk would probably also help with the speed. Perhaps you can read each line, convert the string to its bytes, store those bytes for a future write operation (i.e in a List or something), and every so often (after reading x lines) write a chunk to the disk.
BinaryWriter: http://msdn.microsoft.com/en-us/library/ms143302.aspx
There are some text files(Records) which i need to access using C#.Net. But the matter is those files are larger than 1GB. (minimum size is 1 GB)
what should I need to do?
What are the factors which I need to be concentrate on?
Can some one give me an idea to over come from this situation.
EDIT:
Thanks for the fast responses. yes they are fixed length records. These text files coming from a local company. (There last month transaction records)
Is it possible to access these files like normal text files (using normal file stream).
and
How about the memory management????
Expanding on CasperOne's answer
Simply put there is no way to reliably put a 100GB file into memory at one time. On a 32 bit machine there is simply not enough addressing space. In a 64 bit machine there is enough addressing space but during the time in which it would take to actually get the file in memory, your user will have killed your process out of frustration.
The trick is to process the file incrementally. The base System.IO.Stream() class is designed to process a variable (and possibly infinite) stream in distinct quantities. It has several Read methods that will only progress down a stream a specific number of bytes. You will need to use these methods in order to divide up the stream.
I can't give more information because your scenario is not specific enough. Can you give us more details or your record delimeters or some sample lines from the file?
Update
If they are fixed length records then System.IO.Stream will work just fine. You can even use File.Open() to get access to the underlying Stream object. Stream.Read has an overload that requests the number of bytes to be read from the file. Since they are fixed length records this should work well for your scenario.
As long as you don't call ReadAllText() and instead use the Stream.Read() methods which take explicit byte arrays, memory won't be an issue. The underlying Stream class will take care not to put the entire file into memory (that is of course, unless you ask it to :) ).
You aren't specifically listing the problems you need to overcome. A file can be 100GB and you can have no problems processing it.
If you have to process the file as a whole then that is going to require some creative coding, but if you can simply process sections of the file at a time, then it is relatively easy to move to the location in the file you need to start from, process the data you need to process in chunks, and then close the file.
More information here would certainly be helpful.
What are the main problems you are having at the moment? The big thing to remember is to think in terms of streams - i.e. keep the minimum amount of data in memory that you can. LINQ is excellent at working with sequences (although there are some buffering operations you need to avoid, such as OrderBy).
For example, here's a way of handling simple records from a large file efficiently (note the iterator block).
For performing multiple aggregates/analysis over large data from files, consider Push LINQ in MiscUtil.
Can you add more context to the problems you are thinking of?
Expanding on JaredPar's answer.
If the file is a binary file (i.e. ints stored as 4 bytes, fixed length strings etc) you can use the BinaryReader class. Easier than pulling out n bytes and then trying to interrogate that.
Also note, the read method on System.IO.Stream is a non blocking operation. If you ask for 100 bytes it may return less than that, but still not have reached end of file.
The BinaryReader.ReadBytes method will block until it reads the requested number of bytes, or End of file - which ever comes first.
Nice collaboration lads :)
Hey Guys, I realize that this post hasn't been touched in a while, but I just wanted to post a site that has the solution to your problem.
http://thedeveloperpage.wordpress.com/c-articles/using-file-streams-to-write-any-size-file-introduction/
Hope it helps!
-CJ