I need a C# implementation of Java's PushbackInputStream. I have made my own very basic one, but I wondered if there was a well tested and decently performing version already available somewhere. As it happens I always push back the same bytes I read so really it just needs to be able to reposition backwards, buffering up to a number of bytes I specify. (like Java's BufferedInputStream with the mark and reset methods).
Update: I should add that I can't simply reposition the stream as CanSeek may be false. (e.g. when the input steam is a NetworkStream)
The problem with pushing data back into a stream is that any readers that sit on top of the stream may already have a local buffer of data. This makes this approach very brittle. Personally, I would try to avoid this scenario, and use data constructs where I either don't need to push back, or can use single-byte Peek etc.
You need to build a wrapper class that either functions as a stream, but supports a buffer of the last X bytes so you can seek back at least for a limited distance, or something that isn't a stream at all where you can indeed "push data back into the input stream".
Either way you're going to have to write something yourself.
Can't you just use a System.IO.Stream and seek backwards after reading from current position?
stream.Seek(-1, System.IO.SeekOrigin.Current)
Where -1 could be a variable of how far you want to go back?
So long as the stream indicates it supports seeking (CanSeek) then
stream.Seek(-offset, System.IO.SeekOrigin.Current)
Will be fine.
Related
I'm porting over some C++ and Java code to C# and see all the data IO as Streams, when all the streams are less than 1K in size.
Given that the buffer size of the stream equals the whole stream in almost all circumstances, is there any reason I shouldn't simply use a Byte[]?
The 1K data units are arriving from a stream source (network or disk) however, once read into memory, the stream access is a little random. I think direct byte[x] access might be more efficient (or logical).
So my question, is it generally acceptable from a security and architecture perspective to use Byte[] array directly instead of wrapping it in a stream? Assume that no further "stream" access is needed for other operations (e.g. an encoded media stream).
Since you stated that the usage is "a little random," I think a byte array makes the most sense; these are inherently good for lookup at a given position, while a stream would require you to do a linear read and reset the position. I'm not sure what security concerns you might have, but if you're passing the array to any unmanaged resources you might want to consider pinning it in memory.
I need a collection type for received bytes in my socket application (which deals with ~5k of concurrent connections).
I tried using a List<byte> but since it has one internal array and I receive lots of data, it can cause OutOfMemoryExceptions.
So I need a collection that,
Keeps the data in smaller blocks; like an Unrolled Linked List.
Provides fast lookup (Preferably an IList<T>) because I look for a delimiter that marks the end of the message after each receive operation.
What I use right now is Stream. I supply a MemoryStream for the operations that don't involve too much data and supply a FileStream of a temporary file for the operations that involve serious amounts of data.
MemoryStream is no different than a List<T>, though and I prefer not to use files as buffers.
So...
What collection or approach do you recommend?
It appears that you are using inappropriate architecture for a network application. You should buffer only those data which is required. Here you are using a list to buffer the data until the required amount of data is received.
I would recommend that you should check for delimiter on each receipt of data in the data itself and if it is there, you should push in only the data till you encounter the delimiter. Once the data is ready, you should fetch it out from list and use it and dispose off the list. Adding up everything to the list is not a good approach and will surely consume a lot of memory.
Ideally, you should have a protocol which always inform you before you actually receive the data about the length of data you are going to receive. This way, you can be sure that required data has been received and you should not rely on the delimiter.
A possible quick and dirty solution:
At the start of the program, allocate a buffer large enough for the largest amount of data you will receive. Use a separate 'count' field to keep track of how much data is currently in use.
(I don't really like this solution; I'd use files or find some way of working with the data in blocks, but it might work for you).
I receive the follow exception:
System.NotSupportedException : This stream does not support seek operations.
at System.Net.Sockets.NetworkStream.Seek(Int64 offset, SeekOrigin origin)
at System.IO.BufferedStream.FlushRead()
at System.IO.BufferedStream.WriteByte(Byte value)
The follow link show that this is a known problem for microsoft.
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=273186
This stacktrace show 2 things:
The System.IO.BufferedStream do some absurd pointer move operation. A BufferedStream should buffer the underlying stream and not more. The quality of buffer will be bad if there are such seek operation.
It will never work stable with a stream that not support Seek.
Are there any alternatives?
Does I need a buffer together with a NetworkStream in C# or is this already buffered.
Edit: I want simply reduce the number of read/write calls to the underlying socket stream.
The NetworkStream is already buffered. All data that is received is kept in a buffer waiting for you to read it. Calls to read will either be very fast, or will block waiting for data to be received from the other peer on the network, a BufferedStream will not help in either case.
If you are concerned about the blocking then you can look at switching the underlying socket to non-blocking mode.
The solution is to use two independent BufferedStreams, one for receiving and one for sending. And don't forget to flush the sending BufferedStream appropriately.
Since even in 2018 it seems hard to get a satisfying answer to this question, for the sake of humanity, here are my two cents:
The NetworkStream is buffered on the OS side. However, that does not mean there are no reasons to buffer on the .net side. TCP behaves well on Write-Read (repeat), but stalls on Write-Write-Read due to delayed ack, etc, etc.
If you, like me, have a bunch of sub-par protocol code to take into the twentyfirst century, you want to buffer.
Alternatively, if you stick to the above, you could also buffer only reads/rcvs or only writes/sends, and use the NetworkStream directly for the other side, depending on how broken what code is. You just have to be consistent!
What BufferedStream docs fail to make abundantly clear is that you should only switch reading and writing if your stream is seekable. This is because it buffers reads and writes in the same buffer. BufferedStream simply does not work well for NetworkStream.
As Marc pointed out, the cause of this lameness is the conflation of two streams into one NetworkStream which is not one of .net's greatest design decisions.
A BufferedStream simply acts to reduce the number of read/write calls to the underlying stream (which may be IO/hardware bound). It cannot provide seek capability (and indeed, buffering and seeking are in many ways contrary to eachother).
Why do you need to seek? Perhaps copy the stream to something seekable first - a MemoryStream or a FileStream - then do your actual work from that second, seekable stream.
Do you have a specific purpose in mind? I may be able to suggest more appropriate options with more details...
In particular: note that NetworkStream is a curiosity - with most streams, read/write relate to the same physical stream; however, a NetworkStream actually represents two completely independent pipes; read and write are completely unrelated. Likewise, you can't seek in bytes that have already zipped past you... you can skip data, but that is better done by doing a few Read opdrations and discarding the data.
There are some text files(Records) which i need to access using C#.Net. But the matter is those files are larger than 1GB. (minimum size is 1 GB)
what should I need to do?
What are the factors which I need to be concentrate on?
Can some one give me an idea to over come from this situation.
EDIT:
Thanks for the fast responses. yes they are fixed length records. These text files coming from a local company. (There last month transaction records)
Is it possible to access these files like normal text files (using normal file stream).
and
How about the memory management????
Expanding on CasperOne's answer
Simply put there is no way to reliably put a 100GB file into memory at one time. On a 32 bit machine there is simply not enough addressing space. In a 64 bit machine there is enough addressing space but during the time in which it would take to actually get the file in memory, your user will have killed your process out of frustration.
The trick is to process the file incrementally. The base System.IO.Stream() class is designed to process a variable (and possibly infinite) stream in distinct quantities. It has several Read methods that will only progress down a stream a specific number of bytes. You will need to use these methods in order to divide up the stream.
I can't give more information because your scenario is not specific enough. Can you give us more details or your record delimeters or some sample lines from the file?
Update
If they are fixed length records then System.IO.Stream will work just fine. You can even use File.Open() to get access to the underlying Stream object. Stream.Read has an overload that requests the number of bytes to be read from the file. Since they are fixed length records this should work well for your scenario.
As long as you don't call ReadAllText() and instead use the Stream.Read() methods which take explicit byte arrays, memory won't be an issue. The underlying Stream class will take care not to put the entire file into memory (that is of course, unless you ask it to :) ).
You aren't specifically listing the problems you need to overcome. A file can be 100GB and you can have no problems processing it.
If you have to process the file as a whole then that is going to require some creative coding, but if you can simply process sections of the file at a time, then it is relatively easy to move to the location in the file you need to start from, process the data you need to process in chunks, and then close the file.
More information here would certainly be helpful.
What are the main problems you are having at the moment? The big thing to remember is to think in terms of streams - i.e. keep the minimum amount of data in memory that you can. LINQ is excellent at working with sequences (although there are some buffering operations you need to avoid, such as OrderBy).
For example, here's a way of handling simple records from a large file efficiently (note the iterator block).
For performing multiple aggregates/analysis over large data from files, consider Push LINQ in MiscUtil.
Can you add more context to the problems you are thinking of?
Expanding on JaredPar's answer.
If the file is a binary file (i.e. ints stored as 4 bytes, fixed length strings etc) you can use the BinaryReader class. Easier than pulling out n bytes and then trying to interrogate that.
Also note, the read method on System.IO.Stream is a non blocking operation. If you ask for 100 bytes it may return less than that, but still not have reached end of file.
The BinaryReader.ReadBytes method will block until it reads the requested number of bytes, or End of file - which ever comes first.
Nice collaboration lads :)
Hey Guys, I realize that this post hasn't been touched in a while, but I just wanted to post a site that has the solution to your problem.
http://thedeveloperpage.wordpress.com/c-articles/using-file-streams-to-write-any-size-file-introduction/
Hope it helps!
-CJ
I'm using C#.Net and the Socket class from the System.Net.Sockets namespace. I'm using the asynchronous receive methods. I understand this can be more easily done with something like a web service; this question is borne out of my curiosity rather than a practical need.
My question is: assume the client is sending some binary-serialized object of an unknown length. On my server with the socket, how do I know the entire object has been received and that it is ready for deserialization? I've considered prepending the object with the length of the object in bytes, but this seems unnecessary in the .Net world. What happens if the object is larger than the buffer? How would I know, 'hey, gotta resize the buffer because the object is too big'?
You either need the protocol to be self-terminating (like XML is, effectively - you know when you've finished receiving an XML document when it closes the root element) or you need to length-prefix the data, or you need the other end to close the stream when it's done.
In the case of a self-terminated protocol, you need to have enough hooks in so that the reading code can tell when it's finished. With binary serialization you may well not have enough hooks. Length-prefix is by far the easiest solution here.
If you use pure sockets, you need to know the length. Otherwise, the size of the buffer is not relevant, because even if you have a buffer of the size of the whole data, it still may not read all into it - check Stream.Read method, it returns the nr of bites actually read, so you need to loop until all data is received.
Yeah, you won't deserialize until you've rxed all the bytes.