weird behaviour of seek C#

weird behaviour of seek C# - c#

I have some difficulties with stream. I am using FileStream and BinaryReader and I got some weird behaviours. First of all (and this was on another question, when used StreamReader I got weird behaviour that when I did Peek the psoition was changed, so I used BinaryReader which was fine) NOW I have a problem that sometimes when I do Seek (using of course the underlying base stream - FileStream) SOMETIMES it works fine (get to the right position) but sometimes it just jumps to a position that is way beyond the file's length, It doesn't happen all the time, for instance I had a problem to get to a position at 1233*267, but a day later it was fine and the problem was at another place.
FileStream m_fsReader = new FileStream(m_strDataFileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
BinaryReader m_brReader = new BinaryReader(m_fsReader);
and the seek part:
m_fsReader.Seek(offset, SeekOrigin.Begin);
Thanks,

I've noticed that every Stream keep its own position. When a Stream is constructed from another stream, the position is initially the same; but if the second stream seek, it doesn't synchronize its base stream position.
Try to watch Position property of both streams after read and seek operation. You will see discrepancies between the operation and the base stream Position value.
I solved this problem by calling myself Seek on the base stream after the work done by a substream.

It is difficult to say but I'm quite sure that is if one day work and another it does not probability the file has been changed.
Regarding the Seek Method it allow you to seek to any location beyond the length of the stream.
From MSDN:
You can seek to any location beyond the length of the stream. When you seek beyond the length of the file, the file size grows.
http://msdn.microsoft.com/en-us/library/system.io.filestream.seek.aspx

Related

Guidelines for designing a robust file format writer?

Suppose you want to write a .WAV file format writer like so:
using var stream = File.OpenRead("test.wav");
using var writer = new WavWriter(stream, ... /* .WAV format parameters */);
// write the file
// writer.Dispose() does a few things:
// - writes user-added chunks
// - updates the file header (chunk size) so the file is valid
There is a concpetual problem in doing so:
the user can change the stream position and therefore screw the writing process
You may suggest the following:
the writer should own the stream, this would work if writing to a file, but not to a stream
own its own memory stream so it can write to streams too, okay but memory concerns
I guess you get the point...
To me, the only viable thing would be to document that aspect but I may have missed something, hence the question.
Question:
How to make a file format writer be able to write to a stream yet defend yourself about possible changes to its position?

My suggestion would be to keep an internal position field in the WavWriter. Each time you do some operation you can check that this matches the position in the backing stream and throw an exception if it does not. Update this value at the end of each write operation.
Ideally you should also handle streams that does not support seeking, but it does not sound like your design would permit that anyway. It might be a good idea to check CanSeek in the constructor and throw if seek is not supported. It is in general a good idea to validate any arguments before usage.

FileStream.Read() - bytes read

FileStream.Read() returns the amount of bytes read, but... is there any situation other than having reached the end of file, that it will read less bytes than the number of bytes requested and not throw an exception?
the documentation says:
The Read method returns zero only after reaching the end of the stream. Otherwise, Read always reads at least one byte from the stream before returning. If no data is available from the stream upon a call to Read, the method will block until at least one byte of data can be returned. An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached.
But this doesn't quite explain in what situations data would be unavailable and cause the method to block until it can read again. I mean, shouldn't most situations where data is unavailable force an exception?
What are real situations where comparing the number of bytes read against the number of expected bytes could differ (assuming that we're already checking for end of file when we mention number of bytes expected)?
EDIT: A bit more information, reason why I'm asking this is because I've come across a bit of code where the developer pretty much did something like this:
bytesExpected = (remainingBytesInFile > 94208 ? 94208 : remainingBytesInFile
while (bytesRead < bytesExpected)
{
bytesRead += fileStream.Read(buffer, bytesRead, bytesExpected - bytesRead)
}
Now, I can't see any advantage to having this while at all, I'd expect it to throw an exception if it can't read the number of bytes expected (bearing in mind it's already taking into account that there are those many bytes left to read)
What would the reason one could possibly have for something like this? I'm sure I'm missing something

The documentation is for Stream.Read, from which FileStream is derived. Since FileStream is a stream, it should obey the stream contract. Not all streams do, but unless you have a very good reason, you should stick to that.
In a typical file stream, you'll only get a return value smaller than count when you reach the end of file (and it's a pretty simple way of checking for the end of file).
However, in a NetworkStream, for example, you keep reading in a loop until the method returns zero - signalling the end of stream. The same works for file streams - you know you're at the end of the file when Read returns zero.
Most importantly, FileStream isn't just for what you'd consider files - it's also for pseudo-files like standard input/output pipes and COM ports, for example (try opening a file stream on PRN, for example). In that case, you're not reading a file with a fixed length, and the behaviour is the same as with NetworkStream.
Finally, don't forget that FileStream isn't sealed. It's perfectly fine for you to implement a virtualized file system, for example - and it's perfectly fine if your virtualized file system doesn't support seeking, or checking the length of file.
EDIT:
To address your edit, this is exactly how you're supposed to read any stream. Nothing wrong with it. If there's nothing else to read in a stream, the Read method will simply return 0, and you know the stream is over. The only thing is, it seems that he tries to fill his buffer to full, one buffer at a time - this only makes sense if you explicitly need to partition the file by 94208 bytes, and pass that byte[] for further processing somewhere.
If that's not the case, you don't really need to fill the full buffer - you just keep reading (and probably writing on some other side) until Read returns 0. And indeed, by default, FileStream will always fill the whole buffer unless it's built around a pipe handle - but since that's a possibility, you shouldn't rely on the "real file" behaviour, so as long as you need those byte[] for something non-stream (e.g. parsing messages), this is entirely fine. If you're only using the stream as an actual stream, and you're streaming the data somewhere else, it doesn't have a point, really - you only need one while to read the file.

Your expectations would only apply to the case when the stream is reading data off of a no-latency source. Other I/O sources can be slow, which is why the Read method might will not always be able to return immediately. That doesn't mean that there is an error (so no exception), just that it has to wait for data to arrive.
Examples: network stream, file stream on slow disk, etc.
(UPDATE, HDD example) To give an example specific to files (since your case is FileStream, although Read is defined on Stream and so all implementations should fulfill the requirements): mechanical hard-drives go to "sleep" when not active (specially on battery-powered devices, read laptops). Spinning up can take a second or so. That is not an IOException, but your read would have to wait for a second before any data is read.

Simple answer is that on a FileStream it probably never happens.
However keep in mind that the Read method is inherited from Stream which serves as base for many other streams like NetworkStream and in this case you may not be able to read has many bytes as you requested simple because they havent been received from the network yet.
So like the documentation says it all depends on the implementation of the specific type of stream - FileStream, NetworkStream, etc.

Unexpected MemoryStream behavior

I am reading and writing to the same MemoryStream.
Something like this(possible compilation mistakes):
MemoryStream stream = new MemoryStream();
stream.Write("1234",0,4);
stream.Position -= 4;
stream.Read(buffer,0,4);
Why do I HAVE to move Position? Why it is not separate to read and write?
Is there any other Stream that can be used?

Because that's how streams are supposed to work. You have one position, see it as a cursor, set at a point in the stream at which you can read or write. Reading and writing both advance this position.
If you're merely using a MemoryStream to exchange data between callers, as a pseudo IPC mechanism, then perhaps some better way exists to do so.

How to write data at a particular position in c#?

I am making an application in c#. In that application I have one byte array and I want to write that byte array data to particular position.
Here i used the following logic.
using(StreamWriter writer=new StreamWriter(#"D:\"+ FileName + ".txt",true))
{
writer.WriteLine(Encoding.ASCII.GetString(Data),IndexInFile,Data.Length);
}
But whenever i am writing data in file, it starts writing from starting.
My condition is that suppose at the initial I have empty file and I want to start writing in file from position 10000. Please help me .Thanks in advance.

Never try to write binary data as strings, like you do. It won't work correctly. Write binary data as binary data. You can use Stream for that instead of StreamWriter.
using (Stream stream = new FileStream(fileName, FileMode.OpenOrCreate))
{
stream.Seek(1000, SeekOrigin.Begin);
stream.Write(Data, 0, Data.Length);
}

You can set the position inside the stream like this:
writer.BaseStream.Seek( 1000, SeekOrigin.Begin);
Add this before your WriteLine code.
Note that I have not included any code to check that there is at least 1000 chars inside the file to start with.

The WriteLine overload you are using is this one:
TextWriter.WriteLine Method (Char[], Int32, Int32)
In particular the IndexInFile argument supplied is actually the index in the buffer from which to begin reading not the index in the file at which to begin writing - this explains why you are writing at the start of the file not at IndexInFile as you expect.
You should obtain access to the underlying Stream and Seek to the desired position in the file first and then write to the file.

Maybe a bit hacky, but you can access the BaseStream property of the StreamWriter and use Seek(long offset, SeekOrigin origin) on it. (But be warned, as not every stream can use Seek.)

Is there a way to make this faster? MemoryStream vs FileStream

I am working with iTextSharp, and need to generate hundreds of thousands of RTF documents - the resulting files are between 5KB and 500KB.
I am listing 2 approaches below - the original approach wasn't necessarily slow, but I figured why write and retrieve to/from file to get the output string I need. I saw this other approach using MemoryStream, but it actually slowed things down. I essentially just need the outputted RTF content, so that I can run some filters on that RTF to clean up unnecessary formatting. The queries bringing back the data are very quick instant seeming . To generate a 1000 files (actually 2000 files are created in process) with original approach files takes about 15 minutes, the same with second approach takes about 25-30 minutes. The resulting files that I've run are averaging around 80KB.
Is there something wrong with the second approach? Seems like it should be faster than the first one, not slower.
Original approach:
RtfWriter2.GetInstance(doc, new FileStream(RTFFilePathName, FileMode.Create));
doc.Open();
//Add Tables and stuff here
doc.Close(); //It saves a file here to (RTFPathFileName)
StreamReader srRTF = new StreamReader(RTFFilePathName);
string rtfText = srRTF.ReadToEnd();
srRTF.Close();
//Do additional things with rtfText before writing to my final file
New approach, trying to speed it up but this is actually half as fast:
MemoryStream stream = new MemoryStream();
RtfWriter2.GetInstance(doc, stream);
doc.Open();
//Add Tables and stuff here
doc.Close();
string rtfText =
ASCIIEncoding.ASCII.GetString(stream.GetBuffer());
stream.Close();
//Do additional things with rtfText before writing to my final file
The second approach I am trying I found here:
iTextSharp - How to generate a RTF document in the ClipBoard instead of a file

How big your resulting stream is? MemoryStream performs a lot of memory copy operations while growing, so for large results it may take significantly longer to write data by small chunks compared with FileStream.
To verify if it is the problem set inital size of MemoryStream to some large value around resulting size and re-run the code.
To fix it you can pre-grow memory stream initially (if you know approximate output) or write your own stream that uses different scheme when growing. Also using temporary file might be good enough for your purposes as is.

Like Alexei said, its probably caused by fact, yo are creating MemoryStream every time, and every time it continously re-alocates memory as it grows. Try creating only 1 stream and reset it to begining before every write.
Also I think stream.GetBuffer() again returns new memory, so try using same StreamReader with your MemoryStream.
And it seems your code can be easily paralelised, so you can try run it using Paralel Extesions or using TreadPool.
And it seems little weird, you are writing your text as bytes in stream, then reading this stream as bytes and converting to text. Wouldnt it be possible to save your document directly as text?

A MemoryStream is not associated with a file, and has no concept of a filename. Basically, you can't do that.
You certainly can't cast between them; you can only cast upwards an downwards - not sideways; to visualise:
Stream
|
| |
FileStream MemoryStream
You can cast a MemoryStream to a Stream trivially, and a Stream to a MemoryStream via a type-check; but never a FileStream to a MemoryStream. That is like saying a dog is an animal, and an elephant is an animal, so we can cast a dog to an elephant.
You could subclass MemoryStream and add a Name property (that you supply a value for), but there would still be no commonality between a FileStream and a YourCustomMemoryStream, and FileStream doesn't implement a pre-existing interface to get a Name; so the caller would have to explicitly handle both separately, or use duck-typing (maybe via dynamic or reflection).
Another option (perhaps easier) might be: write your data to a temporary file; use a FileStream from there; then (later) delete the file.

I know this is old but there is a lot of misinformation in this thread.
It's all about buffer size. The internal buffers are significantly smaller with a memory stream vs a file stream. Smaller buffers cause more read\writes.
Just intilaize your memory stream with either a file stream or a byte array with a size of around 80k. Close the doc, set stream position to 0 and read to end the contents.
On a side note, get buffer will return the whole allocated buffer. So if you only wrote 1 byte and the buffer is 4k, you will have a lot of garbage in your string.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

weird behaviour of seek C# - c#

Related

Guidelines for designing a robust file format writer?

FileStream.Read() - bytes read

Unexpected MemoryStream behavior

How to write data at a particular position in c#?

Is there a way to make this faster? MemoryStream vs FileStream

Categories

Resources