multiple FileStream on the same file causes write to be ignored - c#

I have a process which have 2 FileStream objects that operates on the same file.
Both objects opens the file using the same method:
file = new FileStream(fullPath, FileMode.Append, FileAccess.Write, FileShare.ReadWrite);
Then I write some bytes to them both using write methods, only the last called write is committed while the other is ignored. Write is called using the following code:
fh.file.Write(buffer, 0, count);
buf is equal to "fd" in both calls and count is equal to 2
I call close() for both objects after that. After the program is terminated, the output file only have one of the two "fd"s that should have been written. Why is that happening? I tried calling Flush() on both objects, but it doesn't make a difference.
Note: calls to Write() are done by the same thread.
the final execution order is like this:
open_obj1()
open_obj2()
write_obj1("fd")
write_obj2("fd")
close_obj1()
close_obj2()
It seems like a simple problem, but I can't get where the problem is. Does the both FileStreams reads the file pointer at the same place and then tries to write to the same place because they both seek to the end of file? if so, what's the solution for this if I wanted the exact same execution order?

See Stream.Position; because this property is not shared between your two streams, subsequent writes start at the beginning, thus writing over the previous writes, not unlike switching your text input to overwrite, moving the caret, and typing in new text. Similarly, if you were to write a longer string followed by a shorter one, you would observe the leftover text of the longer string.

Related

How to optimize "real-time" C# write-to-file & MATLAB read-from-file operation

I am trying to find a good method to write data from a NetworkStream (via C#) to a text file while "quasi-simultaneously" reading the newly written data from the text file into Matlab.
Basically, is there a good method or technique for coordinating write/read operations (from separate programs) such that a read operation does not block a write operation (and vice-versa) and the lag between successive write/reads is minimized?
Currently I am just writing (appending) data from the network stream to a text file via a WriteLine loop, and reading the data by looping Matlab's fscanf function which also marks the last element read and repositions the file-pointer to that spot.
Relevant portions of C# code:
(Note: The loop conditions I'm using are arbitrary, I'm just trying to see what works right now.)
NetworkStream network_stream = tcp_client.GetStream();
string path = #"C:\Matlab\serial_data.txt";
FileInfo file_info = new FileInfo(path);
using (StreamWriter writer = file_info.CreateText())
{
string foo = "";
writer.WriteLine(foo);
}
using (StreamWriter writer = File.AppendText(path))
{
byte[] buffer = new byte[1];
int maxlines = 100000;
int lines = 0;
while (lines <= maxlines)
{
network_stream.Read(buffer, 0, buffer.Length);
byte byte2string = buffer[0];
writer.WriteLine(byte2string);
lines++;
}
}
Relevant Matlab Code:
i=0;
while i<100;
a = fopen('serial_data.txt');
b = fscanf(a, '%g', [1000 1]);
fclose(a);
i=i+1;
end
When I look at the data read into Matlab there are large stretches of zeros in between the actual data, and the most disconcerting part is that number of consecutive data-points read between these "false zero" stretches varies drastically.
I was thinking about trying to insert some delays (Thread.sleep and wait(timerObject)) into C# and Matlab, respectively, but even then, I don't feel confident that will guarantee I always obtain the data received over the network stream, which is imperative.
Any advice/suggestions would be greatly appreciated.
Looks like there's an issue with how fscanf is being used in the reader on the Matlab side.
The reader code looks like it's going to reread the entire file each time through the loop, because it's re-opening it on each pass through the loop. Is this intentional? If you want to track the end of a file, you probably want to keep the file handle open, and just keep checking to see if you can read further data from it with repeated fscanf calls on the same open filehandle.
Also, that fscanf call looks like it might always return a zero-padded 1000-element array, regardless of how large the file it read was. Maybe that's where your "false zeros" are coming from. How many there are would vary with how much data is actually in the file and how often the Matlab code read it between writes. Grab the second argout of fscanf to see how many elements it actually read.
[b,nRead] = fscanf(a, '%g', [1000 1]);
fprintf('Read %d numbers\n', nRead);
b = b(1:nRead);
Check the doc page for fscanf. In the "Output Arguments" section: "If the input contains fewer than sizeA elements, MATLABĀ® pads A with zeros."
And then you may want to look at this question: How can I do an atomic write/append in C#, or how do I get files opened with the FILE_APPEND_DATA flag?. Keeping the writes shorter than the output stream's buffer (like they are now) will make them atomic, and flushing after each write will make them visible to the reader in a timely manner.

MemoryStream.WriteTo(Stream destinationStream) versus Stream.CopyTo(Stream destinationStream)

Which one is better : MemoryStream.WriteTo(Stream destinationStream) or Stream.CopyTo(Stream destinationStream)??
I am talking about the comparison of these two methods without Buffer as I am doing like this :
Stream str = File.Open("SomeFile.file");
MemoryStream mstr = new MemoryStream(File.ReadAllBytes("SomeFile.file"));
using(var Ms = File.Create("NewFile.file", 8 * 1024))
{
str.CopyTo(Ms) or mstr.WriteTo(Ms);// Which one will be better??
}
Update
Here is what I want to Do :
Open File [ Say "X" Type File]
Parse the Contents
From here I get a Bunch of new Streams [ 3 ~ 4 Files ]
Parse One Stream
Extract Thousands of files [ The Stream is an Image File ]
Save the Other Streams To Files
Editing all the Files
Generate a New "X" Type File.
I have written every bit of code which is actually working correctly..
But Now I am optimizing the code to make the most efficient.
It is an historical accident that there are two ways to do the same thing. MemoryStream always had the WriteTo() method, Stream didn't acquire the CopyTo() method until .NET 4.
The MemoryStream.WriteTo() version looks like this:
public virtual void WriteTo(Stream stream)
{
// Exception throwing code elided...
stream.Write(this._buffer, this._origin, this._length - this._origin);
}
The Stream.CopyTo() implementation like this:
private void InternalCopyTo(Stream destination, int bufferSize)
{
int num;
byte[] buffer = new byte[bufferSize];
while ((num = this.Read(buffer, 0, buffer.Length)) != 0)
{
destination.Write(buffer, 0, num);
}
}
Stream.CopyTo() is more universal, it works for any stream. And helps programmers that fumble copying data from, say, a NetworkStream. Forgetting to pay attention to the return value from Read() was a very common bug. But it of course copies the bytes twice and allocates that temporary buffer, MemoryStream doesn't need it since it can write directly from its own buffer. So you'd still prefer WriteTo(). Noticing the difference isn't very likely.
MemoryStream.WriteTo: Writes the entire contents of this memory stream to another stream.
Stream.CopyTo: Reads the bytes from the current stream and writes them to the destination stream. Copying begins at the current position in the current stream.
You'll need to seek back to 0, to get the whole source stream copied.
So I think MemoryStream.WriteTo better option for this situation
If you use Stream.CopyTo, you don't need to read all the bytes into memory to start with. However:
This code would be simpler if you just used File.Copy
If you are going to load all the data into memory, you can just use:
byte[] data = File.ReadAllBytes("input");
File.WriteAllBytes("output", data);
You should have a using statement for the input as well as the output stream
If you really need processing so can't use File.Copy, using Stream.CopyTo will cope with larger files than loading everything into memory. You may not need that, of course, or you may need to load the whole file into memory for other reasons.
If you have got a MemoryStream, I'd probably use MemoryStream.WriteTo rather than Stream.CopyTo, but it probably won't make much difference which you use, except that you need to make sure you're at the start of the stream when using CopyTo.
I think Hans Passant's claim of a bug in MemoryStream.WriteTo() is wrong; it does not "ignore the return value of Write()". Stream.Write() returns void, which implies to me that the entire count bytes are written, which implies that Stream.Write() will block as necessary to complete the operation to, e.g., a NetworkStream, or throw if it ultimately fails.
That is indeed different from the write() system call in ?nix, and its many emulations in libc and so forth, which can return a "short write". I suspect Hans leaped to the conclusion that Stream.Write() followed that, which I would have expected, too, but apparently it does not.
It is conceivable that Stream.Write() could perform a "short write", without returning any indication of that, requiring the caller to check that the Position property of the Stream has actually been advanced by count. That would be a very error-prone API, and I doubt that it does that, but I have not thoroughly tested it. (Testing it would be a bit tricky: I think you would need to hook up a TCP NetworkStream with a reader on the other end that blocked forever, and write enough to fill up the wire buffers. Or something like that...)
The comments for Stream.Write() are not quite unambiguous:
Summary:
When overridden in a derived class, writes a sequence of bytes to the current
stream and advances the current position within this stream by the number
of bytes written.
Parameters: buffer:
An array of bytes. This method copies count bytes from buffer to the current stream.
Compare that to the Linux man page for write(2):
write() writes up to count bytes from the buffer pointed buf to the file referred to by the file descriptor fd.
Note the crucial "up to". That sentence is followed by explanation of some of the conditions under which a "short write" might occur, making it very explicit that it can occur.
This is really a critical issue: we need to know how Stream.Write() behaves, beyond all doubt.
The CopyTo method creates a buffer, populates its with data from the original stream and then calls the Write method passing the created buffer as a parameter. The WriteTo uses the memoryStream's internal buffer to write. That is the difference. What is better - it is up to you to decide which method you prefer.
Creating a MemoryStream from a HttpInputStream in Vb.Net:
Dim filename As String = MyFile.PostedFile.FileName
Dim fileData As Byte() = Nothing
Using binaryReader = New BinaryReader(MyFile.PostedFile.InputStream)
binaryReader.BaseStream.Position = 0
fileData = binaryReader.ReadBytes(MyFile.PostedFile.ContentLength)
End Using
Dim memoryStream As MemoryStream = New MemoryStream(fileData)

One time gives exception another time doesnt

I use multi threading to copy file to another place.I from stream needed byte array and dispose stream.at this example i use 7 threads to copy 3gb file.1st thread can get byte array,but at 2nd thread occurs exception 'System.OutOfMemoryException'
public void Begin()
{
FileStream stream = new FileStream(pathToFile, FileMode.Open);
stream.Position = (threNmb - 1) * 536870912;
BinaryReader reader = new BinaryReader(stream);
for (long i = 0; i < (length); i++)
{
source.Add(reader.ReadByte());//gives exception at i=134217728
}
reader.Dispose();
reader.Close();
stream.Dispose();
stream.Close();
}
It looks like you're using a List<byte>. That's going to be a very inefficient way of copying data, and you're probably making it less efficient by using multiple threads. Additionally, if you're using a single list from multiple threads, your code is already broken - List<T> isn't thread-safe. Even if it were, you'd be mixing the data from the different threads, so you wouldn't be able to reassemble the original data. Oh, and by not using using statements, if an exception is thrown you're leaving file handles open. In other words, I'm advising you to completely abandon your current approach.
Instead, copying a file from one place to another (assuming you can't use File.Copy for some reason), should basically be a case of:
Open stream to read
Open stream to write
Allocate buffer (e.g. 32K)
Repeatedly:
Read from the "input" stream into the buffer, noting how much data the call actually read
Write the data you've just read into the output stream
Loop until your "Read" indicates the end of the input stream
Close both streams (with using statements)
There's no need to have everything in memory. Note that in .NET 4 this is made even easier with the Stream.CopyTo method. which does the third and fourth steps for you.

Writing at the end of file

I'm working on a system that requires high file I/O performance (with C#).
Basically, I'm filling up large files (~100MB) from the start of the file until the end of the file.
Every ~5 seconds I'm adding ~5MB to the file (sequentially from the start of the file), on every bulk I'm flushing the stream.
Every few minutes I need to update a structure which I write at the end of the file (some kind of metadata).
When flushing each one of the bulks I have no performance issue.
However, when updating the metadata at the end of the file I get really low performance.
My guess is that when creating the file (which also should be done extra fast), the file doesn't really allocates the entire 100MB on the disk and when I flush the metadata it must allocates all space until the end of file.
Guys/Girls, any Idea how I can overcome this problem?
Thanks a lot!
From comment:
In general speaking the code is as follows, first the file is opened:
m_Stream = new FileStream(filename,
FileMode.CreateNew,
FileAccess.Write,
FileShare.Write, 8192, false);
m_Stream.SetLength(100*1024*1024);
Every few seconds I'm writing ~5MB.
m_Stream.Seek(m_LastPosition, SeekOrigin.Begin);
m_Stream.Write(buffer, 0, buffer.Length);
m_Stream.Flush();
m_LastPosition += buffer.Length; // HH: guessed the +=
m_Stream.Seek(m_MetaDataSize, SeekOrigin.End);
m_Stream.Write(metadata, 0, metadata.Length);
m_Stream.Flush(); // Takes too long on the first time(~1 sec).
As suggested above would it not make sense (assuming you must have the meta data at the end of the file) write that first.
That would do 2 things (assuming a non sparse file)...
1. allocate the total space for the entire file
2. make any following write operations a little faster as the space is ready and waiting.
Can you not do this asyncronously?
At least the application can then move on to other things.
Have you tried the AppendAllText method?
Your question isn't totally clear, but my guess is you create a file, write 5MB, then seek to 100MB and write the metadata, then seek back to 5MB and write another 5MB and so on.
If that is the case, this is a filesystem problem. When you extend the file, NTFS has to fill the gap in with something. As you say, the file is not allocated until you write to it. The first time you write the metadata the file is only 5MB long, so when you write the metadata NTFS has to allocate and write out 95MB of zeros before it writes the metadata. Upsettingly I think it also does this synchronously, so you don't even win using overlapped IO.
How about using the BufferedStream?
http://msdn.microsoft.com/en-us/library/system.io.bufferedstream(v=VS.100).aspx

how to read a file

I want to write a function in a way if i call it with an argument
of(100) and the path of particular file it will gets the first 100kb
data of the file, and when i call it for the second time with 200 it should return the next 200kb of data leaving the first 100. If there is no more left in the file it should return 0;
Thanks
most of what you want is handled by the System.IO.File and FileStream. If you want that exact function signature.
Step 1) you need to open a file with a particular path. System.IO.File has a few methods for doing this, including Open and OpenRead, as well as ReadAllXXXX, allowing you to access the flie contents in multiple ways. The one you'd probably want is OpenRead, which returns a FileStream object.
Step 2) you need to read a certain number of bytes. Once you have the FileStream from step 1, you should look at the Stream.ReadBytes method. given an array of bytes, it will read a specified number of bytes from the stream into the array
You may look at the StreamReader Class. It may get you to where you want to go, though I'm not sure of how specifically to break it down into the kb chunks you want.

Categories