I use multi threading to copy file to another place.I from stream needed byte array and dispose stream.at this example i use 7 threads to copy 3gb file.1st thread can get byte array,but at 2nd thread occurs exception 'System.OutOfMemoryException'
public void Begin()
{
FileStream stream = new FileStream(pathToFile, FileMode.Open);
stream.Position = (threNmb - 1) * 536870912;
BinaryReader reader = new BinaryReader(stream);
for (long i = 0; i < (length); i++)
{
source.Add(reader.ReadByte());//gives exception at i=134217728
}
reader.Dispose();
reader.Close();
stream.Dispose();
stream.Close();
}
It looks like you're using a List<byte>. That's going to be a very inefficient way of copying data, and you're probably making it less efficient by using multiple threads. Additionally, if you're using a single list from multiple threads, your code is already broken - List<T> isn't thread-safe. Even if it were, you'd be mixing the data from the different threads, so you wouldn't be able to reassemble the original data. Oh, and by not using using statements, if an exception is thrown you're leaving file handles open. In other words, I'm advising you to completely abandon your current approach.
Instead, copying a file from one place to another (assuming you can't use File.Copy for some reason), should basically be a case of:
Open stream to read
Open stream to write
Allocate buffer (e.g. 32K)
Repeatedly:
Read from the "input" stream into the buffer, noting how much data the call actually read
Write the data you've just read into the output stream
Loop until your "Read" indicates the end of the input stream
Close both streams (with using statements)
There's no need to have everything in memory. Note that in .NET 4 this is made even easier with the Stream.CopyTo method. which does the third and fourth steps for you.
Related
I am wondering if it's a good practice to use the EndOfStreamException to detect the end of the BinaryReader stream> I don't want to use BaseStream.Length property or PeekChar as proposed here
C# checking for binary reader end of file
because I have to load it into memory (maybe because it's from Zip file) and enable some flags. Instead, this is what I'm doing:
using (ZipArchive zipArchive = ZipFile.OpenRead(Filename))
using (BinaryReader fStream = new BinaryReader(zipArchive.Entries[0].Open()))
{
while(true){
try
{
fStream.ReadInt32();
}
catch (EndOfStreamException ex)
{
Log.Debug("End of Binary Stream");
break;
}
}
}
That approach is fine. If you know you have a seekable stream you can compare its length to the number of bytes read. Note that FileStream.Length does not load the whole stream into memory.
But that approach is appropriate for arbitrary streams.
And don't worry about the cost of using exceptions in this case, as streams imply IO, and IO is orders of magnitude slower that exception handling.
I would argue that 'best practice' is to have the number of values known, for example by prefixing the stream with the number of values. This should allow you to write it like
var length = fStream.ReadInt32();
for(var i = 0; i < length-1; i++){
fStream.ReadInt32(); // Skip all values except last
}
return fStream.ReadInt32(); // Last value
First of all this would reduce the need of exception handling, if you reach the endOfStream before the last item you know the stream was incorrectly saved, and have a chance of handling it, instead of just returning the last available value. I also find it helpful to have as few exceptions as possible, so you can run your debugger with "break with thrown", and have some sort of confidence that thrown exceptions indicate actual problems. It can also allow you to save your values as part of some other data.
If you cannot change the input format you can still get the uncompressed length of the entry from ZipArchiveEntry.Length. Just divide by sizeof(int) to get the number of values.
In most cases I would also argue for using a serialization library to save data. This tend to make it much easier to change the format of the data in the future.
check your program or init value.
This is a follow up question to this question:
Difference between file path and file stream?
I didn't fully understand everything answered in the linked question.
I am using the Microsoft.SqlServer.Dac.BacPackage which contains a Load method with 2 overloads - one that receives a string path and one that receives a Stream.
This is the documentation of the Load method:
https://learn.microsoft.com/en-us/dotnet/api/microsoft.sqlserver.dac.bacpackage.load?view=sql-dacfx-150
What exactly is the difference between the two? Am I correct in assuming that the overloading of the string path saves all the file in the memory first, while the stream isn't? Are there other differences?
No, the file will not usually be fully loaded all at once.
A string path parameter normally means it will just open the file as a FileStream and pass it to the other version of the function. There is no reason why the stream should fully load the file into memory unless requested.
A Stream parameter means you open the file and pass the resulting Stream. You could also pass any other type of Stream, such as a network stream, a zip or decryption stream, a memory-backed stream, anything really.
Short answer:
The fact that you have two methods, one that accepts a filename and one that accepts a stream is just for convenience. Internally, the one with the filename will open the file as a stream and call the other method.
Longer answer
You can consider a stream as a sequence of bytes. The reason to use a stream instead of a byte[] or List<byte>, is, that if the sequence is really, really large, and you don't need to have access to all bytes at once, it would be a waste to put all bytes in memory before processing them.
For instance, if you want to calculate the checksum for all bytes in a file: you don't need to put all data in memory before you can start calculating the sum. In fact, anything that efficiently can deliver you the bytes one by one would suffice.
That is the reason why people would want to read a file as a stream.
The reason why people want a stream as input for their data, is that they want to give the caller the opportunity to specify the source of their data: callers can provide a stream that reads from a file, but also a stream with data from the internet, or from a database, or from a textBox, the procedure does not care, as long as it can read the bytes one by one or sometimes per chunk of bytes:
using (Stream fileStream = File.Open(fileName)
{
ProcessInputData(fileStream);
}
Or:
byte[] bytesToProcess = ...
using (Stream memoryStream = new MemoryStream(bytesToProcess))
{
ProcessInputData(memoryStream);
}
Or:
string operatorInput = this.textBox1.Text;
using (Stream memoryStream = new MemoryStream(operatorInput))
{
ProcessInputData(memoryStream);
}
Conclusioin
Methods use streams in their interface to indicate that they don't need all data in memory at once. One-by-one, or per chunk is enough. The caller is free to decide where the data comes from.
I am trying to find a good method to write data from a NetworkStream (via C#) to a text file while "quasi-simultaneously" reading the newly written data from the text file into Matlab.
Basically, is there a good method or technique for coordinating write/read operations (from separate programs) such that a read operation does not block a write operation (and vice-versa) and the lag between successive write/reads is minimized?
Currently I am just writing (appending) data from the network stream to a text file via a WriteLine loop, and reading the data by looping Matlab's fscanf function which also marks the last element read and repositions the file-pointer to that spot.
Relevant portions of C# code:
(Note: The loop conditions I'm using are arbitrary, I'm just trying to see what works right now.)
NetworkStream network_stream = tcp_client.GetStream();
string path = #"C:\Matlab\serial_data.txt";
FileInfo file_info = new FileInfo(path);
using (StreamWriter writer = file_info.CreateText())
{
string foo = "";
writer.WriteLine(foo);
}
using (StreamWriter writer = File.AppendText(path))
{
byte[] buffer = new byte[1];
int maxlines = 100000;
int lines = 0;
while (lines <= maxlines)
{
network_stream.Read(buffer, 0, buffer.Length);
byte byte2string = buffer[0];
writer.WriteLine(byte2string);
lines++;
}
}
Relevant Matlab Code:
i=0;
while i<100;
a = fopen('serial_data.txt');
b = fscanf(a, '%g', [1000 1]);
fclose(a);
i=i+1;
end
When I look at the data read into Matlab there are large stretches of zeros in between the actual data, and the most disconcerting part is that number of consecutive data-points read between these "false zero" stretches varies drastically.
I was thinking about trying to insert some delays (Thread.sleep and wait(timerObject)) into C# and Matlab, respectively, but even then, I don't feel confident that will guarantee I always obtain the data received over the network stream, which is imperative.
Any advice/suggestions would be greatly appreciated.
Looks like there's an issue with how fscanf is being used in the reader on the Matlab side.
The reader code looks like it's going to reread the entire file each time through the loop, because it's re-opening it on each pass through the loop. Is this intentional? If you want to track the end of a file, you probably want to keep the file handle open, and just keep checking to see if you can read further data from it with repeated fscanf calls on the same open filehandle.
Also, that fscanf call looks like it might always return a zero-padded 1000-element array, regardless of how large the file it read was. Maybe that's where your "false zeros" are coming from. How many there are would vary with how much data is actually in the file and how often the Matlab code read it between writes. Grab the second argout of fscanf to see how many elements it actually read.
[b,nRead] = fscanf(a, '%g', [1000 1]);
fprintf('Read %d numbers\n', nRead);
b = b(1:nRead);
Check the doc page for fscanf. In the "Output Arguments" section: "If the input contains fewer than sizeA elements, MATLABĀ® pads A with zeros."
And then you may want to look at this question: How can I do an atomic write/append in C#, or how do I get files opened with the FILE_APPEND_DATA flag?. Keeping the writes shorter than the output stream's buffer (like they are now) will make them atomic, and flushing after each write will make them visible to the reader in a timely manner.
Which one is better : MemoryStream.WriteTo(Stream destinationStream) or Stream.CopyTo(Stream destinationStream)??
I am talking about the comparison of these two methods without Buffer as I am doing like this :
Stream str = File.Open("SomeFile.file");
MemoryStream mstr = new MemoryStream(File.ReadAllBytes("SomeFile.file"));
using(var Ms = File.Create("NewFile.file", 8 * 1024))
{
str.CopyTo(Ms) or mstr.WriteTo(Ms);// Which one will be better??
}
Update
Here is what I want to Do :
Open File [ Say "X" Type File]
Parse the Contents
From here I get a Bunch of new Streams [ 3 ~ 4 Files ]
Parse One Stream
Extract Thousands of files [ The Stream is an Image File ]
Save the Other Streams To Files
Editing all the Files
Generate a New "X" Type File.
I have written every bit of code which is actually working correctly..
But Now I am optimizing the code to make the most efficient.
It is an historical accident that there are two ways to do the same thing. MemoryStream always had the WriteTo() method, Stream didn't acquire the CopyTo() method until .NET 4.
The MemoryStream.WriteTo() version looks like this:
public virtual void WriteTo(Stream stream)
{
// Exception throwing code elided...
stream.Write(this._buffer, this._origin, this._length - this._origin);
}
The Stream.CopyTo() implementation like this:
private void InternalCopyTo(Stream destination, int bufferSize)
{
int num;
byte[] buffer = new byte[bufferSize];
while ((num = this.Read(buffer, 0, buffer.Length)) != 0)
{
destination.Write(buffer, 0, num);
}
}
Stream.CopyTo() is more universal, it works for any stream. And helps programmers that fumble copying data from, say, a NetworkStream. Forgetting to pay attention to the return value from Read() was a very common bug. But it of course copies the bytes twice and allocates that temporary buffer, MemoryStream doesn't need it since it can write directly from its own buffer. So you'd still prefer WriteTo(). Noticing the difference isn't very likely.
MemoryStream.WriteTo: Writes the entire contents of this memory stream to another stream.
Stream.CopyTo: Reads the bytes from the current stream and writes them to the destination stream. Copying begins at the current position in the current stream.
You'll need to seek back to 0, to get the whole source stream copied.
So I think MemoryStream.WriteTo better option for this situation
If you use Stream.CopyTo, you don't need to read all the bytes into memory to start with. However:
This code would be simpler if you just used File.Copy
If you are going to load all the data into memory, you can just use:
byte[] data = File.ReadAllBytes("input");
File.WriteAllBytes("output", data);
You should have a using statement for the input as well as the output stream
If you really need processing so can't use File.Copy, using Stream.CopyTo will cope with larger files than loading everything into memory. You may not need that, of course, or you may need to load the whole file into memory for other reasons.
If you have got a MemoryStream, I'd probably use MemoryStream.WriteTo rather than Stream.CopyTo, but it probably won't make much difference which you use, except that you need to make sure you're at the start of the stream when using CopyTo.
I think Hans Passant's claim of a bug in MemoryStream.WriteTo() is wrong; it does not "ignore the return value of Write()". Stream.Write() returns void, which implies to me that the entire count bytes are written, which implies that Stream.Write() will block as necessary to complete the operation to, e.g., a NetworkStream, or throw if it ultimately fails.
That is indeed different from the write() system call in ?nix, and its many emulations in libc and so forth, which can return a "short write". I suspect Hans leaped to the conclusion that Stream.Write() followed that, which I would have expected, too, but apparently it does not.
It is conceivable that Stream.Write() could perform a "short write", without returning any indication of that, requiring the caller to check that the Position property of the Stream has actually been advanced by count. That would be a very error-prone API, and I doubt that it does that, but I have not thoroughly tested it. (Testing it would be a bit tricky: I think you would need to hook up a TCP NetworkStream with a reader on the other end that blocked forever, and write enough to fill up the wire buffers. Or something like that...)
The comments for Stream.Write() are not quite unambiguous:
Summary:
When overridden in a derived class, writes a sequence of bytes to the current
stream and advances the current position within this stream by the number
of bytes written.
Parameters: buffer:
An array of bytes. This method copies count bytes from buffer to the current stream.
Compare that to the Linux man page for write(2):
write() writes up to count bytes from the buffer pointed buf to the file referred to by the file descriptor fd.
Note the crucial "up to". That sentence is followed by explanation of some of the conditions under which a "short write" might occur, making it very explicit that it can occur.
This is really a critical issue: we need to know how Stream.Write() behaves, beyond all doubt.
The CopyTo method creates a buffer, populates its with data from the original stream and then calls the Write method passing the created buffer as a parameter. The WriteTo uses the memoryStream's internal buffer to write. That is the difference. What is better - it is up to you to decide which method you prefer.
Creating a MemoryStream from a HttpInputStream in Vb.Net:
Dim filename As String = MyFile.PostedFile.FileName
Dim fileData As Byte() = Nothing
Using binaryReader = New BinaryReader(MyFile.PostedFile.InputStream)
binaryReader.BaseStream.Position = 0
fileData = binaryReader.ReadBytes(MyFile.PostedFile.ContentLength)
End Using
Dim memoryStream As MemoryStream = New MemoryStream(fileData)
I am working with iTextSharp, and need to generate hundreds of thousands of RTF documents - the resulting files are between 5KB and 500KB.
I am listing 2 approaches below - the original approach wasn't necessarily slow, but I figured why write and retrieve to/from file to get the output string I need. I saw this other approach using MemoryStream, but it actually slowed things down. I essentially just need the outputted RTF content, so that I can run some filters on that RTF to clean up unnecessary formatting. The queries bringing back the data are very quick instant seeming . To generate a 1000 files (actually 2000 files are created in process) with original approach files takes about 15 minutes, the same with second approach takes about 25-30 minutes. The resulting files that I've run are averaging around 80KB.
Is there something wrong with the second approach? Seems like it should be faster than the first one, not slower.
Original approach:
RtfWriter2.GetInstance(doc, new FileStream(RTFFilePathName, FileMode.Create));
doc.Open();
//Add Tables and stuff here
doc.Close(); //It saves a file here to (RTFPathFileName)
StreamReader srRTF = new StreamReader(RTFFilePathName);
string rtfText = srRTF.ReadToEnd();
srRTF.Close();
//Do additional things with rtfText before writing to my final file
New approach, trying to speed it up but this is actually half as fast:
MemoryStream stream = new MemoryStream();
RtfWriter2.GetInstance(doc, stream);
doc.Open();
//Add Tables and stuff here
doc.Close();
string rtfText =
ASCIIEncoding.ASCII.GetString(stream.GetBuffer());
stream.Close();
//Do additional things with rtfText before writing to my final file
The second approach I am trying I found here:
iTextSharp - How to generate a RTF document in the ClipBoard instead of a file
How big your resulting stream is? MemoryStream performs a lot of memory copy operations while growing, so for large results it may take significantly longer to write data by small chunks compared with FileStream.
To verify if it is the problem set inital size of MemoryStream to some large value around resulting size and re-run the code.
To fix it you can pre-grow memory stream initially (if you know approximate output) or write your own stream that uses different scheme when growing. Also using temporary file might be good enough for your purposes as is.
Like Alexei said, its probably caused by fact, yo are creating MemoryStream every time, and every time it continously re-alocates memory as it grows. Try creating only 1 stream and reset it to begining before every write.
Also I think stream.GetBuffer() again returns new memory, so try using same StreamReader with your MemoryStream.
And it seems your code can be easily paralelised, so you can try run it using Paralel Extesions or using TreadPool.
And it seems little weird, you are writing your text as bytes in stream, then reading this stream as bytes and converting to text. Wouldnt it be possible to save your document directly as text?
A MemoryStream is not associated with a file, and has no concept of a filename. Basically, you can't do that.
You certainly can't cast between them; you can only cast upwards an downwards - not sideways; to visualise:
Stream
|
| |
FileStream MemoryStream
You can cast a MemoryStream to a Stream trivially, and a Stream to a MemoryStream via a type-check; but never a FileStream to a MemoryStream. That is like saying a dog is an animal, and an elephant is an animal, so we can cast a dog to an elephant.
You could subclass MemoryStream and add a Name property (that you supply a value for), but there would still be no commonality between a FileStream and a YourCustomMemoryStream, and FileStream doesn't implement a pre-existing interface to get a Name; so the caller would have to explicitly handle both separately, or use duck-typing (maybe via dynamic or reflection).
Another option (perhaps easier) might be: write your data to a temporary file; use a FileStream from there; then (later) delete the file.
I know this is old but there is a lot of misinformation in this thread.
It's all about buffer size. The internal buffers are significantly smaller with a memory stream vs a file stream. Smaller buffers cause more read\writes.
Just intilaize your memory stream with either a file stream or a byte array with a size of around 80k. Close the doc, set stream position to 0 and read to end the contents.
On a side note, get buffer will return the whole allocated buffer. So if you only wrote 1 byte and the buffer is 4k, you will have a lot of garbage in your string.