Looking at the different ways to upload a file in .NET, e.g. HttpPostedFile, and using a HttpHandler, I'm trying to understand how the process works in a bit more details.
Specifically how it writes the information to a file.
Say I have the following:
HttpPostedFile file = context.Request.Files[0];
file.SaveAs("c:\temp\file.zip");
The actual file does not get created until the full stream seems to be processed.
Similarly:
using (Stream output = File.OpenWrite("c:\temp\file.zip"))
using (Stream input = file.InputStream)
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
I would have thought that this would "progressively" write the file as it reads the stream. Looking at the filesystem, it does not seems to do this at all. If I breakpoint inside the while, it does though.
What I'm trying to do, is have it so you upload a file (using a javascript uploader), and poll alongside, whereby the polling ajax request tries to get the fileinfo(file size) of the uploaded file every second. However, it always returns 0 until the upload is complete.
Vimeo seems to be able to do this type of functionality (for IE)?? Is this a .NET limitation, or is there a way to progressively write the file from the stream?
Two points:
First, in Windows, the displayed size of a file is not updated constantly. The file might indeed be growing continually, but the size only increases once.
Second (more likely in this case), the stream might not be flushing to the disk. You could force it to by adding output.Flush() after the call to output.Write(). You might not want to do that, though, since it will probably have a negative impact on performance.
Perhaps you could poll the Length property of the output stream directly, instead of going through the file system.
EDIT:
To make the Length property of the stream accessible to other threads, you could have a field in your class and update it with each read/write:
private long _uploadedByteCount;
void SomeMethod()
{
using (Stream output = File.OpenWrite("c:\temp\file.zip"))
using (Stream input = file.InputStream)
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
Interlocked.Add(ref _uploadedByteCount, bytesRead);
}
}
}
public long GetUploadedByteCount()
{
return _uploadedByteCount;
}
Related
The following code do :
Read all bytes from an input file
Keep only part of the file in outbytes
Write the extracted bytes in outputfile
byte[] outbytes = File.ReadAllBytes(sourcefile).Skip(offset).Take(size).ToArray();
File.WriteAllBytes(outfile, outbytes);
But there is a limitation of ~2GB data for each step.
Edit: The extracted bytes size can also be greater than 2GB.
How could I handle big file ? What is the best way to proceed with good performances, regardless of size ?
Thx !
Example to FileStream to take the middle 3 Gb out of a 5 Gb file:
byte[] buffer = new byte{1024*1024];
using(var readFS = File.Open(pathToBigFile))
using(var writeFS = File.OpenWrite(pathToNewFile))
{
readFS.Seek(1024*1024*1024); //seek to 1gb in
for(int i=0; i < 3000; i++){ //3000 times of one megabyte = 3gb
int bytesRead = readFS.Read(buffer, 0, buffer.Length);
writeFS.Write(buffer, 0, bytesRead);
}
}
It's not a production grade code; Read might not read a full megabyte so you'd end up with less than 3Gb - it's more to demonstrate the concept of using two filestreams and reading repeatedly from one and writing repeatedly to the other. I'm sure you can modify it so that it copies an exact number of bytes by keeping track of the total of all the bytesRead in the loop and stopping reading when you have read enough
It is better to stream the data from one file to the other, only loading small parts of it into memory:
public static void CopyFileSection(string inFile, string outFile, long startPosition, long size)
{
// Open the files as streams
using (var inStream = File.OpenRead(inFile))
using (var outStream = File.OpenWrite(outFile))
{
// seek to the start position
inStream.Seek(startPosition, SeekOrigin.Begin);
// Create a variable to track how much more to copy
// and a buffer to temporarily store a section of the file
long remaining = size;
byte[] buffer = new byte[81920];
do
{
// Read the smaller of 81920 or remaining and break out of the loop if we've already reached the end of the file
int bytesRead = inStream.Read(buffer, 0, (int)Math.Min(buffer.Length, remaining));
if (bytesRead == 0) { break; }
// Write the buffered bytes to the output file
outStream.Write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
while (remaining > 0);
}
}
Usage:
CopyFileSection(sourcefile, outfile, offset, size);
This should have equivalent functionality to your current method without the overhead of reading the entire file, regardless of its size, into memory.
Note: If you're doing this in code that uses async/await, you should change CopyFileSection to be public static async Task CopyFileSection and change inStream.Read and outStream.Write to await inStream.ReadAsync and await outStream.WriteAsync respectively.
My problem is the following:
I am currently extensively testing my application and now I got to know that it isn't able to handle uploads of large files. Of course I informed myself of this problem, and the AllowWriteStreamBuffering-Property is already set to false, but when I try to upload a file with a size of ~ 700 Mb, my PC freezes (I have 4 Gb RAM and I don't get an MemoryOutOfRangeException). Neither I am able to use the HttpClient-Class because I have to provide support for .NET Framework 4, nor I can chunk the upload because the target-servers do not support that kind of upload.
I think the memory problem is caused because the data I already sent (while uploading) is still allocated in the RAM.
These lines of code are responsible for that:
while ((bytesRead = fileStream.Read(fileBuffer, 0, fileBuffer.Length)) != 0)
{
requestStream.Write(fileBuffer, 0, (int)bytesRead);
}
How can I delete the data which is already sent but still using my memory?
If this isn't the cause of the problem, how can I solve it then?
I tried several Methods, and a kind of internal chunked Upload (but the Chunked-Property of the HttpWebRequest is false) works:
long bytesRead = 0;
long splitBytes = 1000000; // ≈ 1 Mb
int dataPacks = (int)Math.Ceiling((double)file.FileSize/splitBytes);
FileStream fileStream = new FileStream(file.Path, FileMode.Open, FileAccess.Read);
byte[] fileBuffer = new byte[splitBytes];
Stream requestStream = request.GetRequestStream();
requestStream.Write(postBuffer, 0, postBuffer.Length);
while (fileStream.Read(fileBuffer, 0, fileBuffer.Length) != 0)
{
if (bytesRead + splitBytes <= file.FileSize)
{
requestStream.Write(fileBuffer, 0, fileBuffer.Length);
fileStream.Flush();
bytesRead += splitBytes;
}
else
{
requestStream.Write(fileBuffer, 0, (int)(file.FileSize - bytesRead));
fileStream.Flush();
bytesRead += (file.FileSize - bytesRead);
}
}
It works even on servers which don't accept a real-chunked upload.
i'm writing a little tape writer application in C#, using class contained in this article:
http://www.codeproject.com/Articles/15487/Magnetic-Tape-Data-Storage-Part-1-Tape-Drive-IO-Co
this works very well, but writes a lot more data on tape than the original file data.
Pratical example:
my test file is 160mb.
writing in a tape results in about 300mb space occupation.
enabling hardware compression it takes about 250mb.
if i read the just wrote raw data from tape i get an about 170mb file (witch is acceptable) and the backupped file always works well.
I tried with other programs, Microsoft NTBackup uses just 170mb (!!) with compression enabled, other commercial and free program uses from 200 to 300mb
But ALL the programs can read correctly the backup (same md5 and sha1 on recovered file!)
whats going on? how can i improve my application? i really can't understand this.
i add my "write" function, who uses a modded write in the class (this works only if you write a single file):
private void Write(string path)
{
int BlockCounter = 0;
int BytesRead = 0;
Byte[] Temp = new Byte[BUFFER_SIZE];
using (System.IO.FileStream InputStream = System.IO.File.OpenRead(path))
{
TapeOperator TapeOp = new TapeOperator();
TapeOp.Load("\\\\.\\Tape0", 0);
TapeOp.SetTapePosition(0);
BytesRead = InputStream.Read(Temp, 0, BUFFER_SIZE);
while (BytesRead > 0)
{
TapeOp.Write(BlockCounter, Temp);
BlockCounter++;
BytesRead = InputStream.Read(Temp, 0, BUFFER_SIZE);
}
TapeOp.TapeMark(1, 1, 1); //TapeMark is a custom function to write a FileMark
BlockCounter++;
TapeOp.Close();
}
}
Modded write from class:
public void Write(long startPos, byte[] stream)
{
m_stream.Write(stream, 0, stream.Length);
m_stream.Flush();
}
My take on it would be the block size of the tape is greater than your BUFFER_SIZE. You are not filling the tape blocks all the way.
I am using a third party tool to get the scanned content from the scanner. On button click it executes the code and gives the content as a FileStream. Now I need to save this FileStream content as a pdf file in to a specified folder.
After saving I need to open the file in browser. How can I save the FileStream as a PDF file?
You can write the stream directly to the output buffer of the response.
So if you're at the point in your code where you have the filestream from the scanner. Simply read bytes from the scanner filestream and write them to the Response.OutputStream
Set the contentType to application/pdf
Make sure you return nothing else. The users browser will do whatever it is configured to do now, either save to disk or show in the browser. You can also save to disk on the server at this point as well in case you wanted a backup.
I'm assuming your file stream is already a pdf, otherwise you'll need to use something like itextsharp to create the pdf.
Edit
Here's some rough and ready code to do it. You'll want to tidy this up, like adding exception trapping to make sure the file stream gets cleaned up properly.
public void SaveToOutput(Stream dataStream)
{
dataStream.Seek(0, SeekOrigin.Begin);
FileStream fileout = File.Create("somepath/file.pdf");
const int chunk = 512;
byte[] buffer = new byte[512];
int bytesread = dataStream.Read(buffer,0,chunk);
while (bytesread == chunk)
{
HttpContext.Current.Response.OutputStream.Write(buffer, 0, chunk);
fileout.Write(buffer, 0, chunk);
bytesread = dataStream.Read(buffer, 0, chunk);
}
HttpContext.Current.Response.OutputStream.Write(buffer, 0, bytesread);
fileout.Write(buffer, 0, bytesread);
fileout.Close();
HttpContext.Current.Response.ContentType = "application/pdf";
}
Simon
You might want to take a look at the C# PDF Library on SourceForge: http://sourceforge.net/projects/pdflibrary/
If I'm understanding you correctly, the third party library is handing you a stream containing the data for the scanned document and you need to write it to a file? If that's the case you need to look up file I/O in C#. Here's a link and an example:
Stream sourceStream = scanner.GetOutput(); // whereever the source stream is
FileStream targetStream = File.OpenWrite(filename, FileMode.Create());
int bytesRead = 0;
byte[] buffer = new byte[2048];
while (true) {
bytesRead = sourceStream.Read(buffer, 0, buffer.length);
if (bytesRead == 0)
break;
targetStream.Write(buffer, 0, bytesRead);
}
sourceStream.Close();
targetStream.Close();
not sure, but maybe check this
http://sourceforge.net/projects/itextsharp/
iTextSharp + FileStream = Corrupt PDF file
Another prominent PDF library (which I have used in the past as well) is iTextSharp. You can take a look at this tutorial on how to convert your Stream to PDF then have the user download it.
What is the best way to copy the contents of one stream to another? Is there a standard utility method for this?
From .NET 4.5 on, there is the Stream.CopyToAsync method
input.CopyToAsync(output);
This will return a Task that can be continued on when completed, like so:
await input.CopyToAsync(output)
// Code from here on will be run in a continuation.
Note that depending on where the call to CopyToAsync is made, the code that follows may or may not continue on the same thread that called it.
The SynchronizationContext that was captured when calling await will determine what thread the continuation will be executed on.
Additionally, this call (and this is an implementation detail subject to change) still sequences reads and writes (it just doesn't waste a threads blocking on I/O completion).
From .NET 4.0 on, there's is the Stream.CopyTo method
input.CopyTo(output);
For .NET 3.5 and before
There isn't anything baked into the framework to assist with this; you have to copy the content manually, like so:
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[32768];
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write (buffer, 0, read);
}
}
Note 1: This method will allow you to report on progress (x bytes read so far ...)
Note 2: Why use a fixed buffer size and not input.Length? Because that Length may not be available! From the docs:
If a class derived from Stream does not support seeking, calls to Length, SetLength, Position, and Seek throw a NotSupportedException.
MemoryStream has .WriteTo(outstream);
and .NET 4.0 has .CopyTo on normal stream object.
.NET 4.0:
instream.CopyTo(outstream);
I use the following extension methods. They have optimized overloads for when one stream is a MemoryStream.
public static void CopyTo(this Stream src, Stream dest)
{
int size = (src.CanSeek) ? Math.Min((int)(src.Length - src.Position), 0x2000) : 0x2000;
byte[] buffer = new byte[size];
int n;
do
{
n = src.Read(buffer, 0, buffer.Length);
dest.Write(buffer, 0, n);
} while (n != 0);
}
public static void CopyTo(this MemoryStream src, Stream dest)
{
dest.Write(src.GetBuffer(), (int)src.Position, (int)(src.Length - src.Position));
}
public static void CopyTo(this Stream src, MemoryStream dest)
{
if (src.CanSeek)
{
int pos = (int)dest.Position;
int length = (int)(src.Length - src.Position) + pos;
dest.SetLength(length);
while(pos < length)
pos += src.Read(dest.GetBuffer(), pos, length - pos);
}
else
src.CopyTo((Stream)dest);
}
.NET Framework 4 introduce new "CopyTo" method of Stream Class of System.IO namespace. Using this method we can copy one stream to another stream of different stream class.
Here is example for this.
FileStream objFileStream = File.Open(Server.MapPath("TextFile.txt"), FileMode.Open);
Response.Write(string.Format("FileStream Content length: {0}", objFileStream.Length.ToString()));
MemoryStream objMemoryStream = new MemoryStream();
// Copy File Stream to Memory Stream using CopyTo method
objFileStream.CopyTo(objMemoryStream);
Response.Write("<br/><br/>");
Response.Write(string.Format("MemoryStream Content length: {0}", objMemoryStream.Length.ToString()));
Response.Write("<br/><br/>");
There is actually, a less heavy-handed way of doing a stream copy. Take note however, that this implies that you can store the entire file in memory. Don't try and use this if you are working with files that go into the hundreds of megabytes or more, without caution.
public static void CopySmallTextStream(Stream input, Stream output)
{
using (StreamReader reader = new StreamReader(input))
using (StreamWriter writer = new StreamWriter(output))
{
writer.Write(reader.ReadToEnd());
}
}
NOTE: There may also be some issues concerning binary data and character encodings.
The basic questions that differentiate implementations of "CopyStream" are:
size of the reading buffer
size of the writes
Can we use more than one thread (writing while we are reading).
The answers to these questions result in vastly different implementations of CopyStream and are dependent on what kind of streams you have and what you are trying to optimize. The "best" implementation would even need to know what specific hardware the streams were reading and writing to.
Unfortunately, there is no really simple solution. You can try something like that:
Stream s1, s2;
byte[] buffer = new byte[4096];
int bytesRead = 0;
while (bytesRead = s1.Read(buffer, 0, buffer.Length) > 0) s2.Write(buffer, 0, bytesRead);
s1.Close(); s2.Close();
But the problem with that that different implementation of the Stream class might behave differently if there is nothing to read. A stream reading a file from a local harddrive will probably block until the read operaition has read enough data from the disk to fill the buffer and only return less data if it reaches the end of file. On the other hand, a stream reading from the network might return less data even though there are more data left to be received.
Always check the documentation of the specific stream class you are using before using a generic solution.
There may be a way to do this more efficiently, depending on what kind of stream you're working with. If you can convert one or both of your streams to a MemoryStream, you can use the GetBuffer method to work directly with a byte array representing your data. This lets you use methods like Array.CopyTo, which abstract away all the issues raised by fryguybob. You can just trust .NET to know the optimal way to copy the data.
if you want a procdure to copy a stream to other the one that nick posted is fine but it is missing the position reset, it should be
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[32768];
long TempPos = input.Position;
while (true)
{
int read = input.Read (buffer, 0, buffer.Length);
if (read <= 0)
return;
output.Write (buffer, 0, read);
}
input.Position = TempPos;// or you make Position = 0 to set it at the start
}
but if it is in runtime not using a procedure you shpuld use memory stream
Stream output = new MemoryStream();
byte[] buffer = new byte[32768]; // or you specify the size you want of your buffer
long TempPos = input.Position;
while (true)
{
int read = input.Read (buffer, 0, buffer.Length);
if (read <= 0)
return;
output.Write (buffer, 0, read);
}
input.Position = TempPos;// or you make Position = 0 to set it at the start
Since none of the answers have covered an asynchronous way of copying from one stream to another, here is a pattern that I've successfully used in a port forwarding application to copy data from one network stream to another. It lacks exception handling to emphasize the pattern.
const int BUFFER_SIZE = 4096;
static byte[] bufferForRead = new byte[BUFFER_SIZE];
static byte[] bufferForWrite = new byte[BUFFER_SIZE];
static Stream sourceStream = new MemoryStream();
static Stream destinationStream = new MemoryStream();
static void Main(string[] args)
{
// Initial read from source stream
sourceStream.BeginRead(bufferForRead, 0, BUFFER_SIZE, BeginReadCallback, null);
}
private static void BeginReadCallback(IAsyncResult asyncRes)
{
// Finish reading from source stream
int bytesRead = sourceStream.EndRead(asyncRes);
// Make a copy of the buffer as we'll start another read immediately
Array.Copy(bufferForRead, 0, bufferForWrite, 0, bytesRead);
// Write copied buffer to destination stream
destinationStream.BeginWrite(bufferForWrite, 0, bytesRead, BeginWriteCallback, null);
// Start the next read (looks like async recursion I guess)
sourceStream.BeginRead(bufferForRead, 0, BUFFER_SIZE, BeginReadCallback, null);
}
private static void BeginWriteCallback(IAsyncResult asyncRes)
{
// Finish writing to destination stream
destinationStream.EndWrite(asyncRes);
}
For .NET 3.5 and before try :
MemoryStream1.WriteTo(MemoryStream2);
Easy and safe - make new stream from original source:
MemoryStream source = new MemoryStream(byteArray);
MemoryStream copy = new MemoryStream(byteArray);
The following code to solve the issue copy the Stream to MemoryStream using CopyTo
Stream stream = new MemoryStream();
//any function require input the stream. In mycase to save the PDF file as stream
document.Save(stream);
MemoryStream newMs = (MemoryStream)stream;
byte[] getByte = newMs.ToArray();
//Note - please dispose the stream in the finally block instead of inside using block as it will throw an error 'Access denied as the stream is closed'