I'm using the Iconic Zip Library to do some zipping.
When it's time to zip up, I want to call their ZipFile.Save(Stream outputStream) method.
On the Telligent side, to save the content of a stream to a file you use their ICentralizedFileStorageProvider.AddUpdateFile(string path, string fileName, Stream contentStream) method.
As you can see, you provide the Iconic Zip Library a stream for them to write to, but to save a file, Telligent does not provide you with a stream for you to write to, you need to provide them with a stream for them to read from.
Sure, I could use a MemoryStream, load it by passing it the Iconic Zip Library and then unload it by passing it to the Telligent APIs, but that would cause the entire zip file to be loaded to memory all at once. I know that the final zip is going to be huge, so loading it entirely onto memory is not an option; I need to do some sort of buffering.
How do I reconcile these two APIs? How do I build a bridge between them where data can flow without hogging up memory? Any ideas?
i want to transfer data over sockets and currently i am creating a memory stream.
i can also use a network stream.
Can anyone please help me understand the difference between c# network stream and memory stream?
A NetworkStream is directly related to a socket; it does not know it's own length, you cannot seek, and the read/write functions are directly bound to the receive/send APIs (and therefore, read and write are entirely unrelated to eachother). It can timeout, and a read can take a considerable time if waiting for more data.
A MemoryStream is basically a wrapper over a local byte[]. It has a known length (which can change), you can seek, and read/write are directly related: both increment the same position cursor, and you can write something, rewind, and then read it. All operations are very timely.
It might be easier to ask "what are the similarities", which would be simply: both have a read/write API, by virtue of being subclasses of Stream.
both streams are derive of Stream, this classes are warper for different purpose
According to my understanding, Network Stream reads from the network interface, where if you use a Memory Stream (I mean, in the same scenario), all the data will be loaded to memory first (I assume it reads to the end of the actual stream), then the read operations will read from memory.
The first read operation to occur on the Memory Stream, all the data needs to be loaded in to memory.
Where network stream, you can read the data as they arrive.
During the serialization we can use either memory stream or file stream.
What is the basic difference between these two? What does memory stream mean?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.Serialization.Formatters.Binary;
namespace Serilization
{
class Program
{
static void Main(string[] args)
{
MemoryStream aStream = new MemoryStream();
BinaryFormatter aBinaryFormat = new BinaryFormatter();
aBinaryFormat.Serialize(aStream, person);
aStream.Close();
}
}
}
Stream is a representation of bytes. Both these classes derive from the Stream class which is abstract by definition.
As the name suggests, a FileStream reads and writes to a file whereas a MemoryStream reads and writes to the memory. So it relates to where the stream is stored.
Now it depends how you plan to use both of these. For eg: Let us assume you want to read binary data from the database, you would go in for a MemoryStream. However if you want to read a file on your system, you would go in for a FileStream.
One quick advantage of a MemoryStream is that there is not need to create temporary buffers and files in an application.
The other answers here are great, but I thought one that takes a really high level look at what purpose steams serve might be useful. There's a bit of simplification going on in the explanation below, but hopefully this gets the idea across:
What is a stream?
A stream is effectively the flow of data between two places, it's the pipe rather than the contents of that pipe.
A bad analogy to start
Imagine a water desalination plant (something that takes seawater, removes the salt and outputs clean drinking water to the water network):
The desalination plant can't remove the salt from all of the sea at one time (and nor would we want it to… where would the saltwater fish live?), so instead we have:
A SeaStream that sucks a set amount of water at a time into the plant.
That SeaStream is connected to the DesalinationStream to remove the salt
And the output of the DesalinationStream is connected to the DrinkingWaterNetworkStream to output the now saltless water to the drinking water supply.
OK, so what's that got to do with computers?
Moving big files all at once can be problematic
Frequently in computing we want to move data between two locations, e.g. from an external hard drive to a binary field in a database (to use the example given in another answer). We can do that by copying all of the data from the file from location A into the computer's memory and from there to to Location B, but if the file is large or the source or destination are potentially unreliable then moving the whole file at once may be either unfeasible or unwise.
For example, say we want to move a large file on a USB stick to a field in a database. We could use a 'System.IO.File' object to retrieve that whole file into the computer's memory and then use a database connection to pass that file onto the database.
But, that's potentially problematic, what if the file is larger than the computer's available RAM? Now the file will potentially be cached to the hard drive, which is slow, and it might even slow the computer down too.
Likewise, what if the data source is unreliable, e.g. copying a file from a network drive with a slow and flaky WiFi connection? Trying to copy a large file in one go can be infuriating because you get half the file and then the connection drops out and you have to start all over again, only for it to potentially fail again.
It can be better to split the file and move it a piece at a time
So, rather than getting whole file at once, it would be better to retrieve the file a piece at a time and pass each piece on to the destination one at a time. This is what a Stream does and that's where the two different types of stream you mentioned come in:
We can use a FileStream to retrieve data from a file a piece at a time
and the database API may make available a MemoryStream endpoint we can write to a piece at a time.
We connect those two 'pipes' together to flow the file pieces from file to database.
Even if the file wasn't too big to be held in RAM, without streams we were still doing a number or read/write operations that we didn't need to. The stages we we're carrying out were:
Retrieving the data from the disk (slow)
Writing to a File object in the computer's memory (a bit faster)
Reading from that File object in the computer's memory (faster again)
Writing to the database (probably slow as there's probably a spinning disk hard-drive at the end of that pipe)
Streams allow us to conceptually do away with the middle two stages, instead of dragging the whole file into computer memory at once, we take the output of the operation to retrieve the data and pipe that straight to the operation to pass the data onto the database.
Other benefits of streams
Separating the retrieval of the data from the writing of the data like this also allows us to perform actions between retrieving the data and passing it on. For example, we could add an encryption stage, or we could write the incoming data to more than one type of output stream (e.g. to a FileStream and a NetworkStream).
Streams also allow us to write code where we can resume the operation should the transfer fail part way through. By keeping track of the number of pieces we've moved, if the transfer fails (e.g. if the network connection drops out) we can restart the Stream from the point at which we received the last piece (this is the offset in the BeginRead method).
In simplest form, a MemoryStream writes data to memory, while a FileStream writes data to a file.
Typically, I use a MemoryStream if I need a stream, but I don't want anything to hit the disk, and I use a FileStream when writing a file to disk.
While a file stream reads from a file, a memory stream can be used to read data mapped in the computer's internal memory (RAM). You are basically reading/writing streams of bytes from memory.
Having a bitter experience on the subject, here's what I've found out. if performance is required, you should copy the contents of a filestream to a memorystream. I had to process the contents of 144 files, 528kbytes each and present the outcome to the user. It took 250 seconds aprox. (!!!!). When I just copied the contents of each filestream to a memorystream, (CopyTo method) without changing anything at all, the time dropped to approximately 32 seconds. Note that each time you copy one stream to another, the stream is appended at the end of the destination stream, so you may need to 'rewind' it prior to copying to it. Hope it helps.
In regards to stream itself, in general, it means that when you put a content into the stream (memory), it will not put all the content of whatever data source (file, db...) you are working with, to the memory. As opposed to for example Arrays or Buffers, where you feed everything to the memory. In stream, you get a chunk of eg. file to the memory. When you reach the end of chunk, stream gets the next chunk from file to the memory. It all happens in low-level background while you are just iterating the stream. That's why it's called stream.
A memory stream handles data via an in memory buffer. A filestream deals with files on disk.
Serializing objects in memory is hardly any useful, in my opinion. You need to serialize an object when you want to save it on disk. Typically, serialization is done from the object(which is in memory) to the disk while deserialization is done from the saved serialized object(on the disk) to the object(in memory).
So, most of the times, you want to serialize to disk, thus you use a Filestream for serialization.
Background:
I'm trying to write a simple SoapExtension class to log inbound/outbound Soap messages from an asmx web service. Following this article on msdn, I have been able to get things working. However I'd really like to understand why/how it's working rather than just copy & pasting code.
The question:
What I'm stuggling to grasp specifically is the handling of the IO streams in the example. All other articles I've read on the web handle the streams in an identical way... first getting a reference to the original stream, creating an in memory "working" stream, and then swapping the contents as necessary.
First question is, what is meant by "stream chaining" in this context? My understaning of streams is that writing to any stream will automatically write to the 'inner' streams in a pipeline. If that's the case, why is it necessary to manually copy contents from one stream to another?
Second question is, in the examples Copy method they're creating a StreamReader and StreamWriter each time, without disposing them - is this not putting extra pressure on the GC? Doesn't seem like something you'd want on a high traffic web service... I tried wrapping both in using statements, but disposing the reader/writer also closed the stream which led to more serious errors. .NET 4 has new Stream.CopyTo(Stream) methods, but what would be a better approach for .NET 3.5?
Well, by chaining streams you can basically have different streams that do different things, in a chained sequence. For instance, you can have one stream that compresses the data, and then another stream that encrypts the data (or the opposite if we are moving in the other direction).
As for ChainStream itself, well... There are lots of things to say about this one. I really recommend this article called Inside of Chainstream, which is extremely in-depth and also covers most of the questions you have.
The chaining is done in the framework. You get the original stream and return the stream where you put your modified result. The framework will chain this new stream into any other extensions.
It is implemented this way because the chaining works "backwards". Normally you add new functionality on top of streams but in this case you want to deal with the information fed into the original stream.
Calling close on stream is the same as Dispose.
I have an application on the Compact Framework that has some large embedded resources (some of them are several megabytes). I am calling assembly.GetManifestResourceStream(...) which returns a Stream object. However, I noticed that on some devices this call not only takes quite a while but causes the device to run out of available memory. Eventually I used reflector to look at the code for this method on the compact framework and it uses an internal method to get a byte[] of the resource data. It then returns this data wrapped in a MemoryStream.
Is there any way to retrieve a resource without using this call since it will always read everything into memory? Ideally I'd like to work with a Stream that I can get random access to without having to read the whole thing into memory (similar to how a FileStream works). It would be pretty neat if I could simply open a FileStream on the assembly and start reading at the appropriate offset, but I doubt this is how resources are embedded.
Don't use an embedded resource. Add it as a content file and open it off disk with a file stream.
I found an open source tool that exposes a lot of the assemblies meta meta and that allowed me to peak into the resource manually:
http://www.jbrowse.com/products/asmex/