How FileStream works in c#?

How FileStream works in c#? - c#

I have this following piece of code.I an not fully understanding its implementation.
img stores the path of image as c:\\desktop\my.jpg
FileStream fs = new FileStream(img, FileMode.Open, FileAccess.Read);
byte[] bimage = new byte[fs.Length];
fs.Read(bimage, 0, Convert.ToInt32(fs.Length));
In the first line the filestream is opening image located at path img to read.
The second line is (i guess) converting the file opened to byte.
What does fs.length represent?
Does image have length or is it the length of name of file(i guess not)?
What is the third line doing?
Please help me clearify!!

fs is one of many C# I/O objects that presents file descriptor and some methods like Read in your example. Because Read method returns byte array, you should declare it first and set its length to file length (the second string, so fs.Length is file length in bytes). Then all you need is just read file content and store it in this array (third line). This could be done by one iteration (like in the example) or by reading blocks in a loop. When you done with reading, it is good approach to destroy fs object to prevent memory leakage.

A "stream", in computing, is commonly a control buffer you open (or connect), read a chunk and close (or disconnect).
In case of files, when opening OS finds file, handle pointers and locks on the resource.
You do reads. When you read, you are picking a range of bytes ("chunk") and putting it in memory. In this case, that second line byte array.
You could, in thesis, pick any number. But life is hard: you have physical memory limitation in any computer.
If your file fits in your RAM + virtual memory... you may use a large byte array (FSB and motherboard throughput applies).
So, in a low memory system, like a Raspberry Pi B (512MB), this can cause errors or failures.
There is where goes fs.Length. Microsoft implemented it to count all the bytes in the file. It iterates, counting every byte till the end of file (EOF).
Knowing this, a fs.Length call will be faster in smaller files, slower on bigger ones and allows you to do some math for your byte array maximum size versus optimum size (hardware power versus file chunk size).
Your buffers shall consider max computer memory and processes running (and using memory) in parallel.
You SHOULD NEVER, in any plataform, rely only on file size to define your memory buffer size.
And remember to always close/disconnect/release/dispose locked I/O resources... Like TCP connections, files, consoles, database connections and thread-safety locks.
Imagine you read a file with 10 GB, as a payment transaction log file in a 512 MB + 2GB SD Raspberry Pi.

Related

Difference between Buffer & Stream in C# [duplicate]

This question already has answers here:
What is the difference between BufferedStream and MemoryStream in terms of application?
(2 answers)
Closed 5 years ago.
I read that Buffer is a sequence of bytes. But I also read that Stream is also a sequence of bytes. So what is the difference between Stream & Buffer?

As I said in my comment, the nutshell difference between a buffer and a stream is that a stream is a sequence that transfers information from or to a specified source, whereas a buffer is a sequence of bytes that is stored in memory. For example:
FileStream stream = new FileStream("filepath.txt", FileMode.OpenOrCreate);
Opens a stream to a file. That stream can be read from, written to, or both. As it doesn't require any additional memory, it's lightweight and fast, but arbitrarily referencing a particular set of data in the source can be cumbersome. Streams also benefit from being a connection rather than a discrete set of data, so you don't need to know the size of the data beforehand.
Conversely:
byte[] fileContents = File.ReadAllBytes("filepath.txt");
Reads all the bytes of a file into memory. This is handy for when you need to manipulate the entire file at once, or keep a "local copy" for your program to hold onto so the file can be free for other uses. Depending on the size of the source and the amount of available memory, though, a buffer containing the entire file might not be an option.
This is just a barebones explanation, though. There are more thorough ones out there, For example, as Marc Gravell puts it:
Many data-structures (lists, collections, etc) act as containers - they hold a set of objects. But not a stream; if a list is a bucket, then a stream is a hose. You can pull data from a stream, or push data into a stream - but normally only once and only in one direction (there are exceptions of course). For example, TCP data over a network is a stream; you can send (or receive) chunks of data, but only in connection with the other computer, and usually only once - you can't rewind the Internet.
Streams can also manipulate data passing through them; compression streams, encryption streams, etc. But again - the underlying metaphor here is a hose of data. A file is also generally accessed (at some level) as a stream; you can access blocks of sequential data. Of course, most file systems also provide random access, so streams do offer things like Seek, Position, Length etc - but not all implementations support such. It has no meaning to seek some streams, or get the length of an open socket.

A buffer has a specified size/length and is used to store data. The Stream on the other hand is used to read and write information from one place to another. For example FileStream is used to read and write to and from files.
The stream itself has a buffer which buffer when filled to its max size is flushed and the data in the stream is read or written.

Combining FileStream and MemoryStream to avoid disk accesses/paging while receiving gigabytes of data?

I'm receiving a file as a stream of byte[] data packets (total size isn't known in advance) that I need to store somewhere before processing it immediately after it's been received (I can't do the processing on the fly). Total received file size can vary from as small as 10 KB to over 4 GB.
One option for storing the received data is to use a MemoryStream, i.e. a sequence of MemoryStream.Write(bufferReceived, 0, count) calls to store the received packets. This is very simple, but obviously will result in out of memory exception for large files.
An alternative option is to use a FileStream, i.e. FileStream.Write(bufferReceived, 0, count). This way, no out of memory exceptions will occur, but what I'm unsure about is bad performance due to disk writes (which I don't want to occur as long as plenty of memory is still available) - I'd like to avoid disk access as much as possible, but I don't know of a way to control this.
I did some testing and most of the time, there seems to be little performance difference between say 10 000 consecutive calls of MemoryStream.Write() vs FileStream.Write(), but a lot seems to depend on buffer size and the total amount of data in question (i.e the number of writes). Obviously, MemoryStream size reallocation is also a factor.
Does it make sense to use a combination of MemoryStream and FileStream, i.e. write to memory stream by default, but once the total amount of data received is over e.g. 500 MB, write it to FileStream; then, read in chunks from both streams for processing the received data (first process 500 MB from the MemoryStream, dispose it, then read from FileStream)?
Another solution is to use a custom memory stream implementation that doesn't require continuous address space for internal array allocation (i.e. a linked list of memory streams); this way, at least on 64-bit environments, out of memory exceptions should no longer be an issue. Con: extra work, more room for mistakes.
So how do FileStream vs MemoryStream read/writes behave in terms of disk access and memory caching, i.e. data size/performance balance. I would expect that as long as enough RAM is available, FileStream would internally read/write from memory (cache) anyway, and virtual memory would take care of the rest. But I don't know how often FileStream will explicitly access a disk when being written to.
Any help would be appreciated.

No, trying to optimize this doesn't make any sense. Windows itself already caches file writes, they are buffered by the file system cache. So your test is about accurate, both MemoryStream.Write() and FileStream.Write() actually write to RAM and have no significant perf differences. The file system driver lazily writes it to disk in the background.
The RAM used for the file system cache is what's left over after processes claimed their RAM needs. By using a MemoryStream, you reduce the effectiveness of the file system cache. Or in other words, you trade one for the other without benefit. You're in fact off worse, you use double the amount of RAM.
Don't help, this is already heavily optimized inside the operating system.

Since recent versions of Windows enable write caching by default, I'd say you could simply use FileStream and let Windows manage when or if anything actually is written to the physical hard drive.
If these files don't stick around after you've received them, you should probably write the files to a temp directory and delete them when you're done with them.

Use a FileStream constructor that allows you to define the buffer size. For example:
using (outputFile = new FileStream("filename",
FileMode.Create, FileAccess.Write, FileShare.None, 65536))
{
}
The default buffer size is 4K. Using a 64K buffer reduces the number of calls to the file system. A larger buffer will reduce the number of writes, but each write starts to take longer. Emperical data (many years of working with this stuff) indicates that 64K is a very good choice.
As somebody else pointed out, the file system will likely do further caching, and do the actual disk write in the background. It's highly unlikely that you'll receive data faster than you can write it to a FileStream.

Getting number of disk accesses from BufferedStream

I'm reading a binary file using BinaryReader. I want to count the number of disk accesses when buffering input with BufferedStream. Unfortunately this class is sealed, so I can't override method to count it manually.
Is there any way of doing it using standard library? Or must I write my own buffering BinaryReader to achieve this?

You could just calculate it from the buffer size you specified in the BufferedStream(Stream, int) constructor. The default is 4096 bytes. Assuming you don't Seek(), the number of file accesses is (filesize + bufsize - 1) / bufsize.
A total overkill approach is to keep in mind that you can chain streams. Create your own Stream derived class and just count the number of calls to the Read() method that need to supply data from the underlying stream. Pass an instance of that class to the BufferedStream constructor.
Neither approach lets you find out how often the operating system hits the disk driver and physically transfers data from the disk. The file system cache sits in between and the actual number greatly depends on how the file data is mapped across the disk cylinders and sectors. You'd get info about that from a performance counter. There's little point in actually using it, the numbers you get will very poorly reproduce on another machine.

Read whole file in memory VS read in chunks

I'm relatively new to C# and programming, so please bear with me. I'm working an an application where I need to read some files and process those files in chunks (for example data is processed in chunks of 48 bytes).
I would like to know what is better, performance-wise, to read the whole file at once in memory and then process it or to read file in chunks and process them directly or to read data in larger chunks (multiple chunks of data which are then processed).
How I understand things so far:
Read whole file in memory
pros:
-It's fast, because the most time expensive operation is seeking, once the head is in place it can read quite fast
cons:
-It consumes a lot of memory
-It consumes a lot of memory in very short time ( This is what I am mainly afraid of, because I do not want that it noticeably impacts overall system performance)
Read file in chunks
pros:
-It's easier (more intuitive) to implement
while(numberOfBytes2Read > 0)
read n bytes
process read data
-It consumes very little memory
cons:
-It could take much more time, if the disk has to seek the file again and move the head to the appropriate position, which in average costs around 12ms.
I know that the answer depends on file size (and hardware). I assume it is better to read the whole file at once, but for how large files is this true, what is the maximum recommended size to read in memory at once (in bytes or relative to the hardware - for example % of RAM)?
Thank you for your answers and time.

It is recommended to read files in buffers of 4K or 8K.
You should really never read files all at once if you want to write it back to another stream. Just read to a buffer and write the buffer back. This is especially through for web programming.
If you have to load the whole file since your operation (text-processing, etc) needs the whole content of the file, buffering does not really help, so I believe it is preferable to use File.ReadAllText or File.ReadAllBytes.
Why 4KB or 8KB?
This is closer to the underlying Windows operating system buffers. Files in NTFS are normally stored in 4KB or 8KB chuncks on the disk although you can choose 32KB chuncks

Your chunk needs to be just large enougth, 48 bytes is of course to small, 4K is reasonable.

c# Why is last byte of copied file different?

I am writing a program to read and write a specific binary file format.
I believe I have it 95% working. I am running into a a strange problem.
In the screenshot I am showing a program I wrote that compares two files byte by byte. The very last byte should be 0 but is FFFFFFF.
Using a binary viewer I can see no difference in the files. They appear to be identical.
Also, windows tells me the size of the files is different but the size on disk is the same.
Can someone help me understand what is going on?
The original is on the left and my copy is on the right.

Possible answers:
You forgot to call Stream.close() or Stream.Dispose().
Your code is messing up text and and other kinds of data (e.g. casting a -1 from a Read() method into a char, then writing it.
We need to see your code though...

Size on disk vs Size
First of all you should note that the Size on disk is almost always different from the Size value because the Size on disk value reflects the allocated drive storage but the Size reflects the actual length of the file.
A disk drive splits its space into blocks of the same size. For example, if your drive works with 4KB blocks then even the smallest file containing a single byte will still take up 4KB on the disk, as that is the minimum space it can allocate. Once you write out the 4KB + 1 byte it will then allocate another 4KB block of storage, thus making it 8KB on disk. Hence the Size on disk is always a multiple of 4KB. So the fact the source and destination files have the same Size on disk does not mean the files are the same length. (Different drives have different blocks sizes, it is not always 4KB).
The Size value is the actual defined length of the file data within the disk blocks.
Your Size Issue
As your Size values are different it means that the operating system has saved different lengths of data. Hence you have a fundamental problem with your copying routine and not just an issue with the last byte as you think at the moment. One of your files is 3,434 bytes and the other 2,008 which is a big difference. Your first step must be to work out why you have such a big difference.
If your hex comparing routine is simply looking at the block data then it will think they are the same length as it is comparing disk blocks rather than actual file length.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.