SqlFileStream - Which FileOption and allocation size?

SqlFileStream - Which FileOption and allocation size? - c#

I am using the SqlFileStream and when constructing the object I am not sure which FileOptions and allocation size to use. I got this from another article but it did not explain why. Can somone help explain or give me a recommendation?
thanks!
using (var destination = new SqlFileStream(serverPathName, serverTxnContext, FileAccess.Write, FileOptions.Asynchronous, 4096))
{
await file.CopyToAsync(destination);
}

Since it appears like you are trying to copy this file asynchronously, you probably want FileOptions.Asynchronous. This the most responsive way to access your file, because you aren't bound to one thread. FileOptions.RandomAccess and FileOptions.SequentialScan both use caching the access the file however FileOptions.SequentialScan isn't guaranteed to cache optimally. Like the name implies, the large difference is how the access the file either randomly or sequentially. The WriteThrough just skips the cache and goes directly to the file which would be faster but riskier.
Allocation size is just the block size on the drive. If you pass 0 it would use the default size, which for an NTFS formatted drive would be 4KB. 4096 turns out to be 4KB so the person here is just making sure the block size is 4KB.

Related

Reading pictures from DB using memorystreams and images often crashes with Out of Memory

I'm trying to figure out if there is something seriously wrong with the following code. It reads the binary from the database, stores it as a picture and associates with an object of an Animal record.
For each row (record of an animal):
byte[] ba = (byte[])x.ItemArray[1]; //reading binary from a DB row
using (MemoryStream m=new MemoryStream(ba))
{
Image i = Image.FromStream(m); //exception thrown occassionally
c.Photo = i;
listOfAnimals.Add(c);
}
First of all, with 18 pictures loaded (the JPG files have 105 Mb in total), the running app uses 2 gb of memory. With no pictures loaded, it is only 500 Mb.
Often the exception gets raised in the marked point, the source of which is System Drawing.
Could anyone help me optimize the code or tell me what the problem is? I must have used some wrong functions...

According to Image.FromStream Method
OutOfMemoryException
The stream does not have a valid image format.
Remarks
You must keep the stream open for the lifetime of the Image.
The stream is reset to zero if this method is called successively with the same stream.
For more information see: Loading an image from a stream without keeping the stream open and Returning Image using Image.FromStream
Try the following:
Create a method to convert byte[] to image
ConvertByteArrayToImage
public static Image ConvertByteArrayToImage(byte[] buffer)
{
using (MemoryStream ms = new MemoryStream(buffer))
{
return Image.FromStream(ms);
}
}
Then:
byte[] ba = (byte[])x.ItemArray[1]; //reading binary from a DB row
c.Photo = ConvertByteArrayToImage(ba);
listOfAnimals.Add(c);

Checking the documentation, a possible reason for out of memory exceptions are that the stream is not a valid image. If this is the case it should fail reliably for a given image, so check if any particular source image is causing this issue.
Another possibility should be that you simply run out of memory. Jpeg typically gets a 10:1 compression level, so 105Mib of compressed data could use +1Gib of memory. I would recommend switching to x64 if at all possible, I see be little reason not to do so today.
There could also be a memory leak, the best way to investigate this would be with a memory profiler. This might be in just about any part of your code, so it is difficult to know without profiling.
You might also need to care about memory fragmentation. Large datablocks are stored in the large object heap, and this is not automatically defragmented. So after running a while you might still have memory available, just not in any continuous block. Again, switching to x64 would mostly solve this problem.
Also, as mjwills comments, please do not store large files in the database. I just spent several hours recovering a huge database, something that would have been much faster if images where stored as files instead.

C# - remove blocks of bytes in large binary files

i want a fast way in c# to remove a blocks of bytes in different places from binary file of size between 500MB to 1GB , the start and the length of bytes needed to be removed are in saved array
int[] rdiDataOffset= {511,15423,21047};
int[] rdiDataSize={102400,7168,512};
EDIT:
this is a piece of my code and it will not work correctly unless i put buffer size to 1:
while(true){
if (rdiDataOffset.Contains((int)fsr.Position))
{
int idxval = Array.IndexOf(rdiDataOffset, (int)fsr.Position, 0, rdiDataOffset.Length);
int oldRFSRPosition = (int)fsr.Position;
size = rdiDataSize[idxval];
fsr.Seek(size, SeekOrigin.Current);
}
int bufferSize = size == 0 ? 2048 : size;
if ((size>0) && (bufferSize > (size))) bufferSize = (size);
if (bufferSize > (fsr.Length - fsr.Position)) bufferSize = (int)(fsr.Length - fsr.Position);
byte[] buffer = new byte[bufferSize];
int nofbytes = fsr.Read(buffer, 0, buffer.Length);
fsr.Flush();
if (nofbytes < 1)
{
break;
}
}

No common file system provides an efficient way to remove chunks from the middle of an existing file (only truncate from the end). You'll have to copy all the data after the removal back to the appropriate new location.

A simple algorithm for doing this using a temp file (it could be done in-place as well but you have a riskier situation in case things go wrong).
Create a new file and call SetLength to set the stream size (if this is too slow you can Interop to SetFileValidData). This ensures that you have room for your temp file while you are doing the copy.
Sort your removal list in ascending order.
Read from the current location (starting at 0) to the first removal point. The source file should be opened without granting Write share permissions (you don't want someone mucking with it while you are editing it).
Write that content to the new file (you will likely need to do this in chunks).
Skip over the data not being copied
Repeat from #3 until done
You now have two files - the old one and the new one ... replace as necessary. If this is really critical data you might want to look a transactional approach (either one you implement or using something like NTFS transactions).
Consider a new design. If this is something you need to do frequently then it might make more sense to have an index in the file (or near the file) which contains a list of inactive blocks - then when necessary you can compress the file by actually removing blocks ... or maybe this IS that process.

If you're on the NTFS file system (most Windows deployments are) and you don't mind doing p/invoke methods, then there is a way, way faster way of deleting chunks from a file. You can make the file sparse. With sparse files, you can eliminate a large chunk of the file with a single call.
When you do this, the file is not rewritten. Instead, NTFS updates metadata about the extents of zeroed-out data. The beauty of sparse files is that consumers of your file don't have to be aware of the file's sparseness. That is, when you read from a FileStream over a sparse file, zeroed-out extents are transparently skipped.
NTFS uses such files for its own bookkeeping. The USN journal, for example, is a very large sparse memory-mapped file.
The way you make a file sparse and zero-out sections of that file is to use the DeviceIOControl windows API. It is arcane and requires p/invoke but if you go this route, you'll surely hide the uggles behind nice pretty function calls.
There are some issues to be aware of. For example, if the file is moved to a non-ntfs volume and then back, the sparseness of the file can disappear - so you should program defensively.
Also, a sparse file can appear to be larger than it really is - complicating tasks involving disk provisioning. A 5g sparse file that has been completely zeroed out still counts 5g towards a user's disk quota.
If a sparse file accumulates a lot of holes, you might want to occasionally rewrite the file in a maintenance window. I haven't seen any real performance troubles occur, but I can at least imagine that the metadata for a swiss-cheesy sparse file might accrue some performance degradation.
Here's a link to some doc if you're into the idea.

Windows API CreateFile: "Invalid parameter" exception with FILE_FLAG_NO_BUFFERING [duplicate]

A little background: I've been experimenting with using the FILE_FLAG_NO_BUFFERING flag when doing IO with large files. We're trying to reduce the load on the cache manager in the hope that with background IO, we'll reduce the impact of our app on user machines. Performance is not an issue. Being behind the scenes as much as possible is a big issue. I have a close-to-working wrapper for doing unbuffered IO but I ran into a strange issue. I get this error when I call Read with an offset that is not a multiple of 4.
Handle does not support synchronous operations. The parameters to the FileStream constructor may need to be changed to indicate that the handle was opened asynchronously (that is, it was opened explicitly for overlapped I/O).
Why does this happen? And is doesn't this message contradict itself? If I add the Asynchronous file option I get an IOException(The parameter is incorrect.)
I guess the real question is what do these requirements, http://msdn.microsoft.com/en-us/library/windows/desktop/cc644950%28v=vs.85%29.aspx, have to do with these multiples of 4.
Here is the code that demonstrates the issue:
FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;
int MinSectorSize = 512;
byte[] buffer = new byte[MinSectorSize * 2];
int i = 0;
while (i < MinSectorSize)
{
try
{
using (FileStream fs = new FileStream(#"<some file>", FileMode.Open, FileAccess.Read, FileShare.None, 8, FileFlagNoBuffering | FileOptions.Asynchronous))
{
fs.Read(buffer, i, MinSectorSize);
Console.WriteLine(i);
}
}
catch { }
i++;
}
Console.ReadLine();

When using FILE_FLAG_NO_BUFFERING, the documented requirement is that the memory address for a read or write must be a multiple of the physical sector size. In your code, you've allowed the address of the byte array to be randomly chosen (hence unlikely to be a multiple of the physical sector size) and then you're adding an offset.
The behaviour you're observing is that the call works if the offset is a multiple of 4. It is likely that the byte array is aligned to a 4-byte boundary, so the call is working if the memory address is a multiple of 4.
Therefore, your question can be rewritten like this: why is the read working when the memory address is a multiple of 4, when the documentation says it has to be a multiple of 512?
The answer is that the documentation doesn't make any specific guarantees about what happens if you break the rules. It may happen that the call works anyway. It may happen that the call works anyway, but only in September on even-numbered years. It may happen that the call works anyway, but only if the memory address is a multiple of 4. (It is likely that this depends on the specific hardware and device drivers involved in the read operation. Just because it works on your machine doesn't mean it will work on anybody else's.)
It probably isn't a good idea to use FILE_FLAG_NO_BUFFERING with FileStream in the first place, because I doubt that FileStream actually guarantees that it will pass the address you give it unmodified to the underlying ReadFile call. Instead, use P/Invoke to call the underlying API functions directly. You may also need to allocate your memory this way, because I don't know whether .NET provides any way to allocate memory with a particular alignment or not.

Just call CreateFile directly with FILE_FLAG_NO_BUFFERING and then close it before opening with FileStream to achieve the same effect.

which data structure is good for temporary big binary data storage?

Plan to have a data structure to store temporary binary data in the memory for analysis.
The max size of the data will be about 10MB.
data will be added at the end 408 bytes at a time.
no search, retrieve operations on those temporary binary data.
data will be wipe out and the storage will be reused for next analysis.
questions:
which structure is good for this purpose? byte[10MB], List<bytes>(10MB), List<MyStruct>(24000), or ...?
how to quickly wipe out the data (not List.Clear(), just set the value to 0) for List or array?
If I say List.Clear(), the memory for this List will shrink or the capacity (memory) of the List is still there and no memory allocation when I call List.AddRange() after the Clear()?
List.Insert() will make the List larger or it just replace the existing item?

You will have to describe what you are doing more to give better answers but it sounds like you are worried about efficiency/perf so
byte[]
no need to clear the array, just keep track of where the 'end' of your current cycle is
n/a
n/a

If your data is usually the same size, and always under a certain size, use a byte array.
Create a byte[], and a int that lets you know where the end of the "full" part of that buffer stops and the "free" part starts. You never need to clear it; just overwrite what was there. The only problem with this is if your data is sometimes 100 kb, sometimes 10 MB, and sometimes a bit larger than you originally planned for.
List will be slower to use and larger in memory, although they handle various sizes of data out of the box.

using (System.IO.MemoryStream memStream = new System.IO.MemoryStream())
{
Do stuff
} // the using ensures proper simple disposal occurs here so you don't have to worry about cleaning up.

Writing at the end of file

I'm working on a system that requires high file I/O performance (with C#).
Basically, I'm filling up large files (~100MB) from the start of the file until the end of the file.
Every ~5 seconds I'm adding ~5MB to the file (sequentially from the start of the file), on every bulk I'm flushing the stream.
Every few minutes I need to update a structure which I write at the end of the file (some kind of metadata).
When flushing each one of the bulks I have no performance issue.
However, when updating the metadata at the end of the file I get really low performance.
My guess is that when creating the file (which also should be done extra fast), the file doesn't really allocates the entire 100MB on the disk and when I flush the metadata it must allocates all space until the end of file.
Guys/Girls, any Idea how I can overcome this problem?
Thanks a lot!
From comment:
In general speaking the code is as follows, first the file is opened:
m_Stream = new FileStream(filename,
FileMode.CreateNew,
FileAccess.Write,
FileShare.Write, 8192, false);
m_Stream.SetLength(100*1024*1024);
Every few seconds I'm writing ~5MB.
m_Stream.Seek(m_LastPosition, SeekOrigin.Begin);
m_Stream.Write(buffer, 0, buffer.Length);
m_Stream.Flush();
m_LastPosition += buffer.Length; // HH: guessed the +=
m_Stream.Seek(m_MetaDataSize, SeekOrigin.End);
m_Stream.Write(metadata, 0, metadata.Length);
m_Stream.Flush(); // Takes too long on the first time(~1 sec).

As suggested above would it not make sense (assuming you must have the meta data at the end of the file) write that first.
That would do 2 things (assuming a non sparse file)...
1. allocate the total space for the entire file
2. make any following write operations a little faster as the space is ready and waiting.
Can you not do this asyncronously?
At least the application can then move on to other things.

Have you tried the AppendAllText method?

Your question isn't totally clear, but my guess is you create a file, write 5MB, then seek to 100MB and write the metadata, then seek back to 5MB and write another 5MB and so on.
If that is the case, this is a filesystem problem. When you extend the file, NTFS has to fill the gap in with something. As you say, the file is not allocated until you write to it. The first time you write the metadata the file is only 5MB long, so when you write the metadata NTFS has to allocate and write out 95MB of zeros before it writes the metadata. Upsettingly I think it also does this synchronously, so you don't even win using overlapped IO.

How about using the BufferedStream?
http://msdn.microsoft.com/en-us/library/system.io.bufferedstream(v=VS.100).aspx

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.