Efficient reading structured binary data from a file

Efficient reading structured binary data from a file - c#

I have the following code fragment that reads a binary file and validates it:
FileStream f = File.OpenRead("File.bin");
MemoryStream memStream = new MemoryStream();
memStream.SetLength(f.Length);
f.Read(memStream.GetBuffer(), 0, (int)f.Length);
f.Seek(0, SeekOrigin.Begin);
var r = new BinaryReader(f);
Single prevVal=0;
do
{
r.ReadUInt32();
var val = r.ReadSingle();
if (prevVal!=0) {
var diff = Math.Abs(val - prevVal) / prevVal;
if (diff > 0.25)
Console.WriteLine("Bad!");
}
prevVal = val;
}
while (f.Position < f.Length);
It unfortunately works very slowly, and I am looking to improve this. In C++, I would simply read the file into a byte array and then recast that array as an array of structures:
struct S{
int a;
float b;
}
How would I do this in C#?

define a struct (possible a readonly struct) with explicit layout ([StructLayout(LayoutKind.Explicit)]) that is precisely the same as your C++ code, then one of:
open the file as a memory-mapped file, get the pointer to the data; use either unsafe code on the raw pointer, or use Unsafe.AsRef<YourStruct> on the data, and Unsafe.Add<> to iterate
open the file as a memory-mapped file, get the pointer to the data; create a custom memory over the pointer (of your T), and iterate over the span
open the file as a byte[]; create a Span<byte> over the byte[], then use MemoryMarshal.Cast<,> to create a Span<YourType>, and iterate over that
open the file as a byte[]; use fixed to pin the byte* and get a pointer; use unsafe code to walk the pointer
something involve "pipelines" - a Pipe that is the buffer, maybe using StreamConnection on a FileStream for filling the pipe, and a worker loop that dequeues from the pipe; complication: the buffers can be discontiguous and may split at inconvenient places; solvable, but subtle code required whenever the first span isn't at least 8 bytes
(or some combination of those concepts)
Any of those should work much like your C++ version. The 4th is simple, but for very large data you probably want to prefer memory-mapped files

This is what we use (compatible with older versions of C#):
public static T[] FastRead<T>(FileStream fs, int count) where T: struct
{
int sizeOfT = Marshal.SizeOf(typeof(T));
long bytesRemaining = fs.Length - fs.Position;
long wantedBytes = count * sizeOfT;
long bytesAvailable = Math.Min(bytesRemaining, wantedBytes);
long availableValues = bytesAvailable / sizeOfT;
long bytesToRead = (availableValues * sizeOfT);
if ((bytesRemaining < wantedBytes) && ((bytesRemaining - bytesToRead) > 0))
{
Debug.WriteLine("Requested data exceeds available data and partial data remains in the file.");
}
T[] result = new T[availableValues];
GCHandle gcHandle = GCHandle.Alloc(result, GCHandleType.Pinned);
try
{
uint bytesRead;
if (!ReadFile(fs.SafeFileHandle, gcHandle.AddrOfPinnedObject(), (uint)bytesToRead, out bytesRead, IntPtr.Zero))
{
throw new IOException("Unable to read file.", new Win32Exception(Marshal.GetLastWin32Error()));
}
Debug.Assert(bytesRead == bytesToRead);
}
finally
{
gcHandle.Free();
}
GC.KeepAlive(fs);
return result;
}
[System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Interoperability", "CA1415:DeclarePInvokesCorrectly")]
[DllImport("kernel32.dll", SetLastError=true)]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool ReadFile
(
SafeFileHandle hFile,
IntPtr lpBuffer,
uint nNumberOfBytesToRead,
out uint lpNumberOfBytesRead,
IntPtr lpOverlapped
);
NOTE: This only works for structs that contain only blittable types, of course. And you must use [StructLayout(LayoutKind.Explicit)] and declare the packing to ensure that the struct layout is identical to the binary format of the data in the file.
For recent versions of C#, you can use Span as mentioned by Marc in the other answer!

Thank you everyone for very helpful comments and answers. Given this input, this is my preferred solution:
[StructLayout(LayoutKind.Sequential, Pack = 1)]
struct Data
{
public UInt32 dummy;
public Single val;
};
static void Main(string[] args)
{
byte [] byteArray = File.ReadAllBytes("File.bin");
ReadOnlySpan<Data> dataArray = MemoryMarshal.Cast<byte, Data>(new ReadOnlySpan<byte>(byteArray));
Single prevVal=0;
foreach( var v in dataArray) {
if (prevVal!=0) {
var diff = Math.Abs(v.val - prevVal) / prevVal;
if (diff > 0.25)
Console.WriteLine("Bad!");
}
prevVal = v.val;
}
}
}
It indeed works much faster than the original implementation.

You are actually not using the MemoryStream at all currently. Your BinaryReader accesses the file directly. To have the BinaryReader use the MemoryStream instead:
Replace
f.Seek(0, SeekOrigin.Begin);
var r = new BinaryReader(f);
...
while (f.Position < f.Length);
with
memStream.Seek(0, SeekOrigin.Begin);
var r = new BinaryReader(memStream);
...
while(r.BaseStream.Position < r.BaseStream.Length)

Related

Read Value of Arbitrary Type From Byte Array

I have to create an instance of an arbitrary value type from the bytes stored at some given offset in an array of bytes (for example, if type is int, 4 bytes shall be taken). I know I can easily do it using pointers to fixed objects, but I don't want to have unsafe code. So I try the following code (sanity checks were stripped):
public object GetValueByType(System.Type type, byte[] byteArray, int offset)
{
int size = System.Runtime.InteropServices.Marshal.SizeOf(type);
MemoryStream memoryStream = new MemoryStream();
memoryStream.Write(byteArray, offset, size);
memoryStream.Seek(0, System.IO.SeekOrigin.Begin);
BinaryFormatter binaryFormatter = new BinaryFormatter();
object obj = (object)binaryFormatter.Deserialize(memoryStream);
return obj;
}
But this code breaks at binaryFormatter.Deserialize.
How may I fix the above code (or achieve the same purpose in any other way)?

Eventually found a solution here (see answer 50), and that's the final code for your convenience:
public object GetValueByType(Type typeOfReturnedValue, int offsetInDataSection)
{
GCHandle handle = GCHandle.Alloc(this.byteArray, GCHandleType.Pinned);
int offset = <some desired offset>;
IntPtr addressInPinnedObject = (handle.AddrOfPinnedObject() + offset);
object returnedObject = Marshal.PtrToStructure(addressInPinnedObject, typeOfReturnedValue);
handle.Free();
return returnedObject;
}
This way my code remains without any unsafe code. Isn't it great?!

How to get a byte** from managed byte[] buffer

I've been using FFmpeg.AutoGen https://github.com/Ruslan-B/FFmpeg.AutoGen wrapper to decode my H264 video for sometime with great success and now have to add AAC audio decoding (previous I was using G711 and NAudio for this).
I have the AAC stream decoding using avcodec_decode_audio4, however the output buffer or frame is in floating point format FLT and I need it to be in S16. For this I have found unmanaged examples using swr_convert and FFmpeg.AutoGen does have this function P/Invoked as;
[DllImport(SWRESAMPLE_LIBRARY, EntryPoint="swr_convert", CallingConvention = CallingConvention.Cdecl, CharSet = CharSet.Ansi)]
public static extern int swr_convert(SwrContext* s, byte** #out, int out_count, byte** #in, int in_count);
My trouble is that I can't find a successful way of converting/fixing/casting my managed byte[] in to a byte** to provide this as the destination buffer.
Has anyone doing this before?
My non-working code...
packet.ResetBuffer(m_avFrame->linesize[0]*2);
fixed (byte* pData = packet.Payload)
{
byte** src = &m_avFrame->data_0;
//byte** dst = *pData;
IntPtr d = new IntPtr(pData);
FFmpegInvoke.swr_convert(m_pConvertContext, (byte**)d.ToPointer(), packet.Length, src, (int)m_avFrame->linesize[0]);
}
Thanks for any help.
Cheers
Dave

The function you are trying to call is documented here: http://www.ffmpeg.org/doxygen/2.0/swresample_8c.html#a81af226d8969df314222218c56396f6a
The out_arg parameter is declare like this:
uint8_t* out_arg[SWR_CH_MAX]
That is an length SWR_CH_MAX array of byte arrays. Your translation renders that as byte** and so forces you to use unsafe code. Personally I think I would avoid that. I would declare the parameter like this:
[MarshalAs(UnmanagedType.LPArray)]
IntPtr[] out_arg
Declare the array like this:
IntPtr[] out_arg = new IntPtr[channelCount];
I am guessing that the CH in SWR_CH_MAX is short-hand for channel.
Then you need to allocate memory for the output buffer. I'm not sure how you want to do that. You could allocate one byte array per channel and pin those arrays to get hold of a pointer to pass down to the native code. That would be my preferred approach because upon return you'd have your channels in nice managed arrays. Another way would be a call to Marshal.AllocHGlobal.
The input buffer would need to be handled in the same way.
I would not use the automated pinvoke translation that you are currently using. It seems he'll bent on forcing you to use pointers and unsafe code. Not massively helpful. I'd translate it by hand.
I'm sorry not to give more specific details but it's a little hard because your question did not contain any information about the types used in your code samples. I hope the general advice is useful.

Thanks to #david-heffernan answer I've managed to get the following working and I'm posting as an answer as examples of managed use of FFmpeg are very rare.
fixed (byte* pData = packet.Payload)
{
IntPtr[] in_buffs = new IntPtr[2];
in_buffs[0] = new IntPtr(m_avFrame->data_0);
in_buffs[1] = new IntPtr(m_avFrame->data_1);
IntPtr[] out_buffs = new IntPtr[1];
out_buffs[0] = new IntPtr(pData);
FFmpegInvoke.swr_convert(m_pConvertContext, out_buffs, m_avFrame->nb_samples, in_buffs, m_avFrame->nb_samples);
}
In in the complete context of decoding a buffer of AAC audio...
protected override void DecodePacket(MediaPacket packet)
{
int frameFinished = 0;
AVPacket avPacket = new AVPacket();
FFmpegInvoke.av_init_packet(ref avPacket);
byte[] payload = packet.Payload;
fixed (byte* pData = payload)
{
avPacket.data = pData;
avPacket.size = packet.Length;
if (packet.KeyFrame)
{
avPacket.flags |= FFmpegInvoke.AV_PKT_FLAG_KEY;
}
int in_len = packet.Length;
int count = FFmpegInvoke.avcodec_decode_audio4(CodecContext, m_avFrame, out frameFinished, &avPacket);
if (count != packet.Length)
{
}
if (count < 0)
{
throw new Exception("Can't decode frame!");
}
}
FFmpegInvoke.av_free_packet(ref avPacket);
if (frameFinished > 0)
{
if (!mConverstionContextInitialised)
{
InitialiseConverstionContext();
}
packet.ResetBuffer(m_avFrame->nb_samples*4); // need to find a better way of getting the out buff size
fixed (byte* pData = packet.Payload)
{
IntPtr[] in_buffs = new IntPtr[2];
in_buffs[0] = new IntPtr(m_avFrame->data_0);
in_buffs[1] = new IntPtr(m_avFrame->data_1);
IntPtr[] out_buffs = new IntPtr[1];
out_buffs[0] = new IntPtr(pData);
FFmpegInvoke.swr_convert(m_pConvertContext, out_buffs, m_avFrame->nb_samples, in_buffs, m_avFrame->nb_samples);
}
packet.Type = PacketType.Decoded;
if (mFlushRequest)
{
//mRenderQueue.Clear();
packet.Flush = true;
mFlushRequest = false;
}
mFirstFrame = true;
}
}

Memory leak at simple loading/saving of files

As part of my thesis, I need to load, modify and save .dds texture files. Therefore I'm using the DevIL.NET-Wrapper library (but the problem isn't specific to this library I guess, it's more of a general problem).
I managed (by using the visual studio memory analysis tools) to figure out the memory leaking function inside the DevIL.NET-Wrapper:
public static byte[] ReadStreamFully(Stream stream, int initialLength) {
if(initialLength < 1) {
initialLength = 32768; //Init to 32K if not a valid initial length
}
byte[] buffer = new byte[initialLength];
int position = 0;
int chunk;
while((chunk = stream.Read(buffer, position, buffer.Length - position)) > 0) {
position += chunk;
//If we reached the end of the buffer check to see if there's more info
if(position == buffer.Length) {
int nextByte = stream.ReadByte();
//If -1 we reached the end of the stream
if(nextByte == -1) {
return buffer;
}
//Not at the end, need to resize the buffer
byte[] newBuffer = new byte[buffer.Length * 2];
Array.Copy(buffer, newBuffer, buffer.Length);
newBuffer[position] = (byte) nextByte;
buffer = newBuffer;
position++;
}
}
//Trim the buffer before returning
byte[] toReturn = new byte[position];
Array.Copy(buffer, toReturn, position);
return toReturn;
}
I did a test program to figure out where the memory leak actually comes from:
private static void testMemoryOverflow(string[] args)
{
DevIL.ImageImporter im;
DevIL.ImageExporter ie;
...
foreach (String file in ddsPaths)
{
using (FileStream fs = File.Open(file, FileMode.Open))
{
/* v memory leak v */
DevIL.Image img = im.LoadImageFromStream(fs);
/* ^ memory leak ^ */
ie.SaveImage(img, fileSavePath);
img = null;
}
}
}
The LoadImageFromStream() function is also part of the DevIL.NET-Wrapper, and in fact calling the function from above. This is where the leak occurs.
What I already tried:
Using GC.Collect()
Disposing the FileStream object manually instead of using the using{} directive
Disposing the stream inside the DevIL.NET ReadStreamFully() function from above
Does anyone have a solution for this?
I'm new to C#, so maybe it's kind of a basic mistake.

Your issue is the buffer size.
byte[] newBuffer = new byte[buffer.Length * 2];
After 2 iterations.. you're already very close to the 85K limit of objects hitting the Large Object Heap. At 3 iterations.. you've hit the threshold. Once there.. they won't be collected until a full garbage collection occurs across all generations. Even then.. the LOH isn't compacted.. so you'll still see some high memory.
I'm not sure why the library you're using does this. I'm not sure why you're using it either.. given that you can use:
Image img = Image.FromStream(fs); // built into .NET.
The way that library is written looks like it was from an earlier version of .NET. It doesn't appear to have memory usage as any sort of concern.

How to write file header using FileStream in C#

I am creating my own video file format and would like to write out a file header and frame headers.
At the moment I just have placeholders defined as such:
byte[] fileHeader = new byte[FILE_HEADER_SIZE * sizeof(int)];
byte[] frameHeader = new byte[FRAME_HEADER_SIZE * sizeof(int)];
I write them out using the following for the file header:
fsVideoWriter.Write(fileHeader, 0, FILE_HEADER_SIZE);
and this for the frame headers:
fsVideoWriter.Write(frameHeader, 0, FRAME_HEADER_SIZE);
Now that I actually need to make proper use of these headers, I'm not sure if this would be the most convenient way to write them, as I am not sure if it will be easy to read in the individual fields I need into separate variables from the headers.
I thought about doing something like the following:
[StructLayout(LayoutKind.Sequential, Pack = 1)]
struct FileHeader
{
public int x;
public int y;
public int z;
// etc. etc.
}
I would like to define it in such a way that I can upgrade easily as the file format evolves, (i.e. including a version number). Is this the recommended way to define a file/frame header? If so, how should I read/write it using the .NET FileStream class? If this is not the recommended way, please suggest the proper way to do this, as maybe someone has already created a generic video file-related class that handles this sort of thing?

I settled upon the following solution:
Writing out file header
public static bool WriteFileHeader(FileStream fileStream, FileHeader fileHeader)
{
try
{
byte[] buffer = new byte[FILE_HEADER_SIZE];
GCHandle gch = GCHandle.Alloc(buffer, GCHandleType.Pinned);
Marshal.StructureToPtr(fileHeader, gch.AddrOfPinnedObject(), false);
gch.Free();
fileStream.Seek(0, SeekOrigin.Begin);
fileStream.Write(buffer, 0, FILE_HEADER_SIZE);
return true;
}
catch (Exception ex)
{
throw ex;
}
}
Reading in file header
public static bool ReadFileHeader(FileStream fileStream, out FileHeader fileHeader)
{
try
{
fileHeader = new FileHeader();
byte[] buffer = new byte[FILE_HEADER_SIZE];
fileStream.Seek(0, SeekOrigin.Begin);
fileStream.Read(buffer, 0, FILE_HEADER_SIZE);
GCHandle gch = GCHandle.Alloc(buffer, GCHandleType.Pinned);
Marshal.PtrToStructure(gch.AddrOfPinnedObject(), fileHeader);
gch.Free();
// test for valid data
boolean isSuccessful = IsValidHeader(fileHeader);
return isSuccessful;
}
catch (Exception ex)
{
throw ex;
}
}
I used a similar approach for the frame headers as well. The idea is basically to make use of byte buffers and Marshal.

You may want to try the BinaryFormatter Class. But it is more or less a black box. If you need precise control of your file format, you can write your own Formatter and use it to serialize your header object.

C# BinaryReader notably slower. Alternative?

For my project I need to write UInt16, UInt32, Bytes and Strings from a file. I started with a simple class I wrote like this:
public FileReader(string path) //constructor
{
if (!System.IO.File.Exists(path))
throw new Exception("FileReader::File not found.");
m_byteFile = System.IO.File.ReadAllBytes(path);
m_readPos = 0;
}
public UInt16 getU16() // basic function for reading
{
if (m_readPos + 1 >= m_byteFile.Length)
return 0;
UInt16 ret = (UInt16)((m_byteFile[m_readPos + 0])
+ (m_byteFile[m_readPos + 1] << 8));
m_readPos += 2;
return ret;
}
I thought it might be better to use the already existing BinaryReader though and so I tried it, but I noticed that it is slower than my approach.
Can somebody explain why this is and if there is another already existing class I could use to load a file and read from it?
~Adura

You have all the data upfront in an array in memory whereas BinaryReader streams the bytes in one at a time from the source which I guess is a file on disk. I guess you could speed it up by passing it a stream that reads from an in-memory array:
Stream stream = new MemoryStream(byteArray);
//Pass the stream to BinaryReader
Note that with this approach you need to fill the entire file in memory at once but I guess that's ok for you.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Efficient reading structured binary data from a file - c#

Related

Read Value of Arbitrary Type From Byte Array

How to get a byte** from managed byte[] buffer

Memory leak at simple loading/saving of files

How to write file header using FileStream in C#

C# BinaryReader notably slower. Alternative?

Categories

Resources