How to prevent number from getting to large from wave - c#

So this is might become overly complex to explain, but I'll try to keep it simple yet informative. my program, which is written in C#.net, monitors a microphone for 2 seconds and returns the Maximum value from a sample. I'm not super well versed with how sound and so forth is generated from winmm.dll, but my program is based loosely on NAudio and another project from CodeProject to visualize a wave. The wave format that I am using is this
//WaveIn.cs
private WaveFormat Format= new WaveFormat(8000, 16,1);
//waveFormat.cs
[StructLayout(LayoutKind.Sequential)]
public class WaveFormat
{
public short wFormatTag;
public short nChannels;
public int nSamplesPerSec;
public int nAvgBytesPerSec;
public short nBlockAlign;
public short wBitsPerSample;
public short cbSize;
public WaveFormat(int rate, int bits, short channels)
{
wFormatTag = (short)WaveFormats.Pcm;
nChannels = channels;
nSamplesPerSec = rate;
wBitsPerSample = (short)bits;
cbSize = 0;
nBlockAlign = (short)(nChannels * (wBitsPerSample / 8));
nAvgBytesPerSec = nSamplesPerSec * nBlockAlign;
}
(I think i may have just found my problem, by posting this but i'm still going to ask)
so then i setup a event for max sound level in my wavein file. If i understand the source code correctly it fires when the buffer is full. here is that code
private void CallBack(IntPtr waveInHandle, WaveMessage message, int userData, ref WaveHeader waveHeader, IntPtr reserved)
{
if (message == WaveMessage.WIM_DATA)
{
GCHandle hBuffer = (GCHandle)waveHeader.dwUser;
WaveInBuffer buffer = (WaveInBuffer)hBuffer.Target;
Exception exception = null;
if (DataAvailable != null)
{
DataAvailable(buffer.Data, buffer.BytesRecorded);
}
if (MaxSoundLevel != null) //FOLLOW THIS ONE
{
byte[] waveStream = new byte[buffer.BytesRecorded];
Marshal.Copy(buffer.Data, waveStream, 0, buffer.BytesRecorded);
MaxSoundLevel(GetMaxSound(GetWaveChannels(waveStream)));
}
if (recording)
{
try
{
buffer.Reuse();
}
catch (Exception e)
{
recording = false;
exception = e;
}
}
}
}
private short[] GetWaveChannels(byte[] waveStream)
{
short[] monoWave = new short[waveStream.Length/2];
int h=0;
for (int i = 0 ; i < waveStream.Length; i += 2)
{
monoWave[h] = BitConverter.ToInt16(waveStream, i);
h++;
}
return monoWave;
}
private int GetMaxSound(short[] wave)
{
int maxSound = 0;
for (int i = 0; i < wave.Length; i++)
{
maxSound = Math.Max(maxSound, Math.Abs(wave[i]));
}
return maxSound;
}
so when i monitor it from this test here it won't crash if i keep sound levels to "normal"
[Test]
public void TestSound()
{
var waveIn = new WaveIn();
waveIn.MaxSoundLevel += new WaveIn.MaxSoundHandler(waveIn_MaxSoundLevel);
waveIn.StartRecording();
Console.WriteLine("Starting to record");
Thread.Sleep(4800); //record for 4.8 seconds.
waveIn.StopRecording();
Console.WriteLine("Done Recording");
}
void waveIn_MaxSoundLevel(int MaxSound)
{
Console.WriteLine("MaxSound:{0}", MaxSound);
}
here is my output
MaxSound:28
MaxSound:24
MaxSound:31
MaxSound:17
MaxSound:18760
Unhandled Exception: System.OverflowException: Negating the minimum value of a twos complement number is invalid.
I once got it to give me MaxSound:32767 (0x7FFF).
So i figured that my problem lied within it trying to convert a 32 bit number to a 16 bit number which is why i switched GetMaxSound from short to int. So I don't know. I am stumped. So why am I having this problem? doesn't my wave suggest it's max is 32,767 and that the winmm.dll would know that and not go past that? and since it is just converting 2 bytes of data to a short should it never encounter this problem? Please help :)

My solution, for those who may be looking into this, was fairly simple in nature. A 16bit signed number's maximum positive value is 32767. it's maximum negative number is -32768. If you take the absolute value of 32768 and try to put it into a 16 bit number it will result in a overflow exception being thrown. So the solution is to cast the short value to a 32 bit number before i try to take the absolute value of it. Here is the corrected function
private int GetMaxSound(short[] wave)
{
int maxSound = 0;
for (int i = 0; i < wave.Length; i++)
{
maxSound = Math.Max(maxSound, Math.Abs((int)wave[i]));
}
return maxSound;
}
I could probably just have stuck with an unsigned number as well by using ushort but the Math.Abs does

Related

Read binary objects from a file in C# written out by a C++ program

I am trying to read objects from very large files containing padded structs that were written into it by a C++ process. I was using an example to memory map the large file and try to deserialize the data into an object but I now can see that it won't work this way.
How can I extract all the objects from the files to use in C#? I'm probably way off but I've provided the code. The objects have a 8 byte milliseconds member followed by 21 16bit integers, which needs 6bytes of padding to align to a 8byte boundary.
[Serializable]
unsafe public struct DataStruct
{
public UInt64 milliseconds;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 21)]
public fixed Int16 data[21];
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
public fixed Int16 padding[3];
};
[Serializable]
public class DataArray
{
public DataStruct[] samples;
}
public static class Helper
{
public static Int16[] GetData(this DataStruct data)
{
unsafe
{
Int16[] output = new Int16[21];
for (int index = 0; index < 21; ++index)
output[index] = data.data[index];
return output;
}
}
}
class FileThreadSupport
{
struct DataFileInfo
{
public string path;
public UInt64 start;
public UInt64 stop;
public UInt64 elements;
};
// Create our epoch timestamp
private static readonly DateTime epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);
// Output TCP client
private Support.AsyncTcpClient output;
// Directory which contains our data
private string replay_directory;
// Files to be read from
private DataFileInfo[] file_infos;
// Current timestamp of when the process was started
UInt64 process_start = 0;
// Object from current file
DataArray current_file_data;
// Offset into current files
UInt64 current_file_index = 0;
// Offset into current files
UInt64 current_file_offset = 0;
// Run flag
bool run = true;
public FileThreadSupport(ref Support.AsyncTcpClient output, ref Engine.A.Information info, ref Support.Configuration configuration)
{
// Set our output directory
replay_directory = configuration.getString("replay_directory");
if (replay_directory.Length == 0)
{
Console.WriteLine("Configuration does not provide a replay directory");
return;
}
// Check the directory for playable files
if(!loadDataDirectory(replay_directory))
{
Console.WriteLine("Replay directory {} did not have any valid files", replay_directory);
}
// Set the output TCP client
this.output = output;
}
private bool loadDataDirectory(string directory)
{
string[] files = Directory.GetFiles(directory, "*.*", SearchOption.TopDirectoryOnly);
file_infos = new DataFileInfo[files.Length];
int index = 0;
foreach (string file in files)
{
string[] parts = file.Split('\\');
string name = parts.Last();
parts = name.Split('.');
if (parts.Length != 2)
continue;
UInt64 start, stop = 0;
if (!UInt64.TryParse(parts[0], out start) || !UInt64.TryParse(parts[1], out stop))
continue;
long size = new System.IO.FileInfo(file).Length;
// Add to our file info array
file_infos[index] = new DataFileInfo
{
path = file,
start = start,
stop = stop,
elements = (ulong)(new System.IO.FileInfo(file).Length / 56
/*System.Runtime.InteropServices.Marshal.SizeOf(typeof(DataStruct))*/)
};
++index;
}
// Sort the array
Array.Sort(file_infos, delegate (DataFileInfo x, DataFileInfo y) { return x.start.CompareTo(y.start); });
// Return whether or not there were files found
return (files.Length > 0);
}
public void start()
{
process_start = (ulong)DateTime.Now.ToUniversalTime().Subtract(epoch).TotalMilliseconds;
UInt64 num_samples = 0;
while(run)
{
// Get our samples and add it to the sample
DataStruct[] result = getData(100);
Engine.A.A message = new Engine.A.A();
for (int i = 0; i < result.Length; ++i)
{
Engine.A.Data sample = new Engine.A.Data();
sample.Time = process_start + num_samples * 4;
Int16[] signal_data = Helper.GetData(result[i]);
for(int e = 0; e < signal_data.Length; ++e)
sample.Value[e] = signal_data[e];
message.Signal.Add(sample);
++num_samples;
}
// Send out the websocket
this.output.SendAsync(message.ToByteArray());
// Sleep 100 milliseconds
Thread.Sleep(100);
}
}
public void stop()
{
run = false;
}
private DataStruct[] getData(UInt64 milliseconds)
{
if (file_infos.Length == 0)
return new DataStruct[0];
if (current_file_data == null)
{
current_file_data = ReadObjectFromMMF(file_infos[current_file_index].path) as DataArray;
if(current_file_data.samples.Length == 0)
return new DataStruct[0];
}
UInt64 elements_to_read = (UInt64) milliseconds / 4;
DataStruct[] result = new DataStruct[elements_to_read];
Array.Copy(current_file_data.samples, (int)current_file_offset, result, 0, (int) Math.Min(elements_to_read, file_infos[current_file_index].elements - current_file_offset));
while((UInt64) result.Length != elements_to_read)
{
current_file_index = (current_file_index + 1) % (ulong) file_infos.Length;
current_file_data = ReadObjectFromMMF(file_infos[current_file_index].path) as DataArray;
if (current_file_data.samples.Length == 0)
return new DataStruct[0];
current_file_offset = 0;
Array.Copy(current_file_data.samples, (int)current_file_offset, result, result.Length, (int)Math.Min(elements_to_read, file_infos[current_file_index].elements - current_file_offset));
}
return result;
}
private object ByteArrayToObject(byte[] buffer)
{
BinaryFormatter binaryFormatter = new BinaryFormatter(); // Create new BinaryFormatter
MemoryStream memoryStream = new MemoryStream(buffer); // Convert buffer to memorystream
return binaryFormatter.Deserialize(memoryStream); // Deserialize stream to an object
}
private object ReadObjectFromMMF(string file)
{
// Get a handle to an existing memory mapped file
using (MemoryMappedFile mmf = MemoryMappedFile.CreateFromFile(file, FileMode.Open))
{
// Create a view accessor from which to read the data
using (MemoryMappedViewAccessor mmfReader = mmf.CreateViewAccessor())
{
// Create a data buffer and read entire MMF view into buffer
byte[] buffer = new byte[mmfReader.Capacity];
mmfReader.ReadArray<byte>(0, buffer, 0, buffer.Length);
// Convert the buffer to a .NET object
return ByteArrayToObject(buffer);
}
}
}
Well for one thing you're not using that memory mapped file well at all, you're just sequentially reading it all in a buffer, which is both needlessly inefficient and much slower than if you simply opened the file to read normally. The selling point of memory mapped files is repeated random access and random updates backed by the OS's virtual memory paging.
And you definitely don't need to read the entire file in memory, since your data is so strongly structured. You know exactly how many bytes to read for a record: Marshal.SizeOf<DataStruct>().
Then you need to get rid of all that serialization noise. Again your data is strongly typed, just read it. Get rid of those fixed arrays and use regular arrays, you're already instructing the marshaller how to read them with MarshalAs attributes (good). That also gets rid of that helper function that just copies an array for some unknown reason.
Your reading loop is very simple: read the correct number of bytes for one entry, use Marshal.PtrToStructure to convert it to a readable structure and add it to a list to return at the end. Bonus points if you can use .Net Core and Unsafe.As or Unsafe.Cast.
Edit: and don't use object returns, you know exactly what you're returning, write it down.

VST host playback timing issues (VST.NET + NAudio)

I'm just getting started in trying to work out VST plugin hosting for a small music program that I've been working on for a while. I've now reached the point where I'm able to take melodies stored within my program and send the midi data to a hosted plugin (using VST.NET) and outputting the audio to WaveOut (NAudio). The problem is that the audio output is playing far too quickly and also not in time.
Here's the code that I'm using for playback, parts of which are based on the example host within the VST.NET sample projects:
public class PhraseEditorWaveProvider : VstWaveProvider
{
public PhraseEditor PhraseEditor { get; private set; }
public Rational PlaybackBeat { get; private set; }
public PhraseEditorWaveProvider(PhraseEditor phraseEditor, string pluginPath, WaveFormat waveFormat = null)
: base(pluginPath, waveFormat)
{
PhraseEditor = phraseEditor;
}
public override int Read(byte[] buffer, int offset, int count)
{
decimal framesPerBeat = (60 / (decimal)PhraseEditor.Phrase.Tempo) * WaveFormat.SampleRate;
Rational startBeat = PlaybackBeat;
Rational endBeat = startBeat + Rational.FromDecimal(count / framesPerBeat);
//Get list of note starts and note ends that occur within the beat range
List<VstEvent> vstEvents = new List<VstEvent>();
foreach(Note note in PhraseEditor.Phrase.Notes)
{
if(note.StartBeat >= startBeat && note.StartBeat < endBeat)
vstEvents.Add(NoteOnEvent(1, (byte)note.Pitch.Value, 100, (int)(note.Duration * framesPerBeat), (int)((note.StartBeat - startBeat) * framesPerBeat)));
if(note.EndBeat >= startBeat && note.EndBeat < endBeat)
vstEvents.Add(NoteOffEvent(1, (byte)note.Pitch.Value, (int)((note.EndBeat - startBeat) * framesPerBeat)));
}
foreach(Chord chord in PhraseEditor.Phrase.Chords)
{
if(chord.StartBeat >= startBeat && chord.StartBeat < endBeat)
{
//Play each note within a chord in the 4th octave, with velocity 70
foreach (Pitch pitch in chord.Pitches)
vstEvents.Add(NoteOnEvent(1, (byte)((pitch.Value % 12) + 48), 70, (int)(chord.Duration * framesPerBeat), (int)((chord.StartBeat - startBeat) * framesPerBeat)));
}
if(chord.EndBeat >= startBeat && chord.EndBeat < endBeat)
{
foreach(Pitch pitch in chord.Pitches)
vstEvents.Add(NoteOffEvent(1, (byte)((pitch.Value % 12) + 48), (int)((chord.EndBeat - startBeat) * framesPerBeat)));
}
}
PlaybackBeat = endBeat;
return base.Read(vstEvents.OrderBy(x => x.DeltaFrames).ToArray(), buffer, offset, count);
}
}
public abstract class VstWaveProvider : IWaveProvider
{
private WaveFormat _waveFormat;
public WaveFormat WaveFormat
{
get
{
return _waveFormat;
}
set
{
_waveFormat = value;
BytesPerWaveSample = _waveFormat.BitsPerSample / 8;
}
}
public VstPluginContext VstContext { get; private set; }
public int BytesPerWaveSample { get; private set; }
public VstWaveProvider(VstPluginContext vstContext, WaveFormat waveFormat = null)
{
WaveFormat = (waveFormat == null) ? new WaveFormat(44100, 2) : waveFormat;
VstContext = vstContext;
}
public VstWaveProvider(string pluginPath, WaveFormat waveFormat = null)
{
WaveFormat = (waveFormat == null) ? new WaveFormat(44100, 2) : waveFormat;
VstContext = OpenPlugin(pluginPath);
}
public abstract int Read(byte[] buffer, int offset, int count);
protected int Read(VstEvent[] vstEvents, byte[] outputBuffer, int offset, int count)
{
VstAudioBufferManager inputBuffers = new VstAudioBufferManager(
VstContext.PluginInfo.AudioInputCount,
count / (Math.Max(1, VstContext.PluginInfo.AudioInputCount) * BytesPerWaveSample)
);
return Read(inputBuffers, vstEvents, outputBuffer, offset, count);
}
protected int Read(VstAudioBufferManager inputBuffers, VstEvent[] vstEvents, byte[] outputBuffer, int offset, int count)
{
VstAudioBufferManager outputBuffers = new VstAudioBufferManager(
VstContext.PluginInfo.AudioOutputCount,
count / (VstContext.PluginInfo.AudioOutputCount * BytesPerWaveSample)
);
VstContext.PluginCommandStub.StartProcess();
if(vstEvents.Length > 0)
VstContext.PluginCommandStub.ProcessEvents(vstEvents);
VstContext.PluginCommandStub.ProcessReplacing(inputBuffers.ToArray(), outputBuffers.ToArray());
VstContext.PluginCommandStub.StopProcess();
//Convert from multi-track to interleaved data
int bufferIndex = offset;
for (int i = 0; i < outputBuffers.BufferSize; i++)
{
foreach (VstAudioBuffer vstBuffer in outputBuffers)
{
Int16 waveValue = (Int16)((vstBuffer[i] + 1) * 128);
byte[] bytes = BitConverter.GetBytes(waveValue);
outputBuffer[bufferIndex] = bytes[0];
outputBuffer[bufferIndex + 1] = bytes[1];
bufferIndex += 2;
}
}
return count;
}
private VstPluginContext OpenPlugin(string pluginPath)
{
HostCommandStub hostCmdStub = new HostCommandStub();
hostCmdStub.PluginCalled += new EventHandler<PluginCalledEventArgs>(HostCmdStub_PluginCalled);
VstPluginContext ctx = VstPluginContext.Create(pluginPath, hostCmdStub);
ctx.Set("PluginPath", pluginPath);
ctx.Set("HostCmdStub", hostCmdStub);
ctx.PluginCommandStub.Open();
ctx.PluginCommandStub.MainsChanged(true);
return ctx;
}
private void HostCmdStub_PluginCalled(object sender, PluginCalledEventArgs e)
{
Debug.WriteLine(e.Message);
}
protected VstMidiEvent NoteOnEvent(byte channel, byte pitch, byte velocity, int noteLength, int deltaFrames = 0)
{
return new VstMidiEvent(deltaFrames, noteLength, 0, new byte[] { (byte)(144 + channel), pitch, velocity, 0 }, 0, 0);
}
protected VstMidiEvent NoteOffEvent(byte channel, byte pitch, int deltaFrames = 0)
{
return new VstMidiEvent(deltaFrames, 0, 0, new byte[] { (byte)(144 + channel), pitch, 0, 0 }, 0, 0);
}
}
Which would get called by the following:
WaveOut waveOut = new WaveOut(WaveCallbackInfo.FunctionCallback());
waveOut.Init(new PhraseEditorWaveProvider(this, #"C:\Users\james\Downloads\Cobalt\Cobalt\Cobalt 64bit\Cobalt.dll"));
waveOut.Play();
Where Cobalt is the current plugin that I'm using for testing.
For context, Rational is my own data type since other parts of my program are doing lots of manipulation of melodies and I found that doubles and decimals weren't giving me the precision that I required.
Also, both the VST plugin context and WaveOut are set to have sample rates of 44.1kHz, so there shouldn't need to be any up/down-sampling when passing the plugin output data into the WaveOut buffer.
I'm at a complete loss as to why the audio is playing back faster than expected. It seems to be roughly 4x faster than expected. If anyone can give any pointers what may be causing this I'd be hugely grateful.
With it playing out of time, I suspect that this is down to me not understanding correctly how the deltaFrame property works within VstMidiEvent. I've tried playing around with both deltaFrame and noteOffset, though don't seem to be having much luck with either, I'm currently working under the assumption that they measure the number of audio frames from the start of the current block of data, to the time of the event within that block. Unfortunately I've been struggling to find much useful documentation on this so it could be that I'm totally wrong about this though.
Look forward to any responses
Kind regards
James
Ok, I think I found what was causing the problem, it was in this section of code:
public override int Read(byte[] buffer, int offset, int count)
{
decimal framesPerBeat = (60 / (decimal)PhraseEditor.Phrase.Tempo) * WaveFormat.SampleRate;
Rational startBeat = PlaybackBeat;
Rational endBeat = startBeat + Rational.FromDecimal(count / framesPerBeat);
...
}
Which I just changed to this:
public override int Read(byte[] buffer, int offset, int count)
{
decimal framesPerBeat = (60 / (decimal)PhraseEditor.Phrase.Tempo) * WaveFormat.SampleRate;
int samplesRequired = count / (WaveFormat.Channels * (WaveFormat.BitsPerSample / 8));
Rational startBeat = PlaybackBeat;
Rational endBeat = startBeat + Rational.FromDecimal(samplesRequired / framesPerBeat);
...
}
Dumb mistake on my part, I'd been converting from bit-rate to sample-rate everywhere except in my method for getting upcoming midi events. My audio is now playing at a rate far closer to what I'd expect, and seems to be more reliable on the timing, though I haven't had a chance to fully test this yet.

C# performance - Using unsafe pointers instead of IntPtr and Marshal

Question
I'm porting a C application into C#. The C app calls lots of functions from a 3rd-party DLL, so I wrote P/Invoke wrappers for these functions in C#. Some of these C functions allocate data which I have to use in the C# app, so I used IntPtr's, Marshal.PtrToStructure and Marshal.Copy to copy the native data (arrays and structures) into managed variables.
Unfortunately, the C# app proved to be much slower than the C version. A quick performance analysis showed that the above mentioned marshaling-based data copying is the bottleneck. I'm considering to speed up the C# code by rewriting it to use pointers instead. Since I don't have experience with unsafe code and pointers in C#, I need expert opinion regarding the following questions:
What are the drawbacks of using unsafe code and pointers instead of IntPtr and Marshaling? For example, is it more unsafe (pun intended) in any way? People seem to prefer marshaling, but I don't know why.
Is using pointers for P/Invoking really faster than using marshaling? How much speedup can be expected approximately? I couldn't find any benchmark tests for this.
Example code
To make the situation more clear, I hacked together a small example code (the real code is much more complex). I hope this example shows what I mean when I'm talking about "unsafe code and pointers" vs. "IntPtr and Marshal".
C library (DLL)
MyLib.h
#ifndef _MY_LIB_H_
#define _MY_LIB_H_
struct MyData
{
int length;
unsigned char* bytes;
};
__declspec(dllexport) void CreateMyData(struct MyData** myData, int length);
__declspec(dllexport) void DestroyMyData(struct MyData* myData);
#endif // _MY_LIB_H_
MyLib.c
#include <stdlib.h>
#include "MyLib.h"
void CreateMyData(struct MyData** myData, int length)
{
int i;
*myData = (struct MyData*)malloc(sizeof(struct MyData));
if (*myData != NULL)
{
(*myData)->length = length;
(*myData)->bytes = (unsigned char*)malloc(length * sizeof(char));
if ((*myData)->bytes != NULL)
for (i = 0; i < length; ++i)
(*myData)->bytes[i] = (unsigned char)(i % 256);
}
}
void DestroyMyData(struct MyData* myData)
{
if (myData != NULL)
{
if (myData->bytes != NULL)
free(myData->bytes);
free(myData);
}
}
C application
Main.c
#include <stdio.h>
#include "MyLib.h"
void main()
{
struct MyData* myData = NULL;
int length = 100 * 1024 * 1024;
printf("=== C++ test ===\n");
CreateMyData(&myData, length);
if (myData != NULL)
{
printf("Length: %d\n", myData->length);
if (myData->bytes != NULL)
printf("First: %d, last: %d\n", myData->bytes[0], myData->bytes[myData->length - 1]);
else
printf("myData->bytes is NULL");
}
else
printf("myData is NULL\n");
DestroyMyData(myData);
getchar();
}
C# application, which uses IntPtr and Marshal
Program.cs
using System;
using System.Runtime.InteropServices;
public static class Program
{
[StructLayout(LayoutKind.Sequential)]
private struct MyData
{
public int Length;
public IntPtr Bytes;
}
[DllImport("MyLib.dll")]
private static extern void CreateMyData(out IntPtr myData, int length);
[DllImport("MyLib.dll")]
private static extern void DestroyMyData(IntPtr myData);
public static void Main()
{
Console.WriteLine("=== C# test, using IntPtr and Marshal ===");
int length = 100 * 1024 * 1024;
IntPtr myData1;
CreateMyData(out myData1, length);
if (myData1 != IntPtr.Zero)
{
MyData myData2 = (MyData)Marshal.PtrToStructure(myData1, typeof(MyData));
Console.WriteLine("Length: {0}", myData2.Length);
if (myData2.Bytes != IntPtr.Zero)
{
byte[] bytes = new byte[myData2.Length];
Marshal.Copy(myData2.Bytes, bytes, 0, myData2.Length);
Console.WriteLine("First: {0}, last: {1}", bytes[0], bytes[myData2.Length - 1]);
}
else
Console.WriteLine("myData.Bytes is IntPtr.Zero");
}
else
Console.WriteLine("myData is IntPtr.Zero");
DestroyMyData(myData1);
Console.ReadKey(true);
}
}
C# application, which uses unsafe code and pointers
Program.cs
using System;
using System.Runtime.InteropServices;
public static class Program
{
[StructLayout(LayoutKind.Sequential)]
private unsafe struct MyData
{
public int Length;
public byte* Bytes;
}
[DllImport("MyLib.dll")]
private unsafe static extern void CreateMyData(out MyData* myData, int length);
[DllImport("MyLib.dll")]
private unsafe static extern void DestroyMyData(MyData* myData);
public unsafe static void Main()
{
Console.WriteLine("=== C# test, using unsafe code ===");
int length = 100 * 1024 * 1024;
MyData* myData;
CreateMyData(out myData, length);
if (myData != null)
{
Console.WriteLine("Length: {0}", myData->Length);
if (myData->Bytes != null)
Console.WriteLine("First: {0}, last: {1}", myData->Bytes[0], myData->Bytes[myData->Length - 1]);
else
Console.WriteLine("myData.Bytes is null");
}
else
Console.WriteLine("myData is null");
DestroyMyData(myData);
Console.ReadKey(true);
}
}
It's a little old thread, but I recently made excessive performance tests with marshaling in C#. I need to unmarshal lots of data from a serial port over many days. It was important to me to have no memory leaks (because the smallest leak will get significant after a couple of million calls) and I also made a lot of statistical performance (time used) tests with very big structs (>10kb) just for the sake of it (an no, you should never have a 10kb struct :-) )
I tested the following three unmarshalling strategies (I also tested the marshalling). In nearly all cases the first one (MarshalMatters) outperformed the other two.
Marshal.Copy was always slowest by far, the other two were mostly very close together in the race.
Using unsafe code can pose a significant security risk.
First:
public class MarshalMatters
{
public static T ReadUsingMarshalUnsafe<T>(byte[] data) where T : struct
{
unsafe
{
fixed (byte* p = &data[0])
{
return (T)Marshal.PtrToStructure(new IntPtr(p), typeof(T));
}
}
}
public unsafe static byte[] WriteUsingMarshalUnsafe<selectedT>(selectedT structure) where selectedT : struct
{
byte[] byteArray = new byte[Marshal.SizeOf(structure)];
fixed (byte* byteArrayPtr = byteArray)
{
Marshal.StructureToPtr(structure, (IntPtr)byteArrayPtr, true);
}
return byteArray;
}
}
Second:
public class Adam_Robinson
{
private static T BytesToStruct<T>(byte[] rawData) where T : struct
{
T result = default(T);
GCHandle handle = GCHandle.Alloc(rawData, GCHandleType.Pinned);
try
{
IntPtr rawDataPtr = handle.AddrOfPinnedObject();
result = (T)Marshal.PtrToStructure(rawDataPtr, typeof(T));
}
finally
{
handle.Free();
}
return result;
}
/// <summary>
/// no Copy. no unsafe. Gets a GCHandle to the memory via Alloc
/// </summary>
/// <typeparam name="selectedT"></typeparam>
/// <param name="structure"></param>
/// <returns></returns>
public static byte[] StructToBytes<T>(T structure) where T : struct
{
int size = Marshal.SizeOf(structure);
byte[] rawData = new byte[size];
GCHandle handle = GCHandle.Alloc(rawData, GCHandleType.Pinned);
try
{
IntPtr rawDataPtr = handle.AddrOfPinnedObject();
Marshal.StructureToPtr(structure, rawDataPtr, false);
}
finally
{
handle.Free();
}
return rawData;
}
}
Third:
/// <summary>
/// http://stackoverflow.com/questions/2623761/marshal-ptrtostructure-and-back-again-and-generic-solution-for-endianness-swap
/// </summary>
public class DanB
{
/// <summary>
/// uses Marshal.Copy! Not run in unsafe. Uses AllocHGlobal to get new memory and copies.
/// </summary>
public static byte[] GetBytes<T>(T structure) where T : struct
{
var size = Marshal.SizeOf(structure); //or Marshal.SizeOf<selectedT>(); in .net 4.5.1
byte[] rawData = new byte[size];
IntPtr ptr = Marshal.AllocHGlobal(size);
Marshal.StructureToPtr(structure, ptr, true);
Marshal.Copy(ptr, rawData, 0, size);
Marshal.FreeHGlobal(ptr);
return rawData;
}
public static T FromBytes<T>(byte[] bytes) where T : struct
{
var structure = new T();
int size = Marshal.SizeOf(structure); //or Marshal.SizeOf<selectedT>(); in .net 4.5.1
IntPtr ptr = Marshal.AllocHGlobal(size);
Marshal.Copy(bytes, 0, ptr, size);
structure = (T)Marshal.PtrToStructure(ptr, structure.GetType());
Marshal.FreeHGlobal(ptr);
return structure;
}
}
Considerations in Interoperability explains why and when Marshaling is required and at what cost. Quote:
Marshaling occurs when a caller and a callee cannot operate on the same instance of data.
repeated marshaling can negatively affect the performance of your application.
Therefore, answering your question if
... using pointers for P/Invoking really faster than using marshaling ...
first ask yourself a question if the managed code is able to operate on the unmanaged method return value instance. If the answer is yes then Marshaling and the associated performance cost is not required.
The approximate time saving would be O(n) function where n of the size of the marshalled instance.
In addition, not keeping both managed and unmanaged blocks of data in memory at the same time for the duration of the method (in "IntPtr and Marshal" example) eliminates additional overhead and the memory pressure.
What are the drawbacks of using unsafe code and pointers ...
The drawback is the risk associated with accessing the memory directly through pointers. There is nothing less safe to it than using pointers in C or C++. Use it if needed and makes sense. More details are here.
There is one "safety" concern with the presented examples: releasing of allocated unmanaged memory is not guaranteed after the managed code errors. The best practice is to
CreateMyData(out myData1, length);
if(myData1!=IntPtr.Zero) {
try {
// -> use myData1
...
// <-
}
finally {
DestroyMyData(myData1);
}
}
For anyone still reading,
Something I don't think I saw in any of the answers, - unsafe code does present something of a security risk. It's not a huge risk, it would be something quite challenging to exploit. However, if like me you work in a PCI compliant organization, unsafe code is disallowed by policy for this reason.
Managed code is normally very secure because the CLR takes care of memory location and allocation, preventing you from accessing or writing any memory you're not supposed to.
When you use the unsafe keyword and compile with '/unsafe' and use pointers, you bypass these checks and create the potential for someone to use your application to gain some level of unauthorized access to the machine it is running on. Using something like a buffer-overrun attack, your code could be tricked into writing instructions into an area of memory that might then be accessed by the program counter (i.e. code injection), or just crash the machine.
Many years ago, SQL server actually fell prey to malicious code delivered in a TDS packet that was far longer than it was supposed to be. The method reading the packet didn't check the length and continued to write the contents past the reserved address space. The extra length and content were carefully crafted such that it wrote an entire program into memory - at the address of the next method.
The attacker then had their own code being executed by the SQL server within a context that had the highest level of access. It didn't even need to break the encryption as the vulnerability was below this point in the transport layer stack.
Just wanted to add my experience to this old thread:
We used Marshaling in sound recording software - we received real time sound data from mixer into native buffers and marshaled it to byte[]. That was real performance killer. We were forced to move to unsafe structs as the only way to complete the task.
In case you don't have large native structs and don't mind that all data is filled twice - Marshaling is more elegant and much, much safer approach.
Two answers,
Unsafe code means it is not managed by the CLR. You need to take care of resources it uses.
You cannot scale the performance because there are so many factors effecting it. But definitely using pointers will be much faster.
Because you stated that your code calls to 3rd-party DLL, I think the unsafe code is more suited in you scenario. You ran into a particular situation of wapping variable-length array in a struct; I know, I know this kind of usage occurs all the time, but it's not always the case after all. You might want to have a look of some questions about this, for example:
How do I marshal a struct that contains a variable-sized array to C#?
If .. I say if .. you can modify the third party libraries a bit for this particular case, then you might consider the following usage:
using System.Runtime.InteropServices;
public static class Program { /*
[StructLayout(LayoutKind.Sequential)]
private struct MyData {
public int Length;
public byte[] Bytes;
} */
[DllImport("MyLib.dll")]
// __declspec(dllexport) void WINAPI CreateMyDataAlt(BYTE bytes[], int length);
private static extern void CreateMyDataAlt(byte[] myData, ref int length);
/*
[DllImport("MyLib.dll")]
private static extern void DestroyMyData(byte[] myData); */
public static void Main() {
Console.WriteLine("=== C# test, using IntPtr and Marshal ===");
int length = 100*1024*1024;
var myData1 = new byte[length];
CreateMyDataAlt(myData1, ref length);
if(0!=length) {
// MyData myData2 = (MyData)Marshal.PtrToStructure(myData1, typeof(MyData));
Console.WriteLine("Length: {0}", length);
/*
if(myData2.Bytes!=IntPtr.Zero) {
byte[] bytes = new byte[myData2.Length];
Marshal.Copy(myData2.Bytes, bytes, 0, myData2.Length); */
Console.WriteLine("First: {0}, last: {1}", myData1[0], myData1[length-1]); /*
}
else {
Console.WriteLine("myData.Bytes is IntPtr.Zero");
} */
}
else {
Console.WriteLine("myData is empty");
}
// DestroyMyData(myData1);
Console.ReadKey(true);
}
}
As you can see much of your original marshalling code is commented out, and declared a CreateMyDataAlt(byte[], ref int) for a coresponding modified external unmanaged function CreateMyDataAlt(BYTE [], int). Some of the data copy and pointer check turns to be unnecessary, that says, the code can be even simpler and probably runs faster.
So, what's so different with the modification? The byte array is now marshalled directly without warpping in a struct and passed to the unmanaged side. You don't allocate the memory within the unmanaged code, rather, just filling data to it(implementation details omitted); and after the call, the data needed is provided to the managed side. If you want to present that the data is not filled and should not be used, you can simply set length to zero to tell the managed side. Because the byte array is allocated within the managed side, it'll be collected sometime, you don't have to take care of that.
I had the same question today and I was looking for some concrete measurement values, but I couldn't find any. So I wrote my own tests.
The test is copying pixel data of a 10k x 10k RGB image. The image data is 300 MB (3*10^9 bytes). Some methods copy this data 10 times, others are faster and therefore copy it 100 times. The used copying methods include
array access via byte pointer
Marshal.Copy(): a) 1 * 300 MB, b) 1e9 * 3 bytes
Buffer.BlockCopy(): a) 1 * 300 MB, b) 1e9 * 3 bytes
Test environment:
CPU: Intel Core i7-3630QM # 2.40 GHz
OS: Win 7 Pro x64 SP1
Visual Studio 2015.3, code is C++/CLI, targeted .net version is 4.5.2, compiled for Debug.
Test results:
The CPU load is 100% for 1 core at all methods (equals 12.5% total CPU load).
Comparison of speed and execution time:
method speed exec.time
Marshal.Copy (1*300MB) 100 % 100%
Buffer.BlockCopy (1*300MB) 98 % 102%
Pointer 4.4 % 2280%
Buffer.BlockCopy (1e9*3B) 1.4 % 7120%
Marshal.Copy (1e9*3B) 0.95% 10600%
Execution times and calculated average throughput written as comments in the code below.
//------------------------------------------------------------------------------
static void CopyIntoBitmap_Pointer (array<unsigned char>^ i_aui8ImageData,
BitmapData^ i_ptrBitmap,
int i_iBytesPerPixel)
{
char* scan0 = (char*)(i_ptrBitmap->Scan0.ToPointer ());
int ixCnt = 0;
for (int ixRow = 0; ixRow < i_ptrBitmap->Height; ixRow++)
{
for (int ixCol = 0; ixCol < i_ptrBitmap->Width; ixCol++)
{
char* pPixel = scan0 + ixRow * i_ptrBitmap->Stride + ixCol * 3;
pPixel[0] = i_aui8ImageData[ixCnt++];
pPixel[1] = i_aui8ImageData[ixCnt++];
pPixel[2] = i_aui8ImageData[ixCnt++];
}
}
}
//------------------------------------------------------------------------------
static void CopyIntoBitmap_MarshallLarge (array<unsigned char>^ i_aui8ImageData,
BitmapData^ i_ptrBitmap)
{
IntPtr ptrScan0 = i_ptrBitmap->Scan0;
Marshal::Copy (i_aui8ImageData, 0, ptrScan0, i_aui8ImageData->Length);
}
//------------------------------------------------------------------------------
static void CopyIntoBitmap_MarshalSmall (array<unsigned char>^ i_aui8ImageData,
BitmapData^ i_ptrBitmap,
int i_iBytesPerPixel)
{
int ixCnt = 0;
for (int ixRow = 0; ixRow < i_ptrBitmap->Height; ixRow++)
{
for (int ixCol = 0; ixCol < i_ptrBitmap->Width; ixCol++)
{
IntPtr ptrScan0 = IntPtr::Add (i_ptrBitmap->Scan0, i_iBytesPerPixel);
Marshal::Copy (i_aui8ImageData, ixCnt, ptrScan0, i_iBytesPerPixel);
ixCnt += i_iBytesPerPixel;
}
}
}
//------------------------------------------------------------------------------
void main ()
{
int iWidth = 10000;
int iHeight = 10000;
int iBytesPerPixel = 3;
Bitmap^ oBitmap = gcnew Bitmap (iWidth, iHeight, PixelFormat::Format24bppRgb);
BitmapData^ oBitmapData = oBitmap->LockBits (Rectangle (0, 0, iWidth, iHeight), ImageLockMode::WriteOnly, oBitmap->PixelFormat);
array<unsigned char>^ aui8ImageData = gcnew array<unsigned char> (iWidth * iHeight * iBytesPerPixel);
int ixCnt = 0;
for (int ixRow = 0; ixRow < iHeight; ixRow++)
{
for (int ixCol = 0; ixCol < iWidth; ixCol++)
{
aui8ImageData[ixCnt++] = ixRow * 250 / iHeight;
aui8ImageData[ixCnt++] = ixCol * 250 / iWidth;
aui8ImageData[ixCnt++] = ixCol;
}
}
//========== Pointer ==========
// ~ 8.97 sec for 10k * 10k * 3 * 10 exec, ~ 334 MB/s
int iExec = 10;
DateTime dtStart = DateTime::Now;
for (int ixExec = 0; ixExec < iExec; ixExec++)
{
CopyIntoBitmap_Pointer (aui8ImageData, oBitmapData, iBytesPerPixel);
}
TimeSpan tsDuration = DateTime::Now - dtStart;
Console::WriteLine (tsDuration + " " + ((double)aui8ImageData->Length * iExec / tsDuration.TotalSeconds / 1e6));
//========== Marshal.Copy, 1 large block ==========
// 3.94 sec for 10k * 10k * 3 * 100 exec, ~ 7617 MB/s
iExec = 100;
dtStart = DateTime::Now;
for (int ixExec = 0; ixExec < iExec; ixExec++)
{
CopyIntoBitmap_MarshallLarge (aui8ImageData, oBitmapData);
}
tsDuration = DateTime::Now - dtStart;
Console::WriteLine (tsDuration + " " + ((double)aui8ImageData->Length * iExec / tsDuration.TotalSeconds / 1e6));
//========== Marshal.Copy, many small 3-byte blocks ==========
// 41.7 sec for 10k * 10k * 3 * 10 exec, ~ 72 MB/s
iExec = 10;
dtStart = DateTime::Now;
for (int ixExec = 0; ixExec < iExec; ixExec++)
{
CopyIntoBitmap_MarshalSmall (aui8ImageData, oBitmapData, iBytesPerPixel);
}
tsDuration = DateTime::Now - dtStart;
Console::WriteLine (tsDuration + " " + ((double)aui8ImageData->Length * iExec / tsDuration.TotalSeconds / 1e6));
//========== Buffer.BlockCopy, 1 large block ==========
// 4.02 sec for 10k * 10k * 3 * 100 exec, ~ 7467 MB/s
iExec = 100;
array<unsigned char>^ aui8Buffer = gcnew array<unsigned char> (aui8ImageData->Length);
dtStart = DateTime::Now;
for (int ixExec = 0; ixExec < iExec; ixExec++)
{
Buffer::BlockCopy (aui8ImageData, 0, aui8Buffer, 0, aui8ImageData->Length);
}
tsDuration = DateTime::Now - dtStart;
Console::WriteLine (tsDuration + " " + ((double)aui8ImageData->Length * iExec / tsDuration.TotalSeconds / 1e6));
//========== Buffer.BlockCopy, many small 3-byte blocks ==========
// 28.0 sec for 10k * 10k * 3 * 10 exec, ~ 107 MB/s
iExec = 10;
dtStart = DateTime::Now;
for (int ixExec = 0; ixExec < iExec; ixExec++)
{
int ixCnt = 0;
for (int ixRow = 0; ixRow < iHeight; ixRow++)
{
for (int ixCol = 0; ixCol < iWidth; ixCol++)
{
Buffer::BlockCopy (aui8ImageData, ixCnt, aui8Buffer, ixCnt, iBytesPerPixel);
ixCnt += iBytesPerPixel;
}
}
}
tsDuration = DateTime::Now - dtStart;
Console::WriteLine (tsDuration + " " + ((double)aui8ImageData->Length * iExec / tsDuration.TotalSeconds / 1e6));
oBitmap->UnlockBits (oBitmapData);
oBitmap->Save ("d:\\temp\\bitmap.bmp", ImageFormat::Bmp);
}
related information:
Why is memcpy() and memmove() faster than pointer increments?
Array.Copy vs Buffer.BlockCopy, Answer https://stackoverflow.com/a/33865267
https://github.com/dotnet/coreclr/issues/2430 "Array.Copy & Buffer.BlockCopy x2 to x3 slower < 1kB"
https://github.com/dotnet/coreclr/blob/master/src/vm/comutilnative.cpp, Line 718 at the time of writing: Buffer.BlockCopy() uses memmove

How to speed up this calculation

Given two ARGB colors represented as integers, 8 bit/channel (alpha, red, green, blue), I need to compute a value that represents a sort of distance (also integer) between them.
So the formula for the distance is: Delta=|R1-R2|+|G1-G2|+|B1-B2| where Rx, Gx and Bx are the values of the channles of color 1 and 2. Alpha channel is always ignored.
I need to speed up this calculation because is done a lot of times on a slow machine. What is the 'geekies' way to calculate this on a single thread given the two integers.
My best so far is but I guess this can be improved further:
//Used for color conversion from/to int
private const int ChannelMask = 0xFF;
private const int GreenShift = 8;
private const int RedShift = 16;
public int ComputeColorDelta(int color1, int color2)
{
int rDelta = Math.Abs(((color1 >> RedShift) & ChannelMask) - ((color2 >> RedShift) & ChannelMask));
int gDelta = Math.Abs(((color1 >> GreenShift) & ChannelMask) - ((color2 >> GreenShift) & ChannelMask));
int bDelta = Math.Abs((color1 & ChannelMask) - (color2 & ChannelMask));
return rDelta + gDelta + bDelta;
}
Long Answer:
How many is "a lot"
I have a fast machine I guess, but I wrote this little script:
public static void Main() {
var s = Stopwatch.StartNew();
Random r = new Random();
for (int i = 0; i < 100000000; i++) {
int compute = ComputeColorDelta(r.Next(255), r.Next(255));
}
Console.WriteLine(s.ElapsedMilliseconds);
Console.ReadLine();
}
And the output is:
6878
So 7 seconds for 100 million times seems pretty good.
We can definitely speed this up though. I changed your function to look like this:
public static int ComputeColorDelta(int color1, int color2) {
return 1;
}
With that change, the output was: 5546. So, we managed to get a 1 second performance gain over 100 million iterations by returning a constant. ;)
Short answer: this function is not your bottleneck. :)
I'm trying to let runtime to make calculation for me.
First of all I define struct with explicit field offset
[StructLayout(LayoutKind.Explicit)]
public struct Color
{
[FieldOffset(0)] public int Raw;
[FieldOffset(0)] public byte Blue;
[FieldOffset(8)] public byte Green;
[FieldOffset(16)] public byte Red;
[FieldOffset(24)] public byte Alpha;
}
the calculation function will be:
public int ComputeColorDeltaOptimized(Color color1, Color color2)
{
int rDelta = Math.Abs(color1.Red - color2.Red);
int gDelta = Math.Abs(color1.Green - color2.Green);
int bDelta = Math.Abs(color1.Blue - color2.Blue);
return rDelta + gDelta + bDelta;
}
And the usage
public void FactMethodName2()
{
var s = Stopwatch.StartNew();
var color1 = new Color(); // This is a structs, so I can define they out of loop and gain some performance
var color2 = new Color();
for (int i = 0; i < 100000000; i++)
{
color1.Raw = i;
color2.Raw = 100000000 - i;
int compute = ComputeColorDeltaOptimized(color1, color2);
}
Console.WriteLine(s.ElapsedMilliseconds); //5393 vs 7472 of original
Console.ReadLine();
}
One idea would be to use the same code you already have, but in a different order: apply the mask, take the difference, then shift.
Another modification that might help is to inline this function: that is, instead of calling it for each pair of colors, just compute the difference directly, inside whatever loop executes this code. I assume it is inside a tight loop, because otherwise its cost would be negligible.
Lastly, since you're probably getting image pixel data, you'd save a lot by going the unsafe route: make your bitmaps like this EditableBitmap, then grab the byte* and read the image data out of it.
You can do this in order to reduce the AND operations:
public int ComputeColorDelta(int color1, int color2)
{
int rDelta = Math.Abs((((color1 >> RedShift) - (color2 >> RedShift))) & ChannelMask)));
// same for other color channels
return rDelta + gDelta + bDelta;
}
not much but something...

What is the fastest way to convert a float[] to a byte[]?

I would like to get a byte[] from a float[] as quickly as possible, without looping through the whole array (via a cast, probably). Unsafe code is fine. Thanks!
I am looking for a byte array 4 time longer than the float array (the dimension of the byte array will be 4 times that of the float array, since each float is composed of 4 bytes). I'll pass this to a BinaryWriter.
EDIT:
To those critics screaming "premature optimization":
I have benchmarked this using ANTS profiler before I optimized. There was a significant speed increase because the file has a write-through cache and the float array is exactly sized to match the sector size on the disk. The binary writer wraps a file handle created with pinvoke'd win32 API. The optimization occurs since this lessens the number of function calls.
And, with regard to memory, this application creates massive caches which use plenty of memory. I can allocate the byte buffer once and re-use it many times--the double memory usage in this particular instance amounts to a roundoff error in the overall memory consumption of the app.
So I guess the lesson here is not to make premature assumptions ;)
There is a dirty fast (not unsafe code) way of doing this:
[StructLayout(LayoutKind.Explicit)]
struct BytetoDoubleConverter
{
[FieldOffset(0)]
public Byte[] Bytes;
[FieldOffset(0)]
public Double[] Doubles;
}
//...
static Double Sum(byte[] data)
{
BytetoDoubleConverter convert = new BytetoDoubleConverter { Bytes = data };
Double result = 0;
for (int i = 0; i < convert.Doubles.Length / sizeof(Double); i++)
{
result += convert.Doubles[i];
}
return result;
}
This will work, but I'm not sure of the support on Mono or newer versions of the CLR. The only strange thing is that the array.Length is the bytes length. This can be explained because it looks at the array length stored with the array, and because this array was a byte array that length will still be in byte length. The indexer does think about the Double being eight bytes large so no calculation is necessary there.
I've looked for it some more, and it's actually described on MSDN, How to: Create a C/C++ Union by Using Attributes (C# and Visual Basic), so chances are this will be supported in future versions. I am not sure about Mono though.
Premature optimization is the root of all evil! #Vlad's suggestion to iterate over each float is a much more reasonable answer than switching to a byte[]. Take the following table of runtimes for increasing numbers of elements (average of 50 runs):
Elements BinaryWriter(float) BinaryWriter(byte[])
-----------------------------------------------------------
10 8.72ms 8.76ms
100 8.94ms 8.82ms
1000 10.32ms 9.06ms
10000 32.56ms 10.34ms
100000 213.28ms 739.90ms
1000000 1955.92ms 10668.56ms
There is little difference between the two for small numbers of elements. Once you get into the huge number of elements range, the time spent copying from the float[] to the byte[] far outweighs the benefits.
So go with what is simple:
float[] data = new float[...];
foreach(float value in data)
{
writer.Write(value);
}
There is a way which avoids memory copying and iteration.
You can use a really ugly hack to temporary change your array to another type using (unsafe) memory manipulation.
I tested this hack in both 32 & 64 bit OS, so it should be portable.
The source + sample usage is maintained at https://gist.github.com/1050703 , but for your convenience I'll paste it here as well:
public static unsafe class FastArraySerializer
{
[StructLayout(LayoutKind.Explicit)]
private struct Union
{
[FieldOffset(0)] public byte[] bytes;
[FieldOffset(0)] public float[] floats;
}
[StructLayout(LayoutKind.Sequential, Pack = 1)]
private struct ArrayHeader
{
public UIntPtr type;
public UIntPtr length;
}
private static readonly UIntPtr BYTE_ARRAY_TYPE;
private static readonly UIntPtr FLOAT_ARRAY_TYPE;
static FastArraySerializer()
{
fixed (void* pBytes = new byte[1])
fixed (void* pFloats = new float[1])
{
BYTE_ARRAY_TYPE = getHeader(pBytes)->type;
FLOAT_ARRAY_TYPE = getHeader(pFloats)->type;
}
}
public static void AsByteArray(this float[] floats, Action<byte[]> action)
{
if (floats.handleNullOrEmptyArray(action))
return;
var union = new Union {floats = floats};
union.floats.toByteArray();
try
{
action(union.bytes);
}
finally
{
union.bytes.toFloatArray();
}
}
public static void AsFloatArray(this byte[] bytes, Action<float[]> action)
{
if (bytes.handleNullOrEmptyArray(action))
return;
var union = new Union {bytes = bytes};
union.bytes.toFloatArray();
try
{
action(union.floats);
}
finally
{
union.floats.toByteArray();
}
}
public static bool handleNullOrEmptyArray<TSrc,TDst>(this TSrc[] array, Action<TDst[]> action)
{
if (array == null)
{
action(null);
return true;
}
if (array.Length == 0)
{
action(new TDst[0]);
return true;
}
return false;
}
private static ArrayHeader* getHeader(void* pBytes)
{
return (ArrayHeader*)pBytes - 1;
}
private static void toFloatArray(this byte[] bytes)
{
fixed (void* pArray = bytes)
{
var pHeader = getHeader(pArray);
pHeader->type = FLOAT_ARRAY_TYPE;
pHeader->length = (UIntPtr)(bytes.Length / sizeof(float));
}
}
private static void toByteArray(this float[] floats)
{
fixed(void* pArray = floats)
{
var pHeader = getHeader(pArray);
pHeader->type = BYTE_ARRAY_TYPE;
pHeader->length = (UIntPtr)(floats.Length * sizeof(float));
}
}
}
And the usage is:
var floats = new float[] {0, 1, 0, 1};
floats.AsByteArray(bytes =>
{
foreach (var b in bytes)
{
Console.WriteLine(b);
}
});
If you do not want any conversion to happen, I would suggest Buffer.BlockCopy().
public static void BlockCopy(
Array src,
int srcOffset,
Array dst,
int dstOffset,
int count
)
For example:
float[] floatArray = new float[1000];
byte[] byteArray = new byte[floatArray.Length * 4];
Buffer.BlockCopy(floatArray, 0, byteArray, 0, byteArray.Length);
You're better-off letting the BinaryWriter do this for you. There's going to be iteration over your entire set of data regardless of which method you use, so there's no point in playing with bytes.
Although you can obtain a byte* pointer using unsafe and fixed, you cannot convert the byte* to byte[] in order for the writer to accept it as a parameter without performing data copy. Which you do not want to do as it will double your memory footprint and add an extra iteration over the inevitable iteration that needs to be performed in order to output the data to disk.
Instead, you are still better off iterating over the array of floats and writing each float to the writer individually, using the Write(double) method. It will still be fast because of buffering inside the writer. See sixlettervariables's numbers.
Using the new Span<> in .Net Core 2.1 or later...
byte[] byteArray2 = MemoryMarshal.Cast<float, byte>(floatArray).ToArray();
Or, if Span can be used instead, then a direct reinterpret cast can be done: (very fast - zero copying)
Span<byte> byteArray3 = MemoryMarshal.Cast<float, byte>(floatArray);
// with span we can get a byte, set a byte, iterate, and more.
byte someByte = byteSpan[2];
byteSpan[2] = 33;
I did some crude benchmarks. The time taken for each is in the comments. [release/no debugger/x64]
float[] floatArray = new float[100];
for (int i = 0; i < 100; i++) floatArray[i] = i * 7.7777f;
Stopwatch start = Stopwatch.StartNew();
for (int j = 0; j < 100; j++)
{
start.Restart();
for (int k = 0; k < 1000; k++)
{
Span<byte> byteSpan = MemoryMarshal.Cast<float, byte>(floatArray);
}
long timeTaken1 = start.ElapsedTicks; ////// 0 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray2 = MemoryMarshal.Cast<float, byte>(floatArray).ToArray();
}
long timeTaken2 = start.ElapsedTicks; ////// 26 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
for (int i = 0; i < floatArray.Length; i++)
BitConverter.GetBytes(floatArray[i]).CopyTo(byteArray, i * sizeof(float));
}
long timeTaken3 = start.ElapsedTicks; ////// 1310 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
Buffer.BlockCopy(floatArray, 0, byteArray, 0, byteArray.Length);
}
long timeTaken4 = start.ElapsedTicks; ////// 33 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
MemoryStream memStream = new MemoryStream();
BinaryWriter writer = new BinaryWriter(memStream);
foreach (float value in floatArray)
writer.Write(value);
writer.Close();
}
long timeTaken5 = start.ElapsedTicks; ////// 1080 ticks //////
Console.WriteLine($"{timeTaken1/10,6} {timeTaken2 / 10,6} {timeTaken3 / 10,6} {timeTaken4 / 10,6} {timeTaken5 / 10,6} ");
}
We have a class called LudicrousSpeedSerialization and it contains the following unsafe method:
static public byte[] ConvertFloatsToBytes(float[] data)
{
int n = data.Length;
byte[] ret = new byte[n * sizeof(float)];
if (n == 0) return ret;
unsafe
{
fixed (byte* pByteArray = &ret[0])
{
float* pFloatArray = (float*)pByteArray;
for (int i = 0; i < n; i++)
{
pFloatArray[i] = data[i];
}
}
}
return ret;
}
Although it basically does do a for loop behind the scenes, it does do the job in one line
byte[] byteArray = floatArray.Select(
f=>System.BitConverter.GetBytes(f)).Aggregate(
(bytes, f) => {List<byte> temp = bytes.ToList(); temp.AddRange(f); return temp.ToArray(); });

Categories