I'm using waveOutWrite with a callback function, and under native code everything is fast. Under .NET it is much slower, to the point I think I'm doing something very wrong, 5 or 10 times slower sometimes.
I can post both sets of code, but seems like too much, so I'll just post the C code that is fast and point out the minor variances in the .NET code.
HANDLE WaveEvent;
const int TestCount = 100;
HWAVEOUT hWaveOut[1]; // don't ask why this is an array, just test code
WAVEHDR woh[1][20];
void CALLBACK OnWaveOut(HWAVEOUT,UINT uMsg,DWORD,DWORD,DWORD)
{
if(uMsg != WOM_DONE)
return;
assert(SetEvent(WaveEvent)); // .NET code uses EventWaitHandle.Set()
}
void test(void)
{
WaveEvent = CreateEvent(NULL,FALSE,FALSE,NULL);
assert(WaveEvent);
WAVEFORMATEX wf;
memset(&wf,0,sizeof(wf));
wf.wFormatTag = WAVE_FORMAT_PCM;
wf.nChannels = 1;
wf.nSamplesPerSec = 8000;
wf.wBitsPerSample = 16;
wf.nBlockAlign = WORD(wf.nChannels*(wf.wBitsPerSample/8));
wf.nAvgBytesPerSec = (wf.wBitsPerSample/8)*wf.nSamplesPerSec;
assert(waveOutOpen(&hWaveOut[0],WAVE_MAPPER,&wf,(DWORD)OnWaveOut,0,CALLBACK_FUNCTION) == MMSYSERR_NOERROR);
for(int x=0;x<2;x++)
{
memset(&woh[0][x],0,sizeof(woh[0][x]));
woh[0][x].dwBufferLength = PCM_BUF_LEN;
woh[0][x].lpData = (char*) malloc(woh[0][x].dwBufferLength);
assert(waveOutPrepareHeader(hWaveOut[0],&woh[0][x],sizeof(woh[0][x])) == MMSYSERR_NOERROR);
assert(waveOutWrite(hWaveOut[0],&woh[0][x],sizeof(woh[0][x])) == MMSYSERR_NOERROR);
}
int bufferIndex = 0;
DWORD times[TestCount];
for(int x=0;x<TestCount;x++)
{
DWORD t = timeGetTime();
assert(WaitForSingleObject(WaveEvent,INFINITE) == WAIT_OBJECT_0); // .NET code uses EventWaitHandle.WaitOne()
assert(woh[0][bufferIndex].dwFlags & WHDR_DONE);
assert(waveOutWrite(hWaveOut[0],&woh[0][bufferIndex],sizeof(woh[0][bufferIndex])) == MMSYSERR_NOERROR);
bufferIndex = bufferIndex == 0 ? 1 : 0;
times[x] = timeGetTime() - t;
}
}
The times[] array for the C code always has values around 80, which is the PCM buffer length I am using. The .NET code also shows similar values sometimes, however, it sometimes shows values as high as 1000, and more often values in the 300 to 500 range.
Doing the part that is in the bottom loop inside the OnWaveOut callback instead of using events, makes it fast all the time, with .NET or native code. So it appears the issue is with the wait events in .NET only, and mostly only when "other stuff" is happening on the test PC -- but not a lot of stuff, can be as simple as moving a window around, or opening a folder in my computer.
Maybe .NET events are just really bad about context switching, or .NET apps/threads in general? In the app I'm using to test my .NET code, the code just runs in the constructor of a form (easy place to add test code), not on a thread-pool thread or anything.
I also tried using the version of waveOutOpen that takes an event instead of a function callback. This is also slow in .NET but not in C, so again, it points to an issue with events and/or context switching.
I'm trying to keep my code simple and setting an event to do the work outside the callback is the best way I can do this with my overall design. Actually just using the event driven waveOut is even better, but I tried this other method because straight callbacks are fast, and I didn't expect normal event wait handles to be so slow.
Maybe not 100% related but I faced somehow the same issue: calling EventWaitHandle.Set for X times is fine, but then, after a threshold that I can't mention, each call of this method takes 1 complete second!
Is appears that some .net way to synchronize thread are much slower than the ones you use in C++.
The all mighty #jonskeet once made a post on his web site (https://jonskeet.uk/csharp/threads/waithandles.html) where he also refers the very complex concept of .net synchronization domains explained here: https://www.drdobbs.com/windows/synchronization-domains/184405771
He mentions that .net and the OS must communicate in a very very very time precise way with object that must be converted from one environment to another. All this is very time consuming.
I summarized a lot here, not to take credit for the answer but there is an explanation. There are some recommendations here (https://learn.microsoft.com/en-us/dotnet/standard/threading/overview-of-synchronization-primitives) about some ways to choose how to synchronize depending on the context, and the performance aspect is mentioned a little bit.
Related
This question already has answers here:
WPF CreateBitmapSourceFromHBitmap() memory leak
(6 answers)
Closed 3 years ago.
3 Overall Questions
Through trial and error of commenting out certain lines it is possible for someone to determine the line and/or function which is causing a memory leak. I have done this, however, using Visual Studio, I cannot figure out how to view how much memory each variable is taking up to find out which is causing a memory leak. That is my first question.
My second is what exactly are they, I have read many posts in the past about references to variables stopping the GC and so on, but I really need someone to explain a bit simpler, because that just doesn't make sense to me. And I am sure many others feel the same way.
Lastly, how do you stop them, I have seen many articles on "unsubscribing" to events and such, but none of them seem to apply to me so none of those techniques work to solve this memory leak. I want to know a more complete way of preventing them which I am sure can be easily understood if I knew what they actually are.
Personal Question
This is one of those "my code is not working" questions. But wait... don't rage yet. My code has a memory leak however, from my very limited knowledge of what they are, I do not understand how this is possible. The function with the memory leak does not have any lines where I add values to a variable (to make it have a larger size), and there are no local variables (which would be Garbage Collected at the end of the function but may not be and might cause memory leaks). My memory leak is very severe, it adds approximately a Gigabyte a minute of memory being used. By the way, it might be useful to know that I am using a WPF in my code, you may recognise the file name patterns: NormalUI.xaml, NormalUI.xaml.cs, Program.cs.
I did not attach my full code, this is because there is no need and I am not lazy. I have made sure that I simplified my code (removed stuff) as much as I could while keeping the error present. This is also so that anybody experiencing the same problem can easily look at this answer and fix their own code.
Here is an image of the quite scary memory leak:
Simplified Code - NormalUI.xaml
<Image x:Name="Map"/>
Simplified Code - NormalUI.xaml.cs
public NormalUI()
{
// Setup some variables
// Start the timer to continually update the UI
System.Timers.Timer EverythingUpdaterTimer = new System.Timers.Timer();
EverythingUpdaterTimer.Elapsed += new ElapsedEventHandler(UpdateEverything);
EverythingUpdaterTimer.Interval = 100; // this has a memory leak
EverythingUpdaterTimer.Enabled = true;
InitializeComponent();
}
public void UpdateEverything(object source, ElapsedEventArgs e) // THIS FUNCTION CALLS THE FUNCTION WITH THE MEMORY LEAK (I THINK(
{
try // Statistics and map (won't work at the start before they have been declared... since this is a timer)
{
Dispatcher.Invoke(() => // Because this is a separate "thread"... I think...
{
// Update the map (height and width in the case of a load)
Map.Height = MAP_HEIGHT;
Map.Width = MAP_WIDTH;
Map.Source = Program.CalculateMap();
});
}
catch
{
Program.Log("Couldn't update general UI information");
}
}
Simplified Code - Program.cs
public static ImageSource CalculateMap()
{
/* This function is responsible for creating the map. The map is actually one large
* picture, but this function gets each individual smaller image (icon) and essentially
* "squashes" the images together to make the larger map. This function is used every
* single turn to update what the map looks like.
*/
List<List<Image>> map = new List<List<Image>>(world.rows);
// Create the empty world
for (int r = 0; r < world.rows; r++)
{
map.Add(new List<Image>(world.cols)); // Add images to list
for (int c = 0; c < world.cols; c++)
{
map[r].Add(EMPTY_ICON); // Add images to list
}
}
// Create the bitmap and convert it into a form which can be edited
Bitmap bitmapMap = new Bitmap(world.rows * ICON_WIDTH, world.cols * ICON_HEIGHT); // Get it? bitmapMap... bitMap...
Graphics g = Graphics.FromImage(bitmapMap);
// Add images to the bitmap
for (int r = 0; r < world.rows; r++)
{
for (int c = 0; c < world.cols; c++)
{
g.DrawImage(map[r][c], (c) * ICON_WIDTH, (r) * ICON_HEIGHT, ICON_WIDTH, ICON_HEIGHT);
}
}
// Convert it to a form usable by WPF (ImageSource).
ImageSource imageSourceMap = System.Windows.Interop.Imaging.CreateBitmapSourceFromHBitmap(
bitmapMap.GetHbitmap(),
IntPtr.Zero,
Int32Rect.Empty,
BitmapSizeOptions.FromEmptyOptions());
// Return the new ImageSource.
return imageSourceMap;
}
I would very much appreciate an answer to my personal question... as well as the overall question.
The only objects that requires disposing are non-.NET objects. If the underlying O/S requires something like CloseHandle (or similar) to free up memory, then you will need to dispose that object in .NET code.
If any .NET classes implement IDisposable you MUST call .Dispose().
You can use tools such as the code analyser built into VS to point out where objects have a Dispose, but you aren't calling it.
I am writing multi-threaded solution that will be used for transferring data from different sources to a central database. Solution, in general, has two parts:
Single-threaded Import engine
Multi-threaded client that invokes Import engine in threads.
In order to minimize custom development I am using Roslyn scripting. This feature is enabled with Nuget Package manager in Import engine project.
Every import is defined as transformation of input table – that has collection of input fields – to destination table – again with collection of destination fields.
Scripting engine is used here to allow custom transformation between input and output. For every input/output pair there is text field with custom script. Here is simplified code used for script initialization:
//Instance of class passed to script engine
_ScriptHost = new ScriptHost_Import();
if (Script != "") //Here we have script fetched from DB as text
{
try
{
//We are creating script object …
ScriptObject = CSharpScript.Create<string>(Script, globalsType: typeof(ScriptHost_Import));
//… and we are compiling it upfront to save time since this might be invoked multiple times.
ScriptObject.Compile();
IsScriptCompiled = true;
}
catch
{
IsScriptCompiled = false;
}
}
Later we will invoke this script with:
async Task<string> RunScript()
{
return (await ScriptObject.RunAsync(_ScriptHost)).ReturnValue.ToString();
}
So, after import definition initialization, where we might have any number of input/output pair description along with script object, memory foot print increases approximately 50 MB per pair where scripting is defined.
Similar usage pattern is applied to validation of destination rows before storing it to a DB (every field might have several scripts that are used to check validity of data).
All in all, typical memory footprint with modest transformation/validation scripting is 200 MB per thread. If we need to invoke several threads, memory usage will be very high and 99% will be used for scripting.
If Import engine is enclosed in WCF based middle layer (which I did) quickly we stumble upon "Insufficient memory" problem.
Obvious solution would be to have one scripting instance that would somehow dispatch code execution to specific function inside the script depending on the need (input/output transformation, validation or something else). I.e. instead of script text for every field we will have SCRIPT_ID that will be passed as global parameter to script engine. Somewhere in script we need to switch to specific portion of code that would execute and return appropriate value.
Benefit of such solution should be considerably better memory usage. Drawback the fact that script maintenance is removed from specific point where it is used.
Before implementing this change, I would like to hear opinions about this solution and suggestions for different approach.
As it seems - using scripting for the mission might be a wasteful overkill - you use many application layers and the memory gets full.
Other solutions:
How do you interface with the DB? you can manipulate the query itself according to your needs instead of writing a whole script for that.
How about using Generics? with enough T's to fit your needs:
public class ImportEngine<T1,T2,T3,T3,T5>
Using Tuples (which is pretty much like using generics)
But if you still think scripts is the right tool for you, I found that the memory usage of scripts can be lowered by running the script work inside your application, (and not with RunAsync), you can do this be getting back from RunAsync the logic, and re-use it, instead of doing the work inside the heavy and memory wasteful RunAsync. Here is an example:
Instead of simply (the script string):
DoSomeWork();
You can do this (IHaveWork is an interface defined in you app, with only one method Work):
public class ScriptWork : IHaveWork
{
Work()
{
DoSomeWork();
}
}
return new ScriptWork();
This way you call the heavy RunAsync only for short period, and it is returning a worker that you can re-use inside your application (and you can of course extend this by adding parameters to the Work method and inherit logic from your application and so on...).
The pattern also breaking the isolation between your app and the script, so you can easily give and get data from the script.
EDIT
Some quick benchmark:
This code:
static void Main(string[] args)
{
Console.WriteLine("Compiling");
string code = "System.Threading.Thread.SpinWait(100000000); System.Console.WriteLine(\" Script end\");";
List<Script<object>> scripts = Enumerable.Range(0, 50).Select(num =>
CSharpScript.Create(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).ToList();
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced); // for fair-play
for (int i = 0; i < 10; i++)
Task.WaitAll(scripts.Select(script => script.RunAsync()).ToArray());
}
Consumes about ~600MB in my environment (just referenced the System.Windows.Form in the ScriptOption for sizing the scripts).
It reuse the Script<object> - it's not consuming more memory on second call to RunAsync.
But we can do better:
static void Main(string[] args)
{
Console.WriteLine("Compiling");
string code = "return () => { System.Threading.Thread.SpinWait(100000000); System.Console.WriteLine(\" Script end\"); };";
List<Action> scripts = Enumerable.Range(0, 50).Select(async num =>
await CSharpScript.EvaluateAsync<Action>(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).Select(t => t.Result).ToList();
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
for (int i = 0; i < 10; i++)
Task.WaitAll(scripts.Select(script => Task.Run(script)).ToArray());
}
In this script, I'm simplifying a bit the solution I proposed to returning Action object, but i think the performance impact is small (but on real implementations I really think you should use your own interface to make it flexible).
When the script is running, you can see a steep rise in memory to ~240MB, but after I'm calling the garbage collector (for demonstration purpose, and I did the same on the previous code) the memory usage drops back to ~30MB. It also faster.
I am not sure whether this existed at the time of question creation but there is something very similar and, let's say, official way how to run scripts multiple times without increasing program memory. You need to use CreateDelegate method that will do exactly what is expected.
I will post it here just for the convenience:
var script = CSharpScript.Create<int>("X*Y", globalsType: typeof(Globals));
ScriptRunner<int> runner = script.CreateDelegate();
for (int i = 0; i < 10; i++)
{
Console.WriteLine(await runner(new Globals { X = i, Y = i }));
}
It takes some memory initially, but keep runner in some global list and invoke it later quickly.
Profiling my application reveals that 50% of runtime is being spent in a packArrays() function which performs array transformations where C++ strongly outperforms C#.
In order to improve performance, I used unsafe in packArrays to gain only low single digit percentage improvements in runtime. In order to eliminate cache as the bottleneck and in order to estimate the ceiling of performance improvement, I wrote packArrays in C++ and timed the difference in both languages. The C++ version runs approx 5x faster than C#. I decided to give C++/CLI a try.
As a result, I have three implementations:
C++ - a simple packArrays() function
C# - packArrays() is wrapped into a class, however the code inside the function is identical to the C++ version
C++/CLI - shown below, but again the implementation of packArrays() is identical (literally) to the previous two
The C++/CLI implementation is as follows
QCppCliPackArrays.cpp
public ref class QCppCliPackArrays
{
void pack(array<bool> ^ xBoolArray, int xLen, array<int> ^% yBoolArray, int % yLen)
{
// prepare variables
pin_ptr<bool> xBoolArrayPinned = &xBoolArray[0];
bool * xBoolArray_ = xBarsAreTruePinned;
pin_ptr<bool> yBoolArrayPinned = &yBoolArray[0];
bool * yBoolArray_ = yBarsAreTruePinned;
// go
packArrays(xBoolArray_, xBarCount, yBoolArray_ , yLen);
}
};
packArraysWorker.cpp
#pragma managed(push, off)
void packArrays(bool * xArray, int xLen, bool * yArray, int & yLen)
{
... actual code that is identical across languages code ...
}
#pragma managed(pop)
QCppCliPackArrays.cpp is compiled with \clr option, packArraysWorker.cpp is compiled with No Common Language RunTime Support option.
The problem: When using a C# application to run both C# and C++/CLI implementations, C++/CLI implementation is still only marginally faster than C#.
Questions:
Is there any other option/setting/keyword I can use to increase the performance of C++/CLI?
Can the performance loss of C++/CLI compared to C++ be wholely attributed to interop? Currently, for 10K repetitions C# runs some 4.5 seconds slower than C++, giving interop 0.45 millisecond per repetition. As all types being passed are blittable, I would expect the interop to .. well just pass over some pointers.
Would I gain anything by using P/Invoke? From what I read not, but it's always better to ask.
Is there any other method I can use? Leaving a five-fold increase in performance on the table is just too much.
All timings are made in Release/x64 from the command line (not from VS) on a single thread.
EDIT:
In order to determine the performance loss due to interop, I placed a Stopwatch around the QCppCliPackArrays::packArrays() call as well a chrono::high_resolution_clock inside the packArrays() per se. The results show that The C# <-> C++/CLI switch costs approx. 5 milliseconds per 10K calls. The switch from managed C++/CLI to unmanaged C++/CLI, according to results, costs nothing.
Hence, interop can be ruled out as the cause of performange degradation.
On the other hand, its obvious that packArrays() is NOT run as unmanaged! But why?
EDIT 2:
I tried to link the packArrays() as a .lib file exported from a separate unmanaged C++ library. Results are still the same.
EDIT 3:
The actual packArrays is this
public void packArrays(bool[] xConditions, int[] xValues, int xLen, ref int[] yValuesPacked, ref int yPackedLen)
{
// alloc
yPackedLen = xConditions.trueCount();
yValuesPacked = new int [yPackedLen];
// fill
int xPackedIdx = 0;
for (int xIdx = 0; xIdx < xLen; xIdx++)
if (xConditions[xIdx] == true)
yValuesPacked[xPackedIdx++] = xValues[xIdx];
}
into yValuesPacked puts all values from xValues where the corresponding xConditions[i] is true.
Now, I am facing a new issue - I have several implementations aiming to solve this problem, all of them work correctly (tested). When I run a benchmark that invididually calls these different implementations 50K times on arrays 86K items long, I get the following timinigs in seconds:
The original implementation originalArray is the code listed above. Clearly, the QCsCpp* versions dominate the benchmark - these are the implementations using C++/CLI. However, when I replace originalArrayin my original application, that calls packArrays a vast number of times, with either QCsCpp* implementation, the whole application runs SLOWER. With this result, I am really clueless and I must admit that it honestly crushed me. How can this be true? As always, any insight is much appreciated.
I need a reliable method to check the mouse pointer state, and to count how many times it has changed, e.g. from 'normal' pointer to the hourglass/circle or vice versa. It is part of a performance test that measures response times, and the only way to determine whether a certain business process has finished is by observing the mouse pointer, and to count how many times it has gone from "normal" to "busy" and back again. Once it done this twice, the process is finished. I know - it's horrible, and a rubbish workaround, but it's the only thing I can use.
I have implemented something that works, but it has one crucial weakness: if the mouse pointer changes while the thread has gone to sleep, then I "miss" this change and consequently the exit condition. I will reduce the wait time to 5 or 10 milliseconds, but it's still not a good solution.
Here's the code I have, to give you an idea of what's going on:
TimeSpan timePassed = new TimeSpan();
bool lastMousePointerState = ConvertMousePointerStateToBoolean(Mouse.CursorName);
bool currentMousePointerState = true;
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
while(pointerChanges <= 1 && timePassed.Seconds < synchDurationTimeout)
{
Thread.Sleep(100);
currentMousePointerState = ConvertMousePointerStateToBoolean(Mouse.CursorName);
var variableComparison = lastMousePointerState ^ currentMousePointerState;
if (variableComparison.Equals(true))
{
pointerChanges++;
}
timePassed = stopWatch.Elapsed;
lastMousePointerState = currentMousePointerState;
}
I had a look at this article to see if perhaps I could make use of callback functions, and what the article describes does work but only for mouse actions, not its state. Since I have practically no experience with callbacks and making calls out to the OS from .NET, I was hoping someone could tell me if a) what I have in mind is generally possible, and if so b) perhaps provide a working code snippet that would achieve what I need.
Thanks in advance !
Edit: I think the GetCursorInfo function might be what I need, but the description is so terse as to be useless to me ;-)
Despite the overwhelming number of responses here, I'd like to answer my own question :-)
What I ended up implementing (and what is good enough for my purposes) is to use the code that was provided by Atomic Object.
I simply generate the dll from the C++ code, and use a loop to check the state. It is still not as good as a callback/notification mechanism, but I need to finish this and this is the best solution to date.
I've been programming console apps for 1 year and I think its time to start something with forms. I don't really know how to make 2 loops work at the same time.
Could any1 help me and give me an example of 2 loops, working together (1 counting from 1 to 100 and 2nd countin from 100 to 200 (both at the same time, lets say 2 message boxes)). I've been looking for smth like that on the net but without success.
I'd also like to know if infinite whiles has to be like while (5>2) or if theres a better way to do that.
Thanks in advance !
I don't really know how to make 2 loops work at the same time.
This is a simple question with an enormous answer, but I'll try to break it down for you.
The problem you're describing at its basic level is "I have two different hunks of code that both interact with the user in some way. I would like to give the user the impression that both hunks of code are running at the same time, smoothly responding to user input."
Obviously the easiest way to do that is to write two programs. That is, make the operating system solve the problem. The operating system somehow manages to have dozens of different processes running "at the same time", all interacting smoothly (we hope) with the user.
But having two processes imposes a high cost. Processes are heavyweight, and it is expensive for the two hunks of code to talk to each other. Suppose you therefore want to have the two hunks of code in the same program. Now what do you do?
One way is to put the two hunks of code each on their own thread within the same process. This seems like a good idea, but it creates a lot of problems of its own. Now you have to worry about thread safety and deadlocks and all of that. And, unfortunately, only one thread is allowed to communicate with the user. Every forms application has a "UI" thread. If you have two "worker" threads running your hunks of code, they have to use cross-thread communication to communicate with the UI thread.
Another way is to break up each hunk of code into tiny little pieces, and then schedule all the pieces to run in order, on the UI thread. The scheduler can give priority to user interaction, and any particular tiny piece of work is not going to block and make the UI thread unresponsive.
It is this last technique that I would suggest you explore. We are doing a lot of work in C# 5 to make it easier to write programs in this style.
See http://msdn.microsoft.com/en-us/async for more information about this new feature.
Not sure if this is what you mean about the two loops.
Infinite loops is anything where while (expression is true) where your expression is 5>2 is always returning true and there is no terminating out of the loop i.e. return; or break;
Drop two labels on the form in Designer view. And then add this in Code view:
public Form1()
{
InitializeComponent();
Shown += new EventHandler(Form1_Shown);
}
void Form1_Shown(object sender, EventArgs e)
{
for (int i = 1; i <= 100; i++)
{
label1.Text = i.ToString();
// "Second loop"
label2.Text = (i + 100).ToString();
Update();
System.Threading.Thread.Sleep(10);
}
}
You'll get two numbers counting simultaneously. One from 1-100. The other from 101-200.
This?
for (int i = 1; i <= 100; i++)
{
//..
for (int i2 = 100; i2 <= 200; i2++)
{
//..
}
}