I am writing multi-threaded solution that will be used for transferring data from different sources to a central database. Solution, in general, has two parts:
Single-threaded Import engine
Multi-threaded client that invokes Import engine in threads.
In order to minimize custom development I am using Roslyn scripting. This feature is enabled with Nuget Package manager in Import engine project.
Every import is defined as transformation of input table – that has collection of input fields – to destination table – again with collection of destination fields.
Scripting engine is used here to allow custom transformation between input and output. For every input/output pair there is text field with custom script. Here is simplified code used for script initialization:
//Instance of class passed to script engine
_ScriptHost = new ScriptHost_Import();
if (Script != "") //Here we have script fetched from DB as text
{
try
{
//We are creating script object …
ScriptObject = CSharpScript.Create<string>(Script, globalsType: typeof(ScriptHost_Import));
//… and we are compiling it upfront to save time since this might be invoked multiple times.
ScriptObject.Compile();
IsScriptCompiled = true;
}
catch
{
IsScriptCompiled = false;
}
}
Later we will invoke this script with:
async Task<string> RunScript()
{
return (await ScriptObject.RunAsync(_ScriptHost)).ReturnValue.ToString();
}
So, after import definition initialization, where we might have any number of input/output pair description along with script object, memory foot print increases approximately 50 MB per pair where scripting is defined.
Similar usage pattern is applied to validation of destination rows before storing it to a DB (every field might have several scripts that are used to check validity of data).
All in all, typical memory footprint with modest transformation/validation scripting is 200 MB per thread. If we need to invoke several threads, memory usage will be very high and 99% will be used for scripting.
If Import engine is enclosed in WCF based middle layer (which I did) quickly we stumble upon "Insufficient memory" problem.
Obvious solution would be to have one scripting instance that would somehow dispatch code execution to specific function inside the script depending on the need (input/output transformation, validation or something else). I.e. instead of script text for every field we will have SCRIPT_ID that will be passed as global parameter to script engine. Somewhere in script we need to switch to specific portion of code that would execute and return appropriate value.
Benefit of such solution should be considerably better memory usage. Drawback the fact that script maintenance is removed from specific point where it is used.
Before implementing this change, I would like to hear opinions about this solution and suggestions for different approach.
As it seems - using scripting for the mission might be a wasteful overkill - you use many application layers and the memory gets full.
Other solutions:
How do you interface with the DB? you can manipulate the query itself according to your needs instead of writing a whole script for that.
How about using Generics? with enough T's to fit your needs:
public class ImportEngine<T1,T2,T3,T3,T5>
Using Tuples (which is pretty much like using generics)
But if you still think scripts is the right tool for you, I found that the memory usage of scripts can be lowered by running the script work inside your application, (and not with RunAsync), you can do this be getting back from RunAsync the logic, and re-use it, instead of doing the work inside the heavy and memory wasteful RunAsync. Here is an example:
Instead of simply (the script string):
DoSomeWork();
You can do this (IHaveWork is an interface defined in you app, with only one method Work):
public class ScriptWork : IHaveWork
{
Work()
{
DoSomeWork();
}
}
return new ScriptWork();
This way you call the heavy RunAsync only for short period, and it is returning a worker that you can re-use inside your application (and you can of course extend this by adding parameters to the Work method and inherit logic from your application and so on...).
The pattern also breaking the isolation between your app and the script, so you can easily give and get data from the script.
EDIT
Some quick benchmark:
This code:
static void Main(string[] args)
{
Console.WriteLine("Compiling");
string code = "System.Threading.Thread.SpinWait(100000000); System.Console.WriteLine(\" Script end\");";
List<Script<object>> scripts = Enumerable.Range(0, 50).Select(num =>
CSharpScript.Create(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).ToList();
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced); // for fair-play
for (int i = 0; i < 10; i++)
Task.WaitAll(scripts.Select(script => script.RunAsync()).ToArray());
}
Consumes about ~600MB in my environment (just referenced the System.Windows.Form in the ScriptOption for sizing the scripts).
It reuse the Script<object> - it's not consuming more memory on second call to RunAsync.
But we can do better:
static void Main(string[] args)
{
Console.WriteLine("Compiling");
string code = "return () => { System.Threading.Thread.SpinWait(100000000); System.Console.WriteLine(\" Script end\"); };";
List<Action> scripts = Enumerable.Range(0, 50).Select(async num =>
await CSharpScript.EvaluateAsync<Action>(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).Select(t => t.Result).ToList();
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
for (int i = 0; i < 10; i++)
Task.WaitAll(scripts.Select(script => Task.Run(script)).ToArray());
}
In this script, I'm simplifying a bit the solution I proposed to returning Action object, but i think the performance impact is small (but on real implementations I really think you should use your own interface to make it flexible).
When the script is running, you can see a steep rise in memory to ~240MB, but after I'm calling the garbage collector (for demonstration purpose, and I did the same on the previous code) the memory usage drops back to ~30MB. It also faster.
I am not sure whether this existed at the time of question creation but there is something very similar and, let's say, official way how to run scripts multiple times without increasing program memory. You need to use CreateDelegate method that will do exactly what is expected.
I will post it here just for the convenience:
var script = CSharpScript.Create<int>("X*Y", globalsType: typeof(Globals));
ScriptRunner<int> runner = script.CreateDelegate();
for (int i = 0; i < 10; i++)
{
Console.WriteLine(await runner(new Globals { X = i, Y = i }));
}
It takes some memory initially, but keep runner in some global list and invoke it later quickly.
Related
I have a command line application that has to be able to perform one of a number of discrete tasks, given a verb as a command line argument. Each task is handled by a class, each of which implements an interface containing a method Execute(). I'm trying to do this without using if or switch statements. So far, what I have is this:
var taskTypeName = $"MyApp.Tasks.{invokedVerb}Task";
var taskType = Type.GetType(taskTypeName, false);
var task = Activator.CreateInstance(taskType) as IMaintenanceTask;
task.Execute();
task is of type IMaintenanceTask, which is fundamentally what I'm trying to achieve. I'd prefer to avoid using dynamic - my understanding is that if it's only used once, like here, I won't see any of the benefits of caching, making it just reflection in fewer keystrokes.
Is this approach (or something along the same lines) likely to noticeably affect performance? I know it definitely increases the chance of runtime exceptions/bugs, but that's partly mitigated by the fact that this application is only going to be run via scripts; it will only deal with predictable input - also this will be the only place in the code that behaves dynamically. Is what I'm trying to achieve sensible? Or would it be better to just do this the boring normal way, via just switching on the input and constructing each type of task via a normal, compile-time constructor, and calling .Execute() on that.
As it is just one time call, you can go with your solution. Just add a few conditions to avoid exception chances.
var taskTypeName = $"MyApp.Tasks.{invokedVerb}Task";
var taskType = Type.GetType(taskTypeName, false);
if (taskType != null && typeof(IMaintenanceTask).IsAssignableFrom(taskType))
{
var task = Activator.CreateInstance(taskType) as IMaintenanceTask;
task.Execute();
}
Don't worry about the performance of a dispatch mechanism, unless it is in a tight loop. Switching a single direct method call to a single call through dynamic, a single call through reflection, a single call through emit API, or a single call through compiled LINQ expression will not make a detectable difference in execution time of your application. The time it takes operating system to start up your application is several orders of magnitude higher than the time it takes your application to decide what method to call, so your solution is as good as a switch, except it is a lot shorter (which is a good thing).
And yet another question about yield return
So I need to execute remotely different SQL scripts. The scripts are in TFS so I get them from TFS automatically and the process iterates through all the files reading their content in memory and sending the content to the remote SQL servers.
So far the process works flawlessly. But now some of the scripts will contain bulk inserts increasing the size of the script to 500,000 MB or more.
So I built the code "thinking" that I was reading the content of the file once in memory but now I have second thoughts.
This is what I have (over simplified):
public IEnumerable<SqlScriptSummary> Find(string scriptsPath)
{
if (!Directory.Exists(scriptsPath))
{
throw new DirectoryNotFoundException(scriptsPath);
}
var path = new DirectoryInfo(scriptsPath);
return path.EnumerateFiles("*.sql", SearchOption.TopDirectoryOnly)
.Select(x =>
{
var script = new SqlScriptSummary
{
Name = x.Name,
FullName = x.FullName,
Content = File.ReadAllText(x.FullName, Encoding.Default)
};
return script;
});
}
....
public void ExecuteScripts(string scriptsPath)
{
foreach (var script in Find(scriptsPath))
{
_scriptRunner.Run(script.Content);
}
}
My understanding is that EnumerateFiles will yield return each file at a time, so that's what made me "think" that I was loading one file at a time in memory.
But...
Once that I'm iterating them, in the ExecuteScripts method what happens with the script variable used in the foreach loop after it goes out of scope? Is that disposed? or does it remain in memory?
If it remains in memory that means that even when I'm using iterators and internally using yield return when I iterate through all of them they are still in memory right? so at the end it would be like using ToList just with a lazy execution is that right?
If the script variable is disposed when it goes out of scope then I think I would be fine
How could I re-design the code to optimize memory consumption, like forcing just to load the content of a script into memory one at a time
Additional questions:
How can I test (unit/integration test) that I'm loading just one script at a time in memory?
How can I test (unit/integration test) that each script is released/not released from memory?
Once that I'm iterating them, what happens with the script variable used in the foreach loop after it goes out of scope? Is that disposed? or does it remain in memory?
If you mean in the ExecuteScripts method - there's nothing to dispose, unless SqlScriptSummary implements IDisposable, which seems unlikely. However, there are two different things here:
The script variable goes out of scope after the foreach loop, and can't act as a GC root
Each object that the script variable has referred to will be eligible for garbage collection when nothing else refers to it... including script on the next iteration.
So yes, basically that should be absolutely fine. You'll be loading one file at a time, and I can't see any reason why there's be more than one file's content in memory at a time, in terms of objects that the GC can't collect. (The GC itself is lazy, so it's unlikely that there'd be exactly one script in memory at a time, but you don't need to worry about that side of things, as your code makes sure that it doesn't keep live references to more than one script at a time.)
The way you can test that you're only loading a single script at a time is to try it with a large directory of large scripts (that don't actually do anything). If you can process more scripts than you have memory, you're fine :)
I want to be able to change the code of my game dynamicly. For example lets say my game is structured this way:
class GameState {
public int SomeData;
public Entity[] EntityPool;
}
interface IServices {
IRenderer Renderer { get; }
}
interface IGameCode {
void RenderAndUpdate(GameState currentState, IServices serviceProvider);
}
I now want be able to write code like this:
void MainLoop() {
IGameCode gameCode = new DefaultGameCode();
while(true) {
// Handle Plattform things
if(shouldUseNewGameCode) {
UnloadCode(gameCode);
gameCode = LoadCode("new.dll");
// or
gameCode = LoadCode("new.cs");
}
// Call GameTick
gameCode.RenderAndUpdate(gameState, services);
}
}
I already used AppDomains and a Proxyclass but it is too slow to serialize every frame. I tried to just pass a pointer but since AppDomains use their own virtual address space i cant access the GameState Object. My other idea was to use Reflection to get the IL from the compiled method via GetMethodBody() and pass it to an DynamicMethod but this would limit the way how I could write the RenderAndUpdate method since I can not use Submethods or Variables in the IGameCode implementation.
So how can I achive what I want to do?
As you've seen, you really don't want to be crossing AppDomain boundaries on every frame, especially if that code has to call back to the main code e.g. IServices a bunch of times. Even with MarshalByRefObject, which can improve things a little, it's going to be too slow. So you need a solution that doesn't involve the AppDomain.
How big is your assembly? How often do you expect to change it?
Noting that .NET assemblies are generally fairly space-efficient, and that in your scenario it seems unlikely a user would switch assemblies more than a few times in a session, I would just read your DLL into memory as a byte[] and then use Assembly.Load(byte[]); to load it.
Alternatively, if you really can't tolerate a dead assembly in your process memory space, I think it would be better to use a helper process, aka "launcher": when you want to switch implementations, start up the helper process (or just leave it running all the time if you want), which in turn will wait for the current game process to exit, and then will start a new one with the new settings.
This will be slower to switch, but of course is a one-time cost for each switch and then the program can run full-speed during the actual gameplay.
I have tried to implement the following algorithm using Parallel.Foreach. I thought it would be trivial to make parallel, since it has no synchronization issues. It is basically a Monte-Carlo tree search, where I explore every child in a parallel. The Monte-Carlo stuff is not really important, all you have to know is that I have a method which works a some tree, and which I call with Parallel.Foreach on the root children. Here is the snippet where the parallel call is being made.
public void ExpandParallel(int time, Func<TGame, TGame> gameFactory)
{
int start = Environment.TickCount;
// Creating all of root's children
while (root.AvailablePlays.Count > 0)
Expand(root, gameInstance);
// Create the children games
var games = root.Children.Select(c =>
{
var g = gameFactory(gameInstance);
c.Play.Apply(g.Board);
return g;
}).ToArray();
// Create a task to expand each child
Parallel.ForEach(root.Children, (tree, state, i) =>
{
var game = games[i];
// Make sure we don't waste time
while (Environment.TickCount - start < time && !tree.Completed)
Expand(tree, game);
});
// Update (reset) the root data
root.Wins = root.Children.Sum(c => c.Wins);
root.Plays = root.Children.Sum(c => c.Plays);
root.TotalPayoff = root.Children.Sum(c => c.TotalPayoff);
}
The Func<TGame, TGame> delegate is a cloning factory, so that each child has its own clone of the game state. I can explain the internals of the Expand method if required, but I can assure that it only accesses the state of the current sub-tree and game instances and there are no static members in any of those types. I thought it may be that Environment.TickCount is making the contention, but I ran an experiment just calling EnvironmentTickCount inside a Parallel.Foreach loop, and got nearly 100 % processor usage.
I only get 45% to 50% use on a Core i5.
This is a common symptom of GC thrashing. Without knowing more about what your doing inside of the Expand method, my best guess is this would be your root-cause. It's also possible that some shared data access is also the culprit, either by calling to a remote system, or by locking access to shared resources.
Before you do anything, You need to determine the exact cause with a profiler or other tool. Don't guess as this will just waste your time, and don't wait for an answer here as without your complete program it can not be answered. As you already know from experimentation, there is nothing in the Parallel.ForEach that would cause this.
I'm using waveOutWrite with a callback function, and under native code everything is fast. Under .NET it is much slower, to the point I think I'm doing something very wrong, 5 or 10 times slower sometimes.
I can post both sets of code, but seems like too much, so I'll just post the C code that is fast and point out the minor variances in the .NET code.
HANDLE WaveEvent;
const int TestCount = 100;
HWAVEOUT hWaveOut[1]; // don't ask why this is an array, just test code
WAVEHDR woh[1][20];
void CALLBACK OnWaveOut(HWAVEOUT,UINT uMsg,DWORD,DWORD,DWORD)
{
if(uMsg != WOM_DONE)
return;
assert(SetEvent(WaveEvent)); // .NET code uses EventWaitHandle.Set()
}
void test(void)
{
WaveEvent = CreateEvent(NULL,FALSE,FALSE,NULL);
assert(WaveEvent);
WAVEFORMATEX wf;
memset(&wf,0,sizeof(wf));
wf.wFormatTag = WAVE_FORMAT_PCM;
wf.nChannels = 1;
wf.nSamplesPerSec = 8000;
wf.wBitsPerSample = 16;
wf.nBlockAlign = WORD(wf.nChannels*(wf.wBitsPerSample/8));
wf.nAvgBytesPerSec = (wf.wBitsPerSample/8)*wf.nSamplesPerSec;
assert(waveOutOpen(&hWaveOut[0],WAVE_MAPPER,&wf,(DWORD)OnWaveOut,0,CALLBACK_FUNCTION) == MMSYSERR_NOERROR);
for(int x=0;x<2;x++)
{
memset(&woh[0][x],0,sizeof(woh[0][x]));
woh[0][x].dwBufferLength = PCM_BUF_LEN;
woh[0][x].lpData = (char*) malloc(woh[0][x].dwBufferLength);
assert(waveOutPrepareHeader(hWaveOut[0],&woh[0][x],sizeof(woh[0][x])) == MMSYSERR_NOERROR);
assert(waveOutWrite(hWaveOut[0],&woh[0][x],sizeof(woh[0][x])) == MMSYSERR_NOERROR);
}
int bufferIndex = 0;
DWORD times[TestCount];
for(int x=0;x<TestCount;x++)
{
DWORD t = timeGetTime();
assert(WaitForSingleObject(WaveEvent,INFINITE) == WAIT_OBJECT_0); // .NET code uses EventWaitHandle.WaitOne()
assert(woh[0][bufferIndex].dwFlags & WHDR_DONE);
assert(waveOutWrite(hWaveOut[0],&woh[0][bufferIndex],sizeof(woh[0][bufferIndex])) == MMSYSERR_NOERROR);
bufferIndex = bufferIndex == 0 ? 1 : 0;
times[x] = timeGetTime() - t;
}
}
The times[] array for the C code always has values around 80, which is the PCM buffer length I am using. The .NET code also shows similar values sometimes, however, it sometimes shows values as high as 1000, and more often values in the 300 to 500 range.
Doing the part that is in the bottom loop inside the OnWaveOut callback instead of using events, makes it fast all the time, with .NET or native code. So it appears the issue is with the wait events in .NET only, and mostly only when "other stuff" is happening on the test PC -- but not a lot of stuff, can be as simple as moving a window around, or opening a folder in my computer.
Maybe .NET events are just really bad about context switching, or .NET apps/threads in general? In the app I'm using to test my .NET code, the code just runs in the constructor of a form (easy place to add test code), not on a thread-pool thread or anything.
I also tried using the version of waveOutOpen that takes an event instead of a function callback. This is also slow in .NET but not in C, so again, it points to an issue with events and/or context switching.
I'm trying to keep my code simple and setting an event to do the work outside the callback is the best way I can do this with my overall design. Actually just using the event driven waveOut is even better, but I tried this other method because straight callbacks are fast, and I didn't expect normal event wait handles to be so slow.
Maybe not 100% related but I faced somehow the same issue: calling EventWaitHandle.Set for X times is fine, but then, after a threshold that I can't mention, each call of this method takes 1 complete second!
Is appears that some .net way to synchronize thread are much slower than the ones you use in C++.
The all mighty #jonskeet once made a post on his web site (https://jonskeet.uk/csharp/threads/waithandles.html) where he also refers the very complex concept of .net synchronization domains explained here: https://www.drdobbs.com/windows/synchronization-domains/184405771
He mentions that .net and the OS must communicate in a very very very time precise way with object that must be converted from one environment to another. All this is very time consuming.
I summarized a lot here, not to take credit for the answer but there is an explanation. There are some recommendations here (https://learn.microsoft.com/en-us/dotnet/standard/threading/overview-of-synchronization-primitives) about some ways to choose how to synchronize depending on the context, and the performance aspect is mentioned a little bit.