Process huge amount of data with dynamic code

Process huge amount of data with dynamic code - c#

I want to be able to change the code of my game dynamicly. For example lets say my game is structured this way:
class GameState {
public int SomeData;
public Entity[] EntityPool;
}
interface IServices {
IRenderer Renderer { get; }
}
interface IGameCode {
void RenderAndUpdate(GameState currentState, IServices serviceProvider);
}
I now want be able to write code like this:
void MainLoop() {
IGameCode gameCode = new DefaultGameCode();
while(true) {
// Handle Plattform things
if(shouldUseNewGameCode) {
UnloadCode(gameCode);
gameCode = LoadCode("new.dll");
// or
gameCode = LoadCode("new.cs");
}
// Call GameTick
gameCode.RenderAndUpdate(gameState, services);
}
}
I already used AppDomains and a Proxyclass but it is too slow to serialize every frame. I tried to just pass a pointer but since AppDomains use their own virtual address space i cant access the GameState Object. My other idea was to use Reflection to get the IL from the compiled method via GetMethodBody() and pass it to an DynamicMethod but this would limit the way how I could write the RenderAndUpdate method since I can not use Submethods or Variables in the IGameCode implementation.
So how can I achive what I want to do?

As you've seen, you really don't want to be crossing AppDomain boundaries on every frame, especially if that code has to call back to the main code e.g. IServices a bunch of times. Even with MarshalByRefObject, which can improve things a little, it's going to be too slow. So you need a solution that doesn't involve the AppDomain.
How big is your assembly? How often do you expect to change it?
Noting that .NET assemblies are generally fairly space-efficient, and that in your scenario it seems unlikely a user would switch assemblies more than a few times in a session, I would just read your DLL into memory as a byte[] and then use Assembly.Load(byte[]); to load it.
Alternatively, if you really can't tolerate a dead assembly in your process memory space, I think it would be better to use a helper process, aka "launcher": when you want to switch implementations, start up the helper process (or just leave it running all the time if you want), which in turn will wait for the current game process to exit, and then will start a new one with the new settings.
This will be slower to switch, but of course is a one-time cost for each switch and then the program can run full-speed during the actual gameplay.

Related

How to use Roslyn C# scripting in batch processing with several scripts?

I am writing multi-threaded solution that will be used for transferring data from different sources to a central database. Solution, in general, has two parts:
Single-threaded Import engine
Multi-threaded client that invokes Import engine in threads.
In order to minimize custom development I am using Roslyn scripting. This feature is enabled with Nuget Package manager in Import engine project.
Every import is defined as transformation of input table – that has collection of input fields – to destination table – again with collection of destination fields.
Scripting engine is used here to allow custom transformation between input and output. For every input/output pair there is text field with custom script. Here is simplified code used for script initialization:
//Instance of class passed to script engine
_ScriptHost = new ScriptHost_Import();
if (Script != "") //Here we have script fetched from DB as text
{
try
{
//We are creating script object …
ScriptObject = CSharpScript.Create<string>(Script, globalsType: typeof(ScriptHost_Import));
//… and we are compiling it upfront to save time since this might be invoked multiple times.
ScriptObject.Compile();
IsScriptCompiled = true;
}
catch
{
IsScriptCompiled = false;
}
}
Later we will invoke this script with:
async Task<string> RunScript()
{
return (await ScriptObject.RunAsync(_ScriptHost)).ReturnValue.ToString();
}
So, after import definition initialization, where we might have any number of input/output pair description along with script object, memory foot print increases approximately 50 MB per pair where scripting is defined.
Similar usage pattern is applied to validation of destination rows before storing it to a DB (every field might have several scripts that are used to check validity of data).
All in all, typical memory footprint with modest transformation/validation scripting is 200 MB per thread. If we need to invoke several threads, memory usage will be very high and 99% will be used for scripting.
If Import engine is enclosed in WCF based middle layer (which I did) quickly we stumble upon "Insufficient memory" problem.
Obvious solution would be to have one scripting instance that would somehow dispatch code execution to specific function inside the script depending on the need (input/output transformation, validation or something else). I.e. instead of script text for every field we will have SCRIPT_ID that will be passed as global parameter to script engine. Somewhere in script we need to switch to specific portion of code that would execute and return appropriate value.
Benefit of such solution should be considerably better memory usage. Drawback the fact that script maintenance is removed from specific point where it is used.
Before implementing this change, I would like to hear opinions about this solution and suggestions for different approach.

As it seems - using scripting for the mission might be a wasteful overkill - you use many application layers and the memory gets full.
Other solutions:
How do you interface with the DB? you can manipulate the query itself according to your needs instead of writing a whole script for that.
How about using Generics? with enough T's to fit your needs:
public class ImportEngine<T1,T2,T3,T3,T5>
Using Tuples (which is pretty much like using generics)
But if you still think scripts is the right tool for you, I found that the memory usage of scripts can be lowered by running the script work inside your application, (and not with RunAsync), you can do this be getting back from RunAsync the logic, and re-use it, instead of doing the work inside the heavy and memory wasteful RunAsync. Here is an example:
Instead of simply (the script string):
DoSomeWork();
You can do this (IHaveWork is an interface defined in you app, with only one method Work):
public class ScriptWork : IHaveWork
{
Work()
{
DoSomeWork();
}
}
return new ScriptWork();
This way you call the heavy RunAsync only for short period, and it is returning a worker that you can re-use inside your application (and you can of course extend this by adding parameters to the Work method and inherit logic from your application and so on...).
The pattern also breaking the isolation between your app and the script, so you can easily give and get data from the script.
EDIT
Some quick benchmark:
This code:
static void Main(string[] args)
{
Console.WriteLine("Compiling");
string code = "System.Threading.Thread.SpinWait(100000000); System.Console.WriteLine(\" Script end\");";
List<Script<object>> scripts = Enumerable.Range(0, 50).Select(num =>
CSharpScript.Create(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).ToList();
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced); // for fair-play
for (int i = 0; i < 10; i++)
Task.WaitAll(scripts.Select(script => script.RunAsync()).ToArray());
}
Consumes about ~600MB in my environment (just referenced the System.Windows.Form in the ScriptOption for sizing the scripts).
It reuse the Script<object> - it's not consuming more memory on second call to RunAsync.
But we can do better:
static void Main(string[] args)
{
Console.WriteLine("Compiling");
string code = "return () => { System.Threading.Thread.SpinWait(100000000); System.Console.WriteLine(\" Script end\"); };";
List<Action> scripts = Enumerable.Range(0, 50).Select(async num =>
await CSharpScript.EvaluateAsync<Action>(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).Select(t => t.Result).ToList();
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
for (int i = 0; i < 10; i++)
Task.WaitAll(scripts.Select(script => Task.Run(script)).ToArray());
}
In this script, I'm simplifying a bit the solution I proposed to returning Action object, but i think the performance impact is small (but on real implementations I really think you should use your own interface to make it flexible).
When the script is running, you can see a steep rise in memory to ~240MB, but after I'm calling the garbage collector (for demonstration purpose, and I did the same on the previous code) the memory usage drops back to ~30MB. It also faster.

I am not sure whether this existed at the time of question creation but there is something very similar and, let's say, official way how to run scripts multiple times without increasing program memory. You need to use CreateDelegate method that will do exactly what is expected.
I will post it here just for the convenience:
var script = CSharpScript.Create<int>("X*Y", globalsType: typeof(Globals));
ScriptRunner<int> runner = script.CreateDelegate();
for (int i = 0; i < 10; i++)
{
Console.WriteLine(await runner(new Globals { X = i, Y = i }));
}
It takes some memory initially, but keep runner in some global list and invoke it later quickly.

Are Private properties of a class called within a Parallel.Foreach body Thread Safe?

I am tasked with writing a system to process result files created by a different process(which I have no control over) and and trying to modify my code to make use of Parallel.Foreach. The code works fine when just calling a foreach but I have some concerns about thread safety when using the parallel version. The base question I need answered here is "Is the way I am doing this going to guarantee thread safety?" or is this going to cause everything to go sideways on me.
I have tried to make sure all calls are to instances and have removed every static anything except the initial static void Main. It is my current understanding that this will do alot towards assuring thread safety.
I have basically the following, edited for brevity
static void Main(string[] args)
{
MyProcess process = new MyProcess();
process.DoThings();
}
And then in the actual process to do stuff I have
public class MyProcess
{
public void DoThings()
{
//Get some list of things
List<Thing> things = getThings();
Parallel.Foreach(things, item => {
//based on some criteria, take actions from MyActionClass
MyActionClass myAct = new MyActionClass(item);
string tempstring = myAct.DoOneThing();
if(somecondition)
{
MyAct.DoOtherThing();
}
...other similar calls to myAct below here
};
}
}
And over in the MyActionClass I have something like the following:
public class MyActionClass
{
private Thing _thing;
public MyActionClass(Thing item)
{
_thing = item;
}
public string DoOneThing()
{
return _thing.GetSubThings().FirstOrDefault();
}
public void DoOtherThing()
{
_thing.property1 = "Somenewvalue";
}
}
If I can explain this any better I'll try, but I think that's the basics of my needs
EDIT:
Something else I just noticed. If I change the value of a property of the item I'm working with while inside the Parallel.Foreach (in this case, a string value that gets written to a database inside the loop), will that have any affect on the rest of the loop iterations or just the one I'm on? Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case?

There is no shared mutable state between actions in the Parallel.ForEach that I can see, so it should be thread-safe, because at most one thread can touch one object at a time.
But as it has been mentioned there is nothing shared that can be seen. It doesn't mean that in the actual code you use everything is as good as it seems here.
Or that nothing will be changed by you or your coworker that will make some state both shared and mutable (in the Thing, for example), and now you start getting difficult to reproduce crashes at best or just plain wrong behaviour at worst that can be left undetected for a long time.
So, perhaps you should try to go fully immutable near threading code?
Perhaps.
Immutability is good, but it is not a silver bullet, and it is not always easy to use and implement, or that every task can be reasonably expressed through immutable objects. And even that accidental "make shared and mutable" change may happen to it as well, though much less likely.
It should at least be considered as a possible option/alternative.
About the EDIT
If I change the value of a property of the item I'm working with while
inside the Parallel.Foreach (in this case, a string value that gets
written to a database inside the loop), will that have any affect on
the rest of the loop iterations or just the one I'm on?
If you change a property and that object is not used anywhere else, and it doesn't rely on some global mutable state (for example, sort of a public static Int32 ChangesCount that increments with each state change), then you should be safe.
a string value that gets written to a database inside the loop - depending on the used data access technology and how you use it, you may be in trouble, because most of them are not designed for multithreaded environment, like EF DbContext, for example. And obviously do not forget that dealing with concurrent access in database is not always easy, though that is a bit away from our original theme.
Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case - if there is no risk of external concurrent changes, then it is just an unnecessary work. And if there is a chance of another threads(not Parallel.For) making changes to those objects that are being persisted, then you already have bigger problems than Parallel.For.
Objects should always have observable consistent state (unlike when half of properties set by one thread, and half by another, while you try to persist that who-knows-what), and if they are used by many threads, then they should be already thread-safe - there should be no way to put them into inconsistent state.
And if they want to be persisted by external code, such objects should probably provide:
Either SyncRoot property to synchronize property reading code.
Or some current state snapshot DTO that is created internally by some thread-safe method like ThingSnapshot Thing.GetCurrentData() { lock() {} }.
Or something more exotic.

Is it a good practice to perform initialization within a Property?

I have a class PluginProvider that is using a PluginLoader component to load plugins (managed/native) from the file system. Within the PluginProvider class, there is currently defined a property called 'PluginTypes' which calls the 'InitializePlugins' instance method on get().
class PluginProvider
{
IEnumerable<IPluginType> PluginTypes
{
get
{
//isInitialized is set inside InitializePlugins method
if(!isInitialized)
{
InitializePlugins(); //contains thread safe code
}
//_pluginTypes is set within InitializePlugins method
return _pluginTypes;
}
}
}
I am looking at refactoring this piece of code. I want to know whether this kind of initialization is fine to do within a property. I know that heavy operations must not be done in a property. But when i checked this link : http://msdn.microsoft.com/en-us/library/vstudio/ms229054.aspx , found this " In particular, operations that access the network or the file system (other than once for initialization) should most likely be methods, not properties.". Now I am a bit confused. Please help.

If you want to delay the initialization as much as you can and you don't know when your property (or properties) will be called, what you're doing is fine.
If you want to delay and you have control over when your property will be called the first time, then you might want to make your method InitializePlugins() public and call it explicitly before accessing the property. This option also opens up the possibility of initializing asynchronously. For example, you could have an InitializePluginsAsync()that returns a Task.
If delaying the initialization is not a big concern, then just perform the initialization within the constructor.

This is of course a matter of taste. But what i would do depends on the length of the operation you're trying to perform. If it takes time to load the plugins, i would create a public method which any user would need to call before working with the class. A different approach would be to put the method inside the constructor, but IMO constructors should return as quickly as possible and should contain field / property initialization.
class PluginProvider
{
private bool _isInitialized;
IEnumerable<IPluginType> PluginTypes { get; set;}
public void Initialize()
{
if (_isInitialized)
{
return;
}
InitializePlugins();
_isInitialized = true;
}
}
Note the down side of this is that you will have to make sure the Initialize method was called before consuimg any operation.
Another thing that just came to mind backing this approach is exception handling. Im sure you wouldn't want your constructorcto be throwing any kind of IOException in case it couldn't load the types from the file system.

Any initialization type of code should be done in the constructor, that way you know it will be called once and only once.
public class PluginProvider
{
IEnumerable<IPluginType> PluginTypes
{
get
{
return _pluginTypes;
}
}
public PluginProvider()
{
InitializePlugins();
}
}

What you are doing there is called lazy initialization. You are postponing doing a potentially costly operation until the very moment its output is needed.
Now, this is not an absolute rule. If your InitializePlugins method takes a long time to complete and it might impact user experience, then you can consider moving it into a public method or even making it asynchronous and call it outside of the property: at app startup or whenever you find a good moment to call a long-lasting operation.
Otherwise, if it's a short lived one-time thing it can stay there. As I said, not an absolute rule. Generally these are some guidelines for whatever applies to a particular case.

What's the explanation of this behavior in C#

Consider this simple console application:
class Program
{
static void Main(string[] args)
{
var human = CreateHuman(args[0]);
Console.WriteLine("Created Human");
Console.ReadLine();
}
public static object CreateHuman(string association)
{
object human = null;
if (association == "is-a")
{
human = new IsAHuman();
}
else
{
human = new HasAHuman();
}
return human;
}
}
public class IsAHuman : Human
{
}
public class HasAHuman
{
public Human Human { get; set; }
}
The Human class is in another assembly, say HumanAssembly.dll. If HumanAssembly.dll exists in the bin directory of our console app, everything would be fine. And as we might expect, by removing it we encounter FileNotFoundException.
I don't understand this part though. Comment human = new IsAHuman(); line, recompile and remove HumanAssembly.dll. Console app won't throw any exception in this case.
My guess is that CLR compiler differentiates between is a and has a associations. In other words, CLR tries to find out and understand and probably load all the types existing in the class definition statement, but it can instantiate a class without knowing what's inside it. But I'm not sure about my interpretation.
I fail to find a good explanation. What is the explanation for this behavior?

You are seeing the behavior of the JIT compiler. Just In Time. A method doesn't get compiled until the last possible moment, just before it is called. Since you removed the need to actually construct a Human object, there is no code path left that forces the jitter to load the assembly. So your program won't crash.
The last remaining reference to Human is the HashAHuman.Human property. You don't use it.
Predicting when the jitter is going to need to load an assembly is not that straight-forward in practice. It gets pretty difficult to reason through when you run the Release build of your code. That normally enables the optimizer that's built into the jitter, one of its core optimization strategies is to inline a method. To do that, it needs access to the method before it is called. You'd need an extra level of indirection, an extra method that has the [MethodImpl(MethodImplOptions.NoInlining)] attribute to stop it from having a peek. That gets to be a bit off into the deep end, always consider a plug-in architecture first, something like MEF.

Here is great explanation of what you are looking for.
The CLR Loader
Specially in the following lines -
This policy of loading types (and assemblies and modules) on demand means that parts of a program that are not used are never brought into
memory. It also means that a running application will often see new
assemblies and modules loaded over time as the types contained in
those files are needed during execution. If this is not the behavior
you want, you have two options. One is to simply declare hidden static
fields of the types you want to guarantee are loaded when your type is
loaded. The other is to interact with the loader explicitly.
As the Bold line says, if you code does not execute a specific line then the types won't be loaded, even if the code is not commented out.
Here is also a similar answer that you might also be interested in -
How are DLLs loaded by the CLR?

Disposable Context Object pattern

Introduction
I just thought of a new design pattern. I'm wondering if it exists, and if not, why not (or why I shouldn't use it).
I'm creating a game using an OpenGL. In OpenGL, you often want to "bind" things -- i.e., make them the current context for a little while, and then unbind them. For example, you might call glBegin(GL_TRIANGLES) then you draw some triangles, then call glEnd(). I like to indent all the stuff inbetween so it's clear where it starts and ends, but then my IDE likes to unindent them because there are no braces. Then I thought we could do something clever! It basically works like this:
using(GL.Begin(GL_BeginMode.Triangles)) {
// draw stuff
}
GL.Begin returns a special DrawBind object (with an internal constructor) and implements IDisposable so that it automatically calls GL.End() at the end of the block. This way everything stays nicely aligned, and you can't forget to call end().
Is there a name for this pattern?
Usually when I see using used, you use it like this:
using(var x = new Whatever()) {
// do stuff with `x`
}
But in this case, we don't need to call any methods on our 'used' object, so we don't need to assign it to anything and it serves no purpose other than to call the corresponding end function.
Example
For Anthony Pegram, who wanted a real example of code I'm currently working on:
Before refactoring:
public void Render()
{
_vao.Bind();
_ibo.Bind(BufferTarget.ElementArrayBuffer);
GL.DrawElements(BeginMode.Triangles, _indices.Length, DrawElementsType.UnsignedInt, IntPtr.Zero);
BufferObject.Unbind(BufferTarget.ElementArrayBuffer);
VertexArrayObject.Unbind();
}
After refactoring:
public void Render()
{
using(_vao.Bind())
using(_ibo.Bind(BufferTarget.ElementArrayBuffer))
{
GL.DrawElements(BeginMode.Triangles, _indices.Length, DrawElementsType.UnsignedInt, IntPtr.Zero);
}
}
Notice that there's a 2nd benefit that the object returned by _ibo.Bind also remembers which "BufferTarget" I want to unbind. It also draws your atention to GL.DrawElements, which is really the only significant statement in that function (that does something noticeable), and hides away those lengthy unbind statements.
I guess the one downside is that I can't interlace Buffer Targets with this method. I'm not sure when I would ever want to, but I would have to keep a reference to bind object and call Dispose manually, or call the end function manually.
Naming
If no one objects, I'm dubbing this Disposable Context Object (DCO) Idiom.
Problems
JasonTrue raised a good point, that in this scenario (OpenGL buffers) nested using statements would not work as expected, as only one buffer can be bound at a time. We can remedy this, however, by expanding on "bind object" to use stacks:
public class BufferContext : IDisposable
{
private readonly BufferTarget _target;
private static readonly Dictionary<BufferTarget, Stack<int>> _handles;
static BufferContext()
{
_handles = new Dictionary<BufferTarget, Stack<int>>();
}
internal BufferContext(BufferTarget target, int handle)
{
_target = target;
if (!_handles.ContainsKey(target)) _handles[target] = new Stack<int>();
_handles[target].Push(handle);
GL.BindBuffer(target, handle);
}
public void Dispose()
{
_handles[_target].Pop();
int handle = _handles[_target].Count > 0 ? _handles[_target].Peek() : 0;
GL.BindBuffer(_target, handle);
}
}
Edit: Just noticed a problem with this. Before if you didn't Dispose() of your context object there wasn't really any consequence. The context just wouldn't switch back to whatever it was. Now if you forget to Dispose of it inside some kind of loop, you're wind up with a stackoverflow. Perhaps I should limit the stack size...

A similar tactic is used with Asp.Net MVC with the HtmlHelper. See http://msdn.microsoft.com/en-us/library/system.web.mvc.html.formextensions.beginform.aspx (using (Html.BeginForm()) {....})
So there's at least one precedent for using this pattern for something other than the obvious "need" for IDisposable for unmanaged resources like file handles, database or network connections, fonts, and so on. I don't think there's a special name for it, but in practice, it seems to be the C# idiom that serves as the counterpart to the C++ idiom, Resource Acquisition is Initialization.
When you're opening a file, you're acquiring, and guaranteeing the disposal of, a file context; in your example, the resource you're acquiring is a is a "binding context", in your words. While I've heard "Dispose pattern" or "Using pattern" used to describe the broad category, essentially "deterministic cleanup" is what you're talking about; you're controlling the lifetime the object.
I don't think it's really a "new" pattern, and the only reason it stands out in your use case is that apparently the OpenGL implementation you're depending on didn't make a special effort to match C# idioms, which requires you to build your own proxy object.
The only thing I'd worry about is if there are any non-obvious side effects, if, for example, you had a nested context where there were similar using constructs deeper in your block (or call stack).

ASP.NET/MVC uses this (optional) pattern to render the beginning and ending of a <form> element like this:
#using (Html.BeginForm()) {
<div>...</div>
}
This is similar to your example in that you are not consuming the value of your IDisposable other than for its disposable semantics. I've never heard of a name for this, but I've used this sort of thing before in other similar scenarios, and never considered it as anything other than understanding how to generally leverage the using block with IDisposable similar to how we can tap into the foreach semanatics by implementing IEnumerable.

I would this is more an idiom than a pattern. Patterns usually are more complex involving several moving parts, and idioms are just clever ways to do things in code.
In C++ it is used quite a lot. Whenever you want to aquire something or enter a scope you create an automatic variable (i.e. on the stack) of a class that begins or creates or whatever you need to be done on entry. When you leave the scope where the automatic variable is declared the destructor is called. The destructor should then end or delete or whatever is required to clean up.
class Lock {
private:
CriticalSection* criticalSection;
public:
Lock() {
criticalSection = new CriticalSection();
criticalSection.Enter();
}
~Lock() {
criticalSection.Leave();
delete criticalSection;
}
}
void F() {
Lock lock();
// Everything in here is executed in a critical section and it is exception safe.
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.