Packing a .NET executable inside a C/C++ executable

Packing a .NET executable inside a C/C++ executable - c#

I built a C# WPF (.NET Core 3.1) application using that has got some interest and I'm about to monetize it.
However, building any kind of license check in C# is pretty much useless, and any user would be able to use a decompiler such as ILSpy to crack it, thus rendering my work pretty much useless.
I took a long hard look on the .NET obfuscators, but ultimately concluded they did not fit my requirements because there are decompilers that can still retrieve the code from Dotfuscator, Babel, Obfuscar, etc. Simply obfuscating names and whatnot isn't really useful, as one could simple debug the code to the point where a license is required.
What I'm trying to do now is build a C/C++ launcher that will execute my .NET from memory.
My plan is to stream the bytes from a server, load them in memory, and run the program. I, however, don't know a whole lot about how I could achieve this.
I've tried VirtualAlloc to allocate all the bytes and changed set memory page to be executable, but it didn't work.
I've tried adjustments based on a few pieces of code that run PE from memory:
https://github.com/aaaddress1/RunPE-In-Memory
https://github.com/codecrack3/Run-PE---Run-Portable-Executable-From-Memory/blob/master/RunPE.cpp
https://www.codeproject.com/Articles/13897/Load-an-EXE-File-and-Run-It-from-Memory
The closes I got was a 0xc0000005 error when trying to run the executable from memory (an array of bytes that makeup my program).
How can this be done? I'd really like to avoid having to rewrite the whole thing in C/C++, specially because of the complex UI.

Related

Is there a way to dynamically generate native x86 code and execute it from .NET?

I'd like to dynamically generate native unmanaged x86 code and then execute it super fast from managed .NET. I know I could emit some IL code and execute that, but I'm thinking the jetting would take too much time to get the benefit of any speed gain I get from it.
I need super fast code, I want to generate a function with the x86 opcodes in memory and pass a fixed pointer to a memory block to it, so it would make some really fast calculations on that block.
I just not sure how to call the native code from .net. remember this should be on the fly in memory, not building a dll. Speed is what really matters here. It's part of a genetic computation project.

The C language is the "portable assembler" and you can generate x86 opcodes directly (writing in assembler would be better). If you compiled the C code into an object, you could link it into .net as a library.
If you are trying to generate executable code dynamically (on-the-fly) you would need to allocate a binary array, push the object code into the array, then get the beginning address of the array after the memory headed and assign that to a function pointer and call it.
However, antivirus software specifically looks for this behavior and would identify your code a virus more than likely.
Also your processor is designed to have "code" memory segments and "data" memory segments. Typically you cannot dynamically modify the code segment without causing a segfault and you cannot execute out of a data segment without causing a segfault.
Also, you code would only run on a SISC processor (x86) and not on a RISC processor.
There is a lot to consider.
We used to do this in assembler in the old days on SISC systems.
Cheers,
David

The short answer is that you can't do that with C#.
You can do that with C++/CLI, by building a mixed-mode assembly that does your "super fast" calculations in native code. That way, you wouldn't need (presumably) to generate executable code "on-the-fly".
If for some reason you can't hard-code the calculation functions, then you will have to acquire executable memory from the operating system.
On Windows, you do that with the VirtualAlloc and VirtualProtect functions. Again, doing that from C# would require P/Invoke interop, which would most likely reverse your speed gains.

How to speed up MonoTouch compilation time?

It is well known that
If compiling takes even 15 seconds, programmers will get bored while the compiler runs and switch over to reading The Onion, which will suck them in and kill hours of productivity.
Our MonoTouch app takes 40 seconds to compile on Macbook Air in Debug/Simulator configuration.
We have about 10 assemblies in the solution.
We're also linking against some native libraries with gcc_flags.
I'm sure there are ways to optimize compilation time that I'm not aware of, which might have to do with references, linker, whatever.
I'm asking this question in hope that someone with better knowledge than me will compile (no pun intended) a list of tips and things to check to reduce MonoTouch compilation time for debug builds.
Please don't suggest hardware optimizations or optimizations not directly related to MonoTouch.

Build Time Improvements in Xamarin.iOS 6.4
Xamarin.iOS 6.4 has significant build time improvements, and there is now an option to only send updated bits of code to the device. See for yourself:
(source: xamarin.com)
Read more and learn how to enable incremental build in Rolf's post.
Evolve 2013 Video
An updated and expanded version of this content can be seen in the video of the Advanced iOS Build mechanics talk I gave at Evolve 2013.
Original Answer
There are several factors affecting build speed. However most of them have more impact on device builds, including the use of the managed linker that you mentioned.
Managed Linker
For devices then Link all is the fastest, followed by Link SDK and (at the very end) Don't link. The reason is that the linker can eliminate code faster than the AOT compiler can build it (net gain). Also the smaller .app will upload faster to your devices.
For simulator Don't link is always faster because there's no AOT (the JIT is used). You should not use other linking options unless you want to test them (it's still faster than doing a device build).
Device tricks
Building a single architecture (e.g. ARMv7) is faster than a FAT binary (e.g. ARMv7 + ARMV7s). Smaller applications also means less time to upload to device;
The default AOT compiler (mono) is a lot faster than using LLVM compilers. However the later will generate better code and also supports ARMv7s, Thumb2;
If you have large assets bundled in your .app then it will take time to deploy/upload them (every time since they must be signed) with your app. I wrote a blog post on how you can workaround this - it can save a lot of time if you have large assets;
Object file caching was implemented in MonoTouch 5.4. Some builds will be a lot faster, but others won't be (when the cache must be purged) faster (but never slower ;-). More information why this often happens here).
Debug builds takes longer because of symbols, running dsymutil and, since it ends up being larger, extra time to upload to devices.
Release builds will, by default (you can turn it off), do a IL strip of the assemblies. That takes only a bit of time - likely gained back when deploying (smaller .app) to the device.
Simulator tricks
Like said earlier try to avoid linking since it will take more time and will require copying assemblies (instead of symlinking them);
Using native libraries is slower because we cannot reuse the shared simlauncher main executable in such cases and need to ask gcc to compile one for the application (and that's slow).
Finally whenever in doubt time it! and by that I mean you can add --time --time to your project extra mtouch arguments to see a timestamp after each operation :-)

This is not really meant as an answer, rather a temporary placeholder until there is a better one.
I found this quote by Seb:
Look at your project's build options and make sure the "Linker
behavior" is at the default "Link SDK assemblies".
If it's showing "Don't link" then you'll experience very long build
time (a large part of it in dsymutil).
I don't know if it is still relevant though, because MonoDevelop shows a warning sign when I choose this option, and it doesn't seem to affect performance much.

You cannot expect your compiler to be lightninng quick without understanding everything that it is required to do. Larger applications will naturally take longer. Different languages or different compilers of the same language can make a huge difference on how long it takes to compile your code.
We have a project that will take almost 2 minutes to compile. Your best solution is to figure out a way to reduce the number of times you compile your code.
Instead of trying to fix 1 line of code and rebuilding, over and over again. Get a group of people together to discuss the problem. Or create a list of 3 or 4 things you want to work on, complete them all then test.
These are just some suggestions and they will not work in all cases.

Wrap and Protect executable

I'm working on a copy protection software and I'm trying to create a wrapper around any kind of executables (managed and unmanaged). This wrapper will then try to execute the wrapped executable without writing it to the disc like normal, and execute it with Process.Start().
I used .Net 4.0 Assembly and Appdomain to get it working, but as I've read and tested, it will only work with .Net executables. How would I go around and execute any kind of executable without writing it "naked" to the drive?
Can I execute it from within an encrypted compressed file for example?
or MemoryMappedFile?

Really you are wasting your time. You CANNOT stop someone from copying your executable, getting your code, or anything else. Even if you can perfectly protect the executable file on disk, the moment it starts running, someone can use a debugger to make a dump of the executable, even from a memory mapped file. This is how products like Xenocode, or .NET Reactor, or any other packer for that matter, are defeated.
The better option for you is to stop and think about what it is that you are really trying to achieve. Are you concerned about people violating a license agreement? Are you worried about your software appearing on The Pirate Bay? If you make useful software, both of these things are eventualities, not possibilities. Protect your software with copyright, and your algorithms with patents, if appropriate. Then you have legal recourse to go after violators.
Sorry to burst your bubble, but there is no technical solution that cannot be defeated. Dongles can be emulated, web services can be patched around, encryption keys can be sniffed, etc. Spend your time making your software great, not trying to protect what cannot be protected.

Measure startup performance c# application

I noticed that sometimes a .net 4.0 c# application takes a long time to start, without any apparent reason. Can can I determine what's actually happening, what modules are loaded? I'm using a number of external assemblies. Can putting them into the GAC improve performances?
Is .NET 4 slower than .NET 2?

.NET programs have two distinct start-up behaviors. They are called cold-start and warm-start. The cold-start is the slow one, you'll get it when no .NET program was started before. Or when the program you start is large and was never run before. The operating system has to find the assembly files on disk, they won't be available in the file system cache (RAM). That takes a while, hard disks are slow and there are a lot of files to find. A small do-nothing Winforms app has to load 51 DLLs to get started. A do-nothing WPF app weighs in at 77 DLLs.
You get a warm start when the assembly files were loaded before, not too long ago. The assembly file data now comes from RAM instead of the slow disk, that's zippedy-doodah. The only startup overhead is now the jitter.
There's little you can do about cold starts, the assemblies have to come of the disk one way or another. A fast disk makes a Big difference, SSDs are especially effective. Using ngen.exe to pre-jit an assembly actually makes the problem worse, it creates another file that needs to be found and loaded. Which is the reason that Microsoft recommends not prejitting small assemblies. Seeing this problem with .NET 4 programs is also highly indicated, you don't have a lot of programs that bind to the version 4 CLR and framework assemblies. Not yet anyway, this solves itself over time.
There's another way this problem automatically disappears. The Windows SuperFetch feature will start to notice that you often load the CLR and the jitted Framework assemblies and will start to pre-load them into RAM automatically. The same kind of trick that the Microsoft Office and Adobe Reader 'optimizers' use. They are also programs that have a lot of DLL dependencies. Unmanaged ones, the problem isn't specific to .NET. These optimizers are crude, they preload the DLLs when you login. Which is the 'I'm really important, screw everything else' approach to working around the problem, make sure you disable them so they don't crowd out the RAM space that SuperFetch could use.

The startup time is most likely due to the runtime JIT compiling assembly IL into machine code for execution. It can also be affected by the debugger - as another answerer has suggested.
Excluding that - I'll talk about an application ran 'in the wild' on a user's machine, with no debugger etc.
The JIT compiler in .Net 4 is, I think it's fair to say, better than in .Net 2 - so no; it's not slower.
You can improve this startup time significantly by running ngen on your application's assemblies - this pre-compiles the EXEs and DLLs into native images. However you lose some flexibility by doing this and, in general, there is not much point.
You should see the startup time of some MFC apps written in C++ - all native code, and yet depending on how they are linked they can take just as long.
It does, of course, also depend on what an application is actually doing at startup!

I dont think putting your assemblies in GAC will boot the performance.
If possible do logging for each instruction you have written on Loading or Intialize events which may help you to identify which statement is actually taking time and with this you can identify the library which is taking time in loading.

C++ backend with C# frontend?

I have a project in which I'll have to process 100s if not 1000s of messages a second, and process/plot this data on graphs accordingly. (The user will search for a set of data in which the graph will be plotted in real time, not literally having to plot 1000s of values on a graph.)
I'm having trouble understanding using DLLs for having the bulk of the message processing in C++, but then handing the information into a C# interface. Can someone dumb it down for me here?
Also, as speed will be a priority, I was wondering if accessing across 2 different layers of code will have more of a performance hit than programming the project in its entirety in C#, or of course, C++. However, I've read bad things about programming a GUI in C++; in regards to which, this application must also look modern, clean, professional etc. So I was thinking C# would be the way forward (perhaps XAML, WPF).
Thanks for your time.

The simplest way to interop between a C/C++ DLL and a .NET Assembly is through p/invoke. On the C/C++ side, create a DLL as you would any other. On the C# side you create a p/invoke declaration. For example, say your DLL is mydll.dll and it exports a method void Foo():
[DllImport("mydll.dll")]
extern static void Foo();
That's it. You simply call Foo like any other static class method. The hard part is getting data marshalled and that is a complicated subject. If you are writing the DLL you can probably go out of your way to make the export functions easily marshalled. For more on the topic of p/invoke marshalling see here: http://msdn.microsoft.com/en-us/magazine/cc164123.aspx.
You will take a performance hit when using p/invoke. Every time a managed application makes an unmanaged method call, it takes a hit crossing the managed/unmanaged boundary and then back again. When you marshal data, a lot of copying goes on. The copying can be reduced if necessary by using 'unsafe' C# code (using pointers to access unmanaged memory directly).
What you should be aware of is that all .NET applications are chock full of p/invoke calls. No .NET application can avoid making Operating System calls and every OS call has to cross into the unmanaged world of the OS. WinForms and even WPF GUI applications make that journey many hundreds, even thousands of times a second.
If it were my task, I would first do it 100% in C#. I would then profile it and tweak performance as necessary.

If speed is your priority, C++ might be the better choice. Try to make some estimations about how hard the calculation really is (1000 messages can be trivial to handle in C# if the calculation per message is easy, and they can be too hard for even the best optimized program). C++ might have some more advantages (regarding performance) over C# if your algorithms are complex, involving different classes, etc.
You might want to take a look at this question for a performance comparison.
Separating back-end and front-end is a good idea. Whether you get a performance penalty from having one in C++ and the other in C# depends on how much data conversion is actually necessary.
I don't think programming the GUI is a pain in general. MFC might be painful, Qt is not (IMHO).
Maybe this gives you some points to start with!

Another possible way to go: sounds like this task is a prime target for parallelization. Build your app in such a way that it can split its workload on several CPU cores or even different machines. Then you can solve your performance problems (if there will be any) by throwing hardware at them.

If you have C/C++ source, consider linking it into C++/CLI .NET Assembly. This kind of project allows you to mix unmanaged code and put managed interfaces on it. The result is a simple .NET assembly which is trivial to use in C# or VB.NET projects.
There is built-in marshaling of simple types, so that you can call functions from the managed C++ side into the unmanaged side.
The only thing you need to be aware of is that when you marshal a delegate into a function pointer, it doesn't hold a reference, so if you need the C++ to hold managed callbacks, you need to arrange for a reference to be held. Other than that, most of the built-in conversions work as expected. Visual Studio will even let you debug across the boundary (turn on unmanaged debugging).
If you have a .lib, you can use it in a C++/CLI project as long as it's linked to the C-Runtime dynamically.

You should really prototype this in C# before you start screwing around with marshalling and unmarshalling data into unsafe structures so that you can invoke functions in a C++ DLL. C# is very often faster than you think it'll be. Prototyping is cheap.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.