Integrating matlab functions into c# project - c#

I have a nice .net assembly of core matlab functions, created of course with the matlab compiler. For functions that accept numbers or arrays of numbers, this is fine; I can write code in c# without having to revert to matlab (well, the RCM has to be installed; that’s fine).
For functions that must reference other functions, however, the only way I can find so far to get a c# programme going is to compile both functions into the assembly. To explain better, let’s say I have a library in which I’ve stored the ode45 routine. If I want to solve a specific equation, let’s say something simple like dy/dx = -y, then I have to create a matlab script file which may be written as follows:
function dydx = diffeq(x, y)
dydx = -y
[obviously the analytical solution exists, but for the sake of this example let’s say I want to solve it this way]
Now in order to solve this equation, I would have to add this function as a method in my class to be compiled into the .net assembly. This of course ruins the generality of my library; I want application-specific equations in a different library to my core math function library. That is, the ODE45 method should reside in a “more core” library than the library in which the “diffeq” method would reside.
More than that, I would much prefer to create the “diffeq” method in a c# class that I can edit directly in e.g. VS2012. I would like to edit the equation directly rather than having to enter matlab each time and recompile an assembly.
To solve this problem, I have gone to the extent of decompiling the assembly which contains both the ode45 code and my differential equation method; it turns out the assembly is nothing but an interface to the MCR; the diffeq methods in the assembly return something like the following:
return mcr.EvaluateFunction(numArgsOut, “diffeq”, new object[0]);
We note that the function/method “diffeq” is not part of the MCR; MCR does not change. However, I can’t find the equation anywhere in the assembly.
Which begs the question “Dude, where’s my function?”
There is a ‘resources’ component of the assembly in which we find [classname].ctf, and in that we’ll find some machine code. This looks encrypted, but the equation might be hidden in there. If so, that would be a deliberate attempt to prevent when I am attempting, and kudos to MathWorks for making it impossible for me to avoid having to enter the matlab application!
However, there doesn’t seem to be anything in licensing to prevent what I want to do; I think it would be great if mathworks would allow as open an approach as that, but in the interrim, does anyone know how to do this?

The "MATLAB Compiler" has a somewhat misleading name. It is more of a deployment solution than a compiler in the actual sense (see note below). It is mainly intended to distribute MATLAB applications to end-users without requiring a full MATLAB installation on their part (only the royalty-free MCR runtime needs to be installed).
The MCR is in fact a stripped-down version of the MATLAB engine along with accompanying libraries.
When you use MATLAB Compiler to generate a binary package, the result is a target-specific wrapper (be it a standalone application, C/C++ shared library, Java package, or a .NET assembly) that calls the MCR runtime. The binary generated includes an embedded CTF archive containing all the original MATLAB content (your M-files and other dependencies) but in an encrypted form. When first executed, the CTF archive is extracted to a temp folder, and the M-files (still encrypted) are then interpreted by the MCR at runtime like typical MATLAB code.
There is an option in deploytool (mcc -C) to tell the compiler not to embed the CTF archive inside the binary as a resource, instead to place it as a seperate file next to the generated binary (this CTF archive can be inspected as a regular ZIP-file, but the source files inside are still encrypted of course).
See the following documentation page for more information:
Application Deployment Products and the Compiler Apps
PS: The truth is MATLAB Compiler started out as a product to convert MATLAB code into full C/C++ code which used the now discontinued "MATLAB C/C++ Math Library" (no runtime requirement, you just compile the generated C++ code and link to certain shared libraries; the result is a true compiled executable not a wrapper). This functionality completely changed around the time MATLAB 7 was released (the reason being that the old way only supported a subset of the MATLAB language, while using the current MCR mechanism enables deploying almost any code). Years later, MATLAB added a new product to replace the once-removed functionality of code translation, namely the MATLAB Coder.

Related

Obtain assembly from executable

I am on a hunt to write a program that will be able to read assembly code from a specified .exe . What i am trying to do is to read the assembly code of a executable, in order to find instructions that i can replace with equivalent instructions, in order to obtain different byte code.
This is generally called a source scrambler / signature scrambler, that modifies assembly code in order to obtain different byte code which results in different signatures.
I was reading about the Assembly class in C# , but did not find anything that could return something like a IEnumerable that contains assembly code from the .exe
Is there anyone that can educate me on this? Is this even possible? Different approaches?
.NET does not really deal with Byte Code. All .NET Programms are turned into somthing called MSIL that is executed by teh .NET Runtime and only then turned into bytecode. The whole process is very similar to how JavaBytecode works.
As a result you get full access to the names of anything. You can even use .NEt Executeables like a .DLL file. But replacing stuff is not easy, outside of inheritance or replacing of the files.
The kind of bitwise manipulation you propably need, requires working with naked pointers. And the .NET Developers went out of their way so you would no ever have to use naked Pointers. You can still use them using Unsface code, but as this sounds like the primary use of your programm you are propably better of starting with something like native C++ instead. Really anything taht uses naked pointers as default, rather then a fallback.

I have the start address of a loaded DLL, how can I discover and call its exports?

I'm writing an add-in that runs in-process. I'm reliably able to discover the memory address of a DLL that is already loaded in that process. The memory at the offset clearly shows an "MZ" DOS header and a "PE" header. Later, there appears to be the names of exported functions etc. This walks and talks like a loaded DLL.
So, now, I'd like to discover more about what the DLL is, and more interestingly, what I might be able to do with it.
I've used PE utilities in the past, but they've always worked with file-based DLLs. How can I list the exported functions of an in-memory DLL, other than by inspecting the process in a hex editor? Is there any way to discover the file-based DLL that is currently loaded? (I'm not overly familiar with the linking that I think takes place when the dll is loaded.)
If I have the names of the exported functions, is it just a matter of trying to call those functions, and guessing their arguments and return values? Or is there some more robust reverse engineering that could be performed?
Given the starting address of the DLL, and a function name, how would I go about making a call in C#?
There are actually many questions here (and some are pretty vast). I'll try providing answers to some.
To get the handle of a module (.dll) loaded in the current process use
[MSDN]: GetModuleHandleEx function, whether its name or address is known (GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS flag)
To get the file name containing the code of a loaded .dll use [MSDN]: GetModuleFileName function. At this point you'll be able to use PE utilities that you mentioned (Dependency Walker or [MSDN]: Sysinternals Suite can be very useful)
Apparently, getting the functions signatures from a .dll is not a trivial task (sometimes it's quite impossible). Here are 3 URLs, but Google would yield tons of results on this topic:
[MSDN]: how can I get metadata (signature with parameters and return type) of C++ dll
[SO]: Is there any native DLL export functions viewer?
[SO]: Get signatures of exported functions in a DLL
But a .dll comes with one or more header file(s) that contain(s) (among other things) the functions/classes declarations. Lacking the header(s) would mean that either:
The .dll symbols are for internal purposes only (and not to be called "manually")
They are private (e.g. protected by a license), and in that case reverse engineering them wouldn't be very ethical
Anyway, considering that one way or another you get the functions names and signatures, you can load the functions via [MSDN]: GetProcAddress function, and then call them.
Doing everything from .NET (C#) (again, function names and signatures are required), only adds an extra level of complexity because C# code runs in managed environment, while C/C++ runs in native environment, and when data is exchanged between the 2 environments it needs to be marshalled/unmarshalled ([MSDN]: Overview of Marshaling in C++).
Below are 3 URLs, but again, Internet is full of information:
[MSDN]: Calling Native Functions from Managed Code
[SO]: Is it possible to call a C function from C#.Net
[SO]: Calling C DLL from C#

Make an executable at runtime

Ok, so I was wondering how one would go about creating a program, that creates a second program(Like how most compression programs can create self extracting self excutables, but that's not what I need).
Say I have 2 programs. Each one containing a class. The one program I would use to modify and fill the class with data. The second file would be a program that also had the class, but empty, and it's only purpose is to access this data in a specific way. I don't know, I'm thinking if the specific class were serialized and then "injected" into the second file. But how would one be able to do that? I've found modifying files that were already compiled fascinating, though I've never been able to make changes that didn't cause errors.
That's just a thought. I don't know what the solution would be, that's just something that crossed my mind.
I'd prefer some information in say c or c++ that's cross-platform. The only other language I'd accept is c#.
also
I'm not looking for 3-rd party library's, or things such as Boost. If anything a shove in the right direction could be all I need.
++also
I don't want to be using a compiler.
Jalf actually read what I wrote
That's exactly what I would like to know how to do. I think that's fairly obvious by what I asked above. I said nothing about compiling the files, or scripting.
QUOTE "I've found modifying files that were already compiled fascinating"
Please read and understand the question first before posting.
thanks.
Building an executable from scratch is hard. First, you'd need to generate machine code for what the program would do, and then you need to encapsulate such code in an executable file. That's overkill unless you want to write a compiler for a language.
These utilities that generate a self-extracting executable don't really make the executable from scratch. They have the executable pre-generated, and the data file is just appended to the end of it. Since the Windows executable format allows you to put data at the end of the file, caring only for the "real executable" part (the exe header tells how big it is - the rest is ignored).
For instance, try to generate two self-extracting zip, and do a binary diff on them. You'll see their first X KBytes are exactly the same, what changes is the rest, which is not an executable at all, it's just data. When the file is executed, it looks what is found at the end of the file (the data) and unzips it.
Take a look at the wikipedia entry, go to the external links section to dig deeper:
http://en.wikipedia.org/wiki/Portable_Executable
I only mentioned Windows here but the same principles apply to Linux. But don't expect to have cross-platform results, you'll have to re-implement it to each platform. I couldn't imagine something that's more platform-dependent than the executable file. Even if you use C# you'll have to generate the native stub, which is different if you're running on Windows (under .net) or Linux (under Mono).
Invoke a compiler with data generated by your program (write temp files to disk if necessary) and or stored on disk?
Or is the question about the details of writing the local executable format?
Unfortunately with compiled languages such as C, C++, Java, or C#, you won't be able to just ``run'' new code at runtime, like you can do in interpreted languages like PHP, Perl, and ECMAscript. The code has to be compiled first, and for that you will need a compiler. There's no getting around this.
If you need to duplicate the save/restore functionality between two separate EXEs, then your best bet is to create a static library shared between the two programs, or a DLL shared between the two programs. That way, you write that code once and it's able to be used by as many programs as you want.
On the other hand, if you're really running into a scenario like this, my main question is, What are you trying to accomplish with this? Even in languages that support things like eval(), self modifying code is usually some of the nastiest and bug-riddled stuff you're going to find. It's worse even than a program written completely with GOTOs. There are uses for self modifying code like this, but 99% of the time it's the wrong approach to take.
Hope that helps :)
I had the same problem and I think that this solves all problems.
You can put there whatever code and if correct it will produce at runtime second executable.
--ADD--
So in short you have some code which you can hard-code and store in the code of your 1st exe file or let outside it. Then you run it and you compile the aforementioned code. If eveything is ok you will get a second executable runtime- compiled. All this without any external lib!!
Ok, so I was wondering how one would
go about creating a program, that
creates a second program
You can look at CodeDom. Here is a tutorial
Have you considered embedding a scripting language such as Lua or Python into your app? This will give you the ability to dynamically generate and execute code at runtime.
From wikipedia:
Dynamic programming language is a term used broadly in computer science to describe a class of high-level programming languages that execute at runtime many common behaviors that other languages might perform during compilation, if at all. These behaviors could include extension of the program, by adding new code, by extending objects and definitions, or by modifying the type system, all during program execution. These behaviors can be emulated in nearly any language of sufficient complexity, but dynamic languages provide direct tools to make use of them.
Depending on what you call a program, Self-modifying code may do the trick.
Basically, you write code somewhere in memory as if it were plain data, and you call it.
Usually it's a bad idea, but it's quite fun.

How do languages like C# and Java avoid C/C++-like independent compilation?

For my programming languages class, I'm writing a research paper on some papers by some important people in the history of language design. One by CAR Hoare struck me as odd because it speaks against independent compilation techniques used in C and later C++ before C even became popular.
Since this is primarily an optimization to speed up compilation times, what is it about Java and C# that make them able to avoid reliance on independent compilation? Is it a compiler technique or are there elements of the language that facilitate this? And are there any other compiled languages that used these techniques before them?
Short answer: Java and C# don't avoid separate compilation; they make full use of it.
Where they differ is that they don't require the programmer to write a pair of separate header/implementation files when writing a reusable library. The user writes the definition of a class once, and the compiler extracts the information equivalent to the "header" from that single definition and includes it in the output file as "type metadata". So the output file (a .jar full of .class files in Java, or an .dll assembly in .NET-based languages) is a combination of binaries AND headers in a single package.
Then when another class is compiled and it depends on the first class, it can look at the metadata instead of having to find a separate include file.
It happens that they target a virtual machine rather than a specific chip architecture, but that's a separate issue; they could put x86 machine code in as the binary and still have the header-like metadata in the same file as well (this is in fact an option in .NET, albeit rarely used).
In C++ compilers it is common to try to speed up compilation by using "pre-compiled headers". The metadata in .NET .dll and .class files is much like a pre-compiled header - already parsed and indexed, ready for rapid look-ups.
The upshot is that in these modern languages, there is one way of doing modularization, and it has the characteristics of a perfectly organised and hand-optimised C++ modular build system - pretty nifty, speaking ASFAC++B.
IMO, one of the biggest factors here is that both java and .NET use intermediate languages; that means that the compiled unit (jar/assembly) contains, as a pre-requisite, a lot of expressive metadata about the types, methods, etc; meaning that it is already laid out conveniently for reference checking. The runtime still checks anyway, in case you are pulling a fast one ;-p
This isn't very far removed from the MIDL that underpins COM, although there the TLB is often a separate entity.
If I've misunderstood your meaning, please let me know...
You could consider a java .class file to be similar to a precompiled header file in C/C++. Essentially the .class file is the intermediate form that a C/C++ linker would need as well as all of the information contained in the header (Java just doesn't have a separate header).
Form your comment in another post:
"I'm basically meaning the idea in
C/C++ that each source file is its own
individual compilation unit. This
doesn't as much seem to be the case in
C# or Java."
In Java (I cannot speak for C#, but I assume it is the same) each source file is its own individual compilation unit. I am not sure why you would think it is not... perhaps we have different definitions of compilation unit?
It requires some language support (otherwise, C/C++ compilers would do it too)
In particular, it requires that the compiler generates self-contained modules, which expose metadata that other modules can reference to call into them.
.NET assemblies are a straightforward example. All the files in a project are compiled together, generating one dll. This dll can be queried by .NET to determine which types it contains, so that other assemblies can call functions defined in it.
And to make use of this, it must be legal in the language to reference other modules.
In C++, what defines the boundary of a module? The language specifies that the compiler only considers data in its current compilation unit (.cpp file + included headers). There is no mechanism for specifying "I'd like to call function Foo in module Bar, even though I don't have the prototype or anything for it at compile-time". The only mechanism you have for sharing type information between files is with #includes.
There is a proposal to add a module system to C++, but it won't be in C++0x. Last I saw, the plan was to consider it for a TR1 after 0x is out.
(It's worth mentioning that the #include system in C/C++ was originally used because it'd speed up compilation. Back in the 70's, it allowed the compiler to process the code in a simple linear scan. It didn't have to build syntax trees or other such "advanced" features. Today, the tables have turned and it's become a huge bottleneck, both in terms of usability and compilation speed.)
The object files generated by a C/C++ are ment to be read only by the linker, not by the compiler.
As to other languages: IIRC Turbo Pascal had "units" which you could use without having any source code. I think the point is to create metadata along with compiled code which can then be used by the compiler to figure out the interface to the module (i.e. signatures of functions, class layout etc.)
One problem with C/C++ which prevents just replacing #include with some kind of #import is also the preprocessor, which can completely change the meaning/syntax etc of included/imported modules. This would be very difficult (if not impossible) with a Java-like module system.

Assembler library for .NET, assembling runtime-variable strings into machine code for injection

Is there such a thing as an x86 assembler that I can call through C#? I want to be able to pass x86 instructions as a string and get a byte array back. If one doesn't exist, how can I make my own?
To be clear - I don't want to call assembly code from C# - I just want to be able to assemble code from instructions and get the machine code in a byte array.
I'll be injecting this code (which will be generated on the fly) to inject into another process altogether.
As part of some early prototyping I did on a personal project, I wrote quite a bit of code to do something like this. It doesn't take strings -- x86 opcodes are methods on an X86Writer class. Its not documented at all, and has nowhere near complete coverage, but if it would be of interest, I would be willing to open-source it under the New BSD license.
UPDATE:
Ok, I've created that project -- Managed.X86
See this project:
https://github.com/ZenLulz/MemorySharp
This project wraps the FASM assembler, which is written in assembly and as a compiled as Microsoft coff object, wrapped by a C++ project, and then again wrapped in C#. This can do exactly what you want: given a string of x86/x64 assembly, this will produce the bytes needed.
If you require the opposite, there is a port of the Udis86 disassembler, fully ported to C#, here:
https://github.com/spazzarama/SharpDisasm
This will convert an array of bytes into the instruction strings for x86/x64
Take a look at Phoenix from Microsoft Research.
Cosmos also has some interesting support for generating x86 code:
http://www.gocosmos.org/blog/20080428.en.aspx
Not directly from C# you can't. However, you could potentially write your own wrapper class that uses an external assembler to compile code. So, you would potentially write the assembly out to a file, use the .NET Framework to spin up a new process that executes the assembler program, and then use System.IO to open up the generated file by the assembler to pull out the byte stream.
However, even if you do all that, I would be highly surprised if you don't then run into security issues. Injecting executable code into a completely different process is becoming less and less possible with each new OS. With Vista, I believe you would definitely get denied. And even in XP, I think you would get an access denied exception when trying to write into memory of another process.
Of course, that raises the question of why you are needing to do this. Surely there's got to be a better way :).
Take a look at this: CodeProject: Using unmanaged code and assembly in C#.
I think you would be best off writing a native Win32 dll. You can then write a function in assembler that is exported from the dll. You can then use C# to dynamically link to the dll.
This is not quite the same as passing in a string and returning a byte array. To do this you would need an x86 assembler component, or a wrapper around masm.exe.
i don't know if this is how it works but you could just shellexecute an external compiler then loading the object generated in your byte array.

Categories