Track Data Input Through Application Code and System Libraries

Track Data Input Through Application Code and System Libraries - c#

I am a security dude, and I have done extensive research on this one, and at this point I am looking for guidance on where to go next.
Also, sorry for the long post, I bolded the important parts.
What I am trying to do at a high level is simple:
I am trying to input some data into a program, and "follow" this data, and track how it's processed, and where it ends up.
For example, if I input my login credentials to FileZilla, I want to track every memory reference that accesses, and initiate traces to follow where that data went, which libraries it was sent to, and bonus points if I can even correlate it down to the network packet.
Right now I am focusing on the Windows platform, and I think my main question comes down to this:
Are there any good APIs to remote control a debugger that understand Windows forms and system libraries?
Here are the key attributes I have found so far:
The name of this analysis technique is "Dynamic Taint Analysis"
It's going to require a debugger or a profiler
Inspect.exe is a useful tool to find Windows UI elements that take input
The Windows automation framework in general may be useful
Automating debuggers seems to be a pain. IDebugClient interface allows for more rich data, but debuggers like IDAPro or even CheatEngine have better memory analysis utilities
I am going to need to place memory break points, and track the references and registers that are associated with the input.
Here are a collection of tools I have tried:
I have played with all the following tools: WinDBG (awesome tool), IDA Pro, CheatEngine, x64dbg, vdb (python debugger), Intel's PIN, Valgrind, etc...
Next, a few Dynamic Taint Analysis tools, but they don't support detecting of .NET components or other conveniences that Windows debugging framework provides natively provided by utilities like Inspect.exe:
https://github.com/wmkhoo/taintgrind
http://bitblaze.cs.berkeley.edu/temu.html
I then tried writing my own C# program using IDebugClient interface, but the it's poorly documented, and the best project I could find was from this fellow, and is 3 years old:
C# app to act like WINDBG's "step into" feature
I am willing to contribute code to an existing project that fits this use case, but at this point I don't even know where to start.
I feel like as a whole dynamic program analysis and debugging tools could use some love... I feel kind of stuck, and don't know where to move from here. There are so many different tools and approaches to solving this problem, and all of them are lacking in some manner of another.
Anyway, I appreciate any direction or guidance. If you made it this far thanks!!
-Dave

If you insist on doing this at runtime, Valgrind or Pin might be your best bet. As I understand it (having never used it), you can configure these tools to interpret each machine instruction in an arbitrary way. You want to trace dataflows through machine instructions to track tainted data (reads of such data, followed by writes to registers or condition code bits). A complication will likely be tracing the origin of an offending instruction back to a program element (DLL? Link module? Named subroutine) so that you can complain appropriately.
This a task you might succeed at doing as an individual in terms of effort.
This should work for applications.
I suspect one of your problems will be tracing where goes in the OS. That's a lot harder although the same principle applies; your difficulty will be getting the OS supplier to let you track insructions executed in the OS.
Doing this as runtime analysis has the downside that if a malicious application doesn't do anything bad on your particular execution, you won't find any problems. That's the classic shortcoming of dynamic analysis.
You could consider tracking the data the source code level using classic compiler techniques. This requires that you have access to all the source code that might be involved (that's actually really hard if your application depends on a wide variety of libraries), that you have tools that can parse and track dataflows through source modules, and that these tools talk to each other for different languages (assembler, C, Java, SQL, HTML, even CSS...).
As static analysis, this has the chance of detecting an undesired dataflow no matter which execution occurs. Turing limitations means that you likely cannot detect all such issues. THat's the shortcoming of static analysis.
Building your own tools, or even integrating individual ones, to do this is likely outside what you can reasonably do as an individual. You'll need to find uniform framework for building such tools. [Check my bio for one].

Related

Alternative for Obfuscation in the .NET world

Are there any alternatives for obfuscation to protect your code from being stolen?

An ultimate protection is the SaaS model. Anything else will expose your precious secrets one way or another.
See: http://en.wikipedia.org/wiki/Software_as_a_service

A short answer is:
Obfuscation has nothing to do with theft protection.
Obfuscation's only purpose is to make it harder to read and understand your code so that in best case reverse engineering is economical unattractive.
It is still possible that someone steals your source code. Even if you use the best available obfuscation technology or if you think about SaaS scenarios.
You normally have your source code at least at two places together with all meta files necessary to build the project:
Your development computer
Your code repository
If you want to protect your code against theft, these are the first places where must be active. Even the biggest players on the market like Adobe, Microsoft Corporation, Symantec have lost source code as a result of a theft but not as a result of reverse engineering. And in bigger companies it does not need an external attacker - an leaving employee is sometimes enough.
So you might be interested in:
Strong machine encryption
Anti virus, Anti rootkit, Anti malware
Firewall and Intrusion Detection
Digital Property Protection
Limited internet access on development computers
Managed remote development environments so that source never leaves secured servers and infrastructure
Etc. pp.
Clear processes and consitent rights management
Today in many cases it is a bigger risk that some bad guy manages to get access to your repository or development system or that a leaving employee has a "backup copy" of your code than that some company invests time in reverse engineering of existing applications to create a 1:1 copy or to make modifications (both is in most countries illegal and may lead to big damage of reputation and expensive sentences and they also have no possibility to get professional support on such hacked and modified software)
Obfuscation does also not mean that your intellectual property is safe against beeing stolen or copied. Depending on the obfuscator you use it is still possible to analyze logic.
If you want to make analyzing logic harder, you need some kind of control flow obfuscation. But cfo can produce a lot of funny and hard to debug problems. I'm sure that's in most cases more an additional problem than an solution.
The bad reality is, that obfuscation solves not the problem of reverse engineering. It solves te problem of 1:1 (or close to 1:1) code copies. That's because most software has an recognizeable user interface or behavior and in nearly all cases it is possible to reproduce user interfaces and behaviors (or to be more exact: The results) and there exists no tool to protect software against this.
If you want to nag casual coders from understanding your code, open source tools like obfuscar may be good enough. But i bet, that you run into problems if you are using technologies like reflection, remoting, plugins, dynamic assembly loading and building etc. pp.
From my point of view - and that's also my experience - obfuscation is expendable in most cases.
If you really want to make it hard for others to access your code (while "really hard" is relative) you have in general two choices:
Some kind of a cryptographic container with a virtual execution environment and a virtual file system which protects not only your code but the complete application and it's structure. Attack vector is e.g. the memory during runtime or the container itself.
Think about SaaS which means, that you deliver the access to your software but not the software itself. But keep in mind that SaaS-Solutions can be hard to develop and expensive depending on the service level, security and confidence you want or must provide. Attack vector is e.g. the server infrastructure.
That ultimate 100% bullet proof solution does - in fact - not exist on this planet.
Last but not least it might be necessary to provide complete source code to customers in some situations. E.g. if you develop individual software and delivering code is part of your contract or if you want to make business in critical segments like aerospace, military industry, governmental systems etc. pp.

You could also code the sensitive functions/components into native C++, wrap it in C++/CLI and use with .NET.
Obviously, it can still be reverse engineered, but is an alternative nevertheless.

There is no obfuscator that will ever be secure enough to protect an application written in .NET. Forget it! Obfuscating is not a real protection.
If you have a .NET Exe file there is a FAR better solution.
I use Themida and can tell that it works very well.
Themida is by far cheaper than the the most obfuscators and is the best in anti piracy protection on the market. It creates a virtual machine were critical parts of your code are run and runs several threads that detect manipulation or breakpoints set by a cracker. It converts the .NET Exe into something that Reflector does not even recognize as a .NET assembly anymore.
Please read the detailed description on their website: http://www.oreans.com/themida_features.php
The only drawback of Themida is that it cannot protect .NET Dlls. (It's strength is protecting C++ code in Exe and DLLs)

A full operating system in c#

I saw this thread here. I was wondering if this was legit (sounds like it) and what are the drawbacks of doing this. What does it entail to run it stand alone in some architecture?
Thanks

Trying to create an operating system in a managed language is currently an "interesting research problem". This means that it seems possible, but there are still quite a few important issues that need to be resolved (for example, I wouldn't expect "managed windows" anytime soon).
For example, take a look at the Singularity project (also available at CodePlex). It still has some native parts, but very few of them. As far as I know, even the garbage collector is written in managed code (with some language extension that allows safe manipulation with pointers).
The trick is that even managed code will eventually be compiled to native code. In .NET, the compilation is done usually by JITter when you start the application. In Singularity, this is done in advance, so you run native code (but generated from managed). Singularity has some other interesting aspects - for example, processes communicate via messages (and cannot dynamically load code), which makes it possible to do some aggressive optimizations when generating native code.

There's an open source project that's trying to achieve exactly that.
It's called the "Managed Operating System Alliance". Mainly targeted as a framework (supplying users with a compiler, libraries, interfaces, tools and an example kernel), it will also feature a complete operating system kernel and small apps.
For further information:
Website: http://mosa-project.org/projects/mosa
IRC: #mosa on freenode

It is legit. Drawbacks are clear, this is a micro kernel. It is going to be a while before your video adapter driver will be fully managed as well. That takes acquiring critical mass with many devs and manufacturers jumping on the bandwagon. Difficult, but it has happened with Linux as the obvious example.
This is being pursued by Microsoft as well. Singularity has been well published about. It has evolved into a secret research project named Midori. There have been enough leaks about it to know its goal, Wikipedia has an article about it. I think lots of the devs that worked on the original CLR joined this project. Whether it will come to a good end is an open question. If it does, clearly the project backer is probably enough to get that critical mass rolling.

Microsoft's Singularity project is a operating system architecture framework which will allow people to write customizable operating system and probably Microsoft's new operating system will be based on singularity.
.NET is very powerful framework, it evolved and it probably contains everything from metadata attributes to linq and which certainly makes us free from bad pointer errors.
Just like Windows Phone and iPhone, people will be able to write customizable operating system for devices.
Today most of firewall, routers (the hardware ones) contain customized linux, that can be replaced with Singularity kernal and your own business process.
Singularity kernel is small it looks like perfect alternative of embedded windows/linux.
I dont think there is any drawback, except that it is totally new system and it will take time for hardware vendors to supply devices comptabile with this, but it will happen in future.

Should I go with C# / .Net OR C++ for my application?

I am working on a project which talks to SQL Server and most of the back end code is in C++.
This is an application which controls flow of few fluids while loading them into carriers. Some of the back end modules which talk to controllers which in turn control flow of fluids are in C++. Since they have memory leaks and some other bugs, there has been attempt to migrate them to .Net.
What I understand is, performance comes down when we use .Net for back end modules. So my opinion was NOT to convert these back end modules to .Net but to fix the issues in C++ itself.
The code in discussion is an application which interacts with firmware of controllers. It basically takes some commands and gets response from controllers. This code does not have UI and the same code interacts with SQL as well to update the data. Both are part of one exe.
.Net is believed to be good when performance is not expected to be rigorous. It would have been suitable if new code had to be written and especially when it involves design of UI. Another school of thought is, .Net is good for higher layers but not for lower layers in a multi tier architecture.
I would like to know opinions of others from different perspective. Some of the aspects to consider are:
speed
maintainability of code
migration related risks in future
etc.
Please comment from angle of rewriting existing code. It will be one to one C++ line conversion to C# if we decide to go for it.

Quick Answer:
If you have capable C++ programmers who can use debuggers and understand their application domain and how to use the controllers, it would probably be easier to do a careful review and fix the memory bugs and issues. After all, time and effort has already been spent on the code, and unless it is trivial code, rewriting in C# could introduce new logic errors.
Questions for the OP:
The code in discussion is driver code which interacts with firmware of controllers. It
basically takes some commands and gets response from controllers. This code does not have
UI and the same driver code interacts with SQL as well to update the data
Are you talking about user-mode software that you have named the "driver", or are you talking about a kernel-mode device driver?
It would help if you could provide more information about these controllers running firmware that control fluid flow. Does the C++ back-end connect to the controllers through RS232 (Serial)? Ethernet? USB? TCP/IP? PCI?
If you're connecting to the controller hardware via TCP/IP or RS232 (Serial), C#/.NET is well equipped to handle the task. For anything else like USB, PCI, Ethernet, etc., you're going to need a device driver which has to be programmed in C or C++ depending on the requirements of the driver. Of course you can encapsulate the user-mode part that is in C++　or encapsulate direct calls to Win32, but it will add more development tasks to your project.

Apparently the only problem with existing C++ code is memory leaks.
That seems to me an insufficient reason to rewrite it all in C#.
Instead I'd suggest running memory leak detection software to find the leaks.

every language is special in its own way, you should find out which language suits best for the scenario

Don't rewrite a whole program in a different language because of a few bugs -- there will just be different ones in the new product, and the Q&A cycle will have to be restarted. I'd fix the bugs in the C++ program. If the issue is memory management, I'd strongly look into std::auto_ptr or std::tr1::shared_ptr, which will automatically delete memory for you when finished. If that's not an option, I'm sure that running something through valgrind or even paying for commercial memory checkers would be cheaper than rewriting the whole thing (in both time and money).

"language is special in its own way" man I need a hug, for real. Don't change languages because code is not written well...write better code and use resources available to you.

Protecting my code from reverse engineering

As discussed in similar questions here and here I want to protect my code from reverse engineering.
My situation is as Simucal describes in his (excellent) answer here:
Basically, what it comes down to is
the only chance you have of being
targeted for source theft is if you
have some very specific, hard to
engineer, algorithm related to your
domain that gives you a leg up on your
competition. This is just about the
only time it would be cost-effective
to attempt to reverse engineer a small
portion of your application.
I have exactly this situation. A hard to engineer algorithm which is elegant and valuable for our specific domain.
After spending months fine tuning and developing this the end result is very compact (approx. 100 lines of code) and elegant. I want to protect this specific part of the code from reverse engineering or at least make it reasonable difficult.
The scenario is a rich-client application written in C# and I have to deploy this part of the code - I cannot execute it from a webservice.
I think extracting the code and rewriting it in a unmanaged native binary is not an option due to performance reasons (and cross boundary issues).
Initially I wanted to do simple obfuscation but given the small size of the code I don't think this will offer much protection.
Ideally I would like to protect my whole application but there are two main issues that seem to make ordinary obfuscaters and 3rd party packers difficult to use:
The application offers a plugin interface and therefore some assemblies (and interfaces/classes) should not be obfuscated and packed
We still want to be able to get to a real stack trace when receiving error reports - potentially this could be done my mapping obfuscation to the real code.
Setting these issues aside (although I would appreciate any input on this as well), what is a good way to protect a tiny part of my code from reverse engineering? I am not concerned about anyone altering or hacking the code but want to make it difficult to understand and reverse engineer it.

It cannot be done. If your code can be run, then it can be read and reverse-engineered. All you can do is make it a little harder and, believe me, it will only be a little harder. You may not like the fact but most crackers are far better at cracking than anyone else is at making things hard to crack. The amount of effort to protect your code is usually not worth it, especially if it disadvantages your paying customers. Witness the stunning non-successes of DRM.
My advice is to not worry about it. If your algorithm is truly novel, seek a patent (although that got a little harder with the Bilski decision unless you tie it to a specific hardware implementation). Relying on trade secrets is also useless unless you only distribute your software to those that sign contracts that ensure they will not allow unfettered access. And then, you have to have a way to police this. The minute you put the binaries up on the internet or distributed them without a contract, I believe you'll be deemed to have lost trade secret status.
Relying on licensing is also fraught with danger - you may think that you can insert clauses in your license that prohibit reverse-engineering but many jurisdictions around the world specifically disallow those provisions. And the Russian mobsters who whoever are responsible for most of the cracking are unlikely to honor said provisions anyway.
Why don't you just concentrate on making your product the best it can be? The goal is to stay ahead of the crowd rather than lock them out altogether. Being the first to deliver and always having the best product in a competitive group will ensure your prosperity far more than wasting a lot of effort on useless protection (IMNSHO).
This is just my opinion. I may be wrong. I've been wrong before, you only need ask my wife :-)

You should obfuscate the complete code since it gets harder to reach that small valuable part. The smaller the code gets, the easier it becomes to understand it. Most obfuscators should not mess with public interfaces since there are many obfuscated libraries out there.
However I think you should rather convince users that there are no special tricks there instead of trying to hide it. To quote Kaiser Soze, "the greatest trick The Devil has ever pulled is to convince the world that he doesn't exist".
And of course you can always file a patent for your invention and protect yourself legally.

Aside from obfuscation it is almost worthless, even Microsoft (ScottGu etc) basically say that people with the right amount of intent and ability will reverse engineer an application and in .NET a basic defense is licensing and IP instead of trying to guard your code through obscurity or some other means of preventing reverse engineering.
That is part of the reasoning of why they released the BCL source instead of keeping it private.

one option is to use the license key and/or hardware fingerprint to decrypt the sensitive code at runtime and emit it as IL; this will make it invisible to static reverse-engineering tools (e.g. Reflector)
also detect the presence of a debugger and refuse to run in debug mode, except possibly in very limited circumstances (i.e. on your machine)
note that this will make debugging very difficult for you, and nearly impossible for others (if this is an end-user app that's not a problem, but if it is a library or framework for other developers to build upon, that's a problem)
note also that making a copy of physical memory to disk and using offline tools on the memory-dump will reveal your decrypted algorithm, so it is fairly easy to defeat - but far more trouble than most people will bother with
the whole thing is a trade-off between difficulty for you vs deterrence for the few bad apples vs potential loss due to theft/plagarism
good luck, and let us know what you decide!

If your code is that sensitive, put it where nobody can get to it.
E.G. provide a client or web page for people to access some service that exposes your functionality.
That service can sit behind an external firewall and communicate with a backend server behind an internal firewall, where your sensitive code runs.
For extra measure, obfuscate that code.
This would require compromising several layers of security before getting to your code.

You can obfuscate it at the C# or CIL level but what is really going to make it impossible is that the IL compiler is designed to create the most efficient machine code that it can to actually execute.
So, to reverse engineer your algorithm, get the machine code and run standard disassembly tools on it. Trace the data through the system by following it forward from the standard input API calls to the standard output API calls.
Face it, if someone wants it, they can have it.
You can make it hard to casually figure it out. For example, I wanted to see what was in some database managed by a Java application. It turned out that the Java decompile was really messy, full of odd functions and classes and namespaces all with the same names, intentionally trying to hide what was really going on.
I could have fixed up the decompiler I was using so that it renamed everything as A_namespace instead of just A and then the function flow would have popped right out to the Eclipse call tracing graphs.
Instead I just threw up my hands and got on with real work rather than rewriting decompilers.
So, you can hide it from casually interested folks, sure.

Most obfuscators allow you to specify which methods/classes you want to keep from being obfuscated. SmartAssembly for instance let you mark methods or classses with attributes, while others let you select the methods in a UI to exclude from the process. You ought to be able to have pretty fine grained control of the process, and so you can have your cake and eat it.
You will however run into problems if you are using reflection.

I've heard good comments about the Spices.Net Obfuscator. It should be able to greatly increase the time necessary to get at the algorithm.

Monitor Usage Statistic-- How it is done?

Windows, Firefox or Google Chrome all monitor usage statistics and analyze the crash reports are sent to them. I am thinking of implementing the same feature into my application.
Of course it's easy to litter an application with a lot of logging statement, but this is the approach that I want to avoid because I don't want my code to have too many cross cutting concern in a function. I am thinking about using AOP to do it, but before that I want to know how other people implement this feature first.
Anyone has any suggestion?
Clarification: I am working on desktop application, and doesn't involve any RDBMS

Joel had a blog article about something like this - his app(s) trap crashes and then contact his server with some set of details. I think he checks for duplicates and throws them out. It is a great system and I was impressed when I read it.
http://www.fogcreek.com/FogBugz/docs/30/UsingFogBUGZtoGetCrashRep.html
We did this at a place I was at that had a public server set up to receive data. I am not a db guy and have no servers I control on the public internets. My personal projects unfortunately do not have this great feature yet.

In "Debugging .Net 2.0 Applications" John Robbins (of Wintellect) writes extensively about how to generate and debug crash reports (acutally windbg/SOS mini dumps). His Superassert class contains code to generate these. Be warned though - there is a lot of effort required to set this up properly: symbol servers, source servers as well as a good knowledge of VS2005 and windbg. His book, however, guides you through the process.
Regarding usage statistics, I have often tied this into authorisation, i.e. has a user the right to carry out a particular task. Overly simply put this could be a method like this (ApplicationActions is an enum):
public static bool HasPermission( ApplicationActions action )
{
// Validate user has permission.
// Log request and result.
}
This method could be added to a singleton SercurityService class. As I said this is overly simple but should indicate the sort of service I have in mind.

I would take a quick look at the Logging Application Block that is part of the Enterprise Library. It provided a large number of the things you require, and is well maintained. Check out some of the scenarios and samples available, I think you will find them to your liking.
http://msdn.microsoft.com/en-us/library/cc309506.aspx

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.