Do class, method and variable names get included in the MSIL after compiling a Windows App project into an EXE?
For obfuscation - less names, harder to reverse engineer.
And for performance - shorter names, faster access.
e.g. So if methods ARE called via name:
Keep names short, better performance for named-lookup.
Keep names cryptic, harder to decompile.
Yes, they're in the IL - fire up Reflector and you'll see them. If they didn't end up in the IL, you couldn't build against them as libraries. (And yes, you can reference .exe files as if they were class libraries.)
However, this is all resolved once in JIT.
Keep names readable so that you'll be able to maintain the code in the future. The performance issue is unlikely to make any measurable difference, and if you want to obfuscate your code, don't do it at the source code level (where you're the one to read the code) - do it with a purpose-built obfuscator.
EDIT: As for what's included - why not just launch Reflector or ildasm and find out? From memory, you lose local variable names (which are in the pdb file if you build it) but that's about it. Private method names and private variable names are still there.
Yes, they do. I do not think that there will be notable performance gain by using shorter names. There is no way that gain overcomes the loss of readability.
Local variables are not included in MSIL. Fields, methods, classes etc are.
Variables are index based.
Member names do get included in the IL whether they are private or public. In fact all of your code gets included too, and if you'd use Reflector, you can practically read all the source code of the application. What's left is debugging the app, and I think there might be tools for that.
You must ABSOLUTELY (and I can't emphasize it more) obfuscate your code if you're making packaged applications that have a number of clients and competition. Luckily there are a number of obfuscators available.
This is a major gripe that I have with .Net. Since MS is doing so much hard work on this, why not develop (or acquire) a professional obfuscator and make that a part of VS. Dotfuscator just doesn't cut it, not the version they've for community.
Keep names short, better
performance for named-lookup.
How could this make any difference? I'm not sure how identifiers are looked up by the VM, but I'm pretty sure it's not doing a straight string comparison lookup. This would be the worst possible way to do it.
Keep names cryptic, harder to decompile.
To be honest, I don't think code obfuscation helps that much. Most competent developers out there have already developed a "sixth sense" to figure out things quickly even if identifiers like method names are totally unhelpful since very often the source code they need to maintain or improve already has these problems (I am talking about method names like "DoAllStuff()").
Anyway, security through obscurity is usually a bad idea.
If you are concerned about obfuscation check out .NET Reactor. I tested 8 different obfuscators and Reactor was not only the cheapest commercial one, it was the second best of the bunch (the best was the most expensive one, Dotfuscator Gold).
[EDIT]
Actually now that I think of it, if all you care about is obfuscating method names then the one that comes with VS.NET, Dotfuscator Community Edition, should work fine.
I think they're added, but the length of the name isn't going to affect anything, because of the way the function names are looked up. As for obfuscation, I think there are tools (Dotfuscator or something like that) that basically do exactly what you're saying.
Related
I need to provide a copy of the source code to a third party, but given it's a nifty extensible framework that could be easily repurposed, I'd rather provide a less OO version (a 'procedural' version for want of a better term) that would allow minor tweaks to values etc but not reimplementation using the full flexibility of how it is currently structured.
The code makes use of the usual stuff: classes, constructors, etc. Is there a tool or method for 'simplifying' this into what is still the 'source' but using only plain variables etc.
For example, if I had a class instance 'myclass' which initialised this.blah in the constructor, the same could be done with a variable called myclass_blah which would then be manipulated in a more 'flat' way. I realise some things like polymorphism would probably not be possible in such a situation. Perhaps an obfuscator, set to a 'super mild' setting would achieve it?
Thanks
My experience with nifty extensible frameworks has been that most shops have their own nifty extensible frameworks (usually more than one) and are not likely to steal them from vendor-provided source code. If you are under obligation to provide source code (due to some business relationship), then, at least in my mind, there's an ethical obligation to provide the actual source code, in a maintainable form. How you protect the source code is a legal matter and I can't offer legal advice, but really you should be including some license with your release and dealing with clients who are not going to outright steal your IP (assuming it's actually yours under the terms you're developing it.)
As had already been said, if this is a requirement based on restrictions of contracts then don't do it. In short, providing a version of the source that differs from what they're actually running becomes a liability and I doubt that it is one that your company should be willing to take. Proving that the code provided matches the code they are running is simple. This is also true if you're trying to avoid license restrictions of libraries your application uses (e.g. GPL).
If that isn't the case then why not provide a limited version of your extensibility framework that only works with internal types and statically compile any required extensions in your application? This will allow the application to continue to function as what they currently run while remaining maintainable without giving up your sacred framework. I've never done it myself but this sounds like something ILMerge could help with.
If you don't want to give out framework - just don't. Provide only source you think is required. Otherwise most likely you'll need to either support both versions in the future OR never work/interact with these people (and people they know) again.
Don't forget that non-obfuscated .Net assemblies have IL in easily de-compilable form. It is often easier to use ILSpy/Reflector to read someone else code than looking at sources.
If the reason to provide code is some sort of inspection (even simply looking at the code) you'd better have semi-decent code. I would seriously consider throwing away tool if its code looks written in FORTRAN-style using C# ( http://www.nikhef.nl/~templon/fortran/fortran_style ).
Side note: I believe "nifty extensible frameworks" are one of the roots of "not invented here" syndrome - I'd be more worried about comments on the framework (like "this code is ##### because it does not use YYY pattern and spacing is wrong") than reuse.
I've been using reflector to decompile a couple simple c# apps but I notice that though code is being decompiled, I still can't see things as they were written on VS. I think this is the way it is as the compiler replaces human instructions by machine code. However I thought I would give it a try and ask it on here. Maybe there is a decompiler that can decompile and show the coding almost identically to the original code.
That is impossible, since there are lots of ways to get the same IL from different code. For example, there is no way to know if an extension method was called fluent-style vs explicit on the declaring type. There is no way to know if LINQ vs regular code was used. All manner of implicit operations may or may not be there. Removed code may or may not have been there. Many primitives (including enums) up-to-and-including 4 bytes are indistinguishable once they are IL.
If you want the actual code, legally obtain the original code.
Existing .Net decompilers generally decompile to the best of their ability.
You appear to be asking for variable names and line formatting, which for obvious reasons are not compiled to IL.
There are several. I currently use JustDecompile found here http://www.telerik.com/products/decompiler.aspx?utm_source=twitter&utm_medium=sm&utm_campaign=ad
[Edit]
An alternative is .NET Reflector found here: http://www.reflector.net/
I believe there is a free version of it, but didn't take time to look.
Basically, no. There are often many ways to arrive at the same IL code, and there's no way at all for a decompiler to know which was used.
No, nor should there ever be. Things like comments and unreachable code would just add bloat with absolutely zero benefit. The very best you can ever do is approximate the compiled code.
I'm evaluating several obfuscators for protecting code in a WPF application.
For checking results of job done by each obfuscator on a given assembly I use Red Gate's .Net Reflector. Just after each obfuscation I open the assembly with .NET Reflector and see what it looks like.
Is it enough? Can .NET Reflector's results be treated as an indicator of quality of obfuscation, or should I try some additional tools? (not any possible instrument of such a kind, but from a point of view of practical common sense).
The results from Reflector should be enough on an indication of how any casual attempt at decompiling would fare. Some obfuscatory will obfuscate code to the extent that the assembly will not even open in Reflector.
Anyone who would try any deeper than that will not be easily deterred by more advanced obfuscation than others.
It would be best, if Reflector and ILSpy would outright refuse to decompile the resulting assembly. I know that there exist obfuscators that are capable of that.
My opinion is: that "is it enough" or not depends on your target app. Obfuscation is never about 100% secure code, it's always to make the code deassembly difficult enough for potential attacker, but it all depends on how much that "potential attacker" will put effort to deassembly your app. And also .NET Reflector is a viewer, like you mantioned, so if it's ecure or not can deduct you, by looking on, for example:
if strings are encrypted
if parameters are encrypted
if class names and fields like (PWD_USER) are encrypted
...
Regards.
Is obfuscation only about garbling the names of non-public variables/members? If so, would it not be possible to write an application that would at least change these names more readible ones like "variable1", etc, and then extract the whole code that can still be compiled?
No, it is about a lot more, especially with more sophisticated obfuscators. They can produce IL that cannot be expressed in most languages, and where the logic flow is horribly tangled to befuddle the best of tools. With lots of time you can do it (probably lots by hand), and there is certainly an arms race between the obfuscators and deobfuscators - but you vastly underestimate the technology here.
Also, note that many obfuscators look at an entire application (not just one assembly), so they can change the public API too.
That is certainly the start of an obfuscator. Though some obfuscators will also encrypt strings and other such tricks to make it very difficult to reverse engineer the assembly.
Of course, since the runtime needs to run the assembly after all of this, it is possible for a determined hacker to reverse engineer it :)
There are 'deobfuscator' tools to undo several obfuscation techniques like Decrypt strings, Remove proxy methods, Devirtualize virtualized code, Remove anti-debug code, Remove junk classes, Restore the types of method parameters and fields and more...
One very powerful tool is de4dot.
But there are more.
Obfuscation is about changing meaningful names like accountBalance to meaningless ones like a1.
The application will obviously still work, but it will be more difficult to understand the algorithms inside it.
It's depend upon the obfuscation technology used. Obsfucating variable name is only one part of the issue. A lot of obfuscation tools perform some kind of program flow obfuscation at the same time, which will complicate further code comprehension. At the end, the obfuscated IL won't be expressible easily (if at all) in most programming languages.
Renaming the variables and fields won't help you much either, as having a lot of variable1, variable2.. won't help you to understand what you read.
I'm just starting out with C# and to me it seems like Microsoft Called their new system .Net because you have to use the Internet to look everything up to find useful functions and which class they stashed it in.
To me it seems nonsensical to require procedure/functions written and designed to stand alone ( non instantiated static objects) to have their class not also function as their namespace.
That is Why can't I use Write or WriteLine instead of Console.WriteLine ?
Then when I start to get used to the idea that the objects I am using ( like string) know how to perform operations I am used to using external functions to achieve ( like to upper, tolower, substring, etc) they change the rules with numbers, numbers don't know how to convert themselves from one numeric type to another for some reason, instead you have to invoke Convert class static functions to change a double to an int and Math class static functions to achieve rounding and truncating.. which quickly turns your simple( in other languages) statement to a gazillion character line in C#.
It also seems obsessed with strong typing which interferes somewhat with the thought process when I code. I understand that type safety reduces errors , but I think it also increases complexity, sometimes unnecessarily. It would be nice if you could choose context driven types when you wish without the explicit Casting or Converting or ToStringing that seems to be basic necessity in C# to get anything done.
So... Is it even possible to write meaningful code in notepad and use cl with out Internet access? What ref book would you use without recourse to autocomplete and Network access?
Any suggestions on smoothing the process towards grokking this language and using it more naturally?
I think you're suffering a bit from the fact that you've used to working in one way during some years, and now must take time to get yourself comfortable using / developing in a new platform.
I do not agree with you , that MS hasn't been consistent on the fact that a string knows how it should convert itself to another type, and other datatypes (like ints) do not.
This is not true, since strings do not know for themselves how they should be converted to another type as well. (You can use the Convert class to Convert types to other types).
It is however true that every type in .NET has a ToString() method, but, you should not rely on that method to convert whatever you have to a string.
I think you have never worked in an OO language before, and therefore, you're having some difficulties with the paradigm shift.
Think of it this way: it's all about responsabilities and behaviour. A class is (if it is well designed) responsible for doing one thing, and does this one thing good.
There is no excuse to use notepad to code a modern language. SharpDevelop or Visual C# Express provide the functionality to work with C# in a productive way.
And no, due to the complexity, not using the internet as a source of information is also not a good option.
You could buy a book that introduces you to the concepts of the language in a structured way, but to get up-to-date information, the internet is neccessary.
Yes, there are drawbacks in C#, like in any other language. I can only give you the advice to get used to the language. Many of the drawbacks become understandable after that, even if some of them don't become less annoying. I recommend that you ask clear, direct questions with example code if you want to know how some language constructs work or how you can solve specific problems more efficiently. That makes it easier to answer those questions.
For notepad, I have no useful advice, however I would advise you to use one of the free IDE's, Microsofts Express Editions, or Sharp Develop.
The IDE will speed the groking of the language, at which point, you can switch back to notepad.
Reading your post I was thinking that you worked mostly with C or dynamic languages previously. Maybe C# is just a wrong choice for you, there are IronPython, F# and a bunch of other languages that have necessary functionality (like functions outside of classes etc.)
I disagree with you about consistency. In fact there are small inconsistency between some components of .NET, but most part of FW is very consistent and predictable.
Strong typing is a huge factor in low defect count. Dynamic typing plays nice in small/intermediate projects (like scripts, etc). In more or less complex program dynamism can introduce a lot of complexity.
Regarding internet/autocomplete - I can hardly imagine any technology with size of .NET that doesn't require a lot of knowledge sources.
Programming in c# using notepad is like buying a ferrari to drive in dirt roads.
At least use Visual Studio Express Edition. For what you wrote I understand that you come from a non OO background, try to learn the OO concept and try to use it. You will eventually understand most design decisions made for .Net.
http://en.wikipedia.org/wiki/Object-oriented_programming
Oh boy where do i start with you(this will be a long post hahaha), well, lets go little by little:
"Microsoft called their system .NET because you have to use Intenet...", the reason why is called .NET is because the SUITE OF MICROSOFT LANGUAGUES(and now some other ones too like Phyton and Ruby, etc) CAN CALL ANY LIBRARY or DLLs, example you can "NET"(Network OR CALL) a DLL that was built in Visual Basic, F#, C++ from WITHIN C# or from any of those languagues you can also call(or ".NET") C# libraries. OK ONE DOWN!!!
NEXT ONE: "it seems nonsensical to require....to have their class not also function as their namespace", this is because a Namespace can have AS MANY CLASSES AS YOU WISH, and your question:
"That is Why can't I use Write or WriteLine instead of Console.WriteLine ?".
The reason is because: "Console"(System.Console hense the "Using" statement at the beginning of your program) Namespace is where "Write" and "WriteLine" LIVES!!(you can also FULLY qualify it (or "call It"). (all this seems to me that you need to study C# Syntax), ok NEXT:
"when I start to get used to the idea that the objects...", ok in simple words:
C# is a "Strongly Type-Safe language" so that SHOULD-MUST tell you what "you are getting in to" otherwise STAY WITH "WEAK or NO TYPE SAFE LANGUAGES" LIKE PHP or C , etc. this does NOT means is bad it just MEANS IS YOUR JOB TO MAKE SURE, as i tell my students: "IF YOU NEED AN INT THEN DEFINE AN INT INSTEAD LETTING THE COMPILER DO IT FOR YOU OTHERWISE YOU WILL HAVE A LOT OF BAD BUGS", or in other words do YOUR homework BEFORE DESIGNING A PIECE OF SOFTWARE.
Note: C# is IMPLICITY TYPE SAFE language SO IF YOU WANT YOU CAN RUN IT AS UNSAFE so from then it wiLL be your job to make sure, so dont complain later(for being lazy) when bugs arrive AT RUNTIME(and a lot of times when the customer is already using your crappy software).
...and last but not least : Whey do you wan to shoot yourself by using notepad? Studio Express is FREE, even the database SQL SERVER is FREE TOO!!, unless you work for a company I WILL ASK FOR PRO, ETC. all the "extra" stuff is for large companies, teams, etc, YOU CAN DO 99% OF THE STUFF WITH THE FREE VERSIONS(and you can still buy-update to full version once you want to scalate to Distributed Software or a Large Project, or if your software becomes a big hit, Example: if you need millions of queryes or hits PER SECOND from your database or 100 people are working on same project(code) but for the majority of times for 2 or 3 "normal" developers working at home or small office the FREE ONES ARE ENOuGH!!)
cherrsss!!! (PS: Software Developer since the 80's)