Related
Let's say I have a WinForm App...written in C#.
Is it possible?
After all, put my eye on Iron Python.
C# is not interpreted, so unlike javascript or other interpreted languages you can't do that natively. You can go four basic routes, listed here in order of least to most complex...
1) Provide a fixed set of operations that the user can apply. Parse the user's input, or provide checkboxes or other UI elements to indicate that a given operation should be applied.
2) Provide a plugin-based or otherwise dynamically defined set of operations. Like #1, this has the advantage of not needing special permissions like full trust. MEF might come in handy for this approach: http://mef.codeplex.com/
3) Use a dynamic c# compilation framework like paxScript: http://eco148-88394.innterhost.net/paxscriptnet/. This would, in theory, allow you to compile small c# snippets on demand.
4) Use IL Emit statements to parse code and generate your operations on the fly. This is by far the most complex solution, likely requires full trust, and is extremely error prone. I don't recommend it unless you have some very obscure requirements and sophisticated users.
The CSharpCodeProvider class will do what you want. For a (VERY outdated, but still working with a few tweaks) example of its use, check out CSI.
If you are willing to consider targeting the Mono runtime, the type Mono.CSharp.Evaluator provides an API for evaluating C# expressions and statements at runtime.
Is obfuscation only about garbling the names of non-public variables/members? If so, would it not be possible to write an application that would at least change these names more readible ones like "variable1", etc, and then extract the whole code that can still be compiled?
No, it is about a lot more, especially with more sophisticated obfuscators. They can produce IL that cannot be expressed in most languages, and where the logic flow is horribly tangled to befuddle the best of tools. With lots of time you can do it (probably lots by hand), and there is certainly an arms race between the obfuscators and deobfuscators - but you vastly underestimate the technology here.
Also, note that many obfuscators look at an entire application (not just one assembly), so they can change the public API too.
That is certainly the start of an obfuscator. Though some obfuscators will also encrypt strings and other such tricks to make it very difficult to reverse engineer the assembly.
Of course, since the runtime needs to run the assembly after all of this, it is possible for a determined hacker to reverse engineer it :)
There are 'deobfuscator' tools to undo several obfuscation techniques like Decrypt strings, Remove proxy methods, Devirtualize virtualized code, Remove anti-debug code, Remove junk classes, Restore the types of method parameters and fields and more...
One very powerful tool is de4dot.
But there are more.
Obfuscation is about changing meaningful names like accountBalance to meaningless ones like a1.
The application will obviously still work, but it will be more difficult to understand the algorithms inside it.
It's depend upon the obfuscation technology used. Obsfucating variable name is only one part of the issue. A lot of obfuscation tools perform some kind of program flow obfuscation at the same time, which will complicate further code comprehension. At the end, the obfuscated IL won't be expressible easily (if at all) in most programming languages.
Renaming the variables and fields won't help you much either, as having a lot of variable1, variable2.. won't help you to understand what you read.
I have heard people state that Code Generators and T4 templates should not be used. The logic behind that is that if you are generating code with a generator then there is a better more efficient way to build the code through generics and templating.
While I slightly agree with this statement above, I have not really found effective ways to build templates that can say for instance instantiate themselves. In otherwords I can never do :
return new T();
Additionally, if I want to generate code based on database values I have found that using Microsoft.SqlServer.Management.SMO in conjunction with T4 templates have been wonderful at generating mass amounts of code without having to copy / paste or use resharper.
Many of the problems I have found with Generics too is that to my shock there are a lot of developers who do not understand them. When I do examine generics for a solution, there are times where it gets complicated because C# states that you cannot do something that may seem logical in my mind.
What are your thoughts? Do you prefer to build a generator, or do you prefer to use generics? Also, how far can generics go? I know a decent amount about generics, but there are traps and pitfalls that I always run into that cause me to resort to a T4 template.
What is the more proper way to handle scenarios where you need a large amount of flexibility? Oh and as a bonus to this question, what are good resources on C# and Generics?
You can do new T(); if you do this
public class Meh<T>
where T : new()
{
public static T CreateOne()
{
return new T();
}
}
As for code-generators. I use one every day without any problems. I'm using one right now in fact :-)
Generics solve one problem, code-generators solve another. For example, creating a business model using a UML editor and then generating your classes with persistence code as I do all of the time using this tool couldn't be achieved with generics, because each persistent class is completely different.
As for a good source on generics. The best has got to be Jon Skeet's book of course! :-)
As the originator of T4, I've had to defend this question quite a few times as you can imagine :-)
My belief is that at its best code generation is a step on the way to producing equivalent value using reusable libraries.
As many others have said, the key concept to maintain DRY is never, ever changing generated code manually, but rather preserving your ability to regenerate when the source metadata changes or you find a bug in the code generator. At that point the generated code has many of the characteristics of object code and you don't run into copy/paste type problems.
In general, it's much less effort to produce a parameterized code generator (especially with template-based systems) than it is to correctly engineer a high quality base library that gets the usage cost down to the same level, so it's a quick way to get value from consistency and remove repetition errors.
However, I still believe that the finished system would most often be improved by having less total code. If nothing else, its memory footprint would almost always be significantly smaller (although folks tend to think of generics as cost free in this regard, which they most certainly are not).
If you've realised some value using a code generator, then this often buys you some time or money or goodwill to invest in harvesting a library from the generated codebase. You can then incrementally reengineer the code generator to target the new library and hopefully generate much less code. Rinse and repeat.
One interesting counterpoint that has been made to me and that comes up in this thread is that rich, complex, parametric libraries are not the easiest thing in terms of learning curve, especially for those not deeply immersed in the platform. Sticking with code generation onto simpler basic frameworks can produce verbose code, but it can often be quite simple and easy to read.
Of course, where you have a lot of variance and extremely rich parameterization in your generator, you might just be trading off complexity an your product for complexity in your templates. This is an easy path to slide into and can make maintenance just as much of a headache - watch out for that.
Generating code isn't evil and it doesn't smell! The key is to generate the right code at the right time. I think T4 is great--I only use it occasionally, but when I do it is very helpful. To say, unconditionally, that generating code is bad is unconditionally crazy!
It seems to me code generators are fine as long as the code generation is part of your normal build process, rather than something you run once and then keep its output. I add this caveat because if just use the code generator once and discard the data that created it, you're just automatically creating a massive DRY violation and maintenance headache; whereas generating the code every time effectively means that whatever you are using to do the generating is the real source code, and the generated files are just intermediate compile stages that you should mostly ignore.
Lex and yacc are classic examples of tools of allow you to specify functionality in an efficient manner and generate efficient code from it. Trying to do their jobs by hand will lengthen your development time and probably produce less efficient and less readable code. And while you could certainly incorporate something like lex and yacc directly into your code and do their jobs at run time instead of at compile time, that would certainly add considerable complexity to your code and slow it down. If you actually need to change your specification at run time it might be worth it, but in most normal cases using lex/yacc to generate code for you at compile time is a big win.
A good percentage of what is in Visual Studio 2010 would not be possible without code generation. Entity Framework would not be possible. The simple act of dragging and dropping a control onto a form would not be possible, nor would Linq. To say that code generation should not be used is strange as so many use it without even thinking about it.
Maybe it is a bit harsh, but for me code generation smells.
That code generation is used means that there are numerous underlying common principles which may be expressed in a "Don't repeat yourself" fashion. It may take a bit longer, but it is satisfying when you end up with classes that only contain the bits that really change, based on an infrastructure that contains the mechanics.
As to Generics...no I don't have too many issues with it. The only thing that currently doesn't work is saying that
List<Animal> a = new List<Animal>();
List<object> o = a;
But even that will be possible in the next version of C#.
Code generation is for me a workaround for many problems found in language, frameworks, etc. They are not evil by themselves, I would say it is very very bad (i.e. evil) to release a language (C#) and framework which forces you to copy&paste (swap on properties, events triggering, lack of macros) or use magical numbers (wpf binding).
So, I cry, but I use them, because I have to.
I've used T4 for code generation and also Generics. Both are good, have their pros and cons, and are suited for different purposes.
In my case, I use T4 to generate Entities, DAL and BLL based on a database schema. However, DAL and BLL reference a mini-ORM I built, based on Generics and Reflection. So I think you can use them side by side, as long as you keep in control and keep it small and simple.
T4 generates static code, while Generics is dynamic. If you use Generics, you use Reflection which is said to be less performant than "hard-coded" solution. Of course you can cache reflection results.
Regarding "return new T();", I use Dynamic Methods like this:
public class ObjectCreateMethod
{
delegate object MethodInvoker();
MethodInvoker methodHandler = null;
public ObjectCreateMethod(Type type)
{
CreateMethod(type.GetConstructor(Type.EmptyTypes));
}
public ObjectCreateMethod(ConstructorInfo target)
{
CreateMethod(target);
}
void CreateMethod(ConstructorInfo target)
{
DynamicMethod dynamic = new DynamicMethod(string.Empty,
typeof(object),
new Type[0],
target.DeclaringType);
ILGenerator il = dynamic.GetILGenerator();
il.DeclareLocal(target.DeclaringType);
il.Emit(OpCodes.Newobj, target);
il.Emit(OpCodes.Stloc_0);
il.Emit(OpCodes.Ldloc_0);
il.Emit(OpCodes.Ret);
methodHandler = (MethodInvoker)dynamic.CreateDelegate(typeof(MethodInvoker));
}
public object CreateInstance()
{
return methodHandler();
}
}
Then, I call it like this:
ObjectCreateMethod _MetodoDinamico = new ObjectCreateMethod(info.PropertyType);
object _nuevaEntidad = _MetodoDinamico.CreateInstance();
More code means more complexity. More complexity means more places for bugs to hide, which means longer fix cycles, which in turn means higher costs throughout the project.
Whenever possible, I prefer to minimize the amount of code to provide equivalent functionality; ideally using dynamic (programmatic) approaches rather than code generation. Reflection, attributes, aspects and generics provide lots of options for a DRY strategy, leaving generation as a last resort.
Generics and code generation are two different things. In some cases you could use generics instead of code generation and for those I believe you should. For the other cases code generation is a powerful tool.
For all the cases where you simply need to generate code based on some data input, code generation is the way to go. The most obvious, but by no means the only example is the forms editor in Visual Studio. Here the input is the designer data and the output is the code. In this case generics is really no help at all, but it is very nice that VS simply generates the code based on the GUI layout.
Code generators could be considered a code smell that indicate a flaw or lack of functionality in the target langauge.
For example, while it has been said here that "Objects that persist can not be generalized", it would be better to think of it as "Objects in C# that automatically persist their data can not be generalized in C#", because I surely can in Python through the use of various methods.
The Python approach could, however, be emulated in static languages through the use of operator[ ](method_name as string), which either returns a functor or a string, depending on requirements. Unfortunately that solution is not always applicable, and returning a functor can be inconvenient.
The point I am making is that code generators indicate a flaw in a chosen language that are addressed by providing a more convenient specialised syntax for the specific problem at hand.
The copy/paste type of generated code (like ORMs make) can also be very useful...
You can create your database, and then having the ORM generate a copy of that database definition expressed in your favorite language.
The advantage comes when you change your original definition (the database), press compile and the ORM (if you have a good one) can re-generates your copy of the definition. Now all references to your database can be checked by the compilers type checker and your code will fail to compile when you're using tables or columns that do not exist anymore.
Think about this: If I call a method a few times in my code, am I not referring to the name I gave to this method originally? I keep repeating that name over and over... Language designers recognized this problem and came up with "Type-safety" as the solution. Not removing the copies (as DRY suggests we should do), but checking them for correctness instead.
The ORM generated code brings the same solution when referring to table and column names. Not removing the copies/references, but bringing the database definition into your (type-safe) language where you can refer to classes and properties instead. Together with the compilers type checking, this solves a similar problem in a similar way: Guarantee compile-time errors instead of runtime ones when you refer to outdated or misspelled tables (classes) or columns (properties).
quote:
I have not really found effective ways to build templates that can say for instance instantiate themselves. In otherwords I can never do :
return new T();
public abstract class MehBase<TSelf, TParam1, TParam2>
where TSelf : MehBase<TSelf, TParam1, TParam2>, new()
{
public static TSelf CreateOne()
{
return new TSelf();
}
}
public class Meh<TParam1, TParam2> : MehBase<Meh<TParam1, TParam2>, TParam1, TParam2>
{
public void Proof()
{
Meh<TParam1, TParam2> instanceOfSelf1 = Meh<TParam1, TParam2>.CreateOne();
Meh<int, string> instanceOfSelf2 = Meh<int, string>.CreateOne();
}
}
Why does being able to copy/paste really, really fast, make it any more acceptable?
That's the only justification for code generation that I can see.
Even if the generator provides all the flexibility you need, you still have to learn how to use that flexibility - which is yet another layer of learning and testing required.
And even if it runs in zero time, it still bloats the code.
I rolled my own data access class. It knows everything about connections, transactions, stored procedure parms, etc, etc, and I only had to write all the ADO.NET stuff once.
It's now been so long since I had to write (or even look at) anything with a connection object in it, that I'd be hard pressed to remember the syntax offhand.
Code generation, like generics, templates, and other such shortcuts, is a powerful tool. And as with most powerful tools, it amplifies the capaility of its user for good and for evil - they can't be separated.
So if you understand your code generator thoroughly, anticipate everything it will produce, and why, and intend it to do so for valid reasons, then have at it. But don't use it (or any of the other technique) to get you past a place where you're not to sure where you're headed, or how to get there.
Some people think that, if you get your current problem solved and some behavior implemented, you're golden. It's not always obvious how much cruft and opaqueness you leave in your trail for the next developer (which might be yourself.)
Do class, method and variable names get included in the MSIL after compiling a Windows App project into an EXE?
For obfuscation - less names, harder to reverse engineer.
And for performance - shorter names, faster access.
e.g. So if methods ARE called via name:
Keep names short, better performance for named-lookup.
Keep names cryptic, harder to decompile.
Yes, they're in the IL - fire up Reflector and you'll see them. If they didn't end up in the IL, you couldn't build against them as libraries. (And yes, you can reference .exe files as if they were class libraries.)
However, this is all resolved once in JIT.
Keep names readable so that you'll be able to maintain the code in the future. The performance issue is unlikely to make any measurable difference, and if you want to obfuscate your code, don't do it at the source code level (where you're the one to read the code) - do it with a purpose-built obfuscator.
EDIT: As for what's included - why not just launch Reflector or ildasm and find out? From memory, you lose local variable names (which are in the pdb file if you build it) but that's about it. Private method names and private variable names are still there.
Yes, they do. I do not think that there will be notable performance gain by using shorter names. There is no way that gain overcomes the loss of readability.
Local variables are not included in MSIL. Fields, methods, classes etc are.
Variables are index based.
Member names do get included in the IL whether they are private or public. In fact all of your code gets included too, and if you'd use Reflector, you can practically read all the source code of the application. What's left is debugging the app, and I think there might be tools for that.
You must ABSOLUTELY (and I can't emphasize it more) obfuscate your code if you're making packaged applications that have a number of clients and competition. Luckily there are a number of obfuscators available.
This is a major gripe that I have with .Net. Since MS is doing so much hard work on this, why not develop (or acquire) a professional obfuscator and make that a part of VS. Dotfuscator just doesn't cut it, not the version they've for community.
Keep names short, better
performance for named-lookup.
How could this make any difference? I'm not sure how identifiers are looked up by the VM, but I'm pretty sure it's not doing a straight string comparison lookup. This would be the worst possible way to do it.
Keep names cryptic, harder to decompile.
To be honest, I don't think code obfuscation helps that much. Most competent developers out there have already developed a "sixth sense" to figure out things quickly even if identifiers like method names are totally unhelpful since very often the source code they need to maintain or improve already has these problems (I am talking about method names like "DoAllStuff()").
Anyway, security through obscurity is usually a bad idea.
If you are concerned about obfuscation check out .NET Reactor. I tested 8 different obfuscators and Reactor was not only the cheapest commercial one, it was the second best of the bunch (the best was the most expensive one, Dotfuscator Gold).
[EDIT]
Actually now that I think of it, if all you care about is obfuscating method names then the one that comes with VS.NET, Dotfuscator Community Edition, should work fine.
I think they're added, but the length of the name isn't going to affect anything, because of the way the function names are looked up. As for obfuscation, I think there are tools (Dotfuscator or something like that) that basically do exactly what you're saying.
I'm just starting out with C# and to me it seems like Microsoft Called their new system .Net because you have to use the Internet to look everything up to find useful functions and which class they stashed it in.
To me it seems nonsensical to require procedure/functions written and designed to stand alone ( non instantiated static objects) to have their class not also function as their namespace.
That is Why can't I use Write or WriteLine instead of Console.WriteLine ?
Then when I start to get used to the idea that the objects I am using ( like string) know how to perform operations I am used to using external functions to achieve ( like to upper, tolower, substring, etc) they change the rules with numbers, numbers don't know how to convert themselves from one numeric type to another for some reason, instead you have to invoke Convert class static functions to change a double to an int and Math class static functions to achieve rounding and truncating.. which quickly turns your simple( in other languages) statement to a gazillion character line in C#.
It also seems obsessed with strong typing which interferes somewhat with the thought process when I code. I understand that type safety reduces errors , but I think it also increases complexity, sometimes unnecessarily. It would be nice if you could choose context driven types when you wish without the explicit Casting or Converting or ToStringing that seems to be basic necessity in C# to get anything done.
So... Is it even possible to write meaningful code in notepad and use cl with out Internet access? What ref book would you use without recourse to autocomplete and Network access?
Any suggestions on smoothing the process towards grokking this language and using it more naturally?
I think you're suffering a bit from the fact that you've used to working in one way during some years, and now must take time to get yourself comfortable using / developing in a new platform.
I do not agree with you , that MS hasn't been consistent on the fact that a string knows how it should convert itself to another type, and other datatypes (like ints) do not.
This is not true, since strings do not know for themselves how they should be converted to another type as well. (You can use the Convert class to Convert types to other types).
It is however true that every type in .NET has a ToString() method, but, you should not rely on that method to convert whatever you have to a string.
I think you have never worked in an OO language before, and therefore, you're having some difficulties with the paradigm shift.
Think of it this way: it's all about responsabilities and behaviour. A class is (if it is well designed) responsible for doing one thing, and does this one thing good.
There is no excuse to use notepad to code a modern language. SharpDevelop or Visual C# Express provide the functionality to work with C# in a productive way.
And no, due to the complexity, not using the internet as a source of information is also not a good option.
You could buy a book that introduces you to the concepts of the language in a structured way, but to get up-to-date information, the internet is neccessary.
Yes, there are drawbacks in C#, like in any other language. I can only give you the advice to get used to the language. Many of the drawbacks become understandable after that, even if some of them don't become less annoying. I recommend that you ask clear, direct questions with example code if you want to know how some language constructs work or how you can solve specific problems more efficiently. That makes it easier to answer those questions.
For notepad, I have no useful advice, however I would advise you to use one of the free IDE's, Microsofts Express Editions, or Sharp Develop.
The IDE will speed the groking of the language, at which point, you can switch back to notepad.
Reading your post I was thinking that you worked mostly with C or dynamic languages previously. Maybe C# is just a wrong choice for you, there are IronPython, F# and a bunch of other languages that have necessary functionality (like functions outside of classes etc.)
I disagree with you about consistency. In fact there are small inconsistency between some components of .NET, but most part of FW is very consistent and predictable.
Strong typing is a huge factor in low defect count. Dynamic typing plays nice in small/intermediate projects (like scripts, etc). In more or less complex program dynamism can introduce a lot of complexity.
Regarding internet/autocomplete - I can hardly imagine any technology with size of .NET that doesn't require a lot of knowledge sources.
Programming in c# using notepad is like buying a ferrari to drive in dirt roads.
At least use Visual Studio Express Edition. For what you wrote I understand that you come from a non OO background, try to learn the OO concept and try to use it. You will eventually understand most design decisions made for .Net.
http://en.wikipedia.org/wiki/Object-oriented_programming
Oh boy where do i start with you(this will be a long post hahaha), well, lets go little by little:
"Microsoft called their system .NET because you have to use Intenet...", the reason why is called .NET is because the SUITE OF MICROSOFT LANGUAGUES(and now some other ones too like Phyton and Ruby, etc) CAN CALL ANY LIBRARY or DLLs, example you can "NET"(Network OR CALL) a DLL that was built in Visual Basic, F#, C++ from WITHIN C# or from any of those languagues you can also call(or ".NET") C# libraries. OK ONE DOWN!!!
NEXT ONE: "it seems nonsensical to require....to have their class not also function as their namespace", this is because a Namespace can have AS MANY CLASSES AS YOU WISH, and your question:
"That is Why can't I use Write or WriteLine instead of Console.WriteLine ?".
The reason is because: "Console"(System.Console hense the "Using" statement at the beginning of your program) Namespace is where "Write" and "WriteLine" LIVES!!(you can also FULLY qualify it (or "call It"). (all this seems to me that you need to study C# Syntax), ok NEXT:
"when I start to get used to the idea that the objects...", ok in simple words:
C# is a "Strongly Type-Safe language" so that SHOULD-MUST tell you what "you are getting in to" otherwise STAY WITH "WEAK or NO TYPE SAFE LANGUAGES" LIKE PHP or C , etc. this does NOT means is bad it just MEANS IS YOUR JOB TO MAKE SURE, as i tell my students: "IF YOU NEED AN INT THEN DEFINE AN INT INSTEAD LETTING THE COMPILER DO IT FOR YOU OTHERWISE YOU WILL HAVE A LOT OF BAD BUGS", or in other words do YOUR homework BEFORE DESIGNING A PIECE OF SOFTWARE.
Note: C# is IMPLICITY TYPE SAFE language SO IF YOU WANT YOU CAN RUN IT AS UNSAFE so from then it wiLL be your job to make sure, so dont complain later(for being lazy) when bugs arrive AT RUNTIME(and a lot of times when the customer is already using your crappy software).
...and last but not least : Whey do you wan to shoot yourself by using notepad? Studio Express is FREE, even the database SQL SERVER is FREE TOO!!, unless you work for a company I WILL ASK FOR PRO, ETC. all the "extra" stuff is for large companies, teams, etc, YOU CAN DO 99% OF THE STUFF WITH THE FREE VERSIONS(and you can still buy-update to full version once you want to scalate to Distributed Software or a Large Project, or if your software becomes a big hit, Example: if you need millions of queryes or hits PER SECOND from your database or 100 people are working on same project(code) but for the majority of times for 2 or 3 "normal" developers working at home or small office the FREE ONES ARE ENOuGH!!)
cherrsss!!! (PS: Software Developer since the 80's)