I've read open source c# code and there is a lot of strange grammar (to me).
They declare method arguments with the this keyword like this:
this object #object
What does it mean?
If I remove 'this' keyword where is before the data type, then will it work differently?
Sounds like an Extension Method.
The # symbol allows the variable name to be the same as a C# keyword - I tend to avoid them like the plague personally.
If you remove the this keyword, it will no longer be an extension method, just a static method. Depending on the calling code syntax, it may no longer compile, for example:
public static class IntegerMethods
{
public static int Add(this int i, int value)
{
return i + value;
}
}
int i = 0;
// This is an "extension method" call, and will only compile against extension methods.
i = i.Add(2);
// This is a standard static method call.
i = IntegerMethods.Add(i, 2);
The compiler will simply translate all "extension method calls" into standard static method calls at any rate, but extension method calls will still only work against valid extension methods as per the this type name syntax.
Some guidelines
These are my own, but I find they are useful.
Discoverability of extension methods can be a problem, so be mindful of the namespace you choose to contain them in. We have very useful stuff under .NET namespaces such as System.Collections or whatever. Less useful but otherwise "common" stuff tends to go under Extensions.<namespace of extended type> such that discoverability is at least consistent via convention.
Try not to extend often used types in broad scope, you don't want MyFabulousExtensionMethod appearing on object throughout your app. If you need to, either constrain the scope (namespace) to be very specific, or bypass extension methods and use a static class directly - these won't pollute the type metadata in IntelliSense.
In extension methods, "this" can be null (due to how they compile into static method calls) so be careful and don't assume that "this" is not null (from the calling side this looks like a successful method call on a null target).
These are optional and not exhaustive, but I find they usually fall under the banner of "good" advice. YMMV.
The 'this type name' syntax is used for extension methods.
For example if I wanted to add a UnCamelCase method to a string (so I could do "HelloWorld".UnCamelCase() to produce "Hello World` - I'd write this:
public static string UnCamelCase(this string text)
{
/*match any instances of a lower case character followed by an upper case
* one, and replace them with the same characters with a space between them*/
return Regex.Replace(text, "([a-z])([A-Z])", "$1 $2");
}
this string text means the specific instance of the string that you're working with, and text is the identifier for it.
The # syntax allows for variable names that are ordinarily reserved.
Related
How can i turn strings like these:
"call System.Console.WriteLine"
"ldstr \"hello\""
into Intructions with Operands?
If you now how to use Mono.Cecil (or Reflection.Emit), the question is more generally about parsing text to code actions.
You have some ways to do that and I can just show you an hint and you can choose your way.
First of all you need some prerequisites (If you IL text is a valid IL code these prerequisites already exist for you).
For example, you can't guess what Console.WriteLine is. Console is an assembly, type, method? same questions on WriteLine. And even if we know that WriteLine is a method, which overload we need to choose? and what about generics? So you need set a contract that define for example that the dot is the delimiter and the first section is assembly, the second is namespace, and so on.
For example:
"mscorlib.System.Console.WriteLine(string)" will be translated to System.Console.WriteLine(string)
After you have a strict contract, you need a few steps (for the WriteLine example):
Resolve the Assembly and get is ModuleDefinition
Resolve and import the Type
Resolve and import the MethodReference describe the requested method
Emit the call
One way to do that, is to keep structure of opcodes and their required action.
For example, we know that call instruction is need to emit call to static method (in most of the cases) so you need to send the operand to ParseStaticCall, there you need to parse the string and emit the call instruction
Pseudo code:
new Dictionary<string, Tuple<OpCode, Action<string>>>
{
{
"call",
Tuple.Create<OpCode, Action<string>>
(OpCodes.Call,ParseStaticCall)
}
};
static void ParseStaticCall(Opcpde opcode,
string call,
ILProcessor processor)
{
string assembly, namespaceName, type, method;
int numOfParameters;
var moduleDefenition = AssemblyResolver.Resolve(assembly).MainModule;
var methodReference =
new ReferenceFinder(moduleDefenition).
GetMethodReference(typeof (Console),
md => md.Name == methodName &&
md.Parameters.Count == numOfParameters);
processor.Emit(opcode, methodReference);
}
AssemblyResolver is an helper class to find assembly by name and path (can be a constant path). ReferenceFinder is an helper class that find type\method in a specific module.
So you need to create method and ILProccesor for the method body, then for every string you have, you need to separate the instruction opcode form the operand, then look in dictionary for the required action and pass the opcode, the operand as string and the ILProccesor.
I'm using BitFactory logging, which exposes a bunch of methods like this:
public void LogWarning(object aCategory, object anObject)
I've got an extension method that makes this a bit nicer for our logging needs:
public static void LogWarning(this CompositeLogger logger,
string message = "", params object[] parameters)
Which just wraps up some common logging operations, and means I can log like:
Logging.LogWarning("Something bad happened to the {0}. Id was {1}",foo,bar);
But when I only have one string in my params object[], then my extension method won't be called, instead the original method will be chosen.
Apart from naming my method something else, is there a way I can stop this from happening?
The rules about how overloaded methods are resolved to one (or an error) are complex (the C# specification is included with Visual Studio for all the gory details).
But there is one simple rule: extension methods are only considered if there is no possible member that can be called.
Because the signature of two objects will accept any two parameters, any call with two parameters will match that member. Thus no extension methods will considered as possibilities.
You could pass a third parameter (eg. String.Empty) and not use it in the format.
Or, and I suspect this is better, to avoid possible interactions with additions to the library (variable length argument list methods are prone to this) rename to LogWarningFormat (akin to the naming of StringBuffer.AppendFormat).
PS. there is no point having a default for the message parameter: it will never used unless you pass no arguments: but that would log nothing.
Declared methods are always preceding extension methods.
If you want to call the extension regardless of the declared method, you have to call it as a regular static method, of the class that declared it.
eg:
LoggerExtensions.LogWarning(Logging, "Something bad happened to the {0}. Id was {1}",foo,bar);
I assume that the extension is declared in a class named LoggerExtensions
Provided that I think a method with a different name is the way to go (easier to read and maintain), as a workaround you could specify parameters as a named parameter:
logger.LogWarning("Something bad happened to the {0}.", parameters: "foo");
I'm a little confused as to why this doesn't give an error. I found this code deep inside of some outdated legacy software and was surprised to see it work.
public static string CleanFileName(this string fileName)
{
return CleanFileName(fileName, 64);
}
public static string CleanFileName(this string fileName, int maxLength)
{
//some logic
}
My experience with extension methods is to call it like this:
fileName.CleanFileName(64);
Does this only work because its a static method as well? Is this common practice and just something I haven't seen yet or a piece of outdated legacy code that I should kill with fire?
Extension methods can always optionally be called as if the "this" modifier was not even there (aka as a normal static method). It's less readable to do this, but syntactically valid.
The other answer is misleading because "It works because the method call is being made from within the same type as its overload." implies something about extension methods. You can invoke extension methods as normal static methods regardless of what class you happen to be in. But through the comments below, it sounds like the confusion is whether the class needs to be qualified or not. And in that vein, Nathan is correct that the reason the class name can be elided is because the call is happening from within the same class as the overload.
It works because the call to CleanFileName(string, int) is being made from within the same type as CleanFileName(string), which allows the call to be made in standard method syntax, rather than extension method syntax. As such, no string instance prefix is required in front of the extension method.
Semantically speaking, static string Foo(this string foo, int bar) { } can be called in the form of Foo(string, int) or string.Foo(int).
Could somebody help me parse following from the C# method declaration: scope, isStatic, name, return type and list of the parameters and their types. So given method declaration like this
public static SomeReturnType GetSomething(string param1, int param2)
etc. I need to be able to parse it and get the info above. So in this case
name = "GetSomething"
scope = "public"
isStatic = true
returnType = "SomeReturnType"
and then array of parameter type and name pairs.
Oh almost forgot the most important part. It has to account for all other scopes (protected, private, internal, protected internal), absence of "static", void return type etc.
Please note that REFLECTION is not solution here. I need REGEX.
So far I have these two:
(?:(?:public)|(?:private)|(?:protected)|(?:internal)|(?:protected internal)\s+)*
(?:(?:static)\s+)*
I guess for rest of the problem I can just get away with string manipulation without regex.
Some thoughts on your problem:
A set of strings that can all be matched by a particular regular expression is called a regular language. The set of strings which are legal method declarations is not a regular language in any version of C#. If you are attempting to find a regular expression which matches every legal C# method declaration and rejects every illegal C# method declaration then you are out of luck.
More generally, regular expressions are almost always a bad idea for anything but the simplest matching problems. (Sorry Jeff.) A far better approach is to first write a lexer, which breaks up the string into a sequence of tokens. Then analyze the token sequence. (Using regular expressions as part of a lexer is not a terrible idea, though you can get by without them.)
I note also that you are glossing over rather a lot of complications in parsing method declarations. You did not mention:
generic/array/pointer/nullable return and formal parameter types
generic type parameter declarations
generic type parameter constraints
unsafe/extern/new/override/virtual/abstract/sealed methods
explicit interface implementation methods
method/parameter/return attributes
partial methods -- slightly tricky to parse, partial is a contextual keyword
comments
I also note that you've not said whether you are guaranteed that the method signature is already good, or if you need to identify bad ones and produce diagnostics as to why they're bad. That's a much harder problem.
Why do you want to do this in the first place? Doing this correctly is rather a lot of work. Perhaps there is an easier way to get what you want?
I wouldn't bother with using Regex. When you get to the part of interpreting method parameters, it gets really messy (ref and out keywords for example). I don't know if you need support for attribute notation as well, but that would make it a complete mess.
Maybe a C# parser library can be of help. I've found a few on the internet:
http://www.codeplex.com/csparser (C# 1.0)
http://www.csharpparser.com/
Alternatively, you could first feed the code to the compiler at runtime, and then use reflection on the newly created assembly. It will be slower, but pretty much guaranteed to be correct. Even though you seem to be opposed to the idea of using reflection, this can be a viable solution.
Something like this:
List<string> referenceAssemblies = new List<string>()
{
"System.dll"
// ...
};
string source = "public abstract class TestClass {" + input + ";}";
CSharpCodeProvider codeProvider = new CSharpCodeProvider();
// No assembly name specified
CompilerParameters compilerParameters =
new CompilerParameters(referenceAssemblies.ToArray());
compilerParameters.GenerateExecutable = false;
compilerParameters.GenerateInMemory = false;
CompilerResults compilerResults = codeProvider.CompileAssemblyFromSource(
compilerParameters, source);
// Check for successful compilation here
Type testClass = compilerResults.CompiledAssembly.GetTypes().First();
Then use reflection on testClass.
Compiling should be safe without input validation, because you're not executing any of the code. You'd only need very basic checks, such as making sure only 1 method signature is entered.
Well given the rules you've provided, it would probably be best to use a series of regular expressions rather than trying to come up with a singular expression. That expression would be enormous.
If you're sold on a singular expression, you'll need to use a regular expression that uses grouping, look-ahead and look-behind.
http://www.regular-expressions.info/lookaround.html
Even with the limited scope of what you're trying to parse out of it, you'll still need some very specific guidelines on all possibilities.
string test = #"public static SomeReturnType GetSomething(string param1, int param2)";
var match = Regex.Match(test, #"(?<scope>\w+)\s+(?<static>static\s+)?(?<return>\w+)\s+(?<name>\w+)\((?<parms>[^)]+)\)");
Console.WriteLine(match.Groups["scope"].Value);
Console.WriteLine(!string.IsNullOrEmpty(match.Groups["static"].Value));
Console.WriteLine(match.Groups["return"].Value);
Console.WriteLine(match.Groups["name"].Value);
List<string> parms = match.Groups["parms"].ToString().Split(',').ToList();
parms.ForEach(x => Console.WriteLine(x));
Console.Read();
Broken for parms with commas, but it's quite possible to also handle that.
(?<StringRepresentation>\A\s*(?:(?:(?<Comment>(?://.*\n)|(?:/\*(?:[\w\d!##$%^&*()\[\]<>,.;\\"':|{}`~+=-_?\s]*)?\*/))|(\[\s*(?<Attributes>\w*)[^\[\]]*?\]))\s*)*?(?:(?:(?<Access>protected\s+internal|internal\s+protected|private|public|protected|internal)\s+)?(?:(?<InheritanceModifier>new|abstract|override|virtual)\s+)?(?:(?<Static>static)\s+)?(?:(?<Extern>extern)\s+)?(?:partial\s+)?)+(?:(?<Type>\w+(?:[\w,.\?\[\]])*?(?:\<.*>)*?)\s+)?(?<Operator>operator\s+)?\s*(?<Name>~?(?:[\w\=+\-\!\~\d\.])+?)\s*(?:\<(?:\w\.*\d*\,*\s*)+\>)*\s*\((?<Parameters>(?:[^()])*?)\)\s*(?:where\s+.+)?\s*(?:\:\s*(?:this|base)\s*(?:\(?[^\(\)]*(?:(?:(?:(?<OpenC>\()[^\(\)]*)+(?:(?<CloseC-OpenC>\))[^\(\)]*?)+)*(?(OpenC)(?!))\)))\s*)?(?:;|(?<ah>\{[^\{\}]*(?:(?:(?:(?<Open>\{)[^\{\}]*)+(?:(?<Close-Open>\})[^\{\}]*?)+)*(?(Open)(?!))\}))))
I can't personally take credit for this one, but the guy who made Regionerate (open source) came up with this and it works pretty well for parsing methods in general.
The first parameter to a C# extension method is the instance that the extension method was called on. I have adopted an idiom, without seeing it elsewhere, of calling that variable "self". I would not be surprised at all if others are using that as well. Here's an example:
public static void Print(this string self)
{
if(self != null) Console.WriteLine(self);
}
However, I'm starting to see others name that parameter "#this", as follows:
public static void Print(this string #this)
{
if(#this != null) Console.WriteLine(#this);
}
And as a 3rd option, some prefer no idiom at all, saying that "self" and "#this" don't give any information. I think we all agree that sometimes there is a clear, meaningful name for the parameter, specific to its purpose, which is better than "self" or "#this". Some go further and say you can always come up with a more valuable name. So this is another valid point of view.
What other idioms have you seen? What idiom do you prefer, and why?
I name it fairly normally, based on the use. So "source" for the source sequence of a LINQ operator, or "argument"/"parameter" for an extension doing parameter/argument checking, etc.
I don't think it has to be particularly related to "this" or "self" - that doesn't give any extra information about the meaning of the parameter. Surely that's the most important thing.
EDIT: Even in the case where there's not a lot of obvious meaning, I'd prefer some meaning to none. What information is conferred by "self" or "#this"? Merely that it's the first parameter in an extension method - and that information is already obvious by the fact that the parameter is decorated with this. In the example case where theStringToPrint/self option is given, I'd use outputText instead - it conveys everything you need to know about the parameter, IMO.
I name the variable exactly how I would name it if it were a plain old static method. The reason being that it can still be called as a static method and you must consider that use case in your code.
The easiest way to look at this is argument validation. Consider the case where null is passed into your method. You should be doing argument checking and throwing an ArgumentNullException. If it's implemented properly you'll need to put "this" as the argument name like so.
public static void Print(this string #this) {
if ( null == #this ) {
throw new ArgumentNullException("this");
}
...
}
Now someone is coding against your library and suddenly gets an exception dialog which says "this is null". They will be most confused :)
This is a bit of a contrived example, but in general I treat extension methods no different that a plain old static method. I find it makes them easier to reason about.
I have seen obj and val used. I do not like #this. We should try to avoid using keywords. I have never seen self but I like it.
I call it 'target', since the extension method will operate on that parameter.
I believe #this should be avoided as it makes use of the most useless language-specific feature ever seen (#). In fact, anything that can cause confusion or decrease readability such as keywords appearing where they are not keywords should be avoided.
self reminds me of python but could be good for a consistent naming convention as it's clear that it's referring to the instance in use while not requiring some nasty syntactic trickery.
You could do something like this...
public static void Print(this string extended)
{
if(extended != null) Console.WriteLine(extended);
}