Is there a programatic way to identify c# reserved words? - c#

I'm looking for a function like
public bool IsAReservedWord(string TestWord)
I know I could roll my own by grabbing a reserve word list from MSDN. However I was hoping there was something built into either the language or .NET reflection that could be relied upon so I wouldn't have to revisit the function when I move to newer versions of C#/.NET.
The reason I'm looking for this is I'm looking for a safeguard in .tt file code generation.

CSharpCodeProvider cs = new CSharpCodeProvider();
var test = cs.IsValidIdentifier("new"); // returns false
var test2 = cs.IsValidIdentifier("new1"); // returns true

The Microsoft.CSharp.CSharpCodeGenerator has an IsKeyword(string) method that does exactly that. However, the class is internal, so you have to use reflection to access it and there's no guarantee it will be available in future versions of the .NET framework. Please note that IsKeyword doesn't take care of different versions of C#.
The public method System.CodeDom.Compiler.ICodeGenerator.IsValidIdentifier(string) rejects keywords as well. The drawback is this method does some other validations as well, so other non-keyword strings are also rejected.
Update: If you just need to produce a valid identifier rather than decide if a particular string is a keyword, you can use ICodeGenerator.CreateValidIdentifier(string). This method takes care of strings with two leading underscores as well by prefixing them with one more underscore. The same holds for keywords. Note that ICodeGenerator.CreateEscapedIdentifier(string) prefixes such strings with the # sign.
Identifiers startings with two leading underscores are reserved for the implementation (i.e. the C# compiler and associated code generators etc.), so avoiding such identifiers from your code is generally a good idea.
Update 2: The reason to prefer ICodeGenerator.CreateValidIdentifier over ICodeGenerator.CreateEscapedIdentifier is that __x and #__x are essentially the same identifier. The following won't compile:
int __x = 10;
int #__x = 20;
In case the compiler would generate and use a __x identifier, and the user would use #__x as a result to a call to CreateEscapedIdentifier, a compilation error would occur. When using CreateValidIdentifier this situation is prevented, because the custom identifier is turned into ___x (three underscores).

However I was hoping there was something built into either the language or .NET reflection that could be relied upon so I wouldn't have to revisit the function when I move to newer versions of C#/.NET.
Note that C# has never added a new reserved keyword since v1.0. Every new keyword has been an unreserved contextual keyword.
Though it is of course possible that we might add a new reserved keyword in the future, we have tried hard to avoid doing so.
For a list of all the reserved and contextual keywords up to C# 5, see
http://ericlippert.com/2009/05/11/reserved-and-contextual-keywords/

static System.CodeDom.Compiler.CodeDomProvider CSprovider =
Microsoft.CSharp.CSharpCodeProvider.CreateProvider("C#");
public static string QuoteName(string name)
{
return CSprovider.CreateEscapedIdentifier(name);
}
public static bool IsAReservedWord(string TestWord)
{
return QuoteName(TestWord) != TestWord;
}
Since the definition of CreateEscapedIdentifier is:
public string CreateEscapedIdentifier(string name)
{
if (!IsKeyword(name) && !IsPrefixTwoUnderscore(name))
{
return name;
}
return ("#" + name);
}
it will properly identify __ identifiers as reserved.

Related

How to include keywords and aliases in Roslyn recommended symbols?

I am using Roslyn to create a C# scripting control with IntelliSense.
I am generally very happy with the results I am getting, however, the recommended symbols don't include keywords such as for and if et cetera and also don't contain type aliases such as int, when it includes Int32.
More specifically, I am using Microsoft.CodeAnalysis.Recommendations, that is:
Recommender.GetRecommendedSymbolsAtPositionAsync(mySemanticModel, scriptPosition, myAdhocWorkspace);
My SemanticModel object is obtained from a C# compilation which always has a reference to mscorlib.dll at the very least.
At all positions in my script, the recommended completions are always correct. However, I would argue that they are incomplete if they are missing keywords such as if, else and for etc.
I can see that it would be easy for me to include common type aliases in my IntelliSense manually. That is, if Int32 is a possible completion, then I could manually add int.
However, it is less obvious when an if statement or a for statement or even is/as would be appropriate in the given scope.
Is there a way to include these keywords when getting the recommended symbols this way?
Is there also a way to automatically include type aliases?
It seems that Recommender.GetRecommendedSymbolsAtPositionAsync provides only symbols completion. That mean, Methods, Types etc (ISymbol implementations).
If you want keywords or snippets completion, you can use Microsoft.CodeAnalysis.Completion.CompletionService
void CompletionExample()
{
var code = #"using System;
namespace NewConsoleApp
{
class NewClass
{
void Method()
{
fo // I want to get 'for' completion for this
}
}
}";
var completionIndex = code.LastIndexOf("fo") + 2;
// Assume you have a method that create a workspace for you
var workspace = CreateWorkspace("newSln", "newProj", code);
var doc = workspace.CurrentSolution.Projects.First().Documents.First();
var service = CompletionService.GetService(doc);
var completionItems = service.GetCompletionsAsync(doc, completionIndex).Result.Items;
foreach (var result in completionItems)
{
Console.WriteLine(result.DisplayText);
Console.WriteLine(string.Join(",", result.Tags));
Console.WriteLine();
}
}
You can play around to figure it out how to customize it for your needs (rules, filters).
Notice that each result comes from a specific completion provider (item.Properties["Provider"]) and you can create a custom CompletionProvider (at least you should be able).
You can also take a look at C# for VS code (that powered with OmniSharp) to see how they did the work.

Declaring constants with the nameof() the constant as the value

Scenario
I have a class for declaring string constants used around a program:
public static class StringConstants
{
public const string ConstantA = "ConstantA";
public const string ConstantB = "ConstantB";
// ...
}
Essentially, it doesn't matter what the actual value of the constant is, as it used when assigning and consuming. It is just for checking against.
The constant names will be fairly self-explanatory, but I want to try and avoid using the same string values more than once.
What I would like to do
I know that nameof() is evaluated at compile-time, so it is entirely possible to assign the const string's value to the nameof() a member.
In order to save writing these magic strings out, I have thought about using the nameof() the constant itself.
Like so:
public static class StringConstants
{
public const string ConstantA = nameof(ConstantA);
public const string ConstantB = nameof(ConstantB);
// ...
}
Question...
I guess there is no real benefit of using the nameof(), other than for refactoring?
Are there any implications to using nameof() when assigning constants?
Should I stick to just using a hard-coded string?
Whilst I think the use of nameof is clever, I can think of a few scenarios where it might cause you a problem (not all of these might apply to you):
1/ There are some string values for which you can't have the name and value the same. Any string value starting with a number for example can't be used as a name of a constant. So you will have exceptions where you can't use nameof.
2/ Depending how these values are used (for example if they are names of values stored in a database, in an xml file, etc), then you aren't at liberty to change the values - which is fine until you come to refactor. If you want to rename a constant to make it more readable (or correct the previous developer's spelling mistake) then you can't change it if you are using nameof.
3/ For other developers who have to maintain your code, consider which is more readable:
public const string ConstantA = nameof(ContantA);
or
public const string ConstantA = "ConstantA";
Personally I think it is the latter. In my opinion if you go the nameof route then that might give other developers cause to stop and wonder why you did it that way. It is also implying that it is the name of the constant that is important, whereas if your usage scenario is anything like mine then it is the value that is important and the name is for convenience.
If you accept that there are times when you couldn't use nameof, then is there any real benefit in using it at all? I don't see any disadvantages aside from the above. Personally I would advocate sticking to traditional hard coded string constants.
That all said, if your objective is to simply to ensure that you are not using the same string value more than once, then (because this will give you a compiler error if two names are the same) this would be a very effective solution.
I think nameof() has 2 advantages over a literal strings:
1.) When the name changes, you will get compiler errors unless you change all occurences. So this is less error-prone.
2.) When quickly trying to understand code you didn't write yourself, you can clearly distinguish which context the name comes from. Example:
ViewModel1.PropertyChanged += OnPropertyChanged; // add the event handler in line 50
...
void OnPropertyChanged(object sender, string propertyName) // event handler in line 600
{
if (propertyName == nameof(ViewModel1.Color))
{
// no need to scroll up to line 50 in order to see
// that we're dealing with ViewModel1's properties
...
}
}
Using the nameof() operator with public constant strings is risky. As its name suggests, the value of a public constant should really be constant/permanent. If you have public constant declared with the nameof() and if you rename it later then you may break your client code using the constant. In his book Essential C# 4.0, Mark Michaelis points out: (Emphasis is mine)
public constants should be permanent because changing their value will
not necessarily take effect in the assemblies that use it. If an
assembly references constants from a different assembly, the value of
the constant is compiled directly into the referencing assembly.
Therefore, if the value in the referenced assembly is changed but the
referencing assembly is not recompiled, then the referencing assembly
will still use the original value, not the new value. Values that
could potentially change in the future should be specified as readonly
instead.

Is it a programmatic way to get SQL keywords (reserved words)

I need to validate the Name of a SQL column, which is created programmatically...
There whould be 2 validation rules:
The Name shouldn't be a C# keyword
The Name shouldn't be a SQL keyword (SQL Server 2008 R2)
The solution for 1st rule it's nice:
The CSharpCodeProvider class has the IsValidIdentifier method which makes the implementation of validation easy.
(ex:
string myColumnName = "blabla";
var isValid = _cSharpCodeProvider.IsValidIdentifier(myColumnName);
)
The solution for 2nd rule it's a litle verbose:
The only way I found doing google searches is to take the keywords from MSDN - Reserved Keywords (Transact-SQL) SQL Server 2008 R2
To build a string[] property which will return all these keywords...
(ex:
public static class SqlReservedKeywords {
public static string[] SqlServerReservedKeywords {
get { return SqlServerKeywords; }
}
private static readonly string[] SqlServerKeywords = new[] {
"ADD","EXISTS","PRECISION",
//. . .
"EXEC","PIVOT","WITH",
"EXECUTE","PLAN","WRITETEXT"
};
}
//External code
var isValid = SqlReservedKeywords.SqlServerReservedKeywords.Contains(myColumnName);
)
Can you advice me about implementantion of 2nd validation rule.
Is it a good practice?
Maybe it exist another way to implement which i didn't found by googling...
Reserved words are a moving target. If the dbms doesn't expose them through a public interface, there isn't usually a good programmatic way to get to them.
If you don't want to guard them with brackets, you risk incorporating symbols that are not reserved in your currently used version of SQL Server, but are reserved in some future version.
I think your best bet is to use the quoting mechanism your dbms provides, since it's designed to deal with exactly this problem. For SQL Server, that means square brackets.
Since there is a function you can call for C#, the real question is how to do the lookup for SQL Reserved words. The way you implemented look up here is NOT the most efficient C#. You should use a HashSet -- quick untested code example follows:
public static class SqlReservedKeywords {
public bool isReserved(string in)
{
return SqlServerKeywords.Contains(in.ToUpper());
}
private static HashSet<string> SqlServerKeywords = new HashSet<string>();
static SqlReservedKeywords()
{
SqlServerKeywords.Add("ADD");
SqlServerKeywords.Add("EXISTS");
SqlServerKeywords.Add("PRECISION");
//. . .
SqlServerKeywords.Add("EXEC");
SqlServerKeywords.Add("PIVOT");
SqlServerKeywords.Add("WITH");
SqlServerKeywords.Add("EXECUTE");
SqlServerKeywords.Add("PLAN");
SqlServerKeywords.Add("WRITETEXT");
}
}
Here is a nice article (by #theburningmonk) showing how fast HashSet is when using Contains
(For those that don't want to click, HashSet is zero)
http://theburningmonk.com/2011/03/hashset-vs-list-vs-dictionary/
Generally, the approach looks correct. Getting keywords for any given language involves a (hopefully small) bit of trial and error due to anything undocumented but the main source is always the language specification itself. I don't know of any languages that come with their own validators but that's not to say they don't exist.
Visual Studio itself has a set of xml files that help it do the validation for any given language. If you were developing an IDE, you might have a table that looked something like;
Keyword | MatchWithRegEx | Color
------------+----------------+---------
for | \wfor | #FF0000
...you get the idea. In your case, you just want to filter out possible problem keywords so that they don't throw an exception. Allowing an exception to be thrown and catching and handling it specifically is a valid methodology albeit not a very clean one.
As for your case, the only real tweak I'd make is not having the list of keywords buried into the program at compile time but instead store the list in an external file which is loaded at your application's starting point. This allows some flexibility if you forget anything or need to support later versions of a language without requiring a rebuild of your application.

Getting the CLR ID

Is there any where to get the CLR ID at runtime for the current application? I am monitoring my system using Performance Monitors and the name used for the instance is:
ApplicationName.exe_p4952_r15_ad1
I can get all other parameters programmatically but not the r15 which is the runtime ID of the common language runtime (instance) that executes your code. I noticed it is always 15, but it is best to get it dynamically to avoid complications.
You can get the whole "suffix", i.e. the part after Application.exe using the same infrastructure that the .NET framework (e.g. the peformance counters) does.
There is the method System.Runtime.Versioning.VersioningHelper.MakeVersionSafeName, which can do that. Note that the method is described as being "Infrastructure" and "This API supports the .NET Framework infrastructure and is not intended to be used directly from your code.", but nevertheless is public. I don't think there is any "better supported" way to get the information you want. At least it is more robust and resilient to future changes, then reverse engineering the information based on documentation.
string suffix = System.Runtime.Versioning.VersioningHelper.MakeVersionSafeName("",
System.Runtime.Versioning.ResourceScope.Machine,
System.Runtime.Versioning.ResourceScope.AppDomain));
This returns _p4472_r16_ad1, for example.
Of course, you could also directly pass the basename of the performance counter to directly get the full name. The usage of the empty string above is only a trick to just get the "suffix".
string str = VersioningHelper.MakeVersionSafeName("Application.exe",
ResourceScope.Machine, ResourceScope.AppDomain);
// str -> "Application.exe_p4472_r16_ad1".
The class VersioningHelpers also has the private method GetRuntimeId(), but given the above, I don't think it is neccessary to use reflection on that achieve what you need.
As far as I can see there is no way to predict what that value will be - here is a quote from the MSDN page you linked (emphasis mine)
runtimeID is a common language runtime identifier.
The article is slightly confusing as it gives an example whereby an application myapp.exe hosts two CLR runtimes however in the example the two instances appear to have different process IDs but the same CLR runtime ID.
The article however definitely doesn't give any promises about what the value of the CLR runtime ID will be or how to find it (it doesn't even state that its a number), which implies to me that its an internal thing and you shouldn't rely on being able to work out what it is.
My approach would probably be to enumerate through all Perfmon counters and monitor any of them that match your PID. If there is more than one (which will happen if you are using any .Net 2.0 components) then you will just have to monitor both.
Can you give any more information about what it is you are trying to do?
You can find it easily by splitting the string you get :
This function split the instance name , and search for the only part that begins with "r" and does not end with ".exe". Once the right part of the string has been found , just delete the first letter "r" and just keep the number to convert it into an integer number and return it.
If the CLR ID is not found , just return "-1" to let the parent function notice this.
int getClrID(string instance_name)
{
string[] instance_name_parts = instance_name.Split('_');
string clr_id = "";
for (int i = 0; i < instance_name_parts.Length; i++)
{
if (instance_name_parts[i].StartsWith("r") && !instance_name_parts[i].EndsWith(".exe"))
{
clr_id = instance_name_parts[i];
break;
}
}
if (clr_id == "") // An error occured ...
return -1;
else
return Convert.ToInt32(clr_id.Substring(1));
}
I hope I helped you.

Regex: C# method declaration parsing

Could somebody help me parse following from the C# method declaration: scope, isStatic, name, return type and list of the parameters and their types. So given method declaration like this
public static SomeReturnType GetSomething(string param1, int param2)
etc. I need to be able to parse it and get the info above. So in this case
name = "GetSomething"
scope = "public"
isStatic = true
returnType = "SomeReturnType"
and then array of parameter type and name pairs.
Oh almost forgot the most important part. It has to account for all other scopes (protected, private, internal, protected internal), absence of "static", void return type etc.
Please note that REFLECTION is not solution here. I need REGEX.
So far I have these two:
(?:(?:public)|(?:private)|(?:protected)|(?:internal)|(?:protected internal)\s+)*
(?:(?:static)\s+)*
I guess for rest of the problem I can just get away with string manipulation without regex.
Some thoughts on your problem:
A set of strings that can all be matched by a particular regular expression is called a regular language. The set of strings which are legal method declarations is not a regular language in any version of C#. If you are attempting to find a regular expression which matches every legal C# method declaration and rejects every illegal C# method declaration then you are out of luck.
More generally, regular expressions are almost always a bad idea for anything but the simplest matching problems. (Sorry Jeff.) A far better approach is to first write a lexer, which breaks up the string into a sequence of tokens. Then analyze the token sequence. (Using regular expressions as part of a lexer is not a terrible idea, though you can get by without them.)
I note also that you are glossing over rather a lot of complications in parsing method declarations. You did not mention:
generic/array/pointer/nullable return and formal parameter types
generic type parameter declarations
generic type parameter constraints
unsafe/extern/new/override/virtual/abstract/sealed methods
explicit interface implementation methods
method/parameter/return attributes
partial methods -- slightly tricky to parse, partial is a contextual keyword
comments
I also note that you've not said whether you are guaranteed that the method signature is already good, or if you need to identify bad ones and produce diagnostics as to why they're bad. That's a much harder problem.
Why do you want to do this in the first place? Doing this correctly is rather a lot of work. Perhaps there is an easier way to get what you want?
I wouldn't bother with using Regex. When you get to the part of interpreting method parameters, it gets really messy (ref and out keywords for example). I don't know if you need support for attribute notation as well, but that would make it a complete mess.
Maybe a C# parser library can be of help. I've found a few on the internet:
http://www.codeplex.com/csparser (C# 1.0)
http://www.csharpparser.com/
Alternatively, you could first feed the code to the compiler at runtime, and then use reflection on the newly created assembly. It will be slower, but pretty much guaranteed to be correct. Even though you seem to be opposed to the idea of using reflection, this can be a viable solution.
Something like this:
List<string> referenceAssemblies = new List<string>()
{
"System.dll"
// ...
};
string source = "public abstract class TestClass {" + input + ";}";
CSharpCodeProvider codeProvider = new CSharpCodeProvider();
// No assembly name specified
CompilerParameters compilerParameters =
new CompilerParameters(referenceAssemblies.ToArray());
compilerParameters.GenerateExecutable = false;
compilerParameters.GenerateInMemory = false;
CompilerResults compilerResults = codeProvider.CompileAssemblyFromSource(
compilerParameters, source);
// Check for successful compilation here
Type testClass = compilerResults.CompiledAssembly.GetTypes().First();
Then use reflection on testClass.
Compiling should be safe without input validation, because you're not executing any of the code. You'd only need very basic checks, such as making sure only 1 method signature is entered.
Well given the rules you've provided, it would probably be best to use a series of regular expressions rather than trying to come up with a singular expression. That expression would be enormous.
If you're sold on a singular expression, you'll need to use a regular expression that uses grouping, look-ahead and look-behind.
http://www.regular-expressions.info/lookaround.html
Even with the limited scope of what you're trying to parse out of it, you'll still need some very specific guidelines on all possibilities.
string test = #"public static SomeReturnType GetSomething(string param1, int param2)";
var match = Regex.Match(test, #"(?<scope>\w+)\s+(?<static>static\s+)?(?<return>\w+)\s+(?<name>\w+)\((?<parms>[^)]+)\)");
Console.WriteLine(match.Groups["scope"].Value);
Console.WriteLine(!string.IsNullOrEmpty(match.Groups["static"].Value));
Console.WriteLine(match.Groups["return"].Value);
Console.WriteLine(match.Groups["name"].Value);
List<string> parms = match.Groups["parms"].ToString().Split(',').ToList();
parms.ForEach(x => Console.WriteLine(x));
Console.Read();
Broken for parms with commas, but it's quite possible to also handle that.
(?<StringRepresentation>\A\s*(?:(?:(?<Comment>(?://.*\n)|(?:/\*(?:[\w\d!##$%^&*()\[\]<>,.;\\"':|{}`~+=-_?\s]*)?\*/))|(\[\s*(?<Attributes>\w*)[^\[\]]*?\]))\s*)*?(?:(?:(?<Access>protected\s+internal|internal\s+protected|private|public|protected|internal)\s+)?(?:(?<InheritanceModifier>new|abstract|override|virtual)\s+)?(?:(?<Static>static)\s+)?(?:(?<Extern>extern)\s+)?(?:partial\s+)?)+(?:(?<Type>\w+(?:[\w,.\?\[\]])*?(?:\<.*>)*?)\s+)?(?<Operator>operator\s+)?\s*(?<Name>~?(?:[\w\=+\-\!\~\d\.])+?)\s*(?:\<(?:\w\.*\d*\,*\s*)+\>)*\s*\((?<Parameters>(?:[^()])*?)\)\s*(?:where\s+.+)?\s*(?:\:\s*(?:this|base)\s*(?:\(?[^\(\)]*(?:(?:(?:(?<OpenC>\()[^\(\)]*)+(?:(?<CloseC-OpenC>\))[^\(\)]*?)+)*(?(OpenC)(?!))\)))\s*)?(?:;|(?<ah>\{[^\{\}]*(?:(?:(?:(?<Open>\{)[^\{\}]*)+(?:(?<Close-Open>\})[^\{\}]*?)+)*(?(Open)(?!))\}))))
I can't personally take credit for this one, but the guy who made Regionerate (open source) came up with this and it works pretty well for parsing methods in general.

Categories