I'm trying to use Roslyn to replace some old, slow code in a utility that searches our source code for string literals which are not wrapped in an X() function call (things wrapped in X() will be be translated).
I was able to use the Syntax Tree to get the string literals quite easily, and identified most of the places where they were wrapped in X(). What I was doing: given an LiteralExpressionSyntax object, I found that this gave me the function call and I could match it with a regular expression.
s.Parent.Parent.Parent.ToFullString()
I quickly ran into a problem when the string literal was split between two lines. At that point I realized my means of checking if it was in an X() call was poor, because I'd have to keep adding .Parent to the chain. While I could write something to crawl backwards up the tree, that didn't seem like it was the right way (and probably wouldn't perform very well).
I've been trying to find a way, given a string literal syntax node, to determine if that is an argument in a method call. I haven't been able to find a decent way of going from the Syntax Tree to the Semantic Model to find what I'm looking for. I'm not even sure if that's the right approach, or if I'm missing something obvious.
I was able to use SymbolFinder.FindSymbolAtPositionAsync to get to the symbol, but that suffers from the same issue - I can't just pass the position of the string literal argument, I have to pass the position of the (correct) parent and I'm back where I started.
I'm hoping to avoid having to loop through the syntax tree multiple times, because that slows things way down. I can parse 553 files in about 1 second, but as soon as I try to loop to account for these multi-line situations, I'm up to about 12-13 seconds.
Just in case I've lost you in this novella (sorry), here's what I'm hoping to figure out: for a string literal being passed as an argument to a method, is there an easy way to determine what that method is?
Here is some example code - I've replaced calls to my X() function with Convert.ToString just to simulate the code I'm searching (I had to add references to one of our DLLs, so I switched calls to Convert.ToString() so I could just reference mscorlib for this example.
static void TestAttempt()
{
string source = #"
Imports System
Namespace Exceptions
Public NotInheritable Class ExampleException
Inherits Validation
Public Sub New()
Convert.ToString(""Ignore me 1"")
Console.WriteLine(""Report me"")
Console.WriteLine(Convert.ToString(""Ignore me 2""))
MyBase.New(Convert.ToString(""Ignore me 3, "" & _
""Because I'm already translated.""))
End Sub
End Class
End Namespace";
var tree = VisualBasicSyntaxTree.ParseText(source);
var syntaxRoot = tree.GetRoot();
int i = 0;
foreach (var s in syntaxRoot.DescendantNodes().OfType<LiteralExpressionSyntax>())
{
// things to skip:
if (s == null) { continue; }
if (s.Kind() != SyntaxKind.StringLiteralExpression) { continue; }
var Mscorlib = PortableExecutableReference.CreateFromFile(#"C:\Windows\Microsoft.NET\Framework\v4.0.30319\mscorlib.dll");
var compilation = VisualBasicCompilation.Create("MyCompilation", syntaxTrees: new[] { tree }, references: new[] { Mscorlib });
var model = compilation.GetSemanticModel(tree);
var symbol = SymbolFinder.FindSymbolAtPositionAsync(model, s.Parent.Parent.Parent.Span.Start, new AdhocWorkspace()).Result;
if (symbol.ToDisplayString().EndsWith("Convert")) { continue; }
Console.WriteLine(symbol);
Console.WriteLine($" Reported: {s.ToString()}");
i++;
}
Console.WriteLine();
Console.WriteLine($"Total: {i}");
}
With a warning that I have never used Roslyn before, and I normally code in VB.NET, I think I hacked together something that seems to do what you want (using LINQPad and lots of Dump() calls helped to find out what was going on).
void TestAttempt()
{
string source = #"
Imports System
Namespace Exceptions
Public NotInheritable Class ExampleException
Inherits Validation
Public Sub New()
X(""Ignore me 1"")
Console.WriteLine(""Report me"")
Console.WriteLine(X(""Ignore me 2""))
MyBase.New(X(""Ignore me 3, "" & _
""Because I'm already translated.""))
End Sub
End Class
End Namespace";
var tree = VisualBasicSyntaxTree.ParseText(source);
var syntaxRoot = tree.GetRoot();
int i = 0, notWrapped = 0;
foreach (var s in syntaxRoot.DescendantNodes().OfType<LiteralExpressionSyntax>())
{
// things to skip:
if (s == null) { continue; }
if (s.Kind() != SyntaxKind.StringLiteralExpression) { continue; }
if (!IsWrappedInCallToX(s))
{
Console.WriteLine($" Reported: {s.ToString()}");
notWrapped++;
}
i++;
}
Console.WriteLine();
Console.WriteLine($"Total: {i}, Not Wrapped In X: {notWrapped}");
}
bool IsWrappedInCallToX(SyntaxNode node)
{
var invocation = node as InvocationExpressionSyntax;
if (invocation != null)
{
var exp = invocation.Expression as IdentifierNameSyntax;
if (exp != null && exp.ToString() == "X")
{
return true;
}
}
if (node.Parent != null)
{
return IsWrappedInCallToX(node.Parent);
}
return false;
}
This results in:
Reported: "Report me"
Total: 5, Not Wrapped In X: 1
The IsWrappedInCallToX function just recurses up the tree looking for an InvocationExpressionSyntax for the X function. I know you said "While I could write something to crawl backwards up the tree, that didn't seem like it was the right way (and probably wouldn't perform very well)", but to me it seems like this is the right way - if the performance is horrible on your code base, maybe not!
Again, I know nothing about Roslyn (this just sounded interesting), so this is very likely a terrible solution! :-)
Related
I'm currently attempting to validate a string for an assignment so it's imperative that I'm not simply given the answer, if you provide an answer please give suitable explanation so that I can learn from it.
Suppose I have a string
(1234)-1234 ABCD
I'd like to create a loop that will go through that string and validate the position of the "()" as well as the "-" and " ". In addition to the validation of those characters their position must also be the same as well as the data type. Finally, it must be inside a method.
CANNOT USE REGEX
TLDR;
Validate the position of characters and digits in a string, while using a loop inside of a method. I cannot use REGEX and need to do this manually.
Here's what I have so far. But I feel like the loop would be more efficient and look nicer.
public static string PhoneChecker(string phoneStr)
{
if (phoneStr[0] == '(' && phoneStr[4] == ')' && phoneStr[5] == ' ' && phoneStr[9] == '-' && phoneStr.Length == 14)
{
phoneStr = phoneStr.Remove(0, 1);
phoneStr = phoneStr.Remove(3, 1);
phoneStr = phoneStr.Remove(3, 1);
phoneStr = phoneStr.Remove(6, 1);
Console.WriteLine(phoneStr);
if (int.TryParse(phoneStr, out int phoneInt) == false)
{
Console.WriteLine("Invalid");
}
else
{
Console.WriteLine("Valid");
}
}
else
{
Console.WriteLine("Invalid");
}
return phoneStr;
}
It is still unmaintaible, but still a little better... Note that your code didn't work with your example string (the indexes were off by one).
public static bool PhoneChecker(string phoneStr)
{
if (phoneStr.Length != 16 || phoneStr[0] != '(' || phoneStr[5] != ')' || phoneStr[6] != '-' || phoneStr[11] != ' ')
{
return false;
}
if (!uint.TryParse(phoneStr.Substring(1, 4), out uint phoneInt))
{
return false;
}
if (!uint.TryParse(phoneStr.Substring(7, 4), out phoneInt))
{
return false;
}
// No checks for phoneStr.Substring(12, 4)
return true;
}
Some differences:
The Length check is the first one. Otherwise a short string would make the program crash (because if you try to do a phoneStr[6] on a phoneStr that has a length of 3 you'll get an exception)
Instead of int.Parse I used uint.Parse, otherwise -500 would be acceptable.
I've splitted the uint.Parse for the two subsections of numbers in two different check
The method returns true or false. It is the caller's work to write the error message.
There are various school of thought about early return in code: I think that the earlier you can abort your code with a return false the better it is. The other advantage is that all the remaining code is at low nesting level (your whole method was inside a big if () {, so nesting +1 compared to mine)
Technically you tagged the question as C#-4.0, but out int is C#-6.0
The main problem here is that stupid constraints produce stupid code. It is rare that Regex are really usefull. This is one of the rare cases. So now you have two possibilities: produce hard-coded unmodifiable code that does exactly what was requested (like the code I wrote), or create a "library" that accepts variable patterns (like the ones used in masked edits, where you can tell the masked edit "accept only (0000)-0000 AAAA") and validates the string based on this pattern... But this will be a poor-man's regex, only worse, because you'll have to maintain and test it. This problem will become clear when one month from the release of the code they'll ask you to accept even the (12345)-1234 ABCD pattern... and then the (1234)-12345 ABCD pattern... and a new pattern every two months (until around one and half years later they'll tell you to remove the validator, because the persons that use the program hate them and it slow their work)
I don't understand the use case of var patterns in C#7. MSDN:
A pattern match with the var pattern always succeeds. Its syntax is
expr is var varname
where the value of expr is always assigned to a local variable named
varname. varname is a static variable of the same type as expr.
The example on MSDN is pretty useless in my opinion, especially because the if is redundant:
object[] items = { new Book("The Tempest"), new Person("John") };
foreach (var item in items) {
if (item is var obj)
Console.WriteLine($"Type: {obj.GetType().Name}, Value: {obj}");
}
Here i don't see any benefits, you could have the same if you access the loop variable item directly which is also of type Object. The if is confusing as well because it's never false.
I could use var otherItem = item or use item diectly.
Can someone explain the use case better?
The var pattern was very frequently discussed in the C# language repository given that it’s not perfectly clear what its use case is and given the fact that is var x does not perform a null check while is T x does, making it appear rather useless.
However, it is actually not meant to be used as obj is var x. It is meant to be used when the left hand side is not a variable on its own.
Here are some examples from the specification. They all use features that are not in C# yet but this just shows that the introduction of the var pattern was primarly made in preparation for those things, so they won’t have to touch it again later.
The following example declares a function Deriv to construct the derivative of a function using structural pattern matching on an expression tree:
Expr Deriv(Expr e)
{
switch (e) {
// …
case Const(_): return Const(0);
case Add(var Left, var Right):
return Add(Deriv(Left), Deriv(Right));
// …
}
Here, the var pattern can be used inside the structures to “pull out” elements from the structure. Similarly, the following example simplifies an expression:
Expr Simplify(Expr e)
{
switch (e) {
case Mult(Const(0), _): return Const(0);
// …
case Add(Const(0), var x): return Simplify(x);
}
}
As gafter writes here, the idea is also to have property pattern matching, allowing the following:
if (o is Point {X is 3, Y is var y})
{ … }
Without checking the design notes on Github I'd guess this was added more for consistency with switch and as a stepping stone for more advanced pattern matching cases,
From the original What’s New in C# 7.0 post :
Var patterns of the form var x (where x is an identifier), which always match, and simply put the value of the input into a fresh variable x with the same type as the input.
And the recent dissection post by Sergey Teplyakov :
if you know what exactly is going on you may find this pattern useful. It can be used for introducing a temporary variable inside the expression:
This pattern essentially creates a temporary variable using the actual type of the object.
public void VarPattern(IEnumerable<string> s)
{
if (s.FirstOrDefault(o => o != null) is var v
&& int.TryParse(v, out var n))
{
Console.WriteLine(n);
}
}
The warning righ before that snippet is also significant:
It is not clear why the behavior is different in the Release mode only. But I think all the issues falls into the same bucket: the initial implementation of the feature is suboptimal. But based on this comment by Neal Gafter, this is going to change: "The pattern-matching lowering code is being rewritten from scratch (to support recursive patterns, too). I expect most of the improvements you seek here will come for "free" in the new code. But it will be some time before that rewrite is ready for prime time.".
According to Christian Nagel :
The advantage is that the variable declared with the var keyword is of the real type of the object,
Only thing I can think of offhand is if you find that you've written two identical blocks of code (in say a single switch), one for expr is object a and the other for expr is null.
You can combine the blocks by switching to expr is var a.
It may also be useful in code generation scenarios where, for whatever reason, you've already written yourself into a corner and always expect to generate a pattern match but now want to issue a "match all" pattern.
In most cases it is true, the var pattern benefit is not clear, and can even be a bad idea. However as a way of capturing anonymous types in temp variable it works great.
Hopefully this example can illustrate this:
Note below, adding a null case avoids var to ever be null, and no null check is required.
var sample = new(int id, string name, int age)[] {
(1, "jonas", 50),
(2, "frank", 48) };
var f48 = from s in sample
where s.age == 48
select new { Name = s.name, Age = s.age };
switch(f48.FirstOrDefault())
{
case var choosen when choosen.Name == "frank":
WriteLine(choosen.Age);
break;
case null:
WriteLine("not found");
break;
}
I am trying to compare namespaces to see if my method only throws the correct exceptions. With the correct exceptions I mean the following:
Exceptions from the same namespace.
Exceptions from a higher namespace.
Exceptions from equivalent (and higher) System-namespace.
Examples:
Method is in namespace MyNamespace.Collections.Generic so it can throw exceptions from MyNamespace.Collections.Generic.
Method is in namespace MyNamespace.Collections.Generic so it can throw exceptions from MyNamespace.Collections and MyNamespace.
Method is in namespace MyNamespace.Collections.Generic so it can throw exceptions from System.Collections.Generic and System.Collections and System.
The first part is quite easy; checking for the same namespace. Also one part of number 3 worked; because System namespace is always correct.
For the other parts I tried the following:
string[] exceptNamespaceSegments = exceptionNamespaceSegments
.Except(classNamespaceSegments)
.ToArray();
if (exceptNamespaceSegments.Length == 1 && exceptNamespaceSegments.FirstOrDefault() == "System")
return;
Which checks if the namespace-segments (Collections, Generic) are also used in the class-namespace. If this is the case, the correct exception is thrown.
But this wouldn't work for a case where the exception is in namespace System.Collections.Generic.Something, because Something isn't in the class-namespace.
Come to think of it, this doesn't take into account the sequence. So System.Generic.Collections would also be correct; using what I currently have.
Is there any way I can get this done without having to write a butload of if-statements comparing each individual segment?
Unless I'm misunderstanding the question: You may try something like this to find all allowed namespaces as per the criteria given.
private static IEnumerable<string> GetAllowedNamespaces(Type type)
{
const string System = "System";
string thisNamespace = type.Namespace;
HashSet<string> hashset = new HashSet<string>();
hashset.Add(thisNamespace);
hashset.Add(System);
var parts = thisNamespace.Split('.');
if (parts.Length > 1)
{
string previous = string.Empty;
foreach (var part in parts)
{
var current = string.IsNullOrEmpty(previous)
? part
: string.Format("{0}.{1}", previous, part);
previous = current;
hashset.Add(current);
}
previous = string.Empty;
foreach (var part in new[] { System }.Concat(parts.Skip(1)))
{
var current = string.IsNullOrEmpty(previous)
? part
: string.Format("{0}.{1}", previous, part);
previous = current;
hashset.Add(current);
}
}
return hashset;
}
Then you can easily check whether exception's namespace fall in this list, if not there is a problem :)
There is duplicate code block, you may refactor it to a method to follow DRY principle..
I was wondering if there is a more aesthetic/easier to read way to write the following:
for (int i = 0; i < 100; i++)
{
// If m.GetString(i) throws an exception, continue.
// Otherwise, do stuff.
try
{
string s = m.GetString(i);
continue;
}
catch (InvalidCastException)
{
}
// do stuff with the message that you know is not a string.
}
Here's what m looks like:
msg[0] = 10
msg[1] = "string"
msg[2] = 2.224574743
// Etc.
// Assume it's different every time.
So therefore when I do m.GetString(0) in this example, it throws an exception, as msg[0] is a uint and not a string. This is what I use to get the type, since m does not contain a GetType and I cannot edit m.
m is an instance of class Message in a library that I cannot edit.
However, despite this working just fine, it feels inefficient (and certainly not reader-friendly) to intentionally create exceptions, even if it's in a try-catch, in order to get the type.
Is there a better way or am I stuck with this?
Edit: Alright, I researched the Message class some more (which I should have done to begin with, my apologies). It is an IEnumerable<object>
Now that I know that m is an IEnumerable<object>, I think this is probably your best bet:
foreach (string s in m.OfType<string>())
{
// Process s, which can't be null.
}
Nice and simple and it appears to handle all the logic that you want, i.e. it will process only the items in the sequence that are strings, and it will ignore all objects of other types.
However as Servy points out, this will not handle nulls in the list because null does not have any type at all.
[My previous answer before I knew the type of m]
I think you can take one of three approaches to this:
(1) Add a bool TryGetString(int index, out string) method to whatever type m is in your example and then do
if (m.TryGetString(i, out s))
// Process s (need to check for null!)
(2) Add a bool IsString(int index) method and call that before calling GetString().
if (m.IsString(i))
{
s = m.GetString(i);
// Process s (need to check for null!)
(3) Alternatively, you could expose the item via something like GetObject(int index) and then do something like Iiya suggested:
string s = m.GetObject(i) as string;
if (s != null)
// Process s
I think (1) or (3) would be best, although there might be a much better solution that we could suggest if we had more information about m.
If you want to process only strings in a non-strongly typed sequence of data, use next code:
for (int i = 0; i < 100; i++)
{
string s = m[i] as string;
if(s != null)
{
}
}
Background.
My script encounters a StackOverflowException while recursively searching for specific text in a large string. The loop is not infinite; the problem occurs (for a specific search) between 9,000-10,000 legitimate searches -- I need it to keep going. I'm using tail-recursion (I think) and that may be part of my problem, since I gather that C# does not do this well. However, I'm not sure how to avoid using tail-recursion in my case.
Question(s). Why is the StackOverflowException occurring? Does my overall approach make sense? If the design sucks, I'd rather start there, rather than just avoiding an exception. But if the design is acceptable, what can I do about the StackOverflowException?
Code.
The class I've written searches for contacts (about 500+ from a specified list) in a large amount of text (about 6MB). The strategy I'm using is to search for the last name, then look for the first name somewhere shortly before or after the last name. I need to find each instance of each contact within the given text. The StringSearcher class has a recursive method that continues to search for contacts, returning the result whenever one is found, but keeping track of where it left off with the search.
I use this class in the following manner:
StringSearcher searcher = new StringSearcher(
File.ReadAllText(FilePath),
"lastname",
"firstname",
30
);
string searchResult = null;
while ((searchResult = searcher.NextInstance()) != null)
{
// do something with each searchResult
}
On the whole, the script seems to work. Most contacts return the results I expect. However, The problem seems to occur when the primary search string is extremely common (thousands of hits), and the secondary search string never or rarely occurs. I know it's not getting stuck because the CurrentIndex is advancing normally.
Here's the recursive method I'm talking about.
public string NextInstance()
{
// Advance this.CurrentIndex to the next location of the primary search string
this.SearchForNext();
// Look a little before and after the primary search string
this.CurrentContext = this.GetContextAtCurrentIndex();
// Primary search string found?
if (this.AnotherInstanceFound)
{
// If there is a valid secondary search string, is that found near the
// primary search string? If not, look for the next instance of the primary
// search string
if (!string.IsNullOrEmpty(this.SecondarySearchString) &&
!this.IsSecondaryFoundInContext())
{
return this.NextInstance();
}
//
else
{
return this.CurrentContext;
}
}
// No more instances of the primary search string
else
{
return null;
}
}
The StackOverflowException occurs on this.CurrentIndex = ... in the following method:
private void SearchForNext()
{
// If we've already searched once,
// increment the current index before searching further.
if (0 != this.CurrentIndex)
{
this.CurrentIndex++;
this.NumberOfSearches++;
}
this.CurrentIndex = this.Source.IndexOf(
this.PrimarySearchString,
ValidIndex(this.CurrentIndex),
StringComparison.OrdinalIgnoreCase
);
this.AnotherInstanceFound = !(this.CurrentIndex >= 0) ? false : true;
}
I can include more code if needed. Let me know if one of those methods or variables are questionable.
*Performance is not really a concern because this will likely run at night as a scheduled task.
You have a 1MB stack. When that stack space runs out and you still need more stack space a StackOverflowException is thrown. This may or may not be a result of infinite recursion, the runtime has no idea. Infinite recursion is simply one effective way of using more stack space then is available (by using an infinite amount). You can be using a finite amount that just so happens to be more than is available and you'll get the same exception.
While there are other ways to use up lots of stack space, recursion is one of the most effective. Each method is adding more space based on the signature and locals of that method. Having deep recursion can use a lot of stack space, so if you expect to have a depth of more than a few hundred levels (and even that is a lot) you should probably not use recursion. Note that any code using recursion can be written iterativly, or to use an explicit Stack.
It's hard to say, as a complete implementation isn't shown, but based on what I can see you are more or less writing an iterator, but you're not using the C# constructs for one (namely IEnumerable).
My guess is "iterator blocks" will allow you to make this algorithm both easier to write, easier to write non-recursively, and more effective from the caller's side.
Here is a high level look at how you might structure this method as an iterator block:
public static IEnumerable<string> SearchString(string text
, string firstString, string secondString, int unknown)
{
int lastIndexFound = text.IndexOf(firstString);
while (lastIndexFound >= 0)
{
if (secondStringNearFirst(text, firstString, secondString, lastIndexFound))
{
yield return lastIndexFound.ToString();
}
}
}
private static bool secondStringNearFirst(string text
, string firstString, string secondString, int lastIndexFound)
{
throw new NotImplementedException();
}
It doesn't seem like recursion is the right solution here. Normally with recursive problems you have some state you pass to the recursive step. In this case, you really have a plain while loop. Below I put your method body in a loop and changed the recursive step to continue. See if that works...
public string NextInstance()
{
while (true)
{
// Advance this.CurrentIndex to the next location of the primary search string
this.SearchForNext();
// Look a little before and after the primary search string
this.CurrentContext = this.GetContextAtCurrentIndex();
// Primary search string found?
if (this.AnotherInstanceFound)
{
// If there is a valid secondary search string, is that found near the
// primary search string? If not, look for the next instance of the primary
// search string
if (!string.IsNullOrEmpty(this.SecondarySearchString) &&
!this.IsSecondaryFoundInContext())
{
continue; // Start searching again...
}
//
else
{
return this.CurrentContext;
}
}
// No more instances of the primary search string
else
{
return null;
}
}
}