Related
I'm trying to use Roslyn to replace some old, slow code in a utility that searches our source code for string literals which are not wrapped in an X() function call (things wrapped in X() will be be translated).
I was able to use the Syntax Tree to get the string literals quite easily, and identified most of the places where they were wrapped in X(). What I was doing: given an LiteralExpressionSyntax object, I found that this gave me the function call and I could match it with a regular expression.
s.Parent.Parent.Parent.ToFullString()
I quickly ran into a problem when the string literal was split between two lines. At that point I realized my means of checking if it was in an X() call was poor, because I'd have to keep adding .Parent to the chain. While I could write something to crawl backwards up the tree, that didn't seem like it was the right way (and probably wouldn't perform very well).
I've been trying to find a way, given a string literal syntax node, to determine if that is an argument in a method call. I haven't been able to find a decent way of going from the Syntax Tree to the Semantic Model to find what I'm looking for. I'm not even sure if that's the right approach, or if I'm missing something obvious.
I was able to use SymbolFinder.FindSymbolAtPositionAsync to get to the symbol, but that suffers from the same issue - I can't just pass the position of the string literal argument, I have to pass the position of the (correct) parent and I'm back where I started.
I'm hoping to avoid having to loop through the syntax tree multiple times, because that slows things way down. I can parse 553 files in about 1 second, but as soon as I try to loop to account for these multi-line situations, I'm up to about 12-13 seconds.
Just in case I've lost you in this novella (sorry), here's what I'm hoping to figure out: for a string literal being passed as an argument to a method, is there an easy way to determine what that method is?
Here is some example code - I've replaced calls to my X() function with Convert.ToString just to simulate the code I'm searching (I had to add references to one of our DLLs, so I switched calls to Convert.ToString() so I could just reference mscorlib for this example.
static void TestAttempt()
{
string source = #"
Imports System
Namespace Exceptions
Public NotInheritable Class ExampleException
Inherits Validation
Public Sub New()
Convert.ToString(""Ignore me 1"")
Console.WriteLine(""Report me"")
Console.WriteLine(Convert.ToString(""Ignore me 2""))
MyBase.New(Convert.ToString(""Ignore me 3, "" & _
""Because I'm already translated.""))
End Sub
End Class
End Namespace";
var tree = VisualBasicSyntaxTree.ParseText(source);
var syntaxRoot = tree.GetRoot();
int i = 0;
foreach (var s in syntaxRoot.DescendantNodes().OfType<LiteralExpressionSyntax>())
{
// things to skip:
if (s == null) { continue; }
if (s.Kind() != SyntaxKind.StringLiteralExpression) { continue; }
var Mscorlib = PortableExecutableReference.CreateFromFile(#"C:\Windows\Microsoft.NET\Framework\v4.0.30319\mscorlib.dll");
var compilation = VisualBasicCompilation.Create("MyCompilation", syntaxTrees: new[] { tree }, references: new[] { Mscorlib });
var model = compilation.GetSemanticModel(tree);
var symbol = SymbolFinder.FindSymbolAtPositionAsync(model, s.Parent.Parent.Parent.Span.Start, new AdhocWorkspace()).Result;
if (symbol.ToDisplayString().EndsWith("Convert")) { continue; }
Console.WriteLine(symbol);
Console.WriteLine($" Reported: {s.ToString()}");
i++;
}
Console.WriteLine();
Console.WriteLine($"Total: {i}");
}
With a warning that I have never used Roslyn before, and I normally code in VB.NET, I think I hacked together something that seems to do what you want (using LINQPad and lots of Dump() calls helped to find out what was going on).
void TestAttempt()
{
string source = #"
Imports System
Namespace Exceptions
Public NotInheritable Class ExampleException
Inherits Validation
Public Sub New()
X(""Ignore me 1"")
Console.WriteLine(""Report me"")
Console.WriteLine(X(""Ignore me 2""))
MyBase.New(X(""Ignore me 3, "" & _
""Because I'm already translated.""))
End Sub
End Class
End Namespace";
var tree = VisualBasicSyntaxTree.ParseText(source);
var syntaxRoot = tree.GetRoot();
int i = 0, notWrapped = 0;
foreach (var s in syntaxRoot.DescendantNodes().OfType<LiteralExpressionSyntax>())
{
// things to skip:
if (s == null) { continue; }
if (s.Kind() != SyntaxKind.StringLiteralExpression) { continue; }
if (!IsWrappedInCallToX(s))
{
Console.WriteLine($" Reported: {s.ToString()}");
notWrapped++;
}
i++;
}
Console.WriteLine();
Console.WriteLine($"Total: {i}, Not Wrapped In X: {notWrapped}");
}
bool IsWrappedInCallToX(SyntaxNode node)
{
var invocation = node as InvocationExpressionSyntax;
if (invocation != null)
{
var exp = invocation.Expression as IdentifierNameSyntax;
if (exp != null && exp.ToString() == "X")
{
return true;
}
}
if (node.Parent != null)
{
return IsWrappedInCallToX(node.Parent);
}
return false;
}
This results in:
Reported: "Report me"
Total: 5, Not Wrapped In X: 1
The IsWrappedInCallToX function just recurses up the tree looking for an InvocationExpressionSyntax for the X function. I know you said "While I could write something to crawl backwards up the tree, that didn't seem like it was the right way (and probably wouldn't perform very well)", but to me it seems like this is the right way - if the performance is horrible on your code base, maybe not!
Again, I know nothing about Roslyn (this just sounded interesting), so this is very likely a terrible solution! :-)
I was wondering if there is a more aesthetic/easier to read way to write the following:
for (int i = 0; i < 100; i++)
{
// If m.GetString(i) throws an exception, continue.
// Otherwise, do stuff.
try
{
string s = m.GetString(i);
continue;
}
catch (InvalidCastException)
{
}
// do stuff with the message that you know is not a string.
}
Here's what m looks like:
msg[0] = 10
msg[1] = "string"
msg[2] = 2.224574743
// Etc.
// Assume it's different every time.
So therefore when I do m.GetString(0) in this example, it throws an exception, as msg[0] is a uint and not a string. This is what I use to get the type, since m does not contain a GetType and I cannot edit m.
m is an instance of class Message in a library that I cannot edit.
However, despite this working just fine, it feels inefficient (and certainly not reader-friendly) to intentionally create exceptions, even if it's in a try-catch, in order to get the type.
Is there a better way or am I stuck with this?
Edit: Alright, I researched the Message class some more (which I should have done to begin with, my apologies). It is an IEnumerable<object>
Now that I know that m is an IEnumerable<object>, I think this is probably your best bet:
foreach (string s in m.OfType<string>())
{
// Process s, which can't be null.
}
Nice and simple and it appears to handle all the logic that you want, i.e. it will process only the items in the sequence that are strings, and it will ignore all objects of other types.
However as Servy points out, this will not handle nulls in the list because null does not have any type at all.
[My previous answer before I knew the type of m]
I think you can take one of three approaches to this:
(1) Add a bool TryGetString(int index, out string) method to whatever type m is in your example and then do
if (m.TryGetString(i, out s))
// Process s (need to check for null!)
(2) Add a bool IsString(int index) method and call that before calling GetString().
if (m.IsString(i))
{
s = m.GetString(i);
// Process s (need to check for null!)
(3) Alternatively, you could expose the item via something like GetObject(int index) and then do something like Iiya suggested:
string s = m.GetObject(i) as string;
if (s != null)
// Process s
I think (1) or (3) would be best, although there might be a much better solution that we could suggest if we had more information about m.
If you want to process only strings in a non-strongly typed sequence of data, use next code:
for (int i = 0; i < 100; i++)
{
string s = m[i] as string;
if(s != null)
{
}
}
Background.
My script encounters a StackOverflowException while recursively searching for specific text in a large string. The loop is not infinite; the problem occurs (for a specific search) between 9,000-10,000 legitimate searches -- I need it to keep going. I'm using tail-recursion (I think) and that may be part of my problem, since I gather that C# does not do this well. However, I'm not sure how to avoid using tail-recursion in my case.
Question(s). Why is the StackOverflowException occurring? Does my overall approach make sense? If the design sucks, I'd rather start there, rather than just avoiding an exception. But if the design is acceptable, what can I do about the StackOverflowException?
Code.
The class I've written searches for contacts (about 500+ from a specified list) in a large amount of text (about 6MB). The strategy I'm using is to search for the last name, then look for the first name somewhere shortly before or after the last name. I need to find each instance of each contact within the given text. The StringSearcher class has a recursive method that continues to search for contacts, returning the result whenever one is found, but keeping track of where it left off with the search.
I use this class in the following manner:
StringSearcher searcher = new StringSearcher(
File.ReadAllText(FilePath),
"lastname",
"firstname",
30
);
string searchResult = null;
while ((searchResult = searcher.NextInstance()) != null)
{
// do something with each searchResult
}
On the whole, the script seems to work. Most contacts return the results I expect. However, The problem seems to occur when the primary search string is extremely common (thousands of hits), and the secondary search string never or rarely occurs. I know it's not getting stuck because the CurrentIndex is advancing normally.
Here's the recursive method I'm talking about.
public string NextInstance()
{
// Advance this.CurrentIndex to the next location of the primary search string
this.SearchForNext();
// Look a little before and after the primary search string
this.CurrentContext = this.GetContextAtCurrentIndex();
// Primary search string found?
if (this.AnotherInstanceFound)
{
// If there is a valid secondary search string, is that found near the
// primary search string? If not, look for the next instance of the primary
// search string
if (!string.IsNullOrEmpty(this.SecondarySearchString) &&
!this.IsSecondaryFoundInContext())
{
return this.NextInstance();
}
//
else
{
return this.CurrentContext;
}
}
// No more instances of the primary search string
else
{
return null;
}
}
The StackOverflowException occurs on this.CurrentIndex = ... in the following method:
private void SearchForNext()
{
// If we've already searched once,
// increment the current index before searching further.
if (0 != this.CurrentIndex)
{
this.CurrentIndex++;
this.NumberOfSearches++;
}
this.CurrentIndex = this.Source.IndexOf(
this.PrimarySearchString,
ValidIndex(this.CurrentIndex),
StringComparison.OrdinalIgnoreCase
);
this.AnotherInstanceFound = !(this.CurrentIndex >= 0) ? false : true;
}
I can include more code if needed. Let me know if one of those methods or variables are questionable.
*Performance is not really a concern because this will likely run at night as a scheduled task.
You have a 1MB stack. When that stack space runs out and you still need more stack space a StackOverflowException is thrown. This may or may not be a result of infinite recursion, the runtime has no idea. Infinite recursion is simply one effective way of using more stack space then is available (by using an infinite amount). You can be using a finite amount that just so happens to be more than is available and you'll get the same exception.
While there are other ways to use up lots of stack space, recursion is one of the most effective. Each method is adding more space based on the signature and locals of that method. Having deep recursion can use a lot of stack space, so if you expect to have a depth of more than a few hundred levels (and even that is a lot) you should probably not use recursion. Note that any code using recursion can be written iterativly, or to use an explicit Stack.
It's hard to say, as a complete implementation isn't shown, but based on what I can see you are more or less writing an iterator, but you're not using the C# constructs for one (namely IEnumerable).
My guess is "iterator blocks" will allow you to make this algorithm both easier to write, easier to write non-recursively, and more effective from the caller's side.
Here is a high level look at how you might structure this method as an iterator block:
public static IEnumerable<string> SearchString(string text
, string firstString, string secondString, int unknown)
{
int lastIndexFound = text.IndexOf(firstString);
while (lastIndexFound >= 0)
{
if (secondStringNearFirst(text, firstString, secondString, lastIndexFound))
{
yield return lastIndexFound.ToString();
}
}
}
private static bool secondStringNearFirst(string text
, string firstString, string secondString, int lastIndexFound)
{
throw new NotImplementedException();
}
It doesn't seem like recursion is the right solution here. Normally with recursive problems you have some state you pass to the recursive step. In this case, you really have a plain while loop. Below I put your method body in a loop and changed the recursive step to continue. See if that works...
public string NextInstance()
{
while (true)
{
// Advance this.CurrentIndex to the next location of the primary search string
this.SearchForNext();
// Look a little before and after the primary search string
this.CurrentContext = this.GetContextAtCurrentIndex();
// Primary search string found?
if (this.AnotherInstanceFound)
{
// If there is a valid secondary search string, is that found near the
// primary search string? If not, look for the next instance of the primary
// search string
if (!string.IsNullOrEmpty(this.SecondarySearchString) &&
!this.IsSecondaryFoundInContext())
{
continue; // Start searching again...
}
//
else
{
return this.CurrentContext;
}
}
// No more instances of the primary search string
else
{
return null;
}
}
}
A common problem in any language is to assert that parameters sent in to a method meet your requirements, and if they don't, to send nice, informative error messages. This kind of code gets repeated over and over, and we often try to create helpers for it. However, in C#, it seems those helpers are forced to deal with some duplication forced upon us by the language and compiler. To show what I mean, let me present some some raw code with no helpers, followed by one possible helper. Then, I'll point out the duplication in the helper and phrase my question precisely.
First, the code without any helpers:
public void SomeMethod(string firstName, string lastName, int age)
{
if(firstName == null)
{
throw new WhateverException("The value for firstName cannot be null.");
}
if(lastName == null)
{
throw new WhateverException("The value for lastName cannot be null.");
}
// Same kind of code for age, making sure it is a reasonable range (< 150, for example).
// You get the idea
}
}
Now, the code with a reasonable attempt at a helper:
public void SomeMethod(string firstName, string lastName, int age)
{
Helper.Validate( x=> x !=null, "firstName", firstName);
Helper.Validate( x=> x!= null, "lastName", lastName);
}
The main question is this: Notice how the code has to pass the value of the parameter and the name of the parameter ("firstName" and firstName). This is so the error message can say, "Blah blah blah the value for the firstName parameter." Have you found any way to get around this using reflection or anything else? Or a way to make it less painful?
And more generally, have you found any other ways to streamline this task of validating parameters while reducing code duplication?
EDIT: I've read people talking about making use of the Parameters property, but never quite found a way around the duplication. Anyone have luck with that?
Thanks!
You should check out Code Contracts; they do pretty much exactly what you're asking. Example:
[Pure]
public static double GetDistance(Point p1, Point p2)
{
CodeContract.RequiresAlways(p1 != null);
CodeContract.RequiresAlways(p2 != null);
// ...
}
Wow, I found something really interesting here. Chris above gave a link to another Stack Overflow question. One of the answers there pointed to a blog post which describes how to get code like this:
public static void Copy<T>(T[] dst, long dstOffset, T[] src, long srcOffset, long length)
{
Validate.Begin()
.IsNotNull(dst, “dst”)
.IsNotNull(src, “src”)
.Check()
.IsPositive(length)
.IsIndexInRange(dst, dstOffset, “dstOffset”)
.IsIndexInRange(dst, dstOffset + length, “dstOffset + length”)
.IsIndexInRange(src, srcOffset, “srcOffset”)
.IsIndexInRange(src, srcOffset + length, “srcOffset + length”)
.Check();
for (int di = dstOffset; di < dstOffset + length; ++di)
dst[di] = src[di - dstOffset + srcOffset];
}
I'm not convinced it is the best answer yet, but it certainly is interesting. Here's the blog post, from Rick Brewster.
This may be somewhat helpful:
Design by contract/C# 4.0/avoiding ArgumentNullException
I tackled this exact problem a few weeks ago, after thinking that it is strange how testing libraries seem to need a million different versions of Assert to make their messages descriptive.
Here's my solution.
Brief summary - given this bit of code:
int x = 3;
string t = "hi";
Assert(() => 5*x + (2 / t.Length) < 99);
My Assert function can print out the following summary of what is passed to it:
(((5 * x) + (2 / t.Length)) < 99) == True where
{
((5 * x) + (2 / t.Length)) == 16 where
{
(5 * x) == 15 where
{
x == 3
}
(2 / t.Length) == 1 where
{
t.Length == 2 where
{
t == "hi"
}
}
}
}
So all the identifier names and values, and the structure of the expression, could be included in the exception message, without you having to restate them in quoted strings.
Alright guys, it's me again, and I found something else that is astonishing and delightful. It is yet another blog post referred to from the other SO question that Chris, above, mentioned.
This guy's approach lets you write this:
public class WebServer
{
public void BootstrapServer( int port, string rootDirectory, string serverName )
{
Guard.IsNotNull( () => rootDirectory );
Guard.IsNotNull( () => serverName );
// Bootstrap the server
}
}
Note that there is no string containing "rootDirectory" and no string containing "serverName"!! And yet his error messages can say something like "The rootDirectory parameter must not be null."
This is exactly what I wanted and more than I hoped for. Here's the link to the guy's blog post.
And the implementation is pretty simple, as follows:
public static class Guard
{
public static void IsNotNull<T>(Expression<Func<T>> expr)
{
// expression value != default of T
if (!expr.Compile()().Equals(default(T)))
return;
var param = (MemberExpression) expr.Body;
throw new ArgumentNullException(param.Member.Name);
}
}
Note that this makes use of "static reflection", so in a tight loop or something, you might want to use Rick Brewster's approach above.
As soon as I post this I'm gonna vote up Chris, and the response to the other SO question. This is some good stuff!!!
Using my library The Helper Trinity:
public void SomeMethod(string firstName, string lastName, int age)
{
firstName.AssertNotNull("firstName");
lastName.AssertNotNull("lastName");
...
}
Also supports asserting that enumeration parameters are correct, collections and their contents are non-null, string parameters are non-empty etcetera. See the user documentation here for detailed examples.
Here's my answer to the problem. I call it "Guard Claws". It uses the IL parser from the Lokad Shared Libs but has a more straightforward approach to stating the actual guard clauses:
string test = null;
Claws.NotNull(() => test);
You can see more examples of it's usage in the specs.
Since it uses real lambdas as input and uses the IL Parser only to generate the exception in the case of a violation it should perform better on the "happy path" than the Expression based designs elsewhere in these answers.
The links are not working, here is the URL:
http://github.com/littlebits/guard_claws/
The Lokad Shared Libraries also have an IL parsing based implementation of this which avoids having to duplicate the parameter name in a string.
For example:
Enforce.Arguments(() => controller, () => viewManager,() => workspace);
Will throw an exception with the appropriate parameter name if any of the listed arguments is null. It also has a really neat policy based rules implementation.
e.g.
Enforce.Argument(() => username, StringIs.Limited(3, 64), StringIs.ValidEmail);
My preference would be to just evaluate the condition and pass the result rather than passing an expression to be evaluated and the parameter on which to evaluate it. Also, I prefer to have the ability to customize the entire message. Note that these are simply preferences -- I'm not saying that your sample is wrong -- but there are some cases where this is very useful.
Helper.Validate( firstName != null || !string.IsNullOrEmpty(directoryID),
"The value for firstName cannot be null if a directory ID is not supplied." );
Don't know if this technique transfers from C/C++ to C#, but I've done this with macros:
#define CHECK_NULL(x) { (x) != NULL || \
fprintf(stderr, "The value of %s in %s, line %d is null.\n", \
#x, __FILENAME__, __LINE__); }
In this case, rather than use your own exception type, or really general types like ApplicationException.. I think it is best to use the built in exception types that are specifically intended for this use:
Among those.. System.ArgumentException, System.ArgumentNullException...
Postsharp or some other AOP framework.
It does not apply everywhere, but it might help in many cases:
I suppose that "SomeMethod" is carrying out some behavioral operation on the data "last name", "first name" and "age". Evaluate your current code design. If the three pieces of data are crying for a class, put them into a class. In that class you can also put your checks. This would free "SomeMethod" from input checking.
The end result would be something like this:
public void SomeMethod(Person person)
{
person.CheckInvariants();
// code here ...
}
The call would be something like this (if you use .NET 3.5):
SomeMethod(new Person { FirstName = "Joe", LastName = "White", Age = 12 });
under the assumption that the class would look like this:
public class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
public int Age { get; set; }
public void CheckInvariants()
{
assertNotNull(FirstName, "first name");
assertNotNull(LastName, "last name");
}
// here are your checks ...
private void assertNotNull(string input, string hint)
{
if (input == null)
{
string message = string.Format("The given {0} is null.", hint);
throw new ApplicationException(message);
}
}
Instead of the syntactic sugar of .NET 3.5 you can also use constructor arguments to create a Person object.
Just as a contrast, this post by Miško Hevery on the Google Testing Blog argues that this kind of parameter checking might not always be a good thing. The resulting debate in the comments also raises some interesting points.
As part of the base class for some extensive unit testing, I am writing a helper function which recursively compares the nodes of one XmlDocument object to another in C# (.NET). Some requirements of this:
The first document is the source, e.g. what I want the XML document to look like. Thus the second is the one I want to find differences in and it must not contain extra nodes not in the first document.
Must throw an exception when too many significant differences are found, and it should be easily understood by a human glancing at the description.
Child element order is important, attributes can be in any order.
Some attributes are ignorable; specifically xsi:schemaLocation and xmlns:xsi, though I would like to be able to pass in which ones are.
Prefixes for namespaces must match in both attributes and elements.
Whitespace between elements is irrelevant.
Elements will either have child elements or InnerText, but not both.
While I'm scrapping something together: has anyone written such code and would it be possible to share it here?
On an aside, what would you call the first and second documents? I've been referring to them as "source" and "target", but it feels wrong since the source is what I want the target to look like, else I throw an exception.
Microsoft has an XML diff API that you can use.
Unofficial NuGet: https://www.nuget.org/packages/XMLDiffPatch.
I googled up a more complete list of solutions of this problem today, I am going to try one of them soon:
http://xmlunit.sourceforge.net/
http://msdn.microsoft.com/en-us/library/aa302294.aspx
http://jolt.codeplex.com/wikipage?title=Jolt.Testing.Assertions.XML.Adaptors
http://www.codethinked.com/checking-xml-for-semantic-equivalence-in-c
https://vkreynin.wordpress.com/tag/xml/
http://gandrusz.blogspot.com/2008/07/recently-i-have-run-into-usual-problem.html
http://xmlspecificationcompare.codeplex.com/
https://github.com/netbike/netbike.xmlunit
This code doesn't satisfy all your requirements, but it's simple and I'm using for my unit tests. Attribute order doesn't matter, but element order does. Element inner text is not compared. I also ignored case when comparing attributes, but you can easily remove that.
public bool XMLCompare(XElement primary, XElement secondary)
{
if (primary.HasAttributes) {
if (primary.Attributes().Count() != secondary.Attributes().Count())
return false;
foreach (XAttribute attr in primary.Attributes()) {
if (secondary.Attribute(attr.Name.LocalName) == null)
return false;
if (attr.Value.ToLower() != secondary.Attribute(attr.Name.LocalName).Value.ToLower())
return false;
}
}
if (primary.HasElements) {
if (primary.Elements().Count() != secondary.Elements().Count())
return false;
for (var i = 0; i <= primary.Elements().Count() - 1; i++) {
if (XMLCompare(primary.Elements().Skip(i).Take(1).Single(), secondary.Elements().Skip(i).Take(1).Single()) == false)
return false;
}
}
return true;
}
try XMLUnit. This library is available for both Java and .Net
For comparing two XML outputs in automated testing I found XNode.DeepEquals.
Compares the values of two nodes, including the values of all descendant nodes.
Usage:
var xDoc1 = XDocument.Parse(xmlString1);
var xDoc2 = XDocument.Parse(xmlString2);
bool isSame = XNode.DeepEquals(xDoc1.Document, xDoc2.Document);
//Assert.IsTrue(isSame);
Reference: https://learn.microsoft.com/en-us/dotnet/api/system.xml.linq.xnode.deepequals?view=netcore-2.2
Comparing XML documents is complicated. Google for xmldiff (there's even a Microsoft solution) for some tools. I've solved this a couple of ways. I used XSLT to sort elements and attributes (because sometimes they would appear in a different order, and I didn't care about that), and filter out attributes I didn't want to compare, and then either used the XML::Diff or XML::SemanticDiff perl module, or pretty printed each document with every element and attribute on a separate line, and using Unix command line diff on the results.
https://github.com/CameronWills/FatAntelope
Another alternative library to the Microsoft XML Diff API. It has a XML diffing algorithm to do an unordered comparison of two XML documents and produce an optimal matching.
It is a C# port of the X-Diff algorithm described here:
http://pages.cs.wisc.edu/~yuanwang/xdiff.html
Disclaimer: I wrote it :)
Another way to do this would be -
Get the contents of both files into two different strings.
Transform the strings using an XSLT (which will just copy everything over to two new strings). This will ensure that all spaces outside the elements are removed. This will result it two new strings.
Now, just compare the two strings with each other.
This won't give you the exact location of the difference, but if you just want to know if there is a difference, this is easy to do without any third party libraries.
I am using ExamXML for comparing XML files. You can try it.
The authors, A7Soft, also provide API for comparing XML files
Not relevant for the OP since it currently ignores child order, but if you want a code only solution you can try XmlSpecificationCompare which I somewhat misguidedly developed.
All above answers are helpful but I tried XMLUnit which look's easy to use Nuget package to check difference between two XML files, here is C# sample code
public static bool CheckXMLDifference(string xmlInput, string xmlOutput)
{
Diff myDiff = DiffBuilder.Compare(Input.FromString(xmlInput))
.WithTest(Input.FromString(xmlOutput))
.CheckForSimilar().CheckForIdentical()
.IgnoreComments()
.IgnoreWhitespace().NormalizeWhitespace().Build();
if(myDiff.Differences.Count() == 0)
{
// when there is no difference
// files are identical, return true;
return true;
}
else
{
//return false when there is 1 or more difference in file
return false;
}
}
If anyone want's to test it, I have also created online tool using it, you can take a look here
https://www.minify-beautify.com/online-xml-difference
Based #Two Cents answer and using this link XMLSorting i have created my own XmlComparer
Compare XML program
private static bool compareXML(XmlNode node, XmlNode comparenode)
{
if (node.Value != comparenode.Value)
return false;
if (node.Attributes.Count>0)
{
foreach (XmlAttribute parentnodeattribute in node.Attributes)
{
string parentattributename = parentnodeattribute.Name;
string parentattributevalue = parentnodeattribute.Value;
if (parentattributevalue != comparenode.Attributes[parentattributename].Value)
{
return false;
}
}
}
if(node.HasChildNodes)
{
sortXML(comparenode);
if (node.ChildNodes.Count != comparenode.ChildNodes.Count)
return false;
for(int i=0; i<node.ChildNodes.Count;i++)
{
string name = node.ChildNodes[i].LocalName;
if (compareXML(node.ChildNodes[i], comparenode.ChildNodes[i]) == false)
return false;
}
}
return true;
}
Sort XML program
private static void sortXML(XmlNode documentElement)
{
int i = 1;
SortAttributes(documentElement.Attributes);
SortElements(documentElement);
foreach (XmlNode childNode in documentElement.ChildNodes)
{
sortXML(childNode);
}
}
private static void SortElements(XmlNode rootNode)
{
for(int j = 0; j < rootNode.ChildNodes.Count; j++) {
for (int i = 1; i < rootNode.ChildNodes.Count; i++)
{
if (String.Compare(rootNode.ChildNodes[i].Name, rootNode.ChildNodes[1 - 1].Name) < 0)
{
rootNode.InsertBefore(rootNode.ChildNodes[i], rootNode.ChildNodes[i - 1]);
}
}
}
// Console.WriteLine(j++);
}
private static void SortAttributes(XmlAttributeCollection attribCol)
{
if (attribCol == null)
return;
bool changed = true;
while (changed)
{
changed = false;
for (int i = 1; i < attribCol.Count; i++)
{
if (String.Compare(attribCol[i].Name, attribCol[i - 1].Name) < 0)
{
//Replace
attribCol.InsertBefore(attribCol[i], attribCol[i - 1]);
changed = true;
}
}
}
}
I solved this problem of xml comparison using XSLT 1.0 which can be used for comparing large xml files using an unordered tree comparison algorithm.
https://github.com/sflynn1812/xslt-diff-turbo