Converting Directed Acyclic Graph (DAG) to tree - c#

I'm trying to implement algoritm to convert Directed Acyclic Graph to Tree (for fun, learining, kata, name it). So I come up with the data structure Node:
/// <summary>
/// Represeting a node in DAG or Tree
/// </summary>
/// <typeparam name="T">Value of the node</typeparam>
public class Node<T>
{
/// <summary>
/// creats a node with no child nodes
/// </summary>
/// <param name="value">Value of the node</param>
public Node(T value)
{
Value = value;
ChildNodes = new List<Node<T>>();
}
/// <summary>
/// Creates a node with given value and copy the collection of child nodes
/// </summary>
/// <param name="value">value of the node</param>
/// <param name="childNodes">collection of child nodes</param>
public Node(T value, IEnumerable<Node<T>> childNodes)
{
if (childNodes == null)
{
throw new ArgumentNullException("childNodes");
}
ChildNodes = new List<Node<T>>(childNodes);
Value = value;
}
/// <summary>
/// Determines if the node has any child node
/// </summary>
/// <returns>true if has any</returns>
public bool HasChildNodes
{
get { return this.ChildNodes.Count != 0; }
}
/// <summary>
/// Travearse the Graph recursively
/// </summary>
/// <param name="root">root node</param>
/// <param name="visitor">visitor for each node</param>
public void Traverse(Node<T> root, Action<Node<T>> visitor)
{
if (root == null)
{
throw new ArgumentNullException("root");
}
if (visitor == null)
{
throw new ArgumentNullException("visitor");
}
visitor(root);
foreach (var node in root.ChildNodes)
{
Traverse(node, visitor);
}
}
/// <summary>
/// Value of the node
/// </summary>
public T Value { get; private set; }
/// <summary>
/// List of all child nodes
/// </summary>
public List<Node<T>> ChildNodes { get; private set; }
}
It's pretty straightforward. Methods:
/// <summary>
/// Helper class for Node
/// </summary>
/// <typeparam name="T">Value of a node</typeparam>
public static class NodeHelper
{
/// <summary>
/// Converts Directed Acyclic Graph to Tree data structure using recursion.
/// </summary>
/// <param name="root">root of DAG</param>
/// <param name="seenNodes">keep track of child elements to find multiple connections (f.e. A connects with B and C and B also connects with C)</param>
/// <returns>root node of the tree</returns>
public static Node<T> DAG2TreeRec<T>(this Node<T> root, HashSet<Node<T>> seenNodes)
{
if (root == null)
{
throw new ArgumentNullException("root");
}
if (seenNodes == null)
{
throw new ArgumentNullException("seenNodes");
}
var length = root.ChildNodes.Count;
for (int i = 0; i < length; ++i)
{
var node = root.ChildNodes[i];
if (seenNodes.Contains(node))
{
var nodeClone = new Node<T>(node.Value, node.ChildNodes);
node = nodeClone;
}
else
{
seenNodes.Add(node);
}
DAG2TreeRec(node, seenNodes);
}
return root;
}
/// <summary>
/// Converts Directed Acyclic Graph to Tree data structure using explicite stack.
/// </summary>
/// <param name="root">root of DAG</param>
/// <param name="seenNodes">keep track of child elements to find multiple connections (f.e. A connects with B and C and B also connects with C)</param>
/// <returns>root node of the tree</returns>
public static Node<T> DAG2Tree<T>(this Node<T> root, HashSet<Node<T>> seenNodes)
{
if (root == null)
{
throw new ArgumentNullException("root");
}
if (seenNodes == null)
{
throw new ArgumentNullException("seenNodes");
}
var stack = new Stack<Node<T>>();
stack.Push(root);
while (stack.Count > 0)
{
var tempNode = stack.Pop();
var length = tempNode.ChildNodes.Count;
for (int i = 0; i < length; ++i)
{
var node = tempNode.ChildNodes[i];
if (seenNodes.Contains(node))
{
var nodeClone = new Node<T>(node.Value, node.ChildNodes);
node = nodeClone;
}
else
{
seenNodes.Add(node);
}
stack.Push(node);
}
}
return root;
}
}
and test:
static void Main(string[] args)
{
// Jitter preheat
Dag2TreeTest();
Dag2TreeRecTest();
Console.WriteLine("Running time ");
Dag2TreeTest();
Dag2TreeRecTest();
Console.ReadKey();
}
public static void Dag2TreeTest()
{
HashSet<Node<int>> hashSet = new HashSet<Node<int>>();
Node<int> root = BulidDummyDAG();
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
var treeNode = root.DAG2Tree<int>(hashSet);
stopwatch.Stop();
Console.WriteLine(string.Format("Dag 2 Tree = {0}ms",stopwatch.ElapsedMilliseconds));
}
private static Node<int> BulidDummyDAG()
{
Node<int> node2 = new Node<int>(2);
Node<int> node4 = new Node<int>(4);
Node<int> node3 = new Node<int>(3);
Node<int> node5 = new Node<int>(5);
Node<int> node6 = new Node<int>(6);
Node<int> node7 = new Node<int>(7);
Node<int> node8 = new Node<int>(8);
Node<int> node9 = new Node<int>(9);
Node<int> node10 = new Node<int>(10);
Node<int> root = new Node<int>(1);
//making DAG
root.ChildNodes.Add(node2);
root.ChildNodes.Add(node3);
node3.ChildNodes.Add(node2);
node3.ChildNodes.Add(node4);
root.ChildNodes.Add(node5);
node4.ChildNodes.Add(node6);
node4.ChildNodes.Add(node7);
node5.ChildNodes.Add(node8);
node2.ChildNodes.Add(node9);
node9.ChildNodes.Add(node8);
node9.ChildNodes.Add(node10);
var length = 10000;
Node<int> tempRoot = node10;
for (int i = 0; i < length; i++)
{
var nextChildNode = new Node<int>(11 + i);
tempRoot.ChildNodes.Add(nextChildNode);
tempRoot = nextChildNode;
}
return root;
}
public static void Dag2TreeRecTest()
{
HashSet<Node<int>> hashSet = new HashSet<Node<int>>();
Node<int> root = BulidDummyDAG();
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
var treeNode = root.DAG2TreeRec<int>(hashSet);
stopwatch.Stop();
Console.WriteLine(string.Format("Dag 2 Tree Rec = {0}ms",stopwatch.ElapsedMilliseconds));
}
What is more, data structure need some improvment:
Overriding GetHash, toString, Equals, == operator
implementing IComparable
LinkedList is probably a better choice
Also, before the conversion there are certian thigs that need to be checked:
Multigraphs
If it's DAG (Cycles)
Diamnods in DAG
Multiple roots in DAG
All in all, it narrows down to a few questions:
How can I improve the conversion? Since this is a recurion it's possible to blow up the stack. I can add stack to memorize it. If I do continuation-passing style, will I be more efficient?
I feel that immutable data structure in this case would be better. Is it correct?
Is Childs the right name ? :)

Algorithm:
As you observed, some nodes appear twice in the output. If the node 2 had children, the whole subtree would appear twice. If you want each node to appear just once, replace
if (hashSet.Contains(node))
{
var nodeClone = new Node<T>(node.Value, node.Childs);
node = nodeClone;
}
with
if (hashSet.Contains(node))
{
// node already seen -> do nothing
}
I wouldn't be too worried about the size of the stack or performance of recursion. However, you could replace your Depth-first-search with Breadth-first-search which would result in nodes closer to the root being visited earlier, thus yielding a more "natural" tree (in your picture you already numbered the nodes in BFS order).
var seenNodes = new HashSet<Node>();
var q = new Queue<Node>();
q.Enqueue(root);
seenNodes.Add(root);
while (q.Count > 0) {
var node = q.Dequeue();
foreach (var child in node.Childs) {
if (!seenNodes.Contains(child )) {
seenNodes.Add(child);
q.Enqueue(child);
}
}
The algorithm handles diamonds and cycles.
Multiple roots
Just declare a class Graph which will contain all the vertices
class Graph
{
public List<Node> Nodes { get; private set; }
public Graph()
{
Nodes = new List<Node>();
}
}
Code:
the hashSet could be named seenNodes.
Instead of
var length = root.Childs.Count;
for (int i = 0; i < length; ++i)
{
var node = root.Childs[i];
write
foreach (var child in root.Childs)
In Traverse, the visitor is quite unnecessary. You could rather have a method which yields all the nodes of the tree (in the same order traverse does) and it is up to user to do whatever with the nodes:
foreach(var node in root.TraverseRecursive())
{
Console.WriteLine(node.Value);
}
If you override GetHashCode and Equals, the algorithm will no more be able to distinguish between two different Nodes with same value, which is probably not what you want.
I don't see any reason why LinkedList would be better here than List, except for the reallocations (Capacity 2,4,8,16,...) which List does when adding nodes.

you had better posted in CodeReview
Childs is wrong => Children
you don't have to use a HashSet, you could have easily used a List>, because checking references only is enough here. (and so no GetHashCode, Equals and operators overriding is needed)
easeier way is Serializing your class and then Deserializing it again into second objectwith XmlSerializer.
while Serialized and Deserialized, 1 object referenced 2 times will become 2 objects with different references.

Related

C# - Odd Null Reference Exception during testing, why does this happen?

This references my last question which appears to have been abandoned. I am experiencing an odd "bug" if you will with C# and MS VS 2015. To reproduce the error, follow the steps:
Open console app project and copy paste code below.
Set a break point here:
First run code past break point, it works! :D
Then run code again but this time STOP at the break point and DRAG the executing statement cursor INTO the if statement from here:
to here:
Hit Continue and an NRE exception is thrown. Why does this happen? Is it just me? What is the technical explination for this?
CODE:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace testapp
{
class Program
{
static void Main(string[] args)
{
FILECollection randomCollection = new FILECollection();
// Fill with junk test data:
for(int i = 0; i<10; i++)
{
FILE junkfile = new FILE() { fileName = i.ToString(), folderName = i.ToString(), fileHashDigest = new byte[1] };
randomCollection.Add(junkfile);
}
if (true)
{
Console.WriteLine("testing this weird exception issue...");
FILE test;
test = new FILE();
test.fileName = "3";
test.folderName = "3";
test.fileHashDigest = new byte[1];
FILE exists = randomCollection.Where(f => f.fileName == test.fileName &&
f.fileHashDigest.SequenceEqual(test.fileHashDigest)).First();
}
}
}
public class FILE
{
public FILE() { _fileName = "";}
private string _fileName;
public string fileName
{
get
{
if (false)
return this._fileName.ToUpper();
else
return this._fileName;
}
set
{
if (false)
this._fileName = value.ToUpper();
else
this._fileName = value;
}
}
public string folderName { get; set; }
public byte[] fileHashDigest { get; set; }
}
public class FILECollection : IEnumerable<FILE>, ICollection<FILE>
{
private HashSet<FILE> svgHash;
private static List<FILE> PreallocationList;
public string FileName = "N/A";
/// <summary>
/// Default Constructor, will not
/// preallocate memory.
/// </summary>
/// <param name="PreallocationSize"></param>
public FILECollection()
{
this.svgHash = new HashSet<FILE>();
this.svgHash.Clear();
}
/// <summary>
/// Overload Constructor Preallocates
/// memory to be used for the new
/// FILE Collection.
/// </summary>
public FILECollection(int PreallocationSize, string fileName = "N/A", int fileHashDigestSize = 32)
{
FileName = fileName;
PreallocationList = new List<FILE>(PreallocationSize);
for (int i = 0; i <= PreallocationSize; i++)
{
byte[] buffer = new byte[fileHashDigestSize];
FILE preallocationSVG = new FILE()
{
fileName = "",
folderName = "",
fileHashDigest = buffer
};
PreallocationList.Add(preallocationSVG);
}
this.svgHash = new HashSet<FILE>(PreallocationList);
this.svgHash.Clear(); // Capacity remains unchanged until a call to TrimExcess is made.
}
/// <summary>
/// Add an FILE file to
/// the FILE Collection.
/// </summary>
/// <param name="svg"></param>
public void Add(FILE svg)
{
this.svgHash.Add(svg);
}
/// <summary>
/// Removes all elements
/// from the FILE Collection
/// </summary>
public void Clear()
{
svgHash.Clear();
}
/// <summary>
/// Determine if the FILE collection
/// contains the EXACT FILE file, folder,
/// and byte[] sequence. This guarantees
/// that the collection contains the EXACT
/// file you are looking for.
/// </summary>
/// <param name="item"></param>
/// <returns></returns>
public bool Contains(FILE item)
{
return svgHash.Any(f => f.fileHashDigest.SequenceEqual(item.fileHashDigest) &&
f.fileName == item.fileName &&
f.folderName == item.folderName);
}
/// <summary>
/// Determine if the FILE collection
/// contains the same file and folder name,
/// byte[] sequence is not compared. The file and folder
/// name may be the same but this does not guarantee the
/// file contents are exactly the same. Use Contains() instead.
/// </summary>
/// <param name="item"></param>
/// <returns></returns>
public bool ContainsPartially(FILE item)
{
return svgHash.Any(f => f.fileName == item.fileName &&
f.folderName == item.folderName);
}
/// <summary>
/// Returns the total number
/// of FILE files in the Collection.
/// </summary>
public int Count
{ get { return svgHash.Count(); } }
public bool IsReadOnly
{ get { return true; } }
public void CopyTo(FILE[] array, int arrayIndex)
{
svgHash.CopyTo(array, arrayIndex);
}
public bool Remove(FILE item)
{
return svgHash.Remove(item);
}
public IEnumerator<FILE> GetEnumerator()
{
return svgHash.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return svgHash.GetEnumerator();
}
}
}
I think either I am debugging in a terribly wrong way, or Microsoft should take a look at this. It's like future code is breaking current code...which is impossible!
OK here's my best guess..
First, as I mentioned in the comments, the exception doesn't occur if you comment out the line FILE exists = randomCollection.Where(f => f.fileName == test.fileName && f.fileHashDigest.SequenceEqual(test.fileHashDigest)).First()‌​;
Second, I noticed the same behavior can be reproduced with the following code:
if (true)
{
object o;
o = new object();
Func<bool> m = () => o == null;
}
i.e. the cause seems to be related to the variable being used in a lambda expression. So, looking at the same code snippet above in ILSpy I get the following:
Program.<>c__DisplayClass0_0 <>c__DisplayClass0_ = new Program.<>c__DisplayClass0_0();
<>c__DisplayClass0_.o = new object();
Func<bool> func = new Func<bool>(<>c__DisplayClass0_.<Main>b__0);
so my best guess is that the NullReferenceException refers to <>c__DisplayClass0_ intance being null - and I'm therefore inclined to believe that the stepping through the if(true) actually skipped the first line where <>c__DisplayClass0_ is instantiated

Why does attempt to access InnerText property cause Stack Overflow Expection when using HtmlAgilityPack?

I'm building an HTML preprocessor using HTMLAgilityPack that builds JSON representations of Html files. To do this, I'm using a simple Node class to contain the necessary properties from the HtmlNode objects provided by HtmlAgilityPack.
Unfortunately, I am consistently getting a Stack overflow exception that crashes my program. It occurs in the last if statement of the GenerateNode() method. As you can see from my code, I have tried unsuccessfully to extract the text from an HtmlTextNode and write it to the Node.InnerText property of my custom Node class. Each of the commented lines throws a stack overflow exception.
For the sake of completeness, I've included all the code in my WebHelpPreprocessor class, but it's only the first code block that concerns this question.
First Block
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using HtmlAgilityPack;
//
using Newtonsoft.Json;
namespace WebHelpPreprocessor
{
/// <summary>
/// A static class to help process html
/// </summary>
public static class Preprocessor
{
private static Node GenerateNode(HtmlNode htmlNode)
{
Node node = new Node(htmlNode.Name, htmlNode.Attributes["guid"].Value);
if(htmlNode.Attributes["class"] != null)
{
node.Classes = htmlNode.Attributes["class"].Value;
}
if(htmlNode.Attributes["id"] != null)
{
node.Id = htmlNode.Attributes["id"].Value;
}
if(htmlNode.Name == "#text")
{
//node.InnerText = htmlNode.InnerHtml;
//Console.WriteLine(htmlNode.InnerText.Trim());
//Console.WriteLine(JsonConvert.SerializeObject(htmlNode));
Console.WriteLine(htmlNode.Name);
}
return node;
}
Here is the rest of the code
/// <summary>
/// Builds a DocumentTree from the root node using breadth-first search
/// </summary>
/// <param name="rootNode">The root node to use for the model</param>
/// <returns>a completed document tree</returns>
public static DocumentTree BuildModelBFS(HtmlNode root)
{
int nodeCount = 1;
root.Attributes.Add("guid", System.Guid.NewGuid().ToString());
DocumentTree documentModel = new DocumentTree(new Node(root.Name, root.Attributes["guid"].Value));
Queue<HtmlNode> q = new Queue<HtmlNode>();
q.Enqueue(root);
while (q.Count > 0)
{
HtmlNode current = q.Dequeue();
Node currentNode = documentModel.find(current.Attributes["guid"].Value);
if (current == null)
{
continue;
}
if(current.HasChildNodes)
{
foreach(HtmlNode child in current.ChildNodes)
{
nodeCount++;
child.Attributes.Add("guid", System.Guid.NewGuid().ToString());
documentModel.AddToTree(GenerateNode(child), currentNode);
q.Enqueue(child);
}
}
//------------------Debugging
string id;
if(current.Attributes["id"] != null)
{
id = current.Attributes["id"].Value;
}
else
{
id = "0";
}
if(current.Name != "#text")
{
Console.WriteLine("Current node: " + current.Name + " ID: " + id + " Hash Code: " + current.GetHashCode());
}
//--------------------End of debugging
}
return documentModel;
}
/// <summary>
/// Builds a DocumentTree from the root node using depth-first search
/// </summary>
/// <param name="rootNode">The root node to use for the model</param>
/// <returns>a completed document tree</returns>
public static DocumentTree BuildModelDFS(HtmlNode root)
{
return null;
}
}
}

Get all atributes of an XML to an 2d or 3d string array in C#

For my project I'm trying to translate or decrypt a XML API answer to an usable array of all the information.
Here is an example of the XML I'm receiving:
<users>
<User LoginName="test1" Owner="" Alias="" UserType="PAID" ClientType="OBM" Quota="10737418240" Timezone="GMT+08:00 (CST)" Language="en" DataFile="1" DataSize="1536" RetainFile="0" RetainSize="0" EnableMSSQL="Y" EnableMSExchange="Y" EnableOracle="Y" EnableLotusNotes="Y" EnableLotusDomino="Y" EnableMySQL="Y" EnableInFileDelta="Y" EnableShadowCopy="Y" EnableExchangeMailbox="N" ExchangeMailboxQuota="0" EnableNASClient="Y" EnableDeltaMerge="Y" EnableMsVm="N" MsVmQuota="0" EnableVMware="N" VMwareQuota="0" Bandwidth="0" Notes="" Status="ENABLE" RegistrationDate="1302687743242" SuspendPaidUser="N" SuspendPaidUserDate="20140503" LastBackupDate="1302699594652" EnableCDP="Y" EnableShadowProtectBareMetal="Y"EnableWinServer2008BareMetal="Y"
Hostname="123.abc.com" FileSizeLimit="52428800" ExcludeNetworkShare="Y"
><Contact Name=""Email="www#qqq.com"/>
</user>
I've succeeded to get one Attribute at once by using this code below:
/// <summary>
/// Get attribute from XML file USE ONLY AS DEMO
/// </summary>
/// <param name="inputXML">XML string</param>
/// <param name="requestVar">The requested variable in the XML response</param>
/// <param name="parameter">Where the requested variable is located</param>
/// <returns>string of all requested variables</returns>
public static String GetAttribute(string inputXML, string requestVar, string parameter)
{
string vars = "";
XmlDocument xml = new XmlDocument();
xml.LoadXml(inputXML);
XmlNodeList xnList = xml.GetElementsByTagName(parameter);
foreach (XmlNode xn in xnList)
{
vars = vars + xn.Attributes[requestVar].Value;
}
return vars;
}
This function needs the node names where it can be found. For this project are many API calls necessary and would like a function which puts all the attributes in a string array. So far I tried to translate this part:
vars = vars + xn.Attributes[requestVar].Value;
to this:
foreach (XmlAttribute xa in xn)
{
vars[i, j, k] = xa.Value;
k++;
}
and I Also tried:
for (k = 0; k < xn.Attributes.Count; k++ )
{
vars[i, j, k] = xn.Attributes[k].Value;
}
But both codes won't work. How do I get a simpel for loop or foreach, which gets all the attributes in the array? And can this also be done with:
XmlNodeList xnList = xml.GetElementsByTagName(parameter);
The i, j and k vars are used in the multiple loops: i is used for XmlNodeList, the j is used for the XmlNode and the k is used for the XmlAttribute.
In this array I would like to get all the info of the XML file in the same order, only the parts between the '""' is needed.
vars[0][0][0]= would stand for: <Users><User LoginName= (vars[<Users>][<User][LoginName]=
This is the function as far as I'm now:
public static String[,,] GetVars(string inputXML)
{
string[,,] vars = new string[100,50,50];
int i, j, k;
XmlDocument xml = new XmlDocument();
xml.LoadXml(inputXML);
i = j = k = 0;
XmlNodeList xnList = xml.GetElementsByTagName("Users");
foreach (XmlNode xn in xnList)
{
foreach (XmlAttribute xa in xn)
{
vars[i, j, k] = xa.Value;
k++;
}
k = 0;
j++;
}
j = 0;
return vars;
I would use the System.Xml.Serialization to accomplish this.
First I would create an object (containing class) that would encompass all of the elements and attributes that the XML would include.
Then assign the appropriate Attributes That Control XML Serialization to each property of the containing class object.
Then use the XmlSerializer to serialize the XML into a useable object.
I have provided an example below that would get you on the right track:
Containing Class Object
namespace XMLSerializationDemo
{
/// <summary>
/// A container that contains properties relevant to a RUBI Object
/// </summary>
[Serializable]
public class RUBIObject
{
[XmlAttribute]
public Guid ID { get; set; }
[XmlAttribute]
public string Name { get; set; }
[XmlAttribute]
public string Description { get; set; }
[XmlAttribute]
public DateTime CreatedOn { get; set; }
}
}
Use a collection of some sort in order to encase the objects.
namespace XMLSerializationDemo
{
/// <summary>
/// Object that contains a collection of RUBIObjects which can be serialized into XML
/// </summary>
[Serializable]
public class RUBIObjectCollection
{
//Base Constructor which instantiates a collection of RUBIObjects
public RUBIObjectCollection()
{
this.Objects = new List<RUBIObject>();
}
public List<RUBIObject> Objects { get; set; }
}
}
Then create the methods for serializing the xml to and from a collection of objects:
namespace XMLSerializationDemo
{
public static class RUBIObjectSerialization
{
public static string SerializeToXML(this RUBIObjectCollection source)
{
//Create a string writer in order to output to console as opposed to file
using (var sw = new StringWriter())
{
//Settings to configure the way the XML will be output to the console. Really, only Indent = true; is needed.
var settings = new XmlWriterSettings();
settings.NewLineChars = Environment.NewLine;
settings.IndentChars = " ";
settings.NewLineHandling = NewLineHandling.Replace;
settings.Indent = true;
//Create writer that writes the xml to the string writer object
using (var xw = XmlWriter.Create(sw, settings))
{
//Create serializer that can serialize a collection of RUBIObjects
XmlSerializer serializer =
new XmlSerializer(typeof(RUBIObjectCollection));
//Serialize this instance of a RUBICollection object, into XML and write to the string writer output
serializer.Serialize(xw, source);
//Flush the xmlwriter stream as it isn't needed any longer
xw.Flush();
}
//Return the XML as a formatted string
return sw.ToString();
}
}
public static RUBIObjectCollection DeserializeToCollection(this string source)
{
RUBIObjectCollection collection = null;
XmlSerializer serializer = null;
//Read the XML string into a stream.
using (var sr = new StringReader(source))
{
//Instantiate an XML Serializer to expect a collection of RUBI Objects
serializer = new XmlSerializer(typeof(RUBIObjectCollection));
//Deserialize the XML stream to a collection
collection = (RUBIObjectCollection)serializer.Deserialize(sr);
}
return collection;
}
}
}
And this would be how it's all used from start to finish:
public class Program
{
private static void Main(string[] args)
{
//Create test data and add it to a collection
var collection = DummyData();
//Serialize the collection to XML and write to console.
Console.WriteLine(collection.SerializeToXML());
//Prevents console window from closing
Console.ReadLine();
}
/// <summary>
/// Generates dummy data for testing purposes
/// </summary>
/// <returns>A collection of RUBIObjects</returns>
private static RUBIObjectCollection DummyData()
{
Random random = new Random();
var collection = new RUBIObjectCollection();
//Build a collection of RUBIObjects and instantiate them with semi-random data.
for (int i = 0; i < 10; i++)
{
int month = random.Next(1, 12); //Random month as an integer
int year = random.Next(2010, 2015); //Random year as an integer
//Create object and add to collection.
collection.Objects.Add(new RUBIObject()
{
ID = Guid.NewGuid(),
Name = string.Format("Object{0}", i),
Description = "Description",
CreatedOn = new DateTime(year, month, 1)
});
}
return collection;
}
}
Bonus : And you can even toss in some Unit Testing to add some shine!
using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using XMLSerializationDemo;
namespace UnitTest
{
[TestClass]
public class UnitTest
{
[TestMethod]
public void DummyData_TestDataCreated()
{
//Arrange
PrivateType pt = new PrivateType(typeof(Program));
//Act
RUBIObjectCollection collection = (RUBIObjectCollection)pt.InvokeStatic("DummyData", null);
int actualResult = collection.Objects.Count;
int expectedResult = 10;
//Assert
Assert.AreEqual(actualResult, expectedResult);
}
[TestMethod]
public void SerializeToXML_GeneratesXMLString()
{
//Arrange
bool actualResult = false;
bool expectedResult = true;
PrivateType pt = new PrivateType(typeof(Program));
RUBIObjectCollection collection = (RUBIObjectCollection)pt.InvokeStatic("DummyData", null);
//Act
string serializedXml = collection.SerializeToXML();
try
{
System.Xml.Linq.XDocument doc = System.Xml.Linq.XDocument.Parse(serializedXml);
actualResult = true;
}
catch
{
actualResult = false;
}
//Assert
Assert.AreEqual(actualResult, expectedResult);
}
[TestMethod]
public void DeserializeToCollection_DeserializedToRUBICollection()
{
//Arrange
bool actualResult = false;
bool expectedResult = true;
XMLSerializationDemo.RUBIObjectCollection deserializedCollection = null;
PrivateType pt = new PrivateType(typeof(XMLSerializationDemo.Program));
XMLSerializationDemo.RUBIObjectCollection collection = (XMLSerializationDemo.RUBIObjectCollection)pt.InvokeStatic("DummyData", null);
string serializedXml = collection.SerializeToXML();
//Act
try
{
deserializedCollection = serializedXml.DeserializeToCollection();
if (deserializedCollection.Objects.Count > 0)
actualResult = true;
}
catch
{
actualResult = false;
}
//Assert
Assert.AreEqual(actualResult, expectedResult);
}
}
}
Example XML Generated by the SerializeToXML custom extension method:
<?xml version="1.0" encoding="utf-16"?>
<RUBIObjectCollection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Objects>
<RUBIObject ID="dac59571-e7eb-401b-b047-b72bd73628a9" Name="Object0" Description="Description" CreatedOn="2010-07-01T00:00:00" />
<RUBIObject ID="42d4741b-ba3c-4db6-ac96-24abf819045d" Name="Object1" Description="Description" CreatedOn="2011-04-01T00:00:00" />
<RUBIObject ID="bc3a2f2f-623a-4e18-be8a-2bf2f3cee841" Name="Object2" Description="Description" CreatedOn="2013-02-01T00:00:00" />
<RUBIObject ID="51965f3b-c216-42c3-9893-ebe829d0b1d1" Name="Object3" Description="Description" CreatedOn="2014-07-01T00:00:00" />
<RUBIObject ID="58492a02-291f-497d-87d8-152b7489a0b3" Name="Object4" Description="Description" CreatedOn="2014-06-01T00:00:00" />
<RUBIObject ID="8b929041-4e6d-42f4-af16-aaa4c3c1d588" Name="Object5" Description="Description" CreatedOn="2011-05-01T00:00:00" />
<RUBIObject ID="1f17d752-95ad-4d89-a2fe-fec6f5eeb713" Name="Object6" Description="Description" CreatedOn="2010-03-01T00:00:00" />
<RUBIObject ID="73716b37-7c10-4aa5-9542-8a28d02d1a0b" Name="Object7" Description="Description" CreatedOn="2011-07-01T00:00:00" />
<RUBIObject ID="a5a8ebe2-487f-462b-938d-49d4d07773bf" Name="Object8" Description="Description" CreatedOn="2014-08-01T00:00:00" />
<RUBIObject ID="2d84bf1b-c012-495d-a0da-8adf45658ea6" Name="Object9" Description="Description" CreatedOn="2014-03-01T00:00:00" />
<RUBIObject ID="492d4fe4-ae64-4e91-a38e-9c0353f73ffc" Name="Object10" Description="Description" CreatedOn="2012-06-01T00:00:00" />
</Objects>
</RUBIObjectCollection>
Once you have a collection of useable objects, you can do what you like with it, such as iterate through each object and populate the said 2d/3d arrays that you had originally requested a solution for.
#JonSkeet, it is indeed a lot easier to use Xdocument. After a short while I've succeeded and created this code:
public static String[,] XGetVars(string inputXML)
{
XDocument doc = XDocument.Parse(inputXML);
int i = 0, j = 0, elementscounter = doc.Root.Elements().Count(), attributescounter = doc.Root.Elements().Attributes().Count();
string[,] vars = new string[elementscounter,(attributescounter/elementscounter)];
foreach (XElement element in doc.Root.Elements())
{
foreach (XAttribute attribute in element.Attributes())
{
vars[i,j] = attribute.ToString();
j++;
}
j = 0;
i++;
}
return vars;
}
This method will create a 2D string array of the XML above. I will continue to get the node attributes as well.
EDIT:
As promised here's the new function which I'm using now:
/// <summary>
/// Create 3D array of the XML response
/// </summary>
/// <param name="inputXML">string XML file</param>
/// <returns>3D string array of the XML file</returns>
public static String[,,] GetVars(string inputXML)
{
try
{
XDocument doc = XDocument.Parse(inputXML);
if (doc.Root.Elements().Attributes().Count() < 1 || ErrorMessage(doc)) return new string[1, 1, 1];//temp fix for empty xml files
int i = 0, j = 0, k = 0, elementscounter = doc.Root.Elements().Count(), attributescounter = doc.Root.Elements().Attributes().Count();
string[,,] vars = new string[elementscounter, ((attributescounter / elementscounter) + getMaxNodes(doc)), getMaxNodesAttributes(doc)];
foreach (XElement element in doc.Root.Elements())
{
foreach (XAttribute attribute in element.Attributes())
{
vars[i, j, 0] = TrimVar(attribute.ToString());
j++;
}
foreach (XElement node in element.Nodes())
{
foreach (XAttribute eAttribute in node.Attributes())
{
vars[i, j, k] = TrimVar(eAttribute.ToString());
k++;
}
k = 0;
j++;
}
j = 0;
i++;
}
return vars;
}
catch (System.Xml.XmlException)
{
string[,,] vars = new string[1, 1, 1];
vars[0, 0, 0] = "No XML found!";
return vars;
}
}
/// <summary>
/// get the max nodes available by a Node
/// </summary>
/// <param name="XML">XML string</param>
/// <returns>Max nodes available</returns>
private static Int32 getMaxNodes(XDocument XML)
{
int max = 1;
foreach (XElement element in XML.Root.Elements())
{
if(element.Nodes().Count() > max) max = element.Nodes().Count();
}
return max;
}
/// <summary>
/// get the max attributes available by a Node
/// </summary>
/// <param name="XML">XML string</param>
/// <returns>Max attributes available</returns>
private static Int32 getMaxNodesAttributes(XDocument XML)
{
int max = 1;
foreach (XElement element in XML.Root.Elements())
{
foreach (XElement node in element.Nodes())
{
if (node.Attributes().Count() > max) max = node.Attributes().Count();
}
}
return max;
}
/// <summary>
/// Trim the input string to only the value
/// </summary>
/// <param name="input">XML readout var</param>
/// <returns>value of the XML readout var</returns>
private static String TrimVar(string input)
{
return input.Remove(0, (input.IndexOf('"'))+1).TrimEnd('"');
}

Redefine massive of fixed size

I have an array size of 10. it must contains ten last values of incoming parameters (number of incoming parameters is nearly 3k) I have some logic in following loop:
for (int i=0; i<incomingLength; i++)
{
//and here I also need to rewrite this array size of 10 with new incomingValue(i)
}
perhaps it is primitive but I am stuck((
You can use a "Circular Buffer" for this.
Here's a sample implementation (parameter validation omitted for brevity):
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
namespace Demo
{
public class CircularBuffer<T>: IEnumerable<T>
{
/// <summary>Constructor.</summary>
/// <param name="capacity">The maximum capacity of the buffer.</param>
public CircularBuffer(int capacity)
{
// The reason for this +1 is to simplify the logic - we can use "front == back" to indicate an empty buffer.
_buffer = new T[capacity+1];
}
/// <summary>The buffer capacity.</summary>
public int Capacity
{
get
{
return _buffer.Length - 1;
}
}
/// <summary>The number of elements currently stored in the buffer.</summary>
public int Count
{
get
{
int result = _back - _front;
if (result < 0)
result += _buffer.Length;
return result;
}
}
/// <summary>Is the buffer empty?</summary>
public bool IsEmpty
{
get
{
return this.Count == 0;
}
}
/// <summary>Is the buffer full? (i.e. has it reached its capacity?)</summary>
public bool IsFull
{
get
{
return nextSlot(_back) == _front;
}
}
/// <summary>Empties the buffer.</summary>
public void Empty()
{
_front = _back = 0;
Array.Clear(_buffer, 0, _buffer.Length); // Destroy any old references so they can be GCed.
}
/// <summary>Add an element to the buffer, overwriting the oldest element if the buffer is full.</summary>
/// <param name="newItem">The element to add.</param>
public void Add(T newItem)
{
_buffer[_back] = newItem;
_back = nextSlot(_back);
if (_back == _front) // Buffer is full?
{
_front = nextSlot(_front); // Bump the front, overwriting the current front.
_buffer[_back] = default(T); // Remove the old front value.
}
}
/// <summary>
/// The typesafe enumerator. Elements are returned in oldest to newest order.
/// This is not threadsafe, so if you are enumerating the buffer while another thread is changing it you will run
/// into threading problems. Therefore you must use your own locking scheme to avoid the problem.
/// </summary>
public IEnumerator<T> GetEnumerator()
{
for (int i = _front; i != _back; i = nextSlot(i))
yield return _buffer[i];
}
/// <summary>The non-typesafe enumerator.</summary>
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator(); // Implement in terms of the typesafe enumerator.
}
/// <summary>Calculates the index of the slot following the specified one, wrapping if necessary.</summary>
private int nextSlot(int slot)
{
return (slot + 1) % _buffer.Length;
}
/// <summary>
/// The index of the element at the front of the buffer.
/// If this equals _back, the buffer is empty.
/// </summary>
private int _front;
/// <summary>
/// The index of the first element BEYOND the last used element of the buffer.
/// Therefore this indicates where the next added element will go.
/// </summary>
private int _back;
/// <summary>The underlying buffer. This has a length one greater than the actual capacity.</summary>
private readonly T[] _buffer;
}
internal class Program
{
private void run()
{
CircularBuffer<int> buffer = new CircularBuffer<int>(10);
for (int i = 0; i < 20; ++i)
buffer.Add(i);
foreach (int n in buffer)
Console.WriteLine(n); // Prints 10..19
}
private static void Main()
{
new Program().run();
}
}
}
use array.copy
var arr1 = new int[] { 1, 2, 3};
var arr2 = new int[] { 4, 5};
var target = new int[arr1.Length + arr2.Length];
Array.Copy(arr1, target, arr1.Length);
Array.Copy(arr2, 0, target, arr1.Length, arr2.Length);
this will combine two arrays. you can modify indexes as you like

Regular expression to separate arguments in functions

I can't deal with a regular expression to separate the argument from function.
The function takes arguments in following way:
FunctionName(arg1;arg2;...;argn)
Now to make the rest of my code work I need to do the following-put every argument in ():
FunctionName((arg1);(arg2);(arg3))
The problem is that the arg can be anything- a number, an operator, other function
The test code for the solution is:
The function before regexp:
Function1((a1^5-4)/2;1/sin(a2);a3;a4)+Function2(a1;a2;1/a3)
After i needd to get sth like this:
Function1(((a1^5-4)/2);(1/sin(a2));(a3);(a4))+Function2((a1);(a2);(1/a3))
Unless I'm missing something, isn't it as simple as replacing ; with );( and surrounding the whole thing in ( ) ?
Using Regex:
(?:([^;()]+);?)+
and LINQ:
string result = "FunctionName(" +
String.Join(";",
from Capture capture in
Regex.Matches(inputString, #"FunctionName\((?:([^;()]+);?)+\)")[0].Groups[1].
Captures
select "(" + capture.Value + ")") + ")";
This is a far cry from a Regex but the potential for nested functions combined with the fact that this is a structured language being modified that a lexer/parser scheme is more appropriate.
Here is an example of a system that processes things of this nature
First, we define something that can be located in the input (the expression to modify)
public interface ISourcePart
{
/// <summary>
/// Gets the string representation of the kind of thing we're working with
/// </summary>
string Kind { get; }
/// <summary>
/// Gets the position this information is found at in the original source
/// </summary>
int Position { get; }
/// <summary>
/// Gets a representation of this data as Token objects
/// </summary>
/// <returns>An array of Token objects representing the data</returns>
Token[] AsTokens();
}
Next, we'll define a construct for housing tokens (identifiable portions of the source text)
public class Token : ISourcePart
{
public int Position { get; set; }
public Token[] AsTokens()
{
return new[] {this};
}
public string Kind { get; set; }
/// <summary>
/// Gets or sets the value of the token
/// </summary>
public string Value { get; set; }
/// <summary>
/// Creates a new Token
/// </summary>
/// <param name="kind">The kind (name) of the token</param>
/// <param name="match">The Match the token is to be generated from</param>
/// <param name="index">The offset from the beginning of the file the index of the match is relative to</param>
/// <returns>The newly created token</returns>
public static Token Create(string kind, Match match, int index)
{
return new Token
{
Position = match.Index + index,
Kind = kind,
Value = match.Value
};
}
/// <summary>
/// Creates a new Token
/// </summary>
/// <param name="kind">The kind (name) of the token</param>
/// <param name="value">The value to assign to the token</param>
/// <param name="position">The absolute position in the source file the value is located at</param>
/// <returns>The newly created token</returns>
public static Token Create(string kind, string value, int position)
{
return new Token
{
Kind = kind,
Value = value,
Position = position
};
}
}
We'll use Regexes to find our tokens in this example (below - Excerpt from Program.cs in my demo project).
/// <summary>
/// Breaks an input string into recognizable tokens
/// </summary>
/// <param name="source">The input string to break up</param>
/// <returns>The set of tokens located within the string</returns>
static IEnumerable<Token> Tokenize(string source)
{
var tokens = new List<Token>();
var sourceParts = new[] { new KeyValuePair<string, int>(source, 0) };
tokens.AddRange(Tokenize(OpenParen, "\\(", ref sourceParts));
tokens.AddRange(Tokenize(CloseParen, "\\)", ref sourceParts));
tokens.AddRange(Tokenize(Semi, ";", ref sourceParts));
tokens.AddRange(Tokenize(Operator, "[\\^\\\\*\\+\\-/]", ref sourceParts));
tokens.AddRange(Tokenize(Literal, "\\w+", ref sourceParts));
return tokens.OrderBy(x => x.Position);
}
As you can see, I've defined patterns for open and close parenthesis, semicolons, basic math operators and letters and numbers.
The Tokenize method is defined as follows (again from Program.cs in my demo project)
/// <summary>
/// Performs tokenization of a collection of non-tokenized data parts with a specific pattern
/// </summary>
/// <param name="tokenKind">The name to give the located tokens</param>
/// <param name="pattern">The pattern to use to match the tokens</param>
/// <param name="untokenizedParts">The portions of the input that have yet to be tokenized (organized as text vs. position in source)</param>
/// <returns>The set of tokens matching the given pattern located in the untokenized portions of the input, <paramref name="untokenizedParts"/> is updated as a result of this call</returns>
static IEnumerable<Token> Tokenize(string tokenKind, string pattern, ref KeyValuePair<string, int>[] untokenizedParts)
{
//Do a bit of setup
var resultParts = new List<KeyValuePair<string, int>>();
var resultTokens = new List<Token>();
var regex = new Regex(pattern);
//Look through all of our currently untokenized data
foreach (var part in untokenizedParts)
{
//Find all of our available matches
var matches = regex.Matches(part.Key).OfType<Match>().ToList();
//If we don't have any, keep the data as untokenized and move to the next chunk
if (matches.Count == 0)
{
resultParts.Add(part);
continue;
}
//Store the untokenized data in a working copy and save the absolute index it reported itself at in the source file
var workingPart = part.Key;
var index = part.Value;
//Look through each of the matches that were found within this untokenized segment
foreach (var match in matches)
{
//Calculate the effective start of the match within the working copy of the data
var effectiveStart = match.Index - (part.Key.Length - workingPart.Length);
resultTokens.Add(Token.Create(tokenKind, match, part.Value));
//If we didn't match at the beginning, save off the first portion to the set of untokenized data we'll give back
if (effectiveStart > 0)
{
var value = workingPart.Substring(0, effectiveStart);
resultParts.Add(new KeyValuePair<string, int>(value, index));
}
//Get rid of the portion of the working copy we've already used
if (match.Index + match.Length < part.Key.Length)
{
workingPart = workingPart.Substring(effectiveStart + match.Length);
}
else
{
workingPart = string.Empty;
}
//Update the current absolute index in the source file we're reporting to be at
index += effectiveStart + match.Length;
}
//If we've got remaining data in the working copy, add it back to the untokenized data
if (!string.IsNullOrEmpty(workingPart))
{
resultParts.Add(new KeyValuePair<string, int>(workingPart, index));
}
}
//Update the untokenized data to contain what we couldn't process with this pattern
untokenizedParts = resultParts.ToArray();
//Return the tokens we were able to extract
return resultTokens;
}
Now that we've got the methods and types in place to handle our tokenized data, we need to be able to recognize pieces of larger meaning, like calls to simple functions (like sin(x)), complex functions (like Function1(a1;a2;a3)), basic mathematical operations (like +, -, *, etc.), and so on. We'll make a simple parser for dealing with that; firstly we'll define a match condition for a parse node.
public class ParseNodeDefinition
{
/// <summary>
/// The set of parse node definitions that could be transitioned to from this one
/// </summary>
private readonly IList<ParseNodeDefinition> _nextNodeOptions;
/// <summary>
/// Creates a new ParseNodeDefinition
/// </summary>
private ParseNodeDefinition()
{
_nextNodeOptions = new List<ParseNodeDefinition>();
}
/// <summary>
/// Gets whether or not this definition is an acceptable ending point for the parse tree
/// </summary>
public bool IsValidEnd { get; private set; }
/// <summary>
/// Gets the name an item must have for it to be matched by this definition
/// </summary>
public string MatchItemsNamed { get; private set; }
/// <summary>
/// Gets the set of parse node definitions that could be transitioned to from this one
/// </summary>
public IEnumerable<ParseNodeDefinition> NextNodeOptions
{
get { return _nextNodeOptions; }
}
/// <summary>
/// Gets or sets the tag that will be associated with the data if matched
/// </summary>
public string Tag { get; set; }
/// <summary>
/// Creates a new ParseNodeDefinition matching items with the specified name/kind.
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <param name="tag">The tag to associate with matched items</param>
/// <param name="isValidEnd">Whether or not the element is a valid end to the parse tree</param>
/// <returns>A ParseNodeDefinition capable of matching items of the given name</returns>
public static ParseNodeDefinition Create(string matchItemsNamed, string tag, bool isValidEnd)
{
return new ParseNodeDefinition { MatchItemsNamed = matchItemsNamed, Tag = tag, IsValidEnd = isValidEnd };
}
public ParseNodeDefinition AddOption(string matchItemsNamed)
{
return AddOption(matchItemsNamed, string.Empty, false);
}
public ParseNodeDefinition AddOption(string matchItemsNamed, string tag)
{
return AddOption(matchItemsNamed, tag, false);
}
/// <summary>
/// Adds an option for a named node to follow this one in the parse tree the node is a part of
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <param name="tag">The tag to associate with matched items</param>
/// <param name="isValidEnd">Whether or not the element is a valid end to the parse tree</param>
/// <returns>The ParseNodeDefinition that has been added</returns>
public ParseNodeDefinition AddOption(string matchItemsNamed, string tag, bool isValidEnd)
{
var node = Create(matchItemsNamed, tag, isValidEnd);
_nextNodeOptions.Add(node);
return node;
}
public ParseNodeDefinition AddOption(string matchItemsNamed, bool isValidEnd)
{
return AddOption(matchItemsNamed, string.Empty, isValidEnd);
}
/// <summary>
/// Links the given node as an option for a state to follow this one in the parse tree this node is a part of
/// </summary>
/// <param name="next">The node to add as an option</param>
public void LinkTo(ParseNodeDefinition next)
{
_nextNodeOptions.Add(next);
}
}
This will let us match a single element by name (whether it's a ParseTree defined later) or a Token as they both implement the ISourcePart interface. Next we'll make a ParseTreeDefinition that allows us to specify sequences of ParseNodeDefinitions for matching.
public class ParseTreeDefinition
{
/// <summary>
/// The set of parse node definitions that constitute an initial match to the parse tree
/// </summary>
private readonly IList<ParseNodeDefinition> _initialNodeOptions;
/// <summary>
/// Creates a new ParseTreeDefinition
/// </summary>
/// <param name="name">The name to give to parse trees generated from full matches</param>
public ParseTreeDefinition(string name)
{
_initialNodeOptions = new List<ParseNodeDefinition>();
Name = name;
}
/// <summary>
/// Gets the set of parse node definitions that constitute an initial match to the parse tree
/// </summary>
public IEnumerable<ParseNodeDefinition> InitialNodeOptions { get { return _initialNodeOptions; } }
/// <summary>
/// Gets the name of the ParseTreeDefinition
/// </summary>
public string Name { get; private set; }
/// <summary>
/// Adds an option for a named node to follow this one in the parse tree the node is a part of
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <returns>The ParseNodeDefinition that has been added</returns>
public ParseNodeDefinition AddOption(string matchItemsNamed)
{
return AddOption(matchItemsNamed, string.Empty, false);
}
/// <summary>
/// Adds an option for a named node to follow this one in the parse tree the node is a part of
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <param name="tag">The tag to associate with matched items</param>
/// <returns>The ParseNodeDefinition that has been added</returns>
public ParseNodeDefinition AddOption(string matchItemsNamed, string tag)
{
return AddOption(matchItemsNamed, tag, false);
}
/// <summary>
/// Adds an option for a named node to follow this one in the parse tree the node is a part of
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <param name="tag">The tag to associate with matched items</param>
/// <param name="isValidEnd">Whether or not the element is a valid end to the parse tree</param>
/// <returns>The ParseNodeDefinition that has been added</returns>
public ParseNodeDefinition AddOption(string matchItemsNamed, string tag, bool isValidEnd)
{
var node = ParseNodeDefinition.Create(matchItemsNamed, tag, isValidEnd);
_initialNodeOptions.Add(node);
return node;
}
/// <summary>
/// Adds an option for a named node to follow this one in the parse tree the node is a part of
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <param name="isValidEnd">Whether or not the element is a valid end to the parse tree</param>
/// <returns>The ParseNodeDefinition that has been added</returns>
public ParseNodeDefinition AddOption(string matchItemsNamed, bool isValidEnd)
{
return AddOption(matchItemsNamed, string.Empty, isValidEnd);
}
/// <summary>
/// Attempts to follow a particular branch in the parse tree from a given starting point in a set of source parts
/// </summary>
/// <param name="parts">The set of source parts to attempt to match in</param>
/// <param name="startIndex">The position to start the matching attempt at</param>
/// <param name="required">The definition that must be matched for the branch to be followed</param>
/// <param name="nodes">The set of nodes that have been matched so far</param>
/// <returns>true if the branch was followed to completion, false otherwise</returns>
private static bool FollowBranch(IList<ISourcePart> parts, int startIndex, ParseNodeDefinition required, ICollection<ParseNode> nodes)
{
if (parts[startIndex].Kind != required.MatchItemsNamed)
{
return false;
}
nodes.Add(new ParseNode(parts[startIndex], required.Tag));
return parts.Count > (startIndex + 1) && required.NextNodeOptions.Any(x => FollowBranch(parts, startIndex + 1, x, nodes)) || required.IsValidEnd;
}
/// <summary>
/// Attempt to match the parse tree definition against a set of source parts
/// </summary>
/// <param name="parts">The source parts to match against</param>
/// <returns>true if the parse tree was matched, false otherwise. parts is updated by this method to consolidate matched nodes into a ParseTree</returns>
public bool Parse(ref IList<ISourcePart> parts)
{
var partsCopy = parts.ToList();
for (var i = 0; i < parts.Count; ++i)
{
var tree = new List<ParseNode>();
if (InitialNodeOptions.Any(x => FollowBranch(partsCopy, i, x, tree)))
{
partsCopy.RemoveRange(i, tree.Count);
partsCopy.Insert(i, new ParseTree(Name, tree.ToArray(), tree[0].Position));
parts = partsCopy;
return true;
}
}
return false;
}
}
Of course these don't do us much good without having some place to store the results of the matchers we've defined so far, so let's define ParseTree and ParseNode where a ParseTree is simply a collection of ParseNode objects where ParseNode is a wrapper around a ParseTree or Token (or more generically any ISourcePart).
public class ParseTree : ISourcePart
{
/// <summary>
/// Creates a new ParseTree
/// </summary>
/// <param name="kind">The kind (name) of tree this is</param>
/// <param name="nodes">The nodes the tree matches</param>
/// <param name="position">The position in the source file this tree is located at</param>
public ParseTree(string kind, IEnumerable<ISourcePart> nodes, int position)
{
Kind = kind;
ParseNodes = nodes.ToList();
Position = position;
}
public string Kind { get; private set; }
public int Position { get; private set; }
/// <summary>
/// Gets the nodes that make up this parse tree
/// </summary>
public IList<ISourcePart> ParseNodes { get; internal set; }
public Token[] AsTokens()
{
return ParseNodes.SelectMany(x => x.AsTokens()).ToArray();
}
}
public class ParseNode : ISourcePart
{
/// <summary>
/// Creates a new ParseNode
/// </summary>
/// <param name="sourcePart">The data that was matched to create this node</param>
/// <param name="tag">The tag data (if any) associated with the node</param>
public ParseNode(ISourcePart sourcePart, string tag)
{
SourcePart = sourcePart;
Tag = tag;
}
public string Kind { get { return SourcePart.Kind; } }
/// <summary>
/// Gets the tag associated with the matched data
/// </summary>
public string Tag { get; private set; }
/// <summary>
/// Gets the data that was matched to create this node
/// </summary>
public ISourcePart SourcePart { get; private set; }
public int Position { get { return SourcePart.Position; } }
public Token[] AsTokens()
{
return SourcePart.AsTokens();
}
}
That's it for the constructs we need, so we'll move into configuring our parse tree definitions. The code from here on is from Program.cs in my demo.
As you might have noticed in the block above about declaring the patterns for each token, there were some values referenced but not defined, here they are.
private const string CloseParen = "CloseParen";
private const string ComplexFunctionCall = "ComplexFunctionCall";
private const string FunctionCallStart = "FunctionCallStart";
private const string Literal = "Literal";
private const string OpenParen = "OpenParen";
private const string Operator = "Operator";
private const string ParenthesisRequiredElement = "ParenthesisRequiredElement";
private const string ParenthesizedItem = "ParenthesizedItem";
private const string Semi = "Semi";
private const string SimpleFunctionCall = "SimpleFunctionCall";
Let's begin by defining a pattern that matches literals (\w+ pattern) that are followed by open parenthesis; we'll use this to match things like sin( or Function1(.
static ParseTreeDefinition CreateFunctionCallStartTree()
{
var tree = new ParseTreeDefinition(FunctionCallStart);
var name = tree.AddOption(Literal);
name.AddOption(OpenParen, true);
return tree;
}
Really not a whole lot to it, setup a tree, add an option for the first thing to match as a Literal, add an option of the next thing to match as an open parenthesis and say that it can end the parse tree.
Now for one that's a little more complex, binary mathematical operations (couldn't think of any unary operations that would need to be included)
static ParseTreeDefinition CreateBinaryOperationResultTree()
{
var tree = new ParseTreeDefinition(Literal);
var parenthesizedItem = tree.AddOption(ParenthesizedItem);
var literal = tree.AddOption(Literal);
var simpleCall = tree.AddOption(SimpleFunctionCall);
var complexCall = tree.AddOption(ComplexFunctionCall);
var #operator = parenthesizedItem.AddOption(Operator);
literal.LinkTo(#operator);
simpleCall.LinkTo(#operator);
complexCall.LinkTo(#operator);
#operator.AddOption(ParenthesizedItem, true);
#operator.AddOption(Literal, true);
#operator.AddOption(SimpleFunctionCall, true);
#operator.AddOption(ComplexFunctionCall, true);
return tree;
}
Here we say that the parse tree can start with a parenthesized item (like (1/2)), a literal (like a5 or 3), a simple call (like sin(4)) or a complex one (like Function1(a1;a2;a3)). In essence we've just defined the options for the left hand operand. Next, we say that the parenthesized item must be followed by an Operator (one of the mathematical operators from the pattern declared way up at the beginning) and, for convenience, we'll say that all of the other options for the left hand operand can progress to that same state (having the operator). Next, the operator must have a right hand side as well, so we give it a duplicate set of options to progress to. Note that they are not the same definitions as the left hand operands, these have the flag set to be able to terminate the parse tree. Notice that the parse tree is named Literal to avoid having to specify yet another kind of element to match all over the place.
Next up, parenthesized items:
static ParseTreeDefinition CreateParenthesizedItemTree()
{
var tree = new ParseTreeDefinition(ParenthesizedItem);
var openParen = tree.AddOption(OpenParen);
var nestedSimpleCall = openParen.AddOption(SimpleFunctionCall);
var nestedComplexCall = openParen.AddOption(ComplexFunctionCall);
var arg = openParen.AddOption(Literal);
var parenthesizedItem = openParen.AddOption(ParenthesizedItem);
var closeParen = nestedSimpleCall.AddOption(CloseParen, true);
arg.LinkTo(closeParen);
parenthesizedItem.LinkTo(closeParen);
nestedComplexCall.LinkTo(closeParen);
return tree;
}
Nice and easy with this one, start with a parenthesis, follow it up with pretty much anything, follow that with another parenthesis to close it.
Simple calls (like sin(x))
static ParseTreeDefinition CreateSimpleFunctionCallTree()
{
var tree = new ParseTreeDefinition(SimpleFunctionCall);
var openParen = tree.AddOption(FunctionCallStart);
var nestedItem = openParen.AddOption(ParenthesizedItem);
var nestedSimpleCall = openParen.AddOption(SimpleFunctionCall);
var nestedComplexCall = openParen.AddOption(ComplexFunctionCall);
var arg = openParen.AddOption(Literal);
var parenthesizedItem = openParen.AddOption(ParenthesizedItem);
var closeParen = nestedSimpleCall.AddOption(CloseParen, true);
arg.LinkTo(closeParen);
nestedItem.LinkTo(closeParen);
parenthesizedItem.LinkTo(closeParen);
nestedComplexCall.LinkTo(closeParen);
return tree;
}
Complex calls (like Function1(a1;a2;a3))
static ParseTreeDefinition CreateComplexFunctionCallTree()
{
var tree = new ParseTreeDefinition(ComplexFunctionCall);
var openParen = tree.AddOption(FunctionCallStart);
var arg = openParen.AddOption(Literal, ParenthesisRequiredElement);
var simpleCall = openParen.AddOption(SimpleFunctionCall, ParenthesisRequiredElement);
var complexCall = openParen.AddOption(ComplexFunctionCall, ParenthesisRequiredElement);
var nested = openParen.AddOption(ParenthesizedItem);
var semi = arg.AddOption(Semi);
simpleCall.LinkTo(semi);
complexCall.LinkTo(semi);
nested.LinkTo(semi);
var arg2 = semi.AddOption(Literal, ParenthesisRequiredElement);
var simpleCall2 = semi.AddOption(SimpleFunctionCall, ParenthesisRequiredElement);
var complexCall2 = semi.AddOption(ComplexFunctionCall, ParenthesisRequiredElement);
var nested2 = semi.AddOption(ParenthesizedItem);
arg2.LinkTo(semi);
simpleCall2.LinkTo(semi);
complexCall2.LinkTo(semi);
nested2.LinkTo(semi);
var closeParen = arg2.AddOption(CloseParen, true);
arg2.LinkTo(closeParen);
simpleCall2.LinkTo(closeParen);
complexCall2.LinkTo(closeParen);
return tree;
}
That's all the trees we'll need, so let's take a look at the code that runs this all
static void Main()
{
//The input string
const string input = #"Function1((a1^5-4)/2;1/sin(a2);a3;a4)+Function2(a1;a2;1/a3)";
//Locate the recognizable tokens within the source
IList<ISourcePart> tokens = Tokenize(input).Cast<ISourcePart>().ToList();
//Create the parse trees we'll need to be able to recognize the different parts of the input
var functionCallStartTree = CreateFunctionCallStartTree();
var parenthethesizedItemTree = CreateParenthesizedItemTree();
var simpleFunctionCallTree = CreateSimpleFunctionCallTree();
var complexFunctionCallTree = CreateComplexFunctionCallTree();
var binaryOpTree = CreateBinaryOperationResultTree();
//Parse until we can't parse anymore
while (functionCallStartTree.Parse(ref tokens) || binaryOpTree.Parse(ref tokens) || parenthethesizedItemTree.Parse(ref tokens) || simpleFunctionCallTree.Parse(ref tokens) || complexFunctionCallTree.Parse(ref tokens))
{ }
//Run our post processing to fix the parenthesis in the input
FixParenthesis(ref tokens);
//Collapse our parse tree(s) back to a string
var values = tokens.OrderBy(x => x.Position).SelectMany(x => x.AsTokens()).Select(x => x.Value);
//Print out our results and wait
Console.WriteLine(string.Join(string.Empty, values));
Console.ReadLine();
}
The only thing we've got left to define is how to actually do the wrapping of the elements in the argument list of a "complex" call. That's handled by the FixParenthesis method.
private static void FixParenthesis(ref IList<ISourcePart> items)
{
//Iterate through the set we're examining
for (var i = 0; i < items.Count; ++i)
{
var parseNode = items[i] as ParseNode;
//If we've got a parse node...
if (parseNode != null)
{
var nodeTree = parseNode.SourcePart as ParseTree;
//If the parse node represents a parse tree...
if (nodeTree != null)
{
//Fix parenthesis within the tree
var nodes = nodeTree.ParseNodes;
FixParenthesis(ref nodes);
nodeTree.ParseNodes = nodes;
}
//If this parse node required parenthesis, replace the subtree and add them
if (parseNode.Tag == ParenthesisRequiredElement)
{
var nodeContents = parseNode.AsTokens();
var combined = string.Join(string.Empty, nodeContents.OrderBy(x => x.Position).Select(x => x.Value));
items[i] = Token.Create(parseNode.Kind, string.Format("({0})", combined), parseNode.Position);
}
continue;
}
var parseTree = items[i] as ParseTree;
//If we've got a parse tree...
if (parseTree != null)
{
//Fix parenthesis within the tree
var nodes = parseTree.ParseNodes;
FixParenthesis(ref nodes);
parseTree.ParseNodes = nodes;
}
}
}
At any rate, I hope this has helped or at least provided a fun diversion.
I probably managed to deal with it(now testing). It turned out to be 5-stage operation. Assuming that '{' and ';' cannot occur in function I've done sth like this:
sBuffer = Regex.Replace(sBuffer, #"(?<sep>[;])", "};{");
sBuffer = Regex.Replace(sBuffer, #"([(])(?<arg>.+?)[}]", "({${arg}}");
sBuffer = Regex.Replace(sBuffer, #"([;])(?<arg>.+?)([)]){1}", ";${arg}})");
sBuffer = Regex.Replace(sBuffer, #"{", "(");
sBuffer = Regex.Replace(sBuffer, #"}", ")");
0.
function1((a1^5-4)/2;1/sin(a2);a3;a4)+function2(a1;a2;1/a3)'
1.First line replaces ; with };{
function1((a1^5-4)/2};{1/sin(a2)};{a3};{a4)+function2(a1};{a2};{1/a3)
2.For first argument - after ( or (not intended) arguments which contain ')' replace (arg};with ({arg}:
function1({(a1^5-4)/2};{1/sin({a2)};{a3};{a4)+function2({a1};{a2};{1/a3)
3. The same at the and of function: {arg) with {arg}:
function1({(a1^5-4)/2};{1/sin({a2})};{a3};{a4})+function2({a1};{a2};{1/a3})
4.5. Replace '{' and '}' with '(' ')':
function1(((a1^5-4)/2);(1/sin((a2)));(a3);(a4))+function2((a1);(a2);(1/a3))
We have some extra () specially when argument itself is surrounded by '(' ')' (nested function) but it doesn't metter as the code is then proceed by Reversed Polish Notation
This is my first code for regexp(I found out about rgexp just few days ago- I'm a beginer) . I hope it's satisfies all the cases (at least those that can occur in excel formulas)

Categories