How to iterate XML by using XDocument in .Net - c#

I have a big XML file where I am taking small snippet by using ReadFrom() and then I will get xmlsnippet which contains leaf, sas, kir tags at different positions (sometimes leaf at top compare to kir or viceversa).
Now the thing is I am using three foreach loop to get these values which is bad logic and it will take time when this snippet also big.
Is there anyway I can use one foreach loop and then three if loop inside foreach to get values?
arr is a custom arraylist
var xdoc = new XDocument(xmlsnippet);
string xml = RemoveAllNamespaces(xdoc.ToString());
foreach (XElement element in XDocument.Parse(xml).Descendants("leaf"))
{
arr.Add(new Test("leaf", element.Value, 2));
break;
}
foreach (XElement element in XDocument.Parse(xml).Descendants("sas"))
{
arr.Add(new Test("sas", element.Value, 2));
break;
}
foreach (XElement element in XDocument.Parse(xml).Descendants("kir"))
{
if (element.Value == "0")
arr.Add(new Test("kir", "90", 2));
break;
}

You only need to Parse that xmlsnippet once (assuming it fits in memory) and then use XNamespace to qualify the right XElement. No need to call RemoveAllnamespaces which I guess does what its name implies and probably does so in an awful way.
I used the following XML snippet as example input, notice the namespaces a, b and c:
var xmlsnippet = #"<root xmlns:a=""https://a.example.com""
xmlns:b=""https://b.example.com""
xmlns:c=""https://c.example.com"">
<child>
<a:leaf>42</a:leaf>
<a:leaf>43</a:leaf>
<a:leaf>44</a:leaf>
<somenode>
<b:sas>4242</b:sas>
<b:sas>4343</b:sas>
</somenode>
<other>
<c:kir>80292</c:kir>
<c:kir>0</c:kir>
</other>
</child>
</root>";
And then use Linq to either return an instance if your Test class or null if no element can be found. That Test class instance is then added to the arraylist.
var arr = new ArrayList();
var xdoc = XDocument.Parse(xmlsnippet);
// add namespaces
var nsa = (XNamespace) "https://a.example.com";
var nsb = (XNamespace) "https://b.example.com";
var nsc = (XNamespace) "https://c.example.com";
var leaf = xdoc.Descendants(nsa + "leaf").
Select(elem => new Test("leaf", elem.Value, 2)).FirstOrDefault();
if (leaf != null) {
arr.Add(leaf);
}
var sas = xdoc.Descendants(nsb + "sas").
Select(elem => new Test("sas", elem.Value, 2)).FirstOrDefault();
if (sas != null) {
arr.Add(sas);
}
var kir = xdoc.
Descendants(nsc + "kir").
Where(ele => ele.Value == "0").
Select(elem => new Test("kir", "90", 2)).
FirstOrDefault();
if (kir != null) {
arr.Add(kir);
}
I expect this to be the most efficient way to find those nodes if you want to stick with using XDocument. If the xml is really huge you might consider using an XMLReader but that probably only helps if memory is a problem.
If you want to do it one LINQ Query you can do this:
var q = xdoc
.Descendants()
.Where(elem => elem.Name.LocalName == "leaf" ||
elem.Name.LocalName == "sas" ||
elem.Name.LocalName == "kir" && elem.Value == "0" )
.GroupBy(k=> k.Name.LocalName)
.Select(k=>
new Test(
k.Key,
k.Key != "kir"? k.FirstOrDefault().Value: "90",
2)
);
arr.AddRange(q.ToList());
That query goes looking for all elements named leaf, sas or kir, groups them on the elementname and then takes the first element in each group. Notice the extra handling in case the elementname is kir. Both the where clause and the projection in Select need to deal with that. You might want to performance test this as I'm not sure how efficient this will be.
For completeness here is an XmlReader version:
var state = FoundElement.NONE;
using(var xe = XmlReader.Create(new StringReader(xmlsnippet)))
while (xe.Read())
{
// if we have not yet found an specific element
if (((state & FoundElement.Leaf) != FoundElement.Leaf) &&
xe.LocalName == "leaf")
{
// add it ... do not change the order of those arguments
arr.Add(new Test(xe.LocalName, xe.ReadElementContentAsString(), 2));
// keep track what we already handled.
state = state | FoundElement.Leaf;
}
if (((state & FoundElement.Sas) != FoundElement.Sas) &&
xe.LocalName == "sas")
{
arr.Add(new Test(xe.LocalName, xe.ReadElementContentAsString(), 2));
state = state | FoundElement.Sas;
}
if (((state & FoundElement.Kir) != FoundElement.Kir) &&
xe.LocalName == "kir")
{
var localName = xe.LocalName; // we need this ...
var cnt = xe.ReadElementContentAsString(); // ... because this moves the reader
if (cnt == "0") {
arr.Add(new Test(localName, "90", 2));
state = state | FoundElement.Kir;
}
}
}
And here is the enum with the different states.
[Flags]
enum FoundElement
{
NONE =0,
Leaf = 1,
Sas = 2,
Kir = 4
}

Related

Lucene Net Search fail if term is too short

I am new to Lucene, so maybe this is a techical limit i dont understand.
I have indexed few text and the try to fetch the content.
If i query this text open-source reciprocal productivity with the query source i get a match.
If i sue the query sour i also gret a match. But if i use the query sou then i don't get any match.
I am using Lucene .Net version 4.8
Here the code i am using to creating index :
using (var dir = FSDirectory.Open(targetDirectory))
{
Analyzer analyzer = metadata.GetAnalyzer() ; //return new StandardAnalyzer(LuceneVersion.LUCENE_48);
var indexConfig = new IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer);
using (IndexWriter writer = new IndexWriter(dir, indexConfig))
{
long entryNumber = csvRecords.Count();
long index = 0;
long lastPercentage = 0;
foreach (dynamic csvEntry in csvRecords)
{
Document doc = new Document();
IDictionary<string, object> dynamicCsvEntry = (IDictionary<string, object>)csvEntry;
var indexedMetadataFiled = metadata.IdexedFields;
foreach (string headField in header)
{
if (indexedMetadataFiled.ContainsKey(headField) == false || (indexedMetadataFiled[headField].NeedToBeIndexed == false && indexedMetadataFiled[headField].NeedToBeStored == false))
continue;
var field = new Field(headField,
((string)dynamicCsvEntry[headField] ?? string.Empty).ToLower(),
indexedMetadataFiled[headField].NeedToBeStored ? Field.Store.YES : Field.Store.NO, //YES
indexedMetadataFiled[headField].NeedToBeIndexed ? Field.Index.ANALYZED : Field.Index.NO //YES
);
doc.Add(field);
}
long percentage = (long)(((decimal)index / (decimal)entryNumber) * 100m);
if ( percentage > lastPercentage && percentage % 10 == 0)
{
_consoleLogger.Information($"..indexing {percentage}%..");
lastPercentage = percentage;
}
writer.AddDocument(doc);
index++;
}
writer.Commit();
}
}
And here the code i sue to query the index :
var tokens = Regex.Split(query.Trim(), #"\W+");
BooleanQuery composedQuery = new BooleanQuery();
foreach (var field in luceneHint.FieldsToSearch)
{
foreach (string word in tokens)
{
if (string.IsNullOrWhiteSpace(word))
continue;
var termQuery = new FuzzyQuery(new Term(field.FieldName, word.ToLower() ));
termQuery.Boost = (float)field.Weight;
composedQuery.Add(termQuery, Occur.SHOULD);
}
}
var indexManager = IndexManager.Instance;
ReferenceManager<IndexSearcher> index = indexManager.Read(boundle);
int resultLimit = luceneHint?.Top ?? RESULT_LIMIT;
var results = new List<JObject>();
var searcher = index.Acquire();
try
{
Dictionary<string, FieldDescriptor> filedToRead = (luceneHint?.FieldsToRead?.Any() ?? false) ?
luceneHint.FieldsToRead.ToDictionary(item => item.FieldName, item => item) :
new Dictionary<string, FieldDescriptor>();
bool fetchEveryField = filedToRead.Count == 0;
TopScoreDocCollector collector = TopScoreDocCollector.Create(resultLimit, true);
int startPageIndex = pageIndex * itemsPerPage;
searcher.Search(composedQuery, collector);
//TopDocs topDocs = searcher.Search(composedQuery, luceneHint?.Top ?? 100);
TopDocs topDocs = collector.GetTopDocs(startPageIndex, itemsPerPage);
foreach (var scoreDoc in topDocs.ScoreDocs)
{
Document doc = searcher.Doc(scoreDoc.Doc);
dynamic result = new JObject();
foreach (var field in doc.Fields)
if (fetchEveryField || filedToRead.ContainsKey(field.Name))
result[field.Name] = field.GetStringValue();
results.Add(result);
}
}
finally
{
if ( searcher != null )
index.Release(searcher);
}
return results;
I am confused, is the fact the i cant get resoult for sou query relate to the fact that the StandardAnalyzer that is used to build the index, use a some stop-word that prevent my query term to be found in the index? (the index stop ad source and sour because those are both english words)
Ps : here the explain plot, even if i don't know how to use it :
searcher.Explain(composedQuery,6) {0 = (NON-MATCH) sum of: }
Description: "sum of:"
IsMatch: false
Match: false
Value: 0
The documentation for FuzzyQuery points out that it uses the default minimumSimilarity value of 0.5: https://lucenenet.apache.org/docs/3.0.3/d0/db9/class_lucene_1_1_net_1_1_search_1_1_fuzzy_query.html
minimumSimilarity - a value between 0 and 1 to set the required similarity between the query term and the matching terms. For example, for a minimumSimilarity of 0.5 a term of the same length as the query term is considered similar to the query term if the edit distance between both terms is less than length(term) * 0.5
So, it matches "source" when the query is "sour", because removing "ce" requires two edits, the edit distance is 2, and that's <= than length("sour") * 0.5. However, matching "source" to "sou" would need 3 edits, and so it's not a match.
You should be able to see the same document matching even if you search for something like "bounce" or "sauce", since those are also within two edits from "source".

Finding all identifiers containing part of the token

I know I can get a string from resources using
Resources.GetIdentifier(token, "string", ctx.ApplicationContext.PackageName)
(sorry, this is in C#, it's part of a Xamarin.Android project).
I know that if my elements are called foo_1, foo_2, foo_3, then I can iterate and grab the strings using something like
var myList = new List<string>();
for(var i = 0; i < 4; ++i)
{
var id = AppContent.GetIdentifier(token + i.ToString(), "string", "package_name");
if (id != 0)
myList.Add(AppContext.GetString(id));
}
My issue is that my token names all begin with "posn." (the posn can denote the position of anything, so you can have "posn.left_arm" and "posn.brokenose"). I want to be able to add to the list of posn elements, so I can't really store a list of the parts after the period. I can't use a string-array for this either (specific reason means I can't do this).
Is there a way that I can use something akin to "posn.*" in the getidentifer call to return the ids?
You can use some reflection foo to get what you want. It is not pretty at all but it works. The reflection stuff is based on https://gist.github.com/atsushieno/4e66da6e492dfb6c1dd0
private List<string> _stringNames;
private IEnumerable<int> GetIdentifiers(string contains)
{
if (_stringNames == null)
{
var eass = Assembly.GetExecutingAssembly();
Func<Assembly, Type> f = ass =>
ass.GetCustomAttributes(typeof(ResourceDesignerAttribute), true)
.OfType<ResourceDesignerAttribute>()
.Where(ca => ca.IsApplication)
.Select(ca => ass.GetType(ca.FullName))
.FirstOrDefault(ty => ty != null);
var t = f(eass) ??
AppDomain.CurrentDomain.GetAssemblies().Select(ass => f(ass)).FirstOrDefault(ty => ty != null);
if (t != null)
{
var strings = t.GetNestedTypes().FirstOrDefault(n => n.Name == "String");
if (strings != null)
{
var fields = strings.GetFields();
_stringNames = new List<string>();
foreach (var field in fields)
{
_stringNames.Add(field.Name);
}
}
}
}
if (_stringNames != null)
{
var names = _stringNames.Where(s => s.Contains(contains));
foreach (var name in names)
{
yield return Resources.GetIdentifier(name, "string", ComponentName.PackageName);
}
}
}
Then somewhere in your Activity you could do:
var ids = GetIdentifiers("action").ToList();
That will give you all the String Resources, which contain the string action.

How to read this part of the web XML file?

I am working on a XML reader which shows the result in the labels.
I want to read the node called "Opmerking" which is standing in "Opmerkingen"
A example:
<VertrekkendeTrein>
<RitNummer>4085</RitNummer>
<VertrekTijd>2014-06-13T22:00:00+0200</VertrekTijd>
<EindBestemming>Rotterdam Centraal</EindBestemming>
<TreinSoort>Sprinter</TreinSoort>
<RouteTekst>A'dam Sloterdijk, Amsterdam C., Duivendrecht</RouteTekst>
<Vervoerder>NS</Vervoerder>
<VertrekSpoor wijziging="false">4</VertrekSpoor>
<Opmerkingen>
<Opmerking> Rijdt vandaag niet</Opmerking>
</Opmerkingen>
</VertrekkendeTrein>
"Opmerkingen" is not always there, it is always changing. The code i use now:
XmlNodeList nodeList = xmlDoc.SelectNodes("ActueleVertrekTijden/VertrekkendeTrein/*");
and:
foreach (XmlNode nodelist2 in nodeList)
{
if (i < 17) //4
{
switch (nodelist2.Name)
{
case "VertrekTijd": string kuttijd4 = (nodelist2.InnerText);
var res4 = Regex.Match(kuttijd4, #"\d{1,2}:\d{1,2}").Value;
lblv4.Text = Convert.ToString(res4); break;
case "TreinSoort": lblts4.Text = (nodelist2.InnerText); break;
case "RouteTekst": lblvia4.Text = (nodelist2.InnerText); break;
case "VertrekSpoor": lbls4.Text = (nodelist2.InnerText); i++; break;
}
}
}
How can i read the part "Opmerking" and set it in a case?
I tried it a few times, but it failed.
i also tried:
case "Opmerking": var texeliseeneiland1 = (nodelist2.InnerText); if (texeliseeneiland1 == null) { } else { lblop1.Text = texeliseeneiland1; lblop1.Font = new Font(lblop1.Font.FontFamily, 17); lblop1.Visible = true; picop1.Visible = true; }; break;
Anyone who knows the answer?
Just extend your logic with check whether current node has child nodes and if so, read them and process:
if (nodelist2.HasChildNodes)
{
for (int i=0; i<nodelist2.ChildNodes.Count; i++)
{
var childNode = root.ChildNodes[i];
//do whatever you need to display the contents of the child node.
}
}
Also I have to recommend to consider LinqToXML or at least refactor the code you shared. With LinqToXML is might be as easy as this:
var temp = from remarkNode in nodelist2.Descendants("Opmerking")
select remarkNode.Value;
Somehow load the xml content in an XDocument object and loop through it.
Example: read it from a file
var doc = XDocument.Load("C:/test.xml");
foreach (var xe in doc.Descendants("Opmerking"))
{
var value = xe.Value;
//Do your job with value
}

How to check if a node has a single child element which is empty?

I have the following code,
XDocument doc = XDocument.Parse(input);
var nodes = doc.Element(rootNode)
.Descendants()
.Where(n =>
(n.Value != "0"
&& n.Value != ".00"
&& n.Value != "false"
&& n.Value != "")
|| n.HasElements)
.Select(n => new
{
n.Name,
n.Value,
Level = n.Ancestors().Count() - 1,
n.HasElements
});
var output = new StringBuilder();
foreach (var node in nodes)
{
if (node.HasElements)
{
output.AppendLine(new string(' ', node.Level) + node.Name.ToString() + ":");
}
else
{
}
My problem is that in case my parent node has only one empty child node, I need to insert one extra blank line. I could not figure out how to check if the only child is empty.
I can get the number of descendants using Descendants = n.Descendants().Count() But I do not see how can I test if that only child is empty or not.
My understanding is that you need all of the parent nodes who only have one child node, and that child node is empty, from what I understand --
Here's a simple test that accomplishes this: It doesn't use your example specifically but accomplishes the task. If you provide what your XML looks like I can try and modify my example to fit your post, if the below is not easily transplanted into your project :)
(Taken from a console app, but the query that actually gets the nodes should work.
static void Main(string[] args)
{
var xml = #"<root><child><thenode>hello</thenode></child><child><thenode></thenode></child></root>";
XDocument doc = XDocument.Parse(xml);
var parentsWithEmptyChild = doc.Element("root")
.Descendants() // gets all descendants regardless of level
.Where(d => string.IsNullOrEmpty(d.Value)) // find only ones with an empty value
.Select(d => d.Parent) // Go one level up to parents of elements that have empty value
.Where(d => d.Elements().Count() == 1); // Of those that are parents take only the ones that just have one element
parentsWithEmptyChild.ForEach(Console.WriteLine);
Console.ReadKey();
}
This returns only the 2nd node, which is the one containing only one empty node, where empty is assumed to be a value of string.Empty.
I was trying to solve this problem myself and this is what I come up with:
XDocument doc = XDocument.Parse(input);
var nodes = doc.Element(rootNode).Descendants()
.Where(n => (n.Value != "0" && n.Value != ".00" && n.Value != "false" && n.Value != "") || n.HasElements)
.Select(n => new { n.Name, n.Value, Level = n.Ancestors().Count() - 1,
n.HasElements, Descendants = n.Descendants().Count(),
FirstChildValue = n.HasElements?n.Descendants().FirstOrDefault().Value:"" });
var output = new StringBuilder();
foreach (var node in nodes)
{
if (node.HasElements)
{
output.AppendLine(new string(' ', node.Level) + node.Name.ToString() + ":");
if (0 == node.Level && 1 == node.Descendants && String.IsNullOrWhiteSpace(node.FirstChildValue))
output.AppendLine("");
}

Is this a good way to iterate through a .NET LinkedList and remove elements?

I'm thinking of doing the following:
for(LinkedListNode<MyClass> it = myCollection.First; it != null; it = it.Next)
{
if(it.Value.removalCondition == true)
it.Value = null;
}
What I'm wondering is: if simply pointing the it.Value to null actually gets rid of it.
Setting the it.Value to null will not remove the node from the list
Here is one way:
for(LinkedListNode<MyClass> it = myCollection.First; it != null; )
{
LinkedListNode<MyClass> next = it.Next;
if(it.Value.removalCondition == true)
myCollection.Remove(it); // as a side effect it.Next == null
it = next;
}
Surely (with a linked list) you need to change the link.
Eg, if you want to remove B from the LL A-B-C, you need to change A's link to B to C.
I'll admit I'm not familiar with the .NET implementation of linked lists but hopefully that's a start for you.
You are changing the value pointed to by a LinkedListNode; beware that your list will contain a hole (empty node) now.
Instead of A - B - C you are going to have A - null - C, if you "delete" B. Is that what you want to achieve?
If you can convert to using List<> rather than LinkedList<> then you can use the RemoveAll() operation. Pass an anonymous delegate like this;
List<string> list = new List<string>()
{
"Fred","Joe","John"
};
list.RemoveAll((string val) =>
{
return (0 == val.CompareTo("Fred"));
});
All this is using Linq extensions.
If you can't convert to using a list then you can use the ToList<>() method to convert it. But you'll then have to do some clear and insertion operations. Like this;
LinkedList<string> str = new LinkedList<string>();
str.AddLast("Fred");
str.AddLast("Joe");
str.AddLast("John");
List<string> ls = str.ToList();
ls.RemoveAll((string val) => val.CompareTo("Fred") == 0);
str.Clear();
ls.ForEach((string val) => str.AddLast(val));
If all this still isn't palatable then try doing a copy of the LinkedList like this;
LinkedList<string> str = new LinkedList<string>();
str.AddLast("Fred");
str.AddLast("Joe");
str.AddLast("John");
LinkedList<string> strCopy = new LinkedList<string>(str);
str.Clear();
foreach (var val in strCopy)
{
if (0 != val.CompareTo("Fred"))
{
str.AddLast(val);
}
}
I hope that helps.
I assume something like this is required
for ( LinkedListNode<MyClass> it = myCollection.First; it != null; it = it.Next ) {
if ( it.Value.removalCondition == true ) {
if ( it.Previous != null && it.Next != null ) {
it.Next.Previous = it.Previous;
it.Previous.Next = it.Next;
} else if ( it.Previous != null )
it.Previous.Next = it.Next;
} else if ( it.Next != null )
it.Next.Previous = it.Previous;
it.Value = null;
}
}
As far as I understood You want to iterate in linkedlist with for cycle which olso contains null -s , so you can use folowing :
for (LinkedListNode<string> node = a.First; node != a.Last.Next; node = node.Next)
{
// do something here
}

Categories