How to read really big KML File - c#
So I have code that successfully reads in a kml file using XDocumnet, however, as the kml file is now significantly larger, I can't run the program without getting a System out of memory exception. Can anyone help me convert this code to a different reader, so I no longer get this exception
Here's what worked when the kml file was smaller:
Dictionary<string, List<string>> CountyCoordinates = new Dictionary<string, List<string>>();
List<string> locationList = new List<string>();
var doc = XDocument.Load("gadm36_USA.kml");
XNamespace ns = "http://www.opengis.net/kml/2.2";
var result = doc.Root.Descendants(ns + "Placemark");
List<XElement> extendedDatas = doc.Descendants(ns + "ExtendedData").ToList();
foreach (XElement extendedData in extendedDatas)
{
List<XElement> simpleFields = extendedData.Elements(ns + "SimpleField").ToList();
}
foreach (XElement xmlInfo in result)
{
var region = xmlInfo.Element(ns + "ExtendedData").Element(ns + "SchemaData").Value;
List<XElement> simpleFields = xmlInfo.Element(ns + "ExtendedData").Element(ns + "SchemaData").Elements(ns + "SimpleField").ToList();
//var country = region.Element(ns + "SimpleData").Value;
//var state = region.Element(ns + "SimpleData");
//var cityCounty = region.Element(ns + "SimpleData");
//<Polygon><outerBoundaryIs><LinearRing><coordinates>
locationList = xmlInfo.Element(ns + "MultiGeometry").Element(ns + "Polygon").Element(ns + "outerBoundaryIs").Element(ns + "LinearRing").Element(ns + "coordinates").Value.Split(' ').ToList();
region = region.ToLower();
region = Regex.Replace(region, #"[^\w]", string.Empty);
CountyCoordinates.Add(region, locationList);
}
return CountyCoordinates;
Here is an example of the klm file:
<Placemark>
<Style><LineStyle><color>ff0000ff</color></LineStyle><PolyStyle><fill>0</fill></PolyStyle></Style>
<ExtendedData><SchemaData schemaUrl="#gadm36_USA_2">
<SimpleData name="NAME_0">United States</SimpleData>
<SimpleData name="NAME_1">Arizona</SimpleData>
<SimpleData name="NAME_2">Pima</SimpleData>
</SchemaData></ExtendedData>
<MultiGeometry><Polygon><outerBoundaryIs><LinearRing><coordinates>-112.539321899414,31.7949981689454 -112.604850769043,31.8155326843262 -112.626457214355,31.8217906951905 -112.635643005371,31.8249397277833 -112.753486633301,31.8611507415771 -112.9423828125,31.9185504913331 -113.081657409668,31.9619903564454 -113.08381652832,31.9624309539794 -113.33349609375,32.0400009155274 -113.333587646484,32.0450096130371 -113.333930969238,32.0983505249024 -113.333992004394,32.1020011901855 -113.33381652832,32.3513298034668 -113.333892822266,32.4205894470215 -113.333686828613,32.5053520202638 -113.14338684082,32.5051498413086 -113.133605957031,32.5052490234376 -112.963966369629,32.504539489746 -112.929733276367,32.5048217773438 -112.872100830078,32.5048294067383 -112.572570800781,32.5058212280275 -112.202919006348,32.507080078125 -112.099632263184,32.5076789855958 -111.793571472168,32.5066299438478 -111.789756774902,32.5066184997559 -111.756599426269,32.506549835205 -111.73974609375,32.5069694519043 -111.721809387207,32.5064697265625 -111.672340393066,32.5063400268555 -111.65495300293,32.5067405700683 -111.587532043457,32.5069808959962 -111.567413330078,32.5069007873536 -111.567436218262,32.5018920898438 -111.471229553223,32.5019416809083 -111.464157104492,32.5018997192383 -111.446762084961,32.5022697448731 -111.262466430664,32.5030517578125 -111.222785949707,32.5027809143066 -111.20539855957,32.5026588439942 -111.154846191406,32.5027503967285 -111.154747009277,32.5114097595215 -111.098213195801,32.5118904113769 -111.063407897949,32.512062072754 -110.950866699219,32.5124397277833 -110.854629516602,32.5119590759278 -110.840507507324,32.5113601684571 -110.840469360352,32.513641357422 -110.756187438965,32.5145797729493 -110.704528808594,32.5144500732423 -110.694198608398,32.5143318176271 -110.684951782227,32.5146789550781 -110.548492431641,32.5139198303223 -110.448432922363,32.5144195556642 -110.448112487793,32.474781036377 -110.447952270508,32.4273910522462 -110.447196960449,32.270149230957 -110.446952819824,32.2551002502443 -110.443702697754,32.2550506591798 -110.44425201416,32.1711921691896 -110.444358825684,32.1657218933105 -110.446006774902,32.0804901123048 -110.448188781738,32.0800704956055 -110.448440551758,32.0668487548828 -110.448196411133,32.05179977417 -110.447708129883,31.9929695129395 -110.446952819824,31.9209003448487 -110.446708679199,31.9053897857667 -110.448127746582,31.7758693695069 -110.448387145996,31.7621917724611 -110.448165893555,31.7462196350099 -110.448455810547,31.7307109832764 -110.534072875977,31.7309474945069 -110.616989135742,31.7306327819824 -110.66438293457,31.7303009033204 -110.669212341309,31.7308082580568 -110.68376159668,31.7305297851564 -110.690216064453,31.7306003570557 -110.704216003418,31.7307624816895 -110.794136047363,31.7308502197266 -110.852279663086,31.7310104370118 -110.851821899414,31.7255306243896 -110.871200561523,31.7257328033448 -110.890579223633,31.7254600524903 -110.955726623535,31.7247200012206 -111.003646850586,31.7247009277344 -111.161399841309,31.7241706848145 -111.161209106445,31.6388511657716 -111.161590576172,31.5507926940918 -111.159545898438,31.5402812957765 -111.16081237793,31.522029876709 -111.263381958008,31.5218391418457 -111.298286437988,31.5216102600098 -111.365417480469,31.5211029052736 -111.364036560059,31.4234199523926 -111.406181335449,31.4369182586671 -111.440193176269,31.4478206634521 -111.448059082031,31.4503402709962 -111.534820556641,31.4781227111816 -111.618415832519,31.5049114227295 -111.687202453613,31.5267524719238 -111.700218200684,31.5308895111085 -111.769149780273,31.5527782440186 -111.952102661133,31.6104793548585 -112.122657775879,31.664270401001 -112.223472595215,31.6960582733155 -112.539321899414,31.7949981689454</coordinates></LinearRing></outerBoundaryIs></Polygon></MultiGeometry>
</Placemark>
Edit so this is what I've done, this successfully does what I want, but it is very computationally expensive and often ends up using 100% of my cpu and crashing the program
public Dictionary<string, List<string>> LoadZipBoundaries()
{
Dictionary<string, List<string>> CountyCoordinates = new Dictionary<string, List<string>>();
List<string> locationList = new List<string>();
string zip = null;
using (XmlReader reader = XmlReader.Create("utah_zcta.kml"))
{
reader.MoveToContent();
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element)
{
if (reader.Name == "SchemaData")
{
XElement el = XNode.ReadFrom(reader) as XElement;
zip = el.Value.Replace("\n\t\t", ",").Split(',')[1];
}
if (reader.Name == "MultiGeometry")
{
XElement coord = XNode.ReadFrom(reader) as XElement;
locationList = coord.Value.Split(' ').ToList();
CountyCoordinates.Add(zip, locationList);
}
}
}
}
return CountyCoordinates;
}
Related
Adding a new node (taking value from another xml file) inside a certain parent node?
I'm trying to create a program having the following steps: 1) Get all xml files from a user given path 2) Open each of the files (if any) and search for nodes <institution> where it is in the format <funding-source><institution-wrap><institution>...</institution></institution-wrap></funding-source> 3) Get the value of the nodes <institution> and search the exact value in the database xml inside the nodes <skosxl:literalForm xml:lang="..."> 4) If found, get the attribute value of its parent node <skos:Concept rdf:about="..."> minus the string http://dx.doi.org/ 5) Add a node <institution-id institution-id-type="fundref"> in the xml file after the <institution> node with the value like <funding-source><institution-wrap><institution>...</institution><institution-id institution-id-type="fundref">VALUE of the rdf:about attribute</institution-id></institution-wrap></funding-source> Here is a sample input file and the desired output for that file. What I have tried: string pathToUpdatedFile = #"D:\test\Jobs"; var files=Directory.GetFiles(pathToUpdatedFile,"*.xml"); foreach (var file in files) { var fundingDoc = XDocument.Load(#"D:\test\database.xml"); XNamespace rdf=XNamespace.Get("http://www.w3.org/1999/02/22-rdf-syntax-ns#"); XNamespace skosxl = XNamespace.Get("http://www.w3.org/2008/05/skos-xl#"); XNamespace skos=XNamespace.Get("http://www.w3.org/2004/02/skos/core#"); var targetAtt = fundingDoc.Descendants(skos+"Concept").Elements(skosxl+"prefLabel") .ToLookup(s => (string)s.Element(skosxl+"literalForm"), s => (string)s.Parent.Attribute(rdf+"about")); XDocument outDoc = XDocument.Parse(File.ReadAllText(file),LoadOptions.PreserveWhitespace); foreach (var f in outDoc.Descendants("funding-source").Elements("institution-wrap")) { if (f.Element("institution-id") == null) { var name = (string)f.Element("institution"); var x = targetAtt[name].FirstOrDefault(); // just take the first one if (x != null) f.Add(new XElement("institution-id", new XAttribute("institution-id-type","fundref"),x.Substring(#"http://dx.doi.org/".Length))); } outDoc.Save(file); } Console.ReadLine(); But it is not working...Can somebody help...
See code below : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using System.Xml.Linq; using System.Data; using System.Data.OleDb; namespace ConsoleApplication31 { class Program { const string FILENAME = #"c:\temp\test.xml"; const string DATABASE = #"c:\temp\test1.xml"; static void Main(string[] args) { XDocument doc = XDocument.Load(FILENAME); XElement article = doc.Root; XNamespace ns = article.GetDefaultNamespace(); XDocument docDatabase = XDocument.Load(DATABASE); XElement rdf = docDatabase.Root; XNamespace nsSkosxl = rdf.GetNamespaceOfPrefix("skosxl"); XNamespace nsRdf = rdf.GetNamespaceOfPrefix("rdf"); List<XElement> prefLabels = rdf.Descendants(nsSkosxl + "prefLabel").ToList(); Dictionary<string, List<string>> dictLabels = prefLabels.GroupBy(x => (string)x.Descendants(nsSkosxl + "literalForm").FirstOrDefault(), y => (string)y.Element(nsSkosxl + "Label").Attribute(nsRdf + "about")) .ToDictionary(x => x.Key, y => y.ToList()); List<XElement> fundingSources = article.Descendants(ns + "funding-source").ToList(); foreach (XElement fundingSource in fundingSources) { XElement institutionWrap = fundingSource.Element(ns + "institution-wrap"); string institution = (string)fundingSource; if (dictLabels.ContainsKey(institution)) { institutionWrap.Add(new XElement("institution-id", new object[] { new XAttribute("institution-id-type","fundref"), dictLabels[institution] })); } else { Console.WriteLine("Dictionary doesn't contain : '{0}'", institution); } } Console.ReadLine(); } } }
I think this is what you are looking for (MODIFIED jdweng's CODE A LITTLE) const string FILENAME = #"c:\temp\test.xml"; const string DATABASE = #"c:\temp\test1.xml"; public static void Main(string[] args) { XDocument doc = XDocument.Load(FILENAME); XElement article = doc.Root; XNamespace ns = article.GetDefaultNamespace(); XDocument docDatabase = XDocument.Load(DATABASE); XElement rdf = docDatabase.Root; XNamespace nsSkosxl = rdf.GetNamespaceOfPrefix("skosxl"); XNamespace nsSkos = rdf.GetNamespaceOfPrefix("skos"); XNamespace nsRdf = rdf.GetNamespaceOfPrefix("rdf"); List<XElement> prefLabels = rdf.Descendants(nsSkos + "Concept").ToList(); Dictionary<string, List<string>> dictLabels = prefLabels.GroupBy(x => (string)x.Descendants(nsSkosxl + "literalForm").FirstOrDefault(), y => (string)y.Parent.Element(nsSkos+"Concept").Attribute(nsRdf + "about").Value.Substring(18)) .ToDictionary(x => x.Key, y => y.ToList()); List<XElement> fundingSources = article.Descendants(ns + "funding-source").ToList(); foreach (XElement fundingSource in fundingSources) { XElement institutionWrap = fundingSource.Element(ns + "institution-wrap"); string institution = (string)fundingSource; if (dictLabels.ContainsKey(institution)) { institutionWrap.Add(new XElement("institution-id", new object[] { new XAttribute("institution-id-type","fundref"), dictLabels[institution] })); } } doc.Save(FILENAME); Console.WriteLine("Done"); Console.ReadLine(); }
Deleted xml elements are not being removed
In the following snippet the final string "removed" still has the deleted elements. I am converting to byte[] first because the final implementation input will be in that form. I am looping thru the saved nodes and deleting them for element document. What gives? All ideas greatly appreciated. static void Main(string[] args) { var str = #" <CustomerCareReport xmlns='http://schemas.datacontract.org/2004/07/Foo.Bar.Frameworks.Reporting.Stack.Infrastructure.DataObjects.FireEms' xmlns:i='http://www.w3.org/2001/XMLSchema-instance'> <OtherInformation> <Documents xmlns:a='http://schemas.datacontract.org/2004/07/Foo.Bar.Frameworks.Reporting.Stack.Infrastructure.DataObjects.Documents'> <a:Document> <a:Contents>Z2dn</a:Contents> <a:Description>yoyo</a:Description> <a:Id>1</a:Id> </a:Document> <a:Document> <a:Contents>ZmhmaA0KZmZmDQpmamZqZmpmDQo=</a:Contents> <a:Description>test 2</a:Description> <a:Id>2</a:Id> </a:Document> </Documents> </OtherInformation> </CustomerCareReport> "; var element = XElement.Parse(str); XNamespace ns = element.Name.Namespace; // convert to bytes firstbecause final implementation input is byte array byte[] bytes = Encoding.ASCII.GetBytes(str); using (var stream = new MemoryStream(bytes)) { var deleteMarked = new List<XElement>(); var xelement = XElement.Load(stream); var docs = xelement .Descendants(ns + "Documents"); foreach (XElement doc in docs.Descendants()) { if (doc.Name.LocalName.Equals("Document")) { var delDoc = doc; deleteMarked.Add(delDoc); } } foreach (var deldoc in deleteMarked) { foreach (var node in deldoc.Elements()) { node.Remove(); } } if (deleteMarked.Count > 0) { stream.Close(); bytes = stream.ToArray(); } var removed = System.Text.Encoding.Default.GetString(bytes); Console.WriteLine(removed); } }
Issue with processing an xml file in c#
I have a 5gb xml file which need to be processed. So, I used XMLReader, but I am having hard time with processing files. I have following part and I want to take the values of levelid,levelUl,primaryCode,primaryPower from sections coming under <ab:pin id="1022">,<ab:pin id="1023">,<ab:pin id="1024"> etc. But the problem I am now facing is there are different sections having the same element names like levelid,levelUl,primaryCode,primaryPower etc. with different values and I am getting incorrect values. How do I correct to correct my code? Following is part of 5 GB xml file <ab:pin id="1022"> <ab:attributes> <ab:levelid>1022</ab:levelid> <ab:levelUl>9837</ab:levelUl> <ab:primaryCode>25</ab:primaryCode> <ab:primaryPower>330</ab:primaryPower> . . . . <ab:pin id="1023"> <ab:attributes> <ab:levelid>1023</ab:levelid> <ab:levelUl>9833</ab:levelUl> <ab:primaryCode>35</ab:primaryCode> <ab:primaryPower>340</ab:primaryPower> Following is the code what i have done XmlReader reader = XmlReader.Create(path); reader.MoveToContent(); string nsUn = reader.LookupNamespace("ab"); while (!reader.EOF) { reader.ReadToFollowing("levelid", nsUn); if (!reader.EOF) { XElement cell = (XElement)XElement.ReadFrom(reader); level_id = cell.Value; ins3gericson.Add(new TestField("level_id", level_id, 2)); } reader.ReadToFollowing("levelUl", nsUn); if (!reader.EOF) { XElement cell = (XElement)XElement.ReadFrom(reader); ins3gericson.Add(new TestField("levelUl", cell.Value, 2)); } reader.ReadToFollowing("primaryCode", nsUn); if (!reader.EOF) { XElement cell = (XElement)XElement.ReadFrom(reader); ins3gericson.Add(new TestField("primaryCode", cell.Value, 2)); } reader.ReadToFollowing("primaryPower", nsUn); if (!reader.EOF) { XElement cell = (XElement)XElement.ReadFrom(reader); ins3gericson.Add(new TestField("primaryPower", cell.Value, 2)); }
Here is my suggestion: using (XmlReader xr = XmlReader.Create("input.xml")) { xr.MoveToContent(); XNamespace ab = xr.LookupNamespace("ab"); while (xr.Read()) { if (xr.NodeType == XmlNodeType.Element && xr.NamespaceURI == ab && xr.LocalName == "pin") { XElement pin = (XElement)XNode.ReadFrom(xr); var data = from atts in pin.Elements(ab + "attributes") select new { levelid = (string)atts.Element(ab + "levelid"), levelUl = (string)atts.Element(ab + "levelUl"), primaryCode = (string)atts.Element(ab + "primaryCode"), primaryPower = (string)atts.Element(ab + "primaryPower") }; Console.WriteLine("levelId: {0}; levelUl: {1}, ...", data.First().levelid, data.First().levelUl); // store/output values here } } } Obviously it all depends on how large the ab:pin elements are, but normally with huge XML input the individual elements fit well into memory. And be careful with the XmlReader, if you have adjacent ab:pin elements without any whitespace, then the above might skip an element, so there the code would need some more finetuning, along the lines of using (XmlReader xr = XmlReader.Create("../../XMLFile1.xml")) { xr.MoveToContent(); XNamespace ab = xr.LookupNamespace("ab"); while (xr.Read()) { while (xr.NodeType == XmlNodeType.Element && xr.NamespaceURI == ab && xr.LocalName == "pin") { XElement pin = (XElement)XNode.ReadFrom(xr); var data = from atts in pin.Elements(ab + "attributes") select new { levelid = (string)atts.Element(ab + "levelid"), levelUl = (string)atts.Element(ab + "levelUl"), primaryCode = (string)atts.Element(ab + "primaryCode"), primaryPower = (string)atts.Element(ab + "primaryPower") }; Console.WriteLine("levelId: {0}; levelUl: {1}, ...", data.First().levelid, data.First().levelUl); } } }
Best method for comparing XML with string
I am looking for the best way to compare XML data with a string. the data is stored in a xml called test.xml, and must be compared with the name descendant, if there is a match more info from the xml must be added to a textbox and picture box. My ( working ) code: var xmlDocument = XDocument.Load("test.xml"); // XML koppellen var key1 = xmlDocument.Descendants("NAME"); // XML filepath var key2 = xmlDocument.Descendants("TITLE"); // XML titel var key3 = xmlDocument.Descendants("BRAND"); // XML afbeelding var key4 = xmlDocument.Descendants("TYPE"); // XML merk var key5 = xmlDocument.Descendants("SOORT"); // XML type var key6 = xmlDocument.Descendants("NAAM"); // XML naam List<string> file = new List<string>(); List<string> title = new List<string>(); List<string> brand = new List<string>(); List<string> type = new List<string>(); List<string> soort = new List<string>(); List<string> naam = new List<string>(); int i = 0; foreach (var key in key1) { file.Add(key.Value.Trim()); } foreach (var key in key2) { title.Add(key.Value.Trim()); } foreach (var key in key3) { brand.Add(key.Value.Trim()); } foreach (var key in key4) { type.Add(key.Value.Trim()); } foreach (var key in key5) { soort.Add(key.Value.Trim()); } foreach (var key in key6) { naam.Add(key.Value.Trim()); } foreach (var Name in naam) { if (textBox3.Text.ToString() == Name.ToString()) { PDFLocation = file[i].ToString(); pictureBox1.Image = pdfhandler.GetPDFthumbNail(PDFLocation); textBox4.Text = title[i].ToString() + "\r\n" + brand[i].ToString() + "\r\n" + type[i].ToString() + "\r\n" + soort[i].ToString() + "\r\n" + textBox3.Text + "\r\n"; } i++; } ] I think this is not the best way to do it, but cant see a better way.... Update: solution: foreach (XElement element in xmlDocument.Descendants("PDFDATA")) { if (textBox3.Text.ToString() == element.Element("NAAM").Value.Trim()) { PDFLocation = element.Element("NAME").Value.ToString(); pictureBox1.Image = pdfhandler.GetPDFthumbNail(PDFLocation); textBox4.Text = element.Element("TITLE").Value + "\r\n" + element.Element("BRAND").Value + "\r\n"; break; } }
Instead of thinking of the xml and a bunch of individual lists of data, it helps to think of it more as objects. Then you can loop through each element one at a time and don't need to split it up into individual lists. This not only removes duplicate code but more importantly creates a better abstraction of the data you are working with. This makes it easier to read and understand what the code is doing. foreach (XElement element in xmlDocument.Elements()) { if (textBox3.Text.ToString() == element.Element("NAAM").Value) { PDFLocation = element.Element("NAAM").Value; pictureBox1.Image = pdfhandler.GetPDFthumbNail(PDFLocation); textBox4.Text = element.Element("Title").Value + "\r\n" + element.Element("Brand").Value + "\r\n" + element.Element("Type").Value + "\r\n" // access rest of properties... } }
LINQ to read XML file and print results
I got the following XML file (Data.xml): <root> <sitecollection name="1A"> <site name="1B"> <maingroup name="1C"> <group name="1D"> </group> </maingroup> </site> </sitecollection> <sitecollection name="2A"> <site name="2B"> <maingroup name="2C"> <group name="2D"> </group> </maingroup> </site> </sitecollection> </root> And I need to print all the all the child elements in this format: 1A 1B 1C 1D 2A 2B 2C 2D I have the following code so far which needs some adjustment. I could also change it completely if there's an easier method. Thanks for your help class xmlreader { public static void Main() { // Xdocument to read XML file XDocument xdoc = XDocument.Load("Data.xml"); var result = new System.Text.StringBuilder(); var lv1s = from lv1 in xdoc.Descendants("sitecollection") select new { sitecollection = lv1.Attribute("name").Value, maingroup = lv1.Descendants("group") }; var lv2s = from lv2 in xdoc.Descendants("site") select new { site = lv2.Attribute("name").Value, sitetittle = lv2.Descendants() }; var lv3s = from lv3 in xdoc.Descendants("maingroup") select new { maingroup = lv3.Attribute("name").Value, }; var lv4s = from lv4 in xdoc.Descendants("group") select new { grouppage = lv4.Attribute("name").Value, }; // Loop to print results foreach (var lv1 in lv1s) { result.AppendLine(lv1.sitecollection); foreach (var lv2 in lv2s) { result.AppendLine(" " + lv2.Attribute("name").Value); foreach (var lv3 in lv3s) { result.AppendLine(" " + lv3.Attribute("name").Value); foreach (var lv4 in lv4s) { result.AppendLine(" " + lv4.Attribute("name").Value); } } } } } }
With such a uniform hierarchy, recursion can do the job with a lot less code: void PrintNames(StringBuilder result, string indent, XElement el) { var attr = el.Attributes("name"); if (attr != null) { result.Append(indent); result.Append(attr.Value); result.Append(System.Environment.NewLine); } indent = indent + " "; foreach(var child in el.Elements()) { PrintNames(result, indent, child); } } ... var sb = new StringBuilder(); PrintNames(sb, String.Empty, xdoc.Root);
How about the following, find all elements with a name attribute, then add spaces based on their depth. var result = new System.Text.StringBuilder(); var namedElements = doc.Descendants().Where(el => el.Attributes("name")!=null); foreach(var el in namedElements) { int depth = el.Ancestors().Count(); for (int i=0;i<depth;i++) result.Append(" "); result.Append(el.Attributes("name").Value); result.Append(System.Environment.NewLine); } NOTE: The above is from memory, so please check the syntax!