the c# code as below:
using (XmlReader Reader = XmlReader.Create(DocumentPath, XmlSettings))
{
while (Reader.Read())
{
switch (Reader.NodeType)
{
case XmlNodeType.Element:
//doing function
break;
case XmlNodeType.Text:
if(Reader.Value.StartsWith("_"))
{
// I want to replace the Reader.Value to new string
}
break;
case XmlNodeType.EndElement:
//doing function
break;
default:
//doing function
break;
}
}
}
I want to set the new value when XmlNodeType = text.
There are 3 steps to this operations:
Load the document as an object model(XML is a hierarchy, if that makes it easier)
Alter the value in the object model
Resave the model as a file
The method of loading is up to you but i would recommend using XDocument and the related Linq-to-XML classes for smaller tasks like this. It is crazy easy as demonstrated in this stack.
Edit - a useful quote for your scenario
XML elements can contain text content. Sometimes the content is simple
(the element only contains text content), and sometimes the content is
mixed (the contents of the element contains both text and other
elements). In either case, each chunk of text is represented as an
XText node.
From MSDN - XText class
Here is a code sample for a console application using the xml from the comments:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml.Linq;
namespace TextNodeChange
{
class Program
{
static void Main(string[] args)
{
string input = #"<Info><title>a</title><content>texttexttexttexttext<tag>TAGNAME</tag>texttexttexttexttext</content>texttexttexttexttext</Info>";
XDocument doc = XDocument.Parse(input);
Console.WriteLine("Input document");
Console.WriteLine(doc);
//get the all of the text nodes in the content element
var textNodes = doc.Element("Info").Element("content").Nodes().OfType<XText>().ToList();
//update the second text node
textNodes[1].Value = "THIS IS AN ALTERED VALUE!!!!";
Console.WriteLine();
Console.WriteLine("Output document");
Console.WriteLine(doc);
Console.ReadLine();
}
}
}
Why are 3 steps required?
Elements of xml are of variable length in a file. Altering a value may change the length of that part of the file and overwrite other parts. Therefore, you have to deserialize the whole document, make the changes and save it again.
You cant replace the reader.value readonly property. You need to use XmlWriter to create your own xml and replace any value you want.
Related
I have an xml document which has no root node. It looks like this:
<?xml version="1.0"?>
<Line>
<City>Paris</City>
<Country>France</Country>
</Line>
<Line>
<City>Lissabon</City>
<Country>Spain</Country>
</Line>
No I want to read Line by Line and write the contents to a database. However, XmlDocument seems to insist that there must exist a root node. How can I process this file?
If you want to parse it as an XML document, you can add a root node like Denis proposed in his comment.
If you would just like to read each line and write it to a database, you can handle the file like an ordinary (text) file and read its contents line by line using a StreamReader.
This would look something like this:
string line;
// Read the file and process it line by line.
var reader = new StreamReader(FILEPATH);
while((line = reader.ReadLine()) != null)
{
// Depending on what you need, you could strip the XML tags
// And write the line to the database
}
reader.Close();
You could try something like this (simple WinForms app with a button and a rich text box to display output for testing):
using System;
using System.Text;
using System.Xml;
using System.Windows.Forms;
namespace WindowsFormsApp11
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
StringBuilder sb = new StringBuilder();
XmlReaderSettings settings = new XmlReaderSettings
{
ConformanceLevel = ConformanceLevel.Fragment
};
using (XmlReader reader = XmlReader.Create(#"c:\ab\countries.xml", settings))
{
while(reader.Read())
{
if (reader.Name != "Line") // Ignore the <Line> nodes
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
sb.Append(string.Format("{0}:", reader.Name));
break;
case XmlNodeType.Text:
sb.Append(string.Format(" {0}{1}", reader.Value, Environment.NewLine));
break;
}
}
}
}
richTextBox1.Text = sb.ToString();
}
}
}
May be not the best solution, but you could create a List (or array) from your XML and insert missing nodes:
// Read lines into List
var list = File.ReadLines("doc.xml").ToList();
// Insert missing nodes
list.Insert(1, "<root>"); // Use 1, because 0 is XML directive
list.Insert(list.Count, "</root>"); //Add closing tag to the end
// Create final XML string with LINQ
var xml_str = list.Aggregate("", (acc, s) => acc + s);
// Having a string, we can create, for instance, XElement (or XDocument)
var xml = XElement.Parse(xml_str);
Console.WriteLine(xml.Element("Line").Element("City").Value);
//Output: Paris
I have a large single node .xml file that I have saved as a string. I want to parse the .xml file for a specific element read and output the innertext. EG: I want to read the FrameNo element and output BINGO to a messagebox. The desired element will only appear once in the .xml document. I prefer using XmlDocument.
I have tried numerous C# .xml examples but am unable to get a output.
xml text is
<Aircraft z:Id="i1" xmlns="http://xxx.yyyyycontract.gov/2018/03/Boeing.xxxxxxxxxxxxxx.Airframe"
xmlns:i="http://www.xxxxxxx.com/2019/XMLSchema-instance"
xmlns:z="http://xxxxxxx.xxxxxxxxx.com/2005/01/Serialization/"><Timestamp i:nil="true"/>
<Uuid>00000000-0000-0000-0000-000000000000</Uuid><Comments i:nil="true"/><Facility>..........
and so on to the end of the .xml
<FrameNo>BINGO</FrameNo><WDate i:nil="true"/></Aircraft>
this is the code section I want to have the code execute in.
private void buttonLoad_Click(object sender, EventArgs e)
{
}
I think, this is self-explanatory
using System.Xml.Linq;
XElement root = XElement.Load(textXML);
XElement myElement = root.Element("FrameNo");
if (myElement != null)
myData = myElement.InnerText;
Thanks to jdweng I wanted to share the final code for others to use. This will function in a method like below
private void buttonMaint_Click(object sender, EventArgs e)
{
XDocument doc = XDocument.Parse(xmlinputstr); // input string from memory or input file
XNamespace ns = doc.Root.GetDefaultNamespace();
string[] Frame = doc.Descendants(ns + "FrameNo").Select(x => (string)x).ToArray(); // selects element to read + trailing character of >
string frame = string.Join("", Frame); //converts from array to string
if (string.IsNullOrEmpty(frame)) // check for empty result
{
txtFrame.Text = "not found"; //outputs to textbox
}
else
{
txtFrame.Text = (frame); //outputs to textbox
}
}
Comments are there for clarity
You need to use the default namespace. See my xml linq solution below :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
string xml = File.ReadAllText(FILENAME);
XDocument doc = XDocument.Parse(xml);
XNamespace ns = doc.Root.GetDefaultNamespace();
XElement frameNo = doc.Descendants(ns + "FrameNo").FirstOrDefault();
string frame = (string)frameNo;
string[] serialNumbers = doc.Descendants(ns + "SerialNumber").Select(x => (string)x).ToArray();
}
}
}
Another weird snag has shown up. Some of the elements are named like this.
<a:SupplierServDoc>
the innertext contents of this element is a base64 packet. There is no problem processing the base64 packet.
The code from the above answers does output the base64 correctly but cannot handle the : in the element name. It throws a 3A hex character error.
I have this code that outputs the inntertext but not as a base64 packet. I have also looked into prefix to handle the : but with worse results. I am outputting the base 64 innertext as a .txt file when finished.
XNamespace ad = http://www.mmmmmmmmmm.com";
XName k = ad + "SupplierServDoc";
string[] WING = doc.Descendants(k).Select(x => (string)x).ToArray();
string wing = string.Join("", WING);
if (string.IsNullOrEmpty(syncd))
{
MessageBox.Show("a:SupplierServDoc Base 64 code not found");
}
else
{
MessageBox.Show("Test " + wing);
}
I'm trying to parse large XML file (size near about 600MB) and using
It's taking longer time and finally, the entire process is aborted. The process is ending with an exception.
Message: "Thread is aborted"
Method:
private string ReadXml(XmlTextReader reader, string fileName)
{
string finalXML = "";
string s1 = "";
try
{
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element: // The node is an element.
s1 += "<" + reader.Name + ">";
break;
case XmlNodeType.Text: //Display the text in each element.
s1 += reader.Value;
break;
case XmlNodeType.EndElement: //Display the end of the element.
s1 += "</" + reader.Name + ">";
break;
}
finalXML = s1;
}
}
catch (Exception ex)
{
Logger.Logger.LogMessage(ex, "File Processing error: " + fileName);
}
reader.Close();
reader.Dispose();
return finalXML;
}
And then reading and desalinizing:
string finalXML = string.Empty;
XmlTextReader reader = new XmlTextReader(unzipfile);
finalXML = await ReadXml(reader, fileName);
var xmlremovenamespae = Helper.RemoveAllNamespaces(finalXML);
XmlParseObjectNew.BizData myxml = new XmlParseObjectNew.BizData();
using (StringReader sr = new StringReader(xmlremovenamespae))
{
XmlSerializer serializer = new XmlSerializer(typeof(XmlParseObjectNew.BizData));
myxml = (XmlParseObjectNew.BizData)serializer.Deserialize(sr);
}
Is there any better way to read & parse large xml file? need a suggestion.
I try this and working fine.
fileName = "your file path";
Try this code ,its parsing greater than 500MB XML file within few second.
using (TextReader textReader = new StreamReader(fileName))
{
using (XmlTextReader reader = new XmlTextReader(textReader))
{
reader.Namespaces = false;
XmlSerializer serializer = new XmlSerializer(typeof("YourXmlClassType"));
parseData = ("YourXmlClassType")serializer.Deserialize(reader);
}
}
The problem is, as mentioned by Jon Skeet and DiskJunky, that your dataset is simply too large to load into memory and your code not optimized for handling this. Hence why various classes are throwing you an 'out of memory exception'.
First of all, string concatenation. Using simple concatenation (a + b) with multiple strings is usually a bad idea due to the way strings work. I would recommend looking up online how to handle string concatenation effectively (for example, Jon Skeet's Concatenating Strings Efficiently).
However this is optimization of your code, the main issue is the sheer size of the XML file you are trying to load into memory. To handle large datasets it is usually better if you can 'stream' the data, processing chunks of data instead of the entire file.
As you have not shown an example of your XML, I took the liberty of making a simple example to illustrate what I mean.
Consider you have the following XML:
<root>
<specialelement>
<value1>somevalue</value1>
<value2>somevalue</value2>
</specialelement>
<specialelement>
<value1>someothervalue</value1>
<value2>someothervalue</value2>
</specialelement>
...
</root>
Of this XML you want to parse the specialelement into an object, with the following class definition:
[XmlRoot("specialelement")]
public class ExampleClass
{
[XmlElement(ElementName = "value1")]
public string Value1 { get; set; }
[XmlElement(ElementName = "value2")]
public string Value2 { get; set; }
}
I'll assume we can process each SpecialElement individually, and define a handler for this as follows:
public void HandleElement(ExampleClass item)
{
// Process stuff
}
Now we can use the XmlTextReader to read each element in the XML individually, when we reach our specialelement we keep track of the data that is contained within the XML element. When we reach the end of our specialelement we deserialize it into an object and send it to our handler for processing. For example:
using (var reader = new XmlTextReader( /* your inputstream */ ))
{
// Buffer for the element contents
StringBuilder sb = new StringBuilder(1000);
// Read till next node
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
// Clear the stringbuilder when we start with our element
if (string.Equals(reader.Name, "specialelement"))
{
sb.Clear();
}
// Append current element without namespace
sb.Append("<").Append(reader.Name).Append(">");
break;
case XmlNodeType.Text: //Display the text in each element.
sb.Append(reader.Value);
break;
case XmlNodeType.EndElement:
// Append the closure element
sb.Append("</").Append(reader.Name).Append(">");
// Check if we have finished reading our element
if (string.Equals(reader.Name, "specialelement"))
{
// The stringbuilder now contains the entire 'SpecialElement' part
using (TextReader textReader = new StringReader(sb.ToString()))
{
// Deserialize
var deserializedElement = (ExampleClass)serializer.Deserialize(textReader);
// Send to handler
HandleElement(deserializedElement);
}
}
break;
}
}
}
As we start processing the data as it comes in from the stream, we do not have to load the entire file into memory. Keeping the memory usage of the program low (preventing out-of-memory exceptions).
Checkout this fiddle to see it in action.
Note that this a quick example, there are still plenty of places where you can improve and optimize this code further.
I am trying to read abc.xml which has this element
<RunTimeStamp>
9/22/2011 2:58:34 PM
</RunTimeStamp>
I am trying to read the value of the element which the xml file has and store it in a string and once i am done with the processing. I get the current timestamp and write the new timestamp back to the xml file.
Here's my code so far, please help and guide, your help will be appreciated.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using log4net;
using System.Xml;
namespace TestApp
{
class TestApp
{
static void Main(string[] args)
{
Console.WriteLine("\n--- Starting the App --");
XmlTextReader reader = new XmlTextReader("abc.xml");
String DateVar = null;
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element: // The node is an element.
Console.Write("<" + reader.Name);
Console.WriteLine(">");
if(reader.Name.Equals("RunTimeStamp"))
{
DateVar = reader.Value;
}
break;
case XmlNodeType.Text: //Display the text in each element.
Console.WriteLine(reader.Value);
break;
/*
case XmlNodeType.EndElement: //Display the end of the element.
Console.Write("</" + reader.Name);
Console.WriteLine(">");
break;
*/
}
}
Console.ReadLine();
// after done with the processing.
XmlTextWriter writer = new XmlTextWriter("abc.xml", null);
}
}
}
I personally wouldn't use XmlReader etc here. I'd just load the whole file, preferrably with LINQ to XML:
XDocument doc = XDocument.Load("abc.xml");
XElement timestampElement = doc.Descendants("RunTimeStamp").First();
string value = (string) timestampElement;
// Then later...
timestampElement.Value = newValue;
doc.Save("abc.xml");
Much simpler!
Note that if the value is an XML-format date/time, you can cast to DateTime instead:
DateTime value = (DateTime) timestampElement;
then later:
timestampElement.Value = DateTime.UtcNow; // Or whatever
However, that will only handle valid XML date/time formats - otherwise you'll need to use DateTime.TryParseExact etc.
linq to xml is the best way to do it. Much simpler and easier as shown by #Jon
My goal is to build an engine that takes the latest HL7 3.0 CDA documents and make them backward compatible with HL7 2.5 which is a radically different beast.
The CDA document is an XML file which when paired with its matching XSL file renders a HTML document fit for display to the end user.
In HL7 2.5 I need to get the rendered text, devoid of any markup, and fold it into a text stream (or similar) that I can write out in 80 character lines to populate the HL7 2.5 message.
So far, I'm taking an approach of using XslCompiledTransform to transform my XML document using XSLT and product a resultant HTML document.
My next step is to take that document (or perhaps at a step before this) and render the HTML as text. I have searched for a while, but can't figure out how to accomplish this. I'm hoping its something easy that I'm just overlooking, or just can't find the magical search terms. Can anyone offer some help?
FWIW, I've read the 5 or 10 other questions in SO which embrace or admonish using RegEx for this, and don't think that I want to go down that road. I need the rendered text.
using System;
using System.IO;
using System.Xml;
using System.Xml.Xsl;
using System.Xml.XPath;
public class TransformXML
{
public static void Main(string[] args)
{
try
{
string sourceDoc = "C:\\CDA_Doc.xml";
string resultDoc = "C:\\Result.html";
string xsltDoc = "C:\\CDA.xsl";
XPathDocument myXPathDocument = new XPathDocument(sourceDoc);
XslCompiledTransform myXslTransform = new XslCompiledTransform();
XmlTextWriter writer = new XmlTextWriter(resultDoc, null);
myXslTransform.Load(xsltDoc);
myXslTransform.Transform(myXPathDocument, null, writer);
writer.Close();
StreamReader stream = new StreamReader (resultDoc);
}
catch (Exception e)
{
Console.WriteLine ("Exception: {0}", e.ToString());
}
}
}
Since you have the XML source, consider writing an XSL that will give you the output you want without the intermediate HTML step. It would be far more reliable than trying to transform the HTML.
This will leave you with just the text:
class Program
{
static void Main(string[] args)
{
var blah = new System.IO.StringReader(sourceDoc);
var reader = System.Xml.XmlReader.Create(blah);
StringBuilder result = new StringBuilder();
while (reader.Read())
{
result.Append( reader.Value);
}
Console.WriteLine(result);
}
static string sourceDoc = "<html><body><p>this is a paragraph</p><p>another paragraph</p></body></html>";
}
Or you can use a regular expression:
public static string StripHtml(String htmlText)
{
// replace all tags with spaces...
htmlText = Regex.Replace(htmlText, #"<(.|\n)*?>", " ");
// .. then eliminate all double spaces
while (htmlText.Contains(" "))
{
htmlText = htmlText.Replace(" ", " ");
}
// clear out non-breaking spaces and & character code
htmlText = htmlText.Replace(" ", " ");
htmlText = htmlText.Replace("&", "&");
return htmlText;
}
Can you use something like this which uses lynx and perl to render the html and then convert that to plain text?
This is a great use-case for XSL:FO and FOP. FOP isn't just for PDF output, one of the other major outputs that is supported is text. You should be able to construct a simple xslt + fo stylesheet that has the specifications (i.e. line width) that you want.
This solution will is a bit more heavy-weight that just using xml->xslt->text as ScottSEA suggested, but if you have any more complex formatting requirements (e.g. indenting), it will become much easier to express in fo, than mocking up in xslt.
I would avoid regexs for extracting the text. That's too low-level and guaranteed to be brittle. If you just want text and 80 character lines, the default xslt template will only print element text. Once you have only the text, you can apply whatever text processing is necessary.
Incidentally, I work for a company who produces CDAs as part of our product (voice recognition for dications). I would look into an XSLT that transforms the 3.0 directly into 2.5. Depending on the fidelity you want to keep between the two versions, the full XSLT route will probably be your easiest bet if what you really want to achieve is conversion between the formats. That's what XSLT was built to do.