Separating portions of XML file with text in C#

Separating portions of XML file with text in C# - c#

I need to write XML files with a tilde as a separator between portions, like so:
....
<Company>
<CompanyDetail>Blah</CompanyDetail>
<Phone>0000000000</Phone>
</Company>
~
<Company>
<CompanyDetail>Blah</CompanyDetail>
<Phone>0000000000</Phone>
</Company>
....
The way to do that normally in C# would be along the lines of
writer = new XmlTextWriter(fileName, Encoding.UTF8);
writer.Formatting = Formatting.Indented;
writer.WriteStartDocument();
int remainingCompanies = companyList.Count;
foreach (Company company in companyList)
{
writer.WriteStartElement("Company");
writer.WriteStartElement("CompanyDetail");
writer.WriteString("company.companyDetail.toString()");
writer.WriteEndElement();
writer.WriteStartElement("Phone");
writer.WriteString("company.phone.toString()");
writer.WriteEndElement();
writer.WriteEndElement();
if (remainingCompanies-- > 1)
{
writer.WriteString(\n~\n);
}
}
But whenever I do this, the resulting XML file ends up being poorly formatted like so:
<Company>
<CompanyDetail>FirstCompany</CompanyDetail>
<Phone>1111111111</Phone>
</Company>
~
<Company><CompanyDetail>SecondCompany</CompanyDetail><Phone>2222222222</Phone></Company>
~
<Company><CompanyDetail>ThirdCompany</CompanyDetail><Phone>3333333333</Phone></Company>
....
When there's much more information for each company than just CompanyDetail and Phone, you can imagine how difficult it gets to look through the single line to visually find what you need in the XML.
My current workaround is to replace the tilde with a comment, but how do I have a tilde separating parts of this XML file AND maintain clean formatting?

Related

Avoid skipping elements in xml reader

Let's suppose that I have a xml like this:
<articles>
<article>
<id>1</id>
<name>A1</name>
<price>10</price>
</article>
<article>
<id>2</id>
<name>A2</name>
</article>
<article>
<id>3</id>
<name>A3</name>
<price>30</price>
</article>
</articles>
As you can see article A2 is missing price tag.
I have a real world case where I parse xml file where some tags in some articles are missing (which I didn't know earlier). I wrote a very simple parser like this:
using (XmlReader reader = XmlReader.Create(new StringReader(myXml)))
{
while (true)
{
bool articleExists = reader.ReadToFollowing("article");
if (!articleExists) return;
reader.ReadToFollowing("id");
string id = reader.ReadElementContentAsString();
reader.ReadToFollowing("name");
string name = reader.ReadElementContentAsString();
reader.ReadToFollowing("price");
string price = reader.ReadElementContentAsString();
//do something with these values
}
But if there is no price tag in article 2 xmlreader will jump to price tag in article A3 and I get articles mixed up and some data skipped, right?
How can I protect from this? So if some tag in article node is absent, then let's say default value is used?
I would still like to use xmlreader if possible. My real file is 200 MB big so I need a simple,fast and efficient solution that won't hang the system.

Getting data from xml file and comparing it to a text file

So I have two files: a mot file and an xml file. What I need to do with these files is to read data from the xml file and compare it to the mot file if it exists. That's the general idea.
Before anything else, for those who are unfamiliar with what a mot
file is (I don't also have much knowledge about it, just the basics)...
(From Wikipedia) A mot file (or a Motorola S-Record
file) is a file format that conveys binary information in ASCII Hex text form.
(from another source) An S-record file consists of a
sequence of specially formatted ASCII character strings. An S-record
will be less than or equal to 78 bytes in length.
The format of a S-Record is:
S | Type | Record Length | Address (starting address) | Data | Checksum
(e.g. S21404200047524D5354524D0000801410AA5AA555F9)
([parsed] S2 14 042000 47524D5354524D0000801410AA5AA555 F9)
The specific idea is that I have data AA BB CC DD and so on allocated in addresses 0x042000 ~ 0x04200F. What’s written in the xml would be:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<data-set xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<record>
<File name="Test.mot">
<Address id="042000">
<Data>AA</Data>
</Address>
</File>
</record>
<record>
<File name="Test.mot">
<Address id="042001">
<Data>BB CC DD</Data>
</Address>
</File>
</record>
<record>
<File name="Test.mot">
<Address id="042004">
<Data>EE FF</Data>
</Address>
</File>
</record>
Then the program would get the data and address from he XML and search the .mot file for any hits. So if a mot file has a record S214042000AABBCCDDEEFF01234567891A2B3C4D5EF9, then this is supposed to bring a match with what's in the xml. Result to true, or 1. If anything in the xml doesn't have a match, then it would return with false or 0.
The problem now would be I’m not well-versed with C# much less with XML although I did have a tiny bit of experience with both. I initially thought it would be something like this:
using (StreamReader sr = new StreamReader("Test.mot"))
{
String line =String.Empty;
while ((line = sr.ReadLine()) != null)
{
if (line.Contains("042004") & line.Contains("EE FF"))
{
Console.WriteLine("Success");
}
else
{
Console.WriteLine("Failure");
}
}
}
But obviously, it didn't result with what I expected. And Failure keeps popping up. Am I right to use StreamReader to read the .mot file? And with regards to the XML file, will XMLDocument work? How do I get data from the xml and compare it with the .mot file? Could someone walk me through how to get this done or provide guides how to properly start with this.
Let me know if I'm not clear on anything.
EDIT:
I thought of an idea. I'm not sure if it's doable, though. Let's say the program will read the mot S-Record file, and it will identify the type of the record. From there every record line listed in the file would be broken down as shown in the sample below:
sample record line: "S214042000AABBCCDDEEFF01234567891A2B3C4D5EF9"
S2 - type w/c means there would be a 3-byte address
14 - record length
F9 - checksum
042000 - AA
042001 - BB
042002 - CC
042003 - DD
...
04200F - 5E
With this new list, I think or I hope it would be easier for the program to use the data in the XML to locate it in the mot file.
Tell me if this will work, or if there are any alternatives.

Correct me when i'm wrong as it is full of assumptions:
the XML only gives the starting values of the data package under the mot file:
||||||||||||
S214042000AABBCCDDEEFF01234567891A2B3C4D5EF9
AABBCCDDEEFF
You could read out the xml and place each record in a record class
public class Record
{
string FileName{get;set;}
string Id {get;set;}
string Data {get;set;}
public Record(){} //default constructor
}
with the XmlDocument class you could read out the xml.
something like:
var document = new XmlDocument();
document.LoadXml("your.xml");
var records = document.SelectNodes("record");
var recordList = new List<Record>();
foreach(var r in records)
{
var file = r.SelectSingleNode("file");
var fileName = file.Attributes["name"].Value;
var address = file.SelectSingleNode("Address");
var id = address.Attributes["id"].Value;
var data = address.SelectSingleNode("Data").InnerText.Replace(" ", "");
recordList.Add(new Record{FileName = fileName, Id = id, Data = data});
}
Afterwards you can then readout everyline of the mot file by position:
since the location of the 042000 always be the 5 - 10 character
var fn = "Test.mot";
using (StreamReader sr = new StreamReader(fn))
{
var record = recordList.Single(r=> r.FileName);
String line =String.Empty;
while ((line = sr.ReadLine()) != null)
{
if (line.SubString(4,6) == record.Id && line.SubString(10, record.Data.Length) == record.Data)
{
Console.WriteLine("Success");
}
else
{
Console.WriteLine("Failure");
}
}
}
Let me know if it helped you out a bit

XDocument changes tab to space

I have a xml-document that simplified looks like this:
<?xml version="1.0" encoding="utf-8"?>
<Node1 separator=" " />
There is a \t as attribute value.
When executing this code
var path = #"C:\test.xml";
var doc = XDocument.Load(path);
doc.Save(path);
the attribute value changed from tab to space.
<?xml version="1.0" encoding="utf-8"?>
<Node1 separator=" " />
Is there a way to preserve the origin value, because it is required to be a tab?

This is "XML whitespace normalization in attributes" portion of XML:Attribute-Value Normalization which is default behavior when handling XML documents.
For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value
You should be able to use XmlTextReader.Normalization property as described here. XmlDocument can load from reader XmlDocument.Load.
var path = #"C:\test.xml";
XmlDocument doc = new XmlDocument();
XmlTextReader reader = new XmlTextReader(path);
doc.Load(reader);
var s = doc.SelectSingleNode("*/#*").InnerText;
Console.WriteLine("|{0}|, {1}", (int)s[0], s.Length); // prints 9 - ASCII code of tab
doc.Save(path);

text append to existing file

<namespace>
<root>
<node1>
<element 1>
<element 1>
</node1>
<node2>
<element 1>
<element 1>
</node2>
</root>
string stringContains = string.Empty;
foreach(string line in File.ReadLines(Path))
{
if(line.Contains("<Root>"))
{
stringContains = line.Replace("<Root>", "<Root>" + newelement.OuterXml);
}
}
File.AppendAllText(Path, stringContains);
I must replace root with other nodes,
so that i am appending text. However the above code appends existing text with stringContains

Try this.
string stringContains = string.Empty;
var lines = File.ReadLines(Path);
foreach (var line in lines)
{
if(line.Contains("<Root>"))
{
stringContains = line.Replace("<Root>", "<Root>" + newelement.OuterXml);
}
}
File.WriteAllLines(Path, lines);
but if your file size is huge then this is bad practise.

Note: You cannot change just one line in a file, you need to read all content, change desired part and write all content again.
According to your info about file size (GBs long) your only way to achive desired behaviour is to create new file and write content of you initial file changing only lines you need.
Procesing should be done line-by-line (as you showed in your sample) to achive minimum use of memory
However, this approach requieres 2X space on your HDD till operation is complete (assuming you will delete initial file after all is done)
Regarding details: this question provides complete sample of how to achive described approach

Using XDocument to write raw XML

I'm trying to create a spreadsheet in XML Spreadsheet 2003 format (so Excel can read it). I'm writing out the document using the XDocument class, and I need to get a newline in the body of one of the <Cell> tags. Excel, when it reads and writes, requires the files to have the literal string
embedded in the string to correctly show the newline in the spreadsheet. It also writes it out as such.
The problem is that XDocument is writing CR-LF (\r\n) when I have newlines in my data, and it automatically escapes ampersands for me when I try to do a .Replace() on the input string, so I end up with &#10; in my file, which Excel just happily writes out as a string literal.
Is there any way to make XDocument write out the literal
as part of the XML stream? I know I can do it by deriving from XmlTextWriter, or literally just writing out the file with a TextWriter, but I'd prefer not to if possible.

I wonder if it might be better to use XmlWriter directly, and WriteRaw?
A quick check shows that XmlDocument makes a slightly better job of it, but xml and whitespace gets tricky very quickly...

I battled with this problem for a couple of days and finally came up with this solution. I used XMLDocument.Save(Stream) method, then got the formatted XML string from the stream. Then I replaced the &#10; occurrences with
and used the TextWriter to write the string to a file.
string xml = "<?xml version=\"1.0\"?><?mso-application progid='Excel.Sheet'?><Workbook xmlns=\"urn:schemas-microsoft-com:office:spreadsheet\" xmlns:o=\"urn:schemas-microsoft-com:office:office\" xmlns:x=\"urn:schemas-microsoft-com:office:excel\" xmlns:ss=\"urn:schemas-microsoft-com:office:spreadsheet\" xmlns:html=\"http://www.w3.org/TR/REC-html40\">";
xml += "<Styles><Style ss:ID=\"s1\"><Alignment ss:Vertical=\"Center\" ss:WrapText=\"1\"/></Style></Styles>";
xml += "<Worksheet ss:Name=\"Default\"><Table><Column ss:Index=\"1\" ss:AutoFitWidth=\"0\" ss:Width=\"75\" /><Row><Cell ss:StyleID=\"s1\"><Data ss:Type=\"String\">Hello&#10;&#10;World</Data></Cell></Row></Table></Worksheet></Workbook>";
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.LoadXml(xml); //load the xml string
System.IO.MemoryStream stream = new System.IO.MemoryStream();
doc.Save(stream); //save the xml as a formatted string
stream.Position = 0; //reset the stream position since it will be at the end from the Save method
System.IO.StreamReader reader = new System.IO.StreamReader(stream);
string formattedXML = reader.ReadToEnd(); //fetch the formatted XML into a string
formattedXML = formattedXML.Replace("&#10;", "
"); //Replace the unhelpful &#10;'s with the wanted endline entity
System.IO.TextWriter writer = new System.IO.StreamWriter("C:\\Temp\test1.xls");
writer.Write(formattedXML); //write the XML to a file
writer.Close();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Separating portions of XML file with text in C# - c#

Related

Avoid skipping elements in xml reader

Getting data from xml file and comparing it to a text file

XDocument changes tab to space

text append to existing file

Using XDocument to write raw XML

Categories

Resources