XmlReader parsing single line of xml - c#

I have an xml file, even though it's not in the correct format. I just want to retrieve the values since that is my main goal. Also the file is one single line. The file looks something like this.
<Real64List>...numbers value....</Real64List> <Real64List>...numbers value...</Real64List><Uint32List>...numbers value..</Uint32List>.
I was able to first the first part of , but when I'm trying to parse the second part and third part I get an error "multiple root on line 1". So how can I parse the second part of Real64List and Uint32List. Thanks in advance!
Here is the snippet of the code I'm using.
using (XmlReader reader = XmlReader.Create("generator_0000001.xml"))
{
List<float> vertex = new List<float>();
List<Vector3> verticesList = new List<Vector3>();
while (!reader.EOF)
{
if (reader.NodeType == XmlNodeType.Element && reader.Name == "Real64List")
{
string xmlValue = reader.Value;
string[] coordinates = xmlValue.Split(',');
for (int i = 0; i < coordinates.Length; i++)
{
vertex.Add(float.Parse(coordinates[i]));
}
for (int i = 0; i < vertex.Count; i += 3)
{
var myVect = new Vector3(vertex[i], vertex[i + 1], vertex[i + 2]);
verticesList.Add(myVect);
}
}
}

You could insert those XmlElements into a new XML node. Then you should be able to parse it as valid XML. Some straightforward examples can be found on MSDN

Related

IEnumerable<XElement> compare is not same

I read an XML file by following two techniques.
By reading the entire XML using Parse XElement XElement.Parse(File.ReadAllText(xmlfile))
Note: I know I shouldn't have used this technique.
By using Load of XDocument XDocument.Load(xmlfile);
Then I tried creating a list of XElement by the following code snippet. To me, results look same but when I try to compare the two IEnumerable object, they aren't same.
What I am overlooking. Here is the code snippet
// Read the xml db file.
XElement xEle = XElement.Parse(File.ReadAllText(xmlfile));
XDocument xDoc = XDocument.Load(xmlfile);
List<XElement> xElementCollection = xEle.Elements("Map").ToList();
List<XElement> xDocumentCollection = xDoc.Descendants("Map").ToList();
bool bCompare = xElementCollection.Equals(xDocumentCollection);
bCompare results to false, however when I look at the data to both the lists. They look same.
You basically need to go through each element in both lists and compare them to each other by value using the XNode.DeepEquals method.
if (xElementCollection.Count != xDocumentCollection.Count)
{
bCompare = false;
}
else
{
bCompare = true;
for (int x = 0, y = 0;
x < xElementCollection.Count && y < xDocumentCollection.Count; x++, y++)
{
if (!XNode.DeepEquals(xElementCollection[x], xDocumentCollection[y]))
bCompare = false;
}
}

Clearing a line of a txt file by the line ID

I have looked all over for the answer to this, but I can't find it anywhere. I need to be able to clear a line from a txt file by the last integer in the line (the ID number), but I have no idea how to do that. Please help? Basically I was thinking that I need to find the last integer, and if it does not equal to the input, then it would move to the next line until it finds the right integer. Then that line is cleared. Here is some of my code that obviously doesn't work:
public static void TicID(CommandArgs args)
{
if (args.Parameters.Count == 1)
{
if (i == 1)
{
try
{
string idToDelete = args.Parameters[0];
StreamReader idreader = new StreamReader("Tickets.txt");
StreamWriter iddeleter = new StreamWriter("Tickets.txt");
string id = Convert.ToString(idreader.Read());
string line = null;
while (idreader.Peek() >= 0)
{
if (String.Compare(id, idToDelete) == 0)
{
iddeleter.WriteLine(line);
}
else
{
idreader.ReadLine();
}
}
}
The most straightforward way to delete lines is to write the lines that should not be deleted:
var idToDelete = "1";
var path = #"C:\Temp\Test.txt";
var lines = File.ReadAllLines(path);
using (var writer = new StreamWriter(path, false)) {
for (var i = 0; i < lines.Length; i++) {
var line = lines[i];
//assuming it's a CSV file
var cols = line.Split(',');
var id = cols[cols.Length - 1];
if (id != idToDelete) {
writer.WriteLine(line);
}
}
}
This is the LINQ-way:
var lines = from line in File.ReadAllLines(path)
let cols = line.Split(',')
let id = cols[cols.Length - 1]
where id != idToDelete
select line;
File.WriteAllLines(path, lines);
Create a StreamReader object and read line by line into a string array or something like that using StreamReader's instance method ReadLine() and now find your line of choice in your text array and delete it.
Important note:
do { /*see description above*/ } while (streamReader.Peak() != -1);

Manipulating lines of data

I have millions of lines generated from data updated every second which look like this:
104500 4783
104501 8930
104502 21794
104503 21927
104505 5746
104506 9968
104509 5867
104510 46353
104511 7767
104512 4903
The column on the left represents time (hhmmss format), and the column on the right is data which is updated second-by-second. As you can see however, it isn't actually second-by-second, and there are some missing times (10:45:04, 10:45:07, 10:45:08 are missing in this example). My goal is to add in the missing seconds, and to use the data from the previous second for that missing second, like this:
104500 4783
104501 8930
104502 21794
104503 21927
104504 21927 --
104505 5746
104506 9968
104507 9968 --
104508 9968 --
104509 5867
104510 46353
104511 7767
104512 4903
I don't want the "--" in the result, I just put those there to mark the added lines. So far I've tried to accomplish this using StreamReader and StreamWriter, but it doesn't seem like they're going to get me what I want. I'm a newbie programmer and a newbie to C#, so if you could just point me in the right direction, that would be great. I'm really just wondering if this is even possible to do in C#...I've spent a lot of time on MSDN and here on SO looking for a solution to this, but so far haven't found any.
Edit: The lines are in a text file, and I want to store the newly created data in a new text file.
There are a few things you need to put together.
Read a file line-by-line: See here: Reading a Text File One Line at a Time
Writing a file line-by-line : StreamWriter.WriteLine
Keep track of the last read line. (Just use a variable in your while loop where you read the lines)
Check whether there is a gap. Maybe by parsing the first column (string.Split) using TimeSpan.Parse. If there is a gap then write the last read line, incrementing the timespan.
ok, here is the whole shooting match, tested and working against your test data:
public void InjectMissingData()
{
DataLine lastDataLine = null;
using (var writer = new StreamWriter(File.Create("c:\\temp\\out.txt")))
{
using (var reader = new StreamReader("c:\\temp\\in.txt"))
{
while (!reader.EndOfStream)
{
var dataLine = DataLine.Parse(reader.ReadLine());
while (lastDataLine != null && dataLine.Occurence - lastDataLine.Occurence > TimeSpan.FromSeconds(1))
{
lastDataLine = new DataLine(lastDataLine.Occurence + TimeSpan.FromSeconds(1), lastDataLine.Data);
writer.WriteLine(lastDataLine.Line);
}
writer.WriteLine(dataLine.Line);
lastDataLine = dataLine;
}
}
}
}
public class DataLine
{
public static DataLine Parse(string line)
{
var timeString = string.Format("{0}:{1}:{2}", line.Substring(0, 2), line.Substring(2, 2),
line.Substring(4, 2));
return new DataLine(TimeSpan.Parse(timeString), long.Parse(line.Substring(7, line.Length - 7).Trim()));
}
public DataLine(TimeSpan occurence, long data)
{
Occurence = occurence;
Data = data;
}
public TimeSpan Occurence { get; private set; }
public long Data { get; private set; }
public string Line
{
get { return string.Format("{0}{1}{2} {3}",
Occurence.Hours.ToString().PadLeft(2, Char.Parse("0")),
Occurence.Minutes.ToString().PadLeft(2, Char.Parse("0")),
Occurence.Seconds.ToString().PadLeft(2, Char.Parse("0")),
Data); }
}
}
In adition to all answers, considering that you are talking about a huge files, consider use of MemoryMappedFiles, can read here to see how to use them from C#.
This is not performance improvement, but memory improvement definetely is.
So far as inserting new entries between certain ones goes, I would advise reading in the text file into separated lines, and then storing them in a List. That way, you can use the Insert(...) method to insert your new lines. From there, you can write the lines back into the file.
When reading the lines, you can use either of the static helper methods in the System.IO.File class: ReadAllText and ReadAllLines.
Note: I've added links to the MSDN Documentation for each of the methods and classes I've mentioned, since you said you are new to C# and programming in general.
String prevTime;
String prevData;
while(String line = myStreamReader.ReadLine())
{
String[] parts = line.Split(new Char[] { ' ' });
String time = parts[0];
String data = parts[1];
Int32 iPrevTime = Int32.Parse(prevTime);
Int32 iCurrentTime = Int32.Parse(time);
// May need to loop here if you're missing more than one second
if(iCurrentTime > iPrevTime + 1)
AddData((iPrevTime + 1).ToString(), prevData);
AddData(time, data);
prevTime = time;
prevData = data;
}
Here is some pseudo-code to get you started. I think you will want this type of algorithm.
Here's some rough code for you. I'm not properly disposing everything, it's just to get you started.
DateTime lastTime;
string lastValue = null;
StreamReader reader = File.OpenText("path");
StreamWriter writer = new StreamWriter(File.OpenWrite("newPath"));
while (!reader.EndOfStream)
{
string[] lineData = reader.ReadLine().Split(' ');
DateTime currentTime = DateTime.Parse(lineData[0]);
string value = lineData[1];
if (lastValue != null)
{
while (lastTime < currentTime.AddSeconds(-1))
{
lastTime = lastTime.AddSeconds(1);
writer.WriteLine("{0} {1}", lastTime, lastValue);
}
}
writer.WriteLine("{0} {1}", currentTime, value);
lastTime = currentTime;
lastValue = value;
}
This assumes the times are never more than a second apart. If that assumption is wrong, it's easy enough to modify the below so it writes the lastValue in a loop for each second missing.
Update I missed in your example that it can in fact miss multiple seconds. I changed the example below to address that.
using (StreamReader reader = OpenYourInputFile())
using (StreamWriter writer = OpenYourOutputFile())
{
TimeSpan? lastTime;
TimeSpan currentTime, maxDiff = TimeSpan.FromSeconds(1);
string lastValue, currentline, currentValue, format = "{0:hhmmss} {1}";
while( (currentLine = reader.ReadLine()) != null)
{
string[] s = currentLine.Split(' ');
currentTime = DateTime.ParseExact("hhmmss", s[0] CultureInfo.InvariantCulture).TimeOfDay;
currentValue = s[1];
if (lastTime.HasValue && currentTime - lastTime.Value > maxDiff)
{
for(int x = 1; x <= (currentTime - lastTime).Seconds; x++) writer.WriteLine(string.Format(format, DateTime.Today.Add(lastTime).AddSeconds(x), lastValue);
}
writer.WriteLine(string.Format(format, DateTime.Today.Add(currentTime), currentValue);
lastTime = currentTime;
lastValue = currentValue;
}
}
string line;//The line that is read.
string previousLine = "0 0";
int prevTime = 0;
//These "using"'s are so that the resources they use will be freed when the block ( i.e. {} ) is finished.
using (System.IO.StreamReader originalFile = new System.IO.StreamReader("c:\\users\\Me\\t.txt"))
using (System.IO.StreamWriter newFile = new System.IO.StreamWriter("c:\\users\\Me\\t2.txt"))
{
while ((line = originalFile.ReadLine()) != null)
{
//"Split" changes the words in "line" (- that are separated by a space) to an array.
//"Parse" takes the first in that array (by using "[0]") and changes it into an integer.
int time = int.Parse(line.Split(' ')[0]);
while (prevTime != 0 && time > ++prevTime) newFile.WriteLine(prevTime.ToString() + " " + previousLine.Split(' ')[1]);
previousLine = line;
prevTime = time;
newFile.WriteLine(line);
}
}

Reading a file and mapping values

I found an implementation of a parallel coordinates application in c#. What I am trying to achieve is that I want to be able to read a CSV file and map the values and Labels onto the coordinates. The method mapping the values is assigning the values manually. Instead, I want those values to be read from the CSV file.
Here is the current method:
public void DataBind()
{
IList<DemoInfo> infos = new List<DemoInfo>();
for (int i = 0; i < ObjectsCount; i++)
{
var x = new DemoInfo();
x.X = m_Random.NextDouble() * 400 - 100;
x.Y = m_Random.NextDouble() * 500 - 100;
x.Z = m_Random.NextDouble() * 600 - 300;
x.V = m_Random.NextDouble() * 800 - 100;
x.K = 1.0;
//x.M = i % 2 == 0 ? 1.0 : -20.0;
x.M = i;
x.Tag = i + 1;
infos.Add(x);
}
var dataSource = new MultiDimensionalDataSource<DemoInfo>(infos, 6);
dataSource.MapDimension(0, info => info.X);
dataSource.MapDimension(1, info => info.Y);
dataSource.MapDimension(2, info => info.Z);
dataSource.MapDimension(3, info => info.V);
dataSource.MapDimension(4, info => info.K);
dataSource.MapDimension(5, info => info.M);
//dataSource.MapDimensionToOpacity(0, 0.5);
dataSource.MapTag(info => info.Tag);
dataSource.Labels[0] = "X";
dataSource.Labels[1] = "Y";
dataSource.Labels[2] = "Z";
dataSource.Labels[3] = "V";
dataSource.Labels[4] = "K";
dataSource.Labels[5] = "M";
dataSource.HelperAxisLabel = "Helper axis";
DataSource = dataSource;
}
Here is some of the data in the CSV File:
SWW Institutions Undergradutes Postgraduates
University College 2085 250
Metropolitan University 4715 1135
Would really appreciate your help !!
Thanks.
I am not sure how your CSV file is mapping to the DemoInfo class. Also, the example below is based on a CSV file, but your sample data is showing a TSV file. If it is a TSV file, just replace ',' with '/t'. Also, something watch out for is if any strings contain your delimiter, such as a SWW Institutions string like "Univeristy, Madison".
You can open the file to read the lines of text and split the line based on your delimiter.
using (var sr = File.OpenText(path))
{
var line = string.Empty;
while ((line = sr.ReadLine()) != null)
{
var dataPoints = line.Split(',');
// Create Your Data Mappings Here
// dataPoints[0]...
}
}

changing a node type to #text whilst keeping the innernodes with the HtmlAgilityPack

I'm using the HtmlAgilityPack to parse an XML file that I'm converting to HTML. Some of the nodes will be converted to an HTML equivalent. The others that are unnecessary I need to remove while maintaining the contents. I tried converting it to a #text node with no luck. Here's my code:
private HtmlNode ConvertElementsPerDatabase(HtmlNode parentNode, bool transformChildNodes)
{
var listTagsToReplace = XmlTagMapping.SelectAll(string.Empty); // Custom Dataobject
var node = parentNode;
if (node != null)
{
var bNodeFound = false;
if (node.Name.Equals("xref"))
{
bNodeFound = true;
node = NodeXref(node);
}
if (node.Name.Equals("graphic"))
{
bNodeFound = true;
node = NodeGraphic(node);
}
if (node.Name.Equals("ext-link"))
{
bNodeFound = true;
node = NodeExtLink(node);
}
foreach (var infoTagToReplace in listTagsToReplace)
{
if (node.Name.Equals(infoTagToReplace.XmlTag))
{
bNodeFound = true;
node.Name = infoTagToReplace.HtmlTag;
if (!string.IsNullOrEmpty(infoTagToReplace.CssClass))
node.Attributes.Add("class", infoTagToReplace.CssClass);
if (node.HasAttributes)
{
var listTagAttributeToReplace = XmlTagAttributeMapping.SelectAll_TagId(infoTagToReplace.Id); // Custom Dataobject
for (int i = 0; i < node.Attributes.Count; i++ )
{
var bDeleteAttribute = true;
foreach (var infoTagAttributeToReplace in listTagAttributeToReplace)
{
if (infoTagAttributeToReplace.XmlName.Equals(node.Attributes[i].Name))
{
node.Attributes[i].Name = infoTagAttributeToReplace.HtmlName;
bDeleteAttribute = false;
}
}
if (bDeleteAttribute)
node.Attributes.Remove(node.Attributes[i].Name);
}
}
}
}
if (transformChildNodes)
for (int i = 0; i < parentNode.ChildNodes.Count; i++)
parentNode.ChildNodes[i] = ConvertElementsPerDatabase(parentNode.ChildNodes[i], true);
if (!bNodeFound)
{
// Replace with #text
}
}
return parentNode;
}
At the end I need to do the node replacement (where you see the "Replace with #text" comment) if the node is not found. I've been ripping my hair (what's left of it) out all day and it's probably something silly. I'm unable to get the help to compile and there is no online version. Help Stackoverflow! You're my only hope. ;-)
I would think you could just do this:
return new HtmlNode(HtmlNodeType.Text, parentNode.OwnerDocument, 0);
This of course adds the node to the head of the document, but I assume you have some sort of code in place to handle where in the document the node should be added.
Regarding the documentation comment, the current (as of this writing) download of the Html Agility Pack documentation contains a CHM file which doesn't require compilation in order to view.

Categories