Most efficient way to parse delimited into html table c#

Most efficient way to parse delimited into html table c# - c#

I've got the following delimited string with pairs:
1,5|2,5|3,5
I want to create a table as follows:
< table>
< tr>< td>1< /td>< td>5< /td>< /tr>
< tr>< td>2< /td>< td>5< /td>< /tr>
< tr>< td>3< /td>< td>5< /td>< /tr>
< /table>
What's the most efficient way in C#?

Parse the string (simple splitting should be enough) and I'd suggest using the .NET XML classes (or Html Agility Pack for the purists out there) to generate the table. Might be overkill vs building up the string manually especially for simple data but it is less verbose and should be easier to extend later.
Using LINQ to XML:
var str = "1,5|2,5|3,5";
var table =
new XElement("table",
str.Split('|')
.Select(pair =>
new XElement("tr",
pair.Split(',')
.Select(num => new XElement("td", num))
)
)
).ToString();
Yields the string:
<table>
<tr>
<td>1</td>
<td>5</td>
</tr>
<tr>
<td>2</td>
<td>5</td>
</tr>
<tr>
<td>3</td>
<td>5</td>
</tr>
</table>

Version 1: Straight-forward
String html = "<table>";
Array.ForEach<String>("1,5|2,5|3,5".Split('|'),r =>
{
html += "<tr>";
Array.ForEach(r.Split(','),c =>
{
html += String.Format("<td>{0}</td>", c);
});
html += "</tr>";
});
html += "</table>";
Untested, but something of the sort?
I take it back, battle tested and working.
Version two, less the delegate:
String html = "<table>";
foreach (String r in "1,5|2,5|3,5".Split('|'))
{
html += "<tr>";
foreach (String c in r.Split(','))
html += String.Format("<td>{0}</td>", c);
html += "</tr>";
}
html += "</table>";
Both versions in a working demo.
And Another version which includes StringBuilder

If you search for efficient way, then you shouldn't use string concat, use StringBuilder instead:
private static string ToTable(string input)
{
var result = new StringBuilder(input.Length * 2);
result.AppendLine("<table>");
foreach (var row in input.Split('|'))
{
result.Append("<tr>");
foreach (var cell in row.Split(','))
result.AppendFormat("<td>{0}</td>", cell);
result.AppendLine("/<tr>");
}
result.AppendLine("</table>");
return result.ToString();
}

Create a IList from your collection as described above using the String.Split method in the code behind and use the native DataList UI Control, bind the datasource to the control and set the DataSource property of the control to your List.
<asp:DataList ID="YourDataList" RepeatLayout="Table" RepeatColumns="2" RepeatDirection="Horizontal" runat="server">
<ItemTemplate>
<%# Eval("value") %>
</ItemTemplate>
</asp:DataList>

Related

Find indexes in String using multiple search items and one single iteration

I have the following HTML sample document:
.....
<div class="TableElement">
<table>
<tr>
<th class="boxToolTip" title="La quotazione di A2A è in rialzo o in ribasso?"> </th>
..
<th class="boxToolTip" class="ColumnLast" title="Trades più recenti su A2A">Ora <img title='' alt='' class='quotePageRTupgradeLink' href='#quotePageRTupgradeContainer' id='cautionImageEnt' src='/common/images/icons/caution_sign.gif'/></th>
</tr>
<tr class="odd">
..
<td align="center"><span id="quoteElementPiece6" class="PriceTextUp">1,619</span></td>
<td align="center"><span id="quoteElementPiece7" class="">1,6235</span></td>
<td align="center"><span id="quoteElementPiece8" class="">1,591</span></td>
<td align="center"><span id="quoteElementPiece9" class="">1,5995</span></td>
..
</tr>
</table>
</div>
......
I need to get the values corresponding at quoteElementPiece 6,7,8,9 and 17 (currently further in the document) section.
I am simply searching one by one in the code at the moment:
int index6 = doc.IndexOf("quoteElementPiece6");
..
int index17 = doc.IndexOf("quoteElementPiece17");
I want to improve this by scanning in one go and having all the indexes for the substrings I need. Example:
var searchstrings = new string[]
{
"quoteElementPiece6",
"quoteElementPiece7",
"quoteElementPiece8",
"quoteElementPiece9",
"quoteElementPiece17"
};
int[] indexes = getIndexes(document,searchstrings); //indexes should be sorted accordingly to the order in searchstrings
Is there anything native in .NET doing this (LinQ for istance)?
I know there are HTML Parser libraries but I prefer avoiding using those, I would like to learn how to do this for each kind of document.

var words = new []{
"quoteElementPiece6",
"quoteElementPiece7"};
// I take for granted your `document` is a string and not an `HtmlDocument` or whatnot.
var result = words.Select(word=>document.IndexOf(word));
Console.WriteLine(string.Join(",", result));

you can do this with LINQ. check my solution
var doc = "this is my document";
List<string> searchstrings = new List<string>
{
"quoteElementPiece6",
"quoteElementPiece7",
"quoteElementPiece8",
"quoteElementPiece9",
"quoteElementPiece17"
};
var lastIndexOfList = new List<int>(searchstrings.Count);
searchstrings.ForEach(x => lastIndexOfList.Add(doc.LastIndexOf(x)));

var pattern = #"(?s)<tr class=""odd"">.+?</tr>";
var tr = Regex.Match(html, pattern).Value.Replace(" ", "");
var xml = XElement.Parse(tr);
var nums = xml
.Descendants()
.Where(n => (string)n.Attribute("id") != null)
.Where(n => n.Attribute("id").Value.StartsWith("quoteElementPiece"))
.Select(n => Regex.Match(n.Attribute("id").Value, "[0-9]+").Value);

How to extract a text version of the HTML in an XML document?

Suppose I have an XML document that looks something like (basically represents an HTML report):
<html>
<head>...</head>
<body>
<div>
<table>
<tr>
<td>Stuff</td>
</tr>
<tr>
<td>More stuff<br /><br />More stuff on another line and some whitespace... </td>
</tr>
<tr>
<td> Some leading whitespace before this stuff<br />Stuff</td>
</tr>
</table>
</div>
</body>
</html>
I want to (using C#) convert this document into a simple text string that looks something like:
Stuff
More stuff
More stuff on another line and some whitespace...
Some leading whitespace before this stuff
Stuff
It should be smart enough to convert table rows into new lines and insert new lines where any inline br tags were added within a cell. It should also keep any whitespace in the table cells intact. I tried using the XmlDocument class and used the InnerText method on the body node, but it doesn't seem to create the output I am looking for (newlines and whitespace are not intact). Is there a simple way to do this? I know one way to do this would be to extract the HTML as one string and do several regular expressions on it to handle the newlines and whitespace. Thanks!

Try this please:
var doc = XElement.Load("test.xml");
var sb = new StringBuilder();
foreach (var text in doc.DescendantNodes().Where(node => node.NodeType == XmlNodeType.Text))
{
sb.AppendLine(((XText)text).Value);
}
More concise:
foreach (var text in doc.DescendantNodes().OfType<XText>())
{
sb.AppendLine(text.ToString());
}

How to Get element that inside another element by class in HtmlAgilityPack

Hello i making HttpWebResponse and getting the HtmlPage with all data that i need for example table with date info that i need to save them to array list and save it to xml file
Example of html Page
<table>
<tr>
<td class="padding5 sorting_1">
<span>01.03.14</span>
</td>
<td class="padding5 sorting_1">
<span>10.03.14</span>
</td>
</tr>
</table>
my code that not working i using the HtmlAgilityPack,with this i can get info from span that have class
private static List<string> GetListDataByClass(string HtmlSourse, string Class)
{
List<string> data = new List<string>();
HtmlAgilityPack.HtmlDocument DocToParse = new HtmlAgilityPack.HtmlDocument();
DocToParse.LoadHtml(HtmlSourse);
foreach (HtmlNode node in DocToParse.DocumentNode.SelectNodes("//span[#class='" + Class + "']"))
{
if(node.InnerText!=null) data.Add(node.InnerText);
}
return data;
}
,but in my case td have the class i tryied
foreach (HtmlNode node in DocToParse.DocumentNode.SelectNodes("//td[#class='" + Class + "']"))
but this not worked
Sow i need to read this data to get the date 01.03.14 and 10.02.14
Sow any ideas how can i get this dates(01.03.14 and 10.02.14)?

Just change the XPath query to:
DocToParse.DocumentNode.SelectNodes("//td[#class='" + Class + "']/span")
This will select all the spans that are inside a td element with the corresponding class.

How to Get element by class in HtmlAgilityPack

Hello i making HttpWebResponse and getting the HtmlPage with all data that i need for example table with date info that i need to save them to array list and save it to xml file
Example of html Page
<table>
<tr>
<td class="padding5 sorting_1">
<span class="DateHover">01.03.14</span>
</td>
<td class="padding5 sorting_1">
<span class="DateHover" >10.03.14</span>
</td>
</tr>
</table>
my code that not working i using the HtmlAgilityPack
private static string GetDataByIClass(string HtmlIn, string ClassToGet)
{
HtmlAgilityPack.HtmlDocument DocToParse = new HtmlAgilityPack.HtmlDocument();
DocToParse.LoadHtml(HtmlIn);
HtmlAgilityPack.HtmlNode InputNode = DocToParse.GetElementbyId(ClassToGet);//here is the problem i dont have method DocToParse.GetElementbyClass
if (InputNode != null)
{
if (InputNode.Attributes["value"].Value != null)
{
return InputNode.Attributes["value"].Value;
}
}
return null;
}
Sow i need to read this data to get the date 01.03.14 and 10.02.14 for be able to save this to array list (and then to xml file)
Sow any ideas how can i get this dates(01.03.14 and 10.02.14)?

Html Agility Pack has XPATH support, so you can do something like this:
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//span[#class='" + ClassToGet + "']"))
{
string value = node.InnerText;
// etc...
}
This means: get all SPAN elements from the top of the document (first /), recursively (second /) that have a given CLASS attribute. Then for each element, get the inner text.

xpath and htmlagility pack

I figured it out! I will leave this posted just in case some other newbie like myself has the same question.
Answer: **("./td[2]/span[#class='smallfont']")***
I am a novice at xpath and html agility. I am so close yet so far.
GOAL: to pull out 4:30am
by using the following with htmlagility pack:
foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table[#id='weekdays']/tr[2]")){
string time = table.SelectSingleNode("./td[2]").InnerText;
I get it down to "\r\n\t\t\r\n\t\t\t4:30am\r\n\t\t\r\n\t" when I try doing anything with the span I get xpath exceptions. What must I add to the ("./td[2]") to just end up with the 4:30am?
HTML
<td class="alt1 espace" nowrap="nowrap" style="text-align: center;">
<span class="smallfont">4:30am</span>
</td>

I don't know if Linq is an option, but you could have also done something like this:
var time = string.Empty;
var html =
"<td class=\"alt1 espace\" nowrap=\"nowrap\" style=\"text-align: center;\"><span class=\"smallfont\">4:30am</span></td>";
var document = new HtmlDocument() { OptionWriteEmptyNodes = true, OptionOutputAsXml = true };
document.LoadHtml(html);
var timeSpan =
document.DocumentNode.Descendants("span").Where(
n => n.Attributes["class"] != null && n.Attributes["class"].Value == "smallfont").FirstOrDefault();
if (timeSpan != null)
time = timeSpan.InnerHtml;

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Most efficient way to parse delimited into html table c# - c#

I've got the following delimited string with pairs: 1,5|2,5|3,5 I want to create a table as follows: < table> < tr>< td>1< /td>< td>5< /td>< /tr> < tr>< td>2< /td>< td>5< /td>< /tr> < tr>< td>3< /td>< td>5< /td>< /tr> < /table> What's the most efficient way in C#?

Related

Find indexes in String using multiple search items and one single iteration

How to extract a text version of the HTML in an XML document?

How to Get element that inside another element by class in HtmlAgilityPack

How to Get element by class in HtmlAgilityPack

xpath and htmlagility pack

Categories

Resources