I need to select the value (hours) related to an specific date. For example in the html below I need to read the number 24:20 based on the number 6;
this is the html:
<div class="day-ofmonth">
<div class="day-ofmonth">
<span class="day-num">6</span>
<span class="available-time">24:20</span>
</div>
<div class="day-ofmonth">
<span class="day-num">7</span>
<span class="available-time">133:50</span>
</div>
<div class="day-ofmonth">
<div class="day-ofmonth">
if I use:
IWebElement t_value = d.FindElement(By.XPath(".//* [#id='calinfo']/div[9]/span[2]"));
var t_again2 = t_value.GetAttribute("textContent");
i will get 24:20; but i need to get the value 24:20(in this case) based on number 6 (6 refers to day of the month) and not the Xpath (everyday will be a different date). If anyone can point me in the right direction, Thanks
string availableTime = null;
// Find all elements with class = 'day-num'
var dayNums = d.FindElements(By.XPath("//span[#class='day-num']"));
foreach (IWebElement dayNum in dayNums)
{
// check if text is equal to 6
if (dayNum.Text == "6")
{
// get the following sibling with class = 'available-time', then get the text
availableTime = dayNum.FindElement(By.XPath("following-sibling::span[#class='available-time']")).Text;
break;
}
}
A one liner solution:
string availableTime = d.FindElement(By.XPath("//span[#class='day-num' and text()='6']/following-sibling::span[#class='available-time']")).Text;
xpath=//span[text()='6')]/following-sibling::span[1]
Related
I am building a scraper to be used on many sites (too many to scrape manually using a web scraping tool such as Octoparse).
Each site will probably be different in structure. Some sites may have data that I wish to be scraped; some may not. This is to be determined using a list of keywords/keyphrases. Of sites that I wish data to be parsed, these are likely to be presented in a list of some way. However, the HTML elements used to present the list is indeterminate (i.e. could be a ul list, li list, a div list, a table, etc).
If a keyword/keyphrase is found, I wish for not only that element to be parsed, but all others that may be part of the same list/group.
Example 1
<div>
<h1>Random content I am not interested in</h1>
</div>
<div>
<h1>Some more random content I am not interested in</h1>
</div>
<div>
<ul>
<li>Dogs</li>
<li>Cats</li>
<li>Birds</li>
</ul>
</div>
Example 2
<div>
<h1>Random content I am not interested in</h1>
</div>
<div>
<h1>Some more random content I am not interested in</h1>
</div>
<div>
<div>
<div>
<div>
<h1>Bob</h1>
<p>A description of Bob</p>
</div>
<div>
<h1>Ben</h1>
<p>A description of Ben</p>
</div>
<div>
<h1>Bill</h1>
<p>A description of Bill</p>
</div>
</div>
</div>
</div>
From example one, if I had identified the element Dogs, I would like the result to be Dogs, Cats, Birds.
From example two, if I had identified Ben, I would like the result to be 3 div elements, each of which contains the heading and paragraph; the key is that all results are to include HTML, not just text.
Any help/guidance would be much appreciated.
I managed something like this:
static IEnumerable<string> FindSimilarItems(string html, string[] values, int maxDepth)
{
var doc = new HtmlDocument();
doc.LoadHtml(html);
var output = new List<string>();
foreach (var value in values)
{
var rootElement = doc.DocumentNode.SelectSingleNode($"//*[text()='{value}']");
if (rootElement == null) continue;
for (int i = 0; i < maxDepth; i++)
{
var newXpath = RemoveXpathGroupIndex(rootElement.XPath, i);
var newElements = doc.DocumentNode.SelectNodes(newXpath);
if (newElements.Count <= 1) continue;
output.AddRange(newElements.Select(x => x.InnerText));
}
}
return output.GroupBy(x => x).Select(x => x.First()).ToList();
}
static string RemoveXpathGroupIndex(string xpath, int groupElement)
{
var splited = xpath.Split('/');
var pickedElement = splited.Length - 1 - groupElement;
splited[pickedElement] = splited[pickedElement].Substring(0, splited[pickedElement].IndexOf('['));
return string.Join("/", splited);
}
This code:
var similarItems = FindSimilarItems(input1, new string[] { "Dogs" }, 3);
Will return
["Dogs", "Cats", "Birds"]
I need to make "placeholders" where a foreach loop iterations does not fill out a row in a grid system.
There is room for 3 columns per row.
If there is only 1 iterations in the loop, make 2 empty column with eg. a background image.
If there is only 2 iterations in the loop, make 1 empty column with eg. a background image.
If there is 3, 6, 9, 12 etc. per row, make no placeholders.
I'm looking for a more dynamic way to make these logics (and more clean code).
var loop = GetLoop("ItemPublisher:Items.List");
int totalItems = loop.Count;
int remainders = totalItems % 3;
int placeholders = 3 - remainders;
string renderPlaceholders = placeholders == 3 ? "0" : placeholders.ToString();
int index = 0;
#foreach(var i in loop){
var title = i.GetString("ItemPublisher:Item.Title");
var imagepath = !string.IsNullOrEmpty(i.GetString("ItemPublisher:Item.Image.ImagePath")) ? i.GetString("ItemPublisher:Item.Image.ImagePath") : "/Files/Images/placeholder.jpg";
<div class="grid__col-md-4">
<h4>#title</h4>
<img src="/Admin/Public/GetImage.ashx?width=992&height=560&crop=0&Compression=75&image=#imagepath"/>
</div>
}
#*Render placeholders*#
#if(placeholders == 1)
{
var imagepath = "/Files/Images/placeholder.jpg";
<div class="grid__col-md-4 placeholder">
<h4></h4>
<img src="/Admin/Public/GetImage.ashx?width=992&height=560&crop=0&Compression=75&image=#imagepath"/>
</div>
}
else if(placeholders == 2)
{
var imagepath = "/Files/Images/placeholder.jpg";
<div class="grid__col-md-4 placeholder">
<h4></h4>
<img src="/Admin/Public/GetImage.ashx?width=992&height=560&crop=0&Compression=75&image=#imagepath"/>
</div>
<div class="grid__col-md-4 placeholder">
<h4></h4>
<img src="/Admin/Public/GetImage.ashx?width=992&height=560&crop=0&Compression=75&image=#imagepath"/>
</div>
}
I just can't figure this one.
I have to search through all nodes that have classes with "item extend featured" values in it (code below). In those classes I need to select every InnerText of <h2 class="itemtitle"> and href value in it, plus all InnerTexts from <div class="title-additional">.
<li class="item extend featured">
<div class="title-box">
<h2 class="itemtitle">
<a target="_top" href="www.example.com/example1/example2/exammple4/example4" title="PC Number 1">PC Number 1</a>
</h2>
<div class="title-additional">
<div class="title-km">150 km</div>
<div class="title-year">2009</div>
<div class="title-price">250 €</div>
</div>
The output should be something like this:
Title:
href:
Title-km:
Title-year:
Title-Price:
--------------
Title:
href:
Title-km:
Title-year:
Title-Price:
--------------
So, the question is, how to traverse through all "item extend featured" nodes in html and select items I need above from each node?
As I understand, something like this should work but it breaks halfway
EDIT: I just noticed, there are ads on the site that share the exact same class and they obviously don't have the elements I need. More problems to think about.
var items1 = htmlDoc.DocumentNode.SelectNodes("//*[#class='item extend featured']");
foreach (var e in items1)
{
var test = e.SelectSingleNode(".//a[#target='_top']").InnerText;
Console.WriteLine(test);
}
var page = new HtmlDocument();
page.Load(path);
var lists = page.DocumentNode.SelectNodes("//li[#class='item extend featured']");
foreach(var list in lists)
{
var link = list.SelectSingleNode(".//*[#class='itemtitle']/a");
string title = link.GetAttributeValue("title", string.Empty);
string href = link.GetAttributeValue("href", string.Empty);
string km = list.SelectSingleNode(".//*[#class='title-km']").InnerText;
string year = list.SelectSingleNode(".//*[#class='title-year']").InnerText;
string price = list.SelectSingleNode(".//*[#class='title-price']").InnerText;
Console.WriteLine("Title: %s\r\n href: %s\r\n Title-km: %s\r\n Title-year: %s\r\n Title-Price: %s\r\n\r\n", title, href, km, year, price);
}
What you are trying to achieve requires multiple XPath expressions as you can't return multiple results at different levels using one query (unless you use Union perhaps).
What you might be looking for is something similar to this:
var listItems = htmlDoc.DocumentNode.SelectNodes("//li[#class='item extend featured']");
foreach(var li in listItems) {
var title = li.SelectNodes("//h2/a/text()");
var href = li.SelectNodes("//h2/a/#href");
var title_km = li.SelectNodes("//div[#class='title-additional']/div[#class='title-km']/text()");
var title_... // other divs
}
Note: code not tested
I`m using MvcGrid.Net
Here is my cshtml page
<div class="well">
<div class="form-inline">
<div class="form-group">
<input type="text" class="form-control" placeholder="Opprtunity ID" data-mvcgrid-type="filter" data-mvcgrid-option="opprtunityid" />
</div>
<div class="form-group">
<input class="form-control" placeholder="Cluster" data-mvcgrid-type="filter" data-mvcgrid-option="Cluster" />
</div>
<button type="button" class="btn btn-default" data-mvcgrid-apply-filter="click">Apply</button>
</div>
</div>
I have two simple search button. When I can try to bind them to the MVC grid confing file i can't see the value in the QueryOptions.
Here is my grid-options:
.WithRetrieveDataMethod((context) =>
{
var options = context.QueryOptions;
int totalRecords;
var repo = DependencyResolver.Current.GetService<General>();
string sortColumn = options.GetSortColumnData<string>();
var items = repo.GetData(out totalRecords,
options.GetFilterString("opprtunityid"),
options.GetFilterString("Cluster"),
//active,
options.GetLimitOffset(),
options.GetLimitRowcount(),
sortColumn, options.SortDirection == SortDirection.Dsc);
return new QueryResult<SourcedPartner>()
{
Items = items,
TotalRecords = totalRecords
}
options.GetFilterString("opprtunityid") here i have a null value.
Can someone explain me why?
When using MVCGrid.Net, you have to make sure that you set up the table definition in MVCGridConfig.cs .
Key elements to filtering are:
1) When declaring the column, you must make sure that you add the following code to the column definition -
.AddColumns(cols => {
cols.Add("opportunityid").WithVisibility(false)
.WithFiltering(true) // MUST have filtering enabled on column definion, otherwise it will not appear in QueryOptions
.WithValueExpression(i => i.OpportunityID);
cols.Add("Cluster").WithHeaderText("Cluster")
.WithFiltering(true)
.WithVisibility(false)
.WithAllowChangeVisibility(true)
.WithValueExpression(i => i.Cluster);
2) You must make sure to include filtering as part of your MVCGridBuilder construction -
MVCGridDefinitionTable.Add("Filtered", new MVCGridBuilder<SourcedPartner>()
.AddColumns(....)
.WithSorting(true, "MySortedColumnName")
.WithFiltering(true) // This lets the GridContext know that something will populate QueryOptions.Filters section
.WithRetrieveDataMethod((context) =>
{
var options = context.QueryOptions;
string opID = options.GetFilterString("opprtunityid");
string cluster = options.GetFilterString("Cluster");
.......
});
When you debug your code, the Filters portion of QueryOptions will be populated with your values from the input boxes. If there is no value, you will have a zero length string that you must check for.
The column needs to have filtering enabled. The builder must have filtering enabled. The column name must match the data-mvcgrid-option name.
When all of these things are set up, you should see the value from your inputs in the Filter section of the QueryOptions.
Know this is late, hope this helps.
Look in the URL to see what variable/s are being sent and set accordingly in options.GetFilterString(******).
Worked for me.
I am using MVC + EF
I have a Feed xml file url that gets updated every 7 minute with items, every time a new item gets added I retrieve all the items to a list variable and then I add these varible to my database table. After that I fill a new list variable which is my ViewModel from the database table. Then I declare the ViewModel inside my view which is a .cshtml file and loop throught all of the objects and display them.
How can I make sure that the newest items get placed on the top and not in the bottom and also the numbers displays in correct order?
This is how I display the items inside my cshtml note that I use a ++number so the newest item needs to be 1 and so on ::
#model Project.Viewmodel.ItemViewModel
#{
int number = 0;
}
<div id="news-container">
#foreach (var item in Model.NewsList.OrderByDescending(n => n.PubDate))
{
<div class="grid">
<div class="number">
<p class="number-data">#(++number)</p>
</div>
<p class="news-title">#(item.Title)</p>
<div class="item-content">
<div class="imgholder">
<img src="#item.Imageurl" />
<p class="news-description">
#(item.Description)
<br />#(item.PubDate) |
Source
</p>
</div>
</div>
</div>
}
</div>
This is how I fill the viewmodel which I use inside the .cshtml file to iterate throught and display the items
private void FillProductToModel(ItemViewModel model, News news)
{
var productViewModel = new NewsViewModel
{
Description = news.Description,
NewsId = news.Id,
Title = news.Title,
link = news.Link,
Imageurl = news.Image,
PubDate = news.Date,
};
model.NewsList.Add(productViewModel);
}
If you check this image thats how it gets displayed with the numbers, thats incorrect.
If you see the arrows thats how it should be, how can I accomplish that?
Any kind of help is appreciated :)
note: When I remove .OrderByDescending, the numbers are correctly on each grid. But I need the .OrderByDescending beacuse i want the latest added item in the top.
Try this:
#model Project.Viewmodel.ItemViewModel
#{
int number = 0;
var NewsItems=Model.NewsList.OrderByDescending(n => n.PubDate).ToList();
}
<div id="news-container">
#foreach (var item in NewsItems)
{
<div class="grid">
<div class="number">
<p class="number-data">#(++number)</p>
</div>
<p class="news-title">#(item.Title)</p>
<div class="item-content">
<div class="imgholder">
<img src="#item.Imageurl" />
<p class="news-description">
#(item.Description)
<br />#(item.PubDate) |
Source
</p>
</div>
</div>
</div>
}
</div>
Looking at your sketch I assume you have float: left or display: inline-block for a grid class. Adding float: right might do the trick.
If that does not help please post CSS you have.
just a quick word..
you are passing NewsViewModel to the view and performing iteration on ItemViewModel ..y?
do u think this may be the cause of the problem..
Regards
You could sort your news list using the CompareTo method:
model.NewsList.Sort((a, b) => b.PubDate.Date.CompareTo(a.PubDate.Date));
Once you have the list sorted correctly, you can simply use CSS to display the news list two items per row. See this fiddle.
The fiddle is a revised one which was provided to me in a similar question I asked before.
Try this one
private void FillProductToModel(ItemViewModel model, News news)
{
var newList = list.OrderByDescending(x => x.News.Date).toList();
var productViewModel = new NewsViewModel
{
Description = newList .Description,
NewsId = newList .Id,
Title = newList .Title,
link = newList .Link,
Imageurl = newList .Image,
PubDate = newList .Date,
};
model.NewsList.Add(productViewModel);