Getting Href property with anglesharp linq query - c#

I am trying to understand how to use anglesharp.
I made this code based on the example (https://github.com/AngleSharp/AngleSharp):
// Setup the configuration to support document loading
var config = Configuration.Default.WithDefaultLoader();
// Load the names of all The Big Bang Theory episodes from Wikipedia
var address = "http://store.scramblestuff.com/";
// Asynchronously get the document in a new context using the configuration
var document = await BrowsingContext.New(config).OpenAsync(address);
// This CSS selector gets the desired content
var menuSelector = "#storeleft a";
// Perform the query to get all cells with the content
var menuItems = document.QuerySelectorAll(menuSelector);
// We are only interested in the text - select it with LINQ
var titles = menuItems.Select(m => m.TextContent).ToList();
var output = string.Join("\n", titles);
Console.WriteLine(output);
This works as expected but now I want to access the Href property but I am unable to do this:
var links = menuItems.Select(m => m.Href).ToList();
When I look in the debugger I can see in results view that the HtmlAnchorElement enumerable object has a Href property but I am obviously not trying to access it right.
None of the examples in the documentation show a property being accessed so I guess it's something so simple that doesn't need to be shown but I am not seeing how to do it.
Can anyone show me how I should be accessing a html property with angle sharp?
edit:
This works when I cast it to the correct type
foreach (IHtmlAnchorElement menuLink in menuItems)
{
Console.WriteLine(menuLink.Href.ToString());
}
How would I write that as a Linq statement like the titles variable?

Alternative to har07's answer:
var menuItems = document.QuerySelectorAll(menuSelector).OfType<IHtmlAnchorElement>();

You can cast to IHtmlAnchorElement as follow :
var links = menuItems.Select(m => ((IHtmlAnchorElement)m).Href).ToList();
or using Cast<IHtmlAnchorElement>() :
var links = menuItems.Cast<IHtmlAnchorElement>()
.Select(m => m.Href)
.ToList();

im a bit late to this topic, but you can use
string link = menuItem.GetAttribute("href");
or this if its a list of items
List<string> menuItems = LinkList.Select(item => item.GetAttribute("href")) .ToList();

Related

Parsing with AngleSharp

Writing programm to Parse some data from one website using AngleSharp. Unfortunately I didn't find any documentation and it makes understanding realy hard.
How can I by using QuerySelectorAll get only link? I'm getting now just all things <a ...>...</a> with Name of article.
1. Name of artucle
The method I'm using now:
var items = document.QuerySelectorAll("a").Where(item => item.ClassName != null && item.ClassName.Contains("object-title-a text-truncate"));
In the previous example I also used ClassName.Contains("object-name"), but if we deal with table cells, there are no any class. As I understand to parse right element maybee I must use some info about parent also. So here is the question, how can I get this '4' value from tabble cell?
....<th class="strong">Room</th>
<td>4</td>....
Regarding your first question.
Here is an example that you can extract the link address.
This a Link of another Stackoveflow post that is related.
var source = #"<a href='http://kinnisvaraportaal-kv-ee.postimees.ee/muua-odra-tanaval-kesklinnas-valmiv-suur-ja-avar-k-2904668.html?nr=1&search_key=69ec78d9b1758eb34c58cf8088c96d10' class='object-title-a text-truncate'>1. Name of artucle</a>";
var parser = new HtmlParser();
var doc = parser.Parse(source);
var selector = "a";
var menuItems = doc.QuerySelectorAll(selector).OfType<IHtmlAnchorElement>();
foreach (var i in menuItems)
{
Console.WriteLine(i.Href);
}
For your Second question, you can check the example on the documention, here is the Link and below is the code sample:
// Setup the configuration to support document loading
var config = Configuration.Default.WithDefaultLoader();
// Load the names of all The Big Bang Theory episodes from Wikipedia
var address = "https://en.wikipedia.org/wiki/List_of_The_Big_Bang_Theory_episodes";
// Asynchronously get the document in a new context using the configuration
var document = await BrowsingContext.New(config).OpenAsync(address);
// This CSS selector gets the desired content
var cellSelector = "tr.vevent td:nth-child(3)";
// Perform the query to get all cells with the content
var cells = document.QuerySelectorAll(cellSelector);
// We are only interested in the text - select it with LINQ
var titles = cells.Select(m => m.TextContent);

How do I isolate the name property of a model being included in my controller for the view?

I have two models DiscountTypes and Discounts. In my view I can display the DiscountType.Name property of my Discountsin my foreach loop by adding the following to my controller:
var eDiscounts = db.EDiscounts.Include(e => e.DiscountTypes);
and in my view:
#Html.DisplayFor(modelItem => item.DiscountTypes.Name)
This works fine, however, I would like to get the actual string being printed out in order to change the values out in a switch statement(for example):
case "Apparel and Accessories":
DiscountTypes.Name = "foo";
break;
I have tried this:
var DTDiscounts = db.EDiscounts.Include(e => e.DiscountTypes);
var DTD = DTDiscounts.ToList();
var myDis = (from e in DTD select e.DiscountTypes.Name).ToList();
ViewBag.DType = myDis;
Which simply prints out System.Collections.Generic.List`1[System.String]
I have also tried using .First()but it just gives me the first type in the iteration repeated for all rows. How would I go about doing this?

Trying to get a list of a single field from all the documents in my Mongo database

I'm using the last driver. My documents are of the form
{
"ItemID": 292823,
....
}
First problem: I'm attempting to get a list of all the ItemIDs, and then sort them. However, my search is just pulling back all the _id, and none of the ItemIDs. What am I doing wrong?
var f = Builders<BsonDocument>.Filter.Empty;
var p = Builders<BsonDocument>.Projection.Include(x => x["ItemID"]);
var found= collection.Find(f).Project<BsonDocument>(p).ToList().ToArray();
When I attempt to query the output, I get the following.
found[0].ToJson()
"{ \"_id\" : ObjectId(\"56fc4bd9ea834d0e2c23a4f7\") }"
It's missing ItemID, and just has the mongo id.
Solution: I messed up the case. It's itemID, not ItemID. I'm still having trouble with the sorting.
Second problem: I tried changing the second line to have x["ItemID"].AsInt32, but then I got an InvalidOperationException with the error
Rewriting child expression from type 'System.Int32' to type
'MongoDB.Bson.BsonValue' is not allowed, because it would change the
meaning of the operation. If this is intentional, override
'VisitUnary' and change it to allow this rewrite.
I want them as ints so that I can add a sort to the query. My sort was the following:
var s = Builders<BsonDocument>.Sort.Ascending(x => x);
var found= collection.Find(f).Project<BsonDocument>(p).Sort(s).ToList().ToArray();
Would this be the correct way to sort it?
Found the solution.
//Get all documents
var f = Builders<BsonDocument>.Filter.Empty;
//Just pull itemID
var p = Builders<BsonDocument>.Projection.Include(x => x["itemID"]);
//Sort ascending by itemID
var s = Builders<BsonDocument>.Sort.Ascending("itemID");
//Apply the builders, and then use the Select method to pull up the itemID's as ints
var found = collection.Find(f)
.Project<BsonDocument>(p)
.Sort(s)
.ToList()
.Select(x=>x["itemID"].AsInt32)
.ToArray();

Create a list of items within 'var response'

I have the following code:
var request = new GeocodingRequest();
request.Address = postcode;
request.Sensor = "false";
var response = GeocodingService.GetResponse(request);
var result = response.Results. ...?
I'd very much like to get result as a list, but I can't seem to convert it. I know I can do something like response.Results.ToList<string>();, but have had no luck.
Can anyone help please :)
Well you can just use:
GeocodingResult[] results = response.Results;
or
List<GeocodingResult> results = response.Results.ToList();
If you want a list of strings, you'll need to decide how you want to convert each result into a string. For example, you might use:
List<string> results = response.Results
.Select(result => result.FormattedAddress)
.ToList();
It is defined as:
[JsonProperty("results")]
public GeocodingResult[] Results { get; set; }
if you want to make it list call: response.Results.ToList().
But why do you want to make it list? You can insert items into list, but I don't think you need it.
assuming response.Results is IEnumerable, just make sure System.Linq is available as a namespace and say response.Results.ToList()

return all xml items with same name in linq

how can I query an xml file where I have multiple items with the same name, so that I can get all items back. Currently I only get the first result back.
I managed to get it to work with the following code, but this returns all items where the specific search criteria is met.
What I want as output is to get two results back where the location is Dublin for example.
The question is how can I achieve this with linq to xml
Cheers Chris,
Here is the code
string location = "Oslo";
var training = (from item in doc.Descendants("item")
where item.Value.Contains(location)
select new
{
event = item.Element("event").Value,
event_location = item.Element("location").Value
}).ToList();
The xml file looks like this
<training>
<item>
<event>C# Training</event>
<location>Prague</location>
<location>Oslo</location>
<location>Amsterdam</location>
<location>Athens</location>
<location>Dublin</location>
<location>Helsinki</location>
</item>
<item>
<event>LINQ Training</event>
<location>Bucharest</location>
<location>Oslo</location>
<location>Amsterdam</location>
<location>Helsinki</location>
<location>Brussels</location>
<location>Dublin</location>
</item>
</training>
You're using item.Element("location") which returns the first location element under the item. That's not necessarily the location you were looking for!
I suspect you actually want something more like:
string location = "Oslo";
var training = from loc in doc.Descendants("location")
where loc.Value == location
select new
{
event = loc.Parent.Element("event").Value,
event_location = loc.Value
};
But then again, what value does event_location then provide, given that it's always going to be the location you've passed into the query?
If this isn't what you want, please give more details - your question is slightly hard to understand at the moment. Details of what your current code gives and what you want it to give would be helpful - as well as what you mean by "name" (in that it looks like you actually mean "value").
EDIT: Okay, so it sounds like you want:
string location = "Oslo";
var training = from loc in doc.Descendants("location")
where loc.Value == location
select new
{
event = loc.Parent.Element("event").Value,
event_locations = loc.Parent.Elements("location")
.Select(e => e.Value)
};
event_locations will now be a sequence of strings. You can get the output you want with:
for (var entry in training)
{
Console.WriteLine("Event: {0}; Locations: {1}",
entry.event,
string.Join(", ", entry.event_locations.ToArray());
}
Give that a try and see if it's what you want...
This might not be the most efficient way of doing it, but this query works:
var training = (from item in root.Descendants("item")
where item.Value.Contains(location)
select new
{
name = item.Element("event").Value,
location = (from node in item.Descendants("location")
where node.Value.Equals(location)
select node.Value).FirstOrDefault(),
}).ToList();
(Note that the code wouldn't compile if the property name was event, so I changed it to name.)
I believe the problem with your code was that the location node retrieved when creating the anonymous type didn't search for the node with the desired value.

Categories