Parsing with AngleSharp

Parsing with AngleSharp - c#

Writing programm to Parse some data from one website using AngleSharp. Unfortunately I didn't find any documentation and it makes understanding realy hard.
How can I by using QuerySelectorAll get only link? I'm getting now just all things <a ...>...</a> with Name of article.
1. Name of artucle
The method I'm using now:
var items = document.QuerySelectorAll("a").Where(item => item.ClassName != null && item.ClassName.Contains("object-title-a text-truncate"));
In the previous example I also used ClassName.Contains("object-name"), but if we deal with table cells, there are no any class. As I understand to parse right element maybee I must use some info about parent also. So here is the question, how can I get this '4' value from tabble cell?
....<th class="strong">Room</th>
<td>4</td>....

Regarding your first question.
Here is an example that you can extract the link address.
This a Link of another Stackoveflow post that is related.
var source = #"<a href='http://kinnisvaraportaal-kv-ee.postimees.ee/muua-odra-tanaval-kesklinnas-valmiv-suur-ja-avar-k-2904668.html?nr=1&search_key=69ec78d9b1758eb34c58cf8088c96d10' class='object-title-a text-truncate'>1. Name of artucle</a>";
var parser = new HtmlParser();
var doc = parser.Parse(source);
var selector = "a";
var menuItems = doc.QuerySelectorAll(selector).OfType<IHtmlAnchorElement>();
foreach (var i in menuItems)
{
Console.WriteLine(i.Href);
}
For your Second question, you can check the example on the documention, here is the Link and below is the code sample:
// Setup the configuration to support document loading
var config = Configuration.Default.WithDefaultLoader();
// Load the names of all The Big Bang Theory episodes from Wikipedia
var address = "https://en.wikipedia.org/wiki/List_of_The_Big_Bang_Theory_episodes";
// Asynchronously get the document in a new context using the configuration
var document = await BrowsingContext.New(config).OpenAsync(address);
// This CSS selector gets the desired content
var cellSelector = "tr.vevent td:nth-child(3)";
// Perform the query to get all cells with the content
var cells = document.QuerySelectorAll(cellSelector);
// We are only interested in the text - select it with LINQ
var titles = cells.Select(m => m.TextContent);

Related

MongoDB search using text index not working c#

I have Added text index on my collection. I am trying to filter the data with text search with some additional filters. But It is not working well with other filters.
{$text:{$search:"test"},Type:"5"}
The above query returns all 42 entries matching the criteria from mongoDB Atlas.
But when I am doing this from c# I think I am querying it wrong. What am I missing here.
var collection = db.GetCollection<TestTbl>("TestTbl");
var filter = Builders<TestTbl>.Filter.Text(searchtext)
&Builders<TestTbl>.Filter.Eq("TypeID", TypeID);
var data = collection.Find(filter).ToList();
here data is returned null.
When I am giving text only in the filter it works fine.
{$text:{$search:"test"}}
var collection = db.GetCollection<TestTbl>("TestTbl");
var filter = Builders<TestTbl>.Filter.Text(searchtext);
var data = collection.Find(filter).ToList();

The Error was with my model it was Guid Type forgot to mention the bsontype string for the field representation

How do I a grab a parameter within a document MONGODB (C#)

i have the following problem, i filtered the collection to get the specific document in collection which i need. The document inside it has some variable with some values. How do i grab the specific variable and its value from the specific document.
MongoClient client = new MongoClient();
var db = client.GetDatabase("myfirstdb");
var collection = db.GetCollection<PlayerInfo>("players");
var filter = Builders<PlayerInfo>.Filter.Eq("playerName", player.Name);
//find in document filter the parameter "isAdmin" and grab its value.
This is how my document looks like.

You have to use Project clause when you perform the find operation on the filter. Below code will do.
bool isAdmin = collection.Find(filter).Project(x => x.isAdmin).FirstOrDefault();

Avoid full table scan in LiteDB?

See code:
var lines = new List<PosLine>(){
new PosLine{Name="John", Address="dummy1", Tstamp=DateTime.Now},
new PosLine{Name="Jane", Address="dummy2", Tstamp=DateTime.Now}
};
using(var db = new LiteDatabase(#"test.db"))
{
var posLines = db.GetCollection<PosLine>("POS");
foreach(var line in lines)
{
var id = posLines.Insert(line);
Console.WriteLine("id=" + id.ToString());
}
var names = posLines.FindAll().Select(p => p.Name).ToList();
foreach(var name in names)
{
Console.WriteLine("name=" + name);
}
}
The line var names = posLines.FindAll().Select(p => p.Name).ToList(); tries to get a list of "Name", but in this case, it's a full table scan. Is there a way to avoid full table scan, like if I create an index on "Name" property, and then fetch all names from that index?

If you are reading all documents you will never avoid full scan. Using an index in Name you can do full index scan (avoiding full "table" scan). The diference between this two full scan is deserialization time and amount data read (index full scan is much more cheap) .
Unfortunately, in current version of LiteDB you have no options to get index key only. It´s quite easy to implement that, so open an issue on github that could be implemented in next version.

Getting Href property with anglesharp linq query

I am trying to understand how to use anglesharp.
I made this code based on the example (https://github.com/AngleSharp/AngleSharp):
// Setup the configuration to support document loading
var config = Configuration.Default.WithDefaultLoader();
// Load the names of all The Big Bang Theory episodes from Wikipedia
var address = "http://store.scramblestuff.com/";
// Asynchronously get the document in a new context using the configuration
var document = await BrowsingContext.New(config).OpenAsync(address);
// This CSS selector gets the desired content
var menuSelector = "#storeleft a";
// Perform the query to get all cells with the content
var menuItems = document.QuerySelectorAll(menuSelector);
// We are only interested in the text - select it with LINQ
var titles = menuItems.Select(m => m.TextContent).ToList();
var output = string.Join("\n", titles);
Console.WriteLine(output);
This works as expected but now I want to access the Href property but I am unable to do this:
var links = menuItems.Select(m => m.Href).ToList();
When I look in the debugger I can see in results view that the HtmlAnchorElement enumerable object has a Href property but I am obviously not trying to access it right.
None of the examples in the documentation show a property being accessed so I guess it's something so simple that doesn't need to be shown but I am not seeing how to do it.
Can anyone show me how I should be accessing a html property with angle sharp?
edit:
This works when I cast it to the correct type
foreach (IHtmlAnchorElement menuLink in menuItems)
{
Console.WriteLine(menuLink.Href.ToString());
}
How would I write that as a Linq statement like the titles variable?

Alternative to har07's answer:
var menuItems = document.QuerySelectorAll(menuSelector).OfType<IHtmlAnchorElement>();

You can cast to IHtmlAnchorElement as follow :
var links = menuItems.Select(m => ((IHtmlAnchorElement)m).Href).ToList();
or using Cast<IHtmlAnchorElement>() :
var links = menuItems.Cast<IHtmlAnchorElement>()
.Select(m => m.Href)
.ToList();

im a bit late to this topic, but you can use
string link = menuItem.GetAttribute("href");
or this if its a list of items
List<string> menuItems = LinkList.Select(item => item.GetAttribute("href")) .ToList();

Get items value from Dynamodb search result

I have to following code to query from dynamodb
Search search = table.Query(new QueryOperationConfig { Filter = filter, AttributesToGet = list of attributes});
I can see there is data in search by expanding its node while debugging, but could not find an easy way to get the items key and values directly.
I tried with
List<Document> documentSet = new List<Document>();
do
{
documentSet = search.GetNextSet();
foreach (var document in documentSet)
{
HttpContext.Current.Response.Write(document["columnName"]);
HttpContext.Current.Response.Write(document["columnName"]);
}
} while (!search.IsDone);
Is there any direct way to get the keys and value from Search object in json or table any in any thing?
Thanks

Given an individual Document (in your case, the document object), you can call document.ToJson() or document.ToJsonPretty() to retrieve the JSON representation of the document. This blog post provides more details.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parsing with AngleSharp - c#

Related

MongoDB search using text index not working c#

How do I a grab a parameter within a document MONGODB (C#)

Avoid full table scan in LiteDB?

Getting Href property with anglesharp linq query

Get items value from Dynamodb search result

Categories

Resources