HtmlAgilityPack - Parse table and assign rows to custom model - c#

So I'm trying to scrape some website data (specifically the first table here). I am using the table xpath, and trying to get the specific row data assigned to my model.
public static async Task<List<SuspensionModel>> GetSuspensionData()
{
var htmlDocument = new HtmlDocument();
var httpResponseMessage = await _httpClient.GetAsync(_2020SuspUrl);
await EnsureSuccessStatusCode(httpResponseMessage);
var SuspStatsAsHtml = await httpResponseMessage.Content.ReadAsStringAsync();
htmlDocument.LoadHtml(SuspStatsAsHtml);
var suspData = ParseTable(htmlDocument, "/html/body/div[3]/div[3]/div[5]/div[1]/table[1]/tbody/tr");
//return ;
}
private static List<SuspensionModel> ParseTable(HtmlDocument htmlDocument, string xPath)
{
var returnData = new List<SuspensionModel>();
foreach (HtmlNode row in htmlDocument.DocumentNode.SelectNodes(xPath))
{
HtmlNodeCollection cells = row.SelectNodes("td");
var arr = new String[7];
for (int i = 0; i < cells.Count; ++i)
{
arr[i] = cells[i].InnerText;
}
var susp = new SuspensionModel
{
IncidentDate = DateTime.Parse(arr[0]),
OffenderName = arr[1],
OffenderTeam = arr[2],
OffenseDesc = arr[3],
ActionDate = DateTime.Parse(arr[4]),
OffenseLength = arr[5],
SalaryLoss = int.Parse(arr[6])
};
returnData.Add(susp);
}
return returnData;
}
In my ParseTable method, where I am assigning values in my model, how can I access the specific cell data in the given row? Basically, I want to do something like:
foreach row, step through each cell and assign to the correct model value. As I have it now, my cells variable always returns null, so I assume I am not using HtmlAgilityPack correctly.
Any assistance is appreciated here!

I ended up resolving this. I was missing two things, and it turns out it wasn't related to HtmlAgilityPack.
I needed to add .Skip(1) to my foreach row so that it skipped the table header row.
foreach (HtmlNode row in htmlDocument.DocumentNode.SelectNodes(xPath).Skip(1))
I needed to fix my SalaryLoss value. I was assigning it as an int, but I needed to change that to a double as it was a currency value.
SalaryLoss = double.Parse(arr[6], System.Globalization.NumberStyles.Currency)

Related

C# GoogleSheets Update Request not Executing

I'm trying to update values in a Google Spreadsheet, The code executes up to the addRequest.Execute(); however, it does not run the execute statement.
This does work if i run a Append Request, however im not trying to append, im trying to update.
I have the following Scopes for the program static readonly string[] Scope = { SheetsService.Scope.Spreadsheets, DriveService.Scope.Drive};
var range = $"{ClashImport[i][0].ToString()}!B7:F106";
var REALInsertList = new sData.ValueRange();
var InsertList = new List<object>();
for (int n = 0; n < DataImport[i].Count; n++) {
InsertList.Add(DataImport[i][n].AccountName);
InsertList.Add(DataImport[i][n].AccountID);
InsertList.Add(DataImport[i][n].Banned);
InsertList.Add(DataImport[i][n].Suspended);
InsertList.Add(DataImport[i][n].History);
}
REALInsertList.Values = new List<IList<object>> { InsertList };
var addRequest = sheetsService.Spreadsheets.Values.Update(REALInsertList, SheetToImportTo, range);
addRequest.ValueInputOption = SpreadsheetsResource.ValuesResource.UpdateRequest.ValueInputOptionEnum.USERENTERED;
addRequest.Execute();
This example will help you to achieve what you are trying to do:
// Define request parameters.
// The ID of the spreadsheet to update.
string spreadsheetId = "YOUR-SPREADSHEET-ID"; // TODO: Update placeholder value.
// How the input data should be interpreted.
string valueInputOption = "RAW"; // TODO: Update placeholder value. Ex -> RAW
// The new values to apply to the spreadsheet.
List<ValueRange> data = new List<ValueRange>(); // Instanciate a list of type ValueRange
ValueRange values = new ValueRange(); // Instanciate a ValueRange object
values.Range = "A1:B2"; // The range you want to update
// Depending in your number of rows, create some logic to populate them
List<object> firstRow = new List<object> { "Hello", 2};
List<object> secondRow = new List<object> { 3, "Hey!"};
// Populate the values to be inserted in the sheet
values.Values = new List<IList<object>> { firstRow, secondRow };
// add values to the data ValueRange List
data.Add(values);
// TODO: Assign values to desired properties of `requestBody`:
BatchUpdateValuesRequest requestBody = new BatchUpdateValuesRequest();
requestBody.ValueInputOption = valueInputOption;
requestBody.IncludeValuesInResponse = true;
requestBody.Data = data;
// Build and make the request
SpreadsheetsResource.ValuesResource.BatchUpdateRequest request
= service.Spreadsheets.Values.BatchUpdate(requestBody, spreadsheetId);
BatchUpdateValuesResponse response = request.Execute();
IList<IList<object>> updatedValues = response.Responses[0].UpdatedData.Values;
// Print updated values
Console.WriteLine("These are the updated values");
foreach (var row in updatedValues)
{
Console.WriteLine("{0}, {1}", row[0], row[1]);
}
Console.Read();
Following the Method: spreadsheets.values.batchUpdate endpoint documentation and the Try this API, I was able to get an idea on how to build the request update body.
Notice I created a List<ValueRange>, which I populate with the appropriate data and in that way make the request.
Documentation
For more info, you can check:
.NET Quickstart
Google Sheets API .NET reference documentation

Why is this ArrayList duplicating he rows

I am junior developer and I am trying to populate an ArrayList from a Dictionary. My problem is rather then adding a new record to the ArrayList it adds the new record but also overwrites the values for all the other values in the array.
So if I inspect the values as the ArrayList is being populated I see the values from the Dictionary as expected. But when that row is inserted into the ArrayList all of the existing rows are over written with the data from current Dictionary Row. So I end up with an ArrayList with several rows that are a duplicate of the last record added from the dictionary. My code is shown below. Can someone please tell me what am I doing wrong? Code below
ArrayList arrData = new ArrayList();
eSummary edata = new eSummary();
//Starts with the first 50 recods retrieved and adds them to the ArrayList. Loops thru to get remaining records
while (blnEmpty)
{
if (response.IsSuccessStatusCode)
{
string json = response.Content.ReadAsStringAsync().Result;
var jss = new JavaScriptSerializer();
var dict = jss.Deserialize<Dictionary<string, dynamic>>(json);
for (int i = 0; i < dict.Values.Sum(x => x.Count); i++)
{
foreach (var item in dict)
{
string checkId = (dict["data"][i]["Id"]);
edata.Id = dict["data"][i]["Id"];
edata.idExternal = (dict["data"][i]["idExternal"]) == null ? "" : (dict["data"][i]["idExternal"]);
edata.Type = "Video";
edata.ownerId = (dict["data"][i]["uploadedByOwnerId"]);
edata.dateUploaded = Convert.ToDateTime((dict["data"][i]["dateUploaded"]));
edata.durationSeconds = Convert.ToDouble((dict["data"][i]["durationSeconds"]));
edata.category = (dict["data"][i]["categories"]).Count < 1 ? string.Empty : (dict["data"][i]["categories"][0]);
edata.title = (dict["data"][i]["title"]) == string.Empty ? string.Empty : (dict["data"][i]["title"]);
edata.dateRecordStarted = Convert.ToDateTime((dict["data"][i]["dateRecordStart"]));
edata.DateAPIRan = DateTime.Now;
if (CheckAutoTag(checkId, dict["data"][i]["tags"]))
{
edata.AutoTagged = true;
}
else edata.AutoTagged = false;
arrData.Add(edata);
edata is a reference type. You keep updating the values of a single object within the loop.
You need to call new eSummary() and set the values on the new object and then add that to your list.
But do note, you should not be using ArrayList in modern c#. Use a List<eSummary> instead.

How to add distinct value in database using Entity Framework

IEnumerable<WebsiteWebPage> data = GetWebPages();
foreach (var value in data)
{
if (value.WebPage.Contains(".htm"))
{
WebsiteWebPage pagesinfo = new WebsiteWebPage();
pagesinfo.WebPage = value.WebPage;
pagesinfo.WebsiteId = websiteid;
db.WebsiteWebPages.Add(pagesinfo);
}
}
db.SaveChanges();
I want to add only distinct values to database in above code. Kindly help me how to do it as I am not able to find any solution.
IEnumerable<WebsiteWebPage> data = GetWebPages();
foreach (var value in data)
{
if (value.WebPage.Contains(".htm"))
{
var a = db.WebsiteWebPages.Where(i => i.WebPage == value.WebPage.ToString()).ToList();
if (a.Count == 0)
{
WebsiteWebPage pagesinfo = new WebsiteWebPage();
pagesinfo.WebPage = value.WebPage;
pagesinfo.WebsiteId = websiteid;
db.WebsiteWebPages.Add(pagesinfo);
db.SaveChanges();
}
}
}
This is the code that I used to add distinct data.I hope it helps
In addition to the code sample Furkan Öztürk supplied, Make sure your DB has a constraint so that you cannot enter duplicate values in the column. Belt and braces approach.
I assume that by "distinct values" you mean "distinct value.WebPage values":
// get existing values (if you ever need this)
var existingWebPages = db.WebsiteWebPages.Select(v => v.WebPage);
// get your pages
var webPages = GetWebPages().Where(v => v.WebPage.Contains(".htm"));
// get distinct WebPage values except existing ones
var distinctWebPages = webPages.Select(v => v.WebPage).Distinct().Except(existingWebPages);
// create WebsiteWebPage objects
var websiteWebPages = distinctWebPages.Select(v =>
new WebsiteWebPage { WebPage = v, WebsiteId = websiteid});
// save all at once
db.WebsiteWebPages.AddRange(websiteWebPages);
db.SaveChanges();
Assuming that you need them to be unique by WebPage and WebSiteId
IEnumerable<WebsiteWebPage> data = GetWebPages();
foreach (var value in data)
{
if (value.WebPage.Contains(".htm"))
{
WebsiteWebPage pagesinfo = new WebsiteWebPage();
if (db.WebsiteWebPages.All(c=>c.WebPage != value.WebPage|| c.WebsiteId != websiteid))
{
pagesinfo.WebPage = value.WebPage;
pagesinfo.WebsiteId = websiteid;
db.WebsiteWebPages.Add(pagesinfo);
}
}
}
db.SaveChanges();
UPDATE
To optimize this (given that your table contains much more data than your current list), override your equals in WebsiteWebPage class to define your uniqueness criteria then:
var myWebsiteWebPages = data.select(x=> new WebsiteWebPage { WebPage = x.WebPage, WebsiteId = websiteid}).Distinct();
var duplicates = db.WebsiteWebPages.Where(x=> myWebsiteWebPage.Contains(x));
db.WebsiteWebPages.AddRange(myWebsiteWebPages.Where(x=> !duplicates.Contains(x)));
this is a one database query to retrieve ONLY duplicates and then removing them from the list
You can use the following code,
IEnumerable<WebsiteWebPage> data = GetWebPages();
var templist = new List<WebsiteWebPage>();
foreach (var value in data)
{
if (value.WebPage.Contains(".htm"))
{
WebsiteWebPage pagesinfo = new WebsiteWebPage();
pagesinfo.WebPage = value.WebPage;
pagesinfo.WebsiteId = websiteid;
templist.Add(pagesinfo);
}
}
var distinctList = templist.GroupBy(x => x.WebsiteId).Select(group => group.First()).ToList();
db.WebsiteWebPages.AddRange(distinctList);
db.SaveChanges();
Or you can use MoreLINQ here to filter distinct the list by parameter like,
var res = tempList.Distinct(x=>x.WebsiteId).ToList();
db.WebsiteWebPages.AddRange(res);
db.SaveChanges();

How to add data from Firebase to DataGridView using FireSharp

I just want to retrieve data from a Firebase to DataGridView. The code I have is retrieving data already, however, it's retrieving everything to the same row instead of creating a new one. I'm a beginner in coding, so I really need help with that.
I read online that Firebase doesn't "Count" data, so it'd be needed to create a counter, so each time I add or delete data, an update would be needed. I did it and it's working. I created a method to load the data.
private async Task firebaseData()
{
int i = 0;
FirebaseResponse firebaseResponse = await client.GetAsync("Counter/node");
Counter_class counter = firebaseResponse.ResultAs<Counter_class>();
int foodCount = Convert.ToInt32(counter.food_count);
while (true)
{
if (i == foodCount)
{
break;
}
i++;
try
{
FirebaseResponse response2 = await client.GetAsync("Foods/0" + i);
Foods foods = response2.ResultAs<Foods>();
this.dtGProductsList.Rows[0].Cells[0].Value = foods.menuId;
this.dtGProductsList.Rows[0].Cells[1].Value = foods.name;
this.dtGProductsList.Rows[0].Cells[2].Value = foods.image;
this.dtGProductsList.Rows[0].Cells[3].Value = foods.price;
this.dtGProductsList.Rows[0].Cells[4].Value = foods.discount;
this.dtGProductsList.Rows[0].Cells[5].Value = foods.description;
}
catch
{
}
}
MessageBox.Show("Done");
}
OBS: A DataTable exists already(dataTable), there's a DataGridView too which has columns(ID,Name, Image, Price, Discount, Description), which match the number and order given to the .Cells[x]. When the Form loads, dtGProductsList.DataSource = dataTable; I tried replacing [0] for [i].
I expect the data that is beeing retrieved to be set to a new row and not to the same, and to not skip rows. I'm sorry if it's too simple, but I can't see a way out.
I Faced the same problem and here is mu solution :
Counter_class XClass = new Counter_class();
FirebaseResponse firebaseResponse = await client.GetAsync("Counter/node");
string JsTxt = response.Body;
if (JsTxt == "null")
{
return ;
}
dynamic data = JsonConvert.DeserializeObject<dynamic>(JsTxt);
var list = new List<XClass >();
foreach (var itemDynamic in data)
{
list.Add(JsonConvert.DeserializeObject<XClass >
(((JProperty)itemDynamic).Value.ToString()));
}
// Now you have a list you can loop through to put it at any suitable Visual
//control
foreach ( XClass _Xcls in list)
{
Invoke((MethodInvoker)delegate {
DataGridViewRow row(DataGridViewRow)dg.Rows[0].Clone();
row.Cells[0].Value =_Xdcls...
row.Cells[1].Value =Xdcls...
row.Cells[2].Value =Xdcls...
......
dg.Insert(0, row);
}

C# ClosedXML assign values from cells in a specific row to string

I'm using ClosedXML elsewhere in my script where I'm iterating through every row like this and it works.
var workbook = new XLWorkbook(ObjectRepPath);
var rows = workbook.Worksheet(1).RangeUsed().RowsUsed().Skip(1);
foreach (var row in rows)
{
objPage = row.Cell(1).GetString();
objElement = row.Cell(2).GetString();
if (objPage == page && objElement == element)
{
locType = row.Cell(3).GetString();
locParm = row.Cell(4).GetString();
}
}
After that I need to pull the data from the cells in a randomly selected row. Here's what I've got so far, which is not working...
var workbook = new XLWorkbook(extFile);
var ws = workbook.Worksheets.Add("Cell Values");
var rnd = new Random();
int rowNum = rnd.Next(2, workbook.Worksheet(1).RangeUsed().RowsUsed().Count());
var dataRow = ws.Row(rowNum);
string dangit = dataRow.Cell(1).GetString();
System.Diagnostics.Debug.WriteLine("Why is this dang thing not working... " + dangit);
Output: Why is this damn thing not working...
It just comes back empty. No error. Does anyone see something I don't?
Alright, I found the solution.
I changed the line ...
var ws = workbook.Worksheets.Add("Cell Values");
to ....
var ws = workbook.Worksheet(1);
and now this works ....
Storage.StreetAddress = ws.Cell(xlRow, 1).GetString();

Categories