Parsing HTML in C# having a hard time with GetByteArrayAsync

Parsing HTML in C# having a hard time with GetByteArrayAsync - c#

public BarchartParser()
{
// Initialize list
StockSymbols = new List<string>();
// Add items
ParseBarchart();
}
this is the C'tor that calls the method
private async void ParseBarchart()
{
try
{
#region Get Html Document
// Get response from site
HttpClient http = new HttpClient();
var response = await http.GetByteArrayAsync(BARCHART_WEBSITE);
/* Break or W/e happens on this line ^^^ */
// Encode html response to UTF-8
string source = Encoding.GetEncoding(BARCHART_ENCODING)
.GetString(response, 0, response.Length - 1);
// Get html
HtmlDocument document = new HtmlDocument();
document.LoadHtml(source);
#endregion
#region Get Data From Table
// Get table containining stock info
HtmlNode table = document.DocumentNode.Descendants()
.Single<HtmlNode>
(
x => (x.Name == "table") &&
(x.Attributes["class"] != null) &&
(x.Attributes["class"].Value.Equals("datatable ajax")) &&
(x.Attributes["id"].Value.Equals("dt1"))
);
// Get 'tbody' element from table
HtmlNode tbody = table.Descendants("tbody").FirstOrDefault();
// Get all rows from the table
List<HtmlNode> allStocks = tbody.Descendants("tr").ToList();
// For each row, id is "td1_X" where X is the symbol of the stock
foreach (HtmlNode row in allStocks)
{
StockSymbols.Add(row.Attributes["id"].Value.ToString()
.Split(new char[] { '_' })[1]);
}
#endregion
}
catch
{
StockSymbols = new List<string>();
StockSymbols.Add("this didn't work");
}
}
And the code from a simple form application that uses this:
BarchartParser barchartData;
public Form1()
{
InitializeComponent();
barchartData = new BarchartParser();
}
private void Form1_Load(object sender, EventArgs e)
{
if (barchartData.StockSymbols != null && barchartData.StockSymbols.Count > 0)
MessageBox.Show(barchartData.StockSymbols[0]);
else
MessageBox.Show("barchartData.StockSymbols is null or count == 0");
this.Close();
}
Not exactly sure what's going on here. It worked for one time that I debugged and then it stopped working.
This code is part of a function that is called during a C'tor. When this throw or whatever happens,
It just continues to the next breakpoint that I set in debug mode... Anyone has a clue of what
May be the cause of this?
Edit: I know it's not a throw because the code in the catch block doesn't happen. It simply moves on
Just in general, i'm following this guide https://code.msdn.microsoft.com/Parsing-Html-using-C-721be358/sourcecode?fileId=122353&pathId=1834557721

You are not await-ing async method so only synchronous portion of the method (basically up to first real await) will be executed as part of your constructor call and the rest will eventually run on some arbitrary tread (possibly bringing process down in case of exception).
Generally you can't call async methods from constructor without good chance of deadlock if you try to call .Result or .Wait() (await vs Task.Wait - Deadlock?). As an option you can see if Fire-and-forget with async vs "old async delegate" works for your case.
Proper fix would be to move async operation from synchronous method (like constructor) to explicit async method and call it accordingly.
Hacky fix (with likley deadlock):
public BarchartParser()
{
...
ParseBarchart().Wait();
}

When an await statement throws an exception, only a try block can catch it. I suggest you add a try-catch or try-finally to catch the exception and handle it properly.

Related

C# TPL: Possible to restart a failed Pipeline at an arbitrary step?

I have a data processing job that consists of about 20 sequential steps. The steps all fall under one of three categories:
do some file manipulation
import / export data from a database
make a call to a 3rd party web API
I've refactored the code from one long, awful looking method to a pipeline pattern, using examples here and here. All of the steps are TransformBlock, such as
var stepThirteenPostToWebApi = new TransformBlock<FileInfo, System.Guid>(async csv =>
{
dynamic task = await ApiUtils.SubmitData(csv.FullName);
return task.guid;
});
The code works most of the time, but occasionally a step in the pipeline fails for whatever reason - let's say a corrupt file can't be read in step 6 of 20 (just an example - any step could fail). The pipeline stops running further tasks, as it should.
However, the 3rd party web API introduces a challenge - we are charged for each job we initiate whether we execute all 20 steps or just the first one.
I would like to be able to fix whatever went wrong in the problem step (again, for our example let's say I fix the corrupt file in step 6 of 20), then pick back up at step 6. The 3rd party web API has a GUID for each job, and is asynchronous, so that should be fine - after the problem is fixed, it will happily let a job resume with remaining steps.
My question: Is it possible (and if so advisable?) to design a pipeline that could begin at any step, assuming the pre-requisites for that step were valid?
It would look something like:
job fails on step 6 and logs step 5 as the last successful step
a human comes along and fixes whatever caused step 6 to fail
a new pipeline is started at step 6
I realize a brute-force way would be to have StartAtStep2(), StartAtStep3(), StartAtStep4() methods. That doesn't seem like a good design, but I'm a bit new at this pattern so maybe that's acceptable.

The brute force way is not that bad, for example your above code would just need to be
bool StartAtStepThirteen(FileInfo csv)
{
return stepThirteenPostToWebApi.Post(csv);
}
The setup of the chain should be a separate method than the executing of the chain. You should save stepThirteenPostToWebApi in a class level variable in a class that represent's the entire chain, the setup of the chain could be done in the class's constructor.
Here is a simple 3 step version of the process. When a error happens instead of faulting the task chain I log the error and pass null along the chain for invalid entries. You could make that log method raise a event and then the user can decide what to do with the bad entry.
public class WorkChain
{
private readonly TransformBlock<string, FileInfo> stepOneGetFileInfo;
private readonly TransformBlock<FileInfo, System.Guid?> stepTwoPostToWebApi;
private readonly ActionBlock<System.Guid?> stepThreeDisplayIdToUser;
public WorkChain()
{
stepOneGetFileInfo = new TransformBlock<string, FileInfo>(new Func<string, FileInfo>(GetFileInfo));
stepTwoPostToWebApi = new TransformBlock<FileInfo, System.Guid?>(new Func<FileInfo, Task<Guid?>>(PostToWebApi));
stepThreeDisplayIdToUser = new ActionBlock<System.Guid?>(new Action<Guid?>(DisplayIdToUser));
stepOneGetFileInfo.LinkTo(stepTwoPostToWebApi, new DataflowLinkOptions() {PropagateCompletion = true});
stepTwoPostToWebApi.LinkTo(stepThreeDisplayIdToUser, new DataflowLinkOptions() {PropagateCompletion = true});
}
public void PostToStepOne(string path)
{
bool result = stepOneGetFileInfo.Post(path);
if (!result)
{
throw new InvalidOperationException("Failed to post to stepOneGetFileInfo");
}
}
public void PostToStepTwo(FileInfo csv)
{
bool result = stepTwoPostToWebApi.Post(csv);
if (!result)
{
throw new InvalidOperationException("Failed to post to stepTwoPostToWebApi");
}
}
public void PostToStepThree(Guid id)
{
bool result = stepThreeDisplayIdToUser.Post(id);
if (!result)
{
throw new InvalidOperationException("Failed to post to stepThreeDisplayIdToUser");
}
}
public void CompleteAdding()
{
stepOneGetFileInfo.Complete();
}
public Task Completion { get { return stepThreeDisplayIdToUser.Completion; } }
private FileInfo GetFileInfo(string path)
{
try
{
return new FileInfo(path);
}
catch (Exception ex)
{
LogGetFileInfoError(ex, path);
return null;
}
}
private async Task<Guid?> PostToWebApi(FileInfo csv)
{
if (csv == null)
return null;
try
{
dynamic task = await ApiUtils.SubmitData(csv.FullName);
return task.guid;
}
catch (Exception ex)
{
LogPostToWebApiError(ex, csv);
return null;
}
}
private void DisplayIdToUser(Guid? obj)
{
if(obj == null)
return;
Console.WriteLine(obj.Value);
}
}

GET Request dont save data

I'm trying to save some data from a GET request. I use StartCoroutine to request and I use Lambda expression for save the data.
My Code is this:
Using UnityEngine;
using System.Collections;
public class Test : MonoBehaviour {
// Use this for initialization
public void Start () {
string url1 = "http://localhost/virtualTV/query/?risorsa=";
string ciao = "http://desktop-pqb3a65:8080/marmotta/resource/ef299b79-35f2-4942-a33b-7e4d7b7cbfb5";
url1 = url1 + ciao;
WWW www1 = new WWW(url1);
var main=new JSONObject(JSONObject.Type.OBJECT);
var final= new JSONObject(JSONObject.Type.OBJECT);;
StartCoroutine(firstParsing((value)=>{main = value;
final= main.Copy();
Debug.Log(main);
}));
Debug.Log(final);
}
public IEnumerator firstParsing( System.Action<JSONObject> callback)
{
string url2 = "http://localhost/virtualTV/FirstQuery/?risorsa=";
string ciao = "http://desktop-pqb3a65:8080/marmotta/resource/ef299b79-35f2-4942-a33b-7e4d7b7cbfb5";
url2 = url2 + ciao;
WWW www2 = new WWW(url2);
yield return www2;
string json = www2.text;
//Parsing del json con creazione di un array
var firstjson = new JSONObject(json);
var tempVideo = new JSONObject(JSONObject.Type.OBJECT);
var array2 = new JSONObject(JSONObject.Type.OBJECT);
tempVideo.AddField ("id", firstjson.GetField ("id"));
tempVideo.AddField ("type", firstjson.GetField ("type"));
tempVideo.AddField ("url", firstjson.GetField ("url"));
array2.Add (tempVideo);
yield return array2;
callback (array2);
Debug.Log ("First Run" + array2);
}
When I try to use FINAL after the command,
final=main.copy()
it is empty. Can you help me to save the value in the variable final? Thanks all.

A coroutine's execution is spread across many frames. When a coroutine encounters a yield return statement, it returns to the calling method, which finishes executing, till the task finishes.
In your case, the Debug.Log(final) statement in Start executes as soon as yield return www2; in firstParsing is executed. The callback hasn't been called yet which is why final is empty.
To be able to access the value in final after it has been assigned outside the callback function, you will have to set a bool which is set to true after final is assigned in the callback. Something like this:
StartCoroutine(firstParsing((value)=>{main = value;
final= main.Copy();
Debug.Log(main);
isFinalAssigned = true;
}));
// In another method
if(isFinalAssigned)
{
// Access final
}
You will have to note that the above if statement is useful only in a method that is called periodically like Update. If you're accessing final in a method that is called only once (like OnEnable) you will have to wait for final to be assigned. You can use another coroutine for this task like
IEnumerator DoSomethingWithFinal()
{
while(!isFinalAssigned)
yield return null; // Wait for next frame
// Do something with final
}
The easiest way out is to consume (access) final in your callback.
EDIT2: From your comments, you can do something like the following. You will have to use coroutines, because blocking the main game thread is not a good idea.
private JSONObject final = null; // Make final a field
Wherever you use final, you have two options.
Use a null check if(final == null) return; This can be impractical.
Wait for final to be assigned in a coroutine and do something as a callback. This is the only way you can do what you want cleanly.
Look below for the implementation.
// Calls callback after final has been assigned
IEnumerator WaitForFinal(System.Action callback)
{
while(final == null)
yield return null; // Wait for next frame
callback();
}
// This whole method depends on final.
// This should be similar to your method set up if you have
// good coding standards (not very long methods, each method does only 1 thing)
void MethodThatUsesFinal()
{
if (final == null)
{
// Waits till final is assigned and calls this method again
StartCoroutine(WaitForFinal(MethodThatUsesFinal));
return;
}
// use final
}

Recursive Function Calls Throw StackOverFlowException

I've to call a function recursively. But after a moment it throws StackOverFlowException. When I used Invoke(new Action(Start)) method, it throws same exception but not in a long moment, this is shorter than the previous one.
How can I overcome this problem?
Example Code:
private void Start()
{
// run select query
mysql(selectQueryString.ToString());
msdr = mysql();
// is finished
if (!msdr.HasRows)
{
this.Finish();
return;
}
// get mysql fields
string[] mysqlFields = Common.GetFields(ref msdr);
while (msdr.Read())
{
// set lastSelectID
lastSelectID = Convert.ToInt32(msdr[idFieldName].ToString());
// fill mssql stored procedure parameters
for (int i = 0; i < matchTable.Count; i++)
{
string valueToAdd = Common.ConvertToEqualivantString(matchTable[i].Type, matchTable[i].Value, ref msdr, ref id, matchTable[i].Parameters);
sql.Ekle(matchTable[i].Key, valueToAdd);
}
// execute adding operation
lastInsertID = (int)sql(false);
// update status bar
this.UpdateStatusBar();
// update menues
this.UpdateMenues();
// increment id for "{id}" statement
id++;
}
// close data reader
msdr.Close();
msdr.Dispose();
mysql.DisposeCommand();
// increment select limit
selectQueryString.LimitFirst += selectQueryString.LimitLast;
// call itself until finish
this.Start();
}

When the last statement in a function is the call to the function itself, you have tail-recursion. While there are languages that optimize tail-recursion to avoid a stack overflow exception, C# is not one of them.
Recursion is not a good pattern for data that can be of an arbitrary length. Simply replace recursion by a while loop:
private void Start()
{
while(true) {
// run select query
mysql(selectQueryString.ToString());
msdr = mysql();
// is finished
if (!msdr.HasRows)
{
this.Finish();
break;
}
// rest of your code..
}
}

Calling C# ASMX Web Service

I have an ASMX web service that I need to utilise as part of a piece of work. I am calling this service via an ASPX page to create new entities on a 3rd party system. I have no access to the underlying code to that service, its simply to allow me to communicate with another system.
Im having trouble finding out if I am calling the service correctly and I wonder if anyone could offer some advice.
I have installed the ASMX page and that has given me a class 'ConfirmConnector' which I call the BeginProcessOperations method. I want to wait on that to return and then parse te results. The results should be in XML which I then step through to get the data I am after.
The trouble is that sometimes this process just dies on me, i.e. when I call my 'EndProcessOperations' method then nothing happens. I dont get an error, nothing - my code just dies and the method returns'
My calling code is:
private void sendConfirmRequest(XmlManipulator requestXML)
{
file.WriteLine("Sending CONFIRM Request!");
AsyncCallback callBack = new AsyncCallback(processConfirmXML); // assign the callback method for this call
IAsyncResult r = conn.BeginProcessOperations(requestXML, callBack, AsyncState);
System.Threading.WaitHandle[] waitHandle = { r.AsyncWaitHandle }; // set up a wait handle so that the process doesnt automatically return to the ASPX page
System.Threading.WaitHandle.WaitAll(waitHandle, -1);
}
My handler code is :
/*
* Process the response XML from the CONFIRM Connector
*/
private static void processConfirmXML(IAsyncResult result)
{
try
{
file.WriteLine("Received Response from CONFIRM!");
if(result == null)
{
file.WriteLine("RESPONSE is null!!");
}
if(conn == null)
{
file.WriteLine("conn is null!!");
}
file.WriteLine("Is Completed : " + result.IsCompleted);
XmlNode root = conn.EndProcessOperations(result);
file.WriteLine("got return XML");
//writeXMLToFile("C:/response.xml",root.InnerXml);
file.WriteLine(root.InnerXml);
Can anyone advise if I am handling this code in the correct way and does anyone have any idea why my code randomly bombs after this line in the handler :
XmlNode root = conn.EndProcessOperations(result);
Thanks for your help,
Paul

thanks for looking, but I solved my problem. The issue appeared to be related to my callback operation.
I changed the code to call my begin & end methods in the same block of code and I havent had an issue since then.
private void sendConfirmRequest(XmlManipulator requestXML)
{
//ConfirmConnector conn = new ConfirmConnector();
file.WriteLine("Sending CONFIRM Request!");
//AsyncCallback callBack = new AsyncCallback(processConfirmXML); // assign the callback method for this call
//IAsyncResult r = conn.BeginProcessOperations(requestXML, callBack, AsyncState);
//System.Threading.WaitHandle[] waitHandle = { r.AsyncWaitHandle }; // set up a wait handle so that the process doesnt automatically return to the ASPX page
//System.Threading.WaitHandle.WaitAll(waitHandle, -1);
file.WriteLine("Calling BeginProcessOperations");
IAsyncResult result = conn.BeginProcessOperations(requestXML, null, null);
// Wait for the WaitHandle to become signaled.
result.AsyncWaitHandle.WaitOne();
file.WriteLine("Calling EndProcessOperations");
XmlNode root = conn.EndProcessOperations(result);
processConfirmXML(root);
file.WriteLine("got return XML");
//writeXMLToFile("C:/response.xml",root.InnerXml);
file.WriteLine(root.InnerXml);
// Close the wait handle.
result.AsyncWaitHandle.Close();
}
Thanks
Paul

Recursive Async HttpWebRequests

Suppose I have the following class:
Public class FooBar
{
List<Items> _items = new List<Items>();
public List<Items> FetchItems(int parentItemId)
{
FetchSingleItem(int itemId);
return _items
}
private void FetchSingleItem(int itemId)
{
Uri url = new Uri(String.Format("http://SomeURL/{0}.xml", itemId);
HttpWebRequest webRequest = (HttpWebRequest)HttpWebRequest.Create(url);
webRequest.BeginGetResponse(ReceiveResponseCallback, webRequest);
}
void ReceiveResponseCallback(IAsyncResult result)
{
// End the call and extract the XML from the response and add item to list
_items.Add(itemFromXMLResponse);
// If this item is linked to another item then fetch that item
if (anotherItemIdExists == true)
{
FetchSingleItem(anotherItemId);
}
}
}
There could be any number of linked items that I will only know about at runtime.
What I want to do is make the initial call to FetchSingleItem and then wait until all calls have completed then return List<Items> to the calling code.
Could someone point me in the right direction? I more than happy to refactor the whole thing if need be (which I suspect will be the case!)

Getting the hang of asynchronous coding is not easy especially when there is some sequential dependency between one operation and the next. This is the exact sort of problem that I wrote the AsyncOperationService to handle, its a cunningly short bit of code.
First a little light reading for you: Simple Asynchronous Operation Runner – Part 2. By all means read part 1 but its a bit heavier than I had intended. All you really need is the AsyncOperationService code from it.
Now in your case you would convert your fetch code to something like the following.
private IEnumerable<AsyncOperation> FetchItems(int startId)
{
XDocument itemDoc = null;
int currentId = startId;
while (currentID != 0)
{
yield return DownloadString(new Uri(String.Format("http://SomeURL/{0}.xml", currentId), UriKind.Absolute),
itemXml => itemDoc = XDocument.Parse(itemXml) );
// Do stuff with itemDoc like creating your item and placing it in the list.
// Assign the next linked ID to currentId or if no other items assign 0
}
}
Note the blog also has an implementation of DownloadString which in turn uses WebClient which simplifies things. However the principles still apply if for some reason you must stick with HttpWebRequest. (Let me know if you are having trouble creating an AsyncOperation for this)
You would then use this code like this:-
int startId = GetSomeIDToStartWith();
Foo myFoo = new Foo();
myFoo.FetchItems(startId).Run((err) =>
{
// Clear IsBusy
if (err == null)
{
// All items are now fetched continue doing stuff here.
}
else
{
// "Oops something bad happened" code here
}
}
// Set IsBusy
Note that the call to Run is asynchronous, code execution will appear to jump past it before all the items are fetched. If the UI is useless to the user or even dangerous then you need to block it in a friendly way. The best way (IMO) to do this is with the BusyIndicator control from the toolkit, setting its IsBusy property after the call to Run and clearing it in the Run callback.

All you need is a thread sync thingy. I chose ManualResetEvent.
However, I don't see the point of using asynchronous IO since you always wait for the request to finish before starting a new one. But the example might not show the whole story?
Public class FooBar
{
private ManualResetEvent _completedEvent = new ManualResetEvent(false);
List<Items> _items = new List<Items>();
public List<Items> FetchItems(int parentItemId)
{
FetchSingleItem(itemId);
_completedEvent.WaitOne();
return _items
}
private void FetchSingleItem(int itemId)
{
Uri url = new Uri(String.Format("http://SomeURL/{0}.xml", itemId);
HttpWebRequest webRequest = (HttpWebRequest)HttpWebRequest.Create(url);
webRequest.BeginGetResponse(ReceiveResponseCallback, webRequest);
}
void ReceiveResponseCallback(IAsyncResult result)
{
// End the call and extract the XML from the response and add item to list
_items.Add(itemFromXMLResponse);
// If this item is linked to another item then fetch that item
if (anotherItemIdExists == true)
{
FetchSingleItem(anotherItemId);
}
else
_completedEvent.Set();
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parsing HTML in C# having a hard time with GetByteArrayAsync - c#

When an await statement throws an exception, only a try block can catch it. I suggest you add a try-catch or try-finally to catch the exception and handle it properly.

Related

C# TPL: Possible to restart a failed Pipeline at an arbitrary step?

GET Request dont save data

Recursive Function Calls Throw StackOverFlowException

Calling C# ASMX Web Service

Recursive Async HttpWebRequests

Categories

Resources