Parsing javascript HTML using HTMLAgilityPack - c#

I have the following HTML that I'm trying to parse using the HTML Agility Pack.
This is a snippet of HTML code:
<body id="station_page" class="">
...
<div>....</div>
<script type="text/javascript">
if (Blablabla == undefined) { var Blablabla = {}; }
Blablabla .Data1= "I want this data";
Blablabla .BlablablaData =
{ "Data2":"I want this data",
"Blablabla":"",
"Blablabla":0 }
{ "Blablabla":123,
"Data3":"I want this data",
"Blablabla":123}
Blablabla .Data4= I want this data;
</script>...
I'm tring to get those 4 data variable (Data1,Data2,Data3,Data4). first, I tried to found the javascript:
doc.DocumentNode.SelectSingleNode("//script[#type='text/javascript']").InnerHtml
How can I check if it's really the right javascript?
After finding the relevant javascript how can I get those 4 data variable (Data1,Data2,Data3,Data4)?

You can't parse javascript with HTML Agility Pack, it only supports HTML parsing. You can get to the script you need with an XPATH like this:
doc.DocumentNode.SelectSingleNode("//script[contains(text(), 'Blablabla')]").InnerHtml
But you'll need to parse the javascript with another method (regex, js grammar, etc.)

Related

How to extract JSON embedded on a HTML page using C#

The JSON I wish to use is embedded on a HTML page. Within a tag on the page there is a statement:
<script>
jsonRAW = {... heaps of JSON... }
Is there a parser to extract this from HTML? I have looked at json.NET but it requires its JSON reasonably formatted.
You can try to use HTML Agility pack. This can be downloaded as a Nuget Package.
After installing, this is a tutorial on how to use HTML Agility pack.
The link has more info but it works like this in code:
var urlLink = "http://www.google.com/jsonPage"; // 1. Specify url where the json is to read.
var web = new HtmlWeb(); // Init the HTMl Web
var doc = web.Load (urlLink); // Load our url
if (doc.ParseErrors != null) { // Check for any errors and deal with it.
}
doc.DocumentNode.SelectSingleNode(""); // Access the dom.
There are other things in between but this should get you started.

How do I inject javascript code via webview?

Basically what I want to do is this this but I can't seem to find something similar for webview XAML control. What I ultimately need to do, is capture an incoming json file from the webview. As is, I get a bad request from the server and unsupported file exception from the webview. I thought about injecting a javascript so that it would alert me, I could get the body of the incoming json and bypass all the errors.
There are two main things you can do:
Call functions programically
Inject any code by using the HTML string
Function Calling
You can use InvokeScript to call javascript functions.
If you have in a webpage with a script:
<script lang="en-us" type="text/javascript">
function myFunction() {
alert("I am an alert box!");
}
</script>
Then you can in C# call:
MyWebview.InvokeScript("myFunction", null);
Which will execute the script function myFunction.
Injecting Text
If you download the HTML page and all other needed files(using the Windows HttpClient), you can inject any code by manipulating and then Navigating to string.
Lets say you want to change the above script to add another function, "HelloWorld", then you can
Search the file for something you know will be there, such as: <script lang=\"en-us\" type=\"text/javascript\">
Using string manipulation, add the desired text, such as a function (but this can be anything)
Navigate to the String
The C# code:
string MyWebPageString = GetWebpageString(WebpageUri);
string ScriptTagString = "<script lang=\"en-us\" type=\"text/javascript\">";
int IndexOfScriptTag = MyWebPageString.IndexOf(ScriptTagString);
int LengthOfScriptTag = ScriptTagString.Length;
string InsertionScriptString = "function SayHelloWorld() { window.external.notify(\"Hello World!\");} ";
MyWebPageString = MyWebPageString.Insert(IndexOfScriptTag + LengthOfScriptTag + 1, InsertionScriptString);
MyWebview.NavigateToString(MyWebPageString);
The result will be that the navigated to Webpage will look like this:
<script lang="en-us" type="text/javascript"> function SayHelloWorld() { window.external.notify("Hello World!");}
function myFunction() {
alert("I am an alert box!");
}
</script>
Since the injection can be applied to any area, even the HTML, you should be able to figure something out.
Hope this helps. Good luck.
This answer was based on this MSDN blog

HTMLAgilitypack read html page info with ajax calls

I am using HtmlAgilitypack for reading specific html elements of a specific url.
The problem I am facing is one of the html tag contents are filled by AJAX requests. So how can I read this ?
<div id="priceInfo"></div>
Code I used to read the url is
HtmlWeb _htmlWeb = new HtmlWeb();
HtmlAgilityPack.HtmlDocument _webDoc = _htmlWeb.Load(webUrl);
// HtmlNodeCollection _priceNode = Gets the node with id priceInfo
The contents of this div is filled by a ajax request and i want to read the contents of this DIv after its getting filled. How can i do that
HtmlAgilityPack is to be used at server side. from what you stating, you are trying to assert a value at client side, not at the server side.
you should look into using jquery/javascript once the ajax call is done.
ajax ({ ....
.done(...) {
// handling the return result...
alert($("#yourHtmlId").val()); // show one of your html tag value attribute.
}
})
http://api.jquery.com/jQuery.ajax/

Html Tag in textbox/input text

When I input some html tag like < b> or < test> (without the space after "<") in my TextBoxes, When I submit the form I got the issue:
Sys.WebForms.PageRequestManagerServerErrorException: An unknown error occurred while processing the request on the server.
The status code returned from the server was: 500
I don't want to set the "ValidateRequest" false, because of security problems.
I thought in make some kind of javascript function inserting a space after "<", this could work...I guess.
Any idea?
You can escape your input using javascript before posting it back.
See existing answers:
Escaping HTML strings with jQuery
Escape HTML using jQuery
Fastest method to escape HTML tags as HTML entities?
On the c# side use HttpUtility.HtmlDecode(string) to decode your text back.
You can escape/unescape your html content using JavaScript (jQuery) as shown below:
<script>
function htmlEncode(value) {
return $('<div/>').text(value).html();
}
function htmlDecode(value) {
return $('<div/>').html(value).text();
}
</script>

How to Get JavaScript value From c# back end

Hi guy i have a javasciprt will create a value , is that any way for me to capture the value in my c# back end code.
function GetKey(){ return Key; } //key is a combination value
Thank you
C# in backend
public String MyVariable;
MyVariable = "Some Value";
ASPX
<%=MyVariable %>
via ajax
You can put it into your aspx page code and pass it to your js (if you are using asp).
If Key is an object, you can serialize it to json string and use JSON parser to deserialize it. If it is a basic type, use <%= Key %> in your asp page is OK. (Note you cannot put the code in your js file)
<script type="text/javascript">
var key = JSON.parse('<%=JsonConvert.SerializeObject(someObj.GetKey()) %>');
//now you can use key in your js logic
</script>

Categories