How to get raw page source (not generated source) from c#

How to get raw page source (not generated source) from c# - c#

The goal is to get the raw source of the page, I mean do not run the scripts or let the browsers format the page at all. for example: suppose the source is <table><tr></table> after the response, I don't want get <table><tbody><tr></tr></tbody></table>, how to do this via c# code?
More info: for example, type "view-source:http://feeds.gawker.com/kotaku/full" in the browser's address bar will give u a xml file, but if you just call "http://feeds.gawker.com/kotaku/full" it will render a html page, what I want is the xml file. hope this is clear.

Here's one way, but it's not really clear what you actually want.
using(var wc = new WebClient())
{
var source = wc.DownloadString("http://google.com");
}

If you mean when rendering your own page. You can get access the the raw page content using a ResponseFilter, or by overriding page render. I would question your motives for doing this though.
Scripts run client-side, so it has no bearing on any c# code.

You can use a tool such as Fiddler to see what is actually being sent over the wire.
disclaimer: I think Fiddler is amazing

Related

CefSharp possible to load html content?

I need to create a application which loads a html "template" file and parse them with current data values. So far no problemm but does anyone knows how to load the parsed html value into the cefsharp browser ?
I found some old topics here with an "loadHtml()" function. But this function isnt there anymore.
Thanks in advance

You need to add a using CefSharp; statement to your code to access the LoadHtml extensions methods.
chromiumWebBrowser.LoadHtml(html);

const string html = "<html><head><title>Test</title></head><body><h1>Html Encoded in URL!</h1></body></html>";
var base64EncodedHtml = Convert.ToBase64String(Encoding.UTF8.GetBytes(html));
browser.Load("data:text/html;base64," + base64EncodedHtml);
From the project wiki on github: Loading HTML/CSS/JavaScript/etc from disk/database/embedded resource/stream

Access to content of a process information

I create an instance of IE with this code:
System.Diagnostics.Process p =
System.Diagnostics.Process.Start("IEXPLORE.EXE",
#"http://www.asnaf.ir/moreinfounit.php?sSdewfwo87kjLKH7624QAZMLLPIdyt75576rtffTfdef22de=1&iIkjkkewr782332ihdsfJHLKDSJKHWPQ397iuhdf87D3dffR=2009585&gGtkh87KJg89jhhJG75gjhu64HGKvuttt87guyr6e67JHGVt=117&cCli986gjdfJK755jh87KJ87hgf9871g00113kjJIZAEQ798=0a26e8ea07358781d128aa4bc98dd89a");
I want to get the contents of the opened window. Is it possible to read the HTML content by this process?

Use following COde,
using (var client = new WebClient())
{
string result = client.DownloadString("http://www.asnaf.ir/moreinfounit.php?sSdewfwo87kjLKH7624QAZMLLPIdyt75576rtffTfdef22de=1&iIkjkkewr782332ihdsfJHLKDSJKHWPQ397iuhdf87D3dffR=2009585&gGtkh87KJg89jhhJG75gjhu64HGKvuttt87guyr6e67JHGVt=117&cCli986gjdfJK755jh87KJ87hgf9871g00113kjJIZAEQ798=0a26e8ea07358781d128aa4bc98dd89a");
// TODO: ur logice here
}

no. your processes run in different virtual addressing spaces. That would have been a serious security vulnerability if you could have read the memory space allocated by another process.
Edit: Consider using something like a WebBrowserControl in your original process. That way you cold easily retrieve the page it displays.

It might be possible, but I'd actually use a HttpWebRequest to obtain the HTML content. If you really just want to get the HTML content for a given http-URL, using IE as a separate process is definitely not the way to go.

You should use WebClient class to retrieve web page content. Check this link:
http://msdn.microsoft.com/en-us/library/system.net.webclient(v=vs.80).aspx

Download content from the internet with code

I have to download some content from a website every day so I figure it will be nice to have a program that will do it... The problem is that the website requires authentication.
My current solution is by using System.Windows.Forms.WebBrowser control. I currently do something like:
/* Create browser */
System.Windows.Forms.WebBrowser browser = new System.Windows.Forms.WebBrowser();
/* navigate to desired site */
browser.Navigate("http://stackoverflow.com/");
// wait for browser to download dom
/* Get all tags of type input */
var elements = browser.Document.Body.GetElementsByTagName("input");
/* let's look for the one we are interested */
foreach (System.Windows.Forms.HtmlElement curInput in elements)
{
if (curInput.GetAttribute("name") == "q") //
{
curInput.SetAttribute("value", "I changed the value of this input");
break;
}
}
// etc
I think this approach works but is not the best solution. I have tried to use the webclient class and that seems to work but for some reason it does not work. I belive the reason why it does not work is because I have to save the cookies?
So my question is how will I be able to track all the bytes that get send to the server and all the bytes that get responded in order to download what I need. In other words I will like to have the webclient act as a webrowser and once I get to the part I need by just looking at the source I should be able to parser the data that I need.
I will appreciate if someone can show me an example of how to do so. Google chrome does a pretty good job displaying lots of information:
Thanks in advance,
Antonio

Answering your question:
The best utility i know to track traffic is Fiddler (its free).
For sending advanced HTTP requests, you should use class System.Net.HttpWebRequest, which also has property CookieContainer, and Headers, allowing you to do what ever you want.
Hope it helps.

Place Image from DLL onto a web page

I'm so stuck on something i thought would be easy.
I have a DLL that returns an Image object.
I just cant figure out how to display that image on a webpage.
I've tried a few ways, and google a million different variations.
Is it not possible to just bind an Image object to an element on the page like an HtmlImage or a simple img?
Or do i need to convert the Image to a Stream? or a Bitmap? I'm really stuck!
Any help appreciated.....
V

With Asp.Net WebForm, the easiest way is to create a custom ashx file.
In Visual Studio, create a new Custom Handler (I'm not sure of the name of the template in Visual Studio). This will create a .ashx file.
In the code of this handler, write something like (does not have VS under the hand to test the syntax) :
public void ProcessRequest(System.Web.HttpContext context)
{
byte[] raw;
using(var ms = new MemoryStream()){
Image myImage = GetFromDll();
myImage.Save(ms, ImageFormat.Png);
raw=ms.ToArray();
}
context.Response.ContentType = "image/png";
context.Response.BinaryWrite(raw);
}
Then, in your browser, navigate to http://yourserver/app/yourhandler.ashx.
You can if you want add url parameter, and get it from the Request.QueryString collection

It's not as simple as binding. On the client side images are retrieved from the web server as a separate GET request, which means you have to have a URL that resolves to an image. The other option, as Asif suggested, is embedding your image in the HTML as a Base64 string, which is bad practice for shared images (see Steve B's comment).
You either have to provide an URL (route that returns the image file in MVC, or a custom page with proper content type and Response.Write in WebForms), or embed in html.
EDIT:
There is also a third option involving custom HTTP handlers. These have the advantage of bypassing the app framework and serving the content almost directly off the web server, see MSDN.

Convert your image to base64 string and then set it in the <img/> tag.
<img/> can show the image in base64 string.
Alternatively you can save the image and use the path in the <img/>.

Generate PDF from ASP.NET from raw HTML/CSS content?

I'm sending emails that have invoices attached as PDFs. I'm already - elsewhere in the application - creating the invoices in an .aspx page. I'd like to use Server.Execute to return the output HTML and generate a PDF from that. Otherwise, I'd have to use a reporting tool to "draw" the invoice on a PDF. That blows for lots of reasons, not the least of which is that I'd have to update both the .aspx page and the report for every minor change. What to do...

There is no way to generate a PDF from an HTML string directly within .NET, but there are number of third party controls that work well.
I've had success with this one: http://www.html-to-pdf.net
and this: http://www.htmltopdfasp.net
The important questions to ask are:
Does it render correctly as compared to the 3 major browsers: IE, FF and Safari/Chrome?
Does it handle CSS fine?
Does the control have it's own rendering engine? If so, bounce it. You don't want to trust a home grown rendering engine - the browsers have a hard enough problem getting everything pixel perfect.
What dependencies does the third party control require? The fewer, the better.
There are a few others but they deal with ActiveX displays and such.

We use a product called ABCPDF for this and it works fantastic.
http://www.websupergoo.com/abcpdf-1.htm

This sounds like a job for Prince. It can take HTML and CSS and generate a PDF, which you can then present to your users. It supports CSS3 better than most web browsers (staff include Håkon Wium Lie, the inventor of CSS).
See the samples, especially the ones for Wikipedia pages, for the beautiful output it can generate. There's also an interesting Google Tech Talk with the authors.
Edit: There is a .NET wrapper available.

wkhtmltopdf is a free and cool exe to generate pdf from html. Its written in c++. But nReco htmltopdf is a wrapper dotnet library for this awesome tool. I implemented using this dotnet library and it was just so good it does everything by its own you just need to give html as a data source.
/// <summary>
/// Converts html into PDF using nReco dll and wkhtmltopdf.exe.
/// </summary>
private byte[] ConvertHtmlToPDF()
{
HtmlToPdfConverter nRecohtmltoPdfObj = new HtmlToPdfConverter();
nRecohtmltoPdfObj.Orientation = PageOrientation.Portrait;
nRecohtmltoPdfObj.PageFooterHtml = CreatePDFFooter();
nRecohtmltoPdfObj.CustomWkHtmlArgs = "--margin-top 35 --header-spacing 0 --margin-left 0 --margin-right 0";
return nRecohtmltoPdfObj.GeneratePdf(CreatePDFScript() + ShowHtml() + "</body></html>");
}
The above function is an excerpt from the below link post which explains it in detail.
HTML to PDF in ASP.Net

The initial question is about converting another aspx page containing an invoice to a PDF document. The invoice is probably using some session data and the user suggests to use Server.Execute() to obtain the invoice page HTML code and then to convert that code to PDF. Converting the invoice page URL directly is not possible because a new session would be created during conversion and the session data would be lost.
This is actually a good technique to preserve session data during conversion which is applied in Convert a HTML Page to PDF in Same Session ASP.NET Demo of the EvoPdf library. The complete C# code to get the HTML string rendered by the invoice page and to convert that string to PDF is:
// Execute the invoice page and get the HTML string rendered by this page
TextWriter outTextWriter = new StringWriter();
Server.Execute("Invoice.aspx", outTextWriter);
string htmlStringToConvert = outTextWriter.ToString();
// Create a HTML to PDF converter object with default settings
HtmlToPdfConverter htmlToPdfConverter = new HtmlToPdfConverter();
// Use the current page URL as base URL
string baseUrl = HttpContext.Current.Request.Url.AbsoluteUri;
// Convert the page HTML string to a PDF document in a memory buffer
byte[] outPdfBuffer = htmlToPdfConverter.ConvertHtml(htmlStringToConvert, baseUrl);
// Send the PDF as response to browser
// Set response content type
Response.AddHeader("Content-Type", "application/pdf");
// Instruct the browser to open the PDF file as an attachment or inline
Response.AddHeader("Content-Disposition", String.Format("attachment; filename=Convert_Page_in_Same_Session.pdf; size={0}", outPdfBuffer.Length.ToString()));
// Write the PDF document buffer to HTTP response
Response.BinaryWrite(outPdfBuffer);
// End the HTTP response and stop the current page processing
Response.End();

As long as you can make sure to use proper XHTML, you could also use a product like Alt-Soft's Xml2PDF to convert XML (XHTML) into PDF by means of XSLT/XSL-FO.
It takes a bit of a learning curve to master, but it works very well once you've "got" it!
Marc

Since you are producing the answer, you can use a tool like Report.NET:
http://sourceforge.net/projects/report/
I disagree with the answers that say you cannot convert directly from output to PDF, however, as you can "re-call" the page and get the HTML as a stream and convert it. I am not sure what tool you would want to use to do this, however. In other words, it is possible, but I am not sure it is worth it. The PDF creation libs, like Report.NET, even though they force reusing some logic and no automagic converrsion, it is easier.
I have not tried this component, but I have heard good things about it from those who have. The model is more like HTML, but I am not sure you can simply send a rendered ASPX to it to create PDF:
http://www.websupergoo.com/abcpdf-8.htm

If you try to find some html to pdf software via GOOGLE you'll get a pile of this stuff.
There are about 10 leaders but most of them use IE dlls in background mode.
Just couple of them use their own parsing engine.
Please try PDF Duo .NET component in your ASP.NET project if you wish to create a PDF programaticaly.
It is light component for a cool generating of PDF invoces, reports e.g.

I'd go a different route. Assuming you are using SQL Server, use SSRS and generate the PDF that way.

A possible minimal solution to use Server.Execute() to obtain the HTML of the invoice page and convert that code to a PDF using winnovative html to pdf api for .net is:
TextWriter outTextWriter = new StringWriter();
Server.Execute("Invoice.aspx", outTextWriter);
HtmlToPdfConverter htmlToPdfConverter = new HtmlToPdfConverter();
byte[] pdfBytes = htmlToPdfConverter.ConvertHtml(outTextWriter.ToString(),
httpContext.Current.Request.Url.AbsoluteUri);

You can use PDFSharp or iTextSharp to convert html to pdf. PDFSharp is not free.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to get raw page source (not generated source) from c# - c#

Here's one way, but it's not really clear what you actually want. using(var wc = new WebClient()) { var source = wc.DownloadString("http://google.com"); }

If you mean when rendering your own page. You can get access the the raw page content using a ResponseFilter, or by overriding page render. I would question your motives for doing this though. Scripts run client-side, so it has no bearing on any c# code.

You can use a tool such as Fiddler to see what is actually being sent over the wire. disclaimer: I think Fiddler is amazing

Related

CefSharp possible to load html content?

Access to content of a process information

Download content from the internet with code

Place Image from DLL onto a web page

Generate PDF from ASP.NET from raw HTML/CSS content?

Categories

Resources