Save an HTML page with all styles and images in C#?

Save an HTML page with all styles and images in C#? - c#

How do I save an HTML page with all styles and images in C#? I need to make a programmatic implementation of a browser's 'Save' feature which doesn't rely on Internet Explorer (WebBrowser component).

I do not think this is very easy.
Download all the HTML for the page using webclient and write the text to an HTML-file. Then use an html-parser to find all linked images and save them in their sub-directory. Do the same for the CSS.
If you do not want to save all the images you can just add the URL of the page to the beginning of all links to images. Also, note that some URL:s are not relative and you will have to compensate for that. And don't forget to scan the css-file for all linked images

I have a similar thing to solve. Biggest problems for you will be the images that come from CSS, they are very difficult to parse.
So, I chose to use FiddlerCore to achieve that.
Might help you too.
The difficult part of your task is to create your own structure, and change image paths accordingly.

Related

Converting HTML to image

I have a web app that displays the profile of over 600 people, and each profile displays a word cloud. the word cloud is rendered using html.
The client has requested that the same word cloud to appear in an excel macro that pretty much does the same thing as the web app.
I have seen a few solutions that saves image from rendered page but is there a way to create images from the html programatically, without selecting each of the 600 profiles manually.

Since rendering html is a browser's job, you could take a look at doing it by javascript.
Write a nice little program in jQuery (or your favorite js framework) that renders the word cloud of every profile on a canvas, and then use this:
http://www.nihilogic.dk/labs/canvas2image/
to take an image of the rendered html.

I know it has been a while since I asked this question but I'll answer it for the benefit of others. I ended up using something called IECapt to capture rendering of a web page into a BMP, JPEG or PNG image file;
http://iecapt.sourceforge.net/
I then wrote a unit test to iterate over the various urls, passing it as an argument to the IECapt utility. Was able to render over 600 images in a few seconds.

Server-side printing in C#/ASP.NET

On the server that my application is being run on, a virtual PDF printer is being installed (don't know much about this yet, except it's from Adobe), and my application needs to use this 'printer' to create PDF's from HTML pages (a GridView mostly), and then redirect the user to the URL of the where the PDF is stored.
I've been looking at the PrintDocument object in System.Drawing.Printing, however I've read that you can't simply feed this a HTML page. What are my choices? The easiest option would be to be able to 'print' a given HTML page (choosing what and what not to print using CSS), but from what I've read this is fairly difficult, so I'm thinking about somehow constructing whatever object PrintDocument needs programatically, if that makes sense.
Any ideas on how I should do this?

there are some free/cheap libs for creating pdfs on the fly. I've used itextsharp before and it worked pretty well. Takes a bit of time to get up to speed in how it works but I'd suggest checking it out.
There are also printing services like Neevia DocConverter that will monitor a folder and auto convert whatever you put in the folder to a pdf, jpg, etc. you can set it up so that if you drop a url shortcut in the folder it will render the webpage at that url to pdf. it's a bit more of a pain if you want to do realtime rendering but works excellent for generating mass reports in batches that you want to post up to a website or email later.

Response.BinaryWrite DIV

Is there a way to write PDF to a div from DataBase i.e. Retrieve a Byte[] from Database and Reponse.BinaryWrite to a div.
We do similar thing for Images using src = "anotherpage.aspx" where image is written on anotherpage.
Is it possible with PDF without using IFrame?

If what you're trying to do is show a PDF file inside a DIV, you're going down the wrong path. You either need to:
Convert the PDF to Flash (ala Flash Paper)
or
Convert the PDF to HTML (like Scribd does using HTML 5).
Then you can embed the PDF inside a DIV. But no browser I know of supports directly embedding PDFs.
Otherwise you have to put the PDF in an IFRAME, but how this is shown is PDF plug-in dependent.

No. The reason it works with a src=otherpage.aspx request is that the src attribute results in the user's web browser making a completely separate request for the other resource. You're serving up an additional page to make that happen. Writing a PDF file directly is trying to inject the PDF into the same request as your page - not really "similar" to your img src at all. In fact, what is most similar to the "src=otherpage.aspx" method is the iframe approach that you mentioned.
As a side note, you our "AnotherPage.aspx" example should really be changed to "AnotherPage.ashx". Note the letter 'h' in there. That means you're using a handler rather than a page, which will perform better.

get current page number of pdf document in asp.net

I am trying to implement a feature where i open (suppose in iframe) a PDF file (multiple pages), Highlight a section of the document a get the page number (the one that is displayed in the PDF tool bar).
Eg: if the toolbar display 2/7 which means i am right now in page 2, i need to capture the page number information. Sounds simple but i am not able to get a .dll/function that exposes this property.
Any help would be grateful.Thanks.

I wouldn't think this would be possible, there's no way to control PDFs with JavaScript in the browser, which is what you'd need to do.
This article suggests the same: http://codingforums.com/showthread.php?t=43436.
Content of link:
in short, no, you can't do that.
really don't think JS can read properties of PDFs, since PDFs are viewed in the browser thru a plugin, ie a viewport for another application (for want of a better explanation).
You may be better trying a different route, such as generating the pages as images and implementing your own paging. Depends on your content and requirements, of course. ABCPDF from http://www.websupergoo.com/ is free (with a link-back), not sure if that's any help for you.

Is there a way to replace a text in a PDF file with itextsharp?

I'm using itextsharp to generate the PDFs, but I need to change some text dynamically.
I know that it's possible to change if there's any AcroField, but my PDF doen's have any of it. It just has some pure texts and I need to change some of them.
Does anyone know how to do it?

Actually, I have a blog post on how to do it! But like IanGilham said, it depends on whether you have control over the original PDF. The basic idea is you setup a form on the page and replace the form fields with the text you want. (You can style the form so it doesn't look like a form)
If you don't have control over the PDF, let me know how to do it!
Here is a link to the full post:
Using a template to programmatically create PDFs with C# and iTextSharp

I haven't used itextsharp, but I have been using PDFNet SDK to explore the content of a large pile of PDFs for localisation over the last few weeks.
I would say that what you require is absolutely achievable, but how difficult it is will depend entirely on how much control you have over the quality of the files. In my case, the files can be constructed from any combination of images, text in any random order, tables, forms, paths, single pixel graphics and scanned pages, some of which are composed from hundreds of smaller images. Let's just say we're having fun with it.
In the PDFTron way of doing things, you would have to implement a viewer (sample available), and add some code over a text selection. Given the complexities of the format, it may be necessary to implement a simple editor in a secondary dialog with the ability to expand the selection to the next line (or whatever other fundamental object is used to make up text). The string could then be edited and applied by copying the entire page of the document into a new page, replacing the selected elements with your new string. You would probably have to do some mathematics to get this to work well though, as just about everything in PDF is located on the page by means of an affine transform.
Good luck. I'm sure there are people on here with some experience of itextsharp and PDF in general.

This question comes up from time to time on the mailing list. The same answer is given time and time again - NO. See this thread for the official answer from the person who created iText.
This question should be a FAQ on the itextsharp tag wiki.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Save an HTML page with all styles and images in C#? - c#

How do I save an HTML page with all styles and images in C#? I need to make a programmatic implementation of a browser's 'Save' feature which doesn't rely on Internet Explorer (WebBrowser component).

I have a similar thing to solve. Biggest problems for you will be the images that come from CSS, they are very difficult to parse. So, I chose to use FiddlerCore to achieve that. Might help you too. The difficult part of your task is to create your own structure, and change image paths accordingly.

Related

Converting HTML to image

Server-side printing in C#/ASP.NET

Response.BinaryWrite DIV

get current page number of pdf document in asp.net

Is there a way to replace a text in a PDF file with itextsharp?

Categories

Resources