How do I print an HTML document from a web service? - c#

I want to print HTML from a C# web service. The web browser control is overkill, and does not function well in a service environment, nor does it function well on a system with very tight security constraints. Is there any sort of free .NET library that will support the printing of a basic HTML page? Here is the code I have so far, which does not run properly.
public void PrintThing(string document)
{
if (Thread.CurrentThread.GetApartmentState() != ApartmentState.STA)
{
Thread thread =
new Thread((ThreadStart) delegate { PrintDocument(document); });
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
}
else
{
PrintDocument(document);
}
}
protected void PrintDocument(string document)
{
WebBrowser browser = new WebBrowser();
browser.DocumentText = document;
while (browser.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
browser.Print();
}
This works fine when called from UI-type threads, but nothing happens when called from a service-type thread. Changing Print() to ShowPrintPreviewDialog() yields the following IE script error:
Error: dialogArguments.___IE_PrintType is null or not an object.
URL: res://ieframe.dll/preview.dlg
And a small empty print preview dialog appears.

You can print from the command line using the following:
rundll32.exe
%WINDIR%\System32\mshtml.dll,PrintHTML
"%1"
Where %1 is the file path of the HTML file to be printed.
If you don't need to print from memory (or can afford to write to the disk in a temp file) you can use:
using (Process printProcess = new Process())
{
string systemPath = Environment.GetFolderPath(Environment.SpecialFolder.System);
printProcess.StartInfo.FileName = systemPath + #"\rundll32.exe";
printProcess.StartInfo.Arguments = systemPath + #"\mshtml.dll,PrintHTML """ + fileToPrint + #"""";
printProcess.Start();
}
N.B. This only works on Windows 2000 and above I think.

I know that Visual Studio itself (at least in 2003 version) references the IE dll directly to render the "Design View".
It may be worth looking into that.
Otherwise, I can't think of anything beyond the Web Browser control.

Easy! Split your problem into two simpler parts:
render the HTML to PDF
print the PDF (SumatraPDF)
-print-to-default $file.pdf prints a PDF file on a default printer
-print-to $printer_name $file.pdf prints a PDF on a given printer

If you've got it in the budget (~$3000), check out PrinceXML.
It will render HTML into a PDF, functions well in a service environment, and supports advanced features such as not breaking a page in the middle of a table cell (which a lot of browsers don't currently support).

I tool that works very well for me is HiQPdf. https://www.hiqpdf.com/
The price is reasonable (starts at $245) and it can render HTML to a PDF and also manage the printing of the PDF files directly.

Maybe this will help. http://www.codeproject.com/KB/printing/printhml.aspx
Also not sure what thread you are trying to access the browser control from, but it needs to be STA
Note - The project referred to in the link does allow you to navigate to a page and perform a print without showing the print dialog.

I don't know the specific tools, but there are some utilities that record / replay clicks. In other words, you could automate the "click" on the print dialog. (I know this is a hack, but when all else fails...)

Related

How to get text from image using C# .NET [duplicate]

Is there an API to use Onenote OCR capabilities to recognise text in images automatically?
If you have OneNote client on the same machine as your program will execute you can create a page in OneNote and insert the image through the COM API. Then you can read the page in XML format which will include the OCR'ed text.
You want to use
Application.CreateNewPage to create a page
Application.UpdatePageContent to insert the image
Application.GetPageContent to read the page content and look for OCRData and OCRText elements in the XML.
OneNote COM API is documented here: http://msdn.microsoft.com/en-us/library/office/jj680120(v=office.15).aspx
When you put an image on a page in OneNote through the API, any images will automatically be OCR'd. The user will then be able to search any text in the images in OneNote. However, you cannot pull the image back and read the OCR'd text at this point.
If this is a feature that interests you, I invite you to go to our UserVoice site and submit this idea: http://onenote.uservoice.com/forums/245490-onenote-developers
update: vote on the idea: https://onenote.uservoice.com/forums/245490-onenote-developer-apis/suggestions/10671321-make-ocr-available-in-the-c-api
-- James
There is a really good sample of how to do this here:
http://www.journeyofcode.com/free-easy-ocr-c-using-onenote/
The main bit of code is:
private string RecognizeIntern(Image image)
{
this._page.Reload();
this._page.Clear();
this._page.AddImage(image);
this._page.Save();
int total = 0;
do
{
Thread.Sleep(PollInterval);
this._page.Reload();
string result = this._page.ReadOcrText();
if (result != null)
return result;
} while (total++ < PollAttempts);
return null;
}
As I will be deleting my blog (which was mentioned in another post), I thought I should add the content here for future reference:
Usage
Let's start by taking a look on how to use the component: The class OnenoteOcrEngine implements the core functionality and implements the interface IOcrEngine which provides a single method:
public interface IOcrEngine
{
string Recognize(Image image);
}
Excluding any error handling, it can be used in a way similar to the following one:
using (var ocrEngine = new OnenoteOcrEngine())
using (var image = Image.FromFile(imagePath))
{
var text = ocrEngine.Recognize(image);
if (text == null)
Console.WriteLine("nothing recognized");
else
Console.WriteLine("Recognized: " + text);
}
Implementation
The implementation is far less straight-forward. Prior to Office 2010, Microsoft Office Document Imaging (MODI) was available for OCR. Unfortunately, this no longer is the case. Further research confirmed that OneNote's OCR functionality is not directly exposed in form of an API, but the suggestions were made to manually parse OneNote documents for the text (see Is it possible to do OCR on a Tiff image using the OneNote interop API? or need a document to extract text from image using onenote Interop?. And that's exactly what I did:
Connect to OneNote using COM interop
Create a temporary page containing the image to process
Show the temporary page (important because OneNote won't perform the OCR otherwise)
Poll for an OCRData tag containing an OCRText tag in the XML code of the page.
Delete the temporary page
Challenges included the parsing of the XML code for which I decided to use LINQ to XML. For example, inserting the image was done using the following code:
private XElement CreateImageTag(Image image)
{
var img = new XElement(XName.Get("Image", OneNoteNamespace));
var data = new XElement(XName.Get("Data", OneNoteNamespace));
data.Value = this.ToBase64(image);
img.Add(data);
return img;
}
private string ToBase64(Image image)
{
using (var memoryStream = new MemoryStream())
{
image.Save(memoryStream, ImageFormat.Png);
var binary = memoryStream.ToArray();
return Convert.ToBase64String(binary);
}
}
Note the usage of XName.Get("Image", OneNoteNamespace) (where OneNoteNamespace is the constant "http://schemas.microsoft.com/office/onenote/2013/onenote" ) for creating the element with the correct namespace and the method ToBase64 which serializes an GDI-image from memory into the Base64 format. Unfortunately, polling (See What is wrong with polling? for a discussion of the topic) in combination with a timeout is necessary to determine whether the detection process has completed successfully:
int total = 0;
do
{
Thread.Sleep(PollInterval);
this._page.Reload();
string result = this._page.ReadOcrText();
if (result != null)
return result;
} while (total++ < PollAttempts);
Results
The results are not perfect. Considering the quality of the images, however, they are more than satisfactory in my opinion. I could successfully use the component in my project. One issue remains which is very annoying: Sometimes, OneNote crashes during the process. Most of the times, a simple restart will fix this issue, but trying to recognise text from some images reproducibly crashes OneNote.
Code / Download
Check out the code at GitHub
not sure about OCR, but the documentation site for onenote API is this
http://msdn.microsoft.com/en-us/library/office/dn575425.aspx#sectionSection1

Onenote OCR capabilities in a desktop software

Is there an API to use Onenote OCR capabilities to recognise text in images automatically?
If you have OneNote client on the same machine as your program will execute you can create a page in OneNote and insert the image through the COM API. Then you can read the page in XML format which will include the OCR'ed text.
You want to use
Application.CreateNewPage to create a page
Application.UpdatePageContent to insert the image
Application.GetPageContent to read the page content and look for OCRData and OCRText elements in the XML.
OneNote COM API is documented here: http://msdn.microsoft.com/en-us/library/office/jj680120(v=office.15).aspx
When you put an image on a page in OneNote through the API, any images will automatically be OCR'd. The user will then be able to search any text in the images in OneNote. However, you cannot pull the image back and read the OCR'd text at this point.
If this is a feature that interests you, I invite you to go to our UserVoice site and submit this idea: http://onenote.uservoice.com/forums/245490-onenote-developers
update: vote on the idea: https://onenote.uservoice.com/forums/245490-onenote-developer-apis/suggestions/10671321-make-ocr-available-in-the-c-api
-- James
There is a really good sample of how to do this here:
http://www.journeyofcode.com/free-easy-ocr-c-using-onenote/
The main bit of code is:
private string RecognizeIntern(Image image)
{
this._page.Reload();
this._page.Clear();
this._page.AddImage(image);
this._page.Save();
int total = 0;
do
{
Thread.Sleep(PollInterval);
this._page.Reload();
string result = this._page.ReadOcrText();
if (result != null)
return result;
} while (total++ < PollAttempts);
return null;
}
As I will be deleting my blog (which was mentioned in another post), I thought I should add the content here for future reference:
Usage
Let's start by taking a look on how to use the component: The class OnenoteOcrEngine implements the core functionality and implements the interface IOcrEngine which provides a single method:
public interface IOcrEngine
{
string Recognize(Image image);
}
Excluding any error handling, it can be used in a way similar to the following one:
using (var ocrEngine = new OnenoteOcrEngine())
using (var image = Image.FromFile(imagePath))
{
var text = ocrEngine.Recognize(image);
if (text == null)
Console.WriteLine("nothing recognized");
else
Console.WriteLine("Recognized: " + text);
}
Implementation
The implementation is far less straight-forward. Prior to Office 2010, Microsoft Office Document Imaging (MODI) was available for OCR. Unfortunately, this no longer is the case. Further research confirmed that OneNote's OCR functionality is not directly exposed in form of an API, but the suggestions were made to manually parse OneNote documents for the text (see Is it possible to do OCR on a Tiff image using the OneNote interop API? or need a document to extract text from image using onenote Interop?. And that's exactly what I did:
Connect to OneNote using COM interop
Create a temporary page containing the image to process
Show the temporary page (important because OneNote won't perform the OCR otherwise)
Poll for an OCRData tag containing an OCRText tag in the XML code of the page.
Delete the temporary page
Challenges included the parsing of the XML code for which I decided to use LINQ to XML. For example, inserting the image was done using the following code:
private XElement CreateImageTag(Image image)
{
var img = new XElement(XName.Get("Image", OneNoteNamespace));
var data = new XElement(XName.Get("Data", OneNoteNamespace));
data.Value = this.ToBase64(image);
img.Add(data);
return img;
}
private string ToBase64(Image image)
{
using (var memoryStream = new MemoryStream())
{
image.Save(memoryStream, ImageFormat.Png);
var binary = memoryStream.ToArray();
return Convert.ToBase64String(binary);
}
}
Note the usage of XName.Get("Image", OneNoteNamespace) (where OneNoteNamespace is the constant "http://schemas.microsoft.com/office/onenote/2013/onenote" ) for creating the element with the correct namespace and the method ToBase64 which serializes an GDI-image from memory into the Base64 format. Unfortunately, polling (See What is wrong with polling? for a discussion of the topic) in combination with a timeout is necessary to determine whether the detection process has completed successfully:
int total = 0;
do
{
Thread.Sleep(PollInterval);
this._page.Reload();
string result = this._page.ReadOcrText();
if (result != null)
return result;
} while (total++ < PollAttempts);
Results
The results are not perfect. Considering the quality of the images, however, they are more than satisfactory in my opinion. I could successfully use the component in my project. One issue remains which is very annoying: Sometimes, OneNote crashes during the process. Most of the times, a simple restart will fix this issue, but trying to recognise text from some images reproducibly crashes OneNote.
Code / Download
Check out the code at GitHub
not sure about OCR, but the documentation site for onenote API is this
http://msdn.microsoft.com/en-us/library/office/dn575425.aspx#sectionSection1

Copying the content from a WebView under WinRT

I've got a WebView with some HTML content which I want to convert into RTF. I've looked at the RTF conversion functions out there and they all look a little flaky to be honest. So my idea is to copy content from the WebView into a RichEditBox, and save to RTF from there.
I've seen this example numerous times.
WebBrowser1.Document.ExecCommand("SelectAll", false, null);
WebBrowser1.Document.ExecCommand("Copy", false, null);
Unfortunately, WinRT's WebView control doesn't have a Document property, so I can't do this
Is there any way to pull the content from the control? To be clear, I don't want the HTML itself - I can get that already using
InvokeScript("eval", new string[] { "document.getElementById('editor').innerHTML;" });
What I want is the actual rendered HTML - the same as if I were to select everything in my WebView, press CTRL+C and then paste it into wordpad.
This is part of a series of questions I asked in trying to accomplish a bigger task - converting HTML to RTF in a Windows Store App.
I'm delighted to report that the above can be done. I finally figured out how to do it, using DataPackage - normally used for sharing content between apps.
First, this javascript function must exist in the HTML loaded in the webview.
function select_body() {
var range = document.body.createTextRange();
range.select();
}
Next, you'll need to add using Windows.ApplicationModel.DataTransfer; to the top of your document. Not enough StackOverflow answers mention the namespaces used. I always have to go hunting for them.
Here's the code that does the magic:
// call the select_body function to select the body of our document
MyWebView.InvokeScript("select_body", null);
// capture a DataPackage object
DataPackage p = await MyWebView.CaptureSelectedContentToDataPackageAsync();
// extract the RTF content from the DataPackage
string RTF = await p.GetView().GetRtfAsync();
// SetText of the RichEditBox to our RTF string
MyRichEditBox.Document.SetText(Windows.UI.Text.TextSetOptions.FormatRtf, RTF);
I've spent about 2 weeks trying to get this to work. Its such a relief to finally discover I don't have to manually encode the file to RTF. Now if I can just get it to work the other way around, I'll be ecstatic. Not essential to the app I'm building, but it would be a lovely feature.
UPDATE
In retrospect you probably don't need to have the function in the HTML, you could probably get away with this (though I haven't tested):
MyWebView.InvokeScript("execScript", new string[] {"document.body.createTextRange().select();"})

c# IE Plugin BHO get the pdf

I need help for a little IE Plugin (Browser Helper Object).
What the Plugin should do:
If the user clicks on a link with pdf behind, the plugin should call an exe file installed on the computer. The exe file would check the PDF and open the default pdf application or a special one.
What i have done:
My BHO starts with my IE. I used this code for the beginning:
http://www.codeproject.com/Articles/19971/How-to-attach-to-Browser-Helper-Object-BHO-with-C
I disabled all adobe plugins so the ie download window shows up. Whit this code i can parse the html body add html markup etc. but thats not what i'm trying to do...
My Problem:
I don't know how to grab the pdf. If i call directly a pdf download link the cast of the site object into a InternetExplorer or WebBrowser Object fails.
public int SetSite(object site)
{
if (site != null)
{
ieInstance = site as InternetExplorer;
ieInstance.DocumentComplete += new DWebBrowserEvents2_DocumentCompleteEventHandler(this.OnDocumentComplete);
}
else if (ieInstance != null)
{
ieInstance.DocumentComplete -= new DWebBrowserEvents2_DocumentCompleteEventHandler(this.OnDocumentComplete);
}
return 0;
}
Document = '((SHDocVw.InternetExplorer)(ieInstance)).Document' threw an exception of type 'System.Runtime.InteropServices.COMException'
Could someone tell me, how i can grab the pdf befor the download window in the IE appears? I know there is a Event before download, but also this event don't help me.
What are you trying to achieve more broadly?
You can't grab a HTML document instance when the browser is showing a PDF file because there isn't one-- the PDF object is loaded instead of the HTML-rendering object.
Depending on what you're trying to achieve, there are likely better ways to accomplish it (most likely by registering a MIME Filter for the application/pdf MIME type).

Printing transformed XML

Due to circumstances beyond my control, I'm replacing all of our Crystal Reports with home-built XML reports, which are working beautifully. For most of the reports that pop up a Crystal Reports viewer, the following code opens them nicely in IE, transforming it to HTML via an XSLT stylesheet.
ProcessStartInfo psi = new ProcessStartInfo(reportFilename)
{
UseShellExecute = true
};
using (Process p = new Process {StartInfo = psi})
{
p.Start();
}
The problem is that some of the reports just print directly to a printer, never showing the report to the user, which works fine in CR. I can't figure out how to do this using the code above.
I'd prefer not to specifically launch an IE process if possible, but I am guaranteed that they're running Windows, so that's not a hard requirement. Also, will printing directly in this manner transform the XML into HTML via the XSL and print that, or just print the actual XML text?
EDIT: I've already tried adding:
Verb = "Print"
to the ProcessStartInfo object, but that winds up with an exception being thrown saying:
"No application is associated with the specified file for this operation"
EDIT AGAIN: Specifying IE as the exe to launch loaded the XML again just fine, but does not offer a "print" action. Adding "window.print()" in a JavaScript block works, but requires manually clicking the print button after allowing the script to run, because IE blocks it.
EDIT THE THIRD: My boss told me not to worry about it, that they can print from IE. I still want to figure this out. I've tried the command line "print.exe", but that only prints the raw XML to the printer. Tried XslCompiledTransform with a PrintDocument, but that's not what I'm looking for, either.
Figured it out, finally. I just created an invisible WebBrowser control that does the IE rendering, and on DocumentCompleted, call its Print() method. Worked like a charm using the default printer settings.
private static void PrintReport(string reportFilename)
{
WebBrowser browser = new WebBrowser();
browser.DocumentCompleted += browser_DocumentCompleted;
browser.Navigate(reportFilename);
}
private static void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser browser = sender as WebBrowser;
if (null == browser)
{
return;
}
browser.Print();
browser.Dispose();
}

Categories