get the data send to the printer - c#

In my project we need to use a virtual printer and then catch the file (most of the times its bitmap) and extract data from it. and transform it into xml like so .
<document name="file://C:\DOCUME~1\ilanit\LOCALS~1\Temp\p0129600584.htm">
<lineXY x="0" y="0" height="1656" width="2275" />

Is it something like Redmon you are looking for (used in conjunction with output to file and the launch an application)? If so you can use it or there are others out there too. Redmon is a little dated and depending on the OS you might have issues. If you can, add more detail and specifics to your question as it's a bit confusing.
UPDATE (based on comments): If the source is PDF or some other document (ie: Word) that has actual text and not just graphics (scan/image) type data you could use a Postscript driver (type 1 might work best) and then extract the text after you capture the print file. If you are not going to use the print file for actual output and just need the data, you can always try the Generic Text driver in Windows as it will ignore graphcis and just put the text in the output file. As long as the output is consistent and a little Regex should be able to pull out what you need.
If the data is graphical in nature such as a scanned image that you are printing, you will need to capture the print job, turn it into a graphic image (as it will be a print file with PCL or Postscript etc.) and then run it through an OCR engine to pull out what you need.

Related

Embed word document into another WITHOUT icon

How to embed a word document into another word document via OpenXML SDK, but showing content, not an icon of word? Such, as we do it manually in word: Insert object from file -> WITHOUT checking "Dispaly as icon"?
I've found this article, but it uses an icon. I've also tried to use OpenXML SDK Productivity Tool, but shows only generated binary data.
EDITED:
I use the following code:
DrawAspect = OleDrawAspectValues.Content
and then i add image part:
var imagePart = mainDocumentPart.AddNewPart<ImagePart>("image/x-emf", imagePartId);
GenerateImagePart(imagePart);
But my image part - is just an array of bytes of word's icon.
So, in this case happens the following: when i open generated document, it shows embedded document as an icon, but when i double click this embedded document, edit it and save changes, the embedded document is shown as a content, so maybe it's possible in some way to show this content without editing embedded document? Should i use instead of array of bytes of word's icon an array of bytes of doc's screenshot?
Not sure i described it clear, so please ask
I'm afraid what you are asking for is almost impossible.
The only difference as far as the word file is concerned between the icon and the embedded file, is the image.
When you don't use a icon Word pretty much just take a screenshot of the document you are embedding and inserts that in place of the Icon graphic.
I've uploaded an example I grabbed from a Word file I made. Found this little gem in the /media folder inside the .docx file.
So basicly, your only choice in resolving this if you can't live with the Icon is to somehow grab a picture of the word-file you want to embed and insert that instead of the Icon image.
How you'd go about that can't be pretty. First of all the open xml sdk contains no such functionality. I tried playing a bit around with office interop as well, but no luck.
I only see two possible ways to achieve this.
First one is via Interop. You'll need to install a "pretend printer" like the ones that print to PDF instead of sending it to a printer. This one however needs to print to an image format. The format of the file in the Media folder was .emf but I'm not positive thats a requirement.
Anyways, should the above somehow be possible you could embed that picture, pretty much using the example you link from Microsoft, and just change this size of the "icon" which now would be an image of the document.
Second possibility would be to open the word document as a process, set the document size to 72% (or whatever makes the document be the only one on screen on your desktop) and the grab a print screen and cut it down to just the document and the use that as your image for the embedding.
For the record, I don't recommend you do any of the above, but thoose are the only options I see.
Should someone have a better solution to this I'm all ears.
Finally, should you decide that you want to push on with this, I'll be happy to code up an example of option number 2 if you reply and tell me you'd like that.
Kaspar
There is a nice wrapper API (Document Builder 2.2) around open xml specially designed to merge documents, with flexibility of choosing the paragraphs to merge etc. You can download it from here.
Using this tool you can embed a paragraph of another word document or entire word document as per your requirement.
The documentation and screen casts on how to use it are here.
Hope this helps.

Extended printer properties

I'm working with a WinForms app. I have an RDLC report that will be printed on 11x17 and then folded (printer supports folding). I'm rendering to EMF and drawing to pages of a PrintDocument. This works fine except for folding.
What I'd like to do is store the settings that make the printer fold. The users would select a preset from a dropdown and the app would select the printer, the paper size, the tray, whether to duplex, and whether to fold. Storing the PrinterSettings object covers most of this, but doesn't save the folding option.
I first attempted to store/retrieve something I read about called DEVMODE. For reference: http://nicholas.piasecki.name/blog/2008/11/programmatically-selecting-complex-printer-options-in-c-shar/. What I found is that even though I had extra data specific to the driver, all the bytes were 0 regardless of what driver-specific settings I changed. I'm not sure where I went wrong with this, but I abandoned it and looked at the printing capabilities in WPF.
I found that I could configure a PrintTicket for my settings, store it, and retrieve it later. It seems a bit convoluted just to save the settings, but I think I have it working. At least it seems to show up correctly in the PrintDialog. However, I'm now stuck trying to figure out how to print my report.
As I understand it, I can't take a PrintDocument from WinForms printing and use it in WPF. I also read EMF format is not supported in WPF. I thought I would render each EMF to a bitmap, then print those. But the text in my report is fuzzy and I'm not having any luck clearing it up.
Starting with a stream that contains EMF bytes that I know will render sharply with PrintDocument, I test trying to save to a file. It seems no settings that I provide will save with crisp text.
var pageImage = new Metafile(stream);
pageImage.Save(filename);
All this just to add the ability to fold. Am I just completely on the wrong track? I don't see how this should be so hard. I guess I either need to find another way to save/restore custom printer settings or I need a way to render these EMF files better.
I also tried rendering the report directly to BMP format and it's also poor quality.
I tried something slightly different and it worked! I reused my original PrintDocument code and printed to an XPS file. Then I printed the XPS file using my PrintTicket and it works fine.

How to connect to a print driver in C#?

I have an task of converting bunch of formats like .pdf, .doc, .jpg, .xls, .txt, .bmp file types into .png format. I found a print driver that does that.
But how do I connect to that printer driver in .net? This will a server side component. I need to print documents into a folder using this print driver.
I am wondering how that can be done.
Thanks
Based on your updated comments, it sounds as if you are looking to convert a variety of images and document types to a single common image type. The process of taking one of the several possible source formats you mention and convert it to a bitmapped format such as .PNG is referred to as RENDERING or RASTERIZING. You want to take one of the input formats, render it to a bitmap representation, then write it to a file in .PNG format. While it certainly might be possible to do this using a print driver, to do so, you would typically be relying on an installed application that would allow you to pass the source document to it for printing to the driver. For this to work, each of the source file types you want to be able to handle this way needs to have an application installed which can take actions from the shell and do what you request. So for example if you want to do this with a .DOC file, you need Microsoft Word installed as it does properly respond to the PRINT shell command. However, the limitation with the shell based method is that it is always going to print to the DEFAULT system printer. So your driver would need to be setup as the default printer for the machine you are going to run your process on. Therefore you would need to see if each of the source types you want to be able to handle have an installed or installable application which will allow you to print them using the shell and the PRINT action verb.
Reference URLs:
Windows Shell Verbs and File Associations
Creating Shortcut Menu Handlers
The problem with this technique is not all applications respond to the PRINT verb correctly or at all. This usually works with all the major Microsoft applications, but you should test any other document types you want to support before going much further with this technique.
This also raises other questions that this doesn't even begin to address such as what to do about multiple page formats. You listed a few image types that are straight-forward and can be converted to PNG files pretty directly. But how do you want to render a multiple page Word document files into PNG format? Do you intend for only one very large PNG with all the pages one after another? Or do you intend for one PNG file per corresponding source document page? The print driver method might not give you very much control over that.
Depending on some of these details and just how much control and reliability you need in the process, you might want to consider a completely different route to your process. Maybe you should consider using tools/libraries that can read the source file formats you want to support and render them directly, after which you can save into your PNG files. One library I have used in the past that would seem to fit and allow you a high degree of control over the conversion (rendering/rasterization) process is LeadTools. It is a fairly pricey product, but my experience with it has been that it does support a wide variety of formats reliably.
LeadTools PDF and Document Readers SDK
There may be some other open source tools available that you could pull together to support this type of functionality, but I'm not familiar with any to point you to anything specific. But hopefully this helps give you some information to look at putting together a process that might be more reliable and give you greater control than trying to coerce a printer driver to do something you might not quite be able to make work reliably.
Server-side component implies something that doesn't have a human sitting at it (at least, not the human that is trying to use that printer). If this is the case then a print driver will not work - Print drivers that write their output to disk instead of a device always, in my experience, ask the user to select a place to save the file (present a Save As dialog).
To elaborate a little bit on what Boo mentioned :
Depending on the printer driver you are using, you may be able to tell it where to save your file.
The problem is by using a printer, how it normally works is that you can print from any application to a .png file. But the application itself has to know how to open and render (not talk to the printer) the content of the original file.
To continue down this path, you have to make sure your server component knows how to read and render content of each file type (.jpg, .pdf, .doc, etc.).
Assuming your server component knows how to render the content, the next step from here is to use the .NET Printing namespace to print your content to the .png printer.
For more details go to : http://msdn.microsoft.com/en-us/magazine/cc188767.aspx

How to get "printer ready bytes" from a source in c#?

I'm in a bit of trouble here, hoping you can help a fellow programmer out.
I have an application that receives a pointer to raw bytes (plus length and stuff) and sends said raw data to a printer. This is important, I have no choice but to use this method to get any printing done.
If I send a raw string, it will print with no problem. However, I need to be able to print formatted text, images, etc. So the thing is... I would like to be able to get printer ready bytes from a given source (maybe a pdf, or html, does not matter as long as it contains formatted text and/or images). It would be like "splitting" the print command like so:
a) Open file and read data
b) Load printer data into memory
c) Send bytes to printer
Obviously, I've got a) and c) covered, it's b) the one that's breaking my head.
Any thoughts?
Thanks in advance for your help.
What you need is the printer processor to receive your print command and create formatted data. You wouldn't want to do this yourself, I hope (formatting to printer-ready data, even if you know PS, AFP, PCL or what it is nowadays, by heart, is very hard and months work). Instead, the printer processor of Windows should be used.
If you're on Windows (I assume, because you use C#, but perhaps you use Mono), you can send any printer command to a file (simply use the FILE: port). To create the formatted data, use any PDF library you have, or use RTF, which is supported by the .NET Framework, and send it to the selected printer (which should match the same printer that's on the other end of your application), which is configured on port FILE:.
The raw print data is then on disk, which you can simply read in as a byte array and send to your actual printer using the application you already got.

How to read a .pdf file programmatically and convert it into audio (.mp3 format)?

I want to parse a PDF file from my C# app and create an audio file off it.
How would I do that ?
I'm particularly looking for a good pdf to text library or a way to strip a pdf file off its text.
You preferably have a tagged PDF document as your input document. This means that the document contains tags to mark up the logical structure of the document (typically a PDF document will only contain visual information).
This PDF could then be converted into DAISY format, which is a standard for digital talking books, i.e. an intermediate XML format storing the text of books along with the logical structure and navigation features.
This Daisy XML format can be either converted to an audio format, or you could be using a Daisy reader, a physical device like an MP3 player to listen to the book.
There is a presentation available at the Daisy web site explaining the principles of this toolchain:
Accessible PDF to DAISY/NIMAS Conversion
Use Festival for the text to speech. Various pdf to text api's exist...
You need the Speech SDK from Microsoft. Read an instruction here
As the other posters outlined, first you have to extract the text from the .pdf file. pdf files are an open format now, so you can probably find a parser through Google.
Then you have to extract the text you want to convert to speech from the file, ignoring things like figure titles, page headers, table of contents etc.
Once you've got the text, you need to convert it to speech. This is probably the hardest part.
A while ago I was fiddling around with generating voice files for a gaming mod, since I'm a rotten voice actor.
Cepstral had the best TTS converters I could find. (The free ones had an annoying tendency to insert Cepstral advertisements in the speech, but I could manually edit this out for what I was doing.)
It turns out that there's a speech synthesis markup language which can be used to provide clues to the TTS converter about which syllable to place accents, etc. Here's a linky:
http://www.w3.org/TR/speech-synthesis/
How you go about automatically adding the SSML to the text is a bit beyond me.
Anyway, the TTS converter will produce an audio file, and the final step would be to compress the audio at the desired bit rate in mp3 format.
If your sole task is to listen to speech synthesized text from a PDF, how about the Acrobat "Read out loud" function at the bottom of the "View" menu?
I guess it's a hard thing to do. Firstly you need to read the text in that pdf, and then use some mechanism of synthetic voice generation to create the audio content. Then you have to store it as an mp3.
On Mac OS X, you can extract the text of the pdf and then pipe it in "say". You should find equivalent synthetisers on other OS.
It's not all that complicated to do, provided that you don't re-invent the wheel, but instead simply reuse existing technology (i.e. text to speech engines like festival), as well as OCR engines to process the PDF files.
The most complicated thing probably is to work with different PDF layouts (columns, rows, embedded graphics,foot notes, URLs etc), which may obfuscate the text recognition process.
However, in general (if this is not supposed to be a learning experience), it is certainly easier to just resort to using existing software solutions:
Visual Text to Speech
Text to speech tools
zero 2000

Categories