PostScript - Error when Using Ghostscript "pdfwrite" - c#

I want to preface this with the understanding that I am working with legacy code and thus I am having to live with less than ideal situations and am doing some quirky stuff because of that. Until I can get approval to rewrite, I will have to make due.
Context
Here is my situation. The application is a "simple" one in that it reports off of a SQL database. For better or for worse it builds its reports with postscript. It make use of Ghostscript dlls in which it has embedded into the application directory. Here is the kicker, it has been requested that I include SSIS reports whose output is already in PDF format. For compatibility sake, i need to convert these PDFs into postscript even though in most situations they will be converted right back to PDF later on. I know this is most likely bad design but there is certain functionality that requires this and it just is what it is for the time being. I am using GhostScript to handle the conversions.
Observed Behavior
The following behavior is what is observed once the PDF is converted to PS, passed through the application, and then converted back to PDF.
When using the "sDevice=pswrite" everything works except that the reports are compiled with poor resolution despite how I tweek the resolution option.
When leveraging "sDevice=ps2write" which I understand to be the current accepted protocol, the PDF will not render back and produces the following error.
ERROR:
undefined
OFFENDING COMMAND:
U1!‘WVt92\a
STACK:
--nostringval--
20
The above error is only produced when using a report from a report server that is accessed via web client. I can confirm that the PDF returns successfully and is not corrupt.
When running local SSIS packages on the application the produced PDF is able to be handled successfully.
When the origional PDF is converted to PS using PS2Write the comments are populated as follows
%!PS-Adobe-3.0
%%BoundingBox: 0 0 612 792
%%Creator: GPL Ghostscript 905 (ps2write)
%%LanguageLevel: 2
%%CreationDate: D:20171003154139-05'00'
%%Pages: 3
%%EndComments
pswrite produces
%!PS-Adobe-3.0
%%Pages: (atend)
%%BoundingBox: 21 30 761 576
%%HiResBoundingBox: 21.600000 30.400000 760.566016 575.100000
%.....................................
%%Creator: GPL Ghostscript 905 (pswrite)
%%CreationDate: 2017/10/03 15:53:40
%%DocumentData: Clean7Bit
%%LanguageLevel: 2
%%EndComments
%%BeginProlog
Suspicion
I am suspecting that either the PDF is in an incompatible standard that cannot be converted to PostScript. For example, a newer PDF version that cant be handled. Or perhaps it contains something that is incompatible such as a font or img.
Is there anyway to hunt this down for sure? Has anyone come across similar situations and what was the solution? Any pointers as to what to look into or things to try?

To be honest, nobody is likely going to be able to help without seeing the original PDF file. Even a dummy file will be fine provided it exhibits the error.
However, the first thing that springs to mind is that you appear to be using Ghostscript 9.05. That is now 5 years old, the current release is (about to be) 9.22. There have been numerous fixes to ps2write in that time, at least 50 or more, and the first thing I would suggest you do is upgrade and see if the problem goes away.
Secondly, you haven't been clear on why you need to convert the PDF files to PostScript. If all you are doing is feeding those back through Ghostscript along with some additional PostScript in order to convert the assemblage into PDF, you do not need to turn the PDF files into into PostScript first. Ghostscript is entirely capable of taking a mixture of PDF and PostScript files, so you can simply inject the PDF in between the PostScript from your SQL output to produce a single combined PDF.
This has a number of advantages; first and most obviously, you shouldn't get your conversion problem. Secondly, any construct in the PDF file which cannot be represented in PostScript (eg transparency) means that the content will be rendered to an image and the PostScript will simply contain a big bitmap. Just like the pswrite output, avoiding conversion means that won't happen. Thirdly it will be quicker than first converting all the PDF files to PostScript.
If you absolutely can't do that, then I would try current code and see if its better. If not then you have found a bug and I would suggest you report it at https://bugs.ghostscript.com you will need to be able to supply an example file and command line though.

Related

How to get RDLC report render PDF with ToUnicode entry for copypasting non-ansi text from the resulting PDF

Preface: we have a reports generated in c# application using Microsoft.Reporting.WebForms. LocalReport class from a RDLC file. They are rendered in PDF format. The text in the report is mostly in Cyrillics. The problem is: it's impossible to copy it from the resulting PDF file, you get garbage.
The reason you get garbage is the text is written as the "Identity-H" encoding for the font. It's not a real encoding, it's just an assignment of CIDs (basically, numbers) for glyphs used in the PDF file. Adobe's PDF format has the "ToUnicode" entry for this reason – that's what should store the correspondence of CIDs to the Unicode characters. If this information was present, it would be possible to copy/past text from the file correctly.
Obviously, this class doesn't write it. While researching the problem, I came across this page that recognizes the lack of copy/paste support and praises it finally being implemented... in SQL Server 2016 Reporting Services.
Well, we don't use ServerReport class and SQL Server RS. Or SQL Server 2016. It'll be kinda a weird and way too giant architectural changes to move to it just because managers complain they cannot copy text from PDFs.
So, is there a workaround? I doubt noone faced this problem before. Maybe the writing of this ToUnicode entry was implemented in LocalReport in the newer version of dotNet? Did someone write some sort of wrapper classes that take a bytearray of the PDF and enhance it? Or maybe people render the report to DOCX and then use some other library to make a PDF out of that correctly?

Embedding Pdf with OpenXml in PowerPoint fails for newer versions

I need to progammatically embed Pdf documents in PowerPoint via OpenXml. According to this: Embedding files into Open XML documents using C# it is possible via OLE32.StgCreateStorageEx methods to create the necessary picture as well as the oleobject.
Unfortunately this doesn't work with current versions of PDF. On a 64 bit OS, this seems to work only with Abobe version 9.Higher version fails with error code 0x8000FFFF which translates to Catastrophic failure. This is actual true after testing it. Even the version 9 does not work reliable.
As a fallback, I used pdfium by google to create a png from the first page. This unluckily is only half the way, as the incorporated oleobject is very different from the original one. That does not hurt until the user tries to open the embedded document via doubleclick within Powerpoint. Then an error message comes up, saying the application of the document cannot be found.
Here my questions:
Has anyone information about how to improve the procedure to make it working even with never versions?
Does anybody know what the changes to the pdf document are that are necessary to incorporate a similar object like pdf does it?
Any hint is highly appreciated
Finally I made it running. Have a look here for explanation.
Actually there is only one difference compared to the code in Embedding files into Open XML documents using C# . When calling, StgCreateStorageEx OLE32.STGFMT.STGFMT_DOCFILE has to be used instead of STGFMT_STORAGE.
That makes it running even with newer Adobe versions.

Edit VSAM file using C#

We are looking at different ways to update a VSAM file.
One of the things that we would like to do is to stop writing any new cobol code.
We were wondering is it possible to download a VSAM file from the Main Frame to a Windows Server, then use a C# program to edit it, then transfer it back to the Main frame?
Has anyone tried this?
And yes we are moving away from the use of VSAM, but it takes time.
There are plenty of other options for updating a VSAM file other than a COBOL program.
Transferring the file and and back again seems a perverse and error prone way to update a simple VSAM file. Most VSAM files contain a mix of character, integer and packed decimal data, C# plain cannot handle mainframe packed decimal and any attempt to translate EBDCIC to ASCII during file transfer will corrupt the packed decimal and binary values, so, you will need to manipulate raw EBCDIC characters.
Obviously you can write a COBOL program! (seems perverse not to!)
You could also write a C, C++, Java or PL/1 program all of which run on the mainframe, all of which have VSAM support.
You could extract to a sequential file update with a script (zsh, Rexx, PERL etc. etc.) and reload. (Your site may have an add-in that allows direct update from Rexx).
Most largish mainframe sites have an add on utility like File-Aid, Startool or Ditto which allow direct editing of VSAM files.
The MS way would be to use Biztalk Microsoft Host Integration Server to access VSAM.
There might be other non-MS drivers, which I am not aware of - maybe via DB2 Windows drivers.

How to connect to a print driver in C#?

I have an task of converting bunch of formats like .pdf, .doc, .jpg, .xls, .txt, .bmp file types into .png format. I found a print driver that does that.
But how do I connect to that printer driver in .net? This will a server side component. I need to print documents into a folder using this print driver.
I am wondering how that can be done.
Thanks
Based on your updated comments, it sounds as if you are looking to convert a variety of images and document types to a single common image type. The process of taking one of the several possible source formats you mention and convert it to a bitmapped format such as .PNG is referred to as RENDERING or RASTERIZING. You want to take one of the input formats, render it to a bitmap representation, then write it to a file in .PNG format. While it certainly might be possible to do this using a print driver, to do so, you would typically be relying on an installed application that would allow you to pass the source document to it for printing to the driver. For this to work, each of the source file types you want to be able to handle this way needs to have an application installed which can take actions from the shell and do what you request. So for example if you want to do this with a .DOC file, you need Microsoft Word installed as it does properly respond to the PRINT shell command. However, the limitation with the shell based method is that it is always going to print to the DEFAULT system printer. So your driver would need to be setup as the default printer for the machine you are going to run your process on. Therefore you would need to see if each of the source types you want to be able to handle have an installed or installable application which will allow you to print them using the shell and the PRINT action verb.
Reference URLs:
Windows Shell Verbs and File Associations
Creating Shortcut Menu Handlers
The problem with this technique is not all applications respond to the PRINT verb correctly or at all. This usually works with all the major Microsoft applications, but you should test any other document types you want to support before going much further with this technique.
This also raises other questions that this doesn't even begin to address such as what to do about multiple page formats. You listed a few image types that are straight-forward and can be converted to PNG files pretty directly. But how do you want to render a multiple page Word document files into PNG format? Do you intend for only one very large PNG with all the pages one after another? Or do you intend for one PNG file per corresponding source document page? The print driver method might not give you very much control over that.
Depending on some of these details and just how much control and reliability you need in the process, you might want to consider a completely different route to your process. Maybe you should consider using tools/libraries that can read the source file formats you want to support and render them directly, after which you can save into your PNG files. One library I have used in the past that would seem to fit and allow you a high degree of control over the conversion (rendering/rasterization) process is LeadTools. It is a fairly pricey product, but my experience with it has been that it does support a wide variety of formats reliably.
LeadTools PDF and Document Readers SDK
There may be some other open source tools available that you could pull together to support this type of functionality, but I'm not familiar with any to point you to anything specific. But hopefully this helps give you some information to look at putting together a process that might be more reliable and give you greater control than trying to coerce a printer driver to do something you might not quite be able to make work reliably.
Server-side component implies something that doesn't have a human sitting at it (at least, not the human that is trying to use that printer). If this is the case then a print driver will not work - Print drivers that write their output to disk instead of a device always, in my experience, ask the user to select a place to save the file (present a Save As dialog).
To elaborate a little bit on what Boo mentioned :
Depending on the printer driver you are using, you may be able to tell it where to save your file.
The problem is by using a printer, how it normally works is that you can print from any application to a .png file. But the application itself has to know how to open and render (not talk to the printer) the content of the original file.
To continue down this path, you have to make sure your server component knows how to read and render content of each file type (.jpg, .pdf, .doc, etc.).
Assuming your server component knows how to render the content, the next step from here is to use the .NET Printing namespace to print your content to the .png printer.
For more details go to : http://msdn.microsoft.com/en-us/magazine/cc188767.aspx

How to convert a printer driver to a stand-alone console application which can generate a printer file containing the bytes to be sent to the printer?

I have a situation where the only way to generate a certain datafile is to print it manually to FILE: under Windows and save it in a file for further processing.
I would really like to have a small stand-alone program which embeds this binary printer driver so I can run it from a batch file and have it generate that binary file for me, as we can then fully automate the "save file in Visio, 'print' it and upload it to the final destination and trigger a remote test".
Is this possible with a suitable Windows SDK? I am a Java programmer, so I do not know Visual Studio and the possibilities with MSDN - yet! - but I'd appreciate pointers.
EDIT: I have the installation files for that printer driver, both 32 and 64 bit. Older versions may include a 16 bit driver.
EDIT: The "print to FILE:" functionality is just what was recommended by the documentation. I have played a little bit with using the LPR-protocol to see what it can do. I'd still prefer the "invoke small binary" approach.
The general problem which you formulate is difficult to solve. Mostly a printer driver consists from some well known components like Print Monitor, Print Processor etc. which are well documented in Windows Driver Kit http://msdn.microsoft.com/en-us/library/ff560885%28v=VS.85%29.aspx. Some years ago I wrote a Print Monitor. It worked many years at a customer. So I know exactly what I writing about. A Print Monitor is nothing more as a DLL with well documented functions. The same is about most other printer components. Those DLLs will be loaded and called from Spooler. If you have a modern printer driver it has no components which run in kernel mode. So one can load most of DLLs from which consist every printer driver and call corresponding function.
You are interesting for using one concert printer driver. So the first what one should do is to examine how this driver is implemented. If you find out which component do the job which you need, you will be probably able to load this DLL in your process and produce output which you need. It is possible that you post an URL where I could download this driver?
UPDATED: I though a little more about your requirements. It seems to me you can goes with the way suggested by developer of the printer driver. If the driver can print to a local port FILE, then it can print in any printer port. So you can give src of a Port Monitor Server driver from C:\WinDDK\7600.16385.1\src\print\monitors\localmon (see also http://msdn.microsoft.com/en-us/library/ff556478%28v=VS.85%29.aspx, http://msdn.microsoft.com/en-us/library/ff549405%28v=VS.85%29.aspx and http://msdn.microsoft.com/en-us/library/ff563806%28v=VS.85%29.aspx). (I is a windows 32/64 DLL, not a real driver) and makes small modification. Instead of saving results to a file you can dispatch the results to your application. It will be work with 100% without any tricks. If you will have some problem to understand localmon I can give you some tips. It is really not complex. The main changes which you have to do is to modify LcmStartDocPort LcmWritePort LcmReadPort LcmEndDocPort functions from localmon.c. Some easy thing which is distinguish Port DLL from a typical DLL, that instead of exporting all DLL's functions it export only one InitializePrintMonitor2 with pointers to all other functions.
UPDATED 2: One more tip for usage of "Local Port" monitor. If goes in printer configuration, then choose "Add Port...", select "Local Port" and click "New Port..." you can type any file name like "C:\temp\my.bin". Then all what you print through a printer will be printed in this file without any user iteration. The name can be any win32 file name (UNC names or Named pipes are also allowed). With this way you can realize some scenarios without any programming with DDK.
UPDATED 3: I looked at the printer driver from different sides and looked one more time in the API in DDK. Now I want recommend you to choose the easiest way, and the way which will be full supported from the driver manufacturer. I suggest following:
You install a printer with the driver which you need and choose as the output port a Local Port with a fixed file name (see Update 2). I named here the destination filename as C:\TEMP\Output.afp. So you receive exactly the same situation like recommend you driver manufacturer. Fixed file name is absolutely the same as FILE: port. So if you print to the printer you receive in Output.afp file in the C:\TEMP directory. To be sure the end of writing you can use ReadDirectoryChangesW or FindNextChangeNotification / FindFirstChangeNotification functions with dwNotifyFilter equal to FILE_NOTIFY_CHANGE_LAST_WRITE. Then you receive notification after last write-time of the file. It means after the end of writing and after FileClose and after the cache is sufficiently flushed. So the file Output.afp is not locked and you can really safe read the results.
For printing of simple documents you can use WritePrinter function (see http://msdn.microsoft.com/en-us/library/dd162959%28VS.85%29.aspx and remark in the documentation http://msdn.microsoft.com/en-us/library/dd145226%28VS.85%29.aspx). Writing of complex files with bitmaps, color and different fonts you have to use typical GDI API like one this in Windows (see http://msdn.microsoft.com/en-us/library/dd162865%28v=VS.85%29.aspx).
This solution looks not very spectacular like writing a printer driver component or a simulation of spooler environment for printer driver, but it will work, will safe work and will be full supported from the driver manufacturer.
(It's been 10 years since I did anything like this, but I don't think the overall concepts have changed all that much:)
What you want to do is implement a custom print processor. A print processor is the piece of code that takes the output that the printer driver generates and transports it to the output device. Print processors are implemented as regular user-mode DLLs. You should be able to find everything you need, including samples, in the Windows DDK.
A while ago we made a commercial application which captured print streams from any windows application and converted the result to XML and tiff images
We did make a prototype with the DDK, but ended up buying a SDK for the print capturing
The SDK was from BlackIce. Although it wasn´t a free SDK, the distribution of the runtimes were royalty free.
Implementation was done with Visual C (unmanaged) and VB6.
The printer driver had to be installed on the server/PC that drove the printing process.
I remember that the tricky part was to control the printer settings in runtime (keep the tiffs compressed, output directory for the files, paper size:A4 or Letter and other settings that were defined in the DEVMODE print control structure).
UPDATE: (Your comment to #Oleg about MO:DCA P triggered my memory. Although it is not about a printer driver...)
For our commercial product, we also had to make a customization to convert MO:DCA (AFP) documents to tiffs and XML.
This SDK had to be able to extract both images and ascii text to enable later conversions
Conversion where then made in batch from AFP documents in one folder to XML and tiffs.
We chose to convert the AFP file after it had been printed (not during print).
The SDK is SnowBound RasterMaster and is available in different flavours (we used the Windows API with ActiveX, and I see now that it is available for Java)
So if your requirement is to convert an AFP document to someting else (extract images and extract ascii text) you could try out the software from SnowBound. Make sure you also get the Optional Feature to be able to extract ASCII text from the MO DCA documents.
This software SDK is more expensive, but it did the job.
They offer a trial version here.
At the moment i have one missing link in your explanation, so let me rephrase what i understood:
You have a special printer driver on your windows system, that is configured to print into a file.
You like to have a simple batch program that can give something to this printer driver to output a binary file.
You have a toolchain where this file can be further processed.
Now my missing part is, what do you want to give to your little batch script, so that it produces your binary file? Do you have a Visio file which should be automatically printed through this driver?
If yes, you should take a look into this little batch script. It is able to take any file with a registered file extension and send it to the default printer with its default settings. By using these settings you are able to change the printer settings within your windows system from a batch file to make your special driver the default one and putting the output into a file.
So if i understood you correctly i didn't had the complete solution but i think a good starting point to accomplish your task.
Update
Ok, after reading your comment, i fully understood what you like to achieve. To get this to work you have to follow Per Larsens advice to write your own driver with the windows ddk (or to be more precise the Windows Driver Kit [WDK]) and encapsulate the already existing driver.
So in short and simple: Your driver signs up as new printer driver. When it is called it gets all the raw bytes from the application. Passes it into the driver that can generate your datafile. Get the output from that driver back and do with it whatever you like.
Some samples to get started can also be found in MSDN as overview or more precisely here.
But just to say it right beforehand: This is not an easy or simple task and the effort is quite high. Maybe trying to manipulate the driver settings of your special driver through the already given batches or a simple application (written with AutoIt) can also solve your problem, by just interacting (automatically) with the settings of the driver.
I can live with "When a user prints any file to this particular Windows printer, then automatically capture the bytes that would have been sent to the printer".
In that case, you want something like RedMon, which redirects the bytes which would have gone to the printer into the input for another program.
Just to reiterate, probably the simplest capture method is using a new Local Port configured as a filename. You can to monitor the output file as previously discussed to catch the output.
Otherwise, you want to write your own port monitor - not a printer driver or a print processor. All a port monitor does is receive the already rendered data from the printer driver, and sends it to the output device. So writing your own port monitor will allow you to go in and change the output port associated with the existing printer driver to be your own output port, and your port monitor can simply write the data to a file, probably one with a unique filename in a dedicated directory.
Printer drivers are far too complicated for what you want to do, and while a print processor could also capture the output data, you'd probably get entangled in some scantily documented system issues you won't want to have to figure out.
The LocalMon sample in the Windows Driver Kit is THE starting point for writing a port monitor. However, it manages all the system local ports and is quite a bit more complex than you need. In fact, much of it is just likely to confuse you. I'd recommend you start with LocalMon, and compare it to the Redmon source, which is much simpler because it manages a dedicated port. Beware that the Redmon source was taken from localmon long ago and appears to have a few bugs, so use Redmon as a reference and pare back the LocalMon code to what's needed to just write the output to a file.
You don't embed drivers in executables- drivers are for the operating system to communicate with the hardware.
You print via the Operating system.
Your 'batch' needs to select the correct printer, and print...

Categories