Get SpotColor Names

Get SpotColor Names - c#

Is there a possibility to get the names of the spotcolors used in a pdf?
I'm using c#. Maybe there is a workaround with ghostscript?
color seperation
I searched the ghostscript doc but didn't find the thing.
Also tried with itextsharp.

The output of -dPDFINFO will be determined by the file contents so start with a valid empty file and using OP windows version 1000\gswin64c
gswin64c -dPDFINFO blank.pdf -o should look like this (note this is console copy
GPL Ghostscript 10.0.0 (2022-09-21)
Copyright (C) 2022 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
File has 1 page.
Producer: GPL Ghostscript 10.00.0
CreationDate: D:20230115003354Z00'00'
ModDate: D:20230115003354Z00'00'
Processing pages 1 through 1.
Page 1 MediaBox: [0 0 595 842]
C:\Apps\PDF\GS\gs1000w64\bin>
to suppress the copy write use -q
to save in a file use level2 redirection
gswin64c -q -dBATCH -dPDFINFO blank.pdf 2>out.txt
To filter output of text file use pipe filters
Does it have spot colours
What are they
As long as no open standard for spot colours exists, TCPDF users will have to buy a colour book by one of the colour manufacturers and insert the values and names of spot colours directly
So here the names are on a RGB scale
- Dark Green is 0,71,57
- Light Yellow is 255,246,142
- Black is 39,36,37
- Red is 166,40,52
- Green is 0,132,75
- Blue is 0,97,157
- Yellow is 255,202,9
But black is not full black is there a better way? yes of course
type example_037.pdf|find /i "/separation" Now we can see the CMYK spots
In this simplified case the CMYK values after each name are shown as for example Full Black = [0.000000 0.000000 0.000000 1.000000].
Note often the separation may be encoded inside the PDF data thus you need to decompress the data first to do the search. There are several tools to do the decompression, so common cross platform ones are qpdf (FOSS) mutool (partner to GhostScript) and PDFtk amongst others.

Yes, you can extract the spot color names used in a PDF using Ghostscript. Here is an example of how to do it with C# and Ghostscript:
using System;
using System.Diagnostics;
using System.IO;
namespace GetSpotColorsFromPDF
{
class Program
{
static void Main(string[] args)
{
var filePath = #"path\to\pdf";
var outputFile = #"path\to\output.txt";
var gsProcess = new Process
{
StartInfo = new ProcessStartInfo
{
FileName = #"path\to\gswin64c.exe",
Arguments = $"-dNODISPLAY -dDumpSpotColors -sOutputFile={outputFile} {filePath}",
RedirectStandardOutput = true,
UseShellExecute = false,
CreateNoWindow = true,
}
};
gsProcess.Start();
gsProcess.WaitForExit();
var output = File.ReadAllText(outputFile);
Console.WriteLine(output);
}
}
}
Note that you need to have Ghostscript installed on your machine and specify the path to the gswin64c.exe executable in the code. The -dNODISPLAY and -dDumpSpotColors arguments are used to extract the spot color information from the PDF and the -sOutputFile argument is used to specify the output file for the extracted information. The extracted information will be saved to the specified output file and then read into a string and printed to the console.

cpdf -list-spot-colors in.pdf will list them, one per line, to standard output.

Related

How to use OCR to read text from image

I'm trying to use IronOCR to read the text from an image. I manually download their dll and include in my project and follow the example that they provided on their website. However, there is no text return at all. Even I'm trying on different file or their sample image also no result. Is there any step that I'm missing?
Here I attach the image that i'm using to try:myImage

Try the following code with the latest version of IronOCR (currently 2021.2.1) which was updated to use Tesseract 4 & 5. Returned a perfect result on your example image.
var Ocr = new IronTesseract();
using (var Input = new OcrInput(#"F:\input_image.png"))
{
Input.Deskew();
var Result = Ocr.Read(Input);
}
Deskew filter gave the best results but there are other that may be useful listed here:
https://ironsoftware.com/csharp/ocr/examples/ocr-image-filters-for-net-tesseract/

Printer settings into PostScript or PCL file

I need:
Print a large number of PDFs with duplex on specific output printer feeder
I have:
printing using ghostscript with 'mswinpr2' device
using (GhostscriptProcessor processor = new GhostscriptProcessor(new GhostscriptVersionInfo("gsdll32.dll")))
{
List<string> switches = new List<string>();
switches.Add("-dPrinted");
switches.Add("-dBATCH");
switches.Add("-dNOPAUSE");
switches.Add("-dNumCopies=1");
switches.Add("-dPDFFitPage");
switches.Add("-dFIXEDMEDIA");
switches.Add("-dNoCancel");
switches.Add("-sFONTPATH = C:\\Windows\\Fonts");
switches.Add("-sDEVICE=mswinpr2");
switches.Add($"-sOutputFile=%printer%{settings.PrinterName}");
switches.Add("D:\\11.pdf");
processor.StartProcessing(switches.ToArray(), null);
}
Problem:
one job in the print queue consisting of 2 pages takes more than 50mb, while I have more than 1500 PDFs with 1 000 000 pages
What i think to do:
Convert PDF to PCL or PS, edit these files and somehow pass the settings (duplex and specific feeder). Then send edited PCL or PS file as RAW data to printer
Question:
How can i pass the settings to PCL or PS?

Since PDF files can't contain device-specific information, you clearly don't need to pick such information from the input, which makes life simpler.
Ghostscript's ps2write device is capable of inserting document wide or page specific PostScript into its output. So you can 'pass the settings' using that.
For PCL you (probably) need to write some device-specific PJL and insert that into the PCL output. However, PCL is nowhere near as uniform as PostScritp, it'll be up to you to find out what need too be prefixed to the file.
[EDIT]
You don't use -sPSDocOptions, PSDocOptions is a distiller param, so you need:
gswin64c.exe -q -dSAFER -dNOPAUSE -dBATCH -sDEVICE=ps2write -sOutputFile=D:\out.ps -c "<</PSDocOptions (<</Duplex true /NumCopies 10>> setpagedevice)>> setdistillerparams" -f D:\0.pdf
Notice that you don't need -f (as you have in your command line) unless you have first set -c. The -f switch is used as a terminator for the -c.

Print PDF using GhostScript

I am in need of your support on the following issue since its pulling me for a while. We have a small c# utility, which print given PDF using GhostScript. This print as expected but fail to retain the page formatting’s. However, pages are printed as expected when I switch Adobe Acrobat in place of GhostScript. So I presume, I am making some obvious mistake on the GhostScript's command line arguments .
Background
Following is the core c# logic, which print a given PDF file with varying style across each pages. The given PDF file has pages;
with inconsistent font style and colour
some of the pages have normal font size where others are printed in extra small
some of the pages has recommended margin but others have very small margin
some of the pages are in colour and the rest in grey.
some of the pages are landscape in style where other are portrait
In concise, the PDF which I am trying to print is nothing but a consolidation (joining individual pdfs into one large pdf) of numerous small sized pdf document with varying fonts style, size, margins.
Issue
Following logic use GhostScript(v9.02) to print PDF file. Though the following logic print any given PDF, it fail to retain the page formatting including header, footer, font size, margin, orientation ( my pdf file has pages those both landscape and portrait).
Interestingly, if I use acrobat reader to print the same PDF then it will print as expected along with all page level formatting's.
PDF specimen: First section, Second section
void PrintDocument()
{
var psInfo = new ProcessStartInfo();
psInfo.Arguments =
String.Format(
" -dPrinted -dBATCH -dNOPAUSE -dNOSAFER -q -dNumCopies=1 -sDEVICE=ljet4 -sOutputFile=\"\\\\spool\\{0}\" \"{1}\"",
GetDefaultPrinter(), #"C:\PDFOutput\test.pdf");
psInfo.FileName = #"C:\Program Files\gs\gs9.10\bin\gswin64c.exe";
psInfo.UseShellExecute = false;
using (var process= Process.Start(psInfo))
{
process.WaitForExit();
}
}

I think you asked this question before, and its also quite clear from your code sample that you are using GSView, not Ghostscript.
Now, while GSView does use Ghostscript to do the heavy lifting, its a concern that you are unable to differentiate between these two applications.
You still haven't provided an example PDF file to look at, nor a command line, though you have now at least managed to quote the Ghostscript version. You need to also give a command line (no I'm not prepared to assemble it from reading your code) and you should try this from the command line, not inside your own application, in order to show that its not your application making the error.
You should consider upgrading Ghostscript to the current version.
Note that a quick perusal of your code indicates that you are specifying a number of command line options (eg -dPDFSETTINGS) which are only appropriate for converting a file into PDF, not for any other purpose (such as printing).
So as I said before, provide a specimen file to reproduce the problem, and a command line (preferably a Ghostscript command line) which causes the problem. Knowing which printer you are using would probably be useful too, although its highly unlikely I will have a duplicate to test on.

Answer - UPDATE 16/12/2013
I was managed to get it fixed and wanted to enclose the working solution if it help others. Special thanks to 'KenS' since he spent lot of time to guide me.
To summarize, I finally decided to use GSView along with GhostScript to print PDF to bypass Adobe. The core logic is given below;
//PrintParamter is a custom data structure to capture file related info
private void PrintDocument(PrintParamter fs, string printerName = null)
{
if (!File.Exists(fs.FullyQualifiedName)) return;
var filename = fs.FullyQualifiedName ?? string.Empty;
printerName = printerName ?? GetDefaultPrinter(); //get your printer here
var processArgs = string.Format("-dAutoRotatePages=/All -dNOPAUSE -dBATCH -sPAPERSIZE=a4 -dFIXEDMEDIA -dPDFFitPage -dEmbedAllFonts=true -dSubsetFonts=true -dPDFSETTINGS=/prepress -dNOPLATFONTS -sFONTPATH=\"C:\\Program Files\\gs\\gs9.10\\fonts\" -noquery -dNumCopies=1 -all -colour -printer \"{0}\" \"{1}\"", printerName, filename);
try
{
var gsProcessInfo = new ProcessStartInfo
{
WindowStyle = ProcessWindowStyle.Hidden,
FileName = gsViewEXEInstallationLocation,
Arguments = processArgs
};
using (var gsProcess = Process.Start(gsProcessInfo))
{
gsProcess.WaitForExit();
}
}

You could use GSPRINT.
I've managed to make it work by only copying gsprint.exe/gswin64c.exe/gsdll64.dll in a directory and launch it from there.
sample code :
// This uses gsprint (mind the paths)
private const string gsPrintExecutable = #"C:\gs\gsprint.exe";
private const string gsExecutable = #"C:\gs\gswin64c.exe";
string pdfPath = #"C:\myShinyPDF.PDF"
string printerName = "MY PRINTER";
string processArgs = string.Format("-ghostscript \"{0}\" -copies=1 -all -printer \"{1}\" \"{2}\"", gsExecutable, printerName, pdfPath );
var gsProcessInfo = new ProcessStartInfo
{
WindowStyle = ProcessWindowStyle.Hidden,
FileName = gsPrintExecutable ,
Arguments = processArgs
};
using (var gsProcess = Process.Start(gsProcessInfo))
{
gsProcess.WaitForExit();
}

Try the following command within Process.Start():
gswin32c.exe -sDEVICE=mswinpr2 -dBATCH -dNOPAUSE -dNOPROMPT -dNoCancel -dPDFFitPage -sOutputFile="%printer%\\[printer_servername]\[printername]" "[filepath_to_pdf]"
It should look like this in C#:
string strCmdText = "gswin32c.exe -sDEVICE=mswinpr2 -dBATCH -dNOPAUSE -dNOPROMPT -dNoCancel -dPDFFitPage -sOutputFile=\"%printer%\\\\[printer_servername]\\[printername]\" \"[filepath_to_pdf]\"";
System.Diagnostics.Process.Start("CMD.exe", strCmdText);
This will place the specified PDF file into the print queue.
Note- your gswin32c.exe must be in the same directory as your C# program. I haven't tested this code.

Convert PDF to JPG or PNG using C# or Command Line [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I need to convert a PDF file to images. I used for testing purposes "Total PDF Converter" which offers a command line, but it's shareware and I need to find a free alternative.
Does anyone knows such a tool or maybe even a free C# library?

The convert tool (or magick since version 7) from the ImageMagick bundle can do this (and a whole lot more).
In its simplest form, it's just
convert myfile.pdf myfile.png
or
magick myfile.pdf myfile.png

As a GhostScript answer is missing and there is no hint for multipage PDF export yet I think adding another variant is ok.
gs -dBATCH -dNOPAUSE -sDEVICE=pnggray -r300 -dUseCropBox -sOutputFile=item-%03d.png examples.pdf
Options description:
dBatch and dNOPAUSE just tell gs to run in batch mode, which means
more or less it will not ask any questions. Those parameters are also
important if you want to run the command in a bash script.
sDEVICE tells gs what output format to produce. pnggray is for
grayscale, png16m for 24-bit RGB color. If you insist on creating
Jpegs use -sDEVICE=jpeg to produce color JPEG files. Use the -dJPEGQ=N (N is an integer from 0 to 100, default 75)
parameter to control the Jpgeg quality.
-r300 sets the scan resolution to 300dpi. If you prefer a smaller
output sizes use -r70 or if you input pdf has a high resoultion use
-r600. If you have a PDF with 300dpi and specify -r600 your images will be upscaled.
-dUseCropBox tell gs to use a CropBox if defined. A CropBox is
specifies an area of interest on a page. If you have a pdf with a
large white margin and you don't want this margin on your output this
option might help.
-sOutputFile defines the name(s) of the output file. The %03d.png part
tells gs to include a counter for multiple files. A two page pdf
would result in two files named item-001.png and item-002.png.
The last (unnamed parameter is the input file.)
Availability:
The convert command of imagemagick does use the gs command internally. If you can convert a pdf with imagemagick, you already have gs installed.
Install ghostscript:
RHEL:
yum install ghostscript
SLES:
zypper install ghostscript
Debian/Ubuntu:
sudo apt-get install ghostscript
Windows:
You can find Windows binaries under http://www.ghostscript.com/download/gsdnld.html

I have found this solution which worked for me: https://github.com/jhabjan/Ghostscript.NET. It is also available as an nuget download.
Here is the sample code for converting all pdf pages into png images:
private static void Test()
{
var localGhostscriptDll = Path.Combine(Environment.CurrentDirectory, "gsdll64.dll");
var localDllInfo = new GhostscriptVersionInfo(localGhostscriptDll);
int desired_x_dpi = 96;
int desired_y_dpi = 96;
string inputPdfPath = "test.pdf";
string outputPath = Environment.CurrentDirectory;
GhostscriptRasterizer _rasterizer = new GhostscriptRasterizer();
_rasterizer.Open(inputPdfPath, localDllInfo, false);
for (int pageNumber = 1; pageNumber <= _rasterizer.PageCount; pageNumber++)
{
string pageFilePath = Path.Combine(outputPath, "Page-" + pageNumber.ToString() + ".png");
Image img = _rasterizer.GetPage(desired_x_dpi, desired_y_dpi, pageNumber);
img.Save(pageFilePath, ImageFormat.Png);
}
_rasterizer.Close();
}

The #Thomas answer didn't work in my case.
I gues that works only if you have images in your pdf.
In my case what worked was pdftoppm (source from https://askubuntu.com/a/50180/37527):
pdftoppm input.pdf outputname -png
This will output each page in the PDF using the format outputname-01.png, with 01 being the index of the page.
Converting a single page of the PDF
pdftoppm input.pdf outputname -png -f {page} -singlefile
Change {page} to the page number. It's indexed at 1, so -f 1 would be the first page.
Specifying the converted image's resolution
The default resolution for this command is 150 DPI. Increasing it will result in both a larger file size and more detail.
To increase the resolution of the converted PDF, add the options -rx {resolution} and -ry {resolution}. For example:
pdftoppm input.pdf outputname -png -rx 300 -ry 300

You may want to check this free solution
http://www.codeproject.com/Articles/32274/How-To-Convert-PDF-to-Image-Using-Ghostscript-API
It easily convert PDF to images (single file or multiple files)
is open source, and use ghostscript (free download)
Example of its use:
converter = new PDFConverter();
converter.JPEGQuality = 90;
converter.OutputFormat = "jpg";
string output = "output.jpg";
converter.Convert("input.pdf", output);

You should use iText sharp. Its a port of an open source java project for manipulating PDFs.
http://sourceforge.net/projects/itextsharp/

2JPEG command line tool can do it, like:
2jpeg.exe -src "C:\In\*.pdf" -dst "C:\Out"

How do I get a Video Thumbnail in .Net?

I'm looking to implement a function that retrieves a single frame from an input video, so I can use it as a thumbnail.
Something along these lines should work:
// filename examples: "test.avi", "test.dvr-ms"
// position is from 0 to 100 percent (0.0 to 1.0)
// returns a bitmap
byte[] GetVideoThumbnail(string filename, float position)
{
}
Does anyone know how to do this in .Net 3.0?
The correct solution will be the "best" implementation of this function.
Bonus points for avoiding selection of blank frames.

I ended up rolling my own stand alone class (with the single method I described), the source can be viewed here. Media browser is GPL but I am happy for the code I wrote for that file to be Public Domain. Keep in mind it uses interop from the directshow.net project so you will have to clear that portion of the code with them.
This class will not work for DVR-MS files, you need to inject a direct show filter for those.

This project will do the trick for AVIs: http://www.codeproject.com/KB/audio-video/avifilewrapper.aspx
Anything other formats, you might look into directshow. There are a few projects that might help:
http://sourceforge.net/projects/directshownet/
http://code.google.com/p/slimdx/

1- Get latest version of ffmpeg.exe from : http://ffmpeg.arrozcru.org/builds/
2- Extract the file and copy ffmpeg.exe to your website
3- Use this Code:
Process ffmpeg;
string video;
string thumb;
video = Server.MapPath("first.avi");
thumb = Server.MapPath("frame.jpg");
ffmpeg = new Process();
ffmpeg.StartInfo.Arguments = " -i "+video+" -ss 00:00:07 -vframes 1 -f image2 -vcodec mjpeg "+thumb;
ffmpeg.StartInfo.FileName = Server.MapPath("ffmpeg.exe");
ffmpeg.Start();

There are some libraries at www.mitov.com that may help. It's a generic wrapper for Directshow functionality, and I think one of the demos shows how to take a frame from a video file.

This is also worth to see:
http://www.codeproject.com/Articles/13237/Extract-Frames-from-Video-Files

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.