How to write and view converted pdfs to and from memory? - c#

Right now I am using ghostscript in Unity to convert pdfs to jpgs and view them in my project.
Currently it flows like so:
-Pdfs are converted into multiple jpegs (one for each page)
-The converted jpegs are written to disk
-They are then read in by bytes into a 2D texture
-And this 2D texture is assigned to a GameObjects RawImage component
This works perfectly in Unity, but... (now comes the hiccup) my project is intended to run on the Microsoft Hololens.
The Hololens runs on the Windows 10 API, but in a limited capacity.
Where the issue arises is when I try to convert pdfs and view them on the Hololens. Quite simply, the Hololens cannot create or delete files outside of its known folders (Pictures, Documents, etc).
My imagined solution to this problem is to instead of write the converted jpeg files to disk, write them to memory and view them from there.
In talking with GhostScript devs, I was told GhostScript.NET does what I am looking to do - convert pdfs and view them from memory (It does this with the Rasterizer/Viewer classes, I believe, but again I don't understand it quite well).
I've been lead to look at the latest GhostScript.NET docs to route out how this is done, but I simply don't understand them well enough to approach this.
My question is then, based on how I'm using ghostscript now, how do I use GhostScript.NET in my project to write the converted jpegs into memory and view them there?
Here's how I'm doing it now (code-wise):
//instantiate
byte[] fileData;
Texture2D tex = null;
//if a PDF file exists at the current head path
if (File.Exists(CurrentHeadPath))
{
//Transform pdf to jpg
PdfToImage.PDFConvert pp = new PDFConvert();
pp.OutputFormat = "jpeg"; //format
pp.JPEGQuality = 100; //100% quality
pp.ResolutionX = 300; //dpi
pp.ResolutionY = 500;
pp.OutputToMultipleFile = true;
CurrentPDFPath = "Data/myFiles/pdfconvimg.jpg";
//this call is what actually converts the pdf to jpeg files
pp.Convert(CurrentHeadPath, CurrentPDFPath);
//this just loads the first image
if (File.Exists("Data/myFiles/pdfconvimg" + 1 + ".jpg"))
{
//reads in the jpeg file by bytes
fileData = File.ReadAllBytes("Data/myFiles/pdfconvimg" + 1 + ".jpg");
tex = new Texture2D(2, 2);
tex.LoadImage(fileData); //..this will auto-resize the texture dimensions.
//Read Texture into RawImage component
PdfObject.GetComponent<RawImage>().texture = tex;
PdfObject.GetComponent<RawImage>().rectTransform.sizeDelta = new Vector2(288, 400);
PdfObject.GetComponent<RawImage>().enabled = true;
}
else
{
Debug.Log("reached eof");
}
}
The convert function is from a script called PDFConvert which I obtained from code project. Specifically How To Convert PDF to Image Using Ghostscript API.

From the GhostScript.Net documentation, take a look at the example code labeled: "Using GhostscriptRasterizer class". Specifically the following lines:
Image img = _rasterizer.GetPage(desired_x_dpi, desired_y_dpi, pageNumber);
img.Save(pageFilePath, ImageFormat.Png);
The Image class seems to be part of the System.Drawing package, and System.Drawing.Image has another Save method where the first parameter is a System.IO.Stream.

Related

printing a texture2D to file or as image for debugging purposes

I am trying to figure out how an image is processed (what it looks like, etc...) at several stages of Image Processing operations, so I have written the following co-routine to capture a screenshot, but I am not sure how to print out the taken shots at different stages (for example as png images)
public void CaptureFrame(RectTransform rect)
{
StartCoroutine( Co_Capture(rect) );
}
private IEnumerator Co_Capture(RectTransform rect)
{
yield return new WaitForEndOfFrame();
_texture = new Texture2D( (int)rect.sizeDelta.x, (int)rect.sizeDelta.y );
_texture.ReadPixels( new Rect( rect.anchoredPosition.x - (rect.sizeDelta.x * 0.5f), rect.anchoredPosition.y - (rect.sizeDelta.y * 0.5f), rect.sizeDelta.x, rect.sizeDelta.y ), 0, 0 );
_texture.Apply();
OnImageCaptured( new ScreenShotEventArgs(_texture) );
}
Given that I build the app to an Android device, ideally I would like to have the images saved both in a folder somewhere (through Unity3D Editor) as well as on the device in its data folder, for example. Something akin to Debug.Log messages that you print out at different stages to see which parts of your code are reached, etc...
What options do I have? What is the code/command/method for saving an image/picture through Unity3D and also on an Android tablet, so that the developer can see the images at various stages or processing?
You got some of it right in the comment section. You use EncodeToPNG() to convert it to a byte array then save it with File.WriteAllBytes. What you didn't get right is the path.
You use save to Application.persistentDataPath if you want that to work on Android, iOS and any platform. It is a must to save it in a folder instead of directly saving it to Application.persistentDataPath.
So that should be Application.persistentDataPath+"/yourfoldername/imagename.png".
Again, this: Application.persistentDataPath+"/imagename.png" will fail on iOS because you are trying to write directly to the sandbox which requires a folder like I mentioned above.
Finally, I noticed you are using + to concatenate paths. Avoid doing that and instead use Path.Combine.
string tempPath = Path.Combine(Application.persistentDataPath, "yourfoldername");
tempPath = Path.Combine(tempPath, "yourfoldername.png");
This will save the texture as Img1.png at a valid, public location on all devices.
File.WriteAllBytes(Path.Combine(Application.persistentDataPath + "Img1.png"), _texture.EncodeToPng());

Why won't some Tiff files load with TiffLib.NET

I am having trouble loading Tiff files in C#. I downloaded some sample tiff files and was able to load them fine, however when I attempt to load any tiff files generated from PCI Geomatica or ArcGIS, the ReadRGBAImage call fails (returns false). Other than IMAGEWIDTH and IMAGELENGTH, all of the other tags I've tried to retrieve have returned null (eg. XRESOLUTION). Does anyone have any ideas as to why this is happening? The relevant code is below:
using (Tiff tif = Tiff.Open(fileName, "r"))
{
// Find the width and height of the image
FieldValue[] value = tif.GetField(TiffTag.IMAGEWIDTH);
int width = value[0].ToInt();
value = tif.GetField(TiffTag.IMAGELENGTH);
int height = value[0].ToInt();
// Read the image into the memory buffer
int[] raster = new int[height * width];
if (!tif.ReadRGBAImage(width, height, raster))
{
System.Windows.Forms.MessageBox.Show("Could not read image");
return null;
}
}
Thanks!
Without a file to reproduce the issue, I can be 100% sure, but it looks like your file can not be converted to RGBA raster with help of LibTiff.Net.
It's not an error, it's just you are using some not very popular flavor of TIFF. Some say TIFF is a Thousand of Incompatible File Formats. And there is certainly some truth in that statement.
The library can read (and decode!) your file. You can get decoded raster using ReadEncodedStrip and / or ReadScanline methods. The task of converting your raster to RGBA is left to you.
And don't forget that not every image can be converted to RGBA without loosing some of the image data.

MemoryStream (pdf) to Ghostscript to MemoryStream (jpg)

I did see "PDF to Image using GhostScript. No image file has to be created", but that only (sort of) answered half my question. Is it possible to use GhostScriptSharp (or the regular GhostScript dll) to convert a pdf in a MemoryStream to a jpg in a MemoryStream? I speak of a dynamically filled in pdf form with iTextSharp which I am already directing to a MemoryStream to save to a database or stream to a http response, and I'd really love to avoid saving to a file (and subsequent cleanup) if I can.
The sole answer in the answer I referenced claimed that one has to go down to the GhostScript dll to do the latter part, but it was obvious I would need to do a good bit of leg-work to figure out what that meant. Does anyone have a good resource that could help me on this journey?
The thing is that the PDF language, unlike the PostScript language, inherently requires random access to the file. If you provide PDF directly to Standard Input or via PIPE, Ghostscript will copy it to a temporary file before interpreting the PDF. So, there is no point of passing PDF as MemoryStream (or byte array) as it will anyway end up on the disk before it is interpreted.
Take a look at the Ghostscript.NET and it's GhostscriptRasterizer sample for the 'in-memory' output.
Ghostscript.Net is a wrapper to the Ghostscript dll. It now can take a stream object and can return an image that can be saved to an stream. Here is an example that I used on as ASP page to generate PDF's from a memory stream. I haven't completely figured out the best way to handle the ghostscript dll and where to locate it on the server.
void PDFToImage(MemoryStream inputMS, int dpi)
{
GhostscriptRasterizer rasterizer = null;
GhostscriptVersionInfo version = new GhostscriptVersionInfo(
new Version(0, 0, 0), #"C:\PathToDll\gsdll32.dll",
string.Empty, GhostscriptLicense.GPL);
using (rasterizer = new GhostscriptRasterizer())
{
rasterizer.Open(inputMS, version, false);
for (int i = 1; i <= rasterizer.PageCount; i++)
{
using (MemoryStream ms = new MemoryStream())
{
Image img = rasterizer.GetPage(dpi, dpi, i);
img.Save(ms, ImageFormat.Jpeg);
ms.Close();
AspImage newPage = new AspImage();
newPage.ImageUrl = "data:image/png;base64," + Convert.ToBase64String((byte[])ms.ToArray());
Document1Image.Controls.Add(newPage);
}
}
rasterizer.Close();
}
}

c# PDF to Bmp for free

I am writing a program that uses OCR (tessnet2) to scan an image file and extract certain information. This was easy before I found out that I was going to be scanning attachments of PDFs from an Exchange server.
The first problem I am working on is how to convert my PDFs to BMP files. From what I can tell so far of TessNet2, it can only read in image files - specifically BMP. So I am now tasked with converting a PDF of indeterminate size (2 - 15 pages) to BMP image. After that is done I can easily scan each image using the code I have built already with TessNet2.
I have seen things using Ghostscript to do this task - i'm just wondering if there was another free solution or if one of you fine humans could give me a crash course on how to do this using Ghostscript.
You can use ImageMagick too. And it's totally free! No trial or payment.
Just download the ImageMagick .exe from here. Install it and download the NuGet file in here.
There is the code! Hope I helped! (even though the question was made 6 years ago...)
Procedure:
using ImageMagick;
public void PDFToBMP(string output)
{
MagickReadSettings settings = new MagickReadSettings();
// Settings the density to 500 dpi will create an image with a better quality
settings.Density = new Density(500);
string[] files= GetFiles();
foreach (string file in files)
{
string fichwithout = Path.GetFileNameWithoutExtension(file);
string path = Path.Combine(output, fichwithout);
using (MagickImageCollection images = new MagickImageCollection())
{
images.Read(fich);
foreach (MagickImage image in images)
{
settings.Height = image.Height;
settings.Width = image.Width;
image.Format = MagickFormat.Bmp; //if you want to do other formats of image, just change the extension here!
image.Write(path + ".bmp"); //and here!
}
}
}
}
Function GetFiles():
public string[] GetFiles()
{
if (!Directory.Exists(#"your\path"))
{
Directory.CreateDirectory(#"your\path");
}
DirectoryInfo dirInfo = new DirectoryInfo(#"your\path");
FileInfo[] fileInfos = dirInfo.GetFiles();
ArrayList list = new ArrayList();
foreach (FileInfo info in fileInfos)
{
if(info.Name != file)
{
// HACK: Just skip the protected samples file...
if (info.Name.IndexOf("protected") == -1)
list.Add(info.FullName);
}
}
return (string[])list.ToArray(typeof(string));
}
Found a CodeProject article on converting PDFs to Images:
http://www.codeproject.com/Articles/57100/Simple-and-Free-PDF-to-Image-Conversion
I recognize this is a very old question, but it is an ongoing problem. If you are targeting .NET 6 or later, I hope you would take a look at my library Melville.PDF.
Melville.Pdf is a MIT-Licensed C# implementation of a PDF renderer. I hope this serves a need that I have felt for some time.
If you are trying to get text out of PDF documents, render + OCR may be the hard way arround. Some PDF files are just a thin wrapper around image objects, but many actually have text inside of them. Melville.PDF does not do text extraction (yet) but it might be an easier way to get text out of some files.

C# import of Adobe Illustrator (.AI) files render to Bitmap?

Anyone knows how to load a .AI file (Adobe Illustrator) and then rasterize/render the vectors into a Bitmap so I could generate eg. a JPG or PNG from it?
I would like to produce thumbnails + render the big version with transparent background in PNG if possible.
Ofcause its "possible" if you know the specs of the .AI, but has anyone any knowledge or code to share for a start? or perhaps just a link to some components?
C# .NET please :o)
Code is most interesting as I know nothing about reading vector points and drawing splines.
Well, if Gregory is right that ai files are pdf-compatible, and you are okay with using GPL code, there is a project called GhostscriptSharp on github that is a .NET interface to the Ghostscript engine that can render PDF.
With the newer AI versions, you should be able to convert from PDF to image. There are plenty of libraries that do this that are cheap, so I would choose buy over build on this one. If you need to convert the older AI files, all bets are off. I am not sure what format they were in.
private void btnGetAIThumb_Click(object sender, EventArgs e)
{
Illustrator.Application app = new Illustrator.Application();
Illustrator.Document doc = app.Open(#"F:/AI_Prog/2009Calendar.ai", Illustrator.AiDocumentColorSpace.aiDocumentRGBColor, null);
doc.Export(#"F:/AI_Prog/2009Calendar.png",Illustrator.AiExportType.aiPNG24, null);
doc.Close(Illustrator.AiSaveOptions.aiDoNotSaveChanges);
doc = null; //
}
Illustrator.AiExportType.aiPNG24 can be set as JPEG,GIF,Flash,SVG and Photoshop format.
I Have Tested that with Pdf2Png and it worked fine with both .PDF and .ai files.
But I don't know how it will work with transparents.
string pdf_filename = #"C:\test.ai";
//string pdf_filename = #"C:\test.pdf";
string png_filename = "converted.png";
List<string> errors = cs_pdf_to_image.Pdf2Image.Convert(pdf_filename, png_filename);

Categories