Extract image from PDF using itextsharp

Extract image from PDF using itextsharp - c#

I am trying to extract all the images from a pdf using itextsharp but can't seem to overcome this one hurdle.
The error occures on the line System.Drawing.Image ImgPDF = System.Drawing.Image.FromStream(MS); giving an error of "Parameter is not valid".
I think it works when the image is a bitmap but not of any other format.
I have this following code - sorry for the length;
private void Form1_Load(object sender, EventArgs e)
{
FileStream fs = File.OpenRead(#"reader.pdf");
byte[] data = new byte[fs.Length];
fs.Read(data, 0, (int)fs.Length);
List<System.Drawing.Image> ImgList = new List<System.Drawing.Image>();
iTextSharp.text.pdf.RandomAccessFileOrArray RAFObj = null;
iTextSharp.text.pdf.PdfReader PDFReaderObj = null;
iTextSharp.text.pdf.PdfObject PDFObj = null;
iTextSharp.text.pdf.PdfStream PDFStremObj = null;
try
{
RAFObj = new iTextSharp.text.pdf.RandomAccessFileOrArray(data);
PDFReaderObj = new iTextSharp.text.pdf.PdfReader(RAFObj, null);
for (int i = 0; i <= PDFReaderObj.XrefSize - 1; i++)
{
PDFObj = PDFReaderObj.GetPdfObject(i);
if ((PDFObj != null) && PDFObj.IsStream())
{
PDFStremObj = (iTextSharp.text.pdf.PdfStream)PDFObj;
iTextSharp.text.pdf.PdfObject subtype = PDFStremObj.Get(iTextSharp.text.pdf.PdfName.SUBTYPE);
if ((subtype != null) && subtype.ToString() == iTextSharp.text.pdf.PdfName.IMAGE.ToString())
{
byte[] bytes = iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw((iTextSharp.text.pdf.PRStream)PDFStremObj);
if ((bytes != null))
{
try
{
System.IO.MemoryStream MS = new System.IO.MemoryStream(bytes);
MS.Position = 0;
System.Drawing.Image ImgPDF = System.Drawing.Image.FromStream(MS);
ImgList.Add(ImgPDF);
}
catch (Exception)
{
}
}
}
}
}
PDFReaderObj.Close();
}
catch (Exception ex)
{
throw new Exception(ex.Message);
}
} //Form1_Load

Resolved...
Even I got the same exception of "Parameter is not valid" and after so much of
work with the help of the link provided by der_chirurg
(http://kuujinbo.info/iTextSharp/CCITTFaxDecodeExtract.aspx ) I resolved it
and following is the code:
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
using iTextSharp.text.pdf.parser;
using Dotnet = System.Drawing.Image;
using iTextSharp.text.pdf;
namespace PDF_Parsing
{
partial class PDF_ImgExtraction
{
string imgPath;
private void ExtractImage(string pdfFile)
{
PdfReader pdfReader = new PdfReader(files[fileIndex]);
for (int pageNumber = 1; pageNumber <= pdfReader.NumberOfPages; pageNumber++)
{
PdfReader pdf = new PdfReader(pdfFile);
PdfDictionary pg = pdf.GetPageN(pageNumber);
PdfDictionary res = (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
PdfDictionary xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
foreach (PdfName name in xobj.Keys)
{
PdfObject obj = xobj.Get(name);
if (obj.IsIndirect())
{
PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
string width = tg.Get(PdfName.WIDTH).ToString();
string height = tg.Get(PdfName.HEIGHT).ToString();
ImageRenderInfo imgRI = ImageRenderInfo.CreateForXObject(new Matrix(float.Parse(width), float.Parse(height)), (PRIndirectReference)obj, tg);
RenderImage(imgRI);
}
}
}
}
private void RenderImage(ImageRenderInfo renderInfo)
{
PdfImageObject image = renderInfo.GetImage();
using (Dotnet dotnetImg = image.GetDrawingImage())
{
if (dotnetImg != null)
{
using (MemoryStream ms = new MemoryStream())
{
dotnetImg.Save(ms, ImageFormat.Tiff);
Bitmap d = new Bitmap(dotnetImg);
d.Save(imgPath);
}
}
}
}
}
}

You need to check the stream's /Filter to see what image format a given image uses. It may be a standard image format:
DCTDecode (jpeg)
JPXDecode (jpeg 2000)
JBIG2Decode (jbig is a B&W only format)
CCITTFaxDecode (fax format, PDF supports group 3 and 4)
Other than that, you'll need to get the raw bytes (as you are), and build an image using the image stream's width, height, bits per component, number of color components (could be CMYK, indexed, RGB, or Something Weird), and a few others, as defined in section 8.9 of the ISO PDF SPECIFICATION (available for free).
So in some cases your code will work, but in others, it'll fail with the exception you mentioned.
PS: When you have an exception, PLEASE include the stack trace every single time. Pretty please with sugar on top?

Works for me like this, using these two methods:
public static List<System.Drawing.Image> ExtractImagesFromPDF(byte[] bytes)
{
var imgs = new List<System.Drawing.Image>();
var pdf = new PdfReader(bytes);
try
{
for (int pageNumber = 1; pageNumber <= pdf.NumberOfPages; pageNumber++)
{
PdfDictionary pg = pdf.GetPageN(pageNumber);
List<PdfObject> objs = FindImageInPDFDictionary(pg);
foreach (var obj in objs)
{
if (obj != null)
{
int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(System.Globalization.CultureInfo.InvariantCulture));
PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
PdfStream pdfStrem = (PdfStream)pdfObj;
var pdfImage = new PdfImageObject((PRStream)pdfStrem);
var img = pdfImage.GetDrawingImage();
imgs.Add(img);
}
}
}
}
finally
{
pdf.Close();
}
return imgs;
}
private static List<PdfObject> FindImageInPDFDictionary(PdfDictionary pg)
{
var res = (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
var xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
var pdfObgs = new List<PdfObject>();
if (xobj != null)
{
foreach (PdfName name in xobj.Keys)
{
PdfObject obj = xobj.Get(name);
if (obj.IsIndirect())
{
var tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
var type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
if (PdfName.IMAGE.Equals(type)) // image at the root of the pdf
{
pdfObgs.Add(obj);
}
else if (PdfName.FORM.Equals(type)) // image inside a form
{
FindImageInPDFDictionary(tg).ForEach(o => pdfObgs.Add(o));
}
else if (PdfName.GROUP.Equals(type)) // image inside a group
{
FindImageInPDFDictionary(tg).ForEach(o => pdfObgs.Add(o));
}
}
}
}
return pdfObgs;
}

In newer version of iTextSharp, the 1st parameter of ImageRenderInfo.CreateForXObject is not Matrix anymore but GraphicsState. #der_chirurg's approach should work. I tested myself with the information from the following link and it worked beautifully:
http://www.thevalvepage.com/swmonkey/2014/11/26/extract-images-from-pdf-files-using-itextsharp/

To extract all Images on all Pages, it is not necessary to implement different filters. iTextSharp has an Image Renderer, which saves all Images in their original image type.
Just do the following found here: http://kuujinbo.info/iTextSharp/CCITTFaxDecodeExtract.aspx You don't need to implement HttpHandler...

I added library on github which, extract images in PDF and compress them.
Could be useful, when you are going to start play with very powerful library ITextSharp.
Here the link: https://github.com/rock-walker/PdfCompression

This works for me and I think it's a simple solution:
Write a custom RenderListener and implement its RenderImage method, something like this
public void RenderImage(ImageRenderInfo info)
{
PdfImageObject image = info.GetImage();
Parser.Matrix matrix = info.GetImageCTM();
var fileType = image.GetFileType();
ImageFormat format;
switch (fileType)
{//you may add more types here
case "jpg":
case "jpeg":
format = ImageFormat.Jpeg;
break;
case "pnt":
format = ImageFormat.Png;
break;
case "bmp":
format = ImageFormat.Bmp;
break;
case "tiff":
format = ImageFormat.Tiff;
break;
case "gif":
format = ImageFormat.Gif;
break;
default:
format = ImageFormat.Jpeg;
break;
}
var pic = image.GetDrawingImage();
var x = matrix[Parser.Matrix.I31];
var y = matrix[Parser.Matrix.I32];
var width = matrix[Parser.Matrix.I11];
var height = matrix[Parser.Matrix.I22];
if (x < <some value> && y < <some value>)
{
return;//ignore these images
}
pic.Save(<path and name>, format);
}

I have used this library in the past without any problems.
http://www.winnovative-software.com/PdfImgExtractor.aspx
private void btnExtractImages_Click(object sender, EventArgs e)
{
if (pdfFileTextBox.Text.Trim().Equals(String.Empty))
{
MessageBox.Show("Please choose a source PDF file", "Choose PDF file", MessageBoxButtons.OK);
return;
}
// the source pdf file
string pdfFileName = pdfFileTextBox.Text.Trim();
// start page number
int startPageNumber = int.Parse(textBoxStartPage.Text.Trim());
// end page number
// when it is 0 the extraction will continue up to the end of document
int endPageNumber = 0;
if (textBoxEndPage.Text.Trim() != String.Empty)
endPageNumber = int.Parse(textBoxEndPage.Text.Trim());
// create the PDF images extractor object
PdfImagesExtractor pdfImagesExtractor = new PdfImagesExtractor();
pdfImagesExtractor.LicenseKey = "31FAUEJHUEBQRl5AUENBXkFCXklJSUlQQA==";
// the demo output directory
string outputDirectory = Path.Combine(Application.StartupPath, #"DemoFiles\Output");
Cursor = Cursors.WaitCursor;
// set the handler to be called when an image was extracted
pdfImagesExtractor.ImageExtractedEvent += pdfImagesExtractor_ImageExtractedEvent;
try
{
// start images counting
imageIndex = 0;
// call the images extractor to raise the ImageExtractedEvent event when an images is extracted from a PDF page
// the pdfImagesExtractor_ImageExtractedEvent handler below will be executed for each extracted image
pdfImagesExtractor.ExtractImagesInEvent(pdfFileName, startPageNumber, endPageNumber);
// Alternatively you can use the ExtractImages() and ExtractImagesToFile() methods
// to extracted the images from a PDF document in memory or to image files in a directory
// uncomment the line below to extract the images to an array of ExtractedImage objects
//ExtractedImage[] pdfPageImages = pdfImagesExtractor.ExtractImages(pdfFileName, startPageNumber, endPageNumber);
// uncomment the lines below to extract the images to image files in a directory
//string outputDirectory = System.IO.Path.Combine(Application.StartupPath, #"DemoFiles\Output");
//pdfImagesExtractor.ExtractImagesToFile(pdfFileName, startPageNumber, endPageNumber, outputDirectory, "pdfimage");
}
catch (Exception ex)
{
// The extraction failed
MessageBox.Show(String.Format("An error occurred. {0}", ex.Message), "Error");
return;
}
finally
{
// uninstall the event handler
pdfImagesExtractor.ImageExtractedEvent -= pdfImagesExtractor_ImageExtractedEvent;
Cursor = Cursors.Arrow;
}
try
{
System.Diagnostics.Process.Start(outputDirectory);
}
catch (Exception ex)
{
MessageBox.Show(string.Format("Cannot open output folder. {0}", ex.Message));
return;
}
}
/// <summary>
/// The ImageExtractedEvent event handler called after an image was extracted from a PDF page.
/// The event is raised when the ExtractImagesInEvent() method is used
/// </summary>
/// <param name="args">The handler argument containing the extracted image and the PDF page number</param>
void pdfImagesExtractor_ImageExtractedEvent(ImageExtractedEventArgs args)
{
// get the image object and page number from even handler argument
Image pdfPageImageObj = args.ExtractedImage.ImageObject;
int pageNumber = args.ExtractedImage.PageNumber;
// save the extracted image to a PNG file
string outputPageImage = Path.Combine(Application.StartupPath, #"DemoFiles\Output",
"pdfimage_" + pageNumber.ToString() + "_" + imageIndex++ + ".png");
pdfPageImageObj.Save(outputPageImage, ImageFormat.Png);
args.ExtractedImage.Dispose();
}

Related

How to scan using NTwain

I am new to learning about scanners and tried a bunch of packages but ended up on NTwain a NuGet library. I am struggling on how to start my scanner and save the images using the api. How can I understand it? Also here's what I have so far.
Edit
I found out how to enable the scan and save it but for some reason I can't get both sides of the paper? I don't know if my encoder is wrong trying to save it as a multi-tiff or its something you have to set using NTwain.
Edit 2
I figured it out. I didn't know scanner see double sided as "Duplex" -> myDS.Capabilities.CapDuplexEnabled.SetValue(BoolType.True);
public static void GetScanner()
{
// Create appId
var appId = TWIdentity.CreateFromAssembly(DataGroups.Image, Assembly.GetExecutingAssembly());
// Attach
var session = new TwainSession(appId);
List<Image> scannedImages = new List<Image>();
session.TransferReady += (s, e) =>
{
Debug.Print("TransferReady is a go.");
};
session.DataTransferred += (s, e) =>
{
if (e.NativeData != IntPtr.Zero)
{
// Handle image data
if (e.NativeData != IntPtr.Zero)
{
var stream = e.GetNativeImageStream();
if (stream != null)
{
//Save Image to list
scannedImages.Add(Image.FromStream(stream));
}
}
}
};
// Open it
session.Open();
// Open the first source found
DataSource myDS = session.FirstOrDefault();
myDS.Open();
myDS.Capabilities.CapDuplexEnabled.SetValue(BoolType.True);
// Start Scan
myDS.Enable(SourceEnableMode.NoUI, false, IntPtr.Zero);
//Close Session
myDS.Close();
// Save Images to specific folder as tiffs
int n = 0;
foreach(Image image in scannedImages)
{
//Get the codec for tiff files
ImageCodecInfo info = null;
foreach (ImageCodecInfo ice in ImageCodecInfo.GetImageEncoders())
if (ice.MimeType == "image/tiff")
info = ice;
//Save as Multi-Page Tiff
System.Drawing.Imaging.Encoder enc = System.Drawing.Imaging.Encoder.SaveFlag;
EncoderParameters ep = new EncoderParameters(1);
ep.Param[0] = new EncoderParameter(enc, (long)EncoderValue.MultiFrame);
//Construct save path
var saveFolderPath = #"C:\Projects\SavingMethods\SavingMethods\ScannedImages\";
string fileName = "Testfile" + n + ".tiff";
var completedFilePath = Path.Combine(saveFolderPath, fileName);
//Save Image
image.Save(completedFilePath, info, ep);
n++;
}
}

I ended up figuring it out by myself but would like to thank ckuri for the comment as his link did help me immensely.
public static void GetScanner()
{
// Create appId
var appId = TWIdentity.CreateFromAssembly(DataGroups.Image,
Assembly.GetExecutingAssembly());
// Attach
var session = new TwainSession(appId);
List<Image> scannedImages = new List<Image>();
session.TransferReady += (s, e) =>
{
Debug.Print("TransferReady is a go.");
};
session.DataTransferred += (s, e) =>
{
if (e.NativeData != IntPtr.Zero)
{
// Handle image data
if (e.NativeData != IntPtr.Zero)
{
var stream = e.GetNativeImageStream();
if (stream != null)
{
//Save Image to list
scannedImages.Add(Image.FromStream(stream));
}
}
}
};
// Open it
session.Open();
// Open the first source found
DataSource myDS = session.FirstOrDefault();
myDS.Open();
myDS.Capabilities.CapDuplexEnabled.SetValue(BoolType.True);
// Start Scan
myDS.Enable(SourceEnableMode.NoUI, false, IntPtr.Zero);
//Close Session
myDS.Close();
// Save Images to specific folder as tiffs
int n = 0;
foreach(Image image in scannedImages)
{
//Get the codec for tiff files
ImageCodecInfo info = null;
foreach (ImageCodecInfo ice in ImageCodecInfo.GetImageEncoders())
if (ice.MimeType == "image/tiff")
info = ice;
//Save as Multi-Page Tiff
System.Drawing.Imaging.Encoder enc = System.Drawing.Imaging.Encoder.SaveFlag;
EncoderParameters ep = new EncoderParameters(1);
ep.Param[0] = new EncoderParameter(enc, (long)EncoderValue.MultiFrame);
//Construct save path
var saveFolderPath = #"C:\Projects\SavingMethods\SavingMethods\ScannedImages\";
string fileName = "Testfile" + n + ".tiff";
var completedFilePath = Path.Combine(saveFolderPath, fileName);
//Save Image
image.Save(completedFilePath, info, ep);
n++;
}
}

Use PDFsharp-wpf to convert txt files to pdf in C#

I am having issues converting my files to pdf. I am completely new to C# (coming from java) and I am trying to build an application that first prints out a log into a text file then converts it to a PDF. If you know a good resource or example please point me in the right direction without judgement. Thank you!
createPDF(myFileString);
void createPDF(string txtFile)
{
string line = null;
System.IO.TextReader readFile = new StreamReader(myFileString);
int yPoint = 0;
PdfDocument pdf = new PdfDocument();
pdf.Info.Title = "TXT to PDF";
PdfPage pdfPage = pdf.AddPage();
XGraphics graph = XGraphics.FromPdfPage(pdfPage);
while (true)
{
line = readFile.ReadLine();
if (line == null)
{
break; // TODO: might not be correct. Was : Exit While
}
else
{
yPoint = yPoint + 40;
}
}
string pdfFilename = "txttopdf.pdf";
pdf.Save(pdfFilename);
readFile.Close();
readFile = null;
Process.Start(pdfFilename);
}

Extract Image and its name from pdf using iTextSharp

I am using iTextSharp c# to extract images and its name from catalog pdf. I Am able to extract images from pdf, but struggling with extracting its corresponding image name as per the attached screenshot and save the file with that name. Please find the code below and let me know your suggestions.
Sample PDF: https://docdro.id/PwBsNR9
Code:
private static List<System.Drawing.Image> ExtractImages(String PDFSourcePath)
{
List<System.Drawing.Image> ImgList = new List<System.Drawing.Image>();
iTextSharp.text.pdf.RandomAccessFileOrArray RAFObj = null;
iTextSharp.text.pdf.PdfReader PDFReaderObj = null;
iTextSharp.text.pdf.PdfObject PDFObj = null;
iTextSharp.text.pdf.PdfStream PDFStremObj = null;
try
{
RAFObj = new iTextSharp.text.pdf.RandomAccessFileOrArray(PDFSourcePath);
PDFReaderObj = new iTextSharp.text.pdf.PdfReader(RAFObj, null);
for (int i = 0; i <= PDFReaderObj.XrefSize - 1; i++)
{
PDFObj = PDFReaderObj.GetPdfObject(i);
if ((PDFObj != null) && PDFObj.IsStream())
{
PDFStremObj = (iTextSharp.text.pdf.PdfStream)PDFObj;
iTextSharp.text.pdf.PdfObject subtype = PDFStremObj.Get(iTextSharp.text.pdf.PdfName.SUBTYPE);
if ((subtype != null) && subtype.ToString() == iTextSharp.text.pdf.PdfName.IMAGE.ToString())
{
}
if ((subtype != null) && subtype.ToString() == iTextSharp.text.pdf.PdfName.IMAGE.ToString())
{
try
{
iTextSharp.text.pdf.parser.PdfImageObject PdfImageObj =
new iTextSharp.text.pdf.parser.PdfImageObject((iTextSharp.text.pdf.PRStream)PDFStremObj);
System.Drawing.Image ImgPDF = PdfImageObj.GetDrawingImage();
ImgList.Add(ImgPDF);
}
catch (Exception)
{
}
}
}
}
PDFReaderObj.Close();
}
catch (Exception ex)
{
throw new Exception(ex.Message);
}
return ImgList;
}

Unfortunately the example PDF is not tagged. Thus, one has to otherwise try and associate title text and image, either by analyzing the location in respect to each other or by exploiting a pattern in the content streams.
In the case at hand analyzing the location in respect to each other is feasible as the title always is (at least partially) drawn on the matching image or is the text right beneath it. Thus, one could in a first pass extract the text with position from a page and in a second one the images, at the same time looking for a title in the previously extracted text in the image area or right beneath. Alternatively one could first extract images with position and size and then extract the text in these areas.
But there also is a certain pattern in the content streams: The titel is always drawn in a single text drawing instruction right after the corresponding image is drawn. Thus, one can also go ahead and in one pass extract images and the next drawn text as associated title.
Either approach can be implemented using the iText parser API. For example in case of the latter approach as follows: first, one implements a render listener that behaves as described, i.e. saves images and the following text:
internal class ImageWithTitleRenderListener : IRenderListener
{
int imageNumber = 0;
String format;
bool expectingTitle = false;
public ImageWithTitleRenderListener(String format)
{
this.format = format;
}
public void BeginTextBlock()
{ }
public void EndTextBlock()
{ }
public void RenderText(TextRenderInfo renderInfo)
{
if (expectingTitle)
{
expectingTitle = false;
File.WriteAllText(string.Format(format, imageNumber, "txt"), renderInfo.GetText());
}
}
public void RenderImage(ImageRenderInfo renderInfo)
{
imageNumber++;
expectingTitle = true;
PdfImageObject imageObject = renderInfo.GetImage();
if (imageObject == null)
{
Console.WriteLine("Image {0} could not be read.", imageNumber);
}
else
{
File.WriteAllBytes(string.Format(format, imageNumber, imageObject.GetFileType()), imageObject.GetImageAsBytes());
}
}
}
Then one parses the document pages using that render listener:
using (PdfReader reader = new PdfReader(#"EVERMOTION ARCHMODELS VOL.78.pdf"))
{
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
ImageWithTitleRenderListener listener = new ImageWithTitleRenderListener(#"EVERMOTION ARCHMODELS VOL.78-{0:D3}.{1}");
for (var i = 1; i <= reader.NumberOfPages; i++)
{
parser.ProcessContent(i, listener);
}
}

I hope this would help.
I am doing this type of thing but if this would help.
// existing pdf path
PdfReader reader = new PdfReader(path);
PRStream pst;
PdfImageObject pio;
PdfObject po;
// number of objects in pdf document
int n = reader.XrefSize;
//FileStream fs = null;
// set image file location
//String path = "E:/";
for (int i = 0; i < n; i++)
{
// get the object at the index i in the objects collection
po = reader.GetPdfObject(i);
// object not found so continue
if (po == null || !po.IsStream())
continue;
//cast object to stream
pst = (PRStream)po;
//get the object type
PdfObject type = pst.Get(PdfName.SUBTYPE);
//check if the object is the image type object
if (type != null && type.ToString().Equals(PdfName.IMAGE.ToString()))
{
//get the image
pio = new PdfImageObject(pst);
// fs = new FileStream(path + "image" + i + ".jpg", FileMode.Create);
//read bytes of image in to an array
byte[] imgdata = pio.GetImageAsBytes();
try
{
Stream stream = new MemoryStream(imgdata);
FileStream fs = stream as FileStream;
if (fs != null) Console.WriteLine(fs.Name);
}
catch
{
}
}
}
Now you can save your stream.
public void SaveStreamToFile(string fileFullPath, Stream stream)
{
if (stream.Length == 0) return;
// Create a FileStream object to write a stream to a file
using (FileStream fileStream = System.IO.File.Create(fileFullPath, (int)stream.Length))
{
// Fill the bytes[] array with the stream data
byte[] bytesInStream = new byte[stream.Length];
stream.Read(bytesInStream, 0, (int)bytesInStream.Length);
// Use FileStream object to write to the specified file
fileStream.Write(bytesInStream, 0, bytesInStream.Length);
}
}

Converting InkCanvas Strokes to a Byte Array and back again

I am working on a program which converts the inkcanvas strokes to a byte array for encryption and then saves it in a txt file. Essentially I need to convert a byte array to inkcanvas strokes. I have the first half of the code done (which converts the inkcanvas strokes to a byte array):
private byte[] InkCanvasToByte()
{
using (MemoryStream ms = new MemoryStream())
{
if(myInkCanvas.Strokes.Count > 0)
{
myInkCanvas.Strokes.Save(ms, true);
byte[] unencryptedSignature = ms.ToArray();
return unencryptedSignature;
}
else
{
return null;
}
}
}
But I need help writing a method to convert the byte array into inkcanvas strokes in order to convert the inkcanvas strokes to a jpg.
So far I have created a method which opens the byte array file and writes it to a byte array variable:
private void ReadByteArrayFromFile()
{
string Chosen_File = "";
Microsoft.Win32.OpenFileDialog ofd = new Microsoft.Win32.OpenFileDialog();
ofd.Filter = "All Files (*.*)|*.*";
ofd.FilterIndex = 1;
ofd.Multiselect = false;
bool? userClickedOK = ofd.ShowDialog();
if (userClickedOK == true)
{
Chosen_File = ofd.FileName;
}
byte[] bytesFromFile = File.ReadAllBytes(Chosen_File);
}
Now all I need to do is convert that byte array back into an image, either through inkcanvas strokes. I'll update this post with a solution if I find one!
EDIT: Hmm. I'm using the code from that link and I get: "The input stream is not a valid binary format. The Starting contents (in byes) are: 00-FB-03-03-06-48-11-45-35-46-35-11-00-00-80-3F-1F ..."
The code I'm using is:
private void ReadByteArrayFromFile(string Chosen_File)
{
byte[] bytesFromFile = File.ReadAllBytes(Chosen_File);
try
{
BinaryFormatter bf = new BinaryFormatter();
MemoryStream ms = new MemoryStream(bytesFromFile);
MyCustomStrokes customStrokes = bf.Deserialize(ms) as MyCustomStrokes;
for(int i = 0; i < customStrokes.StrokeCollection.Length; i++)
{
if(customStrokes.StrokeCollection[i] != null)
{
StylusPointCollection stylusCollection = new
StylusPointCollection(customStrokes.StrokeCollection[i]);
Stroke stroke = new Stroke(stylusCollection);
StrokeCollection strokes = new StrokeCollection();
strokes.Add(stroke);
this.MyInkPresenter.Strokes.Add(strokes);
}
}
}
catch (Exception ex)
{
System.Windows.MessageBox.Show(ex.Message);
}
}
private void DecryptByteArray(byte[] encryptedArray)
{
}
}
[Serializable]
public sealed class MyCustomStrokes
{
public MyCustomStrokes() { }
/// <SUMMARY>
/// The first index is for the stroke no.
/// The second index is for the keep the 2D point of the Stroke.
/// </SUMMARY>
public Point[][] StrokeCollection;
}
}

My problem was that I didn't serialize the output to the saved file and thus the when I loaded that file deserializing it tripped an error. Here is the correct code:
private void SaveByteArrayToFile(byte[] byteArray)
{
var dialog = new System.Windows.Forms.FolderBrowserDialog();
string filepath = "";
if (dialog.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
filepath += dialog.SelectedPath;
System.Windows.MessageBox.Show(filepath);
}
filepath += "Signature.txt";
MyCustomStrokes customStrokes = new MyCustomStrokes();
customStrokes.StrokeCollection = new Point[myInkCanvas.Strokes.Count][];
for (int i = 0; i < myInkCanvas.Strokes.Count; i++)
{
customStrokes.StrokeCollection[i] =
new Point[this.myInkCanvas.Strokes[i].StylusPoints.Count];
for (int j = 0; j < myInkCanvas.Strokes[i].StylusPoints.Count; j++)
{
customStrokes.StrokeCollection[i][j] = new Point();
customStrokes.StrokeCollection[i][j].X =
myInkCanvas.Strokes[i].StylusPoints[j].X;
customStrokes.StrokeCollection[i][j].Y =
myInkCanvas.Strokes[i].StylusPoints[j].Y;
}
}
MemoryStream ms = new MemoryStream();
BinaryFormatter bf = new BinaryFormatter();
bf.Serialize(ms, customStrokes);
File.WriteAllBytes(filepath, Encrypt(ms.GetBuffer()));
}
private void ReadByteArrayFromFile(string Chosen_File)
{
byte[] bytesFromFile = File.ReadAllBytes(Chosen_File);
byte[] decryptedBytes = Decrypt(bytesFromFile);
try
{
BinaryFormatter bf = new BinaryFormatter();
MemoryStream ms = new MemoryStream(decryptedBytes);
MyCustomStrokes customStrokes = bf.Deserialize(ms) as MyCustomStrokes;
for(int i = 0; i < customStrokes.StrokeCollection.Length; i++)
{
if(customStrokes.StrokeCollection[i] != null)
{
StylusPointCollection stylusCollection = new
StylusPointCollection(customStrokes.StrokeCollection[i]);
Stroke stroke = new Stroke(stylusCollection);
StrokeCollection strokes = new StrokeCollection();
strokes.Add(stroke);
this.MyInkPresenter.Strokes.Add(strokes);
}
}
}
catch (Exception ex)
{
System.Windows.MessageBox.Show(ex.Message);
}
}
[Serializable]
public sealed class MyCustomStrokes
{
public MyCustomStrokes() { }
/// <SUMMARY>
/// The first index is for the stroke no.
/// The second index is for the keep the 2D point of the Stroke.
/// </SUMMARY>
public Point[][] StrokeCollection;
}

Pdf Merge Issue in ItextSharp (After Merging Pdfs don't retain their Values)

We are trying to merge three PDFs using ITextSharp. The problem is after merging we are able to get Data from the first PDF only, while the other two PDFs don't retain their values.
All these PDFs have the same structure (i.e. they use the same templates with different data), so my assumption is they are having same fields (AcroFields) which may be creating this problem while merging.
Here is the merge code :
public void MergeFiles(string destinationFile, string[] sourceFiles)
{
try
{
int f = 0;
string outFile = destinationFile;
Document document = null;
PdfCopy writer = null;
while (f < sourceFiles.Length)
{
// Create a reader for a certain document
PdfReader reader = new PdfReader(sourceFiles[f]);
// Retrieve the total number of pages
int n = reader.NumberOfPages;
//Trace.WriteLine("There are " + n + " pages in " + sourceFiles[f]);
if (f == 0)
{
// Step 1: Creation of a document-object
document = new Document(reader.GetPageSizeWithRotation(1));
// Step 2: Create a writer that listens to the document
writer = new PdfCopy(document, new FileStream(outFile, FileMode.Create));
// Step 3: Open the document
document.Open();
}
// Step 4: Add content
PdfImportedPage page;
for (int i = 0; i < n; )
{
++i;
if (writer != null)
{
page = writer.GetImportedPage(reader, i);
writer.AddPage(page);
}
}
PRAcroForm form = reader.AcroForm;
if (form != null)
{
if (writer != null)
{
writer.CopyAcroForm(reader);
}
}
f++;
}
// Step 5: Close the document
if (document != null)
{
document.Close();
}
}
catch (Exception)
{
//handle exception
}
}
This is called as follows :
string[] sourcenames = { #"D:\1.pdf", #"D:\2.pdf", #"D:\3.pdf" };
string destinationname = #"D:\pdf\mergeall\merge3.pdf";
MergeFiles(destinationname, sourcenames);

I figured it out myself after little searching...Following is the solution...
I have created the function to rename the Fields in the PDF,so that after merging the fields will be renamed.
private static int counter = 0;
private void renameFields(PdfReader pdfReader)
{
try
{
string prepend = String.Format("_{0}", counter++);
foreach (DictionaryEntry de in pdfReader.AcroFields.Fields)
{
pdfReader.AcroFields.RenameField(de.Key.ToString(), prepend + de.Key.ToString());
}
}
catch (Exception ex)
{
throw ex;
}
}
This function is called in "MergeFiles" function as follow...
// Create a reader for a certain document
PdfReader reader = new PdfReader(sourceFiles[f]);
renameFields(reader);
// Retrieve the total number of pages
int n = reader.NumberOfPages;

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extract image from PDF using itextsharp - c#

I added library on github which, extract images in PDF and compress them. Could be useful, when you are going to start play with very powerful library ITextSharp. Here the link: https://github.com/rock-walker/PdfCompression

Related

How to scan using NTwain

Use PDFsharp-wpf to convert txt files to pdf in C#

Extract Image and its name from pdf using iTextSharp

Converting InkCanvas Strokes to a Byte Array and back again

Pdf Merge Issue in ItextSharp (After Merging Pdfs don't retain their Values)

Categories

Resources