Looking for a way to convert iTextSharp.text.pdf.BarcodeQRCode to System.Drawing.Image
This is what I have so far...
public System.Drawing.Image GetQRCode(string content)
{
iTextSharp.text.pdf.BarcodeQRCode qrcode = new iTextSharp.text.pdf.BarcodeQRCode(content, 115, 115, null);
iTextSharp.text.Image img = qrcode.GetImage();
MemoryStream ms = new MemoryStream(img.OriginalData);
return System.Drawing.Image.FromStream(ms);
}
In line 3 above using img.OriginalData returns null
Using img.RawData on line 3 instead thows invalid parameter error on line 4.
I've googled some of the code samples on how to perform the thing you want and your code (the "OriginalData" approach) is basicaly the same: https://csharp.hotexamples.com/examples/iTextSharp.text.pdf/BarcodeQRCode/-/php-barcodeqrcode-class-examples.html .
However, I don't see how it could work. From my investigations of BarcodeQRCode#getImage it seems that OriginalData is not set while processing such a barcode, so it will always be null.
More than that, the code you mention belongs to iText 5, which is end of life and no longer maintained (with an exception of considerable security fixes), so it's recommended to update to iText 7.
As for iText 7, I do see how to achieve the same in Java, since barcode classes do have a createAwtImage method there. .NET, on the other hand, lacks such a functionality, so I'd day that one unfortunately couldn't do it in .NET.
There are some good reasons for that. iText's Images (and a barcode could be easily converted to an iText's Image object as shown here: https://kb.itextpdf.com/home/it7kb/faq/how-to-generate-2d-barcode-as-vector-image) represent a PDF's XObject. In PDF syntax, an image file (jpg, png, etc.) is an XObject with the raw image data stored inside. However, an XObject can also contain PDF syntaxt content (it is not just used for image files). So to render such a content one needs to process the data from PDF syntax to image syntax, which is not that easy. There are some means in Java's awt to do so, that's why it's implemented in Java. As for .NET, since there is no out-of-the-box means to convert PDF images to System.Drawing.Image, it was decided not to implement it.
To conclude, there is another iText product, pdfRender, which allows one to convert PDF files (and you could create a page just for a barcode) to images. Perhaps you might want to play with it: https://itextpdf.com/en/products/itext-7/convert-pdf-to-image-pdfrender
Related
I need to convert image files to PDF without using third party libraries in C#. The images can be in any format like (.jpg, .png, .jpeg, .tiff).
I am successfully able to do this with the help of itextsharp; here is the code.
string value = string.Empty;//value contains the data from a json file
List<string> sampleData;
public void convertdata()
{
//sampleData = Newtonsoft.Json.JsonConvert.DeserializeObject<List<string>>(value);
var jsonD = System.IO.File.ReadAllLines(#"json.txt");
sampleData = Newtonsoft.Json.JsonConvert.DeserializeObject<List<string>>(jsonD[0]);
Document document = new Document();
using (var stream = new FileStream("test111.pdf", FileMode.Create, FileAccess.Write, FileShare.None))
{
PdfWriter.GetInstance(document, stream);
document.Open();
foreach (var item in sampleData)
{
newdata = Convert.FromBase64String(item);
var image = iTextSharp.text.Image.GetInstance(newdata);
document.Add(image);
Console.WriteLine("Conversion done check folder");
}
document.Close();
}
But now I need to perform the same without using third party library.
I have searched the internet but I am unable to get something that can suggest a proper answer. All I am getting is to use it with either "itextsharp" or "PdfSharp" or with the "GhostScriptApi".
Would someone suggest a possible solution?
This is doable but not practical in the sense that it would very likely take way too much time for you to implement. The general procedure is:
Open the image file format
Either copy the encoded bytes verbatim to a stream in a PDF document you have created or decode the image data and re-encode it in a PDF stream (whether it's the former or latter depends on the image format)
Save the PDF
This looks easy (it's only three points after all :-)) but when you start to investigate you'll see that it's very complicated.
First of all you need to understand enough of the PDF specification to write a new PDF file from scratch, doing all of the right things. The PDF specification is way over 1000 pages by now; you don't need all of it but you need to support a good portion of it to write a proper PDF document.
Secondly you will need to understand every image file format you want to support. That by itself is not trivial (the TIFF file format for example is so broad that it's a nightmare to support a reasonable fraction of TIFF files out there). In some cases you'll be able to simply copy the bulk of an image file format into your PDF document (jpeg files fall in that category for example), that's a complication you want to support because uncompressing the JPEG file and then recompressing it in a PDF stream will cause quality loss.
So... possible? Yes. Plausible? No. Unless you have gotten lots and lots of time to complete this project.
The structure of the simpliest PDF document with one single page and one single image is the following:
- pdf header
- pdf document catalog
- pages info
- image
- image header
- image data
- page
- reference to image
- list of references to objects inside pdf document
Check this Python code that is doing the following steps to convert image to PDF:
Writes PDF header;
Checks image data to find which filter to use. You should better select just one format like FlateDecode codec (used by PDF to compress images without loss);
Writes "catalog" object which is basically is the array of references to page objects.
Writes image object header;
Writes image data (pixels by pixels, converted to the given codec format) as the "stream" object in pdf;
Writes "page" object which contains "image" object;
Writes "trailer" section with the set of references to objects inside PDF and their starting offsets. PDF format stores references of objects at the end of PDF document.
I would write my own ASP.NET Web Service or Web API service and call it within the app :)
This should be a pretty trivial programming task in C#, however after I have searched a while I simply cannot find anything relevant on how to remove metadata.
I want to remove jpg and png image metadata such as: folder path, shared with, owner and computer.
My application is an MVC 4 application. In my website users can upload an image I get this image at this ActionResult method
if (image != null)
{
photo.ImageFileName = image.FileName;
photo.ImageMimeType = image.ContentType;
photo.PhotoFile = new byte[image.ContentLength];
image.InputStream.Read(photo.PhotoFile, 0, image.ContentLength);
}
Photo is a property in the model, goes like this.
public byte[] PhotoFile { get; set; }
I imagine the way to remove above mentioned metadata or just all metadata, would be to use some coding like this
if (image != null)
{
image = image.RemoveAllMetaData; !!!
I dont mind using some 3rd party dll as long as it is compatible with NET 4.
Thanks.
'Metadata' here is a bit ambiguous--Do you mean the data which is required for a viewer to properly determine the image format so it can be displayed, saving only the raw image data? Or, more likely, do you mean the extra information, such as author, camera type, GPS location, etc, that is often added via the EXIF tags?
If you mean something like the EXIF data, there's a lot of programming material already on the web about how to add/modify/remove EXIF tags, and even some apps which already strips such tags: http://www.steelbytes.com/?mid=30 for example.
If you mean you just want the raw image data, you'll probably have to read and process the image first, since both JPEG and PNG do not contain simply the raw image data; It's encoded with various methods--which is why they contain metadata to tell you how to decode it in the first place. You'll have to learn/explore the JPEG and PNG data formats to extract the original raw image data (or a reasonable facsimile in the case of a "lossy" encoding).
All the above is well-documented on various websites which can be found on Google, and many include image manipulation libraries which can handle these chores for you. I suspect you just didn't know to search for something like "JPEG PNG EXIF METADATA".
BTW, EXIF applies to JPEG's, where EXIF is, loosely (and not fully technically correct) an addition of data (extension) to the end of the JPEG file, which can usually simply be truncated to remove. A quick Google search for me turned up something like libexif.sourceforge.net and other similar results.
I'm not entirely certain about the PNG format, but I believe the PNG format (which does call such items "metadata" as well) was written to include such data as part of the file format rather than an "extension" tagged on after the fact like EXIF is. PNG, however, is open source, and you can obtain libraries and code for manipulating them from the PNG website (www.libpng.org).
There's an app for that but it's written in Perl. It doesn't recompress the image and it's here http://www.sno.phy.queensu.ca/~phil/exiftool
Found it in this thread
How to remove EXIF data without recompressing the JPEG?
Do what all the social media websites do. Create a new image file, stream in the image byte data and use the file you created than the original one that was uploaded. Of course, now you will need to find out the original image's color depth and so on so that the image you create is not of a lower quality -- unless you need to do a disk or image resize as well.
While extracting text from PDF file using iTextSharp using the below piece of code, I am getting this error: “Could not find image data or EI” while debugging the code found that this error is coming in certain pages but not all pages, then further investigated and also found that generally there are two types image in pdf xObject image and Inline Image and using the below piece of code Inline Image ca not be handled. There are few few comments in this issue in other similar post that suggested to use latest version(5.5.0) itextsharp, that also i did but no luck. My basic purpose is to extract the text in the page not image. How can I handle the Inline image or how can I extract only the text regardless what type of image the page having.
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
PdfContentByte pdfData = pdfStamper.GetUnderContent(page);
LocTextExtractionStrategy its = new LocTextExtractionStrategy();
pdfData = pdfStamper.GetUnderContent(page);
string extractedTextInCurrentPage=PdfTextExtractor.GetTextFromPage(pdfReader, page, its);//In this line exception is throwing
}
Please share your PDF.
This is why:
Your PDF contains an inline image. Inline images are problematic in ISO-32000-1, but I personally saw to it that the problem will be solved in ISO-32000-2 (for PDF 2.0, to be expected in 2017).
In ISO-32000-1, an inline images starts with the BI operator, followed by some parameters. The length of the image bytes isn't one of those parameters. The actual image bytes are enclosed by an ID and an EI operator.
Software parsing PDF syntax needs to search for these operators and usually does a good job at it: find BI, then take the bytes between ID and EI. However: what to do when you encounter an image of which EI is part of the image bytes?
This hardly ever happens, but it was reported to us as a problem and we solved this in recent iText versions by converting the bytes between ID and EI to an image. If that fails, iText continues searching for the next EI. If iText doesn't find that EI parameter, you get the exception you mention.
This is a cumbersome process and, being a member of the ISO committee that writes the PDF standards, I introduced a new inline image parameter into the spec: the parameter /L will informs parsers exactly how many bytes are to be expected between the ID and EI operators. At the same time, I saw to it that the recommendation of keeping inline images smaller than 4 KB became normative: in PDF 2.0, it will be illegal to have inline images with more than 4096 bytes. Of course: this doesn't help you. PDF 2.0 doesn't exist yet. My work in the ISO committee only helps to solve the problem on the long term.
On the short term, we've written a work-around that solves the problem for the PDFs that were reported to us, but apparently, you've found a PDF that escapes the workaround. If you want us to solve the problem, you'll have to share the PDF.
i am having a trouble in retrieving images and text in a pdf file at the same, i was able to get images and text in a pdf file but not at the same time (this will cause a question of whether to render the image first or the text first for example in my panel control?), maybe if you guys can help me define what does each constants in pdfname means? i tried using pdfname.all but it returns null, but when using pdfname.resources it returns procset, font and xobject. i used xobject for image, but what are procset and font (could this be the style of the text? does it have pdfname.text for retrieving text)?
thanks in advance.
First of all,
i am having a trouble in retrieving images and text in a pdf file at the same
for this task you should use the iText(Sharp) parser API. In iTextSharp you essentially implement IRenderListener (an interface with methods for being informed about (bitmap) images and text fragments in a content stream) and process the page contents with it:
PdfReader reader = new PdfReader(...);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
int pageNumber = [... the number of the page you are interested in; may be a loop variable ...];
IRenderListener listener = new [... your IRenderListener implementation ...]
parser.ProcessContent(pageNumber, listener);
You ask
whether to render the image first or the text first for example in my panel control
The IRenderListener methods also retrieve information on the location of the bitmap or text fragment in question.
For ideas how the text fragments may be combined in your listener, you may want to be inspired by the implementations SimpleTextExtractionStrategy or LocationTextExtractionStrategy present in iTextSharp.
If you insist on doing it manually, though...
maybe if you guys can help me define what does each constants in pdfname means?
You find the definitions of what the names map to in the PDF specification ISO 32000-1:2008 a copy of which Adobe made available here.
when using pdfname.resources it returns procset, font and xobject. i used xobject for image, but what are procset and font (could this be the style of the text?
The contents of the page Resource Dictionaries are explained in section 7.8.3 of the specification.
does it have pdfname.text for retrieving text)?
You'll find how test is presented in page content streams and xobjects in section 9.
I want to add image to pdf file. I'm using iTextSharp for this.
I have the following code:
var imageBanner = iTextSharp.text.Image.GetInstance(bannerImagePath);
The problem in that the RawData property is equal NULL for jpg images, but for png is all ok.
Please read chapter 10 of the book "iText in Action". The Image class is abstract. It has different implementations for different image types. Some image types exist in PDF. For instance: a JPG (DCTDecode) can be copied literally into a PDF. File types such as PNG and GIF don't exist in PDF, so they need to be converted to raw data first; they are compressed (FlateDecode) later on in the process.
As there's absolutely no need for any 'processing' when dealing with JPGs, no memory is wasted on creating a raw image. It would be bad if RawData weren't null, hence my question: why is it a problem for you? you should be happy that RawData is null!