Inputting image data to PDF with iText

Inputting image data to PDF with iText - c#

I have an ImageButton in my website that has a dynamic source well it basically looks like this: "data:image/svg+xml;base64,...."
So I am trying to insert an image in to PDF using that. This is the code I use
iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(new Uri(((ImageButton) FindControl(fieldKey)).ImageUrl));
I either get a "The URI is empty" error or a path not found.
Any ideas how to approach this?

I doubt anyone would google this but I figured it out so why not post an anwser.
To implement data img types in to PDF remove the prefix part and then convert it back from base 64 in to array byte.
string theSource = ((ImageButton)FindControl(fieldKey)).ImageUrl.Replace("data:image/png;base64,", "");
iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(Convert.FromBase64String(theSource));

Related

How can I convert blob image from database to html image src

I have a blob field in the database which stores image blob. I want to convert this to an image in the html image src tag. How can this be done, i am using ASP.NET MVC c# as my backend.

One way is to convert it to Base64String. First convert it to ByteArray then you can get Base64String.
for example:
var imgUrl = $"data:{imageType};base64,{Convert.ToBase64String(buffer)}";

Asp.Net : Convert PDF to Image

I have a pdf file in a base64 string format. I need to convert it to an image (any type) to use further in my code.
I have searched SO as well as the web and I have not been successful with anything.
Does anyone have a tried and true method for this?
I can use a third party. I just need a working one!!
Thank you

I used the Apitron PDf Rasterizer
to convert my file. It is very simple to use and works great. No missing dlls and clear documentation. However, you do need to purchase it.
Thanks to all for assistance

base64 is the form of string web friendly representation of byte array. you may convert it to a byte array like this:
byte [] decodedBytes = Convert.FromBase64String (encodedText);
then you may simple save it to a file:
System.IO.File.WriteAllBytes("file.pdf", decodedBytes);
and then finally convert pdf to image

Extracting text from PDF with iTextSharp is not working for some PDF

I am using the following code to extract text from the first page of PDF files with iTextSharp :
public static string ExtractTextFromPDFFirstPage(string fileName)
{
string text = null;
using (var pdfReader = new PdfReader(fileName))
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
text = PdfTextExtractor.GetTextFromPage(pdfReader,1,strategy);
text = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(text)));
}
return text;
}
It works quite well for many PDF, but not for some other ones.
Working PDF : http://data.hexagosoft.com/LFBO.pdf
Not working PDF : http://data.hexagosoft.com/LFBP.pdf
These two PDF seems to be quite similar, but one is working and the other is not.
I guess the fact that their producer tag is not the same is a clue here.
Another clue is that this function works for any other page of the PDF without a chart.
I also tried with ghostscipt, without success.
The Encoding line seems to be useless as well.
How can i extract the text of the first page of the non working PDF, using iTextSharp ?
Thanks

Both documents use fonts with inofficial glyph names in their Encoding/Differences array and both do not use a ToUnicode map. The glyph naming seems to be somewhat straight: the number following the MT prefix is the ASCII code of the used glyph.
The first document works, because the mapping is not changed at all and iText will use the default encoding (I guess):
/Differences[65/MT65/MT66/MT67 71/MT71/MT72/MT73 76/MT76 78/MT78 83/MT83]
The other document really changes the mapping:
/Differences [2 /MT76 /MT105 /MT103 /MT104 /MT116 /MT110 /MT32 /MT97 /MT100 /MT115 /MT58 ]
This means: E.g. the character code 2 should map to the glyph named MT76 which is an inofficial/private glyph name that iText doesn't know, so it doesn't have more information but the character code 2 and will use this code for the final result (I guess).
It's impossible without implementing a logic for the MT prefixed glyph names to get the correct text out of this document. Anyhow it is nowhere defined that a glyph name beginning with MT followed by an integer can be mapped to the ASCII value... That's simply by accident or implemented by the font designer/creation tool, whatever it came from.

The 2nd PDF (LFBP.pdf) contains the incorrect mapping from glyphs to text, i.e. you see correct glyphs but the text representation was not correctly encoded for some reason during the generation of this PDF. If you have lot of files like this then the working approach could be:
detect broken pages while extracting text by searching some phrase that should appear on every page, maybe like "service"
process these pages separately using OCR with tools like Tesseract with .NET Wraper

FromBase64String/UTF Encoding

My issue is based around a string of data that I'm getting back from an API call. I'm passing the raw data into FromBase64String and then encoding the byte array back to a string. I'm expecting a valid pdfsharp return that I'm saving to a file. None of the decoded string values below contain the correct data. I know the original base64 coded api return string is valid since I can open it in notepadd++ and use a base64 decoder to create the properly formatted pdf document.
byte[] todecode_byte = Convert.FromBase64String(data);
string decodedUTF7 = Encoding.UTF7.GetString(todecode_byte);
string decodedUTF8 = Encoding.UTF8.GetString(todecode_byte);
The closest representation to what I think it should be (the notepadd++ converted version) is the UTF7. But there seems to be some missing data within the embedded images within the document. UTF8 has some structural differences when comparing to the working document.
For example...
My control...
%PDF-1.7
%ÓôÌá
1 0 obj
<<
UTF7...
%PDF-1.7
%ÓôÌá
1 0 obj
<<
UTF8...
%PDF-1.7
%ï¿½ï¿½ï¿½ï¿½
1 0 obj
<<
But, again, the UTF7 version seems to have issues revolving around the images that are embeded futher down in the document. Either way, both versions create an 88k pdf document that opens as a blank page. The control (using notepadd++), when saved as a pdf document, is about half of that size and will open displaying all of the correct information.

I'm expecting a valid pdfsharp return that I'm saving to a file.
If it's meant to be a PDF file, I wouldn't try to convert that to a string at all. It's simply not text - it's binary data. It should be as simple as:
byte[] binaryData = Convert.FromBase64String(data);
File.WriteAllBytes("file.pdf", binaryData);

Insert image into xml file using c#

I've looked everywhere for the answer to this question but cant find anything so hoping you guys can help me on here.
Basically I want to insert an image into an element in xml document that i have using c#
I understand i have to turn it into bytes but im unsure of how to do this and then insert it into the correct element...
please help as i am a newbie

Read all the bytes into memory using
File.ReadAllBytes().
Convert the bytes to a Base64 string
using Convert.ToBase64String().
Write the Base64 Encoded string to
your element content.
Doneski!

Here's an example in C# for writing and reading images to/from XML.

You can use a CDATA part or simply put all the bytes in their hexadecimal form as a string.
Another option is to use a base64 encoding
The element you use is up to you.

http://www.dreamincode.net/code/snippet1335.htm seems to do exactly what you want to do. It might be something you might want to try out. Note that it is in VB.NET which you can easily convert to C#.

XML can only contain characters, it can't contain an image. There are various ways you can represent an image using characters, for example by encoding the image in PNG and then encoding the PNG in base64; or you could generate an element that contains a link to a URI from where the image can be retrieved. All such conventions have to be agreed between sender and recipient. So before you rush into base64 encoding, check that this is what the recipient expects.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Inputting image data to PDF with iText - c#

Related

How can I convert blob image from database to html image src

Asp.Net : Convert PDF to Image

Extracting text from PDF with iTextSharp is not working for some PDF

FromBase64String/UTF Encoding

Insert image into xml file using c#

Categories

Resources