How to set CurrentCulture for PDF document ITextsharp c# - c#

Document doc = new Document(iTextSharp.text.PageSize.LETTER.Rotate(), 10, 10, 5, 5);
string nazivPDFa = txt_datumFiskalnogIsecka.Text +" "+ txt_nazivKompanije.Text;
PdfWriter pdf = PdfWriter.GetInstance(doc, new FileStream(nazivPDFa + ".pdf", FileMode.CreateNew));
doc.Open();
Paragraph klijent = new Paragraph(ispisiKlijenta.Text);
PdfPTable tabelaNK = new PdfPTable(1);
PdfPCell kl = new PdfPCell(new Phrase(klijent));
kl.BorderColor = BaseColor.BLACK;
tabelaNK.AddCell(kl);
doc.Add(tabelaNK);
I have create PDF document with itextSharp and when I fill PDF with some text who is in Serbian, he doesnt show me chars like š,ć,č,đ,ž.
Example: I wrote "nešto" and I get "neto".
I have a lot of thinks at that PDF and it will take forever to give to all elements current culture.

You aren't using a font when you create your Paragraph. In that case, the Standard Type 1 font Helvetica will be used and it won't be embedded. As Helvetica only supports a limited set of characters, your glyphs won't appear. This is very well documented in the official documentation. It's a pity you try to run before you've learned how to walk.
Several things can be at play.
First, you need to make sure that the encoding of ispisiKlijenta.Text is correct. For instance, is that string in CP1250 or in Unicode? When you write KlijenT.Add("Tekući račun: " +txt_brRacunaKompanije.Text);, you are writing bad code (at least if you were writing Java) because you introduce special characters in your code that may disappear when the code is compiled or executed using a different environment using a different encoding.
Then, you need to provide a font program that knows how to draw the glyphs you need. For instance: Helvetica doesn't know about CP1250, but arial.ttf does (and so do many other fonts, but you need to check first).
Then, you need to decide how you'll use that font. Will you use embed the font as a simple font, as is done in the EncodingExample where we create this PDF, or will you embed the font as a composite font, as is done in the UnicodeExample where we create this PDF. Both PDFs may look identical to you, but they aren't. The choice you make will have an impact on the design of your application.
Once you've made a decision about the font and once you've create a Font object, e.g. named font, you need to use that object when creating a Paragraph.

Related

Try To Understand ITextSharp

I try to build an application that can convert a PDF to an excel with C#.
I have searched for some library to help me with this, but most of them are commercially licensed, so I ended up to iTextSharp.dll
It's good that is free, but I rarely find any good open source documentation for it.
These are some link that I have read:
https://yoda.entelect.co.za/view/9902/extracting-data-from-pdf-files
https://www.mikesdotnetting.com/article/80/create-pdfs-in-asp-net-getting-started-with-itextsharp
http://www.thedevelopertips.com/DotNet/ASPDotNet/Read-PDF-and-Convert-to-Stream.aspx?id=34
there're more. But, most of them did not really explain what use of the code.
So this is most common code in IText with C#:
StringBuilder text = new StringBuilder(); // my new file that will have pdf content?
PdfReader pdfReader = new PdfReader(myPath); // This maybe how IText read the pdf?
for (int page = 1; page <= pdfReader.NumberOfPages; page++) // looping for read all content in pdf?
{
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy(); // ?
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy); // ?
currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.UTF8.GetBytes(currentText))); // maybe how IText convert the data to text?
text.Append(currentText); // maybe the full content?
}
pdfReader.Close(); // to close the PdfReader?
As you can see, I still do not have a clear knowledge of the IText code that I have. Tell me, if my knowledge is correct and give me an answer for code that I still not understand.
Thank You.
Let me start by explaining a bit about PDF.
PDF is not a 'what you see is what you get'-format.
Internally, PDF is more like a file containing instructions for rendering software. Unless you are working with a tagged PDF file, a PDF document does not naturally have a concept of 'paragraph' or 'table'.
If you open a PDF in notepad for instance, you might see something like
7 0 obj
<</BaseFont/Helvetica-Oblique/Encoding/WinAnsiEncoding/Subtype/Type1/Type/Font>>
endobj
Instructions in the document get gathered into 'objects' and objects are numbered, and can be cross-referenced.
As Bruno already indicated in the comments, this means that finding out what a table is, or what the content of a table is, can be really hard.
The PDF document itself can only tell you things like:
object 8 is a line from [50, 100] to [150, 100]
object 125 is a piece of text, in font Helvetica, at position [50, 110]
With the iText core library you can
get all of these objects (which iText calls PathRenderInfo, TextRenderInfo and ImageRenderInfo objects)
get the graphics state when the object was rendered (which font, font-size, color, etc)
This can allow you to write your own parsing logic.
For instance:
gather all the PathRenderInfo objects
remove everything that is not a perfect horizontal or vertical line
make clusters of everything that intersects at 90 degree angles
if a cluster contains more than a given threshold of lines, consider it a table
Luckily, the pdf2Data solution (an iText add-on) already does that kind of thing for you.
For more information go to http://pdf2data.online/

itextsharp editor for creating tags

I would like to know if exist some program for editing itextsharp code.
Now I am writting code directly in VS lile:
// Delivery address details
int left_margin = 40;
int top_margin = 720;
writeText(cb, "Delivery address", left_margin, top_margin, f_cb, 10);
writeText(cb, drHead["delCustomerName"].ToString(), left_margin, top_margin-12, f_cn, 10);
writeText(cb, drHead["delAddress1"].ToString(), left_margin, top_margin-24, f_cn, 10);
writeText(cb, drHead["delAddress2"].ToString(), left_margin, top_margin-36, f_cn, 10);
writeText(cb, drHead["delAddress3"].ToString(), left_margin, top_margin-48, f_cn, 10);
writeText(cb, drHead["delZipcode"].ToString(), left_margin, top_margin-60, f_cn, 10);
writeText(cb, drHead["delCity"].ToString() + ", " + drHead["delCountry"].ToString(), left_margin+65, top_margin-60, f_cn, 10);
But I would like to use some editor for this because this is horrible to write by hand. I am searching for an editor but I can not find it.
I'm guessing that you inherited some code, right? I say that because writeText() is not from iTextSharp, it is someone's specific implementation that wraps around iTextSharp's code.
iTextSharp is a library that ultimately produces PDFs and as such is bound by the PDF specification. This is first and foremost the most important thing to understand. Even if you already know this you must always keep this top of mind.
iText runs in two basic modes, 1) full abstraction where you add "paragraphs" and "tables" and 2) manual placement mode where you say "write this text using this font at this coordinate and don't worry, I've measured everything so I know where to insert line breaks so it doesn't look weird."
Full abstraction mode (my terminology, not their's) uses the Document object and you create new Paragraph objects and Add them to the Document and iText figures the rest out for you. There's less control but if you can live in this mode your live will be so much easier.
Manual placement mode (once again, my terminology, not their's) uses instances of the PdfWriter and PdfContentByte classes to issue commands that are much closer to the PDF spec. Usage of this mode kind of assumes that you've at least partially read the spec. For instance, you'll should be aware that "paragraphs" and "tables" don't actually exist in a PDF.
The variable that you are using, cb, is almost definitely an instance of a PdfContentByte so you are using manual placement mode.
Now to answer your question.
No, there are no GUI programs to edit code that's specific to iTextSharp. However, there are programs such as Adobe Acrobat, Adobe Illustrator and Adobe InDesign that do know how to work with PDFs. There are probably free ones out there, too. You can use these programs, find the x,y coordinates of things and manually write those into your code.

C# iTextSharp multi fonts in a single cell

First off I'm not that great with C# and it's been a while since I've worked with it..
I'm making a windows form for a friend that delivers packages. So I want to transfer his current paper form, into a .pdf with the library iTextSharp. He still needs to print the form to get the customer signature and so on.
What I need:
I want the table to have a little headline, "Company name" for example, the text should be a little smaller than the text input from the windows form(richTextBox1)
Currently I'm using cells and was wondering if I can use 2 different font sizes within the same cell?
What I have:
table.AddCell("Static headline" + Chunk.NEWLINE + richTextBox1.Text);
What I "want":
var normalFont = FontFactory.GetFont(FontFactory.HELVETICA, 9);
var boldFont = FontFactory.GetFont(FontFactory.HELVETICA_BOLD, 12);
table.AddCell("Static headline", boldFont + Chunk.NEWLINE + richTextBox1.Text, normalFont);
You're passing a String and a Font to the AddCell() method. That's not going to work. You need the AddCell() method that takes a Phrase object or a PdfPCell object as parameter.
A Phrase is an object that consists of different Chunks, and the different Chunks can have different font sizes. Please read chapter 2 of my book for more info about this object.
Phrase phrase = new Phrase();
phrase.Add(
new Chunk("Some BOLD text", new Font(Font.FontFamily.TIMES_ROMAN, 12, Font.BOLD))
);
phrase.Add(new Chunk(", some normal text", new Font()));
table.AddCell(phrase);
A PdfPCell is an object to which you can add different objects, such as Phrases, Paragraphs, Images,...
PdfPCell cell = new PdfPCell();
cell.AddElement(new Paragraph("Hello"));
cell.AddElement(list);
cell.AddElement(image);
In this snippet list is of type List and image is of type Image.
The first snippet uses text mode; the second snippet uses composite mode. Cells behave very differently depending on the mode you use.
This is all explained in the documentation; you can find hundreds of C# examples here.

consolidate the fonts between merges pdfs itextsharp C#

I need to merge multiple pdfs together. I am using itextsharp to create all the pdfs. I need to reduce the size of the pdfs to the lowest possible size. I know the fonts are being duplicated for each pdf. Is there to use only one set of fonts throughout the merged pdf? For example, pdf1 is 2.8mb and pdf2 is 2.8 mb I merge them together and its about 5.7mb. I know for a fact that both of those pdfs are using the same font but the data for the font is being duclpicated even though its in the same pdf.
I tried using setting the compression properties to best compression and set full compression and that barely reduced the size.
Though when i ran the pdf through Acrobat X pro and optimize its reduce almost 90%+ from like 160 mb to 5 mb. The usage audit says its 90% of the pdf is fonts before optimizing.
Now is there a way to consolidate the fonts between merges pdfs ?
My answer consists of two parts:
You're not telling us how you're merging the PDFs. Let's hope you've read the official documentation and that you're using PdfSmartCopy. If not, you're doing it wrong. PdfSmartCopy examines the content of the different PDFs and reuses possibly redundant objects (such as reused images, XObjects, fonts). Note that there were some bugs in earlier versions of PdfSmartCopy so please make sure you're using the latest version.
If the different PDFs use different subsets of the same font, you're out of luck. iText doesn't merge font subsets. Merging different font subsets would involve rewriting content streams, creating new fonts if we're talking about simple font sets that require more than 256 characters if the subsets are merged, etc...
You could rename subsets.
As if you had
Helvetica (subset)
and
Helvetica (subset)
you would create
Helvetica-1 (subset)
and
Helvetiva-2 (subset)
once they were different implementations (binary stream compare)
According to
https://turreta.com/2013/12/13/remove-duplicate-fonts-in-pdf-files/
iTextSharp.text.Document tdocument = new iTextSharp.text.Document();
iTextSharp.text.pdf.PdfSmartCopy smart =
new iTextSharp.text.pdf.PdfSmartCopy(tdocument,
new FileStream(#"newAddressPath", FileMode.Create));
tdocument.Open();
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(#"yourPDFMergeFile");
// Where the magic happens
for (int i = 1; i <= reader.NumberOfPages; i++)
{
smart.AddPage(smart.GetImportedPage(reader, i));
}
tdocument.Close();

Not applying the CSS while generating PDF using iTextsharp.dll

I am generating PDF using iTextSharp.dll, but the problem is that I am not able to apply that CSS. I have one div:
<div id="personal" class="headerdiv">
Personal Data
</div>
now my .aspx.cs code is like this:
iTextSharp.text.html.simpleparser.StyleSheet styles = new iTextSharp.text.html.simpleparser.StyleSheet();
styles.LoadTagStyle("#headerdiv", "height", "30px");
styles.LoadTagStyle("#headerdiv", "font-weight", "bold");
styles.LoadTagStyle("#headerdiv", "font-family", "Cambria");
styles.LoadTagStyle("#headerdiv", "font-size", "20px");
styles.LoadTagStyle("#headerdiv", "background-color", "Blue");
styles.LoadTagStyle("#headerdiv", "color", "White");
styles.LoadTagStyle("#headerdiv", "padding-left", "5px");
HTMLWorker worker = new HTMLWorker(document);
worker.SetStyleSheet(styles);
// step 4: we open document and start the worker on the document
document.Open();
worker.StartDocument();
// step 5: parse the html into the document
worker.Parse(reader);
// step 6: close the document and the worker
worker.EndDocument();
worker.Close();
document.Close();
There's a couple of things going on here. First and foremost, the HTML/CSS parser in iText and iTextSharp are far from complete. They're definitely very powerful but still have a ways to go. Each version gets better so always make sure that you're using the latest version.
Second, I've seen more HTML/CSS activity in an add-on for iText/iTextSharp called XMLWorker that you might want to look at. You don't "load styles" anymore, you just pass raw HTML/CSS in and it figures out a lot of things. You can see some examples here, see a list of supported CSS attributes here, download it here (and get the two missing files here and here).
Third, LoadTagStyle is for loading style attributes for HTML tags, not CSS IDs or Classes. You want to use LoadStyle to load by class:
styles.LoadStyle("<classname>", "<attribute>", "<value>");
Unfortunately this method still doesn't do what you want it to do always. For instance, to change the font size you'd think you'd say:
styles.LoadStyle("headerdiv", "font-size", "60ptx);
But to get it to work you can only use relative HTML font sizes (1,2,-1, etc) or PT sizes and you must use the size property:
styles.LoadStyle("headerdiv", "size", "60pt");
//or
styles.LoadStyle("headerdiv", "size", "2");
The LoadStyle honestly feels like an afterthought that was only partially completed and I recommend not using it actually. Instead I recommend writing the style attributes directly inline if you can:
string html = "<div id=\"personal\" class=\"headerdiv\" style=\"padding-left:50px;font-size:60pt;font-family:Cambria;font-weight:700;\">Personal Data</div>";
Obviously this defeats the points of CSS and once again, that's why they're working on the new XMLWorker above.
Lastly, to use fonts by name you have to register them with iTextSharp first, it won't go looking for them:
iTextSharp.text.FontFactory.Register(#"c:\windows\fonts\cambria.ttc", "Cambria");
In case someone is still having issues with this. The latest version of itextsharp (currently 5.3.2) significantly improves the HTMLWorker processor.
you can get it here: http://sourceforge.net/projects/itextsharp/
The correct way to reference the backgroud color is through the HtmlTags class
styles.LoadTagStyle(HtmlTags.HEADERCELL, HtmlTags.BACKGROUNDCOLOR, "Blue");

Categories