PDFClown Detect empty text location

PDFClown Detect empty text location - c#

I am able to use the PDFClown library in C# to parse and extract the text from a daily report in PDF. The issue I am having is detecting when a text value is missing. Using the TextExtractor, there is no place holder in the text value as I expected. The PDF document has a box where the missing text should be so it would seem like there should be some way to detect value is not there. There is no form in this document.

Related

Unwanted characters in Acrobat PDF conversion of auto-detected Word fillable fields. Deleting fillable field characters using iTextSharp5

Acrobat DC, Office 365, iTextSharp5, Win10 Pro 64-bit
I have a Word document containing several pages of text and one empty TextBox in between two of the lines. I am attempting to use the Acrobat "Prepare Form" feature to convert that document to PDF with the TextBox as a fillable field, and Acrobat has no problem auto detecting the TextBox and making it fillable. The problem, however, is that the converted TextBox contains text from either the line of text above it or below it.
I've read that this is caused by placing the TextBox too close to those lines in the Word document and sure enough, by leaving three or four empty lines of space above and below the text box the issue goes away. However, that's an unacceptable amount of wasted space. I tried putting continuous section breaks above and below the TextBox in Word as well as typing spaces in the TextBox but that doesn't help. I also tried it with a 1x1 table instead of a TextBox but the same problem occurred.
I then tried deleting the unwanted text from the PDF TextBox field and saving it that way, which appeared to be a reasonable solution. However, when I used an iTextSharp5 program to detect the PDF's fillable fields it could no longer detect the empty field. I wouldn't mind leaving the original unwanted text in the PDF TextBox field if there were some way to remove it with iTextSharp, but it doesn't seem to have that ability.
Because I have many Word documents to convert to fillable PDF's and might need to update them occasionally, it simply isn't practical for me to manually add the fillable fields to the converted PDFs each time an update is needed. Any suggestions are welcome :-)

Getting Hyperlinks Working in Rich Text Box Using RTF

I am trying to format a hyperlink in a Rich Text Box using the Rich Text Format. I can get basic formatting working thanks to this answer, for example making text bold. However I cannot get the RTF formatted hyperlink to work. I found an example of making an RTF link here. However, when I try to put this in the Rich Text Box as seen below, it causes my application to crash. Any suggestions as to what i'm missing here?
string my_hyperlink_text = #"{\field{\*\fldinst HYPERLINK \"http://www.google.com/\"}{\fldrslt Google}}"
if (rtbControl is RichTextBox & rtbControl.Name == "name_of_control") // Making sure the control is a RichTextBox
{
RichTextBox rtb = rtbControl as RichTextBox;
rtb.Rtf = my_hyperlink_text;
}

An easy way for getting rtfs to work is to write your text in Microsoft word, copy & paste it to Wordpad and the saving it as a RTF from there.
The detour with MS Word is needed, because WordPad does not support entering links in the UI, although it handles them correctly when they come from other sources, like the clipboard. Also, MS Word creates massively bloated rtf.
The rtf file you create this way can then be opened in any text editor and can be used as a string constant in your program.
In your case, I suppose that the prefix and maybe the color table are missing and are causing the problem.
By the way: Wordpad is not much more than a wrapper around the Windows rtf control, i.e. the same control that you are using in your code.

gibberish text output in word output when generated using the template (.dotx) file

I have collection of RTF text and I need to generate the report in word by pasting this RTF content from collection in loop. I use word template (i.e. .dotx) file as a base template file so as to generate report in structured manner. In base template file, I have some placeholder texts which needs to be replaced with the RTF content. While replacing the placeholder text with RTF content I'm facing following two issues :
maintaining the source formatting i.e. to apply format of RTF content (i.e. font, color, bold, etc.) in generated word report
pasting of RTF text in proper order with proper content
In order to maintain source formatting while copying the RTF content, I use
Range.PasteAndFormat(WdRecoveryType.wdFormatOriginalFormatting)
This ensures that while I copy the RTF content from my collection, original source formatting (i.e. as per RTF text) gets applied. There are few other methods to paste the content i.e. Paste, PasteSpecial but they don't maintain the source formatting and upon pasting, it takes the font as the default font of the base word template file. Problem using the PasteAndFormat method is, generated output it bit gibberish when report is generated using template file while generated output is proper when not generated using template file.
I have created the sample project (available at https://drive.google.com/open?id=1es1aBgewbJvQxmOAQF3FMhx3inu3keVy) which exactly reproduces the problem I face. In sample application, when you enter 1, it will generate the report without using the template file and that is the kind of final report I want. When you enter 2, it generates the report with gibberish text and I'm not able to figure out the reason for such output.
Can anyone pls help me to figure out what is the problem with the sample application code and help me to generate the report with option 2 same as the one generated with option 1?

How to retrieve the Font of word from RichTextBox in Winforms?

I am developing an app in WinForms using C#. It has a small window that includes a RichTextBox. The user can write in the RichTextBox and by pressing ctrl+b and ctrl+i they can change the font to bold or italic. When the application is closed down the text is saved. When the application is restarted the text is stored into the RichTextBox again. The problem is I cannot store the font the user was writing with. If a user had a word in bold for example, after the app restarts the word is not bold anymore. Is there a way to store the state of a word ?

The RTF property of the RichTextBox returns the formatted text, so that's what you need to store:
You can use this property .. to extract the text of the control with
the specified RTF formatting defined in the text of the control.

As #stuartd mentioned the RTF propert can be used to solve the aforementioned problem. I store the myRichTextBox.Rtf property in a string and then in a file. After app restarts I read the file and assign the read value to myRichTextBox.Rtf .

Retrieve PDF field "description text" with Itextsharp

Is it possible to get the text shown in the red squares in my attached image? The image is showing part of a PDF document with several fields and their "title"
I don't know what they are called so im having a hard time searching for a solution :(
I can get all the field names and types. But when i debug i cant seem to find any option where the "caption", or whatever it is called, for a field is stored/accessed. If there is a link between the field and that text it would be lovely :)
If you don't know how to access that information with code, do you know what the text might be called so that i can try searching/debugging some more myself?
Edit - added a bigger image, sorry for the NSA blackout of text, not sure i can share the customer PDF document...
Edit2 - added some PDFReader data about the document from VS quickWatch

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.