How to Reading hyperlinks with AnchorText from pdf file C# - c#

I have taken the link values from PDF file like http://google.com
but I need to take the anchor text value, for example click here.
How to to take the anchor link value text?
I have taken the URL value of the PDF file by using the below URL: Reading hyperlinks from pdf file
for example.
Anchor a = new Anchor("Test Anchor");
a.Reference = "http://www.google.com";
myParagraph.Add(a);
Here I get the http://www.google.com but I need to get anchor value i.e. Test Anchor
Need your suggestions.

From the PDF file you need to identify the region where the link is placed and then read the text below the link using iTextSharp.
This way you can extract the text underneath the link. The limitation of this approach is that if the link region is wider than the text, the extraction will read the full text under that region.
private void GetAllHyperlinksFromPDFDocument(string pdfFilePath)
{
string linkTextBuilder = "";
string linkReferenceBuilder = "";
PdfDictionary PageDictionary = default(PdfDictionary);
PdfArray Annots = default(PdfArray);
PdfReader R = new PdfReader(pdfFilePath);
List<BinaryHyperlink> ret = new List<BinaryHyperlink>();
//Loop through each page
for (int i = 1; i <= R.NumberOfPages; i++)
{
//Get the current page
PageDictionary = R.GetPageN(i);
//Get all of the annotations for the current page
Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);
//Make sure we have something
if ((Annots == null) || (Annots.Length == 0))
continue;
//Loop through each annotation
foreach (PdfObject A in Annots.ArrayList)
{
//Convert the itext-specific object as a generic PDF object
PdfDictionary AnnotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(A);
//Make sure this annotation has a link
if (!AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK))
continue;
//Make sure this annotation has an ACTION
if (AnnotationDictionary.Get(PdfName.A) == null)
continue;
//Get the ACTION for the current annotation
PdfDictionary AnnotationAction = (PdfDictionary)AnnotationDictionary.GetAsDict(PdfName.A);
if (AnnotationAction.Get(PdfName.S).Equals(PdfName.URI))
{
//Get action link URL : linkReferenceBuilder
PdfString Link = AnnotationAction.GetAsString(PdfName.URI);
if (Link != null)
linkReferenceBuilder = Link.ToString();
//Get action link text : linkTextBuilder
var LinkLocation = AnnotationDictionary.GetAsArray(PdfName.RECT);
List<string> linestringlist = new List<string>();
iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(((PdfNumber)LinkLocation[0]).FloatValue, ((PdfNumber)LinkLocation[1]).FloatValue, ((PdfNumber)LinkLocation[2]).FloatValue, ((PdfNumber)LinkLocation[3]).FloatValue);
RenderFilter[] renderFilter = new RenderFilter[1];
renderFilter[0] = new RegionTextRenderFilter(rect);
ITextExtractionStrategy textExtractionStrategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), renderFilter);
linkTextBuilder = PdfTextExtractor.GetTextFromPage(R, i, textExtractionStrategy).Trim();
}
}
}
}

Unfortunately I don't think you're going to be able to do this, at least not without a lot of guess-work. In HTML this would be easy because a hyperlink and its text are stored together as:
Click here
However, in a PDF these two entities are not stored with any form of relationship. What we think of as a "hyperlink" within a PDF is technically a PDF Annotation that just happens to be sitting on top of text. You can see this by opening a PDF in an editing program such as Adobe Acrobat Pro. You can change the text but the "clickable" area doesn't change. You can also move and resize the "clickable" area and put it anywhere in the document.
When creating PDFs, iText/iTextSharp abstract this away so you don't have to think about this. You can create a "hyperlink" with clickable text but when it generates a PDF it ultimately will create the text as normal text, calculate the rectangle coordinates and then put an annotation at that rectangle.
I did say that you could try to guess at this, and it might or might not work for you. To do this you'd need to get the rectangle for annotation and then find the text that's also at those coordinates. It won't be an exact match, however, because of padding issues. If you absolutely have to get the text under a hyperlink then this is the only way that I know of for doing this. Good luck!

Related

How to query the Base Paragraph element position? in order to add Link Annotation without saving the file

I'm creating a simple PDF file with some text and an hyperlink attached to the that text:
Document pdfDocument = new Document();
Page pdfPage = pdfDocument.Pages.Add();
TextFragment textFragment = new TextFragment("My Text");
Table table = new Table();
Row row = table.Rows.Add();
Cell cell = row.Cells.Add();
cell.Paragraphs.Add(textFragment);
pdfPage.Paragraphs.Add(table);
LinkAnnotation link = new LinkAnnotation(pdfPage, textFragment.Rectangle); //[Before Save]textFragment.Rectangle: 0,0,35.56,10
link.Action = new GoToURIAction("Link1 before save");
pdfPage.Annotations.Add(link);
pdfDocument.Save(dataDir + "SimplePDFWithLink.pdf");
The problem is that the link annotation is being assign to the before save rectangle [0,0,33.56,10] at the bottom of the screen where's the textFragment is being added to a different rectangle (I can't set here the Position property because I don't know it, it is relative to the cell's table).
In order to solve this I've tried saving the page and only then searching the textFragment using TextFragmentAbsorber
pdfDocument.Save(dataDir + "SimplePDFWithLink.pdf");
//[After Save]textFragment.Rectangle: 0,0,90,770
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber();
pdfPage.Accept(textFragmentAbsorber);
foreach (TextFragment absorbedTextFragment in textFragmentAbsorber.TextFragments)
{
link = new LinkAnnotation(pdfPage, absorbedTextFragment.Rectangle);
link.Action = new GoToURIAction("Link 2 after save");
pdfPage.Annotations.Add(link);
}
pdfDocument.Save(dataDir + "SimplePDFWithLink.pdf");
My Question:
Is is possible to add a simple link to a TextFragment (which is BaseParagraph not StructureElement) without saving the document first?
Here is a simple demo of the outcome, you can see that before saving the document the link is added to the left bottom of the document instead of the text rectangle:
Update:
If I specify the TextFragment's Position value with some arbitrary values, the link is then added exactly to the text, but I don't know what will be the Position value of the element because it being built dynamically using a Table.
Working with TextFragment and TextSegment does work and adds the link without pre-saving the file:
TextFragment textFragment = new TextFragment("My Text");
TextSegment textSegment = new TextSegment("Link to File");
textSegment.Hyperlink = new Aspose.Pdf.WebHyperlink("www.google.com");
textFragment.Segments.Add(textSegment);
It is worth to mention it is works well when linking to a file on the user's file-system like:
textSegment.Hyperlink = new Aspose.Pdf.WebHyperlink("Files\foo.png");

RichTextBox links appearing on same position

I have been able to get links appearing in my RichTextbox. The first entry is correct but when I try appending a new line that also contains a link, the first entry is in the same position as the new link. When clicking on the link it retains it's first entries hyperlink.
I want each line to have it's own hyperlink (where it's underlined)
Code used to append a Link
public void AppendLink(string text, string linkText)
{
LinkLabel link = new LinkLabel();
link.Text = text;
link.LinkClicked += new LinkLabelLinkClickedEventHandler(this.link_LinkClicked);
LinkLabel.Link data = new LinkLabel.Link();
data.LinkData = linkText;
link.Links.Add(data);
link.Location = this.logTextBox.GetPositionFromCharIndex(this.logTextBox.TextLength);
this.logTextBox.Controls.Add(link);
logTextBox.SelectionFont = UNDERLINE_FONT;
this.logTextBox.AppendText(s);
}
Called using this
AppendLogLine("Sealed ");
AppendLink(itemName, GetItemLink(itemName));
AppendLog(" is an unknown item. Keeping.");
Append Log and AppendLogLine does the same as AppendLink just doesn't create a link and uses a different Font

Replace text in Word with text from C# form

I'm trying to make an application in C#. When pressing a radio button, I'd like to open a Microsoft Word document (an invoice) and replace some text with text from my Form. The Word documents also contains some textboxes with text.
I've tried to implement the code written in this link Word Automation Find and Replace not including Text Boxes but when I press the radio button, a window appears asking for "the encoding that makes the document readable" and then the Word document opens and it's full of black triangles and other things instead of my initial template for the invoice.
How my invoice looks after:
Here is what I've tried:
string documentLocation = #"C:\\Documents\\Visual Studio 2015\\Project\\Invoice.doc";
private void yes_radioBtn_CheckedChanged(object sender, EventArgs e)
{
FindReplace(documentLocation, "HotelName", "MyHotelName");
Process process = new Process();
process.StartInfo.FileName = documentLocation;
process.Start();
}
private void FindReplace(string documentLocation, string findText, string replaceText)
{
var app = new Microsoft.Office.Interop.Word.Application();
var doc = app.Documents.Open(documentLocation);
var range = doc.Range();
range.Find.Execute(FindText: findText, Replace: WdReplace.wdReplaceAll, ReplaceWith: replaceText);
var shapes = doc.Shapes;
foreach (Shape shape in shapes)
{
var initialText = shape.TextFrame.TextRange.Text;
var resultingText = initialText.Replace(findText, replaceText);
shape.TextFrame.TextRange.Text = resultingText;
}
doc.Save();
doc.Close();
Marshal.ReleaseComObject(app);
}
So if your word template is the same each time you essentially
Copy The Template
Work On The Template
Save In Desired Format
Delete Template Copy
Each of the sections that you are replacing within your word document you have to insert a bookmark for that location (easiest way to input text in an area).
I always create a function to accomplish this, and I end up passing in the path - as well as all of the text to replace my in-document bookmarks. The function call can get long sometimes, but it works for me.
Application app = new Application();
Document doc = app.Documents.Open("sDocumentCopyPath.docx");
if (doc.Bookmarks.Exists("bookmark_1"))
{
object oBookMark = "bookmark_1";
doc.Bookmarks.get_Item(ref oBookMark).Range.Text = My Text To Replace bookmark_1;
}
if (doc.Bookmarks.Exists("bookmark_2"))
{
object oBookMark = "bookmark_2";
doc.Bookmarks.get_Item(ref oBookMark).Range.Text = My Text To Replace bookmark_2;
}
doc.ExportAsFixedFormat("myNewPdf.pdf", WdExportFormat.wdExportFormatPDF);
((_Document)doc).Close();
((_Application)app).Quit();
This code should get you up and running unless you want to pass in all the values into a function.
EDIT: If you need more examples I'm working on a blog post as well, so I have a lot more detail if this wasn't clear enough for your use case.

underline portion of text using iTextSharp

I have an application that uses itextsharp to fill PDF form fields.
One of these fields has some text with tags. For example:
<U>This text should be underlined</>.
I'd like that the text closed in .. has to be underlined.
How could I do that?
How could I approch it with HTMLWorker for example?
Here's the portion of code where I write my description:
for (int i = 0; i < linesDescription.Count; i++)
{
int count = linesDescription[i].Count();
int countTrim = linesDescription[i].Trim().Count();
Chunk cnk = new Chunk(linesDescription[i] + GeneralPurpose.ReturnChar, TextStyle);
if (firstOpe && i > MaxLinePerPage - 1)
LongDescWrapped_dt_extra.Add(cnk);
else
LongDescWrapped_dt.Add(cnk);
}
Ordinary text fields do not support rich text. If you want the fields to remain interactive, you will need RichText fields. These are fields that are flagged in a way that they accept an RV value. This is explained here: Set different parts of a form field to have different fonts using iTextSharp (Note that I didn't succeed in getting this to work, but you may have better luck.)
If it is OK for you to flatten the form (i.e. remove all interactivity), please take a look at the FillWithUnderline example:
public void manipulatePdf(String src, String dest) throws DocumentException, IOException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.setFormFlattening(true);
AcroFields form = stamper.getAcroFields();
FieldPosition pos = form.getFieldPositions("Name").get(0);
ColumnText ct = new ColumnText(stamper.getOverContent(pos.page));
ct.setSimpleColumn(pos.position);
ElementList elements = XMLWorkerHelper.parseToElementList("<div>Bruno <u>Lowagie</u></div>", null);
for (Element element : elements) {
ct.addElement(element);
}
ct.go();
stamper.close();
}
In this example, we don't fill out the field, but we get the fields position (a page number and a rectangle). We then use ColumnText to add content at this position. As we are inputting HTML, we use XML Worker to parse the HTML into iText objects that we can add to the ColumnText object.
This is a Java example, but it should be easy to port this to C# if you know how to code in C# (which I don't).
You can trythis
Chunk chunk = new Chunk("Underlined TExt", FontFactory.GetFont(FontFactory.TIMES_ROMAN, 12.0f, iTextSharp.text.Font.BOLD | iTextSharp.text.Font.UNDERLINE));
Paragraph reportHeadline = new Paragraph(chunk);
reportHeadline.SpacingBefore = 12.0f;
pdfDoc.Add(reportHeadline);

How to read the hyper-link of an image in power Point using c#

I have inserted an image to powerpoint using c# and have inserted a hyperlink to the picture and its working perfectly.But now i need to read the hyperlink of that picture which i have inserted using c#.
Whereas am inserting a text with hyperlink into powerpoint using c# ,and am reading the hyperlink back from the powerpoint by the below method.
for (int i = 0; i < presentation.Slides.Count; i++)
{
foreach (var item in presentation.Slides[i + 1].Shapes)
{
var shape = (PPT.Shape)item;
if (shape.HasTextFrame == MsoTriState.msoTrue)
{
if (shape.TextFrame.HasText == MsoTriState.msoTrue)
{
var textRange = shape.TextFrame.TextRange;
var text = textRange.Text;
string address=textRange.ActionSettings[PPT.PpMouseActivation.ppMouseClick].Hyperlink.Address;
}
}
}
}
where am getting the hyperlink address in the variable address, likewise i need to get the hyperlink from the image which i have inserted into PPT using c#.
Is it possible.??
One option would be to iterate through Shapes on a Slide and see, whether they contain hyperlink. Or you should give your picture an id when creating it and then find it by the given id.
private void GetHyperlink()
{
Microsoft.Office.Interop.PowerPoint.Application objApp = new Microsoft.Office.Interop.PowerPoint.Application();
objApp.Visible = Microsoft.Office.Core.MsoTriState.msoTrue;
Presentations objPresSet = objApp.Presentations;
Presentation p = objPresSet.Open("C:\test.ppt");
Slide slide = p.Slides[1];
// or Slide slide = objApp.ActiveWindow.View.Slide;
for (int i = 1; i <= slide.Shapes.Count; i++)
{
//If the hyperlink address is filled then display it in MessageBox
if (slide.Shapes[i].ActionSettings[PpMouseActivation.ppMouseClick].Hyperlink.Address != null)
MessageBox.Show(slide.Shapes[i].ActionSettings[PpMouseActivation.ppMouseClick].Hyperlink.Address);
}
}
This code will display all hyperlinks on slide 1.
Also it would be interesting to use Open XML SDK for this purpose instead of automation.
Interesting link:
http://www.aspose.com/docs/display/slidesnet/Finding+a+Shape+in+a+Slide
EDIT:
I suggest you first modify the code that creates the image with hyperlink like this:
//Add a picture
Shape pic = slide.Shapes.AddPicture(#"C:\Users\Public\Pictures\Sample Pictures\koala.jpg",
Microsoft.Office.Core.MsoTriState.msoFalse,
Microsoft.Office.Core.MsoTriState.msoTrue,
shape.Left, shape.Top, shape.Width, shape.Height);
//Here you have various options how to distinguish your shape
pic.Name = "MyPic";
pic.AlternativeText = "Koala";
pic.Tags.Add("MyPic", "#http://www.google.com/");
//adding hyperlink, etc...
pic.ActionSettings[PpMouseActivation.ppMouseClick].Hyperlink.Address =#"http://www.google.com/"
Then when you read the file you can use tags or name to distinguish shapes:
//tags check
if (slide.Shapes[i].Tags.Count > 0 && slide.Shapes[i].Tags["MyPic"]!=null && slide.Shapes[i].ActionSettings[PpMouseActivation.ppMouseClick].Hyperlink.Address != null)
MessageBox.Show(slide.Shapes[i].ActionSettings[PpMouseActivation.ppMouseClick].Hyperlink.Address);
//or
//name check
if (slide.Shapes[i].Name.Equals("MyPic", StringComparison.InvariantCultureIgnoreCase) && slide.Shapes[i].ActionSettings[PpMouseActivation.ppMouseClick].Hyperlink.Address != null)
MessageBox.Show(slide.Shapes[i].ActionSettings[PpMouseActivation.ppMouseClick].Hyperlink.Address);

Categories