Removing hyperlinks from PDF with iTextSharp

Removing hyperlinks from PDF with iTextSharp - c#

I'm using iTextSharp to make PDF files. Specifically, the source of these PDF files is Outlook emails. So there are mailto:, website, etc links everywhere. These links don't show with an underline, they just link to a site when you hover over them and click. I looked at this question: Remove hyperlinks from a PDF document (iTextSharp) and tried both solutions, however neither removed the links. Can someone offer some advice on how to remove hyperlinks from a PDF USING iTextSharp?
I am using Adobe Acrobat Standard X.
Example: http://s000.tinyupload.com/?file_id=05714019267649134441

The solution was a preference in Acrobat/Reader. Go to Edit -> Preferences -> General tab -> Uncheck "Create links from URLs". The best part is, this setting will stay in effect while opening different documents. Credits to Chris Haas.

Related

Show PowerPoint in Web

How to show a PowerPoint file in web using C#? I want to get file from a URL then do some ways to show this PowerPoint in HTML.
I do not want to use drive in my project.
Help me ! Thanks

I was looking for a simple way to do this as well recently and found this answer by #wclear:
just to update this question - as there is a new way to embed
Powerpoints in a web page. If you have an account on OneDrive, do the
following using Powerpoint Online (accessing Powerpoint via the
browser) to embed a Powerpoint:
Click 'File', then 'Share', then 'Embed' Click the 'Generate' button to generate HTML
code to be embedded
Copy the 'Embed Code' and paste it in the HTML of a website

Clicking a link in PDF file

Some help needed with clicking link in downloaded pdf file.
With C#, Selenium and AutoIt , i have downloaded a pdf file. Now i want to open that pdf file and click on dynamic link that is placed in page 2.
I was able to open pdf but didnt got to know how to click link in pdf. some help needed

Web driver does not support clicking links inside a PDF - use an open source pdf API - Apache pdfbox- this will extract all the links and then you can navigate further with webdriver.

Embed word document into another WITHOUT icon

How to embed a word document into another word document via OpenXML SDK, but showing content, not an icon of word? Such, as we do it manually in word: Insert object from file -> WITHOUT checking "Dispaly as icon"?
I've found this article, but it uses an icon. I've also tried to use OpenXML SDK Productivity Tool, but shows only generated binary data.
EDITED:
I use the following code:
DrawAspect = OleDrawAspectValues.Content
and then i add image part:
var imagePart = mainDocumentPart.AddNewPart<ImagePart>("image/x-emf", imagePartId);
GenerateImagePart(imagePart);
But my image part - is just an array of bytes of word's icon.
So, in this case happens the following: when i open generated document, it shows embedded document as an icon, but when i double click this embedded document, edit it and save changes, the embedded document is shown as a content, so maybe it's possible in some way to show this content without editing embedded document? Should i use instead of array of bytes of word's icon an array of bytes of doc's screenshot?
Not sure i described it clear, so please ask

I'm afraid what you are asking for is almost impossible.
The only difference as far as the word file is concerned between the icon and the embedded file, is the image.
When you don't use a icon Word pretty much just take a screenshot of the document you are embedding and inserts that in place of the Icon graphic.
I've uploaded an example I grabbed from a Word file I made. Found this little gem in the /media folder inside the .docx file.
So basicly, your only choice in resolving this if you can't live with the Icon is to somehow grab a picture of the word-file you want to embed and insert that instead of the Icon image.
How you'd go about that can't be pretty. First of all the open xml sdk contains no such functionality. I tried playing a bit around with office interop as well, but no luck.
I only see two possible ways to achieve this.
First one is via Interop. You'll need to install a "pretend printer" like the ones that print to PDF instead of sending it to a printer. This one however needs to print to an image format. The format of the file in the Media folder was .emf but I'm not positive thats a requirement.
Anyways, should the above somehow be possible you could embed that picture, pretty much using the example you link from Microsoft, and just change this size of the "icon" which now would be an image of the document.
Second possibility would be to open the word document as a process, set the document size to 72% (or whatever makes the document be the only one on screen on your desktop) and the grab a print screen and cut it down to just the document and the use that as your image for the embedding.
For the record, I don't recommend you do any of the above, but thoose are the only options I see.
Should someone have a better solution to this I'm all ears.
Finally, should you decide that you want to push on with this, I'll be happy to code up an example of option number 2 if you reply and tell me you'd like that.
Kaspar

There is a nice wrapper API (Document Builder 2.2) around open xml specially designed to merge documents, with flexibility of choosing the paragraphs to merge etc. You can download it from here.
Using this tool you can embed a paragraph of another word document or entire word document as per your requirement.
The documentation and screen casts on how to use it are here.
Hope this helps.

Finding hyperlinks inside a PDF document?

I'm currently using Aspose PDF Kit to split a 'master PDF' up into individual documents + thumbnails. This works well at the moment, but the device I'll be rendering the PDF on won't know about the annotations/links within the PDF.
I understand there is a way to parse the PDF document to detect the X/Y position of a hyperlink etc, is there an simple way to extract/iterate across the document data so I can write it to an external XML file?

You may want to try Docotic.Pdf library for this (disclaimer: I work for Bit Miracle).
The library can be used to retrieve all hyperlinks in a document. You may retrieve bounding box, text and other properties of a link, too.
Please take a look at "Extract text from link target" sample. It may help you to get started.

Pager HTML viewer

We are currently developing a Windows Forms application in VS 2008 C#. This application is for reading long (200 - 300 pages) law documents, and it handles about 30 - 40 docs. The application searches in document text, switches between documents, etc.
Our customer has sent the docs in separate *.rtf files for us to "put it into the application". We decided to convert the rtf files into HTML, using the MS Word's "Save as" function, and then selecting "filtered HTML". In this solution, the application can show the documents in a WebBrowser control.
Our problem is: the Customer wants an additional "Pager view" function, where the user can read the documents like it would be a book. He can see the pages on a virtual paper sheet, and then click next page, previous page, etc. Like in the browser's Print preview dialog.
I have searched the internet for any Pager HTML viewer, but I haven't found anything. Could you suggest any solution or component for showing the HTML pages in pager mode?
In last case, we can hold the original rtf files too for Pager View showing. In this case, is there any solution for view RichText files in pager mode? (We want to avoid it, if it's possible.)
Waiting for your answer:
Peter

I don't know of any components that can display HTML in pages, but a couple possible solutions could be:
edit the HTML documents and manually separate them into linked pages (or hidden divs with javascript to hide/unhide divs for navigating)
convert the RTF docs to XPS format and use WPF's DocumentViewer control - but since your app is WinForms, then you'd probably have to do something like this:
http://www.codeproject.com/KB/dialog/WinFormWPFIntegration.aspx
(though someone commented on that page about a memory-leak :S that's something to keep an eye open for...)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.