Edit PDF text using C#

Edit PDF text using C# - c#

How can I find and then hide (or delete) specific text phrase?
For example, I have created a PDF file containing all sorts of data such as images, tables, text etc.
Now, I want to find a specific phrase like "Hello World" wherever it is mentioned in the file and somehow hide it, or -better even- delete it from the PDF.
And finally get the PDF after deleting this phrase.
I have tried iTextSharp and Spire, but couldn't find anything that worked.

Try the following code snippets to hide the specifc text phrase on PDF using Spire.PDF.
using Spire.Pdf;
using Spire.Pdf.General.Find;
using System.Drawing;
namespace HideText
{
class Program
{
static void Main(string[] args)
{
//load PDF file
PdfDocument doc = new PdfDocument();
doc.LoadFromFile(#"C:\Users\Administrator\Desktop\Example.pdf");
//find all results where "Hello World" appears
PdfTextFind[] finds = null;
foreach (PdfPageBase page in doc.Pages)
{
finds = page.FindText("Hello World").Finds;
}
//cover the specific result with white background color
finds[0].ApplyRecoverString("", Color.White, false);
//save to file
doc.SaveToFile("output.pdf");
}
}
}
Result

The following snippet from here let you find and black-out the text in pdf document:
PdfDocument pdf = new PdfDocument(new PdfReader(SRC), new PdfWriter(DEST));
ICleanupStrategy cleanupStrategy = new RegexBasedCleanupStrategy(new Regex(#"Alice", RegexOptions.IgnoreCase)).SetRedactionColor(ColorConstants.PINK);
PdfAutoSweep autoSweep = new PdfAutoSweep(cleanupStrategy);
autoSweep.CleanUp(pdf);
pdf.Close();
Pay attention to the license. It is AGPL, if you don't buy license.

Related

Searching PDF file for specific word then saving that specific PDF page C#

I am trying to create an app which will search through a bulk PDF file for a letterID (eg 1234567) which is inputted via a textbox. If it locates it it will then save that specific page to a new document. I'm currently using PDFSharp but I'm struggling to find anything online that resembles what I am trying to achieve.
UPDATE: I have solved my issue and managed to get a working result from it! Will update thread tomorrow with code as it may help others.
private void btnSearch_Click(object sender, EventArgs e)
{
string letterID = txtLetterID.Text;
// Open the input file in Import Mode
PdfDocument inputPDFFile = PdfReader.Open(path, PdfDocumentOpenMode.Import);
PdfDictionary dictionary = new PdfDictionary(inputPDFFile);
string id = dictionary.Elements.GetString(letterID);
//Get the total pages in the PDF
var totalPagesInInputPDFFile = inputPDFFile.PageCount;
if (id.Equals(letterID))
{
//Create an instance of the PDF document in memory
PdfDocument outputPDFDocument = new PdfDocument();
// Add a specific page to the PdfDocument instance
outputPDFDocument.AddPage(inputPDFFile.Pages[totalPagesInInputPDFFile - 1]);
//save the PDF document
SaveOutputPDF(outputPDFDocument, totalPagesInInputPDFFile);
}
else
{
lblMessage.Text = "Letter ID not found in this set of letters";
}
}

How to show PDF file in MigraDoc.Rendering.Forms.DocumentPreview (in WinForms)?

I have a simple questions. How can you show a PDf file by using PagePreview?
I have a full pathname document.FileName = "c:\scans\Insurance_34345.pdf";
pagePreview.Preview(document.FileName); or something...
If there another way for showing a pdf. It's okay. I want to show it on a WinForms Form.
I tried this. I don't know what I have to do...
in the Designer
private MigraDoc.Rendering.Forms.DocumentPreview dpvScannedDoc;
Part of the code
string fullPadnaam = Path.Combine(defaultPath, document.FileName);
//PdfDocument pdfDocument = new PdfDocument(fullPadnaam);
//PdfPage page = new PdfPage(pdfDocument);
//XGraphics gfx = XGraphics.FromPdfPage(page);
MigraDoc.DocumentObjectModel.Document pdfDocument = new MigraDoc.DocumentObjectModel.Document();
pdfDocument.ImagePath = fullPadnaam;
var docRenderer = new DocumentRenderer(pdfDocument);
docRenderer.PrepareDocument();
var inPdfDoc = PdfReader.Open(fullPadnaam, PdfDocumentOpenMode.ReadOnly);
for (var i = 0; i < inPdfDoc.PageCount; i++)
{
pdfDocument.AddSection();
docRenderer.PrepareDocument();
var page = inPdfDoc.Pages[i];
var gfx = XGraphics.FromPdfPage(page);
docRenderer.RenderPage(gfx, i + 1);
}
var renderer = new PdfDocumentRenderer();
renderer.Document = pdfDocument;
renderer.RenderDocument();
// MigraDoc.DocumentObjectModel.IO.DdlWriter dw = new MigraDoc.DocumentObjectModel.IO.DdlWriter("HelloWorld.mdddl");
// dw.WriteDocument(pdfDocument);
// dw.Close();
//renderer.PdfDocument.rea(outFilePath);
//string ddl = MigraDoc.DocumentObjectModel.IO.DdlWriter.WriteToString(document1);
dpvScannedDoc.Show( pdfDocument);

PDFsharp does not render PDF files. You cannot show PDF files using the PagePreview.
If you use the XGraphics class for drawing then you can use shared code that draws on the PagePreview and on PDF pages.
The PagePreview sample can be found in the sample package and here:
http://www.pdfsharp.net/wiki/Preview-sample.ashx
If you have code that creates a new PDF file using PDFsharp then you can use the PagePreview to show on screen what you would otherwise draw on PDF pages. You cannot draw existing PDF pages using the PagePreview because PDF does not render PDF.

The MigraDoc DocumentPreview can display MDDDL files (your sample code creates a file "HelloWorld.mdddl"), but it cannot display PDF files.
If the MDDDL uses PDF files as images, they will not show up in the preview. They will show when creating a PDF from the MDDDL.

How can I remove image properties such as local path that Adobe Illustrator has been embedded to PDF file?

I'm trying to replace image in PDF file using iTextSharp(not a java version). It works fine but there only the problem is when I open that PDF with Adobe Illustrator it's always opened with the old hard link. It means Abode Illustrator always view the old image before replace. And a little weird here that it view fine with Adobe Reader(the replaced image can be viewed).
This is the snip code that I've tried:
public static void ReplaceImage(string pdfIn, string imagePath, string pdfOut)
{
PdfReader reader = new PdfReader(pdfIn);
PdfStamper stamper = new PdfStamper(reader, new FileStream(pdfOut, FileMode.Create));
PdfWriter writer = stamper.Writer;
Image img = Image.GetInstance(SysDrawing.Image.FromFile(imagePath), ImageFormat.Tiff);
PdfDictionary page = reader.GetPageN(1);
PdfDictionary resources = page.GetAsDict(PdfName.RESOURCES);
PdfDictionary xobject = resources.GetAsDict(PdfName.XOBJECT);
PdfDictionary properties = resources.GetAsDict(PdfName.PROPERTIES);
PdfDictionary procset = resources.GetAsDict(PdfName.PROCSET);
if (xobject != null)
{
List<PdfName> imgs = new List<PdfName>();
foreach (var ele in xobject.Keys)
{
PdfIndirectReference iref = xobject.GetAsIndirectObject(ele);
imgs.Add(ele);
if (iref.IsIndirect())
{
try
{
PdfDictionary pg = (PdfDictionary)PdfReader.GetPdfObject(iref);
if (pg != null)
{
PdfReader.KillIndirect(iref);
if (PdfName.IMAGE.Equals(SubType))
{
if (img.ImageMask != null)
writer.AddDirectImageSimple(img.ImageMask);
writer.AddDirectImageSimple(img, iref);
}
}
else
{
PdfReader.KillIndirect(iref);
writer.AddDirectImageSimple(img, iref);
}
}
catch {
continue;
}
}
}
}
//stamper.SetFullCompression();
stamper.Close();
stamper.Dispose();
reader.RemoveUnusedObjects();
reader.RemoveAnnotations();
reader.RemoveFields();
reader.Close();
reader.Dispose();
}
Any answer would be appreciated!

Your PDF contains two different documents: one described using PDF syntax and one described using Adobe Illustrator syntax. These two different documents should look identical, but as you have changed the PDF version of the document, they no longer do.
You perceive the document as only one document, because the AI document is stored inside the PDF document. In another question on SO, mkl explains the mechanism: Insert hidden digest in pdf using iText library
In his answer, mkl explains how to add hidden data (in this case a hidden digest, in your case the document in IA format) into a PDF.
You can remove this second document like this:
PdfDictionary catalog = reader.getCatalog();
catalog.remove(PdfName.PIECEINFO);
Of course, this throws away the Adobe Illustrator entirely, so you won't be able to edit the PDF in Adobe Illustrator anymore. If you want the image to change in the AI syntax, you need a library that is able to change AI syntax (and I don't know of any such library).

Replace a bookmark in a Word document with the contents of another word document

I'm looking to replace a bookmark in a word document with the entire contents of another word document. I was hoping to do something along the lines of the following, but appending the xml does not seem to be enough as it does not include pictures.
using Word = Microsoft.Office.Interop.Word;
...
Word.Application wordApp = new Word.Application();
Word.Document doc = wordApp.Documents.Add(filename);
var bookmark = doc.Bookmarks.OfType<Bookmark>().First();
var doc2 = wordApp.Documents.Add(filename2);
bookmark.Range.InsertXML(doc2.Contents.XML);
The second document contains a few images and a few tables of text.
Update: Progress made by using XML, but still doesn't satisfy adding pictures as well.

You've jumped in deep.
If you're using the object model (bookmark.Range) and trying to insert a picture you can use the clipboard or bookmark.Range.InlineShapes.AddPicture(...). If you're trying to insert a whole document you can copy/paste the second document:
Object objUnit = Word.WdUnits.wdStory;
wordApp.Selection.EndKey(ref objUnit, ref oMissing);
wordApp.ActiveWindow.Selection.PasteAndFormat(Word.WdRecoveryType.wdPasteDefault);
If you're using XML there may be other problems, such as formatting, images, headers/footers not coming in correctly.
Depending on the task it may be better to use DocumentBuilder and OpenXML SDK. If you're writing a Word addin you can use the object API, it will likely perform the same, if you're processing documents without Word go with OpenXML SDK and DocumentBuilder. The issue with DocumentBuilder is if it doesn't work there aren't many work-arounds to try. It's open source not the cleanest piece of code if you try troubleshooting it.

You can do this with openxml SDK and Document builder. To outline here is what you will need
1> Inject insert key in main doc
public WmlDocument GetProcessedTemplate(string templatePath, string insertKey)
{
WmlDocument templateDoc = new WmlDocument(templatePath);
using (MemoryStream mem = new MemoryStream())
{
mem.Write(templateDoc.DocumentByteArray, 0, templateDoc.DocumentByteArray.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open([source], true))
{
XDocument xDoc = doc.MainDocumentPart.GetXDocument();
XElement bookMarkPara = [get bookmarkPara to replace];
bookMarkPara.ReplaceWith(new XElement(PtOpenXml.Insert, new XAttribute("Id", insertKey)));
doc.MainDocumentPart.PutXDocument();
}
templateDoc.DocumentByteArray = mem.ToArray();
}
return templateDoc;
}
2> Use document builder to merge
List<Source> documentSources = new List<Source>();
var insertKey = "INSERT_HERE_1";
var processedTemplate = GetProcessedTemplate([docPath], insertKey);
documentSources.Add(new Source(processedTemplate, true));
documentSources.Add(new Source(new WmlDocument([docToInsertFilePath]), insertKey));
DocumentBuilder.BuildDocument(documentSources, [outputFilePath]);

Populate a word template using C# in ASP.NET MVC3

I read it some post referring to Populate word documents, but I need to populate a word document (Office 2007) using C#. For example i want to have a word document with a label [NAME], use that label in C# to put my value, and do all this in a ASP.NET MVC3 controller. Any idea?

You could use the OpenXML SDK provided by Microsoft to manipulate Word documents. And here's a nice article (it's actually the third of a series of 3 articles) with a couple of examples.

You can do like this :
- Introduce "signets" into your Word document template
- Work on a copy of your word template
- Modify signets values from c# code and save or print your file.
Be carefull with releasing correctly your word process if you treat several documents in your application :)

OP's solution extracted from the question:
The solution i found is this:
static void Main(string[] args)
{
Console.WriteLine("Starting up Word template updater ...");
//get path to template and instance output
string docTemplatePath = #"C:\Users\user\Desktop\Doc Offices XML\earth.docx";
string docOutputPath = #"C:\Users\user\Desktop\Doc Offices XML\earth_Instance.docx";
//create copy of template so that we don't overwrite it
File.Copy(docTemplatePath, docOutputPath);
Console.WriteLine("Created copy of template ...");
//stand up object that reads the Word doc package
using (WordprocessingDocument doc = WordprocessingDocument.Open(docOutputPath, true))
{
//create XML string matching custom XML part
string newXml = "<root>" +
"<Earth>Outer Space</Earth>" +
"</root>";
MainDocumentPart main = doc.MainDocumentPart;
main.DeleteParts<CustomXmlPart>(main.CustomXmlParts);
//MainDocumentPart mainPart = doc.AddMainDocumentPart();
//add and write new XML part
CustomXmlPart customXml = main.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using (StreamWriter ts = new StreamWriter(customXml.GetStream()))
{
ts.Write(newXml);
}
//closing WordprocessingDocument automatically saves the document
}
Console.WriteLine("Done");
Console.ReadLine();
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Edit PDF text using C# - c#

Related

Searching PDF file for specific word then saving that specific PDF page C#

How to show PDF file in MigraDoc.Rendering.Forms.DocumentPreview (in WinForms)?

How can I remove image properties such as local path that Adobe Illustrator has been embedded to PDF file?

Replace a bookmark in a Word document with the contents of another word document

Populate a word template using C# in ASP.NET MVC3

Categories

Resources