I have a MS word file which has some sentences and I need to insert some images in between the lines. When I am using the AddPicture method in Microsoft.Office.Interop.Word I am able to insert the image but not at a particular position.
I did not find any method to insert other than AddPicture to insert an image into existing word file. I am trying to insert an image after a particular line after apple there should an image of apple.
Here I am creating a paragraph and trying to add the image. This is my initial file:
This contains paragraphs containing the words apple, mango, and grape.
This is the output of my code (below)
The image should be inserted after the apple line
Required output:
Required Output
using System;
using System.Collections.Generic;
using System.Reflection;
using System.Reflection.Metadata;
using Word =Microsoft.Office.Interop.Word;
using System.IO;
namespace ConsoleApp2
{
class Program
{
static void Main(string[] args)
{
Word.Application ap = new Word.Application();
Word.Document document = ap.Documents.Open(#"C:\Users\ermcnnj\Desktop\Doc1.docx");
//document.InlineShapes.AddPicture(#"C:\Users\ermcnnj\Desktop\apple.png");
String read = string.Empty;
List<string> data = new List<string>();
for (int i = 0; i < document.Paragraphs.Count; i++)
{
string temp = document.Paragraphs[i + 1].Range.Text.Trim();
if (temp != string.Empty && temp.Contains("Apple"))
{
var pPicture = document.Paragraphs.Add();
pPicture.Format.SpaceAfter = 10f;
document.InlineShapes.AddPicture(#"C:\Users\ermcnnj\Desktop\apple.png", Range: pPicture.Range);
}
}
}
}
}
The above is the code I am using.
The following code snippet illustrates how this can be done. Note that. for the sake of clarity, it's simplified to set only the text to be found - there are a lot of additional properties that might need to be specified; read up on the Find functionality in Word's Language Reference.
If a search term is found, the Range associated with Find changes to the found term and further action can be taken. In this case, a new (empty) paragraph is inserted after the found term. (The question specifies that the term is the entire content of a paragraph, so that's what this example assumes!) The Range is then moved to this new paragraph and the InlineShape inserted.
Note how the graphic is assigned to an InlineShape object. If anything needs to be done to this object, work with the object variable ils.
Word.Application ap = new Word.Application();
Word.Document document = ap.Documents.Open(#"C:\Users\ermcnnj\Desktop\Doc1.docx");
Word.Range rng = document.Content;
Word.Find wdFind = rng.Find;
wdFind.Text = "apple";
bool found = wdFind.Execute();
if (found)
{
rng.InsertAfter("\n");
rng.MoveStart(Word.WdUnits.wdParagraph, 1);
Word.InlineShape ils = rng.InlineShapes.AddPicture(#"C:\Test\avatar.jpg", false, true, rng);
}
Related
I have code that copies the content of one PowerPoint slide into another. Below is an example of how images are processed.
foreach (OpenXmlElement element in sourceSlide.CommonSlideData.ShapeTree.ChildElements.ToList())
{
string elementType = element.GetType().ToString();
if (elementType.EndsWith(".Picture"))
{
// Deep clone the element.
elementClone = element.CloneNode(true);
var picture = (Picture)elementClone;
// Get the picture's original rId
var blip = picture.BlipFill.Blip;
string rId = blip.Embed.Value;
// Retrieve the ImagePart from the original slide by rId
ImagePart sourceImagePart = (ImagePart)sourceSlide.SlidePart.GetPartById(rId);
// Add the image part to the new slide, letting OpenXml generate the new rId
ImagePart targetImagePart = targetSlidePart.AddImagePart(sourceImagePart.ContentType);
// And copy the image data.
targetImagePart.FeedData(sourceImagePart.GetStream());
// Retrieve the new ID from the target image part,
string id = targetSlidePart.GetIdOfPart(targetImagePart);
// and assign it to the picture.
blip.Embed.Value = id;
// Get the shape tree that we're adding the clone to and append to it.
ShapeTree shapeTree = targetSlide.CommonSlideData.ShapeTree;
shapeTree.Append(elementClone);
}
This code works fine. For other scenarios like Graphic Frames, it looks a bit different, because each graphic frame can contain multiple picture objects.
// Go thru all the Picture objects in this GraphicFrame.
foreach (var sourcePicture in element.Descendants<Picture>())
{
string rId = sourcePicture.BlipFill.Blip.Embed.Value;
ImagePart sourceImagePart = (ImagePart)sourceSlide.SlidePart.GetPartById(rId);
var contentType = sourceImagePart.ContentType;
var targetPicture = elementClone.Descendants<Picture>().First(x => x.BlipFill.Blip.Embed.Value == rId);
var targetBlip = targetPicture.BlipFill.Blip;
ImagePart targetImagePart = targetSlidePart.AddImagePart(contentType);
targetImagePart.FeedData(sourceImagePart.GetStream());
string id = targetSlidePart.GetIdOfPart(targetImagePart);
targetBlip.Embed.Value = id;
}
Now I need to do the same thing with OLE objects.
// Go thru all the embedded objects in this GraphicFrame.
foreach (var oleObject in element.Descendants<OleObject>())
{
// Get the rId of the embedded OLE object.
string rId = oleObject.Id;
// Get the EmbeddedPart from the source slide.
var embeddedOleObj = sourceSlide.SlidePart.GetPartById(rId);
// Get the content type.
var contentType = embeddedOleObj.ContentType;
// Create the Target Part. Let OpenXML assign an rId.
var targetObjectPart = targetSlide.SlidePart.AddNewPart<EmbeddedObjectPart>(contentType, null);
// Get the embedded OLE object data from the original object.
var objectStream = embeddedOleObj.GetStream();
// And give it to the ObjectPart.
targetObjectPart.FeedData(objectStream);
// Get the new rId and assign it to the OLE Object.
string id = targetSlidePart.GetIdOfPart(targetObjectPart);
oleObject.Id = id;
}
But it didn't work. The resulting PowerPoint is corrupted.
What am I doing wrong?
NOTE: All of the code works except for the rId handling in the OLE Object. I know it works because if I simply pass the original rId from the source object to the target Object Part, like this:
var targetObjectPart = targetSlide.SlidePart
.AddNewPart<EmbeddedObjectPart>(contentType, rId);
it will function properly, so long as that rId doesn't already exist in the target slide, which will obviously not work every time like I need it to.
The source slide and target slide are coming from different PPTX files. We're using OpenXML, not Office Interop.
Since you did not provide the full code, it is difficult to tell what's wrong.
My guess would be that you are not modifying the correct object.
In your code example for Pictures, you are creating and modifying elementClone.
In your code example for ole objects, you are working with and modifying oleObject (which is a descendant of element) and it is not exacly clear from the context, whether it is a part of the source document or of the target document.
You can try this minimal example:
use a new pptx with one embedded ole object for c:\testdata\input.pptx
use a new pptx (a blank one) for c:\testdata\output.pptx
After running the code, I was able to open the embedded ole object in the output document.
using DocumentFormat.OpenXml.Presentation;
using DocumentFormat.OpenXml.Packaging;
using System.Linq;
namespace ooxml
{
class Program
{
static void Main(string[] args)
{
CopyOle("c:\\testdata\\input.pptx", "c:\\testdata\\output.pptx");
}
private static void CopyOle(string inputFile, string outputFile)
{
using (PresentationDocument sourceDocument = PresentationDocument.Open(inputFile, true))
{
using (PresentationDocument targetDocument = PresentationDocument.Open(outputFile, true))
{
var sourceSlidePart = sourceDocument.PresentationPart.SlideParts.First();
var targetSlidePart = targetDocument.PresentationPart.SlideParts.First();
foreach (var element in sourceSlidePart.Slide.CommonSlideData.ShapeTree.ChildElements)
{
//clones an element, does not copy the actual relationship target (e.g. ppt\embeddings\oleObject1.bin)
var elementClone = element.CloneNode(true);
//for each cloned OleObject, fix its relationship
foreach(var clonedOleObject in elementClone.Descendants<OleObject>())
{
//find the original EmbeddedObjectPart in the source document
//(we can use the id from the clonedOleObject to do that, since it contained the same id
// as the source ole object)
var sourceObjectPart = sourceSlidePart.GetPartById(clonedOleObject.Id);
//create a new EmbeddedObjectPart in the target document and copy the data from the original EmbeddedObjectPart
var targetObjectPart = targetSlidePart.AddEmbeddedObjectPart(sourceObjectPart.ContentType);
targetObjectPart.FeedData(sourceObjectPart.GetStream());
//update the relationship target on the clonedOleObject to point to the newly created EmbeddedObjectPath
clonedOleObject.Id = targetSlidePart.GetIdOfPart(targetObjectPart);
}
//add cloned element to the document
targetSlidePart.Slide.CommonSlideData.ShapeTree.Append(elementClone);
}
targetDocument.PresentationPart.Presentation.Save();
}
}
}
}
}
As for troubleshooting, the OOXML Tools chrome extension was helpful.
It allows to compare the structure of two documents, so it is way easier to analyze what went wrong.
Examples:
if you were to only clone all elements, you could see that /ppt/embeddings/* and /ppt/media/* would be missing
or you can check whether the relationships are correct (e.g. input document uses "rId1" to reference the embedded data and the output document uses "R3a2fa0c37eaa42b5")
How can I find and then hide (or delete) specific text phrase?
For example, I have created a PDF file containing all sorts of data such as images, tables, text etc.
Now, I want to find a specific phrase like "Hello World" wherever it is mentioned in the file and somehow hide it, or -better even- delete it from the PDF.
And finally get the PDF after deleting this phrase.
I have tried iTextSharp and Spire, but couldn't find anything that worked.
Try the following code snippets to hide the specifc text phrase on PDF using Spire.PDF.
using Spire.Pdf;
using Spire.Pdf.General.Find;
using System.Drawing;
namespace HideText
{
class Program
{
static void Main(string[] args)
{
//load PDF file
PdfDocument doc = new PdfDocument();
doc.LoadFromFile(#"C:\Users\Administrator\Desktop\Example.pdf");
//find all results where "Hello World" appears
PdfTextFind[] finds = null;
foreach (PdfPageBase page in doc.Pages)
{
finds = page.FindText("Hello World").Finds;
}
//cover the specific result with white background color
finds[0].ApplyRecoverString("", Color.White, false);
//save to file
doc.SaveToFile("output.pdf");
}
}
}
Result
The following snippet from here let you find and black-out the text in pdf document:
PdfDocument pdf = new PdfDocument(new PdfReader(SRC), new PdfWriter(DEST));
ICleanupStrategy cleanupStrategy = new RegexBasedCleanupStrategy(new Regex(#"Alice", RegexOptions.IgnoreCase)).SetRedactionColor(ColorConstants.PINK);
PdfAutoSweep autoSweep = new PdfAutoSweep(cleanupStrategy);
autoSweep.CleanUp(pdf);
pdf.Close();
Pay attention to the license. It is AGPL, if you don't buy license.
I am trying to modify a word document and inserting data at some specific positions( I have a template document which I must get it ready and fill all the blank spaces ).I am using Microsoft.Office.Interop.Word library and till now I just figure out how to insert text at the end of the document, I will write down the code too so maybe someone can help me out.Thanks!
private void button1_Click(object sender, EventArgs e)
string str = null;
OpenFileDialog dia = new OpenFileDialog();
if (dia.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
str = dia.FileName;
Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document doc1 = app.Documents.Open(str);
object missing = System.Reflection.Missing.Value;
doc1.Content.Text += "Merge?";
app.Visible = true;
doc1.Save();
this.Close();
}
}
For sake of simplicity, first add the bookmark in MS Word as follow:
Select the region where you want to add the text, Then go to Insert > Bookmark in Word.
Then give the name to the bookmark as follow:
Then use the follow modified version of Ben:
Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.Application();
Document doc = app.Documents.Open(Path.Combine(Environment.CurrentDirectory, "Report.doc"));
Dictionary<string, string> bookmarks = new Dictionary<string, string> { { "DateOfIssue", "23-06-2018"}, { "TotalNumOfPages", "20" } };
foreach (var bookmark in bookmarks)
{
Bookmark bm = doc.Bookmarks[bookmark.Key];
Range range = bm.Range;
range.Text = bookmark.Value;
doc.Bookmarks.Add(bookmark.Key, range);
}
Finally the output is as follow:
You can use the Range object to insert text at a specific position. msdn
doc1.Range(0, 0).Text = "Hello World";
If you have a template and the position to insert the text is always at the same location, you could also use Bookmark. msdn
[Update]
Here is a complete example to add text to a word document by a bookmark:
Application app = new Microsoft.Office.Interop.Word.Application();
Document doc = app.Documents.Open(#"your file");
string bookmark = "BookmarkName";
Bookmark bm = doc.Bookmarks[bookmark];
Range range = bm.Range;
range.Text = "Hello World";
doc.Bookmarks.Add(bookmark, range);
With this solution, the bookmark will not be deleted and you can add/modify it later again with the same piece of code.
You can use the following to insert a string into another string in a specific position.
doc1.Content.Text = doc1.Content.Text.Insert(10, "Merge?");
Source: https://msdn.microsoft.com/en-us/library/system.string.insert(v=vs.110).aspx
This question already has an answer here:
Replace text in PDF file using iTextSharp(not AcroFields) [closed]
(1 answer)
Closed 6 years ago.
I' ve been searching the Internet for 2 Weeks and found some interesting solutions for my Problem, but nothing seems to give me the answer.
My goal is to do the folowing:
I want to find a Text in a static PDF-File and replace this text with another text.
I would like to keep the design of the content. Is it really that hard?
I found a way but I lost the whole information:
using (PdfReader reader = new PdfReader(path))
{
StringBuilder text = new StringBuilder();
for (int i = 1; i <= reader.NumberOfPages; i++)
{
text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
text.Replace(txt_SuchenNach.Text, txt_ErsetzenMit.Text);
}
return text.ToString();
}
The second try I had was way better, but needs fields where I can change the text inside:
string fileNameExisting =path;
string fileNameNew = #"C:\TEST.pdf";
using (FileStream existingFileStream = new FileStream(fileNameExisting, FileMode.Open))
using (FileStream newFileStream = new FileStream(fileNameNew, FileMode.Create))
{
// PDF öffnen
PdfReader pdfReader = new PdfReader(existingFileStream);
PdfStamper stamper = new PdfStamper(pdfReader, newFileStream);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
foreach (string fieldKey in fieldKeys)
{
var value = pdfReader.AcroFields.GetField(fieldKey);
form.SetField(fieldKey, value.Replace(txt_SuchenNach.Text, txt_ErsetzenMit.Text));
}
// Textfeld unbearbeitbar machen (sieht aus wie normaler text)
stamper.FormFlattening = true;
stamper.Close();
pdfReader.Close();
}
This keeps the formatation of the rest of text and does only change my searched text. I need a solution for text which is NOT in a Textfield.
thanks for all your answers and your help.
The general issue is that text objects may use embedded fonts with specific glyphs assigned to specific letters. I.e. if you have a text object with some text like "abcdef" then the embedded font may contain glyphs for these ("abcdef" letters) only but not for other letters. So if you replace "abcdef" with "xyz" then the PDF will not display these "xyz" as no glyphs are available for these letters to be displayed.
So I would consider the following workflow:
Iterate through all the text objects;
Add new text objects created from scratch on top of PDF file and set the same properties (font, position, etc) but with a different text; This step could require you to have the same fonts installed on your as were used in the original PDF but you may check for installed fonts and use another font for a new text object. This way iTextSharp or another PDF tool will embed a new font object for a new text object.
Remove original text object once you have created a duplicated text object;
Process every text object with the workflow described above;
Save the modified PDF document into a new file.
I have worked on the same requirement and I am able to achieve this by the following steps.
Step1: Locating Source Pdf File and Destination file Path
Step2: Read Source Pdf file and Searching for the location of string that we want to replace
Step3: Replacing the string with new one.
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using PDFExtraction;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
namespace PDFReplaceTextUsingItextSharp
{
public partial class ExtractPdf : System.Web.UI.Page
{
static iTextSharp.text.pdf.PdfStamper stamper = null;
protected void Page_Load(object sender, EventArgs e)
{
}
protected void Replace_Click(object sender, EventArgs e)
{
string ReplacingVariable = txtReplace.Text;
string sourceFile = "Source File Path";
string descFile = "Destination File Path";
PdfReader pReader = new PdfReader(sourceFile);
stamper = new iTextSharp.text.pdf.PdfStamper(pReader, new System.IO.FileStream(descFile, System.IO.FileMode.Create));
PDFTextGetter("ExistingVariableinPDF", ReplacingVariable , StringComparison.CurrentCultureIgnoreCase, sourceFile, descFile);
stamper.Close();
pReader.Close();
}
/// <summary>
/// This method is used to search for the location words in pdf and update it with the words given from replacingText variable
/// </summary>
/// <param name="pSearch">Searchable String</param>
/// <param name="replacingText">Replacing String</param>
/// <param name="SC">Case Ignorance</param>
/// <param name="SourceFile">Path of the source file</param>
/// <param name="DestinationFile">Path of the destination file</param>
public static void PDFTextGetter(string pSearch, string replacingText, StringComparison SC, string SourceFile, string DestinationFile)
{
try
{
iTextSharp.text.pdf.PdfContentByte cb = null;
iTextSharp.text.pdf.PdfContentByte cb2 = null;
iTextSharp.text.pdf.PdfWriter writer = null;
iTextSharp.text.pdf.BaseFont bf = null;
if (System.IO.File.Exists(SourceFile))
{
PdfReader pReader = new PdfReader(SourceFile);
for (int page = 1; page <= pReader.NumberOfPages; page++)
{
myLocationTextExtractionStrategy strategy = new myLocationTextExtractionStrategy();
cb = stamper.GetOverContent(page);
cb2 = stamper.GetOverContent(page);
//Send some data contained in PdfContentByte, looks like the first is always cero for me and the second 100,
//but i'm not sure if this could change in some cases
strategy.UndercontentCharacterSpacing = (int)cb.CharacterSpacing;
strategy.UndercontentHorizontalScaling = (int)cb.HorizontalScaling;
//It's not really needed to get the text back, but we have to call this line ALWAYS,
//because it triggers the process that will get all chunks from PDF into our strategy Object
string currentText = PdfTextExtractor.GetTextFromPage(pReader, page, strategy);
//The real getter process starts in the following line
List<iTextSharp.text.Rectangle> MatchesFound = strategy.GetTextLocations(pSearch, SC);
//Set the fill color of the shapes, I don't use a border because it would make the rect bigger
//but maybe using a thin border could be a solution if you see the currect rect is not big enough to cover all the text it should cover
cb.SetColorFill(BaseColor.WHITE);
//MatchesFound contains all text with locations, so do whatever you want with it, this highlights them using PINK color:
foreach (iTextSharp.text.Rectangle rect in MatchesFound)
{
//width
cb.Rectangle(rect.Left, rect.Bottom, 60, rect.Height);
cb.Fill();
cb2.SetColorFill(BaseColor.BLACK);
bf = BaseFont.CreateFont(BaseFont.HELVETICA_BOLD, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
cb2.SetFontAndSize(bf, 9);
cb2.BeginText();
cb2.ShowTextAligned(0, replacingText, rect.Left, rect.Bottom, 0);
cb2.EndText();
cb2.Fill();
}
}
}
}
catch (Exception ex)
{
}
}
}
}
I'm looking to replace a bookmark in a word document with the entire contents of another word document. I was hoping to do something along the lines of the following, but appending the xml does not seem to be enough as it does not include pictures.
using Word = Microsoft.Office.Interop.Word;
...
Word.Application wordApp = new Word.Application();
Word.Document doc = wordApp.Documents.Add(filename);
var bookmark = doc.Bookmarks.OfType<Bookmark>().First();
var doc2 = wordApp.Documents.Add(filename2);
bookmark.Range.InsertXML(doc2.Contents.XML);
The second document contains a few images and a few tables of text.
Update: Progress made by using XML, but still doesn't satisfy adding pictures as well.
You've jumped in deep.
If you're using the object model (bookmark.Range) and trying to insert a picture you can use the clipboard or bookmark.Range.InlineShapes.AddPicture(...). If you're trying to insert a whole document you can copy/paste the second document:
Object objUnit = Word.WdUnits.wdStory;
wordApp.Selection.EndKey(ref objUnit, ref oMissing);
wordApp.ActiveWindow.Selection.PasteAndFormat(Word.WdRecoveryType.wdPasteDefault);
If you're using XML there may be other problems, such as formatting, images, headers/footers not coming in correctly.
Depending on the task it may be better to use DocumentBuilder and OpenXML SDK. If you're writing a Word addin you can use the object API, it will likely perform the same, if you're processing documents without Word go with OpenXML SDK and DocumentBuilder. The issue with DocumentBuilder is if it doesn't work there aren't many work-arounds to try. It's open source not the cleanest piece of code if you try troubleshooting it.
You can do this with openxml SDK and Document builder. To outline here is what you will need
1> Inject insert key in main doc
public WmlDocument GetProcessedTemplate(string templatePath, string insertKey)
{
WmlDocument templateDoc = new WmlDocument(templatePath);
using (MemoryStream mem = new MemoryStream())
{
mem.Write(templateDoc.DocumentByteArray, 0, templateDoc.DocumentByteArray.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open([source], true))
{
XDocument xDoc = doc.MainDocumentPart.GetXDocument();
XElement bookMarkPara = [get bookmarkPara to replace];
bookMarkPara.ReplaceWith(new XElement(PtOpenXml.Insert, new XAttribute("Id", insertKey)));
doc.MainDocumentPart.PutXDocument();
}
templateDoc.DocumentByteArray = mem.ToArray();
}
return templateDoc;
}
2> Use document builder to merge
List<Source> documentSources = new List<Source>();
var insertKey = "INSERT_HERE_1";
var processedTemplate = GetProcessedTemplate([docPath], insertKey);
documentSources.Add(new Source(processedTemplate, true));
documentSources.Add(new Source(new WmlDocument([docToInsertFilePath]), insertKey));
DocumentBuilder.BuildDocument(documentSources, [outputFilePath]);