Add text and picture in .docx file - c#

I use a Office Word file (template) and in this file there is repetitive default text and photo that I have to replace it by another photo and text
How can I define specific zone in the template and then find those zones in C# to replace them ?

I think the best way is to find out how to manipulate the word xml structure to include the data you want.
For template filling and altering you can use the XML SDK from Microsoft
You can also follow this manual approach here without using the SDK.
Manual approach. You will add a custom XML Ressource that includes your changes/ressources for the template.
If you don`t need to be that flexible you can use the standard content control / picture content control in Word and replace them afterwards in C# - it depends how flexible you want to be in replacing elements..
You can find a good and complete example of using picture content control here: Picture content control handling

Ok, finally I try this approch ; use a Word file with Content Control and use a XML file to bind data to them
For that I use the following code :
string outFile = #"D:\template_created.docx";
string docPath = #"D:\template.docx";
string xmlPath = #"D:\template.xml";
File.Copy(docPath, outFile);
using (WordprocessingDocument doc = WordprocessingDocument.Open(outFile, true))
{
MainDocumentPart mdp = doc.MainDocumentPart;
if (mdp.CustomXmlParts != null)
{
mdp.DeleteParts<CustomXmlPart>(mdp.CustomXmlParts);
}
CustomXmlPart cxp = mdp.AddCustomXmlPart(CustomXmlPartType.CustomXml);
FileStream fs = null;
try
{
fs = new FileStream(xmlPath, FileMode.Open);
cxp.FeedData(fs);
mdp.Document.Save();
}
finally
{
if (fs != null)
{
fs.Dispose();
}
}
}
When I run the app, it created the custom XML file and append it to my Word file. When I open the Word file, there is no error, but all the Content Control are not filled

My final approach was to use Content Control in my Word document with a unique id. Then I can find those id's with C# and replace the content.

Related

How to convert docx to html file using open xml with formatting

I know there are lot of question having same title but I am currently having some issue for them I didn't get the correct way to go.
I am using Open xml sdk 2.5 along with Power tool to convert .docx file to .html file which uses HtmlConverter class for conversion.
I am successfully able to convert the docx file into the Html file but the problem is, html file doesn't retain the original formatting of the document file. eg. Font-size,color,underline,bold etc doesn't reflect into the html file.
Here is my existing code:
public void ConvertDocxToHtml(string fileName)
{
byte[] byteArray = File.ReadAllBytes(fileName);
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
{
HtmlConverterSettings settings = new HtmlConverterSettings()
{
PageTitle = "My Page Title"
};
XElement html = HtmlConverter.ConvertToHtml(doc, settings);
File.WriteAllText(#"E:\Test.html", html.ToStringNewLineOnAttributes());
}
}
}
So I just want to know if is there any way by which I can retain the formatting in converted HTML file.
I know about some third party APIs which does the same thing. But I would prefer if there any way using open xml or any other open source to do this.
PowerTools for Open XML just released a new HtmlConverter module. It now contains an open source, free implementation of a conversion from DOCX to HTML formatted with CSS. The module HtmlConverter.cs supports all paragraph, character, and table styles, fonts and text formatting, numbered and bulleted lists, images, and more. See https://openxmldeveloper.org/
Your end result will not look exactly the way your Word Document turns out, but this link might help.
You might want to find an external tool to help you do this, like Aspose Words
You can use OpenXML Viewer extension for Firefox for Converting with formatting.
http://openxmlviewer.codeplex.com
This works for me. Hope this helps.

Creating report with Aspose.Word without losing formatting

I am using Aspose.Words to create reports from a template file (.docx filetype).
After using Aspose.Words to modify the template file and saving it into a new file, the formatting of the template file were lost (such as bold text, comments, etc).
I have tried:
Aspose.Words.Document doc = new Document(inputStream);
var outputStream = new MemoryStream();
doc.Save(outputStream, SaveFormat.docx);
What I did not expect is that outputStream is much less bytes than inputStream although I have yet to make any modification on doc. It may the reason why the report file lose their formatting.
What should I try now?
Ok, the problem is because the current version of Aspose.Words I'm using does not support docx filetype. But it still can read text of a .docx file, and only text(without any associated formatting).

Editing Outlook.MailItem's RTFBody with C#

I'm trying to append a string to an outgoing Outlook.MailItem. In the send event handler I have:
switch (mailItem.BodyFormat) {
case Outlook.OlBodyFormat.olFormatRichText:
byte[] mailItemBytes = mailItem.RTFBody as byte[];
System.Text.Encoding encoding = new System.Text.ASCIIEncoding();
string RTF = encoding.GetString(mailItemBytes);
RTF += "my string";
byte[] moreMailItemBytes = encoding.GetBytes(RTF);
mailItem.RTFBody = moreMailItemBytes;
break;
// ...
}
but the received email does not contain my string.
I know this is old and has green check already but after searching for similar issues I found a page that gives a good answer for how to modify the RTF body in an outlook project using the Word.Document object model. https://www.add-in-express.com/forum/read.php?FID=5&TID=12738
Basically you treat the text a word doc and forget about working with RTF all together. You will need to add reference of Microsoft.Office.Interop.Word to your project first.
then add using to your project
using Word = Microsoft.Office.Interop.Word;
then add your code
Word.Document doc = Inspector.WordEditor as Word.Document;
//text body
string text = doc.Content.Text;
//end of file
int endOfFile = (text.Length) > 0 ? text.Length - 1 : 0;
//Select the point to add or modify text
Word.Range myRange = doc.Range(endOfFile, endOfFile);
//add your text to end of file
myRange.InsertAfter("my string");
RTF is a pretty elaborate format for files and isn't going to be as easy as concatenating a string. You may try using the RichTextBox control and importing the data there first, adding text to it, then re-grabbing the formatted value back. A bit clunky, but a lot easier than parsing an RTF file.
As an alternative, you may be able to find a library that parses and works with RTF, but that means a dependency for your application and most likely another DLL to include in the release.

using iTextSharp with MVC

I've got a two part question about using iTextSharp. I've built a simple MVC application to store information about "lessons learned" in a SQL Server database. When a user is looking at the details for a lesson I want them to be able to save the lesson as a PDF. I've got some working code but 1) I'm not sure how sound my approach is and 2) the MVC application uses a TinyMCE rich text editor and when I put the rich text into the PDF the html tags are being displayed. How do I get the PDF to honor the html formatting (bold fonts, unordered lists, paragraphs, etc.)?
Below is the code I'm using to generate the PDF. I would really appreciate feedback if I'm going about this incorrectly.
Thanks.
public FilePathResult GetPDF(int id)
{
Lesson lesson = lessonRepository.GetLesson(id);
string pdf = #"C:\Projects\Forms\LessonsLearned\PDF\template_test.pdf";
string outputFilePath = #"C:\Temp\pdf_output\test_template_filled.pdf";
PdfReader pdfReader = null;
try
{
pdfReader = new PdfReader(pdf);
using (FileStream pdfOutputFile = new FileStream(outputFilePath, FileMode.Create))
{
PdfStamper pdfStamper = null;
try
{
pdfStamper = new PdfStamper(pdfReader, pdfOutputFile);
AcroFields acroFields = pdfStamper.AcroFields;
acroFields.SetField("title", lesson.Title);
acroFields.SetField("owner", lesson.Staff.FullName);
acroFields.SetField("date", lesson.DateEntered.ToShortDateString());
// field with rich text
acroFields.SetField("situation", Server.HtmlDecode(lesson.Situation));
acroFields.SetField("description", Server.HtmlDecode(lesson.Description));
pdfStamper.FormFlattening = true;
}
finally
{
if (pdfStamper != null)
{
pdfStamper.Close();
}
}
}
}
finally
{
pdfReader.Close();
}
return File(outputFilePath, "application/pdf", "Lesson_" + lesson.ID + ".pdf");
}
The only major component of your approach I'd change is how the data is returned. Right now you are using a fixed file path, which means that two people making requests at the same time will result in one of them getting an error. Since you are not saving the file for later use, I would skip the FileStream entirely and use a MemoryStream. You can then use FileStreamResult to return the stream with a MIME type of application/pdf.
For the second part you will have some trouble. PDF and HTML are not related, so HTML tags (which are just plain text) have no special meaning in a PDF document. If you want to convert a users HTML (generated by your rich text control) into suitable PDF rich text you will need an HTML parser in the middle.
iText includes the HTMLWorker class, which is a partial HTML parser (meaning it won't handle all html tags or structures) designed to return PDF compatible chunks. You could also use something like the HTMLAgilityPack to have more control over which tags are converted, but you'd then have to do the translation yourself. You could also examine your rich edit control, to see if it can return rich text in an easier to parse format.

Accessing the content of the file

//Introduction
Hey, Welcome.....
This is the tutorial
//EndIntro
//Help1
Select a Stock
To use this software you first need to select the stock. To do that, simply enter the stock symbol in the stock text-box (such as "MSFT").
To continue enter "MSFT" in the stock symbol box.
//EndHelp1
//Help2
Download Stock Data
Next step is to to download the stock data from the online servers. To start the process simply press the "Update" button or hit the <ENTER> key.
After stock data is downloaded the "Refresh" button will appear instead. Press it when you want to refresh the data with the latest quote.
To continue make sure you are online and press the "Update" button
//EndHelp2
First time I want to display the content between //Introduction and //EndIntro then second time the content between //Help1 and //EndHelp1 and so on.
That's a very open-ended question - what sort of file? To read binary data from it you'd usually use:
using (Stream stream = File.OpenRead(filename))
{
// Read from the stream here
}
or
byte[] data = File.ReadAllBytes(filename);
To read text you could use any of:
using (TextReader reader = File.OpenText(filename))
{
// Read from the reader
}
or
string text = File.ReadAllText(filename);
or
string[] lines = File.ReadAllLines(filename);
If you could give more details about the kind of file you want to read, we could help you with more specific advice.
EDIT: To display content from an RTF file, I suggest you load it as text (but be careful of the encoding - I don't know what encoding RTF files use) and then display it in a RichTextBox control by setting the Rtf property. Make the control read-only to avoid the user editing the control (although if the user does edit the control, that wouldn't alter the file anyway).
If you only want to display part of the file, I suggest you load the file, find the relevant bit of text, and use it appropriately with the Rtf property. If you load the whole file as a single string you can use IndexOf and Substring to find the relevant start/end markers and take the substring between them; if you read the file as multiple lines you can look for the individual lines as start/end markers and then concatenate the content between them.
(I also suggest that next time you ask a question, you include this sort of detail to start with rather than us having to tease it out of you.)
EDIT: As Mark pointed out in a comment, RTF files should have a header section. What you've shown isn't really an RTF file in the first place - it's just plain text. If you really want RTF, you could have a header section and then the individual sections. A better alternative would probably be to have separate files for each section - it would be cleaner that way.
Not sure I understand your question correctly. But you can read and write content using System.IO.StreamReader and StreamWriter classes
string content = string.Empty;
using (StreamReader sr = new StreamReader("C:\\sample.txt"))
{
content = sr.ReadToEnd();
}
using (StreamWriter sw = new StreamWriter("C:\\Sample1.txt"))
{
sw.Write(content);
}
Your question needs more clarification. Look at System.IO.File for many ways to read data.
The easiest way of reading a text file is probably this:
string[] lines = File.ReadAllLines("filename.txt");
Note that this automatically handles closing the file so no using statement is need.
If the file is large or you don't need all lines you might prefer to reading the text file in a streaming manner:
using (StreamReader streamReader = File.OpenText(path))
{
while (true)
{
string line = streamReader.ReadLine();
if (line == null)
{
break;
}
// Do something with line...
}
}
If the file contains XML data you might want to open it using an XML parser:
XDocument doc = XDocument.Load("input.xml");
var nodes = doc.Descendants();
There are many, many other ways to read data from a file. Could you be more specific about what the file contains and what information you need to read?
Update: To read an RTF file and display it:
richTextBox.Rtf = File.ReadAllText("input.rtf");

Categories