Read a RTF File and Remove the dynamic text

Read a RTF File and Remove the dynamic text - c#

My RTF File contains a text in the beginning like below:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Trebuchet MS;}{\f1\fswiss\fprq2\fcharset0 Verdana;}{\f2\fnil\fcharset0 Tahoma;}{\f3\fnil\fcharset0 Arial;}{\f4\froman\fprq2\fcharset0 Times New Roman;}}
How can i Read the RTF File and replace with anything i wish?

If you want to generate the RTF Header yourself you should take a look at the RTF Spec.
Otherwise you might be able to simply use the RichTextBox-Control, set the font-style/color etc. and get the header from the RTFText-Property.

Related

Use OpenXML to replace text in DOCX file - strange content

I'm trying to use the OpenXML SDK and the samples on Microsoft's pages to replace placeholders with real content in Word documents.
It used to work as described here, but after editing the template file in Word adding headers and footers it stopped working. I wondered why and some debugging showed me this:
Which is the content of texts in this piece of code:
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(DocumentFile, true))
{
var texts = wordDoc.MainDocumentPart.Document.Body.Descendants<Text>().ToList();
}
So what I see here is that the body of the document is "fragmented", even though in Word the content looks like this:
Can somebody tell me how I can get around this?
I have been asked what I'm trying to achieve. Basically I want to replace user defined "placeholders" with real content. I want to treat the Word document like a template. The placeholders can be anything. In my above example they look like {var:Template1}, but that's just something I'm playing with. It could basically be any word.
So for example if the document contains the following paragraph:
Do not use the name USER_NAME
The user should be able to replace the USER_NAME placeholder with the word admin for example, keeping the formatting intact. The result should be
Do not use the name admin
The problem I see with working on paragraph level, concatenating the content and then replacing the content of the paragraph, I fear I'm losing the formatting that should be kept as in
Do not use the name admin

Various things can fragment text runs. Most frequently proofing markup (as apparently is the case here, where there are "squigglies") or rsid (used to compare documents and track who edited what, when), as well as the "Go back" bookmark Word sets in the background. These become readily apparent if you view the underlying WordOpenXML (using the Open XML SDK Productivity Tool, for example) in the document.xml "part".
It usually helps to go an element level "higher". In this case, get the list of Paragraph descendants and from there get all the Text descendants and concatenate their InnerText.

OpenXML is indeed fragmenting your text:
I created a library that does exactly this : render a word template with the values from a JSON.
From the documenation of docxtemplater :
Why you should use a library for this
Docx is a zipped format that contains some xml. If you want to build a simple replace {tag} by value system, it can already become complicated, because the {tag} is internally separated into <w:t>{</w:t><w:t>tag</w:t><w:t>}</w:t>. If you want to embed loops to iterate over an array, it becomes a real hassle.
The library basically will do the following to keep formatting :
If the text is :
<w:t>Hello</w:t>
<w:t>{name</w:t>
<w:t>} !</w:t>
<w:t>How are you ?</w:t>
The result would be :
<w:t>Hello</w:t>
<w:t>John !</w:t>
<w:t>How are you ?</w:t>
You also have to replace the tag by <w:t xml:space=\"preserve\"> to ensure that the space is not stripped out if they is any in your variables.

Is it possible to read French characters into a C# string from an .eml file?

I have a project where I need to generate a .pdf file based on the content in an .eml file. When dealing with just english characters, I'm fine, the pdf is created flawlessly and everything works (after I strip all the needless html junk).
However an issue arrives when I try to read in an .eml file that is filled with french characters. In particular the french characters are stored as number codes like =E9, =E8, &#339, so on and so forth.
So my issue is this. I read the .eml file in with:
string content = File.ReadAllText(filePath, Encoding.UTF8);
However it comes in as plain text and I don't know how to make the system interpret the =E9 and =E8, etc., codes as French Characters. I can always Regex.Replace everything but I'm hoping for a more elegant solution. Is there any way to take in that long string of plain text and interpret the codes embedded within properly so that the french characters appear instead of their respective codes without using like 30 Regex.Replace expressions?
Due note I can't use any built in iTextSharp functionality since I also need to be able to incorporate french characters (pulled from that .eml file) into the file name of the pdf.
Thanks

You can use regexes, but two regexes should be enough:
text = Regex.Replace(text, #"=([0-9A-Fa-f]{2})", match => ((char)uint.Parse(match.Groups[1].Value, NumberStyles.HexNumber)).ToString());
text = Regex.Replace(text, #"&#(\d+);", match => ((char)uint.Parse(match.Groups[1].Value)).ToString());
A different way would be to find a MIME parsing library which exposes methods for parsing parts of MIME messages, that way you'd decode the =E9 codes. Then, you'd need to call WebUtility.HtmlDecode to parse the HTML entities.

Add header and footer to an RTF file using c#

We have an MVC app that outputs RTF files based on templates (which themselves are RTF files).
The code that my colleague wrote uses System.Windows.Forms.RichTextBox to convert text to RTF file (to be more excat it uses the Rtf property of RichTextBox). I was thinking of adding headers and footers to the template RTF files, but RichTextBox appears to remove those. Additionally some of the documents that we generate are composed of multiple templates (more often than not, a single template does not equal a single page and one template can be injected in the middle of another), so thats one more reason why including headers and footers in the templates would not work.
Is there any way to add headers and footer in C# to RTF documents created in the way described above?
I tried fishing something on the subject from the internet, but I wasn't able to find anything concrete.

I was searching for a library that could possibly solve my problem and I came across this one:
.NET RTF Writer Library in C#
The library itself doesn't exactly solve my problem on it's own, but the documents generated by it are easy to read and without all the crap Word would put into them. The demo for this library generates a document that has a header and a footer. The code of those two looks more or less like this:
{\header
{\pard\fi0\qd
This is a header
\par}
}
{\footer
{\pard\fi0\qc
{\fs30
This is a footer
}\par}
}
I still need to figure out how to apply correct formating here, but that should be relatively easy to find. So, I can solve my initial problem by injecting the code above to the RTF code generated by RichTextBox. I'm not sure if the position of those two tags matters, but I guess I will find that out soon enough...
Here is the code that I use to inject the header and footer:
public string AddHeaderAndFooter(string rtf)
{
// Open file that stores header and footer
string headerCode = System.IO.File.ReadAllText(Server.MapPath("~/DocTemplates/header.txt"));
// Inject header and footer code before the last "}" character
return rtf.Insert(rtf.LastIndexOf('}') - 1, headerCode);
}
Note I have the header and footer in a static txt file, because it actually contains images in RTF readable format and that would be too big to put in the code. I haven't noticed any problems related to the fact that header and footer are defined at the end of the RTF file.

Rtf to WordML Convert in C#

I have a windows application to generate report.
It has templates in RTF as "{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang2057{\\fonttbl{\\f0\\fnil\\fcharset0 Arial;}}\r\n\\viewkind4\\uc1\\pard\\fs20\\tab\\tab\\tab\\tab af\\par\r\n}\r\n", which is written to word doc file. then the word is Saved-As XML and close. Then, tags like (say) are extracted and some new
The problem here is Word, which is used as converter in the process and it consumes valuable time in Loop, where it opens word instance, save, close, delete.
Please correct any mistake if i have made and help me with an alternative to convert to WordML .

Use Aspose .Words
//your rtf string
string rtfStrx = "{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang2057{\\fonttbl{\\f0\\fnil\\fcharset0 Arial;}}\r\n\\viewkind4\\uc1\\pard\\fs20\\tab\\tab\\tab\\tab af\\par\r\n}\r\n"
//convert string to bytes for memory stream
byte[] rtfBytex = Encoding.UTF8.GetBytes(rtfStrx);
MemoryStream rtfStreamx = new MemoryStream(rtfBytex);
Document rtfDocx = new Document(rtfStreamx);
rtfDocx.Save(#"C:\Temp.xml", SaveFormat.WordML);
This saves your RTF text in new document as WordML. I cannot say about time it will take in loop. But it will surely have much less time then MS Word being physically opened and closed.

Unless I am missing something, I assume that you are trying to create Office XML file from RTF template? I think you can use Open XML SDK for creation of the xml file. Specifically, DocumentReflector that comes with that SDK seems to a good fit for that. See this example. Also, there is a http://www.codeguru.com/cpp/controls/richedit/conversions/article.php/c5377/ which shows how to convert from RTF to HTML that might guide you.

use wpf richtextbox. Rtf => xaml. Since xaml is xml_ use xslt or linq to convert it to your desired xml structure

Making text bold using Stream Writer in C#

How to make text bold using stream writer,here is my code:
string path = Application.StartupPath + "\\WZ.PNR";
StreamWriter writer = new StreamWriter(path);
textPrint.ToText(writer, Width, FSection, FAlign, DSection, DAlign, Format);
writer.WriteLine();
writer.Close();
I am writing some text and i need to make some text BOLD, How to do it?
Thanks

StreamWriter is for writing plain text. You need markup of some kind to make text bold. Options include:
RTF
HTML
TeX
How are you expecting to open the generated file? The application will need to understand whatever file format you choose. There's no general concept of "a bold character" - the letter E is the letter E; if you want it styled that styling data is separate.
Given your file extension, are you trying to create a PeerNet Label Designer file? If so, you'll need to find out the appropriate file format - I don't know whether it's a text format, binary etc.

At first you should create your path in that way:
string path = Path.Combine(Application.StartupPath, "WZ.PNR");
After this little improvement let's take a look at your pnr file...
So you open this file and like to write some bold text to it?
So do you have some kind of program, that is already able to create and view such a .pnr file?
I think you'll have, or from where do you know, that it is possible to have bold text within such a file?
In case you have this program to generate such a file with bold text. Just make a new file, enter three words: "one two three" and make the 'two' bold. Save this file and open it with a good plain text editor (e.g. notepad++) or a good Hex-Editor and try to find out how this will be accomplished.
For example, open WordPad create a new rtf-File and insert the above example. After saving it and re-opening in a plain text editor you'll get:
{\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fswiss\fcharset0 Arial;}}
\viewkind4\uc1\pard\f0\fs20 one \b two\b0 three\par
}
And as you can see, the bold is been made by using '\b ' to enable and '\b0 ' to disable the bold text. Also there is plenty of other informations like used font, charset, etc.
That's called reverse-engineering if you don't have any specs. ;-)

The extension PNR suggests some sort of Printer file. That means you'll have to look up the escape codes for that particular kind of printer.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.