Subscript text in pdf C# - c#

How do I insert a subscript character in a string in C#?
I have problems appending a superscript "2" in the same string using char.ConvertFromUtf32(178);, but I struggle with finding a similar solution for the subscripted text. Actually, I'm struggling with finding any solution at all to this rather embarrassing issue.

Plain text doesn't have formatting, like superscript, subscript, bold, italic and/or colors.
You need to use some "rich text" format.
The type of "rich text" depends on where you want to use it. Examples: HTML, RTF.
For PDF you need to look into the formatting options provided by your PDF creation library.

The PDF creation library I'm using did not offer much.
One work around I could figure out was to pick equalent ascii values from charecter map and append it to the existing string.

Related

Use OpenXML to replace text in DOCX file - strange content

I'm trying to use the OpenXML SDK and the samples on Microsoft's pages to replace placeholders with real content in Word documents.
It used to work as described here, but after editing the template file in Word adding headers and footers it stopped working. I wondered why and some debugging showed me this:
Which is the content of texts in this piece of code:
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(DocumentFile, true))
{
var texts = wordDoc.MainDocumentPart.Document.Body.Descendants<Text>().ToList();
}
So what I see here is that the body of the document is "fragmented", even though in Word the content looks like this:
Can somebody tell me how I can get around this?
I have been asked what I'm trying to achieve. Basically I want to replace user defined "placeholders" with real content. I want to treat the Word document like a template. The placeholders can be anything. In my above example they look like {var:Template1}, but that's just something I'm playing with. It could basically be any word.
So for example if the document contains the following paragraph:
Do not use the name USER_NAME
The user should be able to replace the USER_NAME placeholder with the word admin for example, keeping the formatting intact. The result should be
Do not use the name admin
The problem I see with working on paragraph level, concatenating the content and then replacing the content of the paragraph, I fear I'm losing the formatting that should be kept as in
Do not use the name admin
Various things can fragment text runs. Most frequently proofing markup (as apparently is the case here, where there are "squigglies") or rsid (used to compare documents and track who edited what, when), as well as the "Go back" bookmark Word sets in the background. These become readily apparent if you view the underlying WordOpenXML (using the Open XML SDK Productivity Tool, for example) in the document.xml "part".
It usually helps to go an element level "higher". In this case, get the list of Paragraph descendants and from there get all the Text descendants and concatenate their InnerText.
OpenXML is indeed fragmenting your text:
I created a library that does exactly this : render a word template with the values from a JSON.
From the documenation of docxtemplater :
Why you should use a library for this
Docx is a zipped format that contains some xml. If you want to build a simple replace {tag} by value system, it can already become complicated, because the {tag} is internally separated into <w:t>{</w:t><w:t>tag</w:t><w:t>}</w:t>. If you want to embed loops to iterate over an array, it becomes a real hassle.
The library basically will do the following to keep formatting :
If the text is :
<w:t>Hello</w:t>
<w:t>{name</w:t>
<w:t>} !</w:t>
<w:t>How are you ?</w:t>
The result would be :
<w:t>Hello</w:t>
<w:t>John !</w:t>
<w:t>How are you ?</w:t>
You also have to replace the tag by <w:t xml:space=\"preserve\"> to ensure that the space is not stripped out if they is any in your variables.

How do I write subscript lowercase letters in a string

As the question says, how do I write subscript letters for example, Fnet (net is subscripted), in a string?
Is there any shortcut key for creating a subscript lowercase letter? I just found few subscript lowercase letters, ₐ ₑ ᵢ ⱼ ₒ ᵣ ᵤ ᵥ ₓ, other letters are missing.
Instead of searching for direct subscript support in raw string, you should use a control like RichTextBox which has direct support of displaying subscripts. For other controls you can override OnPaint event and do custom text rendering using GDI+ API's.
I don't believe you do.
If we were to change the question to "How do I write bold characters in a string" you would naturally tell me that you can't, and you have to apply some styling instead using CSS or something. Same situation here...
You're looking at formatting as opposed to content.

Best way to extracting only the bold text from a PDF

iTextSharp is a great tool, I can use
PdfTextExtractor.GetTextFromPage(reader, iPage) + " ";
and it works great, but is there a way to extract only the bold text (e.g. the headlines) from the pdf, and not everything?
Any solution is useful, regardless of the programing language. Thank you
From within iText, You need to use the classes from the com.itextpdf.text.pdf.parser package.
Specifically, you'll need to use a PdfTextExtractor with a custom TextExtractionStrategy that checks the font name. Bold fonts USUALLY have the world "bold" in their name.
Potential Issues:
1) Not everything that looks like text is rendered with fonts and letters. It can be paths or a bitmap. The only way to extract such text is with OCR, and there's no way to get font info.
2) Font Encoding. The bytes that map to the glyphs you're seeing in the PDF may not have a map from those bytes to actual character information.
3) Not all bold-looking text is made with a bold font. Some bold text is made by stroking the text outline with a fairly thin line as well as the usual filling. In this case, the text render mode will be set to "stroke & fill" instead of the usual "fill". This is pretty rare, but it does happen from time to time.
An easy way to test for problems 1 and 2 is to attempt to copy and paste the text within Reader/Acrobat. If you can't select it, it's almost certainly paths or an image. If you can select it but the characters come out as random junk when pasted, then iText will come up with the same junk.
Problem 3 isn't that hard to test for programattically, though you have to handle it on a case by case basis. You need to call TextRenderInfo.getTextRenderMode(). 0 is fill (the standard way of doing things), and 2 is "stroke and fill".
So your TextExtractionStrategy can stub out beginTextBlock, endTextBlock, renderImage, and getResultantText. In your renderText implementation, you'll have to check the font name (for "bold", case insensitive) and the text render mode. If either of those is the case, it's part of on of your headings.
All this is supposing that you are dealing with arbitrary PDF files. If all your PDFs come from the same source, you can start cutting corners. I'll leave that as an Exercise For The Reader.
One of your best bets for this job surely is TET by pdflib.com with its ability to extract to the TETML format. Available for Windows, Mac OS X, Linux, Solaris, AIX, HP-UX...
I'm not sure if it does indeed recognize "headlines" as such (because PDF does not know much of structural markups, only visual ones) -- but it surely can tell you exact position and font used by each string of characters.

Making text bold using Stream Writer in C#

How to make text bold using stream writer,here is my code:
string path = Application.StartupPath + "\\WZ.PNR";
StreamWriter writer = new StreamWriter(path);
textPrint.ToText(writer, Width, FSection, FAlign, DSection, DAlign, Format);
writer.WriteLine();
writer.Close();
I am writing some text and i need to make some text BOLD, How to do it?
Thanks
StreamWriter is for writing plain text. You need markup of some kind to make text bold. Options include:
RTF
HTML
TeX
How are you expecting to open the generated file? The application will need to understand whatever file format you choose. There's no general concept of "a bold character" - the letter E is the letter E; if you want it styled that styling data is separate.
Given your file extension, are you trying to create a PeerNet Label Designer file? If so, you'll need to find out the appropriate file format - I don't know whether it's a text format, binary etc.
At first you should create your path in that way:
string path = Path.Combine(Application.StartupPath, "WZ.PNR");
After this little improvement let's take a look at your pnr file...
So you open this file and like to write some bold text to it?
So do you have some kind of program, that is already able to create and view such a .pnr file?
I think you'll have, or from where do you know, that it is possible to have bold text within such a file?
In case you have this program to generate such a file with bold text. Just make a new file, enter three words: "one two three" and make the 'two' bold. Save this file and open it with a good plain text editor (e.g. notepad++) or a good Hex-Editor and try to find out how this will be accomplished.
For example, open WordPad create a new rtf-File and insert the above example. After saving it and re-opening in a plain text editor you'll get:
{\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fswiss\fcharset0 Arial;}}
\viewkind4\uc1\pard\f0\fs20 one \b two\b0 three\par
}
And as you can see, the bold is been made by using '\b ' to enable and '\b0 ' to disable the bold text. Also there is plenty of other informations like used font, charset, etc.
That's called reverse-engineering if you don't have any specs. ;-)
The extension PNR suggests some sort of Printer file. That means you'll have to look up the escape codes for that particular kind of printer.

iText - how to do search/replace on existing RTF document

Currently I'm working on a simple Mail-Merge module.
I need to load plain *.RTF template, then replace all words enclosed in [[field]] tags and at the end and print them out.
I found the iText library which is free and capable of loading/saving pdfs and rtf.
I managed to load rtf, merge a few copies to one huge doc but I have no idea how to replace [[field]] by custom data like customer name/address.
Is that feature present, and if yes - how to do it?
The solution platform is c#/.NET
I don't think that pdf is the way you want to go.
According to this article it is extremely difficult at best and not possible at worst.
Would something like RTFLib work better for you?
G-Man
Finally I decided to use *.docx and "Open XML SDK 2.0 for Microsoft Office" .NET strongly typed wrapper.
You can use RichTextBox control to find/replace placeholders.
RichTextBox rtb = new RichTextBox();
rtb.LoadFile("template.rtf");
string placeHolder = "[[placeholder_name]]";
int pos = rtb.Find(placeHolder);
rtb.Select(pos, placeHolder.Length);
rtb.SelectedText = "new value";
After this you can get rtf formatted text with:
rtb.Rtf;

Categories