I'm building a Word document in OpenXML with C#.
One of the fonts I must use is a custom-made branded font. This font will not be available on customer machines.
Is it possible to embed font-file within .docx file and reference that font in font styles. If yes, how can this be done within C# SDK?
So far that does not seem to be possible, but I might have missed the documentation page somewhere.
p.s. I already have PDF with embedded fonts. Now I need the same looking Word document.
Sounds like what you need is a .pdf. So unless it absolutely must be a .docx I think that's your best option.
Help on generating a .pdf in C# can be found here.
Related
Using the technique in this answer I was successfully embedding the contents of an RTF file into an existing Word DOCX file, using OpenXML 2.5. Or so I thought.
We've now discovered that while the created file works fine in MS Word and Word Online, the document displays without the RTF content on other viewers such as:
Google Docs preview functionality
Windows Phone 8.1 (which has Office functionality built in)
Various iOS and Android viewers
In all cases, the document displays completely correctly except that the RTF content is just missing.
I did think it might be an issue in the viewers rather than the DOCX file, but for several tools to have the same issue makes me suspect it is a bug in our code.
It's a bit of an obscure case so trying to figure out the problem is proving difficult.
The technique you used (altChunk) relies on the viewer to convert the RTF content into WordML.
As you've discovered many don't do this.
To avoid this issue, you've really gotta convert the RTF content in your own code.
I need to convert PDF files into .doc files using C#. The computer has no file system though it doesn't have Office installed. Any good ideas how I can approach this? I did some research and most of people use the interop services.
You need to understand that PDF is not really implemented as a single document format.
If your PDF docs are created by rendering text to a PDF file, then direct PDF conversion is not only possible, but can be very good (reliable).
If the source of your PDF is either a scanner or fax (essentially a scanner...) then what you have is a document with an "picture" of text. This scenario is more difficult to deal with. If you open up the markup for this there is no 'text' to be converted. In this situation you have to deal with some manner of OCR (optical character recognition) which is less reliable due to a variety of issues.
If you have the option of intercepting the data before it is rendered to PDF (say like in SSRS or Crystal) then it would be better for you to bypass the PDF stage and move your data to a Word document.
If you are constrained to receiving faxes and then needing to interpret their content, prepare for OCR hell. It has been a while since I was there, so I hope that it has gotten better.
Even with out office installed on your machine, you have access (with Visual Studios) to the Office developer toolkit which will allow you build documents to be distributed in the Word formats.(.doc/.docx).
An option/idea may be to convert the PDF to Html, which can be opened in Word?
use aspose pdf kit to conver pdf to text and then text to doc using filestream or aspose doc
Background:
I have PDF's I am programmatically generating. I need to be able to send the PDF directly to a printer from the server (not through an intermediate application). At the moment I can do all of the above (generate PDF, send to printer), but because the fonts aren't embedded in the PDF the printer is doing font substitution.
Why the fonts aren't embedded when generated:
I am creating PDF's using SQL Reporting Services 2008. There is a known issue with SQL Reporting Services in that it will not embed fonts (unless a series of requirements are met - http://technet.microsoft.com/en-us/library/ms159713%28SQL.100%29.aspx). Don't ask me why, the PDF meets all of MS's listed requirements and the fonts still show up as not embedded - there is no real control over whether the fonts are embedded, so I have accepted that this isn't working and moved on. The suggested workaround from Microsoft (Link under 'When will Reporting Services do font embedding') is to post process the PDF to manually embed the fonts.
Goal
Take an already generated PDF document, programmatically 'open' it and embed the fonts, resave the PDF.
Approach
I was pointed towards iTextSharp, but most of the examples are for the Java version and I'm having trouble translating to the iTextSharp version (I can't find any documentation for iTextSharp).
I am working on this post for what I need to do: Itext embed font in a PDF.
However for the life of me, I cannot seem to use the ByteArrayOutputStream object. It can't seem to find it. I've researched and researched but nobody seems to say what class it's in or where I find it so I can include it in the using statements. I've even cracked open Reflector and can't seem to find it anywhere.
This is what I have so far and it compiles etc. etc.
(result is my byte[] of the generated PDF).
PdfReader pdf = new PdfReader(result);
BaseFont unicode = BaseFont.CreateFont("Georgia", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
// the next line doesn't work as I need a ByteArrayOutputStream variable to pass in
PdfStamper stamper = new PdfStamper(pdf, MISSINGBYTEARRAYOUTPUTSTREAMVARIABLE);
stamper.AcroFields.SetFieldProperty("test", "textfont", unicode, null);
stamper.Close();
pdf.Close();
So can anybody either help me with using iTextSharp to embed fonts into a PDF or point me in the right direction?
I'm more than happy to use any other solutions other than iTextSharp to complete this goal, but it needs to be free and able to be used by a business for an internal application (i.e. Affero GPL).
This may not be the answer you are looking for (since you want to get your problems solved programmatically, not by an external tool).
But you can use Ghostscript commandline to embed missing fonts in retrospect to PDFs which have not embedded them:
gs \
-sFONTPATH=/path/to/fonts:/another/dir/with/more/fonts \
-o output-pdf-with-embedded-fonts.pdf \
-sDEVICE=pdfwrite \
-dPDFSETTINGS=/prepress \
input-pdf-where-some-fonts-are-not-embedded.pdf
One important thing is that the missing fonts are all available in one of the directories pointed to by the -sFontPath=... switch.
Besides Ghostscript, it is also possible to use Poppler and Cairo. There is a command pdftocairo from Poppler that converts PDF to PDF via pdftocairo -pdf input.pdf output.pdf. It also considers font substitutions set in a Fontconfig configuration file. This is very helpful if you do not have all fonts on your system that are referenced in a PDF file, but know which other font you have installed is a good-looking replacement. After processing, the substitution font is embedded.
I had this problem on a Mac with a PDF I was submitting to IEEE. Using Adobe Reader and Preview, I was able to get around this. I think any pdf printer might work in place of Preview if you are on a PC.
Here are the steps I took. You can individually fix each figure, or fix the whole document.
Open at pdf file using Adobe Reader.
Right click on image, and click “Document Properties.”
Click “Fonts.” Check to see if the font isn’t embedded. Should say “Courier” or other font name.
If your pdf isn’t a standard page size, click on “Description” and look at the page size. Write this down. Ex. 19.4 x 5.22 in.
Open the pdf up in Preview. Go to File->Print. If using a pdf that isn’t a standard page size, click on Paper Size and choose custom. You will need to create a custom page size that is equal to the one you wrote down in step 4. Don’t forget to zero the margins to 0 for all sides. After doing that, you’ll need to set the scale of the print in the print dialog to 100%.
In the lower left of the print dialog (in Preview on a Mac), click “PDF” to print the PDF to a new PDF. Select the destination and print.
Open the new pdf up in Adobe Reader and verify that the fonts are now embedded.
I hope this helps.
I had this problem today with an existing PDF I uploaded to lulu.com to make a printed copy. It was rejected for not having all fonts embedded.
I found that if I opened it in Acrobat X and Saved out as postscript .ps file, then when I double clicked this .ps file in File Explorer, it opened in Acrobat X Distiller, and this automatically created a new PDF file with all fonts embedded!
Naturally this would mean you must have all the fonts needed on your computer. Otherwise a program like InFix can make font substitutions.
I have several Word templates and I wish to use these to dynamically create Word documents in my app. I wish to avoid using automation at all costs as this is no good. I know that I can use both HTML and XML to create word documents but I just don't know where to start with regards to using a template that may well have images in the footer or the header of a document.
I use the OpenXML SDK with Word 2007. After you get the hang of it, it's not so bad. I have several template docx files that I scan through to search and replace for placeholder strings with what I want, and then can stitch together multiple templates into one document if I want to. It's nice because I can start with docx files as the template and modify them while the whole time staying within the realm of the docx format. If an image is in the docx when you start modifying it, it'll be there after you re-save it after modification (provided you didn't programmatically remove it of course).
If you have more details with what you'll be doing, let us know.
You could use DocX. It's free, very easy to use, with nice tutorials and is feature reach. It works with only DOCX documents thou. Also development is currently on hold until the author will finish his semester. Here's detailed blog about it.
It has good example of using template in his Invoice Example.
MigraDoc http://www.pdfsharp.net/MigraDocOverview.ashx is a free utility for exporting PDF/Word/HTML files. I've not worked with it using templates as yet however, you could use the DDL files to persists a layout for your files to be re-used.
So, I have used Pdf995's PDF print driver from a web browser to print web pages and eventually use PdfEdit995 to join these various PDF files into one large PDF.
Now I have a lot of large PDF documents that I wish to add bookmarks to, but am hoping there is a relatively easy way of doing this programmatically (using C#, preferably) - basically, I want to find, within each PDF, text that is large enough to qualify as a header, and use that text as the bookmark.
Any tips/advice/direction? Thanks!
It's definitely possible to do this, but I would recommend finding a PDF library that does most of the leg work. Technically you could do it all yourself with the aid of the PDF specification, but that'd probably take more time than it's worth.
The library will need to be able to let you find text in a document and then return the page and size, font, etc, of the text and create bookmarks (also known as outlines) based on that information programmatically.
My companies product, Quick PDF Library, can help you do this and so can PDFKit.NET. I'm sure there are other libraries out there that support this functionality too. As far as free libraries go, from what I've seen I don't believe that PDFSharp or iText will meet all of your requirements in this case, but I'm sure someone will correct me if I'm wrong.
If you'd prefer to develop a solution for this entirely yourself, then the PDF reference is available online for free.