How can I check if a PDF file is using embedded fonts? - c#

I have a folder where multiple clients upload multiple PDF files.
Some of them are using embedded fonts, some doesn't.
I've been working on a service that optimizes (in terms of file size) the PDF files in this folder.
Each user may be uploading around 400 files, weighing anywhere between 80K to 10M, and my task is to optimize all of them to the smallest possible file size with minimal quality lose.
the PDF Library is doing a great job with it. My only problem is that I can't remove all embedded fonts from all files, since some of the files might use these fonts and the result would be a file that I can't use.
So my questions are:
How can I detect what files use and what files doesn't use embedded fonts?
When optimizing the files that use embedded fonts, How can I remove only the unused fonts?
what I want to achieve is to remove all embedded fonts from most of the files, but keep the embedded fonts in the files where I actually need them. I understand that it depends on the fonts I have on my system (these files should stay on a single system so portability is not that important to me), so I try to find a way to identify, before optimizing, what files will look OK without embedded fonts, and what files I need to keep the embedded fonts.

APDFL has a PDFontIsEmbedded() call. The DotNet interface's Font class has an Embedded property. Saving with the GarbageCollect SaveFlag should remove any unreferenced indirect objects, including fonts.
Note that Resource Dictionaries could potentially be shared by multiple pages so that fonts not used by one page might be used by another page that uses the same resource dictionary.

The Adobe PDF Library version 15 and up have a service that will optimize PDF files for you.
The Optimizer has a function to subset all embedded fonts. What that will do is create a subset of each font limited to only the glyphs of that font actually used by the document. The API is below.
void Datalogics::PDFL::PDFOptimizer::SetOption (OptimizerOption option, bool value)
void Datalogics::PDFL::PDFOptimizer::Optimize (Document document, string newPath)
This is the option that you need
SubsetAllEmbeddedFonts 

Related

How to PDF Resize generated by rdlc and itextsharp

I have generated pdfs using rdlc and then combined multiple pdf files to a single document using iTextSharp pdfsmartcopy class. But my pdf size is large and I want to reduce the size of that pdf file. I have tried compressing it using iTextSharp but that's unable to compress it. When I upload the pdf file to ilivepdf.com online for compression ,then it compresses the 21MB file to 1MB.
Often, the problem is related to embedded fonts.
You see, PDF really strives to preserve your document exactly how you made.
To do that, a PDF library can decide to embed a font. You can imagine this as simply putting the font file into the PDF document.
But, here comes the tricky part.
The PDF specification took into account that this may be overkill.
I mean, if you are only using the 50-something characters typically used in Western languages, it makes little sense to embed the entire font.
So PDF supports a feature called "font subsetting". This means, instead of embedding the entire font, only those characters that are actually used are embedded in the document.
So what is going wrong exactly when you're merging these documents?
(I will skip a lot of the technical details.)
In order to differentiate between a fully embedded font, system font, or subset embedded font, iText generates a new font name for your fonts whenever it embeds them.
So a document containing a subset of Times New Roman might have "Times-AUHFDI" in its resources.
Similarly, a second document (again containing a subset of Times New Roman) might list "Times-VHUIEF" as one of its resources.
I believe it simply adds a random 6-character suffix. (ex-iText developer here)
PdfSmartCopy has to decide what to do with these resources. And sadly, it doesn't know whether these fonts are actually the same. So it decides to embed both these subsets into the new document.
This is a huge memory penalty.
If you have 100 documents, all using a subset of the same font, that subset will be embedded 100 times.
The other tool you listed might actually check whether these fonts are the same (and if they are, embed them only once). Or the other tool might simply not care that much and assume based on the partial name match that they are the same.
The ideal solution would of course be to compare the actual characters in the font, to see whether these two subsets can be merged.
But that would be much more difficult (and might potentially be a performance penalty).
What can you do?
There are 12 fonts that are never embedded. They are assumed to be present on every system (hence why they are never embedded.)
If you have control over the process that generates the PDF documents, you could simply decide to create them using only these fonts.
Alternatively you could write a smarter PdfSmartCopy. You would need to look into how fonts are built and stored, and perform the actual comparison I mentioned earlier.
Ask for technical support at iText. If enough people request this particular feature, you may get it.

Reading font file content in winrt

How to read font file stream from WinRT platform? I need to get font file content from C# UWP. As far as you probably know there is no way to read files from Fonts folder directly. FilePicker is also not an option for me, since it's not a user responsibility to choose this folder. I found the way to enumerate font names using DirectWrite (C++) and then wrapping it with COM component which will be available in C# (https://code.msdn.microsoft.com/FontExplorer-lets-you-f01d415e#content), I wonder if the similar thing can be done to read font file content as byte[] or Stream?
You cannot directly read the TTF file from a UWP app without the user navigating to the file manually. The UWP application is not allowed to open files without the user being prompted unless they are in specific locations.
Also, as mentioned in a comment, many fonts may not be distributed or embedded without special licenses.
Good news: PDF export doesn't make much sense in windows 10. Windows 10 has build-in PDF printer. So, it's better to kill 2 birds with one stone: implement printing and get PDF export free of charge.
Assuming you already got as far as you have created IDWriteFontFile instance, then it's easy to read arbitrary file fragment:
Get file reference key with IDwriteFontFile::GetReferenceKey();.
Get loader interface with IDWriteFontFile::GetLoader();
Create stream instance with IDWriteFontFileLoader::CreateStreamFromKey() using key from step 1.
Use IDWriteFontFileStream::ReadFileFragment/ReleaseFileFragment to read from file stream to your buffer.

XNA - how to determine font type and font size used in compiled XNB file

XNB files are created by Microsoft XNA and distributed with many
games. XNB is a general serialization format capable of representing
arbitrary .NET objects, but there are common definitions for textures,
sound samples, 3D models, fonts, and other game data. XNB files may
use LZX compression (referred to as the Xbox XMemCompress format).
I have decompressed xnb files with fonts and I need to get information what kind of font and font size is used. I don't have a source of primary application. I would like to use the same font design in other application.
Current xnb files with fonts don't have all special characters which I need. I'm able to generate new spritefont with Unicode characters and compile it to xnb files but because I don't know what font had been used so my addon design is visible different.
Does anyone know how to detect what kind of font and fontsize is used in XNB file?
Also I was thinking about change encoding through hex editior but I didn't find any info where information about encoding is stored and how easly change them.
Sample file with font:
https://onedrive.live.com/redir?resid=B6196AD97CA6B88A!251&authkey=!ADaJmio5n3RO2zM&ithint=file%2c.zip
Until now I found below helpful resources:
What is an XNB File?
Compiled (XNB) Content Format
You can use the "Compiled (XNB) Content Format" project from the msdn.
I haven't tried it myself, but according to Microsoft, you can use it to open an XNB file and see it's content printed on the screen:
It also includes an example .xnb parser, written in native C++, which
demonstrates how to parse a compiled XNB file by printing its contents
to the screen.
Check it out here.
You can find other similar tools, but I always prefer to go to the owner of the product itself, this way you have better chances of having something useful.

Handle lots of images in C# application

I'm trying to create a card game in C# and for this I have alot of images that I need to load. They're all jpg images and there are about 7000 of them.
I would like to make sure that if you download the game, the images will not be easily accessible, meaning that they should not just be JPG images in a sub folder of the application. So I thought about imbedding them in a DLL file.
But how do I do this? And how do I handle this efficiently? Is there a tecnique to this sort of thing, or is another method preferable?
I would like to make sure that [...] the images will not be easily accessible
First, you should ask yourself why you want to forbid this. If you just want to avoid that someone else manipulates the pictures, you can leave them in a bunch of subfolders as JPGs, just generate checksums for each file and check them at the time the program loads the pictures.
If you want to avoid reuse of the pictures, you can leave them in a bunch of subfolders, but not as JPGs. Encode them with for example with the standard AES algorithm. But beware, that won't prevent anyone else of making screenshots while you application is running, so you should consider if that's really worth the effort.
EDIT: if you want to embed the images because installation gets easier when you have just one big file to deploy instead of 7000 single files, then you may write a helper program for creating resource files programmatically. See this page from Microsoft, especially the part about .resource files, to learn how to utilize the ResourceWriter class for that purpose.
If you have 7000 image, you need a database. Microsoft SQL Server Compact 4.0 is an option. It's small and easy to use.
I'm assuming that this is a windows application
In order to Embed a Image to the assembly
1. Right click the Image file and Select properties
2. In the Properties Pane Set the BuildAction as Embeded resource
So this Image becomes a embeded resource when the application is compiled
Then you can access the Image from the assembly like:
global::[[MyNameSpace]].Properties.Resources.[[ImageName]]
for eg:this.pictureBox1.Image = global::[[MyNameSpace]].Properties.Resources.[[ImageName]]

Do long class names affect the size of a XAP file?

There is a custom control in my Silverlight application, which is put on the canvas about one thousand times. I'm concerned about the size of a XAP file of this app.
Will XAP file contain explicitely the names of this control in the same quantity? Will I reduce the size of the XAP file by changing the name of the custom control from, say, 10 letters to only 2 letters. How I can calculate the impact of the length of the class names on the final size of the XAP file?
How I can calculate the impact of the length of the class names on the final size of the XAP file?
Simple answer: you could test it. Build your XAP file, rename the class, rebuild the XAP file, and see what the size difference is, if any.
I'd expect the name of the class to only occur once in the metadata, but if you're giving long names to the instances in the XAML, that may make a difference.
If size is important to you I think you are worrying about the wrong thing. You are more likely to include unused code, or DLLs, or large resources, than have names that are too long. Remember the file is zipped and text compression is very good so the effect of repeating the same long control name 1000s of times is minimal.
Also don't forget to re-zip your XAP files with better zip software (as the zip compression you get in XAP files built by Visual Studio is not optimal!)
You are much better off worrying about progressive loading as most Silverlight apps out there are way too bulky (and take way more than the expected/recommended 8 second attention span of most web-users).
A XAP file is just a ZIP file containing your compiled binaries. Your XAML files are embedded as resources in those compiled binaries.
So yes - the length of the name of your control will affect the file size of your XAP file.
How much is a different question. The XAP file is a compressed ZIP file. As an example I have a XAP file with 2 DLLs in. One compressed down from 282k to 94k, the other from 46k to 16k.
I would worry more about why your control appears 1000 times in a single page however. Surely there must be a better way of doing this. Perhaps even doing it in code would be easier/better?

Categories