Get glyph names from TTF file in C# - c#

I have a TTF file (the file that can be downloaded from https://materialdesignicons.com/ to be more precise) and I want to get the name and unicode numeric value of each glyph contained in it (something like this site does https://andreinitescu.github.io/IconFont2Code/ but I'm obviously to dumb to read the essential code parts from his Github project). I do not want the name of the font or anything just the name of each individual glyph.
I can see the names when I open the file with for example Notepad++ but I can not identify a pattern which I could use to get the names programmatically.
I have now searched the internet for more than a day straight and I can't find any helpful answers. It can't be that hard to get it working - or can it?

Related

Where to get a collection of differently encoded text files

I am writing a program that can detect the character-encodings of a file and then converts it to Unicode. In order to test the application I would like to have a collection of text files with different character-encodings. At least with the most common encodings. I've already been googleing for a while but didn't find any. I'm sure I am not the first one with this problem so I don't understand why I cannot find anything like that.
Does anyone know an easy way to get many differently encoded text files?

Edit XPS content

I have got an application that is supposed to send a formatted document to a printer with some barcodes.
I've made other applications that work with printers and print directly through the printserver by sending a xps file, so I thought I would try to see if I could make a .xps file, change the text and be done with it, however every article I can find on the net has to do with creating xps files and not changing them. I feel like it should be possible, and it would be nice not to have to resort to installing Office on the server and print through there. Then I might as well use Open XML and a .docx file.
It is very simple. Let's say I want to change the text INCNUMMER in a .xps file to "testing123". How would I go about that?
I have tried the whole unzip, open the xml, find the text, edit, rezip but I'm afraid there's too much about the .xps format I don't understand to make that work.
Best regards, Kaspar.
As you already know, an XPS file is just a ZIP archive containing a number of files and folders that have particular names and a defined structure.
At the root level there is a Documents folder which will typically contain just a single document folder named 1. Inside that is a Pages folder containing one or more .fpage files: these define the content of each page in the document.
Documents
1
Pages
1.fpage
2.fpage
etc
If you open up these .fpage files in a text editor you will see that they are just XML files. Each page is typically represented by a <Canvas> element that contains multiple <Path> and <Glyphs> elements (text is represented by the latter). However, even though <Glyphs> elements do have a UnicodeString attribute the value of that attribute cannot be changed in isolation.
Each <Glyphs> element also has an Indices attribute. If you remove this attribute altogether and change the UnicodeString attribute at the same time, this almost works. However, you will probably find that when viewing the file in the XPS Viewer application certain characters in the text are replaced by question mark symbols.
Font glyphs are embedded in the XPS file (odttf files in the Resources folder), and the software that generated the XPS file will only embed glyphs that are used in the source document. For example, this means that (for a given font) if you did not use the letter "A" in the source document, then the glyph for that letter will not be written to the resources of the XPS file. Hence if you change the UnicodeString attribute to include a letter "A" then that character will display as a question mark in the viewer because it has no glyph resource that tells it how that character must be drawn.
If you have control over the source document (the one that later gets converted to XPS) then I suppose you could include a piece of text containing all of the characters that you are likely to use, and set its colour to white so that it doesn't print, but I'm not sure whether the XPS printer driver would strip that text out anyway. If it didn't then you could probably do something like this:
Open the relevant .fpage XML file
Search all UnicodeString attributes of <Glyphs> elements to find the text you want
Replace that text with something else
Remove the Indices attribute from the changed <Glyphs> elements
Save the updated XML back to the file
Re-zip then change the extension from ZIP to XPS

Getting the Glyph Name from a TTF or OTF font file

Hi anyone know how to get the Glyph Name from a TTF or OFT font file from C#, I'm willing to parse the file directly to get them if necessary? I found this post here Access opentype glyph names from WPF but it got no answer.
P.S. I have created the font myself and am creating an program to create a CSS (LESS or SASS) file to use the Glyphs I have made easily in web pages like Bootstrap or FontAwesome :)
In TrueType-based fonts (.TTF files), you can try parsing the 'post' table. It's fairly easy to figure out. But, only format 2.0 explicitly stores glyph names. If the post table is format 3.0, there are no glyph names stored (there are a couple of other formats defined, but fonts using them are very, very rare). In that case, your only option is to back-track using Unicode values from the 'cmap'...there are some standard references for Unicode-to-glyph names that may be useful.
For CFF-based fonts (.OTF files), glyph names are stored inside of the 'CFF ' table. That's a bit trickier to parse, but if you're only looking for the glyph name references it shouldn't be too difficult to figure out.

opening a file format

I've been looking on certain sites for some time now, but I cant seem to find anything usable about file formats.
There is a certain file format on my computer, which I want to re-create to make add-ons for a program. Unfortunatly I would be the first to do so for that certain format, which makes it all the more hard. There are programs to ádd information to the file, but those programs are not open-source unfortunatly. But that does mean it's possible to figure out the file format somehow.
The closest I came to finding usable information about re-creating a file format was, "open it in notepad or a hex editor, and see if you can find anything usable"..
This certain file format contains information, so nothing like music files or images in case you'r wondering.
I'm just wondering if there is any guide on how to create a file format, or figuring out how an existing file format works. I believe this sort of format is called a Tabulated data format?
It really does depend on the file format.
Ideally, you find some documentation on how the file works, and use that. This is easy if the file uses a public format, so for HTML files or PNG files you can easily find that information. Proprietary formats often have published spec's too, or at least a publicly available API for manipulating them, depending on the company's policy on actively encouraging this sort of extension.
Next best is using examples of working code (whether published source or reverse engineered in itself) that deal with the file as a reference implementation.
Otherwise, reverse engineering is as good as you can do. Opening it in notepad and a hex editor (even with a binary format, looking at it parsed as text can tell you something; even with a text-based format, looking at it in a hex editor can tell you if they are making use of non-printable characters) is indeed the way to go. It's a detective job and while sometimes easy, often very hard, esp. since you may miss ways they deal with edge-cases that aren't hit in the samples you use.
The difficulty with obscure formats distributed with games is that they are often compiled from either a declarative definition language, a scripting language or directly from a set of resources like textures and meshes.
In some games, one compiled file will contain bits and pieces of all of the above, with no available documentation on the tools and formats used to piece it together. Some people call that "fun".
If you can't get anything from the hex, can't find any documentation and can't find a tool to read the file, you're probably best off asking the community to see if anyone is familiar with the technology.

Extract data from nested tables in PDF

I have a few pdf files that were created from word or excel files.
I need to get the information thats in the tables.
The text in the document is not an image so I'm able to extract the text using tools such as pdfbox.
When I have the text I have no way of knowing what cells in the table it belongs to because I don't know where the table borders are.
Iv'e tried a few desktop tools such as abby or solid pdf converter and they are able to convert the files into nice word documents but this doesn't suit my needs as I want to be able to do this programatticly in C#.
Some of the tables have nested tables wich I think makes this a little bit more diffucult.
I appreciate your help
The difficulty here is caused by the fact that the text in the PDF is not contained within any table. It might look like it is, but underneath the surface, it is not.
So there are a couple of options that I can think of. But none of them are going to be quite as satisfying as you'd probably like.
There are some companies that offer SDKs for PDF to Excel/Word conversion. Investintech and Iceni are a couple of examples. But these solutions are not free.
If you know the exact layout of the PDF files that you need to extract the table data from, then you can use any SDK that lets you extract text from a PDF and also tells you the exact co-ordinates of the extracted text. Using this method you need to know in advance where the text is going to be, so that you can extract text from a specific area on the page. It obviously won't work if you need to process any random document.
It's a difficult task, but hopefully this will give you a starting point.

Categories