How can I determine from the .NET runtime if, for a given font, if it has the glyph for a character? I want to switch the font to Arial Unicode MS if I have text that the specified font does not have a glyph for (very common for CJK).
Update: I'm looking for a C# (ie all managed code) solution. I think GlyphTypeface may be what I need but I can't see a way in it to ask if a given character has a glyph. You can get the entire map back, but I assume that would be an expensive call.
I've done some unicode tools and the technique I use is getting the map and chache it
for each font used.
IDictionary<int, ushort> characterMap = GlyphTypeface.CharacterToGlyphMap
will give you the defined glyph index per codepoint.
msdn ref
if (characterMap.ContainsKey(CodePoint))
glyphExists = true;
else
glyphExists = false;
Related
Using iTextSharp, how can I determine if a parsed chunk of text is both bolded and underlined?
Details:
I'm trying to parse .PDF files in C# specifically for text that is both bolded and underlined. Using ITextSharp, I can derive from LocationTextExtractionStrategy and get the text, the location, the font, etc. from the iTextSharp.text.pdf.parser.TextRenderInfo object passed to the overridden .RenderText method.
However, determining if the text is Bold and/Underlined from the TextRenderInfo object has not been straight forward.
I tried to use TextRenderInfo.GetFont() to find the font properties, but was unsuccessful
I can currently determine if the text is Bold or not, by accessing the private Graphics State field on the TextRenderInfo object and checking it's .Font.PostscriptFontName property for the word "Bold" (Ugly, but appears to work.)
Biggest issue: I haven't found anything to determine if the text is underlined. How can I determine this?
Here is my current attempt:
private FieldInfo _gsField = typeof(TextRenderInfo).GetField("gs",
BindingFlags.GetField | BindingFlags.NonPublic | BindingFlags.Instance);
//Automatically called for each chunk of text in the PDF
public override void RenderText(TextRenderInfo renderInfo)
{
base.RenderText(renderInfo);
//UNDONE:Need to determine if text is underlined. How?
//NOTE: renderInfo.GetFont().FontWeight does not contain any actual information
var gs = (GraphicsState)_gsField.GetValue(renderInfo);
var textChunkInfo = new TextChunkInfo(renderInfo);
_allLocations.Add(textChunkInfo);
if (gs.Font.PostscriptFontName.Contains("Bold"))
//Add this to our found collection
FoundItems.Add(new TextChunkInfo(renderInfo));
if (!_lineHeights.Contains(textChunkInfo.LineHeight))
_lineHeights.Add(textChunkInfo.LineHeight);
}
Full source code of current attempt at: GitHub Repository (Two examples (example.pdf and example2.pdf) are included with text similar to what I'll be searching through.)
I tried to use TextRenderInfo.GetFont() to find the font properties, but was unsuccessful
I can currently determine if the text is Bold or not, by accessing the private Graphics State field on the TextRenderInfo object and checking it's .Font.PostscriptFontName property for the word "Bold" (Ugly, but appears to work.)
I don't quite understand this differentiation. TextRenderInfo.GetFont() is exactly the same as the Font property of the private Graphics State field of TextRenderInfo.
That being said, though, this is indeed one of the major ways to determine boldness.
Bold writing in PDFs is achieved either using
explicitly bold fonts (which is the better way); in this case one can try to determine whether or not the fonts are bold by
looking at the font name: it may contain a substring "bold" or something similar;
looking at some optional properties of the font, e.g. font weight, but beware, they are optional...
inspecting the embedded font file if applicable.
Neither of these methods is fool-proof;
the same font as for non-bold text but using special techniques to make them appear bold (aka poor man's bold), e.g.
not only filling the glyph contours but also drawing a thicker line along it for a bold impression,
drawing the glyph twice, the second time slightly displaced, also for a bold impression.
Underlined writing in PDFs is usually achieved by explicitly drawing a line or a very thin rectangle under the text. You can try and detect such lines by implementing IExtRenderListener, parsing the page in question with it to determine line locations, and then match with text positions during text extraction. Both can also be done in a single pass but beware, the underlines need not be drawn before the text or even shortly thereafter, the pdf producer may first draw all text and only then draw all underlines. Furthermore, I've also come across a funny construction, very short (e.g. 1pt) very wide (e.g. 50pt) vertical lines effectively are seen as horizontal ones...
IExtRenderListener extends the IRenderListener with three new methods, ModifyPath, RenderPath, and ClipPath. Whenever some path is drawn, be it a single line, a rectangle, or some very complex path, you'll first get a number of ModifyPath calls (at least one)
/**
* Called when the current path is being modified. E.g. new segment is being added,
* new subpath is being started etc.
*
* #param renderInfo Contains information about the path segment being added to the current path.
*/
void ModifyPath(PathConstructionRenderInfo renderInfo);
defining the lines and curves the path consists of, then at most one ClipPath call
/**
* Called when the current path should be set as a new clipping path.
*
* #param rule Either {#link PathPaintingRenderInfo#EVEN_ODD_RULE} or {#link PathPaintingRenderInfo#NONZERO_WINDING_RULE}
*/
void ClipPath(int rule);
(if and only if the path shall serve as clip path for the following drawing operations), and finally exactly one RenderPath call
/**
* Called when the current path should be rendered.
*
* #param renderInfo Contains information about the current path which should be rendered.
* #return The path which can be used as a new clipping path.
*/
Path RenderPath(PathPaintingRenderInfo renderInfo);
defining how that path shall be drawn (any combination of filling its interior and stroking the path itself).
I.e. for recognizing underlines, you'll have to collect the path pieces provided via ModifyPath and decide whether they might describe one or more underlines as soon as the RenderPath call comes.
Theoretically underlines could also be created differently, e.g. using a bitmap image, but I'm not aware of pdf producers doing so.
By the way, in your example PDF underlines appear consistently to be drawn using a MoveTo to the line starting point, a LineTo to its end, and then a Stroke to simply stroke the path. Thus, you'll get two ModifyPath calls (one with operation value MOVETO, one with LINETO) and one RenderPath call (with operation STROKE) respectively for each underline.
In DOCOTIC.pdf library there is a method responding as true or false.
In C#
bool FONT_ITALIC = data.Font.Italic;
bool FONT_UNDERLINE = data.Font.Underline;
Check for the value of FONT_ITALIC/FONT_UNDERLINE.
I have tried to use the same, but couldn't get correct value always.
Any suggestions are welcome.
I'm using the following code to load a font into memory for generating an image with GDI+:
var fontCollection = new PrivateFontCollection();
fontCollection.AddFontFile(Server.MapPath("~/fonts/abraham-webfont.ttf"));
fontCollection.Families.Count(); // => This line tells me, that the collection has 0 items.
There are no exceptions, but the fontCollection Families property is empty after the AddFontFile method has run without any exceptions.
I've verified that the path is valid (File.Exists returns true):
Response.Write(System.IO.File.Exists(Server.MapPath("~/fonts/abraham-webfont.ttf"))); // # => Renders "True"
The TTF-file seems to work fine, when I open the file, so it's not an invalid TTF-file:
Any suggestions?
Answer from Hans Passant solved the problem:
PrivateFontCollection is notoriously flakey. One failure mode that's pretty common today is that the font is actually an OpenType font with TrueType outlines. GDI+ only supports "pure" ones. The shoe fits, the web says that Abraham is an OpenType font. Works in WPF, not in Winforms.
I am developing a Windows forms application that includes a DataGridView. This DataGridView has 3 columns, all of which are simply text cells:
Timestamp
Connection
Message
The issue I'm running into is that when I add a row (programmatically), I'm finding that the text disappears if it is too long. To be specific, if the text exceeds 4563 characters in length, then the text disappears.
I know that the DataGridViewTextBoxColumn class has a property call MaxInputLength that can limit the number of characters entered. But according the the Documentation, it only affects text that is input manually by the user. I, however, am inputting this text programmatically.
Just to make sure though, I set this property very high but the disappearing text issue still arises when I pass the 4563 character limit.
One thing I have noticed is that the text is still there (i.e. the scroll bar along the bottom can still be scrolled as though the text is still there) but I cannot see the text itself. I can also edit the text.
I can add characters until the 4563 limit but as soon as I pass that, the text disappears. If I press backspace to return to exactly 4563 characters, the text reappears.
I am developing this using .NET 4.0, since I have to support Windows XP.
Here's the short answer that will probably disappoint you: It's a reported bug and verified by Microsoft, closed as "Not important enough to fix". There may be more instances of it, but it's been known since at least 2011 DataGridView control shows blank cell if large string is entered and column resized to max. The "workaround" is to just limit the size of the cells width, but for you that may not be satisfactory.
However, curiosity got the best of me so I started looking into it a little deeper; Here's the first observation worth mentioning:
If you look at the series of pictures, you'll notice I replicated your problem with the default font size/style and the specific number 5460. What's so special about 5460? Well, nothing in particular, except that as your character threshold crosses it the ContentBounds and Width of the column passes 32767. What's so special about 32767? Other than being the default MaxInputLength of a DataGridViewTextBoxCell, it's the upper limit of a signed short or Int16 (2^15-1). I highly doubt it's a coincidence the issue is occurring here, though not cause of anything to do with MaxInputLength per se. I'd be willing to bet you first noticed the issue at 4563 characters because your font size expanded the width to 32767 as well.
The next question, is why? I'm not really sure. I started following the rabbit hole and disassembled some of the .NET 4.0 DataGridView* libraries to find out. It's a pretty massive and complicated control, and I haven't been able to draw any definite conclusions, but one thing I found that's worth noting is the absolute maximum width a column can assume is 65536, the value of an UNsigned Int16 (2^16):
You see this check in a lot of private internal places when adding or resizing a column, and I tested it. The size won't go larger
This is ironic for two reasons. For one, using the default settings, you can only display 10922 characters (65536 / 6 pixels per character) in a column despite the editing input length being 32767 characters, and programmatically arbitrary.
Second, why would this issue start cropping up at exactly the max of the signed variant of the columns max width? Hmmmm. This is totally a guess, but I think somewhere along the line the max value for whatever renders the text was set as a regular short instead of an unsigned short... or something along those lines. I have my suspicions of the PaintPrivate() method in the implementation of DataGridViewTextBoxCell(), so if you're feeling frisky, maybe put a microscope to it. You'll need an IL disassembler to see this stuff that's not exposed publicly. Specifically, this part of the code I have suspicions of:
if (text != null && (paint && !flag2 || computeContentBounds))
{
int y = cellStyle.WrapMode == DataGridViewTriState.True ? 1 : 2;
rectangle3.Offset(0, y);
// ISSUE: explicit reference operation
// ISSUE: variable of a reference type
Rectangle& local = #rectangle3;
// ISSUE: explicit reference operation
int width = (^local).Width;
// ISSUE: explicit reference operation
(^local).Width = width;
rectangle3.Height -= y + 1;
if (rectangle3.Width > 0 && rectangle3.Height > 0)
{
TextFormatFlags cellStyleAlignment = DataGridViewUtilities.ComputeTextFormatFlagsForCellStyleAlignment(this.DataGridView.RightToLeftInternal, cellStyle.Alignment, cellStyle.WrapMode);
if (paint)
{
if (DataGridViewCell.PaintContentForeground(paintParts))
{
if ((cellStyleAlignment & TextFormatFlags.SingleLine) != TextFormatFlags.Default)
cellStyleAlignment |= TextFormatFlags.EndEllipsis;
TextRenderer.DrawText((IDeviceContext) graphics, text, cellStyle.Font, rectangle3, flag3 ? cellStyle.SelectionForeColor : cellStyle.ForeColor, cellStyleAlignment);
}
}
else
rectangle1 = DataGridViewUtilities.GetTextBounds(rectangle3, text, cellStyleAlignment, cellStyle);
}
Sorry for the book!
TL;DR USE A SMALL ASS FONT IF YOU WANT TO PACK CHARACTERS INTO HUGE CELLS.
Im using a PrivateFontCollection to load a font via the AddMemoryFont. I retrieve the FontFamily, and then I query it using IsStyleAvailable to determine what the font supports as styles. However, with myriad fonts every single call to IsStyleAvailable returns true.
PrivateFontCollection pfc = new PrivateFontCollection();
var fontBuffer = Marshal.AllocCoTaskMem(dta.Length);
Marshal.Copy(dta, 0, fontBuffer, dta.Length);
pfc.AddMemoryFont(fontBuffer, dta.Length);
System.Drawing.FontFamily fam = pfc.Families[0];
if (fam.IsStyleAvailable(d.FontStyle.Bold)) //do something
Does anyone know how to get the actual style information from the FontFamily? If you look at the C:\Windows\Fonts folder you can see the supported styles. For example: Agency FB supports Bold; Regular, but when I query it in this fashion I get styles for Underline, Strikeout, and Italic, as well as Bold and Regular.
Is there a better way to go about this?
The font engine in Windows knows how to synthesize a style from the unstyled base font. It isn't particularly difficult to do on paper, just makes the stems fatter to get bold, tilt them to get italic, draw a line to get underline or strike-out. It isn't exactly as pretty as the dedicated outlines that a good designer will create but it certainly gets the job done. So when you ask "can you do that?" then you'll get a resounding "sure thing!"
Since you explicitly added the TTF files, you already know what styles are directly supported without synthesis and should not need to ask. Finding out anyway is perhaps possible with pinvoke and/or digging through the TTF tables but it is going to be ugly and certainly not directly supported by .NET. There's no winapi function I know of that tells you directly.
I'm wondering if there are any simple ways to get a list of all fixed-width (monospaced) fonts installed on a user's system in C#?
I'm using .net 3.5 so have access to the WPF System.Windows.Media namespace and LINQ to get font information, but I'm not sure what I'm looking for.
I want to be able to provide a filtered list of monospaced fonts and/or pick out monospaced fonts from a larger list of fonts (as seen in the VS options dialog).
Have a look at:
http://www.pinvoke.net/default.aspx/Structures/LOGFONT.html
Use one of the structures in there, then loop over families, instantiating a Font, and getting the LogFont value and checking lfPitchAndFamily.
The following code is written on the fly and untested, but something like the following should work:
foreach (FontFamily ff in System.Drawing.FontFamily.Families)
{
if (ff.IsStyleAvailable(FontStyle.Regular))
{
Font font = new Font(ff, 10);
LOGFONT lf = new LOGFONT();
font.ToLogFont(lf);
if (lf.lfPitchAndFamily ^ 1)
{
do stuff here......
}
}
}
Unfortunately ToLogFont function does not fill lfPitchAndFamily field to correct values. In my case it's always 0.
One approximation to detect which fonts might be fixed is the following
foreach ( FontFamily ff in FontFamily.Families ) {
if ( ff.IsStyleAvailable( FontStyle.Regular ) ) {
float diff;
using ( Font font = new Font( ff, 16 ) ) {
diff = TextRenderer.MeasureText( "WWW", font ).Width - TextRenderer.MeasureText( "...", font ).Width;
}
if ( Math.Abs( diff ) < float.Epsilon * 2 ) {
Debug.WriteLine( ff.ToString() );
}
}
}
Keep in mind that they are several false positives, for example Wingdings
AFAIK you can't do it using BCL libraries only. You have to use WinAPI interop.
You need to analyze 2 lowest bits of LOGFONT.lfPitchAndFamily member. There is a constant FIXED_PITCH (means that font is fixed-width) that can be used as a bit mask for lfPitchAndFamily.
Here is a useful article:
Enumerating Fonts
Enumerating fonts can be a little
confusing, and unless you want to
enumerate all fonts on your system,
can be a little more difficult than
MSDN suggests. This article will
explain exactly the steps you need to
use to find every fixed-width font on
your system, and also enumerate every
possible size for each individual
font.