c# perform text rendering on string - c#

First, please look at Text Rendering article.
Text rendering is the process of converting a string to a format that is readable to the user.
It's 2020 and yet unity doesn't support RTL complex script languages. so I'm looking for render text myself into final result. apparently rendering text get done in higher levels than writing. let me show an example:
U+0633 => س
U+0633 U+0633 => سس (raw string)
you can see there are two of same character but if you type them in windows they render in different shapes. the actual rendered charachters are:
U+FEB3 U+FEB2 => ﺳﺲ (rendered string)
as you can see these two character codes are both different from what they are typed (if you look precisely they are actuality different characters) but the results you see are same so in the middle of way there is an API that renders text properly. according to Microsoft Text Rendering article, normally text get stored in raw format and get rendered when displayed.
Question: is there a simple way in C# (.Net Framework) to convert raw string into rendered string?
example: is there a function to convert "U+0633 U+0633" to "U+FEB3 U+FEB2"? it's probably like what "Uniscribe" or DirectWrite" do, I need to convert what I type to what I see!!!
P.S.:
1- I'm aware of some Unity assets that do this. but they wrote what is available in all systems for many years and they are not complete and sound. I like to do it with OS renderer which is complete and sound.
2- I guess that if it was this easy, Unity developers would use it. so if there is not a straight way, please simply say there is not such thing.
Thanks!

For RTL languages you can use RTL Text Mesh Pro.
By this plugin, you can show your Persian, Arabic or any RTL language in the editor or by passing a variable from the script directly.

I've created a free package for the editor. maybe it can help.
you can flip the text RTL and then copy and paste it wherever you like, or you can use the Function SwitchRTL("the Text to flip"); from the Runtime Script if you want to do it from your script just include the runtime script so you can use it inside your own script.
link below
https://assetstore.unity.com/packages/tools/utilities/right2left-232988

Related

Does C# remove the characters • from a string?

I have this C# code used to populate a label on the screen of a phone. Note that it's not HTML source being used here.
c1Label.Text = "To select cards for your deck you can one of a number of options
•
and this XAML
<local:JustifiedLabel x:Name="c1Label" Text= "To select cards for your deck you can one of a number of options
•
The former shows &#10 as part of the text but the XAML version works fine and shows this as a line feed followed by a bullet.
This is to be expected. Both languages (C# and XML) have different rules, especially regarding what characters are “special” and how they have to be escaped when you want to use them anyway. In the C# string
"
•"
are just exactly those letters since they have no special meaning to the C# compiler. In XML they are numeric character references, and are an escape mechanism of including arbitrary characters.
Conversely, in C# the following
"\n \u2022"
represents a line feed and a bullet. But in XML it's just the exact characters as written.
You can construct endless such examples with almost any two different languages. Yes, this means you cannot just copy text from one language and expect it to represent the same string in another language. If you're transforming one language into another it's easy to handle programmatically, when you're copying stuff around manually you just have to live with this and adapt accordingly.

Plain text search in markdown text

I am trying to write code (in C#) that can search for any plain-text word or phrase in a markdown file. Currently I'm doing this by a long-winded method: convert the markdown to HTML, strip HTML element tags out of the HTML text and then use a simple regular expression to search that for the word/phrase in question. Needless to say, this can be pretty slow.
A concrete example might show the problem. Say the markdown file contains
Something ***significant***
I would like to be able to find that by providing the search phrase something significant (i.e. ignoring the ***'s).
Is there an efficient way of doing this (i.e. that avoids the conversion to HTML) and doesn't involve me writing my own markdown parser?
Edit:
I want a generic way to search for any text or phrase in markdown text that contains any valid markdown formatting. The first answers were ways to match the specific text example I gave.
Edit:
I should have made it clear: this is required for a simple user-facing search and the markdown files could contain any valid markdown formatting. For this reason I need to be able to ignore anything in the markdown that the user wouldn't see as text if they converted the markdown to HTML. E.g. the markdown text that specifies an image (like ![Valid XHTML](http://w3.org/Icons/valid-xhtml10). should be skipped during the search). Converting to HTML produces decent results for the user because it then reasonably accurately reflects what a user sees (but it's just a slow solution, esp when there's a lot of markdown text to look through).
Use a regexp
var str = "Something ***significant***";
var regexp = new Regex("Something.+significant.+");
Console.WriteLine(regexp.Match(str).Success);
I want to do the same thing. I think of one way to achieve that.
Your method has two steps.
Get the plain text out of the markdown source (which has also two steps. Markdown->HTML and HTML->stripped to plain text)
Search within the plain text
Now, if the markdown source is persisted in a data store, then you may be able to also persist the plain text for search purposes only. So the step to extract the plain text from the markdown may be executed only once when persisting the markdown source (or every time the markdown source is updated), but the code that actually searches in the markdown could be executed immediately on the already persisted plain text data as many times as you want.
For example, if you have a relational DB with a column like markdown_text, you could also create a plain_text column and recreate its value every time the markdown_text column is changed.
Users won't bother if saving their markdown takes a few milliseconds (or even seconds) more than before. Users tend to feel safe when something that alters the system's state takes some time (they feel that something is actually happening in the system), rather than happen immediately (they feel that something went wrong and their command did not execute). But they will feel frustrated if searching took more than a few ms to complete. In general users want queries to complete immediately but commands to take some time (not more than a few seconds though).
Try this:
string input = "Something ***significant***";
string v = input.Replace("***", "");
Console.WriteLine(v)
look this example: enter link description here

Font conversion Kurtidev to mangal

I have a microsoft access file in which data is stored in kurtidev font. I have to convert it in to Mangal font. Is there any api available (Free) to do so? If not please suggest the ways to do it programatically.
If you are talking about Kruti Dev, these fonts use Latin/English Unicodes (U+0000 range) to encode Devanagari glyph shapes. Mangal, on the other hand, correctly uses Unicodes from the Devanagari range (U+0900). In other words, Devanagari 'Ka' (क), U+0915 in Mangal, is instead assigned to U+0064 in Kruti Dev. U+0064 is supposed to be Latin 'd'.
To get your database contents to display correctly with Mangal, you'll need to perform a transformation on the text from "Devanagari-as-Latin" Unicode into Devanagari Unicodes. That will require some sort of translation table.
A search for 'kruti dev to unicode converter' turns up a number of tools, some free, some for cost. There's even an online version. I was unable to find any source code for these tools which will likely contain the translation table that you need but you may be able to find something. Or you might be able to derive your own table by running a full complement of Kruti-encoded text through a converter and examining the output.

simple spell checking tool in C#

What i'm tying to achieve is a input field where you can put in how you think you spell the word then it will search my text file named words.txt and will find words that are of similar spelling then it will put the results into a new window.
thanks in advance
This is the one I have used and it sounded exactly what you wanted:
Make similar suggestions for input text by remembering old inputs
You can see it in action in the screen capture video here
ps I pre-populated a dictionary.dic file to suit in one instance and in the above example I did some other rules around LogParsers SQL-Like syntax to provide intellisense. HTH

Best way to extracting only the bold text from a PDF

iTextSharp is a great tool, I can use
PdfTextExtractor.GetTextFromPage(reader, iPage) + " ";
and it works great, but is there a way to extract only the bold text (e.g. the headlines) from the pdf, and not everything?
Any solution is useful, regardless of the programing language. Thank you
From within iText, You need to use the classes from the com.itextpdf.text.pdf.parser package.
Specifically, you'll need to use a PdfTextExtractor with a custom TextExtractionStrategy that checks the font name. Bold fonts USUALLY have the world "bold" in their name.
Potential Issues:
1) Not everything that looks like text is rendered with fonts and letters. It can be paths or a bitmap. The only way to extract such text is with OCR, and there's no way to get font info.
2) Font Encoding. The bytes that map to the glyphs you're seeing in the PDF may not have a map from those bytes to actual character information.
3) Not all bold-looking text is made with a bold font. Some bold text is made by stroking the text outline with a fairly thin line as well as the usual filling. In this case, the text render mode will be set to "stroke & fill" instead of the usual "fill". This is pretty rare, but it does happen from time to time.
An easy way to test for problems 1 and 2 is to attempt to copy and paste the text within Reader/Acrobat. If you can't select it, it's almost certainly paths or an image. If you can select it but the characters come out as random junk when pasted, then iText will come up with the same junk.
Problem 3 isn't that hard to test for programattically, though you have to handle it on a case by case basis. You need to call TextRenderInfo.getTextRenderMode(). 0 is fill (the standard way of doing things), and 2 is "stroke and fill".
So your TextExtractionStrategy can stub out beginTextBlock, endTextBlock, renderImage, and getResultantText. In your renderText implementation, you'll have to check the font name (for "bold", case insensitive) and the text render mode. If either of those is the case, it's part of on of your headings.
All this is supposing that you are dealing with arbitrary PDF files. If all your PDFs come from the same source, you can start cutting corners. I'll leave that as an Exercise For The Reader.
One of your best bets for this job surely is TET by pdflib.com with its ability to extract to the TETML format. Available for Windows, Mac OS X, Linux, Solaris, AIX, HP-UX...
I'm not sure if it does indeed recognize "headlines" as such (because PDF does not know much of structural markups, only visual ones) -- but it surely can tell you exact position and font used by each string of characters.

Categories