I have a microsoft access file in which data is stored in kurtidev font. I have to convert it in to Mangal font. Is there any api available (Free) to do so? If not please suggest the ways to do it programatically.
If you are talking about Kruti Dev, these fonts use Latin/English Unicodes (U+0000 range) to encode Devanagari glyph shapes. Mangal, on the other hand, correctly uses Unicodes from the Devanagari range (U+0900). In other words, Devanagari 'Ka' (क), U+0915 in Mangal, is instead assigned to U+0064 in Kruti Dev. U+0064 is supposed to be Latin 'd'.
To get your database contents to display correctly with Mangal, you'll need to perform a transformation on the text from "Devanagari-as-Latin" Unicode into Devanagari Unicodes. That will require some sort of translation table.
A search for 'kruti dev to unicode converter' turns up a number of tools, some free, some for cost. There's even an online version. I was unable to find any source code for these tools which will likely contain the translation table that you need but you may be able to find something. Or you might be able to derive your own table by running a full complement of Kruti-encoded text through a converter and examining the output.
Related
First, please look at Text Rendering article.
Text rendering is the process of converting a string to a format that is readable to the user.
It's 2020 and yet unity doesn't support RTL complex script languages. so I'm looking for render text myself into final result. apparently rendering text get done in higher levels than writing. let me show an example:
U+0633 => س
U+0633 U+0633 => سس (raw string)
you can see there are two of same character but if you type them in windows they render in different shapes. the actual rendered charachters are:
U+FEB3 U+FEB2 => ﺳﺲ (rendered string)
as you can see these two character codes are both different from what they are typed (if you look precisely they are actuality different characters) but the results you see are same so in the middle of way there is an API that renders text properly. according to Microsoft Text Rendering article, normally text get stored in raw format and get rendered when displayed.
Question: is there a simple way in C# (.Net Framework) to convert raw string into rendered string?
example: is there a function to convert "U+0633 U+0633" to "U+FEB3 U+FEB2"? it's probably like what "Uniscribe" or DirectWrite" do, I need to convert what I type to what I see!!!
P.S.:
1- I'm aware of some Unity assets that do this. but they wrote what is available in all systems for many years and they are not complete and sound. I like to do it with OS renderer which is complete and sound.
2- I guess that if it was this easy, Unity developers would use it. so if there is not a straight way, please simply say there is not such thing.
Thanks!
For RTL languages you can use RTL Text Mesh Pro.
By this plugin, you can show your Persian, Arabic or any RTL language in the editor or by passing a variable from the script directly.
I've created a free package for the editor. maybe it can help.
you can flip the text RTL and then copy and paste it wherever you like, or you can use the Function SwitchRTL("the Text to flip"); from the Runtime Script if you want to do it from your script just include the runtime script so you can use it inside your own script.
link below
https://assetstore.unity.com/packages/tools/utilities/right2left-232988
Jianpu nodes are something like this:
So I want to make an application where user can specify the nodes and the output is the sound of the nodes
My problem is that I don't know how to display the nodes like the above in a RichTextBox.
There some fonts out there but you will have to test their quality.
Here is one, that is for Jianpu notes, more details here, but may not work without problems..
Here is a solution for Erhu Players & Jianpu Readers
And creating a set of notes with a free font maker is also an option.
And finally you might do it all in .Net, including all the painting, but try the fonts first!
There is Unicode 0307 (combining dot above) (looks like 1̇ ) or Unicode 0358 (combining dot above right) (looks like 1͘ ) but they don't perform very well for your task in my opinion. I think 0301 (combining acute accent) (looks like 1́ ) is better, although not very accurate.
For the bottom part 0316 (combining grave accent below) (looks like 1̖ ) is not very nice. You can try 0323 (combining dot below) (looks like 1̣).
You add the unicode characters after the normal letter and you can combine many of them (like 1̣́). Note that the results may vary among different types of fonts. The fonts I experience to support Unicode best are Arial and Times New Roman. I usually take Word, go to insert/symbol and try what looks best.
For the best results I recommend looking for a specialized font that has all the tones built in. Or create such a font by yourself. CorelDraw was able (in Version 6) to create fonts. I guess it still can in newer versions.
I'm testing an SDK that extracts text from a searchable PDF. One of the SDK's dependencies was recently updated, and it's causing an existing test on Hebrew text to fail. I don't know Hebrew nor enough about how the involved technologies represent right-to-left languages.
The NUnit test asserts that the extracted text matches the C# string "מנבוצץז ".
string hebrewText = reader.ReadToEnd();
Assert.AreEqual("מנבוצץז ", hebrewText);
The rasterized PDF has what I believe are the same characters, but in the opposite order.
The unit test fails with this message:
Expected: "מנבוצץז "
But was: " זץצובנמ"
Although the actual result more closely matches what I see in the rasterized PDF, I'm not completely sure the original test is wrong.
Are Hebrew characters in a C# string supposed to be read right-to-left like printed Hebrew text?
Does any part of the .NET stack tamper with the direction of Hebrew strings?
What about NUnit?
Are Hebrew characters embedded in a searchable PDF normally supposed to go in the same direction as the rasterized text?
Anything else I should know before deciding whether to "fix" this unit test?
There are various ways to encode RTL languages. The most common way (and Window's default) is to use logical ordering, which means the first letter is encoded as the first character in a string (or file). So whether visually the first letter appears on the left or right side of the screen doesn't affect the order in which they are stored.
Now as for the text appearing in Visual Studio, it depends on the version. As far as I remember, prior to Visual Studio 2010 the code editor displayed Hebrew backwards, and it was apparent as when you tried to select Hebrew text, it reversed in an odd way (which was visually confusing). It appears this issue no longer exists is Visual Studio 2010 (at least with SP1 which I just tested).
Let's take a Hebrew word for which the direction is more clear to non-Hebrew speakers than the string specified in your text:
יון
The word happens to be the Hebrew word for an ion, and on your screen, it should appear as three letters where the tallest letter is on the left and the shortest is on the right. In a .NET string, the expression "יון".Substring(0, 1) will produce the short letter, since it's the first letter in the string. The string can also be written as "\u05D9\u05D5\u05DF" where the leftmost Unicode character \u05D9 represents the short letter displayed on the right, which clearly demonstrates the order in which the letters are stored.
Since the string in your test case is nonsensical, I can't tell you whether it was a wrong test all along or if it a correct test that should pass. If the image you uploaded has been rendered correctly then it appears the actual result of your test is correct and the expected value is incorrect, and so you should fix the test.
I believe that all strings in C# will be stored internally as LTR; RTL strings will have a non-printable character (or something) denoting that they are indeed RTL.
More than likely. RTL GUIs and rendered text for example need certain properties (specifically RightToLeft and RightToLeftLayout) to be set in order to display correctly.
NUnit shouldn't. Nor should it care. IMHO a reversed string != the original string.
I couldn't comment. I'd assume that they should be whatever the test is expecting though, assuming it was passing at first.
Don't do half measures with RTL, it really doesn't like it. Either have full RTL support, or nothing. It can be pretty nasty, I wish you the best of luck!
iTextSharp is a great tool, I can use
PdfTextExtractor.GetTextFromPage(reader, iPage) + " ";
and it works great, but is there a way to extract only the bold text (e.g. the headlines) from the pdf, and not everything?
Any solution is useful, regardless of the programing language. Thank you
From within iText, You need to use the classes from the com.itextpdf.text.pdf.parser package.
Specifically, you'll need to use a PdfTextExtractor with a custom TextExtractionStrategy that checks the font name. Bold fonts USUALLY have the world "bold" in their name.
Potential Issues:
1) Not everything that looks like text is rendered with fonts and letters. It can be paths or a bitmap. The only way to extract such text is with OCR, and there's no way to get font info.
2) Font Encoding. The bytes that map to the glyphs you're seeing in the PDF may not have a map from those bytes to actual character information.
3) Not all bold-looking text is made with a bold font. Some bold text is made by stroking the text outline with a fairly thin line as well as the usual filling. In this case, the text render mode will be set to "stroke & fill" instead of the usual "fill". This is pretty rare, but it does happen from time to time.
An easy way to test for problems 1 and 2 is to attempt to copy and paste the text within Reader/Acrobat. If you can't select it, it's almost certainly paths or an image. If you can select it but the characters come out as random junk when pasted, then iText will come up with the same junk.
Problem 3 isn't that hard to test for programattically, though you have to handle it on a case by case basis. You need to call TextRenderInfo.getTextRenderMode(). 0 is fill (the standard way of doing things), and 2 is "stroke and fill".
So your TextExtractionStrategy can stub out beginTextBlock, endTextBlock, renderImage, and getResultantText. In your renderText implementation, you'll have to check the font name (for "bold", case insensitive) and the text render mode. If either of those is the case, it's part of on of your headings.
All this is supposing that you are dealing with arbitrary PDF files. If all your PDFs come from the same source, you can start cutting corners. I'll leave that as an Exercise For The Reader.
One of your best bets for this job surely is TET by pdflib.com with its ability to extract to the TETML format. Available for Windows, Mac OS X, Linux, Solaris, AIX, HP-UX...
I'm not sure if it does indeed recognize "headlines" as such (because PDF does not know much of structural markups, only visual ones) -- but it surely can tell you exact position and font used by each string of characters.
Please tell me how can i show symbols like "lambda" or Mu using c#.net in desktop application. what i think is we may do it using ASCII values and convert.toChar();.. if i am right that please give me link of page where i can get ASCII values of all such a scientific symbols.
Please give me link of any URL which contains list of such a ASCII numbers.
Open the Windows character map (charmap.exe), select a Unicode font (Arial should suffice) and copy the symbols into your source code or resources. It's just characters. Of course, you can also switch to Greek keyboard layout, so you can write the characters directly rather than going the charmap route.
Note that you need to use a Unicode font for the labels. You can use charmap to look up which font has Greek characters.
Please tell me how can i show symbols like "lambda" or Mu using c#.net in desktop application.
You don't have to do anything special. Just use whatever letters you want in either the IDE or in strings in the program. C# treats Greek letters the same as any other letters; they are not special.
what i think is we may do it using ASCII values and convert.toChar();
Hold on, I have a phone call. Oh, it's for you. It's 1968 calling, and they want their character set back. :-)
ASCII proper only has 95 printable characters, and Greek letters are not among them. ASCII was invented for teletypes back in the 1960's; we don't use it anymore. Characters in modern programming environments are represented using Unicode, which provides uniform support for tens of thousands of characters in dozens of alphabets.
if i am right then please give me link of page where i can get ASCII values of all such a scientific symbols.
You can get a list of all the Unicode characters at unicode.org. But like I said, you don't need to. You can just embed the character you want directly in the text. There's no need to resort to clumsy tricks like unicode escapes. (Unless, of course, you are planning on sending your source code to your coworkers using a 1970's era teletype machine.)
C# applications are all Unicode - so there should be no problem assigning Unicode strings to the controls' text, for example:
textBox1.Text = "this is a lambda symbol - λ";
Try this
char c = '\u03BB'; //03BC
System.Console.WriteLine(c.ToString());
does it work for you?