Arabic Special Characters unicode \u064b \u064d \u0647 rendered incorrectly save as pdf - c#

Arabic Special Characters with Unicode
\u064b
\u064d
\u0647
Are rendered incorrectly when trying to save as PDF.
Attached is a sample VS2017 project, with a sample word file, and the code used for conversion, only omitting the license part.
Also check the support ticket on the Aspose free forums:
https://forum.aspose.com/t/arabic-special-characters-unicode-u064b-u064d-u0647-rendered-incorrectly-save-as-pdf/193389?u=mohamed.atia88
Thanks in advance,

Related

MigraDoc RTF Document cannot translate special characters

I created an application, where we want to create .rtf & .pdf Documents.
The documents also contain characters like ä,ü,ö,ß and we have the big issue, that those special characters are not shown correctly in the RTF Document.
For creating the rtf document, we are using "Migradoc" and the "RtfDocumentRenderer".
The PDF will be created correctly... And for the rtf document, we already tried a few things:
Setting the UTF encoding before calling the renderer
changing the culture info
creating the document as byte array, converting it to an array, encoded the byte array, but without success
with Unicode instead of the character.
The current version 1.51 of PDFsharp/MigraDoc targets .NET 2.0/.NET 3.5.
The next version (coming soon) targets .NET 6 and properly deals with the change of the default encoding.

Create DOCX file with support for RTL (hebrew, arabic) using C# .NET

In the last several weeks i have figured that DOCX (XCEED) component does not support Hebrew really good, if i write something in Hebrew (Or Arabic) and need dot or colon at the end of the line, they appear at the beginning.
I'm still waiting for the developer to answer me but meanwhile i have look at all the options out there:
Aspose (Way pricey and seems only to work as html to docx)
OpenXML (Very hard to work and couldn't find the right example for me)
HtmlToOpenXML (doesn't have RTL support and support on github is slow)
OpenXML PowerTools (only do DOCX TO HTML)
And several more
please - i you out there have a solution, that can work with images, table and many other basic stuff
i need your help
Thanks for your inquiry. Aspose.Words supports wide range of document formats; it is not limited to Html and Docx. See the documentation to learn what formats are supported:
https://docs.aspose.com/display/wordsnet/Product+Overview
Aspose.Words can be used as for conversion document between various formats as well as for creating documents from scratch using it’s rich API or using reporting features. You can also easily format text or whole paragraph as right-to-left
https://apireference.aspose.com/net/words/aspose.words/font/properties/bidi
https://apireference.aspose.com/net/words/aspose.words/paragraphformat/properties/bidi
The same options can be applied to Section or Table
https://apireference.aspose.com/net/words/aspose.words/pagesetup/properties/bidi
https://apireference.aspose.com/net/words/aspose.words.tables/table/properties/bidi
If you have further question, please do not hesitate to ask here or in Aspose.Words support forum.

Some special characters have rendering problem as a pdf file in LocalReport

I am using Microsoft.Reporting.WinForms.dll to render my RDLC as a pdf file, but when i open pdf file with adobe reader i have a problem about some special Turkish characters.In pdf, they seem normal but when i try to use CTRL + F to search some words in the pdf file. I couldn't find these words. Even if my pdf file included these Turkish characters. Also, when i copy-paste these words into the file, i get characters like 􀃹􀃻􀁏􀁈􀁐􀀃􀀮. It is interesting as i also use the same dll to render my RDLC as an excel file. I use same class same code and same method. I don't have this problem in an excel file.
I use byte[] Render(string format); method in WinForms.dll for rendering. Maybe some special character's ASCII code is out of range for byte array maybe because of this it couldn't render every characters for pdf format but i am not sure about this.
Thanks...
according to microsoft article there is an issue with special characters that is fixed in sql server 2014, the corresponding reportviewer dll would be the 2015 runtime.
maybe you should upgrade
I had a similar problem. My application generates PDFs using LocalReport.
My solution was:
1- Modify the RDLC XML schema to use the 2016 version. Change what you have for this.
<Report xmlns = "http://schemas.microsoft.com/sqlserver/reporting/2016/01/reportdefinition" xmlns: rd = "http://schemas.microsoft.com/SQLServer/reporting/reportdesigner">.
Then, modify the schema, it is no longer the same as previous versions (DataSources go up ...)
2- Remove <EmbedFonts>None</EmbedFonts> from DeviceInfo.
With these changes, I got the special characters painted and printed well.

Can Encoding.Default recognize utf8 characters? Should I really not use it?

Well, when using IO.File.ReadAllText(path) or ReadAllText(path, System.Text.Encoding.UTF8) to read a text file which is saved in ANSI encoding, non-latin characters aren't displayed correctly.
So, I decided to use Encoding.Default. It worked just fine, but I see recommendations against using it everywhere (like here and here) because it "will only guarantee that all UTF-7 character sets will be read correctly". Also Microsoft
says:
Gets an encoding for the operating system's current ANSI code page.
However, it seems to me that it can recognize a file with any encoding. I tested that on a file that contains Chinese, Japanese, and Arabic characters -the file is saved in utf8 encoding-, and I was able to display the file correctly.
Code used:
Dim loadedText As String = IO.File.ReadAllText(path, System.Text.Encoding.Default)
MessageBox.Show(loadedText, "utf8")
Output:
So my question in points:
Is there something I'm missing here?
Why is it not recommended to use Encoding.Default when reading a file?
I know that a file with ANSI encoding would be displayed incorrectly if the default system encoding/system locale is changed, which is something I don't care about in my current case. But..
Is there even another way to prevent this from happening?
Side note: Please don't mind me using the c# tag. Although my code is in VB, any answer with C# code is welcomed.
File.ReadAllText actually tries to auto-detect the encoding. If the encoding cannot be determined from a BOM, then the encoding argument is used to decode the file.
This method attempts to automatically detect the encoding of a file based on the presence of byte order marks. Encoding formats UTF-8 and UTF-32 (both big-endian and little-endian) can be detected.
If you used Encoding.UTF8 to write the file, then it would include a BOM. Your Encoding.Default is likely being ignored.
Using Encoding.Default is not recommended because it is operating system's ANSI code page, which is limited to given code page's character set. In other words, text file created in Notepad (ANSI encoding) in Czech Windows will be displayed incorrectly in English Windows. For this reason, everything should be saved and opened in UTF-8 encoding.
Saved in ANSI and opened in Unicode may not work
Saved in Unicode and opened in ANSI will not work
Saved in ANSI and opened in another ANSI may not work

Special characters in SharpPDF

I'm using sharpPDF dll (http://sharppdf.sourceforge.net) to create PDF's in C#. Everything works great but I don't get any special characters (actually these are Polish letters such as "ą, ć, ł, Ó...") in my output. I'm saving strings in that PDF.
Is there any way to get that working?
Thanks.
Unfortunately SharpPDF has a lot of issues with special characters and there is no evolution planned for a correction of the special characters problem.
Sorry.

Categories