Does Rider or C# supports all emojis? [closed] - c#

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 days ago.
Improve this question
I'm building a Console game and i want to print some emojis like: 👾 or ⚔️, but rider doesn´t render them correctly.
These emojis render correctly:👾
string heroName = $"👾{hero.Name}👾";
while these don't:⚔️☠️
string itemName = $"⚔️{item.Name}⚔️"
My question is, is there a problem in the way I declare the strings, or is it a Rider configuration problem with the encoding that I need to change?

Disclaimer:
I believe, I have some relevant input on your problem.
I'm not a Unicode expert. I don't really understand it well. Just providing my observations based on facts.
You haven't defined what "rendering correctly" and "rendering incorrectly" mean. Therefore, it's a bit hard to provide you a specific answer.
I assume that problem is that emojis are printed in the console as ? characters.
Title of your question is:
Does Rider or C# supports all emojis?
Problem of rendering emojis in Windows Console has nothing to do with Rider or C#.
Partial solution
To render Unicode emojis correctly you need to:
Set encoding for the output and print emojis:
Console.OutputEncoding = Encoding.Unicode;
Console.WriteLine("👾⚔☠"); // Without variations
Console.WriteLine("👾⚔️☠️"); // From the original question
Use Windows Terminal to run your app:
The problem of rendering emojis in CMD (without Windows Terminal) has something to do with GDI rendering engine that currently doesn't support font-fallback.
Additional fragmented observations below:
It is unclear where did you get your emojis from.
The first emoji that you provided is OK.
Copy it: 👾
Go to Full Emoji List, v15.0
Press Ctrl + F, paste it
It's found as U+1F47E – Alien monster
The second emoji is different, you can't find it at Full Emoji List, v15.0
However, you can find a Crossed swords one. It looks exactly like yours on that page:
But emoji from that list looks differently in my browser on current StackOverflow page: ⚔
It looks like a colored picture on my Windows machine on the "Full Emoji List, v15.0" page just because of the font Segoe UI Emoji.
If we convert your emojis from the question to bytes, we've got:
👾 - f0 9f 91 be (link)
⚔️ - e2 9a 94 ef b8 8f (link)
☠️ - e2 98 a0 ef b8 8f (link)
If you google ef b8 8f, you'll find a mysterious Variation Selector-16 at the UTF-8 code page (#66).
So, if you remove these extra bytes (ef b8 8f) of Variation Selector-16 from e2 98 a0 ef b8 8f, you'll end up with bytes e2 98 a0 which you can encode back to a Unicode character ☠ (link). It looks more simplistic in my browser.
It has something to do with Surrogate pairs and variation selectors (Microsoft Learn).
I believe, these will work in more cases than original ones:
👾
⚔
☠
In Visual Studio 2022 on Windows 10 they looks that way:
But when I run the app, in console they look like this:
I'd assume that you refer to such a result as "not rendered correctly".
By default, console output encoding is set to OSEncoding:
So, you need to change it to Encoding.Unicode. But it still won't work in CMD.
If you run it in Windows Terminal, it renders text differently, therefore, it will work.
But it will work differently depending on the font being used.
Additional references:
Windows Command-Line: Inside the Windows Console
Unicode in Windows CMD
StackOverflow answer from Peter Duniho for "How to show emoji in c# console output?"
Everything You Need To Know About Emoji 🍭

Related

How can I set halftone (sethalftone) to each separation color with device tiffsep1 and other separation ones?

The code works but the commented code will create an error. The error are not solved by changing -sDEVICE to tiffgray, for example.
String[] ARGS = new String[] {
"",
"-sDEVICE=tiffsep1",
"-r1200",
"-o out.tiff",
"SOSample.pdf",
//"-c",
//"<< /HalftoneType 1 /Frequency 300 /Angle 45 /SpotFunction {180 mul cos exch 180 mul cos add 2 div} >> sethalftone",
//"-f"
};
How can I define sethalftone with ghostscript and how can I set it for each color of tiffsep1? What am I doing wrong with one color and how to make it for separations?
I'm using:
[DllImport("gsdll64.dll", EntryPoint = "gsapi_init_with_args")]
public static extern int INSTANCEStart(IntPtr instance, int argc, string[] argv);
and so on.
I'm working with Ghostscript 9.52.
Something that could help (\"):
"-c",
"\"<</Orientation 1>> setpagedevice\"",
You need to use the sethalftone PostScript operator in order to change the halftone. Obviously this will involve writing some PostScript.
Not only that, but you really need to set the default halftone, or set the halftone at the start of the page, because the current PDF interpreter in Ghostscript does an initgraphics at the start of every page of a PDF file.
For all of this you are going to need a copy of the PostScript Language Reference Manual, which you can get from somewhere on the Adobe web site. They keep moving stuff around so I'm not going to try and post a link, just google for the name of the manual. You want the third edition.
So you need to write a BeginPage procedure, which you will find covered in Chapter 6 under device control, pages 427 onwards.
The BeginPage procedure will need to set a halftone, and you will find halftones covered in Section 7.4, page 480 onwards. You will presumably want to use either a type 2 or type 4 halftone dictionary.
When you've assembled that, you then need to pass it to Ghostscript before you process the PDF file. The simplest method is to put the PostScript program in a file (called eg setup.ps) and then put that filename on the command line immediately before the PDF filename.
Eg:
gs -r1200 -sDEVICE=tiffsep1 -o out%d.tif setup.ps sample.pdf
Note that PDF files can contain a halftone specification themselves (this is deprecated in PDF 2.0) and Ghostscript will honour any halftone in a PDF file.
Finally; this is an unusual request and, given that you are writing code to link to the Ghostscript DLL, makes me think you may be using Ghostscript commercially. You should review the AGPL to ensure you are complying with the terms of the license. If you plan on distributing your application you will almost certainly need a commercial license.

itext reading pdf 1s as up arrows ERROR

For some reason itextsharp is now reading pdf which contains numbers such as 4123 as 4*23 where the * is actually a an arrow pointing up. Not sure why this is happening. Please help.
Thanks.
Sample file is located here: https://dl.dropboxusercontent.com/u/116833/SAMPLE%20PDF.pdf
The reason for the arrows is that the file actually tries to mislead text extractors which extract text according to the guidelines of Section 9.10.2 Mapping Character Codes to Unicode Values of the PDF specification ISO 32000-1 while not confusing those which prefer ActualText marked-content sequence entries: The former method is lead to believe the '3's are arrows while the latter is told the '3's are threes.
Most likely this is done to prevent automated text extraction while allowing manual copy&paste because Adobe Reader does prefer the ActualText marked-content sequence entries (thus, manual extraction works all right) while many programmatic extractors prefer the former method.
As far as I read the relevant sections of the specification, it prefers neither way over the other.
Details
E.g. look at the first part number:
BT
/T1_1 1 Tf
10 0 0 10 69.1456 750.2834 Tm
(1 )Tj
ET
EMC
/Span <</MCID 14 >>BDC
BT
/T1_1 1 Tf
10 0 0 10 89.5488 750.2834 Tm
(2)Tj
/Span<</ActualText<FEFF0033>>> BDC
(3)Tj
EMC
(412109 )Tj
ET
EMC
As you see the '3' is marked with an ActualText entry indicating that it is a three indeed (<FEFF0033> is a long way to indicate the Unicode digit three).
The font T1_1, on the other hand, offers a ToUnicode stream containing the mapping
...
<30> <0030>
<31> <0031>
<32> <0032>
<33> <0018>
<34> <0034>
<35> <0035>
...
As you see while other digits (0x30 is '0', 0x31 is '1', ... , 0x39 is '9') are mapped identically, the '3', i.e. 0x33, is mapped to the Unicode code point 0x0018, and
U+0018 is the Unicode hex value of the character <control>, which is categorized as "control character" in the Unicode 6.0 character table.
"<control>" was previously named "CANCEL" in older versions of Unicode.
(cf. http://www.marathon-studios.com/unicode/U0018/Control)
In some context this control character is displayed as an upwards arrow.

unidentified char preventing me from parsing coordinates into an int

Ok I'll try and explain the problem although it's going to be a bit hard.
I'm trying to parse some information from a certain page containing coordinates.
and the copy paste give you something like this:
Distance Position
5.8 ‎‭(‭‭77‬‬|‭-‭2‬‬)‬‎
6.3 ‎‭(‭‭76‬‬|‭-‭1‬‬)‬‎
7.8 ‎‭(‭‭76‬‬|‭‭6‬‬)‬‎
9.2 ‎‭(‭‭91‬‬|‭‭3‬‬)‬‎
9.5 ‎‭(‭‭79‬‬|‭‭10‬‬)‬‎
12.2 ‎‭(‭‭80‬‬|‭‭13‬‬)‬‎
15 ‎‭(‭‭82‬‬|‭-‭14‬‬)‬‎
15 ‎‭(‭‭81‬‬|‭‭16‬‬)‬‎
now the problem that I have is that between the "(" and the number there is an unidentified char, if you press on the right arrow key it won't move but if you press few times then it will move.
I haven't encounter this thing anywhere, and the website is in php if that helps.
also if that helps when I copy paste the information in here the char disappear and I can move freely through the text.
Please help me with this problem since it's causing my software to malfunction since I'm trying to parse the coordinates into an int and because of that char it won't let me, it'll give me a format exception.
While viewing in UTF-8, I see nothing, while changing the encoding to ANSI, I am left with:
5.8 ‎‭(‭‭77‬‬|‭-‭2‬‬)‬‎
6.3 ‎‭(‭‭76‬‬|‭-‭1‬‬)‬‎
7.8 ‎‭(‭‭76‬‬|‭‭6‬‬)‬‎
9.2 ‎‭(‭‭91‬‬|‭‭3‬‬)‬‎
9.5 ‎‭(‭‭79‬‬|‭‭10‬‬)‬‎
12.2 ‎‭(‭‭80‬‬|‭‭13‬‬)‬‎
15 ‎‭(‭‭82‬‬|‭-‭14‬‬)‬‎
15 ‎‭(‭‭81‬‬|‭‭16‬‬)‬‎
You seem to have used the Left-to-right mark (‎‭), and the encoding was swapped once or twice.
You could clean it, because it's from a website. My first guess would be that your browser settings are not correct (wrong encoding set).
You can still try cleaning it.
Code:
Regex rgx = new Regex("[^a-zA-Z0-9_\n %\[\]\.\(\)%&-]");
data = rgx.Replace(data, "");

Using Font to create barcode - is this correct? Am I missing something?

I saw a lot of threads and postings about .NET and generating barcodes. A lot of people are talking about libraries, dlls and some "out of the box applications". I just ask myself: do I really need this whole stuff? Im (at this time) able to create a barcode without any extra library or something else. Thats fine working with Font. I just need a barcodefont, lets take one free like 3 of 9, I can install it (actually I dont even have to install it, its enough to have the path to the file) and then Ill do something like this:
Font f = new Font("Free 3 of 9", 80);
this.Font = f;
Label l = new Label();
l.Text = "*STACKOVERFLOW*";
l.Size = new System.Drawing.Size(800, 600);
this.Controls.Add(l);
this.Size = new Size(800, 600);
And I can display the barcode. And you know what? My phone is able to read it. So thats how easy it is. I could save it as jpg, I could past it into an xml file and so on. If I want another barcode, I just need the new font, change 3 of 9 to something else and thats it.
So here my question: what im I missing? Everybody is talking about "uh use something thats already done", "use libraries" and so on. So what problems could I get if I continue this without extra libs and other stuff? Any suggestions? Thank you
3 of 9 is just one way to encode barcodes. Usually when we think of barcodes we are thinking of the UPC standards, but for most barcode readers, 3 of 9 works just fine. If it works for your application, don't waste your time with a third party library and just use the 3 of 9 font.
One thing to consider is that the 3 of 9 standard does not contain a check digit. This helps to detect read errors in some standards. The wikipedia page on the encoding has some details on this, but it sounds like it's not really a big deal because of the way the characters are encoded.

decrypt an encrypted value?

I have an old Paradox database (I can convert it to Access 2007) which contains more then 200,000 records. This database has two columns: the first one is named "Word" and the second one is named "Mean". It is a dictionary database and my client wants to convert this old database to ASP.NET and SQL.
However, we don't know what key or method is used to encrypt or encode the "Mean" column which is in the Unicode format. The software itself has been written in Delphi 7 and we don't have the source code. My client only knows the credentials for logging in to database. The problem is decoding the Mean column.
What I do have is the compiled windows application and the Paradox database. This software can decode the "Mean" column for each "Word" so the method and/or key is in its own compiled code(.exe) or one of the files in its directory.
For example, we know that in the following row the "Zymurgy"
exactly means "مبحث عمل تخمیر در شیمی علمی, تخمیر شناسی" since the application translates it like that. Here is what the record looks like when I open the database in Access:
Word Mean
Zymurgy 5OBnGguKPdDAd7L2lnvd9Lnf1mdd2zDBQRxngsCuirK5h91sVmy0kpRcue/+ql9ORmP99Mn/QZ4=
Therefore we're trying to discover how the value in the Mean column is converted to "مبحث عمل تخمیر در شیمی علمی, تخمیر شناسی". I think the "Mean" column value in above row is encoded in Base64 string format, but decoding the Base64 string does not yet result in the expected text.
The extensions for files in the win app directory are dll, CCC, DAT, exe (other than the main app file), SYS, FAM, MB, PX, TV, VAL.
Any kind of help is appreciated.
here is two more example and remember double quotes at start and end are not part of the strings:
word: "abdominal"
coded value: "vwtj0bmj7jdF9SS8sbrIalBoKMDvTbpraFgG4gP/G9GLx5iU/E98rQ=="
translation in Farsi: "شکمی, بطنی, وریدهای شکمی, ماهیان بطنی"
word: "cart"
coded value: "KHoCkDsIndb6OKjxVxsh+Ti+iA/ZqP9sz28e4/cQzMyLI+ToPbiLOaECWQ8XKXTz"
translation in Farsi: "ارابه, گاری, دوچرخه, چرخ, با گاری بردن"
here is the result in different encodings:
1- in unicode the result is: "ᩧ訋퀽矀箖�柖�섰᱁艧껀늊螹泝汖銴岔꫾也捆￉鹁"
2- in utf32 the result is: "��������������"
3- in utf7 the result is: "äàg\v=ÐÀw²ö{Ýô¹ßÖg]Û0ÁAgÀ®²¹ÝlVl´\\¹ïþª_NFcýôÉÿA"
4- in utf8 the result is: "��g\v�=��w���{����g]�0�Ag��������lVl���\\����_NFc����A�"
5- in 1256 the result is: "نàg\vٹ=ذہw²ِ–{فô¹كضg]غ0ءAg‚ہ®ٹ²¹‡فlVl´’”\\¹ï‏ھ_NFc‎ôةےA"
yet i discovered that the paradox database system is very complex when it comes to key management and most of the time the keys are "compound keys" and that's why it's problematic and that's why it's abandoned!
UPDATE: i'm trying to do the automation by using AutoIt v3 because the decryption process as i understand can't be done in one or two days. now i have another problem which is related to text/font. when i copy the translated text to notepad it will change to some unrecognizable text unless i change the font of notepad to the font of the translation software. if i type something in the notepad in Farsi it will show it correctly regardless of what font i've been chosen. more interesting is when i copy the text to any other program like MS Office Word it'll be shown correctly no matter what font i choose.
so how can i get around this ?
In this situation, I would think about writing a script/program to simply pull all the data out through the existing program.
You could write an application to send keypresses to the app which would select and copy each value in turn.
It would take a while to run, but you could just leave it overnight (how big is your database?) and it only has to run once.
Not sure how easy this would be, since I haven't seen this app of course - might this work?
Take a debugger like ollydbg/softice. Find the place where the mean is decoded/encoded and then step through the instructions one by one, check all registers to find out what is done. I have done so numerous times. That should help you getting started, since you have the application which is able to decode this stuff. You also have a reference word. That's all you need.
Also take into consideration: Unicode can be Little or Big Endian. So you might try swapping the bytes. UTF-8 can be a pain, since some words are stored as one byte and some as two bytes.
You can also try to take words which are almost identical in Farsi and try to compare the outputs. That could lead to a reconstruction of a custom code page, if there is one.

Categories