Unity text URL adds extra characters [duplicate] - c#

Wordpress is putting this at the end of my permalink on the live site... %E2%80%8E anyone know why? Thanks guys?!

If you copy post title from MS Word or WordPad or similar editor. This char is like end of line.

Step 1) Identify the link, and open the post or page it appears on in the WordPress Dashboard.
Step 2) We need to delete the invisible character causing the issue, so delete the last several characters from the URL, including the quotation mark, so that this
Step 3) Manually retype what was deleted.
Step 4) Click Update then go and check the revised post to confirm the problem is resolved.
https://www.wpkb.com/fix-wordpress-links-%E2%80%8E-end/

These invisible unicode characters are actually there (unwillingly). You can notice them when moving cursor across them with arrow keys. They use to be added by formatting editors like Word. It's crazy, but Edge adds them even to window title =-O (messing with password managers) or MS Teams Wiki to code snippets (which are meant for preserving space-indented plain text).
It's complicated to get rid of them, because almost all plain text editors and browsers (hence all webapps) today support unicode and even ctrl-shift pasting them preserves them. Even if you try to backspace them, editors usually keep them to preserve rtl/ltr text orientation for you.
Copy the title to some hex editor, remove the characters there and copy it back. Or copy just the ascii part from address bar (if they are URL encoded) and clear the title field by selecting all (ctrl-a).
I use:
PSPad (natively)
Notepad++ (with HEX-Editor plugin)
Common invisible characters:
Code point
UTF-8 hex
Name
U+200B
e2 80 8b
ZERO WIDTH SPACE
U+200E
e2 80 8e
LEFT-TO-RIGHT MARK
`U+200F
e2 80 8f
RIGHT-TO-LEFT MARK
https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128

Yes, If you copied it from some editor.
Simple solution is to just copy the content from editor and paste it in 'notepad' text editor as it doesn't support UTF-8 character.
you will easily notice that buggy characted/text like '%E2%80%8E' in your text.

these are all non-printable ASCII chars
like these are all äÄçÇéÉêöÖÐþúÚ
to remove use this code
function remove_non_ascii(str) {
if ((str===null) || (str===''))
return false;
else
str = str.toString();
return str.replace(/[^\x20-\x7E]/g, '');
}
console.log(remove_non_ascii('äÄçÇéÉêHello-WorldöÖÐþúÚ'));

If you use some characters in your link WordPress will show %E2%80%8E instead of those. for example if you use Half Space (CTRL + Space or CTRL + Shift + 2) in your link, WordPress shows %E2%80%8E. solution: just use text + - in your links

Related

uGUI Text Field, How to Remove "Replacement Characters" (uFFFD aka �)?

Using the uGUI Text component, I'm getting "replacement characters" aka � and I can't find a way to remove them.
I'm getting a string from the Instagram api which contains unicode characters for both non-alphabet language characters (for Japanese for example) which I need.
However, the unicode characters for the emojis come in as replacement characters aka �.
I don't require the emojis and they can be stripped out however, I can't find a method to do this.
I'm unable to use TextMeshPro as I'm unable generate a font asset with all the unicode characters need to display the various languages (this could be user error but when I try the process hangs).
I notice these � characters don't appear in the Inspector or console so there must be a way to ignore or remove them.
I'm setting the string like this
body.text = System.Uri.UnescapeDataString(postData.text);
I've tried a number of things that haven't worked including
body.text = body.text.Replace('\uFFFD','\'');//doesn't work
body.text = Regex.Replace(body.text, #"^[\ufffd]", string.Empty);//doesn't work
I've also tried breaking up the string as a char array. When I try to print to console I get this error when it hits a replacement character:
foreach (char item in postData.text.ToCharArray())
print(item); //Error: UTF-16 to UTF-8 conversion failed because the input string is invalid
Any help with this would be greatly appreciated!
Thank you.
Unity 2018.4.4, c#
Found the answer!
This post provided a solution: How do I remove emoji characters from a string?
body.text = Regex.Replace(body.text, #"\p{Cs}", "");

Why do I get an CS1056 Unexpected character '' on this code

I'm getting this unexpected character '' error and I don't understand why.
var list = new List<MyModel>();
list.Add(new MyModel() {
variable1 = 942,
variable2 = 2001,
variable3 = "my text",
variable4 = 123
​}); // CS1056 Unexpected character '' on this line
From what the error says and the actual error code I got from an Online compiler after copy/pasting, Your code on this line contains a character that is not visible but that the compiler is trying to interpret. Simply try erase every character starting at your closing bracket towards your number 3 and press Enter again It should be working (it did work for me)
I just deleted the file Version=v4.0.AssemblyAttributes.cs(1,1,1,1) located in my temp folder C:\Users\MyUser\AppData\Local\Temp and then it works perfectly.
For .NET Core you have to delete .NETCoreApp,Version=v2.1.AssemblyAttributes.cs
As mentioned by Daneau in the accepted answer, the problem is by a character that is not visible in the IDE.
Here are several solutions to find the invisible character with Notepad++.
Solutions 1: Show Symbol
Copy the code to Notepad++,
Select View -> Show Symbol -> Show All Characters
This can show invisible control characters.
Solutions 2: Convert to ANSI
Copy the code to Notepad++,
Select Encoding- > Convert to ANSI
This will convert the invisible character to ? if it is a none ANSI character.
Solutions 3: Remove none ASCII characters
Copy the code to Notepad++,
Open the Find window (Ctrl+F)
Select the Replace tab
in "Find what" write: [^\x00-\x7F]
Leave "Replace with" empty
In "Search Mode" select "Regular expression"
Find and remove the none ASCII characters
This will remove none ASCII characters.
Note: This can remove valid non ASCII characters (in strings and comments) so try to skip those if you have any.
Tip: Use HEX-Editor plugin
Use Notepad++ HEX-Editor plugin to see the binary code of text. Any character out of the range of 0x00 - 0x7F (0 - 127) is a non ASCII character and a suspect of being the problem.
Just reporting my direct experience.
As Daneau wrote, I had a character (ASCII DLE, I copied while messing up a zebra printer) hiding in the text. I could not afford to rewrite everything, so I used notepad++ "View->Show Symbol->Show All Characters" feature.
I apologize for not commenting Daneau entry, but I don't have enough reputation.
Write the code again without copying it. That worked for me
go to C:\Users\UserName\AppData\Local\Temp\ and clear the data or remove the file specified in the error, that will solve the issue.
VS will add the required file on auto, no worries.
I got this error when I moved my application from one folder to another, I resolved this by deleting the Debug folder inside the obj folder.
It indeed has to do with copy pasting code and characters that you cannot see. The easiest way to fix it is by passing your copy pasted code into a note application or simple text program which will automatically remove these invisible characters. After that simply copy the code from the text editor and paste it into your IDE.
For some reason this happened to me on every project in my solution. My fix was to delete all bin and obj folders in my solution.

StatusStrip Labels Text are mirrored [duplicate]

I am using a StringBuilder in C# to append some text, which can be English (left to right) or Arabic (right to left)
stringBuilder.Append("(");
stringBuilder.Append(text);
stringBuilder.Append(") ");
stringBuilder.Append(text);
If text = "A", then output is "(A) A"
But if text = "بتث", then output is "(بتث) بتث"
Any ideas?
This is a well-known flaw in the Windows text rendering engine when asked to render Right-To-Left text, Arabic or Hebrew. It has a difficult problem to solve, people often fall back to Western words and punctuation when there is no good alternative word available in the language. Brand and company names for example. The renderer tries to guess at the proper render order by looking at the code points, with characters in the Latin character set clearly having to be rendered left-to-right.
But it fumbles at punctuation, with brackets being the most visible. You have to be explicit about it so it knows what to do, you must use the Unicode Right-to-left mark, U+200F or \u200f in C# code. Conversely, use the Left-to-right mark if you know you need LTR rendering, U+200E.
Use AppendFormat instead of just Append:
stringBuilder.AppendFormat("({0}) {0}", text)
This may fix the issue, but it may - you need to look at the text value - it probably has LTR/RTL markers characters embedded. These need to either be removed or corrected in the value.
I had a similar issue and I managed to solve it by creating a function that checks each Char in Unicode. If it is from page FE then I add 202C after it as shown below. Without this it gets RTL and LTF mixed for what I wanted.
string us = string.Format("\uFE9E\u202C\uFE98\u202C\uFEB8\u202C\uFEC6\u202C\uFEEB\u202C\u0020\u0660\u0662\u0664\u0668 Aa1");

Correct Hebrew character sequence in C# and searchable PDFs

I'm testing an SDK that extracts text from a searchable PDF. One of the SDK's dependencies was recently updated, and it's causing an existing test on Hebrew text to fail. I don't know Hebrew nor enough about how the involved technologies represent right-to-left languages.
The NUnit test asserts that the extracted text matches the C# string "מנבוצץז ".
string hebrewText = reader.ReadToEnd();
Assert.AreEqual("מנבוצץז ", hebrewText);
The rasterized PDF has what I believe are the same characters, but in the opposite order.
The unit test fails with this message:
Expected: "מנבוצץז "
But was: " זץצובנמ"
Although the actual result more closely matches what I see in the rasterized PDF, I'm not completely sure the original test is wrong.
Are Hebrew characters in a C# string supposed to be read right-to-left like printed Hebrew text?
Does any part of the .NET stack tamper with the direction of Hebrew strings?
What about NUnit?
Are Hebrew characters embedded in a searchable PDF normally supposed to go in the same direction as the rasterized text?
Anything else I should know before deciding whether to "fix" this unit test?
There are various ways to encode RTL languages. The most common way (and Window's default) is to use logical ordering, which means the first letter is encoded as the first character in a string (or file). So whether visually the first letter appears on the left or right side of the screen doesn't affect the order in which they are stored.
Now as for the text appearing in Visual Studio, it depends on the version. As far as I remember, prior to Visual Studio 2010 the code editor displayed Hebrew backwards, and it was apparent as when you tried to select Hebrew text, it reversed in an odd way (which was visually confusing). It appears this issue no longer exists is Visual Studio 2010 (at least with SP1 which I just tested).
Let's take a Hebrew word for which the direction is more clear to non-Hebrew speakers than the string specified in your text:
יון
The word happens to be the Hebrew word for an ion, and on your screen, it should appear as three letters where the tallest letter is on the left and the shortest is on the right. In a .NET string, the expression "יון".Substring(0, 1) will produce the short letter, since it's the first letter in the string. The string can also be written as "\u05D9\u05D5\u05DF" where the leftmost Unicode character \u05D9 represents the short letter displayed on the right, which clearly demonstrates the order in which the letters are stored.
Since the string in your test case is nonsensical, I can't tell you whether it was a wrong test all along or if it a correct test that should pass. If the image you uploaded has been rendered correctly then it appears the actual result of your test is correct and the expected value is incorrect, and so you should fix the test.
I believe that all strings in C# will be stored internally as LTR; RTL strings will have a non-printable character (or something) denoting that they are indeed RTL.
More than likely. RTL GUIs and rendered text for example need certain properties (specifically RightToLeft and RightToLeftLayout) to be set in order to display correctly.
NUnit shouldn't. Nor should it care. IMHO a reversed string != the original string.
I couldn't comment. I'd assume that they should be whatever the test is expecting though, assuming it was passing at first.
Don't do half measures with RTL, it really doesn't like it. Either have full RTL support, or nothing. It can be pretty nasty, I wish you the best of luck!

C# Unknown Text Found

I'm creating a program to transfer text from a word document to a database. During some testing I came across some text inside a textbox after setting it's text to a table cell range as follows:
textBox1.Text = oDoc.Tables[1].Cell(1, 3).Range.Text;
What appeared in the form was:
What wasn't expected was the dot at the end of the text and I have no idea what it is supposed to represent. The dot can be highlighted but if you try and copy and paste it nothing appears. You can delete the dot manually. Can anyone help me identify what this is?
The identification bit shouldn't be too hard:
string text = oDoc.Tables[1].Cell(1, 3).Range.Text;
textBox1.Text = ((int) text[4]).ToString("x4");
That will give you the Unicode UTF-16 code unit for that character... you can then find out what it is on the Unicode web site. (I usually look at the Charts page or the directory of PDFs and guess which chart it will be in based on the numbering - it's not ideal, and there are probably better ways, but it's always worked well enough for me...)
Of course when you've identified it you'll still need to work out what the heck it's doing there... does the original Word document just have "HOLD"?

Categories