razor page (blazor) is encoding the characters - c#

I have a little website and when I do Ctrl+U on the website the UTF-8 characters are not being printed correctly.
For example if I try to print programación it prints programación
This is happening only when I print the value of a variable, it does not happen when I hardcode the text.
For example, <div>#variable</div> prints the content with the encoded values like programación. But the value itself, using the inspector is programación.
But if in the following line I write <div>programación</div> it prints it correctly.
I tried to use httputlity.htmldecode but I got the same result.
I also tried the meta charset=UTF-8 and I saved the .razor file as UTF. Like it is specified in this post but none of those worked.
Is there any way of printing those characters correctly?
I'm using .NET5 if that matters.

By default, Razor encodes all non-ASCII characters, i.e. those outside of the Basic Latin range. If you want other ranges to be left alone, you need to configure that. You do so in the ConfigureServices method by specifying the ranges that Razor should not encode. The character you are having problems with is in the Latin-1 Supplement range, so you include that:
services.Configure<WebEncoderOptions>(options =>
{
options.TextEncoderSettings = new TextEncoderSettings(
UnicodeRanges.BasicLatin,
UnicodeRanges.Latin1Supplement);
});
Note that whatever you set here will override the default settings, which is why you also need to include the UnicodeRanges.BasicLatin range. If you are unsure which character sets you should include, you can check here: http://www.unicode.org/charts/. Alternatively, you can simply specify UnicodeRanges.All.

Related

XML invalid using the following characters £ ` –

I am trying to create an RSS feed that will validate using the W3C validator.
I keep getting problems from the following URLS containing the characters £, ` or -
Here are the URLs:
http://www.example.co.uk/news/2012/april/stamp-rationing-–-why-the-royal-mail-are-ripping-you-off
Here is the error:
This feed does not validate.
line 14, column 119: link must be a full and valid URL: http://www.example.co.uk/news/2012/april/stamp-rationing-–-why-the-royal-mail-are-ripping-you-off [help]
... –-why-the-royal-mail-are-ripping-you-off
I have tried replacing the symbols with escape characters but this doesn't work. Here are the escape characters I have been using:
Text = Text.Replace("-", "&#45");
Text = Text.Replace("£", "%C2%A");
Text = Text.Replace("`", "%60");
Text = Text.Replace("’", "%60");
Does anyone have any idea how to solve this problem? Here are some more links that are causing me problems:
http://www.example.co.uk/news/2012/march/for-sale-3-bed-detached-london-home-£15,000
Error:
This feed does not validate.
line 14, column 106: link must be a full and valid URL: http://www.example.co.uk/news/2012/march/for-sale-3-bed-detached-london-home-£15,000 [help]
... -sale-3-bed-detached-london-home-£15,000
You will need to URL encode the URLs before posting them in the RSS:
var encoded = HttpUtility.UrlEncode(aUrl);
Note that the URLs will not be usable directly as :, / etc will also get encoded.
If you want the values of these to be valid XML, use SecurityElement.Escape instead.
var escaped = SecurityElement.Escape(aUrl);
I'm building an API for my system, and I've been using some stuff to normalize the fields. Try filtering this with PHP:
$value = preg_replace('/[^a-z]/i', '', $value);
$value = preg_replace('/[^\x09\x0A\x0D\x20-\x7F]/e', '"&#".ord($0).";"', $value);
$value = htmlentities($value, ENT_NOQUOTES, 'UTF-8', false);
Answer is either to use UTF-8 encoding or convert non-ascii characters to XML entities.
UTF-8 encoding: Make sure the document is output in UTF-8, and includes the relevant encoding headers.
See also UTF-8 encoding xml in PHP
Entity encoding: Convert all non ASCII characters to XML entities.
XML Entities look like this: £ (that one is for the £ sign). Most programming languages will either do this automatically for you as you generate the XML document, or provide standard functions for doing it. You didn't specify the language you're using, but the above should help you find the appropriate API functions.
One thing you should not be doing is generating XML data manually (ie outputting tags and attributes, as strings), or string-replacing the entities manually. You should be using the proper APIs for it. Generating XML (or any other standard data format) manually is always likely to end in problems like this, and does it seem to be a bit crazy to do it the hard way if the tools are right there in front of you to do it properly.

url encoding c# mismatch encoding

I have a large url that I am encoding using System.Web.HttpUtility.UrlEncode. When I encode it its not encoding it like the working example I have. I am not sure what the problem is, maybe different character type or something, I put an example of what suppose to be created and what actually being created. thanks for any help, i am lost on this one.
Working exmaple (look how this one has Did%252Citag%252 and the other doesnt)
22%7Chttp%3A%2F%2Fv17.nonxt1.googlevideo.com%2Fvideoplayback%3Fid%3D0b608733ae5257c3%26itag%3D22%26source%3Dpicasa%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1333533157%26sparams%3Did%252Citag%252Csource%252Cip%252Cipbits%252Cexpire%26signature%3D8AD67D74F34FBAFBBA87616C0AED4A336DF0982A.129E2B5E648F8A2F35A34F312AC5C3C957A1C40A%26key%3Dlh1%2C35%7Chttp%3A%2F%2Fv18.nonxt3.googlevideo.com%2Fvideoplayback%3Fid%3D0b608733ae5257c3%26itag%3D35%26source%3Dpicasa%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1333533157%26sparams%3Did%252Citag%252Csource%252Cip%252Cipbits%252Cexpire%26signature%3D7A58A11994C710872E945D0EAA6E43B6BFB8A648.B9C1D9FB377E1A49EBF3DC6C166C0B6E3E94EC24%26key%3Dlh1%2C34%7Chttp%3A%2F%2Fv6.nonxt1.googlevideo.com%2Fvideoplayback%3Fid%3D0b608733ae5257c3%26itag%3D34%26source%3Dpicasa%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1333533157%26sparams%3Did%252Citag%252Csource%252Cip%252Cipbits%252Cexpire%26signature%3D260B10850A3448C849B8B8F1F2AF5E31244E71BC.6D7420FD66B85D40982BFB2C847EDB46021C63AE%26key%3Dlh1%2C5%7Chttp%3A%2F%2Fv23.nonxt7.googlevideo.com%2Fvideoplayback%3Fid%3D0b608733ae5257c3%26itag%3D5%26source%3Dpicasa%26ip%3D0.0.0.0%26ipbits%3D0%26expire%3D1333533157%26sparams%3Did%252Citag%252Csource%252Cip%252Cipbits%252Cexpire%26signature%3D9894DCDA7D2634EE0006CE0F6E0E29ABF7A8F253.18765D7CD7BDE80ED1A47DC8EC559C3E05C92F56%26key%3Dlh1
Here is an example of the one I am creating (see this one encodes as did%2citag%2)
5%7chttp%3a%2f%2fv23.nonxt7.googlevideo.com%2fvideoplayback%3fid%3d0b608733ae5257c3%26itag%3d5%26source%3dpicasa%26ip%3d0.0.0.0%26ipbits%3d0%26expire%3d1333562840%26sparams%3did%2citag%2csource%2cip%2cipbits%2cexpire%26signature%3dC0E2993011931D9F5FCAFAF54E821415F6042DDD.477CD23B021563A6DE30E858E35C21046E0B0BA6%26key%3dlh1%2c18%7chttp%3a%2f%2fv11.nonxt4.googlevideo.com%2fvideoplayback%3fid%3d0b608733ae5257c3%26itag%3d18%26source%3dpicasa%26ip%3d0.0.0.0%26ipbits%3d0%26expire%3d1333562840%26sparams%3did%2citag%2csource%2cip%2cipbits%2cexpire%26signature%3d696501A8ACBA0E1246173B040E0FB81DA8EBCDC7.944BA6C08C630EFFC2456D66BAD12376D7E377B2%26key%3dlh1%2c34%7chttp%3a%2f%2fv6.nonxt1.googlevideo.com%2fvideoplayback%3fid%3d0b608733ae5257c3%26itag%3d34%26source%3dpicasa%26ip%3d0.0.0.0%26ipbits%3d0%26expire%3d1333562840%26sparams%3did%2citag%2csource%2cip%2cipbits%2cexpire%26signature%3dDDD3D9081F7F2FF462D17CFAE6CAB72AEB86DEA9.3275E0EE8921EF728132035FC94BEF5926A0B7C1%26key%3dlh1%2c35%7chttp%3a%2f%2fv18.nonxt3.googlevideo.com%2fvideoplayback%3fid%3d0b608733ae5257c3%26itag%3d35%26source%3dpicasa%26ip%3d0.0.0.0%26ipbits%3d0%26expire%3d1333562840%26sparams%3did%2citag%2csource%2cip%2cipbits%2cexpire%26signature%3d7826E7470450F9F473BC7A845967EF3AC655CFB.3850F952F5D68151D325CD754C581CD66B0BC4D7%26key%3dlh1%2c22%7chttp%3a%2f%2fv17.nonxt1.googlevideo.com%2fvideoplayback%3fid%3d0b608733ae5257c3%26itag%3d22%26source%3dpicasa%26ip%3d0.0.0.0%26ipbits%3d0%26expire%3d1333562840%26sparams%3did%2citag%2csource%2cip%2cipbits%2cexpire%26signature%3d32FAAE6AE74B22BFB3DBD4300CEEDBC1A12A9ED4.8014678ABB1AEE93FB4B1C36E2C74C89102DC112%26key%3dlh1
Looks like in the first example the URL is double encoded. Meaning if you look at decoded sparams parameter it is represented as
sparams=id%2Citag%2Csource%2Cip%2Cipbits%2Cexpire
In your second example
sparams=id,itag,source,ip,ipbits,expire
So, what is happening in the first example is that, they are doing a UrlEncode on the value first. Using this value Construct the URL and then do UrlEncode on the constructed URL.
UPDATE : This is a general practice to be followed if the value of your querystring contains values which needs to be UrlEncoded (eg. , & space ? etc)
According to w3c standards, your example is fine. There is no %252 symbol.
I'm not sure exactly what you are expecting, but when you fire these strings into a URL Decoder, this is what you get:
String 1
22|http://v17.nonxt1.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=22&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333533157&sparams=id%2Citag%2Csource%2Cip%2Cipbits%2Cexpire&signature=8AD67D74F34FBAFBBA87616C0AED4A336DF0982A.129E2B5E648F8A2F35A34F312AC5C3C957A1C40A&key=lh1,35|http://v18.nonxt3.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=35&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333533157&sparams=id%2Citag%2Csource%2Cip%2Cipbits%2Cexpire&signature=7A58A11994C710872E945D0EAA6E43B6BFB8A648.B9C1D9FB377E1A49EBF3DC6C166C0B6E3E94EC24&key=lh1,34|http://v6.nonxt1.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=34&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333533157&sparams=id%2Citag%2Csource%2Cip%2Cipbits%2Cexpire&signature=260B10850A3448C849B8B8F1F2AF5E31244E71BC.6D7420FD66B85D40982BFB2C847EDB46021C63AE&key=lh1,5|http://v23.nonxt7.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=5&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333533157&sparams=id%2Citag%2Csource%2Cip%2Cipbits%2Cexpire&signature=9894DCDA7D2634EE0006CE0F6E0E29ABF7A8F253.18765D7CD7BDE80ED1A47DC8EC559C3E05C92F56&key=lh1
String 2
5|http://v23.nonxt7.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=5&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333562840&sparams=id,itag,source,ip,ipbits,expire&signature=C0E2993011931D9F5FCAFAF54E821415F6042DDD.477CD23B021563A6DE30E858E35C21046E0B0BA6&key=lh1,18|http://v11.nonxt4.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=18&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333562840&sparams=id,itag,source,ip,ipbits,expire&signature=696501A8ACBA0E1246173B040E0FB81DA8EBCDC7.944BA6C08C630EFFC2456D66BAD12376D7E377B2&key=lh1,34|http://v6.nonxt1.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=34&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333562840&sparams=id,itag,source,ip,ipbits,expire&signature=DDD3D9081F7F2FF462D17CFAE6CAB72AEB86DEA9.3275E0EE8921EF728132035FC94BEF5926A0B7C1&key=lh1,35|http://v18.nonxt3.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=35&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333562840&sparams=id,itag,source,ip,ipbits,expire&signature=7826E7470450F9F473BC7A845967EF3AC655CFB.3850F952F5D68151D325CD754C581CD66B0BC4D7&key=lh1,22|http://v17.nonxt1.googlevideo.com/videoplayback?id=0b608733ae5257c3&itag=22&source=picasa&ip=0.0.0.0&ipbits=0&expire=1333562840&sparams=id,itag,source,ip,ipbits,expire&signature=32FAAE6AE74B22BFB3DBD4300CEEDBC1A12A9ED4.8014678ABB1AEE93FB4B1C36E2C74C89102DC112&key=lh1
You URLS are quite different, and they also have a leading chars that I'm not sure you are wanting.

Why does Console.Write treat character x266A differently?

I'm writing a console app that needs to print some atypical (for a console app) unicode characters such as musical notes, box drawing symbols, etc.
Most characters show up correctly, or show a ? if the glyph doesn't exist for whatever font the console is using, however I found one character which behaves oddly which can be demonstrated with the lines below:
Console.Write("ABC");
Console.Write('♪'); //This is the same as: Console.Write((char)0x266A);
Console.Write("XYZ");
When this is run it will print ABC then move the cursor back to the start of the line and overwrite it with XYZ. Why does this happen?
The console doesn't use Uncode, so the characters has to be translated to an 8-bit code page. The ♪ character is converted to the character with code 13 (hex 0x0d), which is CR or Carrage Return.
In most code pages, for example code page 850, the CR chararacter glyph resembles a quarter note, and the 266a character is specified as the Unicode equivalent.
However, if you write the CR character to the console, it will not display the quarter note glyph, instead it is interpreted as the control character CR which moves the cursor to the beginning of the line.
Console.Write('♪'); is considered Unicode. My guess it is it translates it to the closest ASCII character. You should be using U+1D160 or the appropriate unicode, musical equivalent.
There are the required primitives to generate musical output in the Unicode code set (starting at U+1D100). For example, U+1D11A is a 5-line staff, U+1D158 is a closed notehead.
See http://www.unicode.org/charts/PDF/U1D100.pdf
..then the issue becomes making sure that you have a typeface with the appropriate glyphs included (and dealing with the issues of spacing things correctly, etc.)
IF you're looking to generate printed output, you should look at Lilypond, which is an OSS music notation package that uses a text file format to define the musical content and then generates gorgeous output.

Remove anchor from URL in C#

I'm trying to pull in an src value from an XML document, and in one that I'm testing it with, the src is:
<content src="content/Orwell - 1984 - 0451524934_split_2.html#calibre_chapter_2"/>
That creates a problem when trying to open the file. I'm not sure what that #(stuff) suffix is called, so I had no luck searching for an answer. I'd just like a simple way to remove it if possible. I suppose I could write a function to search for a # and remove anything after, but that would break if the filename contained a # symbol (or can a file even have that symbol?)
Thanks!
If you had the src in a string you could use
srcstring.Substring(0,srcstring.LastIndexOf("#"));
Which would return the src without the #. If the values you are retreiving are all web urls then this should work, the # is a bookmark in a url that takes you to a specific part of the page.
You should be OK assuming that URLs won't contain a "#"
The character "#" is unsafe and should
always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might
follow it.
Source (search for "#" or "unsafe").
Therefore just use String.Split() with the "#" as the split character. This should give you 2 parts. In the highly unlikely event it gives more, just discard the last one and rejoin the remainder.
From Wikipedia:
# is used in a URL of a webpage or other resource to introduce a "fragment identifier" – an id which defines a position within that resource. For example, in the URL http://en.wikipedia.org/wiki/Number_sign#Other_uses the portion after the # (Other_uses) is the fragment identifier, in this case indicating that the display should be moved to show the tag marked by ... in the HTML
It's not safe to remove de anchor of the url. What I mean is that ajax like sites make use of the anchor to keep track of the context. For example gmail. If you go to http://www.gmail.com/#inbox, you go directly to your inbox, but if you go to http://www.gmail.com/#all, you'll go to all your mail.
The server can give a different response based on the anchor, even if the response is a file.

How to use strange characters in a query string

I am using silverlight / ASP .NET and C#. What if I want to do this from silverlight for instance,
// I have left out the quotes to show you literally what the characters
// are that I want to use
string password = vtakyoj#"5
string encodedPassword = HttpUtility.UrlEncode(encryptedPassword, Encoding.UTF8);
// encoded password now = vtakyoj%23%225
URI uri = new URI("http://www.url.com/page.aspx#password=vtakyoj%23%225");
HttpPage.Window.Navigate(uri);
If I debug and look at the value of uri it shows up as this (we are still inside the silverlight app),
http://www.url.com?password=vtakyoj%23"5
So the %22 has become a quote for some reason.
If I then debug inside the page.aspx code (which of course is ASP .NET) the value of Request["password"] is actually this,
vtakyoj#"5
Which is the original value. How does that work? I would have thought that I would have to go,
HttpUtility.UrlDecode(Request["password"], Encoding.UTF8)
To get the original value.
Hope this makes sense?
Thanks.
First lets start with the UTF8 business. Esentially in this case there isn't any. When a string contains characters with in the standard ASCII character range (as your password does) a UTF8 encoding of that string is identical to a single byte ASCII string.
You start with this:-
vtakyoj#"5
The HttpUtility.UrlEncode somewhat aggressively encodes it to:-
vtakyoj%23%225
Its encoded the # and " however only # has special meaning in a URL. Hence when you view string value of the Uri object in Silverlight you see:-
vtakyoj%23"5
Edit (answering supplementary questions)
How does it know to decode it?
All data in a url must be properly encoded thats part of its being valid Url. Hence the webserver can rightly assume that all data in the query string has been appropriately encoded.
What if I had a real string which had %23 in it?
The correct encoding for "%23" would be "%3723" where %37 is %
Is that a documented feature of Request["Password"] that it decodes it?
Well I dunno, you'd have check the documentation I guess. BTW use Request.QueryString["Password"] the presence of this same indexer directly on Request was for the convenience of porting classic ASP to .NET. It doesn't make any real difference but its better for clarity since its easier to make the distinction between QueryString values and Form values.
if I don't use UFT8 the characters are being filtered out.
Aare you sure that non-ASCII characters may be present in the password? Can you provide an example you current example does not need encoding with UTF-8?
If Request["password"] is to work, you need "http://url.com?password=" + HttpUtility.UrlEncode("abc%$^##"). I.e. you need ? to separate the hostname.
Also the # syntax is username:password#hostname, but it has been disabled in IE7 and above IIRC.

Categories