Should I set charset/codepage when working with digits only? - c#

I have a C# code that returns coordinates:
<%# Page Language="C#" CodePage="65001" CodeFile="codefile.aspx.cs" inherits="codefile" %>
Response.Write(coordinates());
It otuput something like this:
77.0444687 12.9120790
Do I need to set the CodePage="65001"?
Is that appropriate?

Since the CodePage value of "65001" is the Windows implementation of UTF-8 and the default encoding for ASP.NET is UTF-8, then it is not necessary to use this CodePage value, but it is not inappropriate either. You are just restating the default value. I suppose if the default changes in newer versions of the .NET Framework, then explicitly stating this value would be more useful.
Read Code Page Identifiers for more information.

By default, all pages have anyway a CodePage that is first set on web.config of the system, then you may have change it on web.config on the site, and finally you can change it on the page declaration on top.
You do not need to change the code page for numbers, from the default of the system that is probably UTF-8 encoding.

Related

razor page (blazor) is encoding the characters

I have a little website and when I do Ctrl+U on the website the UTF-8 characters are not being printed correctly.
For example if I try to print programación it prints programación
This is happening only when I print the value of a variable, it does not happen when I hardcode the text.
For example, <div>#variable</div> prints the content with the encoded values like programación. But the value itself, using the inspector is programación.
But if in the following line I write <div>programación</div> it prints it correctly.
I tried to use httputlity.htmldecode but I got the same result.
I also tried the meta charset=UTF-8 and I saved the .razor file as UTF. Like it is specified in this post but none of those worked.
Is there any way of printing those characters correctly?
I'm using .NET5 if that matters.
By default, Razor encodes all non-ASCII characters, i.e. those outside of the Basic Latin range. If you want other ranges to be left alone, you need to configure that. You do so in the ConfigureServices method by specifying the ranges that Razor should not encode. The character you are having problems with is in the Latin-1 Supplement range, so you include that:
services.Configure<WebEncoderOptions>(options =>
{
options.TextEncoderSettings = new TextEncoderSettings(
UnicodeRanges.BasicLatin,
UnicodeRanges.Latin1Supplement);
});
Note that whatever you set here will override the default settings, which is why you also need to include the UnicodeRanges.BasicLatin range. If you are unsure which character sets you should include, you can check here: http://www.unicode.org/charts/. Alternatively, you can simply specify UnicodeRanges.All.

How can I change System.Text.Encoding.Default?

When I call :
oStreamReader = new StreamReader(_sFileName, System.Text.Encoding.Default);
I don't get characters with accent (by the way I expect french characters with accent).
When I display the System.Text.Encoding.Default, I get :
{System.Text.SBCSCodePageEncoding}
[System.Text.SBCSCodePageEncoding]: {System.Text.SBCSCodePageEncoding}
BodyName: "iso-8859-1"
CodePage: 1252
DecoderFallback: {System.Text.InternalDecoderBestFitFallback}
EncoderFallback: {System.Text.InternalEncoderBestFitFallback}
EncodingName: "Europe de l'Ouest (Windows)"
HeaderName: "Windows-1252"
IsBrowserDisplay: true
IsBrowserSave: true
IsMailNewsDisplay: true
IsMailNewsSave: true
IsReadOnly: true
IsSingleByte: true
WebName: "Windows-1252"
WindowsCodePage: 1252
Does it not expect to be UTF-8 ?
Where can I set System.Text.Encoding.Default ?
Is it bound with Windows settings ?
Thanks a lot in advance.
Eric.
Does it not expect to be UTF-8 ?
On .NET Framework, it's your configured Windows code page. On .NET Core, it is UTF-8.
From the docs:
In .NET Framework on the Windows desktop, the Default property always gets the system's active code page and creates a Encoding object that corresponds to it. The active code page may be an ANSI code page, which includes the ASCII character set along with additional characters that vary by code page. Because all Default encodings based on ANSI code pages lose data, consider using the Encoding.UTF8 encoding instead. UTF-8 is often identical in the U+00 to U+7F range, but can encode characters outside the ASCII range without loss
Where can I set System.Text.Encoding.Default ?
It's your configured Windows code page.
Is it bound with Windows settings ?
Yep
oStreamReader = new StreamReader(_sFileName, System.Text.Encoding.Default);
The easiest thing is to just do:
oStreamReader = new StreamReader(_sFileName);
StreamReader will try to detect the encoding used from the byte order marks, but will fall back to UTF-8 if that fails, so just let it do that.
There should be almost no need to ever type Encoding.Default in your code: it's a badly-named property which should be ignored.
Different computers can use different encodings as the default, and the default encoding can change on a single computer. Refer to https://learn.microsoft.com/en-us/dotnet/api/system.text.encoding.default?view=net-5.0 for details. This page clearly explains how System.Text.Encoding should be used.
I would not suggest changing default encoding because the encoding returned by the Default property uses best-fit fallback to map unsupported characters to characters supported by the code page. Rather, use the encoding required by your specific code.
oStreamReader = new StreamReader(_sFileName, System.Text.Encoding.UTF8);

Encoded characters are not rendering correctly

I am using the shorthand for HttpUtility.HtmlEncode to encode the data going into my textboxs.
<asp:TextBox ID="txtProperty" runat="server" Text='<%#: Bind("Property")%>'></asp:TextBox>
My understanding of how encoded characters behave is that when your web browser renders them, they should display as the characters they represent and not the actual encoded characters. As this example code on the MSDN website suggests.
However my encoded characters does not behave this way.
For example a '£' character being retrieved from a database, displays in the textbox as:
And not:
I don't think it has anything to do with how my website is configured to handle encoding, because if I manually set the text as the encoded characters in the HTML:
<asp:TextBox ID="txtProperty" runat="server" Text="£"></asp:TextBox>
It renders the encoded characters correctly as:
This indicates to me that it is a problem with the way I am using HtmlEncode.
Still I tried explicitly setting the encoding to UTF-8 in my webconfig and it made no difference.
Could someone explain this behavior, or what might be the problem here?
When you do <%#: Bind("Property")%> ASP.NET will already take care of HTML-encoding the string, if you pre-encode it you'll fall in the double-encoding scenario.
See ScottGu's New <%: %> Syntax for HTML Encoding Output in ASP.NET 4 (and ASP.NET MVC 2):
ASP.NET 4 introduces a new IHtmlString interface (along with a concrete implementation: HtmlString) that you can implement on types to indicate that its value is already properly encoded (or otherwise examined) for displaying as HTML, and that therefore the value should not be HTML-encoded again.
The <%: %> code-nugget syntax checks for the presence of the IHtmlString interface and will not HTML encode the output of the code expression if its value implements this interface.
This allows developers to avoid having to decide on a per-case basis whether to use <%= %> or <%: %> code-nuggets.
Instead you can always use <%: %> code nuggets, and then have any properties or data-types that are already HTML encoded implement the IHtmlString interface.

Html Encoding of output in legacy ASP.NET site

I have a legacy ASP.Net site (recently upgraded to .NET 4.0) which never had Request Validation turned on and it doesn't Html encode any user input at all.
My solution was to turn on request validation and to catch the HttpRequestValidationException in Global.asax and redirect the user to an error page. I don't Html Encode the user input as I'll have to do it in thousands of places. I hope my approach will stop any XSS vectors getting saved into database.
However, in case if there is already any XSS vector stored in database I reckon I should also Html encode all output. Unfortunately I have very limited dev and test resource to successfully achieve this. I came up with a list of changes I need to go through:
Search and Replace all <%= %> with <%: %>.
Search and Replace all Labels with Literals and add Mode="Encode".
Wrap all eval() with HtmlEncode.
My questions are:
Is there any simpler way of turning on all output to be automatically Html encoded?
Am I missing anything from above list?
Any pitfalls I should be careful about?
Thanks.
Search and Replace all <%= %> with <%: %>.
Don't forget the <%# and Response.Write which will be harder to replace
Search and Replace all Labels with Literals and add Mode="Encode".
But you will loose all formatting on the previously generated spans, break the DOM
and the corresponding js/css
You would also have to search all Literals with Mode="PassThrough" and set them to Encode
Wrap all eval() with HtmlEncode.
Yes, it seems like a subset of the <%# matter above
Also, you could have some custom controls with funky render method
Assuming there is "only" a relational DB as back-end, If I had access to the DB, I would first go on identifying the problematic tables and columns which values contain markup.
I would then :
cleanup as best as I can those values in DB.
ensure HtmlEncoding of the corresponding outputs in my pages
I could then go for a basic global replace <%= becoming <%: and sanitize outputs on the long run.

Displaying Swedish charaters in aspx page

In an aspx page, Combo box is displaying swedish characters in wrong way. It is displaying like "Réunion" instead of"'Re'union" ? This value is retrieved from oracle database? Please suggest workarounds to fix this issue? Note: Culture and UICulture attributes are tried with. But not working"
You either use encoding for your html page different from the default UTF-8 or are reading wrong values from the database. You can check the encoding headers with Firebug or IE Dev Tools and also the document encoding. You can check if your column in the database is unicode or ASCII in which case you will need encoding. There are two simple test you can do
Add some swedish text directly into a C# string and assign it to a label. See how it renders. If it is OK then your page encoding is OK.
Put a breakpoint after you retrieve the value from the database and check with the debugger if it is displayed correctly in the debugger window.
If 1 does not display correctly but 2 does then you have encoding problem with the page. If you 1 is displayed correctly but 2 is not you have a problem when reading or writing values to the database.
First of all, determine if you receive the string correctly from the Oracle database (in debugger, view the received string). If the string is already received wrong, it means you have not properly set the database charset on your connection. You should fix that; a nasty workaround would be to “ungarble” the garbled string by something like Encoding.UTF8.GetString(Encoding.GetEncoding(1252).GetBytes(garbledString)).

Categories