Special phone and e-mail characters - c#

I am parsing web pages by .NET (c#, HtmlAgilityPack). There are some values in the special format in the web page code (phone, email). Target values are "+420 221 513 222" and "revize#secar.cz" for instance but in html source code the values are like
<span class="p none">420%8722%AC1%87513%87%AC222</span>
<a class="e none">rev%DBize%DB%A7se%DBcar%DB%96cz</a>
I think I am missing something. I tried to use replace function etc. but to no avail. Can somebody help me with converting this values to right string values? (regex?)
Thank you for your help.

You could use:
HttpUtility.HtmlDecode(S)
This can be found in the System.Web namespace.

Sure. You're looking for Uri.UnescapeDataString(url). However, it doesn't quite decode all of it at the same time. So what you need to do is use it in a loop, like this:
public static string DecodeUrlString(this string url)
{
string newUrl;
while ((newUrl = Uri.UnescapeDataString(url)) != url)
url = newUrl;
return newUrl;
}

Related

Ampersand in query string messing up simple mailto of a link in the body

I'm trying to do a simple mailto inside my C# ASP.Net web app.
string url = HttpContext.Current.Request.Url.AbsoluteUri;
System.Diagnostics.Process.Start("mailto:?subject=View Rig Map&body=" + url);
However if the url has a query string with an ampersand (&) separating the name-value pairs like so "http://localhost:51771/MuseumViewer.aspx?MuseumIDs=3301&CountryIDs=1" the link it cut off in the body of the email at "http://localhost:51771/MuseumViewer.aspx?MuseumIDs=3301."
I don't really want to do anything fancy because all I need to do is have the link in the body of the email. Can anyone help me with this? Would it work if I put the mailto on the client side?
UPDATE with SOLUTION
I'm having a tough time deciding on who to pick as the answer but here is the solution I used:
string url = HttpContext.Current.Request.Url.AbsoluteUri;
string link = Server.UrlEncode(url);
System.Diagnostics.Process.Start("mailto:?subject=View Rig Map&body=" + link);
%26 is the URL escape code for an ampersand. Try running UrlEncode() on the url.
Add reference to System.Web to your project.
Use the below lines in your app
string url = "http://localhost:51771/MuseumViewer.aspx?MuseumIDs=3301&CountryIDs=1";
System.Diagnostics.Process.Start("mailto:?subject=View Rig Map&body=" + System.Web.HttpUtility.UrlEncodeUnicode(url));

How to read the query string when it contains unencoded data?

I have the below asp.net page which accepts a "url" query string key whose value can be an un-encoded url:
http://localhost:4104/WebSiteForTest/TinyUrl.aspx?url=http://www.google.co.uk/#hl=en&q=life&oq=life&aq=f&aqi=g-s1g9&aql=&gs_sm=3&gs_upl=2803373l2803701l2l2803826l4l4l0l0l0l0l188l453l0.3l3l0&bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&fp=94681dc4659502d1&biw=1680&bih=883
Now from this page, how would that be possible to read the text after ".aspx?"?
I checked the Request.Url.AbsoluteUri property and it only showed
"http://localhost:4104/WebSiteForTest/TinyUrl.aspx?url=http://www.google.co.uk/"
I also checked with the Request.QueryString with the below code:
private void getQueryString()
{
var sb = new StringBuilder();
var queryStringCount = Request.QueryString.Keys.Count;
for (int keyIndex = 0; keyIndex < queryStringCount; keyIndex++)
{
sb.Append(Request.QueryString.Keys[keyIndex]).Append("=").Append(Request.QueryString[keyIndex]);
if (keyIndex != (queryStringCount - 1))
{
sb.Append("&");
}
}
}
However, the code after "#" doesn't appear in any query string.
how would that be possible to read the text after ".aspx?"?
if you say it's not possible, how Google uses "#" in their url then when you search for something?!
http://www.google.co.uk/#hl=en&site=&q=life&oq=life&aq=f&aqi=g-s1g9&aql=&gs_sm=3&gs_upl=3317l3630l0l3755l4l4l0l0l0l0l125l391l3.1l4l0&bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&fp=94681dc4659502d1&biw=1680&bih=849
Thanks,
It's not possible to get value after anchor on server side, you can check this with fiddler or something similar, you should deal with this on client. Browser simply strips all after anchor.
Retrieving Anchor Link In URL for ASP.Net
c# get complete URL with "#"
Update:
I don't know how google exactly do this, but if you look with fiddler after initial request there goes another without #, here is a fidller log for request from your question :
so my advice is look with fiddler how google do this, or maybe ask another question
Use Request.QueryString
http://localhost:4104/WebSiteForTest/TinyUrl.aspx?url=http://www.google.co.uk/#hl=en&q=life&oq=life&aq=f&aqi=g-s1g9&aql=&gs_sm=3&gs_upl=2803373l2803701l2l2803826l4l4l0l0l0l0l188l453l0.3l3l0&bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&fp=94681dc4659502d1&biw=1680&bih=883
<%=Request.QueryString("url")%> will get the ?url parameter
I assume you're using C# to do this. You can easily get the parameters and their values by iterating through the request object. Or in this case, since you know the name of the parameter, simply do this:
String url = Request.QueryString["url"];
More information on iterating through your request parameters can be found here.
The Uri Type works as well.
String yourHttpUri ="....";
Uri yourURI = new Uri(yourHttpUri);
yourURI.query // "?url=http://www.google.co.uk/"
yourURI.fragment // "#hl=en&q=life&oq=life&aq=f&aqi=g-s1g9&aql=&gs_sm=3&gs_upl=2803373l2803701l2l2803826l4l4l0l0l0l0l188l453l0.3l3l0&bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&fp=94681dc4659502d1&biw=1680&bih=883"
Edit:
Have you tried Request.Url.ToString(); (And create a new Uri from the result)

Parse reply/quote and make it a hyperlink ASP.NET

I'm working on a messageboard and I'd like to have the following reply/quote system:
#5432 //post number
This is a reply to a post
#5647
This is a reply to another post
This is plain text, server-side I want to replace it so it ends like this:
<a href="#5432>#5432</a>
...
I think the regex would be ^#\d+, but I don't know how to implement it, especially with multiple ocurrences.
This ASP.NET + C#, btw.
Here is how you can do it in c#:
string str = #"#5432 //post number
This is a reply to a post
#5647
This is a reply to another post";
Regex.Replace(str, #"(#)(\d+)", #"$1$2")

Broken encoding after postback

I have a query string with a parameter value that contains the norwegian character å encoded as %e5. The page contains a form with an action attribute which is automatically filled by ASP.Net. When the URL is output into said attribute it is printed with a full two byte encoding: %u00e5.
When posting back this seems to be ok when debugging the code behind. However the page actually does a redirect to itself (for some other reason) and the redirect location header looks like this: Location: /myFolder/MyPage.aspx?Param1=%C3%A5
So the %e5 has been translated to %C3%A5 which breaks the output somehow.
In HTML text the broken characters look like å after having been output via HttpUtility.HtmlEncode.
The entire web application is ISO8859-1 encoded.
PS. When removing the u00 from the output %u00e5 in the action attribute before posting the form, everything is output nicely. But the error seems to be the translation from %e5 to %C3%A5. (And of course the self redirect, but that's another matter.)
Any pointers?
The solution I ended up with was encoding the redirect URL manually.
public void ReloadPage()
{
UrlBuilder url = new UrlBuilder(Context, Request.Path);
foreach (string queryParam in Request.QueryString.AllKeys)
{
string queryParamValue = Request.QueryString[queryParam];
url.AddQueryItem(queryParam, queryParamValue);
}
Response.Redirect( url.ToString(), true);
}
The url.AddQueryItem basically does HttpContext.Server.UrlDecode(queryParamValue) and the url.ToString builds the query string and for each query item does HttpContext.Server.UrlEncode( queryParamValue).
The UrlBuilder is a class already present in our library, so once I found the problem and realized that C#/.Net didn't provide tools for this, coding the fix was quick :)

Getting U+fffd/65533 instead of special character from Query String

I have a C# .net web project that have a globalization tag set to:
<globalization requestEncoding="utf-8" responseEncoding="utf-8" culture="nb-no" uiCulture="no"/>
When this URL a Flash application (you get the same problem when you enter the URL manually in a browser): c_product_search.aspx?search=kjøkken (alternatively: c_product_search-aspx?search=kj%F8kken
Both return the following character codes:
k U+006b 107
j U+006a 106
� U+fffd 65533
k U+006b 107
k U+006b 107
e U+0065 101
n U+006e 110
I don't know too much about character encoding, but it seems that the ø has been given a unicode replacement character, right?
I tried to change the globalization tag to:
<globalization requestEncoding="iso-8859-1" responseEncoding="utf-8" culture="nb-no" uiCulture="no"/>
That made the request work. However, now, other searches on my page stopped working.
I also tried the following with similar results:
NameValueCollection qs = HttpUtility.ParseQueryString(Request.QueryString.ToString(), Encoding.GetEncoding("iso-8859-1"));
string search = (string)qs["search"];
What should I do?
Kind Regards,
nitech
The problem comes from the combination Firefox/Asp.Net. When you manually entered a URL in Firefox's address bar, if the url contains french or swedish characters, Firefox will encode the url with "ISO-8859-1" by default.
But when asp.net recieves such a url, it thinks that it's utf-8 encoded ... And encoded characters become "U+fffd". I couldn't find a way in asp.net to detect that the url is "ISO-8859-1". Request.Encoding is set to utf-8 ... :(
Several solutions exist :
put <globalization requestEncoding="iso-8859-1" responseEncoding="iso-8859-1"/> in your Web.config. But your may comme with other problems, and your application won't be standard anymore (it will not work with languages like japanese) ... And anyway, I prefer using UTF-8 !
go to about:config in Firefox and set the value of network.standard-url.encode-query-utf8 to true. It will now work for you (Firefox will encode all your url with utf-8). But not for anybody else ...
The least worst solution I could come with was to handle this with code. If the default decoding didn't work, we reparse QueryString with iso8859-1 :
string query = Request.QueryString["search"];
if (query.Contains("%ufffd"))
query = HttpUtility.ParseQueryString(Request.Url.Query, Encoding.GetEncoding("iso-8859-1"))["search"];
query = HttpUtility.UrlDecode(query);
It works with hyperlinks and manually-entered url, in french, english, or japanese. But I don't know how it will handle other encodings like ISO8859-5 (russian) ...
Does anyone have a better solution ?
This solves only the problem of manually-entered url. In your hyperlinks, don't forget to encode url parameters with HttpUtility.UrlEncode on the server, or encodeURIComponent on the javascript code. And use HttpUtility.UrlDecode to decode it.
public string GetEncodedQueryString(string key)
{
string query = Request.QueryString[key];
if (query != null)
if (query.Contains((char)0xfffd))
query = HttpUtility.ParseQueryString(Request.Url.Query, Encoding.GetEncoding("iso-8859-1"))[key];
return query;
}
i think your problem is in the flash, not the .net.
it sends the special character in a weird way.
try to urlencode the search string bevore you send it to the server.
If the app is expecting the URL-encoded request to be based on UTF-8, the character "ø" should be "%C3%B8", not "%F8". Whatever function you're using to escape/encode that request, you probably need to pass it the name of the underlying character encoding, "UTF-8".
It turns out that ActionScript 2.0 will send the URL encoded/escaped with UTF-8 while ActionScript 3.0 used ISO-8859-1. The way to solve this was to change the Request.Encoding value inside Global.asax if an encoding is specified in the URL:
void Application_BeginRequest(object sender, EventArgs e)
{
HttpContext ctx = HttpContext.Current;
// encoding specified?
if (!String.IsNullOrEmpty(Request["encoding"]))
{
ctx.Request.ContentEncoding = System.Text.Encoding.GetEncoding(ctx.Request["encoding"]);
}
}
Could it be done differently?
Regards,
nitech

Categories