How to get QueryString from a href?

How to get QueryString from a href? - c#

I am trying to stop XSS attack so I am using html agility pack to make my whitelist and Microsoft Anti-Cross Site Scripting Library to deal with the rest.
Now I am looking at encoding all html hrefs. I get a big string of html code that can contain hrefs. Accours to MS Library they have an URL encode but if you encode the whole URl then it can't be used. So in the example they just encode the query string
UrlEncode Untrusted input is used in a
URL (such as a value in a
querystring) Click
Here!
http://msdn.microsoft.com/en-us/library/aa973813.aspx
So now my questions is how do I parse through a href and find the query string. Is it always just "?" then query string or can it have spaces and be written in different ways?
Edit
This urls will not be written by me but the users who will share them. So that's why I need a way to make sure I get all query strings and not just ones in valid format. If it can work invalid format I have to grab these ones too. Hackers won't care if it is valid format or not as long as it still does what they want.

I believe it is always the part after the ? but you can easily use the Uri class for this:
Uri uri = new Uri("http://foo.com/page.html?query");
string query = uri.Query;
That will include the ? itself. Of course, you can fetch the other bits as well, which could be handy.

what about using encrypted query string and in your code you can decrypt it
OR you can use Request.PathInfo that make you not need ? in query string

Here's a W3C reference addressing the composition of URIs with querystrings, which says in part:
The question mark ("?", ASCII 3F hex)
is used to delimit the boundary
between the URI of a queryable object,
and a set of words used to express a
query on that object.

Related

C# Parse HTML Post Data

I have MemoryStream data (HTML POST Data) which i need to parse it.
Converting it to string give result like below
key1=value+1&key2=val++2
Now the problem is that all this + are space in html. Am not sure why space is converting to +
This is how i am converting MemoryStream to string
Encoding.UTF8.GetString(request.PostData.ToArray())

If you are using Content-Type of application/x-www-form-urlencoded, your data needs to be url encoded.
Use System.Web.HttpUtility.UrlEncode():
using System.Web;
var data = HttpUtility.UrlEncode(request.PostData);
See more in MSDN.
You can also use JSON format for POST.

I suppose that the data you are retrieving are encoded with URL rules.
You can discover why data are encoded to this format reading this simple article from W3c school.
To encode/decode your post string you may use this couple of methods:
System.Web.HttpUtility.UrlEncode(yourString); // Encode
System.Web.HttpUtility.UrlDecode(yourString); // Decode
You can find more informations about URL manipulation functions here.
Note: If you need to encode/decode an array of string you need to enumerate your collection with a for or foreach statement. Remember that with this kind of cycles you cannot directly change the cycle variable value during the enumeration (so probably you need a temporary storage variable).
At least, to efficiently parse strings, I suggest you to use the System.Text.RegularExpression.Regex class and learn the regex "language".
You can find some example on how to use Regex here; Regex101 site has also a C# code generator that shows you how to translate your regex into code.

decode querystring within context.Request

i have an ashx file that requires some query-string values to deliver appropriate images.
The problem is some search engines urlencode then htmlencode those urls in their cache or when they index those.
So for example instead of getting
/preview.ashx?file=abcd.jpg&root=small
i get
/preview.ashx?file=abcd.jpg&root=small
this basically throws off the context.Request.QueryString["root"] so it thinks that there's no variable root
i guess the second key in the querystring is amp;root i.e the part after the & sign.
What i'm wondering is if there's a way to automatically html and urldecode the querystring on serverside so that the program doesn't get confused.

There is no harm in calling HttpUtility.HtmlDecode or HttpUtility.UrlDecode more than once.
string queryString = "/preview.ashx?file=abcd.jpg&root=small";
var parsedString = HttpUtility.HtmlDecode(queryString);
var root = HttpUtility.ParseQueryString(parsedString)["root"];

To URL encode and decode you can use the following methods:
string encoded = System.Web.HttpUtility.UrlEncode(url);
string decoded = System.Web.HttpUtility.UrlDecode(url);
I've seen instances in the past where the Query String has been doubly encoded. In which case you'll need to doubly decode — this may be your issue...

C# mvc2 encode url

i'm trying to encode an url with the code below;
var encodedUrl = HttpUtility.UrlEncode("http://www.example.com");
var decodedUrl = HttpUtility.UrlDecode("http%3A%2F%2Fwww%2Eexample%2Ecom%2F");
I'm working with the google webmaster tools api and this api expects an URL as shown in the decodedUrl variable above. Every single character is encoded there.
When i use the httputility encode function i get the following result;
http%3a%2f%2fwww.example.com
How can i use the encoding variable in such a way that every character in the url is encoded?

I'm pretty sure that HtmlUtility and AntiXss (another MS tool for encoding urls) aren't going to help here. A "." in a url is considered valid and so doesn't need to be encoded.
I think you're going to have to post-process your encoded string to further encode other characters that are not valid within teh google webmaster tools API.
i.e. do something like this...
var encodedUrl = HttpUtility.UrlEncode("http://www.example.com")
.Replace(".", "%2E");
... assuming that "." is the only character you're having problems with.

The period is not a reserved character in a URL, so it won't be encoded. See this question and answer for an elegant solution.

How to use strange characters in a query string

I am using silverlight / ASP .NET and C#. What if I want to do this from silverlight for instance,
// I have left out the quotes to show you literally what the characters
// are that I want to use
string password = vtakyoj#"5
string encodedPassword = HttpUtility.UrlEncode(encryptedPassword, Encoding.UTF8);
// encoded password now = vtakyoj%23%225
URI uri = new URI("http://www.url.com/page.aspx#password=vtakyoj%23%225");
HttpPage.Window.Navigate(uri);
If I debug and look at the value of uri it shows up as this (we are still inside the silverlight app),
http://www.url.com?password=vtakyoj%23"5
So the %22 has become a quote for some reason.
If I then debug inside the page.aspx code (which of course is ASP .NET) the value of Request["password"] is actually this,
vtakyoj#"5
Which is the original value. How does that work? I would have thought that I would have to go,
HttpUtility.UrlDecode(Request["password"], Encoding.UTF8)
To get the original value.
Hope this makes sense?
Thanks.

First lets start with the UTF8 business. Esentially in this case there isn't any. When a string contains characters with in the standard ASCII character range (as your password does) a UTF8 encoding of that string is identical to a single byte ASCII string.
You start with this:-
vtakyoj#"5
The HttpUtility.UrlEncode somewhat aggressively encodes it to:-
vtakyoj%23%225
Its encoded the # and " however only # has special meaning in a URL. Hence when you view string value of the Uri object in Silverlight you see:-
vtakyoj%23"5
Edit (answering supplementary questions)
How does it know to decode it?
All data in a url must be properly encoded thats part of its being valid Url. Hence the webserver can rightly assume that all data in the query string has been appropriately encoded.
What if I had a real string which had %23 in it?
The correct encoding for "%23" would be "%3723" where %37 is %
Is that a documented feature of Request["Password"] that it decodes it?
Well I dunno, you'd have check the documentation I guess. BTW use Request.QueryString["Password"] the presence of this same indexer directly on Request was for the convenience of porting classic ASP to .NET. It doesn't make any real difference but its better for clarity since its easier to make the distinction between QueryString values and Form values.
if I don't use UFT8 the characters are being filtered out.
Aare you sure that non-ASCII characters may be present in the password? Can you provide an example you current example does not need encoding with UTF-8?

If Request["password"] is to work, you need "http://url.com?password=" + HttpUtility.UrlEncode("abc%$^##"). I.e. you need ? to separate the hostname.
Also the # syntax is username:password#hostname, but it has been disabled in IE7 and above IIRC.

QueryString malformed after URLDecode

I'm trying to pass in a Base64 string into a C#.Net web application via the QueryString. When the string arrives the "+" (plus) sign is being replaced by a space. It appears that the automatic URLDecode process is doing this. I have no control over what is being passed via the QueryString. Is there any way to handle this server side?
Example:
http://localhost:3399/Base64.aspx?VLTrap=VkxUcmFwIHNldCB0byAiRkRTQT8+PE0iIHBsdXMgb3IgbWludXMgNSBwZXJjZW50Lg==
Produces:
VkxUcmFwIHNldCB0byAiRkRTQT8 PE0iIHBsdXMgb3IgbWludXMgNSBwZXJjZW50Lg==
People have suggested URLEncoding the querystring:
System.Web.HttpUtility.UrlEncode(yourString)
I can't do that as I have no control over the calling routine (which is working fine with other languages).
There was also the suggestion of replacing spaces with a plus sign:
Request.QueryString["VLTrap"].Replace(" ", "+");
I had though of this but my concern with it, and I should have mentioned this to start, is that I don't know what other characters might be malformed in addition to the plus sign.
My main goal is to intercept the QueryString before it is run through the decoder.
To this end I tried looking at Request.QueryString.toString() but this contained the same malformed information. Is there any way to look at the raw QueryString before it is URLDecoded?
After further testing it appears that .Net expects everything coming in from the QuerString to be URL encoded but the browser does not automatically URL encode GET requests.

The suggested solution:
Request.QueryString["VLTrap"].Replace(" ", "+");
Should work just fine. As for your concern:
I had though of this but my concern with it, and I should have mentioned this to start, is that I don't know what other characters might be malformed in addition to the plus sign.
This is easy to alleviate by reading about base64. The only non alphanumeric characters that are legal in modern base64 are "/", "+" and "=" (which is only used for padding).
Of those, "+" is the only one that has special meaning as an escaped representation in URLs. While the other two have special meaning in URLs (path delimiter and query string separator), they shouldn't pose a problem.
So I think you should be OK.

You could manually replace the value (argument.Replace(' ', '+')) or consult the HttpRequest.ServerVariables["QUERY_STRING"] (even better the HttpRequest.Url.Query) and parse it yourself.
You should however try to solve the problem where the URL is given; a plus sign needs to get encoded as "%2B" in the URL because a plus otherwise represents a space.
If you don't control the inbound URLs, the first option would be preferred as you avoid the most errors this way.

I'm having this exact same issue except I have control over my URL. Even with Server.URLDecode and Server.URLEncode it doesn't convert it back to a + sign, even though my query string looks as follows:
http://localhost/childapp/default.aspx?TokenID=0XU%2fKUTLau%2bnSWR7%2b5Z7DbZrhKZMyeqStyTPonw1OdI%3d
When I perform the following.
string tokenID = Server.UrlDecode(Request.QueryString["TokenID"]);
it still does not convert the %2b back into a + sign. Instead I have to do the following:
string tokenID = Server.UrlDecode(Request.QueryString["TokenID"]);
tokenID = tokenID.Replace(" ", "+");
Then it works correctly. Really odd.

I had similar problem with a parameter that contains Base64 value and when it comes with '+'.
Only Request.QueryString["VLTrap"].Replace(" ", "+"); worked fine for me;
no UrlEncode or other encoding helping because even if you show encoded link on page yourself with '+' encoded as a '%2b' then it's browser that changes it to '+' at first when it showen and when you click it then browser changes it to empty space. So no way to control it as original poster says even if you show links yourself. The same thing with such links even in html emails.

If you use System.Uri.UnescapeDataString(yourString) it will ignore the +. This method should only be used in cases like yours where when the string was encoded using some sort of legacy approach either on the client or server.
See this blog post:
http://blogs.msdn.com/b/yangxind/archive/2006/11/09/don-t-use-net-system-uri-unescapedatastring-in-url-decoding.aspx

If you URLEncode the string before adding it to the URL you will not have any of those problems (the automatic URLDecode will return it to the original state).

Well, obviously you should have the Base64 string URLEncoded before sending it to the server.
If you cannot accomplish that, I would suggest simply replacing any embedded spaces back to +; since b64 strings are not suposed to have spaces, its a legitimate tactic...

System.Web.HttpUtility.UrlEncode(yourString) will do the trick.

As a quick hack you could replace space with plus character before base64-decoding.

I am by no means a C# developer but it looks like you need to url ENCODE your Base64 string before sending it as a url.

Can't you just assume a space is a + and replace it?
Request.QueryString["VLTrap"].Replace(" ", "+");
;)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.