HttpWebRequest sometimes incorrectly decoding parts of a query parameter

HttpWebRequest sometimes incorrectly decoding parts of a query parameter - c#

I have spotted something which seems off in the HttpWebRequest object.
If I run the following:
var q = "кот (";
(Note here 'кот' is written in Russian, apparently it means cat.)
var encoded = Uri.EscapeDataString(q);
var url = $"https://api.twitter.com/1.1/search/tweets.json?count=100&include_entities=true&q={encoded}";
I get the following value in url:
https://api.twitter.com/1.1/search/tweets.json?count=100&include_entities=true&q=%D0%BA%D0%BE%D1%82%20%28
If I then run this:
var r = (HttpWebRequest)WebRequest.Create(url);
r.GetResponse();
In Fiddler, the above is observed to actually make a request:
https://api.twitter.com/1.1/search/tweets.json?count=100&include_entities=true&q=%D0%BA%D0%BE%D1%82%20(
Note that the ( is not encoded as %28 as it was when I constructed the HttpWebRequest.
If instead I use:
q = "CAT ("
i.e. no Russian characters, only latin(?), I get this as the URL:
https://api.twitter.com/1.1/search/tweets.json?count=100&include_entities=true&q=CAT%20%28
And this is also the request observed in Fiddler.
To summarise, it seems when mixing latin and non latin characters, the ( is not being sent encoded.
Does anybody have any suggestions how to solve this?
UPDATE:
This is important because as far as I can tell it is the reason I can't successfully make these API queries to Twitter as it appears to be breaking our OAuth1 signing, we are getting:
HTTP/1.1 401 Authorization Required
{"errors":[{"code":32,"message":"Could not authenticate you."}]}
I can even edit the failing request in Fiddler and replace the ( with a %28 in the GET request, and it then succeeds on replaying it with this single change.

This may help... it appears that the behaviour of the Uri class is as follows:
var q = "кот (";
var encoded = Uri.EscapeDataString(q);
// encoded = %D0%BA%D0%BE%D1%82%20%28
var uri = new Uri("https://api.twitter.com/1.1/search/tweets.json?count=100&include_entities=true&q=" + encoded);
// uri.AbsoluteUri = https://api.twitter.com/1.1/search/tweets.json?count=100&include_entities=true&q=%D0%BA%D0%BE%D1%82%20(
var uri2 = new Uri("https://api.twitter.com/1.1/search/tweets.json?count=100&include_entities=true&q=CAT (");
// uri2.AbsoluteUri = https://api.twitter.com/1.1/search/tweets.json?count=100&include_entities=true&q=CAT%20(
var uri3 = new Uri("https://api.twitter.com/1.1/search/tweets.json?count=100&include_entities=true&q=кот (");
// uri3.AbsoluteUri = https://api.twitter.com/1.1/search/tweets.json?count=100&include_entities=true&q=%D0%BA%D0%BE%D1%82%20(
I cannot debug into the .NET Framework code at the moment, but I suspect use of Uri.EscapeDataString might be incorrect WRT non-Latin chars and brackets?
Does the request with the unencoded bracket actually work? If so I would suggest the bracket does not need encoding...
UPDATE: I actually think this might be a bug in Uri?

Related

unable to remove the slash "\" in json string in c#

I am unable to remove the slash while requesting to the Post API in c#.By default slash is added in the value, is there is a way to remove the slash in the string.I am sending the string array to api.I have used replace also but it is not working.
"[\"9782163865630.jpg\",\"9946239664158.jpg\",\"9946237403166.jpg\",\"10056487272478.jpg\",\"10056486322206.jpg\",\"10060074352670.jpg\",\"9999843459102.jpg\",\"9716071170078.jpg\",\"9716071497758.jpg\",\"10052987715614.jpg\",\"10052985683998.jpg\",\"10056390115358.jpg\",\"10056391622686.jpg\",\"10056391360542.jpg\",\"9837103120414.jpg\",\"9837102923806.jpg\",\"9837104857118.jpg\"]"
public void PostWebAPI(List<string> FileNameList)
{
string json = JsonConvert.SerializeObject(FileNameList).ToString();
json = json.Replace(#"\","");
var client = new RestClient("eg.api.stackflow.com/post");
client.Timeout = -1;
var request = new RestRequest(Method.POST);
request.AlwaysMultipartFormData = true;
request.AddParameter("filePaths", json);
request.AddParameter("bucketAsDir", "false");
IRestResponse response = client.Execute(request);
}
Visual Studio debugging:

The backslash \ is not a readable character in your string, its an escape character for the double quotes: \"; its telling the compiler that the " following the backslash is not a string delimiter but a regular character part of the string.
Consider you want to have a string that contains the following text: "Hello" (not Hello). You would write the following:
string s = "\"Hello\"";
s is really "Hello" but the debugger will show it as "\"Hello\"" because it has no better way to desambiguate to the user " as string delimiter from " as part of the string itself.
In short, the escape character \ inside a string tells the compiler that the following character is used in a way that is not the default interpretation the compiler would consider. Other expamples:
\": a regular double quote instead of the string delimiter "
\0: null charater instead of a regular 0
\n: new line character instead of a regular n
\t: tab character instead of a regular t
\\: backslash instead of the escape character \
etc.
Check here for the whole list.
So, to make a long story short, dont worry, your string really is: ["9782163865630.jpg","9946239664158.jpg","9946237403166.jpg",.... You can verify this by simply printing out the string to the console: Console.WriteLine(json);

The slash isn't actually in the string. You're trying to remove something that doesn't exist. The debugger is just escaping the double quotes. Click on the magnifier icon will get you some options on how the debugger displays it.

Each quotes " is a special symbol in C#.
Those backslashes \ just escape sequences for the quotes ".
It dont make your result error.
Try to write this in Visual Studio:
string myString = "This is my "string""; // Error
You can use a backslash before each quote (\") to fix it:
string myString = "This is my \"string\""; // This work well
Try this here

I was facing the problem with the above code using RestClient instead of it, I have used HttpClient, Now from API I am not getting the error. Slash is added to request paramater using RestClient but in HttpClient it is not added, due to this UnicodeEncoding.UTF8, "application/json" the actuall value is been passed in the parameter of the API.
public async Task CallAPIAsync(List<string> objFileNameList)
{
var Info = new APIModel
{
filePaths = objFileNameList,
bucketAsDir = "false"
};
string request = JsonConvert.SerializeObject(Info);
using (var client = new HttpClient())
{
client.Timeout = Timeout.InfiniteTimeSpan;
var stringContent = new StringContent(request, UnicodeEncoding.UTF8, "application/json");
client.BaseAddress = new Uri("eg.api.stackflow.com/post");
var response = await client.PostAsync("post", stringContent);
var message = response.Content.ReadAsStringAsync().Result;
}
}

Uri.PathAndQuery doesnt include hash Query

Is this expected? For example:
https://google.com/hello?w=orld#hi
Uri.PathAndQuery would result:
/hello?w=orld
Fully excluding the # bit even though I require it.
What should I do here?
Should I manually do a PathAndQuery like operation perhaps:
string fullUri = Uri.ToString();
Uri.Host + "/" + fullUri .Substring(fullUri.indexOf(Uri.Host)+Uri.Host.Length)
Essentially it compiles google.com, /, hello?w=orld#hi which would be an expected result
Im retrieving this specifically for a stream write request related operation:
{0} {1} HTTP/1.1\r\n {0} = Method {1} = pathandquery

The #hi part is called "fragment", you can access it through .Fragment. Since the property is called PathAndQuery, not PathAndQueryAndFragment, I assume this works as intended. As far as I know there is no method or property available which includes the fragment, but you can easily attach it:
var uri = new Uri("https://google.com/hello?w=orld#hi");
var pathAndQueryAndFragment = $"{uri.PathAndQuery}{uri.Fragment}";
But be aware that the fragment part is usually not submitted to the server.

How can I get a part/subdomain of my URL in C#?

I have a URL like the following
http://yellowcpd.testpace.net
How can I get yellowcpd from this? I know I can do that with string parsing, but is there a builtin way in C#?

Assuming your URLs will always be testpace.net, try this:
var subdomain = Request.Url.Host.Replace("testpace.net", "").TrimEnd('.');
It'll just give you the non-testpace.net part of the Host. If you don't have Request.Url.Host, you can do new Uri(myString).Host instead.

try this
string url = Request.Url.AbsolutePath;
var myvalues= url.Split('.');

How can I get yellowcpd from this? I know I can do that with string
parsing, but is there a builtin way in C#?
.Net doesn't provide a built-in feature to extract specific parts from Uri.Host. You will have to use string manipulation or a regular expression yourself.
The only constant part of the domain string is the TLD. The TLD is the very last bit of the domain string, eg .com, .net, .uk etc. Everything else under that depends on the particular TLD for its position (so you can't assume the next to last part is the "domain name" as, for .co.uk it would be .co

This fits the bill.
Split over two lines:
string rawURL = Request.Url.Host;
string domainName = rawURL .Split(new char[] { '.', '.' })[1];
Or over one:
string rawURL = Request.Url.Host.Split(new char[] { '.', '.' })[1];

The simple answer to your question is no there isn't a built in method to extract JUST the sub-domain. With that said this is the solution that I use...
public enum GetSubDomainOption
{
ExcludeWWW,
IncludeWWW
};
public static class Extentions
{
public static string GetSubDomain(this Uri uri,
GetSubDomainOption getSubDomainOption = GetSubDomainOption.IncludeWWW)
{
var subdomain = new StringBuilder();
for (var i = 0; i < uri.Host.Split(new char[]{'.'}).Length - 2; i++)
{
//Ignore any www values of ExcludeWWW option is set
if(getSubDomainOption == GetSubDomainOption.ExcludeWWW && uri.Host.Split(new char[]{'.'})[i].ToLowerInvariant() == "www") continue;
//I use a ternary operator here...this could easily be converted to an if/else if you are of the ternary operators are evil crowd
subdomain.Append((i < uri.Host.Split(new char[]{'.'}).Length - 3 &&
uri.Host.Split(new char[]{'.'})[i+1].ToLowerInvariant() != "www") ?
uri.Host.Split(new char[]{'.'})[i] + "." :
uri.Host.Split(new char[]{'.'})[i]);
}
return subdomain.ToString();
}
}
USAGE:
var subDomain = Request.Url.GetSubDomain(GetSubDomainOption.ExcludeWWW);
or
var subDomain = Request.Url.GetSubDomain();
I currently have the default set to include the WWW. You could easilly reverse this by switching the optional parameter value in the GetSubDomain() method.
In my opinion this allows for an option that looks nice in code and without digging in appears to be 'built-in' to c#. Just to confirm your expectations...I tested three values and this method will always return just the "yellowcpd" if the exclude flag is used.
www.yellowcpd.testpace.net
yellowcpd.testpace.net
www.yellowcpd.www.testpace.net
One assumption that I use is that...splitting the hostname on a . will always result in the last two values being the domain (i.e. something.com)

As others have pointed out, you can do something like this:
var req = new HttpRequest(filename: "search", url: "http://www.yellowcpd.testpace.net", queryString: "q=alaska");
var host = req.Url.Host;
var yellow = host.Split('.')[1];
The portion of the URL you want is part of the hostname. You may hope to find some method that directly addresses that portion of the name, e.g. "the subdomain (yellowcpd) within TestSpace", but this is probably not possible, because the rules for valid host names allow for any number of labels (see Valid Host Names). The host name can have any number of labels, separated by periods. You will have to add additional restrictions to get what you want, e.g. "Separate the host name into labels, discard www if present and take the next label".

How to decode Javascript Unicode into C# strings

For example the JSON callback we get on a google autosearch:
window.google.td && window.google.td('tljp1322487273527014', 4,{e:"HY7TTtmRFZPe8QPCvf30Dw",c:1,u:"http://www.google.co.uk/s?hl\x3den\x26cp\x3d5\x26gs_id\x3d17\x26xhr\x3dt\x26q\x3dowasp\x26pf\x3dp\x26sclient\x3dpsy-ab\x26source\x3dhp\x26pbx\x3d1\x26oq\x3d\x26aq\x3d\x26aqi\x3d\x26aql\x3d\x26gs_sm\x3d\x26gs_upl\x3d\x26bav\x3don.2,or.r_gc.r_pw.,cf.osb\x26fp\x3dbd20912ccdf288ab\x26biw\x3d387\x26bih\x3d362\x26tch\x3d4\x26ech\x3d15\x26psi\x3d5o3TTqCqCsnD0QXA7sUI.1322487273527.1\x26wrapid\x3dtljp1322487273527014",d:"[\x22owasp\x22,[[\x22owasp\x22,0,\x220\x22],[\x22owasp\\u003Cb\\u003E top 10\\u003C\\/b\\u003E\x22,0,\x221\x22],[\x22owasp\\u003Cb\\u003E top 10 2011\\u003C\\/b\\u003E\x22,0,\x222\x22],[\x22owasp\\u003Cb\\u003E zap\\u003C\\/b\\u003E\x22,0,\x223\x22]],{\x22j\x22:\x2217\x22}]"});window.google.td && window.google.td('tljp1322487273527014', 4,{e:"HY7TTtmRFZPe8QPCvf30Dw",c:0,u:"http://www.google.co.uk/s?hl\x3den\x26cp\x3d5\x26gs_id\x3d17\x26xhr\x3dt\x26q\x3dowasp\x26pf\x3dp\x26sclient\x3dpsy-ab\x26source\x3dhp\x26pbx\x3d1\x26oq\x3d\x26aq\x3d\x26aqi\x3d\x26aql\x3d\x26gs_sm\x3d\x26gs_upl\x3d\x26bav\x3don.2,or.r_gc.r_pw.,cf.osb\x26fp\x3dbd20912ccdf288ab\x26biw\x3d387\x26bih\x3d362\x26tch\x3d4\x26ech\x3d15\x26psi\x3d5o3TTqCqCsnD0QXA7sUI.1322487273527.1\x26wrapid\x3dtljp1322487273527014",d:""});
more specifically, how to go from:
"\x22te\\u003Cb\\u003Esco\\u003C\\/b\\u003E\x22,0,\x220\x22"
to
"te\u003Cb\u003Esco\u003C\/b\u003E",0,"0"
to
"te<b>sco</b>"
Note that the System.Web UrlDecode and HtmlDecode are not able to handle this.
Interestingly, the AntiXss almost does the reverse, since it can go from:
"te<b>sco</b>"
To
te\00003Cb\00003Esco\00003C\00002Fb\00003E
Security angle
These decodings have a number of security implications since they will be rendered by the browser. For example if in Javascript/jQuery we have a variable with the payload
var xss = "te\u003Cscript\u003Ealert\u002812\u0029\u003C\u002Fscript\u003E"
will be triggered if assigned to a div's html
$("#header").html(xss)

\x....
WTF? \u - dat's okey.
According to previous answer:
string str = #"P\u003e\u003cp\u003e Notes \u003cstrong\u003e Разработчик: \u003c/STRONG\u003e \u003cbr /\u003eЕсли игра Безразлично";
Regex regex = new Regex(#"\\u([0-9a-z]{4})",RegexOptions.IgnoreCase);
str = regex.Replace(str, match => char.ConvertFromUtf32(Int32.Parse(match.Groups[1].Value , System.Globalization.NumberStyles.HexNumber)));

It appears that "\x22te\\u003Cb\\u003Esco\\u003C\\/b\\u003E\x22,0,\x220\x22" is hex encoded, there is nothing available to decode this string out of the box, however the following should work:
var regex = new Regex(#"\\x([a-fA-F0-9]{2})");
var replaced = regex.Replace(input, match => char.ConvertFromUtf32(Int32.Parse(match.Groups[1].Value, System.Globalization.NumberStyles.HexNumber)));

extract query string from a URL string

I am reading from history, and I want that when i come across a google query, I can extract the query string. I am not using request or httputility since i am simply parsing a string. however, when i come across URLs like this, my program fails to parse it properly:
http://www.google.com.mt/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=mt&source=hp&biw=986&bih=663&q=hotmail&meta=&btnG=Fittex+bil-Google
what i was trying to do is get the index of q= and the index of & and take the words in between but in this case the index of & will be smaller than q= and it will give me errors.
any suggestions?
thanks for your answers, all seem good :) p.s. i couldn't use httputility, not I don't want to. when i add a reference to system.web, httputility isn't included! it's only included in an asp.net application. Thanks again

It's not clear why you don't want to use HttpUtility. You could always add a reference to System.Web and use it:
var parsedQuery = HttpUtility.ParseQueryString(input);
Console.WriteLine(parsedQuery["q"]);
If that's not an option then perhaps this approach will help:
var query = input.Split('&')
.Single(s => s.StartsWith("q="))
.Substring(2);
Console.WriteLine(query);
It splits on & and looks for the single split result that begins with "q=" and takes the substring at position 2 to return everything after the = sign. The assumption is that there will be a single match, which seems reasonable for this case, otherwise an exception will be thrown. If that's not the case then replace Single with Where, loop over the results and perform the same substring operation in the loop.
EDIT: to cover the scenario mentioned in the comments this updated version can be used:
int index = input.IndexOf('?');
var query = input.Substring(index + 1)
.Split('&')
.SingleOrDefault(s => s.StartsWith("q="));
if (query != null)
Console.WriteLine(query.Substring(2));

If you don't want to use System.Web.HttpUtility (thus be able to use the client profile), you can still use Mono HttpUtility.cs which is only an independent .cs file that you can embed in your application. Then you can simply use the ParseQueryString method inside the class to parse the query string properly.

here is the solution -
string GetQueryString(string url, string key)
{
string query_string = string.Empty;
var uri = new Uri(url);
var newQueryString = HttpUtility.ParseQueryString(uri.Query);
query_string = newQueryString[key].ToString();
return query_string;
}

Why don't you create a code which returns the string from the q= onwards till the next &?
For example:
string s = historyString.Substring(url.IndexOf("q="));
int newIndex = s.IndexOf("&");
string newString = s.Substring(0, newIndex);
Cheers

Use the tools available:
String UrlStr = "http://www.google.com.mt/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=mt&source=hp&biw=986&bih=663&q=hotmail&meta=&btnG=Fittex+bil-Google";
NameValueCollection Items = HttpUtility.ParseQueryString(UrlStr);
String QValue = Items["q"];

If you really need to do the parsing yourself, and are only interested in the value for 'q' then the following would work:
string url = #"http://www.google.com.mt/search?" +
"client=firefoxa&rls=org.mozilla%3Aen-" +
"US%3Aofficial&channel=s&hl=mt&source=hp&" +
"biw=986&bih=663&q=hotmail&meta=&btnG=Fittex+bil-Google";
int question = url.IndexOf("?");
if(question>-1)
{
int qindex = url.IndexOf("q=", question);
if (qindex > -1)
{
int ampersand = url.IndexOf('&', qindex);
string token = null;
if (ampersand > -1)
token = url.Substring(qindex+2, ampersand - qindex - 2);
else
token = url.Substring(qindex+2);
Console.WriteLine(token);
}
}
But do try to look at using a proper URL parser, it will save you a lot of hassle in the future.
(amended this question to include a check for the '?' token, and support 'q' values at the end of the query string (without the '&' at the end) )

And that's why you should use Uri and HttpUtility.ParseQueryString.

HttpUtility is fine for the .Net Framework. However that class is not available for WinRT apps. If you want to get the parameters from a url in a Windows Store App you need to use WwwFromUrlDecoder. You create an object from this class with the query string you want to get the parameters from, the object has an enumerator and supports also lambda expressions.
Here's an example
var stringUrl = "http://localhost/?name=Jonathan&lastName=Morales";
var decoder = new WwwFormUrlDecoder(stringUrl);
//Using GetFirstByName method
string nameValue = decoder.GetFirstByName("name");
//nameValue has "Jonathan"
//Using Lambda Expressions
var parameter = decoder.FirstOrDefault(p => p.Name.Contains("last")); //IWwwFormUrlDecoderEntry variable type
string parameterName = parameter.Name; //lastName
string parameterValue = parameter.Value; //Morales
You can also see http://www.dzhang.com/blog/2012/08/21/parsing-uri-query-strings-in-windows-8-metro-style-apps

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

HttpWebRequest sometimes incorrectly decoding parts of a query parameter - c#

Related

unable to remove the slash "\" in json string in c#

Uri.PathAndQuery doesnt include hash Query

How can I get a part/subdomain of my URL in C#?

How to decode Javascript Unicode into C# strings

extract query string from a URL string

Categories

Resources