extract query string from a URL string

extract query string from a URL string - c#

I am reading from history, and I want that when i come across a google query, I can extract the query string. I am not using request or httputility since i am simply parsing a string. however, when i come across URLs like this, my program fails to parse it properly:
http://www.google.com.mt/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=mt&source=hp&biw=986&bih=663&q=hotmail&meta=&btnG=Fittex+bil-Google
what i was trying to do is get the index of q= and the index of & and take the words in between but in this case the index of & will be smaller than q= and it will give me errors.
any suggestions?
thanks for your answers, all seem good :) p.s. i couldn't use httputility, not I don't want to. when i add a reference to system.web, httputility isn't included! it's only included in an asp.net application. Thanks again

It's not clear why you don't want to use HttpUtility. You could always add a reference to System.Web and use it:
var parsedQuery = HttpUtility.ParseQueryString(input);
Console.WriteLine(parsedQuery["q"]);
If that's not an option then perhaps this approach will help:
var query = input.Split('&')
.Single(s => s.StartsWith("q="))
.Substring(2);
Console.WriteLine(query);
It splits on & and looks for the single split result that begins with "q=" and takes the substring at position 2 to return everything after the = sign. The assumption is that there will be a single match, which seems reasonable for this case, otherwise an exception will be thrown. If that's not the case then replace Single with Where, loop over the results and perform the same substring operation in the loop.
EDIT: to cover the scenario mentioned in the comments this updated version can be used:
int index = input.IndexOf('?');
var query = input.Substring(index + 1)
.Split('&')
.SingleOrDefault(s => s.StartsWith("q="));
if (query != null)
Console.WriteLine(query.Substring(2));

If you don't want to use System.Web.HttpUtility (thus be able to use the client profile), you can still use Mono HttpUtility.cs which is only an independent .cs file that you can embed in your application. Then you can simply use the ParseQueryString method inside the class to parse the query string properly.

here is the solution -
string GetQueryString(string url, string key)
{
string query_string = string.Empty;
var uri = new Uri(url);
var newQueryString = HttpUtility.ParseQueryString(uri.Query);
query_string = newQueryString[key].ToString();
return query_string;
}

Why don't you create a code which returns the string from the q= onwards till the next &?
For example:
string s = historyString.Substring(url.IndexOf("q="));
int newIndex = s.IndexOf("&");
string newString = s.Substring(0, newIndex);
Cheers

Use the tools available:
String UrlStr = "http://www.google.com.mt/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=mt&source=hp&biw=986&bih=663&q=hotmail&meta=&btnG=Fittex+bil-Google";
NameValueCollection Items = HttpUtility.ParseQueryString(UrlStr);
String QValue = Items["q"];

If you really need to do the parsing yourself, and are only interested in the value for 'q' then the following would work:
string url = #"http://www.google.com.mt/search?" +
"client=firefoxa&rls=org.mozilla%3Aen-" +
"US%3Aofficial&channel=s&hl=mt&source=hp&" +
"biw=986&bih=663&q=hotmail&meta=&btnG=Fittex+bil-Google";
int question = url.IndexOf("?");
if(question>-1)
{
int qindex = url.IndexOf("q=", question);
if (qindex > -1)
{
int ampersand = url.IndexOf('&', qindex);
string token = null;
if (ampersand > -1)
token = url.Substring(qindex+2, ampersand - qindex - 2);
else
token = url.Substring(qindex+2);
Console.WriteLine(token);
}
}
But do try to look at using a proper URL parser, it will save you a lot of hassle in the future.
(amended this question to include a check for the '?' token, and support 'q' values at the end of the query string (without the '&' at the end) )

And that's why you should use Uri and HttpUtility.ParseQueryString.

HttpUtility is fine for the .Net Framework. However that class is not available for WinRT apps. If you want to get the parameters from a url in a Windows Store App you need to use WwwFromUrlDecoder. You create an object from this class with the query string you want to get the parameters from, the object has an enumerator and supports also lambda expressions.
Here's an example
var stringUrl = "http://localhost/?name=Jonathan&lastName=Morales";
var decoder = new WwwFormUrlDecoder(stringUrl);
//Using GetFirstByName method
string nameValue = decoder.GetFirstByName("name");
//nameValue has "Jonathan"
//Using Lambda Expressions
var parameter = decoder.FirstOrDefault(p => p.Name.Contains("last")); //IWwwFormUrlDecoderEntry variable type
string parameterName = parameter.Name; //lastName
string parameterValue = parameter.Value; //Morales
You can also see http://www.dzhang.com/blog/2012/08/21/parsing-uri-query-strings-in-windows-8-metro-style-apps

Related

Remove remaining url after htm?

i need to remove the url path starting from /deposit/jingdongpay.htm?bid=4089
'?' i just need /deposit/jingdongpay.htm.
It means that the url will remove anything that comes after .htm
Any idea how to do it.

In Python, you can use str.split():
url = "/deposit/jingdongpay.htm?bid=4089"
split_url = url.split('?')
>>> split_url
['/deposit/jingdongpay.htm', 'bid=4089']
>>> split_url[0]
'/deposit/jingdongpay.htm'

You can use the String.Split method:
string url = "/deposit/jingdongpay.htm?bid=4089";
string result = url.Split('?')[0];
Another approach would be to use String.Substring:
string result = url.Substring(0, url.IndexOf('?'));
Or maybe if you are interested in a LINQ solution:
string result = new string(url.TakeWhile(c => c != '?').ToArray());
result = "/deposit/jingdongpay.htm"

How can I get a part/subdomain of my URL in C#?

I have a URL like the following
http://yellowcpd.testpace.net
How can I get yellowcpd from this? I know I can do that with string parsing, but is there a builtin way in C#?

Assuming your URLs will always be testpace.net, try this:
var subdomain = Request.Url.Host.Replace("testpace.net", "").TrimEnd('.');
It'll just give you the non-testpace.net part of the Host. If you don't have Request.Url.Host, you can do new Uri(myString).Host instead.

try this
string url = Request.Url.AbsolutePath;
var myvalues= url.Split('.');

How can I get yellowcpd from this? I know I can do that with string
parsing, but is there a builtin way in C#?
.Net doesn't provide a built-in feature to extract specific parts from Uri.Host. You will have to use string manipulation or a regular expression yourself.
The only constant part of the domain string is the TLD. The TLD is the very last bit of the domain string, eg .com, .net, .uk etc. Everything else under that depends on the particular TLD for its position (so you can't assume the next to last part is the "domain name" as, for .co.uk it would be .co

This fits the bill.
Split over two lines:
string rawURL = Request.Url.Host;
string domainName = rawURL .Split(new char[] { '.', '.' })[1];
Or over one:
string rawURL = Request.Url.Host.Split(new char[] { '.', '.' })[1];

The simple answer to your question is no there isn't a built in method to extract JUST the sub-domain. With that said this is the solution that I use...
public enum GetSubDomainOption
{
ExcludeWWW,
IncludeWWW
};
public static class Extentions
{
public static string GetSubDomain(this Uri uri,
GetSubDomainOption getSubDomainOption = GetSubDomainOption.IncludeWWW)
{
var subdomain = new StringBuilder();
for (var i = 0; i < uri.Host.Split(new char[]{'.'}).Length - 2; i++)
{
//Ignore any www values of ExcludeWWW option is set
if(getSubDomainOption == GetSubDomainOption.ExcludeWWW && uri.Host.Split(new char[]{'.'})[i].ToLowerInvariant() == "www") continue;
//I use a ternary operator here...this could easily be converted to an if/else if you are of the ternary operators are evil crowd
subdomain.Append((i < uri.Host.Split(new char[]{'.'}).Length - 3 &&
uri.Host.Split(new char[]{'.'})[i+1].ToLowerInvariant() != "www") ?
uri.Host.Split(new char[]{'.'})[i] + "." :
uri.Host.Split(new char[]{'.'})[i]);
}
return subdomain.ToString();
}
}
USAGE:
var subDomain = Request.Url.GetSubDomain(GetSubDomainOption.ExcludeWWW);
or
var subDomain = Request.Url.GetSubDomain();
I currently have the default set to include the WWW. You could easilly reverse this by switching the optional parameter value in the GetSubDomain() method.
In my opinion this allows for an option that looks nice in code and without digging in appears to be 'built-in' to c#. Just to confirm your expectations...I tested three values and this method will always return just the "yellowcpd" if the exclude flag is used.
www.yellowcpd.testpace.net
yellowcpd.testpace.net
www.yellowcpd.www.testpace.net
One assumption that I use is that...splitting the hostname on a . will always result in the last two values being the domain (i.e. something.com)

As others have pointed out, you can do something like this:
var req = new HttpRequest(filename: "search", url: "http://www.yellowcpd.testpace.net", queryString: "q=alaska");
var host = req.Url.Host;
var yellow = host.Split('.')[1];
The portion of the URL you want is part of the hostname. You may hope to find some method that directly addresses that portion of the name, e.g. "the subdomain (yellowcpd) within TestSpace", but this is probably not possible, because the rules for valid host names allow for any number of labels (see Valid Host Names). The host name can have any number of labels, separated by periods. You will have to add additional restrictions to get what you want, e.g. "Separate the host name into labels, discard www if present and take the next label".

Getting string after a specific slash

I have a string and I want to get whatever is after the 3rd slash so.
I don't know of any other way I can do this, I don't really want to use regex if I dont need it.
http://www.website.com/hello for example would be hello
I have used str.LastIndexOf('/') before like:
string str3 = str.Substring(str.LastIndexOf('/') + 1);
However I am still trying to figure out how to do this for a slash that is not the first or last

string s = "some/string/you/want/to/split";
string.Join("/", s.Split('/').Skip(3).ToArray());

As suggested by C.Evenhuis, you should rely on the native System.Uri class:
string url = "http://stackoverflow.com/questions/20213490/getting-string-after-a-specific-slash"
Uri asUri = new Uri(url);
string result = asUri.LocalPath;
Console.WriteLine(result);
(live at http://csharpfiddle.com/LlLbriBm)
This will output:
/questions/20213490/getting-string-after-a-specific-slash
If you don't want the first / in the result, simply use:
string url = "http://stackoverflow.com/questions/20213490/getting-string-after-a-specific-slash"
Uri asUri = new Uri(url);
string result = asUri.LocalPath.TrimStart('/');
Console.WriteLine(result);
You should take a look in the System.Uri class documentation. There's plenty of property that can you can play with, depending on what you want to actually keep in the url (url parameters, hashtag, etc.)

If you're manipulating URLs, then use the Uri class instead of rolling your own.
But if you want to do it manually for educational reasons, you could do something like this:
int startPos = 0;
for(int i = 0; i < 3; i++)
{
startPos = s.IndexOf('/', startPos)+1;
}
var stringOfInterest = s.Substring(startPos);
There are lots of ways this might fail if the string isn't in the form you expect, so it's just an example to get you started.
Although this is premature optimisation, this sort of approach is more efficient than smashing the whole string into components and putting them back together again.

Getting substring between two separators in an arbitrary position

I have following string:
string source = "Test/Company/Business/Department/Logs.tvs/v1";
The / character is the separator between various elements in the string. I need to get the last two elements of the string. I have following code for this purpose. This works fine. Is there any faster/simpler code for this?
CODE
static void Main()
{
string component = String.Empty;
string version = String.Empty;
string source = "Test/Company/Business/Department/Logs.tvs/v1";
if (!String.IsNullOrEmpty(source))
{
String[] partsOfSource = source.Split('/');
if (partsOfSource != null)
{
if (partsOfSource.Length > 2)
{
component = partsOfSource[partsOfSource.Length - 2];
}
if (partsOfSource.Length > 1)
{
version = partsOfSource[partsOfSource.Length - 1];
}
}
}
Console.WriteLine(component);
Console.WriteLine(version);
Console.Read();
}

Why no regular expression? This one is fairly easy:
.*/(?<component>.*)/(?<version>.*)$
You can even label your groups so for your match all you need to do is:
component = myMatch.Groups["component"];
version = myMatch.Groups["version"];

The following should be faster, as it only scans as much of the string as it needs to to find two / and it doesn't bother splitting up the whole string:
string component = "";
string version = "";
string source = "Test/Company/Business/Department/Logs.tvs/v1";
int last = source.LastIndexOf('/');
if (last != -1)
{
int penultimate = source.LastIndexOf('/', last - 1);
version = source.Substring(last + 1);
component = source.Substring(penultimate + 1, last - penultimate - 1);
}
That said, as with all performance questions: profile! Try the two side-by-side with a big list of real-life inputs and see which is fastest.
(Also, this will leave empty strings rather than throw an exception if there is no slash in the input... but throw if source is null, lazy me.)

Your approach is the most suitable one given that your are looking for substrings at a particular index. A LINQ expression to do the same in this case will likely not improve the code or its readability.
For reference, there is some great information from Microsoft here on working with strings and LINQ. In particular see the article here which covers some examples with both LINQ and RegEx.
EDIT: +1 For Matt's named group within RegEx approach... that's the nicest solution I've seen.

Your code mostly looks fine. A couple of points to note:
String.Split() will never return null, so you don't need the null check on it.
If the source string has fewer than two / characters, how would you deal with that? (The Original Post was updated to address this)
Do you really want to just output empty strings if your source string is null or empty (or invalid)? If you have specific expectations about the nature of the input, you may want to consider failing fast when those expectations are not met.

You could try something like this but I doubt it would be much faster. You could do some meassurements with System.Diagnostics.StopWatch to see if you feel the need.
string source = "Test/Company/Business/Department/Logs.tvs/v1";
int index1 = source.LastIndexOf('/');
string last = source.Substring(index1 + 1);
string substring = source.Substring(0, index1);
int index2 = substring.LastIndexOf('/');
string secondLast = substring.Substring(index2 + 1);

I would try
string source = "Test/Company/Business/Department/Logs.tvs/v1";
var components = source.Split('/').Reverse().Take(2);
String last = string.Empty;
var enumerable = components as string[] ?? components.ToArray();
if (enumerable.Count() == 2)
last = enumerable.FirstOrDefault();
var secondLast = enumerable.LastOrDefault();
Hope this will help

you can retrieve the last two words using the process as below:
string source = "Test/Company/Business/Department/Logs.tvs/v1";
String[] partsOfSource = source.Split('/');
if(partsOfSourch.length>2)
for(int i=partsOfSourch.length-2;i<=partsOfSource.length-1;i++)
console.writeline(partsOfSource[i]);

Regex for replace page number in URL

I'm not good with regex and I'm not able to figure out an applicable solution, so after a good amount of searching I'm still unable to nail this.
I have an URL with an optional page=123 parameter. There can be other optional get parameters in the url too which can occur before or after the page parameter.
I need to replace that parameter to page=--PLACEHOLDER-- to be able to use it with my paging function.
If the page parameter does not occur in the url, I would like to add it the way described before.
I'm trying to write an extension method for on string for this, but a static function would be just as good.
A bit of explanation would be appreciated too, as it would give me a good lesson in regex and hopefully next time I wouldn't have to ask.
Also I'm using asp.net mvc-3 but for compatibility reasons a complex rewrite occurs before the mvc-s routing, and I'm not able to access that. So please don't advise me to use mvc-s routing for this because I cant.

I suggest skipping the regex and using another approach:
Extract the querystring from the url.
Build a HttpValueCollection from the querystring using HttpUtility.ParseQueryString
Replace the page parameter in the collection.
Call .ToString() on the collection and you get a new querystring back.
Construct the altered url using the original minus the old querystring plus the new one.
Something like:
public static string SetPageParameter(this string url, int pageNumber)
{
var queryStartIndex = url.IndexOf("?") + 1;
if (queryStartIndex == 0)
{
return string.Format("{0}?page={1}", url, pageNumber);
}
var oldQueryString = url.Substring(queryStartIndex);
var queryParameters = HttpUtility.ParseQueryString(oldQueryString);
queryParameters["page"] = pageNumber;
return url.Substring(0, queryStartIndex) + queryParameters.ToString();
}
I haven't verified that this compiles, but it should give you an idea.

You want it as a static method with regular expression, here is a first state :
public static string ChangePage(string sUrl)
{
string sRc = string.Empty;
const string sToReplace = "&page=--PLACEHOLDER--";
Regex regURL = new Regex(#"^http://.*(&?page=(\d+)).*$");
Match mPage = regURL.Match(sUrl);
if (mPage.Success) {
GroupCollection gc = mPage.Groups;
string sCapture = gc[1].Captures[0].Value;
// gc[2].Captures[0].Value) is the page number
sRc = sUrl.Replace(sCapture, sToReplace);
}
else {
sRc = sUrl+sToReplace;
}
return sRc;
}
With a small test :
static void Main(string[] args)
{
string sUrl1 = "http://localhost:22666/HtmlEdit.aspx?mid=0&page=123&test=12";
string sUrl2 = "http://localhost:22666/HtmlEdit.aspx?mid=0&page=125612";
string sUrl3 = "http://localhost:22666/HtmlEdit.aspx?mid=0&pager=12";
string sUrl4 = "http://localhost:22666/HtmlEdit.aspx?page=12&mid=0";
string sRc = string.Empty;
sRc = ChangePage(sUrl1);
Console.WriteLine(sRc);
sRc = ChangePage(sUrl2);
Console.WriteLine(sRc);
sRc = ChangePage(sUrl3);
Console.WriteLine(sRc);
sRc = ChangePage(sUrl4);
Console.WriteLine(sRc);
}
which give the result :
http://localhost:22666/HtmlEdit.aspx?mid=0&page=--PLACEHOLDER--&test=12
http://localhost:22666/HtmlEdit.aspx?mid=0&page=--PLACEHOLDER--
http://localhost:22666/HtmlEdit.aspx?mid=0&pager=12&page=--PLACEHOLDER--
http://localhost:22666/HtmlEdit.aspx?&page=--PLACEHOLDER--&mid=0

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

extract query string from a URL string - c#

here is the solution - string GetQueryString(string url, string key) { string query_string = string.Empty; var uri = new Uri(url); var newQueryString = HttpUtility.ParseQueryString(uri.Query); query_string = newQueryString[key].ToString(); return query_string; }

Why don't you create a code which returns the string from the q= onwards till the next &? For example: string s = historyString.Substring(url.IndexOf("q=")); int newIndex = s.IndexOf("&"); string newString = s.Substring(0, newIndex); Cheers

And that's why you should use Uri and HttpUtility.ParseQueryString.

Related

Remove remaining url after htm?

How can I get a part/subdomain of my URL in C#?

Getting string after a specific slash

Getting substring between two separators in an arbitrary position

Regex for replace page number in URL

Categories

Resources