Retrieving a portion of a url

Retrieving a portion of a url - c#

I need your collective wisdom. I am needing to get a portion of a url so that I can pass it as a parameter to make some stuff happen. Here's what I've got:
here is an example of the url "somesite/somepage/johndoe21911", I am needing to get the "21911" so that I can pass it into this:
var url = Request.ApplicationPath.Replace("/", "");
Session["agencyId"] = _Apps.GetGehaAgencyData(portion needed goes here);
Any direction is greatly appreciated

If your URL looks like an actual URL (with the http:// part) then you could use Uri class:
private static void Extract()
{
Uri uri = new Uri("http://somesite/somepage/johndoe21911");
string last = uri.Segments.LastOrDefault();
string numOnly = Regex.Replace(last, "[^0-9 _]", string.Empty);
Console.WriteLine(last);
Console.WriteLine(numOnly);
}
If it's exactly like in your example (without the http:// part) then you could do something like this:
private static void Extract()
{
string uri = "http://somesite/somepage/johndoe21911";
string last = uri.Substring(uri.LastIndexOf('/') + 1);
string numOnly = Regex.Replace(last, "[^0-9 _]", string.Empty);
Console.WriteLine(last);
Console.WriteLine(numOnly);
}
Above is assuming you want ALL numerics from the last segment of the URL, which is what you've said your requirement is. That is, if your URL were to look like this:
somesite/somepage/john123doe456"
This will extract 123456.
If you want only the last 5 characters, you could simply use string.Substring() to extract the last five characters.
If you want numerics which are at the end of the string then this would work.
private static void Extract()
{
string uri = "somesite/somepage/john123doe21911";
string last = uri.Substring(uri.LastIndexOf('/') + 1);
string numOnly = Regex.Match(last, #"\d+$").Value;
Console.WriteLine(last);
Console.WriteLine(numOnly);
}
Oh and saying I've come across some stuff on google, but wasn't really sure on how to implement them is a very lazy answer. If you Google you can find countless examples of how to do all these things, even on this site itself. Please from next time onward do your research first and try yourself first.

Related

trim string before character but still keep the remain part after it

So I have this string which I have to trim and manipulate a little with it.
My string example:
string test = "studentName_123.pdf";
Now, what I want to do is somehow extract only the _123 part and at the end I need to have studentName.pdf
What I have tried:
string test_extracted = test.Substring(0, test.LastIndexOf("_") )+".pdf";
This also works but the thing is that I don't want to add the ".pdf" suffix at the end of the string manually because I can have strings that are not pdf, for ex. studentName.docx , studentName.png.
So basically I just want the "_123" part removed but still keep the remain part after that.

I think this might help you:
string test = "studentName_123.pdf";
string test_extracted = test.Substring(0, test.LastIndexOf("_") )+ test.Substring(test.LastIndexOf("."),test.Length - test.LastIndexOf(".") );
Using Remove(int startIndex, int count):
string test = "studentName_123.pdf";
string test_extracted = test.Remove(test.LastIndexOf("_"), test.LastIndexOf(".") - test.LastIndexOf("_"));

Sounds like you mean something like this?
string extension = Path.GetExtension(test);
string pdfName = Path.GetFileNameWithoutExtension(test).Split('_')[0];
string fullName = pdfName + extension;

Since you know what value you will always be replacing in your strings, "_123", to base on your example, just utilize the replace method and replace it with nothing since the method expects two arguments;
string test_extracted = test.replace('_123', '');

This could be solved with a regular expression like this
(\w*)_.*(\.\w*) where the first capture group (\w*) matches everything before the underscore and the second group (\.\w*) matches the file extensions.
Lastly we just have to concat the groups without the stuff inbetween like so:
string test = "studentName_123.pdf";
var regex = Regex.Match(test, #"(\w*)_.*(\.\w*)");
string newString = regex.Groups[1].Value + regex.Groups[2].Value;

How extract Specific names from url?

I already have Listbox full with URLs like this I convert them to String
http://example.com/1392/Music/1392/Shahrivar/21/Avicii%20-%20True/01.%20Avicii%20Ft.%20Aloe%20Blacc%20-%20Wake%20Me%20Up%20(CDQ)%20%5b320%5d.mp3
and I wanna extract for example on this link Name of Song: "Avicii Ft Aloe Blacc -Wake Me Up " I'm using c# I already extract links from a web page and now I only need to extract names from links. thanks already for any suggestions or help.

Try this:
using System;
using System.Linq;
using System.Net;
namespace ConsoleApplication1
{
class Program
{
static void Main (string[] args)
{
var url = "http://example.com/1392/Music/1392/Shahrivar/21/Avicii%20-%20True/01.%20Avicii%20Ft.%20Aloe%20Blacc%20-%20Wake%20Me%20Up%20(CDQ)%20%5b320%5d.mp3";
var uri = new Uri (url);
string[] segments = uri.Segments.Select (x => WebUtility.UrlDecode (x).TrimEnd ('/')).ToArray ();
}
}
}

If you know the structure of the URL you are scraping you should be able to break-off the useless part of the string.
For example, if you know that the URL follows the form:
http://example.com/1392/Music/1392/Shahrivar/21/{Artist}-{Album}/{Track Information}
Roughly, I think the following would allow you to extract the information you want from a single link:
void Main (string[] args)
{
var example = #"http://example.com/1392/Music/1392/Shahrivar/21/Avicii%20-%20True/01.%20Avicii%20Ft.%20Aloe%20Blacc%20-%20Wake%20Me%20Up%20(CDQ)%20%5b320%5d.mp3";
var parts = example.split('/');
var album = parts[7];
var trackInfo = parts[8];
var trackParts = trackInfo.split('-');
var artist = trackParts[0];
var trackTitle = trackParts[1];
Console.WriteLine(trackTitle);
}
Here I am splitting the URL by '/', which is a messy solution, but for a simple case, it works. Then I am finding the index within the tokenized string where the desired information can be found. once I have the track information, I know the convention is to separate the Artist from the Title by a '-', so I split again and then have both artist and title.
You can refactor this into a method which takes the URL, and returns an object containing the Artist and song title. After that, you might want to use a string.Replace on the '%20' with ' '.

First of all, use HttpUtility.DecodeUrl. This function will decode HTML special chars, leaving a plain string to work with. You can then simply split by /.

How can I get a part/subdomain of my URL in C#?

I have a URL like the following
http://yellowcpd.testpace.net
How can I get yellowcpd from this? I know I can do that with string parsing, but is there a builtin way in C#?

Assuming your URLs will always be testpace.net, try this:
var subdomain = Request.Url.Host.Replace("testpace.net", "").TrimEnd('.');
It'll just give you the non-testpace.net part of the Host. If you don't have Request.Url.Host, you can do new Uri(myString).Host instead.

try this
string url = Request.Url.AbsolutePath;
var myvalues= url.Split('.');

How can I get yellowcpd from this? I know I can do that with string
parsing, but is there a builtin way in C#?
.Net doesn't provide a built-in feature to extract specific parts from Uri.Host. You will have to use string manipulation or a regular expression yourself.
The only constant part of the domain string is the TLD. The TLD is the very last bit of the domain string, eg .com, .net, .uk etc. Everything else under that depends on the particular TLD for its position (so you can't assume the next to last part is the "domain name" as, for .co.uk it would be .co

This fits the bill.
Split over two lines:
string rawURL = Request.Url.Host;
string domainName = rawURL .Split(new char[] { '.', '.' })[1];
Or over one:
string rawURL = Request.Url.Host.Split(new char[] { '.', '.' })[1];

The simple answer to your question is no there isn't a built in method to extract JUST the sub-domain. With that said this is the solution that I use...
public enum GetSubDomainOption
{
ExcludeWWW,
IncludeWWW
};
public static class Extentions
{
public static string GetSubDomain(this Uri uri,
GetSubDomainOption getSubDomainOption = GetSubDomainOption.IncludeWWW)
{
var subdomain = new StringBuilder();
for (var i = 0; i < uri.Host.Split(new char[]{'.'}).Length - 2; i++)
{
//Ignore any www values of ExcludeWWW option is set
if(getSubDomainOption == GetSubDomainOption.ExcludeWWW && uri.Host.Split(new char[]{'.'})[i].ToLowerInvariant() == "www") continue;
//I use a ternary operator here...this could easily be converted to an if/else if you are of the ternary operators are evil crowd
subdomain.Append((i < uri.Host.Split(new char[]{'.'}).Length - 3 &&
uri.Host.Split(new char[]{'.'})[i+1].ToLowerInvariant() != "www") ?
uri.Host.Split(new char[]{'.'})[i] + "." :
uri.Host.Split(new char[]{'.'})[i]);
}
return subdomain.ToString();
}
}
USAGE:
var subDomain = Request.Url.GetSubDomain(GetSubDomainOption.ExcludeWWW);
or
var subDomain = Request.Url.GetSubDomain();
I currently have the default set to include the WWW. You could easilly reverse this by switching the optional parameter value in the GetSubDomain() method.
In my opinion this allows for an option that looks nice in code and without digging in appears to be 'built-in' to c#. Just to confirm your expectations...I tested three values and this method will always return just the "yellowcpd" if the exclude flag is used.
www.yellowcpd.testpace.net
yellowcpd.testpace.net
www.yellowcpd.www.testpace.net
One assumption that I use is that...splitting the hostname on a . will always result in the last two values being the domain (i.e. something.com)

As others have pointed out, you can do something like this:
var req = new HttpRequest(filename: "search", url: "http://www.yellowcpd.testpace.net", queryString: "q=alaska");
var host = req.Url.Host;
var yellow = host.Split('.')[1];
The portion of the URL you want is part of the hostname. You may hope to find some method that directly addresses that portion of the name, e.g. "the subdomain (yellowcpd) within TestSpace", but this is probably not possible, because the rules for valid host names allow for any number of labels (see Valid Host Names). The host name can have any number of labels, separated by periods. You will have to add additional restrictions to get what you want, e.g. "Separate the host name into labels, discard www if present and take the next label".

Getting string after a specific slash

I have a string and I want to get whatever is after the 3rd slash so.
I don't know of any other way I can do this, I don't really want to use regex if I dont need it.
http://www.website.com/hello for example would be hello
I have used str.LastIndexOf('/') before like:
string str3 = str.Substring(str.LastIndexOf('/') + 1);
However I am still trying to figure out how to do this for a slash that is not the first or last

string s = "some/string/you/want/to/split";
string.Join("/", s.Split('/').Skip(3).ToArray());

As suggested by C.Evenhuis, you should rely on the native System.Uri class:
string url = "http://stackoverflow.com/questions/20213490/getting-string-after-a-specific-slash"
Uri asUri = new Uri(url);
string result = asUri.LocalPath;
Console.WriteLine(result);
(live at http://csharpfiddle.com/LlLbriBm)
This will output:
/questions/20213490/getting-string-after-a-specific-slash
If you don't want the first / in the result, simply use:
string url = "http://stackoverflow.com/questions/20213490/getting-string-after-a-specific-slash"
Uri asUri = new Uri(url);
string result = asUri.LocalPath.TrimStart('/');
Console.WriteLine(result);
You should take a look in the System.Uri class documentation. There's plenty of property that can you can play with, depending on what you want to actually keep in the url (url parameters, hashtag, etc.)

If you're manipulating URLs, then use the Uri class instead of rolling your own.
But if you want to do it manually for educational reasons, you could do something like this:
int startPos = 0;
for(int i = 0; i < 3; i++)
{
startPos = s.IndexOf('/', startPos)+1;
}
var stringOfInterest = s.Substring(startPos);
There are lots of ways this might fail if the string isn't in the form you expect, so it's just an example to get you started.
Although this is premature optimisation, this sort of approach is more efficient than smashing the whole string into components and putting them back together again.

Regex for replace page number in URL

I'm not good with regex and I'm not able to figure out an applicable solution, so after a good amount of searching I'm still unable to nail this.
I have an URL with an optional page=123 parameter. There can be other optional get parameters in the url too which can occur before or after the page parameter.
I need to replace that parameter to page=--PLACEHOLDER-- to be able to use it with my paging function.
If the page parameter does not occur in the url, I would like to add it the way described before.
I'm trying to write an extension method for on string for this, but a static function would be just as good.
A bit of explanation would be appreciated too, as it would give me a good lesson in regex and hopefully next time I wouldn't have to ask.
Also I'm using asp.net mvc-3 but for compatibility reasons a complex rewrite occurs before the mvc-s routing, and I'm not able to access that. So please don't advise me to use mvc-s routing for this because I cant.

I suggest skipping the regex and using another approach:
Extract the querystring from the url.
Build a HttpValueCollection from the querystring using HttpUtility.ParseQueryString
Replace the page parameter in the collection.
Call .ToString() on the collection and you get a new querystring back.
Construct the altered url using the original minus the old querystring plus the new one.
Something like:
public static string SetPageParameter(this string url, int pageNumber)
{
var queryStartIndex = url.IndexOf("?") + 1;
if (queryStartIndex == 0)
{
return string.Format("{0}?page={1}", url, pageNumber);
}
var oldQueryString = url.Substring(queryStartIndex);
var queryParameters = HttpUtility.ParseQueryString(oldQueryString);
queryParameters["page"] = pageNumber;
return url.Substring(0, queryStartIndex) + queryParameters.ToString();
}
I haven't verified that this compiles, but it should give you an idea.

You want it as a static method with regular expression, here is a first state :
public static string ChangePage(string sUrl)
{
string sRc = string.Empty;
const string sToReplace = "&page=--PLACEHOLDER--";
Regex regURL = new Regex(#"^http://.*(&?page=(\d+)).*$");
Match mPage = regURL.Match(sUrl);
if (mPage.Success) {
GroupCollection gc = mPage.Groups;
string sCapture = gc[1].Captures[0].Value;
// gc[2].Captures[0].Value) is the page number
sRc = sUrl.Replace(sCapture, sToReplace);
}
else {
sRc = sUrl+sToReplace;
}
return sRc;
}
With a small test :
static void Main(string[] args)
{
string sUrl1 = "http://localhost:22666/HtmlEdit.aspx?mid=0&page=123&test=12";
string sUrl2 = "http://localhost:22666/HtmlEdit.aspx?mid=0&page=125612";
string sUrl3 = "http://localhost:22666/HtmlEdit.aspx?mid=0&pager=12";
string sUrl4 = "http://localhost:22666/HtmlEdit.aspx?page=12&mid=0";
string sRc = string.Empty;
sRc = ChangePage(sUrl1);
Console.WriteLine(sRc);
sRc = ChangePage(sUrl2);
Console.WriteLine(sRc);
sRc = ChangePage(sUrl3);
Console.WriteLine(sRc);
sRc = ChangePage(sUrl4);
Console.WriteLine(sRc);
}
which give the result :
http://localhost:22666/HtmlEdit.aspx?mid=0&page=--PLACEHOLDER--&test=12
http://localhost:22666/HtmlEdit.aspx?mid=0&page=--PLACEHOLDER--
http://localhost:22666/HtmlEdit.aspx?mid=0&pager=12&page=--PLACEHOLDER--
http://localhost:22666/HtmlEdit.aspx?&page=--PLACEHOLDER--&mid=0

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Retrieving a portion of a url - c#

Related

trim string before character but still keep the remain part after it

How extract Specific names from url?

How can I get a part/subdomain of my URL in C#?

Getting string after a specific slash

Regex for replace page number in URL

Categories

Resources