How to extract first segment out of URI using Uri class

How to extract first segment out of URI using Uri class - c#

I am using Uri class for application development and needs the first segment of user-entered uri that either it contains http:// or http:// or ftp:// etc.
If not so,i have to hardcode to add to it.
I have already searched for it using googling and stackoverflowing but they didn't showed the precise requirement for me.
string path,downloadURL;
path = this.savePath.Text;
downloadURL = this.downloadURL.Text;
// i have done this but it didn't check if already existing .
downloadURL = "http://" + downloadURL;
Uri tmp = new Uri(downloadURL);
//extracts the last element
string EndPathFileName = tmp.Segments.Last();
// something like this but it returns only '/'.
//string StartPathFileName = tmp.Segments.First();
//Console.WriteLine(StartPathFileName);
Any suggestions?

Well there are a few options depending on what behavior you want...
You could just check if it contains :// which might be enough for what you want:
if(!downloadURL.Contains("://"))
downloadURL = "http://" + downloadURL;
Note that this would allow things such as "rubbish://www.example.com"
If you wanted to be a bit more cautious, you could check if the string starts with one of your predetermined values. For example:
if(!downloadURL.StartsWith("http://") && !downloadURL.StartsWith("https://") && !downloadURL.StartsWith("ftp://"))
downloadURL = "http://" + downloadURL;
Though this would mean that "rubbish://www.example.com" would become "http://rubbish://www.example.com".
You could go for a mix of both options, but keep in mind it can become very difficult to cope with all kinds of user input.
One final suggestion, which is even more robust, might be as follows:
string[] approvedSchemes = new string[] { "http", "https", "ftp" };
string userScheme = "";
if(downloadURL.Contains("://"))
{
// Get the first scheme defined, we will use this if it is in the approved list.
userScheme = downloadURL.Substring(0, downloadURL.IndexOf("://"));
// To cater for multiple :// remove all of them
downloadURL = downloadURL.Substring(downloadURL.LastIndexOf("://") + 3);
}
// Check if the user defined scheme is in the approved list, if not then set to http.
if(Array.IndexOf(approvedSchemes, userScheme.ToLowerInvariant()) > -1)
downloadURL = userScheme + "://" + downloadURL;
else
downloadURL = "http://" + downloadURL;
Here is a working example

You need to use Uri.Scheme Property
Uri baseUri = new Uri("http://www.contoso.com/");
Console.WriteLine(baseUri.Scheme); //http
Uri anotherUri = new Uri("https://www.contoso.com/");
Console.WriteLine(anotherUri.Scheme); //https

Related

Remove additional slashes from URL

In ASP.Net it is posible to get same content from almost equal pages by URLs like
localhost:9000/Index.aspx and localhost:9000//Index.aspx or even localhost:9000///Index.aspx
But it isn't looks good for me.
How can i remove this additional slashes before user go to some page and in what place?

Use this :
url = Regex.Replace(url , #"/+", #"/");
it will support n times

see
https://stackoverflow.com/a/19689870
https://msdn.microsoft.com/en-us/library/9hst1w91.aspx#Examples
You need to specify a base Uri and a relative path to get the canonized behavior.
Uri baseUri = new Uri("http://www.contoso.com");
Uri myUri = new Uri(baseUri, "catalog/shownew.htm");
Console.WriteLine(myUri.ToString());

This solution is not that pretty but very easy
do
{
url = url.Replace("//", "/");
}
while(url.Contains("//"));
This will work for many slashes in your url but the runtime is not that great.

Remove them, for example:
url = url.Replace("///", "/").Replace("//", "/");

Regex.Replace("http://localhost:3000///////asdasd/as///asdasda///asdasdasd//", #"/+", #"/").Replace(":/", "://")

Here is the code snippets for combining URL segments, with the ability of removing the duplicate slashes:
public class PathUtils
{
public static string UrlCombine(params string[] components)
{
var isUnixPath = Path.DirectorySeparatorChar == '/';
for (var i = 1; i < components.Length; i++)
{
if (Path.IsPathRooted(components[i])) components[i] = components[i].TrimStart('/', '\\');
}
var url = Path.Combine(components);
if (!isUnixPath)
{
url = url.Replace(Path.DirectorySeparatorChar, '/');
}
return Regex.Replace(url, #"(?<!(http:|https:))//", #"/");
}
}

C# , splitting parts unclear

I have seen some topics about this already but those we're a bit unclear about what I want,
Im making a Special WebBrowser in C#
But, I want to split the TextBox text
Like:
Textbox1.Text = "http://google.com/lol";
I want,
Label1.Text = "http://google.com";
And
Label2.Text = "/lol";
But i want it to like detect the URL from TextBox1.Text not just google.com, every url
Like:
[
Label1.Text = Label1.Text("ignorefrom/");
Label2.Text = Label2.Text("ignorefromno/");
]
Ofcourse the ting above isnt possible, but thats basicly what I mean
Anny1 know how I can do that
Possible better explaining.
I'm making a web browser I want to detect the
URL: for example: http://google.com/lol
I want the first and second part from an url in a label
So: http://google.c om in label1 and /lol in label2 with every url
I have seen multiple topics about this but this was a bit different then my case

You should check out the documentation for the .NET class Uri. It has the functionality you're looking for.
Example:
var url = new Uri("http://www.google.com/some/path/file.aspx");
Console.WriteLine(url.Host); // prints www.google.com
Console.WriteLine(url.AbsolutePath); // prints /some/path/file.aspx
Properties used in this example:
Host
AbsolutePath

I would use the right tool, in this case you could use Uri.TryCreate:
string url = "http://google.com/lol";
Uri uri;
if (Uri.TryCreate(url, UriKind.Absolute, out uri))
{
Textbox1.Text = uri.Scheme + Uri.SchemeDelimiter + uri.Host;
Label1.Text = uri.AbsolutePath;
}
If you want to include the query as mentioned in a comment you should use uri.PathAndQuery. Then "http://google.com/lol?lol=1" returns /lol?lol=1 as desired.

Thx Tim Schmelter
string url = "http://google.com/lol";
Uri uri;
if (Uri.TryCreate(url, UriKind.Absolute, out uri))
{
Textbox1.Text = uri.Scheme + Uri.SchemeDelimiter + uri.Host;
Label1.Text = uri.AbsolutePath;
}
works like a charm :)

This should give the result you want:
using System.Uri;
Uri uri = new Uri("http://google.com/lol");
Label1.Text = uri.AbsoluteUri;
Label2.Text = uri.PathAndQuery;

Get only the prefix from a host name in a given URL [duplicate]

This question already has answers here:
Top level domain from URL in C#
(7 answers)
Closed 9 years ago.
I need to get the domain name without the top level domain suffix of a given url.
e.g
Url :www.google.com then output=google
Url :http://www.google.co.uk/path1/path2 then output=google
Url :http://google.co.uk/path1/path2 then output=google
Url :http://google.com then output=google
Url :http://google.co.in then output=google
Url :http://mail.google.co.in then output=google
For that i try this code
var uri = new Uri("http://www.google.co.uk/path1/path2");
var sURL = uri.Host;
string[] aa = sURL.Split('.');
MessageBox.Show(aa[1]);
But every time i can't get correct output(specialty url without www). after that i search no google and try to solve it but it's help less. i also see the question on stackoverflow but it can't work for me.

This answer is just for completeness, cause I think it would be a valid approach, if it wouldn't be so complicated and essentially abuse the DNS system. Note that this isn't 100% foolproof either (and requires access to a DNS).
Extract the full domain name of the URL. Let's take http://somepart.subdomain.example.org/some/files as an example. We'd get somepart.subdomain.example.org.
Split the domain name at dots: {"somepart", "subdomain", "example", "org"}.
Take the rightmost part (org) and see whether it is a known (top level) domain name.
If it is, the next part to the left is the domain name you're looking for.
If it isn't, try to retrieve an IP for this.
If there's an IP for it, the last added part is your domain name.
If there isn't an IP either, add the next part to the left and repeat these checks (in this example you'd now test for example.org).

The right answer to your question is: No you can't.
The only solution that can almost achieve it in a dirty and not easy to maintain way is to have a list with all the existent TopLevelDomain (you can find an incomplete one in this SO answer)
var allTld = new[] {".com", ".it",".co.uk"}; //there you have find a really big list of all TLD
string urlToCheck = "www.google.com";//sports-ak.espn.go.com/nfl/ http://www.google.co.uk/path1/path2
if (!urlToCheck.StartsWith("http", StringComparison.OrdinalIgnoreCase))
{
urlToCheck = string.Concat("http://", urlToCheck);
}
var uri = new Uri(urlToCheck);
string domain = string.Empty;
for (int i = 0; i < allTld.Length; i++)
{
var index = uri.Host.LastIndexOf(allTld[i], StringComparison.OrdinalIgnoreCase);
if (index>-1)
{
domain = uri.Host.Substring(0, index);
index = domain.LastIndexOf(".", StringComparison.Ordinal);
if (index>-1)
{
domain = domain.Substring(index + 1);break;
}
}
}
if (string.IsNullOrEmpty(domain))
{
throw new Exception(string.Format("TLD of url {0} is missing", urlToCheck));
}
IMHO You should ask yourself: Do I really need the name without the TLD?

This is the best you can get. It's not a maintainable solution, it is not a "fast" solution. (GetDomain.GetDomainFromUrl should be optimized).
Use GetDomain.GetDomainFromUrl
In TldPatterns.EXACT add "co.uk" (I don't know why it doesn't exist in the first place)
Some other minor string manipulations
This what it should look like:
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
class TldPatterns
{
private TldPatterns()
{
// Prevent instantiation.
}
/**
* If a hostname is contained in this set, it is a TLD.
*/
static public string[] EXACT = new string[] {
"gov.uk",
"mil.uk",
"co.uk",
//...
public class Program
{
static void Main(string[] args)
{
string[] urls = new[] {"www.google.com", "http://www.google.co.uk/path1/path2 ", "http://google.co.uk/path1/path2 ",
"http://google.com", "http://google.co.in"};
foreach (var item in urls)
{
string url = item;
if (!Regex.IsMatch(item, "^\\w+://"))
url = "http://" + item;
var domain = GetDomain.GetDomainFromUrl(url);
Console.WriteLine("Original : " + item);
Console.WriteLine("URL : " + url);
Console.WriteLine("Domain : " + domain);
Console.WriteLine("Domain Part : " + domain.Substring(0, domain.IndexOf('.')));
Console.WriteLine();
}
}
}
Outputs:
Original : www.google.com
URL : http://www.google.com
Domain : google.com
Domain Part : google
Original : http://www.google.co.uk/path1/path2
URL : http://www.google.co.uk/path1/path2
Domain : google.co.uk
Domain Part : google
Original : http://google.co.uk/path1/path2
URL : http://google.co.uk/path1/path2
Domain : google.co.uk
Domain Part : google
Original : http://google.com
URL : http://google.com
Domain : google.com
Domain Part : google
Original : http://google.co.in
URL : http://google.co.in
Domain : google.co.in
Domain Part : google

I have tested using following Regex with your all cases and it works.
string url = "http://www.google.co.uk/path1/path2";
Regex rgx = new Regex(#"(http(s?)://)?(www.)?((?<content>.*?)\.){1}([\w]+\.?)+");
Match MatchResult = rgx.Match(url);
string result = MatchResult.Groups["content"].Value; //google

String that came from Request.Url.ToString() misteriously changes to another string when manipulating/comparing the first characters

I'm aware that there are easier ways to do this and believe me, I've tried them. I'm of course open to any suggestions =). You don't need to read the whole code, just the part that says where the problem lies. Also, I'm debbugging perl style so you guys can see. Oh and did I mention that on my development environment everything works as intended?
Here's the code:
string GetPortalAlias()
{
String myURL2 = Request.Url.ToString();
URLLabel.Text = "Original Request.Url.ToString() returned: \"" + myURL2 + "\"";
string myURL = string.Copy(myURL2);
URLLabel.Text = "Copying it to myURL, it's now: \"" + myURL + "\"";
myURL = myURL.ToLower().Trim();
URLLabel.Text += "<br>Trimming and ToLower myURL.<br>The new url is \"" + myURL + "\"" + "<br>";
myURL = myURL.Replace(":80", "");
URLLabel.Text += "Replacing the \":80\".<br> The new url is\"" + myURL + "\"<br>";
//***HERE LIES THE PROBLEM***
myURL = myURL.Replace("http://", "");
URLLabel.Text += "Replacing the \"http://\".<br> The new url is\"" + myURL + "\"<br>";
//***PROBLEM ENDS***
myURL = myURL.Remove(myURL.IndexOf("/"));
URLLabel.Text += "Removing everything after the \"/\"." + "<br> The new url is \"" + myURL + "\"<br>";
URLLabel.Text += "<br>GetPortalAlias Returning \"" + myURL + "\"";
return myURL;
}
Believe it or not, the output produced in the webpage is this:
Copying it to myURL, it's now: "http://sar.smg.com.ar/Default.aspx?TabID=912"
Trimming and ToLower myURL.
The new url is "http://sar.smg.com.ar/default.aspx?tabid=912"
Replacing the ":80".
The new url is"http://sar.smg.com.ar/default.aspx?tabid=912"
Replacing the "http://".
The new url is"intranetqa/default.aspx?tabid=912"
Removing everything after the "/".
The new url is "intranetqa"
GetPortalAlias Returning "intranetqa"
So... for some reason whenever it reaches the replace section it mysteriously mutates to start with "intranetqa" instead of "sar.smg.com.ar". "intranetqa" is our default hostname. CHANGING OR TAKING AWAY ANY CHARACTER OF HTTP:// IN ANY WAY MUTATES THE STRING.
I do a string.copy because I'm aware that if two strings are equal the compiler stores them in the same place therefore I wanted to prevent errors. Taking those lines away and use Request.Url.ToString() tomyURL directly does nothing at all. They were just a test to see if that worked.
Here's a list of the things I've tried:
All combinations of string / String, none worked.
I've tried Request.Host.Url and it just gave me "intranetqa".
I've used Request.Url.AbsoluteUri and that's why I have the replace
:80 line.
USING THE .tochararray FUNCTION GIVES ME BACK THE INTRANETQA THING
myURL = myURL.Substring(6) gives back the intranetqa thing.
string.Contains("sar.smg.com.ar") gives back false.
I believe the trick lies around here:
Uri uriAddress1 = Request.Url; and "The parts are <br>" + "Part 1: " + uriAddress1.Segments[0] + "<br>Part 2: " + uriAddress1.Segments[1]; Gives Part1 : "/" and Part 2: "Default.aspx". Trying to access part 3 (index 2) gives an exception.
The request.url does not have the first part, but when I call the ToString() method, it does have like a "fake" first part

Between your browser and the server are a reverse proxy and an output re-writer. These may be the same component, or separate components.
The URL your server actually sees is always of the form http://intranetqa/default.aspx?tabid=912 (after the reverse proxy/URL re-writer has intercepted the request).
The output your server produces is actually like:
Copying it to myURL, it's now: "http://intranetqa/Default.aspx?TabID=912"
Trimming and ToLower myURL.
The new url is "http://intranetqa/default.aspx?tabid=912"
Replacing the ":80".
The new url is"http://intranetqa/default.aspx?tabid=912"
Replacing the "http://".
The new url is"intranetqa/default.aspx?tabid=912"
Removing everything after the "/".
The new url is "intranetqa"
GetPortalAlias Returning "intranetqa"
The output re-writer is inspecting the output from your server and doing a replace of http://intranetqa with http://sar.smg.com.ar. Once you strip the http:// off of the front of these strings, it's no longer a match and so replacement no longer occurs.
If you want to know what the original requesting URL/host are, hopefully the reverse proxy either is, or can be configured to, adding an extra header to the request with the original URL.

You can try something like this
Uri uriAddress1 = new Uri("http://www.contoso.com/title/index.htm");
Console.WriteLine("The parts are {0}, {1}, {2}", uriAddress1.Segments[0], uriAddress1.Segments[1], uriAddress1.Segments[2]);
Uri.Segments Property
This is better way to handle URIs and their segments.

Try to use this property instead:
String myURL2 = Request.Url.AbsoluteUri;

Here is an Extension method that I use to pull the SiteRootPath. You should be able to easily adjust it however you need it. You will need access to the HttpContext for what I currently have below, however, you don't sound like you need that.
using System;
using System.Web;
namespace FlixPicks.Web.Extensions
{
public static class HttpContextExtensions
{
public static string SiteRootPath(this HttpContext context)
{
if (context == null || context.Request == null) { return null; }
return context.Request.Url.SiteRootPath(context.Request.ApplicationPath);
}
public static string SiteRootPath(this HttpContextBase context)
{
return context.Request.Url.SiteRootPath(context.Request.ApplicationPath);
}
private static string SiteRootPath(this Uri url, string applicationPath)
{
if (url == null) { return null; }
// Formatting the fully qualified website url/name.
string appPath = string.Format(
"{0}://{1}{2}{3}",
url.Scheme,
url.Host,
url.Port == 80 ? string.Empty : ":" + url.Port,
applicationPath);
// Remove ending slash(es) if one or more exists to consistently return
// a path without an ending slash. Could have just as well choosen to always include an ending slash.
while (appPath.EndsWith("/") || appPath.EndsWith("\\"))
{
appPath = appPath.Substring(0, appPath.Length - 1);
}
return appPath;
}
}
}
Good luck,
Tom

Don't you want to achieve part of what is done here?
Something like
string host = Request.Url.IsDefaultPort ?
Request.Url.Host :
Request.Url.Authority;
If you want to persist with the old method change it like this
string GetPortalAlias()
{
var rawUrl = Request.Url.ToString();
var lowerTrimmedUrl = rawUrl.ToLower().Trim();
var withoutPortUrl = lowerTrimmedUrl.Replace(":80", "");
var withoutProtocolUrl = withoutPortUrl.Replace("http://", "");
var justHostUrl = withoutProtocolUrl.Remove(myURL.IndexOf("/"));
var evolution = new StringBuilder();
evolution.AppendFormat(
"{0}<br>",
HttpUtility.HtmlEncode(rawUrl));
evolution.AppendFormat(
"{0}<br>",
HttpUtility.HtmlEncode(lowerTrimmedUrl));
evolution.AppendFormat(
"{0}<br>",
HttpUtility.HtmlEncode(withoutPortUrl));
evolution.AppendFormat(
"{0}<br>",
HttpUtility.HtmlEncode(withoutProtocolUrl));
evolution.AppendFormat(
"{0}<br>",
HttpUtility.HtmlEncode(justHostUrl));
URLLabel.Text = evolution.ToString();
return justHostUrl;
}
So you can see whats going on.

What is the quickest way to get the absolute uri for the root of the app in asp.net?

What is the simplest way to get: http://www.[Domain].com in asp.net?
There doesn't seem to be one method which can do this, the only way I know is to do some string acrobatics on server variables or Request.Url. Anyone?

We can use Uri and his baseUri constructor :
new Uri(this.Request.Url, "/") for the root of the website
new Uri(this.Request.Url, this.Request.ResolveUrl("~/")) for the root of the website

You can do it like this:
string.Format("{0}://{1}:{2}", Request.Url.Scheme, Request.Url.Host, Request.Url.Port)
And you'll get the generic URI syntax <protocol>://<host>:<port>

You can use something like this.
System.Web.HttpContext.Current.Server.ResolveUrl("~/")
It maps to the root of the application. now if you are inside of a virtual directory you will need to do a bit more work.
Edit
Old posting contained incorrect method call!

I really like the way CMS handled this question the best, using the String.Format, and the Page.Request variables. I'd just like to tweak it slightly. I just tested it on one of my pages, so, i'll copy the code here:
String baseURL = string.Format(
(Request.Url.Port != 80) ? "{0}://{1}:{2}" : "{0}://{1}",
Request.Url.Scheme,
Request.Url.Host,
Request.Url.Port)

System.Web.UI.Page.Request.Url

this.Request.Url.Host

I use this property on Page to handle cases virtual directories and default ports:
string FullApplicationPath {
get {
StringBuilder sb = new StringBuilder();
sb.AppendFormat("{0}://{1}", Request.Url.Scheme, Request.Url.Host);
if (!Request.Url.IsDefaultPort)
sb.AppendFormat(":{0}", Request.Url.Port);
if (!string.Equals("/", Request.ApplicationPath))
sb.Append(Request.ApplicationPath);
return sb.ToString();
}
}

This method handles http/https, port numbers and query strings.
'Returns current page URL
Function fullurl() As String
Dim strProtocol, strHost, strPort, strurl, strQueryString As String
strProtocol = Request.ServerVariables("HTTPS")
strPort = Request.ServerVariables("SERVER_PORT")
strHost = Request.ServerVariables("SERVER_NAME")
strurl = Request.ServerVariables("url")
strQueryString = Request.ServerVariables("QUERY_STRING")
If strProtocol = "off" Then
strProtocol = "http://"
Else
strProtocol = "https://"
End If
If strPort <> "80" Then
strPort = ":" & strPort
Else
strPort = ""
End If
If strQueryString.Length > 0 Then
strQueryString = "?" & strQueryString
End If
Return strProtocol & strHost & strPort & strurl & strQueryString
End Function

I had to deal with something similar, I needed a way to programatically set the tag to point to my website root.
The accepted solution wasn't working for me because of localhost and virtual directories stuff.
So I came up with the following solution, it works on localhost with or without virtual directories and of course under IIS Websites.
string.Format("{0}://{1}:{2}{3}", Request.Url.Scheme, Request.Url.Host, Request.Url.Port, ResolveUrl("~")

Combining the best of what I've seen on this question so far, this one takes care of:
http and https
standard ports (80, 443) and non standard
application hosted in a sub-folder of the root
string url = String.Format(
Request.Url.IsDefaultPort ? "{0}://{1}{3}" : "{0}://{1}:{2}{3}",
Request.Url.Scheme, Request.Url.Host,
Request.Url.Port, ResolveUrl("~/"));

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to extract first segment out of URI using Uri class - c#

You need to use Uri.Scheme Property Uri baseUri = new Uri("http://www.contoso.com/"); Console.WriteLine(baseUri.Scheme); //http Uri anotherUri = new Uri("https://www.contoso.com/"); Console.WriteLine(anotherUri.Scheme); //https

Related

Remove additional slashes from URL

C# , splitting parts unclear

Get only the prefix from a host name in a given URL [duplicate]

String that came from Request.Url.ToString() misteriously changes to another string when manipulating/comparing the first characters

What is the quickest way to get the absolute uri for the root of the app in asp.net?

Categories

Resources