How to get website name from domain name? - c#

I fetch the domain from the URL as follows:
var uri = new Uri("Http://www.google.com");
var host = uri.Host;
//host ="www.google.com"
But I want only google.com in Host,
host = "google.com"

Given the accepted answer I guess the issue was not knowing how to manipulate strings rather than how to deal with uris... but for anyone else who ends up here:
The Uri class does not have this property so you will have to parse it yourself.
Presumably you do not know what the subdomain is before time so a simple replace may not be possible.
This is not trivial since the TLDs are so varied (http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains), and there maybe be multiple parts to the url (eg http://pre.subdomain.domain.co.uk).
You will have to decide exactly what you want to get and how complex you want the solution to be.
simple - do a string replace, see ekad's answer
medium - regex that works most of the time, see Strip protocol and subdomain from a URL
or complex - refer to a list of suffixes in order to figure out what is subdomain and what is domain eg
Get the subdomain from a URL

If host begins with "www.", you can replace "www." with an empty string using String.Replace Method like this:
var uri = new Uri("Http://www.google.com");
var host = uri.Host.ToLower();
if (host.StartsWith("www."))
{
host = host.Replace("www.", "");
}

Related

How to maintain the right URL in C#/ASP.NET?

I am given a code and on one of its pages which shows a "search result" after showing different items, it allows user to click on one of records and it is expected to bring up a page so that specific selected record can be modified.
However, when it is trying to bring up the page I get (by IE) "This page cannot be displayed".
It is obvious the URL is wrong because first I see something http://www.Something.org/Search.aspx then it turns into http://localhost:61123/ProductPage.aspx
I did search in the code and found the following line which I think it is the cause. Now, question I have to ask:
What should I do to avoid using a static URL and make it dynamic so it always would be pointing to the right domain?
string url = string.Format("http://localhost:61123/ProductPage.aspx?BC={0}&From={1}", barCode, "Search");
Response.Redirect(url);
Thanks.
Use HttpContext.Current.Request.Url in your controller to see the URL. Url contains many things including Host which is what you're looking for.
By the way, if you're using the latest .Net 4.6+ you can create the string like so:
string url = $"{HttpContext.Current.Request.Url.Host}/ProductPage.aspx?BC={barCode}&From={"Search"}";
Or you can use string.Format
string host = HttpContext.Current.Request.Url.Host;
string url = string.Format("{0}/ProductPage.aspx?BC={1}&From={2}"), host, barCode, "Search";
You can store the Host segment in your AppSettings section of your Web.Config file (per config / environment like so)
Debug / Development Web.Config
Production / Release Web.Config (with config override to replace the localhost value with something.org host)
and then use it in your code like so.
// Creates a URI using the HostUrlSegment set in the current web.config
Uri hostUri = new Uri(ConfigurationManager.AppSettings.Get("HostUrlSegment"));
// does something like Path.Combine(..) to construct a proper Url with the hostName
// and the other url segments. The $ is a new C# construct to do string interpolation
// (makes for readable code)
Uri fullUri = new Uri(hostUri, $"ProductPage.aspx?BC={barCode}&From=Search");
// fullUrl.AbosoluteUri will contain the proper Url
Response.Redirect(fullUri.AbsoluteUri);
The Uri class has a lot of useful properties and methods to give you Relative Url, AbsoluteUrl, your Url Fragments, Host name etc etc.
This should do it.
string url = string.Format("ProductPage.aspx?BC={0}&From={1}", barCode, "Search");
Response.Redirect(url);
If you are using .Net 4.6+ you can also use this string interpolation version
string url = $"ProductPage.aspx?BC={barcode}&From=Search";
Response.Redirect(url);
You should just be able to omit the hostname to stay on the current domain.

strip off prefix of URL's

I have got the following log of URL strings. The logs contain millions of records.
www.example.com/p1?q=k
example.com/p1?q=k
http://example.com/p1?q=k
https://example.com/p1?q=k
http://www.example.com/p1?q=k
I used the C# Uri class but it throws an excepition for format of type "example.com/p1?q=K"
I was wondering if there is a generally/standard accepted method for dealing with such different types of URL to get websitename & the relative URL.
P.S: I could strip off http:// & https:// by using a regex or string comparision, but curious to know if there are any elegant solutions
If you try it with your existing example it will not work.. however you can play around with this and do some appending code where needed which means you will need to create a few variables to store the http://, https://, and www.
System.Uri uriPre = new Uri ("http://www.example.com/p1?q=k");
string uriString = uriPre.Host + uriPre.PathAndQuery;
uriString = uriString.Replace("www.", "");
yields
"example.com/p1?q=k"
the rest of the coding you will have to figure out because only you would know when to utilize the different protocols base on the example I've provided
to expand on Alexei Levenkov answer here is an example that you can use to try to create a new Uri.
Uri tempValue;
var uriPre = new Uri(string.Empty, UriKind.Relative);
if (Uri.TryCreate("example.com/p1?q=k", UriKind.Relative, out tempValue))
{
// do something or retrun tempValue;
}
Uri it the class that is designed to deal with Uris
var noSchemaRelativeUri = new Uri("example.com/foo", UriKind.Relative);
Either UriBuilder or Uri(Uri base, Uri relative) can be used to construct absolute Uri.
To pick between relative and aboslute you can use Uri.TryCreate.
Note. "www.example.com" and "example.com" strictly speaking are unrelated domain names, converting one to another is not guaranteed to always produce registered domain name (also indeed most sites register both and do some sort of redirect between).

Extract domain with subdomain in side class

I have an ASP.NET 3.5 Web application with C# 2008.
What I need to do is, I want to extract full domain name in side a class method from Current URL.
For example :
I do have Current URL like :
http://subdomain.domain.com/pagename.aspx
OR
https://subdomain.domain.com/pagename.aspx?param=value&param2=value2
Then the result should be like,
http://subdomain.domain.com
OR
https://subdomain.domain.com
You're looking for the Uri class:
new Uri(str).GetLeftPart(UriPartial.Authority)
Create a Uri and query the Host property:
var uri = new Uri(str);
var host = uri.Host;
(Later)
I just realized that you want the scheme and the domain. In that case, #SLaks answer is the one you want. You could do it by combining the uri.Scheme and uri.Host, but that can get messy for things like mailto urls, etc.

Parsing string for Domain / hostName

Out customers can enter websites from domain names. They also can enter mailadresses from their contacts.
Know we need to find customers which websited whoose domain can be associated to the domains of the mailadresses.
So my idea is to extract the host from the webadress and from the url and compare them
So what's the most reliable algorithm to get the hostname from a url?
for example a host can be:
foo.com
www.foo.com
http://foo.com
https://foo.com
https://www.foo.com
The result should always be foo.com
Rather than relying on unreliable regex use System.Uri to do the parsing for you. Use a code like this:
string uriStr = "www.foo.com";
if (!uriStr.Contains(Uri.SchemeDelimiter)) {
uriStr = string.Concat(Uri.UriSchemeHttp, Uri.SchemeDelimiter, uriStr);
}
Uri uri = new Uri(uriStr);
string domain = uri.Host; // will return www.foo.com
Now to get just the top-level domain you can use:
string tld = uri.GetLeftPart( UriPartial.Authority ); // will return foo.com
Here's a regular expression that will match the url's you have provided. Basically http and https etc are optional, as is the www Everything is then matched up to a possible path;
var expression = /(https?:\/\/)?(www\.)?([^\/]*)(\/.*)?$/;
This would mean that;
var result = 'https://www.foo.com.vu/blah'.replace(expression, '$3')
Would evaluate to
result === 'foo.com.vu'
There is already a url parser in c# for extracting this information
Here are some examples http://www.stev.org/post/2011/06/27/C-HowTo-Parse-a-URL.aspx
See this url. The Host property, unlike the Authority will not include the port number.
http://msdn.microsoft.com/en-us/library/system.uri.host(v=vs.110).aspx

filter duplicate URLs domain from List c#

I have a list of 100,000 urls in list(Of string) which can contain urls in the form.
yahoo.com
http://yahoo.com
http://www.yahoo.com
i have tried using a combination of regex and the Uri class, but that didn't help, so i dumped the code. i also tried using this code, but it will only remove duplicatse of exact form, since its not domain specific.
list = new ArrayList<T>(new HashSet<T>(list))
How filter these duplicates and keep just one of these url if it contains the same name e.g yahoo.
thanks
[EDIT]
Please note that
all URL are of different domains, but can usually have duplicates like the example i gave above
also, am using .net 2.0, so i can't use linq
This worked for me
[TestMethod]
public void TestMethod1()
{
var sites = new List<string> {"yahoo.com", "http://yahoo.com", "http://www.yahoo.com"};
var result = sites.Select(
s =>
s.StartsWith("http://www.")
? s
: s.StartsWith("http://")
? "http://www." + s.Substring(7)
: "http://www." + s).Distinct();
Assert.AreEqual(1, result.Count());
}
I think the Uri Class would be able to help in this case. I am not at a VS machine where I can test; however, pass the Uri constructor the string of the Url, and try the Host property for comparison:
List<string> distinctHosts = new List<string>();
foreach (string url in UrlList)
{
Uri uri = new Uri(url)
if (! disctinctHosts.Contains(uri.Host))
{
distinctHosts.Add(uri.Host);
}
}
This feels a bit primitive, and could probably be more elegant - possibly without a foreach; but like I said, I'm not at a development machine where I could work with it.
I think this would be able to handle any variation of a valid Url. Building an ArrayList is not a good idea; in my opinion, Regex would require that you maintain some sort of custom 'MatchList' that could get unwieldy.
As #Damokles points out, you should have some form of validation. The Uri class does require a protocol: 'http://' or 'ftp://'. You do not want to assume 'badurl.com' is actually invalid; however:
if (!url.StartsWith("http://")) { /* add protocol */ } // then check Host domain as above
...should be sufficient simply to retrieve a distinct host or domain name. I recommend any option that does not require guessing the index position of any part of the Url as that is tightly bound to specific formats.
You can do this with the Uri class and Linq/extension methods. The trick is to normalize the Url before using it with the Uri class. Also note that the Uri class requires the scheme, so that will have to be added for ones where it's not present. You can use a different property of the Uri class to achieve different results. The example below returns all unique Urls and treats yahoo.com differently than www.yahoo.com.
string[] urls = new[] {
"yahoo.com",
"http://yahoo.com",
"http://www.yahoo.com" };
var unique = urls.
Select(url => new System.Uri(
url.StartsWith("http") ? url : "http://" + url).Host).
Distinct();
(Edited to clean up formatting and to make the scheme addition part support both "http://" and "https://")
Try a Regex then .*?(\w+\.\w+)$ assuming you don't have anything after the tld.

Categories