Comparing different urls for save domain - c#

Introduction:
I have a start url suppose www.example.com now i run scraper on this url to collect all the internal links belonging to same site and external links.
Problem:
I am using the code below to compare a found url with the main url www.example.com to see if they both have same domain so i take the url as internal url.
Uri baseUri = new Uri(url); //main URL
Uri myUri = new Uri(baseUri, strRef); //strRef is new found link
//domain = baseUri.Host;
domain = baseUri.Host.Replace("www.", string.Empty).Replace("http://", string.Empty).Replace("https://", string.Empty).Trim();
string domain2=myUri.Host.Replace("www.", string.Empty).Replace("http://", string.Empty).Replace("https://", string.Empty).Trim();
strRef = myUri.ToString();
if (domain2==(domain) )
{ //DO STUFF }
Is the above the correct logic ? Because if supposing i get a new url http://news.example.com the domain name found becomes : news.example.com which does not match the domain name of the main url . Is this correct?Should it match or not . And What is a better way if mine is not good enough.

here is a solution to find main domain from subdomain
string url = "http://www.xxx.co.uk";
string strRef = "http://www.news.xxx.co.uk";
Uri baseUri = new Uri(url); //main URL
Uri myUri = new Uri(baseUri, strRef); //strRef is new found link
var domain = baseUri.Host;
domain = baseUri.Host.Replace("www.", string.Empty).Replace("http://", string.Empty).Replace("https://", string.Empty).Trim();
//hrere is solution
string domain2 = GetDomainName(strRef);
strRef = myUri.ToString();
if (domain2 == (domain))
{ //DO STUFF
}
private static string GetDomainName(string url)
{
string domain = new Uri(url).DnsSafeHost.ToLower();
var tokens = domain.Split('.');
if (tokens.Length > 2)
{
//Add only second level exceptions to the < 3 rule here
string[] exceptions = { "info", "firm", "name", "com", "biz", "gen", "ltd", "web", "net", "pro", "org" };
var validTokens = 2 + ((tokens[tokens.Length - 2].Length < 3 || exceptions.Contains(tokens[tokens.Length - 2])) ? 1 : 0);
domain = string.Join(".", tokens, tokens.Length - validTokens, validTokens);
}
return domain;
}

Related

How can I get the current url protocol in the MVC controller?

This method returns the domain and port:
public Uri BuildUrl()
{
Uri domain = new Uri(Request.GetDisplayUrl());
return new Uri(domain.Host + (domain.IsDefaultPort ? "" : ":" + domain.Port));
}
But it does not include the "http://" or "https://" part. How can I get it?
You need to get the protocol from Scheme property of Uri.
public Uri BuildUrl()
{
Uri domain = new Uri(Request.GetDisplayUrl());
return new Uri(domain.Scheme + "://" + domain.Host + (domain.IsDefaultPort ? "" : ":" + domain.Port));
}
There is Scheme property
then you can use UriBuilder
var uriBuilder = new UriBuilder(domain.Host)
{
Scheme = domain.Scheme,
Port = domain.IsDefaultPort ? -1 : domain.Port
};
return uriBuilder.Uri;
but your method doesn't make any sense, why not use
public Uri BuildUrl() => new Uri(Request.GetDisplayUrl());

Find the applicant's current URl

I'm request for get url :
public Uri GetAbsoluteUri()
{
var request = _httpContextAccessor.HttpContext.Request;
UriBuilder uriBuilder = new UriBuilder();
uriBuilder.Scheme = request.Scheme;
uriBuilder.Host = request.Host.Host;
uriBuilder.Path = request.Path.ToString();
uriBuilder.Query = request.QueryString.ToString();
return uriBuilder.Uri;
}
public string RootPath => Path.Combine(WebRootPath, RootFolderName);
public string GetProductPicturePath()
{
return Path.Combine(GetAbsoluteUri().ToString(), RootFolderName, ProductPictureFolder);
}
public string GetProductMainPicturePath()
{
string path = Path.Combine(GetAbsoluteUri().ToString(), RootFolderName, ProductPictureFolder, ProductMainPictureFolder);
return path;
}
public string GetNewPath()
{
string productMainPicturePath = GetProductMainPicturePath();
return Path.Combine(productMainPicturePath);
}
finally i using the GetNewPath().
, but this will give me the address :
https://localhost/api/Product/GetProductList/Upload/ProductPictureFolder/ProductMainPicture/77777.png
but i have 2 problem with this url :
1 - it not contain port in url https://localhost/api but i need return like this : http://localhost:4200/api
2 - This includes the name of the controller and the ActionName, but I need to like this : https://localhost/Upload/ProductPictureFolder/ProductMainPicture/77777.png
but it return for me this : https://localhost/api/Product/GetProductList/Upload/ProductPictureFolder/ProductMainPicture/77777.png
i not need this /api/Product/GetProductList .
Product : Controller Name
GetProductList : ActionName
How Can I Solve This Problem ???
1 - it not contain port in url https://localhost/api but i need return like this
To get port you can use this snippet :
if (request.Host.Port.HasValue)
uriBuilder.Port = request.Host.Port.Value;
2 - This includes the name of the controller and the ActionName, but I
need to like this :
https://localhost/Upload/ProductPictureFolder/ProductMainPicture/77777.png
I suggest you that set UriBuilder Path based on your needs, not from the request. Something like this:
// Make your Upload file path here
var relativePath = Path.Combine(folderName, filename);
var request = _httpContextAccessor.HttpContext.Request;
var uriBuilder = new UriBuilder
{
Host = request.Host.Host,
Scheme = request.Scheme,
Path = relativePath
};
if (request.Host.Port.HasValue)
uriBuilder.Port = request.Host.Port.Value;
var imageUrl = uriBuilder.ToString();

How to get requesting page controller name in mvc [duplicate]

There's a lot of information for building Uris from Controller and Action names, but how can I do this the other way around?
Basically, all I'm trying to achieve is to get the Controller and Action names from the referring page (i.e. Request.UrlReferrer). Is there an easy way to achieve this?
I think this should do the trick:
// Split the url to url + query string
var fullUrl = Request.UrlReferrer.ToString();
var questionMarkIndex = fullUrl.IndexOf('?');
string queryString = null;
string url = fullUrl;
if (questionMarkIndex != -1) // There is a QueryString
{
url = fullUrl.Substring(0, questionMarkIndex);
queryString = fullUrl.Substring(questionMarkIndex + 1);
}
// Arranges
var request = new HttpRequest(null, url, queryString);
var response = new HttpResponse(new StringWriter());
var httpContext = new HttpContext(request, response)
var routeData = RouteTable.Routes.GetRouteData(new HttpContextWrapper(httpContext));
// Extract the data
var values = routeData.Values;
var controllerName = values["controller"];
var actionName = values["action"];
var areaName = values["area"];
My Visual Studio is currently down so I could not test it, but it should work as expected.
To expand on gdoron's answer, the Uri class has methods for grabbing the left and right parts of the URL without having to do string parsing:
url = Request.UrlReferrer.GetLeftPart(UriPartial.Path);
querystring = Request.UrlReferrer.Query.Length > 0 ? uri.Query.Substring(1) : string.Empty;
// Arranges
var request = new HttpRequest(null, url, queryString);
var response = new HttpResponse(new StringWriter());
var httpContext = new HttpContext(request, response)
var routeData = RouteTable.Routes.GetRouteData(new HttpContextWrapper(httpContext));
// Extract the data
var values = routeData.Values;
var controllerName = values["controller"];
var actionName = values["action"];
var areaName = values["area"];
To add to gdoran's accepted answer, I found that the action doesn't get populated if a custom route attribute is used. The following works for me:
public static void SetUpReferrerRouteVariables(HttpRequestBase httpRequestBase, ref string previousAreaName, ref string previousControllerName, ref string previousActionName)
{
// No referrer found, perhaps page accessed directly, just return.
if (httpRequestBase.UrlReferrer == null) return;
// Split the url to url + QueryString.
var fullUrl = httpRequestBase.UrlReferrer.ToString();
var questionMarkIndex = fullUrl.IndexOf('?');
string queryString = null;
var url = fullUrl;
if (questionMarkIndex != -1) // There is a QueryString
{
url = fullUrl.Substring(0, questionMarkIndex);
queryString = fullUrl.Substring(questionMarkIndex + 1);
}
// Arrange.
var request = new HttpRequest(null, url, queryString);
var response = new HttpResponse(new StringWriter());
var httpContext = new HttpContext(request, response);
var routeData = RouteTable.Routes.GetRouteData(new HttpContextWrapper(httpContext));
if (routeData == null) throw new AuthenticationRedirectToReferrerDataNotFoundException();
// Extract the data.
var previousValues = routeData.Values;
previousAreaName = previousValues["area"] == null ? string.Empty : previousValues["area"].ToString();
previousControllerName = previousValues["controller"] == null ? string.Empty : previousValues["controller"].ToString();
previousActionName = previousValues["action"] == null ? string.Empty : previousValues["action"].ToString();
if (previousActionName != string.Empty) return;
var routeDataAsListFromMsDirectRouteMatches = (List<RouteData>)previousValues["MS_DirectRouteMatches"];
var routeValueDictionaryFromMsDirectRouteMatches = routeDataAsListFromMsDirectRouteMatches.FirstOrDefault();
if (routeValueDictionaryFromMsDirectRouteMatches == null) return;
previousActionName = routeValueDictionaryFromMsDirectRouteMatches.Values["action"].ToString();
if (previousActionName == "") previousActionName = "Index";
}
Here is a lightweight way to do this without creating response objects.
var values = RouteDataContext.RouteValuesFromUri(Request.UrlReferrer);
var controllerName = values["controller"];
var actionName = values["action"];
Uses this custom HttpContextBase class
public class RouteDataContext : HttpContextBase {
public override HttpRequestBase Request { get; }
private RouteDataContext(Uri uri) {
var url = uri.GetLeftPart(UriPartial.Path);
var qs = uri.GetComponents(UriComponents.Query,UriFormat.UriEscaped);
Request = new HttpRequestWrapper(new HttpRequest(null,url,qs));
}
public static RouteValueDictionary RouteValuesFromUri(Uri uri) {
return RouteTable.Routes.GetRouteData(new RouteDataContext(uri)).Values;
}
}
#gordon's solution works, but you need to use
return RedirectToAction(actionName.ToString(), controllerName.ToString(),values);
if you want to go to previous action
The RouteData object can access this info:
var controller = RouteData.Values["controller"].ToString();
var action = RouteData.Values["action"].ToString();
This is a method I made to extract url simplified from referrer because I had token (finished with "))/") in my URL so you can extract easily controller and action from this:
private static string GetURLSimplified(string url)
{
string separator = "))/";
string callerURL = "";
if (url.Length > 3)
{
int index = url.IndexOf(separator);
callerURL = url.Substring(index + separator.Length);
}
return callerURL;
}
I don't believe there is any built-in way to retrieve the previous Controller/Action method call. What you could always do is wrap the controllers and action methods so that they are recorded in a persistent data store, and then when you require the last Controller/Action method, just retrieve it from the database (or whatever you so choose).
Why would you need to construct ActionLink from a url ? The purpose of ActionLink is just the opposite to make a url from some data. So in your page just do:
var fullUrl = Request.UrlReferrer.ToString();
Back

Most appropriate way of getting absolute URLs from crawled urls

Assume that i have root url as follow
http://www.monstermmorpg.com
Now i will show several url examples and how to get target
url1: http://www.monstermmorpg.com/
url2: http://www.monstermmorpg.com/Register#21312
url3: Register#21312
url4: /Register
url5: Register
url6: /Register?news=true&news2=true
// there may be more that goes to same url but i don't have full list atm
I need a function that will result following urls as following with help of root url
url1: http://www.monstermmorpg.com
url2: http://www.monstermmorpg.com/Register
url3: http://www.monstermmorpg.com/Register
url4: http://www.monstermmorpg.com/Register
url5: http://www.monstermmorpg.com/Register
url6: http://www.monstermmorpg.com/Register?news=true&news2=true
There is this method but i believe that is insufficient any better method ?
C# .net 4.5 WPF application
Uri baseUri= new Uri("http://www.contoso.com");
Uri myUri = new Uri(baseUri,"catalog/shownew.htm?date=today");
Console.WriteLine(myUri.AbsoluteUri);
static void Main(string[] args)
{
var baseUrl = "http://www.monstermmorpg.com";
var urls = new string[] {
"http://www.monstermmorpg.com/",
"http://www.monstermmorpg.com/Register#21312",
"Register#21312",
"/Register",
"Register",
"/Register?news=true&news2=true" };
var absoluteUrls = new List<string>();
foreach (var url in urls)
{
if (url.StartsWith("http"))
{
var uri = new Uri(url);
absoluteUrls.Add(uri.Host + uri.PathAndQuery);
}
else
{
var urlWithSlash = url;
if (!urlWithSlash.StartsWith("/"))
urlWithSlash = "/" + url;
var uri = new Uri(baseUrl + urlWithSlash);
absoluteUrls.Add(uri.Host + uri.PathAndQuery);
}
}
// Now absoluteUrls contains
//url1: http://www.monstermmorpg.com
//url2: http://www.monstermmorpg.com/Register
//url3: http://www.monstermmorpg.com/Register
//url4: http://www.monstermmorpg.com/Register
//url5: http://www.monstermmorpg.com/Register
//url6: http://www.monstermmorpg.com/Register?news=true&news2=true
}

Check if it is root domain in string

I'm new to C#,
lets say I have a string
string testurl = "http://www.mytestsite.com/hello";
if (test url == root domain) {
// do something
}
I want to check if that string "testurl" is the root domain i.e http://www.mytestsite.com or http://mytestsite.com etc.
Thanks.
Use the Uri class:
var testUrl = new Uri("http://www.mytestsite.com/hello");
if (testUrl.AbsolutePath== "/")
{
Console.WriteLine("At root");
}
else
{
Console.WriteLine("Not at root");
}
Which nicely deals with any normalization issues that may be required (e.g. treating http://www.mytestsite.com and http://www.mytestsite.com/ the same)
You may try like this:
string testurl = "http://www.mytestsite.com/hello"
if ( GetDomain.GetDomainFromUrl(testurl) == rootdomain) {
// do something
}
You can also try using URI.HostName property
The following example writes the host name (www.contoso.com) of the server to the console.
Uri baseUri = new Uri("http://www.contoso.com:8080/");
Uri myUri = new Uri(baseUri, "shownew.htm?date=today");
Console.WriteLine(myUri.Host);
If the hostname returned is equal to "http://mytestsite.com" you are done.
string testurl = "http://www.mytestsite.com/hello";
string prefix = testurl.Split(new String[] { "//" })[0] + "//";
string url = testurl.Replace(prefix, "");
string root = prefix + url.Split("/")[0];
if (testurl == root) {
// do something
}

Categories