I'm having an unexpected behavior with the System.Uri class.
When an instance of System.Uri is created, and the UrlString has some patterns like ..., or ...#, or .#, the System.Uri removes all repeated . characters.
This is weird, but I believe this behavior is based on RFC 2396.
The problem begins when I try to download the HTML from this URL: http://www.submarino.com.br/produto/1/23853463/mundo+segundo+steve+jobs,+o:+as+frases+mais+inspiradoras+...
and the System.Uri removes all the repeated .s. As the web site doesn't recognize the "New URL," it redirects to the rriginal URL. Then a "System.Net.WebException: Too many automatic redirections were attempted" is thrown and the page is never reached.
How can I solve this issue?
You can use reflection to remove that particular attribute. Use this before your Uri call:
MethodInfo getSyntax = typeof(UriParser).GetMethod("GetSyntax", System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.NonPublic);
FieldInfo flagsField = typeof(UriParser).GetField("m_Flags", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic);
if (getSyntax != null && flagsField != null)
{
foreach (string scheme in new[] { "http", "https" })
{
UriParser parser = (UriParser)getSyntax.Invoke(null, new object[] { scheme });
if (parser != null)
{
int flagsValue = (int)flagsField.GetValue(parser);
// Clear the CanonicalizeAsFilePath attribute
if ((flagsValue & 0x1000000) != 0)
flagsField.SetValue(parser, flagsValue & ~0x1000000);
}
}
}
It has been reported to Connect before.
Related
I need a check that returns true for the following website urls:
I need to make sure that websites that start as www. pass as true. Also google.com should return true.
www.google.com
google.com
http://google.com
http://www.google.com
https://google.com
https://www.google.com
I have been using IsWellFormedUriString and haven't gotten anywhere. It keeps returning true. I also have used Uri.TryCreate and can't get it to work either. There is so much on Stack Overflow regarding this topic but none of them are working. I must be doing something wrong.
Here is my ValidateUrl function:
public static bool ValidateUrl(string url)
{
try
{
if (url.Substring(0, 3) != "www." && url.Substring(0, 4) != "http" && url.Substring(0, 5) != "https")
{
url = "http://" + url;
}
if (Uri.IsWellFormedUriString(url, UriKind.RelativeOrAbsolute))
{
Uri strUri = new Uri(url);
return true;
}
else
{
return false;
}
}
catch (Exception exc)
{
throw exc;
}
}
And I am calling it like this:
if (ValidateUrl(url) == false) {
validationErrors.Add(new Error()
{
fieldName = "url",
errorDescription = "Url is not in correct format."
});
}
It is returning true for htp:/google.com. I know there's a lot on this site regarding this topic but I have been trying to get this to work all day yesterday and nothing is working.
If you want your users to copy and paste from the db into the browser and enter a valid site, I think you should validate the url format
and at the same time verify the existence of the url
for example:
Uri.IsWellFormedUriString("http://www.google.com", UriKind.Absolute);
It will be true again how the URL is in the correct form.
WebRequest request = WebRequest.Create("http://www.google.com");
try
{
request.GetResponse();
}
catch (Exception ex)
{
throw ex;
}
An exception will return, if it is not possible to get the answer from the url
Hi.
If I understand your question correct then I would check it like that:
public static bool ValidateUrl(string url)
{
if (url.StartsWith("https://www.") || url.StartsWith("http://www.") || url.StartsWith("https://google.com") || url.StartsWith("http://google.com"))
{
return true;
}
else
{
return false;
}
}
Any domain name not google.com but with https://www. or http://www. returns true otherwise false.
If you want to test if an HTTP(S) url is good or not, you should use this :
(credit : stackoverflow.com/a/56116499/215552 )
Uri uriResult;
bool result = Uri.TryCreate(uriName, UriKind.Absolute, out uriResult)
&& (uriResult.Scheme == Uri.UriSchemeHttp || uriResult.Scheme == Uri.UriSchemeHttps);
So in your case :
public static boolean ValidateUrl(string url){
Uri uriResult;
return Uri.TryCreate(url, UriKind.Absolute, out uriResult)
&& (uriResult.Scheme == Uri.UriSchemeHttp || uriResult.Scheme == Uri.UriSchemeHttps);
}
// EDIT : try this :
public static bool ValidateUrl(string URL)
{
string Pattern = #"(http(s)?://)?([\w-]+\.)+[\w-]+[\w-]+[\.]+[\][a-z.]{2,3}$+([./?%&=]*)?";
Regex Rgx = new Regex(Pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
return Rgx.IsMatch(URL);
}
I got it working by writing a small helper method that uses Regex to validate the url.
The following URL's pass:
google.com
www.google.com
http://google.com
http://www.google.com
https://google.com/test/test
https://www.google.com/test
It fails on:
www.google.com/a bad path with white space/
Below is the helper method I created:
public static bool ValidateUrl(string value, bool required, int minLength, int maxLength)
{
value = value.Trim();
if (required == false && value == "") return true;
if (required && value == "") return false;
Regex pattern = new Regex(#"^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]#!\$&'\(\)\*\+,;=.]+$");
Match match = pattern.Match(value);
if (match.Success == false) return false;
return true;
}
This allows users to input any valid url, plus it accounts for bad url paths with white space which is exactly what I needed.
I have just started using Retrace by stackify to monitor my application and have seen thousands of errors which are:
System.FormatException: Guid should contain 32 digits with 4 dashes (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).
at System.Guid.TryParseGuidWithNoStyle
at System.Guid.TryParseGuid
at System.Guid..ctor
at System.DirectoryServices.AccountManagement.ADStoreCtx.IdentityClaimToFilter
These errors are happening thousands of times a day and I can't work out quite why. Firstly, my application works like so:
MVC front end - using Windows Authentication (using RestSharp to call backend)
Web API back end, using Windows Authentication passed from RestSharp NTLM Authentication.
RestSharp wrapper
public object WebRequest<T>(string controller, Dictionary<string, string> parameters, Method apiMethod, string action)
{
RestClient client = new RestClient(Url + controller + "/");
client.Authenticator = new NtlmAuthenticator();
RestRequest request = new RestRequest(action, apiMethod);
if (parameters != null && parameters.Count > 0)
{
foreach (var parameter in parameters)
{
request.AddParameter(parameter.Key, parameter.Value);
}
}
object result = JsonToObject<T>(client.Execute(request).Content);
return result;
}
Helper Methods
#helper Username()
{
PrincipalContext ctx = new PrincipalContext(ContextType.Domain);
var username = System.Web.HttpContext.Current.User.Identity.Name.Replace(#"DOMAIN\", "");
#username
}
#helper UserFullName()
{
using (var context = new PrincipalContext(ContextType.Domain))
{
var principal = UserPrincipal.FindByIdentity(context, User.Identity.Name);
if (principal != null)
{
var fullName = string.Format("{0}", principal.DisplayName);
#fullName
}
}
}
Any suggestions on where this error may be happening or what I can do to narrow it down? It seems to happen on every page from what I can tell in Stackify.
There is an overload of FindByIdentity that allows you to specify what the identityValue actually is, E.g.
Try UserPrincipal.FindByIdentity(context, IdentityType.SamAccountName, User.Identity.Name);
As GUID is valid option for this call it seems there is an issue when you use the non-specific overload & it presumably attempts to sniff what the value is.
In the picture above, I have Request Body of a POST request with FiddlerCore dll.
Here is how I capture it:
private void FiddlerApplication_AfterSessionComplete(Session sess)
{
string requestBody = "";
if (sess.oRequest != null)
{
if (sess.oRequest.headers != null)
{
requestBody = sess.GetRequestBodyAsString();
}
}
}
However, I would only need to capture it in the case it's parameters (2 last line on the picture) and in the other case I don't need to capture it.
I can filter it with string, it is what I do so far. However, what would be the proper way to do this?
NOTE: Each line on the picture is a different request, for a total of 5.
If there is no content type then ignore it. Figure out the ones you do want and take those.
private void FiddlerApplication_AfterSessionComplete(Session sess) {
if (sess == null || sess.oRequest == null || sess.oRequest.headers == null)
return;
// Ignore HTTPS connect requests or other non-POST requests
if (sess.RequestMethod == "CONNECT" || sess.RequestMethod != "POST")
return;
var reqHeaders = sess.oRequest.headers.ToString(); //request headers
// Get the content type of the request
var contentType = sess.oRequest["Content-Type"];
// Lets assume you have a List<string> of approved content types.
// Ignore requests that do not have a content type
// or are not in the approved list of types.
if(contentType != null && !approvedContent.Any(c => contentType.Containes(c))
return;
var reqBody = sess.GetRequestBodyAsString();//get the Body of the request
//...other code.
}
Are there any helper classes available in .NET to allow me to build a Url?
For example, if a user enters a string:
stackoverflow.com
and i try to pass that to an HttpWebRequest:
WebRequest.CreateHttp(url);
It will fail, because it is not a valid url (it has no prefix).
What i want is to be able to parse the partial url the user entered:
Uri uri = new Uri(url);
and then fix the missing pieces:
if (uri.Port == 0)
uri.Port = 3333;
if (uri.Scheme == "")
uri.Scheme = "https";
Does .NET have any classes that can be used to parse and manipulate Uri's?
The UriBuilder class can't do the job
The value that the user entered (e.g. stackoverflow.com:3333) is valid; i just need a class to pick it apart. i tried using the UriBuilder class:
UriBuilder uriBuilder = new UriBuilder("stackoverflow.com:3333");
unfortunately, the UriBuilder class is unable to handle URIs:
uriBuilder.Path = 3333
uriBuilder.Port = -1
uriBuidler.Scheme = stackoverflow.com
So i need a class that can understand host:port, which especially becomes important when it's not particularly http, but could be.
Bonus Chatter
Console application.
From the other question
Some examples of URL's that require parsing:
server:8088
server:8088/func1
server:8088/func1/SubFunc1
http://server
http://server/func1
http://server/func/SubFunc1
http://server:8088
http://server:8088/func1
http://server:8088/func1/SubFunc1
magnet://server
magnet://server/func1
magnet://server/func/SubFunc1
magnet://server:8088
magnet://server:8088/func1
magnet://server:8088/func1/SubFunc1
http://[2001:db8::1]
http://[2001:db8::1]:80
The format of a Url is:
foo://example.com:8042/over/there?name=ferret#nose
\_/ \_________/ \__/\_________/\__________/ \__/
| | | | | |
scheme host port path query fragment
Bonus Chatter
Just to point out again that UriBuilder does not work:
https://dotnetfiddle.net/s66kdZ
If you need to ensure that some string coming as user input is valid url you could use the Uri.TryCreate method:
Uri uri;
string someUrl = ...
if (!Uri.TryCreate(someUrl, UriKind.Absolute, out uri))
{
// the someUrl string did not contain a valid url
// inform your users about that
}
else
{
var request = WebRequest.Create(uri);
// ... safely proceed with executing the request
}
Now if on the other hand you want to be building urls in .NET there's the UriBuilder class specifically designed for that purpose. Let's take an example. Suppose you wanted to build the following url: http://example.com/path?foo=bar&baz=bazinga#some_fragment where the bar and bazinga values are coming from the user:
string foo = ... coming from user input
string baz = ... coming from user input
var uriBuilder = new UriBuilder("http://example.com/path");
var parameters = HttpUtility.ParseQueryString(string.Empty);
parameters["foo"] = foo;
parameters["baz"] = baz;
uriBuilder.Query = parameters.ToString();
uriBuilder.Fragment = "some_fragment";
Uri finalUrl = uriBuilder.Uri;
var request = WebRequest.Create(finalUrl);
... safely proceed with executing the request
You can use the UriBuilder class.
var builder = new UriBuilder(url);
builder.Port = 3333
builder.Scheme = "https";
var result = builder.Uri;
To be valid a URI needs to have the scheme component. "server:8088" is not a valid URI. "http://server:8088" is. See https://www.rfc-editor.org/rfc/rfc3986
There are the Uri.IsWellFormedUriString and Uri.TryCreate methods, but they seem to return true for file paths, etc.
How do I check whether a string is a valid (not necessarily active) HTTP URL for input validation purposes?
Try this to validate HTTP URLs (uriName is the URI you want to test):
Uri uriResult;
bool result = Uri.TryCreate(uriName, UriKind.Absolute, out uriResult)
&& uriResult.Scheme == Uri.UriSchemeHttp;
Or, if you want to accept both HTTP and HTTPS URLs as valid (per J0e3gan's comment):
Uri uriResult;
bool result = Uri.TryCreate(uriName, UriKind.Absolute, out uriResult)
&& (uriResult.Scheme == Uri.UriSchemeHttp || uriResult.Scheme == Uri.UriSchemeHttps);
This method works fine both in http and https. Just one line :)
if (Uri.IsWellFormedUriString("https://www.google.com", UriKind.Absolute))
MSDN: IsWellFormedUriString
Try that:
bool IsValidURL(string URL)
{
string Pattern = #"^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]#!\$&'\(\)\*\+,;=.]+$";
Regex Rgx = new Regex(Pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
return Rgx.IsMatch(URL);
}
It will accept URL like that:
http(s)://www.example.com
http(s)://stackoverflow.example.com
http(s)://www.example.com/page
http(s)://www.example.com/page?id=1&product=2
http(s)://www.example.com/page#start
http(s)://www.example.com:8080
http(s)://127.0.0.1
127.0.0.1
www.example.com
example.com
public static bool CheckURLValid(this string source)
{
Uri uriResult;
return Uri.TryCreate(source, UriKind.Absolute, out uriResult) && uriResult.Scheme == Uri.UriSchemeHttp;
}
Usage:
string url = "htts://adasd.xc.";
if(url.CheckUrlValid())
{
//valid process
}
UPDATE: (single line of code) Thanks #GoClimbColorado
public static bool CheckURLValid(this string source) => Uri.TryCreate(source, UriKind.Absolute, out Uri uriResult) && uriResult.Scheme == Uri.UriSchemeHttps;
Usage:
string url = "htts://adasd.xc.";
if(url.CheckUrlValid())
{
//valid process
}
All the answers here either allow URLs with other schemes (e.g., file://, ftp://) or reject human-readable URLs that don't start with http:// or https:// (e.g., www.google.com) which is not good when dealing with user inputs.
Here's how I do it:
public static bool ValidHttpURL(string s, out Uri resultURI)
{
if (!Regex.IsMatch(s, #"^https?:\/\/", RegexOptions.IgnoreCase))
s = "http://" + s;
if (Uri.TryCreate(s, UriKind.Absolute, out resultURI))
return (resultURI.Scheme == Uri.UriSchemeHttp ||
resultURI.Scheme == Uri.UriSchemeHttps);
return false;
}
Usage:
string[] inputs = new[] {
"https://www.google.com",
"http://www.google.com",
"www.google.com",
"google.com",
"javascript:alert('Hack me!')"
};
foreach (string s in inputs)
{
Uri uriResult;
bool result = ValidHttpURL(s, out uriResult);
Console.WriteLine(result + "\t" + uriResult?.AbsoluteUri);
}
Output:
True https://www.google.com/
True http://www.google.com/
True http://www.google.com/
True http://google.com/
False
After Uri.TryCreate you can check Uri.Scheme to see if it HTTP(s).
As an alternative approach to using a regex, this code uses Uri.TryCreate per the OP, but then also checks the result to ensure that its Scheme is one of http or https:
bool passed =
Uri.TryCreate(url, UriKind.Absolute, out Uri uriResult)
&& (uriResult.Scheme == Uri.UriSchemeHttp
|| uriResult.Scheme == Uri.UriSchemeHttps);
This would return bool:
Uri.IsWellFormedUriString(a.GetAttribute("href"), UriKind.Absolute)
Problem: Valid URLs should include all of the following “prefixes”: https, http, www
Url must contain http:// or https://
Url may contain only one instance of www.
Url Host name type must be Dns
Url max length is 100
Solution:
public static bool IsValidUrl(string webSiteUrl)
{
if (webSiteUrl.StartsWith("www."))
{
webSiteUrl = "http://" + webSiteUrl;
}
return Uri.TryCreate(webSiteUrl, UriKind.Absolute, out Uri uriResult)
&& (uriResult.Scheme == Uri.UriSchemeHttp
|| uriResult.Scheme == Uri.UriSchemeHttps) && uriResult.Host.Replace("www.", "").Split('.').Count() > 1 && uriResult.HostNameType == UriHostNameType.Dns && uriResult.Host.Length > uriResult.Host.LastIndexOf(".") + 1 && 100 >= webSiteUrl.Length;
}
Validated with Unit Tests
Positive Unit Test:
[TestCase("http://www.example.com/")]
[TestCase("https://www.example.com")]
[TestCase("http://example.com")]
[TestCase("https://example.com")]
[TestCase("www.example.com")]
public void IsValidUrlTest(string url)
{
bool result = UriHelper.IsValidUrl(url);
Assert.AreEqual(result, true);
}
Negative Unit Test:
[TestCase("http.www.example.com")]
[TestCase("http:www.example.com")]
[TestCase("http:/www.example.com")]
[TestCase("http://www.example.")]
[TestCase("http://www.example..com")]
[TestCase("https.www.example.com")]
[TestCase("https:www.example.com")]
[TestCase("https:/www.example.com")]
[TestCase("http:/example.com")]
[TestCase("https:/example.com")]
public void IsInvalidUrlTest(string url)
{
bool result = UriHelper.IsValidUrl(url);
Assert.AreEqual(result, false);
}
Note: IsValidUrl method should not validate any relative url path like example.com
See:
Should I Use Relative or Absolute URLs?
Uri uri = null;
if (!Uri.TryCreate(url, UriKind.Absolute, out uri) || null == uri)
return false;
else
return true;
Here url is the string you have to test.
I've created this function to help me with URL validation, you can customize it as you like, note this is written in python3.10.6
def url_validator(url: str) -> bool:
"""
use this func to filter out the urls to follow only valid urls
:param: url
:type: str
:return: True if the passed url is valid otherwise return false
:rtype: bool
"""
#the following regex is copied from Django source code
# to validate a url using regax
regex = re.compile(
r"^(?:http|ftp)s?://" # http:// or https://
r"(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|" # domain...
r"localhost|" # localhost...
r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})" # ...or ip
r"(?::\d+)?" # optional port
r"(?:/?|[/?]\S+)$",
re.IGNORECASE,
)
blocked_sites: list[str] = []
for site in blocked_sites:
if site in url or site == url:
return False
# if none of the above then ensure that the url is valid and then return True otherwise return False
if re.match(regex, url):
return True
return False