URL display with proper output using System.Uri c# - c#

I have an application where in I have stored a lot of websites without validating them. Now I am validating the URL entered. But the already stored URL's are there as it is.
I want a strict display code that allows me to correct the user typos also and just gives the a proper URL to deal with.
The data that is already in the system has a lot of typos such as ...http://example.com or htp://example.com or ttp://example.com. I want the code to tackle that and come up with the proper url either by regexing the invalid part or making it correct.
That is the best approach to establish this?

You can obviously pick out the correct ones with a regex.
However, you will need to write your own logic to fix those that are 'broken'. You could pull these and with another regex and then simply search and replace the broken element. There are going to be limitations to this as you can only really check the protocol prefix and not the domain part itself.

Here is my try:
http(s)?://(www.)?[a-zA-Z0-9\-\.\\/]+
where [a-zA-Z0-9-.\/] includes all characters that you want to allow users to use.
P.S. please be aware that if you are using RegEx under C#, do not forget to use double \\ as otherwise your expression might not work properly.
Hope it gets you started.

Related

No luck with HTML decoding based on safe HTML tags (vb.net or c#)

I've spent quite a bit of time trying to figure out the best way to handle this. I'm HTML encoding rich text from untrusted user input prior to storing it in the database.
I've bounce back and forth between multiple discussions, and it seems the safest method is to:
HTML encode absolutely everything, and only decode based on a white/safe list prior to sending it back to the client.
However, I'm also seeing strong suggestions for using http://htmlagilitypack.codeplex.com/
This compares user input against your safe/white list.
I've read:
C# HtmlDecode Specific tags only
https://eksith.wordpress.com/2011/06/14/whitelist-santize-htmlagilitypack/
And really, about 10 other posts and have become frustrated because now I can't figure out the best way to handle this.
I've tried using regular expressions to use regex replace methods:
For Each tag In AcceptableTags.Split(CChar("|")).ToList()
pattern = "<" + "\s*/?\s*" + tag + ".*?" + ">"
Regex = New Regex(pattern)
input = Regex.Replace(input, pattern)
Next
This doesn't seems to work well at all.
Is there someone out there who has a tried and true method with an example implementation they wouldn't mind sharing? I'll take c# or vb.net.
Depends on your data. Whitelist on the initial validation is fine if, for example, you're trying to avoid HTML in a phone number. On the other hand, if you can't be specific about what's in and what's out then just leave it "raw".
It's highly unlikely that storing encoded data in a database is the correct thing to do.
Any system of even marginal complexity will have non-HTML clients it will have to serve data to. When you do have an HTML client, you need to escape the output appropriate to HTML. Same for XML. Similarly, if you decide today you like JSON better, you'll encode to that. CSV? No problem - put quotes around your values (and escape any quotes) in case they have commas. Use parameters when doing SQL. Get the idea?
TL;DR;
Whitelist input if you can
Saving specifically encoded data is probably wrong
Always, always, always escape appropriate to your output
Never try and do your own escaping - always use a trusted library. You will never do a good enough job.

Mixed character and integer based identifier best practice in web applications

Im developing a web application, in which I need to identify a certain page using an identifier.
Usually I would use a auto increment interger, which relates to the ID of the item in the DB.
Like this for example:
http://example.com/item/1
But I see more and more use of identifies like this (TinyUrl and YouTube):
http://example.com/item/1BHYQJh1
And I wonder, should I go for this solution?
What is the benefit, is it just to shorten the ID in case you get up to a really long interger?
Or is it to "hack proof" the soulution so that people cant "guess" the url by replacing 1 with 2.
I really appreciate the last one, I would like to add this extra security to my application. But does anyone know of any code snippets that does this exact thing?
Examples in C# would be great.
This is not really a programming issue, but...
I prefer 'nice' URLs and I am not alone, and to me plain numbers are nicer than 1BHY..., but YMMV.
The 'guessing' you mention is not relevant here. If the user is allowed to access /2 then it doesn't matter. If he is not allowed, then basing the security on obscure URLs is a poor choice. What if someone types the wrong value and stumbles upon page not meant for him.
If you need security, you need to check whether the current user is allowed to access the page at specified URL and act accordingly.
I don't understand what 'examples in C#' mean. These are URLs, they are not expressed in C#.
You could use Guid.NewGuid() to create a 'unique' identifier
Is a GUID unique 100% of the time?

Strategy for auto encoding text inputs?

To prevent my application from crashing with the error "A potentially dangerous Request.Form value was detected...", I just turned page validation off. I want to revisit this and solve it correctly.
Is there a good strategy for this? If people are entering '<' and '>', I think the only way to save their data is to encode it via Javacript. I have tried catching it in the code-behind, but it becomes too late. I am thinking of inheriting the textbox and auto encode/decode the input with client scripts. I also have to think of all the angle brackets that are already saved in my database.
Any suggestions or experience with this?
I get from your answer that you don't want your client to send you "dangerous" content, so its desirable to leave the page validation turned on, as a last line of defense, instead of turning it off and using Server.HtmlEncode on each user input value (you might miss one and it is a lot of work).
I would go for a javascript solution, for example you could use a library such as jQuery, and hook into the submit events of the forms, and tidy the input before submitting. Much cleaner than creating your own derived textbox.
For the users without javascript, or that try to "hack" your little script, sc#!w them, they will reach your last line of defense, and get an error.
It's best to think of the built-in page validation as a safety device that isn't applicable to all cases. There are more than a few times when it is completely impossible to do something with it turned on. In these cases we turn it off, and deal with the validation ourselves.
The most obvious case is that sometimes we actually do want to send big chunks of HTML to the server. Of course, doing so still has to be made secure, but "oh, that looks like a big chunk of HTML! throw a security exception!" obviously isn't the correct way to do that.
So, in these cases it's perfectly sensible to turn off page-validation and add your own server-side. It does mean that you have to think about just how this input will be used with a bit more scrutiny than before. Follow through the path of every datum input (not just those where you expect to see characters like <, and ensure that either it will never be sent back to the client unescaped, or that it is thoroughly inspected to guarantee safety.
You can escape dangerous chars before posting the data. Like this:
string = escape(string);
and then on the server side:
var stringVal = Server.UrlDecode(Request["string"]);
Something like that.
Have you considered using ,
Server.HtmlEncode(input)
There is no real need to do it in the client end using javascript. You can easily do it in the server side using the above technique.
And possibly be a duplicate of this question
/BB

Is there a simple way in C#/ASP.NET to validate that user input is a URL to guard against XSS attacks?

We've got an interstitial page that warns people when they're leaving our site. The trouble is it takes querystring parameters and blindly generates a page, thus it's vulnerable to XSS attacks. I've been tasked with fixing it and I want to do it right.
You should call Server.HtmlEncode to properly escape your generated HTML.
Yes, try this:
if(Uri.IsWellFormedUriString(url, UriKind.Absolute) && url.StartsWith("http"))
Response.Write(string.Format("{0}",
HttpUtility.HtmlEncode(url)));
So things not to do;
Use regex
Use HtmlEncode without thought.
Things to do;
Treat all input as untrusted.
Encode input before it is output. However make sure you're using the right type of encoding. If you put user input in an attribute then you use HtmlAttributeEncode, if it's just HTML then you use HtmlEncode, if you put it into JavaScript then it's JavaScriptEncode. If your javascript puts it into a div then it's HtmlEncode, followed by JavaScriptEncode.
Consider using AntiXSS which provides more encoding mechanisms and uses a safe list approach which is inherently safer.
Whitelist the exit URLs so people cannot use this page as an open referrer. Do not have a parameter which has the URL, rather have a GUID which looks up the URL from a database, session table or whatever.
(Disclosure : I own AntiXSS)
The best way is to get rid of the page entirely and just accept that its a website and make it act like a website. Websites link to other resources, it's why the web has over 200million sites instead of about a dozen.
Failing that, your best bet is to start with HtmlEncoding as a quick fix, and then replacing it with a lookup of ids to bring one to different sites.
But really, those "ZOMG you are leaving!" pages are horrible. They're even worse than the sites that open new tabs for every so-called "external" link.

Email templating engine

I am about to write a simple email manager for the site I'm working on (asp.net/c#); the site sends out various emails, like on account creation, news, some user actions, etc. So there will be some email templates with placeholders like [$FirstName] which will be replaced by actual values. Pretty standard stuff. I'm just wondering if someone can advise on existing code - again, i need something very simple, without many bells/whistles, and obviously with source code (and free)
Any ideas/comments will be highly appreciated!
Thanks,
Andrey
There are several threads on Stack Overflow about this already, but I ended up rolling my own solution from various suggestions there.
I used this FormatWith extension method to take care of simple templating, and then I made a basic Email base class to take care of common tasks, like pulling in an appropriate template and replacing all the requisite info, as well as providing a Send() method.
All the emails I need to send have their own subclass deriving from the base, and define things unique to them, such as TemplateText, BindingData, Recipients, and Subject. Having them each in their own class makes them very easy to unit test idependently of the rest of the app.
So that your app can work with these email classes without really caring which one it's using, it's also a good idea to implement an interface, with any shared methods (the only one I cared about was Send()), so then your app can instantiate whatever email class it wants and work with them in the same way. Maybe generics could be used, too, but this was what I came up with.
IEmail email = new MyEmailClass();
email.Send();
Edit: There are many more suggestions here: Can I set up HTML/Email Templates with ASP.NET?
I always do the following. Templates = text string with {#} placeholders. To use a template I load the string (from whatever store) and then call string.Format(template,param1,param2..)
Simple and works well. When you need something stronger you can move to a framework of some kind but string.format has always worked well for me.
note
Alison R's link takes this method to the next step using 3.5's anonymous types to great effect. If you are 3.5 I recommend using the FormatWith there (I will) otherwise this way works well.
Having just done this myself, there is some great information at: Sending Email Both in HTML and Plain Text. Best part is, you don't need anything other than .NET.
Essentially, you create an HTML page (AKA, your formatted e-mail) with the tags that you want to replace (in the case of this solution, tags will be in the format of: <%TAGNAME%>). You then utilize the information found at the above website to create a mail template with the tags filled with the appropriate data, and the injections will be done for you into your HTML template. Then, you just use the SMTP classes built into .NET and send the mail on its way. It's very simple and straightforward.
Let me know if you have any additional questions. Hope that helps!
If you are using ASP.NET, you already have a templating engine available to you. Simply create an ASP.NET page that will produce the results for you (using whatever controls, etc, etc) you want, as well as setting the ContentType of the response to the appropriate type (either text or HTML, depending on the email format)
Make sure that this url is not publically exposed.
Then, in your code, you would create an HttpWebRequest/HttpWebResponse or WebClient and then fetch the URL and get the contents. The ASP.NET engine will process the request and return your formatted results, which you can then email.
If you want something simpler, why not use a RegEx and match? Just make sure you have a fairly unique identifer for your fields (prefix and suffix, which you can guarantee will never be used, or, you can at least write an escape sequence for it) and you could easily use the Match method to do the replace.
The only "gotcha" to the RegEx approach is that if you have to do embedded templating, then that's going to require a little more work.

Categories