ASP.NET MVC Alter Markup before Output

ASP.NET MVC Alter Markup before Output - c#

Excuse my limited knoweldge here.
In the past I have used Steve Sanderson's method to HTML encode by default at runtime: http://blog.stevensanderson.com/2007/12/19/aspnet-mvc-prevent-xss-with-automatic-html-encoding/
I have a need to alter img src and a href attributes before they are spat out in the user's browser. There is a solution using JavaScript but this is not ideal for several reasons. Intercepting the compiler is not an option because of unnecessarily using Response.Write for trivial HTML.
Is there something I can do with HTTP modules or the view engine?
Any thoughts?
UPDATE: I do not need to HTML encode the attributes but I do have a need to change them.
Cheers.

Use a response filter. Works with any ASP.NET project, including MVC. Should work even if you're using a different view engine, as it intercepts at a lower level.
Here's an actual example that strips whitespace:
https://web.archive.org/web/20211029043851/https://www.4guysfromrolla.com/articles/120308-1.aspx
I've used this before to rewrite links before sending to the client, but I can't find the code at the moment.

Related

What is the best way to filter bad HTML Content from Posts using AntiXSS Library?

I want to create an Asp.net Website and I want to prevent Cross Site Scripting. I have a page with Summernote (a WYSIWYG HTML Editor), which, when submittet, posts HTML Code to MVC ActionResult via form or Ajax Post.
This Method saves this Code in my Database as content/body of a message. On another Site, you can display the content, which shows formating things like Lists etc.
Because of security reasons i want to filter the content i recieve from client. I am using the AntiXSS Library from Microsoft.
A part of my MVC Code:
[ValidateInput(false), HttpPost, ValidateAntiForgeryToken]
public ActionResult CreateMessage(string subject, string body)
{
var cleanBody = Sanitizer.GetSafeHtmlFragment(body);
//do the Database thing here
}
The major problem is, that it kills my HTML Elements with tag, because it removes the src=""
should be:
<p><img src="data:image/png;base64,some/ultra/long/picture/code/here" data-filename="grafik.png"></p>
remaining:
<p><img src="" alt=""><img src=""></p>
What can i do to prevent this?
Is there a way to add an exception rule?
Is there an another better way?
How does it work?
Thanks for help!

There is no such thing anymore as the "AntiXSS Library". It used to be a separate library, but Microsoft moved it into .Net, so it's now under System.Web.Security.AntiXss.
The reason this is important is that you need a sanitizer. The way you are using AntiXss currently will take a list of html tags and a list of attributes to those tags, and will remove everything else from your html code. That's not very good for you, because you only want to remove javascript, regardless of tags or attributes. Let's take for example <a>, with its href attribute. You most probably want to allow your users to insert links, but you don't want them to be able to insert javascript via <a href="javascript: ...">. So you cannot filter out href for <a>, but if you leave it, your page will be vulnerable to XSS.
So you want a sanitizer that only removes javascript. In the original AntiXSS library there was a sanitizer, but when Microsoft moved it to .Net, the sanitizer was left out.
So in short, AntiXss will not help you with your current usecase.
You can find proper html sanitizers like for example Google Caja (client-side sanitizer here), or many others. The point is, even if this sanitizer is in javascript (on the client), if you carefully don't insert your data into the page DOM before sanitizing it, it will all be fine.
So in short, you could just save any data from the HTML editor to your database as is without any transformation (mind sql injection of course, but current data access technologies should have that covered), and then when such data is displayed, send it to the client without adding it to the page dom (like as json data for example, but properly encoded for json then of course!), then run your sanitizer that will remove any javascript, and then add it to the page.
The reason this is very good is because your wysiwyg html editor will likely have a preview screen. Don't forget to add sanitization to previews as well, otherwise the preview will be vulnerable to XSS. If sanitization was on the server, you would have to send the editor contents to the server, sanitize it and send it back to your user for preview - not very user-friendly.
Also note that many wysiwyg editors support hooking into their rendering and adding such a sanitizer. If an editor does not support this and does not have its own sanitizer, that cannot be made secure with regard to XSS.

Building SPA application. Is calling RenderBody necessary?

I'm building SPA application using Backbone.js and as its back-end I want to use ASP.NET Web API. I need only one page and this fact brings me a lot of confusion.
ApiController returns json response and as far as I understand there's no need in asp.net-specific views at all. Am I right?
Can I use plain html for my main page? Or should I use *.cshtml and put a call to RenderBody instead?
If choose the first option then how will I handle validation?
Thanks!

Well the trick is that if you want search engines to be able to index your page, or people to be able to share to Facebook with a custom icon/description, etc you'll need to serve back static HTML -- none of those bots are able to run your javascript to render the page as the browser does.
If you're uninterested in this, then yes, you can completely avoid RenderBody.

Localizing JavaScript strings in an ASP.NET Web Forms application

One of the apps I work on is a large ASP.NET 3.5 Web Forms app that is used in the U.S., Central and South Americas, and Europe. We're also starting to do more work with AJAX, and thus we need to settle on a strategy for localizing strings in our JavaScript controls.
For example, we have a control written as a jQuery plugin that formats data into a sortable table. The buttons and table columns need to be localizable.
We're currently using two different approaches to handle this scenario, but I'm not completely satisfied with either.
Write the bulk of the code in a jQuery plugin style, then place a script block on the .aspx page where we'll pull in values from a .resx file and feed them into the plugin code. Here's an example in pseudo code:
<script>
var view;
$(function() {
view = {
columnHeaders: {
productNumber = <%$ Resources:WidgetProductNumber_HeaderText %>,
productDescription = <%$ Resources:WidgetProductDescription_HeaderText %>
}
};
});
</script>
Place the JavaScript in plain .js files with custom tokens in place of strings. We have a handrolled HttpModule that will parse JavaScript files and replace the tokens with values from any existing .resx file whose file name matches the name of the JavaScript file being processed.
Both approaches have problems. I'd prefer to keep our JavaScript code separate from our .aspx pages to make it more unobtrusive and reusable.
The HttpModule approach is clever but a little opaque to developers. I'm also looking to implement a JavaScript bundler called Rejuicer, which is also written as an HttpModule, and getting these to work together seems like it would require customizing the open source code. I'd prefer to use the code as it's written so that we can upgrade it as the project progresses.
Are there any other tried-and-true strategies for approaching this problem in ASP.NET?

It seems that both approaches are a little more complex/cumbersome than necessary. Keep it simple.
1) Using an .ashx, custom http handler, or web service, create a .net object (anonymous, custom -- doesn't matter) that matches the client side JSON object.
2) Populate server side object's properties with the localized values
3) Set the response content type to text/json or text/javascript.
4) Using the JavaScriptSerializer class, serialize the object into the response stream.
From the client side, you have two options:
1) Use an AJAX call to the .ashx/handler/service to set your client side "view" object to the response JSON.
2) Create a script tag with the src="the/path/to/the/serviceOrHandler". In this case you would need to include the js variable declaration in your response output.
Let me know if you need a code sample.

I just stumbled onto this question, and I have another answer to throw into the ring. It isn't my work or anything, but it looks like a fairly elegant solution. It involves writing a localization handler to serve up ASP.NET resources to Javascript. Here are a couple of links:
http://weblog.west-wind.com/posts/2009/Apr/02/A-Localization-Handler-to-serve-ASPNET-Resources-to-JavaScript
http://www.tikalk.com/use-aspnet-resource-strings-within-javascript-files/

How can I use asp.net to generate and return an HTML document as a string (outside of a web context)

I need to write a system to generate HTML email from a data model -
I was going to create a templating system to build the model into an HTML representation using HTML 'fragments' stored in an xml template. But it occurs to me that these it might be better to use asp or asp.net than write my own templating system?
What I am wondering is whether/how it would be possible to use asp (maybe asp.net mvc?) to return an HTML string - I wouldn't be running on a web server, or in response to an HTTP request.
I have not done any asp or asp.net yet- My experience of ASP stretches to 'Create new project' in visual studio - but maybe now is a good time to learn!
Thank You!

The standard ASP.NET view engine--ASP.NET web forms--is very difficult to use in this way as it is pretty tied to the HttpContext and really don't want to give you a string back but rather stream into the HttpResponse. So you'd generally need IIS stood up to get it to go.
Xslt (as you are thinking) is a pretty decent option. As is, if things are simple enough, your own template replacement scheme. Now, if things are complex enough, some other options would include:
The Spark View Engine
The new Asp.NET Razor View Engine.
Either of those should let you get a string out of a template without too much trouble.

The simplest way is to make an aspx page that renders the email and then read it on the server using WebClient or and HttpWebRequest.
System.Net.WebClient oClient = new System.Net.WebClient();
string Email = oClient.DownloadString(UrlOfPage);
There are other ways to capture the output and I am sure if you search on Google you can find articles about this, but from personal experience this is the simplest way to go.
Also beware of the Html/Css limitations of many email clients. It is not the same as a browser.

Parsing HTML generated from Legacy ASP Application to create ASP.NET 2.0 Pages

One of my friends is working on having a good solution to generate aspx pages, out of html pages generated from a legacy asp application.
The idea is to run the legacy app, capture html output, clean the html using some tool (say HtmlTidy) and parse it/transform it to aspx, (using Xslt or a custom tool) so that existing html elements, divs, images, styles etc gets converted neatly to an aspx page (too much ;) ).
Any existing tools/scripts/utilities to do the same?

Here's what you do.
Define what the legacy app is supposed to do. Write down the scenarios of getting pages, posting forms, navigating, etc.
Write unit test-like scripts for the various scenarios.
Use the Python HTTP client library to exercise the legacy app in your various scripts.
If your scripts work, you (a) actually understand the legacy app, (b) can make it do the various things it's supposed to do, and (c) you can reliably capture the HTML response pages.
Update your scripts to capture the HTML responses.
You have the pages. Now you can think about what you need for your ASPX pages.
Edit the HTML by hand to make it into ASPX.
Write something that uses Beautiful Soup to massage the HTML into a form suitable for ASPX. This might be some replacement of text or tags with <asp:... tags.
Create some other, more useful data structure out of the HTML -- one that reflects the structure and meaning of the pages, not just the HTML tags. Generate the ASPX pages from that more useful structure.

Just found HTML agility pack to be useful enough, as they understand C# better than python.

I know this is an old question, but in a similar situation (50k+ legacy ASP pages that need to display in a .NET framework), I did the following.
Created a rewrite engine (HttpModule) which catches all incoming requests and looks for anything that is from the old site.
(in a separate class - keep things organized!) use WebClient or HttpRequest, etc to open a connection to the old server and download the rendered HTML.
Use the HTML agility toolkit (very slick) to extract the content that I'm interested in - in our case, this is always inside if a div with the class "bdy".
Throw this into a cache - a SQL table in this example.
Each hit checks the cache and either a)retrieves the page and builds the cache entry, or b) just gets the page from the cache.
An aspx page built specifically for displaying legacy content receives the rewrite request and displays the relevant content from the legacy page inside of an asp literal control.
The cache is there for performance - since the first request for a given page has a minimum of two hits - one from the browser to the new server, one from the new server to the old server - I store cachable data on the new server so that subsequent requests don't have to go back to the old server. We also cache images, css, scripts, etc.
It gets messy when you have to handle forms, cookies, etc, but these can all be stored in your cache and passed through to the old server with each request if necessary. I also store content expiration dates and other headers that I get back from the legacy server and am sure to pass those back to the browser when rendering the cached page. Just remember to take as content-agnostic an approach as possible. You're effectively building an in-page web proxy that lets IIS render old ASP the way it wants, and manipulating the output.
Works very well - I have all of the old pages working seamlessly within our ASP.NET app. This saved us a solid year of development time that would have been required if we had to touch every legacy asp page.
Good luck!

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.