Allowing and finding links while removing HTML

Allowing and finding links while removing HTML - c#

I recently asked a couple of questions on here related to two subjects
1) Stopping HTML that may be posted by a user in a text field to then render as HTMl on a web page
2) Detect links in a string and where they start and end
I am having problems trying to put the two together.
Over all, I have a text box that a user can type into. They are allowed to type in anything they want.
When posted to the server, I want to seek out all links that are in that text and save them to a database table. Then show on the webpage the text they have typed without any HTML except that I put in myself
So if they type www.google.com, i will turn it to http://www.google.com
I can do that no problem. However if they type something like <p style="margin-left:50px">www.google.com</p> it will find the link, change the link, but the web page will turn the margin bit into actual HTML.
I was recommended to use HTML encoding, however if I do it AFTER I have saved the links into the database, the indices are off (start and length of where the links are in the text).
If I do the HTML encoding BEFORE I save the links, the links may get messed up. If they type in
www.google.com
It will encode the text and the link my regex expression will find is
www.google.com">www.google.com</a&gt
I either need to improve my regex, or find another way
For reference my regex is
#"((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:#=.+?,##%&~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])"

If I understood this correctly, you need to display any other html tag the user may type in as-is. Try replacing the < and > characters with < and > respectively.
If you do this before you run the regex replace, it should sort out your issue.

Related

Include hyperlinks in description of phone call entity

My code generates a phone call activity in each customer lead, then records the SMS conversation between that customer and the company. The description may contain an URL to an image that the customer finds relevant.
I can put the URL as text in the description property, but I would like to transform it to a hyperlink (something like an <a> tag of html.) That way I can click to open it directly instead of copy pasting the URL first.
How can I achieve this?

The description field on a phone call is just a plain old text field, so you can't add any formatting or hyperlinks there.
You could make a separate field, single line of text, with a type of URL. Then the URL you input should act as a typical hyperlink.
The data types are documented at Create and Edit Fields.

I believe you're asking about being able to have a large text area, within CRM, that is editable, but allows you to enter, or at least click on, hyperlinks.
I see two supported solutions, but both would take a lot of customization.
Create an HTML webresource that loads the text from the field, parses it, looking for hyper links, and then add's the correct <a> tagging in order for the links to be clickable.
Search for a client side wiki markup Text Editor widget of some sort (possibly something like http://goessner.net/articles/wiky/ ?), and then format the hyper links with the correct markup.

None of CRM data types support native full blown html rendering. If you try to use JavaScript to update the value and try to render it as html, it would trigger a save to the database as CRM would see the attribute value as modified.

Parsing HTML - Getting the paragraph with the most text

I am trying to parse a HTML page (The page isn't known and changes often, however they are always news sites). Basically, I need to pull the news out of a bunch of code downloaded from the site, which i'm trying to do with a regex like this:
Match m = Regex.Match(x.Result, #"<p>(.+?)</p>");
Obvious bad idea - it pulls down anything tagged as a paragraph.
Any better ways to pull a news article or large body of text, separated from the code, from a website?

Well, this may not be exactly what you want (you haven't provided a lot of detail), but you can strip all tags from a page with a pair of simple regex's.
Remove javascript and CSS:
<(script|style).*?</\1>
Remove tags
<.*?>
Credit goes to this existing answer. What you will be left with is the "plain text" from the page.

Best method or control to display text from a file in an asp.net webpage

This may be a totally newbie question, but here it goes. I have a asp.net web page that I need to display text from a .txt file. I am trying to figure what would be the best control to do this with or the best method. I looked at using an iframe, but this does a very poor job of displaying the text from the file (for instance no word wrap for an iframe). I don't really expect anyone to solve this for me completely, but if you have any suggestions or know of any links to tutorials or explanations where someone has done this, I would be very greatful.
Thanks

You can for example add a Literal control, assign File.ReadAllLines("yourfile.txt") to the Text property and replace \r\n with <br />.

You should just read the text-file in code (using a streamreader for example). Once you have that text, just output it to your web page.
If you're using web forms you could place a label and then set the text of that label.
If you're using MVC you could put it in the ViewBag and then in your view output the value from the ViewBag (or use a custom viewmodel)

You could use a Literal or Label control. Make sure that the control that you use encodes the text in order to avoid XSS vulnerabilities (or encode the text manually if necessary).
It might as well be necessary to substitute line endings with <br/> tags.

Safe HTML in ASP.NET Controls

Im sure this is a common question...
I want the user to be able to enter and format a description.
Right now I have a multiline textbox that they can enter plain text into. It would be nice if they could do a little html formatting. Is this something I am going to have to handle? Parse out the input and only validate if there are "safe" tags like <ul><li><b> etc?
I am saving this description in an SQL db. In order to display this HTML properly do I need to use a literal on the page and just dump it in the proper area or is there a better control for what I am doing?
Also, is there a free control like the one on SO for user input/minor editing?

Have a look at the AntiXSS library. The current release (3.1) has a method called GetSafeHtmlFragment, which can be used to do the kind of parsing you're talking about.
A Literal is probably the correct control for outputting this HTML, as the Literal just outputs what's put into it and lets the browser render any HTML. Labels will output all the markup including tags.
The AJax Control Toolkit has a text editor.

Also, is there a free control like the
one on SO for user input/minor
editing?
Stackoverflow uses the WMD control and markdown as explained here:
https://blog.stackoverflow.com/2008/09/what-was-stack-overflow-built-with/

You will need to check what tags are entered to avoid Cross side scripting attacks etc. You could use a regex to check that any tags are on a 'whitelist' you have and strip out any others.
You can check out this link for a list of rich text editors.

In addition to the other answers, you will need to set ValidateRequest="false" in the #Page directive of the page that contains the textbox. This turns off the standard ASP.NET validation that prevents HTML from being posted from a textbox. You should then use your own validation routine, such as the one #PhilPursglove mentions.

Get a subsection of HTML document

I am trying to get a subsection of an HTML page. The functionality I am looking for is similar to the one implemented on most blogs. Usually, on the main page of the blog, you only see a section of the post, and when you click on the title you get the full content of that blog post.
There must be code that exists to get that subsection without breaking the HTML.
Does anyone know of good .NET code that does that?
EDIT: I need to keep the HTML formatting of the content, so stripping all the HTML isn't really an option. I wouldn't mind taking a fixed-length substring of the content (i.e. the first 800 characters or so) but then not breaking the HTML would be a nightmare.
Thanks!

I would strip the html first from the content string (How can I strip HTML tags from a string in ASP.NET?) then do a left on the resulting string.

Usually this works by taking a substring of the contents of that blog post before the blog post is rendered into html.

That wouldn't be done by cutting the page output directly (messing with the HTML).
Handle that with server-side code displaying a trim of the blog content.

Usually the way that's done isn't by chunking off a piece of the HTML. Rather, There's a database that contains the blog posts, and the Main page has it's own HTML/CSS which dynamically loads only the first X paragraphs of each blog post.

To my mind the "simplest thing that could possibly work" would be to scan the blog post that you want to summarize until you get to the first close-paragraph </p> tag.
Don't be tempted to scan the HTML with a regex.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Allowing and finding links while removing HTML - c#

If I understood this correctly, you need to display any other html tag the user may type in as-is. Try replacing the < and > characters with < and > respectively. If you do this before you run the regex replace, it should sort out your issue.

Related

Include hyperlinks in description of phone call entity

Parsing HTML - Getting the paragraph with the most text

Best method or control to display text from a file in an asp.net webpage

Safe HTML in ASP.NET Controls

Get a subsection of HTML document

Categories

Resources