Remove anchor from URL in C# - c#

I'm trying to pull in an src value from an XML document, and in one that I'm testing it with, the src is:
<content src="content/Orwell - 1984 - 0451524934_split_2.html#calibre_chapter_2"/>
That creates a problem when trying to open the file. I'm not sure what that #(stuff) suffix is called, so I had no luck searching for an answer. I'd just like a simple way to remove it if possible. I suppose I could write a function to search for a # and remove anything after, but that would break if the filename contained a # symbol (or can a file even have that symbol?)
Thanks!

If you had the src in a string you could use
srcstring.Substring(0,srcstring.LastIndexOf("#"));
Which would return the src without the #. If the values you are retreiving are all web urls then this should work, the # is a bookmark in a url that takes you to a specific part of the page.

You should be OK assuming that URLs won't contain a "#"
The character "#" is unsafe and should
always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might
follow it.
Source (search for "#" or "unsafe").
Therefore just use String.Split() with the "#" as the split character. This should give you 2 parts. In the highly unlikely event it gives more, just discard the last one and rejoin the remainder.

From Wikipedia:
# is used in a URL of a webpage or other resource to introduce a "fragment identifier" – an id which defines a position within that resource. For example, in the URL http://en.wikipedia.org/wiki/Number_sign#Other_uses the portion after the # (Other_uses) is the fragment identifier, in this case indicating that the display should be moved to show the tag marked by ... in the HTML

It's not safe to remove de anchor of the url. What I mean is that ajax like sites make use of the anchor to keep track of the context. For example gmail. If you go to http://www.gmail.com/#inbox, you go directly to your inbox, but if you go to http://www.gmail.com/#all, you'll go to all your mail.
The server can give a different response based on the anchor, even if the response is a file.

Related

If an anchor tag is stored in Redis, does it cut off the closing anchor tag upon retrieval?

This seems to me like it would certainly be no, but I'm storing three links in three keys of a Redis hash. In Redis, the data is stored as < a href ="exampleurl.com">title</a> but when I retrieve the links into my C# project, it returns to the page as < a href ="exampleurl.com">title
(no closing tag.)
Is it safe to say that Redis doesn't randomly cut off the closing anchor tag from a given key or do I need to do something with the returned data to have it present to the user as a hyperlink?
Edit - I'm using the StackExchange.Redis namespace and to get/set the hash I'm using HashSet and HashGet on my Redis database instance. When viewing in a Redis GUI, the hyperlinks are all there, properly formatted. I'm setting an asp Literal and in the code behind doing
Link1 = cache.HashGet(finalUrl, "sidebar:link1");
this.uxLink1.Text = Link1;
uxLink1 is in the front-end, displayed to the user.
It's odd, I can append a random string like + "hello" to the Link1 variable but not </a> like how StackOverflow doesn't let you just write out the closing anchor bracket without making it a code sample.
The anchor tag stored in Redis has a space in between the opening bracket and the a like so "< a" This was the problem. When I take out the space between the two characters, the links load up without a problem. Never would have guessed that!

Regex to match URL / URI except when contained in an img tag

Credit to dfowler's excellent Jabbr project, I am borrowing code to embed linked content from user posts. The code is from here and uses a regex to extract URLs for additional processing and embedding.
In my case, I run the user posts through a markdown processor first, before attempting this embed. The markdown processor (MarkdownDeep) will, if the user formats the markdown correctly, transform any given image markdown into valid HTML img tag. That works great, however, using the embedded content providers will make the image appear twice, since it shows up validly from the markdown transform, then gets embedded as well afterwards.
So, I believe the solution to my problem lies in changing the regex to not match when the found URL is already contained within a valid img tag.
For ease of answering the regex so far is:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'"".,<>?«»“”‘’]))
I think I want to use negative look-ahead like in this answer to exclude the img, but I'm too poor at regex syntax to implement it myself.
NOTE: I want it to still match images if they just appear in the text. So http://www.example.com/sites/default/files/DellComputer.jpg would match
or in a hyperlink <a href='http://www.example.com/sites/default/files/DellComputer.jpg'> would match but <img src='http://www.example.com/sites/default/files/DellComputer.jpg'> would not.
Thanks for the help, I know some of you have savant-level regex talents, I just never could do them.
For the simple approach, just prepend
(?<!img.*)
to the beginning of your regex. It will match as it already does, but will reject it if img comes somewhere before it on the line. So, the entire regex:
(?<!img.*)(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'"".,<>?«»“”‘’]))
Again, not changed except a few characters on the beginning.
If you need it to be smarter about where the img is located on before it on the line, I would probably recommend using a tool other than regex.

Replacing URL encoded data with their symbols

Hi I am trying to format my Url in order for it to look more user friendly.So far I managed to replace spaces with "-" but it seems that there are special characters like # and : that display as encoded data.This is what I mean:
http://localhost:51208/Home/Details/C%23-in-Depth%2c-Second-Edition/BookId-3
The "#" symbol is displayed as %23 and the "," is displayed as %2c.I would like to be able to replace this encoding with their original symbols.
Does such a way exist?
Oh no, you totally don't want to replace it with #. This symbol has a special meaning in an url. It represents the fragment identifier and its value is never sent to the server. This basically means that if there's a # symbol in your url, everything that follows it gets truncated and never sent to the server. You may take a look at the following post to see what StackOverflow uses to format the slug in the question title. You could run your string through this replace function in order to make sure that no dangerous characters are left.
I would also recommend you reading the following blog post from Scott Hanselman where he covers the various scenarios you might encounter with IIS if you attempt to send special characters in the path portion of your url. I am quoting his conclusion here:
After ALL this effort to get crazy stuff in the Request Path, it's
worth mentioning that simply keeping the values as a part of the Query
String (remember WAY back at the beginning of this post?) is easier,
cleaner, more flexible, and more secure.
Just replace "#" with "sharp" and ":" with "-", you cannot just put those special characters in the url

How do I encode an URL?

When I run my project I get the url http://localhost:5973/PageToPageValuePass/Default.aspx I want to Encode the URL since sometimes I need to transfer data from page to page. When the urls are encoded then it increases the reliability.
Server.UrlEncode("http://www.google.com/c#");
I get this, but how do I use it to help me encode the url?
If your encoding parts of the path:
System.Uri.EscapeUriString("c#")
If your encoding 'arguments':
String.Format( "http://something/?test={0}", System.Uri.EscapeDataString("c#") );
try this
in ASP.NET
Server.UrlEncode("http://www.google.com/c#");
in WinForms using System.Web.dll
HttpUtility.UrlEncode("http://www.google.com/c#");
Url encoding is used to ensure that special symbols included in a url (most likely in a querystring) are not mistakenly interpreted as those used in the parsing and processing of a url. For example, the + symbol is used to indicate a space in a url. However, if you were intending for a + symbol to be a part of your querystring then you would want to encode that querystring before sending it to a browser.
For example. Imagine you have written a page that receives a math equation on the querystring and displays that equation on the page.
The url might be: http://yoursite.com/displayMath.aspx?equation=3+5
The + symbol in this case is intended to be a meaningful part of the equation. However, without a UrlEncode it would be interpreted as representing a space. Reading this value from the querystring on the receiving page would yield "3 5", which is not what was intended.
Instead of redirecting to that url directly, you would want to URL encode the request first. You might write the following code:
string equation = "3+5";
string url = String.Format(#"http://yoursite.com/displayMath.aspx?equation={0}", equation);
string encodedUrl = Server.UrlEncode(url);
Response.Redirect(encodedUrl);
This would ensure that a subsequent Request.Querystring["equation"] would receive the equation intact because any special symbols would first be encoded.
I'm not sure I understand your use case for encoding urls. If you could perhaps provide more information on what you are trying to achieve I will attempt to answer more fully. For now I hope that this information is useful.
say you want to create a link with some parameters you can use it as follows:
aspx:
Click Here
code behind:
myLink.Href = Page.ResolveClientUrl("~/MyPage.aspx") + "?id=" +
Server.UrlEncode("put here what ever you want to url encode");
Or as in your question:
myLink.Href = "http://www.google.com/")+Server.UrlEncode("C#");
this will put in html:
<a id="myLink" runat="server" target="_self" href="http://www.google.com/c+c%23">

Response.Redirect using ~ Path

I have a method that where I want to redirect the user back to a login page located at the root of my web application.
I'm using the following code:
Response.Redirect("~/Login.aspx?ReturnPath=" + Request.Url.ToString());
This doesn't work though. My assumption was that ASP.NET would automatically resolve the URL into the correct path. Normally, I would just use
Response.Redirect("../Login.aspx?ReturnPath=" + Request.Url.ToString());
but this code is on a master page, and can be executed from any folder level. How do I get around this issue?
I think you need to drop the "~/" and replace it with just "/", I believe / is the root
STOP RIGHT THERE! :-) unless you want to hardcode your web app so that it can only be installed at the root of a web site.
"~/" is the correct thing to use, but the reason that your original code didn't work as expected is that ResolveUrl (which is used internally by Redirect) tries to first work out if the path you are passing it is an absolute URL (e.g. "**http://server/**foo/bar.htm" as opposed to "foo/bar.htm") - but unfortunately it does this by simply looking for a colon character ':' in the URL you give it. But in this case it finds a colon in the URL you give in the ReturnPath query string value, which fools it - therefore your '~/' doesn't get resolved.
The fix is that you should be URL-encoding the ReturnPath value which escapes the problematic ':' along with any other special characters.
Response.Redirect("~/Login.aspx?ReturnPath=" + Server.UrlEncode(Request.Url.ToString()));
Additionally, I recommend that you (or anyone) never use Uri.ToString - because it gives a human-readable, more "friendly" version of the URL - not a necessarily correct one (it unescapes things). Instead use Uri.AbsoluteUri - like so:
Response.Redirect("~/Login.aspx?ReturnPath=" + Server.UrlEncode(Request.Url.AbsoluteUri));
you can resolve the URL first
Response.Redirect("~/Login.aspx);
and add the parameters after it got resolved.
What about using
Response.Redirect(String.Format("http://{0}/Login.aspx?ReturnPath={1}", Request.ServerVariables["SERVER_NAME"], Request.Url.ToString()));

Categories