I'm using asp.net/C# and I'm looking to create unique(?) uris for a small CMS system I am creating.
I am generating the uri segment from my articles title, so for example if the title is "My amazing article" the uri would be www.website.com/news/my-amazing-article
There are two parts to this. Firstly, which characters do you think I need to strip out? I am replacing spaces with "-" and I think I should strip out the "/" character too. Can you think of any more that might cause problems? "?" perhaps? Should I remove all non-alpha characters?
Second question, above I mentioned the uris MAY need to be unique. I was going to check the uri list before adding to ensure uniqueness, however I see stack overflow uses a number plus a uri. This I assume allows titles to be duplicated? Do you think this would be a better way?
Transform all diacritics into their base character and then strip anything that is not a letter or a digit using Char.IsLetterOrDigit.
Then replace all spaces by a single dash.
This is what we use in our software.
/// <summary>
/// Convert a name into a string that can be appended to a Uri.
/// </summary>
private static string EscapeName(string name)
{
if (!string.IsNullOrEmpty(name))
{
name = NormalizeString(name);
// Replaces all non-alphanumeric character by a space
StringBuilder builder = new StringBuilder();
for (int i = 0; i < name.Length; i++)
{
builder.Append(char.IsLetterOrDigit(name[i]) ? name[i] : ' ');
}
name = builder.ToString();
// Replace multiple spaces into a single dash
name = Regex.Replace(name, #"[ ]{1,}", #"-", RegexOptions.None);
}
return name;
}
/// <summary>
/// Strips the value from any non english character by replacing thoses with their english equivalent.
/// </summary>
/// <param name="value">The string to normalize.</param>
/// <returns>A string where all characters are part of the basic english ANSI encoding.</returns>
/// <seealso cref="http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net"/>
private static string NormalizeString(string value)
{
string normalizedFormD = value.Normalize(NormalizationForm.FormD);
StringBuilder builder = new StringBuilder();
for (int i = 0; i < normalizedFormD.Length; i++)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(normalizedFormD[i]);
if (uc != UnicodeCategory.NonSpacingMark)
{
builder.Append(normalizedFormD[i]);
}
}
return builder.ToString().Normalize(NormalizationForm.FormC);
}
Concerning using those generated name as unique Id, I would vouch against. Use the generated name as a SEO helper, but not as a key resolver. If you look at how stackoverflow references their pages:
http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net
^--ID ^--Unneeded name but helpful for bookmarks and SEO
You can find the ID there. These two URL point to the same page:
http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net
http://stackoverflow.com/questions/249087/
You want to consult IETF RFC 3986, which describes URIs and what is legal and not legal.
Beyond validity, maybe you want a readable URI, as well. In that case eliminate all non-alphanumeric characters.
In stackoverflow, the title is changeable, hence the use of the ID for a unique yet unchanging distinguisher for the URI. If you don't have changeable titles, then you should be ok just using the text. If you can edit titles after publication, then an id may be preferable.
For question 1: Rob Conery has a pretty useful Regex-based solution to cleaning strings for slug-generation. Here's the extension method (just add this to a static class):
public static string CreateSlug(this string source)
{
var regex = new Regex(#"([^a-z0-9\-]?)");
var slug = "";
if (!string.IsNullOrEmpty(source))
{
slug = source.Trim().ToLower();
slug = slug.Replace(' ', '-');
slug = slug.Replace("---", "-");
slug = slug.Replace("--", "-");
if (regex != null)
slug = regex.Replace(slug, "");
if (slug.Length * 2 < source.Length)
return "";
if (slug.Length > 100)
slug = slug.Substring(0, 100);
}
return slug;
}
For question 2, you could just place a UNIQUE constraint on the column in the database if you want them to be unique. This will allow you to trap the exception and provide useful user input. If you don't like that, then relying on the post identifier is probably a good alternative.
Related
I am writing a method which processes a large number of SQL procedures written by our previous SQL developer.
I am trying to search the files for the following strings CREATE VIEW, CREATE PROCEDURE, CREATE FUNCTION, CREATE TRIGGER.
The search for these strings in the file needs to be case-insensitive
and should match for any number of spaces between each element, e.g.
CREATE VIEW or CREATE VIEW.
When it finds a match it needs to replace the CREATE with CREATE OR ALTER.
The script shall ignore occurrences such as CREATE TABLE.
The script shall ignore occurrences such as CREATE OR ALTER PROCEDURE.
I started by writing a procedure to process the files line by line (this is because the text to search is always contained within the line), but I got stuck...
/// <summary>
/// This method process each individual line executing the replacement where necessary
/// </summary>
/// <param name="line"></param>
/// <returns></returns>
private static string ProcessLine(string line)
{
// how do I perform the logic here?
return line;
}
/// <summary>
/// This method will process each individual file and create a new file with the _new suffix
/// </summary>
/// <param name="file"></param>
public static void ProcessSqlFile(FileInfo file)
{
StringBuilder sb = new StringBuilder();
var lines = File.ReadAllLines(file.FullName);
for (var i = 0; i < lines.Length; i += 1)
{
sb.Append(ProcessLine(lines[i]));
sb.Append(Environment.NewLine);
}
var outputName = Path.Combine(file.DirectoryName, file.Name +"_new");
File.WriteAllText(outputName, sb.ToString());
}
static void Main(string[] args)
{
var inputPath = new DirectoryInfo(#"...");
var files = inputPath.GetFiles("*.sql");
foreach (var fileInfo in files)
{
ProcessSqlFile(fileInfo);
}
}
You may use Regular Expressions (AKA, Regex) for this. For example, you may use the following pattern:
\bcreate\s+(view|procedure|function|trigger)\b
..and replace with:
CREATE OR ALTER $1
Regex demo.
Regex pattern details:
\b - Ensure a word boundary (avoid matching partial words).
\s+ - Match one or more whitespace characters.
(view|procedure|function|trigger) - Match any of the listed words and capture it in group 1.
\b Ensure a word boundary.
Replacement:
CREATE OR ALTER - Literal string.
$1 - Whatever was captured in group 1.
Full C# example:
string input = "I am trying to search the files for the following strings " +
"CREATE VIEW, CREATE PROCEDURE, CREATE FUNCTION, CREATE TRIGGER";
string output = Regex.Replace(input, #"\bcreate\s+(view|procedure|function|trigger)\b",
#"CREATE OR ALTER $1", RegexOptions.IgnoreCase);
Console.WriteLine(output);
Try it online.
Disclaimer:
As GSerg and Charlieface indicated in the comments, this (and similar solutions) would match false positives in string literals. If you might have those, you'd be better off using an SQL parser as a regex pattern would be overly complicated, in this case, if we wish to cover all edge cases.
the solution that seems to me to be the simplest is not making use of regex, but plain text processing.
I tried to run the following method on a few sql files and it looked good to me.
private static string ProcessLine(string line)
{
if (!line.ToUpper().Contains("CREATE"))
{
return line;
}
var wordArray = line.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
for (var i = 0; i < wordArray.Length - 1; i++)
{
if (wordArray[i].ToUpper() != "CREATE" ||
(wordArray[i + 1].ToUpper() != "VIEW" && wordArray[i + 1].ToUpper() != "PROCEDURE" && wordArray[i + 1].ToUpper() != "FUNCTION" && wordArray[i + 1].ToUpper() != "TRIGGER")) continue;
return line.Replace("CREATE", "CREATE OR ALTER");
}
return line;
}
I found this question, which achieves what I am looking for, however I only have one problem: the "start" and "end" of the substring are the same character.
My string is:
.0.label unicode "Area - 110"
and I want to extract the text between the inverted commas ("Area - 110").
In the linked question, the answers are all using specific identifiers, and IndexOf solutions. The problem is that if I do the same, IndexOf will likely return the same value.
Additionally, if I use Split methods, the text I want to keep is not a fixed length - it could be one word, it could be seven; so I am also having issues specifying the indexes of the first and last word in that collection as well.
The problem is that if I do the same, IndexOf will likely return the same value.
A common trick in this situation is to use LastIndexOf to find the location of the closing double-quote:
int start = str.IndexOf('"');
int end = str.LastIndexOf('"');
if (start >= 0 && end > start) {
// We have two separate locations
Console.WriteLine(str.Substring(start+1, end-start-1));
}
Demo.
I would to it like this:
string str = ".0.label unicode \"Area - 110\"";
str = input.SubString(input.IndexOf("\"") + 1);
str = input.SubString(0, input.IndexOf("\""));
In fact, this is one of my most used helper methods/extensions, because it is quite versatile:
/// <summary>
/// Isolates the text in between the parameters, exclusively, using invariant, case-sensitive comparison.
/// Both parameters may be null to skip either step. If specified but not found, a FormatException is thrown.
/// </summary>
public static string Isolate(this string str, string entryString, string exitString)
{
if (!string.IsNullOrEmpty(entryString))
{
int entry = str.IndexOf(entryString, StringComparison.InvariantCulture);
if (entry == -1) throw new FormatException($"String.Isolate failed: \"{entryString}\" not found in string \"{str.Truncate(80)}\".");
str = str.Substring(entry + entryString.Length);
}
if (!string.IsNullOrEmpty(exitString))
{
int exit = str.IndexOf(exitString, StringComparison.InvariantCulture);
if (exit == -1) throw new FormatException($"String.Isolate failed: \"{exitString}\" not found in string \"{str.Truncate(80)}\".");
str = str.Substring(0, exit);
}
return str;
}
You'd use that like this:
string str = ".0.label unicode \"Area - 110\"";
string output = str.Isolate("\"", "\"");
My code should be translating a phrase into pig latin. Every word must have an "ay" at the end and every first letter of each word should be placed before "ay"
ex wall = "allway"
any ideas? this is the easiest way i could think of..
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace english_to_pig_latin
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("THIS IS A English to Pig Latin translator");
Console.WriteLine("ENTER Phrase");
string[] phrase = Console.ReadLine().Split(' ');
int words = phrase.Length;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < words; i++)
{
//to add ay in the end
/*sb.Append(phrase[i].ToString());
sb.Append("ay ");
Console.WriteLine(sb);*/
}
Console.ReadLine();
}
}
}
First you need to define your pig-latin rules. your description lacks real pig-latin rules. for instance, English "sharp" is correctly "Pig-Latinized" as 'arpshay', not 'harpsay', as your explanation above explained. (But i prefer to use 'arp-sh-ay' to facilitate reading of PigLatin as well as using hyphens make it possible to reverse translate back into English.) i suggest you first find some rules for Pig-Latin. Your start is a good start. Your code now separates a phrase into (almost) words. Note that your code will turn "Please, Joe" into "Please," and "Joe" tho, and you probably do not want that comma sent to your word-by-word translator.
when defining your rules, i suggest you consider how to Pig-Latin-ize these words:
hello --> 'ellohay' (a normal word),
string --> 'ingstray' ('str' is the whole consonant string moved to the end),
apple --> 'appleway', 'appleay', or 'appleyay', (depending on your dialect of Pig-Latin),
queen --> 'eenquay' ('qu' is the consonant string here),
yellow --> 'ellowyay' (y is consonant here),
rhythm --> 'ythmrhay' (y is vowel here),
sky --> 'yskay' (y is vowel here).
Note that for any word that starts with 'qu' (like 'queen'), this 'qu' is a special condition that needs handled too. Note that y is probably a consonant when it begins an English word, but a vowel when in the middle or at the end of a word.
The hyphenated Pig Latin versions of these words would be:
ello-h-ay, ing-str-ay, ('apple-way', 'apple-ay', or 'apple-yay'), 'een-qu-ay', 'ellow-y-ay', 'ythm-rh-ay', and 'y-sk-ay'. The hyphenation allows both easier reading as well as an ability to reverse the Pig Latin back into English by a computer parser. But unfortunately, many people just cram the Pig Latin word together without showing any hyphenation separation, so reversing the translation cannot be done simply without ambiguity.
Real pig-latin really goes by the sound of the word, not the spelling, so without a very complex word to phoneme system, this is way too difficult. but most (good) pig-latin writing translators handle the above cases and ignore other exceptions because English is really a very bad language when it comes to phonetically sounding out words.
So my first suggestion is get a set of rules. my 2nd suggestion is use two functions, PigLatinizePhrase() and PigLatinizeWord() where your PigLatinizePhrase() method parses a phrase into words (and punctuation), and calls PigLatinizeWord() for each word, excluding any punctuation. you can use a simple loop thru each character and test for char.IsLetter to determine if it's a letter or not. if it's a letter then add it to a string builder and move to the next letter. if it's not a letter and the string builder is not empty then send that word to your word parser to parse it, and then add the non-letter to your result. this would be your logic for your PigLatinizePhrase() method. Here is my code which does just that:
/// <summary>
/// </summary>
/// <param name="eng">English text, paragraphs, etc.</param>
/// <param name="suffixWithNoOnset">Used to differentiate between Pig Latin dialects.
/// Known dialects may use any of: "ay", "-ay", "way", "-way", "yay", or "-yay".
/// Cooresponding translations for 'egg' will yield: "eggay", "egg-ay", "eggway", "egg-way", "eggyay", "egg-yay".
/// Or for 'I': "Iay", "I-ay", "Iway", "I-way", "Iyay", "I-yay".
/// </param>
/// <returns></returns>
public static string PigLatinizePhrase(string eng, string suffixWithNoOnset = "-ay")
{
if (eng == null) { return null; } // don't break if null
var word = new StringBuilder(); // only current word, built char by char
var pig = new StringBuilder(); // pig latin text
char prevChar = '\0';
foreach (char thisChar in eng)
{
// the "'" test is so "I'll", "can't", and "Ashley's" will work right.
if (char.IsLetter(thisChar) || thisChar == '\'')
{
word.Append(thisChar);
}
else
{
if (word.Length > 0)
{
pig.Append(PigLatinizeWord(word.ToString(), suffixWithNoOnset));
word = new StringBuilder();
}
pig.Append(thisChar);
}
prevChar = thisChar;
}
if (word.Length > 0)
{
pig.Append(PigLatinizeWord(word.ToString(), suffixWithNoOnset));
}
return pig.ToString();
} // public static string PigLatinizePhrase(string eng, string suffixWithNoOnset = "-ay")
The suffixWithNoOnset variable is simply passed directly to the PigLatinizeWord() method and it determines exactly which 'dialect' of Pig Latin will be used. (See the XML comment before the method in the source code for more clarity.)
For the PigLatinizeWord() method, upon actually programming it, i found that it was very convenient to split this functionality into two methods, one method to parse the English word into the 2 parts that Pig Latin cares about, and another to actually do what is desired with those 2 parts, depending on which version of Pig Latin is desired. Here's the source code for these two functions:
/// <summary>
/// </summary>
/// <param name="eng">English word before being translated to Pig Latin.</param>
/// <param name="suffixWithNoOnset">Used to differentiate between Pig Latin dialects.
/// Known dialects may use any of: "ay", "-ay", "way", "-way", "yay", or "-yay".
/// Cooresponding translations for 'egg' will yield: "eggay", "egg-ay", "eggway", "egg-way", "eggyay", "egg-yay".
/// Or for 'I': "Iay", "I-ay", "Iway", "I-way", "Iyay", "I-yay".
/// </param>
/// <returns></returns>
public static string PigLatinizeWord(string eng, string suffixWithNoOnset = "-ay")
{
if (eng == null || eng.Length == 0) { return eng; } // don't break if null or empty
string[] onsetAndEnd = GetOnsetAndEndOfWord(eng);
// string h = string.Empty;
string o = onsetAndEnd[0]; // 'Onset' of first syllable that gets moved to end of word
string e = onsetAndEnd[1]; // 'End' of word, without the onset
bool hyphenate = suffixWithNoOnset.Contains('-');
// if (hyphenate) { h = "-"; }
var sb = new StringBuilder();
if (e.Length > 0) { sb.Append(e); if (hyphenate && o.Length > 0) { sb.Append('-'); } }
if (o.Length > 0) { sb.Append(o); if (hyphenate) { sb.Append('-'); } sb.Append("ay"); }
else { sb.Append(suffixWithNoOnset); }
return sb.ToString();
} // public static string PigLatinizeWord(string eng)
public static string[] GetOnsetAndEndOfWord(string word)
{
if (word == null) { return null; }
// string[] r = ",".Split(',');
string uppr = word.ToUpperInvariant();
if (uppr.StartsWith("QU")) { return new string[] { word.Substring(0,2), word.Substring(2) }; }
int x = 0; if (word.Length <= x) { return new string[] { string.Empty, string.Empty }; }
if ("AOEUI".Contains(uppr[x])) // tests first letter/character
{ return new string[] { word.Substring(0, x), word.Substring(x) }; }
while (++x < word.Length)
{
if ("AOEUIY".Contains(uppr[x])) // tests each character after first letter/character
{ return new string[] { word.Substring(0, x), word.Substring(x) }; }
}
return new string[] { string.Empty, word };
} // public static string[] GetOnsetAndEndOfWord(string word)
I have written a PigLatinize() method in JavaScript before, which was a lot of fun for me. :) I enjoyed making my C# version with more features, giving it the ability to translate to 6 varyious 'dialects' of Pig Latin, especially since C# is my favorite (programming) language. ;)
I think you need this transformation: phrase[i].Substring(1) + phrase[i][0] + "ay"
I'm writing automatic e-mailer. It has to scan database every X minutes and email people with reminders etc.
I have all underlying code ready. All I need now is to format emails.
Is there any predefined templating system in C# so I can create a folder with different templates and eg. tags such as {NAME} so I just find those and replace it.
I can do it manually with opening a *.txt document and replacing those specific tags etc, however is there anything smarter? I wouldn't want to reinvent the wheel.
I'd look at using StringTemplate: http://www.stringtemplate.org/
You can do it with MVC 3's Razor templates, even in non-web applications.
An Internet search for Razor templates non-web will turn up many examples.
It's not too difficult to write from scratch. I wrote this quick utility to do exactly what you described. It looks for tokens in the pattern {token} and replaces them with the value that it retrieves from the NameValueCollection. Tokens in the string correspond to keys in the collection which get replaced out for the value of the key in the collection.
It also has the added bonus of being simple enough to customize exactly as you need it.
public static string ReplaceTokens(string value, NameValueCollection tokens)
{
if (tokens == null || tokens.Count == 0 || string.IsNullOrEmpty(value)) return value;
string token = null;
foreach (string key in tokens.Keys)
{
token = "{" + key + "}";
value = value.Replace(token, tokens[key]);
}
return value;
}
USAGE:
public static bool SendEcard(string fromName, string fromEmail, string toName, string toEmail, string message, string imageUrl)
{
var body = GetEmailBody();
var tokens = new NameValueCollection();
tokens["sitedomain"] = "http://example.com";
tokens["fromname"] = fromName;
tokens["fromemail"] = fromEmail;
tokens["toname"] = toName;
tokens["toemail"] = toEmail;
tokens["message"] = message;
tokens["image"] = imageUrl;
var msg = CreateMailMessage();
msg.Body = StringUtility.ReplaceTokens(body, tokens);
//...send email
}
You can use the nVelocity
string templateDir = HttpContext.Current.Server.MapPath("Templates");
string templateName = "SimpleTemplate.vm";
INVelocityEngine fileEngine =
NVelocityEngineFactory.CreateNVelocityFileEngine(templateDir, true);
IDictionary context = new Hashtable();
context.Add(parameterName , value);
var output = fileEngine.Process(context, templateName);
If you are using ASP.NET 4 you can download RazorMail from the Nuget Gallery. It allows creation of emails using the Razor View Engine outwith the context of an MVC http request.
More details can be found via the following links...
http://www.nuget.org/List/Packages/RazorMail
https://github.com/wduffy/RazorMail/
I use this alot, plug in a Regex and a method that selects the replacement value based on the match.
/// </summary>
/// <param name="input">The text to perform the replacement upon</param>
/// <param name="pattern">The regex used to perform the match</param>
/// <param name="fnReplace">A delegate that selects the appropriate replacement text</param>
/// <returns>The newly formed text after all replacements are made</returns>
public static string Transform(string input, Regex pattern, Converter<Match, string> fnReplace)
{
int currIx = 0;
StringBuilder sb = new StringBuilder();
foreach (Match match in pattern.Matches(input))
{
sb.Append(input, currIx, match.Index - currIx);
string replace = fnReplace(match);
sb.Append(replace);
currIx = match.Index + match.Length;
}
sb.Append(input, currIx, input.Length - currIx);
return sb.ToString();
}
Example Usage
Dictionary<string, string> values = new Dictionary<string, string>();
values.Add("name", "value");
TemplateValues tv = new TemplateValues(values);
Assert.AreEqual("valUE", tv.ApplyValues("$(name:ue=UE)"));
/// <summary>
/// Matches a makefile macro name in text, i.e. "$(field:name=value)" where field is any alpha-numeric + ('_', '-', or '.') text identifier
/// returned from group "field". the "replace" group contains all after the identifier and before the last ')'. "name" and "value" groups
/// match the name/value replacement pairs.
/// </summary>
class TemplateValues
{
static readonly Regex MakefileMacro = new Regex(#"\$\((?<field>[\w-_\.]*)(?<replace>(?:\:(?<name>[^:=\)]+)=(?<value>[^:\)]*))+)?\)");
IDictionary<string,string> _variables;
public TemplateValues(IDictionary<string,string> values)
{ _variables = values; }
public string ApplyValues(string template)
{
return Transform(input, MakefileMacro, ReplaceVariable);
}
private string ReplaceVariable(Match m)
{
string value;
string fld = m.Groups["field"].Value;
if (!_variables.TryGetValue(fld, out value))
{
value = String.Empty;
}
if (value != null && m.Groups["replace"].Success)
{
for (int i = 0; i < m.Groups["replace"].Captures.Count; i++)
{
string replace = m.Groups["name"].Captures[i].Value;
string with = m.Groups["value"].Captures[i].Value;
value = value.Replace(replace, with);
}
}
return value;
}
}
(Even though you've ticked an answer, for future reference) - I got some amazing responses on my question of a similar nature: Which approach to templating in C sharp should I take?
Is there any .Net library to remove all problematic characters of a string and only leave alphanumeric, hyphen and underscore (or similar subset) in an intelligent way? This is for using in URLs, file names, etc.
I'm looking for something similar to stringex which can do the following:
A simple prelude
"simple English".to_url =>
"simple-english"
"it's nothing at all".to_url =>
"its-nothing-at-all"
"rock & roll".to_url =>
"rock-and-roll"
Let's show off
"$12 worth of Ruby power".to_url =>
"12-dollars-worth-of-ruby-power"
"10% off if you act now".to_url =>
"10-percent-off-if-you-act-now"
You don't even wanna trust Iconv for this next part
"kick it en Français".to_url =>
"kick-it-en-francais"
"rock it Español style".to_url =>
"rock-it-espanol-style"
"tell your readers 你好".to_url =>
"tell-your-readers-ni-hao"
You can try this
string str = phrase.ToLower(); //optional
str = str.Trim();
str = Regex.Replace(str, #"[^a-z0-9\s_]", ""); // invalid chars
str = Regex.Replace(str, #"\s+", " ").Trim(); // convert multiple spaces into one space
str = str.Substring(0, str.Length <= 400 ? str.Length : 400).Trim(); // cut and trim it
str = Regex.Replace(str, #"\s", "-");
Perhaps this question here can help you on your way. It gives you code on how Stackoverflow generates its url's (more specifically, how question names are turned into nice urls.
Link to Question here, where Jeff Atwood shows their code
From your examples, the closest thing I've found (although I don't think it does everything that you're after) is:
My Favorite String Extension Methods in C#
and also:
ÜberUtils - Part 3 : Strings
Since neither of these solutions will give you exactly what you're after (going from the examples in your question) and assuming that the goal here is to make your string "safe", I'd second Hogan's advice and go with Microsoft's Anti Cross Site Scripting Library, or at least use that as a basis for something that you create yourself, perhaps deriving from the library.
Here's a link to a class that builds a number of string extension methods (like the first two examples) but leverages Microsoft's AntiXSS Library:
Extension Methods for AntiXss
Of course, you can always combine the algorithms (or similar ones) used within the AntiXSS library with the kind of algorithms that are often used in websites to generate "slug" URL's (much like Stack Overflow and many blog platforms do).
Here's an example of a good C# slug generator:
Improved C# Slug Generator
You could use HTTPUtility.UrlEncode, but that would encode everything, and not replace or remove problematic characters. So your spaces would be + and ' would be encoded as well. Not a solution, but maybe a starting point
If the goal is to make the string "safe" I recommend Mirosoft's anti-xss libary
There will be no library capable of what you want since you are stating specific rules that you want applied, e.g. $x => x-dollars, x% => x-percent. You will almost certainly have to write your own method to acheive this. It shouldn't be too difficult. A string extension method and use of one or more Regex's for making the replacements would probably be quite a nice concise way of doing it.
e.g.
public static string ToUrl(this string text)
{
return text.Trim().Regex.Replace(text, ..., ...);
}
Something the Ruby version doesn't make clear (but the original Perl version does) is that the algorithm it's using to transliterate non-Roman characters is deliberately simplistic -- "better than nothing" in both senses. For example, while it does have a limited capability to transliterate Chinese characters, this is entirely context-insensitive -- so if you feed it Japanese text then you get gibberish out.
The advantage of this simplistic nature is that it's pretty trivial to implement. You just have a big table of Unicode characters and their corresponding ASCII "equivalents". You could pull this straight from the Perl (or Ruby) source code if you decide to implement this functionality yourself.
I'm using something like this in my blog.
public class Post
{
public string Subject { get; set; }
public string ResolveSubjectForUrl()
{
return Regex.Replace(Regex.Replace(this.Subject.ToLower(), "[^\\w]", "-"), "[-]{2,}", "-");
}
}
I couldn't find any library that does it, like in Ruby, so I ended writing my own method. This is it in case anyone cares:
/// <summary>
/// Turn a string into something that's URL and Google friendly.
/// </summary>
/// <param name="str"></param>
/// <returns></returns>
public static string ForUrl(this string str) {
return str.ForUrl(true);
}
public static string ForUrl(this string str, bool MakeLowerCase) {
// Go to lowercase.
if (MakeLowerCase) {
str = str.ToLower();
}
// Replace accented characters for the closest ones:
char[] from = "ÂÃÄÀÁÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöøùúûüýÿ".ToCharArray();
char[] to = "AAAAAACEEEEIIIIDNOOOOOOUUUUYaaaaaaceeeeiiiidnoooooouuuuyy".ToCharArray();
for (int i = 0; i < from.Length; i++) {
str = str.Replace(from[i], to[i]);
}
// Thorn http://en.wikipedia.org/wiki/%C3%9E
str = str.Replace("Þ", "TH");
str = str.Replace("þ", "th");
// Eszett http://en.wikipedia.org/wiki/%C3%9F
str = str.Replace("ß", "ss");
// AE http://en.wikipedia.org/wiki/%C3%86
str = str.Replace("Æ", "AE");
str = str.Replace("æ", "ae");
// Esperanto http://en.wikipedia.org/wiki/Esperanto_orthography
from = "ĈĜĤĴŜŬĉĝĥĵŝŭ".ToCharArray();
to = "CXGXHXJXSXUXcxgxhxjxsxux".ToCharArray();
for (int i = 0; i < from.Length; i++) {
str = str.Replace(from[i].ToString(), "{0}{1}".Args(to[i*2], to[i*2+1]));
}
// Currencies.
str = new Regex(#"([¢€£\$])([0-9\.,]+)").Replace(str, #"$2 $1");
str = str.Replace("¢", "cents");
str = str.Replace("€", "euros");
str = str.Replace("£", "pounds");
str = str.Replace("$", "dollars");
// Ands
str = str.Replace("&", " and ");
// More aesthetically pleasing contractions
str = str.Replace("'", "");
str = str.Replace("’", "");
// Except alphanumeric, everything else is a dash.
str = new Regex(#"[^A-Za-z0-9-]").Replace(str, "-");
// Remove dashes at the begining or end.
str = str.Trim("-".ToCharArray());
// Compact duplicated dashes.
str = new Regex("-+").Replace(str, "-");
// Let's url-encode just in case.
return str.UrlEncode();
}