replacing text with cleaned word case insensitive c# - c#

I have a list of bad words, that if found in the text string, will be replaced by a cleaned word.
eg. badwords{woof} is replaced by w$$f
But is currently only working when the array list is in the same case as the matched word in the sentence.
var badWords = new List<string>{"woof", "meow"}
var string = "I have a cat named meow and a dog name Woof."
Should become === "I have a cat named m$$w and a dog name W$$f"
public string CensorText(string text)
{
if (string.IsNullOrWhiteSpace(text))
{
return text;
}
foreach (string word in CensoredWords)
{
text = text.Replace(word, WordCleaner(word));
}
return text;
}
private static string WordCleaner(string wordToClean)
{
string firstChar = wordToClean.Substring(0,1);
string lastChar = wordToClean.Substring(wordToClean.Length - 1);
string centerHash = new string('$', wordToClean.Length-2);
return string.Concat(firstChar, centerHash, lastChar);
}
How can make it so that its case insensitive when looping through the words and cleaning them. Simpler the answer is better.

Try replacing:
text = text.Replace(word, WordCleaner(word));
with
text = text.Replace(word.ToLower(), WordCleaner(word));
This converts any upper case letter to a lower case one.
Edit
I've realised that I've made the wrong variable into lower case.
Change:
public string CensorText(string text)
{
To:
public string CensorText(string text)
{
text = text.ToLower();
Edit 2
To retain the original sentence with the censored words changed, it would be much easier to use re instead. First, revert your file back to how it was in the question.
Now replace:
text = text.Replace(word, WordCleaner(word));
with:
text = regex.replace(text,word,WordCleaner(word),RegexOptions.Ignorecase);

Here's a simple option you can use.
The benefit is you don't care which of the word is lower case, it'll work for either cases. Note that compare returns an int, hence why we check it's 0 for a match.
string input = "the Woof is on Fire, we don't need no bucket, leT the ...";
string[] bad_words = new string[] {"woof","fire","BucKet", "Let"};
foreach (var word in input.Split(' ')) {
if (bad_words.Any( b => String.Compare( word, b // Following line does what you want:
, StringComparison.OrdinalIgnoreCase) == 0))
Console.Write(WordCleaner(word));
else
Console.Write(word);
}
Output:
the W$$f is on F$$e we don't need no b$$$$t l$T the ...
Seems fine to me. Note that if you split on space, a word with a comma right after will have that comma as part of the word

Related

Replacing first 16 digits in a string with Regex.Replace

I'm trying to replace only the first 16 digits of a string with Regex. I want it replaced with "*". I need to take this string:
"Request=Credit Card.Auth
Only&Version=4022&HD.Network_Status_Byte=*&HD.Application_ID=TZAHSK!&HD.Terminal_ID=12991kakajsjas&HD.Device_Tag=000123&07.POS_Entry_Capability=1&07.PIN_Entry_Capability=0&07.CAT_Indicator=0&07.Terminal_Type=4&07.Account_Entry_Mode=1&07.Partial_Auth_Indicator=0&07.Account_Card_Number=4242424242424242&07.Account_Expiry=1024&07.Transaction_Amount=142931&07.Association_Token_Indicator=0&17.CVV=200&17.Street_Address=123
Road SW&17.Postal_Zip_Code=90210&17.Invoice_Number=INV19291"
And replace the credit card number with an asterisk, which is why I say the first 16 digits, as that is how many digits are in a credit card. I am first splitting the string where there is a "." and then checking if it contains "card" and "number". Then if it finds it I want to replace the first 16 numbers with "*"
This is what I've done:
public void MaskData(string input)
{
if (input.Contains("."))
{
string[] userInput = input.Split('.');
foreach (string uInput in userInput)
{
string lowerCaseInput = uInput.ToLower();
string containsCard = "card";
string containsNumber = "number";
if (lowerCaseInput.Contains(containsCard) && lowerCaseInput.Contains(containsNumber))
{
tbStoreInput.Text += Regex.Replace(lowerCaseInput, #"[0-9]", "*") + Environment.NewLine;
}
else
{
tbStoreInput.Text += lowerCaseInput + Environment.NewLine;
}
}
}
}
I am aware that the Regex is wrong, but not sure how to only get the first 16, as right now its putting an asterisks in the entire line like seen here:
"account_card_number=****************&**"
I don't want it to show the asterisks after the "&".
Same answer as in the comments but explained.
your regex pattern "[0-9]" is a single digit match, so each individual digit
including the digits after & will be a match and so would be replaced.
What you want to do is add a quantifier which restricts the matching to a number of characters ie 16, so your regex changes to "[0-9]{16}" to ensure those are the only characters affected by your replace operation
Disclaimer
My answer is purposely broader than what is asked by OP but I saw it as an opportunity to raise awareness of other tools that are available in C# (which are objects).
String replacement
Regex is not the only tool available to replace a simple string by another. Instead of
Regex.Replace(lowerCaseInput, #"[0-9]{16}", "****************")
it can also be
new StringBuilder()
.Append(lowerCaseInput.Take(20))
.Append(new string('*', 16))
.Append(lowerCaseInput.Skip(36))
.ToString();
Shifting from procedural to object
Now the real meat comes in the possibility to encapsulate the logic into an object which holds a kind of string representation of a dictionary (entries being separated by '.' while keys and values are separated by '=').
The only behavior this object has is to give back a string representation of the initial input but with some value (1 in your case) masked to user (I assume for some security reason).
public sealed class CreditCardRequest
{
private readonly string _input;
public CreditCardRequest(string input) => _input = input;
public static implicit operator string(CreditCardRequest request) => request.ToString();
public override string ToString()
{
var entries = _input.Split(".", StringSplitOptions.RemoveEmptyEntries)
.Select(entry => entry.Split("="))
.ToDictionary(kv => kv[0].ToLower(), kv =>
{
if (kv[0] == "Account_Card_Number")
{
return new StringBuilder()
.Append(new string('*', 16))
.Append(kv[1].Skip(16))
.ToString();
}
else
{
return kv[1];
}
});
var output = new StringBuilder();
foreach (var kv in entries)
{
output.AppendFormat("{0}={1}{2}", kv.Key, kv.Value, Environment.NewLine);
}
return output.ToString();
}
}
Usage becomes as follow:
tbStoreInput.Text = new CreditCardRequest(input);
The concerns of your code are now independant of each other (the rule to parse the input is no more tied to UI component) and the implementation details are hidden.
You can even decide to use Regex in CreditCardRequest.ToString() if you wish to, the UI won't ever notice the change.
The class would then becomes:
public override string ToString()
{
var output = new StringBuilder();
if (_input.Contains("."))
{
foreach (string uInput in _input.Split('.'))
{
if (uInput.StartsWith("Account_Card_Number"))
{
output.AppendLine(Regex.Replace(uInput.ToLower(), #"[0-9]{16}", "****************");
}
else
{
output.AppendLine(uInput.ToLower());
}
}
}
return output.ToString();
}
You can match 16 digits after the account number, and replace with 16 times an asterix:
(?<=\baccount_card_number=)[0-9]{16}\b
Regex demo
Or you can use a capture group and use that group in the replacement like $1****************
\b(account_card_number=)[0-9]{16}\b
Regex demo

How do I use .ToUpper or .ToLower on a certain word in a big sentence?

string testSentence = "this is a test sentence and I WANT TO SEE HOW IT WILL LOOK LIKE hoping this part is big";
int firstLetter = testSentence.IndexOf("this");
int length = "this is a test sentence and".Length;
string upperSentence = testSentence.Substring(firstLetter, length).ToUpper();
int secondLetter = testSentence.IndexOf(" I");
int length2 = " I WANT TO SEE HOW IT WILL LOOK LIKE".Length;
string lowerSentence = testSentence.Substring(secondLetter, length2).ToLower();
int thirdSentence = testSentence.IndexOf(" hoping");
int length1 = " hoping this part is big".Length;
string get = testSentence.Substring(thirdSentence, length1).ToUpper();
Console.WriteLine(upperSentence + lowerSentence + get);
Can somebody please tell me how would you capitalize in all big or small letters only one word in the middle of the sentence? For example, make the word ''LOOK'' in small case letters. Does the ''.Length'' call has to be used or is there a different way than literally typing the word or part of the sentence that I want to convert to upper or lower cases?
The problem I have with this is, I cannot isolate just one word and make it low/big letters because then the rest of the string after the particular word is also in the lower/upper cases
In line with #Franck's advice, think about the problem as a human. You have three things:
a sentence
words that you want to be big
words that you want to be small
The sentence can really be broken down into a collection of words (ignoring punctuation). You want to go through the collection of words and change some of them to be uppercase and some of them to be lowercase - all the rest of the words you want to leave as they are.
Here is some code that does this:
using System;
using System.Linq;
public class Program
{
public static void Main()
{
// modify these as you please
string testSentence = "this is a test sentence and I WANT TO SEE HOW IT WILL LOOK LIKE hoping this part is big";
string[] wordsToUpperCase = new string[] { "hoping", "test" };
string[] wordsToLowerCase = new string[] { "look", "is" };
// start processing
// split the sentence into words
string[] wordsInSentence = testSentence.Split(' ');
// a List to hold the reworked words
var outputWords = new System.Collections.Generic.List<string>();
// process the words
foreach (string currentWord in wordsInSentence)
{
// check the wordsToUpperCase array for the current word
bool shouldUpperCaseThisWord = wordsToUpperCase.Any(stringToTest => stringToTest.Equals(currentWord, StringComparison.CurrentCultureIgnoreCase));
// check the wordsToLowerCase array for the current word
bool shouldLowerCaseThisWord = wordsToLowerCase.Any(stringToTest => stringToTest.Equals(currentWord, StringComparison.CurrentCultureIgnoreCase));
// add the current word to the output list
if (shouldUpperCaseThisWord)
outputWords.Add(currentWord.ToUpper());
else if (shouldLowerCaseThisWord)
outputWords.Add(currentWord.ToLower());
else
outputWords.Add(currentWord);
}
string finalOutput = String.Join(" ", outputWords);
Console.WriteLine(finalOutput);
}
}

How to contact whole text from file into the string avoiding empty lines beetwen strings

How to get whole text from document contacted into the string. I'm trying to split text by dot: string[] words = s.Split('.'); I want take this text from text document. But if my text document contains empty lines between strings, for example:
pat said, “i’ll keep this ring.”
she displayed the silver and jade wedding ring which, in another time track,
she and joe had picked out; this
much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
result looks like this:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track
3. she and joe had picked out this
4. much of the alternate world she had elected to retain
5. he wondered what if any legal basis she had kept in addition
6. none he hoped wisely however he said nothing
7. better not even to ask
but desired correct output should be like this:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track she and joe had picked out this much of the alternate world she had elected to retain
3. he wondered what if any legal basis she had kept in addition
4. none he hoped wisely however he said nothing
5. better not even to ask
So to do this first I need to process text file content to get whole text as single string, like this:
pat said, “i’ll keep this ring.” she displayed the silver and jade wedding ring which, in another time track, she and joe had picked out; this much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
I can't to do this same way as it would be with list content for example: string concat = String.Join(" ", text.ToArray());,
I'm not sure how to contact text into string from text document
I think this is what you want:
var fileLocation = #"c:\\myfile.txt";
var stringFromFile = File.ReadAllText(fileLocation);
//replace Environment.NewLine with any new line character your file uses
var withoutNewLines = stringFromFile.Replace(Environment.NewLine, "");
//modify to remove any unwanted character
var withoutUglyCharacters = Regex.Replace(withoutNewLines, "[“’”,;-]", "");
var withoutTwoSpaces = withoutUglyCharacters.Replace(" ", " ");
var result = withoutTwoSpaces.Split('.').Where(i => i != "").Select(i => i.TrimStart()).ToList();
So first you read all text from your file, then you remove all unwanted characters and then split by . and return non empty items
Have you tried replacing double new-lines before splitting using a period?
static string[] GetSentences(string filePath) {
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
var sentences = Regex.Split(lines, #"\.[\s]{1,}?");
return sentences;
}
I haven't tested this, but it should work.
Explanation:
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
Throws an exception if the file could not be found. It is advisory you surround the method call with a try/catch.
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
Creates a string, and ignores any lines which are purely whitespace or empty.
var sentences = Regex.Split(lines, #".[\s]{1,}?");
Creates a string array, where the string is split at every period and whitespace following the period.
E.g:
The string "I came. I saw. I conquered" would become
I came
I saw
I conquered
Update:
Here's the method as a one-liner, if that's your style?
static string[] SplitSentences(string filePath) => File.Exists(filePath) ? Regex.Split(string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line))), #"") : null;
I would suggest you to iterate through all characters and just check if they are in range of 'a' >= char <= 'z' or if char == ' '. If it matches the condition then add it to the newly created string else check if it is '.' character and if it is then end your line and add another one :
List<string> lines = new List<string>();
string line = string.Empty;
foreach(char c in str)
{
if((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20)
line += c;
else if(c == '.')
{
lines.Add(line.Trim());
line = string.Empty;
}
}
Working online example
Or if you prefer "one-liner"s :
IEnumerable<string> lines = new string(str.Select(c => (char)(((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20) ? c : c == '.' ? '\n' : '\0')).ToArray()).Split('\n').Select(s => s.Trim());
I may be wrong about this. I would think that you may not want to alter the string if you are splitting it. Example, there are double/single quote(s) (“) in part of the string. Removing them may not be desired which brings up the possibly of a question, reading a text file that contains single/double quotes (as your example data text shows) like below:
var stringFromFile = File.ReadAllText(fileLocation);
will not display those characters properly in a text box or the console because the default encoding using the ReadAllText method is UTF8. Example the single/double quotes will display (replacement characters) as diamonds in a text box on a form and will be displayed as a question mark (?) when displayed to the console. To keep the single/double quotes and have them display properly you can get the encoding for the OS’s current ANSI encoding by adding a parameter to the ReadAllText method like below:
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
Below is code using a simple split method to .split the string on periods (.) Hope this helps.
private void button1_Click(object sender, EventArgs e) {
string fileLocation = #"C:\YourPath\YourFile.txt";
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
string bigString = stringFromFile.Replace(Environment.NewLine, "");
string[] result = bigString.Split('.');
int count = 1;
foreach (string s in result) {
if (s != "") {
textBox1.Text += count + ". " + s.Trim() + Environment.NewLine;
Console.WriteLine(count + ". " + s.Trim());
count++;
}
else {
// period at the end of the string
}
}
}

Finding the number of occurences strings in a specific format occur in a given text

I have a large string, where there can be specific words (text followed by a single colon, like "test:") occurring more than once. For example, like this:
word:
TEST:
word:
TEST:
TEST: // random text
"word" occurs twice and "TEST" occurs thrice, but the amount can be variable. Also, these words don't have to be in the same order and there can be more text in the same line as the word (as shown in the last example of "TEST"). What I need to do is append the occurrence number to each word, for example the output string needs to be this:
word_ONE:
TEST_ONE:
word_TWO:
TEST_TWO:
TEST_THREE: // random text
The RegEx for getting these words which I've written is ^\b[A-Za-z0-9_]{4,}\b:. However, I don't know how to accomplish the above in a fast way. Any ideas?
Regex is perfect for this job - using Replace with a match evaluator:
This example is not tested nor compiled:
public class Fix
{
public static String Execute(string largeText)
{
return Regex.Replace(largeText, "^(\w{4,}):", new Fix().Evaluator);
}
private Dictionary<String, int> counters = new Dictionary<String, int>();
private static String[] numbers = {"ONE", "TWO", "THREE",...};
public String Evaluator(Match m)
{
String word = m.Groups[1].Value;
int count;
if (!counters.TryGetValue(word, out count))
count = 0;
count++;
counters[word] = count;
return word + "_" + numbers[count-1] + ":";
}
}
This should return what you requested when calling:
result = Fix.Execute(largeText);
i think you can do this with Regax.Replace(string, string, MatchEvaluator) and a dictionary.
Dictionary<string, int> wordCount=new Dictionary<string,int>();
string AppendIndex(Match m)
{
string matchedString = m.ToString();
if(wordCount.Contains(matchedString))
wordCount[matchedString]=wordCount[matchedString]+1;
else
wordCount.Add(matchedString, 1);
return matchedString + "_"+ wordCount.ToString();// in the format: word_1, word_2
}
string inputText = "....";
string regexText = #"";
static void Main()
{
string text = "....";
string result = Regex.Replace(text, #"^\b[A-Za-z0-9_]{4,}\b:",
new MatchEvaluator(AppendIndex));
}
see this:
http://msdn.microsoft.com/en-US/library/cft8645c(v=VS.80).aspx
If I understand you correctly, regex is not necessary here.
You can split your large string by the ':' character. Maybe you also need to read line by line (split by '\n'). After that you just create a dictionary (IDictionary<string, int>), which counts the occurrences of certain words. Every time you find word x, you increase the counter in the dictionary.
EDIT
Read your file line by line OR split the string by '\n'
Check if your delimiter is present. Either by splitting by ':' OR using regex.
Get the first item from the split array OR the first match of your regex.
Use a dictionary to count your occurrences.
if (dictionary.Contains(key)) dictionary[key]++;
else dictionary.Add(key, 1);
If you need words instead of numbers, then create another dictionary for these. So that dictionary[key] equals one if key equals 1. Mabye there is another solution for that.
Look at this example (I know it's not perfect and not so nice)
lets leave the exact argument for the Split function, I think it can help
static void Main(string[] args)
{
string a = "word:word:test:-1+234=567:test:test:";
string[] tks = a.Split(':');
Regex re = new Regex(#"^\b[A-Za-z0-9_]{4,}\b");
var res = from x in tks
where re.Matches(x).Count > 0
select x + DecodeNO(tks.Count(y=>y.Equals(x)));
foreach (var item in res)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
private static string DecodeNO(int n)
{
switch (n)
{
case 1:
return "_one";
case 2:
return "_two";
case 3:
return "_three";
}
return "";
}

How to capitalize the first character of each word, or the first character of a whole string, with C#?

I could write my own algorithm to do it, but I feel there should be the equivalent to ruby's humanize in C#.
I googled it but only found ways to humanize dates.
Examples:
A way to turn "Lorem Lipsum Et" into "Lorem lipsum et"
A way to turn "Lorem lipsum et" into "Lorem Lipsum Et"
As discussed in the comments of #miguel's answer, you can use TextInfo.ToTitleCase which has been available since .NET 1.1. Here is some code corresponding to your example:
string lipsum1 = "Lorem lipsum et";
// Creates a TextInfo based on the "en-US" culture.
TextInfo textInfo = new CultureInfo("en-US",false).TextInfo;
// Changes a string to titlecase.
Console.WriteLine("\"{0}\" to titlecase: {1}",
lipsum1,
textInfo.ToTitleCase( lipsum1 ));
// Will output: "Lorem lipsum et" to titlecase: Lorem Lipsum Et
It will ignore casing things that are all caps such as "LOREM LIPSUM ET" because it is taking care of cases if acronyms are in text so that "IEEE" (Institute of Electrical and Electronics Engineers) won't become "ieee" or "Ieee".
However if you only want to capitalize the first character you can do the solution that is over here… or you could just split the string and capitalize the first one in the list:
string lipsum2 = "Lorem Lipsum Et";
string lipsum2lower = textInfo.ToLower(lipsum2);
string[] lipsum2split = lipsum2lower.Split(' ');
bool first = true;
foreach (string s in lipsum2split)
{
if (first)
{
Console.Write("{0} ", textInfo.ToTitleCase(s));
first = false;
}
else
{
Console.Write("{0} ", s);
}
}
// Will output: Lorem lipsum et
There is another elegant solution :
Define the function ToTitleCase in an static class of your projet
using System.Globalization;
public static string ToTitleCase(this string title)
{
return CultureInfo.CurrentCulture.TextInfo.ToTitleCase(title.ToLower());
}
And then use it like a string extension anywhere on your project:
"have a good day !".ToTitleCase() // "Have A Good Day !"
Use regular expressions for this looks much cleaner:
string s = "the quick brown fox jumps over the lazy dog";
s = Regex.Replace(s, #"(^\w)|(\s\w)", m => m.Value.ToUpper());
All the examples seem to make the other characters lowered first which isn't what I needed.
customerName = CustomerName <-- Which is what I wanted
this is an example = This Is An Example
public static string ToUpperEveryWord(this string s)
{
// Check for empty string.
if (string.IsNullOrEmpty(s))
{
return string.Empty;
}
var words = s.Split(' ');
var t = "";
foreach (var word in words)
{
t += char.ToUpper(word[0]) + word.Substring(1) + ' ';
}
return t.Trim();
}
If you just want to capitalize the first character, just stick this in a utility method of your own:
return string.IsNullOrEmpty(str)
? str
: str[0].ToUpperInvariant() + str.Substring(1).ToLowerInvariant();
There's also a library method to capitalize the first character of every word:
http://msdn.microsoft.com/en-us/library/system.globalization.textinfo.totitlecase.aspx
CSS technique is ok but only changes the presentation of the string in the browser. A better method is to make the text itself capitalised before sending to browser.
Most of the above implimentations are ok, but none of them address the issue of what happens if you have mixed case words that need to be preserved, or if you want to use true Title Case, for example:
"Where to Study PHd Courses in the USA"
or
"IRS Form UB40a"
Also using CultureInfo.CurrentCulture.TextInfo.ToTitleCase(string) preserves upper case words as in
"sports and MLB baseball" which becomes "Sports And MLB Baseball" but if the whole string is put in upper case, then this causes an issue.
So I put together a simple function that allows you to keep the capital and mixed case words and make small words lower case (if they are not at the start and end of the phrase) by including them in a specialCases and lowerCases string arrays:
public static string TitleCase(string value) {
string titleString = ""; // destination string, this will be returned by function
if (!String.IsNullOrEmpty(value)) {
string[] lowerCases = new string[12] { "of", "the", "in", "a", "an", "to", "and", "at", "from", "by", "on", "or"}; // list of lower case words that should only be capitalised at start and end of title
string[] specialCases = new string[7] { "UK", "USA", "IRS", "UCLA", "PHd", "UB40a", "MSc" }; // list of words that need capitalisation preserved at any point in title
string[] words = value.ToLower().Split(' ');
bool wordAdded = false; // flag to confirm whether this word appears in special case list
int counter = 1;
foreach (string s in words) {
// check if word appears in lower case list
foreach (string lcWord in lowerCases) {
if (s.ToLower() == lcWord) {
// if lower case word is the first or last word of the title then it still needs capital so skip this bit.
if (counter == 0 || counter == words.Length) { break; };
titleString += lcWord;
wordAdded = true;
break;
}
}
// check if word appears in special case list
foreach (string scWord in specialCases) {
if (s.ToUpper() == scWord.ToUpper()) {
titleString += scWord;
wordAdded = true;
break;
}
}
if (!wordAdded) { // word does not appear in special cases or lower cases, so capitalise first letter and add to destination string
titleString += char.ToUpper(s[0]) + s.Substring(1).ToLower();
}
wordAdded = false;
if (counter < words.Length) {
titleString += " "; //dont forget to add spaces back in again!
}
counter++;
}
}
return titleString;
}
This is just a quick and simple method - and can probably be improved a bit if you want to spend more time on it.
if you want to keep the capitalisation of smaller words like "a" and "of" then just remove them from the special cases string array. Different organisations have different rules on capitalisation.
You can see an example of this code in action on this site: Egg Donation London - this site automatically creates breadcrumb trails at the top of the pages by parsing the url eg "/services/uk-egg-bank/introduction" - then each folder name in the trail has hyphens replaced with spaces and capitalises the folder name, so uk-egg-bank becomes UK Egg Bank. (preserving the upper case 'UK')
An extension of this code could be to have a lookup table of acronyms and uppercase/lowercase words in a shared text file, database table or web service so that the list of mixed case words can be maintained from one single place and apply to many different applications that rely on the function.
There is no prebuilt solution for proper linguistic captialization in .NET. What kind of capitialization are you going for? Are you following the Chicago Manual of Style conventions? AMA or MLA? Even plain english sentence capitalization has 1000's of special exceptions for words. I can't speak to what ruby's humanize does, but I imagine it likely doesn't follow linguistic rules of capitalization and instead does something much simpler.
Internally, we encountered this same issue and had to write a fairly large amount code just to handle proper (in our little world) casing of article titles, not even accounting for sentence capitalization. And it indeed does get "fuzzy" :)
It really depends on what you need - why are you trying to convert the sentences to proper capitalization (and in what context)?
I have achieved the same using custom extension methods. For First Letter of First sub-string use the method yourString.ToFirstLetterUpper(). For First Letter of Every sub-string excluding articles and some propositions, use the method yourString.ToAllFirstLetterInUpper(). Below is a console program:
class Program
{
static void Main(string[] args)
{
Console.WriteLine("this is my string".ToAllFirstLetterInUpper());
Console.WriteLine("uniVersity of lonDon".ToAllFirstLetterInUpper());
}
}
public static class StringExtension
{
public static string ToAllFirstLetterInUpper(this string str)
{
var array = str.Split(" ");
for (int i = 0; i < array.Length; i++)
{
if (array[i] == "" || array[i] == " " || listOfArticles_Prepositions().Contains(array[i])) continue;
array[i] = array[i].ToFirstLetterUpper();
}
return string.Join(" ", array);
}
private static string ToFirstLetterUpper(this string str)
{
return str?.First().ToString().ToUpper() + str?.Substring(1).ToLower();
}
private static string[] listOfArticles_Prepositions()
{
return new[]
{
"in","on","to","of","and","or","for","a","an","is"
};
}
}
OUTPUT
This is My String
University of London
Process finished with exit code 0.
Far as I know, there's not a way to do that without writing (or cribbing) code. C# nets (ha!) you upper, lower and title (what you have) cases:
http://support.microsoft.com/kb/312890/EN-US/

Categories