My code should be translating a phrase into pig latin. Every word must have an "ay" at the end and every first letter of each word should be placed before "ay"
ex wall = "allway"
any ideas? this is the easiest way i could think of..
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace english_to_pig_latin
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("THIS IS A English to Pig Latin translator");
Console.WriteLine("ENTER Phrase");
string[] phrase = Console.ReadLine().Split(' ');
int words = phrase.Length;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < words; i++)
{
//to add ay in the end
/*sb.Append(phrase[i].ToString());
sb.Append("ay ");
Console.WriteLine(sb);*/
}
Console.ReadLine();
}
}
}
First you need to define your pig-latin rules. your description lacks real pig-latin rules. for instance, English "sharp" is correctly "Pig-Latinized" as 'arpshay', not 'harpsay', as your explanation above explained. (But i prefer to use 'arp-sh-ay' to facilitate reading of PigLatin as well as using hyphens make it possible to reverse translate back into English.) i suggest you first find some rules for Pig-Latin. Your start is a good start. Your code now separates a phrase into (almost) words. Note that your code will turn "Please, Joe" into "Please," and "Joe" tho, and you probably do not want that comma sent to your word-by-word translator.
when defining your rules, i suggest you consider how to Pig-Latin-ize these words:
hello --> 'ellohay' (a normal word),
string --> 'ingstray' ('str' is the whole consonant string moved to the end),
apple --> 'appleway', 'appleay', or 'appleyay', (depending on your dialect of Pig-Latin),
queen --> 'eenquay' ('qu' is the consonant string here),
yellow --> 'ellowyay' (y is consonant here),
rhythm --> 'ythmrhay' (y is vowel here),
sky --> 'yskay' (y is vowel here).
Note that for any word that starts with 'qu' (like 'queen'), this 'qu' is a special condition that needs handled too. Note that y is probably a consonant when it begins an English word, but a vowel when in the middle or at the end of a word.
The hyphenated Pig Latin versions of these words would be:
ello-h-ay, ing-str-ay, ('apple-way', 'apple-ay', or 'apple-yay'), 'een-qu-ay', 'ellow-y-ay', 'ythm-rh-ay', and 'y-sk-ay'. The hyphenation allows both easier reading as well as an ability to reverse the Pig Latin back into English by a computer parser. But unfortunately, many people just cram the Pig Latin word together without showing any hyphenation separation, so reversing the translation cannot be done simply without ambiguity.
Real pig-latin really goes by the sound of the word, not the spelling, so without a very complex word to phoneme system, this is way too difficult. but most (good) pig-latin writing translators handle the above cases and ignore other exceptions because English is really a very bad language when it comes to phonetically sounding out words.
So my first suggestion is get a set of rules. my 2nd suggestion is use two functions, PigLatinizePhrase() and PigLatinizeWord() where your PigLatinizePhrase() method parses a phrase into words (and punctuation), and calls PigLatinizeWord() for each word, excluding any punctuation. you can use a simple loop thru each character and test for char.IsLetter to determine if it's a letter or not. if it's a letter then add it to a string builder and move to the next letter. if it's not a letter and the string builder is not empty then send that word to your word parser to parse it, and then add the non-letter to your result. this would be your logic for your PigLatinizePhrase() method. Here is my code which does just that:
/// <summary>
/// </summary>
/// <param name="eng">English text, paragraphs, etc.</param>
/// <param name="suffixWithNoOnset">Used to differentiate between Pig Latin dialects.
/// Known dialects may use any of: "ay", "-ay", "way", "-way", "yay", or "-yay".
/// Cooresponding translations for 'egg' will yield: "eggay", "egg-ay", "eggway", "egg-way", "eggyay", "egg-yay".
/// Or for 'I': "Iay", "I-ay", "Iway", "I-way", "Iyay", "I-yay".
/// </param>
/// <returns></returns>
public static string PigLatinizePhrase(string eng, string suffixWithNoOnset = "-ay")
{
if (eng == null) { return null; } // don't break if null
var word = new StringBuilder(); // only current word, built char by char
var pig = new StringBuilder(); // pig latin text
char prevChar = '\0';
foreach (char thisChar in eng)
{
// the "'" test is so "I'll", "can't", and "Ashley's" will work right.
if (char.IsLetter(thisChar) || thisChar == '\'')
{
word.Append(thisChar);
}
else
{
if (word.Length > 0)
{
pig.Append(PigLatinizeWord(word.ToString(), suffixWithNoOnset));
word = new StringBuilder();
}
pig.Append(thisChar);
}
prevChar = thisChar;
}
if (word.Length > 0)
{
pig.Append(PigLatinizeWord(word.ToString(), suffixWithNoOnset));
}
return pig.ToString();
} // public static string PigLatinizePhrase(string eng, string suffixWithNoOnset = "-ay")
The suffixWithNoOnset variable is simply passed directly to the PigLatinizeWord() method and it determines exactly which 'dialect' of Pig Latin will be used. (See the XML comment before the method in the source code for more clarity.)
For the PigLatinizeWord() method, upon actually programming it, i found that it was very convenient to split this functionality into two methods, one method to parse the English word into the 2 parts that Pig Latin cares about, and another to actually do what is desired with those 2 parts, depending on which version of Pig Latin is desired. Here's the source code for these two functions:
/// <summary>
/// </summary>
/// <param name="eng">English word before being translated to Pig Latin.</param>
/// <param name="suffixWithNoOnset">Used to differentiate between Pig Latin dialects.
/// Known dialects may use any of: "ay", "-ay", "way", "-way", "yay", or "-yay".
/// Cooresponding translations for 'egg' will yield: "eggay", "egg-ay", "eggway", "egg-way", "eggyay", "egg-yay".
/// Or for 'I': "Iay", "I-ay", "Iway", "I-way", "Iyay", "I-yay".
/// </param>
/// <returns></returns>
public static string PigLatinizeWord(string eng, string suffixWithNoOnset = "-ay")
{
if (eng == null || eng.Length == 0) { return eng; } // don't break if null or empty
string[] onsetAndEnd = GetOnsetAndEndOfWord(eng);
// string h = string.Empty;
string o = onsetAndEnd[0]; // 'Onset' of first syllable that gets moved to end of word
string e = onsetAndEnd[1]; // 'End' of word, without the onset
bool hyphenate = suffixWithNoOnset.Contains('-');
// if (hyphenate) { h = "-"; }
var sb = new StringBuilder();
if (e.Length > 0) { sb.Append(e); if (hyphenate && o.Length > 0) { sb.Append('-'); } }
if (o.Length > 0) { sb.Append(o); if (hyphenate) { sb.Append('-'); } sb.Append("ay"); }
else { sb.Append(suffixWithNoOnset); }
return sb.ToString();
} // public static string PigLatinizeWord(string eng)
public static string[] GetOnsetAndEndOfWord(string word)
{
if (word == null) { return null; }
// string[] r = ",".Split(',');
string uppr = word.ToUpperInvariant();
if (uppr.StartsWith("QU")) { return new string[] { word.Substring(0,2), word.Substring(2) }; }
int x = 0; if (word.Length <= x) { return new string[] { string.Empty, string.Empty }; }
if ("AOEUI".Contains(uppr[x])) // tests first letter/character
{ return new string[] { word.Substring(0, x), word.Substring(x) }; }
while (++x < word.Length)
{
if ("AOEUIY".Contains(uppr[x])) // tests each character after first letter/character
{ return new string[] { word.Substring(0, x), word.Substring(x) }; }
}
return new string[] { string.Empty, word };
} // public static string[] GetOnsetAndEndOfWord(string word)
I have written a PigLatinize() method in JavaScript before, which was a lot of fun for me. :) I enjoyed making my C# version with more features, giving it the ability to translate to 6 varyious 'dialects' of Pig Latin, especially since C# is my favorite (programming) language. ;)
I think you need this transformation: phrase[i].Substring(1) + phrase[i][0] + "ay"
Related
I'm trying to replace only the first 16 digits of a string with Regex. I want it replaced with "*". I need to take this string:
"Request=Credit Card.Auth
Only&Version=4022&HD.Network_Status_Byte=*&HD.Application_ID=TZAHSK!&HD.Terminal_ID=12991kakajsjas&HD.Device_Tag=000123&07.POS_Entry_Capability=1&07.PIN_Entry_Capability=0&07.CAT_Indicator=0&07.Terminal_Type=4&07.Account_Entry_Mode=1&07.Partial_Auth_Indicator=0&07.Account_Card_Number=4242424242424242&07.Account_Expiry=1024&07.Transaction_Amount=142931&07.Association_Token_Indicator=0&17.CVV=200&17.Street_Address=123
Road SW&17.Postal_Zip_Code=90210&17.Invoice_Number=INV19291"
And replace the credit card number with an asterisk, which is why I say the first 16 digits, as that is how many digits are in a credit card. I am first splitting the string where there is a "." and then checking if it contains "card" and "number". Then if it finds it I want to replace the first 16 numbers with "*"
This is what I've done:
public void MaskData(string input)
{
if (input.Contains("."))
{
string[] userInput = input.Split('.');
foreach (string uInput in userInput)
{
string lowerCaseInput = uInput.ToLower();
string containsCard = "card";
string containsNumber = "number";
if (lowerCaseInput.Contains(containsCard) && lowerCaseInput.Contains(containsNumber))
{
tbStoreInput.Text += Regex.Replace(lowerCaseInput, #"[0-9]", "*") + Environment.NewLine;
}
else
{
tbStoreInput.Text += lowerCaseInput + Environment.NewLine;
}
}
}
}
I am aware that the Regex is wrong, but not sure how to only get the first 16, as right now its putting an asterisks in the entire line like seen here:
"account_card_number=****************&**"
I don't want it to show the asterisks after the "&".
Same answer as in the comments but explained.
your regex pattern "[0-9]" is a single digit match, so each individual digit
including the digits after & will be a match and so would be replaced.
What you want to do is add a quantifier which restricts the matching to a number of characters ie 16, so your regex changes to "[0-9]{16}" to ensure those are the only characters affected by your replace operation
Disclaimer
My answer is purposely broader than what is asked by OP but I saw it as an opportunity to raise awareness of other tools that are available in C# (which are objects).
String replacement
Regex is not the only tool available to replace a simple string by another. Instead of
Regex.Replace(lowerCaseInput, #"[0-9]{16}", "****************")
it can also be
new StringBuilder()
.Append(lowerCaseInput.Take(20))
.Append(new string('*', 16))
.Append(lowerCaseInput.Skip(36))
.ToString();
Shifting from procedural to object
Now the real meat comes in the possibility to encapsulate the logic into an object which holds a kind of string representation of a dictionary (entries being separated by '.' while keys and values are separated by '=').
The only behavior this object has is to give back a string representation of the initial input but with some value (1 in your case) masked to user (I assume for some security reason).
public sealed class CreditCardRequest
{
private readonly string _input;
public CreditCardRequest(string input) => _input = input;
public static implicit operator string(CreditCardRequest request) => request.ToString();
public override string ToString()
{
var entries = _input.Split(".", StringSplitOptions.RemoveEmptyEntries)
.Select(entry => entry.Split("="))
.ToDictionary(kv => kv[0].ToLower(), kv =>
{
if (kv[0] == "Account_Card_Number")
{
return new StringBuilder()
.Append(new string('*', 16))
.Append(kv[1].Skip(16))
.ToString();
}
else
{
return kv[1];
}
});
var output = new StringBuilder();
foreach (var kv in entries)
{
output.AppendFormat("{0}={1}{2}", kv.Key, kv.Value, Environment.NewLine);
}
return output.ToString();
}
}
Usage becomes as follow:
tbStoreInput.Text = new CreditCardRequest(input);
The concerns of your code are now independant of each other (the rule to parse the input is no more tied to UI component) and the implementation details are hidden.
You can even decide to use Regex in CreditCardRequest.ToString() if you wish to, the UI won't ever notice the change.
The class would then becomes:
public override string ToString()
{
var output = new StringBuilder();
if (_input.Contains("."))
{
foreach (string uInput in _input.Split('.'))
{
if (uInput.StartsWith("Account_Card_Number"))
{
output.AppendLine(Regex.Replace(uInput.ToLower(), #"[0-9]{16}", "****************");
}
else
{
output.AppendLine(uInput.ToLower());
}
}
}
return output.ToString();
}
You can match 16 digits after the account number, and replace with 16 times an asterix:
(?<=\baccount_card_number=)[0-9]{16}\b
Regex demo
Or you can use a capture group and use that group in the replacement like $1****************
\b(account_card_number=)[0-9]{16}\b
Regex demo
Obviously I'm new to this, hence the content of this project. I have written some code that will translate English into Pig Latin. Easy enough. The problem is I want to find a way to translate the Pig Latin back into English using a logic block. The clone string just seems like a cheap way to do it. Any suggestions? Here's my code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace FunctionTest
{
public class PigLatinClass
{
public static void pigTalk(string sentence)
{
try
{
while (sentence != "exit")
{
string firstLetter;
string afterFirst;
string pigLatinOut = "";
int x;
string vowel = "AEIOUaeiou";
Console.WriteLine("Enter a sentence to convert into PigLatin");
sentence = Console.ReadLine();
string[] pieces = sentence.Split();
foreach (string piece in pieces)
{
afterFirst = piece.Substring(1);
firstLetter = piece.Substring(0, 1);
x = vowel.IndexOf(firstLetter);
if (x == -1)
{
pigLatinOut = (afterFirst + firstLetter + "ay ");
}
else
{
pigLatinOut = (firstLetter + afterFirst + "way ");
}
Console.Write(pigLatinOut);
}
Console.WriteLine("Press Enter to flip the sentence back.");
Console.ReadKey(true);
string clonedString = null;
clonedString = (String)sentence.Clone();
Console.WriteLine(clonedString);
}
}
catch (Exception e)
{
Console.WriteLine(e.ToString());
}
}
}
}
The problem is there's no real rule that would work. For example: If the 3rd letter
from the last was "w" you might want to say this is a vowel word but, a consonant word starting with a "w" could also fit this rule. If the first letter was a vowel again you might want to say that this is a vowel word but, a consonant word could also fit this rule since the first letter is moved to the back (pat = atpay). The only way I think this is possible is to have an if statement that checks if w is in the 3rd position and the word starts with a vowel which would call for && operator and Visual Studio yells at you if you use it with strings.
The problem is that Pig Latin/English translation is not a bijective function.
For example, imagine to have 2 English words like "all" and "wall", the corresponding Pig Latin words will be always "allway".
This suggest you that if you get a word like "allway" you can't give a unique translation in English, but (at least) two.
I'm assuming this is homework.
What your professor probably wants is for you to convert a sentence to pig latin, and from pig latin. Keeping a copy of the original string only lets you "flip back" from sentences you already know the non-pig latin version of. It doesn't allow you to flip back from any string.
I think you want to structure your program like this:
public class PigLatinClass
{
public static string ToPigLatin(string sentence)
{
// Convert a string to pig latin
}
public static string FromPigLatin(string sentence)
{
// Convert a string from pig latin (opposite logic of above)
}
public static string PigTalk()
{
string sentence;
Console.WriteLine("Enter a sentence to convert into PigLatin");
sentence = Console.ReadLine();
sentence = ToPigLatin(sentence);
Console.WriteLine(sentence);
Console.WriteLine("Press Enter to flip the sentence back.");
Console.ReadKey(true);
sentence = FromPigLatin(sentence);
Console.WriteLine(sentence);
}
}
I'm using asp.net/C# and I'm looking to create unique(?) uris for a small CMS system I am creating.
I am generating the uri segment from my articles title, so for example if the title is "My amazing article" the uri would be www.website.com/news/my-amazing-article
There are two parts to this. Firstly, which characters do you think I need to strip out? I am replacing spaces with "-" and I think I should strip out the "/" character too. Can you think of any more that might cause problems? "?" perhaps? Should I remove all non-alpha characters?
Second question, above I mentioned the uris MAY need to be unique. I was going to check the uri list before adding to ensure uniqueness, however I see stack overflow uses a number plus a uri. This I assume allows titles to be duplicated? Do you think this would be a better way?
Transform all diacritics into their base character and then strip anything that is not a letter or a digit using Char.IsLetterOrDigit.
Then replace all spaces by a single dash.
This is what we use in our software.
/// <summary>
/// Convert a name into a string that can be appended to a Uri.
/// </summary>
private static string EscapeName(string name)
{
if (!string.IsNullOrEmpty(name))
{
name = NormalizeString(name);
// Replaces all non-alphanumeric character by a space
StringBuilder builder = new StringBuilder();
for (int i = 0; i < name.Length; i++)
{
builder.Append(char.IsLetterOrDigit(name[i]) ? name[i] : ' ');
}
name = builder.ToString();
// Replace multiple spaces into a single dash
name = Regex.Replace(name, #"[ ]{1,}", #"-", RegexOptions.None);
}
return name;
}
/// <summary>
/// Strips the value from any non english character by replacing thoses with their english equivalent.
/// </summary>
/// <param name="value">The string to normalize.</param>
/// <returns>A string where all characters are part of the basic english ANSI encoding.</returns>
/// <seealso cref="http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net"/>
private static string NormalizeString(string value)
{
string normalizedFormD = value.Normalize(NormalizationForm.FormD);
StringBuilder builder = new StringBuilder();
for (int i = 0; i < normalizedFormD.Length; i++)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(normalizedFormD[i]);
if (uc != UnicodeCategory.NonSpacingMark)
{
builder.Append(normalizedFormD[i]);
}
}
return builder.ToString().Normalize(NormalizationForm.FormC);
}
Concerning using those generated name as unique Id, I would vouch against. Use the generated name as a SEO helper, but not as a key resolver. If you look at how stackoverflow references their pages:
http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net
^--ID ^--Unneeded name but helpful for bookmarks and SEO
You can find the ID there. These two URL point to the same page:
http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net
http://stackoverflow.com/questions/249087/
You want to consult IETF RFC 3986, which describes URIs and what is legal and not legal.
Beyond validity, maybe you want a readable URI, as well. In that case eliminate all non-alphanumeric characters.
In stackoverflow, the title is changeable, hence the use of the ID for a unique yet unchanging distinguisher for the URI. If you don't have changeable titles, then you should be ok just using the text. If you can edit titles after publication, then an id may be preferable.
For question 1: Rob Conery has a pretty useful Regex-based solution to cleaning strings for slug-generation. Here's the extension method (just add this to a static class):
public static string CreateSlug(this string source)
{
var regex = new Regex(#"([^a-z0-9\-]?)");
var slug = "";
if (!string.IsNullOrEmpty(source))
{
slug = source.Trim().ToLower();
slug = slug.Replace(' ', '-');
slug = slug.Replace("---", "-");
slug = slug.Replace("--", "-");
if (regex != null)
slug = regex.Replace(slug, "");
if (slug.Length * 2 < source.Length)
return "";
if (slug.Length > 100)
slug = slug.Substring(0, 100);
}
return slug;
}
For question 2, you could just place a UNIQUE constraint on the column in the database if you want them to be unique. This will allow you to trap the exception and provide useful user input. If you don't like that, then relying on the post identifier is probably a good alternative.
I could write my own algorithm to do it, but I feel there should be the equivalent to ruby's humanize in C#.
I googled it but only found ways to humanize dates.
Examples:
A way to turn "Lorem Lipsum Et" into "Lorem lipsum et"
A way to turn "Lorem lipsum et" into "Lorem Lipsum Et"
As discussed in the comments of #miguel's answer, you can use TextInfo.ToTitleCase which has been available since .NET 1.1. Here is some code corresponding to your example:
string lipsum1 = "Lorem lipsum et";
// Creates a TextInfo based on the "en-US" culture.
TextInfo textInfo = new CultureInfo("en-US",false).TextInfo;
// Changes a string to titlecase.
Console.WriteLine("\"{0}\" to titlecase: {1}",
lipsum1,
textInfo.ToTitleCase( lipsum1 ));
// Will output: "Lorem lipsum et" to titlecase: Lorem Lipsum Et
It will ignore casing things that are all caps such as "LOREM LIPSUM ET" because it is taking care of cases if acronyms are in text so that "IEEE" (Institute of Electrical and Electronics Engineers) won't become "ieee" or "Ieee".
However if you only want to capitalize the first character you can do the solution that is over hereā¦ or you could just split the string and capitalize the first one in the list:
string lipsum2 = "Lorem Lipsum Et";
string lipsum2lower = textInfo.ToLower(lipsum2);
string[] lipsum2split = lipsum2lower.Split(' ');
bool first = true;
foreach (string s in lipsum2split)
{
if (first)
{
Console.Write("{0} ", textInfo.ToTitleCase(s));
first = false;
}
else
{
Console.Write("{0} ", s);
}
}
// Will output: Lorem lipsum et
There is another elegant solution :
Define the function ToTitleCase in an static class of your projet
using System.Globalization;
public static string ToTitleCase(this string title)
{
return CultureInfo.CurrentCulture.TextInfo.ToTitleCase(title.ToLower());
}
And then use it like a string extension anywhere on your project:
"have a good day !".ToTitleCase() // "Have A Good Day !"
Use regular expressions for this looks much cleaner:
string s = "the quick brown fox jumps over the lazy dog";
s = Regex.Replace(s, #"(^\w)|(\s\w)", m => m.Value.ToUpper());
All the examples seem to make the other characters lowered first which isn't what I needed.
customerName = CustomerName <-- Which is what I wanted
this is an example = This Is An Example
public static string ToUpperEveryWord(this string s)
{
// Check for empty string.
if (string.IsNullOrEmpty(s))
{
return string.Empty;
}
var words = s.Split(' ');
var t = "";
foreach (var word in words)
{
t += char.ToUpper(word[0]) + word.Substring(1) + ' ';
}
return t.Trim();
}
If you just want to capitalize the first character, just stick this in a utility method of your own:
return string.IsNullOrEmpty(str)
? str
: str[0].ToUpperInvariant() + str.Substring(1).ToLowerInvariant();
There's also a library method to capitalize the first character of every word:
http://msdn.microsoft.com/en-us/library/system.globalization.textinfo.totitlecase.aspx
CSS technique is ok but only changes the presentation of the string in the browser. A better method is to make the text itself capitalised before sending to browser.
Most of the above implimentations are ok, but none of them address the issue of what happens if you have mixed case words that need to be preserved, or if you want to use true Title Case, for example:
"Where to Study PHd Courses in the USA"
or
"IRS Form UB40a"
Also using CultureInfo.CurrentCulture.TextInfo.ToTitleCase(string) preserves upper case words as in
"sports and MLB baseball" which becomes "Sports And MLB Baseball" but if the whole string is put in upper case, then this causes an issue.
So I put together a simple function that allows you to keep the capital and mixed case words and make small words lower case (if they are not at the start and end of the phrase) by including them in a specialCases and lowerCases string arrays:
public static string TitleCase(string value) {
string titleString = ""; // destination string, this will be returned by function
if (!String.IsNullOrEmpty(value)) {
string[] lowerCases = new string[12] { "of", "the", "in", "a", "an", "to", "and", "at", "from", "by", "on", "or"}; // list of lower case words that should only be capitalised at start and end of title
string[] specialCases = new string[7] { "UK", "USA", "IRS", "UCLA", "PHd", "UB40a", "MSc" }; // list of words that need capitalisation preserved at any point in title
string[] words = value.ToLower().Split(' ');
bool wordAdded = false; // flag to confirm whether this word appears in special case list
int counter = 1;
foreach (string s in words) {
// check if word appears in lower case list
foreach (string lcWord in lowerCases) {
if (s.ToLower() == lcWord) {
// if lower case word is the first or last word of the title then it still needs capital so skip this bit.
if (counter == 0 || counter == words.Length) { break; };
titleString += lcWord;
wordAdded = true;
break;
}
}
// check if word appears in special case list
foreach (string scWord in specialCases) {
if (s.ToUpper() == scWord.ToUpper()) {
titleString += scWord;
wordAdded = true;
break;
}
}
if (!wordAdded) { // word does not appear in special cases or lower cases, so capitalise first letter and add to destination string
titleString += char.ToUpper(s[0]) + s.Substring(1).ToLower();
}
wordAdded = false;
if (counter < words.Length) {
titleString += " "; //dont forget to add spaces back in again!
}
counter++;
}
}
return titleString;
}
This is just a quick and simple method - and can probably be improved a bit if you want to spend more time on it.
if you want to keep the capitalisation of smaller words like "a" and "of" then just remove them from the special cases string array. Different organisations have different rules on capitalisation.
You can see an example of this code in action on this site: Egg Donation London - this site automatically creates breadcrumb trails at the top of the pages by parsing the url eg "/services/uk-egg-bank/introduction" - then each folder name in the trail has hyphens replaced with spaces and capitalises the folder name, so uk-egg-bank becomes UK Egg Bank. (preserving the upper case 'UK')
An extension of this code could be to have a lookup table of acronyms and uppercase/lowercase words in a shared text file, database table or web service so that the list of mixed case words can be maintained from one single place and apply to many different applications that rely on the function.
There is no prebuilt solution for proper linguistic captialization in .NET. What kind of capitialization are you going for? Are you following the Chicago Manual of Style conventions? AMA or MLA? Even plain english sentence capitalization has 1000's of special exceptions for words. I can't speak to what ruby's humanize does, but I imagine it likely doesn't follow linguistic rules of capitalization and instead does something much simpler.
Internally, we encountered this same issue and had to write a fairly large amount code just to handle proper (in our little world) casing of article titles, not even accounting for sentence capitalization. And it indeed does get "fuzzy" :)
It really depends on what you need - why are you trying to convert the sentences to proper capitalization (and in what context)?
I have achieved the same using custom extension methods. For First Letter of First sub-string use the method yourString.ToFirstLetterUpper(). For First Letter of Every sub-string excluding articles and some propositions, use the method yourString.ToAllFirstLetterInUpper(). Below is a console program:
class Program
{
static void Main(string[] args)
{
Console.WriteLine("this is my string".ToAllFirstLetterInUpper());
Console.WriteLine("uniVersity of lonDon".ToAllFirstLetterInUpper());
}
}
public static class StringExtension
{
public static string ToAllFirstLetterInUpper(this string str)
{
var array = str.Split(" ");
for (int i = 0; i < array.Length; i++)
{
if (array[i] == "" || array[i] == " " || listOfArticles_Prepositions().Contains(array[i])) continue;
array[i] = array[i].ToFirstLetterUpper();
}
return string.Join(" ", array);
}
private static string ToFirstLetterUpper(this string str)
{
return str?.First().ToString().ToUpper() + str?.Substring(1).ToLower();
}
private static string[] listOfArticles_Prepositions()
{
return new[]
{
"in","on","to","of","and","or","for","a","an","is"
};
}
}
OUTPUT
This is My String
University of London
Process finished with exit code 0.
Far as I know, there's not a way to do that without writing (or cribbing) code. C# nets (ha!) you upper, lower and title (what you have) cases:
http://support.microsoft.com/kb/312890/EN-US/
Is there a built-in mechanism in .NET to match patterns other than Regular Expressions? I'd like to match using UNIX style (glob) wildcards (* = any number of any character).
I'd like to use this for a end-user facing control. I fear that permitting all RegEx capabilities will be very confusing.
I like my code a little more semantic, so I wrote this extension method:
using System.Text.RegularExpressions;
namespace Whatever
{
public static class StringExtensions
{
/// <summary>
/// Compares the string against a given pattern.
/// </summary>
/// <param name="str">The string.</param>
/// <param name="pattern">The pattern to match, where "*" means any sequence of characters, and "?" means any single character.</param>
/// <returns><c>true</c> if the string matches the given pattern; otherwise <c>false</c>.</returns>
public static bool Like(this string str, string pattern)
{
return new Regex(
"^" + Regex.Escape(pattern).Replace(#"\*", ".*").Replace(#"\?", ".") + "$",
RegexOptions.IgnoreCase | RegexOptions.Singleline
).IsMatch(str);
}
}
}
(change the namespace and/or copy the extension method to your own string extensions class)
Using this extension, you can write statements like this:
if (File.Name.Like("*.jpg"))
{
....
}
Just sugar to make your code a little more legible :-)
Just for the sake of completeness. Since 2016 in dotnet core there is a new nuget package called Microsoft.Extensions.FileSystemGlobbing that supports advanced globing paths. (Nuget Package)
some examples might be, searching for wildcard nested folder structures and files which is very common in web development scenarios.
wwwroot/app/**/*.module.js
wwwroot/app/**/*.js
This works somewhat similar with what .gitignore files use to determine which files to exclude from source control.
I found the actual code for you:
Regex.Escape( wildcardExpression ).Replace( #"\*", ".*" ).Replace( #"\?", "." );
The 2- and 3-argument variants of the listing methods like GetFiles() and EnumerateDirectories() take a search string as their second argument that supports filename globbing, with both * and ?.
class GlobTestMain
{
static void Main(string[] args)
{
string[] exes = Directory.GetFiles(Environment.CurrentDirectory, "*.exe");
foreach (string file in exes)
{
Console.WriteLine(Path.GetFileName(file));
}
}
}
would yield
GlobTest.exe
GlobTest.vshost.exe
The docs state that there are some caveats with matching extensions. It also states that 8.3 file names are matched (which may be generated automatically behind the scenes), which can result in "duplicate" matches in given some patterns.
The methods that support this are GetFiles(), GetDirectories(), and GetFileSystemEntries(). The Enumerate variants also support this.
If you want to avoid regular expressions this is a basic glob implementation:
public static class Globber
{
public static bool Glob(this string value, string pattern)
{
int pos = 0;
while (pattern.Length != pos)
{
switch (pattern[pos])
{
case '?':
break;
case '*':
for (int i = value.Length; i >= pos; i--)
{
if (Glob(value.Substring(i), pattern.Substring(pos + 1)))
{
return true;
}
}
return false;
default:
if (value.Length == pos || char.ToUpper(pattern[pos]) != char.ToUpper(value[pos]))
{
return false;
}
break;
}
pos++;
}
return value.Length == pos;
}
}
Use it like this:
Assert.IsTrue("text.txt".Glob("*.txt"));
If you use VB.Net, you can use the Like statement, which has Glob like syntax.
http://www.getdotnetcode.com/gdncstore/free/Articles/Intoduction%20to%20the%20VB%20NET%20Like%20Operator.htm
I have written a globbing library for .NETStandard, with tests and benchmarks. My goal was to produce a library for .NET, with minimal dependencies, that doesn't use Regex, and outperforms Regex.
You can find it here:
github.com/dazinator/DotNet.Glob
https://www.nuget.org/packages/DotNet.Glob/
I wrote a FileSelector class that does selection of files based on filenames. It also selects files based on time, size, and attributes. If you just want filename globbing then you express the name in forms like "*.txt" and similar. If you want the other parameters then you specify a boolean logic statement like "name = *.xls and ctime < 2009-01-01" - implying an .xls file created before January 1st 2009. You can also select based on the negative: "name != *.xls" means all files that are not xls.
Check it out.
Open source. Liberal license.
Free to use elsewhere.
Based on previous posts, I threw together a C# class:
using System;
using System.Text.RegularExpressions;
public class FileWildcard
{
Regex mRegex;
public FileWildcard(string wildcard)
{
string pattern = string.Format("^{0}$", Regex.Escape(wildcard)
.Replace(#"\*", ".*").Replace(#"\?", "."));
mRegex = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
}
public bool IsMatch(string filenameToCompare)
{
return mRegex.IsMatch(filenameToCompare);
}
}
Using it would go something like this:
FileWildcard w = new FileWildcard("*.txt");
if (w.IsMatch("Doug.Txt"))
Console.WriteLine("We have a match");
The matching is NOT the same as the System.IO.Directory.GetFiles() method, so don't use them together.
From C# you can use .NET's LikeOperator.LikeString method. That's the backing implementation for VB's LIKE operator. It supports patterns using *, ?, #, [charlist], and [!charlist].
You can use the LikeString method from C# by adding a reference to the Microsoft.VisualBasic.dll assembly, which is included with every version of the .NET Framework. Then you invoke the LikeString method just like any other static .NET method:
using Microsoft.VisualBasic;
using Microsoft.VisualBasic.CompilerServices;
...
bool isMatch = LikeOperator.LikeString("I love .NET!", "I love *", CompareMethod.Text);
// isMatch should be true.
https://www.nuget.org/packages/Glob.cs
https://github.com/mganss/Glob.cs
A GNU Glob for .NET.
You can get rid of the package reference after installing and just compile the single Glob.cs source file.
And as it's an implementation of GNU Glob it's cross platform and cross language once you find another similar implementation enjoy!
I don't know if the .NET framework has glob matching, but couldn't you replace the * with .*? and use regexes?
Just out of curiosity I've glanced into Microsoft.Extensions.FileSystemGlobbing - and it was dragging quite huge dependencies on quite many libraries - I've decided why I cannot try to write something similar?
Well - easy to say than done, I've quickly noticed that it was not so trivial function after all - for example "*.txt" should match for files only in current directly, while "**.txt" should also harvest sub folders.
Microsoft also tests some odd matching pattern sequences like "./*.txt" - I'm not sure who actually needs "./" kind of string - since they are removed anyway while processing.
(https://github.com/aspnet/FileSystem/blob/dev/test/Microsoft.Extensions.FileSystemGlobbing.Tests/PatternMatchingTests.cs)
Anyway, I've coded my own function - and there will be two copies of it - one in svn (I might bugfix it later on) - and I'll copy one sample here as well for demo purposes. I recommend to copy paste from svn link.
SVN Link:
https://sourceforge.net/p/syncproj/code/HEAD/tree/SolutionProjectBuilder.cs#l800
(Search for matchFiles function if not jumped correctly).
And here is also local function copy:
/// <summary>
/// Matches files from folder _dir using glob file pattern.
/// In glob file pattern matching * reflects to any file or folder name, ** refers to any path (including sub-folders).
/// ? refers to any character.
///
/// There exists also 3-rd party library for performing similar matching - 'Microsoft.Extensions.FileSystemGlobbing'
/// but it was dragging a lot of dependencies, I've decided to survive without it.
/// </summary>
/// <returns>List of files matches your selection</returns>
static public String[] matchFiles( String _dir, String filePattern )
{
if (filePattern.IndexOfAny(new char[] { '*', '?' }) == -1) // Speed up matching, if no asterisk / widlcard, then it can be simply file path.
{
String path = Path.Combine(_dir, filePattern);
if (File.Exists(path))
return new String[] { filePattern };
return new String[] { };
}
String dir = Path.GetFullPath(_dir); // Make it absolute, just so we can extract relative path'es later on.
String[] pattParts = filePattern.Replace("/", "\\").Split('\\');
List<String> scanDirs = new List<string>();
scanDirs.Add(dir);
//
// By default glob pattern matching specifies "*" to any file / folder name,
// which corresponds to any character except folder separator - in regex that's "[^\\]*"
// glob matching also allow double astrisk "**" which also recurses into subfolders.
// We split here each part of match pattern and match it separately.
//
for (int iPatt = 0; iPatt < pattParts.Length; iPatt++)
{
bool bIsLast = iPatt == (pattParts.Length - 1);
bool bRecurse = false;
String regex1 = Regex.Escape(pattParts[iPatt]); // Escape special regex control characters ("*" => "\*", "." => "\.")
String pattern = Regex.Replace(regex1, #"\\\*(\\\*)?", delegate (Match m)
{
if (m.ToString().Length == 4) // "**" => "\*\*" (escaped) - we need to recurse into sub-folders.
{
bRecurse = true;
return ".*";
}
else
return #"[^\\]*";
}).Replace(#"\?", ".");
if (pattParts[iPatt] == "..") // Special kind of control, just to scan upper folder.
{
for (int i = 0; i < scanDirs.Count; i++)
scanDirs[i] = scanDirs[i] + "\\..";
continue;
}
Regex re = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
int nScanItems = scanDirs.Count;
for (int i = 0; i < nScanItems; i++)
{
String[] items;
if (!bIsLast)
items = Directory.GetDirectories(scanDirs[i], "*", (bRecurse) ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);
else
items = Directory.GetFiles(scanDirs[i], "*", (bRecurse) ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);
foreach (String path in items)
{
String matchSubPath = path.Substring(scanDirs[i].Length + 1);
if (re.Match(matchSubPath).Success)
scanDirs.Add(path);
}
}
scanDirs.RemoveRange(0, nScanItems); // Remove items what we have just scanned.
} //for
// Make relative and return.
return scanDirs.Select( x => x.Substring(dir.Length + 1) ).ToArray();
} //matchFiles
If you find any bugs, I'll be grad to fix them.
I wrote a solution that does it. It does not depend on any library and it does not support "!" or "[]" operators. It supports the following search patterns:
C:\Logs\*.txt
C:\Logs\**\*P1?\**\asd*.pdf
/// <summary>
/// Finds files for the given glob path. It supports ** * and ? operators. It does not support !, [] or ![] operators
/// </summary>
/// <param name="path">the path</param>
/// <returns>The files that match de glob</returns>
private ICollection<FileInfo> FindFiles(string path)
{
List<FileInfo> result = new List<FileInfo>();
//The name of the file can be any but the following chars '<','>',':','/','\','|','?','*','"'
const string folderNameCharRegExp = #"[^\<\>:/\\\|\?\*" + "\"]";
const string folderNameRegExp = folderNameCharRegExp + "+";
//We obtain the file pattern
string filePattern = Path.GetFileName(path);
List<string> pathTokens = new List<string>(Path.GetDirectoryName(path).Split('\\', '/'));
//We obtain the root path from where the rest of files will obtained
string rootPath = null;
bool containsWildcardsInDirectories = false;
for (int i = 0; i < pathTokens.Count; i++)
{
if (!pathTokens[i].Contains("*")
&& !pathTokens[i].Contains("?"))
{
if (rootPath != null)
rootPath += "\\" + pathTokens[i];
else
rootPath = pathTokens[i];
pathTokens.RemoveAt(0);
i--;
}
else
{
containsWildcardsInDirectories = true;
break;
}
}
if (Directory.Exists(rootPath))
{
//We build the regular expression that the folders should match
string regularExpression = rootPath.Replace("\\", "\\\\").Replace(":", "\\:").Replace(" ", "\\s");
foreach (string pathToken in pathTokens)
{
if (pathToken == "**")
{
regularExpression += string.Format(CultureInfo.InvariantCulture, #"(\\{0})*", folderNameRegExp);
}
else
{
regularExpression += #"\\" + pathToken.Replace("*", folderNameCharRegExp + "*").Replace(" ", "\\s").Replace("?", folderNameCharRegExp);
}
}
Regex globRegEx = new Regex(regularExpression, RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
string[] directories = Directory.GetDirectories(rootPath, "*", containsWildcardsInDirectories ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);
foreach (string directory in directories)
{
if (globRegEx.Matches(directory).Count > 0)
{
DirectoryInfo directoryInfo = new DirectoryInfo(directory);
result.AddRange(directoryInfo.GetFiles(filePattern));
}
}
}
return result;
}
Unfortunately the accepted answer will not handle escaped input correctly, because string .Replace("\*", ".*") fails to distinguish between "*" and "\*" - it will happily replace "*" in both of these strings, leading to incorrect results.
Instead, a basic tokenizer can be used to convert the glob path into a regex pattern, which can then be matched against a filename using Regex.Match. This is a more robust and flexible solution.
Here is a method to do this. It handles ?, *, and **, and surrounds each of these globs with a capture group, so the values of each glob can be inspected after the Regex has been matched.
static string GlobbedPathToRegex(ReadOnlySpan<char> pattern, ReadOnlySpan<char> dirSeparatorChars)
{
StringBuilder builder = new StringBuilder();
builder.Append('^');
ReadOnlySpan<char> remainder = pattern;
while (remainder.Length > 0)
{
int specialCharIndex = remainder.IndexOfAny('*', '?');
if (specialCharIndex >= 0)
{
ReadOnlySpan<char> segment = remainder.Slice(0, specialCharIndex);
if (segment.Length > 0)
{
string escapedSegment = Regex.Escape(segment.ToString());
builder.Append(escapedSegment);
}
char currentCharacter = remainder[specialCharIndex];
char nextCharacter = specialCharIndex < remainder.Length - 1 ? remainder[specialCharIndex + 1] : '\0';
switch (currentCharacter)
{
case '*':
if (nextCharacter == '*')
{
// We have a ** glob expression
// Match any character, 0 or more times.
builder.Append("(.*)");
// Skip over **
remainder = remainder.Slice(specialCharIndex + 2);
}
else
{
// We have a * glob expression
// Match any character that isn't a dirSeparatorChar, 0 or more times.
if(dirSeparatorChars.Length > 0) {
builder.Append($"([^{Regex.Escape(dirSeparatorChars.ToString())}]*)");
}
else {
builder.Append("(.*)");
}
// Skip over *
remainder = remainder.Slice(specialCharIndex + 1);
}
break;
case '?':
builder.Append("(.)"); // Regex equivalent of ?
// Skip over ?
remainder = remainder.Slice(specialCharIndex + 1);
break;
}
}
else
{
// No more special characters, append the rest of the string
string escapedSegment = Regex.Escape(remainder.ToString());
builder.Append(escapedSegment);
remainder = ReadOnlySpan<char>.Empty;
}
}
builder.Append('$');
return builder.ToString();
}
The to use it:
string testGlobPathInput = "/Hello/Test/Blah/**/test*123.fil?";
string globPathRegex = GlobbedPathToRegex(testGlobPathInput, "/"); // Could use "\\/" directory separator chars on Windows
Console.WriteLine($"Globbed path: {testGlobPathInput}");
Console.WriteLine($"Regex conversion: {globPathRegex}");
string testPath = "/Hello/Test/Blah/All/Hail/The/Hypnotoad/test_somestuff_123.file";
Console.WriteLine($"Test Path: {testPath}");
var regexGlobPathMatch = Regex.Match(testPath, globPathRegex);
Console.WriteLine($"Match: {regexGlobPathMatch.Success}");
for(int i = 0; i < regexGlobPathMatch.Groups.Count; i++) {
Console.WriteLine($"Group [{i}]: {regexGlobPathMatch.Groups[i]}");
}
Output:
Globbed path: /Hello/Test/Blah/**/test*123.fil?
Regex conversion: ^/Hello/Test/Blah/(.*)/test([^/]*)123\.fil(.)$
Test Path: /Hello/Test/Blah/All/Hail/The/Hypnotoad/test_somestuff_123.file
Match: True
Group [0]: /Hello/Test/Blah/All/Hail/The/Hypnotoad/test_somestuff_123.file
Group [1]: All/Hail/The/Hypnotoad
Group [2]: _somestuff_
Group [3]: e
I have created a gist here as a canonical version of this method:
https://gist.github.com/crozone/9a10156a37c978e098e43d800c6141ad