Check if an uppercase Letter is inside a string - c#

I am currently learning C# and RegEx. I am working on a small wordcrawler. I get a big list of many words where I escape those which don't fit to my RegEx.
Here's my code:
var WordRegex = new Regex("^[a-zA-Z]{4,}$", RegexOptions.Compiled | RegexOptions.CultureInvariant);
var secondRegex = new Regex("([A-Z]{1})");
var words = new List<string>();
var finalList = new List<string>();
foreach (var word in words)
{
if (WordRegex.IsMatch(word) && secondRegex.Matches(word).Count == 1 || secondRegex.Matches(word).Count == 0)
{
finalList.Add(word);
}
}
So this works fine if the word is 'McLaren' (two uppercase letters) it won't add it to finalList. But if the words is something like 'stackOverflow' (one uppercase letter but not at start of string), it does take it to finallist. Is there any simple way to prevent this problem?
PS: if there is any better way than RegEx let me know!
Here are some examples:
("McLaren");//false
("Nissan");//true
("BMW");//false
("Subaru");//true
("Maserati");//true
("Mercedes Benz");//false
("Volkswagen");//true
("audi");//true
("Alfa Romeo");//false
("rollsRoyce");//false
("drive");//true
These with true should be accepted and the other shouldn't be accepted.
What I want to reach is that the regex shouldnt add when its written like this'rollsRoyce' but if it's written like 'Rollsroyce' or 'RollsRoyce' it should be accepted. So I have to check if there are uppercase letters inside the string.

If you want to check if the string contains an upper letter - this would be my approach
string sValue = "stackOverflow";
bool result = !sValue.Any(x => char.IsUpper(x));
Update to the updated question
string sValue = "stackOverflow";
bool result = sValue.Where(char.IsUpper).Skip(1).Any();
this ignores the 1st char and determines if the rest of the string contains at least one upper letter

There is a very easy solution without regex or linq:
bool hasUppercase = !str.equals(str.toLowerCase());
Now you can easily check:
if(!hasUppercase) {
// no uppercase letter
}
else {
// there is an uppercase letter
}
Just checking if the string is equal to its lowercased self.

Related

Determine if string is made up of characters from a different string (Scrabble-like program)

I am writing a program in C# that goes through a list of words and determines if they can be made up by a string that a user input. Just like the Scrabble game.
For example, when the user inputs the string "vacation", my program is supposed to go through a list of words that I already have and should return true when it gets to words like "cat". So it doesn't necessarily have to user ALL the letters.
Another example could be the word "overflow", it should return true with words like "over", "flow", "low", "lover". If the input word has repeating characters by N times, the word that matches can also have that letter up to N times but no more.
I currently have something like this:
var desiredChars = "ent";
var word = "element";
bool contains = desiredChars.All(word.Contains);
However, this checks if it contains all of the letters. I want to check if it contains ONLY those letters or less but ONLY those that can be made up with letters that the user passed.
If it wasn't for the issue of possible multiple letters (for "overflow", the word "fool" is a match, but "wow" isn't, because there aren't two w characters in the letter set), this Linq code would work
string letters = "overflow";
string word = "lover";
bool match = !word.Except(letters).Any(); // unfortunately, not sufficient
So, to handle the multiple letter issue, something like this is needed:
var letterChars = letters.ToList();
bool match = word.All(i => letterChars.Remove(i));
Here, we return true only if all the letters in the word can successfully be removed from the set of letters. Note that you only need to check those words in your dictionary that start with one of the letters in your letter set.
That worked for your example:
public static bool IsWordPartOfString(string startString, string word)
{
var tempTable = startString.ToArray();
foreach (var c in word)
{
var index = Array.FindIndex(tempTable, myChar => myChar == c);
if (index == -1)
{
return false;
}
tempTable[index] = ' ';
}
return true;
}
Steps:
1) Convert startString into an array
2) Iterate chars of the tested word
3) If char not found in startString return false
4) If char found in startString find it in the tempTable and remove so it
cannot be reused (to prevent scenario when startString has only one occurrence of a letter but the test word has multiple)
5) If possible to iterate through the whole word it means it all can be constructed from the letters in initial string so return true.

Get only Whole Words from a .Contains() statement

I've used .Contains() to find if a sentence contains a specific word however I found something weird:
I wanted to find if the word "hi" was present in a sentence which are as follows:
The child wanted to play in the mud
Hi there
Hector had a hip problem
if(sentence.contains("hi"))
{
//
}
I only want the SECOND sentence to be filtered however all 3 gets filtered since CHILD has a 'hi' in it and hip has a 'hi' in it. How do I use the .Contains() such that only whole words get picked out?
Try using Regex:
if (Regex.Match(sentence, #"\bhi\b", RegexOptions.IgnoreCase).Success)
{
//
};
This works just fine for me on your input text.
Here's a Regex solution:
Regex has a Word Boundary Anchor using \b
Also, if the search string might come from user input, you might consider escaping the string using Regex.Escape
This example should filter a list of strings the way you want.
string findme = "hi";
string pattern = #"\b" + Regex.Escape(findme) + #"\b";
Regex re = new Regex(pattern,RegexOptions.IgnoreCase);
List<string> data = new List<string> {
"The child wanted to play in the mud",
"Hi there",
"Hector had a hip problem"
};
var filtered = data.Where(d => re.IsMatch(d));
DotNetFiddle Example
You could split your sentence into words - you could split at each space and then trim any punctuation. Then check if any of these words are 'hi':
var punctuation = source.Where(Char.IsPunctuation).Distinct().ToArray();
var words = sentence.Split().Select(x => x.Trim(punctuation));
var containsHi = words.Contains("hi", StringComparer.OrdinalIgnoreCase);
See a working demo here: https://dotnetfiddle.net/AomXWx
You could write your own extension method for string like:
static class StringExtension
{
public static bool ContainsWord(this string s, string word)
{
string[] ar = s.Split(' ');
foreach (string str in ar)
{
if (str.ToLower() == word.ToLower())
return true;
}
return false;
}
}

c# trying to change first letter to uppercase but doesn't work

I have to convert the first letter of every word the user inputs into uppercase. I don't think I'm doing it right so it doesn't work but I'm not sure where has gone wrong D: Thank you in advance for your help! ^^
static void Main(string[] args)
{
Console.Write("Enter anything: ");
string x = Console.ReadLine();
string pattern = "^";
Regex expression = new Regex(pattern);
var regexp = new System.Text.RegularExpressions.Regex(pattern);
Match result = expression.Match(x);
Console.WriteLine(x);
foreach(var match in x)
{
Console.Write(match);
}
Console.WriteLine();
}
If your exercise isn't regex operations, there are built-in utilities to do what you are asking:
System.Globalization.TextInfo ti = System.Globalization.CultureInfo.CurrentCulture.TextInfo;
string titleString = ti.ToTitleCase("this string will be title cased");
Console.WriteLine(titleString);
Prints:
This String Will Be Title Cased
If you operation is for regex, see this previous StackOverflow answer: Sublime Text: Regex to convert Uppercase to Title Case?
First of all, your Regex "^" matches the start of a line. If you need to match each word in a multi-word line, you'll need a different Regex, e.g. "[A-Za-z]".
You're also not doing anything to actually change the first letter to upper case. Note that strings in C# are immutable (they cannot be changed after creation), so you will need to create a new string which consists of the first letter of the original string, upper cased, followed by the rest of the string. Give that part a try on your own. If you have trouble, post a new question with your attempt.
string pattern = "(?:^|(?<= ))(.)"
^ doesnt capture anything by itself.You can replace by uppercase letters by applying function to $1.See demo.
https://regex101.com/r/uE3cC4/29
I would approach this using Model Extensions.
PHP has a nice method called ucfirst.
So I translated that into C#
public static string UcFirst(this string s)
{
var stringArr = s.ToCharArray(0, s.Length);
var char1ToUpper = char.Parse(stringArr[0]
.ToString()
.ToUpper());
stringArr[0] = char1ToUpper;
return string.Join("", stringArr);
}
Usage:
[Test]
public void UcFirst()
{
string s = "john";
s = s.UcFirst();
Assert.AreEqual("John", s);
}
Obviously you would still have to split your sentence into a list and call UcFirst for each item in the list.
Google C# Model Extensions if you need help with what is going on.
One more way to do it with regex:
string input = "this string will be title cased, even if there are.cases.like.that";
string output = Regex.Replace(input, #"(?<!\w)\w", m => m.Value.ToUpper());
I hope this may help
public static string CapsFirstLetter(string inputValue)
{
char[] values = new char[inputValue.Length];
int count = 0;
foreach (char f in inputValue){
if (count == 0){
values[count] = Convert.ToChar(f.ToString().ToUpper());
}
else{
values[count] = f;
}
count++;
}
return new string(values);
}

Using regex or string manipulation when creating permalinks

I have following method(and looks expensive too) for creating permalinks but it's lacking few stuff that are quite important for nice permalink:
public string createPermalink(string text)
{
text = text.ToLower().TrimStart().TrimEnd();
foreach (char c in text.ToCharArray())
{
if (!char.IsLetterOrDigit(c) && !char.IsWhiteSpace(c))
{
text = text.Replace(c.ToString(), "");
}
if (char.IsWhiteSpace(c))
{
text = text.Replace(c, '-');
}
}
if (text.Length > 200)
{
text = text.Remove(200);
}
return text;
}
Few stuff that it is lacking:
if someone enters text like this:
"My choiches are:foo,bar" would get returned as "my-choices-arefoobar"
and it should be like: "my-choiches-are-foo-bar"
and If someone enters multiple white spaces it would get returned as "---" which is not nice to have in url.
Is there some better way to do this in regex(I really only used it few times)?
UPDATE:
Requirement was:
Any non digit or letter chars at beginning or end are not allowed
Any non digit or letter chars should be replaced by "-"
When replaced with "-" chars should not reapeat like "---"
And finally stripping string at index 200 to ensure it's not too long
Change to
public string createPermalink(string text)
{
text = text.ToLower();
StringBuilder sb = new StringBuilder(text.Length);
// We want to skip the first hyphenable characters and go to the "meat" of the string
bool lastHyphen = true;
// You can enumerate directly a string
foreach (char c in text)
{
if (char.IsLetterOrDigit(c))
{
sb.Append(c);
lastHyphen = false;
}
else if (!lastHyphen)
{
// We use lastHyphen to not put two hyphens consecutively
sb.Append('-');
lastHyphen = true;
}
if (sb.Length == 200)
{
break;
}
}
// Remove the last hyphen
if (sb.Length > 0 && sb[sb.Length - 1] == '-')
{
sb.Length--;
}
return sb.ToString();
}
If you really want to use regexes, you can do something like this (based on the code of Justin)
Regex rgx = new Regex(#"^\W+|\W+$");
Regex rgx2 = new Regex(#"\W+");
return rgx2.Replace(rgx.Replace(text.ToLower(), string.Empty), "-");
The first regex searches for non-word characters (1 or more) at the beginning (^) or at the end of the string ($) and removes them. The second one replaces one or more non-word characters with -.
This should solve the problem that you have explained. Please let me know if it needs any further explanation.
Just as an FYI, the regex makes use of lookarounds to get it done in one run
//This will find any non-character word, lumping them in one group if more than 1
//It will ignore non-character words at the beginning or end of the string
Regex rgx = new Regex(#"(?!\W+$)\W+(?<!^\W+)");
//This will then replace those matches with a -
string result = rgx.Replace(input, "-");
To keep the string from going beyond 200 characters, you will have to use substring. If you do this before the regex, then you will be ok, but if you do it after, then you run the risk of having a trailing dash again, FYI.
example:
myString.Substring(0,200)
I use an iterative approach for this - because in some cases you might want certain characters to be turned into words instead of having them turned into '-' characters - e.g. '&' -> 'and'.
But when you're done you'll also end up with a string that potentially contains multiple '-' - so you have a final regex that collapses all multiple '-' characters into one.
So I would suggest using an ordered list of regexes, and then run them all in order. This code is written to go in a static class that is then exposed as a single extension method for System.String - and is probably best merged into the System namespace.
I've hacked it from code I use, which had extensibility points (e.g. you could pass in a MatchEvaluator on construction of the replacement object for more intelligent replacements; and you could pass in your own IEnumerable of replacements, as the class was public), and therefore it might seem unnecessarily complicated - judging by the other answers I'm guessing everybody will think so (but I have specific requirements for the SEO of the strings that are created).
The list of replacements I use might not be exactly correct for your uses - if not, you can just add more.
private class SEOSymbolReplacement
{
private Regex _rx;
private string _replacementString;
public SEOSymbolReplacement(Regex r, string replacement)
{
//null-checks required.
_rx = r;
_replacementString = replacement;
}
public string Execute(string input)
{
/null-check required
return _rx.Replace(input, _replacementString);
}
}
private static readonly SEOSymbolReplacement[] Replacements = {
new SEOSymbolReplacement(new Regex(#"#", RegexOptions.Compiled), "Sharp"),
new SEOSymbolReplacement(new Regex(#"\+", RegexOptions.Compiled), "Plus"),
new SEOSymbolReplacement(new Regex(#"&", RegexOptions.Compiled), " And "),
new SEOSymbolReplacement(new Regex(#"[|:'\\/,_]", RegexOptions.Compiled), "-"),
new SEOSymbolReplacement(new Regex(#"\s+", RegexOptions.Compiled), "-"),
new SEOSymbolReplacement(new Regex(#"[^\p{L}\d-]",
RegexOptions.IgnoreCase | RegexOptions.Compiled), ""),
new SEOSymbolReplacement(new Regex(#"-{2,}", RegexOptions.Compiled), "-")};
/// <summary>
/// Transforms the string into an SEO-friendly string.
/// </summary>
/// <param name="str"></param>
public static string ToSEOPathString(this string str)
{
if (str == null)
return null;
string toReturn = str;
foreach (var replacement in DefaultReplacements)
{
toReturn = replacement.Execute(toReturn);
}
return toReturn;
}

Find all words without figures using RegEx

I found this code to get all words of a string,
static string[] GetWords(string input)
{
MatchCollection matches = Regex.Matches(input, #"\b[\w']*\b");
var words = from m in matches.Cast<Match>()
where !string.IsNullOrEmpty(m.Value)
select TrimSuffix(m.Value);
return words.ToArray();
}
static string TrimSuffix(string word)
{
int apostrapheLocation = word.IndexOf('\'');
if (apostrapheLocation != -1)
{
word = word.Substring(0, apostrapheLocation);
}
return word;
}
Please describe about the code.
How can I get words without figures?
2 How can I get words without figures?
You'll have to replace \w with [A-Za-z]
So that your RegEx becomes #"\b[A-Za-z']*\b"
And then you'll have to think about TrimSuffix(). The regEx allows apostrophes but TrimSuffix() will extract only the left part. So "it's" will become "it".
In
MatchCollection matches = Regex.Matches(input, #"\b[\w']*\b");
the code is using a regex that will look for any word; \b means border of word and \w is the alpha numerical POSIX class to get everything as letters(with or without graphical accents), numbers and sometimes underscore and the ' is just included in the list along with the alphaNum. So basically that is searching for the begining and the end of the word and selecting it.
then
var words = from m in matches.Cast<Match>()
where !string.IsNullOrEmpty(m.Value)
select TrimSuffix(m.Value);
is a LINQ syntax, where you can do SQL-Like queries inside your code. That code is getting every match from the regex and checking to see if the value is not empty and to get it without spaces. Its also where you can add your figure validation.
and This:
static string TrimSuffix(string word)
{
int apostrapheLocation = word.IndexOf('\'');
if (apostrapheLocation != -1)
{
word = word.Substring(0, apostrapheLocation);
}
return word;
}
is removing the ' of the words who have it and getting just the part that is before it
i.e. for don't word it will get only the don

Categories