How can I search for an exact match in a string? For example, If I had a string with this text:
label
label:
labels
And I search for label, I only want to get the first match, not the other two. I tried the Contains and IndexOf method, but they also give me the 2nd and 3rd matches.
You can use a regular expression like this:
bool contains = Regex.IsMatch("Hello1 Hello2", #"(^|\s)Hello(\s|$)"); // yields false
bool contains = Regex.IsMatch("Hello1 Hello", #"(^|\s)Hello(\s|$)"); // yields true
The \b is a word boundary check, and used like above it will be able to match whole words only.
I think the regex version should be faster than Linq.
Reference
You can try to split the string (in this case the right separator can be the space but it depends by the case) and after you can use the equals method to see if there's the match e.g.:
private Boolean findString(String baseString,String strinfToFind, String separator)
{
foreach (String str in baseString.Split(separator.ToCharArray()))
{
if(str.Equals(strinfToFind))
{
return true;
}
}
return false;
}
And the use can be
findString("Label label Labels:", "label", " ");
It seems you've got a delimiter (crlf) between the words so you could include the delimiter as part of the search string.
If not then I'd go with Liviu's suggestion.
You could try a LINQ version:
string str = "Hello1 Hello Hello2";
string another = "Hello";
string retVal = str.Split(" \n\r".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.First( p => p .Equals(another));
Related
I have a list of words that I want to remove from a string I use the following method
string stringToClean = "The.Flash.2014.S07E06.720p.WEB-DL.HEVC.x265.RMTeam";
string[] BAD_WORDS = {
"720p", "web-dl", "hevc", "x265", "Rmteam", "."
};
var cleaned = string.Join(" ", stringToClean.Split(' ').Where(w => !BAD_WORDS.Contains(w, StringComparer.OrdinalIgnoreCase)));
but it is not working And the following text is output
The.Flash.2014.S07E06.720p.WEB-DL.HEVC.x265.RMTeam
For this it would be a good idea to create a reusable method that splits a string into words. I'll do this as an extension method of string. If you are not familiar with extension methods, read extension methods demystified
public static IEnumerable<string> ToWords(this string text)
{
// TODO implement
}
Usage will be as follows:
string text = "This is some wild text!"
List<string> words = text.ToWords().ToList();
var first3Words = text.ToWords().Take(3);
var lastWord = text.ToWords().LastOrDefault();
Once you've got this method, the solution to your problem will be easy:
IEnumerable<string> badWords = ...
string inputText = ...
IEnumerable<string> validWords = inputText.ToWords().Except(badWords);
Or maybe you want to use Except(badWords, StringComparer.OrdinalIgnoreCase);
The implementation of ToWords depends on what you would call a word: everything delimited by a dot? or do you want to support whitespaces? or maybe even new-lines?
The implementation for your problem: A word is any sequence of characters delimited by a dot.
public static IEnumerable<string> ToWords(this string text)
{
// find the next dot:
const char dot = '.';
int startIndex = 0;
int dotIndex = text.IndexOf(dot, startIndex);
while (dotIndex != -1)
{
// found a Dot, return the substring until the dot:
int wordLength = dotIndex - startIndex;
yield return text.Substring(startIndex, wordLength;
// find the next dot
startIndex = dotIndex + 1;
dotIndex = text.IndexOf(dot, startIndex);
}
// read until the end of the text. Return everything after the last dot:
yield return text.SubString(startIndex, text.Length);
}
TODO:
Decide what you want to return if text starts with a dot ".ABC.DEF".
Decide what you want to return if the text ends with a dot: "ABC.DEF."
Check if the return value is what you want if text is empty.
Your split/join don't match up with your input.
That said, here's a quick one-liner:
string clean = BAD_WORDS.Aggregate(stringToClean, (acc, word) => acc.Replace(word, string.Empty));
This is basically a "reduce". Not fantastically performant but over strings that are known to be decently small I'd consider it acceptable. If you have to use a really large string or a really large number of "words" you might look at another option but it should work for the example case you've given us.
Edit: The downside of this approach is that you'll get partials. So for example in your token array you have "720p" but the code I suggested here will still match on "720px" but there are still ways around it. For example instead of using string's implementation of Replace you could use a regex that will match your delimiters something like Regex.Replace(acc, $"[. ]{word}([. ])", "$1") (regex not confirmed but should be close and I added a capture for the delimiter in order to put it back for the next pass)
Let's say I have a string such as 123ad456. I want to make a method that separates the groups of numbers into a list, so then the output will be something like 123,456.
I've tried doing return Regex.Match(str, #"-?\d+").Value;, but that only outputs the first occurrence of a number, so the output would be 123. I also know I can use Regex.Matches, but from my understanding, that would output 123456, not separating the different groups of numbers.
I also see from this page on MSDN that Regex.Match has an overload that takes the string to find a match for and an int as an index at which to search for the match, but I don't see an overload that takes in the above in addition to a parameter for the regex pattern to search for, and the same goes for Regex.Matches.
I guess the approach to use would be to use a for loop of some sort, but I'm not entirely sure what to do. Help would be greatly appreciated.
All you have to to use Matches instead of Match. Then simply iterate over all matches:
string result = "";
foreach (Match match in Regex.Matches(str, #"-?\d+"))
{
result += match.result;
}
You may iterate over string data using foreach and use TryParse to check each character.
foreach (var item in stringData)
{
if (int.TryParse(item.ToString(), out int data))
{
// int data is contained in variable data
}
}
Using a combination of string.Join and Regex.Matches:
string result = string.Join(",", Regex.Matches(str, #"-?\d+").Select(m => m.Value));
string.Join performs better than continually appending to an existing string.
\d+ is the regex for integer numbers;
//System.Text.RegularExpressions.Regex
resultString = Regex.Match(subjectString, #"\d+").Value;
returns a string with the very first occurence of a number in subjectString.
Int32.Parse(resultString) will then give you the number.
I've used .Contains() to find if a sentence contains a specific word however I found something weird:
I wanted to find if the word "hi" was present in a sentence which are as follows:
The child wanted to play in the mud
Hi there
Hector had a hip problem
if(sentence.contains("hi"))
{
//
}
I only want the SECOND sentence to be filtered however all 3 gets filtered since CHILD has a 'hi' in it and hip has a 'hi' in it. How do I use the .Contains() such that only whole words get picked out?
Try using Regex:
if (Regex.Match(sentence, #"\bhi\b", RegexOptions.IgnoreCase).Success)
{
//
};
This works just fine for me on your input text.
Here's a Regex solution:
Regex has a Word Boundary Anchor using \b
Also, if the search string might come from user input, you might consider escaping the string using Regex.Escape
This example should filter a list of strings the way you want.
string findme = "hi";
string pattern = #"\b" + Regex.Escape(findme) + #"\b";
Regex re = new Regex(pattern,RegexOptions.IgnoreCase);
List<string> data = new List<string> {
"The child wanted to play in the mud",
"Hi there",
"Hector had a hip problem"
};
var filtered = data.Where(d => re.IsMatch(d));
DotNetFiddle Example
You could split your sentence into words - you could split at each space and then trim any punctuation. Then check if any of these words are 'hi':
var punctuation = source.Where(Char.IsPunctuation).Distinct().ToArray();
var words = sentence.Split().Select(x => x.Trim(punctuation));
var containsHi = words.Contains("hi", StringComparer.OrdinalIgnoreCase);
See a working demo here: https://dotnetfiddle.net/AomXWx
You could write your own extension method for string like:
static class StringExtension
{
public static bool ContainsWord(this string s, string word)
{
string[] ar = s.Split(' ');
foreach (string str in ar)
{
if (str.ToLower() == word.ToLower())
return true;
}
return false;
}
}
I have to convert the first letter of every word the user inputs into uppercase. I don't think I'm doing it right so it doesn't work but I'm not sure where has gone wrong D: Thank you in advance for your help! ^^
static void Main(string[] args)
{
Console.Write("Enter anything: ");
string x = Console.ReadLine();
string pattern = "^";
Regex expression = new Regex(pattern);
var regexp = new System.Text.RegularExpressions.Regex(pattern);
Match result = expression.Match(x);
Console.WriteLine(x);
foreach(var match in x)
{
Console.Write(match);
}
Console.WriteLine();
}
If your exercise isn't regex operations, there are built-in utilities to do what you are asking:
System.Globalization.TextInfo ti = System.Globalization.CultureInfo.CurrentCulture.TextInfo;
string titleString = ti.ToTitleCase("this string will be title cased");
Console.WriteLine(titleString);
Prints:
This String Will Be Title Cased
If you operation is for regex, see this previous StackOverflow answer: Sublime Text: Regex to convert Uppercase to Title Case?
First of all, your Regex "^" matches the start of a line. If you need to match each word in a multi-word line, you'll need a different Regex, e.g. "[A-Za-z]".
You're also not doing anything to actually change the first letter to upper case. Note that strings in C# are immutable (they cannot be changed after creation), so you will need to create a new string which consists of the first letter of the original string, upper cased, followed by the rest of the string. Give that part a try on your own. If you have trouble, post a new question with your attempt.
string pattern = "(?:^|(?<= ))(.)"
^ doesnt capture anything by itself.You can replace by uppercase letters by applying function to $1.See demo.
https://regex101.com/r/uE3cC4/29
I would approach this using Model Extensions.
PHP has a nice method called ucfirst.
So I translated that into C#
public static string UcFirst(this string s)
{
var stringArr = s.ToCharArray(0, s.Length);
var char1ToUpper = char.Parse(stringArr[0]
.ToString()
.ToUpper());
stringArr[0] = char1ToUpper;
return string.Join("", stringArr);
}
Usage:
[Test]
public void UcFirst()
{
string s = "john";
s = s.UcFirst();
Assert.AreEqual("John", s);
}
Obviously you would still have to split your sentence into a list and call UcFirst for each item in the list.
Google C# Model Extensions if you need help with what is going on.
One more way to do it with regex:
string input = "this string will be title cased, even if there are.cases.like.that";
string output = Regex.Replace(input, #"(?<!\w)\w", m => m.Value.ToUpper());
I hope this may help
public static string CapsFirstLetter(string inputValue)
{
char[] values = new char[inputValue.Length];
int count = 0;
foreach (char f in inputValue){
if (count == 0){
values[count] = Convert.ToChar(f.ToString().ToUpper());
}
else{
values[count] = f;
}
count++;
}
return new string(values);
}
I have a string like the one below and I want to replace the FieldNN instances with the ouput from a function.
So far I have been able to replace the NN instances with the output from the function. But I am not sure how I can delete the static "field" portion with the same regex.
input string:
(Field30="2010002257") and Field1="yuan" not Field28="AAA"
required output:
(IncidentId="2010002257") and Author="yuan" not Recipient="AAA"
This is the code I have so far:
public string translateSearchTerm(string searchTerm) {
string result = "";
result = Regex.Replace(searchTerm.ToLower(), #"(?<=field).*?(?=\=)", delegate(Match Match) {
string fieldId = Match.ToString();
return String.Format("_{0}", getFieldName(Convert.ToInt64(fieldId)));
});
log.Info(String.Format("result={0}", result));
return result;
}
which gives:
(field_IncidentId="2010002257") and field_Author="yuan" not field_Recipient="aaa"
The issues I would like to resolve are:
Remove the static "field" prefixes from the output.
Make the regex case-insenitive on the "FieldNN" parts and not lowercase the quoted text portions.
Make the regex more robust so that the quoted string parts an use either double or single quotes.
Make the regex more robust so that spaces are ignored: FieldNN = "AAA" vs. FieldNN="AAA"
I really only need to address the first issue, the other three would be a bonus but I could probably fix those once I have discovered the right patterns for whitespace and quotes.
Update
I think the pattern below solves issues 2. and 4.
result = Regex.Replace(searchTerm, #"(?<=\b(?i:field)).*?(?=\s*\=)", delegate(Match Match)
To fix first issue use groups instead of positive lookbehind:
public string translateSearchTerm(string searchTerm) {
string result = "";
result = Regex.Replace(searchTerm.ToLower(), #"field(.*?)(?=\=)", delegate(Match Match) {
string fieldId = Match.Groups[1].Value;
return getFieldName(Convert.ToInt64(fieldId));
});
log.Info(String.Format("result={0}", result));
return result;
}
In this case "field" prefix will be included in each match and will be replaced.