Extract all numbers from string - c#

Let's say I have a string such as 123ad456. I want to make a method that separates the groups of numbers into a list, so then the output will be something like 123,456.
I've tried doing return Regex.Match(str, #"-?\d+").Value;, but that only outputs the first occurrence of a number, so the output would be 123. I also know I can use Regex.Matches, but from my understanding, that would output 123456, not separating the different groups of numbers.
I also see from this page on MSDN that Regex.Match has an overload that takes the string to find a match for and an int as an index at which to search for the match, but I don't see an overload that takes in the above in addition to a parameter for the regex pattern to search for, and the same goes for Regex.Matches.
I guess the approach to use would be to use a for loop of some sort, but I'm not entirely sure what to do. Help would be greatly appreciated.

All you have to to use Matches instead of Match. Then simply iterate over all matches:
string result = "";
foreach (Match match in Regex.Matches(str, #"-?\d+"))
{
result += match.result;
}

You may iterate over string data using foreach and use TryParse to check each character.
foreach (var item in stringData)
{
if (int.TryParse(item.ToString(), out int data))
{
// int data is contained in variable data
}
}

Using a combination of string.Join and Regex.Matches:
string result = string.Join(",", Regex.Matches(str, #"-?\d+").Select(m => m.Value));
string.Join performs better than continually appending to an existing string.

\d+ is the regex for integer numbers;
//System.Text.RegularExpressions.Regex
resultString = Regex.Match(subjectString, #"\d+").Value;
returns a string with the very first occurence of a number in subjectString.
Int32.Parse(resultString) will then give you the number.

Related

C# Char Array remove at specific index

Not to sure the best way to remove the char from the char array if the char at a given index is a number.
private string TextBox_CharacterCheck(string tocheckTextBox)
{
char[] charlist = tocheckTextBox.ToCharArray();
foreach (char character in charlist)
{
if (char.IsNumber(character))
{
}
}
return (new string(charlist));
}
Thanks in advance.
// this is now resolved. thank you to all who contributed
You could use the power of Linq:
return new string(tocheckTextBox.Where(c => !char.IsNumber(c)).ToArray())
This is fairly easy using Regex:
var result = Regex.Replace("a1b2c3d4", #"\d", "");
(as #Adassko notes, you can use "[0-9]" instead of #"\d" if you just want the digits 0 to 9, and not any other numeric characters).
You can also do it fairly efficiently using a StringBuilder:
var sb = new StringBuilder();
foreach (var ch in "a1b2c3d4")
{
if (!char.IsNumber(ch))
{
sb.Append(ch);
}
}
var result = sb.ToString();
You can also do it with linq:
var result = new string("a1b2c3d4".Where(x => !char.IsNumber(x)).ToArray());
Use Regex:
private string TextBox_CharacterCheck(string tocheckTextBox)
{
return Regex.Replace(tocheckTextBox, #"[\d]", string.Empty);;
}
System.String is immutable. You could use string.Replace or a regular expression to remove unwanted characters into a new string.
your best bet is to use regular expressions.
strings are immutable meaning that you can't change them - you need to rewrite the whole string - to do it in optimal way you should use StringBuilder class and Append every character that you want.
Also watch out for your code - char.IsNumber checks not only for characters 0-9, it also returns true for every numeric character such as ٢ and you probably don't want that.
here's the full list of characters returning true:
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙0123456789
you should also use [0-9] rather than \d in your regular expressions if you want only parsable digits.
You can also use a trick to .Split your string on your character, then .Join it back. This not only allows you to remove one or more characters, it also lets you to replace it with some other character.
I use this trick to remove incorrect characters from file name:
string.Join("-", possiblyIncorrectFileName.Split(Path.GetInvalidFileNameChars()))
this code will replace any character that cannot be used in valid file name to -
You can use LINQ to remove the char from the char array if the char at a given index is a number.
CODE
//This will return you the list of char discarding the number.
var removedDigits = tocheckTextBox.Where(x => !char.IsDigit(x));
//This will return the string without numbers.
string output = string.join("", removedDigits);

Can anyone improve this 'list of IP addresses' regex? [duplicate]

What is the regular expression to validate a comma delimited list like this one:
12365, 45236, 458, 1, 99996332, ......
I suggest you to do in the following way:
(\d+)(,\s*\d+)*
which would work for a list containing 1 or more elements.
This regex extracts an element from a comma separated list, regardless of contents:
(.+?)(?:,|$)
If you just replace the comma with something else, it should work for any delimiter.
It depends a bit on your exact requirements. I'm assuming: all numbers, any length, numbers cannot have leading zeros nor contain commas or decimal points. individual numbers always separated by a comma then a space, and the last number does NOT have a comma and space after it. Any of these being wrong would simplify the solution.
([1-9][0-9]*,[ ])*[1-9][0-9]*
Here's how I built that mentally:
[0-9] any digit.
[1-9][0-9]* leading non-zero digit followed by any number of digits
[1-9][0-9]*, as above, followed by a comma
[1-9][0-9]*[ ] as above, followed by a space
([1-9][0-9]*[ ])* as above, repeated 0 or more times
([1-9][0-9]*[ ])*[1-9][0-9]* as above, with a final number that doesn't have a comma.
Match duplicate comma-delimited items:
(?<=,|^)([^,]*)(,\1)+(?=,|$)
Reference.
This regex can be used to split the values of a comma delimitted list. List elements may be quoted, unquoted or empty. Commas inside a pair of quotation marks are not matched.
,(?!(?<=(?:^|,)\s*"(?:[^"]|""|\\")*,)(?:[^"]|""|\\")*"\s*(?:,|$))
Reference.
/^\d+(?:, ?\d+)*$/
i used this for a list of items that had to be alphanumeric without underscores at the front of each item.
^(([0-9a-zA-Z][0-9a-zA-Z_]*)([,][0-9a-zA-Z][0-9a-zA-Z_]*)*)$
You might want to specify language just to be safe, but
(\d+, ?)+(\d+)?
ought to work
I had a slightly different requirement, to parse an encoded dictionary/hashtable with escaped commas, like this:
"1=This is something, 2=This is something,,with an escaped comma, 3=This is something else"
I think this is an elegant solution, with a trick that avoids a lot of regex complexity:
if (string.IsNullOrEmpty(encodedValues))
{
return null;
}
else
{
var retVal = new Dictionary<int, string>();
var reFields = new Regex(#"([0-9]+)\=(([A-Za-z0-9\s]|(,,))+),");
foreach (Match match in reFields.Matches(encodedValues + ","))
{
var id = match.Groups[1].Value;
var value = match.Groups[2].Value;
retVal[int.Parse(id)] = value.Replace(",,", ",");
}
return retVal;
}
I think it can be adapted to the original question with an expression like #"([0-9]+),\s?" and parse on Groups[0].
I hope it's helpful to somebody and thanks for the tips on getting it close to there, especially Asaph!
In JavaScript, use split to help out, and catch any negative digits as well:
'-1,2,-3'.match(/(-?\d+)(,\s*-?\d+)*/)[0].split(',');
// ["-1", "2", "-3"]
// may need trimming if digits are space-separated
The following will match any comma delimited word/digit/space combination
(((.)*,)*)(.)*
Why don't you work with groups:
^(\d+(, )?)+$
If you had a more complicated regex, i.e: for valid urls rather than just numbers. You could do the following where you loop through each element and test each of them individually against your regex:
const validRelativeUrlRegex = /^(^$|(?!.*(\W\W))\/[a-zA-Z0-9\/-]+[^\W_]$)/;
const relativeUrls = "/url1,/url-2,url3";
const startsWithComma = relativeUrls.startsWith(",");
const endsWithComma = relativeUrls.endsWith(",");
const areAllURLsValid = relativeUrls
.split(",")
.every(url => validRelativeUrlRegex.test(url));
const isValid = areAllURLsValid && !endsWithComma && !startsWithComma

Extracting and Manipulating Strings in C#.Net

We have a requirement to extract and manipulate strings in C#. Net. The requirement is - we have a string
($name$:('George') AND $phonenumer$:('456456') AND
$emailaddress$:("test#test.com"))
We need to extract the strings between the character - $
Therefore, in the end, we need to get a list of strings containing - name, phonenumber, emailaddress.
What would be the ideal way to do it? are there any out of the box features available for this?
Regards,
John
The simplest way is to use a regular expression to match all non-whitespace characters between $ :
var regex=new Regex(#"\$\w+\$");
var input = "($name$:('George') AND $phonenumer$:('456456') AND $emailaddress$:(\"test#test.com\"))";
var matches=regex.Matches(input);
This will return a collection of matches. The .Value property of each match contains the matching string. \$ is used because $ has special meaning in regular expressions - it matches the end of a string. \w means a non-whitespace character. + means one or more.
Since this is a collection, you can use LINQ on it to get eg an array with the values:
var values=matches.OfType<Match>().Select(m=>m.Value).ToArray();
That array will contain the values $name$,$phonenumer$,$emailaddress$.
Capture by name
You can specify groups in the pattern and attach names to them. For example, you can group the field name values:
var regex=new Regex(#"\$(?<name>\w+)\$");
var names=regex.Matches(input)
.OfType<Match>()
.Select(m=>m.Groups["name"].Value);
This will return name,phonenumer,emailaddress. Parentheses are used for grouping. (?<somename>pattern) is used to attach a name to the group
Extract both names and values
You can also capture the field values and extract them as a separate field. Once you have the field name and value, you can return them, eg as an object or anonymous type.
The pattern in this case is more comples:
#"\$(?<name>\w+)\$:\(['""](?<value>.+?)['""]\)"
Parentheses are escaped because we want them to match the values. Both ' and " characters are used in values, so ['"] is used to specify a choice of characters. The pattern is a literal string (ie starts with #) so the double quotes have to be escaped: ['""] . Any character has to be matched .+ but only up to the next character in the pattern .+?. Without the ? the pattern .+ would match everything to the end of the string.
Putting this together:
var regex = new Regex(#"\$(?<name>\w+)\$:\(['""](?<value>.+?)['""]\)");
var myValues = regex.Matches(input)
.OfType<Match>()
.Select(m=>new { Name=m.Groups["name"].Value,
Value=m.Groups["value"].Value
})
.ToArray()
Turn them into a dictionary
Instead of ToArray() you could convert the objects to a dictionary with ToDictionary(), eg with .ToDictionary(it=>it.Name,it=>it.Value). You could omit the select step and generate the dictionary from the matches themselves :
var myDict = regex.Matches(input)
.OfType<Match>()
.ToDictionary(m=>m.Groups["name"].Value,
m=>m.Groups["value"].Value);
Regular expressions are generally fast because they don't split the string. The pattern is converted to efficient code that parses the input and skips non-matching input immediatelly. Each match and group contain only the index to their starting and ending character in the input string. A string is only generated when .Value is called.
Regular expressions are thread-safe, which means a single Regex object can be stored in a static field and reused from multiple threads. That helps in web applications, as there's no need to create a new Regex object for each request
Because of these two advantages, regular expressions are used extensively to parse log files and extract specific fields. Compared to splitting, performance can be 10 times better or more, while memory usage remains low. Splitting can easily result in memory usage that's multiple times bigger than the original input file.
Can it go faster?
Yes. Regular expressions produce parsing code that may not be as efficient as possible. A hand-written parser could be faster. In this particular case, we want to start capturing text if $ is detected up until the first $. This can be done with the following method :
IEnumerable<string> GetNames(string input)
{
var builder=new StringBuilder(20);
bool started=false;
foreach(var c in input)
{
if (started)
{
if (c!='$')
{
builder.Append(c);
}
else
{
started=false;
var value=builder.ToString();
yield return value;
builder.Clear();
}
}
else if (c=='$')
{
started=true;
}
}
}
A string is an IEnumerable<char> so we can inspect one character at a time without having to copy them. By using a single StringBuilder with a predetermined capacity we avoid reallocations, at least until we find a key that's larger than 20 characters.
Modifying this code to extract values though isn't so easy.
Here's one way to do it, but certainly not very elegant. Basically splitting the string on the '$' and taking every other item will give you the result (after some additional trimming of unwanted characters).
In this example, I'm also grabbing the value of each item and then putting both in a dictionary:
var input = "($name$:('George') AND $phonenumer$:('456456') AND $emailaddress$:(\"test#test.com\"))";
var inputParts = input.Replace(" AND ", "")
.Trim(')', '(')
.Split(new[] {'$'}, StringSplitOptions.RemoveEmptyEntries);
var keyValuePairs = new Dictionary<string, string>();
for (int i = 0; i < inputParts.Length - 1; i += 2)
{
var key = inputParts[i];
var value = inputParts[i + 1].Trim('(', ':', ')', '"', '\'', ' ');
keyValuePairs[key] = value;
}
foreach (var kvp in keyValuePairs)
{
Console.WriteLine($"{kvp.Key} = {kvp.Value}");
}
// Wait for input before closing
Console.WriteLine("\nDone!\nPress any key to exit...");
Console.ReadKey();
Output

Check and Return Keyword in a List of String

I was wondering if there's a way to check if a string contains any of the keyword is a list, and return the found keyword if found.
For example, I have a list of keywords.
List<string> keywords = new List<string>{"word1", "word2", "word3"};
And I have a sentence (string) I want to check against keywords:
string sentence = "something something something word2 something something";
Is there a way to search for the keywords in the sentence and return the found one? For example, return word2.
I know I can probably just use a forloop to loop through keywords, but since there will be at least 20 keyword in my actual program, I don't want to do so as it makes my code kinda messy.
My original idea is like this:
string SearchKeywords(List<string> keywords, string sentence){
foreach (string word in keywords){
if (sentence.Contains(word)) return word;
}
return ""; //return blank string if no match found
}
I'm wondering if there's a built-in function that I can use to do the job. Thanks!
Using regex, you can create an alternation using your keywords to get a pattern of word1|word2|word3. They should be escaped via Regex.Escape to avoid conflicting with any regex metacharacters. Ignoring case is done by adding the RegexOptions.IgnoreCase option.
string pattern = String.Join("|", keywords.Select(k => Regex.Escape(k)));
Match m = Regex.Match(sentence, pattern, RegexOptions.IgnoreCase);
if (m.Success)
{
Console.WriteLine("Keyword found: {0}", m.Value);
}
else
{
Console.WriteLine("No keywords found!");
}
If you change your mind and want to find multiple matches, use Regex.Matches instead and loop through its results.
You could use Linq's FirstOrDefault extension method:
string SearchKeywords(List<string> keywords, string sentence){
return keywords.FirstOrDefault(w => sentence.Contains(w)) ?? "";
}
The ?? "" at the end simply means that if no keyword is found within the string, your method should return an empty string.
If you know that you'll only get one word per sentence (and you're sure you're only checking one sentence at a time), I'd use p.s.w.g's answer. For a similar one which will look through a longer string (and thus possibly return multiple results) just use .Where() like this:
IEnumerable<string> SearchKeywords(List<string> keywords, string sentence)
{
return keywords.Where(w => sentence.ToLower().Contains(w.ToLower()));
}
In this case, you don't need the null coalescing operator (??) because the Enumerable will be returned either way, if there aren't any matches, then it will return an Enumerable with no elements.
Note that this will find partial-word matches (for instance, keyword "cray" would match on sentence word "crayon." You can fix that by adding spaces around your keywords (thus: " cray "), or by splitting your sentence into an array with .Split() and check the two Enumerables against each other:
IEnumerable<string> SearchKeywords(List<string> keywords, string sentence)
{
var splitSentence = sentence.ToLower().Split(' ');
return keywords.Intersect(splitSentence);
}

what is a good pattern to processes each individual regex match through a method

I'm trying to figure out a pattern where I run a regex match on a long string, and each time it finds a match, it runs a replace on it. The thing is, the replace will vary depending on the matched value. This new value will be determined by a method. For example:
var matches = Regex.Match(myString, myPattern);
while(matches.Success){
Regex.Replace(myString, matches.Value, GetNewValue(matches.Groups[1]));
matches = matches.NextMatch();
}
The problem (i think) is that if I run the Regex.Replace, all of the match indexes get messed up so the result ends up coming out wrong. Any suggestions?
If you replace each pattern with a fixed string, Regex.replace does that for you. You don't need to iterate the matches:
Regex.Replace(myString, myPattern, "replacement");
Otherwise, if the replacement depends upon the matched value, use the MatchEvaluator delegate, as the 3rd argument to Regex.Replace. It receives an instance of Match and returns string. The return value is the replacement string. If you don't want to replace some matches, simply return match.Value:
string myString = "aa bb aa bb";
string myPattern = #"\w+";
string result = Regex.Replace(myString, myPattern,
match => match.Value == "aa" ? "0" : "1" );
Console.WriteLine(result);
// 0 1 0 1
If you really need to iterate the matches and replace them manually, you need to start replacement from the last match towards the first, so that the index of the string is not ruined for the upcoming matches. Here's an example:
var matches = Regex.Matches(myString, myPattern);
var matchesFromEndToStart = matches.Cast<Match>().OrderByDescending(m => m.Index);
var sb = new StringBuilder(myString);
foreach (var match in matchesFromEndToStart)
{
if (IsGood(match))
{
sb.Remove(match.Index, match.Length)
.Insert(match.Index, GetReplacementFor(match));
}
}
Console.WriteLine(sb.ToString());
Just be careful, that your matches do not contain nested instances. If so, you either need to remove matches which are inside another match, or rerun the regex pattern to generate new matches after each replacement. I still recommend the second approach, which uses the delegates.
If I understand your question correctly, you want to perform a replace based on a constant Regular Expression, but the replacement text you use will change based on the actual text that the regex matches on.
The Captures property of the Match Class (not the Match method) returns a collection of all the matches with your regex within the input string. It contains information like the position within the string, the matched value and the length of the match. If you iterate over this collection with a foreach loop you should be able to treat each match individually and perform some string manipulations where you can dynamically modify the replacement value.
I would use something like
Regex regEx = new Regex("some.*?pattern");
string input = "someBLAHpattern!";
foreach (Match match in regEx.Matches(input))
{
DoStuffWith(match.Value);
}

Categories