Regex alternative for extracting numeric and non-numeric strings - c#

Using the below expression, I'm able to get the expected output and extract numbers or string and split to a string array.
Regex _re = new Regex(#"(?<=\D)(?=\d)|(?<=\d)(?=\D)", RegexOptions.Compiled);
_re.Split("2323dfdf233fgfgfg ddfdf334").Dump(); //string can be any alphanumeric start with
How to achieve the same thing without using Regex? Do I need to parse each char and segregate? I have a large array of text which needs to be processed to extract but I cannot use regex as inputs provided.

For a Linq solution, you can combine the use of Enumerable.Skip() and Enumerable.TakeWhile() while checking for char.IsDigit() to determine whether the character is a digit or not. For example:
string inputString = "2323dfdf233fgfgfg ddfdf334";
var list = new List<string>();
int usedLength = 0;
while (usedLength < inputString.Length)
{
bool isDigit = char.IsDigit(inputString[usedLength]);
string item = string.Concat(inputString.Skip(usedLength).
TakeWhile((c) => char.IsDigit(c) == isDigit));
usedLength += item.Length;
list.Add(item);
};
Then you can easily iterate through the list:
foreach (string item in list)
Console.WriteLine(item);
Output:
2323
dfdf
233
fgfgfg ddfdf
334

This solution is fast enough. Check with larger strings.
string str = "2323dfdf233fgfgfg ddfdf334";
var strings = new List<string>();
var sb = new StringBuilder();
var lastCharIsNumber = char.IsDigit(str[0]);
foreach (var c in str) {
if (char.IsDigit(c) ) {
if (!lastCharIsNumber) {
strings.Add(sb.ToString());
sb.Clear();
}
lastCharIsNumber = true;
}
else {
if (lastCharIsNumber) {
strings.Add(sb.ToString());
sb.Clear();
}
lastCharIsNumber = false;
}
sb.Append(c);
}
strings.Add(sb.ToString());
strings.Dump();

Related

Find two strings in list with a regular expression

I need to find two strings within a list that contains the characters from another string, which are not in order. To make it clear, an example could be a list of animals like:
lion
dog
bear
cat
And a given string is: oodilgn.
The answer here would be: lion and dog
Each character from the string will be used only once.
Is there a regular expression that will allow me to do this?
You could try to put the given string between []. These brackets will allow choosing - in any order - from these letters only. This may not be a perfect solution, but it will catch the majority of your list.
For example, you could write oodilgn as [oodilgn], then add a minimum number of letters to be found - let's say 3 - by using the curly brackets {}. The full regex will be like this:
[oodilgn]{3,}
This code basically says: find any word that has three of the letters that are located between brackets in any order.
Demo: https://regex101.com/r/MCWHjQ/2
Here is some example algorithm that does the job. I have assumed that the two strings together don't need to take all letters from the text else i make additional commented check. Also i return first two appropriate answers.
Here is how you call it in the outside function, Main or else:
static void Main(string[] args)
{
var text = "oodilgn";
var listOfWords = new List<string> { "lion", "dog", "bear", "cat" };
ExtractWordsWithSameLetters(text, listOfWords);
}
Here is the function with the algorithm. All string manuplations are entirely with regex.
public static void ExtractWordsWithSameLetters(string text, List<string> listOfWords)
{
string firstWord = null;
string secondWord = null;
for (var i = 0; i < listOfWords.Count - 1; i++)
{
var textCopy = text;
var firstWordIsMatched = true;
foreach (var letter in listOfWords[i])
{
var pattern = $"(.*?)({letter})(.*?)";
var regex = new Regex(pattern);
if (regex.IsMatch(text))
{
textCopy = regex.Replace(textCopy, "$1*$3", 1);
}
else
{
firstWordIsMatched = false;
break;
}
}
if (!firstWordIsMatched)
{
continue;
}
firstWord = listOfWords[i];
for (var j = i + 1; j < listOfWords.Count; j++)
{
var secondWordIsMatched = true;
foreach (var letter in listOfWords[j])
{
var pattern = $"(.*?)({letter})(.*?)";
var regex = new Regex(pattern);
if (regex.IsMatch(text))
{
textCopy = regex.Replace(textCopy, "$1*$3", 1);
}
else
{
secondWordIsMatched = false;
break;
}
}
if (secondWordIsMatched)
{
secondWord = listOfWords[j];
break;
}
}
if (secondWord == null)
{
firstWord = null;
}
else
{
//if (textCopy.ToCharArray().Any(l => l != '*'))
//{
// break;
//}
break;
}
}
if (firstWord != null)
{
Console.WriteLine($"{firstWord} { secondWord}");
}
}
Function is far from optimised but does what you want. If you want to return results, not print them just create an array and stuff firstWord and secondWord in it and have return type string[] or add two paramaters with ref out In those cases you will need to check the result in the calling function.
please try this out
Regex r=new Regex("^[.*oodilgn]$");
var list=new List<String>(){"lion","dog","fish","god"};
var output=list.Where(x=>r.IsMatch(x));
result
output=["lion","dog","god"];

How do I check if a string contains a string from an array of strings?

So here is my example
string test = "Hello World, I am testing this string.";
string[] myWords = {"testing", "string"};
How do I check if the string test contains any of the following words? If it does contain how do I make it so that it can replace those words with a number of asterisks equal to the length of that?
You can use a regex:
public string AstrixSomeWords(string test)
{
Regex regex = new Regex(#"\b\w+\b");
return regex.Replace(test, AsterixWord);
}
private string AsterixWord(Match match)
{
string word = match.Groups[0].Value;
if (myWords.Contains(word))
return new String('*', word.Length);
else
return word;
}
I have checked the code and it seems to work as expected.
If the number of words in myWords is large you might consider using HashSet for better performance.
bool cont = false;
string test = "Hello World, I am testing this string.";
string[] myWords = { "testing", "string" };
foreach (string a in myWords)
{
if( test.Contains(a))
{
int no = a.Length;
test = test.Replace(a, new string('*', no));
}
}
var containsAny = myWords.Any(x => test.Contains(x));
Something like this
foreach (var word in mywords){
if(test.Contains(word )){
string astr = new string("*", word.Length);
test.Replace(word, astr);
}
}
EDIT: Refined

Splitting a string array

I have a string array string[] arr, which contains values like N36102W114383, N36102W114382 etc...
I want to split the each and every string such that the value comes like this N36082 and W115080.
What is the best way to do this?
This should work for you.
Regex regexObj = new Regex(#"\w\d+"); # matches a character followed by a sequence of digits
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
matchResults = matchResults.NextMatch(); #two mathches N36102 and W114383
}
If you have the fixed format every time you can just do this:
string[] split_data = data_string.Insert(data_string.IndexOf("W"), ",")
.Split(",", StringSplitOptions.None);
Here you insert a recognizable delimiter into your string and then split it by this delimiter.
Forgive me if this doesn't quite compile, but I'd just break down and write the string processing function by hand:
public static IEnumerable<string> Split(string str)
{
char [] chars = str.ToCharArray();
int last = 0;
for(int i = 1; i < chars.Length; i++) {
if(char.IsLetter(chars[i])) {
yield return new string(chars, last, i - last);
last = i;
}
}
yield return new string(chars, last, chars.Length - last);
}
If you use C#, please try:
String[] code = new Regex("(?:([A-Z][0-9]+))").Split(text).Where(e => e.Length > 0 && e != ",").ToArray();
in case you're only looking for the format NxxxxxWxxxxx, this will do just fine :
Regex r = new Regex(#"(N[0-9]+)(W[0-9]+)");
Match mc = r.Match(arr[i]);
string N = mc.Groups[1];
string W = mc.Groups[2];
Using the 'Split' and 'IsLetter' string functions, this is relatively easy in c#.
Don't forget to write unit tests - the following may have some corner case errors!
// input has form "N36102W114383, N36102W114382"
// output: "N36102", "W114383", "N36102", "W114382", ...
string[] ParseSequenceString(string input)
{
string[] inputStrings = string.Split(',');
List<string> outputStrings = new List<string>();
foreach (string value in inputstrings) {
List<string> valuesInString = ParseValuesInString(value);
outputStrings.Add(valuesInString);
}
return outputStrings.ToArray();
}
// input has form "N36102W114383"
// output: "N36102", "W114383"
List<string> ParseValuesInString(string inputString)
{
List<string> outputValues = new List<string>();
string currentValue = string.Empty;
foreach (char c in inputString)
{
if (char.IsLetter(c))
{
if (currentValue .Length == 0)
{
currentValue += c;
} else
{
outputValues.Add(currentValue);
currentValue = string.Empty;
}
}
currentValue += c;
}
outputValues.Add(currentValue);
return outputValues;
}

Extracting variable number of token pairs from a string to a pair of arrays

Here is the requirement.
I have a string with multiple entries of a particular format. Example below
string SourceString = "<parameter1(value1)><parameter2(value2)><parameter3(value3)>";
I want to get the ouput as below
string[] parameters = {"parameter1","parameter2","parameter3"};
string[] values = {"value1","value2","value3"};
The above string is just an example with 3 pairs of parameter values. The string may have 40, 52, 75 - any number of entries (less than 100 in one string).
Like this I have multiple strings in an array. I want to do this operation for all the strings in the array.
Could any one please advice how to achieve this? I'm a novice in c#.
Is using regex a better solution or is there any other method?
Any help is much appreciated.
If you didn't like RegEx's you could do something like this:
class Program
{
static void Main()
{
string input = "<parameter1(value1)>< parameter2(value2)>";
string[] Items = input.Replace("<", "").Split('>');
List<string> parameters = new List<string>();
List<string> values = new List<string>();
foreach (var item in Items)
{
if (item != "")
{
KeyValuePair<string, string> kvp = GetInnerItem(item);
parameters.Add(kvp.Key);
values.Add(kvp.Value);
}
}
// if you really wanted your results in arrays
//
string[] parametersArray = parameters.ToArray();
string[] valuesArray = values.ToArray();
}
public static KeyValuePair<string, string> GetInnerItem(string item)
{
//expects parameter1(value1)
string[] s = item.Replace(")", "").Split('(');
return new KeyValuePair<string, string>(s[0].Trim(), s[1].Trim());
}
}
It might be a wee bit quicker than the RegEx method but certainly not as flexible.
You could use RegEx class in combination with an expression to parse the string and generate these arrays by looping through MatchCollections.
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
This does it:
string[] parameters = null;
string[] values = null;
// string SourceString = "<parameter1(value1)><parameter2(value2)><parameter3(value3)>";
string SourceString = #"<QUEUE(C2.BRH.ARB_INVPUSH01)><CHANNEL(C2.MONITORING_CHANNEL)><QMGR(C2.MASTER_NA‌​ME.TRACKER)>";
// string regExpression = #"<([^\(]+)[\(]([\w]+)";
string regExpression = #"<([^\(]+)[\(]([^\)]+)";
Regex r = new Regex(regExpression);
MatchCollection collection = r.Matches(SourceString);
parameters = new string[collection.Count];
values = new string[collection.Count];
for (int i = 0; i < collection.Count; i++)
{
Match m = collection[i];
parameters[i] = m.Groups[1].Value;
values[i] = m.Groups[2].Value;
}

Extracting strings in .NET

I have a string that looks like this:
var expression = #"Args("token1") + Args("token2")";
I want to retrieve a collection of strings that are enclosed in Args("") in the expression.
How would I do this in C# or VB.NET?
Regex:
string expression = "Args(\"token1\") + Args(\"token2\")";
Regex r = new Regex("Args\\(\"([^\"]+)\"\\)");
List<string> tokens = new List<string>();
foreach (var match in r.Matches(expression)) {
string s = match.ToString();
int start = s.IndexOf('\"');
int end = s.LastIndexOf('\"');
tokens.add(s.Substring(start + 1, end - start - 1));
}
Non-regex (this assumes that the string in the correct format!):
string expression = "Args(\"token1\") + Args(\"token2\")";
List<string> tokens = new List<string>();
int index;
while (!String.IsNullOrEmpty(expression) && (index = expression.IndexOf("Args(\"")) >= 0) {
int start = expression.IndexOf('\"', index);
string s = expression.Substring(start + 1);
int end = s.IndexOf("\")");
tokens.Add(s.Substring(0, end));
expression = s.Substring(end + 2);
}
There is another regular expression method for accomplishing this, using lookahead and lookbehind assertions:
Regex regex = new Regex("(?<=Args\\(\").*?(?=\"\\))");
string input = "Args(\"token1\") + Args(\"token2\")";
MatchCollection matches = regex.Matches(input);
foreach (var match in matches)
{
Console.WriteLine(match.ToString());
}
This strips away the Args sections of the string, giving just the tokens.
If you want token1 and token2, you can use following regex
input=#"Args(""token1"") + Args(""token2"")"
MatchCollection matches = Regex.Matches(input,#"Args\(""([^""]+)""\)");
Sorry, If this is not what you are looking for.
if your collection looks like this:
IList<String> expression = new List<String> { "token1", "token2" };
var collection = expression.Select(s => Args(s));
As long as Args returns the same type as the queried collection type this should work okay
you can then iterate over the collection like so
foreach (var s in collection)
{
Console.WriteLine(s);
}

Categories