Using linq to count substrings in a string?

Using linq to count substrings in a string? - c#

I could use the following linq expression to count the number of occurrences of a word as follows:
string test = "And And And";
int j = test.Split(' ').Count(x => x.Contains("And"));
However what if I was searching for "And And", Is there a way to use linq to count words without using split. Do any of these methods take longer the O(n)?

You can use a regular expression:
string test = "And And And";
int j = Regex.Matches(test, "And").Cast<Match>().Count();
BTW, do you want to allow overlapping occurrences? i.e. if you're looking for "And And", do you consider that test contains 1 or 2 occurrences of it?

I found a clever solution that can be resolved serverside with most LINQ to ORMs:
string search = "foo";
int searchLength = search.Length;
var result = qry.Select(i => new { Object = i, Occurrences = (i.SomeProperty.Length - i.SomeProperty.Replace(search, "").Length) / searchLength });
The idea is to replace the substring by an empty string and then divide the difference in string length by the length of the search term.

You can use IndexOf:
string what = "And";
int count = 0;
int pos = -what.Length;
for (;;)
{
pos = input.IndexOf(what, pos + what.Length);
if (pos == -1) break;
count++;
}

This is not quite Linq, but you can also make an extension method like below. It is probably more efficient than any Linq solution:
public static int CountSubStrings(this string input, string delimiter, bool ignoreCase = false)
{
int instancesNo = 0;
int pos = 0;
while((pos = input.IndexOf(delimiter, pos, ignoreCase ? StringComparison.InvariantCultureIgnoreCase : StringComparison.InvariantCulture)) != -1)
{
pos += delimiter.Length;
instancesNo++;
}
return instancesNo;
}

Related

C#: Need to split a string into a string[] and keeping the delimiter (also a string) at the beginning of the string

I think I am too dumb to solve this problem...
I have some formulas which need to be "translated" from one syntax to another.
Let's say I have a formula that goes like that (it's a simple one, others have many "Ceilings" in it):
string formulaString = "If([Param1] = 0, 1, Ceiling([Param2] / 0.55) * [Param3])";
I need to replace "Ceiling()" with "Ceiling(; 1)" (basically, insert "; 1" before the ")").
My attempt is to split the fomulaString at "Ceiling(" so I am able to iterate through the string array and insert my string at the correct index (counting every "(" and ")" to get the right index)
What I have so far:
//splits correct, but loses "CEILING("
string[] parts = formulaString.Split(new[] { "CEILING(" }, StringSplitOptions.None);
//splits almost correct, "CEILING(" is in another group
string[] parts = Regex.Split(formulaString, #"(CEILING\()");
//splits almost every letter
string[] parts = Regex.Split(formulaString, #"(?=[(CEILING\()])");
When everything is done, I concat the string so I have my complete formula again.
What do I have to set as Regex pattern to achieve this sample? (Or any other method that will help me)
part1 = "If([Param1] = 0, 1, ";
part2 = "Ceiling([Param2] / 0.55) * [Param3])";
//part3 = next "CEILING(" in a longer formula and so on...

As I mention in a comment, you almost got it: (?=Ceiling). This is incomplete for your use case unfortunately.
I need to replace "Ceiling()" with "Ceiling(; 1)" (basically, insert "; 1" before the ")").
Depending on your regex engine (for example JS) this works:
string[] parts = Regex.Split(formulaString, #"(?<=Ceiling\([^)]*(?=\)))");
string modifiedFormula = String.join("; 1", parts);
The regex
(?<=Ceiling\([^)]*(?=\)))
(?<= ) Positive lookbehind
Ceiling\( Search for literal "Ceiling("
[^)] Match any char which is not ")" ..
* .. 0 or more times
(?=\)) Positive lookahead for ")", effectively making us stop before the ")"
This regex is a zero-assertion, therefore nothing is lost and it will cut your strings before the last ")" in every "Ceiling()".
This solution would break whenever you have nested "Ceiling()". Then your only solution would be writing your own parser for the same reasons why you can't parse markup with regex.

Regex.Replace(formulaString, #"(?<=Ceiling\()(.*?)(?=\))","$1; 1");
Note: This will not work for nested "Ceilings", but it does for Ceiling(), It will also not work fir Ceiling(AnotherFunc(x)). For that you need something like:
Regex.Replace(formulaString, #"(?<=Ceiling\()((.*\((?>[^()]+|(?1))*\))*|[^\)]*)(\))","$1; 1$3");
but I could not get that to work with .NET, only in JavaScript.

This is my solution:
private string ConvertCeiling(string formula)
{
int ceilingsCount = formula.CountOccurences("Ceiling(");
int startIndex = 0;
int bracketCounter;
for (int i = 0; i < ceilingsCount; i++)
{
startIndex = formula.IndexOf("Ceiling(", startIndex);
bracketCounter = 0;
for (int j = 0; j < formula.Length; j++)
{
if (j < startIndex) continue;
var c = formula[j];
if (c == '(')
{
bracketCounter++;
}
if (c == ')')
{
bracketCounter--;
if (bracketCounter == 0)
{
// found end
formula = formula.Insert(j, "; 1");
startIndex++;
break;
}
}
}
}
return formula;
}
And CountOccurence:
public static int CountOccurences(this string value, string parameter)
{
int counter = 0;
int startIndex = 0;
int indexOfCeiling;
do
{
indexOfCeiling = value.IndexOf(parameter, startIndex);
if (indexOfCeiling < 0)
{
break;
}
else
{
startIndex = indexOfCeiling + 1;
counter++;
}
} while (true);
return counter;
}

Split string after specific character or after max length

i want to split a string the following way:
string s = "012345678x0123x01234567890123456789";
s.SplitString("x",10);
should be split into
012345678
x0123
x012345678
9012345678
9
e.g. the inputstring should be split after the character "x" or length 10 - what comes first.
here is what i've tried so far:
public static IEnumerable<string> SplitString(this string sInput, string search, int maxlength)
{
int index = Math.Min(sInput.IndexOf(search), maxlength);
int start = 0;
while (index != -1)
{
yield return sInput.Substring(start, index-start);
start = index;
index = Math.Min(sInput.IndexOf(search,start), maxlength);
}
}

I would go with this regular expression:
([^x]{1,10})|(x[^x]{1,9})
which means:
Match at most 10 characters that are not x OR match x followed by at most 9 characters thar are not x
Here is working example:
string regex = "([^x]{1,10})|(x[^x]{1,9})";
string input = "012345678x0123x01234567890123456789";
var results = Regex.Matches(input, regex)
.Cast<Match>()
.Select(m => m.Value);
which produces values by you.

Personally I don't like RegEx. It creates code that is hard to de-bug and is very hard to work out what it is meant to be doing when you first look at it. So for a more lengthy solution I would go with something like this.
public static IEnumerable<string> SplitString(this string sInput, char search, int maxlength)
{
var result = new List<string>();
var count = 0;
var lastSplit = 0;
foreach (char c in sInput)
{
if (c == search || count - lastSplit == maxlength)
{
result.Add(sInput.Substring(lastSplit, count - lastSplit));
lastSplit = count;
}
count ++;
}
result.Add(sInput.Substring(lastSplit, count - lastSplit));
return result;
}
Note I changed the first parameter to a char (from a string). This code can probably be optimised some more, but it is nice and readable, which for me is more important.

What is the best way to find length of split characters from the given string by using String.Split() Method or Linq Lambda Expression in C#

I have String called "RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa". I want to know number of "a" characters available in the give String. As per my knowledge I found two ways to find the count. That are: 1) By using String.Split() 2) Linq Lambda Expression
My Observations:
1) If i use String.Split() it is returning wrong result
2) If i use Linq Lambda Expression it is returning correct result.
Here my doubt is how can i get the count of the given split character from the given string by using String.Split()
And also please suggest me which is the best way to get count of the given split character from the given string either "String.Split()" or "Linq Lambda" expression?
Please find the complete example:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace SplitData
{
class Program
{
static void Main(string[] args)
{
SplitData("RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa", 'a');
SplitData("RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa", 'r');
SplitData("RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa", 'R');
SplitData("RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa", 'm');
SplitData("RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa", 'd');
SplitData("RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa", 'g');
SplitData("RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa", 's');
SplitData("RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa", 'o');
SplitData("RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa", 'c');
SplitData("RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa", 'u');
SplitData("RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa", 'f');
Console.ReadKey();
}
private static void SplitData(string data,char split)
{
// using lambda expresion
int len = data.AsEnumerable().Where(x => x.ToString().ToLower().Contains(split)).Count();
Console.WriteLine("Total '" + split + "' available are:{0} using lambda", len.ToString());
//using normal split function
len = data.Split(split).Length;
Console.WriteLine("Total '" + split + "' available are:{0} using normal split", len.ToString());
}
}
}

string str = "RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa";
int countA = str.Count(r => r == 'a');
If you want case insensitive count then:
string str = "RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa";
char searchChar = 'a';
int countA = str.Count(r => char.ToUpperInvariant(r) == char.ToUpperInvariant(searchChar));
If you ask for best option between string.Split and Linq Count, then IMO, LINQ is more readable. I am not sure about the performance but I suspect LINQ version to be faster.
If you want to use string.Split and make it case insensitive then construct a character array of two elements, (with upper case and lower case) and then use Split like:
string str = "RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa";
char searchChar = 'a';
char[] delimeters = new char[2];
delimeters[0] = char.ToLowerInvariant(searchChar);
delimeters[1] = char.ToUpperInvariant(searchChar);
var count = str.Split(delimeters).Length - 1;

You mean you want to count the occurances of a letter? Like this?
String data = "RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa";
Char letter = 'a';
Int32 totalOccurances = data.Count(character => character == letter);

For case-insensitive comparison, you can use a StringComparer instance or equivalent StringComparison enum.. As far as how you want to write it, pick your poison. =)
// caller specifies comparison type
int Count1(string str, char searchChar, StringComparison comparison = StringComparison.CurrentCultureIgnoreCase)
{
string searchStr = searchChar.ToString();
int count = 0;
for (int i = 0; i < str.Length; i++)
if (string.Equals(searchStr, str[i].ToString(), comparison))
count++;
return count;
}
// ordinal comparison
int Count2(string str, char searchChar)
{
string searchStr = searchChar.ToString();
int count = 0;
for (int i = 0; i < str.Length; i++)
if (searchChar == str[i])
count++;
return count;
}
// ordinal comparison
int Count3(string str, char searchChar)
{
return str.Split(searchChar).Length - 1;
}
// ordinal comparison
int Count4(string str, char searchChar)
{
return str.Count(c => c == searchChar);
}
// caller specifies comparison type
int Count5(string str, char searchChar, StringComparison comparison = StringComparison.CurrentCultureIgnoreCase)
{
string searchStr = searchChar.ToString();
return str.Count(c => string.Equals(c.ToString(), searchStr, comparison));
}

Not a fancy LINQ solution but nevertheless
int count = CountChar("RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa", 'a');
.....
int CountChar(string input, char toFind)
{
int count = 0;
int pos = -1;
while((pos = input.IndexOf(toFind, pos+1)) != -1)
count++;
return count;
}
String.IndexOf starting from a position
and there is also the case insensitive option
EDIT: Well now I was curious and decided to measure the timing with this and the lambda solution.
The difference is remarkable.....
void Main()
{
string str = "RamaSubbaReddyabcdaacacakkkoooahgafffgahgghsa";
Stopwatch sw = new Stopwatch();
sw.Start();
for(int i = 0; i < 10000000; i++)
{
int count = CountChar(str, 'a');
}
sw.Stop();
Console.WriteLine("Using IndexOf:" + sw.ElapsedMilliseconds.ToString());
sw.Reset();
sw.Start();
for(int i = 0; i < 10000000; i++)
{
int countA = str.Count(r => r == 'a');
}
sw.Stop();
Console.WriteLine("Using Count:" + sw.ElapsedMilliseconds.ToString());
}
The first loop ends in 1160 milliseconds and the second one in 6200 milliseconds.
Can someone spot if there are problems in this measurements?

Finding multiple indexes from source string

Basically I need to do String.IndexOf() and I need to get array of indexes from the source string.
Is there easy way to get array of indexes?
Before asking this question I have Googled a lot, but have not found easy solution to solve this simple problem.

How about this extension method:
public static IEnumerable<int> IndexesOf(this string haystack, string needle)
{
int lastIndex = 0;
while (true)
{
int index = haystack.IndexOf(needle, lastIndex);
if (index == -1)
{
yield break;
}
yield return index;
lastIndex = index + needle.Length;
}
}
Note that when looking for "AA" in "XAAAY" this code will now only yield 1.
If you really need an array, call ToArray() on the result. (This is assuming .NET 3.5 and hence LINQ support.)

var indexs = "Prashant".MultipleIndex('a');
//Extension Method's Class
public static class Extensions
{
static int i = 0;
public static int[] MultipleIndex(this string StringValue, char chChar)
{
var indexs = from rgChar in StringValue
where rgChar == chChar && i != StringValue.IndexOf(rgChar, i + 1)
select new { Index = StringValue.IndexOf(rgChar, i + 1), Increament = (i = i + StringValue.IndexOf(rgChar)) };
i = 0;
return indexs.Select(p => p.Index).ToArray<int>();
}
}

You would have to loop, I suspect:
int start = 0;
string s = "abcdeafghaji";
int index;
while ((index = s.IndexOf('a', start)) >= 0)
{
Console.WriteLine(index);
start = index + 1;
}

Using a solution that utilizes regex can be more reliable, using the indexOf function can be unreliable. It will find all matches and indexes, not matching an exact phrase which can lead to unexpected results. This function resolves that by making use of the Regex library.
public static IEnumerable<int> IndexesOf(string haystack, string needle)
{
Regex r = new Regex("\\b(" + needle + ")\\b");
MatchCollection m = r.Matches(haystack);
return from Match o in m select o.Index;
}

How do I extract text that lies between parentheses (round brackets)?

I have a string User name (sales) and I want to extract the text between the brackets, how would I do this?
I suspect sub-string but I can't work out how to read until the closing bracket, the length of text will vary.

If you wish to stay away from regular expressions, the simplest way I can think of is:
string input = "User name (sales)";
string output = input.Split('(', ')')[1];

A very simple way to do it is by using regular expressions:
Regex.Match("User name (sales)", #"\(([^)]*)\)").Groups[1].Value
As a response to the (very funny) comment, here's the same Regex with some explanation:
\( # Escaped parenthesis, means "starts with a '(' character"
( # Parentheses in a regex mean "put (capture) the stuff
# in between into the Groups array"
[^)] # Any character that is not a ')' character
* # Zero or more occurrences of the aforementioned "non ')' char"
) # Close the capturing group
\) # "Ends with a ')' character"

Assuming that you only have one pair of parenthesis.
string s = "User name (sales)";
int start = s.IndexOf("(") + 1;
int end = s.IndexOf(")", start);
string result = s.Substring(start, end - start);

Use this function:
public string GetSubstringByString(string a, string b, string c)
{
return c.Substring((c.IndexOf(a) + a.Length), (c.IndexOf(b) - c.IndexOf(a) - a.Length));
}
and here is the usage:
GetSubstringByString("(", ")", "User name (sales)")
and the output would be:
sales

Regular expressions might be the best tool here. If you are not famililar with them, I recommend you install Expresso - a great little regex tool.
Something like:
Regex regex = new Regex("\\((?<TextInsideBrackets>\\w+)\\)");
string incomingValue = "Username (sales)";
string insideBrackets = null;
Match match = regex.Match(incomingValue);
if(match.Success)
{
insideBrackets = match.Groups["TextInsideBrackets"].Value;
}

string input = "User name (sales)";
string output = input.Substring(input.IndexOf('(') + 1, input.IndexOf(')') - input.IndexOf('(') - 1);

A regex maybe? I think this would work...
\(([a-z]+?)\)

using System;
using System.Text.RegularExpressions;
private IEnumerable<string> GetSubStrings(string input, string start, string end)
{
Regex r = new Regex(Regex.Escape(start) +`"(.*?)"` + Regex.Escape(end));
MatchCollection matches = r.Matches(input);
foreach (Match match in matches)
yield return match.Groups[1].Value;
}

int start = input.IndexOf("(") + 1;
int length = input.IndexOf(")") - start;
output = input.Substring(start, length);

Use a Regular Expression:
string test = "(test)";
string word = Regex.Match(test, #"\((\w+)\)").Groups[1].Value;
Console.WriteLine(word);

input.Remove(input.IndexOf(')')).Substring(input.IndexOf('(') + 1);

The regex method is superior I think, but if you wanted to use the humble substring
string input= "my name is (Jayne C)";
int start = input.IndexOf("(");
int stop = input.IndexOf(")");
string output = input.Substring(start+1, stop - start - 1);
or
string input = "my name is (Jayne C)";
string output = input.Substring(input.IndexOf("(") +1, input.IndexOf(")")- input.IndexOf("(")- 1);

var input = "12(34)1(12)(14)234";
var output = "";
for (int i = 0; i < input.Length; i++)
{
if (input[i] == '(')
{
var start = i + 1;
var end = input.IndexOf(')', i + 1);
output += input.Substring(start, end - start) + ",";
}
}
if (output.Length > 0) // remove last comma
output = output.Remove(output.Length - 1);
output : "34,12,14"

Here is a general purpose readable function that avoids using regex:
// Returns the text between 'start' and 'end'.
string ExtractBetween(string text, string start, string end)
{
int iStart = text.IndexOf(start);
iStart = (iStart == -1) ? 0 : iStart + start.Length;
int iEnd = text.LastIndexOf(end);
if(iEnd == -1)
{
iEnd = text.Length;
}
int len = iEnd - iStart;
return text.Substring(iStart, len);
}
To call it in your particular example you can do:
string result = ExtractBetween("User name (sales)", "(", ")");

I'm finding that regular expressions are extremely useful but very difficult to write. So, I did some research and found this tool that makes writing them so easy.
Don't shy away from them because the syntax is difficult to figure out. They can be so powerful.

This code is faster than most solutions here (if not all), packed as String extension method, it does not support recursive nesting:
public static string GetNestedString(this string str, char start, char end)
{
int s = -1;
int i = -1;
while (++i < str.Length)
if (str[i] == start)
{
s = i;
break;
}
int e = -1;
while(++i < str.Length)
if (str[i] == end)
{
e = i;
break;
}
if (e > s)
return str.Substring(s + 1, e - s - 1);
return null;
}
This one is little longer and slower, but it handles recursive nesting more nicely:
public static string GetNestedString(this string str, char start, char end)
{
int s = -1;
int i = -1;
while (++i < str.Length)
if (str[i] == start)
{
s = i;
break;
}
int e = -1;
int depth = 0;
while (++i < str.Length)
if (str[i] == end)
{
e = i;
if (depth == 0)
break;
else
--depth;
}
else if (str[i] == start)
++depth;
if (e > s)
return str.Substring(s + 1, e - s - 1);
return null;
}

I've been using and abusing C#9 recently and I can't help throwing in Spans even in questionable scenarios... Just for the fun of it, here's a variation on the answers above:
var input = "User name (sales)";
var txtSpan = input.AsSpan();
var startPoint = txtSpan.IndexOf('(') + 1;
var length = txtSpan.LastIndexOf(')') - startPoint;
var output = txtSpan.Slice(startPoint, length);
For the OP's specific scenario, it produces the right output.
(Personally, I'd use RegEx, as posted by others. It's easier to get around the more tricky scenarios where the solution above falls apart).
A better version (as extension method) I made for my own project:
//Note: This only captures the first occurrence, but
//can be easily modified to scan across the text (I'd prefer Slicing a Span)
public static string ExtractFromBetweenChars(this string txt, char openChar, char closeChar)
{
ReadOnlySpan<char> span = txt.AsSpan();
int firstCharPos = span.IndexOf(openChar);
int lastCharPos = -1;
if (firstCharPos != -1)
{
for (int n = firstCharPos + 1; n < span.Length; n++)
{
if (span[n] == openChar) firstCharPos = n; //This allows the opening char position to change
if (span[n] == closeChar) lastCharPos = n;
if (lastCharPos > firstCharPos) break;
//This would correctly extract "sales" from this [contrived]
//example: "just (a (name (sales) )))(test"
}
return span.Slice(firstCharPos + 1, lastCharPos - firstCharPos - 1).ToString();
}
return "";
}

Much similar to #Gustavo Baiocchi Costa but offset is being calculated with another intermediate Substring.
int innerTextStart = input.IndexOf("(") + 1;
int innerTextLength = input.Substring(start).IndexOf(")");
string output = input.Substring(innerTextStart, innerTextLength);

I came across this while I was looking for a solution to a very similar implementation.
Here is a snippet from my actual code. Starts substring from the first char (index 0).
string separator = "\n"; //line terminator
string output;
string input= "HowAreYou?\nLets go there!";
output = input.Substring(0, input.IndexOf(separator));

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Using linq to count substrings in a string? - c#

You can use a regular expression: string test = "And And And"; int j = Regex.Matches(test, "And").Cast<Match>().Count(); BTW, do you want to allow overlapping occurrences? i.e. if you're looking for "And And", do you consider that test contains 1 or 2 occurrences of it?

You can use IndexOf: string what = "And"; int count = 0; int pos = -what.Length; for (;;) { pos = input.IndexOf(what, pos + what.Length); if (pos == -1) break; count++; }

Related

C#: Need to split a string into a string[] and keeping the delimiter (also a string) at the beginning of the string

Split string after specific character or after max length

What is the best way to find length of split characters from the given string by using String.Split() Method or Linq Lambda Expression in C#

Finding multiple indexes from source string

How do I extract text that lies between parentheses (round brackets)?

Categories

Resources