Count occurences of a string pattern, in a string using Linq - c#

Im working to correctly map links on websites.
I need to be able to count how often ../ occurs in a string. At this moment I have a function that loops through the string and counts, while this works, im looking for a Linq solution.
I know that I can count with a single character like this
int count = Href.Count(f => f == '/');
But, can I, by using LINQ , count how often the pattern ../ occurs? Is this possible?

You can do that nicely with Regex
var dotdotslash=new Regex(#"\.\./");
string test="../../bla/../";
int count=dotdotslash.Matches(test).Count;
↓
3

You could use this extension method:
public static int ContainsCount(this string input, string subString, bool countIntersecting = true, StringComparison comparison = StringComparison.CurrentCulture)
{
int occurences = 0;
int step = countIntersecting ? 1 : subString.Length;
int index = -step;
while ((index = input.IndexOf(subString, index + step, comparison)) >= 0)
occurences++;
return occurences;
}
which returns the number of sub-strings in a given string with pure string-methods:
int count = Href.ContainsCount("../");
String-methods are superior to other methods which use LINQ or regex in terms of efficiency.
This method supports counting intersecting sub-strings(default) and non-overlapping sub-strings.
This shows the difference:
string str = "ottotto";
int count = str.ContainsCount("otto"); // 2
count = str.ContainsCount("otto", false); // 1

Yes, it's possible, but it's very awkward, it will be slow, and it will be hard to read. Don't use it.
How would you count occurrences of a string within a string?
src.Select((c, i) => src.Substring(i)).Count(sub => sub.StartsWith(target))
Alternatively, this looks pretty beautiful:
public static class StringExtensions
{
public static IEnumerable<int> IndexOfAll(this string input, string value){
var currentIndex = 0;
while((currentIndex = input.IndexOf(value, currentIndex)) != -1)
yield return currentIndex++;
}
}
and usage:
"TESTHATEST"
.IndexOfAll("TEST")
.Count()
.Dump();

Regular expression (see Dmitry Ledentsov's answer) is much better here; however Linq is also possible:
String source = #"abc../def../";
// 2
int result = source
.Where((item, index) => source.Substring(index).StartsWith(#"../"))
.Count();

Actually, you can do it in a really LINQy (and awkward :) ) way like this:
private static int CountPatternAppearancesInString(string str, string pattern)
{
var count = str
.Select(
(_, index) =>
index < str.Length - pattern.Length + 1 &&
str.Skip(index)
.Take(pattern.Length)
.Zip(pattern, (strChar, patternChar) => strChar == patternChar)
.All(areEqual => areEqual))
.Count(isMatch => isMatch);
return count;
}
Or, using some of the String-provided methods:
private static int CountPatternAppearancesInString(string str, string pattern)
{
var count = str
.Select(
(_, index) =>
index < str.Length - pattern.Length + 1 &&
str.IndexOf(pattern, index, pattern.Length) >= 0)
.Count(isMatch => isMatch);
return count;
}
But, as already said, it is suboptimal and serves for illustration purpose only.

Related

Odd-Even Index String Sort

Task:
Implement the method at each iteration of which, the odd characters of the string are combined and wrapped to its beginning, and the even characters are wrapped to the end.
"source" The source string.
"count" The count of iterations.
My code:
public static string ShuffleChars(string s, int count)
{
string res = string.Empty;
for (int i = 0; i <= count; i++)
{
res = $"{string.Concat(s.Where((x, i) => i % 2 == 0))}{string.Concat(s.Where((x, i) => i % 2 != 0))}";
}
}
return res;
I sorted string but I don't know how can I do iterations on same value , I tried use "for" , but it is not working, help me pls
i need to sort like this:
1."123456789"
2."135792468" first iteration
3."159483726" second iteration
4."198765432" third iteration
but if I use loop , anyway count = 2 or count = 10 it returns "135792468", I don't know why
The problems with your code are:
You return from inside the loop. This prevents any but the first iteration to complete.
You use <= instead of < in your loop condition. Since we start at 0, this will iterate count + 1 times.
You use the same variable name i for the loop counter as you do in the Where clause, which is illegal since they're in the same scope.
To resolve these issues (and use string.Concat instead of string.Join):
public static string ShuffleChars(string s, int count)
{
for (int i = 0; i < count; i++)
{
s = string.Concat(s.Where((item, index) => index % 2 == 0)) +
string.Concat(s.Where((item, index) => index % 2 != 0));
}
return s;
}
Testing the output:
static void Main()
{
var input = "123456789";
Console.WriteLine($"Starting input = {input}");
Console.WriteLine($"One iteration = {ShuffleChars(input, 1)}");
Console.WriteLine($"Two iterations = {ShuffleChars(input, 2)}");
Console.WriteLine($"Three iterations = {ShuffleChars(input, 3)}");
GetKeyFromUser("\nDone! Press any key to exit...");
}
Output
If I'm reading your problem correctly, you pretty much want to do what your code is doing; shuffle the characters at odd positions to the beginning and even positions to the end.
However, you want to continue to shuffle them, the number of times that you pass in, count. If you try to just loop what you have, you're continuing to use the original string that you passed in, s, and then will always end up returning the same value.
The easiest way to accomplish this is to declare an output string that you continue to assign to until you break out of the loop. So something like:
public static string ShuffleChars(string s, int count)
{
var output = s;
for(int i = 0; i < count; i++) {
output = string.Join("", output.Where((v, j) => j % 2 == 0))
+ string.Join("", output.Where((v, j) => j % 2 != 0));
}
return output;
}
The key here is that you are declaring a new value, output, and initializing it to your string value you passed in. Then for each iteration of the loop, you reassign the value of output to the new value. Finally, once you break out of the loop, you return the final value of output.
As others have stated, there are other ways you could improve on the assignment line. Personally, I probably prefer using string interpolation:
output = $"{output.Where((v, j) => j % 2 == 0)}{output.Where((v, j) => j % 2 != 0)};"
You could convert the string to an IEnumerable<char>, and then apply the same LINQ transformations a count number of times. Finally materialize the IEnumerable<char> to a char[] using the ToArray operator, and then convert the array back to a string.
public static string ShuffleChars(string s, int count)
{
IEnumerable<char> chars = s;
foreach (var _ in Enumerable.Range(0, count))
{
chars = chars
.Select((c, i) => (c, i))
.OrderBy(e => e.i % 2)
.Select(e => e.c);
}
return new String(chars.ToArray());
}

find Count of Substring in string C#

I am trying to find how many times the word "Serotonin" appears in the gathered web data but cannot find a method for finding the number of times.
IEnumerator OnMouseDown()
{
string GatheredData;
StringToFind = "Serotonin"
string url = "https://en.wikipedia.org/wiki/Dopamine";
WWW www = new WWW(url);
yield return www;
GatheredData = www.text;
//Attempted methods below
M1_count = GatheredData.Contains(StringToFind);
M1_count = GatheredData.Count(StringToFind);
M1_count = GatheredData.IndexOf(StringToFind);
}
I can easily use the data from those methods 1 and 3 when I tell it what number in the index and method 2 would work but only works for chars not strings
I have checked online and on here but found nothing of finding the count of the StringToFind
Assume string is like this
string test = "word means collection of chars, and every word has meaning";
then just use regex to find how many times word is matched in your test string like this
int count = Regex.Matches(test, "word").Count;
output would be 2
The solution
int count = Regex.Matches(someString, potencialSubstring).Count;
did not work for me. Even thou I used Regex.Escape(str)
So I wrote it myself, it is quite slow, but the performance is not an issue in my app.
private static List<int> StringOccurencesCount(String haystack, String needle, StringComparison strComp)
{
var results = new List<int>();
int index = haystack.IndexOf(needle, strComp);
while (index != -1)
{
results.Add(index);
index = haystack.IndexOf(needle, index + needle.Length, strComp);
}
return results;
}
Maybe someone will find this useful.
Improvement on #Petr Nohejl's excellent answer:
public static int Count (this string s, string substr, StringComparison strComp = StringComparison.CurrentCulture)
{
int count = 0, index = s.IndexOf(substr, strComp);
while (index != -1)
{
count++;
index = s.IndexOf(substr, index + substr.Length, strComp);
}
return count;
}
This does not use Regex.Matches and probably has better performance and is more predictable.
See on .NET Fiddle
If you don't worry about performance, here are 3 alternative solutions:
int Count(string src, string target)
=> src.Length - src.Replace(target, target[1..]).Length;
int Count(string src, string target)
=> src.Split(target).Length - 1;
int Count(string src, string target)
=> Enumerable
.Range(0, src.Length - target.Length + 1)
.Count(index => src.Substring(index, target.Length) == target);
Oh yes, I have it now.
I will split() the array and get the length
2nd to that I will IndexOf until I return a -1
Thanks for help in the comments!
A possible solution would be to use Regex:
var count = Regex.Matches(GatheredData.ToLower(), String.Format("\b{0}\b", StringToFind)).Count;

IndexOf n-th character of that type

I need to get the index of the n-th character of that type in a string.
Example:
string test = "asdfasdfasdf";
int index = test.IndexOf("a", 3);
and index would be 8, since the 3rd a has the index of 8
Is there a function that does that or do you have any idea how to do that in a smart way?
You can do it in a single line, but it is not pretty:
var s = "aabbabhjhjdsfbaxt";
var idx = s.Select((c, i) => new {c, i})
.Where(p => p.c == 'a')
.Skip(2)
.FirstOrDefault()?.i ?? -1;
The idea is to pair up characters with their indexes, filter by character, skip n-1 items, and take the next one if it exists.
Another approach would be to use regex with look-behind:
var idx = Regex.Match(s, "(?<=(a[^a]*){2})a").Index;
This matches an 'a' preceded by two more 'a's, possibly with other characters in the middle.
A simple for loop will do the trick, but you can place the for loop in an extension method to get this functionality as a one-liner throughout your application.
Like this:
public static int NthIndexOf(this string text, char letter, int occurrence)
{
if (text == null) throw new ArgumentNullException(nameof(text));
int count = 0;
for (int index = 0; index < text.Length; index++)
if (text[index] == letter)
{
if (++count == occurrence)
return index;
}
return -1;
}
Usage:
string test = "asdfasdfasdf";
int index = test.IndexOf('a', 3);
You can create an extension method for this like that:
public static int NthIndexOf(this string s, char c, int occurence)
{
if (s == null) throw new ArgumentNullException(nameof(s));
if (occurence <= 0) return -1;
int i = -1, o = 0;
do
{
i = s.IndexOf(c, i+1);
o++;
} while (i >= 0 && o < occurence);
return i;
}
After checking the input arguments it consecutivly uses string.IndexOf() to get the n-th index of the desired character c.
If no further occurence of c is found (IndexOf returns -1) the loop breaks.
Usage:
string test = "asdfasdfasdf";
int index = test.NthIndexOf('a', 3); // 8
Of course you can do the same with a parameter of type string for c instead of char, but would need to change s.IndexOf(c, i+1) to s.IndexOf(c, i+c.Length).
As René Vogt said you should use an extension method chaining calls to IndexOf() for performance reasons:
public static int NthIndexOf(this string s, string value, int n)
{
var index = -1;
for (; n > 0; n--)
index = s.IndexOf(value, index + 1);
return index;
}
This returns the 0-based start index of the (1-based) nth occurence of the string value.
Here is a working example: https://dotnetfiddle.net/xlqf2B

.indexOf for multiple results

Let's say I have a text and I want to locate the positions of each comma. The string, a shorter version, would look like this:
string s = "A lot, of text, with commas, here and,there";
Ideally, I would use something like:
int[] i = s.indexOf(',');
but since indexOf only returns the first comma, I instead do:
List<int> list = new List<int>();
for (int i = 0; i < s.Length; i++)
{
if (s[i] == ',')
list.Add(i);
}
Is there an alternative, more optimized way of doing this?
Here I got a extension method for that, for the same use as IndexOf:
public static IEnumerable<int> AllIndexesOf(this string str, string searchstring)
{
int minIndex = str.IndexOf(searchstring);
while (minIndex != -1)
{
yield return minIndex;
minIndex = str.IndexOf(searchstring, minIndex + searchstring.Length);
}
}
so you can use
s.AllIndexesOf(","); // 5 14 27 37
https://dotnetfiddle.net/DZdQ0L
You could use Regex.Matches(string, string) method. This will return a MatchCollection and then you could determine the Match.Index. MSDN has a good example,
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"\b\w+es\b";
string sentence = "Who writes these notes?";
foreach (Match match in Regex.Matches(sentence, pattern))
Console.WriteLine("Found '{0}' at position {1}",
match.Value, match.Index);
}
}
// The example displays the following output:
// Found 'writes' at position 4
// Found 'notes' at position 17
IndexOf also allows you to add another parameter for where to start looking. You can set that parameter to be the last known comma location +1. For example:
string s = "A lot, of text, with commas, here and, there";
int loc = s.IndexOf(',');
while (loc != -1) {
Console.WriteLine(loc);
loc = s.IndexOf(',', loc + 1);
}
You could use the overload of the IndexOf method that also takes a start index to get the following comma, but you would still have to do that in a loop, and it would perform pretty much the same as the code that you have.
You could use a regular expression to find all commas, but that produces quite some overhead, so that's not more optimised than what you have.
You could write a LINQ query to do it in a different way, but that also has some overhead so it's not more optimised than what you have.
So, there are many alternative ways, but not any way that is more optimised.
A bit unorthodox, but why not use a split? Might be less aggressive than iterating over the entire string
string longString = "Some, string, with, commas.";
string[] splitString = longString.Split(",");
int numSplits = splitString.Length - 1;
Debug.Log("number of commas "+numSplits);
Debug.Log("first comma index = "+GetIndex(splitString, 0)+" second comma index = "+GetIndex(splitString, 1));
public int GetIndex(string[] stringArray, int num)
{
int charIndex = 0;
for (int n = num; n >= 0; n--)
{
charIndex+=stringArray[n].Length;
}
return charIndex + num;
}

Splitting string into pairs with c#

Is there a way to break a string into pairs without looking at indexes? e.g.
TVBMCVTVFGTVTB would be broken into a list of strings as such:
[TV,BM,CV,TV,FG,TV,TB]
Perhaps I should have worded the question to state is their a function similar to string.join, or string.split when working with strings to break them into groups.
Oh come on, just use indexes like this:
public static class StringExtensions {
public static IEnumerable<string> TakeEvery(this string s, int count) {
int index = 0;
while(index < s.Length) {
if(s.Length - index >= count) {
yield return s.Substring(index, count);
}
else {
yield return s.Substring(index, s.Length - index);
}
index += count;
}
}
}
I have added no guard clauses.
Usage:
var items = "TVBMCVTVFGTVTB".TakeEvery(2);
foreach(var item in items) {
Console.WriteLine(item);
}
If you like some esoteric solutions:
1)
string s = "TVBMCVTVFGTVTB";
var splitted = Enumerable.Range(0, s.Length)
.GroupBy(x => x / 2)
.Select(x => new string(x.Select(y => s[y]).ToArray()))
.ToList();
2)
string s = "ABCDEFGHIJKLMN";
var splitted = Enumerable.Range(0, (s.Length + 1) / 2)
.Select(i =>
s[i * 2] +
((i * 2 + 1 < s.Length) ?
s[i * 2 + 1].ToString() :
string.Empty))
.ToList();
If you REALLY want to avoid using indexes...
You could use a Regex "\w\w" or "\w{2,2}" or some variation like that and MSDN - Regex.Matches method to get a MatchCollection which would contain the matches as pairs of characters. Change \w in the regex pattern to suit your needs.
List<string> result = new List<string>();
while (original.Length > 0)
{
result.Add(new String(original.Take(2).ToArray()));
original = new String(original.Skip(2).ToArray());
}
return result;
The LINQ probably uses indices somewhere internally, but I didn't touch any of them so I consider this valid. It works for odd-length originals, too.
Edit:
Thanks Heinzi for the correction. Demo: http://rextester.com/MWCKYD83206
Convert the string into a char array and then iterate along that making new strings out of pairs of characters.
1) Split list into pairs
var s = "TVBMCVTVFGTVTB";
var pairs = Enumerable.Range(0, s.Length / 2)
.Select(i => String.Concat(s.Skip(i * 2).Take(2)));
This will work if you know that s always is of even length, or you don't accept, or don't care about, ending with singletons for strings with odd length.
2) Split list into pairs - include any singleton remainders
If you want to include singleton remainders, for odd length strings, you can simply use ceiling:
var s = "TVBMCVTVFGTVTB";
var pairs = Enumerable.Range(0, (int)Math.Ceiling(s.Length / 2D))
.Select(i => String.Concat(s.Skip(i * 2).Take(2)));

Categories