How to check how many times a string exist in another string - c#

I Want to see how many time's a string occurrs in a string. For example I want to see how many times 2018 occurs in this paragraph:
zaeazeaze2018
azeazeazeazeaze2018azezaaze
azeaze4azeaze2018
In this case it is occuring 3 times.
I tried the following code
But the problem is that it always returns 0
And I can't find the mistake here:
public static string count(string k)
{
int i = 0;
foreach(var line in k)
{
if (line.ToString().Contains("Bestellung sehen"))
{
i++;
i = +i;
}
}
return i.ToString();
}

use this :
string text = "Hello2018,world2018\r\nWe have five 2018 here\r\n2018is coming2018"
int Counter = Regex.Matches(text, "2018").Count;
Console.WriteLine(Counter.ToString()); //write : 5

You can use Regular Expressions to handle such cases. Regular expressions give you good flexibility over your pattern matching in a string. In your case, I have prepared a sample code for you using Regular Expressions:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string str="zaeazeaze2018azeazeazeazeaze2018azezaazeazeaze4azeaze2018";
string regexPattern = #"2018";
int numberOfOccurence = Regex.Matches(str, regexPattern).Count;
Console.WriteLine(numberOfOccurence);
}
}
Working example: https://dotnetfiddle.net/PGgbm8
If you will notice the line string regexPattern = #"2018";, this sets the pattern to find all occurences of 2018 from your string. You can change this pattern according to what you require. A simple example would be that if I changed the pattern to string regexPattern = #"\d+";, it would give me 4 as output. This is because my pattern will match all occurences of numbers in the string.

This can be accomplished using Regular Expressions with the following:
using System.Text.RegularExpressions;
public static int count(string fullString, string searchPattern)
{
int i = Regex.Matches(fullString, searchPattern).Count;
return i;
}
For example, the following returns 2 as an int, not string:
count("asdfasdfasfdfindmeasdfadfasdasdfasdffindmesadf","findme")
I find this is quick enough for most of my use cases.

str is your String from your count method
str2 is your substring which is Bestellung sehen
int n = str2.length
int k = 0;
for(int i=0;i < str.length; i++){
if(str.substring(i,i+n-1)){
k++;
if(i+n-1 >= str.length){
break;
}
}
}
return k.toString()

Related

Regex C# is it possible to use a variable in substitution?

I got bunch of strings in text, which looks like something like this:
h1. this is the Header
h3. this one the header too
h111. and this
And I got function, which suppose to process this text depends on what lets say iteration it been called
public void ProcessHeadersInText(string inputText, int atLevel = 1)
so the output should look like one below in case of been called
ProcessHeadersInText(inputText, 2)
Output should be:
<h3>this is the Header<h3>
<h5>this one the header too<h5>
<h9 and this <h9>
(last one looks like this because of if value after h letter is more than 9 it suppose to be 9 in the output)
So, I started to think about using regex.
Here's the example https://regex101.com/r/spb3Af/1/
(As you can see I came up with regex like this (^(h([\d]+)\.+?)(.+?)$) and tried to use substitution on it <h$3>$4</h$3>)
Its almost what I'm looking for but I need to add some logic into work with heading level.
Is it possible to add any work with variables in substitution?
Or I need to find other way? (extract all heading first, replace em considering function variables and value of the header, and only after use regex I wrote?)
The regex you may use is
^h(\d+)\.+\s*(.+)
If you need to make sure the match does not span across line, you may replace \s with [^\S\r\n]. See the regex demo.
When replacing inside C#, parse Group 1 value to int and increment the value inside a match evaluator inside Regex.Replace method.
Here is the example code that will help you:
using System;
using System.Linq;
using System.Text.RegularExpressions;
using System.IO;
public class Test
{
// Demo: https://regex101.com/r/M9iGUO/2
public static readonly Regex reg = new Regex(#"^h(\d+)\.+\s*(.+)", RegexOptions.Compiled | RegexOptions.Multiline);
public static void Main()
{
var inputText = "h1. Topic 1\r\nblah blah blah, because of bla bla bla\r\nh2. PartA\r\nblah blah blah\r\nh3. Part a\r\nblah blah blah\r\nh2. Part B\r\nblah blah blah\r\nh1. Topic 2\r\nand its cuz blah blah\r\nFIN";
var res = ProcessHeadersInText(inputText, 2);
Console.WriteLine(res);
}
public static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
return reg.Replace(inputText, m =>
string.Format("<h{0}>{1}</h{0}>", (int.Parse(m.Groups[1].Value) > 9 ?
9 : int.Parse(m.Groups[1].Value) + atLevel), m.Groups[2].Value.Trim()));
}
}
See the C# online demo
Note I am using .Trim() on m.Groups[2].Value as . matches \r. You may use TrimEnd('\r') to get rid of this char.
You can use a Regex like the one used below to fix your issues.
Regex.Replace(s, #"^(h\d+)\.(.*)$", #"<$1>$2<$1>", RegexOptions.Multiline)
Let me explain you what I am doing
// This will capture the header number which is followed
// by a '.' but ignore the . in the capture
(h\d+)\.
// This will capture the remaining of the string till the end
// of the line (see the multi-line regex option being used)
(.*)$
The parenthesis will capture it into variables that can be used as "$1" for the first capture and "$2" for the second capture
Try this:
private static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
// Group 1 = value after 'h'
// Group 2 = Content of header without leading whitespace
string pattern = #"^h(\d+)\.\s*(.*?)\r?$";
return Regex.Replace(inputText, pattern, match => EvaluateHeaderMatch(match, atLevel), RegexOptions.Multiline);
}
private static string EvaluateHeaderMatch(Match m, int atLevel)
{
int hVal = int.Parse(m.Groups[1].Value) + atLevel;
if (hVal > 9) { hVal = 9; }
return $"<h{hVal}>{m.Groups[2].Value}</h{hVal}>";
}
Then just call
ProcessHeadersInText(input, 2);
This uses the Regex.Replace(string, string, MatchEvaluator, RegexOptions) overload with a custom evaluator function.
You could of course streamline this solution into a single function with an inline lambda expression:
public static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
string pattern = #"^h(\d+)\.\s*(.*?)\r?$";
return Regex.Replace(inputText, pattern,
match =>
{
int hVal = int.Parse(match.Groups[1].Value) + atLevel;
if (hVal > 9) { hVal = 9; }
return $"<h{hVal}>{match.Groups[2].Value}</h{hVal}>";
},
RegexOptions.Multiline);
}
A lot of good solution in this thread, but I don't think you really need a Regex solution for your problem. For fun and challenge, here a non regex solution:
Try it online!
using System;
using System.Linq;
public class Program
{
public static void Main()
{
string extractTitle(string x) => x.Substring(x.IndexOf(". ") + 2);
string extractNumber(string x) => x.Remove(x.IndexOf(". ")).Substring(1);
string build(string n, string t) => $"<h{n}>{t}</h{n}>";
var inputs = new [] {
"h1. this is the Header",
"h3. this one the header too",
"h111. and this" };
foreach (var line in inputs.Select(x => build(extractNumber(x), extractTitle(x))))
{
Console.WriteLine(line);
}
}
}
I use C#7 nested function and C#6 interpolated string. If you want, I can use more legacy C#. The code should be easy to read, I can add comments if needed.
C#5 version
using System;
using System.Linq;
public class Program
{
static string extractTitle(string x)
{
return x.Substring(x.IndexOf(". ") + 2);
}
static string extractNumber(string x)
{
return x.Remove(x.IndexOf(". ")).Substring(1);
}
static string build(string n, string t)
{
return string.Format("<h{0}>{1}</h{0}>", n, t);
}
public static void Main()
{
var inputs = new []{
"h1. this is the Header",
"h3. this one the header too",
"h111. and this"
};
foreach (var line in inputs.Select(x => build(extractNumber(x), extractTitle(x))))
{
Console.WriteLine(line);
}
}
}

Unusual Regex behavior in c#

I have a Regex that is behaving rather oddly and I can't figure why. Original Regex:
Regex regex = new Regex(#"(?i)\d\.\d\dv");
This expression returns/matches an equivalent to 1.35V or 1.35v, which is what I want. However, it is not exclusive enough for my program and it returns some strings I don't need.
Modified Regex:
Regex rgx = new Regex(#"(?i)\d\.\d\dv\s");
Simply by adding '\s' to the expression, it matches/returns DDR3, which is not at all what I want. I'm guessing some sort of inversion is occurring, but I don't understand why and I can't seem to find a reference to explain it. All I wanted to do was add a space to the end of expression to filter a few more results.
Any help would be greatly appreciated.
EDIT:
Here is a functional test case with a generic version of what is going on in my code. Just open a new WPF in Visual Studio, copy and paste, and it should repeat the results for you.
namespace WpfApplication1
{
/// <summary>
/// Interaction logic for MainWindow.xaml
/// </summary>
public partial class MainWindow : Window
{
public MainWindow()
{
InitializeComponent();
}
Regex rgx1 = new Regex(#"(?i)\d\.\d\dv");
Regex rgx2 = new Regex(#"(?i)\d\.\d\dv\s");
string testCase = #"DDR3 Vdd | | | | | 1.35v |";
string str = null;
public void IsMatch(string input)
{
Match rgx1Match = rgx1.Match(input);
if (rgx1Match.Success)
{
GetInfo(input);
}
}
public void GetInfo(string input)
{
Match rgx1Match = rgx1.Match(input);
Match rgx2Match = rgx2.Match(input);
string[] tempArray = input.Split();
int index = 0;
if (rgx1Match.Success)
{
index = GetMatchIndex(rgx1, tempArray);
str = tempArray[index].Trim();
global::System.Windows.Forms.MessageBox.Show("First expression match: " + str);
}
if (rgx2Match.Success)
{
index = GetMatchIndex(rgx2, tempArray);
str = tempArray[index].Trim();
System.Windows.Forms.MessageBox.Show(input);
global::System.Windows.Forms.MessageBox.Show("Second expression match: " + str);
}
}
public int GetMatchIndex(Regex expression, string[] input)
{
int index = 0;
for (int i = 0; i < input.Length; i++)
{
if (index < 1)
{
Match rgxMatch = expression.Match(input[i]);
if (rgxMatch.Success)
{
index = i;
}
}
}
return index;
}
private void button1_Click(object sender, RoutedEventArgs e)
{
string line;
IsMatch(testCase);
}
}
}
The GetMatchesIndex method is called a number of times in other parts of the code without incident, it is just on this one Regex that I've hit a stumbling block.
The behavior you are seeing has entirely to do with your application logic, and very little to do with the regular expression. In GetMatchIndex, you are defaulting index = 0. So what happens if none of the entries in string[] input match? You get back index = 0, which is the index of DDR3, the first element in string[] input.
You don't see that behavior in the first regular expression, because it matches 1.35v. However, when you add the space to the end, it doesn't match any of the entries in the split input, so you get back the first one by default which happens to be DDR3. Also, if (rgx1Match.Success) doesn't really help, because you check for a match in the entire string first (which does match because there's a space there), and then search for the index after splitting, which removed the spaces!
The fix is pretty simple: When you are returning an index from an array in a programming language that uses 0-based numbering, the standard way to represent "not found" is with -1 so it doesn't get confused with the valid result of 0. So default index to -1 instead and handle a result of -1 as a special case, i.e., display an error message to the user like "No matches".
Your question is incorrect:
new Regex(#"(?i)\d\.\d\dv\s").Match("DDR3").Success
is false
In fact, the results seem to work exactly as you'd like.

Replacing / with regex

I have a question regarding replacing some characters with regex or any other best practice or efficient way.
Here is what I have as input, it has mostly the same form: A/ABC/N/ABC/123
The output should look like this: A_ABC_NABC123, basically the first 2 / should be changed to _ and the rest removed.
Of course i could do with some String.Replace. etc one by one, but I don't think it is a good way to do that. I search for a better solution.
So how to do it with Regex?
This will do it, although there may be a simpler way:
static class CustomReplacer
{
public static string Replace(string input)
{
int i = 0;
return Regex.Replace(input, "/", m => i++ < 2 ? "_" : "");
}
}
var replaced = CustomReplacer.Replace("A/ABC/N/ABC/123");
I've wrapped the code like this to make sure you don't accidentally the int variable.
Edit: There's also this overload which stops after a certain number of replacements, but you'd have to do it in two steps: replace the first two / with _, then replace the remaining / with nothing.
Try this:
string st = "A/ABC/N/ABC/123";
string [] arrStr = st.Split(new char[] { '/' });
st = string.Empty;
for (int i = 0; i < arrStr.Length; i++)
{
if (i < 2)
st += arrStr[i] + "_";
else
st += arrStr[i];
}

Count regex replaces (C#)

Is there a way to count the number of replacements a Regex.Replace call makes?
E.g. for Regex.Replace("aaa", "a", "b"); I want to get the number 3 out (result is "bbb"); for Regex.Replace("aaa", "(?<test>aa?)", "${test}b"); I want to get the number 2 out (result is "aabab").
Ways I can think to do this:
Use a MatchEvaluator that increments a captured variable, doing the replacement manually
Get a MatchCollection and iterate it, doing the replacement manually and keeping a count
Search first and get a MatchCollection, get the count from that, then do a separate replace
Methods 1 and 2 require manual parsing of $ replacements, method 3 requires regex matching the string twice. Is there a better way.
Thanks to both Chevex and Guffa. I started looking for a better way to get the results and found that there is a Result method on the Match class that does the substitution. That's the missing piece of the jigsaw. Example code below:
using System.Text.RegularExpressions;
namespace regexrep
{
class Program
{
static int Main(string[] args)
{
string fileText = System.IO.File.ReadAllText(args[0]);
int matchCount = 0;
string newText = Regex.Replace(fileText, args[1],
(match) =>
{
matchCount++;
return match.Result(args[2]);
});
System.IO.File.WriteAllText(args[0], newText);
return matchCount;
}
}
}
With a file test.txt containing aaa, the command line regexrep test.txt "(?<test>aa?)" ${test}b will set %errorlevel% to 2 and change the text to aabab.
You can use a MatchEvaluator that runs for each replacement, that way you can count how many times it occurs:
int cnt = 0;
string result = Regex.Replace("aaa", "a", m => {
cnt++;
return "b";
});
The second case is trickier as you have to produce the same result as the replacement pattern would:
int cnt = 0;
string result = Regex.Replace("aaa", "(?<test>aa?)", m => {
cnt++;
return m.Groups["test"] + "b";
});
This should do it.
int count = 0;
string text = Regex.Replace(text,
#"(((http|ftp|https):\/\/|www\.)[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?)", //Example expression. This one captures URLs.
match =>
{
string replacementValue = String.Format("<a href='{0}'>{0}</a>", match.Value);
count++;
return replacementValue;
});
I am not on my dev computer so I can't do it right now, but I am going to experiment later and see if there is a way to do this with lambda expressions instead of declaring the method IncrementCount() just to increment an int.
EDIT modified to use a lambda expression instead of declaring another method.
EDIT2 If you don't know the pattern in advance, you can still get all the groupings (The $ groups you refer to) within the match object as they are included as a GroupCollection. Like so:
int count = 0;
string text = Regex.Replace(text,
#"(((http|ftp|https):\/\/|www\.)[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?)", //Example expression. This one captures URLs.
match =>
{
string replacementValue = String.Format("<a href='{0}'>{0}</a>", match.Value);
count++;
foreach (Group g in match.Groups)
{
g.Value; //Do stuff with g.Value
}
return replacementValue;
});

C# - Simplest way to remove first occurrence of a substring from another string

I need to remove the first (and ONLY the first) occurrence of a string from another string.
Here is an example replacing the string "\\Iteration". This:
ProjectName\\Iteration\\Release1\\Iteration1
would become this:
ProjectName\\Release1\\Iteration1
Here some code that does this:
const string removeString = "\\Iteration";
int index = sourceString.IndexOf(removeString);
int length = removeString.Length;
String startOfString = sourceString.Substring(0, index);
String endOfString = sourceString.Substring(index + length);
String cleanPath = startOfString + endOfString;
That seems like a lot of code.
So my question is this: Is there a cleaner/more readable/more concise way to do this?
int index = sourceString.IndexOf(removeString);
string cleanPath = (index < 0)
? sourceString
: sourceString.Remove(index, removeString.Length);
sourceString.Replace(removeString, "");
string myString = sourceString.Remove(sourceString.IndexOf(removeString),removeString.Length);
EDIT: #OregonGhost is right. I myself would break the script up with conditionals to check for such an occurence, but I was operating under the assumption that the strings were given to belong to each other by some requirement. It is possible that business-required exception handling rules are expected to catch this possibility. I myself would use a couple of extra lines to perform conditional checks and also to make it a little more readable for junior developers who may not take the time to read it thoroughly enough.
Wrote a quick TDD Test for this
[TestMethod]
public void Test()
{
var input = #"ProjectName\Iteration\Release1\Iteration1";
var pattern = #"\\Iteration";
var rgx = new Regex(pattern);
var result = rgx.Replace(input, "", 1);
Assert.IsTrue(result.Equals(#"ProjectName\Release1\Iteration1"));
}
rgx.Replace(input, "", 1); says to look in input for anything matching the pattern, with "", 1 time.
You could use an extension method for fun. Typically I don't recommend attaching extension methods to such a general purpose class like string, but like I said this is fun. I borrowed #Luke's answer since there is no point in re-inventing the wheel.
[Test]
public void Should_remove_first_occurrance_of_string() {
var source = "ProjectName\\Iteration\\Release1\\Iteration1";
Assert.That(
source.RemoveFirst("\\Iteration"),
Is.EqualTo("ProjectName\\Release1\\Iteration1"));
}
public static class StringExtensions {
public static string RemoveFirst(this string source, string remove) {
int index = source.IndexOf(remove);
return (index < 0)
? source
: source.Remove(index, remove.Length);
}
}
If you'd like a simple method to resolve this problem. (Can be used as an extension)
See below:
public static string RemoveFirstInstanceOfString(this string value, string removeString)
{
int index = value.IndexOf(removeString, StringComparison.Ordinal);
return index < 0 ? value : value.Remove(index, removeString.Length);
}
Usage:
string valueWithPipes = "| 1 | 2 | 3";
string valueWithoutFirstpipe = valueWithPipes.RemoveFirstInstanceOfString("|");
//Output, valueWithoutFirstpipe = " 1 | 2 | 3";
Inspired by and modified #LukeH's and #Mike's answer.
Don't forget the StringComparison.Ordinal to prevent issues with Culture settings.
https://www.jetbrains.com/help/resharper/2018.2/StringIndexOfIsCultureSpecific.1.html
I definitely agree that this is perfect for an extension method, but I think it can be improved a bit.
public static string Remove(this string source, string remove, int firstN)
{
if(firstN <= 0 || string.IsNullOrEmpty(source) || string.IsNullOrEmpty(remove))
{
return source;
}
int index = source.IndexOf(remove);
return index < 0 ? source : source.Remove(index, remove.Length).Remove(remove, --firstN);
}
This does a bit of recursion which is always fun.
Here is a simple unit test as well:
[TestMethod()]
public void RemoveTwiceTest()
{
string source = "look up look up look it up";
string remove = "look";
int firstN = 2;
string expected = " up up look it up";
string actual;
actual = source.Remove(remove, firstN);
Assert.AreEqual(expected, actual);
}

Categories