Using Regex.Replace to keep characters that can be vary

Using Regex.Replace to keep characters that can be vary - c#

I have the following:
string text = "version=\"1,0\"";
I want to replace the comma for a dot, while keeping the 1 and 0, BUT keeping in mind that they be different in different situations! It could be version="2,3" .
The smart ass and noob-unworking way to do it would be:
for (int i = 0; i <= 9; i++)
{
for (int z = 0; z <= 9; z++)
{
text = Regex.Replace(text, "version=\"i,z\"", "version=\"i.z\"");
}
}
But of course.. it's a string, and I dont want i and z be behave as a string in there.
I could also try the lame but working way:
text = Regex.Replace(text, "version=\"1,", "version=\"1.");
text = Regex.Replace(text, "version=\"2,", "version=\"2.");
text = Regex.Replace(text, "version=\"3,", "version=\"3.");
And so on.. but it would be lame.
Any hints on how to single-handedly handle this?
Edit: I have other commas that I don't wanna replace, so text.Replace(",",".") can't do

You need a regex like this to locate the comma
Regex reg = new Regex("(version=\"[0-9]),([0-9]\")");
Then do the repacement:
text = reg.Replace(text, "$1.$2");
You can use $1, $2, etc. to refer to the matching groups in order.

(?<=version=")(\d+),
You can try this.See demo.Replace by $1.
https://regex101.com/r/sJ9gM7/52

You can perhaps use capture groups to keep the numbers in front and after for replacement afterwards for a more 'traditional way' to do it:
string text = "version=\"1,0\"";
var regex = new Regex(#"version=""(\d*),(\d*)""");
var result = regex.Replace(text, "version=\"$1.$2\"");
Using parens like the above in a regex is to create a capture group (so the matched part can be accessed later when needed) so that in the above, the digits before and after the comma will be stored in $1 and $2 respectively.
But I decided to delve a little bit further and let's consider the case if there are more than one comma to replace in the version, i.e. if the text was version="1,1,0". It would actually be tedious to do the above, and you would have to make one replace for each 'type' of version. So here's one solution that is sometimes called a callback in other languages (not a C# dev, but I fiddled around lambda functions and it seems to work :)):
private static string SpecialReplace(string text)
{
var result = text.Replace(',', '.');
return result;
}
public static void Main()
{
string text = "version=\"1,0,0\"";
var regex = new Regex(#"version=""[\d,]*""");
var result = regex.Replace(text, x => SpecialReplace(x.Value));
Console.WriteLine(result);
}
The above gives version="1.0.0".
"version=""[\d,]*""" will first match any sequence of digits and commas within version="...", then pass it to the next line for the replace.
The replace takes the matched text, passes it to the lambda function which takes it to the function SpecialReplace, where a simple text replace is carried out only on the matched part.
ideone demo

Related

Remove list of words from string

I have a list of words that I want to remove from a string I use the following method
string stringToClean = "The.Flash.2014.S07E06.720p.WEB-DL.HEVC.x265.RMTeam";
string[] BAD_WORDS = {
"720p", "web-dl", "hevc", "x265", "Rmteam", "."
};
var cleaned = string.Join(" ", stringToClean.Split(' ').Where(w => !BAD_WORDS.Contains(w, StringComparer.OrdinalIgnoreCase)));
but it is not working And the following text is output
The.Flash.2014.S07E06.720p.WEB-DL.HEVC.x265.RMTeam

For this it would be a good idea to create a reusable method that splits a string into words. I'll do this as an extension method of string. If you are not familiar with extension methods, read extension methods demystified
public static IEnumerable<string> ToWords(this string text)
{
// TODO implement
}
Usage will be as follows:
string text = "This is some wild text!"
List<string> words = text.ToWords().ToList();
var first3Words = text.ToWords().Take(3);
var lastWord = text.ToWords().LastOrDefault();
Once you've got this method, the solution to your problem will be easy:
IEnumerable<string> badWords = ...
string inputText = ...
IEnumerable<string> validWords = inputText.ToWords().Except(badWords);
Or maybe you want to use Except(badWords, StringComparer.OrdinalIgnoreCase);
The implementation of ToWords depends on what you would call a word: everything delimited by a dot? or do you want to support whitespaces? or maybe even new-lines?
The implementation for your problem: A word is any sequence of characters delimited by a dot.
public static IEnumerable<string> ToWords(this string text)
{
// find the next dot:
const char dot = '.';
int startIndex = 0;
int dotIndex = text.IndexOf(dot, startIndex);
while (dotIndex != -1)
{
// found a Dot, return the substring until the dot:
int wordLength = dotIndex - startIndex;
yield return text.Substring(startIndex, wordLength;
// find the next dot
startIndex = dotIndex + 1;
dotIndex = text.IndexOf(dot, startIndex);
}
// read until the end of the text. Return everything after the last dot:
yield return text.SubString(startIndex, text.Length);
}
TODO:
Decide what you want to return if text starts with a dot ".ABC.DEF".
Decide what you want to return if the text ends with a dot: "ABC.DEF."
Check if the return value is what you want if text is empty.

Your split/join don't match up with your input.
That said, here's a quick one-liner:
string clean = BAD_WORDS.Aggregate(stringToClean, (acc, word) => acc.Replace(word, string.Empty));
This is basically a "reduce". Not fantastically performant but over strings that are known to be decently small I'd consider it acceptable. If you have to use a really large string or a really large number of "words" you might look at another option but it should work for the example case you've given us.
Edit: The downside of this approach is that you'll get partials. So for example in your token array you have "720p" but the code I suggested here will still match on "720px" but there are still ways around it. For example instead of using string's implementation of Replace you could use a regex that will match your delimiters something like Regex.Replace(acc, $"[. ]{word}([. ])", "$1") (regex not confirmed but should be close and I added a capture for the delimiter in order to put it back for the next pass)

Regex to strip characters except given ones?

I would like to strip strings but only leave the following:
[a-zA-Z]+[_a-zA-Z0-9-]*
I am trying to output strings that start with a character, then can have alphanumeric, underscores, and dashes. How can I do this with RegEx or another function?

Because everything in the second part of the regex is in the first part, you could do something like this:
String foo = "_-abc.!##$5o993idl;)"; // your string here.
//First replace removes all the characters you don't want.
foo = Regex.Replace(foo, "[^_a-zA-Z0-9-]", "");
//Second replace removes any characters from the start that aren't allowed there.
foo = Regex.Replace(foo, "^[^a-zA-Z]+", "");
So start out by paring it down to only the allowed characters. Then get rid of any allowed characters that can't be at the beginning.
Of course, if your regex gets more complicated, this solution falls apart fairly quickly.

Assuming that you've got the strings in a collection, I would do it this way:
foreach element in the collection try match the regex
if !success, remove the string from the collection
Or the other way round - if it matches, add it to a new collection.
If the strings are not in a collection can you add more details as to what your input looks like ?

If you want to pull out all of the identifiers matching your regular expression, you can do it like this:
var input = " _wontmatch f_oobar0 another_valid ";
var re = new Regex( #"\b[a-zA-Z][_a-zA-Z0-9-]*\b" );
foreach( Match match in re.Matches( input ) )
Console.WriteLine( match.Value );

Use MatchCollection matchColl = Regex.Matches("input string","your regex");
Then use:
string [] outStrings = new string[matchColl.Count]; //A string array to contain all required strings
for (int i=0; i < matchColl.Count; i++ )
outStrings[i] = matchColl[i].ToString();
You will have all the required strings in outStrings. Hope this helps.

Edited
var s = Regex.Matches(input_string, "[a-z]+(_*-*[a-z0-9]*)*", RegexOptions.IgnoreCase);
string output_string="";
foreach (Match m in s)
{
output_string = output_string + m;
}
MessageBox.Show(output_string);

string in c#. replace certain number in a loop

I have string. "12341234115151_log_1.txt" (this string length is not fixed. but "log" pattern always same)
I have a for loop.
each iteration, I want to set the number after "log" of i.
like "12341234115151_log_2.txt"
"12341234115151_log_3.txt"
....
to
"12341234115151_log_123.txt"
in c#, what is a good way to do so?
thanks.

A regex is ideal for this. You can use the Regex.Replace method and use a MatchEvaluator delegate to perform the numerical increment.
string input = "12341234115151_log_1.txt";
string pattern = #"(\d+)(?=\.)";
string result = Regex.Replace(input, pattern,
m => (int.Parse(m.Groups[1].Value) + 1).ToString());
The pattern breakdown is as follows:
(\d+): this matches and captures any digit, at least once
(?=\.): this is a look-ahead which ensures that a period (or dot) follows the number. A dot must be escaped to be a literal dot instead of a regex metacharacter. We know that the value you want to increment is right before the ".txt" so it should always have a dot after it. You could also use (?=\.txt) to make it clearer and be explicit, but you may have to use RegexOptions.IgnoreCase if your filename extension can have different cases.

You can use Regex. like this
var r = new Regex("^(.*_log_)(\\d).txt$")
for ... {
var newname = r.Replace(filename, "${1}"+i+".txt");
}

Use regular expressions to get the counter, then just append them together.
If I've read your question right...

How about,
for (int i =0; i<some condition; i++)
{
string name = "12341234115151_log_"+ i.ToString() + ".txt";
}

C# Regular Expression to return only the numbers

Let's say I have the following within my source code, and I want to return only the numbers within the string:
The source is coming from a website, just fyi, and I already have it parsed out so that it comes into the program, but then I need to actually parse these numbers to return what I want. Just having a doosy of a time trying to figure it out tho :(
like: 13|100|0;
How could I write this regex?
var cData = new Array(
"g;13|g;100|g;0",
"g;40|g;100|g;1.37",
"h;43|h;100|h;0",
"h;27|h;100|h;0",
"i;34|i;100|i;0",
"i;39|i;100|i;0",
);

Not sure you actually need regex here.
var str = "g;13|g;100|g;0";
str = str.Replace("g;", "");
would give you "13|100|0".
Or a slight improvement on spinon's answer:
// \- included in case numbers can be negative. Leave it out if not.
Regex.Replace("g;13|g;100|g;0", "[^0-9\|\.\-]", "");
Or an option using split and join:
String.Join("|", "g;13|g;100|g;0".Split('|').Select(pipe => pipe.Split(';')[1]));

I would use something like this so you only keep numbers and separator:
Regex.Replace("g;13|g;100|g;0", "[^0-9|]", "");

Regex might be overkill in this case. Given the uniform delimiting of | and ; I would recommend String.Split(). Then you could either split again or use String.Replace() to get rid of the extra chars (i.e. g;).

It looks like you have a number of solutions, but I'll throw in one more where you can iterate over each group in a match to get the number out if you want.
Regex regexObj = new Regex(#"\w;([\d|.]+)\|?");
Match matchResults = regexObj.Match("g;13|g;100|g;0");
if( matchResults.IsMatch )
{
for (int i = 1; i < matchResults.Groups.Count; i++)
{
Group groupObj = matchResults.Groups[i];
if (groupObj.Success)
{
//groupObj.Value will be the number you want
}
}
}
I hope this is helps.

Formatting sentences in a string using C#

I have a string with multiple sentences. How do I Capitalize the first letter of first word in every sentence. Something like paragraph formatting in word.
eg ."this is some code. the code is in C#. "
The ouput must be "This is some code. The code is in C#".
one way would be to split the string based on '.' and then capitalize the first letter and then rejoin.
Is there a better solution?

In my opinion, when it comes to potentially complex rules-based string matching and replacing - you can't get much better than a Regex-based solution (despite the fact that they are so hard to read!). This offers the best performance and memory efficiency, in my opinion - you'll be surprised at just how fast this'll be.
I'd use the Regex.Replace overload that accepts an input string, regex pattern and a MatchEvaluator delegate. A MatchEvaluator is a function that accepts a Match object as input and returns a string replacement.
Here's the code:
public static string Capitalise(string input)
{
//now the first character
return Regex.Replace(input, #"(?<=(^|[.;:])\s*)[a-z]",
(match) => { return match.Value.ToUpper(); });
}
The regex uses the (?<=) construct (zero-width positive lookbehind) to restrict captures only to a-z characters preceded by the start of the string, or the punctuation marks you want. In the [.;:] bit you can add the extra ones you want (e.g. [.;:?."] to add ? and " characters.
This means, also, that your MatchEvaluator doesn't have to do any unnecessary string joining (which you want to avoid for performance reasons).
All the other stuff mentioned by one of the other answerers about using the RegexOptions.Compiled is also relevant from a performance point of view. The static Regex.Replace method does offer very similar performance benefits, though (there's just an additional dictionary lookup).
Like I say - I'll be surprised if any of the other non-regex solutions here will work better and be as fast.
EDIT
Have put this solution up against Ahmad's as he quite rightly pointed out that a look-around might be less efficient than doing it his way.
Here's the crude benchmark I did:
public string LowerCaseLipsum
{
get
{
//went to lipsum.com and generated 10 paragraphs of lipsum
//which I then initialised into the backing field with #"[lipsumtext]".ToLower()
return _lowerCaseLipsum;
}
}
[TestMethod]
public void CapitaliseAhmadsWay()
{
List<string> results = new List<string>();
DateTime start = DateTime.Now;
Regex r = new Regex(#"(^|\p{P}\s+)(\w+)", RegexOptions.Compiled);
for (int f = 0; f < 1000; f++)
{
results.Add(r.Replace(LowerCaseLipsum, m => m.Groups[1].Value
+ m.Groups[2].Value.Substring(0, 1).ToUpper()
+ m.Groups[2].Value.Substring(1)));
}
TimeSpan duration = DateTime.Now - start;
Console.WriteLine("Operation took {0} seconds", duration.TotalSeconds);
}
[TestMethod]
public void CapitaliseLookAroundWay()
{
List<string> results = new List<string>();
DateTime start = DateTime.Now;
Regex r = new Regex(#"(?<=(^|[.;:])\s*)[a-z]", RegexOptions.Compiled);
for (int f = 0; f < 1000; f++)
{
results.Add(r.Replace(LowerCaseLipsum, m => m.Value.ToUpper()));
}
TimeSpan duration = DateTime.Now - start;
Console.WriteLine("Operation took {0} seconds", duration.TotalSeconds);
}
In a release build, the my solution was about 12% faster than the Ahmad's (1.48 seconds as opposed to 1.68 seconds).
Interestingly, however, if it was done through the static Regex.Replace method, both were about 80% slower, and my solution was slower than Ahmad's.

Here's a regex solution that uses the punctuation category to avoid having to specify .!?" etc. although you should certainly check if it covers your needs or set them explicitly. Read up on the "P" category under the "Supported Unicode General Categories" section located on the MSDN Character Classes page.
string input = #"this is some code. the code is in C#? it's great! In ""quotes."" after quotes.";
string pattern = #"(^|\p{P}\s+)(\w+)";
// compiled for performance (might want to benchmark it for your loop)
Regex rx = new Regex(pattern, RegexOptions.Compiled);
string result = rx.Replace(input, m => m.Groups[1].Value
+ m.Groups[2].Value.Substring(0, 1).ToUpper()
+ m.Groups[2].Value.Substring(1));
If you decide not to use the \p{P} class you would have to specify the characters yourself, similar to:
string pattern = #"(^|[.?!""]\s+)(\w+)";
EDIT: below is an updated example to demonstrate 3 patterns. The first shows how all punctuations affect casing. The second shows how to pick and choose certain punctuation categories by using class subtraction. It uses all punctuations while removing specific punctuation groups. The third is similar to the 2nd but using different groups.
The MSDN link doesn't spell out what some of the punctuation categories refer to, so here's a breakdown:
P: all punctuations (comprises all of the categories below)
Pc: underscore _
Pd: dash -
Ps: open parenthesis, brackets and braces ( [ {
Pe: closing parenthesis, brackets and braces ) ] }
Pi: initial single/double quotes (MSDN says it "may behave like Ps/Pe depending on usage")
Pf: final single/double quotes (MSDN Pi note applies)
Po: other punctuation such as commas, colons, semi-colons and slashes ,, :, ;, \, /
Carefully compare how the results are affected by these groups. This should grant you a great degree of flexibility. If this doesn't seem desirable then you may use specific characters in a character class as shown earlier.
string input = #"foo ( parens ) bar { braces } foo [ brackets ] bar. single ' quote & "" double "" quote.
dash - test. Connector _ test. Comma, test. Semicolon; test. Colon: test. Slash / test. Slash \ test.";
string[] patterns = {
#"(^|\p{P}\s+)(\w+)", // all punctuation chars
#"(^|[\p{P}-[\p{Pc}\p{Pd}\p{Ps}\p{Pe}]]\s+)(\w+)", // all punctuation chars except Pc/Pd/Ps/Pe
#"(^|[\p{P}-[\p{Po}]]\s+)(\w+)" // all punctuation chars except Po
};
// compiled for performance (might want to benchmark it for your loop)
foreach (string pattern in patterns)
{
Console.WriteLine("*** Current pattern: {0}", pattern);
string result = Regex.Replace(input, pattern,
m => m.Groups[1].Value
+ m.Groups[2].Value.Substring(0, 1).ToUpper()
+ m.Groups[2].Value.Substring(1));
Console.WriteLine(result);
Console.WriteLine();
}
Notice that "Dash" is not capitalized using the last pattern and it's on a new line. One way to make it capitalized is to use the RegexOptions.Multiline option. Try the above snippet with that to see if it meets your desired result.
Also, for the sake of example, I didn't use RegexOptions.Compiled in the above loop. To use both options OR them together: RegexOptions.Compiled | RegexOptions.Multiline.

You have a few different options:
Your approach of splitting the string, capitalizing and then re-joining
Using regular expressions to perform a replace of the expressions (which can be a bit tricky for case)
Write a C# iterator that iterates over each character and yields a new IEnumerable<char> with the first letter after a period in upper case. May offer benefit of a streaming solution.
Loop over each char and upper-case those that appear immediately after a period (whitespace ignored) - a StringBuffer may make this easier.
The code below uses an iterator:
public static string ToSentenceCase( string someString )
{
var sb = new StringBuilder( someString.Length );
bool wasPeriodLastSeen = true; // We want first letter to be capitalized
foreach( var c in someString )
{
if( wasPeriodLastSeen && !c.IsWhiteSpace )
{
sb.Append( c.ToUpper() );
wasPeriodLastSeen = false;
}
else
{
if( c == '.' ) // you may want to expand this to other punctuation
wasPeriodLastSeen = true;
sb.Append( c );
}
}
return sb.ToString();
}

I don't know why, but I decided to give yield return a try, based on what LBushkin had suggested. Just for fun.
static IEnumerable<char> CapitalLetters(string sentence)
{
//capitalize first letter
bool capitalize = true;
char lastLetter;
for (int i = 0; i < sentence.Length; i++)
{
lastLetter = sentence[i];
yield return (capitalize) ? Char.ToUpper(sentence[i]) : sentence[i];
if (Char.IsWhiteSpace(lastLetter) && capitalize == true)
continue;
capitalize = false;
if (lastLetter == '.' || lastLetter == '!') //etc
capitalize = true;
}
}
To use it:
string sentence = new String(CapitalLetters("this is some code. the code is in C#.").ToArray());

Do your work in a StringBuffer.
Lowercase the whole thing.
Loop through and uppercase leading chars.
Call ToString.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Using Regex.Replace to keep characters that can be vary - c#

You need a regex like this to locate the comma Regex reg = new Regex("(version=\"[0-9]),([0-9]\")"); Then do the repacement: text = reg.Replace(text, "$1.$2"); You can use $1, $2, etc. to refer to the matching groups in order.

(?<=version=")(\d+), You can try this.See demo.Replace by $1. https://regex101.com/r/sJ9gM7/52

Related

Remove list of words from string

Regex to strip characters except given ones?

string in c#. replace certain number in a loop

C# Regular Expression to return only the numbers

Formatting sentences in a string using C#

Categories

Resources