Regular expression to replace string except in sqaure brackets - c#

Need to replace all forward-slash (/) with > except for the ones in the square brackets
input string:
string str = "//div[1]/li/a[#href='https://www.facebook.com/']";
Tried pattern (did not work):
string regex = #"\/(?=$|[^]]+\||\[[^]]+\]\/)";
var pattern = Regex.Replace(str, regex, ">");
Expected Result:
">>div[1]>li>a[#href='https://www.facebook.com/']"

Your thinking was good with lookbehind but instead positive use negative.
(?<!\[[^\]]*)(\/)
Demo
After updating your c# code
string pattern = #"(?<!\[[^\]]*)(\/)";
string input = "//div[1]/li/a[#href='https://www.facebook.com/']";
var result = Regex.Replace(input, pattern, ">");
You will get
>>div[1]>li>a[#href='https://www.facebook.com/']

If you're willing to also use String.Replace you can do the following:
string input = "//div[1]/li/a[#href='https://www.facebook.com/']";
string expected = ">>div[1]>li>a[#href='https://www.facebook.com/']";
var groups = Regex.Match(input, #"^(.*)(\[.*\])$")
.Groups
.Cast<Group>()
.Select(g => g.Value)
.Skip(1);
var left = groups.First().Replace('/', '>');
var right = groups.Last();
var actual = left + right;
Assert.Equal(expected, actual);
What this does is split the string into two groups, where for the first group the / is replaced by > as you describe. The second group is appended as is. Basically, you don't care what is between square brackets.
(The Assert is from an xUnit unit test.)

You could either match from an opening till a closing square bracket or capture the / in a capturing group.
In the replacement replace the / with a <
Pattern
\[[^]]+\]|(/)
\[[^]]+\] Match from opening [ till closing ]
| Or
(/) Capture / in group 1
Regex demo | C# demo
For example
string str = "//div[1]/li/a[#href='https://www.facebook.com/']";
string regex = #"\[[^]]+\]|(/)";
str = Regex.Replace(str, regex, m => m.Groups[1].Success ? ">" : m.Value);
Console.WriteLine(str);
Output
>>div[1]>li>a[#href='https://www.facebook.com/']

Related

C# Filter a word with an undefined number of spaces between charachers

For exampe:
I can create a wordt with multiple spaces for example:
string example = "**example**";
List<string>outputs = new List<string>();
string example_output = "";
foreach(char c in example)
{
example_putput += c + " ";
}
And then i can loop it to remve all spaces and add them to the outputs list,
The problem there is. I need it to work in scenario's where there are double spaces and more.
For example.
string text = "This is a piece of text for this **example**.";
I basicly want to detect AND remove 'example'
But, i want to do that even when it says e xample, e x ample or example.
And in my scenaria, since its a spamfilter, i cant just replace the spaces in the whole sentence like below, because i'd need to .Replace( the word with the exact same spaces as the user types it in).
.Replace(" ", "");
How would i achieve this?
TLDR:
I want to filter out a word with multiple spaces combinations without altering any other parts of the line.
So example, e xample, e x ample, e x a m ple
becomes a filter word
I wouldn't mind a method which could generates a word with all spaces as plan b.
You can use this regex to achieve that:
(e[\s]*x[\s]*a[\s]*m[\s]*p[\s]*l[\s]*e)
Link
Dotnet Fiddle
You could use a regex for that: e\s*x\s*a\s*m\s*p\s*l\s*e
\s means any whitespace character and the * means 0-n count of that whitespace.
Small snippet:
const string myInput = "e x ample";
var regex = new Regex("e\s*x\s*a\s*m\s*p\s*l\s*e");
var match = regex.Match(myInput);
if (match.Success)
{
// We have a match! Bad word
}
Here the link for the regex: https://regex101.com/r/VFjzTg/1
I see that the problem is to ignore the spaces in the matchstring, but not touch them anywhere else in the string.
You could create a regular expression out of your matchword, allowing arbitrary whitespace between each character.
// prepare regex. Need to do this only once for many applications.
string findword = "example";
// TODO: would need to escape special chars like * ( ) \ . + ? here.
string[] tmp = new string[findword.Length];
for(int i=0;i<tmp.Length;i++)tmp[i]=findword.Substring(i,1);
System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(string.Join("\\s*",tmp));
// on each text to filter, do this:
string inp = "A text with the exa mple word in it.";
string outp;
outp = r.Replace(inp,"");
System.Console.WriteLine(outp);
Left out the escaping of regex-special-chars for brevity.
You can try regular expressions:
using System.Text.RegularExpressions;
....
// Having a word to find
string toFind = "Example";
// we build the regular expression
Regex regex = new Regex(
#"\b" + string.Join(#"\s*", toFind.Select(c => Regex.Escape(c.ToString()))) + #"\b",
RegexOptions.IgnoreCase);
// Then we apply regex built for the required text:
string text = "This is a piece of text for this **example**. And more (e X amp le)";
string result = regex.Replace(text, "");
Console.Write(result);
Outcome:
This is a piece of text for this ****. And more ()
Edit: if you want to ignore diacritics, you should modify regular expression:
string toFind = "Example";
Regex regex = new Regex(#"\b" + string.Join(#"\s*",
toFind.Select(c => Regex.Escape(c.ToString()) + #"\p{Lm}*")),
RegexOptions.IgnoreCase);
and Normalize text before matching:
string text = "This is a piece of text for this **examplé**. And more (e X amp le)";
string result = regex.Replace(text.Normalize(NormalizationForm.FormD), "");

Regex - Removing specific characters before the final occurance of #

So, I'm trying to remove certain characters [.&#] before the final occurance of an #, but after that final #, those characters should be allowed.
This is what I have so far.
string pattern = #"\.|\&|\#(?![^#]+$)|[^a-zA-Z#]";
string input = "username#middle&something.else#company.com";
// input, pattern, replacement
string result = Regex.Replace(input, pattern, string.Empty);
Console.WriteLine(result);
Output: usernamemiddlesomethingelse#companycom
This currently removes all occurances of the specified characters, apart from the final #. I'm not sure how to get this to work, help please?
You may use
[.&#]+(?=.*#)
Or, equivalent [.&#]+(?![^#]*$). See the regex demo.
Details
[.&#]+ - 1 or more ., & or # chars
(?=.*#) - followed with any 0+ chars (other than LF) as many as possible and then a #.
See the C# demo:
string pattern = #"[.&#]+(?=.*#)";
string input = "username#middle&something.else#company.com";
string result = Regex.Replace(input, pattern, string.Empty);
Console.WriteLine(result);
// => usernamemiddlesomethingelse#company.com
Just a simple solution (and alternative to complex regex) using Substring and LastIndexOf:
string pattern = #"[.#&]";
string input = "username#middle&something.else#company.com";
string inputBeforeLastAt = input.Substring(0, input.LastIndexOf('#'));
// input, pattern, replacement
string result = Regex.Replace(inputBeforeLastAt, pattern, string.Empty) + input.Substring(input.LastIndexOf('#'));
Console.WriteLine(result);
Try it with this fiddle.

Matching a result in c#

I have the following string
background-image: url('https://s3-eu-west-1.amazonaws.com/files.domain.com/uploads/image/file/168726/carousel_IMG_6455.jpg')
and I just want to get the image URL.
My code is:
image = image.Replace(#"'", "\"");
Match match = Regex.Match(image, #"'([^']*)");
Match.Success returns nothing, so I can not get the image URL.
Is there something missing? This used to work but not now.
The following pattern achieves your result, without the usage of string.replace.
var pattern = #"'(?<url>.*)'";
Match match = Regex.Match(image, pattern);
Console.WriteLine($"Math: {match.Groups["url"].Value}");
If you want the " surrounding the string, add this:
var result = $"\"{match.Groups["url"].Value}\""
No need for a regex, just
Split the string with ' substring
Find the element starting with http
Return the first found item.
C# demo:
var s = "background-image: url('https://s3-eu-west-1.amazonaws.com/files.domain.com/uploads/image/file/168726/carousel_IMG_6455.jpg')";
var res = s.Split(new[] {"'"}, StringSplitOptions.None)
.Where(v => v.StartsWith("http"))
.FirstOrDefault();
Console.WriteLine(res);
// => https://s3-eu-west-1.amazonaws.com/files.domain.com/uploads/image/file/168726/carousel_IMG_6455.jpg
If you need to use a regex, use the standard regex to match a string between two strings, start(.*?)end where (.*?) captures into Group 1 any 0 or more chars other than a newline, as few as possible as the *? quantifier is lazy:
var s = "background-image: url('https://s3-eu-west-1.amazonaws.com/files.domain.com/uploads/image/file/168726/carousel_IMG_6455.jpg')";
var res = Regex.Match(s, #"'(.*?)'").Groups[1].Value ?? string.Empty;
Console.WriteLine(res);
// => https://s3-eu-west-1.amazonaws.com/files.domain.com/uploads/image/file/168726/carousel_IMG_6455.jpg
See another C# demo
The regex of: (\".*\")
Will match the URL given the input string:
background-image: url("https://s3-eu-west-1.amazonaws.com/files.domain.com/uploads/image/file/168726/carousel_IMG_6455.jpg")
image = image.Replace(#"'", "\"");
Match match = Regex.Match(image, "(\\\".*\\\")");
Edit:
If you are looking for something that will match pairs of single quotes or double quotes you could use:
(\".*\"|'.*')
Match match = Regex.Match(image, "(\\\".*\\\"|'.*')");

How do I replace all instances of any special characters between each occurrence of a set of delimiters in a string?

I'm attempting to replace all instances of any special characters between each occurrence of a set of delimiters in a string. I believe the solution will include some combination of a regular expression match to retrieve the text between each set of delimiters and a regular expression replace to replace each offending character within the match with a space. Here’s what I have so far:
string input = "***XX*123456789~N3*123 E. Fake St. Apt# 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W. False Ave.*Apt. #6B~N4*Beverly Hills*CA*90210~DMG*";
string matchPattern = "(~N3\\*)(.*?)(~N4\\*)";
string replacePattern = "[^0-9a-zA-Z ]?";
var matches = Regex.Matches(input, matchPattern);
foreach (Match match in matches)
{
match.Value = "~N3*" + Regex.Replace(match.Value, replacePattern, " ") + "~N4*";
}
MessageBox.Show(input);
I would expect the message box to show the following:
"***XX*123456789~N3*123 E Fake St Apt 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W False Ave *Apt 6B~N4*Beverly Hills*CA*90210~DMG*"
Obviously this isn’t working because I can’t assign to the matched value inside the loop, but I hope you can follow my thought process. It is important that any characters which are not between the delimiters remain unchanged. Any direction or advice would be helpful. Thank you so much!
Use a Regex.Replace with a match evaluator where you may call the second Regex.Replace:
string input = "***XX*123456789~N3*123 E. Fake St. Apt# 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W. False Ave.*Apt. #6B~N4*Beverly Hills*CA*90210~DMG*";
string matchPattern = #"(~N3\*)(.*?)(~N4\*)";
string replacePattern = "[^0-9a-zA-Z ]";
string res = Regex.Replace(input, matchPattern, m =>
string.Format("{0}{1}{2}",
m.Groups[1].Value,
Regex.Replace(m.Groups[2].Value, replacePattern, " "), // Here, you modify just inside the 1st regex matches
m.Groups[3].Value));
Console.Write(res); // Just to print the demo result
// => ***XX*123456789~N3*123 E Fake St Apt 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W False Ave Apt 6B~N4*Beverly Hills*CA*90210~DMG*
See the C# demo
Actually, since ~N3* and ~N4* are literal strings, you may use a single capturing group in the pattern and then add those delimiters as hard-coded in the match evaluator, but it is up to you to decide what suits you best.

Extract string that contains only letters in C#

string input = "5991 Duncan Road";
var onlyLetters = new String(input.Where(Char.IsLetter).ToArray());
Output: DuncanRoad
But I am expecting output is Duncan Road. What need to change ?
For the input like yours, you do not need a regex, just skip all non-letter symbols at the beginning with SkipWhile():
Bypasses elements in a sequence as long as a specified condition is true and then returns the remaining elements.
C# code:
var input = "5991 Duncan Road";
var onlyLetters = new String(input.SkipWhile(p => !Char.IsLetter(p)).ToArray());
Console.WriteLine(onlyLetters);
See IDEONE demo
A regx solution that will remove numbers that are not part of words and also adjoining whitespace:
var res = Regex.Replace(str, #"\s+(?<!\p{L})\d+(?!\p{L})|(?<!\p{L})\d+(?!\p{L})\s+", string.Empty); 
You can use this lookaround based regex:
repl = Regex.Replace(input, #"(?<![a-zA-Z])[^a-zA-Z]|[^a-zA-Z](?![a-zA-Z])", "");
//=> Duncan Road
(?<![a-zA-Z])[^a-zA-Z] matches a non-letter that is not preceded by another letter.
| is regex alternation
[^a-zA-Z](?![a-zA-Z]) matches a non-letter that is not followed by another letter.
RegEx Demo
You can still use LINQ filtering with Char.IsLetter || Char.IsWhiteSpace. To remove all leading and trailing whitespace chars you can call String.Trim:
string input = "5991 Duncan Road";
string res = String.Join("", input.Where(c => Char.IsLetter(c) || Char.IsWhiteSpace(c)))
.Trim();
Console.WriteLine(res); // Duncan Road

Categories