C# Replace some char from a Regex Match [duplicate] - c#

In the end, I will want to replace all the \t that are enclosed within "
I'm currently on Regex101 trying various iterations of my regex... This is the the closest I have so far...
originString = blah\t\"blah\tblah\"\t\"blah\"\tblah\tblah\t\"blah\tblah\t\tblah\t\"\t\"\tbleh\"
regex = \t?+\"{1}[^"]?+([\t])?+[^"]?+\"
\t?+ maybe one or more tab
\"{1} a double quote
[^"]?+ anything but a double quote
([\t])?+ capture all the tabs
[^"]?+ anything but a double quote
\"{1} a double quote
My logic is flawed!
I need your help in grouping the tab characters.

Match the double quoted substrings with a mere "[^"]+" regex (if there are no escape sequences to account for) and replace the tabs inside the matches only inside a match evaluator:
var str = "A tab\there \"inside\ta\tdouble-quoted\tsubstring\" some\there";
var pattern = "\"[^\"]+\""; // A pattern to match a double quoted substring with no escape sequences
var result = Regex.Replace(str, pattern, m =>
m.Value.Replace("\t", "-")); // Replace the tabs inside double quotes with -
Console.WriteLine(result);
// => A tab here "inside-a-double-quoted-substring" some here
See the C# demo

you can use this :
\"[^\"]*\"
originally answered here

Related

Separate title string with no spaces into words

I want to find and separate words in a title that has no spaces.
Before:
ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)"Test"'Test'[Test]
After:
This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'
I'm looking for a regular expression rule that can do the following.
I thought I'd identify each word if it starts with an uppercase letter.
But also preserve all uppercase words as not to space them into A L L U P P E R C A S E.
Additional rules:
Space a letter if it touches a number: Hello2019World Hello 2019 World
Ignore spacing initials that contain periods, hyphens, or underscores T.E.S.T.
Ignore spacing if between brackets, parentheses, or quotes [Test] (Test) "Test" 'Test'
Preserve hyphens Hello-World
C#
https://rextester.com/GAZJS38767
// Title without spaces
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]\"Test\"'Test'";
// Detect where to space words
string[] split = Regex.Split(title, "(?<!^)(?=(?<![.\\-'\"([{])[A-Z][\\d+]?)");
// Trim each word of extra spaces before joining
split = (from e in split
select e.Trim()).ToArray();
// Join into new title
string newtitle = string.Join(" ", split);
// Display
Console.WriteLine(newtitle);
Regular expression
I'm having trouble with spacing before the numbers, brackets, parentheses, and quotes.
https://regex101.com/r/9IIYGX/1
(?<!^)(?=(?<![.\-'"([{])(?<![A-Z])[A-Z][\d+?]?)
(?<!^) // Negative look behind
(?= // Positive look ahead
(?<![.\-'"([{]) // Ignore if starts with punctuation
(?<![A-Z]) // Ignore if starts with double Uppercase letter
[A-Z] // Space after each Uppercase letter
[\d+]? // Space after number
)
Solution
Thanks for all your combined effort in answers. Here's a Regex example. I'm applying this to file names and have exclude special characters \/:*?"<>|.
https://rextester.com/FYEVE73725
https://regex101.com/r/xi8L4z/1
Here is a regex which seems to work well, at least for your sample input:
(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=\W)(?=\W)
This patten says to make a split on a boundary of one of the following conditions:
what precedes is a lowercase, and what precedes is an uppercase (or
vice-versa)
what precedes is a digit and what follows is a letter (or
vice-versa)
what precedes and what follows is a non word character
(e.g. quote, parenthesis, etc.)
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]\"Test\"'Test'";
string[] split = Regex.Split(title, "(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=\\W)(?=\\W)");
split = (from e in split select e.Trim()).ToArray();
string newtitle = string.Join(" ", split);
This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'
Note: You might also want to add this assertion to the regex alternation:
(?<=\W)(?=\w)|(?<=\w)(?=\W)
We got away with this here, because this boundary condition never happened. But you might need it with other inputs.
First few parts are similar to #revo answer: (?<!^|[A-Z\p{P}])[A-Z]|(?<=\p{P})\p{P}, additionally I add the following regex to space between number and letter: (?<=[a-z])(?=\d)|(?<=\d)(?=[a-z])|(?<=[A-Z])(?=\d)|(?<=\d)(?=[A-Z]) and to detect OTPIsADevice then replace with lookahead and lookbehind to find uppercase with a lowercase: (((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
Note that | is or operator which allowed all the regex to be executed.
Regex: (?<!^|[A-Z\p{P}])[A-Z]|(?<=\p{P})\p{P}|(?<=[a-z])(?=\d)|(?<=\d)(?=[a-z])|(?<=[A-Z])(?=\d)|(?<=\d)(?=[A-Z])|(((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
Demo
Update
Improvised a bit:
From: (?<!^|[A-Z\p{P}])[A-Z]|(?<=\p{P})\p{P}|(?<=[a-z])(?=\d)|(?<=\d)(?=[a-z])|(?<=[A-Z])(?=\d)|(?<=\d)(?=[A-Z])
into: (?<!^|[A-Z\p{P}])[A-Z]|(?<=\p{P})\p{P}|(?<=\p{L})\d which do the same thing.
(((?<!^)(?<!\p{P})[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)\]}!&}]) improvised from OP comment which is adding exception to some punctuation: (((?<!^)(?<!['([{])[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)\\]}!&}])
Final regex:
(?<!^|[A-Z\p{P}])[A-Z]|(?<=\p{P})\p{P}|(?<=\p{L})\d|(((?<!^)(?<!\p{P})[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)\]}!&}])
Demo
Aiming for simplicity rather than huge regex, I would recommend this code with small simple patterns (comments with explanation are in code):
string str = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)\"Test\"'Test'[Test]";
// insert space when there is small letter followed by upercase letter
str = Regex.Replace(str, "(?<=[a-z])(?=[A-Z])", " ");
// insert space whenever there's digit followed by a ltter
str = Regex.Replace(str, #"(?<=\d)(?=[A-Za-z])", " ");
// insert space when there's letter followed by digit
str = Regex.Replace(str, #"(?<=[A-Za-z])(?=\d)", " ");
// insert space when there's one of characters ("'[ followed by letter or digit
str = Regex.Replace(str, #"(?=[(\[""'][a-zA-Z0-9])", " ");
// insert space when what preceeds is on of characters ])"'
str = Regex.Replace(str, #"(?<=[)\]""'])", " ");
You could reduce the requirements to shorten the steps of a regular expression using a different interpretation of them. For example, the first requirement would be the same as to say, preserve capital letters if they are not preceded by punctuation marks or capital letters.
The following regex works almost for all of the mentioned requirements and may be extended to include or exclude other situations:
(?<!^|[A-Z\p{P}])[A-Z]|(?<=\p{P})\p{P}
You have to use Replace() method and use $0 as substitution string.
See live demo here
.NET (See it in action):
string input = #"ThisIsAnExample.TitleHELLO-WORLD2019T.E.S.T.(Test)""Test""'Test'[Test]";
Regex regex = new Regex(#"(?<!^|[A-Z\p{P}])[A-Z]|(?<=\p{P})\p{P}", RegexOptions.Multiline);
Console.WriteLine(regex.Replace(input, #" $0"));

C# Regex, match but not include the first character before matched string

How can I make this C# Regex to not include the first character before the URL in the matching results:
((?!\").)https?:\/\/twitter\.com\/(?:#!\/)?(\w+)\/status(?:es)?\/(\d+)
This will match:
Xhttps://twitter.com/oppomobileindia/status/798397636780953600
Notice the first X letter.
I want it to match the URLs that start without double quotes. Also not include the first character before the https for those URLs that do not start with double quotes.
An actual example that I use in my code:
var str = "<div id=\"content\">
<p>https://twitter.com/oppomobileindia/status/798397636780953600</p>
<p>\"https://twitter.com/oppomobileindia/status/11111111111111111111</p></div>";
var pattern = #"(?<!""')https?://twitter\.com/(?:#!/)?(\w+)/status(?:es)?/(\d+)";//
var rgx = new Regex(pattern);
var results = rgx.Replace(str, "XXX");
In the above example, only the first URL should be replaces, because the second one has double quotation before the URL. It also should be replaced at the exact match, without the first letter before the matches string.
Use a (?<!") negative lookbehind:
var re = #"(?<!"")https?://twitter\.com/(?:#!/)?(\w+)/status(?:es)?/(\d+)";
The (?<!") means that there cannot be a " immediately before the current location.
In C#, you do not need to escape / inside the pattern since regex delimiters are not used when defining the regex.
Note on the C# syntax: if you want to define a " inside a verbatim string literal, double it. In a regular string literal, escape the " and \:
var re = "(?<!\")https?://twitter\\.com/(?:#!/)?(\\w+)/status(?:es)?/(\\d+)";

Regex pattern for splitting a delimited string in curly braces

I have the following string
{token1;token2;token3#somewhere.com;...;tokenn}
I need a Regex pattern, that would give a result in array of strings such as
token1
token2
token3#somewhere.com
...
...
...
tokenn
Would also appreciate a suggestion if can use the same pattern to confirm the format of the string, means string should start and end in curly braces and at least 2 values exist within the anchors.
You may use an anchored regex with named repeated capturing groups:
\A{(?<val>[^;]*)(?:;(?<val>[^;]*))+}\z
See the regex demo
\A - start of string
{ - a {
(?<val>[^;]*) - Group "val" capturing 0+ (due to * quantifier, if the value cannot be empty, use +) chars other than ;
(?:;(?<val>[^;]*))+ - 1 or more occurrences (thus, requiring at least 2 values inside {...}) of the sequence:
; - a semi-colon
(?<val>[^;]*) - Group "val" capturing 0+ chars other than ;
} - a literal }
\z - end of string.
.NET regex keeps each capture in a CaptureCollection stack, that is why all the values captured into "num" group can be accessed after a match is found.
C# demo:
var s = "{token1;token2;token3;...;tokenn}";
var pat = #"\A{(?<val>[^;]*)(?:;(?<val>[^;]*))+}\z";
var caps = new List<string>();
var result = Regex.Match(s, pat);
if (result.Success)
{
caps = result.Groups["val"].Captures.Cast<Capture>().Select(t=>t.Value).ToList();
}
Read it(similar to your problem): How to keep the delimiters of Regex.Split?.
For your RegEx testing use this: http://www.regexlib.com/RETester.aspx?AspxAutoDetectCookieSupport=1.
But RegEx is a very resource-intensive, slow operation.
In your case will be better to use the Split method of string class, for example : "token1;token2;token3;...;tokenn".Split(';');. It will return to you a collection of strings, that you want to obtain.

Match a particular word after double quotes - c#,regex

I want to match a particular word which is followed by double quotes.
I am using regex #"\bspecific\S*id\b" which will match anything that starts with specific and ends with id.
But, I want something which should match
"specific-anything-id"(it should be with double quotes)
**<specific-anything-id>** - should not match
specific-"anything"-id - should not match
You can include the double quotes and use a negated character class [^"] (matching any char but ") rather than \S (that can also match double quotes as it matches any non-whitespace character):
var pattern = #"""specific[^""]*id""";
You do not need word boundaries either here.
See the regex demo and a C# demo:
var s = "\"specific-anything-id\" <specific-anything-id> specific-\"anything\"-id";
var matches = Regex.Matches(s, #"""specific[^""]*id""");
foreach (Match m in matches)
Console.WriteLine(m.Value); // => "specific-anything-id"
Do:
"([^"]+)"
the matched group would contain the ID you want.

Find words in text by start and stop characters

I need to find and replace all words in text.
Format of these words :
start with (long), end with ;
example
(long)Row["Id"];
whats is the regexp pattern for this format ? I tried some but dont works for me.
Thanks.
\(long\)(.*?);
(.*?) generally tries to capture as many as necessary to find the ; at the end. And as for the (long) you will need to escape the parentheses.
Try the following:
var input = "(long)Row["Id"];";
var result = Regex.Replace(input, #"\(long\)([^;]+)", "$1.ToLong()");
The following expression: \(long\)([^;]+):
\(: Matches an open parentheses (.
long: Matches the word long literally.
\): Matches a closed parentheses ).
([^;]+): Matches one or more non-semicolon characters and puts them into capturing group 1.
As an alternative to regex, you can use String.StartsWith and String.EndsWith methods.
For example;
string[] lines = File.ReadAllLines(#"C:\Users\Public\TestFolder\Text.txt");
foreach(string word in lines)
{
if (word.StartsWith("(long)", StringComparison.InvariantCulture) && word.EndsWith(';', StringComparison.InvariantCulture))
{
//Replace your string here.
}
}

Categories