So I have the following regex.replace in C#:
Regex.Replace(inputString, #"^([^,]*,){5}(.*)", #"$1somestring,$2");
where 5 is a variable number in code, but that's not really relevant since at the time of execution it will always have a set value (like 5, for example). Same with somestring,.
Essentially I want to input somestring, between the two groups. The output works for somestring,$2, but $1 is just printed as $1. So say whatever (.*) grabs = "2, a, f2" the resulting string I'd get out is $1somestring,2,a,f2 no matter what $1 is. Is this because of the repeating group feature {5}? If so, how do I grab the collection of repeats and put it in place of where I have $1 right now?
Edit: I know the first group captures correctly, as well. I grab the content of somestring, using this regex:
Regex.Match(line, #"^([^,]*,){5}([0-9]+\.[0-9]+),.*");
The first part is identical the the first group in the replacement regex, and it works fine, so there shouldn't be an issue (and they're both used on the same string).
Edit2:
Ok I'll try to explain more of the process since someone said it was hard to understand. I have three variables, line a string I work with, and latIndex and lonIndex which are just ints (tells me between what ,'s two doubles I look for are located). I have the two following matches:
var latitudeMatch = Regex.Match(line, #"^([^,]*,){" + latIndex + #"}([0-9]+\.[0-9]+),.*");
var longitudeMatch = Regex.Match(line, #"^([^,]*,){" + lonIndex + #"}([0-9]+\.[0-9]+),.*");
I then grab the doubles:
var latitude = latitudeMatch.Groups[2].Value;
var longitude = longitudeMatch.Groups[2].Value;
I use these doubles to get a string from a web API, which i store in a variable called veiRef. Then I want to insert these after the doubles, using the following code (insert after lat or lon, depending on which one appears last):
if (latIndex > lonIndex)
{
line = Regex.Replace(line, #"^([^,]*,){" + (latIndex+1) + #"}(.*)",$#"$1{veiRef},$2");
}
else
{
line = Regex.Replace(line, #"^([^,]*,){" + (lonIndex + 1) + #"}(.*)", $#"$1{veiRef},$2");
}
However, this results in a string line which doesn't have the content of $1 inserted before it ($2 works fine).
You have a repeated capturing group at the start of the pattern that you need to turn into a non-capturing one and wrap with a capturing group. Then, you may access the whole part of the match with the $1 backreference.
var line = "a, s, f, double, double, 12, sd, 1";
var latIndex = 5;
var pat = $#"^((?:[^,]*,){{{latIndex+1}}})(.*)";
// Console.WriteLine(pat); // => ^((?:[^,]*,){6})(.*)
var veiRef = "str";
line = Regex.Replace(line, pat, $"${{1}}{veiRef.Replace("$","$$")}$2");
Console.WriteLine(line); // => a, s, f, double, double, 12,str sd, 1
See the C# demo
The pattern - ^((?:[^,]*,){6})(.*) - now contains ((?:[^,]*,){6}) after ^, and this is now what $1 holds after a match is found.
Since your replacement string is dynamic, you need to make sure any $ inside gets doubled (hence, .Replace("$","$$")) and that the first backreference is unambiguous, thus it should look like ${1} (it will work regardless whether the veiRef starts with a digit or not).
Replacement string in details:
It is an interpolated string literal...
$" - declaration of the interpolated string literal (start)
${{1}} - a literal ${1} string (the { and } must be doubled to denote literal symbols)
{veiRef.Replace("$","$$")} - a piece of C# code inside the interpolated string literal (we delimit this part where code is permitted with single {...})
$2 - a literal $2 string
" - end of the interpolated string literal.
Adding an extra group around the repeating capturing group seems to provide the desired output for the example you gave.
Regex.Replace("a, s, f, double, double, 12, sd, 1", #"^(([^,]*,){5})(.*)", #"$1somestring,$3");
I'm not an expert on RegEx and someone can probably explain it better than I, but:-
Group 1 is the set of 5 repeating capture groups
Group 2 is the last of the repeating capture groups
Group 3 is the text after the 5 repeating capture groups.
Related
I am new to C# programming language and came across the following problem
I have a string " avenue 4 TH some more words". I want to remove space between 4 and TH. I have written a regex which helps in determining whether "4 TH" is available in a string or not.
[0-9]+\s(th|nd|st|rd)
string result = "avanue 4 TH some more words";
var match = Regex.IsMatch(result,"\\b" + item + "\\b",RegexOptions.IgnoreCase) ;
Console.WriteLine(match);//True
Is there anything in C# which will remove the space
something likeRegex.Replace(result, "[0-9]+\\s(th|nd|st|rd)", "[0-9]+(th|nd|st|rd)",RegexOptions.IgnoreCase);
so that end result looks like
avenue 4TH some more words
You may use
var pattern = #"(?i)(\d+)\s*(th|[nr]d|st)\b";
var match = string.Concat(Regex.Match(result, pattern)?.Groups.Cast<Group>().Skip(1));
See the C# demo yielding 4TH.
The regex - (?i)(\d+)\s*(th|[nr]d|st)\b - matches 1 or more digits capturing the value into Group 1, then 0 or more whitespaces are matched with \s*, and then th, nd, rd or st as whole words (as \b is a word boundary) are captured into Group 2.
The Regex.Match(result, pattern)? part tries to match the pattern in the string. If there is a match, the match object Groups property is accessed and all groups are cast to aGrouplist withGroups.Cast(). Since the first group is the whole match value, we.Skip(1)` it.
The rest - the values of Group 1 and Group 2 - are concatenated with string.Concat.
I'm trying to come up with a regular expression matches the text in bold in all the examples.
Between the string "JZ" and any character before "-"
JZ123456789-301A
JZ134255872-22013
Between the string "JZ" and the last character
JZ123456789D
I have tried the following but it only works for the first example
(?<=JZ).*(?=-)
You can use (?<=JZ)[0-9]+, presuming the desired text will always be numeric.
Try it out here
You may use
JZ([^-]*)(?:-|.$)
and grab Group 1 value. See the regex demo.
Details
JZ - a literal substring
([^-]*) - Capturing group 1: zero or more chars other than -
(?:-|.$) - a non-capturing group matching either - or any char at the end of the string
C# code:
var m = Regex.Match(s, #"JZ([^-]*)(?:-|.$)");
if (m.Success)
{
Console.WriteLine(m.Groups[1].Value);
}
If, for some reason, you need to obtain the required value as a whole match, use lookarounds:
(?<=JZ)[^-]*(?=-|.$)
See this regex variation demo. Use m.Value in the code above to grab the value.
A one-line answer without regex:
string s,r;
// if your string always starts with JZ
s = "JZ123456789-301A";
r = string.Concat(s.Substring(2).TakeWhile(char.IsDigit));
Console.WriteLine(r); // output : 123456789
// if your string starts with anything
s = "A12JZ123456789-301A";
r = string.Concat(s.Substring(s.IndexOf("JZ")).TakeWhile(char.IsDigit));
Console.WriteLine(r); // output : 123456789
Basically, we remove everything before and including the delimiter "JZ", then we take each char while they are digit. The Concat is use to transform the IEnumerable<char> to a string. I think it is easier to read.
Try it online
I am developing C# MVC application. I got an account name and its code from one field from the view but I have to segregate them for storing them in database. I have used Regular Expression and successfully separated the code from rest of the string. But in the string part I can only get the string before the space or hyphen. My Regex is:
string numberPart = Regex.Match(s, #"\d+").Value;
string alphaPart = Regex.Match(s, #"[a-zA-Z]+\s+").Value;
d.code = numberPart;
d.name = alphaPart;
"2103010001 - SALES - PACKING SERV - MUTTON ( 1F )"
this is my complete string from the view. When I used the above Regex for separating code and description, I get the following,
numberPart = 2103010001
alphaPart = SALES
What I want is:
numberPart = 2103010001
alphaPart = SALES - PACKING SERV - MUTTON ( 1F )
What would be the appropriate expression to do this?
For the second regex, you essentially want "everything after (and including) the first letter". Thus you can simply try
string alphaPart = Regex.Match(s, #"[a-zA-Z].*").Value;
If you want to be more specific, you can restrict the "after" part to just the characters you expect, maybe
string alphaPart = Regex.Match(s, #"[a-zA-Z][a-zA-Z0-9 ()-]*").Value;
but you still need the leading [a-zA-Z] because otherwise you'd match the number part too.
Just do splitting accoring to the first - character.
Regex.Split(input, #"(?<=^[^-]*?)\s*-\s*");
DEMO
Suppose I have a string
Likes (20)
I want to fetch the sub-string enclosed in round brackets (in above case its 20) from this string. This sub-string can change dynamically at runtime. It might be any other number from 0 to infinity. To achieve this my idea is to use a for loop that traverses the whole string and then when a ( is present, it starts adding the characters to another character array and when ) is encountered, it stops adding the characters and returns the array. But I think this might have poor performance. I know very little about regular expressions, so is there a regular expression solution available or any function that can do that in an efficient way?
If you don't fancy using regex you could use Split:
string foo = "Likes (20)";
string[] arr = foo.Split(new char[]{ '(', ')' }, StringSplitOptions.None);
string count = arr[1];
Count = 20
This will work fine regardless of the number in the brackets ()
e.g:
Likes (242535345)
Will give:
242535345
Works also with pure string methods:
string result = "Likes (20)";
int index = result.IndexOf('(');
if (index >= 0)
{
result = result.Substring(index + 1); // take part behind (
index = result.IndexOf(')');
if (index >= 0)
result = result.Remove(index); // remove part from )
}
Demo
For a strict matching, you can do:
Regex reg = new Regex(#"^Likes\((\d+)\)$");
Match m = reg.Match(yourstring);
this way you'll have all you need in m.Groups[1].Value.
As suggested from I4V, assuming you have only that sequence of digits in the whole string, as in your example, you can use the simpler version:
var res = Regex.Match(str,#"\d+")
and in this canse, you can get the value you are looking for with res.Value
EDIT
In case the value enclosed in brackets is not just numbers, you can just change the \d with something like [\w\d\s] if you want to allow in there alphabetic characters, digits and spaces.
Even with Linq:
var s = "Likes (20)";
var s1 = new string(s.SkipWhile(x => x != '(').Skip(1).TakeWhile(x => x != ')').ToArray());
const string likes = "Likes (20)";
int likesCount = int.Parse(likes.Substring(likes.IndexOf('(') + 1, (likes.Length - likes.IndexOf(')') + 1 )));
Matching when the part in paranthesis is supposed to be a number;
string inputstring="Likes (20)"
Regex reg=new Regex(#"\((\d+)\)")
string num= reg.Match(inputstring).Groups[1].Value
Explanation:
By definition regexp matches a substring, so unless you indicate otherwise the string you are looking for can occur at any place in your string.
\d stand for digits. It will match any single digit.
We want it to potentially be repeated several times, and we want at least one. The + sign is regexp for previous symbol or group repeated 1 or more times.
So \d+ will match one or more digits. It will match 20.
To insure that we get the number that is in paranteses we say that it should be between ( and ). These are special characters in regexp so we need to escape them.
(\d+) would match (20), and we are almost there.
Since we want the part inside the parantheses, and not including the parantheses we tell regexp that the digits part is a single group.
We do that by using parantheses in our regexp. ((\d+)) will still match (20), but now it will note that 20 is a subgroup of this match and we can fetch it by Match.Groups[].
For any string in parantheses things gets a little bit harder.
Regex reg=new Regex(#"\((.+)\)")
Would work for many strings. (the dot matches any character) But if the input is something like "This is an example(parantesis1)(parantesis2)", you would match (parantesis1)(parantesis2) with parantesis1)(parantesis2 as the captured subgroup. This is unlikely to be what you are after.
The solution can be to do the matching for "any character exept a closing paranthesis"
Regex reg=new Regex(#"\(([^\(]+)\)")
This will find (parantesis1) as the first match, with parantesis1 as .Groups[1].
It will still fail for nested paranthesis, but since regular expressions are not the correct tool for nested paranthesis I feel that this case is a bit out of scope.
If you know that the string always starts with "Likes " before the group then Saves solution is better.
I'm working with X12 EDI Files (Specifically 835s for those of you in Health Care), and I have a particular vendor who's using a non-HIPAA compliant version (3090, I think). The problem is that in a particular segment (PLB- again, for those who care) they're sending a code which is no longer supported by the HIPAA Standard. I need to locate the specific code, and update it with a corrected code.
I think a Regex would be best for this, but I'm still very new to Regex, and I'm not sure where to begin. My current methodology is to turn the file into an array of strings, find the array that starts with "PLB", break that into an array of strings, find the code, and change it. As you can guess, that's very verbose code for something which should be (I'd think) fairly simple.
Here's a sample of what I'm looking for:
~PLB|1902841224|20100228|49>KC15X078001104|.08~
And here's what I want to change it to:
~PLB|1902841224|20100228|CS>KC15X078001104|.08~
Any suggestions?
UPDATE: After review, I found I hadn't quite defined my question well enough. The record above is an example, but it is not necessarilly a specific formatting match- there are three things which could change between this record and some other (in another file) I'd have to fix. They are:
The Pipe (|) could potentially be any non-alpha numeric character. The file itself will define which character (normally a Pipe or Asterisk).
The > could also be any other non-alpha numeric character (most often : or >)
The set of numbers immediately following the PLB is an identifier, and could change in format and length. I've only ever seen numeric Ids there, but technically it could be alpha numeric, and it won't necessarilly be 10 characters.
My Plan is to use String.Format() with my Regex match string so that | and > can be replaced with the correct characters.
And for the record. Yes, I hate ANSI X12.
Assuming that the "offending" code is always 49, you can use the following:
resultString = Regex.Replace(subjectString, #"(?<=~PLB|\d{10}|\d{8}|)49(?=>\w+|)", "CS");
This looks for 49 if it's the first element after a | delimiter, preceded by a group of 8 digits, another |, a group of 10 digits, yet another |, and ~PLB. It also looks if it is followed by >, then any number of alphanumeric characters, and one more |.
With the new requirements (and the lucky coincidence that .NET is one of the few regex flavors that allow variable repetition inside lookbehind), you can change that to:
resultString = Regex.Replace(subjectString, #"(?<=~PLB\1\w+\1\d{8}(\W))49(?=\W\w+\1)", "CS");
Now any non-alphanumeric character is allowed as separator instead of | or > (but in the case of | it has to be always the same one), and the restrictions on the number of characters for the first field have been loosened.
Another, similar approach that works on any valid X12 file to replace a single data value with another on a matching segment:
public void ReplaceData(string filePath, string segmentName,
int elementPosition, int componentPosition,
string oldData, string newData)
{
string text = File.ReadAllText(filePath);
Match match = Regex.Match(text,
#"^ISA(?<e>.).{100}(?<c>.)(?<s>.)(\w+.*?\k<s>)*IEA\k<e>\d*\k<e>\d*\k<s>$");
if (!match.Success)
throw new InvalidOperationException("Not an X12 file");
char elementSeparator = match.Groups["e"].Value[0];
char componentSeparator = match.Groups["c"].Value[0];
char segmentTerminator = match.Groups["s"].Value[0];
var segments = text
.Split(segmentTerminator)
.Select(s => s.Split(elementSeparator)
.Select(e => e.Split(componentSeparator)).ToArray())
.ToArray();
foreach (var segment in segments.Where(s => s[0][0] == segmentName &&
s.Count() > elementPosition &&
s[elementPosition].Count() > componentPosition &&
s[elementPosition][componentPosition] == oldData))
{
segment[elementPosition][componentPosition] = newData;
}
File.WriteAllText(filePath,
string.Join(segmentTerminator.ToString(), segments
.Select(e => string.Join(elementSeparator.ToString(),
e.Select(c => string.Join(componentSeparator.ToString(), c))
.ToArray()))
.ToArray()));
}
The regular expression used validates a proper X12 interchange envelope and assures that all segments within the file contain at least a one character name element. It also parses out the element and component separators as well as the segment terminator.
Assuming that your code is always a two digit number that comes after a pipe character | and before the greater than sign > you can do it like this:
var result = Regex.Replace(yourString, #"(\|)(\d{2})(>)", #"$1CS$3");
You can break it down with regex yes.
If i understand your example correctly the 2 characters between the | and the > need to be letters and not digits.
~PLB\|\d{10}\|\d{8}\|(\d{2})>\w{14}\|\.\d{2}~
This pattern will match the old one and capture the characters between the | and the >. Which you can then use to modify (lookup in a db or something) and do a replace with the following pattern:
(?<=|)\d{2}(?=>)
This will look for the ~PLB|#|#| at the start and replace the 2 numbers before the > with CS.
Regex.Replace(testString, #"(?<=~PLB|[0-9]{10}|[0-9]{8})(\|)([0-9]{2})(>)", #"$1CS$3")
The X12 protocol standard allows the specification of element and component separators in the header, so anything that hard-codes the "|" and ">" characters could eventually break. Since the standard mandates that the characters used as separators (and segment terminators, e.g., "~") cannot appear within the data (there is no escape sequence to allow them to be embedded), parsing the syntax is very simple. Maybe you're already doing something similar to this, but for readability...
// The original segment string (without segment terminator):
string segment = "PLB|1902841224|20100228|49>KC15X078001104|.08";
// Parse the segment into elements, then the fourth element
// into components (bounds checking is omitted for brevity):
var elements = segment.Split('|');
var components = elements[3].Split('>');
// If the first component is the bad value, replace it with
// the correct value (again, not checking bounds):
if (components[0] == "49")
components[0] = "CS";
// Reassemble the segment by joining the components into
// the fourth element, then the elements back into the
// segment string:
elements[3] = string.Join(">", components);
segment = string.Join("|", elements);
Obviously more verbose than a single regular expression but parsing X12 files is as easy as splitting strings on a single character. Except for the fixed length header (which defines the delimiters), an entire transaction set can be parsed with Split:
// Starting with a string that contains the entire 835 transaction set:
var segments = transactionSet.Split('~');
var segmentElements = segments.Select(s => s.Split('|')).ToArray();
// segmentElements contains an array of element arrays,
// each composite element can be split further into components as shown earlier
What I found is working is the following:
parts = original.Split(record);
for(int i = parts.Length -1; i >= 0; i--)
{
string s = parts[i];
string nString =String.Empty;
if (s.StartsWith("PLB"))
{
string[] elems = s.Split(elem);
if (elems[3].Contains("49" + subelem.ToString()))
{
string regex = string.Format(#"(\{0})49({1})", elem, subelem);
nString = Regex.Replace(s, regex, #"$1CS$2");
}
I'm still having to split my original file into a set of strings and then evaluate each string, but the that seams to be working now.
If anyone knows how to get around that string.Split up at the top, I'd love to see a sample.