I am developing C# MVC application. I got an account name and its code from one field from the view but I have to segregate them for storing them in database. I have used Regular Expression and successfully separated the code from rest of the string. But in the string part I can only get the string before the space or hyphen. My Regex is:
string numberPart = Regex.Match(s, #"\d+").Value;
string alphaPart = Regex.Match(s, #"[a-zA-Z]+\s+").Value;
d.code = numberPart;
d.name = alphaPart;
"2103010001 - SALES - PACKING SERV - MUTTON ( 1F )"
this is my complete string from the view. When I used the above Regex for separating code and description, I get the following,
numberPart = 2103010001
alphaPart = SALES
What I want is:
numberPart = 2103010001
alphaPart = SALES - PACKING SERV - MUTTON ( 1F )
What would be the appropriate expression to do this?
For the second regex, you essentially want "everything after (and including) the first letter". Thus you can simply try
string alphaPart = Regex.Match(s, #"[a-zA-Z].*").Value;
If you want to be more specific, you can restrict the "after" part to just the characters you expect, maybe
string alphaPart = Regex.Match(s, #"[a-zA-Z][a-zA-Z0-9 ()-]*").Value;
but you still need the leading [a-zA-Z] because otherwise you'd match the number part too.
Just do splitting accoring to the first - character.
Regex.Split(input, #"(?<=^[^-]*?)\s*-\s*");
DEMO
Related
I have a string which I extract from an HTML document like this:
var elas = htmlDoc.DocumentNode.SelectSingleNode("//a[#class='a-size-small a-link-normal a-text-normal']");
if (elas != null)
{
//
_extractedString = elas.Attributes["href"].Value;
}
The HREF attribute contains this part of the string:
gp/offer-listing/B002755TC0/
And I'm trying to extract the B002755TC0 value, but the problem here is that the string will vary by its length and I cannot simply use Substring method that C# offers to extract that value...
Instead I was thinking if there's a clever way to do this, to perhaps a match beginning of the string with what I search?
For example I know for a fact that each href has this structure like I've shown, So I would simply match these keywords:
offer-listing/
So I would find this keyword and start extracting the part of the string B002755TC0 until the next " / " sign ?
Can someone help me out with this ?
This is a perfect job for a regular expression :
string text = "gp/offer-listing/B002755TC0/";
Regex pattern = new Regex(#"offer-listing/(\w+)/");
Match match = pattern.Match(text);
string whatYouAreLookingFor = match.Groups[1].Value;
Explanation : we just match the exact pattern you need.
'offer-listing/'
followed by any combination of (at least one) 'word characters' (letters, digits, hyphen, etc...),
followed by a slash.
The parenthesis () mean 'capture this group' (so we can extract it later with match.Groups[1]).
EDIT: if you want to extract also from this : /dp/B01KRHBT9Q/
Then you could use this pattern :
Regex pattern = new Regex(#"/(\w+)/$");
which will match both this string and the previous. The $ stands for the end of the string, so this literally means :
capture the characters in between the last two slashes of the string
Though there is already an accepted answer, I thought of sharing another solution, without using Regex. Just find the position of your pattern in the input + it's lenght, so the wanted text will be the next character. to find the end, search for the first "/" after the begining of the wanted text:
string input = "gp/offer-listing/B002755TC0/";
string pat = "offer-listing/";
int begining = input.IndexOf(pat)+pat.Length;
int end = input.IndexOf("/",begining);
string result = input.Substring(begining,end-begining);
If your desired output is always the last piece, you could also use split and get the last non-empty piece:
string result2 = input.Split(new string[]{"/"},StringSplitOptions.RemoveEmptyEntries)
.ToList().Last();
So I have the following regex.replace in C#:
Regex.Replace(inputString, #"^([^,]*,){5}(.*)", #"$1somestring,$2");
where 5 is a variable number in code, but that's not really relevant since at the time of execution it will always have a set value (like 5, for example). Same with somestring,.
Essentially I want to input somestring, between the two groups. The output works for somestring,$2, but $1 is just printed as $1. So say whatever (.*) grabs = "2, a, f2" the resulting string I'd get out is $1somestring,2,a,f2 no matter what $1 is. Is this because of the repeating group feature {5}? If so, how do I grab the collection of repeats and put it in place of where I have $1 right now?
Edit: I know the first group captures correctly, as well. I grab the content of somestring, using this regex:
Regex.Match(line, #"^([^,]*,){5}([0-9]+\.[0-9]+),.*");
The first part is identical the the first group in the replacement regex, and it works fine, so there shouldn't be an issue (and they're both used on the same string).
Edit2:
Ok I'll try to explain more of the process since someone said it was hard to understand. I have three variables, line a string I work with, and latIndex and lonIndex which are just ints (tells me between what ,'s two doubles I look for are located). I have the two following matches:
var latitudeMatch = Regex.Match(line, #"^([^,]*,){" + latIndex + #"}([0-9]+\.[0-9]+),.*");
var longitudeMatch = Regex.Match(line, #"^([^,]*,){" + lonIndex + #"}([0-9]+\.[0-9]+),.*");
I then grab the doubles:
var latitude = latitudeMatch.Groups[2].Value;
var longitude = longitudeMatch.Groups[2].Value;
I use these doubles to get a string from a web API, which i store in a variable called veiRef. Then I want to insert these after the doubles, using the following code (insert after lat or lon, depending on which one appears last):
if (latIndex > lonIndex)
{
line = Regex.Replace(line, #"^([^,]*,){" + (latIndex+1) + #"}(.*)",$#"$1{veiRef},$2");
}
else
{
line = Regex.Replace(line, #"^([^,]*,){" + (lonIndex + 1) + #"}(.*)", $#"$1{veiRef},$2");
}
However, this results in a string line which doesn't have the content of $1 inserted before it ($2 works fine).
You have a repeated capturing group at the start of the pattern that you need to turn into a non-capturing one and wrap with a capturing group. Then, you may access the whole part of the match with the $1 backreference.
var line = "a, s, f, double, double, 12, sd, 1";
var latIndex = 5;
var pat = $#"^((?:[^,]*,){{{latIndex+1}}})(.*)";
// Console.WriteLine(pat); // => ^((?:[^,]*,){6})(.*)
var veiRef = "str";
line = Regex.Replace(line, pat, $"${{1}}{veiRef.Replace("$","$$")}$2");
Console.WriteLine(line); // => a, s, f, double, double, 12,str sd, 1
See the C# demo
The pattern - ^((?:[^,]*,){6})(.*) - now contains ((?:[^,]*,){6}) after ^, and this is now what $1 holds after a match is found.
Since your replacement string is dynamic, you need to make sure any $ inside gets doubled (hence, .Replace("$","$$")) and that the first backreference is unambiguous, thus it should look like ${1} (it will work regardless whether the veiRef starts with a digit or not).
Replacement string in details:
It is an interpolated string literal...
$" - declaration of the interpolated string literal (start)
${{1}} - a literal ${1} string (the { and } must be doubled to denote literal symbols)
{veiRef.Replace("$","$$")} - a piece of C# code inside the interpolated string literal (we delimit this part where code is permitted with single {...})
$2 - a literal $2 string
" - end of the interpolated string literal.
Adding an extra group around the repeating capturing group seems to provide the desired output for the example you gave.
Regex.Replace("a, s, f, double, double, 12, sd, 1", #"^(([^,]*,){5})(.*)", #"$1somestring,$3");
I'm not an expert on RegEx and someone can probably explain it better than I, but:-
Group 1 is the set of 5 repeating capture groups
Group 2 is the last of the repeating capture groups
Group 3 is the text after the 5 repeating capture groups.
Slightly similar to this question, I want to replace argv contents:
string argv = "-help=none\n-URL=(default)\n-password=look\n-uname=Khanna\n-p=100";
to this:
"-help=none\n-URL=(default)\n-password=********\n-uname=Khanna\n-p=100"
I have tried very basic string find and search operations (using IndexOf, SubString etc.). I am looking for more elegant solution so as to replace this part of string:
-password=AnyPassword
to:
-password=*******
And keep other part of string intact. I am looking if String.Replace or Regex replace may help.
What I've tried (not much of error-checks):
var pwd_index = argv.IndexOf("--password=");
string converted;
if (pwd_index >= 0)
{
var leftPart = argv.Substring(0, pwd_index);
var pwdStr = argv.Substring(pwd_index);
var rightPart = pwdStr.Substring(pwdStr.IndexOf("\n") + 1);
converted = leftPart + "--password=********\n" + rightPart;
}
else
converted = argv;
Console.WriteLine(converted);
Solution
Similar to Rubens Farias' solution but a little bit more elegant:
string argv = "-help=none\n-URL=(default)\n-password=\n-uname=Khanna\n-p=100";
string result = Regex.Replace(argv, #"(password=)[^\n]*", "$1********");
It matches password= literally, stores it in capture group $1 and the keeps matching until a \n is reached.
This yields a constant number of *'s, though. But telling how much characters a password has, might already convey too much information to hackers, anyway.
Working example: https://dotnetfiddle.net/xOFCyG
Regular expression breakdown
( // Store the following match in capture group $1.
password= // Match "password=" literally.
)
[ // Match one from a set of characters.
^ // Negate a set of characters (i.e., match anything not
// contained in the following set).
\n // The character set: consists only of the new line character.
]
* // Match the previously matched character 0 to n times.
This code replaces the password value by several "*" characters:
string argv = "-help=none\n-URL=(default)\n-password=look\n-uname=Khanna\n-p=100";
string result = Regex.Replace(argv, #"(password=)([\s\S]*?\n)",
match => match.Groups[1].Value + new String('*', match.Groups[2].Value.Length - 1) + "\n");
You can also remove the new String() part and replace it by a string constant
I am trying to create a regex for this task, but I really can't grasp the understanding of regex apart from very simple cases :-( :
The problem: I have this ("SQL like") query:
SELECT tcmcs003.*, tccom130.nama, tccom705.dsca, tcmcs052.dsca, tccom100.nama
FROM tcmcs003, tccom130,tccom705,tcmcs052,tccom100
WHERE tcmcs003.cadr REFERS TO tccom130
AND tcmcs003.casi REFERS TO tccom705
AND tcmcs003.cprj REFERS TO tcmcs052
AND tcmcs003.bpid REFERS TO tccom100
ORDER BY tcmcs003._index1
I want to "extract" all the table names and column names, and after that I want to simply add my characters to them...
For example replace:
SELECT tcmcs003.*, tccom130.nama
with:
SELECT tcmcs003XXX.*, tccom130XXX.namaYYY
Up to now I have the "best" regex I have is this:
(?<gselect>SELECT\s+)*(?<tname>\w{5}\d{3})*(?<spaces>[\.\,\s])+(?<colname>\w{4})*
And replacement pattern:
${gselect}${tname}XXX${spaces}${colname}YYY
The output is really terrible :-(
SELECT tcmcs003.
m130
.nama
m705
.dsca
s052
.dsca
m100
.nama
FROM
s003
m130
,m705
,s052
,m100
WHER
s003
.cadr
REFE
m130
s003
How can I write the regex?
I want to capture repeteately something like
[(any string)(table name)(\.a dot or not)(column name)(any string) ] (repeat N times)
EDIT
I am writing in C#
The pattern should be a bit more general that:
\b(tc(?:mcs|com)\d{3}XXX.\w+)\b
in the sense that table name is 5 characters (the first is always a t, followed by 4 random chars) followed by 3 random digits
table column is 4 random chars
Instead of trying to match the whole command, I'll simply match each table or column independently. Since tables have digits in its name, there's few chances it could match something else.
Match column names with:
\b(t\w{4}\d{3}\.\w{4})\b
Match table names with:
\b(t\w{4}\d{3})\b
Then, we can replace each with the desired value: "$1YYY" and "$1XXX" respectively. The patterns use these constructs:
\b Matches a word boundary (a word char on one side and not a word char on the other).
\w{4} Matches 4 word chars ([A-Za-z0-9_]).
\d{3} Matches 3 digits ([0-9]).
Code:
string input = #"SELECT tcmcs003.*, tccom130.nama, tccom705.dsca, tcmcs052.dsca, tccom100.nama
FROM tcmcs003, tccom130,tccom705,tcmcs052,tccom100
WHERE tcmcs003.cadr REFERS TO tccom130
AND tcmcs003.casi REFERS TO tccom705
AND tcmcs003.cprj REFERS TO tcmcs052
AND tcmcs003.bpid REFERS TO tccom100
ORDER BY tcmcs003._index1";
string Pattern1 = #"\b(t\w{4}\d{3}\.\w{4})\b";
string Pattern2 = #"\b(t\w{4}\d{3})\b";
Regex r1 = new Regex(Pattern1);
Regex r2 = new Regex(Pattern2);
string replacement1 = "YYY";
string replacement2 = "XXX";
string result = "";
result = r1.Replace(input, "$1" + replacement1);
result = r2.Replace(result, "$1" + replacement2);
Console.WriteLine(result);
ideone Demo
I am having a regular expression
Regex r = new Regex(#"(\s*)([A|B|C|E|G|H|J|K|L|M|N|P|R|S|T|V|Y|X]\d(?!.*[DFIOQU])(?:[A-Z](\s?)\d[A-Z]\d))(\s*)",RegexOptions.IgnoreCase);
and having a string
string test="LJHLJHL HJGJKDGKJ JGJK C1C 1C1 LKJLKJ";
I have to fetch C1C 1C1.This running fine.
But if a modify test string as
string test="LJHLJHL HJGJKDGKJ JGJK C1C 1C1 ON";
then it is unable to find the pattern i.e C1C 1C1.
any idea why this expression is failing?
You have a negative look ahead:
(?!.*[DFIOQU])
That matches the "O" in "ON" and since it is a negative look ahead, the whole pattern fails. And, as an aside, I think you want to replace this:
[A|B|C|E|G|H|J|K|L|M|N|P|R|S|T|V|Y|X]
With this:
[A-CEGHJ-NPR-TVYX]
A pipe (|) is a literal character inside a character class, not an alternation, and you can use ranges to help hilight the characters that you're leaving out.
A single regex might not be the best way to parse that string. Or perhaps you just need a looser regex.
You are searching for a not a following DFIOQU with your negative look ahead (?!.*[DFIOQU])
In your second string there is a O at the end in ON, so it must be failing to match.
If you remove the .* in your negative look ahead it will only check the directly following character and not the complete string to the end (Is it this what you want?).
\s*([ABCEGHJKLMNPRSTVYX]\d(?![DFIOQU])(?:[A-Z]\s?\d[A-Z]\d))\s*
then it works, see it here on Regexr. It is now checking if there is not one of the characters in the class directly after the digit, I don't know if this is intended.
Btw. I removed the | from your first character class, its not needed and also some brackets around your whitespaces, also not needed.
As I understood you need to find the C1C 1C1 text in your string
I've used this regex for do this
string strRegex = #"^.*(?<c1c>C1C)\s*(?<c1c2>1C1).*$";
after that you can extract text from named groups
string strRegex = #"^.*(?<c1c>C1C)\s*(?<c1c2>1C1).*$";
RegexOptions myRegexOptions = RegexOptions.Multiline;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"LJHLJHL HJGJKDGKJ JGJK C1C 1C1 LKJLKJ";
string secondStr = "LJHLJHL HJGJKDGKJ JGJK C1C 1C1 ON";
Match match = myRegex.Match(strTargetString);
string c1c = match.Groups["c1c"].Value;
string c1c2 = match.Groups["c1c2"].Value;
Console.WriteLine(c1c + " " +c1c2);