regex to strip number from var in string

regex to strip number from var in string - c#

I have a long string and I have a var inside it
var abc = '123456'
Now I wish to get the 123456 from it.
I have tried a regex but its not working properly
Regex regex = new Regex("(?<abc>+)=(?<var>+)");
Match m = regex.Match(body);
if (m.Success)
{
string key = m.Groups["var"].Value;
}
How can I get the number from the var abc?
Thanks for your help and time

var body = #" fsd fsda f var abc = '123456' fsda fasd f";
Regex regex = new Regex(#"var (?<name>\w*) = '(?<number>\d*)'");
Match m = regex.Match(body);
Console.WriteLine("name: " + m.Groups["name"]);
Console.WriteLine("number: " + m.Groups["number"]);
prints:
name: abc
number: 123456

Your regex is not correct:
(?<abc>+)=(?<var>+)
The + are quantifiers meaning that the previous characters are repeated at least once (and there are no characters since (?< ... > ... ) is named capture group and is not considered as a character per se.
You perhaps meant:
(?<abc>.+)=(?<var>.+)
And a better regex might be:
(?<abc>[^=]+)=\s*'(?<var>[^']+)'
[^=]+ will match any character except an equal sign.
\s* means any number of space characters (will also match tabs, newlines and form feeds though)
[^']+ will match any character except a single quote.
To specifically match the variable abc, you then put it like this:
(?<abc>abc)\s*=\s*'(?<var>[^']+)'
(I added some more allowances for spaces)

From the example you provided the number can be gotten such as
Console.WriteLine (
Regex.Match("var abc = '123456'", #"(?<var>\d+)").Groups["var"].Value); // 123456
\d+ means 1 or more numbers (digits).
But I surmise your data doesn't look like your example.

Try this:
var body = #"my word 1, my word 2, my word var abc = '123456' 3, my word x";
Regex regex = new Regex(#"(?<=var \w+ = ')\d+");
Match m = regex.Match(body);

Related

C# Filter a word with an undefined number of spaces between charachers

For exampe:
I can create a wordt with multiple spaces for example:
string example = "**example**";
List<string>outputs = new List<string>();
string example_output = "";
foreach(char c in example)
{
example_putput += c + " ";
}
And then i can loop it to remve all spaces and add them to the outputs list,
The problem there is. I need it to work in scenario's where there are double spaces and more.
For example.
string text = "This is a piece of text for this **example**.";
I basicly want to detect AND remove 'example'
But, i want to do that even when it says e xample, e x ample or example.
And in my scenaria, since its a spamfilter, i cant just replace the spaces in the whole sentence like below, because i'd need to .Replace( the word with the exact same spaces as the user types it in).
.Replace(" ", "");
How would i achieve this?
TLDR:
I want to filter out a word with multiple spaces combinations without altering any other parts of the line.
So example, e xample, e x ample, e x a m ple
becomes a filter word
I wouldn't mind a method which could generates a word with all spaces as plan b.

You can use this regex to achieve that:
(e[\s]*x[\s]*a[\s]*m[\s]*p[\s]*l[\s]*e)
Link
Dotnet Fiddle

You could use a regex for that: e\s*x\s*a\s*m\s*p\s*l\s*e
\s means any whitespace character and the * means 0-n count of that whitespace.
Small snippet:
const string myInput = "e x ample";
var regex = new Regex("e\s*x\s*a\s*m\s*p\s*l\s*e");
var match = regex.Match(myInput);
if (match.Success)
{
// We have a match! Bad word
}
Here the link for the regex: https://regex101.com/r/VFjzTg/1

I see that the problem is to ignore the spaces in the matchstring, but not touch them anywhere else in the string.
You could create a regular expression out of your matchword, allowing arbitrary whitespace between each character.
// prepare regex. Need to do this only once for many applications.
string findword = "example";
// TODO: would need to escape special chars like * ( ) \ . + ? here.
string[] tmp = new string[findword.Length];
for(int i=0;i<tmp.Length;i++)tmp[i]=findword.Substring(i,1);
System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(string.Join("\\s*",tmp));
// on each text to filter, do this:
string inp = "A text with the exa mple word in it.";
string outp;
outp = r.Replace(inp,"");
System.Console.WriteLine(outp);
Left out the escaping of regex-special-chars for brevity.

You can try regular expressions:
using System.Text.RegularExpressions;
....
// Having a word to find
string toFind = "Example";
// we build the regular expression
Regex regex = new Regex(
#"\b" + string.Join(#"\s*", toFind.Select(c => Regex.Escape(c.ToString()))) + #"\b",
RegexOptions.IgnoreCase);
// Then we apply regex built for the required text:
string text = "This is a piece of text for this **example**. And more (e X amp le)";
string result = regex.Replace(text, "");
Console.Write(result);
Outcome:
This is a piece of text for this ****. And more ()
Edit: if you want to ignore diacritics, you should modify regular expression:
string toFind = "Example";
Regex regex = new Regex(#"\b" + string.Join(#"\s*",
toFind.Select(c => Regex.Escape(c.ToString()) + #"\p{Lm}*")),
RegexOptions.IgnoreCase);
and Normalize text before matching:
string text = "This is a piece of text for this **examplé**. And more (e X amp le)";
string result = regex.Replace(text.Normalize(NormalizationForm.FormD), "");

Replacing mutiple occurrences of string using string builder by regex pattern matching

We are trying to replace all matching patterns (regex) in a string builder with their respective "groups".
Firstly, we are trying to find the count of all occurrences of that pattern and loop through them (count - termination condition). For each match we are assigning the match object and replace them using their respective groups.
Here only the first occurrence is replaced and the other matches are never replaced.
*str* - contains the actual string
Regex - ('.*')\s*=\s*(.*)
To match pattern:
'nam_cd'=isnull(rtrim(x.nam_cd),''),
'Company'=isnull(rtrim(a.co_name),'')
Pattern : created using https://regex101.com/
*matches.Count* - gives the correct count (here 2)
String pattern = #"('.*')\s*=\s*(.*)";
MatchCollection matches = Regex.Matches(str, pattern);
StringBuilder sb = new StringBuilder(str);
Match match = Regex.Match(str, pattern);
for (int i = 0; i < matches.Count; i++)
{
String First = String.Empty;
Console.WriteLine(match.Groups[0].Value);
Console.WriteLine(match.Groups[1].Value);
First = match.Groups[2].Value.TrimEnd('\r');
First = First.Trim();
First = First.TrimEnd(',');
Console.WriteLine(First);
sb.Replace(match.Groups[0].Value, First + " as " + match.Groups[1].Value) + " ,", match.Index, match.Groups[0].Value.Length);
match = match.NextMatch();
}
Current output:
SELECT DISTINCT
isnull(rtrim(f.fleet),'') as 'Fleet' ,
'cust_clnt_id' = isnull(rtrim(x.cust_clnt_id),'')
Expected output:
SELECT DISTINCT
isnull(rtrim(f.fleet),'') as 'Fleet' ,
isnull(rtrim(x.cust_clnt_id),'') as 'cust_clnt_id'

A regex solution like this is too fragile. If you need to parse any arbitrary SQL, you need a dedicated parser. There are examples on how to parse SQL properly in Parsing SQL code in C#.
If you are sure there are no "wild", unbalaned ( and ) in your input, you may use a regex as a workaround, for a one-off job:
var result = Regex.Replace(s, #"('[^']+')\s*=\s*(\w+\((?>[^()]+|(?<o>\()|(?<-o>\)))*\))", "\n $2 as $1");
See the regex demo.
Details
('[^']+') - Capturing group 1 ($1): ', 1 or more chars other than ' and then '
\s*=\s* - = enclosed with 0+ whitespaces
(\w+\((?>[^()]+|(?<o>\()|(?<-o>\)))*\)) - Capturing group 2 ($2):
\w+ - 1+ word chars
\((?>[^()]+|(?<o>\()|(?<-o>\)))*\) - a (...) substring with any amount of balanced (...)s inside (see my explanation of this pattern).

Find hashtags in string

I am working on a Xamarin.Forms PCL project in C# and would like to detect all the hashtags.
I tried splitting at spaces and checking if the word begins with an # but the problem is if the post contains two spaces like "Hello #World Test" it would lose that the double space
string body = "Example string with a #hashtag in it";
string newbody = "";
foreach (var word in body.Split(' '))
{
if (word.StartsWith("#"))
newbody += "[" + word + "]";
newbody += word;
}
Goal output:
Example string with a [#hashtag] in it
I also only want it to have A-Z a-z 0-9 and _ stopping at any other character
Test #H3ll0_W0rld$%Test => Test [#H3ll0_W0rld]$%Test
Other Stack questions try to detect the string and extract it, I would like it work with it and put it back in the string without losing anything that methods such as splitting by certain characters would lose.

You can use Regex with #\w+ and $&
Explanation
# matches the character # literally (case sensitive)
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$& Includes a copy of the entire match in the replacement string.
Example
var input = "asdads sdfdsf #burgers, #rabbits dsfsdfds #sdf #dfgdfg";
var regex = new Regex(#"#\w+");
var matches = regex.Matches(input);
foreach (var match in matches)
{
Console.WriteLine(match);
}
or
var result = regex.Replace(input, "[$&]" );
Console.WriteLine(result);
Ouput
#burgers
#rabbits
#sdf
#dfgdfg
asdads sdfdsf [#burgers], [#rabbits] dsfsdfds [#sdf] [#dfgdfg]
Updated Demo here
Another Example

Use a regular expression: \#\w*
string pattern = "\#\w*";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);

RegEx string between N and (N+1)th Occurance

I am attempting to find nth occurrence of sub string between two special characters. For example.
one|two|three|four|five
Say, I am looking to find string between (n and n+1 th) 2nd and 3rd Occurrence of '|' character, which turns out to be 'three'.I want to do it using RegEx. Could someone guide me ?
My Current Attempt is as follows.
string subtext = "zero|one|two|three|four";
Regex r = new Regex(#"(?:([^|]*)|){3}");
var m = r.Match(subtext).Value;

If you have full access to C# code, you should consider a mere splitting approach:
var idx = 2; // Might be user-defined
var subtext = "zero|one|two|three|four";
var result = subtext.Split('|').ElementAtOrDefault(idx);
Console.WriteLine(result);
// => two
A regex can be used if you have no access to code (if you use some tool that is powered with .NET regex):
^(?:[^|]*\|){2}([^|]*)
See the regex demo. It matches
^ - start of string
(?:[^|]*\|){2} - 2 (or adjust it as you need) or more sequences of:
[^|]* - zero or more chars other than |
\| - a | symbol
([^|]*) - Group 1 (access via .Groups[1]): zero or more chars other than |
C# code to test:
var pat = $#"^(?:[^|]*\|){{{idx}}}([^|]*)";
var m = Regex.Match(subtext, pat);
if (m.Success) {
Console.WriteLine(m.Groups[1].Value);
}
// => two
See the C# demo
If a tool does not let you access captured groups, turn the initial part into a non-consuming lookbehind pattern:
(?<=^(?:[^|]*\|){2})[^|]*
^^^^^^^^^^^^^^^^^^^^
See this regex demo. The (?<=...) positive lookbehind only checks for a pattern presence immediately to the left of the current location, and if the pattern is not matched, the match will fail.

Use this:
(?:.*?\|){n}(.[^|]*)
where n is the number of times you need to skip your special character. The first capturing group will contain the result.
Demo for n = 2

Use this regex and then select the n-th match (in this case 2) from the Matches collection:
string subtext = "zero|one|two|three|four";
Regex r = new Regex("(?<=\|)[^\|]*");
var m = r.Matches(subtext)[2];

search string for everything before a set of characters in C#

I'm looking for a way to search a string for everything before a set of characters in C#. For Example, if this is my string value:
This is is a test.... 12345
I want build a new string with all of the characters before "12345".
So my new string would equal "This is is a test.... "
Is there a way to do this?
I've found Regex examples where you can focus on one character but not a sequence of characters.

You don't need to use a Regex:
public string GetBitBefore(string text, string end)
{
var index = text.IndexOf(end);
if (index == -1) return text;
return text.Substring(0, index);
}

You can use a lazy quantifier to match anything, followed by a lookahead:
var match = Regex.Match("This is is a test.... 12345", #".*?(?=\d{5})");
where:
.*? lazily matches everything (up to the lookahead)
(?=…) is a positive lookahead: the pattern must be matched, but is not included in the result
\d{5} matches exactly five digits. I'm assuming this is your lookahead; you can replace it

You can do so with help of regex lookahead.
.*(?=12345)
Example:
var data = "This is is a test.... 12345";
var rxStr = ".*(?=12345)";
var rx = new System.Text.RegularExpressions.Regex (rxStr,
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
var match = rx.Match(data);
if (match.Success) {
Console.WriteLine (match.Value);
}
Above code snippet will print every thing upto 12345:
This is is a test....
For more detail about see regex positive lookahead

This should get you started:
var reg = new Regex("^(.+)12345$");
var match = reg.Match("This is is a test.... 12345");
var group = match.Groups[1]; // This is is a test....
Of course you'd want to do some additional validation, but this is the basic idea.

^ means start of string
$ means end of string
The asterisk tells the engine to attempt to match the preceding token zero or more times. The plus tells the engine to attempt to match the preceding token once or more
{min,max} indicate the minimum/maximum number of matches.
\d matches a single character that is a digit, \w matches a "word character" (alphanumeric characters plus underscore), and \s matches a whitespace character (includes tabs and line breaks).
[^a] means not so exclude a
The dot matches a single character, except line break characters
In your case there many way to accomplish the task.
Eg excluding digit: ^[^\d]*
If you know the set of characters and they are not only digit, don't use regex but IndexOf(). If you know the separator between first and second part as "..." you can use Split()

Take a look at this snippet:
class Program
{
static void Main(string[] args)
{
string input = "This is is a test.... 12345";
// Here we call Regex.Match.
MatchCollection matches = Regex.Matches(input, #"(?<MySentence>(\w+\s*)*)(?<MyNumberPart>\d*)");
foreach (Match item in matches)
{
Console.WriteLine(item.Groups["MySentence"]);
Console.WriteLine("******");
Console.WriteLine(item.Groups["MyNumberPart"]);
}
Console.ReadKey();
}
}

You could just split, not as optimal as the indexOf solution
string value = "oiasjdoiasj12345";
string end = "12345";
string result = value.Split(new string[] { end }, StringSplitOptions.None)[0] //Take first part of the result, not the quickest but fairly simple

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

regex to strip number from var in string - c#

var body = #" fsd fsda f var abc = '123456' fsda fasd f"; Regex regex = new Regex(#"var (?<name>\w) = '(?<number>\d)'"); Match m = regex.Match(body); Console.WriteLine("name: " + m.Groups["name"]); Console.WriteLine("number: " + m.Groups["number"]); prints: name: abc number: 123456

From the example you provided the number can be gotten such as Console.WriteLine ( Regex.Match("var abc = '123456'", #"(?<var>\d+)").Groups["var"].Value); // 123456 \d+ means 1 or more numbers (digits). But I surmise your data doesn't look like your example.

Try this: var body = #"my word 1, my word 2, my word var abc = '123456' 3, my word x"; Regex regex = new Regex(#"(?<=var \w+ = ')\d+"); Match m = regex.Match(body);

Related

C# Filter a word with an undefined number of spaces between charachers

Replacing mutiple occurrences of string using string builder by regex pattern matching

Find hashtags in string

RegEx string between N and (N+1)th Occurance

search string for everything before a set of characters in C#

Categories

Resources

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

regex to strip number from var in string - c#

var body = #" fsd fsda f var abc = '123456' fsda fasd f"; Regex regex = new Regex(#"var (?<name>\w*) = '(?<number>\d*)'"); Match m = regex.Match(body); Console.WriteLine("name: " + m.Groups["name"]); Console.WriteLine("number: " + m.Groups["number"]); prints: name: abc number: 123456

From the example you provided the number can be gotten such as Console.WriteLine ( Regex.Match("var abc = '123456'", #"(?<var>\d+)").Groups["var"].Value); // 123456 \d+ means 1 or more numbers (digits). But I surmise your data doesn't look like your example.

Try this: var body = #"my word 1, my word 2, my word var abc = '123456' 3, my word x"; Regex regex = new Regex(#"(?<=var \w+ = ')\d+"); Match m = regex.Match(body);

Related

C# Filter a word with an undefined number of spaces between charachers

Replacing mutiple occurrences of string using string builder by regex pattern matching

Find hashtags in string

RegEx string between N and (N+1)th Occurance

search string for everything before a set of characters in C#

Categories

Resources

var body = #" fsd fsda f var abc = '123456' fsda fasd f"; Regex regex = new Regex(#"var (?<name>\w) = '(?<number>\d)'"); Match m = regex.Match(body); Console.WriteLine("name: " + m.Groups["name"]); Console.WriteLine("number: " + m.Groups["number"]); prints: name: abc number: 123456