RegEx: Split string by separator and then by another

RegEx: Split string by separator and then by another - c#

There is a problem with needed behavior.
Assume there is a
sourceString = #"name1$$value1^name2$$value2^name3$$value3";
maybe more long string...
I'd like to first split by ^ separator and then by another $$ to create dictionary based on this name-value pairs.
This string is stored in file so may be too long, any split operations may take too much time.
I hope there is a regex with match by ^ and internal groupmatch by $$.

This regex (.*?)\$\$(.*?)(?:\^|$) will match the name value pairs, and here is a Rubular to prove it. And to use it you can use the following code:
var input = "name1$$value1^name2$$value2^name3$$value3";
var pattern = #"(.*?)\$\$(.*?)(?:\^|$)";
var hash = new Dictionary<string, string>();
var match = Regex.Match(input, pattern);
while (match.Success)
{
hash.Add(match.Groups[1].Value, match.Groups[2].Value);
match = match.NextMatch();
}

Why not use:
sourceString.Split(new char[] {'^'}, StringSplitOptions.RemoveEmptyEntries)
Then you can do the same for $$

Related

Regex match and replace operators in math operation

Given an input string
12/3
12*3/12
(12*54)/(3/4)
I need to find and replace each operator with a string that contains the operator
some12text/some3text
some12text*some2text/some12text
(some12text*some54text)/(some3text/some4text)
practical application:
From a backend (c#), i have the following string
34*157
which i need to translate to:
document.getElementById("34").value*document.getElementById("157").value
and returned to the screen which can be run in an eval() function.
So far I have
var pattern = #"\d+";
var input = "12/3;
Regex r = new Regex(pattern);
var matches = r.Matches(input);
foreach (Match match in matches)
{
// im at a loss what to match and replace here
}
Caution: i cannot do a blanket input.Replace() in the foreach loop, as it may incorrectly replace (12/123) - it should only match the first 12 to replace
Caution2: I can use string.Remove and string.Insert, but that mutates the string after the first match, so it throws off the calculation of the next match
Any pointers appreciated

Here you go
string pattern = #"\d+"; //machtes 1-n consecutive digits
var input = "(12*54)/(3/4)";
string result = Regex.Replace(input, pattern, "some$0Text");
$0 is the character group matching the pattern \d+. You can also write
string result = Regex.Replace(input, pattern, m => "some"+ m.Groups[0]+ "Text");
Fiddle: https://dotnetfiddle.net/JUknx2

Match the last bracket

I have a string which contains some text followed by some brackets with different content (possibly empty). I need to extract the last bracket with its content:
atext[d][][ef] // should return "[ef]"
other[aa][][a] // should return "[a]"
xxxxx[][xx][x][][xx] // should return "[xx]"
yyyyy[] // should return "[]"
I have looked into RegexOptions.RightToLeft and read up on lazy vs greedy matching, but I can't for the life of me get this one right.

This regex will work
.*(\[.*\])
Regex Demo
More efficient and non-greedy version
.*(\[[^\]]*\])
C# Code
string input = "atext[d][][ef]\nother[aa][][a]\nxxxxx[][xx][x][][xx]\nyyyyy[]";
string pattern = "(?m).*(\\[.*\\])";
Regex rgx = new Regex(pattern);
Match match = rgx.Match(input);
while (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
match = match.NextMatch();
}
Ideone Demo
It may give unexpected results for nested [] or unbalanced []

Alternatively, you could reverse the string using a function similar to this:
public static string Reverse( string s )
{
char[] charArray = s.ToCharArray();
Array.Reverse( charArray );
return new string( charArray );
}
And then you could perform a simple Regex search to just look for the first [someText] group or just use a for loop to iterate through and then stop when the first ] is reached.

With negative lookahead:
\[[^\]]*\](?!\[)
This is relatively efficient and flexible, without the evil .*. This will be also work with longer text which contains multiple instances.
Regex101 demo here

The correct way for .net is indeed to use the regex option RightToLeft with the appropriate method Regex.Match(String, String, RegexOptions).
In this way you keep the pattern very simple and efficient since it doesn't produce the less backtracking step and, since the pattern ends with a literal character (the closing bracket), allows a quick search for possible positions in the string where the pattern may succeeds before the "normal" walk of the regex engine.
public static void Main()
{
string input = #"other[aa][][a]";
string pattern = #"\[[^][]*]";
Match m = Regex.Match(input, pattern, RegexOptions.RightToLeft);
if (m.Success)
Console.WriteLine("Found '{0}' at position {1}.", m.Value, m.Index);
}

Using regex to remove everything that is not in between '<#'something'#>' and replace it with commas

I have a string, for example
<#String1#> + <#String2#> , <#String3#> --<#String4#>
And I want to use regex/string manipulation to get the following result:
<#String1#>,<#String2#>,<#String3#>,<#String4#>
I don't really have any experience doing this, any tips?

There are multiple ways to do something like this, and it depends on exactly what you need. However, if you want to use a single regex operation to do it, and you only want to fix stuff that comes between the bracketed strings, then you could do this:
string input = "<#String1#> + <#String2#> , <#String3#> --<#String4#>";
string pattern = "(?<=>)[^<>]+(?=<)";
string replacement = ",";
string result = Regex.Replace(input, pattern, replacement);
The pattern uses [^<>]+ to match any non-pointy-bracket characters, but it combines it with a look-behind statement ((?<=>)) and a look-ahead statement (?=<) to make sure that it only matches text that occurs between a closing and another opening set of brackets.
If you need to remove text that comes before the first < or after the last >, or if you find the look-around statements confusing, you may want to consider simply matching the text that comes between the brackets and then loop through all the matches and build a new string yourself, rather than using the RegEx.Replace method. For instance:
string input = "sdfg<#String1#> + <#String2#> , <#String3#> --<#String4#>ag";
string pattern = #"<[^<>]+>";
List<String> values = new List<string>();
foreach (Match m in Regex.Matches(input, pattern))
values.Add(m.Value);
string result = String.Join(",", values);
Or, the same thing using LINQ:
string input = "sdfg<#String1#> + <#String2#> , <#String3#> --<#String4#>ag";
string pattern = #"<[^<>]+>";
string result = String.Join(",", Regex.Matches(input, pattern).Cast<Match>().Select(x => x.Value));

If you're just after string manipulation and don't necessarily need a regex, you could simply use the string.Replace method.
yourString = yourString.Replace("#> + <#", "#>,<#");

get an special Substring in c#

I need to extract a substring from an existing string. This String starts with uninteresting characters (include "," "space" and numbers) and ends with ", 123," or ", 57," or something like this where the numbers can change. I only need the Numbers.
Thanks

public static void Main(string[] args)
{
string input = "This is 2 much junk, 123,";
var match = Regex.Match(input, #"(\d*),$"); // Ends with at least one digit
// followed by comma,
// grab the digits.
if(match.Success)
Console.WriteLine(match.Groups[1]); // Prints '123'
}

Regex to match numbers: Regex regex = new Regex(#"\d+");
Source (slightly modified): Regex for numbers only

I think this is what you're looking for:
Remove all non numeric characters from a string using Regex
using System.Text.RegularExpressions;
...
string newString = Regex.Replace(oldString, "[^.0-9]", "");
(If you don't want to allow the decimal delimiter in the final result, remove the . from the regular expression above).

Try something like this :
String numbers = new String(yourString.TakeWhile(x => char.IsNumber(x)).ToArray());

You can use \d+ to match all digits within a given string
So your code would be
var lst=Regex.Matches(inp,reg)
.Cast<Match>()
.Select(x=x.Value);
lst now contain all the numbers
But if your input would be same as provided in your question you don't need regex
input.Substring(input.LastIndexOf(", "),input.LastIndexOf(","));

Regex to strip characters except given ones?

I would like to strip strings but only leave the following:
[a-zA-Z]+[_a-zA-Z0-9-]*
I am trying to output strings that start with a character, then can have alphanumeric, underscores, and dashes. How can I do this with RegEx or another function?

Because everything in the second part of the regex is in the first part, you could do something like this:
String foo = "_-abc.!##$5o993idl;)"; // your string here.
//First replace removes all the characters you don't want.
foo = Regex.Replace(foo, "[^_a-zA-Z0-9-]", "");
//Second replace removes any characters from the start that aren't allowed there.
foo = Regex.Replace(foo, "^[^a-zA-Z]+", "");
So start out by paring it down to only the allowed characters. Then get rid of any allowed characters that can't be at the beginning.
Of course, if your regex gets more complicated, this solution falls apart fairly quickly.

Assuming that you've got the strings in a collection, I would do it this way:
foreach element in the collection try match the regex
if !success, remove the string from the collection
Or the other way round - if it matches, add it to a new collection.
If the strings are not in a collection can you add more details as to what your input looks like ?

If you want to pull out all of the identifiers matching your regular expression, you can do it like this:
var input = " _wontmatch f_oobar0 another_valid ";
var re = new Regex( #"\b[a-zA-Z][_a-zA-Z0-9-]*\b" );
foreach( Match match in re.Matches( input ) )
Console.WriteLine( match.Value );

Use MatchCollection matchColl = Regex.Matches("input string","your regex");
Then use:
string [] outStrings = new string[matchColl.Count]; //A string array to contain all required strings
for (int i=0; i < matchColl.Count; i++ )
outStrings[i] = matchColl[i].ToString();
You will have all the required strings in outStrings. Hope this helps.

Edited
var s = Regex.Matches(input_string, "[a-z]+(_*-*[a-z0-9]*)*", RegexOptions.IgnoreCase);
string output_string="";
foreach (Match m in s)
{
output_string = output_string + m;
}
MessageBox.Show(output_string);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

RegEx: Split string by separator and then by another - c#

Why not use: sourceString.Split(new char[] {'^'}, StringSplitOptions.RemoveEmptyEntries) Then you can do the same for $$

Related

Regex match and replace operators in math operation

Match the last bracket

Using regex to remove everything that is not in between '<#'something'#>' and replace it with commas

get an special Substring in c#

Regex to strip characters except given ones?

Categories

Resources