C# Regex split by comma outside the { } - c#

I am not as familiar with RegEx as I probably should be.
However, I am looking for an expression(s) that matches a variant of values.
My string:
2020/09/10 05:41:02,ABC,888,!"#$%'()=~|{`}*+_?><-^\#[;:]./\,{"data1-1":"48.16","data1-2":"!"#$%'()=~|{`}*+_?><-^\#[;:]./\"}
I am trying to split comma using regular expression to get the result below:
string regex = "," + #"\s*(?![^{}]*\})";
List listResult = Regex.Split(myString, regex).ToList();
The received results are not correct.
Can regular expressions be used in this case?
What could i use to split that string according to every comma outside the { }? Cheers

I'm not sure how this works with regular expressions. However, instead of using regex, you could just create a list with your delimiters and use the string.split method:
char[] delim = new [] {','}; //in your case just one delimiter
var listResult = myString.Split(delim, StringSplitOptions.RemoveEmptyEntries);
The string.split method returns an array.

You can check how comma separated value format (CSV) is usually parsed.
Here with a regex : https://stackoverflow.com/a/18147076/6424355
Split using comma is simpler if you don't needs quotes

Related

Trim().Split causes a problem in Contains()

I am getting a string and trimming it first, then splitting it and assigning it to a string[]. Then, I am using every element in the array for a string.Contains() or string.StartsWith() method. Interesting thing is that even if the string contains element, Contains() doesn't work properly. And situation is same for StartsWith(), too. Does anyone have any idea about the problem?
P.S.: I trimmed strings after splitting and problem was solved.
string inputTxt = "tasklist";
string commands = "net, netsh, tasklist";
string[] maliciousConsoleCommands = commands.Trim(' ').Split(',');
for (int i = 0; i < maliciousConsoleCommands.Length; i++) {
if (inputTxt.StartsWith(maliciousConsoleCommands[i])) {
return false;
}
}
//this code works but no idea why previous code didn't work.
string[] maliciousConsoleCommands = commands.Split(',');
for (int i = 0; i < maliciousConsoleCommands.Length; i++) {
if (inputTxt.StartsWith(maliciousConsoleCommands[i].Trim(' '))) {
return false;
}
}
I expected to work properly but it is solved by trimming after splitting.
Your delimiter is not a comma char, it's a comma followed by a white-space - so instead of splitting by ',', simply split by ", ":
string[] maliciousConsoleCommands = commands.Split(new string[] {", "});
This will return the items without the leading space so the trim will be redundant.
It seems, you should Trim each item :
// ["net", "netsh, "tasklist"]
string[] maliciousConsoleCommands = commands
.Split(',') // "net" " netsh", " tasklist" - note leading spaces
.Select(item => item.Trim()) // removing leading spaces from each item
.ToArray();
Finally, if you want to test if inputTxt is malicious:
if (commands
.Split(',')
.Select(item => item.Trim()) // You can combine Select and Any
.Any(item => inputTxt.StartsWith(item))
return false;
First code you presented won't work because you want to trim initial string, so "net, netsh, tasklist" will stay unchanged after trimming (no leading and trailing spaces), then splitting it by comma will produce entries, that have leading space. Thus, you will get unexpected results. You should be trimming after splitting the string.
Second code also won't work, because you use Trim after StartsWith, which return bool value. You can't apply Trim to bool, this code should not even compile.
Yet another way to split if the commands themselves have no spaces is to use ' ' itself as a delimiter, and discard empty entries :
var maliciousConsoleCommands = commands.Split(new[]{',',' '},StringSplitOptions.RemoveEmptyEntries)
.ToArray();
This avoids the temporary strings generated by every string manipulation command.
For your code to work though, you'd have use Contains for each command, instead of using StartWith :
var isSuspicious = maliciousCommands.Any(cmd=>input.Contains(cmd));
Or even :
var isSuspicious = maliciousCommands.Any(input.Contains);
This can get rather slow if you have multiple commands, or if the input text is large
Regular expression alternative
A far faster technique would be to use a Regular expression. This performs a lot faster than searching individual keywords :
var regex=new Regex("net|netsh|tasklist");
var isSuspicious=regex.IsMatch(inputTxt);
Regular expressions are thread-safe which means they can be created once and reused by different threads/requests.
By using Match/Matches instead of IsMatch the regex could return the actual keywords that were detected :
var detection=regex.Match(inputTxt);
if (detection.Success)
{
var detectedKeyword=detection.Value;
....
}
Converting the original comma-separated list to a regular expression can be performed with a single String.Replace(", ") or another regular expression that can handle any whitespace character :
string commands = "net , netsh, \ttasklist";
var pattern=Regex.Replace(commands,#"\s*,\s*","|").Dump();
var regex=new Regex(pattern);
Detecting whole words only
Both Contains and the original regular expression would match tasklist1 as well as tasklist. It's possible to match whole words only, if the pattern is surrounded by the word delimiter, \b :
#"\b(" + pattern + #")\b"
This will match tasklist and net but reject tasklist1

Splitting on “,” but not “/,”

Question: How do I write an expression to split a string on ',' but not '/,'? Later I'll want to replace '/,' with ', '.
Details...
Delimiter: ','
Skip Char: '/'
Example input: "Mister,Bill,is,made,of/,clay"
I want to split this input into an array: {"Mister", "Bill", "is", "made", "of, clay"}
I know how to do this with a char prev, cur; and some indexers, but that seems beta.
Java Regex has a split functionality, but I don't know how to replicate this behavior in C#.
Note: This isn't a duplicate question, this is the same question but for a different language.
I believe you're looking for a negative lookbehind:
var regex = new Regex("(?<!/),");
var result = regex.Split(str);
this will split str on all commas that are not preceded by a slash. If you want to keep the '/,' in the string then this will work for you.
Since you said that you wanted to split the string and later replace the '/,' with ', ', you'll want to do the above first then you can iterate over the result and replace the strings like so:
var replacedResult = result.Select(s => s.Replace("/,", ", ");
string s = "Mister,Bill,is,made,of/,clay";
var arr = s.Replace("/,"," ").Split(',');
result : {"Mister", "Bill", "is", "made", "of clay"}
Using Regex:
var result = Regex.Split("Mister,Bill,is,made,of/,clay", "(?<=[^/]),");
Just use a Replace to remove the commas from your string :
s.Replace("/,", "//").Split(',').Select(x => x.Replace("//", ","));
You can use this in c#
string regex = #"(?:[^\/]),";
var match = Regex.Split("Mister,Bill,is,made,of/,clay", regex, RegexOptions.IgnoreCase);
After that you can replace /, and continue your operation as you like

Split separates strings and ignore/remove other delimited string

I'm reading a comma-delimited list of strings from a config file. I need to check the following steps
1) check to see if the string has `[`, if it is then remove or ignore...
2) split `,` `-` //which i am doing below...
Here is what I able to do so far;
string mediaString = "Cnn-[news],msnbc";
string[] split = mediaString.Split(new Char[] { ',', '-' }); //gets me the bracket
what I want is to ignore/remove the string which is in the brackets so the end result should be like this:
mediaString =
Cnn
msnbc
Using Linq:
mediaString.Split(new Char[] { ',', '-' }).Where(val => !val.Contains('[')
You can make the test (val.Contains(...)) as sophisticated as you like (e.g. starts and ends with, regular expression, specific values, call an object provided via a DI framework if you want to get all enterprisey).
Use Regex replace to clean your string
string str = #"Cnn-[news],msnbc";
Regex regex = new Regex(#"\[.*\]");
string cleanStr = regex.Replace(str, "");
string[] split = cleanStr.Split(new Char[] { ',', '-' });
Without using LINQ or regex:
Split your string as you are doing now.
Create a data structure of string type for example: List.
Run over the results array and for each entry check it contains the specified character, if it doesn't add it to the List.
In the end you should have a List with the required result.
This The regex solution is far more elegant but if you cannot use reg ex this one should do it

C# Why i can not split the string?

string myNumber = "3.44";
Regex regex1 = new Regex(".");
string[] substrings = regex1.Split(myNumber);
foreach (var substring in substrings)
{
Console.WriteLine("The string is : {0} and the length is {1}",substring, substring.Length);
}
Console.ReadLine();
I tried to split the string by ".", but it the splits return 4 empty string. Why?
. means "any character" in regular expressions. So don't split using a regex - split using String.Split:
string[] substrings = myNumber.Split('.');
If you really want to use a regex, you could use:
Regex regex1 = new Regex(#"\.");
The # makes it a verbatim string literal, to stop you from having to escape the backslash. The backslash within the string itself is an escape for the dot within the regex parser.
the easiest solution would be: string[] val = myNumber.Split('.');
. is a reserved character in regex. if you literally want to match a period, try:
Regex regex1 = new Regex(#"\.");
However, you're better off simply using myNumber.Split(".");
The dot matches a single character, without caring what that character
is. The only exception are newline characters.
Source: http://www.regular-expressions.info/dot.html
Therefore your implying in your code to split the string at each character.
Use this instead.
string substr = num.Split('.');
Keep it simple, use String.Split() method;
string[] substrings = myNumber.Split('.');
It has an other overload which allows specifying split options:
public string[] Split(
char[] separator,
StringSplitOptions options
)
You don't need regex you do that by using Split method of string object
string myNumber = "3.44";
String[] substrings = myNumber.Split(".");
foreach (var substring in substrings)
{
Console.WriteLine("The string is : {0} and the length is {1}",substring, substring.Length);
}
Console.ReadLine();
The period "." is being interpreted as any single character instead of a literal period.
Instead of using regular expressions you could just do:
string[] substrings = myNumber.Split(".");
In Regex patterns, the period character matches any single character. If you want the Regex to match the actual period character, you must escape it in the pattern, like so:
#"\."
Now, this case is somewhat simple for Regex matching; you could instead use String.Split() which will split based on the occurrence of one or more static strings or characters:
string[] substrings = myNumber.Split('.');
try
Regex regex1 = new Regex(#"\.");
EDIT: Er... I guess under a minute after Jon Skeet is not too bad, anyway...
You'll want to place an escape character before the "." - like this "\\."
"." in a regex matches any character, so if you pass 4 characters to a regex with only ".", it will return four empty strings. Check out this page for common operators.
Try
Regex regex1 = new Regex("[.]");

How can I split a string using regex to return a list of values?

How can I take the string foo[]=1&foo[]=5&foo[]=2 and return a collection with the values 1,5,2 in that order. I am looking for an answer using regex in C#. Thanks
In C# you can use capturing groups
private void RegexTest()
{
String input = "foo[]=1&foo[]=5&foo[]=2";
String pattern = #"foo\[\]=(\d+)";
Regex regex = new Regex(pattern);
foreach (Match match in regex.Matches(input))
{
Console.Out.WriteLine(match.Groups[1]);
}
}
I don't know C#, but...
In java:
String[] nums = String.split(yourString, "&?foo[]");
The second argument in the String.split() method is a regex telling the method where to split the String.
I'd use this particular pattern:
string re = #"foo\[\]=(?<value>\d+)";
So something like (not tested):
Regex reValues = new Regex(re,RegexOptions.Compiled);
List<integer> values = new List<integer>();
foreach (Match m in reValues.Matches(...putInputStringHere...)
{
values.Add((int) m.Groups("value").Value);
}
Use the Regex.Split() method with an appropriate regex. This will split on parts of the string that match the regular expression and return the results as a string[].
Assuming you want all the values in your querystring without checking if they're numeric, (and without just matching on names like foo[]) you could use this: "&?[^&=]+="
string[] values = Regex.Split(“foo[]=1&foo[]=5&foo[]=2”, "&?[^&=]+=");
Incidentally, if you're playing with regular expressions the site http://gskinner.com/RegExr/ is fantastic (I'm just a fan).
Assuming you're dealing with numbers this pattern should match:
/=(\d+)&?/
This should do:
using System.Text.RegularExpressions;
Regex.Replace(s, !#"^[0-9]*$”, "");
Where s is your String where you want the numbers to be extracted.
Just make sure to escape the ampersand like so:
/=(\d+)\&/
Here's an alternative solution using the built-in string.Split function:
string x = "foo[]=1&foo[]=5&foo[]=2";
string[] separator = new string[2] { "foo[]=", "&" };
string[] vals = x.Split(separator, StringSplitOptions.RemoveEmptyEntries);

Categories