Trim().Split causes a problem in Contains()

Trim().Split causes a problem in Contains() - c#

I am getting a string and trimming it first, then splitting it and assigning it to a string[]. Then, I am using every element in the array for a string.Contains() or string.StartsWith() method. Interesting thing is that even if the string contains element, Contains() doesn't work properly. And situation is same for StartsWith(), too. Does anyone have any idea about the problem?
P.S.: I trimmed strings after splitting and problem was solved.
string inputTxt = "tasklist";
string commands = "net, netsh, tasklist";
string[] maliciousConsoleCommands = commands.Trim(' ').Split(',');
for (int i = 0; i < maliciousConsoleCommands.Length; i++) {
if (inputTxt.StartsWith(maliciousConsoleCommands[i])) {
return false;
}
}
//this code works but no idea why previous code didn't work.
string[] maliciousConsoleCommands = commands.Split(',');
for (int i = 0; i < maliciousConsoleCommands.Length; i++) {
if (inputTxt.StartsWith(maliciousConsoleCommands[i].Trim(' '))) {
return false;
}
}
I expected to work properly but it is solved by trimming after splitting.

Your delimiter is not a comma char, it's a comma followed by a white-space - so instead of splitting by ',', simply split by ", ":
string[] maliciousConsoleCommands = commands.Split(new string[] {", "});
This will return the items without the leading space so the trim will be redundant.

It seems, you should Trim each item :
// ["net", "netsh, "tasklist"]
string[] maliciousConsoleCommands = commands
.Split(',') // "net" " netsh", " tasklist" - note leading spaces
.Select(item => item.Trim()) // removing leading spaces from each item
.ToArray();
Finally, if you want to test if inputTxt is malicious:
if (commands
.Split(',')
.Select(item => item.Trim()) // You can combine Select and Any
.Any(item => inputTxt.StartsWith(item))
return false;

First code you presented won't work because you want to trim initial string, so "net, netsh, tasklist" will stay unchanged after trimming (no leading and trailing spaces), then splitting it by comma will produce entries, that have leading space. Thus, you will get unexpected results. You should be trimming after splitting the string.
Second code also won't work, because you use Trim after StartsWith, which return bool value. You can't apply Trim to bool, this code should not even compile.

Yet another way to split if the commands themselves have no spaces is to use ' ' itself as a delimiter, and discard empty entries :
var maliciousConsoleCommands = commands.Split(new[]{',',' '},StringSplitOptions.RemoveEmptyEntries)
.ToArray();
This avoids the temporary strings generated by every string manipulation command.
For your code to work though, you'd have use Contains for each command, instead of using StartWith :
var isSuspicious = maliciousCommands.Any(cmd=>input.Contains(cmd));
Or even :
var isSuspicious = maliciousCommands.Any(input.Contains);
This can get rather slow if you have multiple commands, or if the input text is large
Regular expression alternative
A far faster technique would be to use a Regular expression. This performs a lot faster than searching individual keywords :
var regex=new Regex("net|netsh|tasklist");
var isSuspicious=regex.IsMatch(inputTxt);
Regular expressions are thread-safe which means they can be created once and reused by different threads/requests.
By using Match/Matches instead of IsMatch the regex could return the actual keywords that were detected :
var detection=regex.Match(inputTxt);
if (detection.Success)
{
var detectedKeyword=detection.Value;
....
}
Converting the original comma-separated list to a regular expression can be performed with a single String.Replace(", ") or another regular expression that can handle any whitespace character :
string commands = "net , netsh, \ttasklist";
var pattern=Regex.Replace(commands,#"\s*,\s*","|").Dump();
var regex=new Regex(pattern);
Detecting whole words only
Both Contains and the original regular expression would match tasklist1 as well as tasklist. It's possible to match whole words only, if the pattern is surrounded by the word delimiter, \b :
#"\b(" + pattern + #")\b"
This will match tasklist and net but reject tasklist1

Related

C# Regex split by comma outside the { }

I am not as familiar with RegEx as I probably should be.
However, I am looking for an expression(s) that matches a variant of values.
My string:
2020/09/10 05:41:02,ABC,888,!"#$%'()=~|{`}*+_?><-^\#[;:]./\,{"data1-1":"48.16","data1-2":"!"#$%'()=~|{`}*+_?><-^\#[;:]./\"}
I am trying to split comma using regular expression to get the result below:
string regex = "," + #"\s*(?![^{}]*\})";
List listResult = Regex.Split(myString, regex).ToList();
The received results are not correct.
Can regular expressions be used in this case?
What could i use to split that string according to every comma outside the { }? Cheers

I'm not sure how this works with regular expressions. However, instead of using regex, you could just create a list with your delimiters and use the string.split method:
char[] delim = new [] {','}; //in your case just one delimiter
var listResult = myString.Split(delim, StringSplitOptions.RemoveEmptyEntries);
The string.split method returns an array.

You can check how comma separated value format (CSV) is usually parsed.
Here with a regex : https://stackoverflow.com/a/18147076/6424355
Split using comma is simpler if you don't needs quotes

C# Char Array remove at specific index

Not to sure the best way to remove the char from the char array if the char at a given index is a number.
private string TextBox_CharacterCheck(string tocheckTextBox)
{
char[] charlist = tocheckTextBox.ToCharArray();
foreach (char character in charlist)
{
if (char.IsNumber(character))
{
}
}
return (new string(charlist));
}
Thanks in advance.
// this is now resolved. thank you to all who contributed

You could use the power of Linq:
return new string(tocheckTextBox.Where(c => !char.IsNumber(c)).ToArray())

This is fairly easy using Regex:
var result = Regex.Replace("a1b2c3d4", #"\d", "");
(as #Adassko notes, you can use "[0-9]" instead of #"\d" if you just want the digits 0 to 9, and not any other numeric characters).
You can also do it fairly efficiently using a StringBuilder:
var sb = new StringBuilder();
foreach (var ch in "a1b2c3d4")
{
if (!char.IsNumber(ch))
{
sb.Append(ch);
}
}
var result = sb.ToString();
You can also do it with linq:
var result = new string("a1b2c3d4".Where(x => !char.IsNumber(x)).ToArray());

Use Regex:
private string TextBox_CharacterCheck(string tocheckTextBox)
{
return Regex.Replace(tocheckTextBox, #"[\d]", string.Empty);;
}

System.String is immutable. You could use string.Replace or a regular expression to remove unwanted characters into a new string.

your best bet is to use regular expressions.
strings are immutable meaning that you can't change them - you need to rewrite the whole string - to do it in optimal way you should use StringBuilder class and Append every character that you want.
Also watch out for your code - char.IsNumber checks not only for characters 0-9, it also returns true for every numeric character such as ٢ and you probably don't want that.
here's the full list of characters returning true:
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙０１２３４５６７８９
you should also use [0-9] rather than \d in your regular expressions if you want only parsable digits.
You can also use a trick to .Split your string on your character, then .Join it back. This not only allows you to remove one or more characters, it also lets you to replace it with some other character.
I use this trick to remove incorrect characters from file name:
string.Join("-", possiblyIncorrectFileName.Split(Path.GetInvalidFileNameChars()))
this code will replace any character that cannot be used in valid file name to -

You can use LINQ to remove the char from the char array if the char at a given index is a number.
CODE
//This will return you the list of char discarding the number.
var removedDigits = tocheckTextBox.Where(x => !char.IsDigit(x));
//This will return the string without numbers.
string output = string.join("", removedDigits);

How to remove multiple, repeating & unnecessary punctuation from string in C#?

Considering strings like this:
"This is a string....!"
"This is another...!!"
"What is this..!?!?"
...
// There are LOTS of examples of weird/angry sentence-endings like the ones above.
I want to replace the unnecessary punctuation at the end to make it look like this:
"This is a string!"
"This is another!"
"What is this?"
What I basically do is:
- split by space
- check if last char in string contains a punctuation
- start replacing with the patterns below
I have tried a very big ".Replace(string, string)" function, but it does not work - there has to be a simpler regex I guess.
Documentation:
Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string.
As well as:
Because this method returns the modified string, you can chain together successive calls to the Replace method to perform multiple replacements on the original string.
Anything is wrong here.
EDIT: ALL the proposed solutions work fine! Thank you very much!
This one was the best suited solution for my project:
Regex re = new Regex("[.?!]*(?=[.?!]$)");
string output = re.Replace(input, "");

Your solution works almost fine (demo), the only issue is when the same sequence could be matched starting at different spots. For example, ..!?!? from your last line is not part of the substitution list, so ..!? and !? get replaced by two separate matches, producing ?? in the output.
It looks like your strategy is pretty straightforward: in a chain of multiple punctuation characters the last character wins. You can use regular expressions to do the replacement:
[!?.]*([!?.])
and replace it with $1, i.e. the capturing group that has the last character:
string s;
while ((s = Console.ReadLine()) != null) {
s = Regex.Replace(s, "[!?.]*([!?.])", "$1");
Console.WriteLine(s);
}
Demo

Simply
[.?!]*(?=[.?!]$)
should do it for you. Like
Regex re = new Regex("[.?!]*(?=[.?!]$)");
Console.WriteLine(re.Replace("This is a string....!", ""));
This replaces all punctuations but the last with nothing.
[.?!]* matches any number of consecutive punctuation characters, and the (?=[.?!]$) is a positive lookahead making sure it leaves one at the end of the string.
See it here at ideone.

Or you can do it without regExps:
string TrimPuncMarks(string str)
{
HashSet<char> punctMarks = new HashSet<char>() {'.', '!', '?'};
int i = str.Length - 1;
for (; i >= 0; i--)
{
if (!punctMarks.Contains(str[i]))
break;
}
// the very last punct mark or null if there were no any punct marks in the end
char? suffix = i < str.Length - 1 ? str[str.Length - 1] : (char?)null;
return str.Substring(0, i+1) + suffix;
}
Debug.Assert("What is this?" == TrimPuncMarks("What is this..!?!?"));
Debug.Assert("What is this" == TrimPuncMarks("What is this"));
Debug.Assert("What is this." == TrimPuncMarks("What is this."));

Extracting and Manipulating Strings in C#.Net

We have a requirement to extract and manipulate strings in C#. Net. The requirement is - we have a string
($name$:('George') AND $phonenumer$:('456456') AND
$emailaddress$:("test#test.com"))
We need to extract the strings between the character - $
Therefore, in the end, we need to get a list of strings containing - name, phonenumber, emailaddress.
What would be the ideal way to do it? are there any out of the box features available for this?
Regards,
John

The simplest way is to use a regular expression to match all non-whitespace characters between $ :
var regex=new Regex(#"\$\w+\$");
var input = "($name$:('George') AND $phonenumer$:('456456') AND $emailaddress$:(\"test#test.com\"))";
var matches=regex.Matches(input);
This will return a collection of matches. The .Value property of each match contains the matching string. \$ is used because $ has special meaning in regular expressions - it matches the end of a string. \w means a non-whitespace character. + means one or more.
Since this is a collection, you can use LINQ on it to get eg an array with the values:
var values=matches.OfType<Match>().Select(m=>m.Value).ToArray();
That array will contain the values $name$,$phonenumer$,$emailaddress$.
Capture by name
You can specify groups in the pattern and attach names to them. For example, you can group the field name values:
var regex=new Regex(#"\$(?<name>\w+)\$");
var names=regex.Matches(input)
.OfType<Match>()
.Select(m=>m.Groups["name"].Value);
This will return name,phonenumer,emailaddress. Parentheses are used for grouping. (?<somename>pattern) is used to attach a name to the group
Extract both names and values
You can also capture the field values and extract them as a separate field. Once you have the field name and value, you can return them, eg as an object or anonymous type.
The pattern in this case is more comples:
#"\$(?<name>\w+)\$:\(['""](?<value>.+?)['""]\)"
Parentheses are escaped because we want them to match the values. Both ' and " characters are used in values, so ['"] is used to specify a choice of characters. The pattern is a literal string (ie starts with #) so the double quotes have to be escaped: ['""] . Any character has to be matched .+ but only up to the next character in the pattern .+?. Without the ? the pattern .+ would match everything to the end of the string.
Putting this together:
var regex = new Regex(#"\$(?<name>\w+)\$:\(['""](?<value>.+?)['""]\)");
var myValues = regex.Matches(input)
.OfType<Match>()
.Select(m=>new { Name=m.Groups["name"].Value,
Value=m.Groups["value"].Value
})
.ToArray()
Turn them into a dictionary
Instead of ToArray() you could convert the objects to a dictionary with ToDictionary(), eg with .ToDictionary(it=>it.Name,it=>it.Value). You could omit the select step and generate the dictionary from the matches themselves :
var myDict = regex.Matches(input)
.OfType<Match>()
.ToDictionary(m=>m.Groups["name"].Value,
m=>m.Groups["value"].Value);
Regular expressions are generally fast because they don't split the string. The pattern is converted to efficient code that parses the input and skips non-matching input immediatelly. Each match and group contain only the index to their starting and ending character in the input string. A string is only generated when .Value is called.
Regular expressions are thread-safe, which means a single Regex object can be stored in a static field and reused from multiple threads. That helps in web applications, as there's no need to create a new Regex object for each request
Because of these two advantages, regular expressions are used extensively to parse log files and extract specific fields. Compared to splitting, performance can be 10 times better or more, while memory usage remains low. Splitting can easily result in memory usage that's multiple times bigger than the original input file.
Can it go faster?
Yes. Regular expressions produce parsing code that may not be as efficient as possible. A hand-written parser could be faster. In this particular case, we want to start capturing text if $ is detected up until the first $. This can be done with the following method :
IEnumerable<string> GetNames(string input)
{
var builder=new StringBuilder(20);
bool started=false;
foreach(var c in input)
{
if (started)
{
if (c!='$')
{
builder.Append(c);
}
else
{
started=false;
var value=builder.ToString();
yield return value;
builder.Clear();
}
}
else if (c=='$')
{
started=true;
}
}
}
A string is an IEnumerable<char> so we can inspect one character at a time without having to copy them. By using a single StringBuilder with a predetermined capacity we avoid reallocations, at least until we find a key that's larger than 20 characters.
Modifying this code to extract values though isn't so easy.

Here's one way to do it, but certainly not very elegant. Basically splitting the string on the '$' and taking every other item will give you the result (after some additional trimming of unwanted characters).
In this example, I'm also grabbing the value of each item and then putting both in a dictionary:
var input = "($name$:('George') AND $phonenumer$:('456456') AND $emailaddress$:(\"test#test.com\"))";
var inputParts = input.Replace(" AND ", "")
.Trim(')', '(')
.Split(new[] {'$'}, StringSplitOptions.RemoveEmptyEntries);
var keyValuePairs = new Dictionary<string, string>();
for (int i = 0; i < inputParts.Length - 1; i += 2)
{
var key = inputParts[i];
var value = inputParts[i + 1].Trim('(', ':', ')', '"', '\'', ' ');
keyValuePairs[key] = value;
}
foreach (var kvp in keyValuePairs)
{
Console.WriteLine($"{kvp.Key} = {kvp.Value}");
}
// Wait for input before closing
Console.WriteLine("\nDone!\nPress any key to exit...");
Console.ReadKey();
Output

Using Regex.Replace to keep characters that can be vary

I have the following:
string text = "version=\"1,0\"";
I want to replace the comma for a dot, while keeping the 1 and 0, BUT keeping in mind that they be different in different situations! It could be version="2,3" .
The smart ass and noob-unworking way to do it would be:
for (int i = 0; i <= 9; i++)
{
for (int z = 0; z <= 9; z++)
{
text = Regex.Replace(text, "version=\"i,z\"", "version=\"i.z\"");
}
}
But of course.. it's a string, and I dont want i and z be behave as a string in there.
I could also try the lame but working way:
text = Regex.Replace(text, "version=\"1,", "version=\"1.");
text = Regex.Replace(text, "version=\"2,", "version=\"2.");
text = Regex.Replace(text, "version=\"3,", "version=\"3.");
And so on.. but it would be lame.
Any hints on how to single-handedly handle this?
Edit: I have other commas that I don't wanna replace, so text.Replace(",",".") can't do

You need a regex like this to locate the comma
Regex reg = new Regex("(version=\"[0-9]),([0-9]\")");
Then do the repacement:
text = reg.Replace(text, "$1.$2");
You can use $1, $2, etc. to refer to the matching groups in order.

(?<=version=")(\d+),
You can try this.See demo.Replace by $1.
https://regex101.com/r/sJ9gM7/52

You can perhaps use capture groups to keep the numbers in front and after for replacement afterwards for a more 'traditional way' to do it:
string text = "version=\"1,0\"";
var regex = new Regex(#"version=""(\d*),(\d*)""");
var result = regex.Replace(text, "version=\"$1.$2\"");
Using parens like the above in a regex is to create a capture group (so the matched part can be accessed later when needed) so that in the above, the digits before and after the comma will be stored in $1 and $2 respectively.
But I decided to delve a little bit further and let's consider the case if there are more than one comma to replace in the version, i.e. if the text was version="1,1,0". It would actually be tedious to do the above, and you would have to make one replace for each 'type' of version. So here's one solution that is sometimes called a callback in other languages (not a C# dev, but I fiddled around lambda functions and it seems to work :)):
private static string SpecialReplace(string text)
{
var result = text.Replace(',', '.');
return result;
}
public static void Main()
{
string text = "version=\"1,0,0\"";
var regex = new Regex(#"version=""[\d,]*""");
var result = regex.Replace(text, x => SpecialReplace(x.Value));
Console.WriteLine(result);
}
The above gives version="1.0.0".
"version=""[\d,]*""" will first match any sequence of digits and commas within version="...", then pass it to the next line for the replace.
The replace takes the matched text, passes it to the lambda function which takes it to the function SpecialReplace, where a simple text replace is carried out only on the matched part.
ideone demo

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Trim().Split causes a problem in Contains() - c#

Your delimiter is not a comma char, it's a comma followed by a white-space - so instead of splitting by ',', simply split by ", ": string[] maliciousConsoleCommands = commands.Split(new string[] {", "}); This will return the items without the leading space so the trim will be redundant.

Related

C# Regex split by comma outside the { }

C# Char Array remove at specific index

How to remove multiple, repeating & unnecessary punctuation from string in C#?

Extracting and Manipulating Strings in C#.Net

Using Regex.Replace to keep characters that can be vary

Categories

Resources