I try to remove any commas from my strings, but not all of them. I've searched for this all over the forum but I cant find a solution to it. Let me explain it with an example:
So basically, I have a file with many lines. They look like this:
,,,,,9,33380,32785,14,,,,,50,,,,,,3,,,,600
,,,,19,33399,32774,14,,,,,50,,,,,,2,,,,600
,,,,19,33399,32784,14,,,,,50,,,,,,3,,,,600
,,,,38,33380,32789,14,,,,,50,,,,,,1,,,,600
,,,,38,33404,32793,14,,,,,50,,,,,,1,,,,600
,,,,79,33394,32795,14,,,,,50,,,,,,2,,,,600
,,,,83,33396,32789,14,,,,,50,,,,,,5,,,,600
,,,100,33399,32779,14,,,,,50,,,,,,3,,,,600
,,,101,33399,32797,14,,,,,50,,,,,,2,,,,600
The output I want is to keep a single comma between the values. And also remove any leading commas from beginning of string. Like this:
9,33380,32785,14,50,3,600
19,33399,32774,14,50,2,600
19,33399,32784,14,50,3,600
38,33380,32789,14,50,1,600
38,33404,32793,14,50,1,600
79,33394,32795,14,50,2,600
83,33396,32789,14,50,5,600
100,33399,32779,14,50,3,600
101,33399,32797,14,50,2,600
I've tried to use text.Replace(',','') but then it removes them all.
Unfortunately I am not very good at RegEx either and I don't even know if this is possible with that.
Any help would be appreciated!
You can use the following regex to condense multiple commas into one and then Trim to remove any leading or trailing commas
var result = Regex.Replace(inpyut, ",+", ",").Trim(',');
There are couple solutions
Splitting by , delimiter and removing empty entries (',,'), and then rebuilding the string with the same , delimiter.
var result = string.Join(",", ",,,,,9,33380,3272774,,".Split(new [] { ',' }, StringSplitOptions.RemoveEmptyEntries);
Here's my solution
text = ",,,,,9,33380,32785,14,,,,,50,,,,,,3,,,,600";
while (text.Contains(",,"))
{
text = text.Replace(",,", ",");
}
if (text.Substring(0, 1) == ",") {
int textLength = text.Length;
text = text.Substring(1, textLength - 1);
}
hope it can help you :)
#juharr's answer looks great, but if performance is a problem them it may be better to trim before replace and alter two or more commas to one. Thus giving:
var result = Regex.Replace(inpyut.Trim(','), ",,+", ",");
I am getting a string and trimming it first, then splitting it and assigning it to a string[]. Then, I am using every element in the array for a string.Contains() or string.StartsWith() method. Interesting thing is that even if the string contains element, Contains() doesn't work properly. And situation is same for StartsWith(), too. Does anyone have any idea about the problem?
P.S.: I trimmed strings after splitting and problem was solved.
string inputTxt = "tasklist";
string commands = "net, netsh, tasklist";
string[] maliciousConsoleCommands = commands.Trim(' ').Split(',');
for (int i = 0; i < maliciousConsoleCommands.Length; i++) {
if (inputTxt.StartsWith(maliciousConsoleCommands[i])) {
return false;
}
}
//this code works but no idea why previous code didn't work.
string[] maliciousConsoleCommands = commands.Split(',');
for (int i = 0; i < maliciousConsoleCommands.Length; i++) {
if (inputTxt.StartsWith(maliciousConsoleCommands[i].Trim(' '))) {
return false;
}
}
I expected to work properly but it is solved by trimming after splitting.
Your delimiter is not a comma char, it's a comma followed by a white-space - so instead of splitting by ',', simply split by ", ":
string[] maliciousConsoleCommands = commands.Split(new string[] {", "});
This will return the items without the leading space so the trim will be redundant.
It seems, you should Trim each item :
// ["net", "netsh, "tasklist"]
string[] maliciousConsoleCommands = commands
.Split(',') // "net" " netsh", " tasklist" - note leading spaces
.Select(item => item.Trim()) // removing leading spaces from each item
.ToArray();
Finally, if you want to test if inputTxt is malicious:
if (commands
.Split(',')
.Select(item => item.Trim()) // You can combine Select and Any
.Any(item => inputTxt.StartsWith(item))
return false;
First code you presented won't work because you want to trim initial string, so "net, netsh, tasklist" will stay unchanged after trimming (no leading and trailing spaces), then splitting it by comma will produce entries, that have leading space. Thus, you will get unexpected results. You should be trimming after splitting the string.
Second code also won't work, because you use Trim after StartsWith, which return bool value. You can't apply Trim to bool, this code should not even compile.
Yet another way to split if the commands themselves have no spaces is to use ' ' itself as a delimiter, and discard empty entries :
var maliciousConsoleCommands = commands.Split(new[]{',',' '},StringSplitOptions.RemoveEmptyEntries)
.ToArray();
This avoids the temporary strings generated by every string manipulation command.
For your code to work though, you'd have use Contains for each command, instead of using StartWith :
var isSuspicious = maliciousCommands.Any(cmd=>input.Contains(cmd));
Or even :
var isSuspicious = maliciousCommands.Any(input.Contains);
This can get rather slow if you have multiple commands, or if the input text is large
Regular expression alternative
A far faster technique would be to use a Regular expression. This performs a lot faster than searching individual keywords :
var regex=new Regex("net|netsh|tasklist");
var isSuspicious=regex.IsMatch(inputTxt);
Regular expressions are thread-safe which means they can be created once and reused by different threads/requests.
By using Match/Matches instead of IsMatch the regex could return the actual keywords that were detected :
var detection=regex.Match(inputTxt);
if (detection.Success)
{
var detectedKeyword=detection.Value;
....
}
Converting the original comma-separated list to a regular expression can be performed with a single String.Replace(", ") or another regular expression that can handle any whitespace character :
string commands = "net , netsh, \ttasklist";
var pattern=Regex.Replace(commands,#"\s*,\s*","|").Dump();
var regex=new Regex(pattern);
Detecting whole words only
Both Contains and the original regular expression would match tasklist1 as well as tasklist. It's possible to match whole words only, if the pattern is surrounded by the word delimiter, \b :
#"\b(" + pattern + #")\b"
This will match tasklist and net but reject tasklist1
I have a string which consists number of ordered terms separated by lines (\n) as it shown in the following example: (note, the string I have is an element of an array of string)
term 1
term 2
.......
.......
term n
I want to split a specific number of terms, let we say (1000) only and discard the rest of the terms. I'm trying the following code :
string[] training = traindocs[tr].Trim().Split('\n');
List <string> trainterms = new List<string>();
for (int i = 0; i < 1000; i++)
{
if (i >= training.Length)
break;
trainterms.Add(training[i].Trim().Split('\t')[0]);
}
Can I conduct this operation without using List or any other data structure? I mean just extract the specific number of the terms into the the Array (training) directly ?? thanks in advance.
How about LINQ? The .Take() extension method kind of seems to fit your bill:
List<string> trainterms = traindocs[tr].Trim().Split('\n').Take(1000).ToList();
According to MSDN you can use an overloaded version of the split method.
public string[] Split( char[] separator, int count,
StringSplitOptions options )
Parameters
separator Type: System.Char[] An array of Unicode characters that
delimit the substrings in this string, an empty array that contains no
delimiters, or null.
count Type: System.Int32 The maximum number of
substrings to return.
options Type: System.StringSplitOptions
StringSplitOptions.RemoveEmptyEntries to omit empty array elements
from the array returned; or StringSplitOptions.None to include empty
array elements in the array returned.
Return Value
Type: System.String[] An array whose elements contain the substrings
in this string that are delimited by one or more characters in
separator. For more information, see the Remarks section.
So something like so:
String str = "A,B,C,D,E,F,G,H,I";
String[] str2 = str.Split(new Char[]{','}, 5, StringSplitOptions.RemoveEmptyEntries);
System.Console.WriteLine(str2.Length);
System.Console.Read();
Would print: 5
EDIT:
Upon further investigation it seems that the count parameter just instructs when the splitting stops. The rest of the string will be kept in the last element.
So, the code above, would yield the following result:[0] = A, [1] = B, [2] = C, [3] = D, [4] = E,F,G,H,I, which is not something you seem to be after.
To fix this, you would need to do something like so:
String str = "A\nB\nC\nD\nE\nF\nG\nH\nI";
List<String> myList = str.Split(new Char[]{'\n'}, 5, StringSplitOptions.RemoveEmptyEntries).ToList<String>();
myList[myList.Count - 1] = myList[myList.Count - 1].Split(new Char[] { '\n' })[0];
System.Console.WriteLine(myList.Count);
foreach (String str1 in myList)
{
System.Console.WriteLine(str1);
}
System.Console.Read();
The code above will only retain the first 5 (in your case, 1000) elements. Thus, I think that Darin's solution might be cleaner, if you will.
If you want most efficient(fastest) way, you have to use overload of String.Split, passing total number of items required.
If you want easy way, use LINQ.
today I was wondering if there is a better solution perform the following code sample.
string keyword = " abc, foo , bar";
string match = "foo";
string[] split= keyword.Split(new char[] { ',', ';' }, StringSplitOptions.RemoveEmptyEntries);
foreach(string s in split)
{
if(s.Trim() == match){// asjdklasd; break;}
}
Is there a way to perform trim() without manually iterating through each item? I'm looking for something like 'split by the following chars and automatically trim each result'.
Ah, immediatly before posting I found
List<string> parts = line.Split(';').Select(p => p.Trim()).ToList();
in How can I split and trim a string into parts all on one line?
Still I'm curious: Might there be a better solution to this? (Or would the compiler probably convert them to the same code output as the Linq-Operation?)
Another possible option (that avoids LINQ, for better or worse):
string line = " abc, foo , bar";
string[] parts= Array.ConvertAll(line.Split(','), p => p.Trim());
However, if you just need to know if it is there - perhaps short-circuit?
bool contains = line.Split(',').Any(p => p.Trim() == match);
var parts = line
.Split(';')
.Select(p => p.Trim())
.Where(p => !string.IsNullOrWhiteSpace(p))
.ToArray();
I know this is 10 years too late but you could have just split by ' ' as well:
string[] split= keyword.Split(new char[] { ',', ';', ' ' }, StringSplitOptions.RemoveEmptyEntries);
Because you're also splitting by the space char AND instructing the split to remove the empty entries, you'll have what you need.
If spaces just surrounds the words in the comma separated string this will work:
var keyword = " abc, foo , bar";
var array = keyword.Replace(" ", "").Split(',');
if (array.Contains("foo"))
{
Debug.Print("Match");
}
I would suggest using regular expressions on the original string, looking for the pattern "any number of spaces followed by one of your delimiters followed by one or more spaces" and remove those spaces. Then split.
Try this:
string keyword = " abc, foo , bar";
string match = "foo";
string[] split = Regex.Split(keyword.Trim(), #"\s*[,;]\s*");
if (split.Contains(match))
{
// do stuff
}
You're going to find a lot of different methods of doing this and the performance change and accuracy isn't going to be readily apparent. I'd recommend plugging them all into a testing suite like NUnit in order both to find which one comes out on top AND which ones are accurate.
Use small, medium, and large amounts of text in loops to examine the various situations.
Starting with .Net 5, there is an easier option:
string[] split= keyword.Split(new char[] { ',', ';' }, StringSplitOptions.TrimEntries);
You can combine it with the option to remove empty entries:
string[] split= keyword.Split(new char[] { ',', ';' }, StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyEntries);