C# Char Array remove at specific index - c#

Not to sure the best way to remove the char from the char array if the char at a given index is a number.
private string TextBox_CharacterCheck(string tocheckTextBox)
{
char[] charlist = tocheckTextBox.ToCharArray();
foreach (char character in charlist)
{
if (char.IsNumber(character))
{
}
}
return (new string(charlist));
}
Thanks in advance.
// this is now resolved. thank you to all who contributed

You could use the power of Linq:
return new string(tocheckTextBox.Where(c => !char.IsNumber(c)).ToArray())

This is fairly easy using Regex:
var result = Regex.Replace("a1b2c3d4", #"\d", "");
(as #Adassko notes, you can use "[0-9]" instead of #"\d" if you just want the digits 0 to 9, and not any other numeric characters).
You can also do it fairly efficiently using a StringBuilder:
var sb = new StringBuilder();
foreach (var ch in "a1b2c3d4")
{
if (!char.IsNumber(ch))
{
sb.Append(ch);
}
}
var result = sb.ToString();
You can also do it with linq:
var result = new string("a1b2c3d4".Where(x => !char.IsNumber(x)).ToArray());

Use Regex:
private string TextBox_CharacterCheck(string tocheckTextBox)
{
return Regex.Replace(tocheckTextBox, #"[\d]", string.Empty);;
}

System.String is immutable. You could use string.Replace or a regular expression to remove unwanted characters into a new string.

your best bet is to use regular expressions.
strings are immutable meaning that you can't change them - you need to rewrite the whole string - to do it in optimal way you should use StringBuilder class and Append every character that you want.
Also watch out for your code - char.IsNumber checks not only for characters 0-9, it also returns true for every numeric character such as ٢ and you probably don't want that.
here's the full list of characters returning true:
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙0123456789
you should also use [0-9] rather than \d in your regular expressions if you want only parsable digits.
You can also use a trick to .Split your string on your character, then .Join it back. This not only allows you to remove one or more characters, it also lets you to replace it with some other character.
I use this trick to remove incorrect characters from file name:
string.Join("-", possiblyIncorrectFileName.Split(Path.GetInvalidFileNameChars()))
this code will replace any character that cannot be used in valid file name to -

You can use LINQ to remove the char from the char array if the char at a given index is a number.
CODE
//This will return you the list of char discarding the number.
var removedDigits = tocheckTextBox.Where(x => !char.IsDigit(x));
//This will return the string without numbers.
string output = string.join("", removedDigits);

Related

Trim().Split causes a problem in Contains()

I am getting a string and trimming it first, then splitting it and assigning it to a string[]. Then, I am using every element in the array for a string.Contains() or string.StartsWith() method. Interesting thing is that even if the string contains element, Contains() doesn't work properly. And situation is same for StartsWith(), too. Does anyone have any idea about the problem?
P.S.: I trimmed strings after splitting and problem was solved.
string inputTxt = "tasklist";
string commands = "net, netsh, tasklist";
string[] maliciousConsoleCommands = commands.Trim(' ').Split(',');
for (int i = 0; i < maliciousConsoleCommands.Length; i++) {
if (inputTxt.StartsWith(maliciousConsoleCommands[i])) {
return false;
}
}
//this code works but no idea why previous code didn't work.
string[] maliciousConsoleCommands = commands.Split(',');
for (int i = 0; i < maliciousConsoleCommands.Length; i++) {
if (inputTxt.StartsWith(maliciousConsoleCommands[i].Trim(' '))) {
return false;
}
}
I expected to work properly but it is solved by trimming after splitting.
Your delimiter is not a comma char, it's a comma followed by a white-space - so instead of splitting by ',', simply split by ", ":
string[] maliciousConsoleCommands = commands.Split(new string[] {", "});
This will return the items without the leading space so the trim will be redundant.
It seems, you should Trim each item :
// ["net", "netsh, "tasklist"]
string[] maliciousConsoleCommands = commands
.Split(',') // "net" " netsh", " tasklist" - note leading spaces
.Select(item => item.Trim()) // removing leading spaces from each item
.ToArray();
Finally, if you want to test if inputTxt is malicious:
if (commands
.Split(',')
.Select(item => item.Trim()) // You can combine Select and Any
.Any(item => inputTxt.StartsWith(item))
return false;
First code you presented won't work because you want to trim initial string, so "net, netsh, tasklist" will stay unchanged after trimming (no leading and trailing spaces), then splitting it by comma will produce entries, that have leading space. Thus, you will get unexpected results. You should be trimming after splitting the string.
Second code also won't work, because you use Trim after StartsWith, which return bool value. You can't apply Trim to bool, this code should not even compile.
Yet another way to split if the commands themselves have no spaces is to use ' ' itself as a delimiter, and discard empty entries :
var maliciousConsoleCommands = commands.Split(new[]{',',' '},StringSplitOptions.RemoveEmptyEntries)
.ToArray();
This avoids the temporary strings generated by every string manipulation command.
For your code to work though, you'd have use Contains for each command, instead of using StartWith :
var isSuspicious = maliciousCommands.Any(cmd=>input.Contains(cmd));
Or even :
var isSuspicious = maliciousCommands.Any(input.Contains);
This can get rather slow if you have multiple commands, or if the input text is large
Regular expression alternative
A far faster technique would be to use a Regular expression. This performs a lot faster than searching individual keywords :
var regex=new Regex("net|netsh|tasklist");
var isSuspicious=regex.IsMatch(inputTxt);
Regular expressions are thread-safe which means they can be created once and reused by different threads/requests.
By using Match/Matches instead of IsMatch the regex could return the actual keywords that were detected :
var detection=regex.Match(inputTxt);
if (detection.Success)
{
var detectedKeyword=detection.Value;
....
}
Converting the original comma-separated list to a regular expression can be performed with a single String.Replace(", ") or another regular expression that can handle any whitespace character :
string commands = "net , netsh, \ttasklist";
var pattern=Regex.Replace(commands,#"\s*,\s*","|").Dump();
var regex=new Regex(pattern);
Detecting whole words only
Both Contains and the original regular expression would match tasklist1 as well as tasklist. It's possible to match whole words only, if the pattern is surrounded by the word delimiter, \b :
#"\b(" + pattern + #")\b"
This will match tasklist and net but reject tasklist1

Extract all numbers from string

Let's say I have a string such as 123ad456. I want to make a method that separates the groups of numbers into a list, so then the output will be something like 123,456.
I've tried doing return Regex.Match(str, #"-?\d+").Value;, but that only outputs the first occurrence of a number, so the output would be 123. I also know I can use Regex.Matches, but from my understanding, that would output 123456, not separating the different groups of numbers.
I also see from this page on MSDN that Regex.Match has an overload that takes the string to find a match for and an int as an index at which to search for the match, but I don't see an overload that takes in the above in addition to a parameter for the regex pattern to search for, and the same goes for Regex.Matches.
I guess the approach to use would be to use a for loop of some sort, but I'm not entirely sure what to do. Help would be greatly appreciated.
All you have to to use Matches instead of Match. Then simply iterate over all matches:
string result = "";
foreach (Match match in Regex.Matches(str, #"-?\d+"))
{
result += match.result;
}
You may iterate over string data using foreach and use TryParse to check each character.
foreach (var item in stringData)
{
if (int.TryParse(item.ToString(), out int data))
{
// int data is contained in variable data
}
}
Using a combination of string.Join and Regex.Matches:
string result = string.Join(",", Regex.Matches(str, #"-?\d+").Select(m => m.Value));
string.Join performs better than continually appending to an existing string.
\d+ is the regex for integer numbers;
//System.Text.RegularExpressions.Regex
resultString = Regex.Match(subjectString, #"\d+").Value;
returns a string with the very first occurence of a number in subjectString.
Int32.Parse(resultString) will then give you the number.

Extracting and Manipulating Strings in C#.Net

We have a requirement to extract and manipulate strings in C#. Net. The requirement is - we have a string
($name$:('George') AND $phonenumer$:('456456') AND
$emailaddress$:("test#test.com"))
We need to extract the strings between the character - $
Therefore, in the end, we need to get a list of strings containing - name, phonenumber, emailaddress.
What would be the ideal way to do it? are there any out of the box features available for this?
Regards,
John
The simplest way is to use a regular expression to match all non-whitespace characters between $ :
var regex=new Regex(#"\$\w+\$");
var input = "($name$:('George') AND $phonenumer$:('456456') AND $emailaddress$:(\"test#test.com\"))";
var matches=regex.Matches(input);
This will return a collection of matches. The .Value property of each match contains the matching string. \$ is used because $ has special meaning in regular expressions - it matches the end of a string. \w means a non-whitespace character. + means one or more.
Since this is a collection, you can use LINQ on it to get eg an array with the values:
var values=matches.OfType<Match>().Select(m=>m.Value).ToArray();
That array will contain the values $name$,$phonenumer$,$emailaddress$.
Capture by name
You can specify groups in the pattern and attach names to them. For example, you can group the field name values:
var regex=new Regex(#"\$(?<name>\w+)\$");
var names=regex.Matches(input)
.OfType<Match>()
.Select(m=>m.Groups["name"].Value);
This will return name,phonenumer,emailaddress. Parentheses are used for grouping. (?<somename>pattern) is used to attach a name to the group
Extract both names and values
You can also capture the field values and extract them as a separate field. Once you have the field name and value, you can return them, eg as an object or anonymous type.
The pattern in this case is more comples:
#"\$(?<name>\w+)\$:\(['""](?<value>.+?)['""]\)"
Parentheses are escaped because we want them to match the values. Both ' and " characters are used in values, so ['"] is used to specify a choice of characters. The pattern is a literal string (ie starts with #) so the double quotes have to be escaped: ['""] . Any character has to be matched .+ but only up to the next character in the pattern .+?. Without the ? the pattern .+ would match everything to the end of the string.
Putting this together:
var regex = new Regex(#"\$(?<name>\w+)\$:\(['""](?<value>.+?)['""]\)");
var myValues = regex.Matches(input)
.OfType<Match>()
.Select(m=>new { Name=m.Groups["name"].Value,
Value=m.Groups["value"].Value
})
.ToArray()
Turn them into a dictionary
Instead of ToArray() you could convert the objects to a dictionary with ToDictionary(), eg with .ToDictionary(it=>it.Name,it=>it.Value). You could omit the select step and generate the dictionary from the matches themselves :
var myDict = regex.Matches(input)
.OfType<Match>()
.ToDictionary(m=>m.Groups["name"].Value,
m=>m.Groups["value"].Value);
Regular expressions are generally fast because they don't split the string. The pattern is converted to efficient code that parses the input and skips non-matching input immediatelly. Each match and group contain only the index to their starting and ending character in the input string. A string is only generated when .Value is called.
Regular expressions are thread-safe, which means a single Regex object can be stored in a static field and reused from multiple threads. That helps in web applications, as there's no need to create a new Regex object for each request
Because of these two advantages, regular expressions are used extensively to parse log files and extract specific fields. Compared to splitting, performance can be 10 times better or more, while memory usage remains low. Splitting can easily result in memory usage that's multiple times bigger than the original input file.
Can it go faster?
Yes. Regular expressions produce parsing code that may not be as efficient as possible. A hand-written parser could be faster. In this particular case, we want to start capturing text if $ is detected up until the first $. This can be done with the following method :
IEnumerable<string> GetNames(string input)
{
var builder=new StringBuilder(20);
bool started=false;
foreach(var c in input)
{
if (started)
{
if (c!='$')
{
builder.Append(c);
}
else
{
started=false;
var value=builder.ToString();
yield return value;
builder.Clear();
}
}
else if (c=='$')
{
started=true;
}
}
}
A string is an IEnumerable<char> so we can inspect one character at a time without having to copy them. By using a single StringBuilder with a predetermined capacity we avoid reallocations, at least until we find a key that's larger than 20 characters.
Modifying this code to extract values though isn't so easy.
Here's one way to do it, but certainly not very elegant. Basically splitting the string on the '$' and taking every other item will give you the result (after some additional trimming of unwanted characters).
In this example, I'm also grabbing the value of each item and then putting both in a dictionary:
var input = "($name$:('George') AND $phonenumer$:('456456') AND $emailaddress$:(\"test#test.com\"))";
var inputParts = input.Replace(" AND ", "")
.Trim(')', '(')
.Split(new[] {'$'}, StringSplitOptions.RemoveEmptyEntries);
var keyValuePairs = new Dictionary<string, string>();
for (int i = 0; i < inputParts.Length - 1; i += 2)
{
var key = inputParts[i];
var value = inputParts[i + 1].Trim('(', ':', ')', '"', '\'', ' ');
keyValuePairs[key] = value;
}
foreach (var kvp in keyValuePairs)
{
Console.WriteLine($"{kvp.Key} = {kvp.Value}");
}
// Wait for input before closing
Console.WriteLine("\nDone!\nPress any key to exit...");
Console.ReadKey();
Output

c# How do trim all non numeric character in a string

what is the faster way to trim all alphabet in a string that have alphabet prefix.
For example, input sting "ABC12345" , and i wish to havee 12345 as output only.
Thanks.
Please use "char.IsDigit", try this:
static void Main(string[] args)
{
var input = "ABC12345";
var numeric = new String(input.Where(char.IsDigit).ToArray());
Console.Read();
}
You can use Regular Expressions to trim an alphabetic prefix
var input = "ABC123";
var trimmed = Regex.Replace(input, #"^[A-Za-z]+", "");
// trimmed = "123"
The regular expression (second parameter) ^[A-Za-z]+ of the replace method does most of the work, it defines what you want to be replaced using the following rules:
The ^ character ensures a match only exists at the start of a string
The [A-Za-z] will match any uppercase or lowercase letters
The + means the upper or lowercase letters will be matched as many times in a row as possible
As this is the Replace method, the third parameter then replaces any matches with an empty string.
The other answers seem to answer what is the slowest way .. so if you really need the fastest way, then you can find the index of the first digit and get the substring:
string input = "ABC12345";
int i = 0;
while ( input[i] < '0' || input[i] > '9' ) i++;
string output = input.Substring(i);
The shortest way to get the value would probably be the VB Val method:
double value = Microsoft.VisualBasic.Conversion.Val("ABC12345"); // 12345.0
You would have to regular expression. It seems you are looking for only digits and not letters.
Sample:
string result =
System.Text.RegularExpressions.Regex.Replace("Your input string", #"\D+", string.Empty);

Regex to remove all special characters from string?

I'm completely incapable of regular expressions, and so I need some help with a problem that I think would best be solved by using regular expressions.
I have list of strings in C#:
List<string> lstNames = new List<string>();
lstNames.add("TRA-94:23");
lstNames.add("TRA-42:101");
lstNames.add("TRA-109:AD");
foreach (string n in lstNames) {
// logic goes here that somehow uses regex to remove all special characters
string regExp = "NO_IDEA";
string tmp = Regex.Replace(n, regExp, "");
}
I need to be able to loop over the list and return each item without any special characters. For example, item one would be "TRA9423", item two would be "TRA42101" and item three would be TRA109AD.
Is there a regular expression that can accomplish this for me?
Also, the list contains more than 4000 items, so I need the search and replace to be efficient and quick if possible.
EDIT:
I should have specified that any character beside a-z, A-Z and 0-9 is special in my circumstance.
It really depends on your definition of special characters. I find that a whitelist rather than a blacklist is the best approach in most situations:
tmp = Regex.Replace(n, "[^0-9a-zA-Z]+", "");
You should be careful with your current approach because the following two items will be converted to the same string and will therefore be indistinguishable:
"TRA-12:123"
"TRA-121:23"
[^a-zA-Z0-9] is a character class matches any non-alphanumeric characters.
Alternatively, [^\w\d] does the same thing.
Usage:
string regExp = "[^\w\d]";
string tmp = Regex.Replace(n, regExp, "");
This should do it:
[^a-zA-Z0-9]
Basically it matches all non-alphanumeric characters.
You can use:
string regExp = "\\W";
This is equivalent to Daniel's "[^a-zA-Z0-9]"
\W matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}].
For my purposes I wanted all English ASCII chars, so this worked.
html = Regex.Replace(html, "[^\x00-\x80]+", "")
Depending on your definition of "special character", I think "[^a-zA-Z0-9]" would probably do the trick. That would find anything that is not a small letter, a capital letter, or a digit.
tmp = Regex.Replace(n, #"\W+", "");
\w matches letters, digits, and underscores, \W is the negated version.
If you don't want to use Regex then another option is to use
char.IsLetterOrDigit
You can use this to loop through each char of the string and only return if true.
public static string Letters(this string input)
{
return string.Concat(input.Where(x => char.IsLetter(x) && !char.IsSymbol(x) && !char.IsWhiteSpace(x)));
}

Categories