C# string manipulation - c#

I have a string like
A150[ff;1];A160;A100;D10;B10'
in which I want to extract A150, D10, B10
In between these valid string, i can have any characters. The one part that is consistent is the semicolumn between each legitimate strings.
Again the junk character that I am trying to remove itself can contain the semi column

Without having more detail for the specific rules, it looks like you want to use String.Split(';') and then construct a regex to parse out the string you really need foreach string in your newly created collection. Since you said that the semi colon can appear in the "junk" it's irrelevant since it won't match your regex.

var input = "A150[ff+1];A160;A150[ff-1]";
var temp = new List<string>();
foreach (var s in input.Split(';'))
{
temp.Add(Regex.Replace(s, "(A[0-9]*)\\[*.*", "$1"));
}
foreach (var s1 in temp.Distinct())
{
Console.WriteLine(s1);
}
produces the output
A150
A160

First,you should use
string s="A150[ff;1];A160;A100;D10;B1";
s.IndexOf("A160");
Through this command you can get the index of A160 and other words.
And then s.Remove(index,count).

If you only want to remove the 'junk' inbetween the '[' and ']' characters you can use regex for that
Regex regex = new Regex(#"\[([^\}]+)\]");
string result = regex.Replace("A150[ff;1];A160;A100;D10;B10", "");
Then String.Split to get the individual items

Related

C# Char Array remove at specific index

Not to sure the best way to remove the char from the char array if the char at a given index is a number.
private string TextBox_CharacterCheck(string tocheckTextBox)
{
char[] charlist = tocheckTextBox.ToCharArray();
foreach (char character in charlist)
{
if (char.IsNumber(character))
{
}
}
return (new string(charlist));
}
Thanks in advance.
// this is now resolved. thank you to all who contributed
You could use the power of Linq:
return new string(tocheckTextBox.Where(c => !char.IsNumber(c)).ToArray())
This is fairly easy using Regex:
var result = Regex.Replace("a1b2c3d4", #"\d", "");
(as #Adassko notes, you can use "[0-9]" instead of #"\d" if you just want the digits 0 to 9, and not any other numeric characters).
You can also do it fairly efficiently using a StringBuilder:
var sb = new StringBuilder();
foreach (var ch in "a1b2c3d4")
{
if (!char.IsNumber(ch))
{
sb.Append(ch);
}
}
var result = sb.ToString();
You can also do it with linq:
var result = new string("a1b2c3d4".Where(x => !char.IsNumber(x)).ToArray());
Use Regex:
private string TextBox_CharacterCheck(string tocheckTextBox)
{
return Regex.Replace(tocheckTextBox, #"[\d]", string.Empty);;
}
System.String is immutable. You could use string.Replace or a regular expression to remove unwanted characters into a new string.
your best bet is to use regular expressions.
strings are immutable meaning that you can't change them - you need to rewrite the whole string - to do it in optimal way you should use StringBuilder class and Append every character that you want.
Also watch out for your code - char.IsNumber checks not only for characters 0-9, it also returns true for every numeric character such as ٢ and you probably don't want that.
here's the full list of characters returning true:
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙0123456789
you should also use [0-9] rather than \d in your regular expressions if you want only parsable digits.
You can also use a trick to .Split your string on your character, then .Join it back. This not only allows you to remove one or more characters, it also lets you to replace it with some other character.
I use this trick to remove incorrect characters from file name:
string.Join("-", possiblyIncorrectFileName.Split(Path.GetInvalidFileNameChars()))
this code will replace any character that cannot be used in valid file name to -
You can use LINQ to remove the char from the char array if the char at a given index is a number.
CODE
//This will return you the list of char discarding the number.
var removedDigits = tocheckTextBox.Where(x => !char.IsDigit(x));
//This will return the string without numbers.
string output = string.join("", removedDigits);

Using regex to remove everything that is not in between '<#'something'#>' and replace it with commas

I have a string, for example
<#String1#> + <#String2#> , <#String3#> --<#String4#>
And I want to use regex/string manipulation to get the following result:
<#String1#>,<#String2#>,<#String3#>,<#String4#>
I don't really have any experience doing this, any tips?
There are multiple ways to do something like this, and it depends on exactly what you need. However, if you want to use a single regex operation to do it, and you only want to fix stuff that comes between the bracketed strings, then you could do this:
string input = "<#String1#> + <#String2#> , <#String3#> --<#String4#>";
string pattern = "(?<=>)[^<>]+(?=<)";
string replacement = ",";
string result = Regex.Replace(input, pattern, replacement);
The pattern uses [^<>]+ to match any non-pointy-bracket characters, but it combines it with a look-behind statement ((?<=>)) and a look-ahead statement (?=<) to make sure that it only matches text that occurs between a closing and another opening set of brackets.
If you need to remove text that comes before the first < or after the last >, or if you find the look-around statements confusing, you may want to consider simply matching the text that comes between the brackets and then loop through all the matches and build a new string yourself, rather than using the RegEx.Replace method. For instance:
string input = "sdfg<#String1#> + <#String2#> , <#String3#> --<#String4#>ag";
string pattern = #"<[^<>]+>";
List<String> values = new List<string>();
foreach (Match m in Regex.Matches(input, pattern))
values.Add(m.Value);
string result = String.Join(",", values);
Or, the same thing using LINQ:
string input = "sdfg<#String1#> + <#String2#> , <#String3#> --<#String4#>ag";
string pattern = #"<[^<>]+>";
string result = String.Join(",", Regex.Matches(input, pattern).Cast<Match>().Select(x => x.Value));
If you're just after string manipulation and don't necessarily need a regex, you could simply use the string.Replace method.
yourString = yourString.Replace("#> + <#", "#>,<#");

Regex to strip characters except given ones?

I would like to strip strings but only leave the following:
[a-zA-Z]+[_a-zA-Z0-9-]*
I am trying to output strings that start with a character, then can have alphanumeric, underscores, and dashes. How can I do this with RegEx or another function?
Because everything in the second part of the regex is in the first part, you could do something like this:
String foo = "_-abc.!##$5o993idl;)"; // your string here.
//First replace removes all the characters you don't want.
foo = Regex.Replace(foo, "[^_a-zA-Z0-9-]", "");
//Second replace removes any characters from the start that aren't allowed there.
foo = Regex.Replace(foo, "^[^a-zA-Z]+", "");
So start out by paring it down to only the allowed characters. Then get rid of any allowed characters that can't be at the beginning.
Of course, if your regex gets more complicated, this solution falls apart fairly quickly.
Assuming that you've got the strings in a collection, I would do it this way:
foreach element in the collection try match the regex
if !success, remove the string from the collection
Or the other way round - if it matches, add it to a new collection.
If the strings are not in a collection can you add more details as to what your input looks like ?
If you want to pull out all of the identifiers matching your regular expression, you can do it like this:
var input = " _wontmatch f_oobar0 another_valid ";
var re = new Regex( #"\b[a-zA-Z][_a-zA-Z0-9-]*\b" );
foreach( Match match in re.Matches( input ) )
Console.WriteLine( match.Value );
Use MatchCollection matchColl = Regex.Matches("input string","your regex");
Then use:
string [] outStrings = new string[matchColl.Count]; //A string array to contain all required strings
for (int i=0; i < matchColl.Count; i++ )
outStrings[i] = matchColl[i].ToString();
You will have all the required strings in outStrings. Hope this helps.
Edited
var s = Regex.Matches(input_string, "[a-z]+(_*-*[a-z0-9]*)*", RegexOptions.IgnoreCase);
string output_string="";
foreach (Match m in s)
{
output_string = output_string + m;
}
MessageBox.Show(output_string);

How to split E-mail address from a text?

In a text box, I keep E-mail addresses.
for example
Text_box.value="a#hotmail.com,b#hotmail.com,c#hotmail.com"
How can I split all of the email addresses? Should I use Regex?
Finally, I want to keep any E-mail address which is correctly coded by user
string[] s=Text_box.Text.split(',');
Regex R=new Regex("\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b");
var temp=from t in s where R.IsMatch(t) select t;
List<string> final=new List<string>();
final.addrange(temp);
use this
string[] emails = list.Split(new char[]{','});
This will only print the matched email address and not which does not match.
private void Match()
{
Regex validationExpression = new Regex(#"\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*");
string text = "whatever#gmail;a#hotmail.com,gmail#sync,b#hotmail.com,c#hotmail.com,what,dhinesh#c";
MatchCollection matchCollection = validationExpression.Matches(text);
foreach (var matchedEmailAddress in matchCollection)
{
Console.WriteLine(matchedEmailAddress.ToString());
}
Console.ReadLine();
}
This will print
a#hotmail.com
b#hotmail.com
c#hotmail.com
Other things will not be matched by regular expression.
"a#hotmail.com,b#hotmail.com,c#hotmail.com".Split(',');
There are two ways to split string.
1) Every string type object has method called Split() which takes array of characters or array of strings. Elements of this array are used to split given string.
string[] parts = Text_box.value.Split(new char[] {','});
2) Although string.Split() is enough in this example, we can achieve same result using regular expressions. Regex to split is :
string[] parts = Regex.Split(Text_box.value,#",");
You have to use correct regexp to find all forms of email adresses (with latin letters).
Check on wikipedia ( http://en.wikipedia.org/wiki/Email_address ) for correct syntax of email address (easier way) or in RFC5322, 5321 (much harder to understand).
I'm using this:
(?:[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*|""(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*"")#(?:(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-zA-Z0-9-]*[a-zA-Z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Please tell me what the problem is c# regex.split()

string temp_constraint = row["Constraint_Name"].ToString();
string split_string = "FK_"+tableName+"_";
string[] words = Regex.Split(temp_constraint, split_string);
I am trying to split a string using another string.
temp_constraint = FK_ss_foo_ss_fee
split_string = FK_ss_foo_
but it returns a single dimension array with the same string as in temp_constraint
Please help
Your split operation works fine for me:
string temp_constraint = "FK_ss_foo_ss_fee";
string split_string = "FK_ss_foo_";
string[] words = Regex.Split(temp_constraint, split_string);
foreach (string word in words)
{
Console.WriteLine(">{0}<", word);
}
Output:
><
>ss_fee<
I think the problem is that your variables are not set to what you think they are. You will need to debug to find the error elsewhere in your program.
I would also avoid using Split for this (both Regex and String.Split). You aren't really splitting the input - you are removing a string from the start. Split might not always do what you want. Imagine if you have a foreign key like the following:
FK_ss_foo_ss_fee_FK_ss_foo_ss_bee
You want to get ss_fee_FK_ss_foo_ss_bee but split would give you ss_fee_ and ss_bee. This is a contrived example, but it does demonstrate that what you are doing is not a split.
You should use String.Split instead
string[] words =
temp_constraint.Split(new []{split_string}, StringSplitOptions.None);
string split uses a character array to split text and does the split by each character which is not often ideal.
The following article shows how to split text by an entire word
http://www.bytechaser.com/en/functions/ufgr7wkpwf/split-text-by-words-and-not-character-arrays.aspx

Categories