Split alphanumeric string to array containing the alphabet and numeric characters separately - c#

I'm looking to find a way to split an alphanumeric string like
"Foo123Bar"
into an array that contains it like so
array[0] = "Foo"
array[1] = "123"
array[2] = "Bar"
I'm not sure what the best way to achieve this is, especially because the strings I'm comparing follow no specific pattern as far as which is first, alphabet or numbers, or how many times they each appear. For example it could look like any of the following:
"Foo123Bar"
"123Bar"
"Foobar123"
"Foo123Bar2"
I'm trying to find out if there is a more efficient way of doing this other than splitting the string character by character and checking to see if it's numeric.

string input = "Foo123Bar";
var array = Regex.Matches(input, #"\D+|\d+")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();

I dont think you will get around checking every character of the string.
You could try something like this:
string[] SplitMyString(string s)
{
if( s.Length == 0 )
return new string[0];
List<string> split = new List<string>(1);
split.Add("");
bool isNumeric = s[0] >= '0' && s[0] <= '9';
foreach(char c in s)
{
bool AddString = false;
if( c >= '0' && c <= '9' )
{
AddString = !isNumeric;
isNumeric = true;
}
else
{
AddString = isNumeric;
isNumeric = false;
}
if( AddString )
split.Add(c.ToString());
else
split[split.Count-1] += c;
}
return split.ToArray();
}

Related

How to get string to contain only numbers, dashes, and space?

I am using regex to check if the string contains only numbers, dashes and spaces:
Regex regex = new Regex(#"^[0-9- ]+$");
if(!regex.IsMatch(str))
How can I do this without regex?
You can use linq to iterate over the characters, and char.IsDigit to check for a digit.
bool invalid = myString.Any( x => x != ' ' && x != '-' && !char.IsDigit(x) );
Here's a LINQ solution:
var allowedChars = "1234567890- ";
var str = "3275-235 23-325";
if (str.All(x => allowedChars.Contains(x))){
Console.WriteLine("true");
}

Replace string if starts with string in List

I have a string that looks like this
s = "<Hello it´s me, <Hi how are you <hay"
and a List
List<string> ValidList= {Hello, hay} I need the result string to be like
string result = "<Hello it´s me, ?Hi how are you <hay"
So the result string will if it starts with an < and the rest bellogs to the list, keep it, otherwise if starts with < but doesn´t bellong to list replaces the H by ?
I tried using the IndexOf to find the position of the < and the if the string after starsWith any of the strings in the List leave it.
foreach (var vl in ValidList)
{
int nextLt = 0;
while ((nextLt = strAux.IndexOf('<', nextLt)) != -1)
{
//is element, leave it
if (!(strAux.Substring(nextLt + 1).StartsWith(vl)))
{
//its not, replace
strAux = string.Format(#"{0}?{1}", strAux.Substring(0, nextLt), strAux.Substring(nextLt + 1, strAux.Length - (nextLt + 1)));
}
nextLt++;
}
}
To give the solution I gave as a comment its proper answer:
Regex.Replace(s, string.Format("<(?!{0})", string.Join("|", ValidList)), "?")
This (obviously) uses regular expressions to replace the unwanted < characters by ?. In order to recognize those characters, we use a negative lookahead expression. For the example word list, this would look like this: (?!Hallo|hay). This will essentially match only if what we are matching is not followed by Hallo or hay. In this case, we are matching < so the full expression becomes <(?!Hallo|hay).
Now we just need to account for the dynamic ValidList by creating the regular expression on the fly. We use string.Format and string.Join there.
Something like this without using RegEx or LINQ
string s = "<Hello it´s me, <Hi how are you <hay";
List<string> ValidList = new List<string>() { "Hello", "hay" };
var arr = s.Split(new[] { '<' }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < arr.Length; i++)
{
bool flag = false;
foreach (var item in ValidList)
{
if (arr[i].Contains(item))
{
flag = false;
break;
}
else
{
flag = (flag) ? flag : !flag;
}
}
if (flag)
arr[i] = "?" + arr[i];
else
arr[i] = "<" + arr[i];
}
Console.WriteLine(string.Concat(arr));
A possible solution using LINQ.It splits the string using < and checks if the "word" (text until a blank space found) following is in the Valid List,adding < or ? accordingly. Finally,it joins it all:
List<string> ValidList = new List<string>{ "Hello", "hay" };
string str = "<Hello it´s me, <Hi how are you <hay";
var res = String.Join("",str.Split(new char[] { '<' }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => ValidList.Contains(x.Split(' ').First()) ? "<" + x : "?"+x));

Find a fixed length string with specific string part in C#

I want to find a string of fixed length with specific substring. But I need to do it like we can do in SQL queries.
Example:
I have strings like -
AB012345
AB12345
AB123456
AB1234567
AB98765
AB987654
I want to select strings that have AB at first and 6 characters afterwards. Which can be done in SQL by SELECT * FROM [table_name] WHERE [column_name] LIKE 'AB______' (6 underscores after AB).
So the result will be:
AB012345
AB123456
AB987654
I need to know if there is any way to select strings in such way with C#, by using AB______.
You can use Regular Expressions to filter the result:
List<string> sList = new List<string>(){"AB012345",
"AB12345",
"AB123456",
"AB1234567",
"AB98765",
"AB987654"};
var qry = sList.Where(s=>Regex.Match(s, #"^AB\d{6}$").Success);
Considering you have an string array:
string[] str = new string[3]{"AB012345", "A12345", "AB98765"};
var result = str.Where(x => x.StartsWith("AB") && x.Length == 8).ToList();
The logic is if it starts with AB, and its length is 8. It is your best match.
this should do it
List<string> sList = new List<string>(){
"AB012345",
"AB12345",
"AB123456",
"AB1234567",
"AB98765",
"AB987654"};
List<string> sREsult = sList.Where(x => x.Length == 8 && x.StartsWith("AB")).ToList();
first x.Length == 8 determines the length and x.StartsWith("AB") determines the required characters at the start of the string
This can be achieved by using string.Startwith and string.Length function like this:
public bool CheckStringValid (String input)
{
if (input.StartWith ("AB") && input.Length == 8)
{
return true;
}
else
{
return false;
}
}
This will return true if string matches your criteria.
Hope this helps.
var strlist = new List<string>()
{
"AB012345",
"AB12345",
"AB123456",
"AB1234567",
"AB98765",
"AB987654"
};
var result = strlist.Where(
s => (s.StartsWith("AB") &&(s.Length == 8))
);
foreach(var v in result)
{
Console.WriteLine(v.ToString());
}

How to remove characters from a string using LINQ

I'm having a String like
XQ74MNT8244A
i nee to remove all the char from the string.
so the output will be like
748244
How to do this?
Please help me to do this
new string("XQ74MNT8244A".Where(char.IsDigit).ToArray()) == "748244"
Two options. Using Linq on .Net 4 (on 3.5 it is similar - it doesn't have that many overloads of all methods):
string s1 = String.Concat(str.Where(Char.IsDigit));
Or, using a regular expression:
string s2 = Regex.Replace(str, #"\D+", "");
I should add that IsDigit and \D are Unicode-aware, so it accepts quite a few digits besides 0-9, for example "542abc٣٤".
You can easily adapt them to a check between 0 and 9, or to [^0-9]+.
string value = "HTQ7899HBVxzzxx";
Console.WriteLine(new string(
value.Where(x => (x >= '0' && x <= '9'))
.ToArray()));
If you need only digits and you really want Linq try this:
youstring.ToCharArray().Where(x => char.IsDigit(x)).ToArray();
Using LINQ:
public string FilterString(string input)
{
return new string(input.Where(char.IsNumber).ToArray());
}
Something like this?
"XQ74MNT8244A".ToCharArray().Where(x => { var i = 0; return Int32.TryParse(x.ToString(), out i); })
string s = "XQ74MNT8244A";
var x = new string(s.Where(c => (c >= '0' && c <= '9')).ToArray());
How about an extension method (and overload) that does this for you:
public static string NumbersOnly(this string Instring)
{
return Instring.NumbersOnly("");
}
public static string NumbersOnly(this string Instring, string AlsoAllowed)
{
char[] aChar = Instring.ToCharArray();
int intCount = 0;
string strTemp = "";
for (intCount = 0; intCount <= Instring.Length - 1; intCount++)
{
if (char.IsNumber(aChar[intCount]) || AlsoAllowed.IndexOf(aChar[intCount]) > -1)
{
strTemp = strTemp + aChar[intCount];
}
}
return strTemp;
}
The overload is so you can retain "-", "$" or "." as well, if you wish (instead of strictly numbers).
Usage:
string numsOnly = "XQ74MNT8244A".NumbersOnly();

How do I convert spaces, except for those within quotes, to commas in C#?

Suppose I have a string like this:
one two three "four five six" seven eight
and I want to convert it to this:
one,two,three,"four five six",seven,eight
What's the easiest way to do this in C#?
Assuming that quotes are inescapable you can do the following.
public string SpaceToComma(string input) {
var builder = new System.Text.StringBuilder();
var inQuotes = false;
foreach ( var cur in input ) {
switch ( cur ) {
case ' ':
builder.Append(inQuotes ? cur : ',');
break;
case '"':
inQuotes = !inQuotes;
builder.Append(cur);
break;
default:
builder.Append(cur);
break;
}
}
return builder.ToString();
}
static string Space2Comma(string s)
{
return string.Concat(s.Split('"').Select
((x, i) => i % 2 == 0 ? x.Replace(' ', ',') : '"' + x + '"').ToArray());
}
My first guess is to use a parser that's already written and simple change the delimiter and quote character fit your needs (which are and " respectively).
It looks like this is available to you in C#:
http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx
Perhaps if you changed the delimiter to " ", it may suit your needs to read in the file and then it's just a matter of calling String.Join() a for each line.
I would use the Regex class for this purpose.
Regular expressions can be used to match your input, break it down into individual groups, which you can then reassemble however you want. You can find documentation on the regex classes here.
Regex rx = new Regex( "(\w)|([\"]\w+[\"])" );
MatchCollection matches = rx.Matches("first second \"third fourth fifth\" sixth");
string.Join( ", ", matches.Select( x => x.Value ).ToArray() );
Here's a more reusable function that I came up with:
private string ReplaceWithExceptions(string source, char charToReplace,
char replacementChar, char exceptionChar)
{
bool ignoreReplacementChar = false;
char[] sourceArray = source.ToCharArray();
for (int i = 0; i < sourceArray.Length; i++)
{
if (sourceArray[i] == exceptionChar)
{
ignoreReplacementChar = !ignoreReplacementChar;
}
else
{
if (!ignoreReplacementChar)
{
if (sourceArray[i] == charToReplace)
{
sourceArray[i] = replacementChar;
}
}
}
}
return new string(sourceArray);
}
Usage:
string test = "one two three \"four five six\" seven eight";
System.Diagnostics.Debug.WriteLine(ReplaceWithExceptions(test, char.Parse(" "),
char.Parse(","), char.Parse("\"")));
This may be overkill, but if you believe the problem may generalize, such as having a need to split by other types of characters, or having additional rules that define a token, you should consider either using a parser generator such as Coco or writing a simple one on your own. Coco/R, for instance, will build generate a lexer and parser from an EBNF grammar you provide. The lexer will be a DFA, or a state machine, which is a generalized form of the code provided by JaredPar. Your grammar definition for Coco/R would look like this:
CHARACTERS
alphanum = 'A'..'Z' + 'a'..'z' + '0'..'9'.
TOKENS
unit = '"' {alphanum|' '} '"' | {alphanum}.
Then the produced lexer will scan and tokanize your input accordingly.
Per my comment to the original question, if you don't need the quotes in the final result, this will get the job done. If you do need the quotes, feel free to ignore this.
private String SpaceToComma(string input)
{
String[] temp = input.Split(new Char[] { '"' }, StringSplitOptions.RemoveEmptyEntries);
for (Int32 i = 0; i < temp.Length; i += 2)
{
temp[i] = temp[i].Trim().Replace(' ', ',');
}
return String.Join(",", temp);
}
#Mehrdad beat me to it but guess I'll post it anyway:
static string Convert(string input)
{
var slices = input
.Split('"')
.Select((s, i) => i % 2 != 0
? #"""" + s + #""""
: s.Trim().Replace(' ', ','));
return string.Join(",", slices.ToArray());
}
LINQified and tested :-) ... For a full console app: http://pastebin.com/f23bac59b

Categories