how to check if string has more than two repeating characters - c#

I'm trying to check if a string contains more than two repeating characters.
for example
'aabcd123' = ok
'aaabcd123' = not ok
'aabbab11!#' = ok
'aabbbac123!' = not ok
I've tried something like this but with no luck
if (string.Distinct().Count() > 2){
//do something
}
any help would be appreciated.

This one worked for me:
public bool IsOK(string s)
{
if(s.Length < 3) return true;
return !s.Where((c,i)=> i >= 2 && s[i-1] == c && s[i-2] == c).Any();
}
'aabcd123' : OK
'aaabcd123' : not OK
'aabbab11!#' : OK
'aabbbac123!' : not OK

Just for classic loops sake:
public bool HasRepeatingChars(string str)
{
for(int i = 0; i < str.Length - 2; i++)
if(str[i] == str[i+1] && str[i] == str[i+2])
return true;
return false;
}

You can use Regex for this one:
return Regex.IsMatch(inputString, #"(.)\1{2,}", RegexOptions.IgnoreCase)
This checks for any character, then for at least 2 times that same character. Works even with strings like "AaA" and can be modified to find out exactly what those characters are, where they are occurring in the string, and much more (also allows you to replace those substrings with something else)
More information:
http://msdn.microsoft.com/en-us/library/6f7hht7k.aspx
http://msdn.microsoft.com/en-us/library/az24scfc.aspx

Since you want to find runs of three characters and not just three characters in a string, you need to loop through and look at three consecutive characters to see if they are the same. A loop like this will work.
string myString = "aaabbcd";
bool hasMoreThanTwoRepeatingCharsInARow = false;
for(int index = 2; index < myString.Length; index++)
{
if(myString[index] == myString[index - 1] && myString[index] == myString[index - 2])
{
hasMoreThanTwoRepeatingCharsInARow = true;
}
}
I would stick this in a method and make the variables better and you're good to go!

[TestMethod]
public void Test()
{
const string sample1 = "aabcd123";
const string sample2 = "aaabcd123";
const string sample3 = "aabbab11!#";
const string sample4 = "aabbbac123!";
Assert.IsTrue(IsOk(sample1));
Assert.IsFalse(IsOk(sample2));
Assert.IsTrue(IsOk(sample3));
Assert.IsFalse(IsOk(sample4));
}
private bool IsOk(string str)
{
char? last = null;
var i = 1;
foreach (var c in str)
{
if (last == c)
{
i++;
if (i > 2) return false;
}
else
{
i = 1;
}
last = c;
}
return true;
}

Use the regex
(?:.*<letter>.*){2}
For letter b for example you'd use
(?:.*b.*){2}
For all vowels
(?:.*[aeiou].*){2}
For regex usage on c# refer to https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex?view=net-5.0

Related

How can i Split words inside this .txt in c# [duplicate]

How to split text into words?
Example text:
'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'
The words in that line are:
Oh
you
can't
help
that
said
the
Cat
we're
all
mad
here
I'm
mad
You're
mad
Split text on whitespace, then trim punctuation.
var text = "'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'";
var punctuation = text.Where(Char.IsPunctuation).Distinct().ToArray();
var words = text.Split().Select(x => x.Trim(punctuation));
Agrees exactly with example.
First, Remove all special characeters:
var fixedInput = Regex.Replace(input, "[^a-zA-Z0-9% ._]", string.Empty);
// This regex doesn't support apostrophe so the extension method is better
Then split it:
var split = fixedInput.Split(' ');
For a simpler C# solution for removing special characters (that you can easily change), add this extension method (I added a support for an apostrophe):
public static string RemoveSpecialCharacters(this string str) {
var sb = new StringBuilder();
foreach (char c in str) {
if ((c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '\'' || c == ' ') {
sb.Append(c);
}
}
return sb.ToString();
}
Then use it like so:
var words = input.RemoveSpecialCharacters().Split(' ');
You'll be surprised to know that this extension method is very efficient (surely much more efficient then the Regex) so I'll suggest you use it ;)
Update
I agree that this is an English only approach but to make it Unicode compatible all you have to do is replace:
(c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')
With:
char.IsLetter(c)
Which supports Unicode, .Net Also offers you char.IsSymbol and char.IsLetterOrDigit for the variety of cases
Just to add a variation on #Adam Fridental's answer which is very good, you could try this Regex:
var text = "'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'";
var matches = Regex.Matches(text, #"\w+[^\s]*\w+|\w");
foreach (Match match in matches) {
var word = match.Value;
}
I believe this is the shortest RegEx that will get all the words
\w+[^\s]*\w+|\w
If you don't want to use a Regex object, you could do something like...
string mystring="Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.";
List<string> words=mystring.Replace(",","").Replace(":","").Replace(".","").Split(" ").ToList();
You'll still have to handle the trailing apostrophe at the end of "that,'"
This is one of solution, i dont use any helper class or method.
public static List<string> ExtractChars(string inputString) {
var result = new List<string>();
int startIndex = -1;
for (int i = 0; i < inputString.Length; i++) {
var character = inputString[i];
if ((character >= 'a' && character <= 'z') ||
(character >= 'A' && character <= 'Z')) {
if (startIndex == -1) {
startIndex = i;
}
if (i == inputString.Length - 1) {
result.Add(GetString(inputString, startIndex, i));
}
continue;
}
if (startIndex != -1) {
result.Add(GetString(inputString, startIndex, i - 1));
startIndex = -1;
}
}
return result;
}
public static string GetString(string inputString, int startIndex, int endIndex) {
string result = "";
for (int i = startIndex; i <= endIndex; i++) {
result += inputString[i];
}
return result;
}
If you want to use the "for cycle" to check each char and save all punctuation in the input string I've create this class. The method GetSplitSentence() return a list of SentenceSplitResult. In this list there are saved all the words and all the punctuation & numbers. Each punctuation or numbers saved is an item in the list. The sentenceSplitResult.isAWord is used to check if is a word or not. [Sorry for my English]
public class SentenceSplitResult
{
public string word;
public bool isAWord;
}
public class StringsHelper
{
private readonly List<SentenceSplitResult> outputList = new List<SentenceSplitResult>();
private readonly string input;
public StringsHelper(string input)
{
this.input = input;
}
public List<SentenceSplitResult> GetSplitSentence()
{
StringBuilder sb = new StringBuilder();
try
{
if (String.IsNullOrEmpty(input)) {
Logger.Log(new ArgumentNullException(), "GetSplitSentence - input is null or empy");
return outputList;
}
bool isAletter = IsAValidLetter(input[0]);
// Each char i checked if is a part of a word.
// If is YES > I can store the char for later
// IF is NO > I Save the word (if exist) and then save the punctuation
foreach (var _char in input)
{
isAletter = IsAValidLetter(_char);
if (isAletter == true)
{
sb.Append(_char);
}
else
{
SaveWord(sb.ToString());
sb.Clear();
SaveANotWord(_char);
}
}
SaveWord(sb.ToString());
}
catch (Exception ex)
{
Logger.Log(ex);
}
return outputList;
}
private static bool IsAValidLetter(char _char)
{
if ((Char.IsPunctuation(_char) == true) || (_char == ' ') || (Char.IsNumber(_char) == true))
{
return false;
}
return true;
}
private void SaveWord(string word)
{
if (String.IsNullOrEmpty(word) == false)
{
outputList.Add(new SentenceSplitResult()
{
isAWord = true,
word = word
});
}
}
private void SaveANotWord(char _char)
{
outputList.Add(new SentenceSplitResult()
{
isAWord = false,
word = _char.ToString()
});
}
You could try using a regex to remove the apostrophes that aren't surrounded by letters (i.e. single quotes) and then using the Char static methods to strip all the other characters. By calling the regex first you can keep the contraction apostrophes (e.g. can't) but remove the single quotes like in 'Oh.
string myText = "'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'";
Regex reg = new Regex("\b[\"']\b");
myText = reg.Replace(myText, "");
string[] listOfWords = RemoveCharacters(myText);
public string[] RemoveCharacters(string input)
{
StringBuilder sb = new StringBuilder();
foreach (char c in input)
{
if (Char.IsLetter(c) || Char.IsWhiteSpace(c) || c == '\'')
sb.Append(c);
}
return sb.ToString().Split(' ');
}

How to create a sequence of strings between "start" and "end" strings [duplicate]

I have a question about iterate through the Alphabet.
I would like to have a loop that begins with "a" and ends with "z". After that, the loop begins "aa" and count to "az". after that begins with "ba" up to "bz" and so on...
Anybody know some solution?
Thanks
EDIT: I forgot that I give a char "a" to the function then the function must return b. if u give "bnc" then the function must return "bnd"
First effort, with just a-z then aa-zz
public static IEnumerable<string> GetExcelColumns()
{
for (char c = 'a'; c <= 'z'; c++)
{
yield return c.ToString();
}
char[] chars = new char[2];
for (char high = 'a'; high <= 'z'; high++)
{
chars[0] = high;
for (char low = 'a'; low <= 'z'; low++)
{
chars[1] = low;
yield return new string(chars);
}
}
}
Note that this will stop at 'zz'. Of course, there's some ugly duplication here in terms of the loops. Fortunately, that's easy to fix - and it can be even more flexible, too:
Second attempt: more flexible alphabet
private const string Alphabet = "abcdefghijklmnopqrstuvwxyz";
public static IEnumerable<string> GetExcelColumns()
{
return GetExcelColumns(Alphabet);
}
public static IEnumerable<string> GetExcelColumns(string alphabet)
{
foreach(char c in alphabet)
{
yield return c.ToString();
}
char[] chars = new char[2];
foreach(char high in alphabet)
{
chars[0] = high;
foreach(char low in alphabet)
{
chars[1] = low;
yield return new string(chars);
}
}
}
Now if you want to generate just a, b, c, d, aa, ab, ac, ad, ba, ... you'd call GetExcelColumns("abcd").
Third attempt (revised further) - infinite sequence
public static IEnumerable<string> GetExcelColumns(string alphabet)
{
int length = 0;
char[] chars = null;
int[] indexes = null;
while (true)
{
int position = length-1;
// Try to increment the least significant
// value.
while (position >= 0)
{
indexes[position]++;
if (indexes[position] == alphabet.Length)
{
for (int i=position; i < length; i++)
{
indexes[i] = 0;
chars[i] = alphabet[0];
}
position--;
}
else
{
chars[position] = alphabet[indexes[position]];
break;
}
}
// If we got all the way to the start of the array,
// we need an extra value
if (position == -1)
{
length++;
chars = new char[length];
indexes = new int[length];
for (int i=0; i < length; i++)
{
chars[i] = alphabet[0];
}
}
yield return new string(chars);
}
}
It's possible that it would be cleaner code using recursion, but it wouldn't be as efficient.
Note that if you want to stop at a certain point, you can just use LINQ:
var query = GetExcelColumns().TakeWhile(x => x != "zzz");
"Restarting" the iterator
To restart the iterator from a given point, you could indeed use SkipWhile as suggested by thesoftwarejedi. That's fairly inefficient, of course. If you're able to keep any state between call, you can just keep the iterator (for either solution):
using (IEnumerator<string> iterator = GetExcelColumns())
{
iterator.MoveNext();
string firstAttempt = iterator.Current;
if (someCondition)
{
iterator.MoveNext();
string secondAttempt = iterator.Current;
// etc
}
}
Alternatively, you may well be able to structure your code to use a foreach anyway, just breaking out on the first value you can actually use.
Edit: Made it do exactly as the OP's latest edit wants
This is the simplest solution, and tested:
static void Main(string[] args)
{
Console.WriteLine(GetNextBase26("a"));
Console.WriteLine(GetNextBase26("bnc"));
}
private static string GetNextBase26(string a)
{
return Base26Sequence().SkipWhile(x => x != a).Skip(1).First();
}
private static IEnumerable<string> Base26Sequence()
{
long i = 0L;
while (true)
yield return Base26Encode(i++);
}
private static char[] base26Chars = "abcdefghijklmnopqrstuvwxyz".ToCharArray();
private static string Base26Encode(Int64 value)
{
string returnValue = null;
do
{
returnValue = base26Chars[value % 26] + returnValue;
value /= 26;
} while (value-- != 0);
return returnValue;
}
The following populates a list with the required strings:
List<string> result = new List<string>();
for (char ch = 'a'; ch <= 'z'; ch++){
result.Add (ch.ToString());
}
for (char i = 'a'; i <= 'z'; i++)
{
for (char j = 'a'; j <= 'z'; j++)
{
result.Add (i.ToString() + j.ToString());
}
}
I know there are plenty of answers here, and one's been accepted, but IMO they all make it harder than it needs to be. I think the following is simpler and cleaner:
static string NextColumn(string column){
char[] c = column.ToCharArray();
for(int i = c.Length - 1; i >= 0; i--){
if(char.ToUpper(c[i]++) < 'Z')
break;
c[i] -= (char)26;
if(i == 0)
return "A" + new string(c);
}
return new string(c);
}
Note that this doesn't do any input validation. If you don't trust your callers, you should add an IsNullOrEmpty check at the beginning, and a c[i] >= 'A' && c[i] <= 'Z' || c[i] >= 'a' && c[i] <= 'z' check at the top of the loop. Or just leave it be and let it be GIGO.
You may also find use for these companion functions:
static string GetColumnName(int index){
StringBuilder txt = new StringBuilder();
txt.Append((char)('A' + index % 26));
//txt.Append((char)('A' + --index % 26));
while((index /= 26) > 0)
txt.Insert(0, (char)('A' + --index % 26));
return txt.ToString();
}
static int GetColumnIndex(string name){
int rtn = 0;
foreach(char c in name)
rtn = rtn * 26 + (char.ToUpper(c) - '#');
return rtn - 1;
//return rtn;
}
These two functions are zero-based. That is, "A" = 0, "Z" = 25, "AA" = 26, etc. To make them one-based (like Excel's COM interface), remove the line above the commented line in each function, and uncomment those lines.
As with the NextColumn function, these functions don't validate their inputs. Both with give you garbage if that's what they get.
Here’s what I came up with.
/// <summary>
/// Return an incremented alphabtical string
/// </summary>
/// <param name="letter">The string to be incremented</param>
/// <returns>the incremented string</returns>
public static string NextLetter(string letter)
{
const string alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
if (!string.IsNullOrEmpty(letter))
{
char lastLetterInString = letter[letter.Length - 1];
// if the last letter in the string is the last letter of the alphabet
if (alphabet.IndexOf(lastLetterInString) == alphabet.Length - 1)
{
//replace the last letter in the string with the first leter of the alphbat and get the next letter for the rest of the string
return NextLetter(letter.Substring(0, letter.Length - 1)) + alphabet[0];
}
else
{
// replace the last letter in the string with the proceeding letter of the alphabet
return letter.Remove(letter.Length-1).Insert(letter.Length-1, (alphabet[alphabet.IndexOf(letter[letter.Length-1])+1]).ToString() );
}
}
//return the first letter of the alphabet
return alphabet[0].ToString();
}
just curious , why not just
private string alphRecursive(int c) {
var alphabet = "abcdefghijklmnopqrstuvwxyz".ToCharArray();
if (c >= alphabet.Length) {
return alphRecursive(c/alphabet.Length) + alphabet[c%alphabet.Length];
} else {
return "" + alphabet[c%alphabet.Length];
}
}
This is like displaying an int, only using base 26 in stead of base 10. Try the following algorithm to find the nth entry of the array
q = n div 26;
r = n mod 26;
s = '';
while (q > 0 || r > 0) {
s = alphabet[r] + s;
q = q div 26;
r = q mod 26;
}
Of course, if you want the first n entries, this is not the most efficient solution. In this case, try something like daniel's solution.
I gave this a go and came up with this:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Alphabetty
{
class Program
{
const string alphabet = "abcdefghijklmnopqrstuvwxyz";
static int cursor = 0;
static int prefixCursor;
static string prefix = string.Empty;
static bool done = false;
static void Main(string[] args)
{
string s = string.Empty;
while (s != "Done")
{
s = GetNextString();
Console.WriteLine(s);
}
Console.ReadKey();
}
static string GetNextString()
{
if (done) return "Done";
char? nextLetter = GetNextLetter(ref cursor);
if (nextLetter == null)
{
char? nextPrefixLetter = GetNextLetter(ref prefixCursor);
if(nextPrefixLetter == null)
{
done = true;
return "Done";
}
prefix = nextPrefixLetter.Value.ToString();
nextLetter = GetNextLetter(ref cursor);
}
return prefix + nextLetter;
}
static char? GetNextLetter(ref int letterCursor)
{
if (letterCursor == alphabet.Length)
{
letterCursor = 0;
return null;
}
char c = alphabet[letterCursor];
letterCursor++;
return c;
}
}
}
Here is something I had cooked up that may be similar. I was experimenting with iteration counts in order to design a numbering schema that was as small as possible, yet gave me enough uniqueness.
I knew that each time a added an Alpha character, it would increase the possibilities 26x but I wasn't sure how many letters, numbers, or the pattern I wanted to use.
That lead me to the code below. Basically you pass it an AlphaNumber string, and every position that has a Letter, would eventually increment to "z\Z" and every position that had a Number, would eventually increment to "9".
So you can call it 1 of two ways..
//This would give you the next Itteration... (H3reIsaStup4dExamplf)
string myNextValue = IncrementAlphaNumericValue("H3reIsaStup4dExample")
//Or Loop it resulting eventually as "Z9zzZzzZzzz9zZzzzzzz"
string myNextValue = "H3reIsaStup4dExample"
while (myNextValue != null)
{
myNextValue = IncrementAlphaNumericValue(myNextValue)
//And of course do something with this like write it out
}
(For me, I was doing something like "1AA000")
public string IncrementAlphaNumericValue(string Value)
{
//We only allow Characters a-b, A-Z, 0-9
if (System.Text.RegularExpressions.Regex.IsMatch(Value, "^[a-zA-Z0-9]+$") == false)
{
throw new Exception("Invalid Character: Must be a-Z or 0-9");
}
//We work with each Character so it's best to convert the string to a char array for incrementing
char[] myCharacterArray = Value.ToCharArray();
//So what we do here is step backwards through the Characters and increment the first one we can.
for (Int32 myCharIndex = myCharacterArray.Length - 1; myCharIndex >= 0; myCharIndex--)
{
//Converts the Character to it's ASCII value
Int32 myCharValue = Convert.ToInt32(myCharacterArray[myCharIndex]);
//We only Increment this Character Position, if it is not already at it's Max value (Z = 90, z = 122, 57 = 9)
if (myCharValue != 57 && myCharValue != 90 && myCharValue != 122)
{
myCharacterArray[myCharIndex]++;
//Now that we have Incremented the Character, we "reset" all the values to the right of it
for (Int32 myResetIndex = myCharIndex + 1; myResetIndex < myCharacterArray.Length; myResetIndex++)
{
myCharValue = Convert.ToInt32(myCharacterArray[myResetIndex]);
if (myCharValue >= 65 && myCharValue <= 90)
{
myCharacterArray[myResetIndex] = 'A';
}
else if (myCharValue >= 97 && myCharValue <= 122)
{
myCharacterArray[myResetIndex] = 'a';
}
else if (myCharValue >= 48 && myCharValue <= 57)
{
myCharacterArray[myResetIndex] = '0';
}
}
//Now we just return an new Value
return new string(myCharacterArray);
}
}
//If we got through the Character Loop and were not able to increment anything, we retun a NULL.
return null;
}
Here's my attempt using recursion:
public static void PrintAlphabet(string alphabet, string prefix)
{
for (int i = 0; i < alphabet.Length; i++) {
Console.WriteLine(prefix + alphabet[i].ToString());
}
if (prefix.Length < alphabet.Length - 1) {
for (int i = 0; i < alphabet.Length; i++) {
PrintAlphabet(alphabet, prefix + alphabet[i]);
}
}
}
Then simply call PrintAlphabet("abcd", "");

How to keep only numbers and some special characters in a string?

I have a string containing regular characters, special characters and numbers. I'm trying to remove the regular characters, just keeping the numbers and the special characters. I use a loop to check if a character is a special character or a number. Then, I replace it with an empty string. However, this doesn't seem to work because I get an error "can't apply != to string or char". My code is below. If possible, please give me some ideas to fix this. Thanks.
public string convert_string_to_no(string val)
{
string str_val = "";
int val_len = val.Length;
for (int i = 0; i < val_len; i++)
{
char myChar = Convert.ToChar(val.Substring(i, 1));
if ((char.IsDigit(myChar) == false) && (myChar != "-"))
{
str_val = str_val.replace(str_val.substring(i,1),"");
}
}
return str_val;
}
you can use regular expressions to do that.its faster than using loop and clean
String test ="Q1W2-hjkxas1-EE3R4-5T";
Regex rgx = new Regex("[^0-9-]");
Console.WriteLine(rgx.Replace(test, ""));
check the working code here
It seem try to change "-" to '-', and better to construct the string and not replacing the char.
public string convert_string_to_no(string val)
{
string str_val = "";
int val_len = val.Length;
for (int i = 0; i < val_len; i++)
{
char myChar = Convert.ToChar(val.Substring(i, 1));
if (char.IsDigit(myChar) && myChar == '-')
{
str_val += myChar;
}
}
return str_val;
}
I perfer Linq:
public static class StringExtensions
{
public static string ToLimitedString(this string instance,
string validCharacters)
{
// null reference checking...
var result = new string(instance
.Where(c => validCharacters.Contains(c))
.ToArray());
return result;
}
}
usage:
var test ="Q1W2-hjkxas1-EE3R4-5T";
var limited = test.ToLimitedString("01234567890-");
Console.WriteLine(limited);
result:
12-1-34-5
DotNetFiddle Example

How to split text into words?

How to split text into words?
Example text:
'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'
The words in that line are:
Oh
you
can't
help
that
said
the
Cat
we're
all
mad
here
I'm
mad
You're
mad
Split text on whitespace, then trim punctuation.
var text = "'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'";
var punctuation = text.Where(Char.IsPunctuation).Distinct().ToArray();
var words = text.Split().Select(x => x.Trim(punctuation));
Agrees exactly with example.
First, Remove all special characeters:
var fixedInput = Regex.Replace(input, "[^a-zA-Z0-9% ._]", string.Empty);
// This regex doesn't support apostrophe so the extension method is better
Then split it:
var split = fixedInput.Split(' ');
For a simpler C# solution for removing special characters (that you can easily change), add this extension method (I added a support for an apostrophe):
public static string RemoveSpecialCharacters(this string str) {
var sb = new StringBuilder();
foreach (char c in str) {
if ((c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '\'' || c == ' ') {
sb.Append(c);
}
}
return sb.ToString();
}
Then use it like so:
var words = input.RemoveSpecialCharacters().Split(' ');
You'll be surprised to know that this extension method is very efficient (surely much more efficient then the Regex) so I'll suggest you use it ;)
Update
I agree that this is an English only approach but to make it Unicode compatible all you have to do is replace:
(c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')
With:
char.IsLetter(c)
Which supports Unicode, .Net Also offers you char.IsSymbol and char.IsLetterOrDigit for the variety of cases
Just to add a variation on #Adam Fridental's answer which is very good, you could try this Regex:
var text = "'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'";
var matches = Regex.Matches(text, #"\w+[^\s]*\w+|\w");
foreach (Match match in matches) {
var word = match.Value;
}
I believe this is the shortest RegEx that will get all the words
\w+[^\s]*\w+|\w
If you don't want to use a Regex object, you could do something like...
string mystring="Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.";
List<string> words=mystring.Replace(",","").Replace(":","").Replace(".","").Split(" ").ToList();
You'll still have to handle the trailing apostrophe at the end of "that,'"
This is one of solution, i dont use any helper class or method.
public static List<string> ExtractChars(string inputString) {
var result = new List<string>();
int startIndex = -1;
for (int i = 0; i < inputString.Length; i++) {
var character = inputString[i];
if ((character >= 'a' && character <= 'z') ||
(character >= 'A' && character <= 'Z')) {
if (startIndex == -1) {
startIndex = i;
}
if (i == inputString.Length - 1) {
result.Add(GetString(inputString, startIndex, i));
}
continue;
}
if (startIndex != -1) {
result.Add(GetString(inputString, startIndex, i - 1));
startIndex = -1;
}
}
return result;
}
public static string GetString(string inputString, int startIndex, int endIndex) {
string result = "";
for (int i = startIndex; i <= endIndex; i++) {
result += inputString[i];
}
return result;
}
If you want to use the "for cycle" to check each char and save all punctuation in the input string I've create this class. The method GetSplitSentence() return a list of SentenceSplitResult. In this list there are saved all the words and all the punctuation & numbers. Each punctuation or numbers saved is an item in the list. The sentenceSplitResult.isAWord is used to check if is a word or not. [Sorry for my English]
public class SentenceSplitResult
{
public string word;
public bool isAWord;
}
public class StringsHelper
{
private readonly List<SentenceSplitResult> outputList = new List<SentenceSplitResult>();
private readonly string input;
public StringsHelper(string input)
{
this.input = input;
}
public List<SentenceSplitResult> GetSplitSentence()
{
StringBuilder sb = new StringBuilder();
try
{
if (String.IsNullOrEmpty(input)) {
Logger.Log(new ArgumentNullException(), "GetSplitSentence - input is null or empy");
return outputList;
}
bool isAletter = IsAValidLetter(input[0]);
// Each char i checked if is a part of a word.
// If is YES > I can store the char for later
// IF is NO > I Save the word (if exist) and then save the punctuation
foreach (var _char in input)
{
isAletter = IsAValidLetter(_char);
if (isAletter == true)
{
sb.Append(_char);
}
else
{
SaveWord(sb.ToString());
sb.Clear();
SaveANotWord(_char);
}
}
SaveWord(sb.ToString());
}
catch (Exception ex)
{
Logger.Log(ex);
}
return outputList;
}
private static bool IsAValidLetter(char _char)
{
if ((Char.IsPunctuation(_char) == true) || (_char == ' ') || (Char.IsNumber(_char) == true))
{
return false;
}
return true;
}
private void SaveWord(string word)
{
if (String.IsNullOrEmpty(word) == false)
{
outputList.Add(new SentenceSplitResult()
{
isAWord = true,
word = word
});
}
}
private void SaveANotWord(char _char)
{
outputList.Add(new SentenceSplitResult()
{
isAWord = false,
word = _char.ToString()
});
}
You could try using a regex to remove the apostrophes that aren't surrounded by letters (i.e. single quotes) and then using the Char static methods to strip all the other characters. By calling the regex first you can keep the contraction apostrophes (e.g. can't) but remove the single quotes like in 'Oh.
string myText = "'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'";
Regex reg = new Regex("\b[\"']\b");
myText = reg.Replace(myText, "");
string[] listOfWords = RemoveCharacters(myText);
public string[] RemoveCharacters(string input)
{
StringBuilder sb = new StringBuilder();
foreach (char c in input)
{
if (Char.IsLetter(c) || Char.IsWhiteSpace(c) || c == '\'')
sb.Append(c);
}
return sb.ToString().Split(' ');
}

Parse an integer from a string with trailing garbage

I need to parse a decimal integer that appears at the start of a string.
There may be trailing garbage following the decimal number. This needs to be ignored (even if it contains other numbers.)
e.g.
"1" => 1
" 42 " => 42
" 3 -.X.-" => 3
" 2 3 4 5" => 2
Is there a built-in method in the .NET framework to do this?
int.TryParse() is not suitable. It allows trailing spaces but not other trailing characters.
It would be quite easy to implement this but I would prefer to use the standard method if it exists.
You can use Linq to do this, no Regular Expressions needed:
public static int GetLeadingInt(string input)
{
return Int32.Parse(new string(input.Trim().TakeWhile(c => char.IsDigit(c) || c == '.').ToArray()));
}
This works for all your provided examples:
string[] tests = new string[] {
"1",
" 42 ",
" 3 -.X.-",
" 2 3 4 5"
};
foreach (string test in tests)
{
Console.WriteLine("Result: " + GetLeadingInt(test));
}
foreach (var m in Regex.Matches(" 3 - .x. 4", #"\d+"))
{
Console.WriteLine(m);
}
Updated per comments
Not sure why you don't like regular expressions, so I'll just post what I think is the shortest solution.
To get first int:
Match match = Regex.Match(" 3 - .x. - 4", #"\d+");
if (match.Success)
Console.WriteLine(int.Parse(match.Value));
There's no standard .NET method for doing this - although I wouldn't be surprised to find that VB had something in the Microsoft.VisualBasic assembly (which is shipped with .NET, so it's not an issue to use it even from C#).
Will the result always be non-negative (which would make things easier)?
To be honest, regular expressions are the easiest option here, but...
public static string RemoveCruftFromNumber(string text)
{
int end = 0;
// First move past leading spaces
while (end < text.Length && text[end] == ' ')
{
end++;
}
// Now move past digits
while (end < text.Length && char.IsDigit(text[end]))
{
end++;
}
return text.Substring(0, end);
}
Then you just need to call int.TryParse on the result of RemoveCruftFromNumber (don't forget that the integer may be too big to store in an int).
I like #Donut's approach.
I'd like to add though, that char.IsDigit and char.IsNumber also allow for some unicode characters which are digits in other languages and scripts (see here).
If you only want to check for the digits 0 to 9 you could use "0123456789".Contains(c).
Three example implementions:
To remove trailing non-digit characters:
var digits = new string(input.Trim().TakeWhile(c =>
("0123456789").Contains(c)
).ToArray());
To remove leading non-digit characters:
var digits = new string(input.Trim().SkipWhile(c =>
!("0123456789").Contains(c)
).ToArray());
To remove all non-digit characters:
var digits = new string(input.Trim().Where(c =>
("0123456789").Contains(c)
).ToArray());
And of course: int.Parse(digits) or int.TryParse(digits, out output)
This doesn't really answer your question (about a built-in C# method), but you could try chopping off characters at the end of the input string one by one until int.TryParse() accepts it as a valid number:
for (int p = input.Length; p > 0; p--)
{
int num;
if (int.TryParse(input.Substring(0, p), out num))
return num;
}
throw new Exception("Malformed integer: " + input);
Of course, this will be slow if input is very long.
ADDENDUM (March 2016)
This could be made faster by chopping off all non-digit/non-space characters on the right before attempting each parse:
for (int p = input.Length; p > 0; p--)
{
char ch;
do
{
ch = input[--p];
} while ((ch < '0' || ch > '9') && ch != ' ' && p > 0);
p++;
int num;
if (int.TryParse(input.Substring(0, p), out num))
return num;
}
throw new Exception("Malformed integer: " + input);
string s = " 3 -.X.-".Trim();
string collectedNumber = string.empty;
int i;
for (x = 0; x < s.length; x++)
{
if (int.TryParse(s[x], out i))
collectedNumber += s[x];
else
break; // not a number - that's it - get out.
}
if (int.TryParse(collectedNumber, out i))
Console.WriteLine(i);
else
Console.WriteLine("no number found");
This is how I would have done it in Java:
int parseLeadingInt(String input)
{
NumberFormat fmt = NumberFormat.getIntegerInstance();
fmt.setGroupingUsed(false);
return fmt.parse(input, new ParsePosition(0)).intValue();
}
I was hoping something similar would be possible in .NET.
This is the regex-based solution I am currently using:
int? parseLeadingInt(string input)
{
int result = 0;
Match match = Regex.Match(input, "^[ \t]*\\d+");
if (match.Success && int.TryParse(match.Value, out result))
{
return result;
}
return null;
}
Might as well add mine too.
string temp = " 3 .x£";
string numbersOnly = String.Empty;
int tempInt;
for (int i = 0; i < temp.Length; i++)
{
if (Int32.TryParse(Convert.ToString(temp[i]), out tempInt))
{
numbersOnly += temp[i];
}
}
Int32.TryParse(numbersOnly, out tempInt);
MessageBox.Show(tempInt.ToString());
The message box is just for testing purposes, just delete it once you verify the method is working.
I'm not sure why you would avoid Regex in this situation.
Here's a little hackery that you can adjust to your needs.
" 3 -.X.-".ToCharArray().FindInteger().ToList().ForEach(Console.WriteLine);
public static class CharArrayExtensions
{
public static IEnumerable<char> FindInteger(this IEnumerable<char> array)
{
foreach (var c in array)
{
if(char.IsNumber(c))
yield return c;
}
}
}
EDIT:
That's true about the incorrect result (and the maintenance dev :) ).
Here's a revision:
public static int FindFirstInteger(this IEnumerable<char> array)
{
bool foundInteger = false;
var ints = new List<char>();
foreach (var c in array)
{
if(char.IsNumber(c))
{
foundInteger = true;
ints.Add(c);
}
else
{
if(foundInteger)
{
break;
}
}
}
string s = string.Empty;
ints.ForEach(i => s += i.ToString());
return int.Parse(s);
}
private string GetInt(string s)
{
int i = 0;
s = s.Trim();
while (i<s.Length && char.IsDigit(s[i])) i++;
return s.Substring(0, i);
}
Similar to Donut's above but with a TryParse:
private static bool TryGetLeadingInt(string input, out int output)
{
var trimmedString = new string(input.Trim().TakeWhile(c => char.IsDigit(c) || c == '.').ToArray());
var canParse = int.TryParse( trimmedString, out output);
return canParse;
}

Categories