Extract all occurrences of specific characters from strings - c#

I have something like this in my code.
mystring.Split(new[]{"/","*"}, StringSplitOptions.RemoveEmptyEntries);
however, what I actually want is to separate mystring into two arrays, one holding the separated items above, and the other array to hold the delimiters above in the order they appear in the string.
I could use .IndexOf to continue searching until I extract all of them, but somehow I think this will be redundant. Is there a way to do this in .NET? If possible I want to avoid LINQ.
Thanks.

Something like:
var separators = new char[] { '/', '*' };
var words = new List<string>();
var delimiters = new List<string>();
var idx = source.IndexOfAny(separators);
var prevIdx = 0;
while (idx > -1)
{
if (idx - prevIdx > 0)
words.Add(source.Substring(prevIdx, idx - prevIdx));
prevIdx = idx + 1;
delimiters.Add(source.Substring(idx, 1));
idx = source.IndexOfAny(separators, idx + 1);
}

If I understand the questioner correctly, he wants the actual separated items as well as the delimiters.
I think the following code will work:
List<string> SeparatedItems = new List<string>();
List<string> Delimiters = new List<string>();
string sTestString = "mytest/string*isthis**and not/this";
string sSeparatedItemString = String.Empty;
foreach(char c in sTestString) {
if(c == '/' || c == '*') {
Delimiters.Add(c.ToString());
if(sSeparatedItemString != String.Empty) {
SeparatedItems.Add(sSeparatedItemString);
sSeparatedItemString = String.Empty;
}
}
else {
sSeparatedItemString += c.ToString();
}
}
if(sSeparatedItemString != String.Empty) {
SeparatedItems.Add(sSeparatedItemString);
}

Try this:
var items = new List<string>();
var delimiters = new List<string>();
items.AddRange(Regex.Split(text, #"(?<=/)|(?=/)|(?<=\*)|(?=\*)"));
for (int i = 0; i < items.Count; )
{
string item = items[i];
if (item == "*" || item == "/")
{
delimiters.Add(item);
items.RemoveAt(i);
}
else if (item == "")
{
items.RemoveAt(i);
}
else
{
i++;
}
}

You could consider a Regex expression using named groups. Try a nested named group. The outer including capturing the separator and the inner capturing the content only.

Since you're running in .NET 2.0, I'd say using IndexOf is one of the most straight forward ways to solve the problem:
public static int CountOccurences(string input, string pattern)
{
int count = 0;
int i = 0;
while (i = input.IndexOf(pattern, i) != -1)
count++;
return count;
}
The solution Rob Smyth suggests would also work, but I find this the easiest and most understandable one.

Related

Get count of unique characters between first and last letter

I'm trying to get the unique characters count that are between the first and last letter of a word. For example: if I type Yellow the expected output is Y3w, if I type People the output should be P4e and if I type Money the output should be M3y. This is what I tried:
//var strArr = wordToConvert.Split(' ');
string[] strArr = new[] { "Money","Yellow", "People" };
List<string> newsentence = new List<string>();
foreach (string word in strArr)
{
if (word.Length > 2)
{
//ignore 2-letter words
string newword = null;
int distinctCount = 0;
int k = word.Length;
int samecharcount = 0;
int count = 0;
for (int i = 1; i < k - 2; i++)
{
if (word.ElementAt(i) != word.ElementAt(i + 1))
{
count++;
}
else
{
samecharcount++;
}
}
distinctCount = count + samecharcount;
char frst = word[0];
char last = word[word.Length - 1];
newword = String.Concat(frst, distinctCount.ToString(), last);
newsentence.Add(newword);
}
else
{
newsentence.Add(word);
}
}
var result = String.Join(" ", newsentence.ToArray());
Console.WriteLine("Output: " + result);
Console.WriteLine("----------------------------------------------------");
With this code I'm getting the expect output for Yellow, but seems that is not working with People and Money. What can I do to fix this issue or also I'm wondering is maybe there is a better way to do this for example using LINQ/Regex.
Here's an implementation that uses Linq:
string[] strArr = new[]{"Money", "Yellow", "People"};
List<string> newsentence = new List<string>();
foreach (string word in strArr)
{
if (word.Length > 2)
{
// we want the first letter, the last letter, and the distinct count of everything in between
var first = word.First();
var last = word.Last();
var others = word.Skip(1).Take(word.Length - 2);
// Case sensitive
var distinct = others.Distinct();
// Case insensitive
// var distinct = others.Select(c => char.ToLowerInvariant(c)).Distinct();
string newword = first + distinct.Count().ToString() + last;
newsentence.Add(newword);
}
else
{
newsentence.Add(word);
}
}
var result = String.Join(" ", newsentence.ToArray());
Console.WriteLine(result);
Output:
M3y Y3w P4e
Note that this doesn't take account of case, so the output for FiIsSh is 4.
Maybe not the most performant, but here is another example using linq:
var words = new[] { "Money","Yellow", "People" };
var transformedWords = words.Select(Transform);
var sentence = String.Join(' ', transformedWords);
public string Transform(string input)
{
if (input.Length < 3)
{
return input;
}
var count = input.Skip(1).SkipLast(1).Distinct().Count();
return $"{input[0]}{count}{input[^1]}";
}
You can implement it with the help of Linq. e.g. (C# 8+)
private static string EncodeWord(string value) => value.Length <= 2
? value
: $"{value[0]}{value.Substring(1, value.Length - 2).Distinct().Count()}{value[^1]}";
Demo:
string[] tests = new string[] {
"Money","Yellow", "People"
};
var report = string.Join(Environment.NewLine, tests
.Select(test => $"{test} :: {EncodeWord(test)}"));
Console.Write(report);
Outcome:
Money :: M3y
Yellow :: Y3w
People :: P4e
A lot of people have put up some good solutions. I have two solutions for you: one uses LINQ and the other does not.
LINQ, Probably not much different from others
if (str.Length < 3) return str;
var midStr = str.Substring(1, str.Length - 2);
var midCount = midStr.Distinct().Count();
return string.Concat(str[0], midCount, str[str.Length - 1]);
Non-LINQ
if (str.Length < 3) return str;
var uniqueLetters = new Dictionary<char, int>();
var midStr = str.Substring(1, str.Length - 2);
foreach (var c in midStr)
{
if (!uniqueLetters.ContainsKey(c))
{
uniqueLetters.Add(c, 0);
}
}
var midCount = uniqueLetters.Keys.Count();
return string.Concat(str[0], midCount, str[str.Length - 1]);
I tested this with the following 6 strings:
Yellow
Money
Purple
Me
You
Hiiiiiiiii
Output:
LINQ: Y3w, Non-LINQ: Y3w
LINQ: M3y, Non-LINQ: M3y
LINQ: P4e, Non-LINQ: P4e
LINQ: Me, Non-LINQ: Me
LINQ: Y1u, Non-LINQ: Y1u
LINQ: H1i, Non-LINQ: H1i
Fiddle
Performance-wise I'd guess they're pretty much the same, if not identical, but I haven't run any real perf test on the two approaches. I can't imagine they'd be much different, if at all. The only real difference is that the second route expands Distinct() into what it probably does under the covers anyway (I haven't looked at the source to see if that's true, but that's a pretty common way to get a count of . And the first route is certainly less code.
I Would use Linq for that purpose:
string[] words = new string[] { "Yellow" , "People", "Money", "Sh" }; // Sh for 2 letter words (or u can insert 0 and then remove the trinary operator)
foreach (string word in words)
{
int uniqeCharsInBetween = word.Substring(1, word.Length - 2).ToCharArray().Distinct().Count();
string result = word[0] + (uniqeCharsInBetween == 0 ? string.Empty : uniqeCharsInBetween.ToString()) + word[word.Length - 1];
Console.WriteLine(result);
}

Replace string if starts with string in List

I have a string that looks like this
s = "<Hello it´s me, <Hi how are you <hay"
and a List
List<string> ValidList= {Hello, hay} I need the result string to be like
string result = "<Hello it´s me, ?Hi how are you <hay"
So the result string will if it starts with an < and the rest bellogs to the list, keep it, otherwise if starts with < but doesn´t bellong to list replaces the H by ?
I tried using the IndexOf to find the position of the < and the if the string after starsWith any of the strings in the List leave it.
foreach (var vl in ValidList)
{
int nextLt = 0;
while ((nextLt = strAux.IndexOf('<', nextLt)) != -1)
{
//is element, leave it
if (!(strAux.Substring(nextLt + 1).StartsWith(vl)))
{
//its not, replace
strAux = string.Format(#"{0}?{1}", strAux.Substring(0, nextLt), strAux.Substring(nextLt + 1, strAux.Length - (nextLt + 1)));
}
nextLt++;
}
}
To give the solution I gave as a comment its proper answer:
Regex.Replace(s, string.Format("<(?!{0})", string.Join("|", ValidList)), "?")
This (obviously) uses regular expressions to replace the unwanted < characters by ?. In order to recognize those characters, we use a negative lookahead expression. For the example word list, this would look like this: (?!Hallo|hay). This will essentially match only if what we are matching is not followed by Hallo or hay. In this case, we are matching < so the full expression becomes <(?!Hallo|hay).
Now we just need to account for the dynamic ValidList by creating the regular expression on the fly. We use string.Format and string.Join there.
Something like this without using RegEx or LINQ
string s = "<Hello it´s me, <Hi how are you <hay";
List<string> ValidList = new List<string>() { "Hello", "hay" };
var arr = s.Split(new[] { '<' }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < arr.Length; i++)
{
bool flag = false;
foreach (var item in ValidList)
{
if (arr[i].Contains(item))
{
flag = false;
break;
}
else
{
flag = (flag) ? flag : !flag;
}
}
if (flag)
arr[i] = "?" + arr[i];
else
arr[i] = "<" + arr[i];
}
Console.WriteLine(string.Concat(arr));
A possible solution using LINQ.It splits the string using < and checks if the "word" (text until a blank space found) following is in the Valid List,adding < or ? accordingly. Finally,it joins it all:
List<string> ValidList = new List<string>{ "Hello", "hay" };
string str = "<Hello it´s me, <Hi how are you <hay";
var res = String.Join("",str.Split(new char[] { '<' }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => ValidList.Contains(x.Split(' ').First()) ? "<" + x : "?"+x));

C# string.split() separate string by uppercase

I've been using the Split() method to split strings. But this work if you set some character for condition in string.Split(). Is there any way to split a string when is see Uppercase?
Is it possible to get few words from some not separated string like:
DeleteSensorFromTemplate
And the result string is to be like:
Delete Sensor From Template
Use Regex.split
string[] split = Regex.Split(str, #"(?<!^)(?=[A-Z])");
Another way with regex:
public static string SplitCamelCase(string input)
{
return System.Text.RegularExpressions.Regex.Replace(input, "([A-Z])", " $1", System.Text.RegularExpressions.RegexOptions.Compiled).Trim();
}
If you do not like RegEx and you really just want to insert the missing spaces, this will do the job too:
public static string InsertSpaceBeforeUpperCase(this string str)
{
var sb = new StringBuilder();
char previousChar = char.MinValue; // Unicode '\0'
foreach (char c in str)
{
if (char.IsUpper(c))
{
// If not the first character and previous character is not a space, insert a space before uppercase
if (sb.Length != 0 && previousChar != ' ')
{
sb.Append(' ');
}
}
sb.Append(c);
previousChar = c;
}
return sb.ToString();
}
I had some fun with this one and came up with a function that splits by case, as well as groups together caps (it assumes title case for whatever follows) and digits.
Examples:
Input -> "TodayIUpdated32UPCCodes"
Output -> "Today I Updated 32 UPC Codes"
Code (please excuse the funky symbols I use)...
public string[] SplitByCase(this string s) {
var ʀ = new List<string>();
var ᴛ = new StringBuilder();
var previous = SplitByCaseModes.None;
foreach(var ɪ in s) {
SplitByCaseModes mode_ɪ;
if(string.IsNullOrWhiteSpace(ɪ.ToString())) {
mode_ɪ = SplitByCaseModes.WhiteSpace;
} else if("0123456789".Contains(ɪ)) {
mode_ɪ = SplitByCaseModes.Digit;
} else if(ɪ == ɪ.ToString().ToUpper()[0]) {
mode_ɪ = SplitByCaseModes.UpperCase;
} else {
mode_ɪ = SplitByCaseModes.LowerCase;
}
if((previous == SplitByCaseModes.None) || (previous == mode_ɪ)) {
ᴛ.Append(ɪ);
} else if((previous == SplitByCaseModes.UpperCase) && (mode_ɪ == SplitByCaseModes.LowerCase)) {
if(ᴛ.Length > 1) {
ʀ.Add(ᴛ.ToString().Substring(0, ᴛ.Length - 1));
ᴛ.Remove(0, ᴛ.Length - 1);
}
ᴛ.Append(ɪ);
} else {
ʀ.Add(ᴛ.ToString());
ᴛ.Clear();
ᴛ.Append(ɪ);
}
previous = mode_ɪ;
}
if(ᴛ.Length != 0) ʀ.Add(ᴛ.ToString());
return ʀ.ToArray();
}
private enum SplitByCaseModes { None, WhiteSpace, Digit, UpperCase, LowerCase }
Here's another different way if you don't want to be using string builders or RegEx, which are totally acceptable answers. I just want to offer a different solution:
string Split(string input)
{
string result = "";
for (int i = 0; i < input.Length; i++)
{
if (char.IsUpper(input[i]))
{
result += ' ';
}
result += input[i];
}
return result.Trim();
}

How to Split an Already Split String

I have a code as below.
foreach (var item in betSlipwithoutStake)
{
test1 = item.Text;
splitText = test1.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
if (!test.Exists(str => str == splitText[0]))
test.Add(splitText[0]);
}
I'm getting values like "Under 56.5 Points (+56.5)".
Now I want to split again with everything after '(' for each items in the list so i will get a new list and can use it. How can I do that?
if you want to extract value inside parenthesis:
foreach (var item in betSlipwithoutStake)
{
test1 = item.Text;
splitText = test1.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
if (!test.Exists(str => str == splitText[0]))
if(splitText[0].Contains("("))
test.Add(splitText[0].Split('(', ')')[1]);
else
test.Add(splitText[0]);
}
Well, assuming you are after a solution without regular expressions, and that you have a List<string> test declared, you can follow up with a substring, with indexes (and some error handling):
foreach (var item in betSlipwithoutStake)
{
test1 = item.Text;
splitText = test1.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
if (splitText.Length == 0)
continue;
string stringToCheck = splitText[0];
int openParenIndex = stringToCheck.IndexOf('(');
int closeParenIndex = stringToCheck.LastIndexOf(')');
if (openParenIndex >=0 && closeParenIndex >= 0)
{
// get what's inside the outermost set of parens
int length = closeParenIndex - openParenIndex + 1;
stringToCheck = stringToCheck.Substring(openParenIndex, length);
}
if (!test.Exists(str => str == splitText[0]))
test.Add(splitText[0]);
}
You can find out about all of the methods to use with strings here.

Splitting a string array

I have a string array string[] arr, which contains values like N36102W114383, N36102W114382 etc...
I want to split the each and every string such that the value comes like this N36082 and W115080.
What is the best way to do this?
This should work for you.
Regex regexObj = new Regex(#"\w\d+"); # matches a character followed by a sequence of digits
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
matchResults = matchResults.NextMatch(); #two mathches N36102 and W114383
}
If you have the fixed format every time you can just do this:
string[] split_data = data_string.Insert(data_string.IndexOf("W"), ",")
.Split(",", StringSplitOptions.None);
Here you insert a recognizable delimiter into your string and then split it by this delimiter.
Forgive me if this doesn't quite compile, but I'd just break down and write the string processing function by hand:
public static IEnumerable<string> Split(string str)
{
char [] chars = str.ToCharArray();
int last = 0;
for(int i = 1; i < chars.Length; i++) {
if(char.IsLetter(chars[i])) {
yield return new string(chars, last, i - last);
last = i;
}
}
yield return new string(chars, last, chars.Length - last);
}
If you use C#, please try:
String[] code = new Regex("(?:([A-Z][0-9]+))").Split(text).Where(e => e.Length > 0 && e != ",").ToArray();
in case you're only looking for the format NxxxxxWxxxxx, this will do just fine :
Regex r = new Regex(#"(N[0-9]+)(W[0-9]+)");
Match mc = r.Match(arr[i]);
string N = mc.Groups[1];
string W = mc.Groups[2];
Using the 'Split' and 'IsLetter' string functions, this is relatively easy in c#.
Don't forget to write unit tests - the following may have some corner case errors!
// input has form "N36102W114383, N36102W114382"
// output: "N36102", "W114383", "N36102", "W114382", ...
string[] ParseSequenceString(string input)
{
string[] inputStrings = string.Split(',');
List<string> outputStrings = new List<string>();
foreach (string value in inputstrings) {
List<string> valuesInString = ParseValuesInString(value);
outputStrings.Add(valuesInString);
}
return outputStrings.ToArray();
}
// input has form "N36102W114383"
// output: "N36102", "W114383"
List<string> ParseValuesInString(string inputString)
{
List<string> outputValues = new List<string>();
string currentValue = string.Empty;
foreach (char c in inputString)
{
if (char.IsLetter(c))
{
if (currentValue .Length == 0)
{
currentValue += c;
} else
{
outputValues.Add(currentValue);
currentValue = string.Empty;
}
}
currentValue += c;
}
outputValues.Add(currentValue);
return outputValues;
}

Categories