Determine which character was used in String.Split() - c#

If I am using String.Split() how can I find out which character caused the split? For instance, when "Apple|Car" splits, I want to know that it did so via the pipe character and not a comma or hyphen.
When I see the "Car" item, I'd want to know it was split from "Apple" with a pipe, and split from "Plane" with a comma.
var splitChars = new Char [] {'|', ',', '-'};
string item1 = "Apple|Car,Plane-Truck";
var mySplit = item1.Split(splitChars);

string myMessage = "Apple|Car,Plane-Truck";
//Break apart string
var splits = myMessage.Split(new Char[] { '|', ',', '-' });
int accumulated_length = 0;
foreach (string piece in splits)
{
accumulated_length += piece.Length + 1;
if (accumulated_length <= myMessage.Length)
{
Console.WriteLine("{0} was split at {1}", piece, myMessage[accumulated_length - 1]);
}
else
{
Console.WriteLine("{0} was the last one", piece);
}
}

It will split on all of them in the example you've given. but in general, you would just see which of the defined split characters are contained in the string:
var sourceString = "Apple|Car,Plane-Truck";
var allSplitChars = new[] {'|', ',', '-', '.', '!', '?'};
// Find only the characters that are contained in the source string
List<char> charsUsedToSplit = allSplitChars.Where(sourceString.Contains).ToList();

Any characters in the list will be used for the split.. can you clarify what you're actually trying to do? in your example the tokens after the split will be "Apple", "Car", "Plane", "Truck" so each of your characters will be used to split..
If you're trying to determine which character caused the split for each token, then perhaps you might implement the split yourself and keep track:
List<Tuple<String, Char>> Splitter(string msg, char[] chars) {
var offset = 0;
var splitChars = new HashSet<char>(chars);
var splits = new List<Tuple<String, Char>>();
for(var idx = 0; idx < msg.Length; idx++) {
if (splitChars.Contains(msg[idx])) {
var split = Tuple.Create(msg.Substring(offset, idx - offset), msg[idx]);
splits.Add(split);
offset = idx + 1;
}
}
return splits;
}
string myMessage = "Apple|Car,Plane-Truck";
var splits = Splitter(myMessage, new [] {'|', ',', '-'});
foreach (string piece in splits)
{
Console.WriteLine("word: {0}, split by: {1}", piece.Item1, piece.Item2);
}

Related

C# Count the words in a string

How do I do this with basic string functions and loop? I want to count the words in a string. My problem is that it only works when the user do not use multiple spaces.
Here is my code:
string phrase;
int word = 1;
Console.Write("Enter a phrase: ");
phrase = Console.ReadLine();
for (int i = 0; i<phrase.Length; i++)
{
if (name[i] == ' ')
{
word++;
}
}
Console.WriteLine(word);
One approach is to use a regular expression to "condense" all consecutive spaces into a single instance. Then the job is simple.
var str = "aaa bb cccc d e";
var regex = new Regex(#"\s+");
Console.WriteLine(regex.Replace(str, " ")?.Split(' ')?.Count());
If you can use LINQ, i suggest this approach:
string[] source = phrase.Split(new char[] { '.', '?', '!', ' ', ';', ':', ',' }, StringSplitOptions.RemoveEmptyEntries);
var matchQuery = from word in source
select word;
int wordCount = matchQuery.Count();
Console.WriteLine(wordCount);
I would create an array with string data type. Then I would use Split method when reading the data. This would split the entire text anytime you see a defined character (character is a one letter or character). In your case the defined character would be empty space; that is ' '. So my formula would be something like:
string phrase;
string[] seperated; // this is where you would split the full name
int word = 1;
Console.Write("Enter a phrase: ");
phrase = Console.ReadLine();
seperated=phrase.Split(' ');
for (int i = 0; i<seperated.Length; i++)
{
Console.WriteLine(seperated[i]); // this would print each word one by one
}
Once capture the full name split in seperated array, than you can use the seperated name, last name etc the way you want. seperated[0]= would be the first word, seperated[1] would be the second word... if the name consists of total 5 words than the last word could be reached by seperated[4].
Instead of the for loop you could use Split() and Linq:
var splitPhrase = phrase.Split(' ');
var wordCount = splitPhrase.Count(x=>x != "");
or use StringSplitOptions, as per comment:
var words = phrase.Split(' ', StringSplitOptions.RemoveEmptyEntrie);
var wordCount = words.Count();
You can use regex pattern:
\S matches anything but a whitespace
string str = "Test words test"
MatchCollection collection = Regex.Matches(str, #"[\S]+");
int numberOfWords = collection.Count;
First of all, we have to define word. If word is
Any non empty sequence of letters
we can use a simple regular expression pattern: \p{L}+
Code:
using System.Text.RegularExpressions;
...
int word = Regex.Matches(phrase, #"\p{L}+").Count;
Edit: in case you don't want regular expressions you can implement FSM - Finite State Machine:
int word = 0;
bool inWord = false;
foreach (var c in phrase)
if (char.IsLetter(c)) {
if (!inWord) // we count beginnings of each word
word += 1;
inWord = true;
}
else
inWord = false;
Here we have two states: - inWord == true, false - which are if character is within some word or not. Having these states we can count all the words beginnings.
You can achieve this by using the following function.It only returns the no. of words in the given sentence.
public int totalWords(string sentence) {
int wordCount = 0;
for (int i = 0; i < sentence.Length - 1; i++)
{
if (sentence[i] == ' ' && Char.IsLetter(sentence[i + 1]) && (i > 0))
{
wordCount++;
}
}
wordCount++;
return wordCount;
}
Assuming your words are separated by a space you can just Split the string and get the length of the resulting array:
string[] words = phrase.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
int numberOfWords = words.Length;

How to stop String.Concat(); from removing whitespaces? [duplicate]

I would like to split a string with delimiters but keep the delimiters in the result.
How would I do this in C#?
If the split chars were ,, ., and ;, I'd try:
using System.Text.RegularExpressions;
...
string[] parts = Regex.Split(originalString, #"(?<=[.,;])")
(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.
If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
#"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", #"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2
Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if(index-start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
Just in case anyone wants this answer aswell...
Instead of string[] parts = Regex.Split(originalString, #"(?<=[.,;])") you could use string[] parts = Regex.Split(originalString, #"(?=yourmatch)") where yourmatch is whatever your separator is.
Supposing the original string was
777- cat
777 - dog
777 - mouse
777 - rat
777 - wolf
Regex.Split(originalString, #"(?=777)") would return
777 - cat
777 - dog
and so on
This version does not use LINQ or Regex and so it's probably relatively efficient. I think it might be easier to use than the Regex because you don't have to worry about escaping special delimiters. It returns an IList<string> which is more efficient than always converting to an array. It's an extension method, which is convenient. You can pass in the delimiters as either an array or as multiple parameters.
/// <summary>
/// Splits the given string into a list of substrings, while outputting the splitting
/// delimiters (each in its own string) as well. It's just like String.Split() except
/// the delimiters are preserved. No empty strings are output.</summary>
/// <param name="s">String to parse. Can be null or empty.</param>
/// <param name="delimiters">The delimiting characters. Can be an empty array.</param>
/// <returns></returns>
public static IList<string> SplitAndKeepDelimiters(this string s, params char[] delimiters)
{
var parts = new List<string>();
if (!string.IsNullOrEmpty(s))
{
int iFirst = 0;
do
{
int iLast = s.IndexOfAny(delimiters, iFirst);
if (iLast >= 0)
{
if (iLast > iFirst)
parts.Add(s.Substring(iFirst, iLast - iFirst)); //part before the delimiter
parts.Add(new string(s[iLast], 1));//the delimiter
iFirst = iLast + 1;
continue;
}
//No delimiters were found, but at least one character remains. Add the rest and stop.
parts.Add(s.Substring(iFirst, s.Length - iFirst));
break;
} while (iFirst < s.Length);
}
return parts;
}
Some unit tests:
text = "[a link|http://www.google.com]";
result = text.SplitAndKeepDelimiters('[', '|', ']');
Assert.IsTrue(result.Count == 5);
Assert.AreEqual(result[0], "[");
Assert.AreEqual(result[1], "a link");
Assert.AreEqual(result[2], "|");
Assert.AreEqual(result[3], "http://www.google.com");
Assert.AreEqual(result[4], "]");
A lot of answers to this! One I knocked up to split by various strings (the original answer caters for just characters i.e. length of 1). This hasn't been fully tested.
public static IEnumerable<string> SplitAndKeep(string s, params string[] delims)
{
var rows = new List<string>() { s };
foreach (string delim in delims)//delimiter counter
{
for (int i = 0; i < rows.Count; i++)//row counter
{
int index = rows[i].IndexOf(delim);
if (index > -1
&& rows[i].Length > index + 1)
{
string leftPart = rows[i].Substring(0, index + delim.Length);
string rightPart = rows[i].Substring(index + delim.Length);
rows[i] = leftPart;
rows.Insert(i + 1, rightPart);
}
}
}
return rows;
}
This seems to work, but its not been tested much.
public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
List<string> splitValues = new List<string>();
int itemStart = 0;
for (int pos = 0; pos < value.Length; pos++)
{
for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
{
if (separators[sepIndex] == value[pos])
{
// add the section of string before the separator
// (unless its empty and we are discarding empty sections)
if (itemStart != pos || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, pos - itemStart));
}
itemStart = pos + 1;
// add the separator
splitValues.Add(separators[sepIndex].ToString());
break;
}
}
}
// add anything after the final separator
// (unless its empty and we are discarding empty sections)
if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
}
return splitValues.ToArray();
}
Recently I wrote an extension method do to this:
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
I'd say the easiest way to accomplish this (except for the argument Hans Kesting brought up) is to split the string the regular way, then iterate over the array and add the delimiter to every element but the last.
To avoid adding character to new line try this :
string[] substrings = Regex.Split(input,#"(?<=[-])");
result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
result[i] += separator;
(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)
(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)
Iterate through the string character by character (which is what regex does anyway.
When you find a splitter, then spin off a substring.
pseudo code
int hold, counter;
List<String> afterSplit;
string toSplit
for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
if(toSplit[counter] = /*split charaters*/)
{
afterSplit.Add(toSplit.Substring(hold, counter));
hold = counter;
}
}
That's sort of C# but not really. Obviously, choose the appropriate function names.
Also, I think there might be an off-by-1 error in there.
But that will do what you're asking.
veggerby's answer modified to
have no string items in the list
have fixed string as delimiter like "ab" instead of single character
var delimiter = "ab";
var text = "ab33ab9ab"
var parts = Regex.Split(text, $#"({Regex.Escape(delimiter)})")
.Where(p => p != string.Empty)
.ToList();
// parts = "ab", "33", "ab", "9", "ab"
The Regex.Escape() is there just in case your delimiter contains characters which regex interprets as special pattern commands (like *, () and thus have to be escaped.
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApplication9
{
class Program
{
static void Main(string[] args)
{
string input = #"This;is:a.test";
char sep0 = ';', sep1 = ':', sep2 = '.';
string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
List<string> parts=new List<string>();
foreach (Match match in matches)
{
parts.Add(match.ToString());
}
}
}
}
I wanted to do a multiline string like this but needed to keep the line breaks so I did this
string x =
#"line 1 {0}
line 2 {1}
";
foreach(var line in string.Format(x, "one", "two")
.Split("\n")
.Select(x => x.Contains('\r') ? x + '\n' : x)
.AsEnumerable()
) {
Console.Write(line);
}
yields
line 1 one
line 2 two
I came across same problem but with multiple delimiters. Here's my solution:
public static string[] SplitLeft(this string #this, char[] delimiters, int count)
{
var splits = new List<string>();
int next = -1;
while (splits.Count + 1 < count && (next = #this.IndexOfAny(delimiters, next + 1)) >= 0)
{
splits.Add(#this.Substring(0, next));
#this = new string(#this.Skip(next).ToArray());
}
splits.Add(#this);
return splits.ToArray();
}
Sample with separating CamelCase variable names:
var variableSplit = variableName.SplitLeft(
Enumerable.Range('A', 26).Select(i => (char)i).ToArray());
I wrote this code to split and keep delimiters:
private static string[] SplitKeepDelimiters(string toSplit, char[] delimiters, StringSplitOptions splitOptions = StringSplitOptions.None)
{
var tokens = new List<string>();
int idx = 0;
for (int i = 0; i < toSplit.Length; ++i)
{
if (delimiters.Contains(toSplit[i]))
{
tokens.Add(toSplit.Substring(idx, i - idx)); // token found
tokens.Add(toSplit[i].ToString()); // delimiter
idx = i + 1; // start idx for the next token
}
}
// last token
tokens.Add(toSplit.Substring(idx));
if (splitOptions == StringSplitOptions.RemoveEmptyEntries)
{
tokens = tokens.Where(token => token.Length > 0).ToList();
}
return tokens.ToArray();
}
Usage example:
string toSplit = "AAA,BBB,CCC;DD;,EE,";
char[] delimiters = new char[] {',', ';'};
string[] tokens = SplitKeepDelimiters(toSplit, delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var token in tokens)
{
Console.WriteLine(token);
}

how can i split or is there any other Method.? [duplicate]

This question already has answers here:
Split string into string array of single characters
(8 answers)
Closed 6 years ago.
eg.
string str="A+B-D*E";
I want to get array like that
string[] list=new string{"A","+","B","-","D","*","E"};
So I try to search.But it not okay.
https://msdn.microsoft.com/en-us/library/system.string.split.aspx
Update: i don't want ToArray or ToCharArray.
Actually my example is wrong. I want a string[]
For example:
String sample = "AB+CD+EF";
String[] result = new[]{"AB","+","CB","+","EF"};
Simply convert to character-array as a string is nothing but a list of characters:
var result = input.ToArray();
Or better
result = input.ToCharArray();
Which a string-method not just an extension-method of IEnumerable<char>.
Assuming that the result should be a string[]
string str = "A+B-D*E";
string[] result = str.Select(x => x.ToString()).ToArray();
if the output type could also be a char[] i'd recommend
char[] result = str.ToCharArray();
Store it in a character array instead
string s = "A+B-DE";
var chars = s.ToCharArray();
OR
var chars = s.ToArray();
If you want to split by multiple characters and you want to keep the delimiters you could use this method:
public static string[] Split(string input, bool keepDelimiters, params char[] delimiter)
{
if (input == null) throw new ArgumentNullException(nameof(input));
if (delimiter == null) throw new ArgumentNullException(nameof(delimiter));
if (input.Length <= 1) return new[] {input};
List<string> tokens = new List<string>();
int start = 0, index;
while ((index = input.IndexOfAny(delimiter, start)) > 0)
{
tokens.Add(input.Substring(start, index - start));
if (keepDelimiters)
tokens.Add(input[index].ToString());
start = index + 1;
}
if (start < input.Length)
tokens.Add(input.Substring(start));
return tokens.ToArray();
}
Your sample:
string[] result = Split("A+B-D*E", true, '+', '-', '*', '/');
foreach(string token in result)
Console.WriteLine(token);
Result:
A
+
B
-
D
*
E

Read numbers from the console given in a single line, separated by a space

I have a task to read n given numbers in a single line, separated by a space ( ) from the console.
I know how to do it when I read every number on a separate line (Console.ReadLine()) but I need help with how to do it when the numbers are on the same line.
You can use String.Split. You can provide the character(s) that you want to use to split the string into multiple. If you provide none all white-spaces are assumed as split-characters(so new-line, tab etc):
string[] tokens = line.Split(); // all spaces, tab- and newline characters are used
or, if you want to use only spaces as delimiter:
string[] tokens = line.Split(' ');
If you want to parse them to int you can use Array.ConvertAll():
int[] numbers = Array.ConvertAll(tokens, int.Parse); // fails if the format is invalid
If you want to check if the format is valid use int.TryParse.
You can split the line using String.Split():
var line = Console.ReadLine();
var numbers = line.Split(' ');
foreach(var number in numbers)
{
int num;
if (Int32.TryParse(number, out num))
{
// num is your number as integer
}
}
You can use Linq to read the line then split and finally convert each item to integers:
int[] numbers = Console
.ReadLine()
.Split(new Char[] {' '}, StringSplitOptions.RemoveEmptyEntries)
.Select(item => int.Parse(item))
.ToArray();
You simply need to split the data entered.
string numbersLine = console.ReadLine();
string[] numbers = numbersLine.Split(new char[] { ' '});
// Convert to int or whatever and use
This will help you to remove extra blank spaces present at the end or beginning of the input string.
string daat1String = Console.ReadLine();
daat1String = daat1String.TrimEnd().TrimStart();
string[] data1 = daat1String.Split(null);
int[] data1Int = Array.ConvertAll(data1, int.Parse);
you can do
int[] Numbers = Array.ConvertAll(Console.ReadLine().Split(' '),(item) => Convert.ToInt32(item));
the above line helps us get individual integers in a Line , separated by a
Single space.Two Or More spaces between numbers will result in error.
int[] Numbers = Array.ConvertAll(Console.ReadLine().Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries), (item) => Convert.ToInt32(item));
this variation will Fix the error and work well even when two or more spaces between numbers in a Line
you can use this function, it's very helpful
static List<string> inputs = new List<string>();
static int input_pointer = 0;
public static string cin(char sep = ' ')
{
if (input_pointer >= inputs.Count)
{
string line = Console.ReadLine();
inputs = line.Split(sep).OfType<string>().ToList();
input_pointer = 0;
}
string v = inputs[input_pointer];
input_pointer++;
return v;
}
Example:
for(var i =0; i<n ; i++)
for (var j = 0; j<n; j++)
{
M[i,j] = Convert.ToInt16(cin());
}

How do i remove all trailing space from word(both front and end white space) in comma seperated string

My string is like this:
Abc , xyz , pqr
Final output:
Abc,xyz,pqr
i want to remove all trailing space(from front and end) from my word whenever i encounter comma in my string but condittion is if my string contain comma or space.
Eg:
Abc pqr, ttt ooo
output:
Abc,pqr,ttt,ooo
(no space before or after the word)
So all spaces and commas are separators and you want to remove all consecutive duplicates. You can use String.Split with StringSplitOptions.RemoveEmptyEntries and String.Join:
string[] parts = input.Split(new []{' ', ',' }, StringSplitOptions.RemoveEmptyEntries);
string result = string.Join(",", parts);
This is my favourite since it's readable, maintainable and efficient. I've tested it against a regex approach with a 60000-length string and 1000 repetitions:
Regex: 11.68 seconds
String.Split + String.Join: 1.28 seconds
But if the string is very large you might want to use a StringBuilder approach.
This is the best result so far:
public static string SplitAnyKeepSingleSeparator(string input, string separator, params char[] delimiter)
{
if(input == null) return null;
input = input.Trim(delimiter);
StringBuilder sb = new StringBuilder(input.Length);
int index = 0;
int delimiterIndex = input.IndexOfAny(delimiter);
while (delimiterIndex != -1)
{
string token = input.Substring(index, delimiterIndex - index);
sb.Append(token).Append(separator);
index = delimiterIndex + 1;
while (delimiter.Contains(input[index])) index++;
delimiterIndex = input.IndexOfAny(delimiter, index);
}
sb.Append(input.Substring(index));
return sb.ToString();
}
But with a 60,000 character-string it's still less efficient than the String.Split+Join approach.
The simplest and the best in perfomance:
private static string SplitWordsByComma(string s)
{
return Regex.Replace(s.Trim(' ', ','), #"(?<=\b\w+\b)[\s,]+", ",");
}
fastest (and properly working for bound cases):
private static string SplitWordsByComma(string s)
{
var sb = new StringBuilder(s.Length);
for (int i = 0; i < s.Length; i++)
{
while (i < s.Length && !char.IsLetter(s[i]))
{
i++;
}
while (i < s.Length && char.IsLetter(s[i]))
{
sb.Append(s[i++]);
}
sb.Append(',');
}
return sb.Remove(sb.Length - 1, 1).ToString();
}
This should work :)
string inputStr = "ABC, cde , fgh, IJk";
string outputStr = inputStr.Replace(' ', '');
OR
string outputStr = string.Join(",",(inputStr.Split(new char[] { ' ', ',' }, StringSplitOptions.RemoveEmptyEntries));
Try this one..
string input = "input:plumber, plumber output:plumber,,plumber";
input = input.Replace(" ", ",").Trim();
while (input.Contains(",,"))
{
input = input.Replace(",,", ",");
}
Edited...
I have tested the answers provided and compared with mine. (in VB.net)
Dim input As String = IO.File.ReadAllText("C:\Users\SARVESH\Desktop\abc.txt")
Dim stp As New Diagnostics.Stopwatch
stp.Start()
input = input.Replace(" ", ",").Trim()
While input.Contains(",,")
input = input.Replace(",,", ",")
End While
stp.Stop()
MessageBox.Show(stp.ElapsedMilliseconds)
input = IO.File.ReadAllText("C:\Users\SARVESH\Desktop\abc.txt")
stp.Reset()
stp.Restart()
input = System.Text.RegularExpressions.Regex.Replace(input.Trim(" "c, ","c), "(?<=\b\w+\b)[\s,]+", ",")
stp.Stop()
MessageBox.Show(stp.ElapsedMilliseconds)
input = IO.File.ReadAllText("C:\Users\SARVESH\Desktop\abc.txt")
stp.Reset()
stp.Restart()
Dim parts As String() = input.Split({" "c, ","c}, StringSplitOptions.RemoveEmptyEntries)
Dim result As String = String.Join(",", parts)
stp.Stop()
MessageBox.Show(stp.ElapsedMilliseconds)
Tested on a string of 10616532 characters.
Tim Schmelter' code works much faster... does the task in 141 ms.
Mine does the same in 512 ms.
regex does the same in 1221 ms.

Categories