C# - Identify the matching character when using String.Split(CharArray)

C# - Identify the matching character when using String.Split(CharArray) - c#

If I use the Split() function on a string, passing in various split characters as a char[] parameter, and given that the matching split character is removed from the string, how can I identify which character it matched & split on?
string inputString = "Hello, there| world";
char[] splitChars = new char[] { ',','|' }
foreach(string section in inputString.Split(splitChars))
{
Console.WriteLine(section) // [0] Hello [1} there [2] world (no splitChars)
}
I understand that perhaps its not possible to retain this information with my approach. If thats the case, could you suggest an alternative approach?

The C# Regex.Split() method documented here can return the split characters as well as the words between them.
string inputString = "Hello, there| world";
string pattern = #"(,)|([|])";
foreach (string result in Regex.Split(inputString, pattern))
{
Console.WriteLine("'{0}'", result);
}
the result is:
'Hello'
','
' there'
'|'
' world'

Use the Regex.Split() method. I have wrapped this method in the following extension method that is as easy to use as string.Split() itself:
public static string[] ExtendedSplit(this string input, char[] splitChars)
{
string pattern = string.Join("|", splitChars.Select(x => "(" + Regex.Escape(x.ToString()) + ")"));
return Regex.Split(input, pattern);
}
Usage:
string inputString = "Hello, there| world";
char[] splitChars = new char[]{',', '|'};
foreach (string result in inputString.ExtendedSplit(splitChars))
{
Console.WriteLine("'{0}'", result);
}
Output:
'Hello'
','
' there'
'|'
' world'

No, but its rather trivial to write one yourself. Remember, framework methods aren't magic, somebody wrote them. If something doesn't exactly match your needs, write one that does!
static IEnumerable<(string Sector, char Separator)> Split(
this string s,
IEnumerable<char> separators,
bool removeEmptyEntries)
{
var buffer = new StringBuilder();
var separatorsSet = new HashSet<char>(separators);
foreach (var c in s)
{
if (separatorsSet.Contains(c))
{
if (!removeEmptyEntries || buffer.Length > 0)
yield return (buffer.ToString(), c);
buffer.Clear();
}
else
buffer.Append(c);
}
if (buffer.Length > 0)
yield return (buffer.ToString(), default(char));
}

Related

Split word from string

I use this method for splitting words from string, but \n doesn't consider. How can I solve it?
public string SplitXWord(string text, int wordCount)
{
string output = "";
IEnumerable<string> words = text.Split().Take(wordCount);
foreach (string word in words)
{
output += " " + word;
}
return output;
}

Well, string.Split() splits by white-spaces only
https://learn.microsoft.com/en-us/dotnet/api/system.string.split?view=net-6.0
Split is used to break a delimited string into substrings. You can use either a character array or a string array to specify zero or more delimiting characters or strings. If no delimiting characters are specified, the string is split at white-space characters.
bold is mine.
So far so good, string.Split() splits on spaces ' ', tabulation '\t', new line '\n', carriage return '\r' etc.:
Console.Write(string.Join(", ", "a\nb\rc\td e".Split()));
produces
a, b, c, d, e
If you want to split on your cown delimiters, you should prvide them:
Console.Write(string.Join(", ", "a\nb\rc\td e".Split(new char[] {' ', '\t'})));
note that \r and \n are preserved, when splitted on ' ' and 't'
a
b
c, d, e
So, it seems that your method should be something like this:
using System.Linq;
...
//DONE: static - we don't want this here
public static string SplitXWord(string text, int wordCount) {
//DONE: don't forget about degenerated cases
if (string.IsNullOrWhiteSpace(text) || wordCount <= 0)
return "";
//TODO: specify delimiters on which you want to split
return string.Join(" ", text
.Split(
new char[] { ' ', '\t' },
wordCount + 1,
StringSplitOptions.RemoveEmptyEntries)
.Take(wordCount));
}

Use the overload of Split method which accepts an array of char separators and clears the empty entries
string str = "my test \n\r string \n is here";
string[] words = str.Split(new []{' ', '\r', '\n'}, StringSplitOptions.RemoveEmptyEntries);
UPDATE:
Another solution with regex and keeping line characters:
string str = "my test\r\n string\n is here";
var wordsByRegex = Regex.Split(str, #"(?= ).+?(\r|\n|\r\n)?").ToList();

fiddle
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp17
{
class Program
{
static void Main(string[] args)
{
string myStr = "hello my friend \n whats up \n bro";
string[] mySplitStr = myStr.Split("\n");
mySplitStr.ToList().ForEach(str=>{
Console.WriteLine(str);
//to remove the white spaces
//Console.WriteLine(str.Replace(" ",""));
});
Console.ReadLine();
}
}
}

How can I trim strings down to the first occurrence of a "；" or a "["

I have c# strings that look like this:
"かたむく；かたぶく[ok]"
"そば[側,傍]；そく[側]；はた"
"くすり"
"おととい[gikun]；おとつい[gikun]；いっさくじつ"
How can I trim these down so that the output has only text up to the first occurrence of a "；" character (not a normal semicolon) or a "[" or if neither are present then the new string would be the same as the existing.
"かたむく"
"そば"
"くすり"
"おととい"
Is that something that would be best done with Regex or should I use some indexOf type of code to do this?

You don't need a Regex, just string.IndexOfAny. Something like:
var inputs = new[]
{
"かたむく；かたぶく[ok]",
"そば[側,傍]；そく[側]；はた",
"くすり",
"おととい[gikun]；おとつい[gikun]；いっさくじつ"
};
var separators = new[] {' ', '['};
foreach (var input in inputs)
{
var separatorPosition = input.IndexOfAny(separators);
if (separatorPosition >= 0)
{
Debug.WriteLine($"Split: {input.Substring(0, separatorPosition)}");
}
else
{
Debug.WriteLine($"No Split: {input}");
}
}
I get the following output from your inputs:
Split: かたむく；かたぶく
Split: そば
No Split: くすり
Split: おととい
It doesn't quite match what you show, but I think it's correct (and what you show isn't)

Expanding on my comment, "IndexOf can be used to find the first index of the [ character, and Substring can return the string up to that point."
public static string GetSubstringToChar(string input, char delimeter = '[')
{
if (input == null || !input.Contains(delimeter)) return input;
return input.Substring(0, input.IndexOf(delimeter));
}
To make this work with multiple delimeters, we can pass in an array of delimeter characters and use IndexOfAny:
public static string GetSubstringToChar(string input, char[] delimeters)
{
if (input == null || !input.Any(delimeters.Contains)) return input;
return input.Substring(0, input.IndexOfAny(delimeters));
}
You could then call this like:
var strings = new List<string>
{
"かたむく；かたぶく[ok]",
"そば[側,傍]；そく[側]；はた",
"くすり",
"おととい[gikun]；おとつい[gikun]；いっさくじつ",
};
var delimeters = new[] { '；', '[' };
foreach (var str in strings)
{
Console.WriteLine(GetSubstringToChar(str, delimeters));
}

An extension method with a little validation will do the job.
public static string GetUntil(this string input, char[] delimiters)
{
if (input == null || input.IndexOfAny(delimiters) == -1)
return input;
else
return input.Split(delimiters)[0];
}
then call like:
var test = "かたむく；かたぶく[ok]".GetUntil(new char[] { ' ', '[' });

reverse words BUT dot should be at the end

I have sentence:
"I love Marry."
and I would like to get:
"Marry love I." (dot at the end)
How can I do that?
public static string ReverseWords(string originalString)
{
return string.Join(" ", originalString.Split(' ').Where(x => !string.IsNullOrEmpty(x)).Reverse());
}

You can remove the last '.' before the split.
Demo:
public static string ReverseWords(string originalString)
{
var input = originalString.EndsWith(".") ? originalString.Remove(originalString.Length - 1) : originalString; // will trim ending '.'
return string.Join(" ", input.Split().Reverse()) + ".";
}
Try it online!

Try this. I am making it into several statements for readability.
var words = originalString.Split(new [] {' ', '.'}, StringSplitOptions.RemoveEmptyEntries).Reverse();
That gets your words in reverse order, and avoids the need for your Where clause. Then join them back with the period:
return string.Join(' ', words) + '.';

Do it in two steps where you split on . first;
return
string.Join(".",
originalString.Split('.')
.ToList()
.Select(s => string.Join(" ", s.Split(' ').Where(x => !string.IsNullOrEmpty(x)).Reverse())));

For single sentences, remove the dot and append it again in the end.
To remove the dot you can use TrimEnd which will remove all dots from the end of the string. If there is none, nothing is removed:
public static string ReverseWords(string originalString)
{
originalString = originalString.TrimEnd('.');
originalString = string.Join(" ", originalString.Split(' ').Where(x => !string.IsNullOrEmpty(x)).Reverse());
return originalString + ".";
}
For multiple senctences you can split the input string at the ., which will give you an array of sentences without dots. Then you simply reverse each part, append a dot and put them back together (I used a StringBuilder to do that):
public static string ReverseWordsMultiple(string originalString)
{
String[] sentences = originalString.Split(new char[] { '.' }, StringSplitOptions.RemoveEmptyEntries);
StringBuilder builder = new StringBuilder();
foreach (String senctence in sentences)
{
builder.Append(string.Join(" ", senctence.Split(' ').Where(x => !string.IsNullOrEmpty(x)).Reverse()));
builder.Append(". ");
}
return builder.ToString().TrimEnd();
}

Find substring ignoring specified characters

Do any of you know of an easy/clean way to find a substring within a string while ignoring some specified characters to find it. I think an example would explain things better:
string: "Hello, -this- is a string"
substring to find: "Hello this"
chars to ignore: "," and "-"
found the substring, result: "Hello, -this"
Using Regex is not a requirement for me, but I added the tag because it feels related.
Update:
To make the requirement clearer: I need the resulting substring with the ignored chars, not just an indication that the given substring exists.
Update 2:
Some of you are reading too much into the example, sorry, i'll give another scenario that should work:
string: "?A&3/3/C)412&"
substring to find: "A41"
chars to ignore: "&", "/", "3", "C", ")"
found the substring, result: "A&3/3/C)41"
And as a bonus (not required per se), it will be great if it's also not safe to assume that the substring to find will not have the ignored chars on it, e.g.: given the last example we should be able to do:
substring to find: "A3C412&"
chars to ignore: "&", "/", "3", "C", ")"
found the substring, result: "A&3/3/C)412&"
Sorry if I wasn't clear before, or still I'm not :).
Update 3:
Thanks to everyone who helped!, this is the implementation I'm working with for now:
http://www.pastebin.com/pYHbb43Z
An here are some tests:
http://www.pastebin.com/qh01GSx2
I'm using some custom extension methods I'm not including but I believe they should be self-explainatory (I will add them if you like)
I've taken a lot of your ideas for the implementation and the tests but I'm giving the answer to #PierrOz because he was one of the firsts, and pointed me in the right direction.
Feel free to keep giving suggestions as alternative solutions or comments on the current state of the impl. if you like.

in your example you would do:
string input = "Hello, -this-, is a string";
string ignore = "[-,]*";
Regex r = new Regex(string.Format("H{0}e{0}l{0}l{0}o{0} {0}t{0}h{0}i{0}s{0}", ignore));
Match m = r.Match(input);
return m.Success ? m.Value : string.Empty;
Dynamically you would build the part [-, ] with all the characters to ignore and you would insert this part between all the characters of your query.
Take care of '-' in the class []: put it at the beginning or at the end
So more generically, it would give something like:
public string Test(string query, string input, char[] ignorelist)
{
string ignorePattern = "[";
for (int i=0; i<ignoreList.Length; i++)
{
if (ignoreList[i] == '-')
{
ignorePattern.Insert(1, "-");
}
else
{
ignorePattern += ignoreList[i];
}
}
ignorePattern += "]*";
for (int i = 0; i < query.Length; i++)
{
pattern += query[0] + ignorepattern;
}
Regex r = new Regex(pattern);
Match m = r.Match(input);
return m.IsSuccess ? m.Value : string.Empty;
}

Here's a non-regex string extension option:
public static class StringExtensions
{
public static bool SubstringSearch(this string s, string value, char[] ignoreChars, out string result)
{
if (String.IsNullOrEmpty(value))
throw new ArgumentException("Search value cannot be null or empty.", "value");
bool found = false;
int matches = 0;
int startIndex = -1;
int length = 0;
for (int i = 0; i < s.Length && !found; i++)
{
if (startIndex == -1)
{
if (s[i] == value[0])
{
startIndex = i;
++matches;
++length;
}
}
else
{
if (s[i] == value[matches])
{
++matches;
++length;
}
else if (ignoreChars != null && ignoreChars.Contains(s[i]))
{
++length;
}
else
{
startIndex = -1;
matches = 0;
length = 0;
}
}
found = (matches == value.Length);
}
if (found)
{
result = s.Substring(startIndex, length);
}
else
{
result = null;
}
return found;
}
}

EDIT: here's an updated solution addressing the points in your recent update. The idea is the same except if you have one substring it will need to insert the ignore pattern between each character. If the substring contains spaces it will split on the spaces and insert the ignore pattern between those words. If you don't have a need for the latter functionality (which was more in line with your original question) then you can remove the Split and if checking that provides that pattern.
Note that this approach is not going to be the most efficient.
string input = #"foo ?A&3/3/C)412& bar A341C2";
string substring = "A41";
string[] ignoredChars = { "&", "/", "3", "C", ")" };
// builds up the ignored pattern and ensures a dash char is placed at the end to avoid unintended ranges
string ignoredPattern = String.Concat("[",
String.Join("", ignoredChars.Where(c => c != "-")
.Select(c => Regex.Escape(c)).ToArray()),
(ignoredChars.Contains("-") ? "-" : ""),
"]*?");
string[] substrings = substring.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
string pattern = "";
if (substrings.Length > 1)
{
pattern = String.Join(ignoredPattern, substrings);
}
else
{
pattern = String.Join(ignoredPattern, substring.Select(c => c.ToString()).ToArray());
}
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine("Index: {0} -- Match: {1}", match.Index, match.Value);
}
Try this solution out:
string input = "Hello, -this- is a string";
string[] searchStrings = { "Hello", "this" };
string pattern = String.Join(#"\W+", searchStrings);
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine(match.Value);
}
The \W+ will match any non-alphanumeric character. If you feel like specifying them yourself, you can replace it with a character class of the characters to ignore, such as [ ,.-]+ (always place the dash character at the start or end to avoid unintended range specifications). Also, if you need case to be ignored use RegexOptions.IgnoreCase:
Regex.Matches(input, pattern, RegexOptions.IgnoreCase)
If your substring is in the form of a complete string, such as "Hello this", you can easily get it into an array form for searchString in this way:
string[] searchString = substring.Split(new[] { ' ' },
StringSplitOptions.RemoveEmptyEntries);

This code will do what you want, although I suggest you modify it to fit your needs better:
string resultString = null;
try
{
resultString = Regex.Match(subjectString, "Hello[, -]*this", RegexOptions.IgnoreCase).Value;
}
catch (ArgumentException ex)
{
// Syntax error in the regular expression
}

You could do this with a single Regex but it would be quite tedious as after every character you would need to test for zero or more ignored characters. It is probably easier to strip all the ignored characters with Regex.Replace(subject, "[-,]", ""); then test if the substring is there.
Or the single Regex way
Regex.IsMatch(subject, "H[-,]*e[-,]*l[-,]*l[-,]*o[-,]* [-,]*t[-,]*h[-,]*i[-,]*s[-,]*")

Here's a non-regex way to do it using string parsing.
private string GetSubstring()
{
string searchString = "Hello, -this- is a string";
string searchStringWithoutUnwantedChars = searchString.Replace(",", "").Replace("-", "");
string desiredString = string.Empty;
if(searchStringWithoutUnwantedChars.Contains("Hello this"))
desiredString = searchString.Substring(searchString.IndexOf("Hello"), searchString.IndexOf("this") + 4);
return desiredString;
}

You could do something like this, since most all of these answer require rebuilding the string in some form.
string1 is your string you want to look through
//Create a List(Of string) that contains the ignored characters'
List<string> ignoredCharacters = new List<string>();
//Add all of the characters you wish to ignore in the method you choose
//Use a function here to get a return
public bool subStringExist(List<string> ignoredCharacters, string myString, string toMatch)
{
//Copy Your string to a temp
string tempString = myString;
bool match = false;
//Replace Everything that you don't want
foreach (string item in ignoredCharacters)
{
tempString = tempString.Replace(item, "");
}
//Check if your substring exist
if (tempString.Contains(toMatch))
{
match = true;
}
return match;
}

You could always use a combination of RegEx and string searching
public class RegExpression {
public static void Example(string input, string ignore, string find)
{
string output = string.Format("Input: {1}{0}Ignore: {2}{0}Find: {3}{0}{0}", Environment.NewLine, input, ignore, find);
if (SanitizeText(input, ignore).ToString().Contains(SanitizeText(find, ignore)))
Console.WriteLine(output + "was matched");
else
Console.WriteLine(output + "was NOT matched");
Console.WriteLine();
}
public static string SanitizeText(string input, string ignore)
{
Regex reg = new Regex("[^" + ignore + "]");
StringBuilder newInput = new StringBuilder();
foreach (Match m in reg.Matches(input))
{
newInput.Append(m.Value);
}
return newInput.ToString();
}
}
Usage would be like
RegExpression.Example("Hello, -this- is a string", "-,", "Hello this"); //Should match
RegExpression.Example("Hello, -this- is a string", "-,", "Hello this2"); //Should not match
RegExpression.Example("?A&3/3/C)412&", "&/3C\\)", "A41"); // Should match
RegExpression.Example("?A&3/3/C) 412&", "&/3C\\)", "A41"); // Should not match
RegExpression.Example("?A&3/3/C)412&", "&/3C\\)", "A3C412&"); // Should match
Output
Input: Hello, -this- is a string
Ignore: -,
Find: Hello this
was matched
Input: Hello, -this- is a string
Ignore: -,
Find: Hello this2
was NOT matched
Input: ?A&3/3/C)412&
Ignore: &/3C)
Find: A41
was matched
Input: ?A&3/3/C) 412&
Ignore: &/3C)
Find: A41
was NOT matched
Input: ?A&3/3/C)412&
Ignore: &/3C)
Find: A3C412&
was matched

How do I split a string by a multi-character delimiter in C#?

What if I want to split a string using a delimiter that is a word?
For example, This is a sentence.
I want to split on is and get This and a sentence.
In Java, I can send in a string as a delimiter, but how do I accomplish this in C#?

http://msdn.microsoft.com/en-us/library/system.string.split.aspx
Example from the docs:
string source = "[stop]ONE[stop][stop]TWO[stop][stop][stop]THREE[stop][stop]";
string[] stringSeparators = new string[] {"[stop]"};
string[] result;
// ...
result = source.Split(stringSeparators, StringSplitOptions.None);
foreach (string s in result)
{
Console.Write("'{0}' ", String.IsNullOrEmpty(s) ? "<>" : s);
}

You can use the Regex.Split method, something like this:
Regex regex = new Regex(#"\bis\b");
string[] substrings = regex.Split("This is a sentence");
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
Edit: This satisfies the example you gave. Note that an ordinary String.Split will also split on the "is" at the end of the word "This", hence why I used the Regex method and included the word boundaries around the "is". Note, however, that if you just wrote this example in error, then String.Split will probably suffice.

Based on existing responses on this post, this simplify the implementation :)
namespace System
{
public static class BaseTypesExtensions
{
/// <summary>
/// Just a simple wrapper to simplify the process of splitting a string using another string as a separator
/// </summary>
/// <param name="s"></param>
/// <param name="pattern"></param>
/// <returns></returns>
public static string[] Split(this string s, string separator)
{
return s.Split(new string[] { separator }, StringSplitOptions.None);
}
}
}

string s = "This is a sentence.";
string[] res = s.Split(new string[]{ " is " }, StringSplitOptions.None);
for(int i=0; i<res.length; i++)
Console.Write(res[i]);
EDIT: The "is" is padded on both sides with spaces in the array in order to preserve the fact that you only want the word "is" removed from the sentence and the word "this" to remain intact.

...In short:
string[] arr = "This is a sentence".Split(new string[] { "is" }, StringSplitOptions.None);

Or use this code; ( same : new String[] )
.Split(new[] { "Test Test" }, StringSplitOptions.None)

You can use String.Replace() to replace your desired split string with a character that does not occur in the string and then use String.Split on that character to split the resultant string for the same effect.

Here is an extension function to do the split with a string separator:
public static string[] Split(this string value, string seperator)
{
return value.Split(new string[] { seperator }, StringSplitOptions.None);
}
Example of usage:
string mystring = "one[split on me]two[split on me]three[split on me]four";
var splitStrings = mystring.Split("[split on me]");

var dict = File.ReadLines("test.txt")
.Where(line => !string.IsNullOrWhitespace(line))
.Select(line => line.Split(new char[] { '=' }, 2, 0))
.ToDictionary(parts => parts[0], parts => parts[1]);
or
enter code here
line="to=xxx#gmail.com=yyy#yahoo.co.in";
string[] tokens = line.Split(new char[] { '=' }, 2, 0);
ans:
tokens[0]=to
token[1]=xxx#gmail.com=yyy#yahoo.co.in

string strData = "This is much easier"
int intDelimiterIndx = strData.IndexOf("is");
int intDelimiterLength = "is".Length;
str1 = strData.Substring(0, intDelimiterIndx);
str2 = strData.Substring(intDelimiterIndx + intDelimiterLength, strData.Length - (intDelimiterIndx + intDelimiterLength));

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# - Identify the matching character when using String.Split(CharArray) - c#

Related

Split word from string

How can I trim strings down to the first occurrence of a "；" or a "["

reverse words BUT dot should be at the end

Find substring ignoring specified characters

How do I split a string by a multi-character delimiter in C#?

Categories

Resources