Splitting a string in C#, why is this not working? - c#

I have the following string:
string myString = " The objective for test.\vVision\v* Deliver a test
goals\v** Comprehensive\v** Control\v* Alignment with cross-Equities
strategy\vApproach\v*An acceleration "
and I am trying to split on "\v"
I tried this but it doesn't seem to work:
char[] delimiters = new char[] { '\v' };
string[] split = myString.Split(delimiters);
for (int i = 0; i < split.Length; i++) {
}
split.Length shows up as 1. Any suggestions?

"\v" is two characters, not one, in your original string (which is not counting the \ as an escape character as a literal C# string does).
You need to be splitting on literal "\v" which means you need to specify the overload of Split that takes a string:
string[] split = narrative.Split(new string[] {"\\v"}, StringSplitOptions.None);
Note how I had to escape the "\" character with "\\"
Your '\v' is a single control character, not two characters.
I think your question itself is slightly misleading...
Your example string, if entered into C# will actually work like you expected, because a \v in a verbatum C# string will be escaped to a special character:
string test = " The objective for test.\vVision\v* Deliver a test goals\v** Comprehensive\v** Control\v* Alignment with cross-Equities strategy\vApproach\v*An acceleration ";
char[] delimiters = new char[] { '\v' };
Console.WriteLine(test.Split(delimiters).Length); // Prints 8
However, I think your actual string really does have backslash-v in it rather than escaped \v:
string test = " The objective for test.\\vVision\\v* Deliver a test goals\\v** Comprehensive\\v** Control\\v* Alignment with cross-Equities strategy\\vApproach\\v*An acceleration ";
char[] delimiters = new char[] { '\v' };
Console.WriteLine(test.Split(delimiters).Length); // Prints 1, like you say you see.
So you can fix it as described above by using an array of strings to split the string:
string test = " The objective for test.\\vVision\\v* Deliver a test goals\\v** Comprehensive\\v** Control\\v* Alignment with cross-Equities strategy\\vApproach\\v*An acceleration ";
string[] delimiters = new [] { "\\v" };
Console.WriteLine(test.Split(delimiters, StringSplitOptions.None).Length); // Prints 8

Use something like this
string[] separators = {#"\v"};
string value = #"The objective for test.\vVision\v* Deliver a test goals\v** Comprehensive\v** Control\v* Alignment with cross-Equities strategy\vApproach\v*An acceleration";
string[] words = value.Split(separators, StringSplitOptions.RemoveEmptyEntries);

If you don't need the resulting array for later use, you can combine the split and loop into one call.
foreach (string s in myString.Split(new[] { "\v" }, StringSplitOptions.RemoveEmptyEntries))
{
//s is the string you can now use in your loop
}

Use Like this,
string myString = " The objective for test.\vVision\v* Deliver a test goals\v** Comprehensive\v** Control\v* Alignment with cross-Equities strategy\vApproach\v*An acceleration";
string[] delimiters = new string[] { "\v" };
string[] split = myString.Split(delimiters, StringSplitOptions.None);
for (int i = 1; i < split.Length; i++)
{
}

Try running it like this.
public static string FirstName(string fullName)
{
if (fullName == null)
return null;
var split = fullName.Split(',');
return split.Length > 0 ? split[0] : string.Empty;
}

Related

Determine which character was used in String.Split()

If I am using String.Split() how can I find out which character caused the split? For instance, when "Apple|Car" splits, I want to know that it did so via the pipe character and not a comma or hyphen.
When I see the "Car" item, I'd want to know it was split from "Apple" with a pipe, and split from "Plane" with a comma.
var splitChars = new Char [] {'|', ',', '-'};
string item1 = "Apple|Car,Plane-Truck";
var mySplit = item1.Split(splitChars);
string myMessage = "Apple|Car,Plane-Truck";
//Break apart string
var splits = myMessage.Split(new Char[] { '|', ',', '-' });
int accumulated_length = 0;
foreach (string piece in splits)
{
accumulated_length += piece.Length + 1;
if (accumulated_length <= myMessage.Length)
{
Console.WriteLine("{0} was split at {1}", piece, myMessage[accumulated_length - 1]);
}
else
{
Console.WriteLine("{0} was the last one", piece);
}
}
It will split on all of them in the example you've given. but in general, you would just see which of the defined split characters are contained in the string:
var sourceString = "Apple|Car,Plane-Truck";
var allSplitChars = new[] {'|', ',', '-', '.', '!', '?'};
// Find only the characters that are contained in the source string
List<char> charsUsedToSplit = allSplitChars.Where(sourceString.Contains).ToList();
Any characters in the list will be used for the split.. can you clarify what you're actually trying to do? in your example the tokens after the split will be "Apple", "Car", "Plane", "Truck" so each of your characters will be used to split..
If you're trying to determine which character caused the split for each token, then perhaps you might implement the split yourself and keep track:
List<Tuple<String, Char>> Splitter(string msg, char[] chars) {
var offset = 0;
var splitChars = new HashSet<char>(chars);
var splits = new List<Tuple<String, Char>>();
for(var idx = 0; idx < msg.Length; idx++) {
if (splitChars.Contains(msg[idx])) {
var split = Tuple.Create(msg.Substring(offset, idx - offset), msg[idx]);
splits.Add(split);
offset = idx + 1;
}
}
return splits;
}
string myMessage = "Apple|Car,Plane-Truck";
var splits = Splitter(myMessage, new [] {'|', ',', '-'});
foreach (string piece in splits)
{
Console.WriteLine("word: {0}, split by: {1}", piece.Item1, piece.Item2);
}

reverse words in a string in c# keeping number of whitespaces same

I am trying to code a function to reverse words in a string in c#,
Ex: "This is some text, hello world"
should be printed like
"world hello, text some is This" the number of white spaces must be same in reverse string and special characters like comma must be correctly placed after the preceding word as shown in reverse string.
I tried following, but it is not taking care of special characters like ','
public static string reverseStr(string s)
{
string result = "";
string word = "";
foreach (char c in s)
{
if (c == ' ')
{
result = word + ' ' + result;
word= "";
}
else
{
word = word + c;
}
}
result = word + ' ' + result;
return result;
}
what do you mean
with special characters like comma
are there other characters that need to be treated different? This turns "This is some text, hello world" to your expected result "world hello, text some is This"
string input = "This is some text, hello world";
string result = string.Join(" ", input.Split(' ', ',').Reverse()).Replace(" ", ", ");
UPDATE
if you want to treat every special character, you need a RegEx Solution.
string result2 =string.Join(string.Empty, System.Text.RegularExpressions.Regex.Split(input, #"([^\w]+)").Reverse());
Here's a solution using regex:
Regex.Replace(
string.Join("", //3. Join reversed elements
Regex.Split(input, #"(\s+)|(,)") //1. Split by space and comma, keep delimeters
.Reverse()), //2. Reverse splitted elements
#"(\s+),", #",$1"); //4. Fix comma position in joined string
The following solution keeps all whitespaces.
It first detects the kind (separator vs word/content) of any character and stores a list of chunks (where each item contains the start and end index, together with a boolean telling whether the chunk contains separators or word).
Then it writes the chunks to a result string in reversed order.
The order of characters inside each chunks is preserved, being the chunk a separator or word/content: this allow also to keep any double space or other chain of separators without need to post-check their sequence or quantity.
public static string Reverse(string text, Func<char, bool> separatorPredicate)
{
// Get all chars from source text
var aTextChars = text.ToCharArray();
// Find the start and end position of every chunk
var aChunks = new List<Tuple<int, int, bool>>();
{
var bLast = false;
var ixStart = 0;
// Loops all characters
for (int ixChar = 0; ixChar < aTextChars.Length; ixChar++)
{
var ch = aTextChars[ixChar];
// Current char is a separator?
var bNow = separatorPredicate(ch);
// Current char kind (separator/word) is different from previous
if ((ixChar > 0) && (bNow != bLast))
{
aChunks.Add(Tuple.Create(ixStart, ixChar - 1, bLast));
ixStart = ixChar;
bLast = bNow;
}
}
// Add remaining chars
aChunks.Add(Tuple.Create(ixStart, aTextChars.Length - 1, bLast));
}
var result = new StringBuilder();
// Loops all chunks in reverse order
for (int ixChunk = aChunks.Count - 1; ixChunk >= 0; ixChunk--)
{
var chunk = aChunks[ixChunk];
result.Append(text.Substring(chunk.Item1, chunk.Item2 - chunk.Item1 + 1));
}
return result.ToString();
}
public static string Reverse(string text, char[] separators)
{
return Reverse(text, ch => Array.IndexOf(separators, ch) >= 0);
}
public static string ReverseByPunctuation(string text)
{
return Reverse(text, new[] { ' ', '\t', '.', ',', ';', ':' });
}
public static string ReverseWords(string text)
{
return Reverse(text, ch => !char.IsLetterOrDigit(ch));
}
There are 4 methods:
Reverse(string text, Func separatorPredicate) receives the source text and a delegate to determine when a character is a separator.
Reverse(string text, char[] separators) receives the source text and an array of chars to be treated as separators (any other char is word/content).
ReverseByPunctuation(string text) receives only the source text and delegates the computation to the first overload passing a predefined set of separator chars.
ReverseWords(string text) receives only the source text and delegates the computation to the first overload passing a delegate that recognize as separator everything that is not a letter or digit.

Indent multiple lines of text

I need to indent multiple lines of text (in contrast to this question for a single line of text).
Let's say this is my input text:
First line
Second line
Last line
What I need is this result:
First line
Second line
Last line
Notice the indentation in each line.
This is what I have so far:
var textToIndent = #"First line
Second line
Last line.";
var splittedText = textToIndent.Split(new string[] {Environment.NewLine}, StringSplitOptions.None);
var indentAmount = 4;
var indent = new string(' ', indentAmount);
var sb = new StringBuilder();
foreach (var line in splittedText) {
sb.Append(indent);
sb.AppendLine(line);
}
var result = sb.ToString();
Is there a safer/simpler way to do it?
My concern is in the split method, which might be tricky if text from Linux, Mac or Windows is transfered, and new lines might not get splitted correctly in the target machine.
Since you are indenting all the lines, how about doing something like:
var result = indent + textToIndent.Replace("\n", "\n" + indent);
Which should cover both Windows \r\n and Unix \n end of lines.
Just replace your newline with newline + indent:
var indentAmount = 4;
var indent = new string(' ', indentAmount);
textToIndent = indent + textToIndent.Replace(Environment.NewLine, Environment.NewLine + indent);
The following solution may seem long-winded compared to other solutions posted here; but it has a few distinct advantages:
It will preserve line separators / terminators exactly as they are in the input string.
It will not append superfluous indentation characters at the end of the string.
It might run faster, as it uses only very primitive operations (character comparisons and copying; no substring searches, nor regular expressions). (But that's just my expectation; I haven't actually measured.)
static string Indent(this string str, int count = 1, char indentChar = ' ')
{
var indented = new StringBuilder();
var i = 0;
while (i < str.Length)
{
indented.Append(indentChar, count);
var j = str.IndexOf('\n', i + 1);
if (j > i)
{
indented.Append(str, i, j - i + 1);
i = j + 1;
}
else
{
break;
}
}
indented.Append(str, i, str.Length - i);
return indented.ToString();
}
Stakx's answer got me thinking about not appending superfluous indentation characters. And I think is best to avoid those characters not only at the end, but also in the middle and beginning of the string (when that's all that line has).
I used a Regex to replace new lines only if they are not followed by another new line, and another Regex to avoid adding the first indent in case the string begins with a new line:
Regex regexForReplace = new Regex(#"(\n)(?![\r\n])");
Regex regexForFirst = new Regex(#"^([\r\n]|$)");
string Indent(string textToIndent, int indentAmount = 1, char indentChar = ' ')
{
var indent = new string(indentChar, indentAmount);
string firstIndent = regexForFirst.Match(textToIndent).Success ? "" : indent;
return firstIndent + regexForReplace.Replace(textToIndent, #"$1" + indent);
}
I create the Regexs outside the method in order to speed up multiple replacements.
This solution can be tested at: https://ideone.com/9yu5Ih
If you need a string extension that adds a generic indent to a multi line string you can use:
public static string Indent(this string input, string indent)
{
return string.Join(Environment.NewLine, input.Split(Environment.NewLine).Select(item => string.IsNullOrEmpty(item.Trim()) ? item : indent + item));
}
This extension skips empty lines.
This solution is really simple to understand if you know linq and it's more simple to debug and change if you need to adapt it to different scopes.

Eliminating characters from string

I am reading from a file, and some of the data is comming in like this
"\"ZIP\""
so when i try to assign it its causing errors, i want to get rid off the extra \", so if I assign it to as string like
string s = data[1].ToString();
what s is "\"ZIP\""
i just want it to be "ZIP", i tried:
string s = data[1].ToString().replace("\\\"","");
but no luck. Any help would be much appreciated.
just try:
var result = "\"ZIP\"".Replace("\"", "");
Or:
var result = "\"ZIP\"".Trim('"');
String.Trim could be used with an array of char to remove from start and end of a string
char[] charsToTrim = { '"', '\\'};
string s = data[1].ToString().Trim(charsToTrim);
Remove the escape characters in the string by split and create new string. You can include as many characters in escape sequence array as you want.
StringBuilder sb = new StringBuilder();
string[] parts = inputString.Split(new char[] {'"'};
StringSplitOptions.RemoveEmptyEntries);
int size = parts.Length;
for (int i = 0; i < size; i++)
sb.AppendFormat("{0} ", parts[i]);=
string strWithoutEscape = sb.ToString();
Try:
data[1].toString().replace("\\\"", "");
Notice that the function is case-sensitive, so using ToString would fail.

Finding a char from an array, spliting at that point and then inserting another char after

I am basically looking for a way to check if a certain string contains any of a certain list of chars, and if contains one of these to split the string and then insert the same char infront/after it. This is because these certain chars are breaking my search when they are input due to SQL not handling them well.
This is how far I have actually got so far:
string[] errorChars = new string[]
{
"!",
"}",
"{",
"'",
};
for (int i = 0; i < errorChars.Count(); i++)
{
if(fTextSearch.Contains(errorChars[i]))
{
}
}
The problem with several answers (in their current rendition) is that they are dropping your split character. If you need to keep your split character, try this:
StringBuilder sb = new StringBuilder();
string[] splitString = fTextSearch.Split(errorChars, StringSplitOptions.None);
int numNewCharactersAdded = 0;
foreach( string itm in splitString)
{
sb.Append(itm); //append string
if (fTextSearch.Length > (sb.Length - numNewCharactersAdded))
{
sb.Append(fTextSearch[sb.Length - numNewCharactersAdded]); //append splitting character
sb.Append(fTextSearch[sb.Length - numNewCharactersAdded - 1]); //append it again
numNewCharactersAdded ++;
}
}
fTextSearch = sb.ToString();
Here's an IDEOne example
I think what you are really wanting is a replace function.
for (int i = 0; i < errorChars.Count(); i++)
{
if(fTextSearch.Contains(errorChars[i]))
{
fTextSearch.Replace(errorChars[i],errorChars[i] + errorChars[i]);
}
}
although doubling up the character is probably not the answer. You need the escape char which is \ so the replace string would be
ftextSearch.Replace(errorChars[i],"\"+errorChars[i]);

Categories