How to use Regex to split a string AND include whitespace - c#

I can't seem to find (or write) a simple way of splitting the following sentence into words and assigning a word to the whitespace between the letters.
(VS 2010, C#, .net4.0).
String text = "This is a test.";
Desired result:
[0] = This
[1] = " "
[2] = is
[3] = " "
[4] = a
[5] = " "
[6] = test.
The closest I have come is:
string[] words = Regex.Split(text, #"\s");
but ofcourse, this drops the whitespace.
Suggestions are appreciated. Thanks
Edit: There may be one or more spaces between the words. I would like all spaces between the words to be returned as a "word" itself (with all spaces being placed in that "word"). e.g., if 5 spaces between a word would be.
String spaceword = " "; <--This is not showing correctly, there should be a string of 5 spaces.

Change your pattern to (\s+):
String text = "This is a test.";
string[] words = Regex.Split(text, #"(\s+)");
for(int i =0; i < words.Length;i++)
{
Console.WriteLine(i.ToString() + "," + words[i].Length.ToString() + " = " + words[i]);
}
Here's the output:
0,4 = This
1,8 =
2,2 = is
3,1 =
4,1 = a
5,3 =
6,5 = test.

You can use LINQ to add spaces manually between them:
var parts = text.Split(new[]{ ' ' }, StringSplitOptions.RemoveEmptyEntries);
var result = parts.SelectMany((x,idx) => idx != parts.Length - 1
? new[] { x, " " }
: new[] { x }).ToList();

You can try this regex, \w+|\s+ which uses or operator |
var arr = Regex.Matches(text, #"\S+|\s+").Cast<Match>()
.Select(i => i.Value)
.ToArray();
It just matches both words and spaces and some LINQ stuff is being used so arr is just a String Array

Related

Ignore regex separator between "display name"

I need to split the string with the separator ">," but this separator is allowed between the "" that correspond to the display name e.g:
""display>, name>," <email#tegg.com>, "<display,> >" <display_a#email.com>";
I need it to be separated like:
[["display>, name>," <email#tegg.com>,] ["<display,> >" <display_a#email.com>"]]
I'm using at this moment this:
aux = Regex.Split(addresses, #"(?<=\>,)");
But this doesnt work when the display name has ">,"
E.g str:
str = "\"Some name>,\" <example#email.com>, \"<display,> >\" <display_a#email.com>'";
You can use
var matches = Regex.Matches(str, #"""([^""]*)""\s*<([^<>]*)>")
.Cast<Match>()
.Select(x => new[] { x.Groups[1].Value, x.Groups[2].Value })
.ToArray();
See the C# demo:
var p = #"""([^""]*)""\s*<([^<>]*)>";
var str = "\"Some name>,\" <example#email.com>, \"<display,> >\" <display_a#email.com>'";
var matches = Regex.Matches(str, p).Cast<Match>().Select(x => new[] { x.Groups[1].Value, x.Groups[2].Value }).ToArray();
foreach (var pair in matches)
Console.WriteLine("{0} : {1}", pair[0],pair[1]);
Output:
Some name>, : example#email.com
<display,> > : display_a#email.com
See also the regex demo. Details:
" - a " char
([^"]*) - Group 1: any zero or more chars other than "
" - a " char
\s* - zero or more whitespaces
< - a < char
([^<>]*) - Group 2: any zero or more chars other than < and >
> - a > char

How to combine two texts?

I have a Single line text box and Multiline text box, and want to include a word into the Single line text box with the words in Multiline text box per line
Like this :
Single line text: "Hello"(I have to use variables)<br>
Multiline words:
<br>
1998<br>
1999<br>
2000
Expected results:
Hello1998
Hello1999
Hello2000
Pls Help me
I use the below code, but it is not working just with the Single line text box and I have to manipulate by both text boxes:
string left = string.Format(add.Text , Environment.NewLine);
string right = string.Format(textBox1.Text, Environment.NewLine);
string[] leftSplit = left.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
string[] rightSplit = right.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
string output = "";
if (leftSplit.Length == rightSplit.Length)
{
for (int i = 0; i < leftSplit.Length; i++)
{
output += leftSplit[i] + ":" + rightSplit[i] + Environment.NewLine;
}
}
result.Text = output;
Could you please advise me on the right approach?
If you have single line only one word, then no need split it into an array.
Lets consider it as a string left = "Hello";
and textbox1 contains multiline words i.e.
string right = string.Format(textBox1.Text, Environment.NewLine); // right variable contains 1998 \n 1999 \n 2000
Then you can try below approach
var concatString = right.Split(new[] { Environment.NewLine }, StringSplitOptions.None).Select(x => left + x);
string result = string.Join(Environment.NewLine , concatString);
.Net Fiddle
Output :
Hello1998
Hello1999
Hello2000
TextBox.GetLineText(int) will help you:
var singlelineText = singlelineTextBox.Text;
var composedLines = new List<string>();
for (var i = 0; i < multilineineTextBox.LineCount; i++)
{
composedLines.Add(singlelineText + multilineineTextBox.GetLineText(i));
}
result.Text = string.Join(EnvironmentNewline, composedLines);

Split a special character string using c#

I have a string as below
"/calc 2 3 +"
how to split in such a way that i can get
str1="/calc"
str2= "2 3 +"
is there any method in c# does special character splitting?
Thanks!
If it's just that string and always split like you have it, then you can do this:
var x = #"/calc 2 3 +";
var str1 = x.Substring(0, 5);
var str2 = x.Substring(6);
Otherwise, no, there's not special thing that does it because you don't have a unique delimiter.
Use IndexOf method to find the first occurrence of space character. And then use Substring method to split the string into 2.
string strInput = #"/calc 2 3 +";
var list = strInput.Split(' ').ToList();
str1 = list[0];
str2 = String.Join(" ",list.RemoveAt(0));
I'm surprised that no one provided the most obvious answer - using one of the string.Split overloads that allows you to specify the maximum number of substrings to return:
string input = "/calc 2 3 +";
var result = input.Split(new[] { ' ' }, 2);
Debug.Assert(result.Length == 2 && result[0] == "/calc" && result[1] == "2 3 +");
There are lots of ways to do that. The easiest one is to use index of the first space.
var mystr = #"/calc 2 3 +";
int index= mystr.IndexOf(' ');
var str1 = mystr.Substring(0, index);
var str2 = mystr.Substring(index+1);
Also if there is any pattern in your text, you can also use RegEx
string a = "/calc 2 3 +";
string[] array = a.Split(' ');
str1= array[0];
str2= string.Format("{0} {1} {2}", array[1], array[2], array[3]);

how to split a string by whitespaces in C#?

How can I split this by whitespaces. (the first lines is its header)
I try this code but error "index out of range" at cbay.ABS = columnsC[5] because the second line return only 4 instead of 6 elements like in 1st line. I want the 2nd line also return 6 elements.
using (StringReader strrdr = new StringReader(strData))
{
string str;
while ((str = strrdr.ReadLine()) != null)
{
// str = str.Trim();
if ((Regex.IsMatch(str.Substring(0, 1), #"J")) || (Regex.IsMatch(str.Substring(0, 1), #"C")))
{
columnsC = Regex.Split(str, " +");
cbay.AC = columnsC[1];
cbay.AU = columnsC[2];
cbay.SA = columnsC[3];
cbay.ABS = columnsC[5];
// cbay.ABS = str;
}
}
}
In order to get only words without redundant witespaces you could pass StringSplitOptions.RemoveEmptyEntries as second argument for the Split method of the string and if will remove all redundant "whitespaces" since it will split on each whitespace. Instead of using Regex check this simple example:
string inputString = "Some string with words separated with multiple blanck characters";
string[] words = inputString.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
string resultString = String.Join(" ", words); //joins the words without multiple whitespaces, this is for test only.
EDIT In your particular case, if you use this string where parts are separated with multiple whitespaces (at least three) it will work. Check the example:
string inputString = "J 16 16 13 3 3";
string[] words = inputString.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
EDIT2:This is the simplest and the dummiest solution to your problem but I think it will work:
if(str.Length>0 && ((str[0]=="J") || (str[0]=="C")))
{
columnsC = str.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
if((str[0]=="J")
{
cbay.AC = columnsC[1];
cbay.AU = columnsC[2];
cbay.SA = columnsC[3];
cbay.ABS = columnsC[5];
}
else
{
cbay.AU = columnsC[1];
cbay.SA = columnsC[2];
}
}
You could first replace multiple spaces with zeros and after that split on the remaining single spaces;
var test = "test 1 2 3";
var items = test.Replace(" ", "0").Split(' ');
You might get some 00 positions if there are many spaces, but that will still work I guess

C# Regex.Split and Regular expression

I have string, I need split it two times and select part which goes after special character.
Lets say:
string myString = "Word 2010|82e146e7-bc85-4bd4-a691-23d55c686f4b;#Videos|55140947-00d0-4d75-9b5c-00d8d5ab8436";
string[] guids = Regex.Split(myString,";#");
So here I am getting array of two elements with Value + GUID. But I need only Guids, like:
[0]82e146e7-bc85-4bd4-a691-23d55c686f4b
[1]55140947-00d0-4d75-9b5c-00d8d5ab8436
Any way of doing it in one/two lines?
You can do this but just because you can do it in one line doesn't mean you should (readability comes into play if you get too fancy here). There's obviously no validation here at all.
string myString = "Word 2010|82e146e7-bc85-4bd4-a691-23d55c686f4b;#Videos|55140947-00d0-4d75-9b5c-00d8d5ab8436";
string[] guids = Regex.Split(myString, ";#")
.SelectMany(s => Regex.Split(s, #"\|").Skip(1))
.ToArray();
Assert.AreEqual(2, guids.Length);
Assert.AreEqual("82e146e7-bc85-4bd4-a691-23d55c686f4b", guids[0]);
Assert.AreEqual("55140947-00d0-4d75-9b5c-00d8d5ab8436", guids[1]);
You could easily do this without a regex if the last part of each is always a guid:
string[] guids = String.Split(";").Select(c => c.Substring(c.Length - 36)).ToArray();
string[] guids = myString.Split(';').Select(x => x.Split('|')[1]).ToArray();
string myString = "Word 2010|82e146e7-bc85-4bd4-a691-23d55c686f4b;#Videos|55140947-00d0-4d75-9b5c-00d8d5ab8436";
//split the string by ";#"
string[] results = myString.Split(new string[] { ";#" }, StringSplitOptions.RemoveEmptyEntries);
//remove the "value|" part
results[0] = results[0].Substring(results[0].IndexOf('|') + 1);
results[1] = results[1].Substring(results[1].IndexOf('|') + 1);
//Same as above, but in a for loop. usefull if there are more then 2 guids to find
//for(int i = 0; i < results.Length; i++)
// results[i] = results[i].Substring(results[i].IndexOf('|') + 1);
foreach(string result in results)
Console.WriteLine(result);
var guids = Regex
.Matches(myString, #"HEX{8}-HEX{4}-HEX{4}-HEX{4}-HEX{12}".Replace("HEX", "[A-Fa-f0-9]"))
.Cast<Match>()
.Select(m => m.Value)
.ToArray();

Categories