Splitting string on multi-character delimeter - c#

string Idstr="ID03I010102010210AEMPD4677EID03I020102020208L8159734ID03I030102030210IPS1406974PT03T010109981815938030202PT03T0201109899488666030201PT03T0301109818159381030203PT03T040112919818159381030201";
string[] stringSeparators = new string[] { "ID03I0" };
string[] result;
result = IdStr.Split(stringSeparators, StringSplitOptions.RemoveEmptyEntries);
This is the result:
result[0]=10102010210AEMPD4677E
result[1]=20102020208L8159734
result[3]=30102030210IPS1406974PT03T010109981815938030202PT03T0201109899488666030201PT03T0301109818159381030203PT03T040112919818159381030201
Desired result:
result[0]=ID03I010102010210AEMPD4677E
result[1]=ID03I020102020208L8159734
result[3]=ID03I030102030210IPS1406974PT03T010109981815938030202PT03T0201109899488666030201PT03T0301109818159381030203PT03T040112919818159381030201
As you can see I want to include delimiter ID03I0 to the elements.
NOTE: I know I can include it by hardcoding it. But that's not the way I want to do it.

result = IdStr.Split(stringSeparators, StringSplitOptions.RemoveEmptyEntries)
.Select(x => stringSeparators[0] + x).ToArray();
This adds the seperator to the beginning at every element within your array.
EDIT: Unfortunately with this approach you are limited to use just one single delimiter. So if you want to add more you´d use Regex instead.

Following Regex pattern should work.
string input = "ID03I010102010210AEMPD4677EID03I020102020208L8159734ID03I030102030210IPS1406974PT03T010109981815938030202PT03T0201109899488666030201PT03T0301109818159381030203PT03T040112919818159381030201";
string delimiter = "ID03I0";//Modify it as you need
string pattern = string.Format("(?<=.)(?={0})", delimiter);
string[] result = Regex.Split(input, pattern);
Online Demo
Adapted from this answer.

Related

Complex string split C#

I have input file like this:
input.txt
aa#aa.com bb#bb.com "Information" "Hi there"
cc#cc.com dd#dd.com "Follow up" "Interview"
I have used this method:
string[] words = item.Split(' ');
However, it splits every words with space. I also have spaces in quotes strings but I won't split those spaces.
Basically I want to parse this input from file to this output:
From = aa#aa.com
To = bb#bb.com
Subject = Information
Body = Hi there
How do I split these strings in C#?
Simply you can use Regex as it is said in this question
var stringValue = "aa#aa.com bb#bb.com \"Information\" \"Hi there\"";
var parts = Regex.Matches(stringValue, #"[\""].+?[\""]|[^ ]+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
//parts: aa#aa.com
bb#bb.com
"Information"
"Hi there"
Also you may try Replace function to remove those " characters.
The String.Split() method has an overload that allows you to specify the number of splits required. You can get what you want like this:
Read one line at a time
Call input.Split(new string[" "], 3, StringSplitOptions.None) - this returns an array of strings with 3 parts. Since email addresses don't have spaces in them, the first two strings will be the from/to addresses, and the third string will be the subject and message. Assume the result of this call is stored in firstSplit[], then firstSplit[0] is the from address, firstSplit[1] is the to address, and firstSplit[2] is the subject and message combined.
Call firstSplit[2].Split(new string[""" """], 2, StringSplitOptions.None) - this searches for the string " " in the concatenated subject+message from the previous call, which should pinpoint the separator between the end of the subject and the start of the message. This will give you the subject and message in another array. (The double-quotes inside are doubled to escape them)
This assumes you disallow double quotes in your subject and message. If you do allow double quotes, then you need to ensure you escape them before putting it in the file in the first place.
You can do this without using regex by just using IndexOf and SubString just put it in a loop if you have multiple emails to parse.
It's not pretty but it would be faster than RegEx if you're doing a lot of them.
string content = #"abba#aa.com dddb#bdd.com ""Information"" ""Hi there""";
string firstEmail = content.Substring(0, content.IndexOf(" ", StringComparison.Ordinal));
string secondEmail = content.Substring(firstEmail.Length, content.IndexOf(" ", firstEmail.Length + 1) - firstEmail.Length);
int firstQuote = content.IndexOf("\"", StringComparison.Ordinal);
string subjectandMessage = content.Substring(firstQuote, content.Length - content.IndexOf("\"", firstQuote, StringComparison.Ordinal));
String[] words = subjectandMessage.Split(new string[] { "\" \"" }, StringSplitOptions.None);
Console.WriteLine(firstEmail);
Console.WriteLine(secondEmail);
Console.WriteLine(words[0].Remove(0,1));
Console.WriteLine(words[1].Remove(words[1].Length -1));
Output:
aa#aa.com
bb#bb.com
Information
Hi there
As Spencer pointed out, read this file line by line using File.ReadAllLines() method and then apply String.Split[] method with spaces using something like this:
string[] elements = string.Split(new char[0]);
UPDATE
Not a pretty solution, but this is how I think it can work:
string[] readText = File.ReadAllLines(' ');
//Take value of first 3 fields by simple readText[index]; (index: 0-2)
string temp = "";
for(int i=3; i<readText.Length; i++)
{
temp += readText[i];
}
Requires reference to Microsoft.VisualBasic, but a bit more reliable than Regex:
using (var tfp = new Microsoft.VisualBasic.FileIO.TextFieldParser("input.txt")) {
for (tfp.SetDelimiters(" "); !tfp.EndOfData;) {
string[] fields = tfp.ReadFields();
Debug.Print(string.Join(",", fields)); // "aa#aa.com,bb#bb.com,Information,Hi there"
}
}

Split string into array that is enclosed in capturing parentheses

I have the following string in my project:
((1,01/31/2015)(1,Filepath)(1,name)(1,code)(1,String)(1, ))
I want to split this string into parts where i get the information within the capturing parentheses (for example 1,Filepath or (1,Filepath), but the whole string is in capturing parentheses too as you can see. The result i then try to put into array with string[] array = Regex.Split(originalString,SomeRegexHere)
Now i am wondering what would be the best approach be, just remove the first and last character of the string so i don't have the capturing parentheses enclosing the whole string, or is there some way to use Regular expressions on this to get the result i want to ?
string s = "((1,01/31/2015)(1,Filepath)(1,name)(1,code)(1,String)(1, ))";
var data = s.Split(new string[]{"(", ")"}, StringSplitOptions.RemoveEmptyEntries)
your Data would be then
["1,01/31/2015",
"1,Filepath",
"1,name",
"1,code",
"1,String",
"1,"]
You can create a substring without the first 2 and last 2 brackets and then split this on the enclosing brackets
var s = "((1,01/31/2015)(1,Filepath)(1,name)(1,code)(1,String)(1, ))";
var result = s.Substring(2, s.Length - 4)
.Split(new string[]{")("}, StringSplitOptions.RemoveEmptyEntries);
foreach(var r in result)
Console.WriteLine(r);
Output
1,01/31/2015
1,Filepath
1,name
1,code
1,String
1,
Example
(?<=\()[^()]*(?=\))
Just do a match and get your contents instead of splitting.See demo.
https://regex101.com/r/eS7gD7/15

Splitting on “,” but not “/,”

Question: How do I write an expression to split a string on ',' but not '/,'? Later I'll want to replace '/,' with ', '.
Details...
Delimiter: ','
Skip Char: '/'
Example input: "Mister,Bill,is,made,of/,clay"
I want to split this input into an array: {"Mister", "Bill", "is", "made", "of, clay"}
I know how to do this with a char prev, cur; and some indexers, but that seems beta.
Java Regex has a split functionality, but I don't know how to replicate this behavior in C#.
Note: This isn't a duplicate question, this is the same question but for a different language.
I believe you're looking for a negative lookbehind:
var regex = new Regex("(?<!/),");
var result = regex.Split(str);
this will split str on all commas that are not preceded by a slash. If you want to keep the '/,' in the string then this will work for you.
Since you said that you wanted to split the string and later replace the '/,' with ', ', you'll want to do the above first then you can iterate over the result and replace the strings like so:
var replacedResult = result.Select(s => s.Replace("/,", ", ");
string s = "Mister,Bill,is,made,of/,clay";
var arr = s.Replace("/,"," ").Split(',');
result : {"Mister", "Bill", "is", "made", "of clay"}
Using Regex:
var result = Regex.Split("Mister,Bill,is,made,of/,clay", "(?<=[^/]),");
Just use a Replace to remove the commas from your string :
s.Replace("/,", "//").Split(',').Select(x => x.Replace("//", ","));
You can use this in c#
string regex = #"(?:[^\/]),";
var match = Regex.Split("Mister,Bill,is,made,of/,clay", regex, RegexOptions.IgnoreCase);
After that you can replace /, and continue your operation as you like

Splitting a string in C#

I am trying to split a string in C# the following way:
Incoming string is in the form
string str = "[message details in here][another message here]/n/n[anothermessage here]"
And I am trying to split it into an array of strings in the form
string[0] = "[message details in here]"
string[1] = "[another message here]"
string[2] = "[anothermessage here]"
I was trying to do it in a way such as this
string[] split = Regex.Split(str, #"\[[^[]+\]");
But it does not work correctly this way, I am just getting an empty array or strings
Any help would be appreciated!
Use the Regex.Matches method instead:
string[] result =
Regex.Matches(str, #"\[.*?\]").Cast<Match>().Select(m => m.Value).ToArray();
The Split method returns sub strings between the instances of the pattern specified. For example:
var items = Regex.Split("this is a test", #"\s");
Results in the array [ "this", "is", "a", "test" ].
The solution is to use Matches instead.
var matches = Regex.Matches(str, #"\[[^[]+\]");
You can then use Linq to easily get an array of matched values:
var split = matches.Cast<Match>()
.Select(m => m.Value)
.ToArray();
Another option would be to use lookaround assertions for your splitting.
e.g.
string[] split = Regex.Split(str, #"(?<=\])(?=\[)");
This approach effectively splits on the void between a closing and opening square bracket.
Instead of using a regex you could use the Split method on the string like so
Split(new[] { '\n', '[', ']' }, StringSplitOptions.RemoveEmptyEntries)
You'll loose [ and ] around your results with this method but it's not hard to add them back in as needed.

Substring of a variant string

I have the following return of a printer:
{Ta005000000000000000000F 00000000000000000I 00000000000000000N 00000000000000000FS 00000000000000000IS 00000000000000000NS 00000000000000000}
Ok, I need to save, in a list, the return in parts.
e.g.
[0] "Ta005000000000000000000F"
[1] "00000000000000000I"
[2] "00000000000000000N"
...
The problem is that the number of characters varies.
A tried to make it going into the 'space', taking the substring, but failed...
Any suggestion?
Use String.Split on a single space, and use StringSplitOptions.RemoveEmptyEntries to make sure that multiple spaces are seen as only one delimiter:
var source = "00000000000000000FS 0000000...etc";
var myArray = source.Split(' ', StringSplitOptions.RemoveEmptyEntries);
#EDIT: An elegant way to get rid of the braces is to include them as separators in the Split (thanks to Joachim Isaksson in the comments):
var myArray = source.Split(new[] {' ', '{', '}'}, StringSplitOptions.RemoveEmptyEntries);
You could use a Regex for this:
string input = "{Ta005000000000000000000F 00000000000000000I 00000000000000000N 00000000000000000FS 00000000000000000IS 00000000000000000NS 00000000000000000}";
IEnumerable<string> matches = Regex.Matches(input, "[0-9a-zA-Z]+").Select(m => m.Value);
You can use string.split to create an array of substrings. Split allows you to specify multiple separator characters and to ignore repeated splits if necessary.
You could use the .Split member of the "String" class and split the parts up to that you want.
Sample would be:
string[] input = {Ta005000000000000000000F 00000000000000000I 00000000000000000N 00000000000000000FS 00000000000000000IS 00000000000000000NS 00000000000000000};
string[] splits = input.Split(' ');
Console.WriteLine(splits[0]); // Ta005000000000000000000F
And so on.
Just off the bat. Without considering the encompassing braces:
string printMsg = "Ta005000000000000000000F 00000000000000000I
00000000000000000N 00000000000000000FS
00000000000000000IS 00000000000000000NS 00000000000000000";
string[] msgs = printMsg.Split(' ').ForEach(s=>s.Trim()).ToArray();
Could work.

Categories