Split a file using specific word in C# [duplicate] - c#

This question already has answers here:
How to tell a RegEx to be greedy on an 'Or' Expression
(2 answers)
Closed 2 years ago.
there is a file which i want to split
MSH|^~\&||||^asdasdasd|||asdasd|637226866166648574|637226866166648574|2.4
EVN|asd|20200416|20200416
PID|1|PW9074asdasd41|asd|PW907441|asdsad^wqe^wqeqwe||19700524|M
MSH|^~\&||||^qweqwewqe|||qwewqeqw|637226866166648574|637226866166648574|2.4
EVN|P03|20200416|20200416
PID|1|PW907441|PW907441|PW907441|Purvis^Walter^Rayshawn||19700524|M
I want to split it using MSH so that the result would be an array of string
array[0]=
"MSH|^~\&||||^asdasdasd|||asdasd|637226866166648574|637226866166648574|2.4
EVN|asd|20200416|20200416
PID|1|PW9074asdasd41|asd|PW907441|asdsad^wqe^wqeqwe||19700524|M";
array[1]=
"MSH|^~\&||||^asdasdasd|||asdasd|637226866166648574|637226866166648574|2.4
EVN|asd|20200416|20200416
PID|1|PW9074asdasd41|asd|PW907441|asdsad^wqe^wqeqwe||19700524|M";
What I have tried so far:
string[] sentences = Regex.Split(a, #"\W*((?i)MSH(?-i))\W*");
result:
array[0]="";
array[1]="MSH";
array[2]="asdasdasd|||asdasd|637226866166648574|637226866166648574|2.4
EVN|asd|20200416|20200416
PID|1|PW9074asdasd41|asd|PW907441|asdsad^wqe^wqeqwe||19700524|M";
array[3]="MSH";
array[4]="asdasdasd|||asdasd|637226866166648574|637226866166648574|2.4
EVN|asd|20200416|20200416
PID|1|PW9074asdasd41|asd|PW907441|asdsad^wqe^wqeqwe||19700524|M";
Or atleast it should not miss |^~\&||||^ after split in index 1 and 2

You can simply use the Split() function for this. Below generates an IEnumerable, which you can make an array using ToArray if you wanted to:
void Main()
{
string s = #"MSH|^~\&||||^asdasdasd|||asdasd|637226866166648574|637226866166648574|2.4
EVN|asd|20200416|20200416
PID|1|PW9074asdasd41|asd|PW907441|asdsad^wqe^wqeqwe||19700524|M
MSH|^~\&||||^qweqwewqe|||qwewqeqw|637226866166648574|637226866166648574|2.4
EVN|P03|20200416|20200416
PID|1|PW907441|PW907441|PW907441|Purvis^Walter^Rayshawn||19700524|M";
foreach (var element in s.Split(new string[] { "MSH" }, StringSplitOptions.RemoveEmptyEntries).Select(x => $"MSH{x}"))
{
Console.WriteLine(element);
}
}

If you want to split on MSH, Cetin Basoz is right. It will perfectly work doing that :
var sentences = a.Split(new String[] { "MSH" }, StringSplitOptions.RemoveEmptyEntries);
If you wanna be case insensitive, you can use that which is much simpler than the regex you used previously :
var sentences = Regex.Split(a, "MSH", RegexOptions.IgnoreCase);

Related

Regex Split behaviour when splitting string by variable length in C# [duplicate]

This question already has answers here:
C# Regex.Split: Removing empty results
(9 answers)
Closed 3 years ago.
An application produces a flat file where each line represents data to be imported into another application. The type of data is irrelevant to this question, but suppose the first line is a string of numbers "0123456789" and the delimiter is a different width for each column. For example, I have to split the strings into an array of different lengths, e.g. 1,2,3,4 giving;
0
12
345
6789
The following code using Regex.Split(s,s) tests this; but can anyone explain why the string is split into 6 groups when I expected 4?
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string data = "0123456789";
string splitPattern = "^";
for(int x = 1; x < 5; ++x) {
splitPattern += string.Format("(.{{{0}}})", x);
}
string[] processedData = Regex.Split(data, splitPattern);
Console.WriteLine($"Using {splitPattern} to split {data} yields {processedData.Length} results.");
foreach(string d in processedData) {
Console.WriteLine(String.Format("[{0}]", d));
}
}
}
Running this code results in the following printed lines;
Using ^(.{1})(.{2})(.{3})(.{4}) to split 0123456789 yields 6 results.
[]
[0]
[12]
[345]
[6789]
[]
In reality the data includes text, numbers and punctuation. Also, the column lengths are not incremental, but I was stumped by the way this was split.
Links
dotnetfiddle
Regex101
Edit
Thanks for the answers and comments. I don't consider this to be a duplicate of C# Regex.Split: Removing empty results since the user actually edited their question to explain it was relating to their regex pattern. I understand now that the behaviour I've noticed is expected and after thinking about it, appreciate why this is so. The pattern in Regex.Split(data, splitPattern) kind of denotes where the delimiter should be. So if the pattern matches the start (and end), then an empty string is the result before (and after) the match.
I prefer Split over Match in this instance since it returns a simple string[] instead of a Match.
It's because Split actually splits the results into a component before and after the expression. When the expression has groups in it, it also includes it as part of the split as well.
See tweaked demo: https://dotnetfiddle.net/gUnxGP
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string data = "01234,56789";
var splitPattern = ","; // two results
string[] processedData = Regex.Split(data, splitPattern);
Console.WriteLine($"Using {splitPattern} to split {data} yields {processedData.Length} results.");
foreach (string d in processedData)
{
Console.WriteLine(String.Format("[{0}]", d));
}
splitPattern = "(,)"; // three results (includes the comma itself)
processedData = Regex.Split(data, splitPattern);
Console.WriteLine($"Using {splitPattern} to split {data} yields {processedData.Length} results.");
foreach (string d in processedData)
{
Console.WriteLine(String.Format("[{0}]", d));
}
}
}
/* Output
Using , to split 01234,56789 yields 2 results.
[01234]
[56789]
Using (,) to split 01234,56789 yields 3 results.
[01234]
[,]
[56789]
*/
As Wiktor commented, you probably should be using Matches instead of Split

How to split a string by multiple chars? [duplicate]

This question already has answers here:
Split a string by another string in C#
(11 answers)
Closed 6 years ago.
I have a string like this: string ip = "192.168.10.30 | SomeName".
I want to split it by the | (including the spaces. With this code it is not possible unfortunately:
string[] address = ip.Split(new char[] {'|'}, StringSplitOptions.RemoveEmptyEntries);
as this leads to "192.168.10.30 ". I know I can add .Trim() to address[0] but is that really the right approach?
Simply adding the spaces(' | ') to the search pattern gives me an
Unrecognized escape sequence
You can split by string, not by character:
var result = ip.Split(new string[] {" | "}, StringSplitOptions.RemoveEmptyEntries);
The Split method accepts character array, so you can specify the second character as well in that array. Since you ware used RemoveEmptyEntries those spaces will be removed from the final result.
Use like this :
string[] address = ip.Split(new char[] { '|',' '}, StringSplitOptions.RemoveEmptyEntries);
You will get two items in the array
"192.168.10.30" and SomeName
This might do the trick for you
string[] address = ip.Split(new char[] { '|' }, StringSplitOptions.RemoveEmptyEntries).Select(s => s.Trim()).ToArray();

Split string into elements of a string array [duplicate]

This question already has answers here:
How can I split a string with a string delimiter? [duplicate]
(7 answers)
Closed 8 years ago.
I've got a string with the format of:
blockh->127.0.0.1 testlocal.de->127.0.0.1 testlocal2.com
Now I need to seperate the elements, best way would be a string array I think. I would need to get only these elements seperated:
127.0.0.5 somerandompage.de
127.0.0.1 anotherrandompage.com
How to split and filter the array to get only these elements?
Using the .Filter() Methode doesn't to the job.
You can use the string Split method:
var st = "blockh->127.0.0.1 testlocal.de->127.0.0.1 testlocal2.com";
var result = st.Split(new [] { "->" }, StringSplitOptions.None);
You can achieve the same with a Regex:
var result = Regex.Split(st, "->");
As a note from #Chris both of these will split the string into an array with 3 elements:
blockh
127.0.0.1 testlocal.de
127.0.0.1 testlocal2.com
In case you want to get rid of blockh, you can do a regex match using an IP address and domain regex:
var ip = new Regex(#"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b\s*(([\w][\w\-\.]*)\.)?([\w][\w\-]+)(\.([\w][\w\.]*))");
var result = ip.Matches(st).Cast<Match>()
.Select(m => m.Value)
.ToArray();
This will get only the two elements containing IP addresses.
You can use a string Split() method to do that.
var s = "testlocal->testlocal2";
var splitted = s.Split(new[] {"->"}, StringSplitOptions.RemoveEmptyEntries); //result splitted[0]=testlocal, splitted[1]=testlocal2
If the simpler versions for splitting a string don't work for you, you will likely be served best by defining a Regular Expression and extracting matches.
That is described in detail in this MSDN article: http://msdn.microsoft.com/en-us/library/ms228595.aspx
For more information on how Regular Expressions can look this page is very helpful: http://regexone.com/

How can I split a string with a string delimiter? [duplicate]

This question already has answers here:
How do I split a string by a multi-character delimiter in C#?
(10 answers)
Closed 4 years ago.
I have this string:
"My name is Marco and I'm from Italy"
I'd like to split it, with the delimiter being is Marco and, so I should get an array with
My name at [0] and
I'm from Italy at [1].
How can I do it with C#?
I tried with:
.Split("is Marco and")
But it wants only a single char.
string[] tokens = str.Split(new[] { "is Marco and" }, StringSplitOptions.None);
If you have a single character delimiter (like for instance ,), you can reduce that to (note the single quotes):
string[] tokens = str.Split(',');
.Split(new string[] { "is Marco and" }, StringSplitOptions.None)
Consider the spaces surronding "is Marco and". Do you want to include the spaces in your result, or do you want them removed? It's quite possible that you want to use " is Marco and " as separator...
You are splitting a string on a fairly complex sub string. I'd use regular expressions instead of String.Split. The later is more for tokenizing you text.
For example:
var rx = new System.Text.RegularExpressions.Regex("is Marco and");
var array = rx.Split("My name is Marco and I'm from Italy");
Try this function instead.
string source = "My name is Marco and I'm from Italy";
string[] stringSeparators = new string[] {"is Marco and"};
var result = source.Split(stringSeparators, StringSplitOptions.None);
You could use the IndexOf method to get a location of the string, and split it using that position, and the length of the search string.
You can also use regular expression. A simple google search turned out with this
using System;
using System.Text.RegularExpressions;
class Program {
static void Main() {
string value = "cat\r\ndog\r\nanimal\r\nperson";
// Split the string on line breaks.
// ... The return value from Split is a string[] array.
string[] lines = Regex.Split(value, "\r\n");
foreach (string line in lines) {
Console.WriteLine(line);
}
}
}
Read C# Split String Examples - Dot Net Pearls and the solution can be something like:
var results = yourString.Split(new string[] { "is Marco and" }, StringSplitOptions.None);
There is a version of string.Split that takes an array of strings and a StringSplitOptions parameter:
http://msdn.microsoft.com/en-us/library/tabh47cf.aspx

C# - Parsing a line of text - what's the best way to do this? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Parse multiple doubles from string in C#
Say I have a line of text that looks as follows:
"45.690 24.1023 .09223 4.1334"
What would be the most efficient way, in C#, to extract just the numbers from this line? The number of spaces between each number varies and is unpredictable from line to line. I have to do this thousands of times, so efficiency is key.
Thanks.
IEnumerable<double> doubles = s.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
.Select<string, double>(double.Parse)
Updated to use StringSplitOptions.RemoveEmptyEntries since the number of spaces varies
Use a Regex split. This will allow you to split on any whitespace of any length between your numbers:
string input = "45.690 24.1023 .09223 4.1334";
string pattern = "\\s*"; // Split on whitepsace
string[] substrings = Regex.Split(input, pattern);
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
I haven't measured, but simplicity is key if you are trying to be efficient so probably something like
var chars = new List<char>();
for( int i =0; i < numChars; ++i )
if( char.IsDigit( text[i] ) )
chars.Add(text[i]);
You want efficient.....
var regex = new Regex(#"([\d\.]+)", RegexOptions.Compiled)
var matches = regex.Matches(input);

Categories