Split string pattern

Split string pattern - c#

I have a string that I need to split in an array of string. All the values are delimited by a pipe | and are separated by a comma.
|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||
The array should have the following 8 values after the split
111
2,2
room 1
13'2'' x 13'8''
""
""
""
""
by "" I simply mean an empty string. Please note that the value can also have a comma e.g 2,2. I think probably the best way to do this is through Regex.Split but I am not sure how to write the correct regular expression. Any suggestions or any better way of achieving this will be really appreciated.

You can use Match() to get the values instead of split() as long as the values between the pipe characters don't contain the pipe character itself:
(?<=\|)[^|]*(?=\|)
This will match zero or more non-pipe characters [^|]* which are preceded (?<=\|) and followed by a pipe (?=\|).
In C#:
var input = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var results = Regex.Matches(input, #"(?<=\|)[^|]*(?=\|)");
foreach (Match match in results)
Console.WriteLine("Found '{0}' at position {1}",
match.Value, match.Index);
EDIT: Since commas always separate the values that are between pipe characters | then we can be sure that the commas used as separators will always appear at odd intervals, so we can only walk the even indexes of the array to get the true values like this:
var input = "|room 1|,|,|,||,||,||,||,||,||";
var results = Regex.Matches(input, #"(?<=\|)[^|]*(?=\|)");
for (int i = 0; i < results.Count; i+=2)
Console.WriteLine("Found '{0}'", results[i].Value);
This can be also used in the first example above.

Assuming all fields are enclosed by a pipe and delimited by a comma you can use |,| as the delimiter, removing the leading and trailing |
Dim data = "|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||"
Dim delim = New String() {"|,|"}
Dim results = data.Substring(1, data.Length - 2).Split(delim, StringSplitOptions.None)
For Each s In results
Console.WriteLine(s)
Next
Output:
111
2,2
room 1
13'2'' x 13'8''
""
""
""
""

No need to use a regex, remove the pipes and split the string on the comma:
var input = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var parts = input.Split(',').Select(x => x.Replace("|", string.Empty));
or
var parts = input.Replace("|", string.Empty).Split(',');
EDIT: OK, in that case, use a while loop to parse the string:
var values = new List<string>();
var str = #"|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";;
while (str.Length > 0)
{
var open = str.IndexOf('|');
var close = str.IndexOf('|', open + 1);
var value = str.Substring(open + 1, open + close - 1);
values.Add(value);
str = open + close < str.Length - 1
? str.Substring(open + close + 2)
: string.Empty;
}

You could try this:
string a = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
string[] result = a.Split('|').Where(s => !s.Contains(",")).Select(s => s.Replace("|",String.Empty)).ToArray();

mmm maybe this work for you:
var data = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var resultArray = data.Replace("|", "").Split(',');
Regards.,
k
EDIT: You can use wildcard
string data = "|111|,|2,2|,|,3|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var resultArray = data.Replace("|,|", "¬").Replace("|", "").Split('¬');
Regards.,
k

Check, if this fits your needs...
var str = "|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
//Iterate through all your matches (we're looking for anything between | and |, non-greedy)
foreach (Match m in Regex.Matches(str, #"\|(.*?)\|"))
{
//Groups[0] is entire match, with || symbols, but [1] - something between ()
Console.WriteLine(m.Groups[1].Value);
}
Though, to find anything between | and |, you might and probably should use [^\|] instead of . character.
At least, for specified use case it gives the result you're expecting.

Related

Splitting a string at first number and then returning 2 strings

Having some trouble adapting my splitting of a string into 2 parts to do it from the first number. It's currently splitting on the first space, but that won't work long term because cities have spaces in them too.
Current code:
var string = "Chicago 1234 Anytown, NY"
var commands = parameters.Split(new[] { ' ' }, 2);
var originCity = commands[0];
var destination = commands[1];
This works great for a city that has a single name, but I break on:
var string = "Los Angeles 1234 Anytown, NY"
I've tried several different approaches that I just haven't been able to work out. Any ideas on being able to return 2 strings as the following:
originCity = Los Angeles
destination = 1234 Anytown, NY

You can't use .Split() for this.
Instead, you need to find the index of the first number. You can use .indexOfAny() with an array of numbers (technically a char[] array) to do this.
int numberIndex = address.IndexOfAny("0123456789".ToCharArray())
You can then capture two substrings; One before the index, the other after.
string before = line.Substring(0, numberIndex);
string after = line.Substring(numberIndex);

You could use Regex. In the following, match is the first match in the regex results.
var match = Regex.Match(s, "[0-9]");
if (match.Success)
{
int index = match.Index;
originCity = s.Substring(0, index);
destination = s.Substring(index, s.Length - index);
}
Or you can do it yourself:
int index = 0;
foreach (char c in s)
{
int result;
if (int.TryParse(c, out result))
{
index = result;
break;
}
//or if (char.IsDigit()) { index = int.Parse(c); break; }
}
...

You should see if using a regular expression will do what you need here. At least with the sample data you're showing, the expression:
(\D+)(\d+)(\D+)
would group the results into non-numeric characters up to the first numeric character, the numeric characters until a non-numeric is encountered, and then the rest of the non-numeric characters. Here is how it would be used in code:
var pattern = #"(\D+)(\d+)(\D+)";
var input = "Los Angeles 1234 Anytown, NY";
var result = Regex.Match(input, pattern);
var city = result.Groups[1];
var destination = $"{result.Groups[2]} {result.Groups[3]}";
This falls apart in cases like 29 Palms, California or if the numbers would contain comma, decimal, etc so it is certainly not a silver bullet but I don't know your data and it may be ok for such a simple solution.

C# regular expression

I have string like this:
{F971h}[0]<0>some result code: 1
and I want to split it into:
F971
0
0
some result code: 1
I know I can first split "{|}|[|]|<|>" it into:
{F971h}
[0]
<0>
some result code: 1
and next: {F971h} -> F971; [0] -> 0; etc.
But how can I do it with one regular expression?
I try somethink like this:
Regex rgx = new Regex(#"(?<timestamp>[0-9A-F]+)" + #"(?<subsystem>\d+)" + #"(?<level>\d+)" + #"(?<messagep>[0-9A-Za-z]+)");
var result = rgx.Matches(input);

You can try just Split without any regular expressions:
string source = "{F971h}[0]<0>some result code: 1";
string[] items = source.Split(new char[] { '{', '}', '[', ']', '<', '>' },
StringSplitOptions.RemoveEmptyEntries);
Test:
// F971h
// 0
// 0
// some result code: 1
Console.Write(String.Join(Environment.NewLine, items));

There are two issues with your regex:
You do not allow lowercase ASCII letters in the first capture group (add a-z or a RegexOptions.IgnoreCase flag)
The delimiting characters are missing in the pattern (<, >, [, ], etc.)
Use
{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<messagep>.+)
^ ^^^ ^^^ ^^ ^
See the regex demo
Since the messagep group should match just the rest of the line, I suggest just using .+ at the end. Else, you'd need to replace your [0-9A-Za-z]+ that does not allow whitespace with something like [\w\s]+ (match all word chars and whitespaces, 1 or more times).
C# code:
var s = #"{F971h}[0]<0>some result code: 1";
var pat = #"{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<messagep>.+)";
var m = Regex.Match(s, pat);
if (m.Success)
{
Console.Out.WriteLine(m.Groups["timestamp"].Value);
Console.Out.WriteLine(m.Groups["subsystem"].Value);
Console.Out.WriteLine(m.Groups["level"].Value);
Console.Out.WriteLine(m.Groups["messagep"].Value);
}
Or for a multiline string containing multiple matches:
var s = "{F971h}[0]<0>some result code: 1\r\n{FA71h}[0]<0>some result code: 3\r\n{FB72h}[0]<0>some result code: 5";
var pat = #"{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<messagep>[^\r\n]+)";
var res = System.Text.RegularExpressions.Regex.Matches(s, pat)
.Cast<System.Text.RegularExpressions.Match>()
.Select(x => new[] {
x.Groups["timestamp"].Value,
x.Groups["subsystem"].Value,
x.Groups["level"].Value,
x.Groups["messagep"].Value})
.ToList();

You can get it like that:
string line = #"{F971h}[0]<0>some result code: 1";
var matchCollection = Regex.Matches(line, #"\{(?<timestamp>.*?)\}\[(?<subsystem>.*?)\]<(?<level>.*?)>(?<messagep>.*)");
if (matchCollection.Count > 0)
{
string timestamp = matchCollection[0].Groups["timestamp"].Value;
string subsystem = matchCollection[0].Groups["subsystem"].Value;
string level = matchCollection[0].Groups["level"].Value;
string messagep = matchCollection[0].Groups["messagep"].Value;
Console.Out.WriteLine("First part is {0}, second: {1}, thrid: {2}, last: {3}", timestamp, subsystem, level, messagep);
}
else
{
Console.Out.WriteLine("No match found.");
}
You can watch it live here on regex storm. You'll have to learn about:
Named capture groups
Repetitions

Thank you all! Code below works for me. I missed that it can be multiple string:
{F971h}[0]<0>some result code: 1\r\n{FA71h}[0]<0>some result code: 3\r\n{FB72h}[0]<0>some result code: 5
code:
var pat = #"{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<message>.+)";
var collection = Regex.Matches(input, pat);
foreach (Match m in collection)
{
var timestamp = m.Groups["timestamp"];
var subsystem = m.Groups["subsystem"];
var level = m.Groups["level"];
var message = m.Groups["message"];
}

Extract multiple white spaces from string

I want to get white spaces which are greater than 1 space long.
The following gets me the null chars between each letter, and also the white spaces. However I only want to extract the two white spaces string between c and d, and the 3 white spaces string between f and g.
string b = "ab c def gh";
List<string> c = Regex.Split(b, #"[^\s]").ToList();
UPDATE:
The following works, but I'm looking for a more elegant way of achieving this:
c.RemoveAll(x => x == "" || x == " ");
The desired result would be a List<string> containing " " and " "

If you want List<String> as a result you could execute this Linq query
string b = "ab c def gh";
List<String> c = Regex
.Matches(b, #"\s{2,}")
.OfType<Match>()
.Select(match => match.Value)
.ToList();

This should give you your desired List.
string b = "ab c def gh";
var regex = new Regex(#"\s\s+");
var result = new List<string>();
foreach (Match m in regex.Matches(b))
result.Add(m.Value);

If all you are interested in are these groups of whitespaces, you could use
foreach(var match in Regex.Matches(b, #"\s\s+")) {
// ... do something with match
}
This guarantees that you will match at least 2 whitespaces.

Rather than splitting using a Regex, try using Regex.Matches to get all items matching your pattern - in this case I've used a pattern to match two or more whitespace characters, which I think is what you want?
var matchValues = Regex.Matches("ab c def gh", "\\s\\s+")
.OfType<Match>().Select(m => m.Value).ToList();
Annoyingly, the MatchCollection returned by Regex.Matches isn't IEnumerable<Match>, hence the need to use OfType<> in the LINQ expression.

You can use the following single line :
var list =Regex.Matches(value,#"[ ]{2,}").Cast<Match>().Select(match => match.Value).ToList();
Hope it will help you.

How to trim characters from certain patterned words in string?

Given the following string:
string s = "I need drop the 1 from the end of AAAAAAAA1 and BBBBBBBB1"
How can I trim the "1" from any 8 character string that ends in a 1? I got so far as to find a working Regex pattern that finds these strings, and I'm guessing I could use a TrimEnd to remove the "1", but how I do I modify the string itself?
Regex regex = new Regex("\\w{8}1");
foreach (Match match in regex.Matches(s))
{
MessageBox.Show(match.Value.TrimEnd('1'));
}
The result I'm looking for would be "I need drop the 1 from the end of AAAAAAAA and BBBBBBBB"

Regex.Replace is the tool for the job:
var regex = new Regex("\\b(\\w{8})1\\b");
regex.replace(s, "$1");
I slightly modified the regular expression to match the description of what you are trying to do more closely.

Here a non-regex approach:
s = string.Join(" ", s.Split().Select(w => w.Length == 9 && w.EndsWith("1") ? w.Substring(0, 8) : w));

In VB with LINQ:
Dim l = 8
Dim s = "I need drop the 1 from the end of AAAAAAAA1 and BBBBBBBB1"
Dim d = s.Split(" ").Aggregate(Function(p1, p2) p1 & " " & If(p2.Length = l + 1 And p2.EndsWith("1"), p2.Substring(0, p2.Length - 1), p2))

Try this:
s = s.Replace(match.Value, match.Value.TrimEnd('1'));
And the s string will have the value you want.

Extracting string between two characters?

I want to extract email id between < >
for example.
input string : "abc" <abc#gmail.com>; "pqr" <pqr#gmail.com>;
output string : abc#gmail.com;pqr#gmail.com

Without regex, you can use this:
public static string GetStringBetweenCharacters(string input, char charFrom, char charTo)
{
int posFrom = input.IndexOf(charFrom);
if (posFrom != -1) //if found char
{
int posTo = input.IndexOf(charTo, posFrom + 1);
if (posTo != -1) //if found char
{
return input.Substring(posFrom + 1, posTo - posFrom - 1);
}
}
return string.Empty;
}
And then:
GetStringBetweenCharacters("\"abc\" <abc#gmail.com>;", '<', '>')
you will get
abc#gmail.com

string input = #"""abc"" <abc#gmail.com>; ""pqr"" <pqr#gmail.com>;";
var output = String.Join(";", Regex.Matches(input, #"\<(.+?)\>")
.Cast<Match>()
.Select(m => m.Groups[1].Value));

Tested
string input = "\"abc\" <abc#gmail.com>; \"pqr\" <pqr#gmail.com>;";
matchedValuesConcatenated = string.Join(";",
Regex.Matches(input, #"(?<=<)([^>]+)(?=>)")
.Cast<Match>()
.Select(m => m.Value));
(?<=<) is a non capturing look behind so < is part of the search but not included in the output
The capturing group is anything not > one or more times
Can also use non capturing groups #"(?:<)([^>]+)(?:>)"
The answer from LB +1 is also correct. I just did not realize it was correct until I wrote an answer myself.

Use the String.IndexOf(char, int) method to search for < starting at a given index in the string (e.g. the last index that you found a > character at, i.e. at the end of the previous e-mail address - or 0 when looking for the first address).
Write a loop that repeats for as long as you find another < character, and everytime you find a < character, look for the next > character. Use the String.Substring(int, int) method to extract the e-mail address whose start and end position is then known to you.

Could use the following regex and some linq.
var regex = new Regex(#"\<(.*?)\>");
var input= #"""abc"" <abc#gmail.com>; ""pqr"" <pqr#gmail.com>";
var matches = regex.Matches(input);
var res = string.Join(";", matches.Cast<Match>().Select(x => x.Value.Replace("<","").Replace(">","")).ToArray());
The <> brackets get removed afterwards, you could also integrate it into Regex I guess.

string str = "\"abc\" <abc#gmail.com>; \"pqr\" <pqr#gmail.com>;";
string output = string.Empty;
while (str != string.Empty)
{
output += str.Substring(str.IndexOf("<") + 1, str.IndexOf(">") -1);
str = str.Substring(str.IndexOf(">") + 2, str.Length - str.IndexOf(">") - 2).Trim();
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Split string pattern - c#

You could try this: string a = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||"; string[] result = a.Split('|').Where(s => !s.Contains(",")).Select(s => s.Replace("|",String.Empty)).ToArray();

Related

Splitting a string at first number and then returning 2 strings

C# regular expression

Extract multiple white spaces from string

How to trim characters from certain patterned words in string?

Extracting string between two characters?

Categories

Resources