Splitting data with Regex (Regular Expressions) - c#

I would need some help with matching data in this example string:
req:{REQUESTER_NAME},key:{abc},act:{UPDATE},sku:{ABC123,DEF-123},qty:{10,5}
Essentially, every parameter is separated by "," but it is also included within {} and I need some help with regex as I am not that good with it.
Desired Output:
req = "REQUESTER_NAME"
key = "abc"
act = "UPDATE"
sku[0] = "ABC123"
sku[1] = "DEF-123"
qty[0] = 10
qty[1] = 5

I would suggest you do the following
Use String Split with ',' character as the separator (eg output req:{REQUESTER_NAME})
With each pair of data, do String Split with ';' character as the separator (eg output "req", "{REQUESTER_NAME}")
Do a String Replace for characters '{' and '}' with "" (eg output REQUESTER_NAME)
Do a String Split again with ',' character as separator (eg output "ABC123", "DEF-123")
That should parse it for you perfectly. You can store the results into your data structure as the results come in. (Eg. You can store the name at step 2 whereas the value for some might be available at Step 3 and for others at Step 4)
Hope That Helped
Note:
- If you don't know string split - http://www.dotnetperls.com/split-vbnet
- If you don't know string replace - http://www.dotnetperls.com/replace-vbnet

The below sample may helps to solve your problem. But here lot of string manipulations are there.
string input = "req:{REQUESTER_NAME},key:{abc},act:{UPDATE},sku:{ABC123,DEF-123},qty:{10,5}";
Console.WriteLine(input);
string[] words = input.Split(new string[] { "}," }, StringSplitOptions.RemoveEmptyEntries);
foreach (string item in words)
{
if (item.Contains(':'))
{
string modifiedString = item.Replace(",", "," + item.Substring(0, item.IndexOf(':')) + ":");
string[] wordsColl = modifiedString.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string item1 in wordsColl)
{
string finalString = item1.Replace("{", "");
finalString = finalString.Replace("}", "");
Console.WriteLine(finalString);
}
}
}

First, use Regex.Matches to get the parameters inside { and }.
string str = "req:{REQUESTER_NAME},key:{abc},act:{UPDATE},sku:{ABC123,DEF-123},qty:{10,5}";
MatchCollection matches = Regex.Matches(str,#"\{.+?\}");
string[] arr = matches.Cast<Match>()
.Select(m => m.Groups[0].Value.Trim(new char[]{'{','}',' '}))
.ToArray();
foreach (string s in arr)
Console.WriteLine(s);
output
REQUESTER_NAME
abc
UPDATE
ABC123,DEF-123
10,5
then use Regex.Split to get the parameter names
string[] arr1 = Regex.Split(str,#"\{.+?\}")
.Select(x => x.Trim(new char[]{',',':',' '}))
.Where(x => !string.IsNullOrEmpty(x)) //need this to get rid of empty strings
.ToArray();
foreach (string s in arr1)
Console.WriteLine(s);
output
req
key
act
sku
qty
Now you can easily traverse through the parameters. something like this
for(int i=0; i<arr.Length; i++)
{
if(arr1[i] == "req")
//arr[i] contains req parameters
else if(arr1[i] == "sku")
//arr[i] contains sku parameters
//use string.Split(',') to get all the sku paramters and process them
}

Kishore's answer is correct. This extension method may help implement that suggestion:
<Extension()>
Function WideSplit(InputString As String, SplitToken As String) As String()
Dim aryReturn As String()
Dim intIndex As Integer = InputString.IndexOf(SplitToken)
If intIndex = -1 Then
aryReturn = {InputString}
Else
ReDim aryReturn(1)
aryReturn(0) = InputString.Substring(0, intIndex)
aryReturn(1) = InputString.Substring(intIndex + SplitToken.Length)
End If
Return aryReturn
End Function
If you import System.Runtime.CompilerServices, you can use it like this:
Dim stringToParse As String = "req:{REQUESTER_NAME},key:{abc},act:{UPDATE},sku:{ABC123,DEF-123},qty:{10,5}"
Dim strTemp As String
Dim aryTemp As String()
strTemp = stringToParse.WideSplit("req:{")(1)
aryTemp = strTemp.WideSplit("},key:{")
req = aryTemp(0)
aryTemp = aryTemp(1).WideSplit("},act:{")
key = aryTemp(0)
'etc...
You may be able do this more memory efficiently, though, as this method creates a number of temporary string allocations.

Kishore's solution is perfect, but here is another solution that works with regex:
Dim input As String = "req:{REQUESTER_NAME},key:{abc},act:{UPDATE},sku:{ABC123,DEF-123},qty:{10,5}"
Dim Array = Regex.Split(input, ":{|}|,")
This does essentially the same, it uses regex to split on :{, } and ,. The solution might be a bit shorter though. The values will be put into the array like this:
"req", "REQUESTER_NAME","", ... , "qty", "10", "5", ""
Notice after the parameter and its value(s) there will be an empty string in the array. When looping over the array you can use this to let the program know when a new parameter starts. Then you can create a new array/data structure to store its values.

Related

how to get before and after text in the string which contains 'Or' in it

I want a code that takes a string with 'Or' in it and takes text before and after 'Or' and stores it in seperate variable
I tried the substring function
var text = "Actor or Actress";
var result= text.Substring(0, text.LastIndexOf("or"));
but with this getting only actor I want actor as well as actress but in seperate variables as a whole word so it can be anything in place of 'actor or actress'
You need to use one of the flavors of String.Split that accepts an array of string delimiters:
string text = "Actor or Actress";
string[] delim = new string[] { " or " }; // add spaces around to avoid spliting `Actor` due to the `or` in the end
string[] elements = text.Split(delim, StringSplitOptions.None);
foreach (string elem in elements)
{
Console.WriteLine(elem);
}
Output:
Actor
Actress
Note: I am using .NET framework 4.8, but .NET 6 also has an overload of String.Split that accepts a single string delimiter so there's no need to create delim as an array of strings, unless you want to be able to split based on variations like " Or "," or ".
Spli() should do the job
text.Split(“or”);
use the Split() method
var text = "Actor or Actress";
Console.WriteLine(text.Split("or")[0]);
Console.WriteLine(text.Split("or")[1]);
Output
Actor
Actress
try this
string[] array = text.Split(' ');
foreach (string item in array)
{
if (item != "or")
{
Console.WriteLine(item);
}
}
Output
Actor
Actress

Split a string by a newline in C#

I have a string like this :
SITE IÇINDE OLMASI\nLÜKS INSAA EDILMIS OLMASI\nSITE IÇINDE YÜZME HAVUZU,
VB. SOSYAL YASAM ALANLARININ OLMASI.\nPROJESİNE UYGUN YAPILMIŞ OLMASI
I'm trying to split and save this string like this :
array2 = mystring.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
foreach (var str in sarray2)
{
if (str != null && str != "")
{
_is.RelatedLook.InternalPositive += str;
}
}
I also tried
Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
This obviously doesn't split my string. How can I split my string in a correct way? Thanks
var result = mystring.Split(new string[] {"\\n"}, StringSplitOptions.None);
Since the new line is glued to the words in your case, you have to use an additional back-slash.
In linqpad I was able to get it split
var ug = "SITE IÇINDE OLMASI\nLÜKS INSAA EDILMIS OLMASI\nSITE IÇINDE YÜZME HAVUZU, VB. SOSYAL YASAM ALANLARININ OLMASI.\nPROJESİNE UYGUN YAPILMIŞ OLMASI";
var test = ug.Split('\n');
test.Dump();
Convert the Literal character sequence for a new line to a string, and split by that - i.e.
string clipboardText = Clipboard.GetText();
string[] seperatingTags = { Environment.NewLine.ToString() };
List<string> Lines = clipboardText.Split(seperatingTags, StringSplitOptions.RemoveEmptyEntries).ToList();
Split by a new line is very tricky since it is not consistent and you can have multiple lines or different combinations of splitting characters. I have tried many methods including some in this thread, but finally, I came up with a solution of my own which seems to fix all the cases I came across.
I am using Regex.Split with some cleaning as follows (I have wrapped it in extension method)
public static IEnumerable<string> SplitByLine(this string str)
{
return Regex
.Split(str, #"((\r)+)?(\n)+((\r)+)?")
.Select(i => i.Trim())
.Where(i => !string.IsNullOrEmpty(i));
}
usage
var lines = "string with\nnew lines\r\n\n\n with all kind of weird com\n\r\r\rbinations".SplitByLine();

How to split a defined string from another string and get the first item after split

I have this string D:\ASN\Documents\ENU\LO\ANL\File\05003ede-59bf-45c6-bb57-a6111e9f18e0\linux-cheat-sheet.pdf and I want to exclude this string D:\ASN\Documents\ENU\LO from the above string and then get the first string(in this case ANL)after the split.
I tried something like this:
string fullpath = "D:\\ASN\\Documents\\ENU\\LO\\ANL\\File\\05003ede-59bf-45c6-bb57-a6111e9f18e0\\linux-cheat-sheet.pdf"
string[] sep = new string[]{"D:\\ASN\\Documents\\ENU\\LO"};
string [] result = fullpath.split(stringSeparators, StringSplitOptions.RemoveEmptyEntries);
foreach (string s in result)
{
Console.Write(s.Substring(s.IndexOf(#"\") + 1));
}
But this is giving me ANL\File\05003ede-59bf-45c6-bb57-a6111e9f18e0\linux-cheat-sheet.pdf". Instead I need only ANL. How can this be achieved? Is there any other way to get this instead of this way.
TIA
var result = fullpath.Replace(samplePath, "").Split('\\')[1];
You can replace the first part (samplePath) with nothing, removing it (or you could use Substring to get the second part of the fullPath, counting the characters of samplePath), and then Split the result on '\', getting the second occurrence, which is the result you expect.
Here's a working version: https://dotnetfiddle.net/k4tfGP
Can you do a split on the second string when it sees the \?
String sampleString = "ANL\File\05003ede-59bf-45c6-bb57-a6111e9f18e0\linux-cheat-sheet.pdf"";
String[] stringArray = sampleString.split("\");
String wantedString = stringArray[0];
This is not what split() is intended for. split() is generally used to divide your string into multiple sections based on a separator. In your case, you might have wanted to use it to separate the sub-folders by splitting on '\'.
But you want something else -- to remove a section of text. If you know that the text will always be at the start, try this:
string result = fullpath.Substring("D:\\ASN\\Documents\\ENU\\LO".Length);
This will return the original string, minus the first X characters, where X is exactly the length of the string you want to remove.
string fullpath =
"D:\\ASN\\Documents\\ENU\\LO\\ANL\\File\\05003ede-59bf-45c6-bb57-a6111e9f18e0\\linux-cheat-sheet.pdf";
string[] sep = new string[] {"D:\\ASN\\Documents\\ENU\\LO"};
string[] result = fullpath.Split(sep, StringSplitOptions.RemoveEmptyEntries);
foreach (string s in result)
{
Console.Write(s.Substring(s.IndexOf(#"\") + 1, s.IndexOf(#"\", 2) - 1));
}
String.IndexOf will get you the index of the first, but has overloads giving a starting point. So In this example, I have given the starting point as "2" as your path contains "\\" always.
string basepath = "D:\\ASN\\Documents\\ENU\\LO\\";
string fullpath = "D:\\ASN\\Documents\\ENU\\LO\\ANL\\File\\05003ede-59bf-45c6-bb57-a6111e9f18e0\\linux-cheat-sheet.pdf";
fullpath = fullpath.Replace(basepath, "");
string returnValue = fullpath.Remove(fullpath.IndexOf("\\"), fullpath.Length-fullpath.IndexOf("\\"));
Worked here...

C# how to split a string backwards?

What i'm trying to do is split a string backwards. Meaning right to left.
string startingString = "<span class=\"address\">Hoopeston,, IL 60942</span><br>"
What I would do normally is this.
string[] splitStarting = startingString.Split('>');
so my splitStarting[1] would = "Hoopeston,, IL 60942</span"
then I would do
string[] splitAgain = splitStarting[1].Split('<');
so splitAgain[0] would = "Hoopeston,, IL 60942"
Now this is what I want to do, I want to split by ' ' (a space) reversed for the last 2 instances of ' '.
For example my array would come back like so:
[0]="60942"
[1]="IL"
[2] = "Hoopeston,,"
To make this even harder I only ever want the first two reverse splits, so normally I would do something like this
string[] splitCity,Zip = splitAgain[0].Split(new char[] { ' ' }, 3);
but how would you do that backwards? The reason for that is, is because it could be a two name city so an extra ' ' would break the city name.
Regular expression with named groups to make things so much simpler. No need to reverse strings. Just pluck out what you want.
var pattern = #">(?<city>.*) (?<state>.*) (?<zip>.*?)<";
var expression = new Regex(pattern);
Match m = expression .Match(startingString);
if(m.success){
Console.WriteLine("Zip: " + m.Groups["zip"].Value);
Console.WriteLine("State: " + m.Groups["state"].Value);
Console.WriteLine("City: " + m.Groups["city"].Value);
}
Should give the following results:
Found 1 match:
1. >Las Vegas,, IL 60942< has 3 groups:
1. Las Vegas,, (city)
2. IL (state)
3. 60942 (zip)
String literals for use in programs:
C#
#">(?<city>.*) (?<state>.*) (?<zip>.*?)<"
One possible solution - not optimal but easy to code - is to reverse the string, then to split that string using the "normal" function, then to reverse each of the individual split parts.
Another possible solution is to use regular expressions instead.
I think you should do it like this:
var s = splitAgain[0];
var zipCodeStart = s.LastIndexOf(' ');
var zipCode = s.Substring(zipCodeStart + 1);
s = s.Substring(0, zipCodeStart);
var stateStart = s.LastIndexOf(' ');
var state = s.Substring(stateStart + 1);
var city = s.Substring(0, stateStart );
var result = new [] {zipCode, state, city};
Result will contain what you requested.
If Split could do everything there would be so many overloads that it would become confusing.
Don't use split, just custom code it with substrings and lastIndexOf.
string str = "Hoopeston,, IL 60942";
string[] parts = new string[3];
int place = str.LastIndexOf(' ');
parts[0] = str.Substring(place+1);
int place2 = str.LastIndexOf(' ',place-1);
parts[1] = str.Substring(place2 + 1, place - place2 -1);
parts[2] = str.Substring(0, place2);
You can use a regular expression to get the three parts of the string inside the tag, and use LINQ extensions to get the strings in the right order.
Example:
string startingString = "<span class=\"address\">East St Louis,, IL 60942</span><br>";
string[] city =
Regex.Match(startingString, #"^.+>(.+) (\S+) (\S+?)<.+$")
.Groups.Cast<Group>().Skip(1)
.Select(g => g.Value)
.Reverse().ToArray();
Console.WriteLine(city[0]);
Console.WriteLine(city[1]);
Console.WriteLine(city[2]);
Output:
60942
IL
East St Louis,,
How about
using System.Linq
...
splitAgain[0].Split(' ').Reverse().ToArray()
-edit-
ok missed the last part about multi word cites, you can still use linq though:
splitAgain[0].Split(' ').Reverse().Take(2).ToArray()
would get you the
[0]="60942"
[1]="IL"
The city would not be included here though, you could still do the whole thing in one statement but it would be a little messy:
var elements = splitAgain[0].Split(' ');
var result = elements
.Reverse()
.Take(2)
.Concat( new[ ] { String.Join( " " , elements.Take( elements.Length - 2 ).ToArray( ) ) } )
.ToArray();
So we're
Splitting the string,
Reversing it,
Taking the two first elements (the last two originally)
Then we make a new array with a single string element, and make that string from the original array of elements minus the last 2 elements (Zip and postal code)
As i said, a litle messy, but it will get you the array you want. if you dont need it to be an array of that format you could obviously simplfy the above code a little bit.
you could also do:
var result = new[ ]{
elements[elements.Length - 1], //last element
elements[elements.Length - 2], //second to last
String.Join( " " , elements.Take( elements.Length - 2 ).ToArray( ) ) //rebuild original string - 2 last elements
};
At first I thought you should use Array.Reverse() method, but I see now that it is the splitting on the ' ' (space) that is the issue.
Your first value could have a space in it (ie "New York"), so you dont want to split on spaces.
If you know the string is only ever going to have 3 values in it, then you could use String.LastIndexOf(" ") and then use String.SubString() to trim that off and then do the same again to find the middle value and then you will be left with the first value, with or without spaces.
Was facing similar issue with audio FileName conventions.
Followed this way: String to Array conversion, reverse and split, and reverse each part back to normal.
char[] addressInCharArray = fullAddress.ToCharArray();
Array.Reverse(addressInCharArray);
string[] parts = (new string(addressInCharArray)).Split(new char[] { ' ' }, 3);
string[] subAddress = new string[parts.Length];
int j = 0;
foreach (string part in parts)
{
addressInCharArray = part.ToCharArray();
Array.Reverse(addressInCharArray);
subAddress[j++] = new string(addressInCharArray);
}

C# Path operations

I need to get
"first_level" and "second_level\third_level" from the original path "first_level\second_level\third_level", something that splits the path into two part by the first separator. Is there any C# method in .net library that does that?
Use the Split overload that takes a count for the maximum number of substrings to return:
string input = #"first_level\second_level\third_level";
string[] result = input.Split(new[] { '\\' }, 2);
foreach (string s in result)
Console.WriteLine(s);
// result[0] = "first_level"
// result[1] = "second_level\third_level"
string myPath = #"first_level\second_level\third_level";
string[] levels = myPath.Split('\\');
and
level[0] will be equal to first_level
level[2] will be equal to second_level
level[3] will be equal to third_level
you asking this?

Categories