C# Regex.Split and Regular expression

C# Regex.Split and Regular expression - c#

I have string, I need split it two times and select part which goes after special character.
Lets say:
string myString = "Word 2010|82e146e7-bc85-4bd4-a691-23d55c686f4b;#Videos|55140947-00d0-4d75-9b5c-00d8d5ab8436";
string[] guids = Regex.Split(myString,";#");
So here I am getting array of two elements with Value + GUID. But I need only Guids, like:
[0]82e146e7-bc85-4bd4-a691-23d55c686f4b
[1]55140947-00d0-4d75-9b5c-00d8d5ab8436
Any way of doing it in one/two lines?

You can do this but just because you can do it in one line doesn't mean you should (readability comes into play if you get too fancy here). There's obviously no validation here at all.
string myString = "Word 2010|82e146e7-bc85-4bd4-a691-23d55c686f4b;#Videos|55140947-00d0-4d75-9b5c-00d8d5ab8436";
string[] guids = Regex.Split(myString, ";#")
.SelectMany(s => Regex.Split(s, #"\|").Skip(1))
.ToArray();
Assert.AreEqual(2, guids.Length);
Assert.AreEqual("82e146e7-bc85-4bd4-a691-23d55c686f4b", guids[0]);
Assert.AreEqual("55140947-00d0-4d75-9b5c-00d8d5ab8436", guids[1]);

You could easily do this without a regex if the last part of each is always a guid:
string[] guids = String.Split(";").Select(c => c.Substring(c.Length - 36)).ToArray();

string[] guids = myString.Split(';').Select(x => x.Split('|')[1]).ToArray();

string myString = "Word 2010|82e146e7-bc85-4bd4-a691-23d55c686f4b;#Videos|55140947-00d0-4d75-9b5c-00d8d5ab8436";
//split the string by ";#"
string[] results = myString.Split(new string[] { ";#" }, StringSplitOptions.RemoveEmptyEntries);
//remove the "value|" part
results[0] = results[0].Substring(results[0].IndexOf('|') + 1);
results[1] = results[1].Substring(results[1].IndexOf('|') + 1);
//Same as above, but in a for loop. usefull if there are more then 2 guids to find
//for(int i = 0; i < results.Length; i++)
// results[i] = results[i].Substring(results[i].IndexOf('|') + 1);
foreach(string result in results)
Console.WriteLine(result);

var guids = Regex
.Matches(myString, #"HEX{8}-HEX{4}-HEX{4}-HEX{4}-HEX{12}".Replace("HEX", "[A-Fa-f0-9]"))
.Cast<Match>()
.Select(m => m.Value)
.ToArray();

Related

Regex to split by a Targeted String up to a certain character

I have an LDAP Query I need to build the domain.
So, split by "DC=" up to a "comma"
INPUT:
LDAP://DC=SOMETHINGS,DC=ELSE,DC=NET\account
RESULT:
SOMETHING.ELSE.NET

You can do it pretty simple using DC=(\w*) regex pattern.
var str = #"LDAP://DC=SOMETHINGS,DC=ELSE,DC=NET\account";
var result = String.Join(".", Regex.Matches(str, #"DC=(\w*)")
.Cast<Match>()
.Select(m => m.Groups[1].Value));

Without Regex you can do:
string ldapStr = #"LDAP://DC=SOMETHINGS,DC=ELSE,DC=NET\account";
int startIndex = ldapStr.IndexOf("DC=");
int length = ldapStr.LastIndexOf("DC=") - startIndex;
string output = null;
if (startIndex >= 0 && length <= ldapStr.Length)
{
string domainComponentStr = ldapStr.Substring(startIndex, length);
output = String.Join(".",domainComponentStr.Split(new[] {"DC=", ","}, StringSplitOptions.RemoveEmptyEntries));
}
If you are always going to get the string in similar format than you can also do:
string ldapStr = #"LDAP://DC=SOMETHINGS,DC=ELSE,DC=NET\account";
var outputStr = String.Join(".", ldapStr.Split(new[] {"DC=", ",","\\"}, StringSplitOptions.RemoveEmptyEntries)
.Skip(1)
.Take(3));
And you will get:
outputStr = "SOMETHINGS.ELSE.NET"

Parse for words starting with # character in a string

I have to write a program which parses a string for words starting with '#' and return the words along with the # symbol.
I have tried something like:
char[] delim = { '#' };
string[] strArr = commenttext.Split(delim);
return strArr;
But it returns all the words without '#' in an array.
I need something pretty straight forward.No LINQ like things
If the string is "abc #ert #xyz" then I should get back #ert and #xyz.

If you define "word" as "separated by spaces" then this would work:
string[] strArr = commenttext.Split(' ')
.Where(w => w.StartsWith("#"))
.ToArray();
If you need something more complex, a Regular Expression might be more appropriate.
I need something pretty straight forward.No LINQ like things>
The non-Linq equivalent would be:
var words = commenttext.Split(' ');
List<string> temp = new List<string>();
foreach(string w in words)
{
if(w.StartsWith("#"))
temp.Add(w);
}
string[] strArr = temp.ToArray();

If you're against using Linq, which you should not be unless you're required to use older .NET versions, an approach along these lines would suit your needs.
string[] words = commenttext.Split(delimiter);
for (int i = 0; i < words.Length; i++)
{
string word = words[i];
if (word.StartsWith(delimiter))
{
// save in array / list
}
}

const string test = "#Amir abcdef #Stack #C# mnop xyz";
var splited = test.Split(' ').Where(m => m.StartsWith("#")).ToList();
foreach (var b in splited)
{
Console.WriteLine(b.Substring(1, b.Length - 1));
}
Console.ReadKey();

Extracting ONLY number values between specific characters while ignoring string-number combination

I have for example this string,
|SomeText1|123|0#$0#62|SomeText2|456|6#83|SomeText3#61|SomeText1#41|SomeText5#62|SomeText3#82|SomeText9#40|SomeText2#$1#166|SomeText2|999|7#146|SomeText2#167|SomeText2#166|
I want to extract only number values and add them to list and later sum them. That means values,
|123|,|456|, |999|.
All other values like,
|SomeText1|,|SomeText2|,|SomeText2#$1#166|
shouldn't be in list.
I'm working with C#. I tried something like,
int sum = 0;
List<int> results = new List<int>();
Regex regexObj = new Regex(#"\|(.*?)\|");
Match matchResults = regexObj.Match(s);
while (matchResults.Success)
{
results.Add(Convert.ToInt32(matchResults));
matchResults = matchResults.NextMatch();
}
for (int i = 0; i < results.Count; i++)
{
int bonusValues = results[i];
sum = sum + bonusValues;
}
So basic idea is to extract values between | | characters and ignore one that are not pure digits like
|#16543TextBL#aBLa564B|

string input = #"|SomeText1|123|0#$0#62|SomeText2|456|6#83|SomeText3#61|SomeText1#41|SomeText5#62|SomeText3#82|SomeText9#40|SomeText2#$1#166|SomeText2|999|7#146|SomeText2#167|SomeText2#166|";
var numbers = Regex.Matches(input, #"\|(\d+)\|")
.Cast<Match>()
.Select(m => m.Groups[1].Value).ToList();
var sum = numbers.Sum(n => int.Parse(n));

If regex isn't a definite requirement, you could use linq
stringName.Split("|".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.Where(x => x.All(char.IsNumber)).ToList();
With sum
stringName.Split("|".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.Where(x => x.All(char.IsNumber)).Sum(x => int.Parse(x));

you have to tried below menioned code
Regex regexObj = new Regex(#"\d+");
\d represents any digit, + for one or more.
If you want to catch negative numbers as well you can use -?\d+.
Regex regexObj = new Regex(#"-?\d+");
Note that as a string, it should be represented in C# as "\d+", or #"\d+"

if you don't need specific the regex stuff you could easily split the string with:
string[] splits = s.split('|');
then you could loop through this set of strings and try to parse it to an integer. I
for(int i=0; i<splits.size(); i++){
int num;
bool isNum = int.TryParse(splits[i], out num);
if(isNum){
list.add(num);
}
}

Extracting parts of a string c#

In C# what would be the best way of splitting this sort of string?
%%x%%a,b,c,d
So that I end up with the value between the %% AND another variable containing everything right of the second %%
i.e. var x = "x"; var y = "a,b,c,d"
Where a,b,c.. could be an infinite comma seperated list. I need to extract the list and the value between the two double-percentage signs.
(To combat the infinite part, I thought perhaps seperating the string out to: %%x%% and a,b,c,d. At this point I can just use something like this to get X.
var tag = "%%";
var startTag = tag;
int startIndex = s.IndexOf(startTag) + startTag.Length;
int endIndex = s.IndexOf(tag, startIndex);
return s.Substring(startIndex, endIndex - startIndex);
Would the best approach be to use regex or use lots of indexOf and substring to do the extracting based on te static %% characters?

Given that what you want is "x,a,b,c,d" the Split() function is actually pretty powerful and regex would be overkill for this.
Here's an example:
string test = "%%x%%a,b,c,d";
string[] result = test.Split(new char[] { '%', ',' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string s in result) {
Console.WriteLine(s);
}
Basicly we ask it to split by both '%' and ',' and ignore empty results (eg. the result between "%%"). Here's the result:
x
a
b
c
d

To Extract X:
If %% is always at the start then;
string s = "%%x%%a,b,c,d,h";
s = s.Substring(2,s.LastIndexOf("%%")-2);
//Console.WriteLine(s);
Else;
string s = "v,u,m,n,%%x%%a,b,c,d,h";
s = s.Substring(s.IndexOf("%%")+2,s.LastIndexOf("%%")-s.IndexOf("%%")-2);
//Console.WriteLine(s);
If you need to get them all at once then use this;
string s = "m,n,%%x%%a,b,c,d";
var myList = s.ToArray()
.Where(c=> (c != '%' && c!=','))
.Select(c=>c).ToList();

This'll let you do it all in one go:
string pattern = "^%%(.+?)%%(?:(.+?)(?:,|$))*$";
string input = "%%x%%a,b,c,d";
Match match = Regex.Match(input, pattern);
if (match.Success)
{
// "x"
string first = match.Groups[1].Value;
// { "a", "b", "c", "d" }
string[] repeated = match.Groups[2].Captures.Cast<Capture>()
.Select(c => c.Value).ToArray();
}

You can use the char.IsLetter to get all the list of letter
string test = "%%x%%a,b,c,d";
var l = test.Where(c => char.IsLetter(c)).ToArray();
var output = string.Join(", ", l.OrderBy(c => c));

Since you want the value between the %% and everything after in separate variables and you don't need to parse the CSV, I think a RegEx solution would be your best choice.
var inputString = #"%%x%%a,b,c,d";
var regExPattern = #"^%%(?<x>.+)%%(?<csv>.+)$";
var match = Regex.Match(inputString, regExPattern);
foreach (var item in match.Groups)
{
Console.WriteLine(item);
}
The pattern has 2 named groups called x and csv, so rather than just looping, you can easily reference them by name and assign them to values:
var x = match.Groups["x"];
var y = match.Groups["csv"];

C# Linq non-vowels

From the given string
(i.e)
string str = "dry sky one two try";
var nonVowels = str.Split(' ').Where(x => !x.Contains("aeiou")); (not working).
How can i extract non-vowel words?

Come on now y'all. IndexOfAny is where it's at. :)
// if this is public, it's vulnerable to people setting individual elements.
private static readonly char[] Vowels = "aeiou".ToCharArray();
// C# 3
var nonVowelWorks = str.Split(' ').Where(word => word.IndexOfAny(Vowels) < 0);
// C# 2
List<string> words = new List<string>(str.Split(' '));
words.RemoveAll(delegate(string word) { return word.IndexOfAny(Vowels) >= 0; });

This should work:
var nonVowels = str.Split(' ').Where(x => x.Intersect("aeiou").Count() == 0);
String.Contains requires you to pass a single char. Using Enumerable.Contains would only work for a single char, as well - so you'd need multiple calls. Intersect should handle this case.

Something like:
var nonVowelWords = str.Split(' ').Where(x => Regex.Match(x, #"[aeiou]") == null);

string str = "dry sky one two try";
var nonVowels = str.ToCharArray()
.Where(x => !new [] {'a', 'e', 'i', 'o', 'u'}.Contains(x));

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Regex.Split and Regular expression - c#

You could easily do this without a regex if the last part of each is always a guid: string[] guids = String.Split(";").Select(c => c.Substring(c.Length - 36)).ToArray();

string[] guids = myString.Split(';').Select(x => x.Split('|')[1]).ToArray();

var guids = Regex .Matches(myString, #"HEX{8}-HEX{4}-HEX{4}-HEX{4}-HEX{12}".Replace("HEX", "[A-Fa-f0-9]")) .Cast<Match>() .Select(m => m.Value) .ToArray();

Related

Regex to split by a Targeted String up to a certain character

Parse for words starting with # character in a string

Extracting ONLY number values between specific characters while ignoring string-number combination

Extracting parts of a string c#

C# Linq non-vowels

Categories

Resources