Regex to split by a Targeted String up to a certain character - c#

I have an LDAP Query I need to build the domain.
So, split by "DC=" up to a "comma"
INPUT:
LDAP://DC=SOMETHINGS,DC=ELSE,DC=NET\account
RESULT:
SOMETHING.ELSE.NET

You can do it pretty simple using DC=(\w*) regex pattern.
var str = #"LDAP://DC=SOMETHINGS,DC=ELSE,DC=NET\account";
var result = String.Join(".", Regex.Matches(str, #"DC=(\w*)")
.Cast<Match>()
.Select(m => m.Groups[1].Value));

Without Regex you can do:
string ldapStr = #"LDAP://DC=SOMETHINGS,DC=ELSE,DC=NET\account";
int startIndex = ldapStr.IndexOf("DC=");
int length = ldapStr.LastIndexOf("DC=") - startIndex;
string output = null;
if (startIndex >= 0 && length <= ldapStr.Length)
{
string domainComponentStr = ldapStr.Substring(startIndex, length);
output = String.Join(".",domainComponentStr.Split(new[] {"DC=", ","}, StringSplitOptions.RemoveEmptyEntries));
}
If you are always going to get the string in similar format than you can also do:
string ldapStr = #"LDAP://DC=SOMETHINGS,DC=ELSE,DC=NET\account";
var outputStr = String.Join(".", ldapStr.Split(new[] {"DC=", ",","\\"}, StringSplitOptions.RemoveEmptyEntries)
.Skip(1)
.Take(3));
And you will get:
outputStr = "SOMETHINGS.ELSE.NET"

Related

Split a String on 2nd last occurrence of comma in C#

I have a string say
var str = "xy,yz,zx,ab,bc,cd";
and I want to split it on the 2nd last occurrence of a comma in C# i.e
a = "xy,yz,zx,ab"
b = "bc,cd"
How can I achieve this result?
Let's find the required comma index with a help of LastIndexOf:
var str = "xy,yz,zx,ab,bc,cd";
// index of the 2nd last occurrence of ','
int index = str.LastIndexOf(',', str.LastIndexOf(',') - 1);
Then use Substring:
string a = str.Substring(0, index);
string b = str.Substring(index + 1);
Let's have a look:
Console.WriteLine(a);
Comsole.WriteLine(b);
Outcome:
xy,yz,zx,ab
bc,cd
Alternative "readable" approach ;)
const string text = "xy,yz,zx,ab,bc,cd";
var words = text.Split(',');
var firstBatch = Math.Max(words.Length - 2, 0);
var first = string.Join(",", words.Take(firstBatch));
var second = string.Join(",", words.Skip(firstBatch));
first.Should().Be("xy,yz,zx,ab"); // Pass OK
second.Should().Be("bc,cd"); // Pass OK
You could handle this via regex replacement:
var str = "xy,yz,zx,ab,bc,cd";
var a = Regex.Replace(str, #",[^,]+,[^,]+$", "");
var b = Regex.Replace(str, #"^.*,([^,]+,[^,]+)$", "$1");
Console.WriteLine(a);
Console.WriteLine(b);
This prints:
xy,yz,zx,ab
bc,cd
It you get Microsoft's System.Interactive extensions from NuGet then you can do this:
string output = String.Join(",", str.Split(',').TakeLast(2));

Finding the longest substring regex?

Someone knows how to find the longest substring composed of letters using using MatchCollection.
public static Regex pattern2 = new Regex("[a-zA-Z]");
public static string zad3 = "ala123alama234ijeszczepsa";
You can loop over all matches and get the longest:
string max = "";
foreach (Match match in Regex.Matches(zad3, "[a-zA-Z]+"))
if (max.Length < match.Value.Length)
max = match.Value;
Try this:
MatchCollection matches = pattern2.Matches(txt);
List<string> strLst = new List<string>();
foreach (Match match in matches)
strLst.Add(match.Value);
var maxStr1 = strLst.OrderByDescending(s => s.Length).First();
or better way :
var maxStr2 = matches.Cast<Match>().Select(m => m.Value).ToArray().OrderByDescending(s => s.Length).First();
best solution for your task is:
string zad3 = "ala123alama234ijeszczepsa54dsfd";
string max = Regex.Split(zad3,#"\d+").Max(x => x);
You must change your Regex pattern to include the repetition operator + so that it matches more than once.
[a-zA-Z] should be [a-zA-Z]+
You can get the longest value using LINQ. Order by the match length descending and then take the first entry. If there are no matches the result is null.
string pattern2 = "[a-zA-Z]+";
string zad3 = "ala123alama234ijeszczepsa";
var matches = Regex.Matches(zad3, pattern2);
string result = matches
.Cast<Match>()
.OrderByDescending(x => x.Value.Length)
.FirstOrDefault()?
.Value;
The string named result in this example is:
ijeszczepsa
Using linq and the short one:
string longest= Regex.Matches(zad3, pattern2).Cast<Match>()
.OrderByDescending(x => x.Value.Length).FirstOrDefault()?.Value;
you can find it in O(n) like this (if you do not want to use regex):
string zad3 = "ala123alama234ijeszczepsa";
int max=0;
int count=0;
for (int i=0 ; i<zad3.Length ; i++)
{
if (zad3[i]>='0' && zad3[i]<='9')
{
if (count > max)
max=count;
count=0;
continue;
}
count++;
}
if (count > max)
max=count;
Console.WriteLine(max);

Split a string containing digits

I'm having a string like,
"abc kskd 8.900 prew"
need to Split this string so that i get the result as "abc kskd" and "8.900 prew"
how can i achieve this with C#?
Get the index of first digit using LINQ then use Substring:
var input = "abc kskd 8.900 prew";
var index = input.Select( (x,idx) => new {x, idx})
.Where(c => char.IsDigit(c.x))
.Select(c => c.idx)
.First();
var part1 = input.Substring(0, index);
var part2 = input.Substring(index);
This should do if you don't need to do something complicated:
var data = "abc kskd 8.900 prew";
var digits = "0123456789".ToCharArray();
var idx = data.IndexOfAny(digits);
if (idx != -1)
{
var firstPart = data.Substring(0, idx - 1);
var secondPart = data.Substring(idx);
}
IndexOfAny is actually very fast.
This could also be modified to separate the string into more parts (using the startIndex parameter), but you didn't ask for that.
straightforward with a regular expression:
var str = "abc kskd 8.900 prew";
var result = Regex.Split(str, #"\W(\d.*)").Where(x => x!="").ToArray();
Try this,
public string[] SplitText(string text)
{
var startIndex = 0;
while (startIndex < text.Length)
{
var index = text.IndexOfAny("0123456789".ToCharArray(), startIndex);
if (index < 0)
{
break;
}
var spaceIndex = text.LastIndexOf(' ', startIndex, index - startIndex);
if (spaceIndex != 0)
{
return new String[] { text.Substring(0, spaceIndex), text.Substring(spaceIndex + 1) };
}
startIndex = index;
}
return new String[] {text};
}
Something similar to what #Dominic Kexel provided, but only if you don't want to use linq.
string[] result = Regex.Split("abc kskd 8.900 prew", #"\w*(?=\d+\.\d)");

i want to parse my script file by regex in c#

i have a script file like below
[grade]
`[achievement]`
[gold multiple]
250
[level]
34
99
[pre required quest]
38
[/pre required quest]
for example:
lex("grade") return "`[achievement]`"
lex("level") return "34,99"
may be i can do it by linq, but i don't find a way
i tried
scripts = File.ReadAllText(scriptFilePath);
string gradeKeyword = #"(?<=\[grade\]\r\n).*?\r\n*(?=\[.*\]\r\n)"
Regex reg = new Regex(gradeKeyword);
Match mat = reg.Match(scripts);
it didn't work(which i want to get [achievement])
BTW, can i do that by linq?
You could try not using a regex.
public static IEnumerable<string> GetScriptSection(string file, string section)
{
var startMatch = string.Format("[{0}]", section);
var endMatch = string.Format("[/{0}]", section);
var lines = file.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Select(s => s.Trim()).ToList();
int startIndex = lines.FindIndex(f => f == startMatch) + 1;
int endIndex = lines.FindLastIndex(f => f == endMatch);
if(endIndex == -1)
{
endIndex = lines.FindIndex(startIndex, f => f.StartsWith("[") && lines.IndexOf(f) > startIndex);
endIndex = endIndex == -1 ? lines.Count : endIndex;
}
return lines.GetRange(startIndex, endIndex - startIndex).Where(l => !string.IsNullOrWhiteSpace(l)).ToList();
}
But I would just use YAML, XML, or some other well used format instead of rolling my own.
.NET regular expressions default to not matching across line breaks. You have to specify the RegexOptions.SingleLine option.
Try following expresion:
string gradeKeyword = "\\[level\\]\\r\\n([^\\[].+\\r\\n)+";

Retrieve String Containing Specific substring C#

I am having an output in string format like following :
"ABCDED 0000A1.txt PQRSNT 12345"
I want to retreieve substring(s) having .txt in above string. e.g. For above it should return 0000A1.txt.
Thanks
You can either split the string at whitespace boundaries like it's already been suggested or repeatedly match the same regex like this:
var input = "ABCDED 0000A1.txt PQRSNT 12345 THE.txt FOO";
var match = Regex.Match (input, #"\b([\w\d]+\.txt)\b");
while (match.Success) {
Console.WriteLine ("TEST: {0}", match.Value);
match = match.NextMatch ();
}
Split will work if it the spaces are the seperator. if you use oter seperators you can add as needed
string input = "ABCDED 0000A1.txt PQRSNT 12345";
string filename = input.Split(' ').FirstOrDefault(f => System.IO.Path.HasExtension(f));
filname = "0000A1.txt" and this will work for any extension
You may use c#, regex and pattern, match :)
Here is the code, plug it in try. Please comment.
string test = "afdkljfljalf dkfjd.txt lkjdfjdl";
string ffile = Regex.Match(test, #"\([a-z0-9])+.txt").Groups[1].Value;
Console.WriteLine(ffile);
Reference: regexp
I did something like this:
string subString = "";
char period = '.';
char[] chArString;
int iSubStrIndex = 0;
if (myString != null)
{
chArString = new char[myString.Length];
chArString = myString.ToCharArray();
for (int i = 0; i < myString.Length; i ++)
{
if (chArString[i] == period)
iSubStrIndex = i;
}
substring = myString.Substring(iSubStrIndex);
}
Hope that helps.
First split your string in array using
char[] whitespace = new char[] { ' ', '\t' };
string[] ssizes = myStr.Split(whitespace);
Then find .txt in array...
// Find first element starting with .txt.
//
string value1 = Array.Find(array1,
element => element.Contains(".txt", StringComparison.Ordinal));
Now your value1 will have the "0000A1.txt"
Happy coding.

Categories