I have this string:
abc-8479fe75-82rb5-45g00-b563-a7346703098b
I want to go through this string starting from the end and grab everything till the 5th occurence of the "-" sign.
how do I substring this string to only show (has to be through the procedure described above): -8479fe75-82rb5-45g00-b563-a7346703098b
You can also define handy extension method if that's common use case:
public static class StringExtensions
{
public static string GetSubstringFromReversedOccurence(this string #this,
char #char,
int occurence)
{
var indexes = new List<int>();
var idx = 0;
while ((idx = #this.IndexOf(#char, idx + 1)) > 0)
{
indexes.Add(idx);
}
if (occurence > indexes.Count)
{
throw new Exception("Not enough occurences of character");
}
var desiredIndex = indexes[indexes.Count - occurence];
return #this.Substring(desiredIndex);
}
}
According to your string rules, I would do a simple split :
var text = "abc-8479fe75-82rb5-45g00-b563-a7346703098b";
var split = text.split('-');
if (split.Lenght <= 5)
return text;
return string.Join('-', split.Skip(split.Lenght - 5);
We could use a regex replacement with a capture group:
var text = "abc-8479fe75-82rb5-45g00-b563-a7346703098b";
var pattern = #"([^-]+(?:-[^-]+){4}).*";
var output = Regex.Replace(text, pattern, "$1");
Console.WriteLine(output); // abc-8479fe75-82rb5-45g00-b563
If we can rely on the input always having 6 hyphen separated components, we could also use this version:
var text = "abc-8479fe75-82rb5-45g00-b563-a7346703098b";
var pattern = #"-[^-]+$";
var output = Regex.Replace(text, pattern, "");
Console.WriteLine(output); // abc-8479fe75-82rb5-45g00-b563
I think this should be the most performant
public string GetSubString(string text)
{
var count = 0;
for (int i = text.Length-1; i > 0; i--)
{
if (text[i] == '-') count++;
if (count == 5) return text.Substring(i);
}
return null;
}
I would like to split a string with delimiters but keep the delimiters in the result.
How would I do this in C#?
If the split chars were ,, ., and ;, I'd try:
using System.Text.RegularExpressions;
...
string[] parts = Regex.Split(originalString, #"(?<=[.,;])")
(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.
If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
#"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", #"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2
Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if(index-start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
Just in case anyone wants this answer aswell...
Instead of string[] parts = Regex.Split(originalString, #"(?<=[.,;])") you could use string[] parts = Regex.Split(originalString, #"(?=yourmatch)") where yourmatch is whatever your separator is.
Supposing the original string was
777- cat
777 - dog
777 - mouse
777 - rat
777 - wolf
Regex.Split(originalString, #"(?=777)") would return
777 - cat
777 - dog
and so on
This version does not use LINQ or Regex and so it's probably relatively efficient. I think it might be easier to use than the Regex because you don't have to worry about escaping special delimiters. It returns an IList<string> which is more efficient than always converting to an array. It's an extension method, which is convenient. You can pass in the delimiters as either an array or as multiple parameters.
/// <summary>
/// Splits the given string into a list of substrings, while outputting the splitting
/// delimiters (each in its own string) as well. It's just like String.Split() except
/// the delimiters are preserved. No empty strings are output.</summary>
/// <param name="s">String to parse. Can be null or empty.</param>
/// <param name="delimiters">The delimiting characters. Can be an empty array.</param>
/// <returns></returns>
public static IList<string> SplitAndKeepDelimiters(this string s, params char[] delimiters)
{
var parts = new List<string>();
if (!string.IsNullOrEmpty(s))
{
int iFirst = 0;
do
{
int iLast = s.IndexOfAny(delimiters, iFirst);
if (iLast >= 0)
{
if (iLast > iFirst)
parts.Add(s.Substring(iFirst, iLast - iFirst)); //part before the delimiter
parts.Add(new string(s[iLast], 1));//the delimiter
iFirst = iLast + 1;
continue;
}
//No delimiters were found, but at least one character remains. Add the rest and stop.
parts.Add(s.Substring(iFirst, s.Length - iFirst));
break;
} while (iFirst < s.Length);
}
return parts;
}
Some unit tests:
text = "[a link|http://www.google.com]";
result = text.SplitAndKeepDelimiters('[', '|', ']');
Assert.IsTrue(result.Count == 5);
Assert.AreEqual(result[0], "[");
Assert.AreEqual(result[1], "a link");
Assert.AreEqual(result[2], "|");
Assert.AreEqual(result[3], "http://www.google.com");
Assert.AreEqual(result[4], "]");
A lot of answers to this! One I knocked up to split by various strings (the original answer caters for just characters i.e. length of 1). This hasn't been fully tested.
public static IEnumerable<string> SplitAndKeep(string s, params string[] delims)
{
var rows = new List<string>() { s };
foreach (string delim in delims)//delimiter counter
{
for (int i = 0; i < rows.Count; i++)//row counter
{
int index = rows[i].IndexOf(delim);
if (index > -1
&& rows[i].Length > index + 1)
{
string leftPart = rows[i].Substring(0, index + delim.Length);
string rightPart = rows[i].Substring(index + delim.Length);
rows[i] = leftPart;
rows.Insert(i + 1, rightPart);
}
}
}
return rows;
}
This seems to work, but its not been tested much.
public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
List<string> splitValues = new List<string>();
int itemStart = 0;
for (int pos = 0; pos < value.Length; pos++)
{
for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
{
if (separators[sepIndex] == value[pos])
{
// add the section of string before the separator
// (unless its empty and we are discarding empty sections)
if (itemStart != pos || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, pos - itemStart));
}
itemStart = pos + 1;
// add the separator
splitValues.Add(separators[sepIndex].ToString());
break;
}
}
}
// add anything after the final separator
// (unless its empty and we are discarding empty sections)
if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
}
return splitValues.ToArray();
}
Recently I wrote an extension method do to this:
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
I'd say the easiest way to accomplish this (except for the argument Hans Kesting brought up) is to split the string the regular way, then iterate over the array and add the delimiter to every element but the last.
To avoid adding character to new line try this :
string[] substrings = Regex.Split(input,#"(?<=[-])");
result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
result[i] += separator;
(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)
(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)
Iterate through the string character by character (which is what regex does anyway.
When you find a splitter, then spin off a substring.
pseudo code
int hold, counter;
List<String> afterSplit;
string toSplit
for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
if(toSplit[counter] = /*split charaters*/)
{
afterSplit.Add(toSplit.Substring(hold, counter));
hold = counter;
}
}
That's sort of C# but not really. Obviously, choose the appropriate function names.
Also, I think there might be an off-by-1 error in there.
But that will do what you're asking.
veggerby's answer modified to
have no string items in the list
have fixed string as delimiter like "ab" instead of single character
var delimiter = "ab";
var text = "ab33ab9ab"
var parts = Regex.Split(text, $#"({Regex.Escape(delimiter)})")
.Where(p => p != string.Empty)
.ToList();
// parts = "ab", "33", "ab", "9", "ab"
The Regex.Escape() is there just in case your delimiter contains characters which regex interprets as special pattern commands (like *, () and thus have to be escaped.
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApplication9
{
class Program
{
static void Main(string[] args)
{
string input = #"This;is:a.test";
char sep0 = ';', sep1 = ':', sep2 = '.';
string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
List<string> parts=new List<string>();
foreach (Match match in matches)
{
parts.Add(match.ToString());
}
}
}
}
I wanted to do a multiline string like this but needed to keep the line breaks so I did this
string x =
#"line 1 {0}
line 2 {1}
";
foreach(var line in string.Format(x, "one", "two")
.Split("\n")
.Select(x => x.Contains('\r') ? x + '\n' : x)
.AsEnumerable()
) {
Console.Write(line);
}
yields
line 1 one
line 2 two
I came across same problem but with multiple delimiters. Here's my solution:
public static string[] SplitLeft(this string #this, char[] delimiters, int count)
{
var splits = new List<string>();
int next = -1;
while (splits.Count + 1 < count && (next = #this.IndexOfAny(delimiters, next + 1)) >= 0)
{
splits.Add(#this.Substring(0, next));
#this = new string(#this.Skip(next).ToArray());
}
splits.Add(#this);
return splits.ToArray();
}
Sample with separating CamelCase variable names:
var variableSplit = variableName.SplitLeft(
Enumerable.Range('A', 26).Select(i => (char)i).ToArray());
I wrote this code to split and keep delimiters:
private static string[] SplitKeepDelimiters(string toSplit, char[] delimiters, StringSplitOptions splitOptions = StringSplitOptions.None)
{
var tokens = new List<string>();
int idx = 0;
for (int i = 0; i < toSplit.Length; ++i)
{
if (delimiters.Contains(toSplit[i]))
{
tokens.Add(toSplit.Substring(idx, i - idx)); // token found
tokens.Add(toSplit[i].ToString()); // delimiter
idx = i + 1; // start idx for the next token
}
}
// last token
tokens.Add(toSplit.Substring(idx));
if (splitOptions == StringSplitOptions.RemoveEmptyEntries)
{
tokens = tokens.Where(token => token.Length > 0).ToList();
}
return tokens.ToArray();
}
Usage example:
string toSplit = "AAA,BBB,CCC;DD;,EE,";
char[] delimiters = new char[] {',', ';'};
string[] tokens = SplitKeepDelimiters(toSplit, delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var token in tokens)
{
Console.WriteLine(token);
}
i want to split a string the following way:
string s = "012345678x0123x01234567890123456789";
s.SplitString("x",10);
should be split into
012345678
x0123
x012345678
9012345678
9
e.g. the inputstring should be split after the character "x" or length 10 - what comes first.
here is what i've tried so far:
public static IEnumerable<string> SplitString(this string sInput, string search, int maxlength)
{
int index = Math.Min(sInput.IndexOf(search), maxlength);
int start = 0;
while (index != -1)
{
yield return sInput.Substring(start, index-start);
start = index;
index = Math.Min(sInput.IndexOf(search,start), maxlength);
}
}
I would go with this regular expression:
([^x]{1,10})|(x[^x]{1,9})
which means:
Match at most 10 characters that are not x OR match x followed by at most 9 characters thar are not x
Here is working example:
string regex = "([^x]{1,10})|(x[^x]{1,9})";
string input = "012345678x0123x01234567890123456789";
var results = Regex.Matches(input, regex)
.Cast<Match>()
.Select(m => m.Value);
which produces values by you.
Personally I don't like RegEx. It creates code that is hard to de-bug and is very hard to work out what it is meant to be doing when you first look at it. So for a more lengthy solution I would go with something like this.
public static IEnumerable<string> SplitString(this string sInput, char search, int maxlength)
{
var result = new List<string>();
var count = 0;
var lastSplit = 0;
foreach (char c in sInput)
{
if (c == search || count - lastSplit == maxlength)
{
result.Add(sInput.Substring(lastSplit, count - lastSplit));
lastSplit = count;
}
count ++;
}
result.Add(sInput.Substring(lastSplit, count - lastSplit));
return result;
}
Note I changed the first parameter to a char (from a string). This code can probably be optimised some more, but it is nice and readable, which for me is more important.
Hey I have an input string that looks like this:
Just a test Post [c] hello world [/c]
the output should be:
hello world
can anybody help?
I tried to use:
Regex regex = new Regex("[c](.*)[/c]");
var v = regex.Match(post.Content);
string s = v.Groups[1].ToString();
You may do this without Regex. Consider this extension method:
public static string GetStrBetweenTags(this string value,
string startTag,
string endTag)
{
if (value.Contains(startTag) && value.Contains(endTag))
{
int index = value.IndexOf(startTag) + startTag.Length;
return value.Substring(index, value.IndexOf(endTag) - index);
}
else
return null;
}
and use it:
string s = "Just a test Post [c] hello world [/c] ";
string res = s.GetStrBetweenTags("[c]", "[/c]");
In regex
[character_group]
means:
Matches any single character in character_group.
Note that \, *, +, ?, |, {, [, (,), ^, $,., # and white space are Character Escapes and you have to use \ to use them in your expression:
\[c\](.*)\[/c\]
The backslash character \ in a regular expression indicates that the character that follows it either is a special character, or should be interpreted literally.
so that your code should be work correctly if you edit your regex:
Regex regex = new Regex("\[c\](.*)\[/c\]");
var v = regex.Match(post.Content);
string s = v.Groups[1].ToString();
Change your code to:
Regex regex = new Regex(#"\[c\](.*)\[/c\]");
var v = regex.Match(post.Content);
string s = v.Groups[1].Value;
Piggybacking on #horgh's answer, this adds an inclusive/exclusive option:
public static string ExtractBetween(this string str, string startTag, string endTag, bool inclusive)
{
string rtn = null;
int s = str.IndexOf(startTag);
if (s >= 0)
{
if(!inclusive)
s += startTag.Length;
int e = str.IndexOf(endTag, s);
if (e > s)
{
if (inclusive)
e += startTag.Length;
rtn = str.Substring(s, e - s);
}
}
return rtn;
}
You looking for something like this?
var regex = new Regex(#"(?<=\[c\]).*?(?=\[/c\])");
foreach(Match match in regex.Matches(someString))
Console.WriteLine(match.Value);
This code takes into account also identical opening tags and can ignore tag case
public static string GetTextBetween(this string value, string startTag, string endTag, StringComparison stringComparison = StringComparison.CurrentCulture)
{
if (!string.IsNullOrEmpty(value))
{
int startIndex = value.IndexOf(startTag, stringComparison) + startTag.Length;
if (startIndex > -0)
{
var endIndex = value.IndexOf(endTag, startIndex, stringComparison);
if (endIndex > 0)
{
return value.Substring(startIndex, endIndex - startIndex);
}
}
}
return null;
}
I am scraping some website content which is like this - "Company Stock Rs. 7100".
Now, what i want is to extract the numeric value from this string. I tried split but something or the other goes wrong with my regular expression.
Please let me know how to get this value.
Use:
var result = Regex.Match(input, #"\d+").Value;
If you want to find only number which is last "entity" in the string you should use this regex:
\d+$
If you want to match last number in the string, you can use:
\d+(?!\D*\d)
int val = int.Parse(Regex.Match(input, #"\d+", RegexOptions.RightToLeft).Value);
I always liked LINQ:
var theNumber = theString.Where(x => char.IsNumber(x));
Though Regex sounds like the native choice...
This code will return the integer at the end of the string. This will work better than the regular expressions in the case that there is a number somewhere else in the string.
public int getLastInt(string line)
{
int offset = line.Length;
for (int i = line.Length - 1; i >= 0; i--)
{
char c = line[i];
if (char.IsDigit(c))
{
offset--;
}
else
{
if (offset == line.Length)
{
// No int at the end
return -1;
}
return int.Parse(line.Substring(offset));
}
}
return int.Parse(line.Substring(offset));
}
If your number is always after the last space and your string always ends with this number, you can get it this way:
str.Substring(str.LastIndexOf(" ") + 1)
Here is my answer ....it is separating numeric from string using C#....
static void Main(string[] args)
{
String details = "XSD34AB67";
string numeric = "";
string nonnumeric = "";
char[] mychar = details.ToCharArray();
foreach (char ch in mychar)
{
if (char.IsDigit(ch))
{
numeric = numeric + ch.ToString();
}
else
{
nonnumeric = nonnumeric + ch.ToString();
}
}
int i = Convert.ToInt32(numeric);
Console.WriteLine(numeric);
Console.WriteLine(nonnumeric);
Console.ReadLine();
}
}
}
You can use \d+ to match the first occurrence of a number:
string num = Regex.Match(input, #"\d+").Value;