Related
I Have a string with special chars and i have to replace those chars with an index (padded n '0' left).
Fast example for better explanation:
I have the string "0980 0099 8383 $$$$" and an index (integer) 3
result should be "0980 0099 8383 0003"
The special characters are not necessarily in sequence.
the source string could be empty or it may not contain any special characters
I've already written functions that works.
public static class StringExtensions
{
public static string ReplaceCounter(this string source, int counter, string character)
{
string res = source;
try
{
if (!string.IsNullOrEmpty(character))
{
if (res.Contains(character))
{
// Get ALL Indexes position of character
var Indexes = GetIndexes(res, character);
int max = GetMaxValue(Indexes.Count);
while (counter >= max)
{
counter -= max;
}
var new_value = counter.ToString().PadLeft(Indexes.Count, '0');
for (int i = 0; i < Indexes.Count; i++)
{
res = res.Remove(Indexes[i], 1).Insert(Indexes[i], new_value[i].ToString());
}
}
}
}
catch (Exception)
{
res = source;
}
return res;
}
private static List<int> GetIndexes(string mainString, string toFind)
{
var Indexes = new List<int>();
for (int i = mainString.IndexOf(toFind); i > -1; i = mainString.IndexOf(toFind, i + 1))
{
// for loop end when i=-1 (line.counter not found)
Indexes.Add(i);
}
return Indexes;
}
private static int GetMaxValue(int numIndexes)
{
int max = 0;
for (int i = 0; i < numIndexes; i++)
{
if (i == 0)
max = 9;
else
max = max * 10 + 9;
}
return max;
}
}
but i don't really like it (first of all because i'm passing the char as string.. and not as a char).
string source = "000081059671####=1811";
int index = 5;
string character = "#";
string result = source.ReplaceCounter(index, character);
can it be more optimized and compact?
Can some good soul help me?
Thanks in advance
EDIT
The index is variable so:
If the index is 15
string source = "000081059671####=1811";
int index = 15;
string character = "#";
string result = source.ReplaceCounter(index, character);
// result = "0000810596710015=1811"
it should be a check if the index > max number
in my code i posted above, if this case happened i remove from index the "max" value until index < max number
What is mux number? if the special chars number is 4 (as in the example below) the max number will be 9999
string source = "000081059671####=1811";
// max number 9999
Yet another edit
From a comment it seems that more than one digit can be used. In this case the counter can be converted to a string and treated as a char[] to pick the character to use in each iteration :
public static string ReplaceCounter(this string source,
int counter,
char character)
{
var sb=new StringBuilder(source);
var replacements=counter.ToString();
int r=replacements.Length-1;
for(int i=sb.Length-1;i>=0;i--)
{
if(sb[i]==character)
{
sb[i]=r>=0 ? replacements[r--] : '0';
}
}
return sb.ToString();
}
This can be used for any number of digits."0980 0099 8383 $$$$".ReplaceCounter(15,'$') produces 0980 0099 8383 0015
An edit
After posting the original answer I remembered one can modify a string without allocations by using a StringBuilder. In this case, the last match needs to be replaced with one character, all other matches with another. This ca be a simple reverse iteration :
public static string ReplaceCounter(this string source,
int counter,
char character)
{
var sb=new StringBuilder(source);
bool useChar=true;
for(int i=sb.Length-1;i>=0;i--)
{
if(sb[i]==character)
{
sb[i]=useChar?(char)('0'+counter):'0';
useChar=false;
}
}
return sb.ToString();
}
Console.WriteLine("0000##81#059671####=1811".ReplaceCounter(5,'#'));
Console.WriteLine("0980 0099 8383 $$$$".ReplaceCounter(3,'$'));
------
0000008100596710005=1811
0980 0099 8383 0003
Original Answer
Any string modification operation produces a new temporary string that need to be garbage collected. This adds up so quickly that avoiding temporary strings can result in >10x speed improvements when processing lots of text or lots of requests. That's better than using parallel processing.
You can use Regex.Replace to perform complex replacements without allocating temporary strings. You can use one of the Replace overloads that use a MatchEvaluator to produce dynamic output, not just a single value.
In this case :
var source = "0000##81#059671####=1811";
var result = Regex.Replace(source,"#", m=>m.NextMatch().Success?"0":"5");
Console.WriteLine(result);
--------
0000008100596710005=1811
Match.NextMatch() returns the next match in the source, so m.NextMatch().Success can be used to identify the last match and replace it with the index.
This would fail if the character was one of the Regex pattern characters. This can be avoided by escaping the character with Regex.Escape(string)
This can be packed in an extension method
public static string ReplaceCounter(this string source,
int counter,
string character)
{
return Regex.Replace(source,
Regex.Escape(character),
m=>m.NextMatch().Success?"0":counter.ToString());
}
public static string ReplaceCounter(this string source,
int counter,
char character)
=>ReplaceCounter(source,counter,character.ToString());
This code
var source= "0980 0099 8383 $$$$";
var result=source.ReplaceCounter(5,"$");
Returns
0980 0099 8383 0003
I would suggest such solutiuon (got rid out of helper methods:
public static class StringExtensions
{
public static string ReplaceCounter(this string source, int counter, char character)
{
string res = source;
string strCounter = counter.ToString();
bool counterTooLong = false;
int idx;
// Going from the and backwards, we fill with counter digits.
for(int i = strCounter.Length - 1; i >= 0; i--)
{
idx = res.LastIndexOf(character);
// if we run out of special characters, break the loop.
if (idx == -1)
{
counterTooLong = true;
break;
}
res = res.Remove(idx, 1).Insert(idx, strCounter[i].ToString());
}
// If we could not fit the counter, we simply throw exception
if (counterTooLong) throw new InvalidOperationException();
// If we did not fill all placeholders, we fill it with zeros.
while (-1 != (idx = res.IndexOf(character))) res = res.Remove(idx, 1).Insert(idx, "0");
return res;
}
}
Here's fiddle
I would like to split a string with delimiters but keep the delimiters in the result.
How would I do this in C#?
If the split chars were ,, ., and ;, I'd try:
using System.Text.RegularExpressions;
...
string[] parts = Regex.Split(originalString, #"(?<=[.,;])")
(?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.
If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
#"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", #"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2
Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string:
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if(index-start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
Just in case anyone wants this answer aswell...
Instead of string[] parts = Regex.Split(originalString, #"(?<=[.,;])") you could use string[] parts = Regex.Split(originalString, #"(?=yourmatch)") where yourmatch is whatever your separator is.
Supposing the original string was
777- cat
777 - dog
777 - mouse
777 - rat
777 - wolf
Regex.Split(originalString, #"(?=777)") would return
777 - cat
777 - dog
and so on
This version does not use LINQ or Regex and so it's probably relatively efficient. I think it might be easier to use than the Regex because you don't have to worry about escaping special delimiters. It returns an IList<string> which is more efficient than always converting to an array. It's an extension method, which is convenient. You can pass in the delimiters as either an array or as multiple parameters.
/// <summary>
/// Splits the given string into a list of substrings, while outputting the splitting
/// delimiters (each in its own string) as well. It's just like String.Split() except
/// the delimiters are preserved. No empty strings are output.</summary>
/// <param name="s">String to parse. Can be null or empty.</param>
/// <param name="delimiters">The delimiting characters. Can be an empty array.</param>
/// <returns></returns>
public static IList<string> SplitAndKeepDelimiters(this string s, params char[] delimiters)
{
var parts = new List<string>();
if (!string.IsNullOrEmpty(s))
{
int iFirst = 0;
do
{
int iLast = s.IndexOfAny(delimiters, iFirst);
if (iLast >= 0)
{
if (iLast > iFirst)
parts.Add(s.Substring(iFirst, iLast - iFirst)); //part before the delimiter
parts.Add(new string(s[iLast], 1));//the delimiter
iFirst = iLast + 1;
continue;
}
//No delimiters were found, but at least one character remains. Add the rest and stop.
parts.Add(s.Substring(iFirst, s.Length - iFirst));
break;
} while (iFirst < s.Length);
}
return parts;
}
Some unit tests:
text = "[a link|http://www.google.com]";
result = text.SplitAndKeepDelimiters('[', '|', ']');
Assert.IsTrue(result.Count == 5);
Assert.AreEqual(result[0], "[");
Assert.AreEqual(result[1], "a link");
Assert.AreEqual(result[2], "|");
Assert.AreEqual(result[3], "http://www.google.com");
Assert.AreEqual(result[4], "]");
A lot of answers to this! One I knocked up to split by various strings (the original answer caters for just characters i.e. length of 1). This hasn't been fully tested.
public static IEnumerable<string> SplitAndKeep(string s, params string[] delims)
{
var rows = new List<string>() { s };
foreach (string delim in delims)//delimiter counter
{
for (int i = 0; i < rows.Count; i++)//row counter
{
int index = rows[i].IndexOf(delim);
if (index > -1
&& rows[i].Length > index + 1)
{
string leftPart = rows[i].Substring(0, index + delim.Length);
string rightPart = rows[i].Substring(index + delim.Length);
rows[i] = leftPart;
rows.Insert(i + 1, rightPart);
}
}
}
return rows;
}
This seems to work, but its not been tested much.
public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
List<string> splitValues = new List<string>();
int itemStart = 0;
for (int pos = 0; pos < value.Length; pos++)
{
for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
{
if (separators[sepIndex] == value[pos])
{
// add the section of string before the separator
// (unless its empty and we are discarding empty sections)
if (itemStart != pos || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, pos - itemStart));
}
itemStart = pos + 1;
// add the separator
splitValues.Add(separators[sepIndex].ToString());
break;
}
}
}
// add anything after the final separator
// (unless its empty and we are discarding empty sections)
if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
{
splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
}
return splitValues.ToArray();
}
Recently I wrote an extension method do to this:
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
I'd say the easiest way to accomplish this (except for the argument Hans Kesting brought up) is to split the string the regular way, then iterate over the array and add the delimiter to every element but the last.
To avoid adding character to new line try this :
string[] substrings = Regex.Split(input,#"(?<=[-])");
result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
result[i] += separator;
(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)
(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)
Iterate through the string character by character (which is what regex does anyway.
When you find a splitter, then spin off a substring.
pseudo code
int hold, counter;
List<String> afterSplit;
string toSplit
for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
if(toSplit[counter] = /*split charaters*/)
{
afterSplit.Add(toSplit.Substring(hold, counter));
hold = counter;
}
}
That's sort of C# but not really. Obviously, choose the appropriate function names.
Also, I think there might be an off-by-1 error in there.
But that will do what you're asking.
veggerby's answer modified to
have no string items in the list
have fixed string as delimiter like "ab" instead of single character
var delimiter = "ab";
var text = "ab33ab9ab"
var parts = Regex.Split(text, $#"({Regex.Escape(delimiter)})")
.Where(p => p != string.Empty)
.ToList();
// parts = "ab", "33", "ab", "9", "ab"
The Regex.Escape() is there just in case your delimiter contains characters which regex interprets as special pattern commands (like *, () and thus have to be escaped.
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApplication9
{
class Program
{
static void Main(string[] args)
{
string input = #"This;is:a.test";
char sep0 = ';', sep1 = ':', sep2 = '.';
string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
Regex regex = new Regex(pattern);
MatchCollection matches = regex.Matches(input);
List<string> parts=new List<string>();
foreach (Match match in matches)
{
parts.Add(match.ToString());
}
}
}
}
I wanted to do a multiline string like this but needed to keep the line breaks so I did this
string x =
#"line 1 {0}
line 2 {1}
";
foreach(var line in string.Format(x, "one", "two")
.Split("\n")
.Select(x => x.Contains('\r') ? x + '\n' : x)
.AsEnumerable()
) {
Console.Write(line);
}
yields
line 1 one
line 2 two
I came across same problem but with multiple delimiters. Here's my solution:
public static string[] SplitLeft(this string #this, char[] delimiters, int count)
{
var splits = new List<string>();
int next = -1;
while (splits.Count + 1 < count && (next = #this.IndexOfAny(delimiters, next + 1)) >= 0)
{
splits.Add(#this.Substring(0, next));
#this = new string(#this.Skip(next).ToArray());
}
splits.Add(#this);
return splits.ToArray();
}
Sample with separating CamelCase variable names:
var variableSplit = variableName.SplitLeft(
Enumerable.Range('A', 26).Select(i => (char)i).ToArray());
I wrote this code to split and keep delimiters:
private static string[] SplitKeepDelimiters(string toSplit, char[] delimiters, StringSplitOptions splitOptions = StringSplitOptions.None)
{
var tokens = new List<string>();
int idx = 0;
for (int i = 0; i < toSplit.Length; ++i)
{
if (delimiters.Contains(toSplit[i]))
{
tokens.Add(toSplit.Substring(idx, i - idx)); // token found
tokens.Add(toSplit[i].ToString()); // delimiter
idx = i + 1; // start idx for the next token
}
}
// last token
tokens.Add(toSplit.Substring(idx));
if (splitOptions == StringSplitOptions.RemoveEmptyEntries)
{
tokens = tokens.Where(token => token.Length > 0).ToList();
}
return tokens.ToArray();
}
Usage example:
string toSplit = "AAA,BBB,CCC;DD;,EE,";
char[] delimiters = new char[] {',', ';'};
string[] tokens = SplitKeepDelimiters(toSplit, delimiters, StringSplitOptions.RemoveEmptyEntries);
foreach (var token in tokens)
{
Console.WriteLine(token);
}
For instance, I have a string and I only want the character '<' to appear 10 times in the string, and create a substring where the cutoff point is the 10th appearance of that character. Is this possible?
A manual solution could be like the following:
class Program
{
static void Main(string[] args)
{
int maxNum = 10;
string initialString = "a<b<c<d<e<f<g<h<i<j<k<l<m<n<o<p<q<r<s<t<u<v<w<x<y<z";
string[] splitString = initialString.Split('<');
string result = "";
Console.WriteLine(splitString.Length);
if (splitString.Length > maxNum)
{
for (int i = 0; i < maxNum; i++) {
result += splitString[i];
result += "<";
}
}
else
{
result = initialString;
}
Console.WriteLine(result);
Console.ReadKey();
}
}
By the way, it may be better to try to do it using Regex (in case you may have other replacement rules in the future, or need to make changes, etc). However, given your problem, something like that will work, too.
You can utilize TakeWhile for your purpose, given the string s, your character < as c and your count 10 as count, following function would solve your problem:
public static string foo(string s, char c, int count)
{
var i = 0;
return string.Concat(s.TakeWhile(x => (x == c ? i++ : i) < count));
}
Regex.Matches can be used to count the number of occurrences of a patter in a string.
It also reference the position of each occurrence, the Capture.Index property.
You can read the Index of the Nth occurrence and cut your string there:
(The RegexOptions are there just in case the pattern is something different. Modify as required.)
int cutAtOccurrence = 10;
string input = "one<two<three<four<five<six<seven<eight<nine<ten<eleven<twelve<thirteen<fourteen<fifteen";
var regx = Regex.Matches(input, "<", RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
if (regx.Count >= cutAtOccurrence) {
input = input.Substring(0, regx[cutAtOccurrence - 1].Index);
}
input is now:
one<two<three<four<five<six<seven<eight<nine<ten
If you need to use this procedure many times, it's bettern to build a method that returns a StringBuilder instead.
In C#, I got a string which looks in the following format:
a number|a number|a number,a number
for example: 1|2|3,4
I consider each number as the different part of the string. in the previous example, 1 is the first part, 2 is the second and so on.
I want to be able to replace a specific part of the string given an index of the part I want to change.
It's not that hard to do it with String.Split but that part with the comma makes it tedious since then i need to check if the index is 3 or 4 and then also separate with the comma.
Is there a more elegant way to do a switch of a specific part in the string? maybe somehow with a regular expression?
EDIT: I will add some requirements which I didn't write before:
What if I want to for example take the 3rd part of the string and replace it with the number there and add it 2. for example 1|2|3,4 to 1|2|5,4 where the 5 is NOT a constant but depends on the input string given.
You can create the following method
static string Replace(string input, int index, string replacement)
{
int matchIndex = 0;
return Regex.Replace(input, #"\d+", m => matchIndex++ == index ? replacement : m.Value);
}
Usage:
string input = "1|2|3,4";
string output = Replace(input, 1, "hello"); // "1|hello|3,4
As Eric Herlitz suggested, you can use other regex, the negative of delimiters. For example, if you expect , and | delimiters, you can replace \d+ by [^,|]+ regex. If you expect ,, | and # delimiters, you can use [^,|#] regex.
If you need to do some mathematical operations, you're free to do so:
static string Replace(string input, int index, int add)
{
int matchIndex = 0;
return Regex.Replace(input, #"\d+", m => matchIndex++ == index ? (int.Parse(m.Value) + add).ToString() : m.Value );
}
Example:
string input = "1|2|3,4";
string output = Replace(input, 2, 2); // 1|2|5,4
You can even make it generic:
static string Replace(string input, int index, Func<string,string> operation)
{
int matchIndex = 0;
return Regex.Replace(input, #"\d+", m => matchIndex++ == index ? operation(m.Value) : m.Value);
}
Example:
string input = "1|2|3,4";
string output = Replace(input, 2, value => (int.Parse(value) + 2).ToString()); // 1|2|5,4
Try this:
static void Main()
{
string input = "1|2|3|4,5,6|7,8|9|23|29,33";
Console.WriteLine(ReplaceByIndex(input, "hello", 23));
Console.ReadLine();
}
static string ReplaceByIndex(string input, string replaceWith, int index)
{
int indexStart = input.IndexOf(index.ToString());
int indexEnd = input.IndexOf(",", indexStart);
if (input.IndexOf("|", indexStart) < indexEnd)
indexEnd = input.IndexOf("|", indexStart);
string part1 = input.Substring(0, indexStart);
string part2 = "";
if (indexEnd > 0)
{
part2 = input.Substring(indexEnd, input.Length - indexEnd);
}
return part1 + replaceWith + part2;
}
This is assuming the numbers are in ascending order.
Use Regex.Split for the input and Regex.Match to collect your delimiters
string input = "1|2|3,4,5,6|7,8|9";
string pattern = #"[,|]+";
// Collect the values
string[] substrings = Regex.Split(input, pattern);
// Collect the delimiters
MatchCollection matches = Regex.Matches(input, pattern);
// Replace anything you like, i.e.
substrings[3] = "222";
// Rebuild the string
int i = 0;
string newString = string.Empty;
foreach (string substring in substrings)
{
newString += string.Concat(substring, matches.Count >= i + 1 ? matches[i++].Value : string.Empty);
}
This will output "1|2|3,222,5,6|7,8|9"
Try this (tested):
public static string Replace(string input, int value, int index)
{
string pattern = #"(\d+)|(\d+)|(\d+),(\d+)";
return Regex.Replace(input, pattern, match =>
{
if (match.Index == index * 2) //multiply by 2 for | and , character.
{
return value.ToString();
}
return match.Value;
});
}
Usage example:
string input = "1|2|3,4";
string output = Replace(input, 9, 1);
Updated with new requirement:
public static string ReplaceIncrement(string input, int incrementValue, int index)
{
string pattern = #"(\d+)|(\d+)|(\d+),(\d+)";
return Regex.Replace(input, pattern, match =>
{
if (match.Index == index * 2)
{
return (int.Parse(match.Value) + incrementValue).ToString();
}
return match.Value;
});
}
I am scraping some website content which is like this - "Company Stock Rs. 7100".
Now, what i want is to extract the numeric value from this string. I tried split but something or the other goes wrong with my regular expression.
Please let me know how to get this value.
Use:
var result = Regex.Match(input, #"\d+").Value;
If you want to find only number which is last "entity" in the string you should use this regex:
\d+$
If you want to match last number in the string, you can use:
\d+(?!\D*\d)
int val = int.Parse(Regex.Match(input, #"\d+", RegexOptions.RightToLeft).Value);
I always liked LINQ:
var theNumber = theString.Where(x => char.IsNumber(x));
Though Regex sounds like the native choice...
This code will return the integer at the end of the string. This will work better than the regular expressions in the case that there is a number somewhere else in the string.
public int getLastInt(string line)
{
int offset = line.Length;
for (int i = line.Length - 1; i >= 0; i--)
{
char c = line[i];
if (char.IsDigit(c))
{
offset--;
}
else
{
if (offset == line.Length)
{
// No int at the end
return -1;
}
return int.Parse(line.Substring(offset));
}
}
return int.Parse(line.Substring(offset));
}
If your number is always after the last space and your string always ends with this number, you can get it this way:
str.Substring(str.LastIndexOf(" ") + 1)
Here is my answer ....it is separating numeric from string using C#....
static void Main(string[] args)
{
String details = "XSD34AB67";
string numeric = "";
string nonnumeric = "";
char[] mychar = details.ToCharArray();
foreach (char ch in mychar)
{
if (char.IsDigit(ch))
{
numeric = numeric + ch.ToString();
}
else
{
nonnumeric = nonnumeric + ch.ToString();
}
}
int i = Convert.ToInt32(numeric);
Console.WriteLine(numeric);
Console.WriteLine(nonnumeric);
Console.ReadLine();
}
}
}
You can use \d+ to match the first occurrence of a number:
string num = Regex.Match(input, #"\d+").Value;