Find quoted strings and replace content between double quotes - c#

I have a string, e.g.
"24.09.2019","545","878","5"
that should be processed to
"{1}","{2}","{3}","{4}"
Now I am trying to use regular expression:
string replacementString="{NG}";
Regex regex = new Regex("\\\"[0-9\.]+\\\"");
MatchCollection matches = regex.Matches(originalString);
List<string> replacements = new List<string>();
for (int x = 0; x < matches.Count; x++)
{
string replacement = String.Copy(replacementString);
replacement = replacement.Replace("NG", (x + 1).ToString());
replacements.Add(replacement);
Match match = matches[x];
}
replacements.Reverse();
int cnt = 0;
foreach (var match in matches.Cast<Match>().Reverse())
{
originalStringTmp = originalStringTmp.Replace(
match.Index,
match.Length,
replacements[cnt]);
cnt++;
}
And
public static string Replace(this string s, int index, int length, string replacement)
{
var builder = new StringBuilder();
builder.Append(s.Substring(0, index));
builder.Append(replacement);
builder.Append(s.Substring(index + length));
return builder.ToString();
}
But in this case the result is
{1},{2},{3},{4}
What regular expression should I use instead of
\"[0-9\.]+\"
to achieve the result
"{1}","{2}","{3}","{4}"
with C# regular expression?

Let's try Regex.Replace in order to replace all the quotations (I've assumed that quotation is escaped by itself: "abc""def" -> abc"def) within the string:
string source = "\"24.09.2019\",\"545\",\"878\",\"5\"";
int index = 0;
string result = Regex.Replace(source, "\"([^\"]|\"\")*\"", m => $"\"{{{++index}}}\"");
Demo:
Func<string, string> convert = (source => {
int index = 0;
return Regex.Replace(source, "\"([^\"]|\"\")*\"", m => $"\"{{{++index}}}\"");
});
String[] tests = new string[] {
"abc",
"\"abc\", \"def\"\"fg\"",
"\"\"",
"\"24.09.2019\",\"545\",\"878\",\"5\"",
"name is \"my name\"; value is \"78\"\"\"\"\"",
"empty: \"\" and not empty: \"\"\"\""
};
string demo = string.Join(Environment.NewLine, tests
.Select(test => $"{test,-50} -> {convert(test)}"));
Console.Write(demo);
Outcome:
abc -> abc
"abc", "def""fg" -> "{1}", "{2}"
"" -> "{1}"
"24.09.2019","545","878","5" -> "{1}","{2}","{3}","{4}"
name is "my name"; value is "78""""" -> name is "{1}"; value is "{2}"
empty: "" and not empty: """" -> empty: "{1}" and not empty: "{2}"
Edit: You can easily elaborate the replacement, e.g. if you want to replace integer numbers only
Func<string, string> convert = (source => {
int index = 0;
// we have match "m" with index "index"
// out task is to provide a string which will be put instead of match
return Regex.Replace(
source,
"\"([^\"]|\"\")*\"",
m => int.TryParse(m.Value.Trim('"'), out int _drop)
? $"\"{{{++index}}}\"") // if match is a valid integer, replace it
: m.Value); // if not, keep intact
});
In general case
Func<string, string> convert = (source => {
int index = 0;
// we have match "m" with index "index"
// out task is to provide a string which will be put instead of match
return Regex.Replace(
source,
"\"([^\"]|\"\")*\"",
m => {
// now we have a match "m", with its value "m.Value"
// its index "index"
// and we have to return a string which will be put instead of match
// if you want unquoted value, i.e. abc"def instead of "abc""def"
// string unquoted = Regex.Replace(
// m.Value, "\"+", match => new string('"', match.Value.Length / 2));
return //TODO: put the relevant code here
}
});

string originalString = "\"24.09.2019\",\"545\",\"878\",\"5\"";
var regex = new Regex("\"[0-9\\.]+\"");
var matches = regex.Matches(originalString);
string result = string.Join(',', Enumerable.Range(1, matches.Count).Select(n => $"\"{{{n}}}\""));
Input:
"24.09.2019","545","878","5"
Result:
"{1}","{2}","{3}","{4}"

Related

Getting a numbers from a string with chars glued

I need to recover each number in a glued string
For example, from these strings:
string test = "number1+3"
string test1 = "number 1+4"
I want to recover (1 and 3) and (1 and 4)
How can I do this?
CODE
string test= "number1+3";
List<int> res;
string[] digits= Regex.Split(test, #"\D+");
foreach (string value in digits)
{
int number;
if (int.TryParse(value, out number))
{
res.Add(number)
}
}
This regex should work
string pattern = #"\d+";
string test = "number1+3";
foreach (Match match in Regex.Matches(test, pattern))
Console.WriteLine("Found '{0}' at position {1}",
match.Value, match.Index);
Note that if you intend to use it multiple times, it's better, for performance reasons, to create a Regex instance than using this static method.
var res = new List<int>();
var regex = new Regex(#"\d+");
void addMatches(string text) {
foreach (Match match in regex.Matches(text))
{
int number = int.Parse(match.Value);
res.Add(number);
}
}
string test = "number1+3";
addMatches(test);
string test1 = "number 1+4";
addMatches(test1);
MSDN link.
Fiddle 1
Fiddle 2
This calls for a regular expression:
(\d+)\+(\d+)
Test it
Match m = Regex.Match(input, #"(\d+)\+(\d+)");
string first = m.Groups[1].Captures[0].Value;
string second = m.Groups[2].Captures[0].Value;
An alternative to regular expressions:
string test = "number 1+4";
int[] numbers = test.Replace("number", string.Empty, StringComparison.InvariantCultureIgnoreCase)
.Trim()
.Split("+", StringSplitOptions.RemoveEmptyEntries)
.Select(x => Convert.ToInt32(x))
.ToArray();

Finding the longest substring regex?

Someone knows how to find the longest substring composed of letters using using MatchCollection.
public static Regex pattern2 = new Regex("[a-zA-Z]");
public static string zad3 = "ala123alama234ijeszczepsa";
You can loop over all matches and get the longest:
string max = "";
foreach (Match match in Regex.Matches(zad3, "[a-zA-Z]+"))
if (max.Length < match.Value.Length)
max = match.Value;
Try this:
MatchCollection matches = pattern2.Matches(txt);
List<string> strLst = new List<string>();
foreach (Match match in matches)
strLst.Add(match.Value);
var maxStr1 = strLst.OrderByDescending(s => s.Length).First();
or better way :
var maxStr2 = matches.Cast<Match>().Select(m => m.Value).ToArray().OrderByDescending(s => s.Length).First();
best solution for your task is:
string zad3 = "ala123alama234ijeszczepsa54dsfd";
string max = Regex.Split(zad3,#"\d+").Max(x => x);
You must change your Regex pattern to include the repetition operator + so that it matches more than once.
[a-zA-Z] should be [a-zA-Z]+
You can get the longest value using LINQ. Order by the match length descending and then take the first entry. If there are no matches the result is null.
string pattern2 = "[a-zA-Z]+";
string zad3 = "ala123alama234ijeszczepsa";
var matches = Regex.Matches(zad3, pattern2);
string result = matches
.Cast<Match>()
.OrderByDescending(x => x.Value.Length)
.FirstOrDefault()?
.Value;
The string named result in this example is:
ijeszczepsa
Using linq and the short one:
string longest= Regex.Matches(zad3, pattern2).Cast<Match>()
.OrderByDescending(x => x.Value.Length).FirstOrDefault()?.Value;
you can find it in O(n) like this (if you do not want to use regex):
string zad3 = "ala123alama234ijeszczepsa";
int max=0;
int count=0;
for (int i=0 ; i<zad3.Length ; i++)
{
if (zad3[i]>='0' && zad3[i]<='9')
{
if (count > max)
max=count;
count=0;
continue;
}
count++;
}
if (count > max)
max=count;
Console.WriteLine(max);

Replace a text by incrementing each occurence

I have a text say:
Hello
abc
Hello
def
Hello
I want to convert it to
Hello1
abc
Hello2
abc
Hello3
i.e I need to append a number after each occurrence of "Hello" text.
Currently I have written this code:
var xx = File.ReadAllText("D:\\test.txt");
var regex = new Regex("Hello", RegexOptions.IgnoreCase);
var matches = regex.Matches(xx);
int i = 1;
foreach (var match in matches.Cast<Match>())
{
string yy = match.Value;
xx = Replace(xx, match.Index, match.Length, match.Value + (i++));
}
and the Replace method above used is:
public static string Replace(string s, int index, int length, string replacement)
{
var builder = new StringBuilder();
builder.Append(s.Substring(0, index));
builder.Append(replacement);
builder.Append(s.Substring(index + length));
return builder.ToString();
}
Currently the above code is not working and is replacing the text in between.
Can you help me fixing that?
Assuming Hello is just a placeholder for a more complex pattern, here is a simple fix: use a match evaluator inside Regex.Replace where you may use variables:
var s = "Hello\nabc\nHello\ndef\nHello";
var i = 0;
var result = Regex.Replace(
s, "Hello", m => string.Format("{0}{1}",m.Value,++i), RegexOptions.IgnoreCase);
Console.WriteLine(result);
See the C# demo

Get String (Text) before next upper letter

I have the following:
string test = "CustomerNumber";
or
string test2 = "CustomerNumberHello";
the result should be:
string result = "Customer";
The first word from the string is the result, the first word goes until the first uppercase letter, here 'N'
I already tried some things like this:
var result = string.Concat(s.Select(c => char.IsUpper(c) ? " " + c.ToString() : c.ToString()))
.TrimStart();
But without success, hope someone could offer me a small and clean solution (without RegEx).
The following should work:
var result = new string(
test.TakeWhile((c, index) => index == 0 || char.IsLower(c)).ToArray());
You could just go through the string to see which values (ASCII) are below 97 and remove the end. Not the prettiest or LINQiest way, but it works...
string test2 = "CustomerNumberHello";
for (int i = 1; i < test2.Length; i++)
{
if (test2[i] < 97)
{
test2 = test2.Remove(i, test2.Length - i);
break;
}
}
Console.WriteLine(test2); // Prints Customer
Try this
private static string GetFirstWord(string source)
{
return source.Substring(0, source.IndexOfAny("ABCDEFGHIJKLMNOPQRSTUVWXYZ".ToArray(), 1));
}
Z][a-z]+ regex it will split the string to string that start with big letters her is an example
regex = "[A-Z][a-z]+";
MatchCollection mc = Regex.Matches(richTextBox1.Text, regex);
foreach (Match match in mc)
if (!match.ToString().Equals(""))
Console.writln(match.ToString() + "\n");
I have tested, this works:
string cust = "CustomerNumberHello";
string[] str = System.Text.RegularExpressions.Regex.Split(cust, #"[a-z]+");
string str2 = cust.Remove(cust.IndexOf(str[1], 1));

Extracting strings in .NET

I have a string that looks like this:
var expression = #"Args("token1") + Args("token2")";
I want to retrieve a collection of strings that are enclosed in Args("") in the expression.
How would I do this in C# or VB.NET?
Regex:
string expression = "Args(\"token1\") + Args(\"token2\")";
Regex r = new Regex("Args\\(\"([^\"]+)\"\\)");
List<string> tokens = new List<string>();
foreach (var match in r.Matches(expression)) {
string s = match.ToString();
int start = s.IndexOf('\"');
int end = s.LastIndexOf('\"');
tokens.add(s.Substring(start + 1, end - start - 1));
}
Non-regex (this assumes that the string in the correct format!):
string expression = "Args(\"token1\") + Args(\"token2\")";
List<string> tokens = new List<string>();
int index;
while (!String.IsNullOrEmpty(expression) && (index = expression.IndexOf("Args(\"")) >= 0) {
int start = expression.IndexOf('\"', index);
string s = expression.Substring(start + 1);
int end = s.IndexOf("\")");
tokens.Add(s.Substring(0, end));
expression = s.Substring(end + 2);
}
There is another regular expression method for accomplishing this, using lookahead and lookbehind assertions:
Regex regex = new Regex("(?<=Args\\(\").*?(?=\"\\))");
string input = "Args(\"token1\") + Args(\"token2\")";
MatchCollection matches = regex.Matches(input);
foreach (var match in matches)
{
Console.WriteLine(match.ToString());
}
This strips away the Args sections of the string, giving just the tokens.
If you want token1 and token2, you can use following regex
input=#"Args(""token1"") + Args(""token2"")"
MatchCollection matches = Regex.Matches(input,#"Args\(""([^""]+)""\)");
Sorry, If this is not what you are looking for.
if your collection looks like this:
IList<String> expression = new List<String> { "token1", "token2" };
var collection = expression.Select(s => Args(s));
As long as Args returns the same type as the queried collection type this should work okay
you can then iterate over the collection like so
foreach (var s in collection)
{
Console.WriteLine(s);
}

Categories