how to manipulate string which contains different pattern in C#? - c#

i have a following type of string format ---
Proposal is given to {Jwala Vora#3/13} for {Amazon Vally#2/11} {1#3/75} by {MdOffice employee#1/1}
the string contains pair of { } with different positions and may be n number of times.
now i want to replace that pair with other strings which i will compute depending on the string between { } pair.
how to do this ?

You could try regular expressions. Specifically, Regex.Replace variants using MatchEvaluator should do the trick. See http://msdn.microsoft.com/en-US/library/cft8645c(v=vs.80).aspx for more information.
Something along these lines:
using System;
using System.Text.RegularExpressions;
public class Replacer
{
public string Replace(string input)
{
// The regular expression passed as the second argument to the Replace method
// matches strings in the format "{value0#value1/value2}", i.e. three strings
// separated by "#" and "/" all surrounded by braces.
var result = Regex.Replace(
input,
#"{(?<value0>[^#]+)#(?<value1>[^/]+)/(?<value2>[^}]+)}",
ReplaceMatchEvaluator);
return result;
}
private string ReplaceMatchEvaluator(Match m)
{
// m.Value contains the matched string including the braces.
// This method is invoked once per matching portion of the input string.
// We can then extract each of the named groups in order to access the
// substrings of each matching portion as follows:
var value0 = m.Groups["value0"].Value; // Contains first value, e.g. "Jwala Vora"
var value1 = m.Groups["value1"].Value; // Contains second value, e.g. "3"
var value2 = m.Groups["value2"].Value; // Contains third value, e.g. "13"
// Here we can do things like convert value1 and value2 to integers...
var intValue1 = Int32.Parse(value1);
var intValue2 = Int32.Parse(value2);
// etc.
// Here we return the value with which the matching portion is replaced.
// This would be some function of value0, value1 and value2 as well as
// any other data in the Replacer class.
return "xyz";
}
}
public static class Program
{
public static void Main(string[] args)
{
var replacer = new Replacer();
var result = replacer.Replace("Proposal is given to {Jwala Vora#3/13} for {Amazon Vally#2/11} {1#3/75} by {MdOffice employee#1/1}");
Console.WriteLine(result);
}
}
This program will output Proposal is given to xyz for xyz xyz by xyz.
You'll need to provide your app-specific logic in the ReplaceMatchEvaluator method to process value0, value1 and value2 as appropriate. The class Replacer can contain additional members that can be used to implement the replacement logic in ReplaceMatchEvaluator. Strings are processed by calling Replace on an instance of the Replacer class.

Well you can split the string by '{' and '}' and determine the contents that way.
But i think a better way would be to find the chars by index and then you know the starting index and the end index of a pair or curly brackets so that way you can reconstruct the string with the placeholders replaced.
But the best method may be using Regex.Replace but that will only help to replace the placeholders with values you want but i think your requirement is to also parse the text inside of the curly brackets and based on that chose the value to be inserted so this won't work well perhaps. Find and Replace a section of a string with wildcard type search

You may use the Regex.Replace Method (String, String, MatchEvaluator) method and the {.*?} pattern. The following example uses a dictionary to replace the values, but you may replace this with your own logic.
class Program
{
static Dictionary<string, string> _dict = new Dictionary<string, string>();
static void Main(string[] args)
{
_dict.Add("{Jwala Vora#3/13}","someValue1");
_dict.Add("{Amazon Vally#2/11}", "someValue2");
_dict.Add("{1#3/75}", "someValue3");
_dict.Add("{MdOffice employee#1/1}", "someValue4");
var input = #"Proposal is given to {Jwala Vora#3/13} for {Amazon Vally#2/11} {1#3/75} by {MdOffice employee#1/1}";
var result = Regex.Replace(input, #"{.*?}", Evaluate);
Console.WriteLine(result);
}
private static string Evaluate(Match match)
{
return _dict[match.Value];
}
}

Cannot you do something with string.Format()?
For example
string.Format("Proposal is given to {0} for {1} {2} by {3}", "Jwala Vora", "Amazon Vally", 1, "MdOffice employee");

Related

Splitting string value c#

I want to know about how to splitting a value in string format in to two parts. Here in my asp application I'm parsing string value from view to controller.
And then I want to split the whole value in to two parts.
Example like: Most of the times value firest two letters could be TEXT value (like "PO" , "SS" , "GS" ) and the rest of the others are numbers (SS235452).
The length of the numbers cannot declare, since it generates randomly. So Want to split it from the begining of the string value. Need a help for that.
My current code is
string approvalnumber = approvalCheck.ApprovalNumber.ToUpper();
Thanks.
As you already mentioned that first part will have 2 letters and it's only second part which is varying, you can use Substring Method of String as shown below.
var textPart = input.Substring(0,2);
var numPart = input.Substring(2);
The first line fetches 2 characters from starting index zero and the second statement fetches all characters from index 2. You can cast the second part to a number if required.
Please note that the second parameter of Substring is not mentioned in second line. This parameter is for length and if nothing is mentioned it fetches till end of string.
You could try using regex to extract alpha, numbers from the string.
This javascript function returns only numbers from the input string.
function getNumbers(input) {
return input.match(/[0-9]+/g);
}
I'd use a RegExp. Considering the fact that you indicate ASP-NET-4 I assume you can't use tuples, out var etc. so it'd go as follows:
using System.Text.RegularExpressions;
using FluentAssertions;
using Xunit;
namespace Playground
{
public class Playground
{
public struct ProjectCodeMatch
{
public string Code { get; set; }
public int? Number { get; set; }
}
[Theory]
[InlineData("ABCDEFG123", "ABCDEFG", 123)]
[InlineData("123456", "", 123456)]
[InlineData("ABCDEFG", "ABCDEFG", null)]
[InlineData("ab123", "AB", 123)]
public void Split_Works(string input, string expectedCode, int? expectedNumber)
{
ProjectCodeMatch result;
var didParse = TryParse(input, out result);
didParse.Should().BeTrue();
result.Code.Should().Be(expectedCode);
result.Number.Should().Be(expectedNumber);
}
private static bool TryParse(string input, out ProjectCodeMatch result)
{
/*
* A word on this RegExp:
* ^ - the match must happen at the beginning of the string (nothing before that)
* (?<Code>[a-zA-Z]+) - grab any number of letters and name this part the "Code" group
* (?<Number>\d+) - grab any number of numbers and name this part the Number group
* {0,1} this group must occur at most 1 time
* $ - the match must end at the end of the string (nothing after that)
*/
var regex = new Regex(#"^(?<Code>[a-zA-Z]+){0,1}(?<Number>\d+){0,1}$");
var match = regex.Match(input);
if (!match.Success)
{
result = default;
return false;
}
int number;
var isNumber = int.TryParse(match.Groups["Number"].Value, out number);
result = new ProjectCodeMatch
{
Code = match.Groups["Code"].Value.ToUpper(),
Number = isNumber ? number : null
};
return true;
}
}
}
A linq answer:
string d = "PO1232131";
string.Join("",d.TakeWhile(a => Char.IsLetter(a)))

Is it possible to store a regex match and use part of it as a list enumerator?

I have created a MadLibs style game where the user enters responses to prompts which in turn replace blanks, represented by %s0, %s1 etc., in a story. I have this working using a for loop but someone else suggested I could do it using regex. What I have so far is below, which replaces all instances of %s+number with "wibble". What I was wondering is if it is possible to store the number found by the regex in a temporary variable and in turn use that to return a value from the list Words? E.g. return Regex.Replace(story, pattern, Global.Words[x]); where x is the number returned by the regex pattern as it goes over the string.
static void Main(string[] args)
{
Globals.Words = new List<string>();
Globals.Words.Add("nathan");
Globals.Words.Add("bob");
var text = "Once upon a time there was a %s0 and it was %s1";
Console.WriteLine(FindEscapeCharacters(text));
}
public static string FindEscapeCharacters(string story)
{
var pattern = #"%s([0-9]+)";
return Regex.Replace(story, "%s([0-9]+)", "wibble");
}
Thanks in advance, Nathan.
Not a direct answer to your question about regexes, but if I understand you correctly, there is an easier way to do this:
string baseString = "I have a {0} {1} in my {0} {2}.";
List<string> words = new List<string>() { "red", "cat", "hat" };
string outputString = String.Format(baseString, words.ToArray());
outputString will be I have a red cat in my red hat..
Is that not what you want, or is there more to your question that I'm missing?
Minor elaboration
String.Format uses the following signature:
string Format(string format, params object[] values)
The neat thing about params is that you can either list values separately:
var a = String.Format("...", valueA, valueB, valueC);
but you can also pass in an array directly:
var a = String.Format("...", valueArray);
Note that you can't mix and match the two approaches.
Yes, you are very close in your attempt with Regex.Replace; the last step is to change constant "wibble" into lambda match => how_to_replace_the_match:
var text = "Once upon a time there was a %s0 and it was %s1";
// Once upon a time there was a nathan and it was bob
var result = Regex.Replace(
text,
"%s([0-9]+)",
match => Globals.Words[int.Parse(match.Groups[1].Value)]);
Edit: In case you don't want working with capturing groups by their numbers, you can name them explicitly:
// Once upon a time there was a nathan and it was bob
var result = Regex.Replace(
text,
"%s(?<number>[0-9]+)",
match => Globals.Words[int.Parse(match.Groups["number"].Value)]);
There is an overload of Regex.Replace that, rather than taking a string for the last argument, takes a MatchEvaluator delegate - a function that takes a Match object and returns a string.
You could make that function parse the integer from the Match's Groups[1].Value property and then use that to index into your list, returning the string you find.

C# Regex, any more efficient way to parse string enclosed by symbol?

I'm not sure if it's okay to ask... But here goes.
I implemented a method that parses a string using regex, each matching are parsed through the delegates with an order ( actually, order is not important-- I think, wait, is it? ... But I wrote it this way, and it's not fully tested ):
Pattern Regex.Replace: #"(?<!\\)\$.+?\$" then String.Replace: #"\$", #"$"; Replace string enclosed by dollar sign. Ignores backslash ones, then erases backslash. Ex: "$global name$" -> "motherofglobalvar", "Money \$9000" -> "Money $9000"
Pattern Regex.Replace #"(?<!\\)%.+?%" then String.Replace #"\%", #"%"; Replace string enclosed by percentage sign. Ignores backslash ones, then erase backslash. Same as previous example: "%local var%" -> "lordoflocalvar", "It's over 9000\%" -> "It's over 9000%"
Pattern Regex.Replace #"(?<!\\)#" then String.Replace #"\#", #"#"; Replace char '#' with whitespace, ' '. But ignore backslash ones, then erase the backslash. Ex: "I#hit#the#ground#too#hard" -> "I hit the ground too hard", "qw\#op" -> "qw#op"
What I've done without much experience (I think):
//parse variable
public static string ParseVariable(string text)
{
return Regex.Replace(Regex.Replace(Regex.Replace(text, #"(?<!\\)\$.+?\$", match =>
{
string trim = match.Value.Trim('$');
string trimUpper = trim.ToUpper();
return variableGlobal.ContainsKey(trim) ? variableGlobal[trim] : match.Value;
}).Replace(#"\$", #"$"), #"(?<!\\)%.+?%", match =>
{
string trim = match.Value.Trim('%');
string trimUpper = trim.ToUpper();
return variableLocal.ContainsKey(trim) ? variableLocal[trim] : match.Value;
}).Replace(#"\%", #"%"), #"(?<!\\)#", " ").Replace(#"\#", #"#");
}
In short, what I used is: Regex.Replace().Replace()
Since I need to parse 3 kinds of symbols, I chained it as following: Regex.Replace(Regex.Replace(Regex.Replace().Replace()).Replace()).Replace()
Is there any more efficient way than this? I mean, like without need to go through the text 6 times? (3 times regex.replace, 3 times string.replace, where each replace modifies the text to be used by the next replace )
Or is it the best way it can do?
Thanks.
Here's a unique take on the problem, I think. You can build a class that will be used to construct the overall pattern piece-by-piece. This class will be responsible for the generating of the MatchEvaluator delegate that will be passed to Replace as well.
class RegexReplacer
{
public string Pattern { get; private set; }
public string Replacement { get; private set; }
public string GroupName { get; private set; }
public RegexReplacer NextReplacer { get; private set; }
public RegexReplacer(string pattern, string replacement, string groupName, RegexReplacer nextReplacer = null)
{
this.Pattern = pattern;
this.Replacement = replacement;
this.GroupName = groupName;
this.NextReplacer = nextReplacer;
}
public string GetAggregatedPattern()
{
string constructedPattern = this.Pattern;
string alternation = (this.NextReplacer == null ? string.Empty : "|" + this.NextReplacer.GetAggregatedPattern()); // If there isn't another replacer, then we won't have an alternation; otherwise, we build an alternation between this pattern and the next replacer's "full" pattern
constructedPattern = string.Format("(?<{0}>{1}){2}", this.GroupName, this.Pattern, alternation); // The (?<XXX>) syntax builds a named capture group. This is used by our GetReplacementDelegate metho.
return constructedPattern;
}
public MatchEvaluator GetReplaceDelegate()
{
return (match) =>
{
if (match.Groups[this.GroupName] != null && match.Groups[this.GroupName].Length > 0) // Did we get a hit on the group name?
{
return this.Replacement;
}
else if (this.NextReplacer != null) // No? Then is there another replacer to inspect?
{
MatchEvaluator next = this.NextReplacer.GetReplaceDelegate();
return next(match);
}
else
{
return match.Value; // No? Then simply return the value
}
};
}
}
It should be obvious as to what Pattern and Replacement represent. GroupName is kind of a hack to let the replacement evaluator know which RegexReplacer fragment resulted in the match. NextReplacer points to another replacer instance that holds a different pattern fragment (et al.).
The idea here is to have a kind of linked list of objects that will represent the overall pattern. You can call GetAggregatedPattern on the outer-most replacer to get the full pattern--each replacer calls the next replacer's GetAggregatedPattern to get that replacer's patter fragment, to which it concatenates its own fragment. The GetReplacementDelegate generates a MatchEvaluator. This MatchEvaluator will compare its own GroupName to the Match's captured groups. If the group name was captured, then we have a hit, and we return this replacer's Replacement value. Otherwise, we step into the next replacer (if there is one) and repeat the group name comparison. If there is no hit on any replacer, then we simply yield back the original value (i.e. what was matched by the pattern; this should be rare).
The usage of such might look like this:
string target = #"$global name$ Money \$9000 %local var% It's over 9000\% I#hit#the#ground#too#hard qw\#op";
RegexReplacer dollarWrapped = new RegexReplacer(#"(?<!\\)\$[^$]+\$", "motherofglobalvar", "dollarWrapped");
RegexReplacer slashDollar = new RegexReplacer(#"\\\$", string.Empty, "slashDollar", dollarWrapped);
RegexReplacer percentWrapped = new RegexReplacer(#"(?<!\\)%[^%]+%", "lordoflocalvar", "percentWrapped", slashDollar);
RegexReplacer slashPercent = new RegexReplacer(#"\\%", string.Empty, "slashPercent", percentWrapped);
RegexReplacer singleAt = new RegexReplacer(#"(?<!\\)#", " ", "singleAt", slashPercent);
RegexReplacer slashAt = new RegexReplacer(#"\\#", "#", "slashAt", singleAt);
RegexReplacer replacer = slashAt;
string pattern = replacer.GetAggregatedPattern();
MatchEvaluator evaluator = replacer.GetReplaceDelegate();
string result = Regex.Replace(target, pattern, evaluator);
Because you want each replacer to know if it got a hit, and because we are hacking this by using group names, you want to make sure that each group name is distinct. A simple way to ensure this would be to use a name that's identical to the variable name since you can't have two variables with the same name within the same scope.
You can see above that I am building each part of the pattern separately, but as I build, I pass the previous replacer as a 4th parameter to the current replacer. This builds the chain of replacers. Once built, I use the last replacer constructed in order to generate the overall pattern and evaluator. If you use anything but, then you will only have part of the overall pattern. Finally, it's simply a matter of passing the generated pattern and evaluator to the Replace method.
Keep in mind that this approach was targeted more at the problem as described. It may work in more general scenarios, but I've only worked with what you've presented. Also, since this is more of a parsing question, a parser may be the proper route to take--although the learning curve is going to be higher.
Also keep in mind that I haven't profiled this code. It certainly doesn't loop over the target string multiple times, but it does involve additional method calls during replacement. You would certainly want to test it in your environment.

Pass an array to a function (And use the function to split the array)

I want to pass a string array (separated by commas), then use a function to split the passed array by a comma, and add in a delimiter in place of the comma.
I will show you what I mean in further detail with some broken code:
String FirstData = "1";
String SecondData = "2" ;
String ThirdData = "3" ;
String FourthData = null;
FourthData = AddDelimiter(FirstData,SecondData,ThirdData);
public String AddDelimiter(String[] sData)
{
// foreach ","
String OriginalData = null;
// So, here ... I want to somehow split 'sData' by a ",".
// I know I can use the split function - which I'm having
// some trouble with - but I also believe there is some way
// to use the 'foreach' function? I wish i could put together
// some more code here but I'm a VB6 guy, and the syntax here
// is killing me. Errors everywhere.
return OriginalData;
}
Syntax doesn't matter much here, you need to get to know the Base Class Library. Also, you want to join strings apparently, not split it:
var s = string.Join(",", arrayOFStrings);
Also, if you want to pass n string to a method like that, you need the params keyword:
public string Join( params string[] data) {
return string.Join(",", data);
}
To split:
string[] splitString = sData.Split(new char[] {','});
To join in new delimiter, pass in the array of strings to String.Join:
string colonString = String.Join(":", splitString);
I think you are better off using Replace, since all you want to do is replace one delimiter with another:
string differentDelimiter = sData.Replace(",", ":");
If you have several objects and you want to put them in an array, you can write:
string[] allData = new string[] { FirstData, SecondData, ThirdData };
you can then simply give that to the function:
FourthData = AddDelimiter(allData);
C# has a nice trick, if you add a params keyword to the function definition, you can treat it as if it's a function with any number of parameters:
public String AddDelimiter(params String[] sData) { … }
…
FourthData = AddDelimiter(FirstData, SecondData, ThirdData);
As for the actual implementation, the easiest way is to use string.Join():
public String AddDelimiter(String[] sData)
{
// you can use any other string instead of ":"
return string.Join(":", sData);
}
But if you wanted to build the result yourself (for example if you wanted to learn how to do it), you could do it using string concatenation (oneString + anotherString), or even better, using StringBuilder:
public String AddDelimiter(String[] sData)
{
StringBuilder result = new StringBuilder();
bool first = true;
foreach (string s in sData)
{
if (!first)
result.Append(':');
result.Append(s);
first = false;
}
return result.ToString();
}
One version of the Split function takes an array of characters. Here is an example:
string splitstuff = string.Split(sData[0],new char [] {','});
If you don't need to perform any processing on the parts in between and just need to replace the delimiter, you could easily do so with the Replace method on the String class:
string newlyDelimited = oldString.Replace(',', ':');
For large strings, this will give you better performance, as you won't have to do a full pass through the string to break it apart and then do a pass through the parts to join them back together.
However, if you need to work with the individual parts (to recompose them into another form that does not resemble a simple replacement of the delimiter), then you would use the Split method on the String class to get an array of the delimited items and then plug those into the format you wish.
Of course, this means you have to have some sort of explicit knowledge about what each part of the delimited string means.

Parsing formatted string

I am trying to create a generic formatter/parser combination.
Example scenario:
I have a string for string.Format(), e.g. var format = "{0}-{1}"
I have an array of object (string) for the input, e.g. var arr = new[] { "asdf", "qwer" }
I am formatting the array using the format string, e.g. var res = string.Format(format, arr)
What I am trying to do is to revert back the formatted string back into the array of object (string). Something like (pseudo code):
var arr2 = string.Unformat(format, res)
// when: res = "asdf-qwer"
// arr2 should be equal to arr
Anyone have experience doing something like this? I'm thinking about using regular expressions (modify the original format string, and then pass it to Regex.Matches to get the array) and run it for each placeholder in the format string. Is this feasible or is there any other more efficient solution?
While the comments about lost information are valid, sometimes you just want to get the string values of of a string with known formatting.
One method is this blog post written by a friend of mine. He implemented an extension method called string[] ParseExact(), akin to DateTime.ParseExact(). Data is returned as an array of strings, but if you can live with that, it is terribly handy.
public static class StringExtensions
{
public static string[] ParseExact(
this string data,
string format)
{
return ParseExact(data, format, false);
}
public static string[] ParseExact(
this string data,
string format,
bool ignoreCase)
{
string[] values;
if (TryParseExact(data, format, out values, ignoreCase))
return values;
else
throw new ArgumentException("Format not compatible with value.");
}
public static bool TryExtract(
this string data,
string format,
out string[] values)
{
return TryParseExact(data, format, out values, false);
}
public static bool TryParseExact(
this string data,
string format,
out string[] values,
bool ignoreCase)
{
int tokenCount = 0;
format = Regex.Escape(format).Replace("\\{", "{");
for (tokenCount = 0; ; tokenCount++)
{
string token = string.Format("{{{0}}}", tokenCount);
if (!format.Contains(token)) break;
format = format.Replace(token,
string.Format("(?'group{0}'.*)", tokenCount));
}
RegexOptions options =
ignoreCase ? RegexOptions.IgnoreCase : RegexOptions.None;
Match match = new Regex(format, options).Match(data);
if (tokenCount != (match.Groups.Count - 1))
{
values = new string[] { };
return false;
}
else
{
values = new string[tokenCount];
for (int index = 0; index < tokenCount; index++)
values[index] =
match.Groups[string.Format("group{0}", index)].Value;
return true;
}
}
}
You can't unformat because information is lost. String.Format is a "destructive" algorithm, which means you can't (always) go back.
Create a new class inheriting from string, where you add a member that keeps track of the "{0}-{1}" and the { "asdf", "qwer" }, override ToString(), and modify a little your code.
If it becomes too tricky, just create the same class, but not inheriting from string and modify a little more your code.
IMO, that's the best way to do this.
It's simply not possible in the generic case. Some information will be "lost" (string boundaries) in the Format method. Assume:
String.Format("{0}-{1}", "hello-world", "stack-overflow");
How would you "Unformat" it?
Assuming "-" is not in the original strings, can you not just use Split?
var arr2 = formattedString.Split('-');
Note that this only applies to the presented example with an assumption. Any reverse algorithm is dependent on the kind of formatting employed; an inverse operation may not even be possible, as noted by the other answers.
A simple solution might be to
replace all format tokens with (.*)
escape all other special charaters in format
make the regex match non-greedy
This would resolve the ambiguities to the shortest possible match.
(I'm not good at RegEx, so please correct me, folks :))
After formatting, you can put the resulting string and the array of objects into a dictionary with the string as key:
Dictionary<string,string []> unFormatLookup = new Dictionary<string,string []>
...
var arr = new string [] {"asdf", "qwer" };
var res = string.Format(format, arr);
unFormatLookup.Add(res,arr);
and in Unformat method, you can simply pass a string and look up that string and return the array used:
string [] Unformat(string res)
{
string [] arr;
unFormatLoopup.TryGetValue(res,out arr); //you can also check the return value of TryGetValue and throw an exception if the input string is not in.
return arr;
}

Categories