Parsing formatted string - c#

I am trying to create a generic formatter/parser combination.
Example scenario:
I have a string for string.Format(), e.g. var format = "{0}-{1}"
I have an array of object (string) for the input, e.g. var arr = new[] { "asdf", "qwer" }
I am formatting the array using the format string, e.g. var res = string.Format(format, arr)
What I am trying to do is to revert back the formatted string back into the array of object (string). Something like (pseudo code):
var arr2 = string.Unformat(format, res)
// when: res = "asdf-qwer"
// arr2 should be equal to arr
Anyone have experience doing something like this? I'm thinking about using regular expressions (modify the original format string, and then pass it to Regex.Matches to get the array) and run it for each placeholder in the format string. Is this feasible or is there any other more efficient solution?

While the comments about lost information are valid, sometimes you just want to get the string values of of a string with known formatting.
One method is this blog post written by a friend of mine. He implemented an extension method called string[] ParseExact(), akin to DateTime.ParseExact(). Data is returned as an array of strings, but if you can live with that, it is terribly handy.
public static class StringExtensions
{
public static string[] ParseExact(
this string data,
string format)
{
return ParseExact(data, format, false);
}
public static string[] ParseExact(
this string data,
string format,
bool ignoreCase)
{
string[] values;
if (TryParseExact(data, format, out values, ignoreCase))
return values;
else
throw new ArgumentException("Format not compatible with value.");
}
public static bool TryExtract(
this string data,
string format,
out string[] values)
{
return TryParseExact(data, format, out values, false);
}
public static bool TryParseExact(
this string data,
string format,
out string[] values,
bool ignoreCase)
{
int tokenCount = 0;
format = Regex.Escape(format).Replace("\\{", "{");
for (tokenCount = 0; ; tokenCount++)
{
string token = string.Format("{{{0}}}", tokenCount);
if (!format.Contains(token)) break;
format = format.Replace(token,
string.Format("(?'group{0}'.*)", tokenCount));
}
RegexOptions options =
ignoreCase ? RegexOptions.IgnoreCase : RegexOptions.None;
Match match = new Regex(format, options).Match(data);
if (tokenCount != (match.Groups.Count - 1))
{
values = new string[] { };
return false;
}
else
{
values = new string[tokenCount];
for (int index = 0; index < tokenCount; index++)
values[index] =
match.Groups[string.Format("group{0}", index)].Value;
return true;
}
}
}

You can't unformat because information is lost. String.Format is a "destructive" algorithm, which means you can't (always) go back.
Create a new class inheriting from string, where you add a member that keeps track of the "{0}-{1}" and the { "asdf", "qwer" }, override ToString(), and modify a little your code.
If it becomes too tricky, just create the same class, but not inheriting from string and modify a little more your code.
IMO, that's the best way to do this.

It's simply not possible in the generic case. Some information will be "lost" (string boundaries) in the Format method. Assume:
String.Format("{0}-{1}", "hello-world", "stack-overflow");
How would you "Unformat" it?

Assuming "-" is not in the original strings, can you not just use Split?
var arr2 = formattedString.Split('-');
Note that this only applies to the presented example with an assumption. Any reverse algorithm is dependent on the kind of formatting employed; an inverse operation may not even be possible, as noted by the other answers.

A simple solution might be to
replace all format tokens with (.*)
escape all other special charaters in format
make the regex match non-greedy
This would resolve the ambiguities to the shortest possible match.
(I'm not good at RegEx, so please correct me, folks :))

After formatting, you can put the resulting string and the array of objects into a dictionary with the string as key:
Dictionary<string,string []> unFormatLookup = new Dictionary<string,string []>
...
var arr = new string [] {"asdf", "qwer" };
var res = string.Format(format, arr);
unFormatLookup.Add(res,arr);
and in Unformat method, you can simply pass a string and look up that string and return the array used:
string [] Unformat(string res)
{
string [] arr;
unFormatLoopup.TryGetValue(res,out arr); //you can also check the return value of TryGetValue and throw an exception if the input string is not in.
return arr;
}

Related

Minimal API - multi-value parameters separated by commas to array of strings

I have query parameters such as /api/items?sizes=m,l,xxl, meaning they are separated by commas. I want to accept them as array of strings ([FromQuery] string[] sizes).
How do I do that? I know how to split the string, the issue is how do I accept string[] and let make sure it knows how to split the string?
string[] sizes = request.Sizes.Split(",", StringSplitOptions.RemoveEmptyEntries);
Such transformation is not supported even for MVC binders (it will require query string in one of the following formats: ?sizes[0]=3344&sizes[1]=2222 or ?sizes=24041&sizes=24117).
You can try using custom binding:
public class ArrayParser
{
public string[] Value { get; init; }
public static bool TryParse(string? value, out ArrayParser result)
{
result = new()
{
Value = value?.Split(',', StringSplitOptions.RemoveEmptyEntries) ?? Array.Empty<string>()
};
return true;
}
}
And usage:
app.MapGet("/api/query-arr", (ArrayParser sizes) => sizes.Value);
Try using %2c in the URL to replace the commas.

Round-trip-safe escaping of strings in C#

I am confused by all the different escaping mechanisms for strings in C#. What I want is an escaping/unescaping method that:
1) Can be used on any string
2) escape+unescape is guaranteed to return the initial string
3) Replaces all punctuation with something else. If that is too much to ask, then at least commas, braces, and #. I am fine with spaces not being escaped.
4) Is unlikely to ever change.
Does it exist?
EDIT: This is for purposes of seriliazing and deserializing app-generated attributes. So my object may or may not have values for Attribute1, Attribute2, Attribute3, etc. Simplifying a bit, the idea is to do something like the below. Goal is to have the encoded collection be brief and more-or-less human-readable.
I am asking what methods would make sense to use for Escape and Unescape.
public abstract class GenericAttribute {
const string key1 = "KEY1"; //It is fine to put some restrictions on the keys, i.e. no punctuation
const string key2 = "KEY2";
public abstract string Encode(); // NO RESTRICTIONS ON WHAT ENCODE MIGHT RETURN
public static GenericAttribute FromKeyValuePair (string key, string value) {
switch (key) {
case key1: return new ConcreteAttribute1(value);
case key2: return new ConcreteAttribute2(value);
// etc.
}
}
}
public class AttributeCollection {
Dictionary <string, GenericAttribute> Content {get;set;}
public string Encode() {
string r = "";
bool first = true;
foreach (KeyValuePair<string, GenericAttribute> pair in this.Content) {
if (first) {
first = false;
} else {
r+=",";
}
r+=(pair.Key + "=" + Escape(pair.Value.Encode()));
}
return r;
}
public AttributeCollection(string encodedCollection) {
// input string is the return value of the Encode method
this.Content = new Dictionary<string, GenericAttribute>();
string[] array = encodedCollection.Split(',');
foreach(string component in array) {
int equalsIndex = component.IndexOf('=');
string key = component.Substring(0, equalsIndex);
string value = component.Substring(equalsIndex+1);
GenericAttribute attribute = GenericAttribute.FromKeyValuePair(key, Unescape(value));
this.Content[key]=attribute;
}
}
}
I'm not entirely sure what your asking, but I believe your intent is for the escaped character to be included, even with the escape.
var content = #"\'Hello";
Console.WriteLine(content);
// Output:
\'Hello
By utilizing the # it will include said escaping, making it apart of your string. That is for the server-side with C#, to account for other languages and escape formats only you would know that.
You can find some great information on C# escaping here:
MSDN Blog
Try using HttpServerUtility.UrlEncode and HttpServerUtility.UrlDecode. I think that will encode and decode all the things you want.
See the MSDN Docs and here is a description of the mapping on Wikipedia.

What is the best way to check if a string can be parsed into an int array?

I need to determine if a string can be parsed into an array of int. The string MAY be in the format
"124,456,789,0"
In case which can it can converted thus:
int[] Ids = SearchTerm.Split(',').Select(int.Parse).ToArray();
However the string may also be something like:
"Here is a string, it is very nice."
In which case the parsing fails.
The logic currently branches in two directions based on whether the string contains a comma character (assuming that only the array-like strings will contain this character) but this logic is now flawed and comma characters are now appearing in other strings.
I could put a Try..Catch around it but I am generally adverse to controlling logic flow by exceptions.
Is there an easy way to do this?
I could put a Try..Catch around it but I am generally adverse to controlling logic flow by exceptions
Good attitude. If you can avoid the exception, do so.
A number of answers have suggested
int myint;
bool parseFailed = SearchTerm.Split(',')
.Any( s => !int.TryParse(s, out myint));
Which is not bad, but not great either. I would be inclined to first, write a better helper method:
static class Extensions
{
public static int? TryParseAsInteger(this string s)
{
int j;
bool success = int.TryParse(s, out j);
if (success)
return j;
else
return null;
}
}
Now you can say:
bool parseFailed = SearchTerm.Split(',')
.Any( s => s.TryParseAsInteger() == null);
But I assume that what you really want is the parsed state if it can succeed, rather than just answering the question "would a parse succeed?" With this helper method you can say:
List<int?> parse = SearchTerm.Split(',')
.Select( s => s.TryParseAsInteger() )
.ToList();
And now if the list contains any nulls, you know that it was bad; if it doesn't contain any nulls then you have the results you wanted:
int[] results = parse.Contains(null) ? null : parse.Select(x=>x.Value).ToArray();
int myint;
bool parseFailed = SearchTerm.Split(',')
.Any( s => !int.TryParse(s, out myint));
You can use multiline lambda expression to get int.TryParse for every Split method result:
var input = "124,456,789,0";
var parts = input.Split(new [] {","}, StringSplitOptions.RemoveEmptyEntries);
var numbers
= parts.Select(x =>
{
int v;
if (!int.TryParse(x, out v))
return (int?)null;
return (int?)v;
}).ToList();
if (numbers.Any(x => !x.HasValue))
Console.WriteLine("string cannot be parsed as int[]");
else
Console.WriteLine("OK");
It will not only check if value can be parsed to int, but also return the value if it can, so you don't have to do the parsing twice.
you can use RegEx to determine if the string match your pattern
something like this
string st = "124,456,789,0";
string pattS = #"[0-9](?:\d{0,2})";
Regex regex = new Regex(pattS);
var res = regex.Matches(st);
foreach (var re in res)
{
//your code here
}
tested on rubular.com here
How about,
int dummy;
var parsable = SearchTerm.Split(',').All(s => int.TryParse(s, out dummy));
but if you are doing that you might as well just catch the exception
Why dont you first remove the characters from the string and use
bool res = int.TryParse(text1, out num1);
Example below has no limit.
The BigInteger type is an immutable type that represents an arbitrarily large integer whose value in theory has no upper or lower bounds.
BigInteger MSDN
string test = "20,100,100,100,100,100,100";
test = test.Replace(",", "");
BigInteger num1 = 0;
bool res = BigInteger.TryParse(test, out num1);

how to manipulate string which contains different pattern in C#?

i have a following type of string format ---
Proposal is given to {Jwala Vora#3/13} for {Amazon Vally#2/11} {1#3/75} by {MdOffice employee#1/1}
the string contains pair of { } with different positions and may be n number of times.
now i want to replace that pair with other strings which i will compute depending on the string between { } pair.
how to do this ?
You could try regular expressions. Specifically, Regex.Replace variants using MatchEvaluator should do the trick. See http://msdn.microsoft.com/en-US/library/cft8645c(v=vs.80).aspx for more information.
Something along these lines:
using System;
using System.Text.RegularExpressions;
public class Replacer
{
public string Replace(string input)
{
// The regular expression passed as the second argument to the Replace method
// matches strings in the format "{value0#value1/value2}", i.e. three strings
// separated by "#" and "/" all surrounded by braces.
var result = Regex.Replace(
input,
#"{(?<value0>[^#]+)#(?<value1>[^/]+)/(?<value2>[^}]+)}",
ReplaceMatchEvaluator);
return result;
}
private string ReplaceMatchEvaluator(Match m)
{
// m.Value contains the matched string including the braces.
// This method is invoked once per matching portion of the input string.
// We can then extract each of the named groups in order to access the
// substrings of each matching portion as follows:
var value0 = m.Groups["value0"].Value; // Contains first value, e.g. "Jwala Vora"
var value1 = m.Groups["value1"].Value; // Contains second value, e.g. "3"
var value2 = m.Groups["value2"].Value; // Contains third value, e.g. "13"
// Here we can do things like convert value1 and value2 to integers...
var intValue1 = Int32.Parse(value1);
var intValue2 = Int32.Parse(value2);
// etc.
// Here we return the value with which the matching portion is replaced.
// This would be some function of value0, value1 and value2 as well as
// any other data in the Replacer class.
return "xyz";
}
}
public static class Program
{
public static void Main(string[] args)
{
var replacer = new Replacer();
var result = replacer.Replace("Proposal is given to {Jwala Vora#3/13} for {Amazon Vally#2/11} {1#3/75} by {MdOffice employee#1/1}");
Console.WriteLine(result);
}
}
This program will output Proposal is given to xyz for xyz xyz by xyz.
You'll need to provide your app-specific logic in the ReplaceMatchEvaluator method to process value0, value1 and value2 as appropriate. The class Replacer can contain additional members that can be used to implement the replacement logic in ReplaceMatchEvaluator. Strings are processed by calling Replace on an instance of the Replacer class.
Well you can split the string by '{' and '}' and determine the contents that way.
But i think a better way would be to find the chars by index and then you know the starting index and the end index of a pair or curly brackets so that way you can reconstruct the string with the placeholders replaced.
But the best method may be using Regex.Replace but that will only help to replace the placeholders with values you want but i think your requirement is to also parse the text inside of the curly brackets and based on that chose the value to be inserted so this won't work well perhaps. Find and Replace a section of a string with wildcard type search
You may use the Regex.Replace Method (String, String, MatchEvaluator) method and the {.*?} pattern. The following example uses a dictionary to replace the values, but you may replace this with your own logic.
class Program
{
static Dictionary<string, string> _dict = new Dictionary<string, string>();
static void Main(string[] args)
{
_dict.Add("{Jwala Vora#3/13}","someValue1");
_dict.Add("{Amazon Vally#2/11}", "someValue2");
_dict.Add("{1#3/75}", "someValue3");
_dict.Add("{MdOffice employee#1/1}", "someValue4");
var input = #"Proposal is given to {Jwala Vora#3/13} for {Amazon Vally#2/11} {1#3/75} by {MdOffice employee#1/1}";
var result = Regex.Replace(input, #"{.*?}", Evaluate);
Console.WriteLine(result);
}
private static string Evaluate(Match match)
{
return _dict[match.Value];
}
}
Cannot you do something with string.Format()?
For example
string.Format("Proposal is given to {0} for {1} {2} by {3}", "Jwala Vora", "Amazon Vally", 1, "MdOffice employee");

Pass an array to a function (And use the function to split the array)

I want to pass a string array (separated by commas), then use a function to split the passed array by a comma, and add in a delimiter in place of the comma.
I will show you what I mean in further detail with some broken code:
String FirstData = "1";
String SecondData = "2" ;
String ThirdData = "3" ;
String FourthData = null;
FourthData = AddDelimiter(FirstData,SecondData,ThirdData);
public String AddDelimiter(String[] sData)
{
// foreach ","
String OriginalData = null;
// So, here ... I want to somehow split 'sData' by a ",".
// I know I can use the split function - which I'm having
// some trouble with - but I also believe there is some way
// to use the 'foreach' function? I wish i could put together
// some more code here but I'm a VB6 guy, and the syntax here
// is killing me. Errors everywhere.
return OriginalData;
}
Syntax doesn't matter much here, you need to get to know the Base Class Library. Also, you want to join strings apparently, not split it:
var s = string.Join(",", arrayOFStrings);
Also, if you want to pass n string to a method like that, you need the params keyword:
public string Join( params string[] data) {
return string.Join(",", data);
}
To split:
string[] splitString = sData.Split(new char[] {','});
To join in new delimiter, pass in the array of strings to String.Join:
string colonString = String.Join(":", splitString);
I think you are better off using Replace, since all you want to do is replace one delimiter with another:
string differentDelimiter = sData.Replace(",", ":");
If you have several objects and you want to put them in an array, you can write:
string[] allData = new string[] { FirstData, SecondData, ThirdData };
you can then simply give that to the function:
FourthData = AddDelimiter(allData);
C# has a nice trick, if you add a params keyword to the function definition, you can treat it as if it's a function with any number of parameters:
public String AddDelimiter(params String[] sData) { … }
…
FourthData = AddDelimiter(FirstData, SecondData, ThirdData);
As for the actual implementation, the easiest way is to use string.Join():
public String AddDelimiter(String[] sData)
{
// you can use any other string instead of ":"
return string.Join(":", sData);
}
But if you wanted to build the result yourself (for example if you wanted to learn how to do it), you could do it using string concatenation (oneString + anotherString), or even better, using StringBuilder:
public String AddDelimiter(String[] sData)
{
StringBuilder result = new StringBuilder();
bool first = true;
foreach (string s in sData)
{
if (!first)
result.Append(':');
result.Append(s);
first = false;
}
return result.ToString();
}
One version of the Split function takes an array of characters. Here is an example:
string splitstuff = string.Split(sData[0],new char [] {','});
If you don't need to perform any processing on the parts in between and just need to replace the delimiter, you could easily do so with the Replace method on the String class:
string newlyDelimited = oldString.Replace(',', ':');
For large strings, this will give you better performance, as you won't have to do a full pass through the string to break it apart and then do a pass through the parts to join them back together.
However, if you need to work with the individual parts (to recompose them into another form that does not resemble a simple replacement of the delimiter), then you would use the Split method on the String class to get an array of the delimited items and then plug those into the format you wish.
Of course, this means you have to have some sort of explicit knowledge about what each part of the delimited string means.

Categories