String tokens in .NET - c#

I am writing a app in .NET which will generate random text based on some input. So if I have text like "I love your {lovely|nice|great} dress" I want to choose randomly from lovely/nice/great and use that in text. Any suggestions in C# or VB.NET are welcome.

You could do it using a regex to make a replacement for each {...}. The Regex.Replace function can take a MatchEvaluator which can do the logic for selecting a random value from the choices:
Random random = new Random();
string s = "I love your {lovely|nice|great} dress";
s = Regex.Replace(s, #"\{(.*?)\}", match => {
string[] options = match.Groups[1].Value.Split('|');
int index = random.Next(options.Length);
return options[index];
});
Console.WriteLine(s);
Example output:
I love your lovely dress
Update: Translated to VB.NET automatically using .NET Reflector:
Dim random As New Random
Dim s As String = "I love your {lovely|nice|great} dress"
s = Regex.Replace(s, "\{(.*?)\}", Function (ByVal match As Match)
Dim options As String() = match.Groups.Item(1).Value.Split(New Char() { "|"c })
Dim index As Integer = random.Next(options.Length)
Return options(index)
End Function)

This may be a bit of an abuse of the custom formatting functionality available through the ICustomFormatter and IFormatProvider interfaces, but you could do something like this:
public class ListSelectionFormatter : IFormatProvider, ICustomFormatter
{
#region IFormatProvider Members
public object GetFormat(Type formatType)
{
if (typeof(ICustomFormatter).IsAssignableFrom(formatType))
return this;
else
return null;
}
#endregion
#region ICustomFormatter Members
public string Format(string format, object arg, IFormatProvider formatProvider)
{
string[] values = format.Split('|');
if (values == null || values.Length == 0)
throw new FormatException("The format is invalid. At least one value must be specified.");
if (arg is int)
return values[(int)arg];
else if (arg is Random)
return values[(arg as Random).Next(values.Length)];
else if (arg is ISelectionPicker)
return (arg as ISelectionPicker).Pick(values);
else
throw new FormatException("The argument is invalid.");
}
#endregion
}
public interface ISelectionPicker
{
string Pick(string[] values);
}
public class RandomSelectionPicker : ISelectionPicker
{
Random rng = new Random();
public string Pick(string[] values)
{
// use whatever logic is desired here to choose the correct value
return values[rng.Next(values.Length)];
}
}
class Stuff
{
public static void DoStuff()
{
RandomSelectionPicker picker = new RandomSelectionPicker();
string result = string.Format(new ListSelectionFormatter(), "I am feeling {0:funky|great|lousy}. I should eat {1:a banana|cereal|cardboard}.", picker, picker);
}
}

String.Format("static text {0} more text {1}", randomChoice0, randomChoice1);

write a simple parser that will get the information in the braces, split it with string.Split , get a random index for that array and build up the string again.
use the StringBuilder for building the result due to performance issues with other stringoperations.

Related

Advanced string formatter with pattern support in C#

I would like to have a flexible template that can translate cases similar to:
WHnnn => WH001, WH002, WH003... (nnn is just a number indicated 3 digits)
INVyyyyMMdd => INV20220228
ORDERyyyyMMdd-nnn => ORDER20220228-007
I know that I can use the following code to achieve a specific template:
string.Format("INV{0:yyyy-MM-dd}", DateTime.Now)
Which should have the same result as case 2 above. But that's not flexible. As the customer may customize their own template as long as I can understand/support, like the third case above.
I know even for the third case, I can do something like this:
string.Format("ORDER{0:yyyy-MM-dd}-{1:d3}", DateTime.Now, 124)
But that's clumsy, as I would like the template (input) to be just like this:
ORDERyyyyMMdd-nnn
The requirement is to support all the supported patterns by string.Format in C#, but the template can be any combination of those patterns.
I would probably use a custom formatter for this case.
Create a new class that will contain date/time & number and will implement IFormattable interface.
There is one tip: use some internal format in style INV{nnn} or INV[nnn] where only the part in {} or [] will be replaced with the value.
Otherwise there could be unwanted changes like in Inv contains 'n'. You could get output as I7v.
In your examples the N is upper case, but will it be the case even after each customisation?
Code (simplified version):
internal sealed class InvoiceNumberInfo : IFormattable
{
private static readonly Regex formatMatcher = new Regex(#"^(?<before>.*?)\[(?<code>\w+?)\](?<after>.*)$");
private readonly DateTime date;
private readonly int number;
public InvoiceNumberInfo(DateTime date, int number)
{
this.date = date;
this.number = number;
}
public string ToString(string format, IFormatProvider formatProvider)
{
var output = format;
while (true)
{
var match = formatMatcher.Match(output);
if (!match.Success)
{
return output;
}
output = match.Groups["before"].Value + FormatValue(match.Groups["code"].Value) + match.Groups["after"].Value;
}
}
private string FormatValue(string code)
{
if (code[0] == 'n')
{
var numberFormat = "D" + code.Length.ToString(CultureInfo.InvariantCulture);
return this.number.ToString(numberFormat);
}
else
{
return this.date.ToString(code);
}
}
}
Use:
internal static class Program
{
public static void Main(string[] args)
{
if (args.Length == 0)
{
Console.WriteLine("No format to display");
return;
}
var inv = new InvoiceNumberInfo(DateTime.Now, number: 7);
foreach (string format in args)
{
Console.WriteLine("Format: '{0}' is formatted as {1:" + format + "}", format, inv);
}
}
}
And output:
Format: 'WH[nnn]' is formatted as WH007
Format: 'INV[yyyyMMdd]' is formatted as INV20220227
Format: 'ORDER[yyyyMMdd]-[nnn]' is formatted as ORDER20220227-007
NOTE This is only a simplified version, proof of concept. Use proper error checking for your code.

c# if contains AND/OR

Is there a variant of this?
if (blabla.Contains("I'm a noob") | blabla.Contains("sry") | blabla.Contains("I'm a noob "+"sry"))
{
//stuff
}
like:
if (blabla.Contains("I'm a noob) and/or ("sry")
{
//stuff
}
Help is appreciated!
You can't collapse it quite as far as you asked, but you can do:
if (blabla.Contains("I'm a noob") || blabla.Contains("sry"))
{
//stuff
}
The "and" case is handled here because a string with both would actually pass both of the statements in the "Or".
As far as I'm aware, there are no built-in methods to do this. But with a little LINQ and extension methods, you can create your own methods that will check to see if a string contains any or all tokens:
public static class ExtensionMethods{
public static bool ContainsAny(this string s, params string[] tokens){
return tokens.Any(t => s.Contains(t));
}
public static bool ContainsAll(this string s, params string[] tokens){
return tokens.All(t => s.Contains(t));
}
}
You could use it like this (remember, params arrays take a variable number of parameters, so you're not limited to just two like in my example):
var str = "this is a string";
Console.WriteLine(str.ContainsAny("this", "fake"));
Console.WriteLine(str.ContainsAny("doesn't", "exist"));
Console.WriteLine(str.ContainsAll("this", "is"));
Console.WriteLine(str.ContainsAll("this", "fake"));
Output:
True
False
True
False
Edit:
For the record, LINQ is not necessary. You could just as easily write them this way:
public static class ExtensionMethods{
public static bool ContainsAny(this string s, params string[] tokens){
foreach(string token in tokens)
if(s.Contains(token)) return true;
return false;
}
public static bool ContainsAll(this string s, params string[] tokens){
foreach(string token in tokens)
if(!s.Contains(token)) return false;
return true;
}
}
var arr = new[]{"I'm a noob" ,"sry", "I'm a noob +sry"};
if(arr.Any(x => blabla.Contains(x)))
{
}
You can use a regex:
Regex r = new Regex("I'm a noob|sry|I'm a noob sry");
if(r.IsMatch(blabla)) {
//TODO: do something
}
Regular expressions have other advanced features like: a* matches with the empty string, a, aa, aaa,...
The funny part is that if you "compile" the regex (for instance using new Regex("I'm a noob|sry|I'm a noob sry",RegexOptions.Compiled), C# will turn it automatically into the fastest solution mechanism possible. For instance if blabla is a 100k chars string, you will only run once over the entire string. And for instance redundant parts like I'm a noob sry will be omitted automatically.

Enum and string match

I'm essentially trying to read an xml file. One of the values has a suffix, e.g. "30d". This is meant to mean '30 days'. So I'm trying to convert this to a DateTime.Now.AddDays(30). To read this field in the XML, i decided to use an Enum:
enum DurationType { Min = "m", Hours = "h", Days = "d" }
Now I'm not exactly sure how exactly to approach this efficiently (I'm a little daft when it comes to enums). Should I separate the suffix, in this case "d", out of the string first, then try and match it in the enum using a switch statement?
I guess if you dumb down my question, it'd be: What's the best way to get from 30d, to DateTime.Now.AddDays(30) ?
You could make an ExtensionMethod to parse the string and return the DateTime you want
Something like:
public static DateTime AddDuration(this DateTime datetime, string str)
{
int value = 0;
int mutiplier = str.EndsWith("d") ? 1440 : str.EndsWith("h") ? 60 : 1;
if (int.TryParse(str.TrimEnd(new char[]{'m','h','d'}), out value))
{
return datetime.AddMinutes(value * mutiplier);
}
return datetime;
}
Usage:
var date = DateTime.Now.AddDuration("2d");
This seems like a good place to use regular expressions; specifically, capture groups.
Below is a working example:
using System;
using System.Text.RegularExpressions;
namespace RegexCaptureGroups
{
class Program
{
// Below is a breakdown of this regular expression:
// First, one or more digits followed by "d" or "D" to represent days.
// Second, one or more digits followed by "h" or "H" to represent hours.
// Third, one or more digits followed by "m" or "M" to represent minutes.
// Each component can be separated by any number of spaces, or none.
private static readonly Regex DurationRegex = new Regex(#"((?<Days>\d+)d)?\s*((?<Hours>\d+)h)?\s*((?<Minutes>\d+)m)?", RegexOptions.IgnoreCase);
public static TimeSpan ParseDuration(string input)
{
var match = DurationRegex.Match(input);
var days = match.Groups["Days"].Value;
var hours = match.Groups["Hours"].Value;
var minutes = match.Groups["Minutes"].Value;
int daysAsInt32, hoursAsInt32, minutesAsInt32;
if (!int.TryParse(days, out daysAsInt32))
daysAsInt32 = 0;
if (!int.TryParse(hours, out hoursAsInt32))
hoursAsInt32 = 0;
if (!int.TryParse(minutes, out minutesAsInt32))
minutesAsInt32 = 0;
return new TimeSpan(daysAsInt32, hoursAsInt32, minutesAsInt32, 0);
}
static void Main(string[] args)
{
Console.WriteLine(ParseDuration("30d"));
Console.WriteLine(ParseDuration("12h"));
Console.WriteLine(ParseDuration("20m"));
Console.WriteLine(ParseDuration("1d 12h"));
Console.WriteLine(ParseDuration("5d 30m"));
Console.WriteLine(ParseDuration("1d 12h 20m"));
Console.WriteLine("Press any key to exit.");
Console.ReadKey();
}
}
}
EDIT: Below is an alternative, slightly more condensed version of the above, though I'm not sure which one I prefer more. I'm usually not a fan of overly dense code.
I adjusted the regular expression to put a limit of 10 digits on each number. This allows me to safely use the int.Parse function, because I know that the input consists of at least one digit and at most ten (unless it didn't capture at all, in which case it would be empty string: hence, the purpose of the ParseInt32ZeroIfNullOrEmpty method).
// Below is a breakdown of this regular expression:
// First, one to ten digits followed by "d" or "D" to represent days.
// Second, one to ten digits followed by "h" or "H" to represent hours.
// Third, one to ten digits followed by "m" or "M" to represent minutes.
// Each component can be separated by any number of spaces, or none.
private static readonly Regex DurationRegex = new Regex(#"((?<Days>\d{1,10})d)?\s*((?<Hours>\d{1,10})h)?\s*((?<Minutes>\d{1,10})m)?", RegexOptions.IgnoreCase);
private static int ParseInt32ZeroIfNullOrEmpty(string input)
{
return string.IsNullOrEmpty(input) ? 0 : int.Parse(input);
}
public static TimeSpan ParseDuration(string input)
{
var match = DurationRegex.Match(input);
return new TimeSpan(
ParseInt32ZeroIfNullOrEmpty(match.Groups["Days"].Value),
ParseInt32ZeroIfNullOrEmpty(match.Groups["Hours"].Value),
ParseInt32ZeroIfNullOrEmpty(match.Groups["Minutes"].Value),
0);
}
EDIT: Just to take this one more step, I've added another version below, which handles days, hours, minutes, seconds, and milliseconds, with a variety of abbreviations for each. I split the regular expression into multiple lines for readability. Note, I also had to adjust the expression by using (\b|(?=[^a-z])) at the end of each component: this is because the "ms" unit was being captured as the "m" unit. The special syntax of "?=" used with "[^a-z]" indicates to match the character but not to "consume" it.
// Below is a breakdown of this regular expression:
// First, one to ten digits followed by "d", "dy", "dys", "day", or "days".
// Second, one to ten digits followed by "h", "hr", "hrs", "hour", or "hours".
// Third, one to ten digits followed by "m", "min", "minute", or "minutes".
// Fourth, one to ten digits followed by "s", "sec", "second", or "seconds".
// Fifth, one to ten digits followed by "ms", "msec", "millisec", "millisecond", or "milliseconds".
// Each component may be separated by any number of spaces, or none.
// The expression is case-insensitive.
private static readonly Regex DurationRegex = new Regex(#"
((?<Days>\d{1,10})(d|dy|dys|day|days)(\b|(?=[^a-z])))?\s*
((?<Hours>\d{1,10})(h|hr|hrs|hour|hours)(\b|(?=[^a-z])))?\s*
((?<Minutes>\d{1,10})(m|min|minute|minutes)(\b|(?=[^a-z])))?\s*
((?<Seconds>\d{1,10})(s|sec|second|seconds)(\b|(?=[^a-z])))?\s*
((?<Milliseconds>\d{1,10})(ms|msec|millisec|millisecond|milliseconds)(\b|(?=[^a-z])))?",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
private static int ParseInt32ZeroIfNullOrEmpty(string input)
{
return string.IsNullOrEmpty(input) ? 0 : int.Parse(input);
}
public static TimeSpan ParseDuration(string input)
{
var match = DurationRegex.Match(input);
return new TimeSpan(
ParseInt32ZeroIfNullOrEmpty(match.Groups["Days"].Value),
ParseInt32ZeroIfNullOrEmpty(match.Groups["Hours"].Value),
ParseInt32ZeroIfNullOrEmpty(match.Groups["Minutes"].Value),
ParseInt32ZeroIfNullOrEmpty(match.Groups["Seconds"].Value),
ParseInt32ZeroIfNullOrEmpty(match.Groups["Milliseconds"].Value));
}
update:
Don't vote for this. I'm leaving it simply because it's an alternative approach. Instead look at sa_ddam213 and Dr. Wily's Apprentice's answers.
Should I separate the suffix, in this case "d", out of the string
first, then try and match it in the enum using a switch statement?
Yes.
For a fully working example:
private void button1_Click( object sender, EventArgs e ) {
String value = "30d";
Duration d = (Duration)Enum.Parse(typeof(Duration), value.Substring(value.Length - 1, 1).ToUpper());
DateTime result = d.From(new DateTime(), value);
MessageBox.Show(result.ToString());
}
enum Duration { D, W, M, Y };
static class DurationExtensions {
public static DateTime From( this Duration duration, DateTime dateTime, Int32 period ) {
switch (duration)
{
case Duration.D: return dateTime.AddDays(period);
case Duration.W: return dateTime.AddDays((period*7));
case Duration.M: return dateTime.AddMonths(period);
case Duration.Y: return dateTime.AddYears(period);
default: throw new ArgumentOutOfRangeException("duration");
}
}
public static DateTime From( this Duration duration, DateTime dateTime, String fullValue ) {
Int32 period = Convert.ToInt32(fullValue.ToUpper().Replace(duration.ToString(), String.Empty));
return From(duration, dateTime, period);
}
}
I really don't see how using an enum helps here.
Here's how I might approach it.
string s = "30d";
int typeIndex = s.IndexOfAny(new char[] { 'd', 'w', 'm' });
if (typeIndex > 0)
{
int value = int.Parse(s.Substring(0, typeIndex));
switch (s[typeIndex])
{
case 'd':
result = DateTime.Now.AddDays(value);
break;
case 'w':
result = DateTime.Now.AddDays(value * 7);
break;
case 'm':
result = DateTime.Now.AddMonths(value);
break;
}
}
Depending on the reliability of your input data, you might need to use int.TryParse() instead of int.Parse(). Otherwise, this should be all you need.
Note: I've also written a sscanf() replacement for .NET that would handle this quite easily. You can see the code for that in the article A sscanf() Replacement for .NET.
Try the following code, assuming that values like "30d" are in a string 'val'.
DateTime ConvertValue(string val) {
if (val.Length > 0) {
int prefix = Convert.ToInt32(val.Length.Remove(val.Length-1));
switch (val[val.Length-1]) {
case 'd': return DateTime.Now.AddDays(prefix);
case 'm': return DateTime.Now.AddMonths(prefix);
// etc.
}
throw new ArgumentException("string in unexpected format.");
}
Example of a console application example/tutorial:
enum DurationType
{
[DisplayName("m")]
Min = 1,
[DisplayName("h")]
Hours = 1 * 60,
[DisplayName("d")]
Days = 1 * 60 * 24
}
internal class Program
{
private static void Main(string[] args)
{
string input1 = "10h";
string input2 = "1d10h3m";
var x = GetOffsetFromDate(DateTime.Now, input1);
var y = GetOffsetFromDate(DateTime.Now, input2);
}
private static Dictionary<string, DurationType> suffixDictionary
{
get
{
return Enum
.GetValues(typeof (DurationType))
.Cast<DurationType>()
.ToDictionary(duration => duration.GetDisplayName(), duration => duration);
}
}
public static DateTime GetOffsetFromDate(DateTime date, string input)
{
MatchCollection matches = Regex.Matches(input, #"(\d+)([a-zA-Z]+)");
foreach (Match match in matches)
{
int numberPart = Int32.Parse(match.Groups[1].Value);
string suffix = match.Groups[2].Value;
date = date.AddMinutes((int)suffixDictionary[suffix]);
}
return date;
}
}
[AttributeUsage(AttributeTargets.Field)]
public class DisplayNameAttribute : Attribute
{
public DisplayNameAttribute(String name)
{
this.name = name;
}
protected String name;
public String Name { get { return this.name; } }
}
public static class ExtensionClass
{
public static string GetDisplayName<TValue>(this TValue value) where TValue : struct, IConvertible
{
FieldInfo fi = typeof(TValue).GetField(value.ToString());
DisplayNameAttribute attribute = (DisplayNameAttribute)fi.GetCustomAttributes(typeof(DisplayNameAttribute), false).FirstOrDefault();
if (attribute != null)
return attribute.Name;
return value.ToString();
}
}
Uses an attribute to define your suffix, uses the enum value to define your offset.
Requires:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;
using System.Text.RegularExpressions;
It may be considered a hack to use the enum integer value but this example will still let you parse out all the Enums (for any other use like switch case) with little tweaks.
Enums can't be backed with non-numeric types, so string-based enums are out. It's possible you may be overthinking it. Without knowing any more about the problem, the most straightforward solution seems to be splitting off the last character, converting the rest to an int, and then handling each final char as a separate case.
I'd suggest using regexp to strip the number first and than execute Enum.Parse Method to evaluate the value of the enum. Than you can use a switch (see Corylulu's answer) to get the right offset, based on the parsed number and enum value.

Converting string to double in C#

I have a long string with double-type values separated by # -value1#value2#value3# etc
I splitted it to string table. Then, I want to convert every single element from this table to double type and I get an error. What is wrong with type-conversion here?
string a = "52.8725945#18.69872650000002#50.9028073#14.971600200000012#51.260062#15.5859949000000662452.23862099999999#19.372202799999250800000045#51.7808372#19.474096499999973#";
string[] someArray = a.Split(new char[] { '#' });
for (int i = 0; i < someArray.Length; i++)
{
Console.WriteLine(someArray[i]); // correct value
Convert.ToDouble(someArray[i]); // error
}
There are 3 problems.
1) Incorrect decimal separator
Different cultures use different decimal separators (namely , and .).
If you replace . with , it should work as expected:
Console.WriteLine(Convert.ToDouble("52,8725945"));
You can parse your doubles using overloaded method which takes culture as a second parameter. In this case you can use InvariantCulture (What is the invariant culture) e.g. using double.Parse:
double.Parse("52.8725945", System.Globalization.CultureInfo.InvariantCulture);
You should also take a look at double.TryParse, you can use it with many options and it is especially useful to check wheter or not your string is a valid double.
2) You have an incorrect double
One of your values is incorrect, because it contains two dots:
15.5859949000000662452.23862099999999
3) Your array has an empty value at the end, which is an incorrect double
You can use overloaded Split which removes empty values:
string[] someArray = a.Split(new char[] { '#' }, StringSplitOptions.RemoveEmptyEntries);
Add a class as Public and use it very easily like convertToInt32()
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
/// <summary>
/// Summary description for Common
/// </summary>
public static class Common
{
public static double ConvertToDouble(string Value) {
if (Value == null) {
return 0;
}
else {
double OutVal;
double.TryParse(Value, out OutVal);
if (double.IsNaN(OutVal) || double.IsInfinity(OutVal)) {
return 0;
}
return OutVal;
}
}
}
Then Call The Function
double DirectExpense = Common.ConvertToDouble(dr["DrAmount"].ToString());
Most people already tried to answer your questions.
If you are still debugging, have you thought about using:
Double.TryParse(String, Double);
This will help you in determining what is wrong in each of the string first before you do the actual parsing.
If you have a culture-related problem, you might consider using:
Double.TryParse(String, NumberStyles, IFormatProvider, Double);
This http://msdn.microsoft.com/en-us/library/system.double.tryparse.aspx has a really good example on how to use them.
If you need a long, Int64.TryParse is also available: http://msdn.microsoft.com/en-us/library/system.int64.tryparse.aspx
Hope that helps.
private double ConvertToDouble(string s)
{
char systemSeparator = Thread.CurrentThread.CurrentCulture.NumberFormat.CurrencyDecimalSeparator[0];
double result = 0;
try
{
if (s != null)
if (!s.Contains(","))
result = double.Parse(s, CultureInfo.InvariantCulture);
else
result = Convert.ToDouble(s.Replace(".", systemSeparator.ToString()).Replace(",", systemSeparator.ToString()));
}
catch (Exception e)
{
try
{
result = Convert.ToDouble(s);
}
catch
{
try
{
result = Convert.ToDouble(s.Replace(",", ";").Replace(".", ",").Replace(";", "."));
}
catch {
throw new Exception("Wrong string-to-double format");
}
}
}
return result;
}
and successfully passed tests are:
Debug.Assert(ConvertToDouble("1.000.007") == 1000007.00);
Debug.Assert(ConvertToDouble("1.000.007,00") == 1000007.00);
Debug.Assert(ConvertToDouble("1.000,07") == 1000.07);
Debug.Assert(ConvertToDouble("1,000,007") == 1000007.00);
Debug.Assert(ConvertToDouble("1,000,000.07") == 1000000.07);
Debug.Assert(ConvertToDouble("1,007") == 1.007);
Debug.Assert(ConvertToDouble("1.07") == 1.07);
Debug.Assert(ConvertToDouble("1.007") == 1007.00);
Debug.Assert(ConvertToDouble("1.000.007E-08") == 0.07);
Debug.Assert(ConvertToDouble("1,000,007E-08") == 0.07);
In your string I see: 15.5859949000000662452.23862099999999 which is not a double (it has two decimal points). Perhaps it's just a legitimate input error?
You may also want to figure out if your last String will be empty, and account for that situation.

Parsing formatted string

I am trying to create a generic formatter/parser combination.
Example scenario:
I have a string for string.Format(), e.g. var format = "{0}-{1}"
I have an array of object (string) for the input, e.g. var arr = new[] { "asdf", "qwer" }
I am formatting the array using the format string, e.g. var res = string.Format(format, arr)
What I am trying to do is to revert back the formatted string back into the array of object (string). Something like (pseudo code):
var arr2 = string.Unformat(format, res)
// when: res = "asdf-qwer"
// arr2 should be equal to arr
Anyone have experience doing something like this? I'm thinking about using regular expressions (modify the original format string, and then pass it to Regex.Matches to get the array) and run it for each placeholder in the format string. Is this feasible or is there any other more efficient solution?
While the comments about lost information are valid, sometimes you just want to get the string values of of a string with known formatting.
One method is this blog post written by a friend of mine. He implemented an extension method called string[] ParseExact(), akin to DateTime.ParseExact(). Data is returned as an array of strings, but if you can live with that, it is terribly handy.
public static class StringExtensions
{
public static string[] ParseExact(
this string data,
string format)
{
return ParseExact(data, format, false);
}
public static string[] ParseExact(
this string data,
string format,
bool ignoreCase)
{
string[] values;
if (TryParseExact(data, format, out values, ignoreCase))
return values;
else
throw new ArgumentException("Format not compatible with value.");
}
public static bool TryExtract(
this string data,
string format,
out string[] values)
{
return TryParseExact(data, format, out values, false);
}
public static bool TryParseExact(
this string data,
string format,
out string[] values,
bool ignoreCase)
{
int tokenCount = 0;
format = Regex.Escape(format).Replace("\\{", "{");
for (tokenCount = 0; ; tokenCount++)
{
string token = string.Format("{{{0}}}", tokenCount);
if (!format.Contains(token)) break;
format = format.Replace(token,
string.Format("(?'group{0}'.*)", tokenCount));
}
RegexOptions options =
ignoreCase ? RegexOptions.IgnoreCase : RegexOptions.None;
Match match = new Regex(format, options).Match(data);
if (tokenCount != (match.Groups.Count - 1))
{
values = new string[] { };
return false;
}
else
{
values = new string[tokenCount];
for (int index = 0; index < tokenCount; index++)
values[index] =
match.Groups[string.Format("group{0}", index)].Value;
return true;
}
}
}
You can't unformat because information is lost. String.Format is a "destructive" algorithm, which means you can't (always) go back.
Create a new class inheriting from string, where you add a member that keeps track of the "{0}-{1}" and the { "asdf", "qwer" }, override ToString(), and modify a little your code.
If it becomes too tricky, just create the same class, but not inheriting from string and modify a little more your code.
IMO, that's the best way to do this.
It's simply not possible in the generic case. Some information will be "lost" (string boundaries) in the Format method. Assume:
String.Format("{0}-{1}", "hello-world", "stack-overflow");
How would you "Unformat" it?
Assuming "-" is not in the original strings, can you not just use Split?
var arr2 = formattedString.Split('-');
Note that this only applies to the presented example with an assumption. Any reverse algorithm is dependent on the kind of formatting employed; an inverse operation may not even be possible, as noted by the other answers.
A simple solution might be to
replace all format tokens with (.*)
escape all other special charaters in format
make the regex match non-greedy
This would resolve the ambiguities to the shortest possible match.
(I'm not good at RegEx, so please correct me, folks :))
After formatting, you can put the resulting string and the array of objects into a dictionary with the string as key:
Dictionary<string,string []> unFormatLookup = new Dictionary<string,string []>
...
var arr = new string [] {"asdf", "qwer" };
var res = string.Format(format, arr);
unFormatLookup.Add(res,arr);
and in Unformat method, you can simply pass a string and look up that string and return the array used:
string [] Unformat(string res)
{
string [] arr;
unFormatLoopup.TryGetValue(res,out arr); //you can also check the return value of TryGetValue and throw an exception if the input string is not in.
return arr;
}

Categories