In my ASP.net project I need to validate some basic data types for user inputs. The data types are like numeric, decimal, datetime etc.
What is the best approach that I should have taken in terms of performance? Is it to do it by Regex.IsMatch() or by TryParse()?
Thanks in advance.
TryParse and Regex.IsMatch are used for two fundamentally different things. Regex.IsMatch tells you if the string in question matches some particular pattern. It returns a yes/no answer. TryParse actually converts the value if possible, and tells you whether it succeeded.
Unless you're very careful in crafting the regular expression, Regex.IsMatch can return true when TryParse will return false. For example, consider the simple case of parsing a byte. With TryParse you have:
byte b;
bool isGood = byte.TryParse(myString, out b);
If the value in myString is between 0 and 255, TryParse will return true.
Now, let's try with Regex.IsMatch. Let's see, what should that regular expression be? We can't just say #"\d+" or even #\d{1,3}". Specifying the format becomes a very difficult job. You have to handle leading 0s, leading and trailing white space, and allow 255 but not 256.
And that's just for parsing a 3-digit number. The rules get even more complicated when you're parsing an int or long.
Regular expressions are great for determining form. They suck when it comes to determining value. Since our standard data types all have limits, determining its value is part of figuring out whether or not the number is valid.
You're better off using TryParse whenever possible, if only to save yourself the headache of trying to come up with a reliable regular expression that will do the validation. It's likely (I'd say almost certain) that a particular TryParse for any of the native types will execute faster than the equivalent regular expression.
The above said, I've probably spent more time on this answer than your Web page will spend executing your TryParse or Regex.IsMatch--total throughout its entire life. The time to execute these things is so small in the context of everything else your Web site is doing, any time you spend pondering the problem is wasted.
Use TryParse if you can, because it's easier. Otherwise use Regex.
As other would say, the best way to answer that is to measure it ;)
static void Main(string[] args)
{
List<double> meansFailedTryParse = new List<double>();
List<double> meansFailedRegEx = new List<double>();
List<double> meansSuccessTryParse = new List<double>();
List<double> meansSuccessRegEx = new List<double>();
for (int i = 0; i < 1000; i++)
{
string input = "123abc";
int res;
bool res2;
var sw = Stopwatch.StartNew();
res2 = Int32.TryParse(input, out res);
sw.Stop();
meansFailedTryParse.Add(sw.Elapsed.TotalMilliseconds);
//Console.WriteLine("Result of " + res2 + " try parse :" + sw.Elapsed.TotalMilliseconds);
sw = Stopwatch.StartNew();
res2 = Regex.IsMatch(input, #"^[0-9]*$");
sw.Stop();
meansFailedRegEx.Add(sw.Elapsed.TotalMilliseconds);
//Console.WriteLine("Result of " + res2 + " Regex.IsMatch :" + sw.Elapsed.TotalMilliseconds);
input = "123";
sw = Stopwatch.StartNew();
res2 = Int32.TryParse(input, out res);
sw.Stop();
meansSuccessTryParse.Add(sw.Elapsed.TotalMilliseconds);
//Console.WriteLine("Result of " + res2 + " try parse :" + sw.Elapsed.TotalMilliseconds);
sw = Stopwatch.StartNew();
res2 = Regex.IsMatch(input, #"^[0-9]*$");
sw.Stop();
meansSuccessRegEx.Add(sw.Elapsed.TotalMilliseconds);
//Console.WriteLine("Result of " + res2 + " Regex.IsMatch :" + sw.Elapsed.TotalMilliseconds);
}
Console.WriteLine("Failed TryParse mean execution time " + meansFailedTryParse.Average());
Console.WriteLine("Failed Regex mean execution time " + meansFailedRegEx.Average());
Console.WriteLine("successful TryParse mean execution time " + meansSuccessTryParse.Average());
Console.WriteLine("successful Regex mean execution time " + meansSuccessRegEx.Average());
}
}
Don't try to make regexes do everything.
Sometimes a simple regex will get you 90% of the way and to make it do everything you need the complexity grows ten times or more.
Then I often find that the simplest solution is to use the regex to check the form and then rely on good old code for the value checking.
Take a date for example, use a regex to check for a match on a date format and then use capturing groups to check the values of the individual values.
I'd guess TryParse is quicker, but more importantly, it's more expressive.
The regular expressions can get pretty ugly when you consider all the valid values for each data type you're using. For example, with DateTime you have to ensure the month is between 1 and 12, and that the day is within the valid range for that particular month.
Related
Solved, thanks for the help!
So I got an assignment for school, and no matter how much I search the net or read my books I can't figure out the answer to the question.
I have done programming for about 4 hours, so thats why the question is phrased wierdly, I think.
Console.WriteLine("Enter a number with any number of decimals.");
string input;
input = Console.ReadLine();
decimal myNumber = decimal.Parse(input);
Console.WriteLine("Please specify how many decimals you want to be shown.");
string input2;
input22 = Console.ReadLine();
int myDecimal = int.Parse(input2);
Console.WriteLine(("Your number with the choosen number of decimals: {0:f3}"), myNumber);
So, when I run it and enter 2,1234567 as my number and 5 as my number of decimals, it prints 2,123 instead of 2,12345.
I know it prints 3 decimals because of the 3 after the f, but I can't figure out how to change the 3 into the ammount chosen by the user.
I have tried {0:f(myDecimal)}, {myDecimal:f and {0:f(int = myDecimal)} , none of which I expected to work as I was just testing things out.
The answer is probably really simple, and I'm probably just overthinking things, but help would be very much appriciated!
You need a format-ception here:
// the {{ and }} escapes to { and }
var numberFormat = string.Format("{{0:f{0}}}", myDecimal).Dump();
Console.WriteLine("Your number with the choosen number of decimals: " + numberFormat, myNumber);
You can use ToString too
decimal myNumber = 224323.545656M;
int myDecimal = 4;
Console.WriteLine(String.Format("Your number with the choosen number of decimals: {0} " , myNumber.ToString("F" + myDecimal)));
You could just simply change your last Console.WriteLine() call to this:
Console.WriteLine("Your number with the choosen number of decimals: {0}",
myNumber.ToString("f" + input2));
The part that changes is: myNumber.ToString("f" + input2). What this does is use string concatenation to build your format string from your variable input2.
I made a fiddle here.
Please keep in mind though, the format string you are using ("F") will round to that number of decimal places (i.e 1.236 would become 1.24 with format "F2")
You need to build the format string dynamically. At this point using a substituted string is harder than ToString:
var digits = 3;
var input = 2.123456789;
var format = $"F{digits}";
var output = "Some text {input.ToString(format)} some more text";
I am using Double.TryParse() to find if the given string is a number or not. I do not know how TryParse works. But when I give an input like 54.34.23 it returns true. I am working on a MVC5 application in Visual Studio Express 2013.
So is 54.34.23 really a number or I have to do something else with the TryParse for it to return false when the above input is given.
Adding a bit more detail 1.2.3.4 also returns true.
if (!double.TryParse(setValue.Value, out val))
{
ModelState.AddModelError("Value", "Value can only be a number");
return View(setValue);
}
Have you considered actually testing TryParse() to see what result it gives you to be sure. I just tested it here and the following code in main()...
double x;
Console.WriteLine("Parse: {0}", double.TryParse("54.34.23", out x).ToString());
Console.WriteLine("Value: {0}", x);
...which gives a result of...
Parse: False
Result: 0
Since I'm getting a different result to you, it seems we have (as Ewan pointed out in the comments) a localisation issue and to fix this you need to specify which rules you would like TryParse() to use via the localised TryParse() method (documented at msdn.microsoft.com.)
This takes 4 parameters and allows you to specify how the parser works in regard to what facets are valid (negative numbers, decimal points, exponents etc.)
If your thread culture allows the "." thousands separator "54.34.23" will parse to 543423
[TestMethod]
public void TestMethod1()
{
string n = "54.34.23";
double d1;
double d2;
Thread.CurrentThread.CurrentCulture = new CultureInfo("de-DE");
bool test = double.TryParse(n, out d1);
Console.WriteLine("test : " + test);
Console.WriteLine("d1 : " + d1);
}
However!!! I'm not sure that this is your problem. I suggest you write a UNIT TEST for your code!! you may find something else is happening
I m curious why would i use string formatting while i can use concatenation such as
Console.WriteLine("Hello {0} !", name);
Console.WriteLine("Hello "+ name + " !");
Why to prefer the first one over second?
You picked too simple of an example.
String formatting:
allows you to use the same variable multiple times: ("{0} + {0} = {1}", x, 2*x)
automatically calls ToString on its arguments: ("{0}: {1}", someKeyObj, someValueObj)
allows you to specify formatting: ("The value will be {0:3N} (or {1:P}) on {2:MMMM yyyy gg}", x, y, theDate)
allows you to set padding easily: (">{0,3}<", "hi"); // ">hi <"
You can trade the string for a dynamic string later.
For example:
// In a land far, far away
string nameFormat = "{0} {1}";
// In our function
string firstName = "John";
string lastName = "Doe";
Console.WriteLine(nameFormat, firstName, lastName);
Here, you can change nameFormat to e.g. "{1}, {0}" and you don't need to change any of your other code. With concatination, you would need to edit your code or possibly duplicate your code to handle both cases.
This is useful in localization/internationalization.
There isn't a singular correct answer to this question. There are a few issues you want to address:
Performance
The performance differences in your examples (and in real apps) are minimal. If you start writing MANY concatenations, you will gradually see better memory performance with the formatted string. Refer to Ben's answer
Readability
You will be better off with a formatted string when you have formatting, or have many different variables to stringify:
string formatString = "Hello {0}, today is {1:yyyy-MM-dd}";
Console.WriteLine(formatString, userName, Date.Today);
Extensibility
Your situation will determine what's best. You tell me which is better when you need to add an item between Username and Time in the log:
Console.WriteLine(
#"Error!
Username: " + userName + "
Time: " + time.ToString("HH:mm:ss") + "
Function: " + functionName + "
Action: " + actionName + "
status: " + status + "
---");
or
Console.WriteLine(#"Error!
Username: {0}
Time: {1}
Function: {2}
Action: {3}
status: {4}
---",
username, time.ToString("HH:mm:ss"), functionName, actionName, status);
Conclusion
I would choose the formatted string most of the time... But I wouldn't hesitate at all to use concatenation when it was easier.
I think the main thing here is readability. So I always choose for what have the best readability for each case.
Note:
With string interpolation of C# 6 your code could be simplified to this:
Console.WriteLine($"Hello {name}!");
Which I think better than your two suggested options.
String formatting allows you to keep the format string separate, and use it where it's needed properly without having to worry about concatenation.
string greeting = "Hello {0}!";
Console.WriteLine(greeting, name);
As for why you would use it in the exact example you gave... force of habit, really.
I think a good example is about i18n and l10n
If you have to change a string between different languages, this: "bla "+variable+"bla bla.."
Will give problems to a program used to create sobstitution for your strings if you use a different language
while in this way: "bla {0} blabla" is easily convertible (you will get {0} as part of the string)
Formatting is usually preferred for most of the reasons explained by other members here. There are couple more reasons I want to throw in from my short programming experience:
Formatting will help in generating Culture aware strings.
Formatting is more performant than concatenation. Remember, every concatenation operation will involve creation of temporary intermediate strings. If you have a bunch of strings you need to concatenate, you are better off using String.Join or StringBuilder.
You are using a trivial example where there is not much of a difference. However, if you have a long string with many parameters it is much nicer to be able to use a format string instead of many, many + signs and line breaks.
It also allows you to format numerical data as you wish, i.e., currency, dates, etc.
The first format is recommended. It allows you to specify specific formats for other types, like displaying a hex value, or displaying some specific string format.
e.g.
string displayInHex = String.Format("{0,10:X}", value); // to display in hex
It is also more consistent. You can use the same convention to display your Debug statement.
e.g.
Debug.WriteLine (String.Format("{0,10:X}", value));
Last but not least, it helps in the localisation of your program.
In addition to reasons like Ignacio's, many (including me) find String.Format-based code much easier to read and alter.
string x = String.Format(
"This file was last modified on {0} at {1} by {2}, and was last backed up {3}."
date, time, user, backupDate);
vs.
string x = "This file was last modified on " + date + " at "
+ time + " by " + user + " and was last backed up " + backupDate + ".";
I have found the former approach (using string.format) very useful when overriding ToString() methods in Entity classes.
For example, in my Product class;
public override string ToString()
{
return string.format("{0} : {1} ({2} / {3} / {4}",
this.id,
this.description,
this.brand,
this.model);
}
Then, when users decide they want the product to appear differently it's easy to change the order/contents or layout of the string that is returned.
Of course you still concatentate this string together but I feel string.Format makes the whole thing a bit more readable and maintainable.
I guess that's the short answer I'm giving then isn't it - readability and maintainability for lengthy or complex strings.
Since C# 6 release a few years ago, it is also possible to perform string interpolation. Per example this:
var n = 12.5847;
System.Console.WriteLine(string.Format("Hello World! {0:C2}", n));
Becomes that:
var n = 12.5847;
System.Console.WriteLine($"Hello World! {n:C2}");
And both give you this result:
Hello World! £12.58
Is there a way to tell the String.Format() function (without writing my own function) how many placeholders there are dynamically? It we be great to say 15, and know I'd have {0}-{14} generated for me. I'm generate text files and I often have more than 25 columns. It would greatly help.
OK,
I will rephrase my question. I wanted to know if it is at all possible to tell the String.Format function at execution time how many place-holders I want in my format string without typing them all out by hand.
I'm guessing by the responses so far, I will just go ahead and write my own method.
Thanks!
You could use Enumerable.Range and LINQ to generate your message string.
Enumerable.Range(0, 7).Select(i => "{" + i + "}").ToArray()
generates following string:
"{0}{1}{2}{3}{4}{5}{6}"
Adding a bit to AlbertEin's response, I don't believe String.Format can do this for you out-of-the-box. You'll need to dynamically create the format string prior to using the String.Format method, as in:
var builder = new StringBuilder();
for(var i = 0; i < n; ++i)
{
builder.AppendFormat("{0}", "{" + i + "}");
}
String.Format(builder.ToString(), ...);
This isn't exactly readable, though.
Why use string.Format when there is no formatting (atleast from what I can see in your question)? You could use simple concatenation using stringbuilder instead.
There is a way to do this directly:
Just create a custom IFormatProvider, then use this overload of string.Format.
For example, if you want to always have 12 decimal points, you can do:
CultureInfo culture = Thread.CurrentThread.CurrentCulture.Clone(); // Copy your current culture
NumberFormatInfo nfi = culture.NumberFormat;
nfi.NumberDecimalDigits = 12; // Set to 12 decimal points
string newResult = string.Format(culture, "{0}", myDouble); // Will put it in with 12 decimal points
Before using String.Format to format a string in C#, I would like to know how many parameters does that string accept?
For eg. if the string was "{0} is not the same as {1}", I would like to know that this string accepts two parameters
For eg. if the string was "{0} is not the same as {1} and {2}", the string accepts 3 parameters
How can I find this efficiently?
String.Format receives a string argument with format value, and an params object[] array, which can deal with an arbitrary large value items.
For every object value, it's .ToString() method will be called to resolve that format pattern
EDIT: Seems I misread your question. If you want to know how many arguments are required to your format, you can discover that by using a regular expression:
string pattern = "{0} {1:00} {{2}}, Failure: {0}{{{1}}}, Failure: {0} ({0})";
int count = Regex.Matches(Regex.Replace(pattern,
#"(\{{2}|\}{2})", ""), // removes escaped curly brackets
#"\{\d+(?:\:?[^}]*)\}").Count; // returns 6
As Benjamin noted in comments, maybe you do need to know number of different references. If you don't using Linq, here you go:
int count = Regex.Matches(Regex.Replace(pattern,
#"(\{{2}|\}{2})", ""), // removes escaped curly brackets
#"\{(\d+)(?:\:?[^}]*)\}").OfType<Match>()
.SelectMany(match => match.Groups.OfType<Group>().Skip(1))
.Select(index => Int32.Parse(index.Value))
.Max() + 1; // returns 2
This also address #280Z28 last problem spotted.
Edit by 280Z28: This will not validate the input, but for any valid input will give the correct answer:
int count2 =
Regex.Matches(
pattern.Replace("{{", string.Empty),
#"\{(\d+)")
.OfType<Match>()
.Select(match => int.Parse(match.Groups[1].Value))
.Union(Enumerable.Repeat(-1, 1))
.Max() + 1;
You'll have to parse through the string and find the highest integer value between the {}'s...then add one.
...or count the number of sets of {}'s.
Either way, it's ugly. I'd be interested to know why you need to be able to figure out this number programatically.
EDIT
As 280Z28 mentioned, you'll have to account for the various idiosyncrasies of what can be included between the {}'s (multiple {}'s, formatting strings, etc.).
I rely on ReSharper to analyze that for me, and it is a pity that Visual Studio does not ship with such a neat feature.