IsNumeric Helper Method for large numbers

IsNumeric Helper Method for large numbers - c#

I'm trying to create a simple helper function that determines if a number is truly numeric. Obviously it should be able to handle 'null', negative numbers, and I'm trying to do this without the help of VB's IsNumeric. And having just learned LINQ, I thought that would be perfect.
The other thing I'd like is to be able to pass a string, integer, long, or any other type, so I was thinking having an 'object' as a parameter is what I really want. Sure, I could always convert the type to string before calling the helper method, but is it possible?
Here's the code I have so far and all I need to do is be able to change the parameter! I can't imagine it wouldn't be possible... any ideas?
private static bool IsNumeric(string input)
{
if (input == null) throw new ArgumentNullException("input");
if (string.IsNullOrEmpty(input)) return false;
int periodCount = 0; //accept a string w/ 1dec to support values w/ a float
return input.Trim()
.ToCharArray()
.Where(c =>
{
if (c == '.') periodCount++;
return Char.IsDigit(c) && periodCount <= 1;
})
.Count() == input.Trim().Length;
}

Maybe?
private static bool IsNumeric<T>(T input)
{
double d;
return double.TryParse(input.ToString(), NumberStyles.Any,CultureInfo.InvariantCulture, out d);
}
bool b1 = IsNumeric(1); //<-- true
bool b2 = IsNumeric(1.0); //<-- true
bool b3 = IsNumeric("a"); //<-- false
bool b4 = IsNumeric("3E+10"); //<-- true
bool b5 = IsNumeric("1,234,567.0"); //<-- true

There are several things to look at here. First, your code won't work with anything with a decimal.
return Char.IsDigit(c) && periodCount <= 1; should be return (Char.IsDigit(c) || c == '.') && periodCount <= 1;
Secondly, that is entirely possible. This makes your code accept anything, as you wanted.
private static bool IsNumeric(object input)
{
if (input == null) throw new ArgumentNullException("input");
string inputStr = input.ToString();
if (string.IsNullOrEmpty(inputStr)) return false;
int periodCount = 0; //accept a string w/ 1dec to support values w/ a float
return inputStr.Trim()
.ToCharArray()
.Where(c =>
{
if (c == '.') periodCount++;
return (Char.IsDigit(c) || c == '.') && periodCount <= 1;
})
.Count() == inputStr.Trim().Length;
}
However, it's very complicated. A much simpler way to do it would be
private static bool IsNumeric(object input)
{
if (input == null) throw new ArgumentNullException("input");
double test;
return double.TryParse(input.ToString(), out test);
}

If depends on how big your numbers must be, try these options:
return double.TryParse(input, out result);
return decimal.TryParse(input, out result); // Larger numbers than double, but slower.
return BigInteger.TryParse(input, out result); // Very large numbers, but slower and does not support decimals.

Related

Make class to encapsulate isLong/isDouble Function

i'm trying to make a class that contains 4 functions: isLong, isDouble, stringToLong, and stringToDouble. I am trying to do this without using a TryParse function. Ideally this class would receive a string and return the appropriate type (bool, bool, long, and double) in respective order.
For instance if i enter the number 100000 it returns True (bool) for isLong.
Below is an example of how i did isLong but i am having difficulty trying to do the same for isDouble for receiving decimals and for both stringToLong/stringToDouble.
public static bool isLong(string s)
{
bool ret = true;
int i;
s = s.Trim();
if (s[0] == '-')
{
i = 1;
}
else
{
i = 0;
}
for (; (i < s.Length); i = i + 1)
{
ret = ret && ((s[i] >= '0') && (s[i] <= '9'));
}
return (ret);
}

You could use MinValue and MaxValue properties for check numeric types, for instance you could define a method like this:
public bool IsLong(decimal value)
{
return value >= long.MinValue && value <= long.MaxValue && value == (long)value;
}

How to know if a string length contains a specified amount of capped letters

I'm trying to know if a string contains a length between 5 and 10 and at the same time, 7-10 letters are in upper case. The idea is to detect if a message sent by a user is 70%-100% capped.
This is what I have tried so far:
bool IsMessageUpper(string input)
{
if (input.Length.Equals(5 <= 10) && (input.Take(7).All(c => char.IsLetter(c) && char.IsUpper(c))))
{
return true;
}
else
{
return false;
}
}

You can rewrite your method in this way
bool IsMessageUpper(string input)
{
int x = input.Length;
return x>=7 && x<= 10 && input.Count(char.IsUpper) >= 7;
}
You can also add some safety checks to handle undesidered inputs
bool IsMessageUpper(string input)
{
int x = (input ?? "").Length;
return x>=7 && x<= 10 && input.Count(char.IsUpper) >= 7;
}

String == operator: how did Microsoft write it?

I want to know how Microsoft write algorithm for string comparison.
string.equal and string.compare
Do they compare character by character like this:
int matched = 1;
for (int i = 0; i < str1.Length; i++)
{
if (str1[i] == str2[i])
{
matched++;
}
else
{
break;
}
}
if (matched == str1.Length) return true;
Or match all at once
if (str1[0] == str2[0] && str1[1] == str2[1] && str1[2] == str2[2]) return true;
I trying pressing F12 on the string.equal function but it got me to the function declaration not the actual code. Thanks
After Thilo mentioned to look at the source i was able to find this... this is how Microsoft wrote it.
public static bool Equals(String a, String b) {
if ((Object)a==(Object)b) {
return true;
}
if ((Object)a==null || (Object)b==null) {
return false;
}
if (a.Length != b.Length)
return false;
return EqualsHelper(a, b);
}
But this raise a question whether is faster by checking character by character or doing a complete match?

Looking at the source (copied below):
null check
reference identity
different length => not equal
go over the binary encoding of the characters in a bit of an unrolled loop
this raise a question whether is faster by checking character by character or doing a complete match
I don't understand the question. You cannot do a "complete match" without checking each of the characters. What you can do is bail out as soon as you find a mismatch. That reduces runtime a bit, but does not change the fact that it is O(n).
// Determines whether two strings match.
[Pure]
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
public bool Equals(String value) {
if (this == null) //this is necessary to guard against reverse-pinvokes and
throw new NullReferenceException(); //other callers who do not use the callvirt instruction
if (value == null)
return false;
if (Object.ReferenceEquals(this, value))
return true;
if (this.Length != value.Length)
return false;
return EqualsHelper(this, value);
}
[System.Security.SecuritySafeCritical] // auto-generated
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
private unsafe static bool EqualsHelper(String strA, String strB)
{
Contract.Requires(strA != null);
Contract.Requires(strB != null);
Contract.Requires(strA.Length == strB.Length);
int length = strA.Length;
fixed (char* ap = &strA.m_firstChar) fixed (char* bp = &strB.m_firstChar)
{
char* a = ap;
char* b = bp;
// unroll the loop
#if AMD64
// for AMD64 bit platform we unroll by 12 and
// check 3 qword at a time. This is less code
// than the 32 bit case and is shorter
// pathlength
while (length >= 12)
{
if (*(long*)a != *(long*)b) return false;
if (*(long*)(a+4) != *(long*)(b+4)) return false;
if (*(long*)(a+8) != *(long*)(b+8)) return false;
a += 12; b += 12; length -= 12;
}
#else
while (length >= 10)
{
if (*(int*)a != *(int*)b) return false;
if (*(int*)(a+2) != *(int*)(b+2)) return false;
if (*(int*)(a+4) != *(int*)(b+4)) return false;
if (*(int*)(a+6) != *(int*)(b+6)) return false;
if (*(int*)(a+8) != *(int*)(b+8)) return false;
a += 10; b += 10; length -= 10;
}
#endif
// This depends on the fact that the String objects are
// always zero terminated and that the terminating zero is not included
// in the length. For odd string sizes, the last compare will include
// the zero terminator.
while (length > 0)
{
if (*(int*)a != *(int*)b) break;
a += 2; b += 2; length -= 2;
}
return (length <= 0);
}
}

Is there a higher performance method for removing rare unwanted chars from a string?

EDIT
Apologies if the original unedited question is misleading.
This question is not asking how to remove Invalid XML Chars from a string, answers to that question would be better directed here.
I'm not asking you to review my code.
What I'm looking for in answers is, a function with the signature
string <YourName>(string input, Func<char, bool> check);
that will have performance similar or better than RemoveCharsBufferCopyBlackList. Ideally this function would be more generic and if possible simpler to read, but these requirements are secondary.
I recently wrote a function to strip invalid XML chars from a string. In my application the strings can be modestly long and the invalid chars occur rarely. This excerise got me thinking. What ways can this be done in safe managed c# and, which would offer the best performance for my scenario.
Here is my test program, I've subtituted the "valid XML predicate" for one the omits the char 'X'.
class Program
{
static void Main()
{
var attempts = new List<Func<string, Func<char, bool>, string>>
{
RemoveCharsLinqWhiteList,
RemoveCharsFindAllWhiteList,
RemoveCharsBufferCopyBlackList
}
const string GoodString = "1234567890abcdefgabcedefg";
const string BadString = "1234567890abcdefgXabcedefg";
const int Iterations = 100000;
var timer = new StopWatch();
var testSet = new List<string>(Iterations);
for (var i = 0; i < Iterations; i++)
{
if (i % 1000 == 0)
{
testSet.Add(BadString);
}
else
{
testSet.Add(GoodString);
}
}
foreach (var attempt in attempts)
{
//Check function works and JIT
if (attempt.Invoke(BadString, IsNotUpperX) != GoodString)
{
throw new ApplicationException("Broken Function");
}
if (attempt.Invoke(GoodString, IsNotUpperX) != GoodString)
{
throw new ApplicationException("Broken Function");
}
timer.Reset();
timer.Start();
foreach (var t in testSet)
{
attempt.Invoke(t, IsNotUpperX);
}
timer.Stop();
Console.WriteLine(
"{0} iterations of function \"{1}\" performed in {2}ms",
Iterations,
attempt.Method,
timer.ElapsedMilliseconds);
Console.WriteLine();
}
Console.Readkey();
}
private static bool IsNotUpperX(char value)
{
return value != 'X';
}
private static string RemoveCharsLinqWhiteList(string input,
Func<char, bool> check);
{
return new string(input.Where(check).ToArray());
}
private static string RemoveCharsFindAllWhiteList(string input,
Func<char, bool> check);
{
return new string(Array.FindAll(input.ToCharArray(), check.Invoke));
}
private static string RemoveCharsBufferCopyBlackList(string input,
Func<char, bool> check);
{
char[] inputArray = null;
char[] outputBuffer = null;
var blackCount = 0;
var lastb = -1;
var whitePos = 0;
for (var b = 0; b , input.Length; b++)
{
if (!check.invoke(input[b]))
{
var whites = b - lastb - 1;
if (whites > 0)
{
if (outputBuffer == null)
{
outputBuffer = new char[input.Length - blackCount];
}
if (inputArray == null)
{
inputArray = input.ToCharArray();
}
Buffer.BlockCopy(
inputArray,
(lastb + 1) * 2,
outputBuffer,
whitePos * 2,
whites * 2);
whitePos += whites;
}
lastb = b;
blackCount++;
}
}
if (blackCount == 0)
{
return input;
}
var remaining = inputArray.Length - 1 - lastb;
if (remaining > 0)
{
Buffer.BlockCopy(
inputArray,
(lastb + 1) * 2,
outputBuffer,
whitePos * 2,
remaining * 2);
}
return new string(outputBuffer, 0, inputArray.Length - blackCount);
}
}
If you run the attempts you'll note that the performance improves as the functions get more specialised. Is there a faster and more generic way to perform this operation? Or if there is no generic option is there a way that is just faster?
Please note that I am not actually interested in removing 'X' and in practice the predicate is more complicated.

You certainly don't want to use LINQ to Objects aka enumerators to do this if you require high performance. Also, don't invoke a delegate per char. Delegate invocations are costly compared to the actual operation you are doing.
RemoveCharsBufferCopyBlackList looks good (except for the delegate call per character).
I recommend that you inline the contents of the delegate hard-coded. Play around with different ways to write the condition. You may get better performance by first checking the current char against a range of known good chars (e.g. 0x20-0xFF) and if it matches let it through. This test will pass almost always so you can save the expensive checks against individual characters which are invalid in XML.
Edit: I just remembered I solved this problem a while ago:
static readonly string invalidXmlChars =
Enumerable.Range(0, 0x20)
.Where(i => !(i == '\u000A' || i == '\u000D' || i == '\u0009'))
.Select(i => (char)i)
.ConcatToString()
+ "\uFFFE\uFFFF";
public static string RemoveInvalidXmlChars(string str)
{
return RemoveInvalidXmlChars(str, false);
}
internal static string RemoveInvalidXmlChars(string str, bool forceRemoveSurrogates)
{
if (str == null) throw new ArgumentNullException("str");
if (!ContainsInvalidXmlChars(str, forceRemoveSurrogates))
return str;
str = str.RemoveCharset(invalidXmlChars);
if (forceRemoveSurrogates)
{
for (int i = 0; i < str.Length; i++)
{
if (IsSurrogate(str[i]))
{
str = str.Where(c => !IsSurrogate(c)).ConcatToString();
break;
}
}
}
return str;
}
static bool IsSurrogate(char c)
{
return c >= 0xD800 && c < 0xE000;
}
internal static bool ContainsInvalidXmlChars(string str)
{
return ContainsInvalidXmlChars(str, false);
}
public static bool ContainsInvalidXmlChars(string str, bool forceRemoveSurrogates)
{
if (str == null) throw new ArgumentNullException("str");
for (int i = 0; i < str.Length; i++)
{
if (str[i] < 0x20 && !(str[i] == '\u000A' || str[i] == '\u000D' || str[i] == '\u0009'))
return true;
if (str[i] >= 0xD800)
{
if (forceRemoveSurrogates && str[i] < 0xE000)
return true;
if ((str[i] == '\uFFFE' || str[i] == '\uFFFF'))
return true;
}
}
return false;
}
Notice, that RemoveInvalidXmlChars first invokes ContainsInvalidXmlChars to save the string allocation. Most strings do not contain invalid XML chars so we can be optimistic.

How to validate Guid in .net

Please tell me how to validate GUID in .net and it is unique for always?

Guid's are unique 99.99999999999999999999999999999999% of the time.
It depends on what you mean by validate?
Code to determine that a Guid string is in fact a Guid, is as follows:
private static Regex isGuid =
new Regex(#"^(\{){0,1}[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}(\}){0,1}$", RegexOptions.Compiled);
internal static bool IsGuid(string candidate, out Guid output)
{
bool isValid = false;
output = Guid.Empty;
if(candidate != null)
{
if (isGuid.IsMatch(candidate))
{
output=new Guid(candidate);
isValid = true;
}
}
return isValid;
}

2^128 is a very, very large number. It is a billion times larger than the number of picoseconds in the life of the universe. Too large by a long shot to ever validate, the answer is doomed to be "42". Which is the point of using them: you don't have to. If you worry about getting duplicates then you worry for the wrong reason. The odds your machine will be destroyed by a meteor impact are considerably larger.
Duck!

Here's a non-Regex answer that should be pretty fast:
public static bool IsHex(this char c)
{
return ((c >= '0' && c <= '9') || (c >= 'a' && c <= 'f') || (c >= 'A' && c <= 'F'));
}
public static bool IsGuid(this string s)
{
// Length of a proper GUID, without any surrounding braces.
const int len_without_braces = 36;
// Delimiter for GUID data parts.
const char delim = '-';
// Delimiter positions.
const int d_0 = 8;
const int d_1 = 13;
const int d_2 = 18;
const int d_3 = 23;
// Before Delimiter positions.
const int bd_0 = 7;
const int bd_1 = 12;
const int bd_2 = 17;
const int bd_3 = 22;
if (s == null)
return false;
if (s.Length != len_without_braces)
return false;
if (s[d_0] != delim ||
s[d_1] != delim ||
s[d_2] != delim ||
s[d_3] != delim)
return false;
for (int i = 0;
i < s.Length;
i = i + (i == bd_0 ||
i == bd_1 ||
i == bd_2 ||
i == bd_3
? 2 : 1))
{
if (!IsHex(s[i])) return false;
}
return true;
}

You cannot validate GUID's uniqueness. You just hope it was generated with a tool that produces unique 16 bytes. As for validation, this simple code might work (assuming you are dealing with GUID's string representation:
bool ValidateGuid(string theGuid)
{
try { Guid aG = new Guid(theGuid); }
catch { return false; }
return true;
}

If you're looking for a way to determine if it's the format of the actual .Net Guid type, take a look at this article. A quick regex does the trick.

this question was already discussed in this post. You may find more interesting details

In .net 4, you can use this extension method
public static class GuidExtensions
{
public static bool IsGuid(this string value)
{
Guid output;
return Guid.TryParse(value, out output);
}
}

i wrote a extension for this
public static bool TryParseGuid(this Guid? guidString)
{
if (guidString != null && guidString != Guid.Empty)
{
if (Guid.TryParse(guidString.ToString(), out _))
{
return true;
}
else
return false;
}
return false;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

IsNumeric Helper Method for large numbers - c#

Related

Make class to encapsulate isLong/isDouble Function

How to know if a string length contains a specified amount of capped letters

String == operator: how did Microsoft write it?

Is there a higher performance method for removing rare unwanted chars from a string?

How to validate Guid in .net

Categories

Resources