Complex regex pattern for science

Complex regex pattern for science - c#

For a project I'm working on, we need to analyse, calculate and process data with R. To do some accurate calculations, our scientists would like to have a regex expression that match the following on our input.
12
1.12
1.00021
234.0012
23.020
123.012
123.0000000000012
1.0000000000023
As you can see the decimal places of the values can have any number of zeros but it's only valid if the zeros are followed by a number between 10 and 99 (inclusive).
So the following should not be valid.
1
0.0001
0.02
8.000000001
1.01
Hope someone has a solution or a direction, because I'm quite stuck.

If I understand your question correctly, decimal places can have any number of zeros followed by 10-99, right?
\d+(\.0*[1-9][0-9])
What I don't see is how you make a distinction between 12 being valid and 1 not.

Related

Can you explain this double to string conversion behavior in C#?

I'm trying to figure out why this happens and what is C# doing here.
Let's say we have a double: 277.3599853515625 (that's 13 digits after the period)
Then 277.3599853515625.ToString() -> "277.359985351563"
We lost a digit and it looks like the number got rounded UP.
But Math.Round(277.3599853515625,12) -> 277.359985351562 (looks like normal math rounding results in rounding DOWN)
I thought maybe if I give ToString() the formatting I want it would do the correct thing (give me the entire number):
277.3599853515625.ToString("0.#############") -> "277.359985351563" (that's 13 # signs, and still lost a digit and rounded UP)
If I reduce the last digit from 5 to 4, it rounds DOWN:
277.3599853515624.ToString("0.#############") -> "277.359985351562"
So it is clearly doing the rounding, but the rounding rules are different from normal math rounding. My first thought was that it's just treating 5 different, when normal rounding rounds 5 down, the ToString rounds it up, but look at this:
277.3599853515624999.ToString("0.#############") -> "277.359985351563" (WHAT?!?!?!?)
Do you have any idea what is happening here and what exactly C#'s logic in ToString() does?
The reason I'm asking is that I need to understand how to replicate the same behavior in a different language.
Thank you.

277.3599853515625.ToString() -> "277.359985351563"
or
277.3599853515624.ToString("0.#############") -> "277.359985351562"
In this case, the ToString method is using MidpointRounding.AwayFromZero so that is why it converts 2 to 3 when the last digit is 5.
For Reference, use this link: https://learn.microsoft.com/en-us/dotnet/api/system.midpointrounding?view=net-6.0#system-midpointrounding-awayfromzero
Math.Round(277.3599853515625,12) -> 277.359985351562**
In this case, Math.Round uses MidpointRounding.ToEven by default and rounds midpoint values to the nearest even number. Need to explicitly define specific MidpointRounding if ToEven is not required.
For Reference, use this link: https://learn.microsoft.com/en-us/dotnet/api/system.math.round?view=net-6.0
277.3599853515624999.ToString("0.#############") -> "277.359985351563"
(WHAT?!?!?!?)
Here, there are two concepts. One is that ToString considers this
277.3599853515624999 as Double type, so it is a 16 digit number; that is why you are getting 16 digits.
Console.WriteLine(277.3599853515624999.GetType()); // System.Double;
Double-15-16 digits (64 bit)
Decimal -28-29 significant digits (128 bit)
Thus, if change this (277.3599853515624999) to (277.3599853515624999m.ToString()),
then you get 277.3599853515624999
And the second one is that there is also rounding done by
MidpointRounding.AwayFromZero.
You can play with the below code:
Decimal h1= 277.3599853515624999m;
string hh= "277.3599853515624999";
string h = 277.3599853515624999m.ToString();
Console.WriteLine(277.3599853515624999.GetType()); // System.Double;
Console.WriteLine(h);
string hhh = Math.Round(277.345,2,MidpointRounding.AwayFromZero).ToString();
Console.WriteLine(hhh);
I hope now there is a clear picture.

What is the time format 'MMMMMMMSS'?

When processing a file from a telecom company, I came across this in the specifications :
When reading in that data, how can I convert that format to something usable in c# ? I have no idea what MMMMMMMMSS format is !!

The only logical explanation I can think of is the following:
Since this is a call duration representation, let's say that a call duration was 10:10:5. I assume they want to represent this in minutes and seconds only. Hence considering the given format, it would be represented like this: 61005 which is 610 minutes and 5 seconds, then the 5 remaining bytes can be filled with trailing zeros, or with space characters (since you mentioned that's what they used to represent a value).
Hope that helps.

I would expect each of these to be zero-padded. Regardless, Split the last two characters off to derive seconds and cents, respectively. The first 8 characters represent minutes and dollars. A one minute (exactly) call would be 7 zeros followed by a 1 followed by two zeros. A ten minute and ten second call would be 6 zeros followed by 1010.

Float value retrieved wrong from database [duplicate]

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
How do you explain floating point inaccuracy to fresh programmers and laymen who still think computers are infinitely wise and accurate?
Do you have a favourite example or anecdote which seems to get the idea across much better than an precise, but dry, explanation?
How is this taught in Computer Science classes?

There are basically two major pitfalls people stumble in with floating-point numbers.
The problem of scale. Each FP number has an exponent which determines the overall “scale” of the number so you can represent either really small values or really larges ones, though the number of digits you can devote for that is limited. Adding two numbers of different scale will sometimes result in the smaller one being “eaten” since there is no way to fit it into the larger scale.
PS> $a = 1; $b = 0.0000000000000000000000001
PS> Write-Host a=$a b=$b
a=1 b=1E-25
PS> $a + $b
1
As an analogy for this case you could picture a large swimming pool and a teaspoon of water. Both are of very different sizes, but individually you can easily grasp how much they roughly are. Pouring the teaspoon into the swimming pool, however, will leave you still with roughly a swimming pool full of water.
(If the people learning this have trouble with exponential notation, one can also use the values 1 and 100000000000000000000 or so.)
Then there is the problem of binary vs. decimal representation. A number like 0.1 can't be represented exactly with a limited amount of binary digits. Some languages mask this, though:
PS> "{0:N50}" -f 0.1
0.10000000000000000000000000000000000000000000000000
But you can “amplify” the representation error by repeatedly adding the numbers together:
PS> $sum = 0; for ($i = 0; $i -lt 100; $i++) { $sum += 0.1 }; $sum
9,99999999999998
I can't think of a nice analogy to properly explain this, though. It's basically the same problem why you can represent 1/3 only approximately in decimal because to get the exact value you need to repeat the 3 indefinitely at the end of the decimal fraction.
Similarly, binary fractions are good for representing halves, quarters, eighths, etc. but things like a tenth will yield an infinitely repeating stream of binary digits.
Then there is another problem, though most people don't stumble into that, unless they're doing huge amounts of numerical stuff. But then, those already know about the problem. Since many floating-point numbers are merely approximations of the exact value this means that for a given approximation f of a real number r there can be infinitely many more real numbers r1, r2, ... which map to exactly the same approximation. Those numbers lie in a certain interval. Let's say that rmin is the minimum possible value of r that results in f and rmax the maximum possible value of r for which this holds, then you got an interval [rmin, rmax] where any number in that interval can be your actual number r.
Now, if you perform calculations on that number—adding, subtracting, multiplying, etc.—you lose precision. Every number is just an approximation, therefore you're actually performing calculations with intervals. The result is an interval too and the approximation error only ever gets larger, thereby widening the interval. You may get back a single number from that calculation. But that's merely one number from the interval of possible results, taking into account precision of your original operands and the precision loss due to the calculation.
That sort of thing is called Interval arithmetic and at least for me it was part of our math course at the university.

Show them that the base-10 system suffers from exactly the same problem.
Try to represent 1/3 as a decimal representation in base 10. You won't be able to do it exactly.
So if you write "0.3333", you will have a reasonably exact representation for many use cases.
But if you move that back to a fraction, you will get "3333/10000", which is not the same as "1/3".
Other fractions, such as 1/2 can easily be represented by a finite decimal representation in base-10: "0.5"
Now base-2 and base-10 suffer from essentially the same problem: both have some numbers that they can't represent exactly.
While base-10 has no problem representing 1/10 as "0.1" in base-2 you'd need an infinite representation starting with "0.000110011..".

How's this for an explantation to the layman. One way computers represent numbers is by counting discrete units. These are digital computers. For whole numbers, those without a fractional part, modern digital computers count powers of two: 1, 2, 4, 8. ,,, Place value, binary digits, blah , blah, blah. For fractions, digital computers count inverse powers of two: 1/2, 1/4, 1/8, ... The problem is that many numbers can't be represented by a sum of a finite number of those inverse powers. Using more place values (more bits) will increase the precision of the representation of those 'problem' numbers, but never get it exactly because it only has a limited number of bits. Some numbers can't be represented with an infinite number of bits.
Snooze...
OK, you want to measure the volume of water in a container, and you only have 3 measuring cups: full cup, half cup, and quarter cup. After counting the last full cup, let's say there is one third of a cup remaining. Yet you can't measure that because it doesn't exactly fill any combination of available cups. It doesn't fill the half cup, and the overflow from the quarter cup is too small to fill anything. So you have an error - the difference between 1/3 and 1/4. This error is compounded when you combine it with errors from other measurements.

In python:
>>> 1.0 / 10
0.10000000000000001
Explain how some fractions cannot be represented precisely in binary. Just like some fractions (like 1/3) cannot be represented precisely in base 10.

Another example, in C
printf (" %.20f \n", 3.6);
incredibly gives
3.60000000000000008882

Here is my simple understanding.
Problem:
The value 0.45 cannot be accurately be represented by a float and is rounded up to 0.450000018. Why is that?
Answer:
An int value of 45 is represented by the binary value 101101.
In order to make the value 0.45 it would be accurate if it you could take 45 x 10^-2 (= 45 / 10^2.)
But that’s impossible because you must use the base 2 instead of 10.
So the closest to 10^2 = 100 would be 128 = 2^7. The total number of bits you need is 9 : 6 for the value 45 (101101) + 3 bits for the value 7 (111).
Then the value 45 x 2^-7 = 0.3515625. Now you have a serious inaccuracy problem. 0.3515625 is not nearly close to 0.45.
How do we improve this inaccuracy? Well we could change the value 45 and 7 to something else.
How about 460 x 2^-10 = 0.44921875. You are now using 9 bits for 460 and 4 bits for 10. Then it’s a bit closer but still not that close. However if your initial desired value was 0.44921875 then you would get an exact match with no approximation.
So the formula for your value would be X = A x 2^B. Where A and B are integer values positive or negative.
Obviously the higher the numbers can be the higher would your accuracy become however as you know the number of bits to represent the values A and B are limited. For float you have a total number of 32. Double has 64 and Decimal has 128.

A cute piece of numerical weirdness may be observed if one converts 9999999.4999999999 to a float and back to a double. The result is reported as 10000000, even though that value is obviously closer to 9999999, and even though 9999999.499999999 correctly rounds to 9999999.

Creditcard verification with regex?

What is the right way to verify a credit card with a regex? If which one to use there are tons online. If not how to verify?

See this link Finding or Verifying Credit Card Numbers with Regulars Expressions
Visa: ^4[0-9]{12}(?:[0-9]{3})?$ All Visa card numbers start with a 4. New cards have 16 digits. Old cards have 13.
MasterCard: ^5[1-5][0-9]{14}$ All MasterCard numbers start with the numbers 51 through 55. All have 16 digits.
American Express: ^3[47][0-9]{13}$ American Express card numbers start with 34 or 37 and have 15 digits.
Diners Club: ^3(?:0[0-5]|[68][0-9])[0-9]{11}$ Diners Club card numbers begin with 300 through 305, 36 or 38. All have 14 digits. There are Diners Club cards that begin with 5 and have 16 digits. These are a joint venture between Diners Club and MasterCard, and should be processed like a MasterCard.
Discover: ^6(?:011|5[0-9]{2})[0-9]{12}$ Discover card numbers begin with 6011 or 65. All have 16 digits.
JCB: ^(?:2131|1800|35\d{3})\d{11}$ JCB cards beginning with 2131 or 1800 have 15 digits. JCB cards beginning with 35 have 16 digits.
Bye.

How can I use credit card numbers containing spaces? covers everything you should need.

I think you're looking for the Luhn Algorithm. It's a simple checksum formula used to validate a variety of identification numbers.

That depends on how accurate you want your pre-validation to be. To validate everything you can, you need to compute what the last digit of the card should be and compare to what is entered, which a RegEx cannot do.
For the algorithm and other details see this link, which also provides a list of common number prefixes that you could validate against.

-- Edit:
Infact, I'll slightly disagree with myself and agree with cletus. Validate as much as you can (without getting into details of specific types of credit cards [IMHO]) before sending it on. And it goes without saying (hopefully), that this validation should be done in JavaScript, to make it fast, then on the server, to double check (and for JavaScript disabled people).
-- Previous Response:
Don't bother; just let the provider verify it when you actually attempt payment. No legitimate reason to try and verify it yourself. You can use this though, if you really feel like it.

Can't figure out what this SubString.PadLeft is doing

In this code I am debugging, I have this code snipit:
ddlExpYear.SelectedItem.Value.Substring(2).PadLeft(2, '0');
What does this return? I really can't run this too much as it is part of a live credit card application. The DropDownList as you could imagine from the name contains the 4-digit year.
UPDATE: Thanks everyone. I don't do a lot of .NET development so setting up a quick test isn't as quick for me.

It takes the last two digits of the year and pads the left side with zeroes to a maximum of 2 characters. Looks like a "just in case" for expiration years ending in 08, 07, etc., making sure that the leading zero is present.

This prints "98" to the console.
class Program
{
static void Main(string[] args)
{
Console.Write("1998".Substring(2).PadLeft(2, '0'));
Console.Read();
}
}

Of course you can run this. You just can't run it in the application you're debugging. To find out what it's doing, and not just what it looks like it's doing, make a new web application, put in a DropDownList, put a few static years in it, and then put in the code you've mentioned and see what it does. Then you'll know for certain.

something stupid. It's getting the value of the selected item and taking the everything after the first two characters. If that is only one character, then it adds a '0' to the beginning of it, and if it is zero characters, the it returns '00'. The reason I say this is stupid is because if you need the value to be two characters long, why not just set it like that to begin with when you are creating the drop down list?

It looks like it's grabbing the substring from the 3rd character (if 0 based) to the end, then if the substring has a length less than 2 it's making the length equal to 2 by adding 0 to the left side.

PadLeft ensures that you receive at least two characters from the input, padding the input (on the left side) with the appropriate character. So input, in this case, might be 12. You get "12" back. Or input might be 9, in which case, you get "09" back.
This is an example of complex chaining (see "Is there any benefit in Chaining" post) gone awry, and making code appear overly complex.

The substring returns the value with the first two characters skipped, the padleft pads the result with leading zeros:
string s = "2014";
MessageBox.Show(s.Substring(2).PadLeft(2, 'x')); //14
string s2 = "14";
MessageBox.Show(s2.Substring(2).PadLeft(2, 'x')); //xx
My guess is the code is trying to convert the year to a 2 digit value.

The PadLeft only does something if the user enters a year that is either 2 or 3 digits long.
With a 1-digit year, you get an exception (Subsring errs).
With a 2-digit year (07, 08, etc), it will return 00. I would say this is an error.
With a 3-digit year (207, 208), which the author may have assumed to be typos, it would return the last digit padded with a zero -- 207 -> 07; 208 -> 08.
As long as the user must choose a year and isn't allowed to enter a year, the PadLeft is unnecessary -- the Substring(2) does exactly what you need given a 4-digit year.

This code seems to be trying to grab a 2 digit year from a four digit year (ddlexpyear is the hint)
It takes strings and returns strings, so I will eschew the string delimiters:
1998 -> 98
2000 -> 00
2001 -> 01
2012 -> 12
Problem is that it doesn't do a good job. In these cases, the padding doesn't actually help. Removing the pad code does not affect the cases it gets correct.
So the code works (with or without the pad) for 4 digit years, what does it do for strings of other lengths?
null: exception
0: exception
1: exception
2: always returns "00". e.g. the year 49 (when the Jews were expulsed from rome) becomes "00". This is bad.
3: saves the last digit, and puts a "0" in front of it. Correct in 10% of cases (when the second digit is actually a zero, like 304, or 908), but quite wrong in the remainder (like 915, 423, and 110)
5: just saves the 3rd and 4th digits, which is also wrong, "10549" should probably be "49" but is instead "54".
as you can expect the problem continues in higher digits.

OK so it's taking the value from the drop down, ABCD
Then it takes the substring from position 2, CD
And then it err, left pads it with 2 zeros if it needs too, CD
Or, if you've just ended X, then it would substring to X and pad to OX

It's taking the last two digits of the year, then pad to the left with a "0".
So 2010 would be 10, 2009 would be 09.
Not sure why the developer didn't just set the value on the dropdown to the last two digits, or why you would need to left pad it (unless you were dealing with years 0-9 AD).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.