I'm importing a csv file in C#, sometimes with '.', sometimes with ',' as decimal separator.
Is there a best way of determinate the decimal separator better than counting from the last char down to the first apperance?
Thanks in advance.
Franklin Albricias.
If you know the correct culture in advance (for example, because you know the user that created the file), you can try to parse the provided value using the appropriate CultureInfo or NumberFormatInfo:
Decimal value = Decimal.Parse(input, new CultureInfo("es-ES"));
But if the type is not known in advance, you'll have to check it manually by examining the characters until you find a separator. (And even that approach assumes that you are guaranteed to always have a decimal separator, such that one is written as 1.0 rather than 1.)
You can't just try each expected format one after the other because you may get false positives.
10,000 means something valid but different for both formats.
Why not use both as a separator?
Have a look at NumberFormatInfo
Edit:
For each value try to parse it with one of the separators.
If that fails try to parse it with the other.
This depends on the actual data stored in the csv file and the data separation character (';' or ',' or ' ').
If all data is always in floting point notation you can use a regular expression that checks both cases. You can use "d+,\d+" to check for values separated by ',' or "\d+\.\d+" for values using '.' as separator
Under the assumption that the file contains only numbers - no strings and what ever - and there are at least two columns, you can do the following.
Go through the first line and look for a semicolon. If you find one, you have semicolon separated numbers with commas as decimal separator, else comma separated numbers with points as decimal separator.
In all other cases you will have to use a heuristic (and sometimes get the wrong conclusion) or you have to strictly parse the file under both assumptions.
Related
Currently I am doing this way.
string strPoint = "12.5";
string strComma = "12,5";
Console.WriteLine("strPoint: " + float.Parse(strPoint,System.Globalization.CultureInfo.InvariantCulture));
Console.WriteLine("strComma: " + float.Parse(strComma,System.Globalization.CultureInfo.InvariantCulture));
Result:
strPoint: 12,5 and strComma: 125.
strComma must be 12.5? what could be the reason behind this. Please advise.
Remove the InvariantCulture from second Parse.Use your current culture.The decimal separator of InvariantCulture is dot,not comma.You can verify that using :
CultureInfo.InvariantCulture.NumberFormat.CurrencyDecimalSeparator;
In first code snippet you are using dot as a separator and using the InvariantCulture for Parse and it is parsing correctly because the InvariantCulture uses dot as a separator.
In the second code snippet you are using comma and it is truncated because it is not the decimal separator of InvariantCulture, the same culture can not use two different separators at the same time.
In InvariantCulture, the comma is the thousands separator, and for correct strings, the result of parsing cannot depend on whether the thousands separator is present (1000 and 1,000 are two different representations of the same number). float.Parse, however, does not enforce that the thousands separator is only used in the appropriate places, it simply skips it entirely.
I think "fLoat.parse()" will treat "," as a group separator not as decimal separator which is "."
Hence Group separator in this case will always be vanish in output mentioned.
I need for this to work in a single format statement and to work for both ints and decimals:
For example:
int myInt=0;
decimal myDecimal=0.0m;
// ... Some other code
string formattedResult1=String.Format("{0}",myInt);
string formattedResult2=String.Format("{0}",myDecimal);
The expected results are:
"" (i.e., string.Empty) if the item to be formatted is zero
and a numeric value (e.g., "123.456" for the decimal version) if it isn't.
I need for this to occur exclusively as a result of the format specification in the format string.
This should do:
string formattedResult1 = string.Format("{0:0.######;-0.######;\"\"}", myInt);
The colon introduces a numeric format string. The numeric format string is divided into 3 parts with semicolons: part 1 is for positive numbers, part 2 for negative numbers, and part 3 for zeros. To define a blank string you need to delimit it with double quotes otherwise it doesn't like it.
See MSDN for the full syntax.
based from the accepted answer above i have done the same thing in microsoft "report builder"
this worked for me (shows 2 decimal places, blank for zero) :
,##0.00;-#,##0.00;""
Regular Expressions have always seemed like black magic to me and I have never been able to get my head around building them.
I am now in need of a Reg Exp (for validation putsposes) that checks that the user enters a number according to the following rules.
no alpha characters
can have decimal
can have commas for the thousands, but the commas must be correctly placed
Some examples of VALID values:
1.23
100
1,234
1234
1,234.56
0.56
1,234,567.89
INVALID values:
1.ab
1,2345.67
0,123.45
1.24,687
You can try the following expression
^([1-9]\d{0,2}(,\d{3})+|[1-9]\d*|0)(\.\d+)?$
Explanation:
The part before the point consists of
either 1-3 digits followed by (one or more) comma plus three digits
or just digits (at least one)
If then follows a dot also some digits must follow.
^(((([1-9][0-9]{0,2})(,[0-9]{3})*)|([0-9]+)))?(\.[0-9]+)?$
This works for all of your examples of valid data, and will also accept decimals that start with a decimal point. (I.e. .61, .07, etc.)
I noticed that all of your examples of valid decimals (1.23, 1,234.56, and 1,234,567.89) had exactly two digits after the decimal point. I'm not sure if this is coincidence, or if you actually require exactly two digits after the decimal point. (I.e. maybe you're working with money values.) The regular expression as I've written it works for any number of digits after the decimal point. (I.e. 1.2345 and 1,234.56789 would be considered valid.) If you need there to be exactly two digits after the decimal point, change the end of the regular expression from +)?$ to {2})?$.
try to use this regex
^(\d{1,3}[,](\d{3}[,])*\d{3}(\.\d{1,3})?|\d{1,3}(\.\d+)?)$
I know you asked for a regex but I think it's much saner to just call double.TryParse() and consider your input acceptable if that method returns true.
double dummy;
var isValid=double.TryParse(text, out dummy);
It won't match your testcases exactly; the major difference being that it is very lenient with commas (so it will accept two of your INVALID inputs).
I'm not sure why you care, but if you really do want comma strictness you could do a preprocessing step where you only check the validity of comma placement and then call double.TryParse() only if the string passes the comma placement test. (If you want to be truly careful, you'll have to honor the CultureInfo so you can know what character is used for separators, and how many digits there are between separators, in the environment your program finds itself in)
Either approach results in code that is more "obviously right" than a regex. For example, you won't have to live with the fear that your regex left out some important case, like scientific notation.
I am using standard input and output to pass 2 base64 strings from one application to another. What would be the best way separating them so I could get them as a two separate strings in other application? I was thinking using a simple comma, to separate them and then just use
string[] s = output.Split(',');
Where output is the data I read in from standard output.
Example with the comma:
MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCv5E5Y0Wrad5FrLjeUsA71Gipl3mhjIuCw1xhj
jDwXN87lIhpE32UvItf+mvp8flQ+fhi5H0PditDCzUFg8lXuiuOXxelLWEXA8hs7jc+4zzR5ps3R
fOv3M6H8K5XGkwWLhVNQX47sAGyY/43JdbfX7+FsYUFeHW/wa2yKSMZS3wIDAQAB
,HNJpFQyeyJoVbeEgkw/WNtzR0JTPIa1hlK1C8LbFcVcJfL33ssq3gbzi0zxn0n2WxBYKJZj2Kqbs
lVrmFbQJRgvq4ZNF4F8z+xjL9RVVE/rk5x243c3Szh05Phzx+IUyXJe6GkITDmsxcwovvzSaGhzU
3qQkNbhIN0fVyynpg0Kfm0WytuW71ku1eq45ibcczgwQLRJX1GKzC9wH7x/V36i6SpyrxZ/+uCIL
4QgnKt6x4QG7Gfk3Msam6h6JTFdzkeHJjq6JzeapdQn5LxeMY0jLGc4cadMCvy/Jdrcg02pG2wOO
/gJT77xvX+d1igi+BQ/YpFlwXI0BIuRwMAeLojmZdRYjJ+LY69auxgpnQvSF4A+Wc6Jo8m1pzzHB
yQvA8KyiRwbyijoBOsg+oK18UPFWeJ5hE3e+8l/WSEcii+oPgXyXTnK+seesGdOPeem3HukNyIps
L/StHZEkzeJFTr8LIB9HLqDikYU2mQjTiK5cIExoyy2Go+0ndL84rCzMZAlfFlffocL9x+SGyeer
M1mxmyDtmiQfDphEZixHOylciKUhWR00dhxkVRQ4Q9LYCeyGfDiewL+rm5se/ePCklWtTGycV9HM
H5vYLhgIkf5W6+XcqcJlE6vp4WWxmKHQYqRAdfW5MYWskx7jBDTMV2MLy7N6gQRQa/OpK8ruAbVf
MwWP1sGyhAxgrw/UxTH1tW498WI5JtQR3oub3+Uj5AqydhwzQtWM58WfVQXdv2bFZmGH7d9A+C95
DQ8QXKrV7Ot/wVq5KKLgpJy8iMe/G/iyXOmQhkLnZ3qvBaIJd+E2ZIVPty6XGMwgC4JebArr+a6V
Cb/SO+vR+eZmXLln/w==
All you have to do is to use a separator which is not a valid Base64 character. Comma is not a base64 character so you can use.
Base64 characters are [0-9a-zA-Z/=+] (all numbers, uppercase, lowercase, forward slash plus and equal sign).
This seems like a good solution. The comma cannot be part of a base64 index table so it is a safe separator.
You can wrap it i some XML. the CDATA element is perfect for that.
Is there an easy way to take a dynamic decimal value and create a validation regular expression that can handle this?
For example, I know that /1[0-9]{1}[0-9]{1}/ should match anything from 100-199, so what would be the best way to programmatically create a similar structure given any decimal number?
I was thinking that I could just loop through each digit and build one from there, but I have no idea how I would go about that.
Ranges are difficult to handle correctly with regular expressions. REs are a tool for text-based analysis or pattern matching, not semantic analysis. The best that you can probably do safely is to recognize a string that is a number with a certain number of digits. You can build REs for the maximum or minimum number of digits for a range using a base 10 logarithm. For example, the match a number between a and b where b > a, construct the RE by:
re = "[1-9][0-9]{"
re += str(log10(a)-1)
re += "-"
re += str(log10(b)-1)
re += "}"
Note: the example is in no particular programming language. Sorry, C# not really spoken here.
There are some boundary point issues, but the basic idea is to construct an RE like [1-9][0-9]{1} for anything between 100 and 999 and then if the string matches the expression, convert to an integer and do the range analysis in value space instead of lexical space.
With all of that said... I would go with Mehrdad's solution and use something provided by the language like decimal.TryParse and then range check the result.
^[-]?\d+(.\d+)?$
will validate a number with an optional decimal point and / or minus sign at the front
No, is the simple answer. Generating the regex that will work correctly would be more complicated than doing the following:
Decimal regex (find the decimal numbers in a string). "^\$?[+-]?[\d,]*(\.\d*)?$"
Convert result to decimal and compare to your range. (decimal.TryParse)
This depends on where and what you want to parse.
Using the bellow RegEx to parse strings for numbers.
Can handle comma's and dots.
[^\d.,](?<number>(\d{1,3}(\.\d{3})*,\d+|\d{1,3}(,\d{3})*\.\d+|\d*[,\.]\d+|\d+))[^\d.,]