GUID you get something like aaaef973-d8ce-4c92-95b4-3635bb2d42d5
Is it always the same? Is it always going to have the following format
8 char "-", 4 char "-", 4 char "-", 4 char "-", 12 char
I'm asking because i need to convert a GUID without "-" to GUID with "-" and vice visa.
No; there are other formats, such as the format you listed except with braces. There's also more complex formats. Here are some of the formats MSDN lists:
UUID formats
32 digits: 00000000000000000000000000000000 (N)
32 digits separated by hyphens: 00000000-0000-0000-0000-000000000000 (D)
32 digits separated by hyphens, enclosed in braces: {00000000-0000-0000-0000-000000000000} (B)
32 digits separated by hyphens, enclosed in parentheses: (00000000-0000-0000-0000-000000000000) (P)
Four hexadecimal values enclosed in braces, where the fourth value is a subset of eight hexadecimal values that is also enclosed in braces: {0x00000000,0x0000,0x0000,{0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00}} (X)
—MSDN
You should simply rely upon it being 32 hexadecimal characters, there can be a variety of ways to present it. Check the Wikipedia article for more information, including a description of how they are commonly written.
For your conversion you should really rely on the static Guid.Parse() methods. Using a mix of your example and the ones in icktoofay's answer, this works nicely:
var z = Guid.Parse("aaaef973-d8ce-4c92-95b4-3635bb2d42d5");
z = Guid.Parse("{aaaef973-d8ce-4c92-95b4-3635bb2d42d5}");
z = Guid.Parse("{0x00000000,0x0000,0x0000,{0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00}}");
then for outputting them with or without hyphens etc you can use the Guid.ToString() method with one of the established format codes.
Most of the time, GUIDS are 32-character hexadecimal strings such as {21EC2020-3AEA-1069-A2DD-08002B30309D} (unless they're encoded in Base-64), and are usually stored as a 128-bit integers. They won't always have hyphens, though.
Related
In some legacy .NET code I've come across a number of custom numeric format strings like this:
###,##0.00
What is the difference between this and:
#,#0.00
?
EDIT:
Here are some example inputs I've tried, all of which yield the same result for both masks: 1000000, 1000, 100, 10, 1.456, -30000, 0.002.
EDIT:
#Sahuagin suggested that these masks could be the same because of how the culture is set to group to three digits. However, even using this I can't demonstrate a difference:
var culture = new CultureInfo("en-US");
culture.NumberFormat.NumberGroupSizes = new[] { 1, 2, 3, 4 };
culture.NumberFormat.NumberGroupSizes.Dump();
1234567890.ToString("#,#0.00", culture).Dump(); // 1234,567,89,0.00
1234567890.ToString("###,#0.00", culture).Dump(); // 1234,567,89,0.00
More generally, I understand that # is an "optional" digit which won't create leading or trailing zeroes. However, it seems like just a single # before the decimal point is enough to get all leading digits. The MSDN docs seem to differentiate between # and ## but the explanation doesn't make much sense to me and I haven't found an example where it makes a difference.
# indicates a place where a digit will appear, if one exists in the number. Absent any such symbols, leading digits automatically appear (though you must have at least either one # or one 0, or no number will appear), but they are still useful as placeholders in some kinds of formats, for example a telephone number:
var value = 1234567890;
Console.WriteLine("{0:###-###-####}", value);
// outputs 123-456-7890
In your example of #,#0.00, I think that only manages to still format correctly (with groups of three) because , is a special grouping symbol, and the culture info is set to group digits in threes. Without that, you would get something like 123-45.67, if you used a - instead of a , for example.
Here is more specific information about , from MSDN:
The "," character serves as both a group separator and a number scaling specifier.
Group separator: If one or more commas are specified between two digit placeholders (0 or #) that format the integral digits of a number, a group separator character is inserted between each number group in the integral part of the output.
The NumberGroupSeparator and NumberGroupSizes properties of the current NumberFormatInfo object determine the character used as the number group separator and the size of each number group. For example, if the string "#,#" and the invariant culture are used to format the number 1000, the output is "1,000".
Number scaling specifier: If one or more commas are specified immediately to the left of the explicit or implicit decimal point, the number to be formatted is divided by 1000 for each comma. For example, if the string "0,," is used to format the number 100 million, the output is "100".
So in your first example of ###,##0.00, it could probably be reduced to #,0.00, if desired, although #,##0.00 is what I usually use since it is much more clear.
I came across this line of code today:
int c = (int)'c';
I was not aware you could cast a char to an int. So I tested it out, and found that a=97, b=98, c=99, d=100 etc etc...
Why is 'a' 97? What do those numbers relate to?
Everyone else (so far) has referred to ASCII. That's a very limited view - it works for 'a', but doesn't work for anything with an accent etc - which can very easily be represented by char.
A char is just an unsigned 16-bit integer, which is a UTF-16 code unit. Usually that's equivalent to a Unicode character, but not always - sometimes multiple code units are required for a single full character. See the documentation for System.Char for more details.
The implicit conversion from char to int (you don't need the cast in your code) just converts that 16-bit unsigned integer to a 32-bit signed integer in the natural, non-lossy way - just as if you had a ushort.
Note that every valid character in ASCII has the same value in UTF-16, which is why the two are often confused when the examples are only ones from the ASCII set.
97 is UTF-16 code unit value of letter a.
Basically this number relates to UTF-16 code unit of given character.
These are the ASCII values representing the characters.
They are the decimal representation of their ascii counterpart:
http://www.asciitable.com/index/asciifull.gif
so 'a' would be a 97
They are character codes, commonly known as ASCII values.
Technically, though, the character codes are not ASCII.
I saw this post on Jon Skeet's blog where he talks about string reversing. I wanted to try the example he showed myself, but it seems to work... which leads me to believe that I have no idea how to create a string that contains a surrogate pair which will actually cause the string reversal to fail. How does one actually go about creating a string with a surrogate pair in it so that I can see the failure myself?
The simplest way is to use \U######## where the U is capital, and the # denote exactly eight hexadecimal digits. If the value exceeds 0000FFFF hexadecimal, a surrogate pair will be needed:
string myString = "In the game of mahjong \U0001F01C denotes the Four of circles";
You can check myString.Length to see that the one Unicode character occupies two .NET Char values. Note that the char type has a couple of static methods that will help you determine if a char is a part of a surrogate pair.
If you use a .NET language that does not have something like the \U######## escape sequence, you can use the method ConvertFromUtf32, for example:
string fourCircles = char.ConvertFromUtf32(0x1F01C);
Addition: If your C# source file has an encoding that allows all Unicode characters, like UTF-8, you can just put the charater directly in the file (by copy-paste). For example:
string myString = "In the game of mahjong 🀜 denotes the Four of circles";
The character is UTF-8 encoded in the source file (in my example) but will be UTF-16 encoded (surrogate pairs) when the application runs and the string is in memory.
(Not sure if Stack Overflow software handles my mahjong character correctly. Try clicking "edit" to this answer and copy-paste from the text there, if the "funny" character is not here.)
The term "surrogate pair" refers to a means of encoding Unicode characters with high code-points in the UTF-16 encoding scheme (see this page for more information);
In the Unicode character encoding, characters are mapped to values between 0x000000 and 0x10FFFF. Internally, a UTF-16 encoding scheme is used to store strings of Unicode text in which two-byte (16-bit) code sequences are considered. Since two bytes can only contain the range of characters from 0x0000 to 0xFFFF, some additional complexity is used to store values above this range (0x010000 to 0x10FFFF).
This is done using pairs of code points known as surrogates. The surrogate characters are classified in two distinct ranges known as low surrogates and high surrogates, depending on whether they are allowed at the start or the end of the two-code sequence.
Try this yourself:
String surrogate = "abc" + Char.ConvertFromUtf32(Int32.Parse("2A601", NumberStyles.HexNumber)) + "def";
Char[] surrogateArray = surrogate.ToCharArray();
Array.Reverse(surrogateArray);
String surrogateReversed = new String(surrogateArray);
or this, if you want to stick with the blog example:
String surrogate = "Les Mise" + Char.ConvertFromUtf32(Int32.Parse("0301", NumberStyles.HexNumber)) + "rables";
Char[] surrogateArray = surrogate.ToCharArray();
Array.Reverse(surrogateArray);
String surrogateReversed = new String(surrogateArray);
nnd then check the string values with the debugger. Jon Skeet is damn right... strings and dates seem easy but they are absolutely NOT.
I need for this to work in a single format statement and to work for both ints and decimals:
For example:
int myInt=0;
decimal myDecimal=0.0m;
// ... Some other code
string formattedResult1=String.Format("{0}",myInt);
string formattedResult2=String.Format("{0}",myDecimal);
The expected results are:
"" (i.e., string.Empty) if the item to be formatted is zero
and a numeric value (e.g., "123.456" for the decimal version) if it isn't.
I need for this to occur exclusively as a result of the format specification in the format string.
This should do:
string formattedResult1 = string.Format("{0:0.######;-0.######;\"\"}", myInt);
The colon introduces a numeric format string. The numeric format string is divided into 3 parts with semicolons: part 1 is for positive numbers, part 2 for negative numbers, and part 3 for zeros. To define a blank string you need to delimit it with double quotes otherwise it doesn't like it.
See MSDN for the full syntax.
based from the accepted answer above i have done the same thing in microsoft "report builder"
this worked for me (shows 2 decimal places, blank for zero) :
,##0.00;-#,##0.00;""
How do i get the numeric value of a unicode character in C#?
For example if tamil character அ (U+0B85) given, output should be 2949 (i.e. 0x0B85)
See also
C++: How to get decimal value of a unicode character in c++
Java: How can I get a Unicode character's code?
Multi code-point characters
Some characters require multiple code points. In this example, UTF-16, each code unit is still in the Basic Multilingual Plane:
(i.e. U+0072 U+0327 U+030C)
(i.e. U+0072 U+0338 U+0327 U+0316 U+0317 U+0300 U+0301 U+0302 U+0308 U+0360)
The larger point being that one "character" can require more than 1 UTF-16 code unit, it can require more than 2 UTF-16 code units, it can require more than 3 UTF-16 code units.
The larger point being that one "character" can require dozens of unicode code points. In UTF-16 in C# that means more than 1 char. One character can require 17 char.
My question was about converting char into a UTF-16 encoding value. Even if an entire string of 17 char only represents one "character", i still want to know how to convert each UTF-16 unit into a numeric value.
e.g.
String s = "அ";
int i = Unicode(s[0]);
Where Unicode returns the integer value, as defined by the Unicode standard, for the first character of the input expression.
It's basically the same as Java. If you've got it as a char, you can just convert to int implicitly:
char c = '\u0b85';
// Implicit conversion: char is basically a 16-bit unsigned integer
int x = c;
Console.WriteLine(x); // Prints 2949
If you've got it as part of a string, just get that single character first:
string text = GetText();
int x = text[2]; // Or whatever...
Note that characters not in the basic multilingual plane will be represented as two UTF-16 code units. There is support in .NET for finding the full Unicode code point, but it's not simple.
((int)'அ').ToString()
If you have the character as a char, you can cast that to an int, which will represent the character's numeric value. You can then print that out in any way you like, just like with any other integer.
If you wanted hexadecimal output instead, you can use:
((int)'அ').ToString("X4")
X is for hexadecimal, 4 is for zero-padding to four characters.
How do i get the numeric value of a unicode character in C#?
A char is not necessarily the whole Unicode code point. In UTF-16 encoded languages such as C#, you may actually need 2 chars to represent a single "logical" character. And your string lengths migh not be what you expect - the MSDN documnetation for String.Length Property says:
"The Length property returns the number of Char objects in this instance, not the number of Unicode characters."
So, if your Unicode character is encoded in just one char, it is already numeric (essentially an unsigned 16-bit integer). You may want to cast it to some of the integer types, but this won't change the actual bits that were originally present in the char.
If your Unicode character is 2 chars, you'll need to multiply one by 2^16 and add it to the other, resulting in a uint numeric value:
char c1 = ...;
char c2 = ...;
uint c = ((uint)c1 << 16) | c2;
How do i get the decimal value of a unicode character in C#?
When you say "decimal", this usually means a character string containing only characters that a human being would interpret as decimal digits.
If you can represent your Unicode character by only one char, you can convert it to decimal string simply by:
char c = 'அ';
string s = ((ushort)c).ToString();
If you have 2 chars for your Unicode character, convert them to a uint as described above, then call uint.ToString.
--- EDIT ---
AFAIK diacritical marks are considered separate "characters" (and separate code points) despite being visually rendered together with the "base" character. Each of these code points taken alone is still at most 2 UTF-16 code units.
BTW I think the proper name for what you are talking about is not "character" but "combining character". So yes, a single combining character can have more than 1 code point and therefore more than 2 code units. If you want a decimal representation of such as combining character, you can probably do it most easily through BigInteger:
string c = "\x0072\x0338\x0327\x0316\x0317\x0300\x0301\x0302\x0308\x0360";
string s = (new BigInteger(Encoding.Unicode.GetBytes(c))).ToString();
Depending on what order of significance of the code unit "digits" you wish, you may want reverse the c.
char c = 'அ';
short code = (short)c;
ushort code2 = (ushort)c;
This is an example of using Plane 1, the Supplementary Multilingual Plane (SMP):
string single_character = "\U00013000"; //first Egyptian ancient hieroglyph in hex
//it is encoded as 4 bytes (instead of 2)
//get the Unicode index using UTF32 (4 bytes fixed encoding)
Encoding enc = new UTF32Encoding(false, true, true);
byte[] b = enc.GetBytes(single_character);
Int32 code = BitConverter.ToInt32(b, 0); //in decimal