CodePointAt equivalent in c# - c#

I have this code in JAVA and works fine
String a = "ABC";
System.out.println(a.length());
for (int n = 0; n < a.length(); n++)
System.out.println(a.codePointAt(n));
The output as expected is
3
65
66
67
I am a little confused aboud a.length() because it is suposed to return the length in chars but String must store every < 256 char in 16 bits or whatever a unicode character would need.
But the question is how can i do the same i C#?.
I need to scan a string and act depending on some unicode characters found.
The real code I need to translate is
String str = this.getString();
int cp;
boolean escaping = false;
for (int n = 0; n < len; n++)
{
//===================================================
cp = str.codePointAt(n); //LOOKING FOR SOME EQUIVALENT IN C#
//===================================================
if (!escaping)
{
....
//Closing all braces below.
Thanks in advance.
How much i love JAVA :). Just need to deliver a Win APP that is a cliend of a Java / Linux app server.

The exact translation would be this :
string a = "ABC⤶"; //Let's throw in a rare unicode char
Console.WriteLine(a.Length);
for (int n = 0; n < a.Length; n++)
Console.WriteLine((int)a[n]); //a[n] returns a char, which we can cast in an integer
//final result : 4 65 66 68 10550
In C# you don't need codePointAt at all, you can get the unicode number directly by casting the character into an int (or for an assignation, it's casted implicitly). So you can get your cp simply by doing
cp = (int)str[n];
How much I love C# :)
However, this is valid only for low Unicode values. Surrogate pairs are handled as two different characters when you break the string down, so they won't be printed as one value. If you really need to handle UTF32, you can refer to this answer, which basically uses
int cp = Char.ConvertToUtf32(a, n);
after incrementing the loop by two (because it's coded on two chars), with the Char.IsSurrogatePair() condition.
Your translation would then become
string a = "ABC\U0001F01C";
Console.WriteLine(s.Count(x => !char.IsHighSurrogate(x)));
for (var i = 0; i < a.Length; i += char.IsSurrogatePair(a, i) ? 2 : 1)
Console.WriteLine(char.ConvertToUtf32(a, i));
Please note the change from s.Length() to a little bit of LINQ for the count, because surrogates are counted as two chars. We simply count how many characters are not higher surrogates to get the clear count of actual characters.

The following code gets the codpoint of a part of a string
var s = "\uD834\uDD61";
for (var i = 0; i < s.Length; i += char.IsSurrogatePair(s, i) ? 2 : 1)
{
var codepoint = char.ConvertToUtf32(s, i);
Console.WriteLine("U+{0:X4}", codepoint);
}

Related

Binary of a number

Is there a simply way to convert decimal/ascii 6 bit decimal numbers from 1 to 100 to binary representation?
To be more specific im interested in 6 bit binary ascii. So I made this to get int 32.
For example "u" is changed to 61 instead 117 in standard decimal ascii.
Then this 61 is needed to be "111101" instead of traditional "01110101" but after this 48 + 8 math it's not important as now it's normal binary, just with 6 bits used.
foreach (char c in partToDecode)
{
var sum = c - 48;
if (sum>40)
{
sum = sum - 8;
}
Found this, but i don't have a clue how to traspose it to c#
void binary(unsigned n) {
unsigned i;
// Reverse loop
for (i = 1 << 31; i > 0; i >>= 1)
printf("%u", !!(n & i));
}
. . .
binary(65);
You can try Convert.ToString, e.g.
int source = 61;
// "111101"
string result = Convert.ToString(source, 2).PadLeft(6, '0');
Fiddle

How to load a huuuuuge string into a BigInteger in C# and not lose the ASCII encoding

I am using BigInteger.Parse(some string) but it takes forever and I'm not even sure if it finishes.
However, I can convert the huge string to a byte array and jam the byte array into a BigInteger constructor in very little time but it munges the original number stored in the string because of the endian issue with BigInteger and byte arrays.
Is there a way to convert the string to a byte array and put the byte array into the BigInteger object while preserving the original number stored in ASCII in the string?
String s = "12345"; // Some huge string, millions of digits.
BigInteger bi = new BigInteger(Encoding.ASCII.GetBytes(s); // very fast but the 12345 is lost
// OR...
BigInteger bi = BigInteger.Parse(s); // Takes forever therefore unuseable.
The byte[] representation of BigInteger has little to do with the ASCII characters. Much like the byte representation of an int has little to do with the ASCII representation of it.
To parse the number, each character must be converted to the digit value, and added to the previously parsed value multiplied by 10. That is probably why it's taking so long, and any version you write will probably not perform better. It has to do something like:
var nr=0;
foreach(var c in "123") nr=nr*10+(c-'0');
Edit
While it is not possible to perform the conversion by just converting to a byte array, the library implementation is slower then it has to be (at least for simple scenarios that do not need internationalization for example). Using the trick suggested by Rudy Velthuis in the comments and not taking into account hex formats or internationalization, I was able to produce a version which for 303104 characters runs ~5 times faster (from 18.2s to 3.75s. For 1 milion digits the fast method takes 47s, long, but it is a huge number):
public static class Helper
{
static BigInteger[] factors = Enumerable.Range(0, 19).Select(i=> BigInteger.Pow(10, i)).ToArray();
public static BigInteger ParseFast(string str)
{
var result = new BigInteger(0);
var n = str.Length;
var hasSgn = str[0] == '-';
int j;
for (var i = hasSgn ? 1 : 0; i < n; i += j - i)
{
long gr = 0;
for (j = i; j < i + 18 && j < n; j++)
{
gr = gr * 10 + (str[j] - '0');
}
result = result * factors[j-i]+ gr;
}
if (hasSgn)
{
result = BigInteger.MinusOne * result;
}
return result;
}
}

Byte/char buffer to long and/or double

In my code I need to convert string representation of integers to long and double values.
String representation is a byte array (byte[]). For example, for a number 12345 string representation is { 49, 50, 51, 52, 53 }
Currently, I use following obvious code for conversion to long (and almost the same code for conversion to double)
private long bytesToIntValue()
{
string s = System.Text.Encoding.GetEncoding("Latin1").GetString(bytes);
return long.Parse(s, CultureInfo.InvariantCulture);
}
This code works as expected, but in my case I want something better. It's because currently I must convert bytes to string first.
In my case, bytesToIntValue() gets called about 12 million times and about 25% of all memory allocations are made in this method.
Sure, I want to optimize this part. I want to perform conversions without intermediate string (+ speed, - allocation).
What would you recommend? How can I perform conversions without intermediate strings? Is there a faster method to perform conversions?
EDIT:
Byte arrays I am dealing with are always contain ASCII-encoded data. Numbers can be negative. For double values exponential format is allowed. Hexadecimal integers are not allowed.
How can I perform conversions without intermediate strings?
Well you can easily convert each byte to a char. For example - untested:
private static long ConvertAsciiBytesToInt32(byte[] bytes)
{
long value = 0;
foreach (byte b in bytes)
{
value *= 10L;
char c = b; // Implicit conversion; effectively ISO-8859-1
if (c < '0' || c > '9')
{
throw new ArgumentException("Bytes contains non-digit: " + c);
}
value += (c - '0');
}
return value;
}
Note that this really does assume it's ASCII (or compatible) - if your byte array is actually UTF-16 (for example) then it will definitely do the wrong thing.
Also note that this doesn't perform any sort of length validation or overflow checking... and it doesn't cope with negative numbers. You could add all of these if you want, but we don't know enough about your requirements to know if it's worth adding the complexity.
I'm not sure that there is a easy way to do that,
Please note that it won't work with other encodings, The test shown on my computer that this is only 3 times faster (I don't think it worth it).
The code + test :
class MainClass
{
public static void Main(string[] args)
{
string str = "12341234";
byte[] buffer = Encoding.ASCII.GetBytes(str);
Stopwatch sw = Stopwatch.StartNew();
for(int i = 0; i < 1000000 ;i ++)
{
long val = BufferToLong.GetValue(buffer);
}
Console.WriteLine (sw.ElapsedMilliseconds);
sw.Restart();
for (int i = 0 ; i < 1000000 ; i++)
{
string valStr = Encoding.ASCII.GetString(buffer);
long val = long.Parse(valStr);
}
Console.WriteLine (sw.ElapsedMilliseconds);
}
}
static class BufferToLong
{
public static long GetValue(Byte[] buffer) {
long number = 0;
foreach (byte currentByte in buffer) {
char currentChar = (char)currentByte;
int currentDigit = currentChar - '0';
number *= 10 ;
number += currentDigit;
}
return number;
}
}
In the end, I created C# version of strol function. This function comes with CRT and source code of CRT comes with Visual Studio.
The resulting method is almost the same as code provided by #Jon Skeet in his answer but also contains some checks for overflow.
In my case all the changes proved to be very useful in terms of speed and memory.

Convert int32 to string in base 16

I'm currently trying to convert a .NET JSON Encoder to NETMF but have hit a problem with Convert.ToString() as there isn't such thing in NETMF.
The original line of the encoder looks like this:
Convert.ToString(codepoint, 16);
And after looking at the documentation for Convert.ToString(Int32, Int32) it says it's for converting an int32 into int 2, 8, 10 or 16 by providing the int as the first parameter and the base as the second.
What are some low level code of how to do this or how would I go about doing this?
As you can see from the code, I only need conversion from an Int32 to Int16.
EDIT
Ah, the encoder also then wants to do:
PadLeft(4, '0');
on the string, is this just adding 4 '0' + '0' + '0' + '0' to the start of the string?
If you mean you want to change a 32-bit integer value into a string which shows the value in hexadecimal:
string hex = intValue.ToString("x");
For variations, please see Stack Overflow question Convert a number into the hex value in .NET.
Disclaimer: I'm not sure if this function exists in NETMF, but it is so fundamental that I think it should.
Here’s some sample code for converting an integer to hexadecimal (base 16):
int num = 48764; // assign your number
// Generate hexadecimal number in reverse.
var sb = new StringBuilder();
do
{
sb.Append(hexChars[num & 15]);
num >>= 4;
}
while (num > 0);
// Pad with leading 0s for a minimum length of 4 characters.
while (sb.Length < 4)
sb.Append('0');
// Reverse string and get result.
char[] chars = new char[sb.Length];
sb.CopyTo(0, chars, 0, sb.Length);
Array.Reverse(chars);
string result = new string(chars);
PadLeft(4, '0') prepends leading 0s to the string to ensure a minimum length of 4 characters.
The hexChars value lookup may be trivially defined as a string:
internal static readonly string hexChars = "0123456789ABCDEF";
Edit: Replacing StringBuilder with List<char>:
// Generate hexadecimal number in reverse.
List<char> builder = new List<char>();
do
{
builder.Add(hexChars[num & 15]);
num >>= 4;
}
while (num > 0);
// Pad with leading 0s for a minimum length of 4 characters.
while (builder.Count < 4)
builder.Add('0');
// Reverse string and get result.
char[] chars = new char[builder.Count];
for (int i = 0; i < builder.Count; ++i)
chars[i] = builder[builder.Count - i - 1];
string result = new string(chars);
Note: Refer to the “Hexadecimal Number Output” section of Expert .NET Micro Framework for a discussion of this conversion.

how to loop through the digits of a binary number?

I have a binary number 1011011, how can I loop through all these binary digits one after the other ?
I know how to do this for decimal integers by using modulo and division.
int n = 0x5b; // 1011011
Really you should just do this, hexadecimal in general is much better representation:
printf("%x", n); // this prints "5b"
To get it in binary, (with emphasis on easy understanding) try something like this:
printf("%s", "0b"); // common prefix to denote that binary follows
bool leading = true; // we're processing leading zeroes
// starting with the most significant bit to the least
for (int i = sizeof(n) * CHAR_BIT - 1; i >= 0; --i) {
int bit = (n >> i) & 1;
leading |= bit; // if the bit is 1, we are no longer reading leading zeroes
if (!leading)
printf("%d", bit);
}
if (leading) // all zero, so just print 0
printf("0");
// at this point, for n = 0x5b, we'll have printed 0b1011011
You can use modulo and division by 2 exactly like you would in base 10. You can also use binary operators, but if you already know how to do that in base 10, it would be easier if you just used division and modulo
Expanding on Frédéric and Gabi's answers, all you need to do is realise that the rules in base 2 are no different to in base 10 - you just need to do your division and modulus with a divisor 2 instead of 10.
The next step is simply to use number >> 1 instead of number / 2 and number & 0x1 instead of number % 2 to improve performance. Mind you, with modern optimising compilers there's probably no difference...
Use an AND with increasing powers of two...
In C, at least, you can do something like:
while (val != 0)
{
printf("%d", val&0x1);
val = val>>1;
}
To expand on #Marco's answer with an example:
uint value = 0x82fa9281;
for (int i = 0; i < 32; i++)
{
bool set = (value & 0x1) != 0;
value >>= 1;
Console.WriteLine("Bit set: {0}", set);
}
What this does is test the last bit, and then shift everything one bit.
If you're already starting with a string, you could just iterate through each of the characters in the string:
var values = "1011011".Reverse().ToCharArray();
for(var index = 0; index < values.Length; index++) {
var isSet = (Boolean)Int32.Parse(values[index]); // Boolean.Parse only works on "true"/"false", not 0/1
// do whatever
}
byte input = Convert.ToByte("1011011", 2);
BitArray arr = new BitArray(new[] { input });
foreach (bool value in arr)
{
// ...
}
You can simply loop through every bit. The following C like pseudocode allows you to set the bit number you want to check. (You might also want to google endianness)
for()
{
bitnumber = <your bit>
printf("%d",(val & 1<<bitnumber)?1:0);
}
The code basically writes 1 if the bit it set or 0 if not. We shift the value 1 (which in binary is 1 ;) ) the number of bits set in bitnumber and then we AND it with the value in val to see if it matches up. Simple as that!
So if bitnumber is 3 we simply do this
00000100 ( The value 1 is shifted 3 left for example)
AND
10110110 (We check it with whatever you're value is)
=
00000100 = True! - Both values have bit 3 set!

Categories