How/why is a negative value parsing successfully to a char?

How/why is a negative value parsing successfully to a char? - c#

I've been stumped on this for a few days now. My goal is to ultimately take the following C++ snippet and port it to C#. This may not be possible, I don't normally work with C++ so I'm hoping for some guidance. Here is the basic method:
char* EncryptStr(char* pszPlainRec, char* pszEncrKey, int iStartPos, int iLength)
{
int i;
for (i = iStartPos; i < iStartPos + iLength; i++)
{
pszPlainRec[i] = (~pszPlainRec[i]) ^ pszEncrKey[i - iStartPos];
}
return pszPlainRec;
}
Prior to the complement operation:
pszPlainRec[8] = '0' = 48
pszEncrKey[0] = 'T' = 84
I get that the compliment operator ~ turns the value to negative but how is it generating the right angle bracket '>' with a negative value? See screenshot below for clarification. If this is some weird voodoo C++ magic, is there an equivalent way in C# or VB.net? Ideally I'd like to return a string. But I can't parse a negative number to a char in C#.

Related

CodePointAt equivalent in c#

I have this code in JAVA and works fine
String a = "ABC";
System.out.println(a.length());
for (int n = 0; n < a.length(); n++)
System.out.println(a.codePointAt(n));
The output as expected is
3
65
66
67
I am a little confused aboud a.length() because it is suposed to return the length in chars but String must store every < 256 char in 16 bits or whatever a unicode character would need.
But the question is how can i do the same i C#?.
I need to scan a string and act depending on some unicode characters found.
The real code I need to translate is
String str = this.getString();
int cp;
boolean escaping = false;
for (int n = 0; n < len; n++)
{
//===================================================
cp = str.codePointAt(n); //LOOKING FOR SOME EQUIVALENT IN C#
//===================================================
if (!escaping)
{
....
//Closing all braces below.
Thanks in advance.
How much i love JAVA :). Just need to deliver a Win APP that is a cliend of a Java / Linux app server.

The exact translation would be this :
string a = "ABC⤶"; //Let's throw in a rare unicode char
Console.WriteLine(a.Length);
for (int n = 0; n < a.Length; n++)
Console.WriteLine((int)a[n]); //a[n] returns a char, which we can cast in an integer
//final result : 4 65 66 68 10550
In C# you don't need codePointAt at all, you can get the unicode number directly by casting the character into an int (or for an assignation, it's casted implicitly). So you can get your cp simply by doing
cp = (int)str[n];
How much I love C# :)
However, this is valid only for low Unicode values. Surrogate pairs are handled as two different characters when you break the string down, so they won't be printed as one value. If you really need to handle UTF32, you can refer to this answer, which basically uses
int cp = Char.ConvertToUtf32(a, n);
after incrementing the loop by two (because it's coded on two chars), with the Char.IsSurrogatePair() condition.
Your translation would then become
string a = "ABC\U0001F01C";
Console.WriteLine(s.Count(x => !char.IsHighSurrogate(x)));
for (var i = 0; i < a.Length; i += char.IsSurrogatePair(a, i) ? 2 : 1)
Console.WriteLine(char.ConvertToUtf32(a, i));
Please note the change from s.Length() to a little bit of LINQ for the count, because surrogates are counted as two chars. We simply count how many characters are not higher surrogates to get the clear count of actual characters.

The following code gets the codpoint of a part of a string
var s = "\uD834\uDD61";
for (var i = 0; i < s.Length; i += char.IsSurrogatePair(s, i) ? 2 : 1)
{
var codepoint = char.ConvertToUtf32(s, i);
Console.WriteLine("U+{0:X4}", codepoint);
}

Byte/char buffer to long and/or double

In my code I need to convert string representation of integers to long and double values.
String representation is a byte array (byte[]). For example, for a number 12345 string representation is { 49, 50, 51, 52, 53 }
Currently, I use following obvious code for conversion to long (and almost the same code for conversion to double)
private long bytesToIntValue()
{
string s = System.Text.Encoding.GetEncoding("Latin1").GetString(bytes);
return long.Parse(s, CultureInfo.InvariantCulture);
}
This code works as expected, but in my case I want something better. It's because currently I must convert bytes to string first.
In my case, bytesToIntValue() gets called about 12 million times and about 25% of all memory allocations are made in this method.
Sure, I want to optimize this part. I want to perform conversions without intermediate string (+ speed, - allocation).
What would you recommend? How can I perform conversions without intermediate strings? Is there a faster method to perform conversions?
EDIT:
Byte arrays I am dealing with are always contain ASCII-encoded data. Numbers can be negative. For double values exponential format is allowed. Hexadecimal integers are not allowed.

How can I perform conversions without intermediate strings?
Well you can easily convert each byte to a char. For example - untested:
private static long ConvertAsciiBytesToInt32(byte[] bytes)
{
long value = 0;
foreach (byte b in bytes)
{
value *= 10L;
char c = b; // Implicit conversion; effectively ISO-8859-1
if (c < '0' || c > '9')
{
throw new ArgumentException("Bytes contains non-digit: " + c);
}
value += (c - '0');
}
return value;
}
Note that this really does assume it's ASCII (or compatible) - if your byte array is actually UTF-16 (for example) then it will definitely do the wrong thing.
Also note that this doesn't perform any sort of length validation or overflow checking... and it doesn't cope with negative numbers. You could add all of these if you want, but we don't know enough about your requirements to know if it's worth adding the complexity.

I'm not sure that there is a easy way to do that,
Please note that it won't work with other encodings, The test shown on my computer that this is only 3 times faster (I don't think it worth it).
The code + test :
class MainClass
{
public static void Main(string[] args)
{
string str = "12341234";
byte[] buffer = Encoding.ASCII.GetBytes(str);
Stopwatch sw = Stopwatch.StartNew();
for(int i = 0; i < 1000000 ;i ++)
{
long val = BufferToLong.GetValue(buffer);
}
Console.WriteLine (sw.ElapsedMilliseconds);
sw.Restart();
for (int i = 0 ; i < 1000000 ; i++)
{
string valStr = Encoding.ASCII.GetString(buffer);
long val = long.Parse(valStr);
}
Console.WriteLine (sw.ElapsedMilliseconds);
}
}
static class BufferToLong
{
public static long GetValue(Byte[] buffer) {
long number = 0;
foreach (byte currentByte in buffer) {
char currentChar = (char)currentByte;
int currentDigit = currentChar - '0';
number *= 10 ;
number += currentDigit;
}
return number;
}
}

In the end, I created C# version of strol function. This function comes with CRT and source code of CRT comes with Visual Studio.
The resulting method is almost the same as code provided by #Jon Skeet in his answer but also contains some checks for overflow.
In my case all the changes proved to be very useful in terms of speed and memory.

Can C#/.net parse exponential hex encoded floating point numbers from strings?

C++, Java all include the [-]0xh.hhhhp+/-d format in the syntax of the language, other languages like python and C99 have library support for parsing these strings (float.fromhex, scanf).
I have not, yet, found a way to parse this exact hex encoded exponential format in C# or using the .NET libraries.
Is there a good way to handle this, or a decent alternative encoding? (decimal encoding is not exact).
Example strings:
0x1p-8
-0xfe8p-12
Thank you

Unfortunately, I don't know of any method built-in to .NET that compares to Python's float.fromhex(). So I suppose the only thing you can do is roll your own .fromhex() in C#. This task can range in difficulty from "Somewhat Easy" to "Very Difficult" depending on how complete and how optimized you'd like your solution to be.
Officially, the IEEE 754 spec allows for decimals within the hexadecimal coefficient (ie. 0xf.e8p-12) which adds a layer of complexity for us since (much to my frustration) .NET also does not support Double.Parse() for hexadecimal strings.
If you can constrain the problem to examples like you've provided where you only have integers as the coefficient, you can use the following solution using string operations:
public static double Parsed(string hexVal)
{
int index = 0;
int sign = 1;
double exponent = 0d;
//Check sign
if (hexVal[index] == '-')
{
sign = -1;
index++;
}
else if (hexVal[index] == '+')
index++;
//consume 0x
if (hexVal[index] == '0')
{
if (hexVal[index+1] == 'x' || hexVal[index+1] == 'X')
index += 2;
}
int coeff_start = index;
int coeff_end = hexVal.Length - coeff_start;
//Check for exponent
int p_index = hexVal.IndexOfAny(new char[] { 'p', 'P' });
if (p_index == 0)
throw new FormatException("No Coefficient");
else if (p_index > -1)
{
coeff_end = p_index - index;
int exp_start = p_index + 1;
int exp_end = hexVal.Length;
exponent = Convert.ToDouble(hexVal.Substring(exp_start, exp_end - (exp_start)));
}
var coeff = (double)(Int32.Parse(hexVal.Substring(coeff_start, coeff_end), NumberStyles.AllowHexSpecifier));
var result = sign * (coeff * Math.Pow(2, exponent));
return result;
}
If you're seeking an identical function to Python's fromhex(), you can try your hand at converting the CPython implementation into C# if you'd like. I tried, but got in over my head as I'm not very familiar with the standard and had trouble following all the overflow checks they were looking out for. They also allow other things like unlimited leading and trailing whitespace, which my solution does not allow for.
My solution is the "Somewhat Easy" solution. I'm guessing if you really knew your stuff, you could build the sign, exponent and mantissa at the bit level instead of multiplying everything out. You could definitely do it in one pass as well, rather than cheating with the .Substring() methods.
Hopefully this at least gets you on the right track.

I have written C# code for formatting and parsing numbers in the hexadecimal floating-point format described in IEEE 754r and supported by C99, C++11 and Java. The code is part of the BSD-licenced FParsec library for F# and is contained in a single file:
https://bitbucket.org/fparsec/main/src/tip/FParsecCS/HexFloat.cs
The supported format is described a bit at http://www.quanttec.com/fparsec/reference/charparsers.html#members.floatToHexString
The test code (written in F#) can be found at https://bitbucket.org/fparsec/main/src/tip/Test/HexFloatTests.fs

C#: The result of casting a negative integer to a byte

I was a looking at the source code of a project, and I noticed the following statement (both keyByte and codedByte are of type byte):
return (byte)(keyByte - codedByte);
I'm trying now to understand what would the result be in cases where keyByte is smaller than codedByte, which results in a negative integer.
After some experiments to understand the result of casting a negative integer which has a value in the range [-255 : -1], I got the following results:
byte result = (byte) (-6); // result = 250
byte result = (byte) (-50); // result = 206
byte result = (byte) (-17); // result = 239
byte result = (byte) (-20); // result = 236
So, provided that -256 < a < 0 , I was able to determine the result by:
result = 256 + a;
My question is: should I always expect this to be the case?

Yes, that will always be the case (i.e. it is not simply dependent on your environment or compiler, but is defined as part of the C# language spec). See http://msdn.microsoft.com/en-us/library/aa691349(v=vs.71).aspx:
In an unchecked context, the result is truncated by discarding any high-order bits that do not fit in the destination type.
The next question is, if you take away the high-order bits of a negative int between -256 and -1, and read it as a byte, what do you get? This is what you've already discovered through experimentation: it is 256 + x.
Note that endianness does not matter because we're discarding the high-order (or most significant) bits, not the "first" 24 bits. So regardless of which end we took it from, we're left with the least significant byte that made up that int.

Yes. Remember, there's no such thing as "-" in the domain of a .Net "Byte":
http://msdn.microsoft.com/en-us/library/e2ayt412.aspx
Because Byte is an unsigned type, it cannot represent a negative
number. If you use the unary minus (-) operator on an expression that
evaluates to type Byte, Visual Basic converts the expression to Short
first. (Note: substitute any CLR/.Net language for "Visual Basic")
ADDENDUM:
Here's a sample app:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace TestByte
{
class Program
{
static void Main(string[] args)
{
for (int i = -255; i < 256; i++)
{
byte b = (byte)i;
System.Console.WriteLine("i={0}, b={1}", i, b);
}
}
}
}
And here's the resulting output:
testbyte|more
i=-255, b=1
i=-254, b=2
i=-253, b=3
i=-252, b=4
i=-251, b=5
...
i=-2, b=254
i=-1, b=255
i=0, b=0
i=1, b=1
...
i=254, b=254
i=255, b=255

Here is an algorithm that performs the same logic as casting to byte, to help you understand it:
For positives:
byte bNum = iNum % 256;
For negatives:
byte bNum = 256 + (iNum % 256);
It's like searching for any k which causes x + 255k to be in the range 0 ... 255. There could only be one k which produces a result with that range, and the result will be the result of casting to byte.
Another way of looking at it is as if it "cycles around the byte value range":
Lets use the iNum = -712 again, and define a bNum = 0.
We shall do iNum++; bNum--; untill iNum == 0:
iNum = -712;
bNum = 0;
iNum++; // -711
bNum--; // 255 (cycles to the maximum value)
iNum++; // -710
bNum--; // 254
... // And so on, as if the iNum value is being *consumed* within the byte value range cycle.
This is, of course, just an illustration to see how logically it works.

This is what happens in unchecked context. You could say that the runtime (or compiler if the Int32 that you cast to Byte is known at compiletime) adds or subtracts 256 as many times as is needed until it finds a representable value.
In a checked context, an exception (or compiletime error) results. See http://msdn.microsoft.com/en-us/library/khy08726.aspx

Yes - unless you get an exception.
.NET defines all arithmetic operations only on 4 byte and larger data types. So the only non-obvious point is how converting an int to a byte works.
For a conversion from an integral type to another integral type, the result of conversion depends on overflow checking context (says the ECMA 334 standard, Section 13.2.1).
So, in the following context
checked
{
return (byte)(keyByte - codedByte);
}
you will see a System.OverflowException. Whereas in the following context:
unchecked
{
return (byte)(keyByte - codedByte);
}
you are guaranteed to always see the results that you expect regardless of whether you do or don't add a multiple of 256 to the difference; for example, 2 - 255 = 3.
This is true regardless of how the hardware represents signed values. The CLR standard (ECMA 335) specifies, in Section 12.1, that the Int32 type is a "32-bit two's-complement signed value". (Well, that also matches all platforms on which .NET or mono is currently available anyway, so one could almost guess that it would work anyway, but it is good to know that the practice is supported by the language standard and portable.)
Some teams do not want to specify overflow checking contexts explicitly, because they have a policy of checking for overflows early in development cycle, but not in released code. In these cases you can safely do byte arithmetic like this:
return (byte)((keyByte - codedByte) % 256);

Bit shifting confusion in C#

I have some old code like this:
private int ParseByte(byte theByte)
{
byte[] bytes = new byte[1];
bytes[0] = theByte;
BitArray bits = new BitArray(bytes);
if (bits[0])
return 1;
else
return 0;
}
It's long and I figured I could trim it down like this:
private int ParseByte(byte theByte)
{
return theByte >> 7;
}
But, I'm not getting the same values as the first function. The byte either contains 00000000 or 10000000. What am I missing here? Am I using an incorrect operator?

The problem is that, in the first function, bits[0] returns the least significant bit, but the second function is returning the most significant bit. To modify the second function to get the least significant bit:
private int ParseByte(byte theByte)
{
return theByte & 00000001;
}
To modify the first function to return the most significant bit, you should use bits[7] -- not bits[0].

The equivalent function to the first snipet is:
return theByte & 1 == 1
In the second snipet you were chechink the most significative bit and in the first snipet the least significant.

Do you want to return int or string? Anyway - you can use modulo:
return theByte % 2 == 0 ? "0" : "1"
OK, you edited ... and want to return int
A word to your shifting operation: you would have to use << instead of >>. But this returns (when you cast to byte instead of int) 0 or 128 and not 0 or 1. So you could rewrite your second solution as:
return (byte)(theByte << 7) == 128 ? 1 : 0;
But the other answers contain really better solutions than this.

Perhaps the first function should check for bits[7] ?

You have an extra zero in your binary numbers (you have 9 digits in each). I'm assuming that's just a typo.
Are you sure you're doing your ordering correctly? Binary is traditionally written right-to-left, not left-to-right like most other numbering systems. If the binary number you showed is property formatted (meaning that 10000000 is really the number 128 and not the number 1) then your first code snippet shouldn't work and the second should. If you're writing it backwards (meaning 10000000 is 1, not 128), then you don't even need to bitshift. Just AND it with 1 (theByte & 1).
In fact, regardless of the approach a bitwise AND (the & operator) seems more appropriate. Given that your first function works and the second does not, I'm assuming you just wrote the number backwards and need to AND it with 1 as described above.

According to a user on Microsoft's site the BitArray internally stores the bits into Int32s in big endian in bit order. That could cause the problem. For a solution and further info you can visit the link.

1st The first function does not work as it tries to return a string instead of an int.
But what you might want is this:
private static int ParseByte(byte theByte)
{
return theByte & 1;
}
However you might also want this:
private static string ParseByteB(byte theByte)
{
return (theByte & 1).ToString();
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How/why is a negative value parsing successfully to a char? - c#

Related

CodePointAt equivalent in c#

Byte/char buffer to long and/or double

Can C#/.net parse exponential hex encoded floating point numbers from strings?

C#: The result of casting a negative integer to a byte

Bit shifting confusion in C#

Categories

Resources