Convert string to ASCII without exceptions (like TryParse)

Convert string to ASCII without exceptions (like TryParse) - c#

I am implementing a TryParse() method for an ASCII string class. The method takes a string and converts it to a C-style string (i.e. a null-terminated ASCII string).
I had been using only a Parse(), doing the conversion to ASCII using::
public static bool Parse(string s, out byte[] result)
{
result = null;
if (s == null || s.Length < 1)
return false;
byte[]d = new byte[s.Length + 1]; // Add space for null-terminator
System.Text.Encoding.ASCII.GetBytes(s).CopyTo(d, 0);
// GetBytes can throw exceptions
// (so can CopyTo() but I can replace that with a loop)
result = d;
return true;
}
However, as part of the idea of a TryParse is to remove the overhead of exceptions, and GetBytes() throws exceptions, I'm looking for a different method that does not do so.
Maybe there is a TryGetbytes()-like method?
Or maybe we can reason about the expected format of a standard .Net string and perform the change mathematically (I'm not overly familiar with UTF encodings)?
EDIT: I guess for non-ASCII chars in the string, the TryParse() method should return false
EDIT: I expect when I get around to implementing the ToString() method for this class I may need to do the reverse there.

Two options:
You could just ignore Encoding entirely, and write the loop yourself:
public static bool TryParse(string s, out byte[] result)
{
result = null;
// TODO: It's not clear why you don't want to be able to convert an empty string
if (s == null || s.Length < 1)
{
return false;
}
byte buffer = new byte[s.Length + 1]; // Add space for null-terminator
for (int i = 0; i < s.Length; i++)
{
char c = s[i];
if (c > 127)
{
return false;
}
buffer[i] = (byte) c;
}
result = buffer;
return true;
}
That's simple, but may be slightly slower than using Encoding.GetBytes.
The second option would be to use a custom EncoderFallback:
public static bool TryParse(string s, out byte[] result)
{
result = null;
// TODO: It's not clear why you don't want to be able to convert an empty string
if (s == null || s.Length < 1)
{
return false;
}
var fallback = new CustomFallback();
var encoding = new ASCIIEncoding { EncoderFallback = fallback };
byte buffer = new byte[s.Length + 1]; // Add space for null-terminator
// Use overload of Encoding.GetBytes that writes straight into the buffer
encoding.GetBytes(s, 0, s.Length, buffer, 0);
if (fallback.HadErrors)
{
return false;
}
result = buffer;
return true;
}
That would require writing CustomFallback though - it would need to basically keep track of whether it had ever been asked to handle invalid input.
If you didn't mind an encoding processing the data twice, you could call Encoding.GetByteCount with a UTF-8-based encoding with a replacement fallback (with a non-ASCII replacement character), and check whether that returns the same number of bytes as the number of chars in the string. If it does, call Encoding.ASCII.GetBytes.
Personally I'd go for the first option unless you have reason to believe it's too slow.

There are two possible exceptions that Encoding.GetBytes might throw according to the documentation.
ArgumentNullException is easily avoided. Do a null check on your input and you can ensure this is never thrown.
EncoderFallbackException needs a bit more investigation... Reading the documentation:
A fallback strategy determines how an encoder handles invalid characters or how a decoder handles invalid bytes.
And if we looking in the documentation for ASCII encoding we see this:
It uses replacement fallback to replace each string that it cannot encode and each byte that it cannot decode with a question mark ("?") character.
That means it doesn't use the Exception Fallback and thus will never throw an EncoderFallbackException.
So in summary if you are using ASCII encoding and ensure you don't pass in a null string then you will never have an exception thrown by the call to GetBytes.

The GetBytes method is throwing an exception because your Encoding.EncoderFallback specifies that it should throw an exception.
Create an encoding object with EncoderReplacementFallback to avoid exceptions on unencodable characters.
Encoding encodingWithFallback = new ASCIIEncoding() { DecoderFallback = DecoderFallback.ReplacementFallback };
encodingWithFallback.GetBytes("Hɘ££o wor£d!");
This way imitates the TryParse methods of the primitive .NET value types:
bool TryEncodingToASCII(string s, out byte[] result)
{
if (s == null || Regex.IsMatch(s, "[^\x00-\x7F]")) // If a single ASCII character is found, return false.
{
result = null;
return false;
}
result = Encoding.ASCII.GetBytes(s); // Convert the string to ASCII bytes.
return true;
}

Related

Convert string to number array c#

I have a follow string example
0 0 1 2.33 4
2.1 2 11 2
There are many ways to convert it to an array, but I need the fastest one, because files can contain 1 billion elements.
string can contain an indefinite number of spaces between numbers
i'am trying
static void Main()
{
string str = "\n\n\n 1 2 3 \r 2322.2 3 4 \n 0 0 ";
byte[] byteArray = Encoding.ASCII.GetBytes(str);
MemoryStream stream = new MemoryStream(byteArray);
var values = ReadNumbers(stream);
}
public static IEnumerable<object> ReadNumbers(Stream st)
{
var buffer = new StringBuilder();
using (var sr = new StreamReader(st))
{
while (!sr.EndOfStream)
{
char digit = (char)sr.Read();
if (!char.IsDigit(digit) && digit != '.')
{
if (buffer.Length == 0) continue;
double ret = double.Parse(buffer.ToString() , culture);
buffer.Clear();
yield return ret;
}
else
{
buffer.Append(digit);
}
}
if (buffer.Length != 0)
{
double ret = double.Parse(buffer.ToString() , culture);
buffer.Clear();
yield return ret;
}
}
}

There are a few things you can do to improve the performance of your code. First, you can use the Split method to split the string into an array of strings, where each element of the array is a number in the string. This will be faster than reading each character of the string one at a time and checking if it is a digit.
Next, you can use double.TryParse to parse each element of the array into a double, rather than using double.Parse and catching any potential exceptions. TryParse will be faster because it does not throw an exception if the string is not a valid double.
Here is an example of how you could implement this:
public static IEnumerable<double> ReadNumbers(string str)
{
string[] parts = str.Split(new[] {' ', '\n', '\r', '\t'}, StringSplitOptions.RemoveEmptyEntries);
foreach (string part in parts)
{
if (double.TryParse(part, NumberStyles.Any, CultureInfo.InvariantCulture, out double value))
{
yield return value;
}
}
}

I'd rather suggest the simpliest solution first and haunt for nano-seconds if there really is a problem with that code.
var doubles = myInput.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Select(x => double.Parse(x, whateverCulture))
Do that for every line in your file, not for the entire file at once, as reading such a huge file at once may crush your memory.
Pretty easy to understand. Afterwards perform a benchmark-test with your data and see if it really affects performance when trying to parse the data. However chances are the actual bottleneck is reading that huge file- which essentially is a IO-thing.

You can improve your solution by trying to avoid creating many objects on the heap. Especially buffer.ToString() is called repeatedly and creates new strings. You can use a ReadOnlySpan<char> struct to slice the string and at the same time avoid heap allocations. A span provides pointers into the original string without making copies of it or parts of it when slicing.
Also do not return the doubles as object, as this will box them. I.e., it will store them on the heap. See: Boxing and Unboxing (C# Programming Guide). If you prefer your solution over mine, use an IEnumerable<double> as return type of your method.
The use of ReadOnlySpans; however, has the disadvantage that it cannot be used in iterator methods. The reason is that a ReadOnlySpan must be allocated on the stack, but an iterator method wraps its state in a class. If you try, you will get the Compiler Error CS4013:
Instance of type cannot be used inside a nested function, query expression, iterator block or async method
Therefore, we must either store the numbers in a collection or consume them in-place. Since I don't know what you want to do with the numbers, I use the former approach:
public static List<double> ReadNumbers(string input)
{
ReadOnlySpan<char> inputSpan = input.AsSpan();
int start = 0;
bool isNumber = false;
var list = new List<double>(); // Improve by passing the expected maximum length.
int i;
for (i = 0; i < inputSpan.Length; i++) {
char c = inputSpan[i];
bool isDigit = Char.IsDigit(c);
if (isDigit && !isNumber) {
start = i;
isNumber = true;
} else if (isNumber && !isDigit && c != '.') {
isNumber = false;
if (Double.TryParse(inputSpan[start..i], CultureInfo.InvariantCulture, out double d)) {
list.Add(d);
}
}
}
if (isNumber) {
if (Double.TryParse(inputSpan[start..i], CultureInfo.InvariantCulture, out double d)) {
list.Add(d);
}
}
return list;
}
inputSpan[start..i] creates a slice as ReadOnlySpan<char>.
Test
string str = "\n\n\n 1 2 3 \r 2322.2 3 4 \n 0 0 ";
foreach (double d in ReadNumbers(str)) {
Console.WriteLine(d);
}
But whenever you are asking for speed, you must run benchmarks to compare the different approaches. Very often what seems a superior solution may fail in the benchmark.
See also: All About Span: Exploring a New .NET Mainstay

Convert char array with zero-char element to C# string

When I receive data from unmanaged code written in C (WinAPI) it asks to reserve a number of bytes and pass the handle (pointer) to the string.
Using Marshal.AllocHGlobal(150) didit.
In return, I received the number of chars, terminated by '/0' - C style.
When I build string from this char array using new string(charBuff) it doesn't cut the string at the '/0' point.
Well, I could use Substring + IndexOf, but is there any elegant way to cut it using some special existing method?

Ok, I found it after wakening up.
It's
Marshal.PtrToStringUni(IntPtr)
string MyStringFromWinAPI()
{
string result;
IntPtr strPtr = Marshal.AllocHGlobal(500);
// here would be any API that gets reserved buffer to rerturn string value
SendMessage(camHwnd, WM_CAP_DRIVER_GET_NAME_UNICODE, 500, strPtr);
// now you could follow 2 ways
// 1-st one is long and boring
char[] charBuff = new char[500];
Marshal.Copy(strPtr, charBuff, 0, 500);
Marshal.FreeHGlobal((IntPtr)strPtr);
result = new string(charBuff);
result = result.Substring(0, result.IndexOf('\0'));
return result;
// or more elegant way
result = Marshal.PtrToStringUni(strPtr);
Marshal.FreeHGlobal((IntPtr)strPtr);
return result;
}

How to display text being held in a int variable?

My variable holds some text but is currently being stored as an int (the class used reads the bytes at a memory address and converts to int. Variable.ToString just displays the decimal representation, but doesn't encode it to readable text, or in other words, I would now like to convert the data from int to string with ascii encoding or something.

Here is a demo (based on our Q+A above).
Note: Settings a string with the null terminator as a test, then encoding it into ASCII bytes, then using unsafe (you will need to allow that in Build Option in project properties), itearte through each byte and convert it until 0x0 is reached.
private void button1_Click(object sender, EventArgs e)
{
var ok = "OK" + (char)0;
var ascii = Encoding.ASCII;
var bin = ascii.GetBytes( ok );
var sb = new StringBuilder();
unsafe
{
fixed (byte* p = bin)
{
byte b = 1;
var i = 0;
while (b != 0)
{
b = p[i];
if (b != 0) sb.Append( ascii.GetString( new[] {b} ) );
i++;
}
}
}
Console.WriteLine(sb);
}
Note the FIXED statement, this is required managed strings/arrayts etc are not guaranteed to be statically placed in memory - this ensures it during that section.

assuming an int variable
int x=10;
you can convert this into string as
string strX = x.ToString();
Try this
string s = "9quali52ty3";
byte[] ASCIIValues = Encoding.ASCII.GetBytes(s);
foreach(byte b in ASCIIValues) {
Console.WriteLine(b);
}

Int32.ToString() has an overload that takes a format string. Take a look at the available format strings and use one of those.

Judging by your previous question, the int you have is (probably) a pointer to the string. Depending on whether the data at the pointer is chars or bytes, do one of these to get your string:
var s = new string((char*)myInt);
var s = new string((sbyte*)myInt);

OK. If you variable is a pointer, then Tim is pointing you in the right direction (assuming it is an address and not an offset from an address - in which case you will need the start address to offset from).
If, on the other hand, your variable contains four encoded ascii characters (of a byte each), then you need to split to bytes and convert each byte to a character. Something like this Console.WriteLine(TypeDescriptor.GetConverter(myUint).ConvertTo(myUint, typeof(string))); from Here - MSDN ByteConverter

Cannot implicitly convert type 'char*' to 'bool'

The program below is from the book "Cracking the coding interview", by Gayle Laakmann McDowell.
The original code is written in C.
Here is the original code:
void reverse(char *str) {
char * end = str;
char tmp;
if (str) {
while (*end) {
++end;
}
--end;
while (str < end) {
tmp = *str;
*str++ = *end;
*end-- = tmp;
}
}
}
I am trying to convert it in C#. After researching via Google and playing with the code, below is what I have. I am a beginner and really stuck. I am not getting the value I am expecting. Can someone tell me what I am doing wrong?
class Program
{
unsafe void reverse(char *str)
{
char* end = str;
char tmp;
if (str) // Cannot implicitly convert type 'char*' to 'bool'
{
while(*end) // Cannot implicitly convert type 'char*' to 'bool'
{
++end;
}
--end;
while(str < end)
{
tmp = *str;
*str += *end;
*end -= tmp;
}
}
}
public static void Main(string[] args)
{
}
}

I can't really remember if this ever worked in C#, but I am quite certain it should not work now.
To start off by answering your question. There is no automatic cast between pointers and bool. You need to write
if(str != null)
Secondly, you can't convert char to bool. Moreover, there is no terminating character for C# strings, so you can't even implement this. Normally, you would write:
while(*end != '\0') // not correct, for illustration only
But there is no '\0' char, or any other magic-termination-char. So you will need to take an int param for length.
Going back to the big picture, this sort of code seems like a terribly inappropriate place to start learning C#. It's way too low level, few C# programmers deal with pointers, chars and unsafe contexts on a daily basis.
... and if you must know how to fix your current code, here's a working program:
unsafe public static void Main(string[] args)
{
var str = "Hello, World";
fixed(char* chr = str){
reverse(chr, str.Length);
}
}
unsafe void reverse(char *str, int length)
{
char* end = str;
char tmp;
if (str != null) //Cannot implicitly convert type 'char*' to 'bool'
{
for(int i = 0; i < length; ++i) //Cannot implicitly convert type 'char*' to 'bool'
{
++end;
}
--end;
while(str<end)
{
tmp = *str;
*str = *end;
*end = tmp;
--end;
++str;
}
}
}
Edit: removed a couple of .Dump() calls, as I was trying it out in LINQPad :)

C# is not like C in that you cannot use an integer value as an implicit bool. You need to manually convert it. One example:
if (str != 0)
while (*end != 0)
A word of warning: If you are migrating from C, there are a few things that can trip you up in a program like this. The main one is that char is 2 bytes. strings and chars are UTF-16 encoded. The C# equivalent of char is byte. Of course, you should use string rather than C-strings.
Another thing: If you got your char* by converting a normal string to a char*, forget your entire code. This is not C. Strings are not null-terminated.
Unless this is homework, you would much rather be doing something like this:
string foo = "Hello, World!";
string bar = foo.Reverse(); // bar now holds "!dlroW ,olleH";

As you discovered, you can use pointers in C#, if you use the unsafe keyword. But you should do that only when really necessary and when you really know what you're doing. You certainly shouldn't use pointers if you're just beginning with the language.
Now, to your actual question: you are given a string of characters and you want to reverse it. Strings in C# are represented as the string class. And string is immutable, so you can't modify it. But you can convert between a string and an array of characters (char[]) and you can modify that. And you can reverse an array by using the static method Array.Reverse(). So, one way to write your method would be:
string Reverse(string str)
{
if (str == null)
return null;
char[] array = str.ToCharArray(); // convert the string to array
Array.Reverse(array); // reverse the array
string result = new string(array); // create a new string out of the array
return result; // and return it
}
If you wanted to write the code that actually does the reversing, you can do that too (as an exercise, I wouldn't do it in production code):
string Reverse(string str)
{
if (str == null)
return null;
char[] array = str.ToCharArray();
// use indexes instead of pointers
int start = 0;
int end = array.Length - 1;
while (start < end)
{
char tmp = array[start];
array[start] = array[end];
array[end] = tmp;
start++;
end--;
}
return new string(array);
}

Try making it more explicit what you are checking in the conditions:
if(str != null)
while(*end != '\0')
You might also want to watch out for that character swapping code: it looks like you've got a +/- in there. If that's supposed to update your pointer positions, I'd suggest making those separate operations.

Best solution for an StringToInt function in C#

I were asked to do an StringToInt / Int.parse function on the white board in an job interview last week and did not perform very good but I came up with some sort of solution. Later when back home I made one in Visual Studion and I wonder if there are any better solution than mine below.
Have not bothred with any more error handling except checking that the string only contains digits.
private int StrToInt(string tmpString)
{
int tmpResult = 0;
System.Text.Encoding ascii = System.Text.Encoding.ASCII;
byte[] tmpByte = ascii.GetBytes(tmpString);
for (int i = 0; i <= tmpString.Length-1; i++)
{
// Check whatever the Character is an valid digit
if (tmpByte[i] > 47 && tmpByte[i] <= 58)
// Here I'm using the lenght-1 of the string to set the power and multiply this to the value
tmpResult += (tmpByte[i] - 48) * ((int)Math.Pow(10, (tmpString.Length-i)-1));
else
throw new Exception("Non valid character in string");
}
return tmpResult;
}

I'll take a contrarian approach.
public int? ToInt(this string mightBeInt)
{
int convertedInt;
if (int.TryParse(mightBeInt, out convertedInt))
{
return convertedInt;
}
return null;
}
After being told that this wasn't the point of the question, I'd argue that the question tests C coding skills, not C#. I'd further argue that treating strings as arrays of characters is a very bad habit in .NET, because strings are unicode, and in any application that might be globalized, making any assumption at all about character representations will get you in trouble, sooner or later. Further, the framework already provides a conversion method, and it will be more efficient and reliable than anything a developer would toss off in such a hurry. It's always a bad idea to re-invent framework functionality.
Then I would point out that by writing an extension method, I've created a very useful extension to the string class, something that I would actually use in production code.
If that argument loses me the job, I probably wouldn't want to work there anyway.
EDIT: As a couple of people have pointed out, I missed the "out" keyword in TryParse. Fixed.

Converting to a byte array is unnecessary, because a string is already an array of chars. Also, magic numbers such as 48 should be avoided in favor of readable constants such as '0'. Here's how I'd do it:
int result = 0;
for (int i = str.Length - 1, factor = 1; i >= 0; i--, factor *= 10)
result += (str[i] - '0') * factor;
For each character (starting from the end), add its numeric value times the correct power of 10 to the result. The power of 10 is calculated by multiplying it with 10 repeatedly, instead of unnecessarily using Math.Pow.

I think your solution is reasonably ok, but instead of doing math.pow, I would do:
tmpResult = 10 * tmpResult + (tmpByte[i] - 48);
Also, check the length against the length of tmpByte rather than tmpString. Not that it normally should matter, but it is rather odd to loop over one array while checking the length of another.
And, you could replace the for loop with a foreach statement.

If you want a simple non-framework using implementation, how 'bout this:
"1234".Aggregate(0, (s,c)=> c-'0'+10*s)
...and a note that you'd better be sure that the string consists solely of decimal digits before using this method.
Alternately, use an int? as the aggregate value to deal with error handling:
"12x34".Aggregate((int?)0, (s,c)=> c>='0'&&c<='9' ? c-'0'+10*s : null)
...this time with the note that empty strings evaluate to 0, which may not be most appropriate behavior - and no range checking or negative numbers are supported; both of which aren't hard to add but require unpretty looking wordy code :-).
Obviously, in practice you'd just use the built-in parsing methods. I actually use the following extension method and a bunch of nearly identical siblings in real projects:
public static int? ParseAsInt32(this string s, NumberStyles style, IFormatProvider provider) {
int val;
if (int.TryParse(s, style, provider, out val)) return val;
else return null;
}
Though this could be expressed slightly shorter using the ternary ? : operator doing so would mean relying on side-effects within an expression, which isn't a boon to readability in my experience.

Just because i like Linq:
string t = "1234";
var result = t.Select((c, i) => (c - '0') * Math.Pow(10, t.Length - i - 1)).Sum();

I agree with Cyclon Cat, they probably want someone who will utilize existing functionality.
But I would write the method a little bit different.
public int? ToInt(this string mightBeInt)
{
int number = 0;
if (Int32.TryParse(mightBeInt, out number))
return number;
return null;
}
Int32.TryParse does not allow properties to be given as out parameter.

I was asked this question over 9000 times on interviews :) This version is capable of handling negative numbers and handles other conditions very well:
public static int ToInt(string s)
{
bool isNegative = false, gotAnyDigit = false;
int result = 0;
foreach (var ch in s ?? "")
{
if(ch == '-' && !(gotAnyDigit || isNegative))
{
isNegative = true;
}
else if(char.IsDigit(ch))
{
result = result*10 + (ch - '0');
gotAnyDigit = true;
}
else
{
throw new ArgumentException("Not a number");
}
}
if (!gotAnyDigit)
throw new ArgumentException("Not a number");
return isNegative ? -result : result;
}
and a couple of lazy tests:
[TestFixture]
public class Tests
{
[Test]
public void CommonCases()
{
foreach (var sample in new[]
{
new {e = 123, s = "123"},
new {e = 110, s = "000110"},
new {e = -011000, s = "-011000"},
new {e = 0, s = "0"},
new {e = 1, s = "1"},
new {e = -2, s = "-2"},
new {e = -12223, s = "-12223"},
new {e = int.MaxValue, s = int.MaxValue.ToString()},
new {e = int.MinValue, s = int.MinValue.ToString()}
})
{
Assert.AreEqual(sample.e, Impl.ToInt(sample.s));
}
}
[Test]
public void BadCases()
{
var samples = new[] { "1231a", null, "", "a", "-a", "-", "12-23", "--1" };
var errCount = 0;
foreach (var sample in samples)
{
try
{
Impl.ToInt(sample);
}
catch(ArgumentException)
{
errCount++;
}
}
Assert.AreEqual(samples.Length, errCount);
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Convert string to ASCII without exceptions (like TryParse) - c#

Related

Convert string to number array c#

Convert char array with zero-char element to C# string

How to display text being held in a int variable?

Cannot implicitly convert type 'char*' to 'bool'

Best solution for an StringToInt function in C#

Categories

Resources