Processing large strings only a lot of numbers in C# - c#

We need to process a lot of strings, which contain many numbers (formatted as strings) and are separated by an arbitrary character.
Ex: numbers separated by a space:
var s = "1234.555 43434.43 434.436 85656.253 564.5656 <etc.>"
Now we are using String.Split to first create an array of strings (containing each number) and then parse these using double.Parse() to a numeric type.
var stringArr = s.Split(separator, StringSplitOptions.RemoveEmptyEntries);
// Some iteration/loop in which at some point this routine is called
var d = double.Parse(stringArr[i]);
But we notice this takes up a lot of resources (memory allocation and CPU), so we would like to create at least a less memory-consuming solution.
The first step is to enumerate the number occurrences in the string and then parse the substrings to a number.
For enumerating, we already found a nice solution on SO:
What are the alternatives to Split a string in c# that don't use String.Split()
However, this routine then uses Substring to read a parting string from the entire string, so I would expect this results (again) in an extra allocation of memory to be able to use/read that partial string...
If so (regarding memory allocation), is there a way to only parse the found part directly from the original string (without allocating a new string) to a numeric value...
I don't mind using unsafe code, if necessary... It will only be marked unsafe during the brief period the numbers are extracted.

separated by an arbitrary character
this forces you to either only handle integers, or to enforce some limitations on what characters can be used as separators and/or decimal separator. For example
1 234.567
might be interpreted as either a single number, two number, or three numbers depending on what characters you want to allow as separators. Note that the thousand separator and decimal separators depend on culture just to make things more complicated.
If you want to minimize memory usage, Double.Parse has a overload that takes a ReadOnlySpan<char> s as input. So just iterate over your string, once you encounter a number character you save the index and iterate to the next non-number character, and use Span<T>.Slice to create a span over the number digits and parse these. Continue until you have parsed all characters.
But you will still need to define what characters counts as part of a number or not. Note that you might want to Parse with the CultureInfo.InvariantCulture to get consistent behavior regardless of region.

I fiddled something.
Based on the assumption, that the "arbitrary" split char is known.
I also added the variant as entrypoint to a DataFlowPipeline, because ReadOnlySpan has some quirks with async and enumeration :D
It is a naive approach to give an idea. You may want to add some sanity checks etc.
using System;
using System.Collections.Generic;
using System.Threading.Tasks.Dataflow;
using System.Threading.Tasks;
public class Program
{
public static async Task Main()
{
var s = "1234.555 43434.43 434.436 85656.253 564.5656";
//foreach( var d in Parse(s, ' ') ) Console.WriteLine(d);
var ab = new ActionBlock<double>(d => Console.WriteLine(d));
ParseAndEnqueue(s, ' ', ab);
await ab.Completion;
}
public static IEnumerable<double> Parse(ReadOnlySpan<char> input, char splitChar)
{
var result = new List<double>();
int nextIndex = input.IndexOf(splitChar);
while( nextIndex >= 0 )
{
if(double.TryParse(input[..nextIndex], out double myVal)) result.Add(myVal);
if( nextIndex+1 >= input.Length ) break;
input = input[(nextIndex+1)..];
nextIndex = input.IndexOf(splitChar);
if( nextIndex < 0 )
if(double.TryParse(input, out double myLastVal)) result.Add(myLastVal);
}
return result;
}
public static void ParseAndEnqueue(ReadOnlySpan<char> input, char splitChar, ActionBlock<double> ab)
{
int nextIndex = input.IndexOf(splitChar);
while( nextIndex >= 0 )
{
if(double.TryParse(input[..nextIndex], out double myVal)) ab.Post(myVal);
if( nextIndex+1 >= input.Length ) break;
input = input[(nextIndex+1)..];
nextIndex = input.IndexOf(splitChar);
if( nextIndex < 0 )
if(double.TryParse(input, out double myLastVal)) ab.Post(myLastVal);
}
ab.Complete();
}
}
=> Fiddle

Related

Read numbers from a file

I have a problem my code is not efficient enough. He thinks he knows the content. How do I write the code so it can work with any file. So he practically only excludes numbers and ignores the words (strings).
public static int SumUpFileContent(string file)
{
int sum = 0;
var lines = File.ReadAllLines(file);
foreach (var line in lines)
{
if (int.TryParse(line, out int i))
sum += i;
}
return sum;
}
Keep in mind :
This doesn't work with numbers that have decimals, only integers. replace int.TryParse() with double.TryParse() if you have to.
The data must come in a very specific format (i.e. every entry must be on its own line)
From the information you provided you could split the file content in to an array
then for each item in the array use an int.tryParse to see if it is a number. (this is assumed that the numbers are always int)

Unity formatting multiple numbers

So I'm a complete newb to unity and c# and I'm trying to make my first mobile incremental game. I know how to format a variable from (e.g.) 1000 >>> 1k however I have several variables that can go up to decillion+ so I imagine having to check every variable's value seperately up to decillion+ will be quite inefficient. Being a newb I'm not sure how to go about it, maybe a for loop or something?
EDIT: I'm checking if x is greater than a certain value. For example if it's greater than 1,000, display 1k. If it's greater than 1,000,000, display 1m...etc etc
This is my current code for checking if x is greater than 1000 however I don't think copy pasting this against other values would be very efficient;
if (totalCash > 1000)
{
totalCashk = totalCash / 1000;
totalCashTxt.text = "$" + totalCashk.ToString("F1") + "k";
}
So, I agree that copying code is not efficient. That's why people invented functions!
How about simply wrapping your formatting into function, eg. named prettyCurrency?
So you can simply write:
totalCashTxt.text = prettyCurrency(totalCashk);
Also, instead of writing ton of ifs you can handle this case with logarithm with base of 10 to determine number of digits. Example in pure C# below:
using System.IO;
using System;
class Program
{
// Very simple example, gonna throw exception for numbers bigger than 10^12
static readonly string[] suffixes = {"", "k", "M", "G"};
static string prettyCurrency(long cash, string prefix="$")
{
int k;
if(cash == 0)
k = 0; // log10 of 0 is not valid
else
k = (int)(Math.Log10(cash) / 3); // get number of digits and divide by 3
var dividor = Math.Pow(10,k*3); // actual number we print
var text = prefix + (cash/dividor).ToString("F1") + suffixes[k];
return text;
}
static void Main()
{
Console.WriteLine(prettyCurrency(0));
Console.WriteLine(prettyCurrency(333));
Console.WriteLine(prettyCurrency(3145));
Console.WriteLine(prettyCurrency(314512455));
Console.WriteLine(prettyCurrency(31451242545));
}
}
OUTPUT:
$0.0
$333.0
$3.1k
$314.5M
$31.5G
Also, you might think about introducing a new type, which implements this function as its ToString() overload.
EDIT:
I forgot about 0 in input, now it is fixed. And indeed, as #Draco18s said in his comment nor int nor long will handle really big numbers, so you can either use external library like BigInteger or switch to double which will lose his precision when numbers becomes bigger and bigger. (e.g. 1000000000000000.0 + 1 might be equal to 1000000000000000.0). If you choose the latter you should change my function to handle numbers in range (0.0,1.0), for which log10 is negative.

Determine a numeric range that may contain alpha and numeric characters

I am not even sure how to phrase this question so I apologize in advance. I have a form used by our QA. it requires input of serial numbers - a lot of them (sometimes hundreds). I have two text boxes on the form for lower and upper numbers in the range (doesn't have to be this way but it is my best guess). I know how to to do this if it were just integers (see code below) but that is not the only format.Examples of the format could include a date code ("170508/1234") or could include alpha characters (ABC1234). there is a wide variety of formats but in every case, I want to find the range from the last set of numbers (typically last four digits - Like "170508/1234 170508/1235... and ABC1234 ABC1235 ... not to exclude 1234 1235 ....).
Thank you in advance
private void btnSN_Click(object sender, EventArgs e)
{
int from = Convert.ToInt32(txtSnStart.Text.Trim());
int to = Convert.ToInt32(txtSnEnd.Text.Trim());
for (int i = from; i <= to; i++)
{
txtSN.Text += i.ToString() +" ";
}
You can make use of RexEx for this, the RegEx expression below should match any number of digits at the end.:
(\d+)$
Here is the modified code:
// find the group of digits at end of entered text
var fromMatch = Regex.Match(txtSnStart.Text.Trim(), #"(\d+)$");
int from = Convert.ToInt32(fromMatch.Groups[1].Value);
// strip the matched digit group from entered text to get the prefix
string prefix = txtSnStart.Text.Trim().SubString(0, txtSnStart.Text.Trim().LastIndexOf(from.ToString()));
var toMatch = Regex.Match(txtSnEnd.Text.Trim(), #"(\d+)$");
int to = Convert.ToInt32(toMatch.Groups[1].Value);
for (int i = from; i <= to; i++)
{
// combine the prefix and range value
txtSN.Text += string.Format("{0}{1} ", prefix, i.ToString());
}
This problem becomes easier if you break it into smaller pieces.
This will become especially important once it becomes apparent that one small part of the problem might become more complex.
The inputs are a prefix, a number to begin the range, and a number to end the range. Given that, here's a class with a function to return those strings:
public class StringRangeCreator
{
public IEnumerable<string> CreateStringRange(
string prefix, int rangeStart, int rangeEnd)
{
if (rangeStart > rangeEnd) throw new ArgumentException(
$"{nameof(rangeStart)} cannot be greater than {nameof(rangeEnd)}");
return Enumerable.Range(rangeStart, (rangeEnd-rangeStart) + 1)
.Select(n => prefix + n);
}
}
Enumerable.Range creates a range of numbers. The Select takes that range and returns a set of strings consisting of the prefix concatenated with the number in the range.
And a unit test to make sure it works:
[TestClass]
public class DetermineRangeFromLastFourCharacters
{
[TestMethod]
public void ReturnsExpectedRangeOfStrings()
{
var result = new StringRangeCreator().CreateStringRange("abc", 1, 10).ToList();
Assert.AreEqual(10, result.Count);
Assert.AreEqual("abc1", result[0]);
Assert.AreEqual("abc10", result[9]);
}
}
But if these are serial numbers then you maybe you don't want ABC8, ABC9, ABC10. You might want ABC08, ABC09, ABC10. All the same length.
So here's the modified class. I'm guessing a little at what the expected behavior might be. It lets you specify an optional minimum number of digits so that you can pad accordingly:
public class StringRangeCreator
{
public IEnumerable<string> CreateStringRange(
string prefix, int rangeStart, int rangeEnd, int minimumDigits = 1)
{
if (rangeStart > rangeEnd) throw new ArgumentException(
$"{nameof(rangeStart)} cannot be greater than {nameof(rangeEnd)}");
return Enumerable.Range(rangeStart, (rangeEnd-rangeStart) + 1)
.Select(n => prefix + n.ToString("0").PadLeft(minimumDigits, '0'));
}
}
And another unit test:
[TestMethod]
public void PadsNumbersAccordingToParameter()
{
var result = new StringRangeCreator().CreateStringRange("abc", 999, 1001, 3).ToList();
Assert.AreEqual(3, result.Count);
Assert.AreEqual("abc999", result[0]);
Assert.AreEqual("abc1001", result[2]);
}
Some other scenarios you might want to account for are negative numbers or extremely large ranges.
Now that the problem of creating the result set is separated into a class with simple inputs and outputs that you can test, what remains is to take the input from the form and break it down into these inputs. That might include making sure that both strings have the same prefixes and making sure that the last parts of the strings are numbers.
But the whole thing will be a little simpler once it's not all in one big method. That also makes it easier to change when you realize something different about how you want it to work. Validating your input can be one step, and then getting the results can be another.
The unit tests help because you don't want to have to debug the whole thing to see if it works. Enumerable.Range didn't work the way I thought it did, so I had to fix a bug. It was easy to find and fix with the unit tests. It would have been harder if I had to run a whole app and then step through it in the debugger.

Rearranging a string of numbers from highest to lowest C#

What I'm trying to do is to create a function that will rearrange a string of numbers like "1234" to "4321". I'm certain that there are many much more efficient ways to do this than my method but I just want to see what went wrong with what I did because I'm a beginner at programming and can use the knowledge to get better.
My thought process for the code was to:
find the largest number in the inputted string
add the largest number into a list
remove the largest number from the inputted string
find the largest number again from the (now shorter) string
So I made a function that found the largest number in a string and it worked fine:
static int LargestNumber(string num)
{
int largestnumber = 0;
char[] numbers = num.ToCharArray();
foreach (var number in numbers)
{
int prevNumber = (int) char.GetNumericValue(number);
if (prevNumber >= largestnumber)
{
largestnumber = prevNumber;
}
}
return largestnumber;
}
Now the rearranging function is what I am having problems with:
static List<int> Rearrange(string num)
{
List<int> rearranged = new List<int>(); // to store rearranged numbers
foreach (var number in num) //for every number in the number string
{
string prevnumber = number.ToString(); // the previous number in the loop
if (prevnumber == LargestNumber(num).ToString()) // if the previous number is the larges number in the inputted string (num)
{
rearranged.Add(Convert.ToInt32(prevnumber)); // put the previous number into the list
// removing the previous number (largest) from the inputted string and update the inputted string (which should be now smaller)
StringBuilder sb = new StringBuilder(num);
sb.Remove(num.IndexOf(number), 1);
num = sb.ToString();
}
}
return rearranged; // return the final rearranged list of numbers
}
When I run this code (fixed for concatenation):
var rearranged = Rearrange("3250");
string concat = String.Join(" ", rearranged.ToArray());
Console.WriteLine(concat);
All I get is:
5
I'm not sure what I'm missing or what I'm doing wrong - the code doesn't seem to be going back after removing '5' which i s the highest number then removing the next highest number/
Your issue is your if statement within your loop.
if (prevnumber == LargestNumber(num).ToString()
{
rearranged.Add(Convert.ToInt32(prevnumber));
//...
}
You only ever add to your List rearranged if the value of prevnumber is the largest value, which is false for every number but 5, so the only value that ever gets added to the list is 5.
That's the answer to why it's only returning 5, but I don't think that will make your method work properly necessarily. You're doing a very dangerous thing by changing the value of the collection you are iterating over (the characters in num) from within the loop itself. Other answers have been written for you containing a method that rearranges the numbers as you've described.
Your Rearrange method is returning List<int> when you try to write that to the console, the best it can do is write System.Collections.Generic.List1[System.Int32] (its type)
Instead of trying to write the list, convert it first into a data type that can be written (string for example)
eg:
var myList = Rearrange("3250");
string concat = String.Join(" ", myList.ToArray());
Console.WriteLine(concat);
Building on pats comment you could iterate through your list and write them to the console.
e.g.
foreach(var i in Rearrange(3250))
{
console.writeline(i.ToString());
}
or if you want to see the linq example.
using system.linq;
Rearrange(3250).foreach(i => console.writeline(i.ToString()));
--edit after seeing you're only getting '5' output
This is because your function only adds number to the list if they are the largest number in your list, which is why 5 is only being added and returned.
Your Rearrange method can be written easily using Array.Sort (or similar with (List<T>) :
int[] Rearrange(int num)
{
var arr = num.ToString ().ToCharArray ();
Array.Sort (arr, (d1, d2) => d2 - d1);
return Array.ConvertAll (arr, ch => ch - '0');
}
Just reading your first sentence
Does not test for integers
static int ReversedNumber(string num)
{
char[] numbers = num.ToCharArray();
Array.Sort(numbers);
Array.Reverse(numbers);
Debug.WriteLine(String.Concat(numbers));
return (int.Parse(String.Concat(numbers)));
}
Because your foreach loop within Rearrange method only loop through the original num. The algorithm doesn't continue to go through the new num string after you have removed the largest number.
You can find the problem by debugging, this foreach loop in Rearrange goes only 4 times if your input string is "3250".

Constantly Incrementing String

So, what I'm trying to do this something like this: (example)
a,b,c,d.. etc. aa,ab,ac.. etc. ba,bb,bc, etc.
So, this can essentially be explained as generally increasing and just printing all possible variations, starting at a. So far, I've been able to do it with one letter, starting out like this:
for (int i = 97; i <= 122; i++)
{
item = (char)i
}
But, I'm unable to eventually add the second letter, third letter, and so forth. Is anyone able to provide input? Thanks.
Since there hasn't been a solution so far that would literally "increment a string", here is one that does:
static string Increment(string s) {
if (s.All(c => c == 'z')) {
return new string('a', s.Length + 1);
}
var res = s.ToCharArray();
var pos = res.Length - 1;
do {
if (res[pos] != 'z') {
res[pos]++;
break;
}
res[pos--] = 'a';
} while (true);
return new string(res);
}
The idea is simple: pretend that letters are your digits, and do an increment the way they teach in an elementary school. Start from the rightmost "digit", and increment it. If you hit a nine (which is 'z' in our system), move on to the prior digit; otherwise, you are done incrementing.
The obvious special case is when the "number" is composed entirely of nines. This is when your "counter" needs to roll to the next size up, and add a "digit". This special condition is checked at the beginning of the method: if the string is composed of N letters 'z', a string of N+1 letter 'a's is returned.
Here is a link to a quick demonstration of this code on ideone.
Each iteration of Your for loop is completely
overwriting what is in "item" - the for loop is just assigning one character "i" at a time
If item is a String, Use something like this:
item = "";
for (int i = 97; i <= 122; i++)
{
item += (char)i;
}
something to the affect of
public string IncrementString(string value)
{
if (string.IsNullOrEmpty(value)) return "a";
var chars = value.ToArray();
var last = chars.Last();
if(char.ToByte() == 122)
return value + "a";
return value.SubString(0, value.Length) + (char)(char.ToByte()+1);
}
you'll probably need to convert the char to a byte. That can be encapsulated in an extension method like static int ToByte(this char);
StringBuilder is a better choice when building large amounts of strings. so you may want to consider using that instead of string concatenation.
Another way to look at this is that you want to count in base 26. The computer is very good at counting and since it always has to convert from base 2 (binary), which is the way it stores values, to base 10 (decimal--the number system you and I generally think in), converting to different number bases is also very easy.
There's a general base converter here https://stackoverflow.com/a/3265796/351385 which converts an array of bytes to an arbitrary base. Once you have a good understanding of number bases and can understand that code, it's a simple matter to create a base 26 counter that counts in binary, but converts to base 26 for display.

Categories