Related
I have a configuration value expressed as a binary number to allow several options within the same value.
E.g. the value of 5 would be "101" or both 4 and 1.
Does anyone know of the best/fastest way to "input" the value '5' and get a list of {1,4} back?
If you want to get powers of 2 which the value consists of:
int value = 5;
var addendums = Enumerable.Range(0, sizeof(int) * 8 - 1)
.Select(i => (1 << i) & value)
.Where(x => x != 0)
.ToList();
Result:
[ 1, 4 ]
Note that if you want to have addendums in descending order, you can apply Reverse() after filtering sequence.
TL;DR The first step generates integer values which correspond to bit positions in integer value 0, 1, 2, ..., 31. Max index is a number of bits in Int32 value - 1 (because we need the index of the bit).
Next step selects a result of bitwise AND operation of the 1 shifted to the corresponding index (same as the power of 2) with the value itself (only first 4 bits shown here):
i 1<<i value (1<<i) & value
Binary Binary Binary Decimal
0 0001 0101 0001 1
1 0010 0101 0000 0
2 0100 0101 0100 4
3 1000 0101 0000 0
...
All you have to do after this step - filter out zeroes.
Some bit shifting and & later...
int n = 5+32;
var lst = new List<int>();
int i = 1;
while (n > 0)
{
if ((n & i) == i)
{
lst.Add(i);
n &= ~i;
}
i <<= 1; // equivalent to i *= 2
}
A little more esoteric, with the use of xor (^):
if (n != 0)
{
while (true)
{
if ((n & i) != 0)
{
lst.Add(i);
n ^= i;
if (n == 0)
{
break;
}
}
i <<= 1; // equivalent to i *= 2
}
}
I have made this little sample. Here you obtain from an integer its value as a sum of its powers of two. Thosw powers should be your input options
class Program
{
static void Main(string[] args)
{
var input = 5;
var options = new List<uint>();
for (uint currentPow = 1; currentPow != 0; currentPow <<= 1)
if ((currentPow & input) != 0)
options.Add(currentPow);
foreach (var option in options)
Console.WriteLine(option);
Console.ReadLine();
}
}
And the output is: 1 4
EDIT>>> In fact this does the same as #Sergey Berezovskiy answer but without LINQ
Hope it helps
The naive approach:
int originalInput = 42;
int input = originalInput;
// Generate binary numbers
var binaryNumbers = Enumerable.Range(0, 31).Select(n => (int)Math.Pow(2, n)).ToArray();
// Largest first
Array.Reverse(binaryNumbers);
var result = new List<int>();
foreach (var bin in binaryNumbers)
{
if (input >= bin)
{
result.Add(bin);
input -= bin;
}
}
Console.WriteLine($"{originalInput} decomposed: " + string.Join(" ", result));
Generate a range of power-of-two numbers, ranging from 2^31 (1073741824) to 2^0 (1), then check whether the input is equal to or larger than those numbers, and if so, add that number to the result list and subtract it from the input.
Now that that's all written out, see how Sergey's answer greatly reduces the code required by some Linq and bitshifting magic.
A hybrid solution, inspired by combining both answers:
var input = 42;
var output = Enumerable.Range(0, 31)
.Select(n => (int)Math.Pow(2, n))
.Where(p => (p & input) > 0);
Console.WriteLine($"{input} decomposed: " + string.Join(" ", output));
A maybe more traditional and easy to understand solution. You convert the number into a string binary representation, and then analyze each character to extract the corresponding decimal representations of each bit at 1.
int number = 5;
string binaryRep = Convert.ToString(number, 2);
List<int> myList = new List<int>();
int pow = 0;
for(int i = binaryRep.Count() - 1; i >= 0; i--)
{
if(binaryRep[i] == '1')
{
myList.Add((int)Math.Pow(2, pow));
}
pow++;
}
Short and fast:
int input = 5;
var list = new List<int>();
for (int i = 1, j = input; i <= j; i *= 2, input >>= 1){
if ((input & 1) == 1)
list.Add(i);
}
To show binary representation use
int value = 7;
var binary = Convert.ToString(value, 2);
To see binary numbers:
private int[] ToBinaryNumbers(int value)
{
var binary = Convert.ToString(value, 2).Reverse();
int ix = 0;
return binary.Select(x => { var res = x == '1' ? (int?)Math.Pow(2, ix) : (int?)null; ix++; return res; }).Where(x => x.HasValue).Select(x => x.Value).ToArray();
}
This will give you 1,2,4 for 7 or 1,8 for 9
Example:
a = "56 65 74 100 99 68 86 180 90", ordered by numbers weights becomes: "100 180 90 56 65 74 68 86 99"
When two numbers have the same "weight", let us class them as if they were strings and not numbers: 100 is before 180 because its "weight" (1) is less than the one of 180 (9) and 180 is before 90 since, having the same "weight" (9) it comes before as a string.
All numbers in the list are positive numbers and the list can be empty.
My tests:
[TestMethod]
public void Test1()
{
Assert.AreEqual("2000 103 123 4444 99",
WeightSort.orderWeight("103 123 4444 99 2000"));
}
[TestMethod]
public void Test2()
{
Assert.AreEqual("11 11 2000 10003 22 123 1234000 44444444 9999",
WeightSort.orderWeight("2000 10003 1234000 44444444 9999 11 11 22 123"));
}
My class to calculate the order of the weights:
public class WeightSort
{
public static string orderWeight(string strng)
{
List<int> list = strng.Split(' ').Select(Int32.Parse).OrderBy(i => i).ToList();
List<int> SumofNums = new List<int>();
List<string> SumandNums = new List<string>();
List<string> SumandNums2 = new List<string>();
List<string> Nums = new List<string>();
foreach (var itm in list)
{
int num = (int)GetSumOfDigits(itm);
SumofNums.Add(num);
SumandNums.Add(itm + "," + num);
}
SumofNums = SumofNums.OrderBy(i => i).ToList();
string txt = "";
foreach (var itm in SumofNums)
{
var item = itm.ToString();
if (!Nums.Contains(item))
{
foreach (var itm2 in SumandNums)
{
var itm3 = itm2.Split(',');
if (item == itm3[1])
{
SumandNums2.Add(itm2);
if (string.IsNullOrEmpty(txt))
txt = itm3[0];
else
txt = txt + " " + itm3[0];
}
}
Nums.Add(item);
}
}
return txt;
}
static long GetSumOfDigits(long n)
{
long num2 = 0;
long num3 = n;
long r = 0;
while (num3 != 0)
{
r = num3 % 10;
num3 = num3 / 10;
num2 = num2 + r;
}
return num2;
}
}
I can handle if there is only one but not duplicates.
Please help me rewrite my class so it can handle the duplicates also..
Sum of digits:
string weights = "103 123 4444 99 2000";
1) 2000, digit sum = 2;
2) 103, digit sum = 4;
3) 123, digit sum = 6;
4) 4444, digit sum = 16;
5) 99, digit sum = 18;
the correct order is "2000 103 123 4444 99"
You can use Linq if sorting by weight means
by sum of digits
lexicographically ("as strings")
the implementation
String a = "56 65 74 100 99 68 86 180 90";
// 100 180 90 56 65 74 68 86 99
String result = String.Join(" ", a
.Split(' ')
.OrderBy(item => item.Sum(ch => ch - '0')) // sum of digits
.ThenBy(item => item)); // lexicographic ("as string")
Try this:
var input = "103 123 4444 99 2000";
var sorted = input.Split(' ').OrderBy(s => s.Sum(c => c - '0')).ThenBy(s => s);
var result = string.Join(" ", sorted);
Addition: I realize now that Dmitry's answer had evolved into the same as mine before I posted mine.
New addition: If you find that s.Sum(c => c - '0') is like a hack, you can be using System.Globalization; and say s.Sum((Func<char, int>)CharUnicodeInfo.GetDecimalDigitValue) instead.
You can validate in the lambda. For example:
var sorted = input.Split(' ')
.OrderBy(s => s.Sum(c => { if (c < '0' || c > '9') { throw new ArgumentOutOfRangeException("c", "Unexpected character."); } return c - '0'; }))
.ThenBy(s => s);
You can also do this by creating a comparer which tells you whether a value is greater than or less than another value and can then be used. The code largely speaks for itself:
void Main()
{
var strings = new List<string>("2000 10003 1234000 44444444 9999 11 11 22 123".Split(' '));
strings.Sort(new MyComparer());
Console.WriteLine(String.Join(" ", strings));
}
public class MyComparer : IComparer<string>
{
public int Compare(string a, string b)
{
var aWeight = GetWeight(a);
var bWeight = GetWeight(b);
if (aWeight==bWeight)
{
return String.Compare(a,b);
}
else
{
return aWeight < bWeight ? -1 : 1;
}
}
private int GetWeight(string number)
{
var weight = 0;
foreach(var digit in number)
{
weight+=Int32.Parse(digit.ToString());
}
return weight;
}
}
The key thing is the MyComparer class which defines a single public method that takes two values in. It gets the weights of the objects and if they are the same it falls back to string comparison.
This comparer can then be passed to a sort function such as that of List<T> to then do the sorting.
This is much lengthier but I thought it worth sharing as it is a little more reusable (eg if you do this in a lot of places in your code you can have your logic in a single class) and it can sometimes be a bit more readable.
I also note I am not a fan of ch - '0' as a way of getting the int value of a character since it is not always obvious at a glance what it does if you don't know the trick. Also in the event of non numeric characters it will still do things, just not necessarily anything sensible. Mine will just throw a good old fashioned exception that can be caught if you pass it any non-numeric data.
I found this solution pretty short and clear:
var orderedNumbers = "56 65 74 100 99 68 86 180 90".Split(' ')
.OrderBy(GetWeight)
.ThenBy(x => x);
var result = String.Join(" ", orderedNumbers);
This will first calculate the weight from any given number and sort by this value. If it´s equal the ThenBy-clause comes to the play and furtherly orders the result by performing a string-comparison (as the values returned by the first OrderBy is a list of strings).
With
int GetWeight(string number)
{
return number.Sum(x => CharUnicodeInfo.GetDecimalDigitValue(x));
}
I've got this - possibly trivial - loop/combinations problem similar to binary combinations. I don't know how to approach it efficiently. Consider this scenario, I need unique loop to pass through all these combinations in a sequence:
Round ABC
01. 000 <- values of A=0, B=0, C=0
02. 001
03. 010
04. 011
05. 100
06. 101
07. 110
08. 111
09. 002
10. 012
11. 102
12. 112
13. 020
14. 021
15. 120
16. 121 <- values of A=1, B=2, C=1
17. 022
18. 122
19. 220
20. 221
21. 222
Except there are 12 letters (A-L), and also the "bit" size is not just 0,1 or 2 but any integer number (from 0 possibly up-to 1000 or 1024, not to make it crazy). I know it's a huge load of combinations, but I'll just scrap just top few that also fulfill my other conditions. So no need to worry about computational madness.
Disclaimer: The order has to be exactly as shown above. NOT a multiple FOR loops going first 0-1024 for C, then B.
Thanks in advance, I just can't seem to find the way to "algorithm it".
Update: Added whole sequence for combinations of ABC/012
regards,
Kate
Explanation:
I've encountered this problem when trying to tackle problem of analyzing sum of money for its combination of coins/notes:
For example $5001 to find out x optimal combinations.
10x $500 + 1x $1
50x $100 + 1x $1
..
Now letters (A,B,C..) correspond to a number of possible values of banknotes or coins ($1, $5,.. $100). While base correspond to a number of pieces of that banknotes/coins (for example $5001/$5000 = 1piece max.)
if I guess your sequence right, you will have it easier to generate it recursively
here an approach in Java, which should generate a sequence that matches your scenario.
I hope it helps you (maybe I add more explanation later):
public static void init() {
// define constants
final int length = 3;
final char maxValue = '3';
// define buffer
final char[] array = new char[length]; java.util.Arrays.fill(array, '0');
final boolean[] alreadySet = new boolean[length]; java.util.Arrays.fill(alreadySet, false);
// fill first digit, then let the recursion take place
for(char c = '1'; c <= (char)(maxValue); c++) {
// iterate from lowest to highest digit
for(int i = array.length-1; i >= 0; i--) {
// set value
array[i] = c;
alreadySet[i] = true;
// print value
System.out.println(new String(array));
// call recursion
recursive(array, c, i, alreadySet, length);
// unset value
alreadySet[i] = false;
array[i] = '0';
}
}
}
public static void recursive(char[] array, char lastValue, int lastIndex, boolean[] alreadySet, int leftToSet) {
// if we didn't set all digits
if(leftToSet > 0) {
// iterate from lowest to highest digit
for(int i = array.length-1; i >= 0; i--) {
// missing all digits already set
if(!alreadySet[i]) {
// count from 1 to lastValue-1
for(char c = '1'; c < lastValue; c++) {
// set value
array[i] = c;
alreadySet[i] = true;
// print value
System.out.println(new String(array));
// call recursion
recursive(array, c, i, alreadySet, leftToSet-1);
// unset value
alreadySet[i] = false;
array[i] = '0';
}
}
}
char c = lastValue;
// iterate from lowest to highest digit
for(int i = array.length-1; i > lastIndex; i--) {
// missing all digits already set
if(!alreadySet[i]) {
// set value
array[i] = c;
alreadySet[i] = true;
// print value
System.out.println(new String(array));
// call recursion
recursive(array, c, i, alreadySet, leftToSet-1);
// unset value
alreadySet[i] = false;
array[i] = '0';
}
}
}
}
A rough sketch in pseudo C#/Java:
Mapping A-L to indexes 0-11
const int[] maxvalues = { define max values for each var }
int[] counters = { initialize with 0s }
while (true)
{
for(i in 11..0)
{
counters[i]++;
if (counters[i] < maxvalues[i])
break; // for
counters[i] = 0;
}
if (counters[0] == maxvalues[0])
break; // while
print(counters.ToDisplayString());
}
(Just noted that the second sequence does not match the first sequence in OP. If OP is correct, I guess I didn't "get" the sequence)
The sequence of numbers you've described can be enumerated by counting upward from 0 in a base representation of numbers one higher than the amount of "letters" used to create your individual sequences.
One simple way to do this is to use a radix converter from base 10 which will act on a variable being incremented in a single loop from 0 to the maximum number of combinations you are looking to achieve.
Here is an implementation:
void Main()
{
for(int i=0; i< 50; i++){
Console.Write(convert(5,i));
Console.Write("\n");
}
}
string convert(int N, int M){
Stack<int> stack = new Stack<int>();
while (M >= N){
stack.Push(M %N);
M = M / N;
}
string str = M.ToString();
while(stack.Count() > 0)
str = str + stack.Pop().ToString();
return str;
}
Starting output:
0
1
2
3
4
10
11
12
13
14
20
21
22
23
24
30
31
32
33
34
40
41
42
43
44
100
101
102
103
104
There are many similar questions, but apparently no perfect match, that's why I'm asking.
I'd like to split a random string (e.g. 123xx456yy789) by a list of string delimiters (e.g. xx, yy) and include the delimiters in the result (here: 123, xx, 456, yy, 789).
Good performance is a nice bonus. Regex should be avoided, if possible.
Update: I did some performance checks and compared the results (too lazy to formally check them though). The tested solutions are (in random order):
Gabe
Guffa
Mafu
Regex
Other solutions were not tested because either they were similar to another solution or they came in too late.
This is the test code:
class Program
{
private static readonly List<Func<string, List<string>, List<string>>> Functions;
private static readonly List<string> Sources;
private static readonly List<List<string>> Delimiters;
static Program ()
{
Functions = new List<Func<string, List<string>, List<string>>> ();
Functions.Add ((s, l) => s.SplitIncludeDelimiters_Gabe (l).ToList ());
Functions.Add ((s, l) => s.SplitIncludeDelimiters_Guffa (l).ToList ());
Functions.Add ((s, l) => s.SplitIncludeDelimiters_Naive (l).ToList ());
Functions.Add ((s, l) => s.SplitIncludeDelimiters_Regex (l).ToList ());
Sources = new List<string> ();
Sources.Add ("");
Sources.Add (Guid.NewGuid ().ToString ());
string str = "";
for (int outer = 0; outer < 10; outer++) {
for (int i = 0; i < 10; i++) {
str += i + "**" + DateTime.UtcNow.Ticks;
}
str += "-";
}
Sources.Add (str);
Delimiters = new List<List<string>> ();
Delimiters.Add (new List<string> () { });
Delimiters.Add (new List<string> () { "-" });
Delimiters.Add (new List<string> () { "**" });
Delimiters.Add (new List<string> () { "-", "**" });
}
private class Result
{
public readonly int FuncID;
public readonly int SrcID;
public readonly int DelimID;
public readonly long Milliseconds;
public readonly List<string> Output;
public Result (int funcID, int srcID, int delimID, long milliseconds, List<string> output)
{
FuncID = funcID;
SrcID = srcID;
DelimID = delimID;
Milliseconds = milliseconds;
Output = output;
}
public void Print ()
{
Console.WriteLine ("S " + SrcID + "\tD " + DelimID + "\tF " + FuncID + "\t" + Milliseconds + "ms");
Console.WriteLine (Output.Count + "\t" + string.Join (" ", Output.Take (10).Select (x => x.Length < 15 ? x : x.Substring (0, 15) + "...").ToArray ()));
}
}
static void Main (string[] args)
{
var results = new List<Result> ();
for (int srcID = 0; srcID < 3; srcID++) {
for (int delimID = 0; delimID < 4; delimID++) {
for (int funcId = 3; funcId >= 0; funcId--) { // i tried various orders in my tests
Stopwatch sw = new Stopwatch ();
sw.Start ();
var func = Functions[funcId];
var src = Sources[srcID];
var del = Delimiters[delimID];
for (int i = 0; i < 10000; i++) {
func (src, del);
}
var list = func (src, del);
sw.Stop ();
var res = new Result (funcId, srcID, delimID, sw.ElapsedMilliseconds, list);
results.Add (res);
res.Print ();
}
}
}
}
}
As you can see, it was really just a quick and dirty test, but I ran the test multiple times and with different order and the result was always very consistent. The measured time frames are in the range of milliseconds up to seconds for the larger datasets. I ignored the values in the low-millisecond range in my following evaluation because they seemed negligible in practice. Here's the output on my box:
S 0 D 0 F 3 11ms
1
S 0 D 0 F 2 7ms
1
S 0 D 0 F 1 6ms
1
S 0 D 0 F 0 4ms
0
S 0 D 1 F 3 28ms
1
S 0 D 1 F 2 8ms
1
S 0 D 1 F 1 7ms
1
S 0 D 1 F 0 3ms
0
S 0 D 2 F 3 30ms
1
S 0 D 2 F 2 8ms
1
S 0 D 2 F 1 6ms
1
S 0 D 2 F 0 3ms
0
S 0 D 3 F 3 30ms
1
S 0 D 3 F 2 10ms
1
S 0 D 3 F 1 8ms
1
S 0 D 3 F 0 3ms
0
S 1 D 0 F 3 9ms
1 9e5282ec-e2a2-4...
S 1 D 0 F 2 6ms
1 9e5282ec-e2a2-4...
S 1 D 0 F 1 5ms
1 9e5282ec-e2a2-4...
S 1 D 0 F 0 5ms
1 9e5282ec-e2a2-4...
S 1 D 1 F 3 63ms
9 9e5282ec - e2a2 - 4265 - 8276 - 6dbb50fdae37
S 1 D 1 F 2 37ms
9 9e5282ec - e2a2 - 4265 - 8276 - 6dbb50fdae37
S 1 D 1 F 1 29ms
9 9e5282ec - e2a2 - 4265 - 8276 - 6dbb50fdae37
S 1 D 1 F 0 22ms
9 9e5282ec - e2a2 - 4265 - 8276 - 6dbb50fdae37
S 1 D 2 F 3 30ms
1 9e5282ec-e2a2-4...
S 1 D 2 F 2 10ms
1 9e5282ec-e2a2-4...
S 1 D 2 F 1 10ms
1 9e5282ec-e2a2-4...
S 1 D 2 F 0 12ms
1 9e5282ec-e2a2-4...
S 1 D 3 F 3 73ms
9 9e5282ec - e2a2 - 4265 - 8276 - 6dbb50fdae37
S 1 D 3 F 2 40ms
9 9e5282ec - e2a2 - 4265 - 8276 - 6dbb50fdae37
S 1 D 3 F 1 33ms
9 9e5282ec - e2a2 - 4265 - 8276 - 6dbb50fdae37
S 1 D 3 F 0 30ms
9 9e5282ec - e2a2 - 4265 - 8276 - 6dbb50fdae37
S 2 D 0 F 3 10ms
1 0**634226552821...
S 2 D 0 F 2 109ms
1 0**634226552821...
S 2 D 0 F 1 5ms
1 0**634226552821...
S 2 D 0 F 0 127ms
1 0**634226552821...
S 2 D 1 F 3 184ms
21 0**634226552821... - 0**634226552821... - 0**634226552821... - 0**634226
552821... - 0**634226552821... -
S 2 D 1 F 2 364ms
21 0**634226552821... - 0**634226552821... - 0**634226552821... - 0**634226
552821... - 0**634226552821... -
S 2 D 1 F 1 134ms
21 0**634226552821... - 0**634226552821... - 0**634226552821... - 0**634226
552821... - 0**634226552821... -
S 2 D 1 F 0 517ms
20 0**634226552821... - 0**634226552821... - 0**634226552821... - 0**634226
552821... - 0**634226552821... -
S 2 D 2 F 3 688ms
201 0 ** 634226552821217... ** 634226552821217... ** 634226552821217... ** 6
34226552821217... **
S 2 D 2 F 2 2404ms
201 0 ** 634226552821217... ** 634226552821217... ** 634226552821217... ** 6
34226552821217... **
S 2 D 2 F 1 874ms
201 0 ** 634226552821217... ** 634226552821217... ** 634226552821217... ** 6
34226552821217... **
S 2 D 2 F 0 717ms
201 0 ** 634226552821217... ** 634226552821217... ** 634226552821217... ** 6
34226552821217... **
S 2 D 3 F 3 1205ms
221 0 ** 634226552821217... ** 634226552821217... ** 634226552821217... ** 6
34226552821217... **
S 2 D 3 F 2 3471ms
221 0 ** 634226552821217... ** 634226552821217... ** 634226552821217... ** 6
34226552821217... **
S 2 D 3 F 1 1008ms
221 0 ** 634226552821217... ** 634226552821217... ** 634226552821217... ** 6
34226552821217... **
S 2 D 3 F 0 1095ms
220 0 ** 634226552821217... ** 634226552821217... ** 634226552821217... ** 6
34226552821217... **
I compared the results and this is what I found:
All 4 functions are fast enough for common usage.
The naive version (aka what I wrote initially) is the worst in terms of computation time.
Regex is a bit slow on small datasets (probably due to initialization overhead).
Regex does well on large data and hits a similar speed as the non-regex solutions.
The performance-wise best seems to be Guffa's version overall, which is to be expected from the code.
Gabe's version sometimes omits an item, but I did not investigate this (bug?).
To conclude this topic, I suggest to use Regex, which is reasonably fast. If performance is critical, I'd prefer Guffa's implementation.
Despite your reluctance to use regex it actually nicely preserves the delimiters by using a group along with the Regex.Split method:
string input = "123xx456yy789";
string pattern = "(xx|yy)";
string[] result = Regex.Split(input, pattern);
If you remove the parentheses from the pattern, using just "xx|yy", the delimiters are not preserved. Be sure to use Regex.Escape on the pattern if you use any metacharacters that hold special meaning in regex. The characters include \, *, +, ?, |, {, [, (,), ^, $,., #. For instance, a delimiter of . should be escaped \.. Given a list of delimiters, you need to "OR" them using the pipe | symbol and that too is a character that gets escaped. To properly build the pattern use the following code (thanks to #gabe for pointing this out):
var delimiters = new List<string> { ".", "xx", "yy" };
string pattern = "(" + String.Join("|", delimiters.Select(d => Regex.Escape(d))
.ToArray())
+ ")";
The parentheses are concatenated rather than included in the pattern since they would be incorrectly escaped for your purposes.
EDIT: In addition, if the delimiters list happens to be empty, the final pattern would incorrectly be () and this would cause blank matches. To prevent this a check for the delimiters can be used. With all this in mind the snippet becomes:
string input = "123xx456yy789";
// to reach the else branch set delimiters to new List();
var delimiters = new List<string> { ".", "xx", "yy", "()" };
if (delimiters.Count > 0)
{
string pattern = "("
+ String.Join("|", delimiters.Select(d => Regex.Escape(d))
.ToArray())
+ ")";
string[] result = Regex.Split(input, pattern);
foreach (string s in result)
{
Console.WriteLine(s);
}
}
else
{
// nothing to split
Console.WriteLine(input);
}
If you need a case-insensitive match for the delimiters use the RegexOptions.IgnoreCase option: Regex.Split(input, pattern, RegexOptions.IgnoreCase)
EDIT #2: the solution so far matches split tokens that might be a substring of a larger string. If the split token should be matched completely, rather than part of a substring, such as a scenario where words in a sentence are used as the delimiters, then the word-boundary \b metacharacter should be added around the pattern.
For example, consider this sentence (yea, it's corny): "Welcome to stackoverflow... where the stack never overflows!"
If the delimiters were { "stack", "flow" } the current solution would split "stackoverflow" and return 3 strings { "stack", "over", "flow" }. If you needed an exact match, then the only place this would split would be at the word "stack" later in the sentence and not "stackoverflow".
To achieve an exact match behavior alter the pattern to include \b as in \b(delim1|delim2|delimN)\b:
string pattern = #"\b("
+ String.Join("|", delimiters.Select(d => Regex.Escape(d)))
+ #")\b";
Finally, if trimming the spaces before and after the delimiters is desired, add \s* around the pattern as in \s*(delim1|delim2|delimN)\s*. This can be combined with \b as follows:
string pattern = #"\s*\b("
+ String.Join("|", delimiters.Select(d => Regex.Escape(d)))
+ #")\b\s*";
Ok, sorry, maybe this one:
string source = "123xx456yy789";
foreach (string delimiter in delimiters)
source = source.Replace(delimiter, ";" + delimiter + ";");
string[] parts = source.Split(';');
Here's a solution that doesn't use a regular expression and doesn't make more strings than necessary:
public static List<string> Split(string searchStr, string[] separators)
{
List<string> result = new List<string>();
int length = searchStr.Length;
int lastMatchEnd = 0;
for (int i = 0; i < length; i++)
{
for (int j = 0; j < separators.Length; j++)
{
string str = separators[j];
int sepLen = str.Length;
if (((searchStr[i] == str[0]) && (sepLen <= (length - i))) && ((sepLen == 1) || (String.CompareOrdinal(searchStr, i, str, 0, sepLen) == 0)))
{
result.Add(searchStr.Substring(lastMatchEnd, i - lastMatchEnd));
result.Add(separators[j]);
i += sepLen - 1;
lastMatchEnd = i + 1;
break;
}
}
}
if (lastMatchEnd != length)
result.Add(searchStr.Substring(lastMatchEnd));
return result;
}
A naive implementation
public IEnumerable<string> SplitX (string text, string[] delimiters)
{
var split = text.Split (delimiters, StringSplitOptions.None);
foreach (string part in split) {
yield return part;
text = text.Substring (part.Length);
string delim = delimiters.FirstOrDefault (x => text.StartsWith (x));
if (delim != null) {
yield return delim;
text = text.Substring (delim.Length);
}
}
}
I came up with a solution for something similar a while back. To efficiently split a string you can keep a list of the next occurance of each delimiter. That way you minimise the times that you have to look for each delimiter.
This algorithm will perform well even for a long string and a large number of delimiters:
string input = "123xx456yy789";
string[] delimiters = { "xx", "yy" };
int[] nextPosition = delimiters.Select(d => input.IndexOf(d)).ToArray();
List<string> result = new List<string>();
int pos = 0;
while (true) {
int firstPos = int.MaxValue;
string delimiter = null;
for (int i = 0; i < nextPosition.Length; i++) {
if (nextPosition[i] != -1 && nextPosition[i] < firstPos) {
firstPos = nextPosition[i];
delimiter = delimiters[i];
}
}
if (firstPos != int.MaxValue) {
result.Add(input.Substring(pos, firstPos - pos));
result.Add(delimiter);
pos = firstPos + delimiter.Length;
for (int i = 0; i < nextPosition.Length; i++) {
if (nextPosition[i] != -1 && nextPosition[i] < pos) {
nextPosition[i] = input.IndexOf(delimiters[i], pos);
}
}
} else {
result.Add(input.Substring(pos));
break;
}
}
(With reservations for any bugs, I just threw this version together now and I haven't tested it thorougly.)
This will have identical semantics to String.Split default mode (so not including empty tokens).
It can be made faster by using unsafe code to iterate over the source string, though this requires you to write the iteration mechanism yourself rather than using yield return.
It allocates the absolute minimum (a substring per non separator token plus the wrapping enumerator) so realistically to improve performance you would have to:
use even more unsafe code (by using 'CompareOrdinal' I effectively am)
mainly in avoiding the overhead of character lookup on the string with a char buffer
make use of domain specific knowledge about the input sources or tokens.
you may be happy to eliminate the null check on the separators
you may know that the separators are almost never individual characters
The code is written as an extension method
public static IEnumerable<string> SplitWithTokens(
string str,
string[] separators)
{
if (separators == null || separators.Length == 0)
{
yield return str;
yield break;
}
int prev = 0;
for (int i = 0; i < str.Length; i++)
{
foreach (var sep in separators)
{
if (!string.IsNullOrEmpty(sep))
{
if (((str[i] == sep[0]) &&
(sep.Length <= (str.Length - i)))
&&
((sep.Length == 1) ||
(string.CompareOrdinal(str, i, sep, 0, sep.Length) == 0)))
{
if (i - prev != 0)
yield return str.Substring(prev, i - prev);
yield return sep;
i += sep.Length - 1;
prev = i + 1;
break;
}
}
}
}
if (str.Length - prev > 0)
yield return str.Substring(prev, str.Length - prev);
}
My first post/answer...this is a recursive approach.
static void Split(string src, string[] delims, ref List<string> final)
{
if (src.Length == 0)
return;
int endTrimIndex = src.Length;
foreach (string delim in delims)
{
//get the index of the first occurance of this delim
int indexOfDelim = src.IndexOf(delim);
//check to see if this delim is at the begining of src
if (indexOfDelim == 0)
{
endTrimIndex = delim.Length;
break;
}
//see if this delim comes before previously searched delims
else if (indexOfDelim < endTrimIndex && indexOfDelim != -1)
endTrimIndex = indexOfDelim;
}
final.Add(src.Substring(0, endTrimIndex));
Split(src.Remove(0, endTrimIndex), delims, ref final);
}
Say that I have a set of numbers:
Group1 = 10, Group2 = 15, Group3 = 20, Group4 = 30
I want to output the summation of all subsets of numbers
10 + 15 = 25
10 + 15 + 20 = 45
10 + 15 + 20 + 30 = 75
15 + 20 = 35
15 + 20 + 30 = 65
20 + 30 = 50
10 + 20 = 30
10 + 30 = 40
10 + 20 + 30 = 60
... (assumed the rest is typed out)
Each of these groups will have a name, so I would want to print out the names used in the calculation before the result:
Group1 + Group2 = 25
How to do such a thing?
EDIT: to JacobM who edited tags, this is NOT homework and would appreciate an ask before you start editing it as such. I am actually at a customer site who is trying to balance a set of numbers, and the result is coming up incorrectly. My thought was to identify which group of numbers is equal to the delta between the 2 sets, and that would identify the problem directly.
Note: this would be float values, not integers.
EDIT2: added arbitrary so that it is understood that I can not just type this out once with a bunch of string.format's .. I could easily use excel at that point.
My thought was to identify which group of numbers is equal to the delta between the 2 sets, and that would identify the problem directly.
The problem "given an integer s, and a set of integers, does any non-empty subset of the set sum to s?" is known as the "subset sum problem". It is extremely well studied, and it is NP-Complete. (See this link for a related problem.)
That is to say it is amongst the hardest problems to solve in a reasonable amount of time. It is widely believed (though at present not proved) that no polynomial-time algorithm can possibly exist for this problem. The best you can do is something like O(2^n) for a set containing n elements.
(I note that your problem is in floats, not integers. It doesn't really matter, as long as you correctly handle the comparison of the calculated sum to the target sum to handle any rounding error that might have accrued in doing the sum.)
For a small number of elements -- you say you have only 15 or so in the set -- your best bet is to just try them all exhaustively. Here's how you do that.
The trick is to realize that there is one subset for each integer from 0 to 2^n. If you look at those numbers in binary:
0000
0001
0010
0011
...
each one corresponds to a subset. The first has no members. The second has just group 1. The third has just group 2. The fourth has group 1 and group 2. And so on.
The pseudocode is easy enough:
for each integer i from 1 to 2^n
{
sum = 0;
for each integer b from 1 to n
{
if the bth bit of i is on then sum = sum + group(b)
}
if sum == target then print out i in binary and quit
}
quit with no solution
Obviously this is O(n 2^n). If you can find an algorithm that always does better than O(c^n), or prove that you cannot find such an algorithm then you'll be famous forever.
The Wikipedia article has a better algorithm that gives an answer much faster most but not all of the time. I would go with the naive algorithm first since it will only take you a few minutes to code up; if it is unacceptably slow then go for the faster, more complex algorithm.
This matches every possible combination...
static void Main(string[] args)
{
Dictionary<string, float> groups = new Dictionary<string, float>();
groups.Add("Group1", 10);
groups.Add("Group2", 15);
groups.Add("Group3", 20);
groups.Add("Group4", 30);
for (int i=0; i < groups.Count - 1; i++)
{
Iterate(groups, i, 0, "");
}
Console.Read();
}
private static void Iterate(Dictionary<string, float> groups, int k, float sum, string s)
{
KeyValuePair<string, float> g = groups.ElementAt(k);
if (string.IsNullOrEmpty(s))
{
s = g.Key;
}
else
{
s += " + " + g.Key;
Console.WriteLine(s + " = " + (sum + g.Value));
}
for (int i = k + 1; i < groups.Count; i++)
{
Iterate(groups, i, sum + g.Value, s);
}
}
I've asked a question about converting an integer to byte representation to solve a problem similar to this.
Converting integer to a bit representation
Here's my 10 cents. It uses the notion that I think #DK was hinting at. You take an integer and convert it to a binary number that represents a bitmask of groups to add. 1 means add it, 0 means skip it. Its in VB but should be convertible to C# pretty easily.
'//Create the group of numbers
Dim Groups As New List(Of Integer)({10, 15, 20, 30})
'//Find the total number groups (Same as 2^Groups.Count() - 1 but reads better for me)
Dim MaxCount = Convert.ToInt32(New String("1"c, Groups.Count), 2)
'//Will hold our string representation of the current bitmask (0011, 1010, etc)
Dim Bits As String
'//Will hold our current total
Dim Total As Integer
'//Will hold the names of the groups added
Dim TextPart As List(Of String)
'//Loop through all possible combination
For I = 0 To MaxCount
'//Create our bitmask
Bits = Convert.ToString(I, 2).PadLeft(Groups.Count, "0")
'//Make sure we have got at least 2 groups
If Bits.Count(Function(ch) ch = "1"c) <= 1 Then Continue For
'//Re-initialize our group array
TextPart = New List(Of String)
'//Reset our total
Total = 0
'//Loop through each bit
For C = 0 To Bits.Count - 1
'//If its a 1, add it
If Bits(C) = "1"c Then
Total += Groups(C)
TextPart.Add("Group" & (C + 1))
End If
Next
'/Output
Trace.WriteLine(Join(TextPart.ToArray(), " + ") & " = " & Total)
Next
Outputs:
Group3 + Group4 = 50
Group2 + Group4 = 45
Group2 + Group3 = 35
Group2 + Group3 + Group4 = 65
Group1 + Group4 = 40
Group1 + Group3 = 30
Group1 + Group3 + Group4 = 60
Group1 + Group2 = 25
Group1 + Group2 + Group4 = 55
Group1 + Group2 + Group3 = 45
Group1 + Group2 + Group3 + Group4 = 75
This is a fairly classic combination problem. See this post for more details:
Algorithm to return all combinations of k elements from n
Effectively what you want to do is iterate from N-choose-1 through N-choose-N and calculate the sums of each subset.
Well as already said the key to your solution lies in getting all the possible combinations! You could put something like this in a static class to register it as an extension method:
public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int length = -1)
{
switch (length)
{
case -1:
foreach (var combination in Enumerable.Range(1, elements.Count()).Select(count => elements.Combinations(count)).SelectMany(c => c))
yield return combination;
break;
case 0:
yield return new T[0];
break;
default:
if (length < -1) throw new ArgumentOutOfRangeException("length");
foreach (var combination in
elements
.SelectMany((element, index) =>
elements
.Skip(index + 1)
.Combinations(length - 1)
.Select(previous => (new[] { element }).Concat(previous))))
yield return combination;
break;
}
}
... and use it like this:
static void Main(string[] args)
{
var groups = new[]
{
new Tuple<string, int>("Group1", 15),
new Tuple<string, int>("Group2", 5),
new Tuple<string, int>("Group3", 17),
};
foreach (var sum in groups
.Combinations()
.Select(x =>
string.Join(" + ", x.Select(tuple => tuple.Item1)) +
" = " +
x.Sum(tuple => tuple.Item2)))
{
Console.WriteLine(sum);
}
Console.ReadLine();
}
Output:
Group1 = 15
Group2 = 5
Group3 = 17
Group1 + Group2 = 20
Group1 + Group3 = 32
Group2 + Group3 = 22
Group1 + Group2 + Group3 = 37
Okay, the last one wasn't as straightforward as I thought. I actually tested it this time, and it gives the correct results.
void PrintInner( string output, float total, List<KeyValuePair<string, float>> children )
{
var parent = children[0];
var innerChildren = new List<KeyValuePair<string, float>>();
innerChildren.AddRange( children );
innerChildren.Remove( parent );
output += parent.Key + ":" + parent.Value.ToString();
total += parent.Value;
if( output != "" ) // Will prevent outputting "Group1:10 = 10", comment out if desired.
Console.WriteLine( output + " = " + total.ToString() );
output += " + ";
while( innerChildren.Count > 0 )
{
PrintInner( output, total, innerChildren );
innerChildren.RemoveAt( 0 );
}
}
void PrintAll()
{
var items = new List<KeyValuePair<string,float>>()
{
new KeyValuePair<string,float>>( "Group1", 10 ),
new KeyValuePair<string,float>>( "Group2", 15 ),
new KeyValuePair<string,float>>( "Group3", 20 ),
new KeyValuePair<string,float>>( "Group4", 30 )
}
while( items.Count > 0 )
{
PrintInner( "", 0, items );
items.RemoveAt( 0 );
}
}
If Group is a custom data type you can overload the +, -, *, /, =, ==, != and subsequently +=, -=, *=, and /= operators as shown here: MSDN: Operator Overloading Tutorial
If your data type is a native data type: int (Int32), long, decimal, double, or float you can do the operations you have.
To output the summation of your numbers you can use:
String.Format("{0} + {1} = {2}", Group1, Group2, (Group1 + Group2));
or
String.Format("{0} + {1} + {2} = {3}", Group1, Group2, Group3, (Group1 + Group2 + Group3));
Finally if in those examples Group is a custom data type, you would also have to overload the ToString() method so that it can display properly.
<bleepzter/>
OK, Part 2 - OO Algorithm Design?
So lets say you have the following:
public class Set: List<float>
{
public Set():base(){}
public static Set operator+(Set set1, Set set2)
{
Set result = new Set();
result.AddRange(set1.ToArray());
result.AddRange(set2.ToArray());
return result;
}
public float Sum
{
get
{
if( this.Count == 0 )
return 0F;
return this.Sum();
}
}
public override string ToString()
{
string formatString = string.Empty;
string result = string.Empty;
for(int i=0; i<this.Count; i++)
{
formatString += "{" + i.ToString() + "} + ";
}
formatString = result.TrimEnd((" +").ToCharArray()); // remove the last "+ ";
float[] values = this.ToArray();
result = String.Format(formatString, values);
return String.Format("{0} = {1}", result, this.Sum);
}
}
The object Set will have a Sum property, as well as a ToString() method that will display the sum and all of its content.