Sort a List<StringBuilder> - c#

Requirement: Iterate through a sorted list of strings, adding a char at the beginning of each string, then re-sorting. This may need to be done a few thousand times. I tried using a regular List of strings but, as expected, the process was way too slow.
I was going to try a List of StringBuilders but there is no direct way to sort the list. Any workarounds come to mind?

You've stated you can't sort a Link - however, you can if you can supply your own sort comparison:
List<StringBuilder> strings = new List<StringBuilder>();
// ...
strings.Sort((s1, s2) => s1.ToString().CompareTo(s2.ToString()));
The problem here as #phoog notes, is that in order to do so it allocates a lot of extra strings and isn't very efficient. The sort he provides is better. What we can do to figure out which approach is better is supply a test. You can see the fiddle here: http://dotnetfiddle.net/Px4fys
The fiddle uses very few strings and very few iterations because it's in a fiddle and there's a memory limit. If you paste this into a console app and run in Release you'll find there's huge differences. As #phoog also suggests LinkedList<char> wins hands-down. StringBuilder is the slowest.
If we bump up the values and run in Release mode:
const int NumStrings= 1000;
const int NumIterations= 1500;
We'll find the results:
List<StringBuilder> - Elapsed Milliseconds: 27,678
List<string> - Elapsed Milliseconds: 2,932
LinkedList<char> - Elapsed Milliseconds: 912
EDIT 2: When I bumped both values up to 3000 and 3000
List<StringBuilder> - Elapsed Milliseconds: // Had to comment out - was taking several minutes
List<string> - Elapsed Milliseconds: 45,928
LinkedList<char> - Elapsed Milliseconds: 6,823

The string builders will be a bit quicker than strings, but still slow, since you have to copy the entire buffer to add a character at the beginning.
You can create a custom comparison method (or comparer object if you prefer) and pass it to the List.Sort method:
int CompareStringBuilders(StringBuilder a, StringBuilder b)
{
for (int i = 0; i < a.Length && i < b.Length; i++)
{
var comparison = a[i].CompareTo(b[i]);
if (comparison != 0)
return comparison;
}
return a.Length.CompareTo(b.Length);
}
Invoke it like this:
var list = new List<StringBuilder>();
//...
list.Sort(CompareStringBuilders);
You would probably do better to look for a different solution to your problem, however.
Linked lists offer quick prepending, so how about using LinkedList<char>? This might not work if you need other StringBuilder functions, of course.
StringBuilder was rewritten for .NET 4, so I've struck out my earlier comments about slow prepending of characters. If performance is an issue, you should test to see where the problems actually lie.

Thanks to all for the suggestions posted. I checked these, and I have to say that I'm astonished that LinkedList works incredibly well, except for memory usage.
Another surprise is the slow sorting speed of the StringBuilder list. It works quickly as expected for the char insert phase. But the posted benchmarks above reflect what I've found: StringBuilder sorts very slowly for some reason. Painfully slow.
List of strings sorts faster. But counter to intuition, List of LinkedList sorts very fast. I have no idea how navigating a linked list could possibly be faster than simple indexing of a buffer (as in strings and StringBuilder), but it is. I would never have thought to try LinkedList. Compliments to McAden for the insight!
But unfortunately, LinkedList runs the system out of RAM. So, back to the drawing board.

Sort the StringBuilders as described in Phoog's answer, but keep the strings in reverse order in the StringBuilder instances - this way, you can optimize the "prepending" of each new character by appending it to the end of the StringBuilder's current value:
Update: with test program
class Program
{
static readonly Random _rng = new Random();
static void Main(string[] args)
{
int stringCount = 2500;
int initialStringSize = 100;
int maxRng = 4;
int numberOfPrepends = 2500;
int iterations = 5;
Console.WriteLine( "String Count: {0}; # of Prepends: {1}; # of Unique Chars: {2}", stringCount, numberOfPrepends, maxRng );
var startingStrings = new List<string>();
for( int i = 0; i < stringCount; ++i )
{
var sb = new StringBuilder( initialStringSize );
for( int j = 0; j < initialStringSize; ++j )
{
sb.Append( _rng.Next( 0, maxRng ) );
}
startingStrings.Add( sb.ToString() );
}
for( int i = 0; i < iterations; ++i )
{
TestUsingStringBuilderAppendWithReversedStrings( startingStrings, maxRng, numberOfPrepends );
TestUsingStringBuilderPrepend( startingStrings, maxRng, numberOfPrepends );
}
var input = Console.ReadLine();
}
private static void TestUsingStringBuilderAppendWithReversedStrings( IEnumerable<string> startingStrings, int maxRng, int numberOfPrepends )
{
var builders = new List<StringBuilder>();
var start = DateTime.Now;
foreach( var str in startingStrings )
{
builders.Add( new StringBuilder( str ).Reverse() );
}
for( int i = 0; i < numberOfPrepends; ++i )
{
foreach( var sb in builders )
{
sb.Append( _rng.Next( 0, maxRng ) );
}
builders.Sort( ( x, y ) =>
{
var comparison = 0;
var xOffset = x.Length;
var yOffset = y.Length;
while( 0 < xOffset && 0 < yOffset && 0 == comparison )
{
--xOffset;
--yOffset;
comparison = x[ xOffset ].CompareTo( y[ yOffset ] );
}
if( 0 != comparison )
{
return comparison;
}
return xOffset.CompareTo( yOffset );
} );
}
builders.ForEach( sb => sb.Reverse() );
var end = DateTime.Now;
Console.WriteLine( "StringBuilder Reverse Append - Total Milliseconds: {0}", end.Subtract( start ).TotalMilliseconds );
}
private static void TestUsingStringBuilderPrepend( IEnumerable<string> startingStrings, int maxRng, int numberOfPrepends )
{
var builders = new List<StringBuilder>();
var start = DateTime.Now;
foreach( var str in startingStrings )
{
builders.Add( new StringBuilder( str ) );
}
for( int i = 0; i < numberOfPrepends; ++i )
{
foreach( var sb in builders )
{
sb.Insert( 0, _rng.Next( 0, maxRng ) );
}
builders.Sort( ( x, y ) =>
{
var comparison = 0;
for( int offset = 0; offset < x.Length && offset < y.Length && 0 == comparison; ++offset )
{
comparison = x[ offset ].CompareTo( y[ offset ] );
}
if( 0 != comparison )
{
return comparison;
}
return x.Length.CompareTo( y.Length );
} );
}
var end = DateTime.Now;
Console.WriteLine( "StringBulder Prepend - Total Milliseconds: {0}", end.Subtract( start ).TotalMilliseconds );
}
}
public static class Extensions
{
public static StringBuilder Reverse( this StringBuilder stringBuilder )
{
var endOffset = stringBuilder.Length - 1;
char a;
for( int beginOffset = 0; beginOffset < endOffset; ++beginOffset, --endOffset )
{
a = stringBuilder[ beginOffset ];
stringBuilder[ beginOffset ] = stringBuilder[ endOffset ];
stringBuilder[ endOffset ] = a;
}
return stringBuilder;
}
}
results:
2500 strings initially at 100 characters, 2500 prepends:

Related

Quick Sort Implementation with large numbers [duplicate]

I learnt about quick sort and how it can be implemented in both Recursive and Iterative method.
In Iterative method:
Push the range (0...n) into the stack
Partition the given array with a pivot
Pop the top element.
Push the partitions (index range) onto a stack if the range has more than one element
Do the above 3 steps, till the stack is empty
And the recursive version is the normal one defined in wiki.
I learnt that recursive algorithms are always slower than their iterative counterpart.
So, Which method is preferred in terms of time complexity (memory is not a concern)?
Which one is fast enough to use in Programming contest?
Is c++ STL sort() using a recursive approach?
In terms of (asymptotic) time complexity - they are both the same.
"Recursive is slower then iterative" - the rational behind this statement is because of the overhead of the recursive stack (saving and restoring the environment between calls).
However -these are constant number of ops, while not changing the number of "iterations".
Both recursive and iterative quicksort are O(nlogn) average case and O(n^2) worst case.
EDIT:
just for the fun of it I ran a benchmark with the (java) code attached to the post , and then I ran wilcoxon statistic test, to check what is the probability that the running times are indeed distinct
The results may be conclusive (P_VALUE=2.6e-34, https://en.wikipedia.org/wiki/P-value. Remember that the P_VALUE is P(T >= t | H) where T is the test statistic and H is the null hypothesis). But the answer is not what you expected.
The average of the iterative solution was 408.86 ms while of recursive was 236.81 ms
(Note - I used Integer and not int as argument to recursiveQsort() - otherwise the recursive would have achieved much better, because it doesn't have to box a lot of integers, which is also time consuming - I did it because the iterative solution has no choice but doing so.
Thus - your assumption is not true, the recursive solution is faster (for my machine and java for the very least) than the iterative one with P_VALUE=2.6e-34.
public static void recursiveQsort(int[] arr,Integer start, Integer end) {
if (end - start < 2) return; //stop clause
int p = start + ((end-start)/2);
p = partition(arr,p,start,end);
recursiveQsort(arr, start, p);
recursiveQsort(arr, p+1, end);
}
public static void iterativeQsort(int[] arr) {
Stack<Integer> stack = new Stack<Integer>();
stack.push(0);
stack.push(arr.length);
while (!stack.isEmpty()) {
int end = stack.pop();
int start = stack.pop();
if (end - start < 2) continue;
int p = start + ((end-start)/2);
p = partition(arr,p,start,end);
stack.push(p+1);
stack.push(end);
stack.push(start);
stack.push(p);
}
}
private static int partition(int[] arr, int p, int start, int end) {
int l = start;
int h = end - 2;
int piv = arr[p];
swap(arr,p,end-1);
while (l < h) {
if (arr[l] < piv) {
l++;
} else if (arr[h] >= piv) {
h--;
} else {
swap(arr,l,h);
}
}
int idx = h;
if (arr[h] < piv) idx++;
swap(arr,end-1,idx);
return idx;
}
private static void swap(int[] arr, int i, int j) {
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
public static void main(String... args) throws Exception {
Random r = new Random(1);
int SIZE = 1000000;
int N = 100;
int[] arr = new int[SIZE];
int[] millisRecursive = new int[N];
int[] millisIterative = new int[N];
for (int t = 0; t < N; t++) {
for (int i = 0; i < SIZE; i++) {
arr[i] = r.nextInt(SIZE);
}
int[] tempArr = Arrays.copyOf(arr, arr.length);
long start = System.currentTimeMillis();
iterativeQsort(tempArr);
millisIterative[t] = (int)(System.currentTimeMillis()-start);
tempArr = Arrays.copyOf(arr, arr.length);
start = System.currentTimeMillis();
recursvieQsort(tempArr,0,arr.length);
millisRecursive[t] = (int)(System.currentTimeMillis()-start);
}
int sum = 0;
for (int x : millisRecursive) {
System.out.println(x);
sum += x;
}
System.out.println("end of recursive. AVG = " + ((double)sum)/millisRecursive.length);
sum = 0;
for (int x : millisIterative) {
System.out.println(x);
sum += x;
}
System.out.println("end of iterative. AVG = " + ((double)sum)/millisIterative.length);
}
Recursion is NOT always slower than iteration. Quicksort is perfect example of it. The only way to do this in iterate way is create stack structure. So in other way do the same that the compiler do if we use recursion, and propably you will do this worse than compiler. Also there will be more jumps if you don't use recursion (to pop and push values to stack).
That's the solution i came up with in Javascript. I think it works.
const myArr = [33, 103, 3, 726, 200, 984, 198, 764, 9]
document.write('initial order :', JSON.stringify(myArr), '<br><br>')
qs_iter(myArr)
document.write('_Final order :', JSON.stringify(myArr))
function qs_iter(items) {
if (!items || items.length <= 1) {
return items
}
var stack = []
var low = 0
var high = items.length - 1
stack.push([low, high])
while (stack.length) {
var range = stack.pop()
low = range[0]
high = range[1]
if (low < high) {
var pivot = Math.floor((low + high) / 2)
stack.push([low, pivot])
stack.push([pivot + 1, high])
while (low < high) {
while (low < pivot && items[low] <= items[pivot]) low++
while (high > pivot && items[high] > items[pivot]) high--
if (low < high) {
var tmp = items[low]
items[low] = items[high]
items[high] = tmp
}
}
}
}
return items
}
Let me know if you found a mistake :)
Mister Jojo UPDATE :
this code just mixes values that can in rare cases lead to a sort, in other words never.
For those who have a doubt, I put it in snippet.

What is the fastest implementation of sql like 'x%' in c# collections on a key

I have a need to do very quick prefix "sql like" searches over a hundreds of thousands of keys. I have tried doing performance tests using a SortedList, a Dictionary, and a SortedDictionary, which I do like so :
var dictionary = new Dictionary<string, object>();
// add a million random strings
var results = dictionary.Where(x=>x.Key.StartsWith(prefix));
I find that that they all take a long time, Dictionary is the fastest, and SortedDictionary the slowest.
Then I tried a Trie implementation from http://www.codeproject.com/Articles/640998/NET-Data-Structures-for-Prefix-String-Search-and-S which is a magnitude faster, ie. milliseconds instead of seconds.
So my question is, is there no .NET collection I can use for the said requirement? I would have assumed that this would be a common requirement.
My basic test :
class Program
{
static readonly Dictionary<string, object> dictionary = new Dictionary<string, object>();
static Trie<object> trie = new Trie<object>();
static void Main(string[] args)
{
var random = new Random();
for (var i = 0; i < 100000; i++)
{
var randomstring = RandomString(random, 7);
dictionary.Add(randomstring, null);
trie.Add(randomstring, null);
}
var lookups = new string[10000];
for (var i = 0; i < lookups.Length; i++)
{
lookups[i] = RandomString(random, 3);
}
// compare searching
var sw = new Stopwatch();
sw.Start();
foreach (var lookup in lookups)
{
var exists = dictionary.Any(k => k.Key.StartsWith(lookup));
}
sw.Stop();
Console.WriteLine("dictionary.Any(k => k.Key.StartsWith(randomstring)) took : {0} ms", sw.ElapsedMilliseconds);
// test other collections
sw.Restart();
foreach (var lookup in lookups)
{
var exists = trie.Retrieve(lookup).Any();
}
sw.Stop();
Console.WriteLine("trie.Retrieve(lookup) took : {0} ms", sw.ElapsedMilliseconds);
Console.ReadKey();
}
public static string RandomString(Random random,int length)
{
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
}
Results:
dictionary.Any(k => k.Key.StartsWith(randomstring)) took : 80990 ms
trie.Retrieve(lookup) took : 115 ms
If sorting matters, try to use a SortedList instead of SortedDictionary. They both have the same functionality but they are implemented differently. SortedList is faster when you want to enumerate the elements (and you can access the elements by index), and SortedDictionary is faster if there are a lot of elements and you want to insert a new element in the middle of the collection.
So try this:
var sortedList = new SortedList<string, object>();
// populate list...
sortedList.Keys.Any(k => k.StartsWith(lookup));
If you have a million elements, but you don't want to re-order them once the dictionary is populated, you can combine their advantages: populate a SortedDictionary with the random elements, and then create a new List<KeyValuePair<,>> or SortedList<,> from that.
So, after little test I found something close enought with usage BinarySearch only Cons is that you have to sort keys from a to z. But the biggest the list, the slower it will be so Ternary Search is fastest from all you can actualy found with binary pc architecture.
Method: (Credits shoult go to #Guffa)
public static int BinarySearchStartsWith(List<string> words, string prefix, int min, int max)
{
while (max >= min)
{
var mid = (min + max) / 2;
var comp = string.CompareOrdinal(words[mid].Substring(0, prefix.Length), prefix);
if (comp >= 0)
{
if (comp > 0)
max = mid - 1;
else
return mid;
}
else
min = mid + 1;
}
return -1;
}
and test implementation
var keysToList = dictionary.Keys.OrderBy(q => q).ToList();
sw = new Stopwatch();
sw.Start();
foreach (var lookup in lookups)
{
bool exist = BinarySearchStartsWith(keysToList, lookup, 0, keysToList.Count - 1)!= -1
}
sw.Stop();
If you can sort the keys once and then use them repeatedly to look up the prefixes, then you can use a binary search to speed things up.
To get the maximum performance, I shall use two arrays, once for keys and one for values, and use the overload of Array.Sort() which sorts a main and an adjunct array.
Then you can use Array.BinarySearch() to search for the nearest key which starts with a given prefix, and return the indices for those that match.
When I try it, it seems to only take around 0.003ms per check if there are one or more matching prefixes.
Here's a runnable console application to demonstrate (remember to do your timings on a RELEASE build):
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Diagnostics;
using System.Linq;
namespace Demo
{
class Program
{
public static void Main()
{
int count = 1000000;
object obj = new object();
var keys = new string[count];
var values = new object[count];
for (int i = 0; i < count; ++i)
{
keys[i] = randomString(5, 16);
values[i] = obj;
}
// Sort key array and value arrays in tandem to keep the relation between keys and values.
Array.Sort(keys, values);
// Now you can use StartsWith() to return the indices of strings in keys[]
// that start with a specific string. The indices can be used to look up the
// corresponding values in values[].
Console.WriteLine("Count of ZZ = " + StartsWith(keys, "ZZ").Count());
// Test a load of times with 1000 random prefixes.
var prefixes = new string[1000];
for (int i = 0; i < 1000; ++i)
prefixes[i] = randomString(1, 8);
var sw = Stopwatch.StartNew();
for (int i = 0; i < 1000; ++i)
for (int j = 0; j < 1000; ++j)
StartsWith(keys, prefixes[j]).Any();
Console.WriteLine("1,000,000 checks took {0} for {1} ms each.", sw.Elapsed, sw.ElapsedMilliseconds/1000000.0);
}
public static IEnumerable<int> StartsWith(string[] array, string prefix)
{
int index = Array.BinarySearch(array, prefix);
if (index < 0)
index = ~index;
// We might have landed partway through a set of matches, so find the first match.
if (index < array.Length)
while ((index > 0) && array[index-1].StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
--index;
while ((index < array.Length) && array[index].StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
yield return index++;
}
static string randomString(int minLength, int maxLength)
{
int length = rng.Next(minLength, maxLength);
const string CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
return new string(Enumerable.Repeat(CHARS, length)
.Select(s => s[rng.Next(s.Length)]).ToArray());
}
static readonly Random rng = new Random(12345);
}
}

How to Know This is the Last Iteration of a Loop?

Is there a fancy way to know you are in the last loop in a List without using counters
List<string> myList = new List<string>() {"are", "we", "there", "yet"};
foreach(string myString in myList) {
// Is there a fancy way to find out here if this is the last time through?
}
No, you will have to use a regular for(int i=0; i<myList.Length; i++) { ... } loop for that.
How about using a for loop instead?
List<string> myList = new List<string>() {"are", "we", "there", "yet"};
for (var i=0; i<myList.Count; i++)
{
var myString = myList[i];
if (i==myList.Count-1)
{
// this is the last item in the list
}
}
foreach is useful - but if you need to keep count of what you are doing, or index things by count, then just revert to a good old for loop.
There are two ways I would do it.
First, with a for loop instead of foreach
for(int i = 0; i < myList.Count; i++)
{
string myString = myList[i];
bool isLast = i == myList.Count - 1;
...
}
Or, if this needs to work with enumerators, change the order of things. Normally MoveNext is done as the control of the while loop, but if we do it right at the beginning of the loop, we can use its return to determine if we're at the end of the list or not.
IEnumerator<string> enumerator = myList.GetEnumerator();
bool isLast = !enumerator.MoveNext();
while(!isLast)
{
string myString = enumerator.Current;
isLast = !enumerator.MoveNext();
...
}
There's no efficient way to do this.
Just use a for loop and index. It runs faster any way.
Well...there is no way to know that. Not as easy as you might guess. The list data structure is just a way to order items one after another. But you don't have a way to say if a item is the last item just using foreach structure. Best use for structure with the count property to know if you hit the last (or previous) item.
var length = list.Count;
for (var idx = 0; idx < list.Count; idx++)
{
if (idx == (length - 1))
{
// The last item. Do something
}
}
hope it helps.
No, there isn't, you'll have to use a counter if you want to use foreach.
Use for operator instead of foreach
The "fancy" way is to maintain 1 element of lookahead, but why on earth would you want to? Jus maintain a count. Here's the fancy way. No counters, no checking the Count property. All it uses is an enumerator:
using System;
using System.Collections.Generic;
namespace Sandbox
{
class Program
{
enum ListPosition : byte
{
First = 0x01 ,
Only = First|Last ,
Middle = 0x02 ,
Last = 0x04 ,
Exhausted = 0x00 ,
}
private static void WalkList( List<int> numbers )
{
List<int>.Enumerator numberWalker = numbers.GetEnumerator();
bool currFetched = numberWalker.MoveNext();
int currValue = currFetched ? numberWalker.Current : default( int );
bool nextFetched = numberWalker.MoveNext();
int nextValue = nextFetched ? numberWalker.Current : default( int );
ListPosition position ;
if ( currFetched && nextFetched ) position = ListPosition.First ;
else if ( currFetched && ! nextFetched ) position = ListPosition.Only ;
else if ( ! currFetched ) position = ListPosition.Exhausted ;
else throw new InvalidOperationException( "Reached Unreachable Code. Hmmm...that doesn't seem quite right" );
while ( position != ListPosition.Exhausted )
{
string article = ( position==ListPosition.Middle?"a":"the" );
Console.WriteLine( " {0} is {1} {2} item in the list" , currValue , article , position );
currFetched = nextFetched ;
currValue = nextValue ;
nextFetched = numberWalker.MoveNext() ;
nextValue = nextFetched?numberWalker.Current:default( int ) ;
if ( currFetched && nextFetched ) position = ListPosition.Middle ;
else if ( currFetched && ! nextFetched ) position = ListPosition.Last ;
else if ( ! currFetched ) position = ListPosition.Exhausted ;
else throw new InvalidOperationException( "Reached Unreachable Code. Hmmm...that doesn't seem quite right" );
}
Console.WriteLine() ;
return ;
}
static void Main( string[] args )
{
List<int> list1 = new List<int>( new []{ 1 , } ) ;
List<int> list2 = new List<int>( new []{ 1 , 2 , } ) ;
List<int> list3 = new List<int>( new []{ 1 , 2 , 3 , } ) ;
List<int> list4 = new List<int>( new []{ 1 , 2 , 3 , 4 , } ) ;
Console.WriteLine( "List 1:" ) ; WalkList( list1 ) ;
Console.WriteLine( "List 2:" ) ; WalkList( list2 ) ;
Console.WriteLine( "List 3:" ) ; WalkList( list3 ) ;
Console.WriteLine( "List 4:" ) ; WalkList( list4 ) ;
return ;
}
}
}
Depends on your definition of "fancy". This might qualify:
if (myList.Count > 0)
{
myList.Take(myList.Count - 1)).ForEach(myString => doSomething(myString));
doSomethingElse(myList.Last());
}
To me, that seems close to what you're looking for. Note that it's not super high-peformance, but it's short and pretty readable, at least to my eyes. You could also write it this way:
if (myList.Count > 0)
{
foreach (string myString in myList.Take(myList.Count - 1))
{
doSomething(myString);
}
doSomethingElse(myList.Last());
}
Here's another alternative, which I'd consider the most obvious, but it's probably the fastest way of doing this:
if (myList.Count > 0)
{
for (int i = 0; i < myList.Count - 1; i++)
{
doSomething(myList[i]);
}
doSomethingElse(myList.Count - 1);
}

a more simple big adder in c#?

I just had the task in school to write a big adder. Meaning a method that can put very large numbers together.
We had 10 minutes and I did complete it on time. The teacher approved it.
I am not too satisfied with the result though, and I thought I perhaps were taking the wrong approach.
Here is my version:
using System;
using System.Text;
namespace kæmpe_adder
{
static class Program
{
static void Main()
{
var x = "1111";
var y = "111111111";
Console.WriteLine(BigAdder(x, y));
Console.ReadLine();
}
public static StringBuilder BigAdder(string x, string y)
{
var a = new StringBuilder(x);
var b = new StringBuilder(y);
return BigAdder(a, b);
}
public static StringBuilder BigAdder(StringBuilder x, StringBuilder y)
{
int biggest;
int carry = 0;
int sum;
var stringSum = new StringBuilder();
if (x.Length > y.Length)
{
y.FillString(x.Length - y.Length);
biggest = x.Length;
}
else if (y.Length > x.Length)
{
x.FillString(y.Length - x.Length);
biggest = y.Length;
}
else
{
biggest = y.Length;
}
for (int i = biggest - 1; i >= 0; i--)
{
sum = Convert.ToInt32(x[i].ToString()) + Convert.ToInt32(y[i].ToString()) + carry;
carry = sum / 10;
stringSum.Insert(0, sum % 10);
}
if (carry != 0)
{
stringSum.Insert(0, carry);
}
return stringSum;
}
public static void FillString(this StringBuilder str, int max)
{
for (int i = 0; i < max; i++)
{
str.Insert(0, "0");
}
}
}
}
When I wrote it, I thought of how you do it with binaries.
Is there a shorter and/or perhaps simpler way to do this?
From the algebraic point of view your code looks correct. From the design point of view, you would definitely prefer to encapsulate each of these big numbers in a class, so that you don't have to reference the string/string builders all the time. I am also not a big fan of this FillString approach, it seems more reasonable to add the digits while both numbers have non-zero values, and then just add the carry to the bigger number until you are done.
Not sure what was the question about binaries? The normal length numbers (32bit and 64bit) are added by the CPU as a single operation.
There are a number of open source implementations you could look to for inspiration.
http://www.codeproject.com/KB/cs/biginteger.aspx
http://biginteger.codeplex.com/
In general, I would recommend using an array of byte or long for best performance, but the conversion from a string to the array would be non-trivial.
Store the numbers in reverse order; this makes finding equivalent places trivial.
This makes it easier to add differently sized strings numbers:
int place = 0;
int carry = 0;
while ( place < shorter.Length ) {
result.Append (AddDigits (longer[place], shorter[place], ref carry));
++place;
}
while ( place < longer.Length ) {
result.Append (AddDigits (longer[place], 0, ref carry));
++place;
}
if ( carry != 0 )
result.Append (carry.ToString ());

C#: Cleanest way to divide a string array into N instances N items long

I know how to do this in an ugly way, but am wondering if there is a more elegant and succinct method.
I have a string array of e-mail addresses. Assume the string array is of arbitrary length -- it could have a few items or it could have a great many items. I want to build another string consisting of say, 50 email addresses from the string array, until the end of the array, and invoke a send operation after each 50, using the string of 50 addresses in the Send() method.
The question more generally is what's the cleanest/clearest way to do this kind of thing. I have a solution that's a legacy of my VBScript learnings, but I'm betting there's a better way in C#.
You want elegant and succinct, I'll give you elegant and succinct:
var fifties = from index in Enumerable.Range(0, addresses.Length)
group addresses[index] by index/50;
foreach(var fifty in fifties)
Send(string.Join(";", fifty.ToArray());
Why mess around with all that awful looping code when you don't have to? You want to group things by fifties, then group them by fifties.
That's what the group operator is for!
UPDATE: commenter MoreCoffee asks how this works. Let's suppose we wanted to group by threes, because that's easier to type.
var threes = from index in Enumerable.Range(0, addresses.Length)
group addresses[index] by index/3;
Let's suppose that there are nine addresses, indexed zero through eight
What does this query mean?
The Enumerable.Range is a range of nine numbers starting at zero, so 0, 1, 2, 3, 4, 5, 6, 7, 8.
Range variable index takes on each of these values in turn.
We then go over each corresponding addresses[index] and assign it to a group.
What group do we assign it to? To group index/3. Integer arithmetic rounds towards zero in C#, so indexes 0, 1 and 2 become 0 when divided by 3. Indexes 3, 4, 5 become 1 when divided by 3. Indexes 6, 7, 8 become 2.
So we assign addresses[0], addresses[1] and addresses[2] to group 0, addresses[3], addresses[4] and addresses[5] to group 1, and so on.
The result of the query is a sequence of three groups, and each group is a sequence of three items.
Does that make sense?
Remember also that the result of the query expression is a query which represents this operation. It does not perform the operation until the foreach loop executes.
Seems similar to this question: Split a collection into n parts with LINQ?
A modified version of Hasan Khan's answer there should do the trick:
public static IEnumerable<IEnumerable<T>> Chunk<T>(
this IEnumerable<T> list, int chunkSize)
{
int i = 0;
var chunks = from name in list
group name by i++ / chunkSize into part
select part.AsEnumerable();
return chunks;
}
Usage example:
var addresses = new[] { "a#example.com", "b#example.org", ...... };
foreach (var chunk in Chunk(addresses, 50))
{
SendEmail(chunk.ToArray(), "Buy V14gr4");
}
It sounds like the input consists of separate email address strings in a large array, not several email address in one string, right? And in the output, each batch is a single combined string.
string[] allAddresses = GetLongArrayOfAddresses();
const int batchSize = 50;
for (int n = 0; n < allAddresses.Length; n += batchSize)
{
string batch = string.Join(";", allAddresses, n,
Math.Min(batchSize, allAddresses.Length - n));
// use batch somehow
}
Assuming you are using .NET 3.5 and C# 3, something like this should work nicely:
string[] s = new string[] {"1", "2", "3", "4"....};
for (int i = 0; i < s.Count(); i = i + 50)
{
string s = string.Join(";", s.Skip(i).Take(50).ToArray());
DoSomething(s);
}
I would just loop through the array and using StringBuilder to create the list (I'm assuming it's separated by ; like you would for email). Just send when you hit mod 50 or the end.
void Foo(string[] addresses)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < addresses.Length; i++)
{
sb.Append(addresses[i]);
if ((i + 1) % 50 == 0 || i == addresses.Length - 1)
{
Send(sb.ToString());
sb = new StringBuilder();
}
else
{
sb.Append("; ");
}
}
}
void Send(string addresses)
{
}
I think we need to have a little bit more context on what exactly this list looks like to give a definitive answer. For now I'm assuming that it's a semicolon delimeted list of email addresses. If so you can do the following to get a chunked up list.
public IEnumerable<string> DivideEmailList(string list) {
var last = 0;
var cur = list.IndexOf(';');
while ( cur >= 0 ) {
yield return list.SubString(last, cur-last);
last = cur + 1;
cur = list.IndexOf(';', last);
}
}
public IEnumerable<List<string>> ChunkEmails(string list) {
using ( var e = DivideEmailList(list).GetEnumerator() ) {
var list = new List<string>();
while ( e.MoveNext() ) {
list.Add(e.Current);
if ( list.Count == 50 ) {
yield return list;
list = new List<string>();
}
}
if ( list.Count != 0 ) {
yield return list;
}
}
}
I think this is simple and fast enough.The example below divides the long sentence into 15 parts,but you can pass batch size as parameter to make it dynamic.Here I simply divide using "/n".
private static string Concatenated(string longsentence)
{
const int batchSize = 15;
string concatanated = "";
int chanks = longsentence.Length / batchSize;
int currentIndex = 0;
while (chanks > 0)
{
var sub = longsentence.Substring(currentIndex, batchSize);
concatanated += sub + "/n";
chanks -= 1;
currentIndex += batchSize;
}
if (currentIndex < longsentence.Length)
{
int start = currentIndex;
var finalsub = longsentence.Substring(start);
concatanated += finalsub;
}
return concatanated;
}
This show result of split operation.
var parts = Concatenated(longsentence).Split(new string[] { "/n" }, StringSplitOptions.None);
Extensions methods based on Eric's answer:
public static IEnumerable<IEnumerable<T>> SplitIntoChunks<T>(this T[] source, int chunkSize)
{
var chunks = from index in Enumerable.Range(0, source.Length)
group source[index] by index / chunkSize;
return chunks;
}
public static T[][] SplitIntoArrayChunks<T>(this T[] source, int chunkSize)
{
var chunks = from index in Enumerable.Range(0, source.Length)
group source[index] by index / chunkSize;
return chunks.Select(e => e.ToArray()).ToArray();
}

Categories