How to shuffle string characters to right and left until int.MaxValue? - c#

My task is to make a organized shuffle, from source all odd numbers will go to left and even number will go to right.
I have done that much like this, and it is good for normal scenario:
public static string ShuffleChars(string source, int count)
{
if (string.IsNullOrWhiteSpace(source) || source.Length == 0)
{
throw new ArgumentException(null);
}
if (count < 0)
{
throw new ArgumentException(null);
}
for (int i = 0; i < count; i++)
{
source = string.Concat(source.Where((item, index) => index % 2 == 0)) +
string.Concat(source.Where((item, index) => index % 2 != 0));
}
return source;
}
Now the problem is, what if the count is int.MaxValue or a other huge number in millions, it will loop trough a lot. How can I optimize the code in terms of speed and resource consumption?

You should be able to determine by the string's length how many iterations it will take before it's back to it's original sort order. Then take the modulus of the iteration count and the input count, and only iterate that many times.
For example, a string that is three characters will be back to it's original sort order in 2 iterations. If the input count was to do 11 iterations, we know that 11 % 2 == 1, so we only need to iterate one time.
Once you determine a formula for how many iterations it takes to reach the original sort order for any length of string, you can always reduce the number of iterations to that number or less.
Coming up with a formula will be tricky, however. A string with 14 characters takes 12 iterations until it matches itself, but a string with 15 characters only takes 4 iterations.
Therefore, a shortcut might be to simply start iterating until we reach the original sort order (or the specified count, whichever comes first). If we reach the count first, then we return that answer. Otherwise, we can determine the answer from the idea in the first paragraph - take the modulus of the input count and the iteration count, and return that answer.
This would require that we store the values from our iterations (in a dictionary, for example) so we can retrieve a specific previous value.
For example:
public static string ShuffleChars(string source, int count)
{
string s = source;
var results = new Dictionary<int, string>();
for (int i = 0; i < count; i++)
{
s = string.Concat(s.Where((item, index) => index % 2 == 0)) +
string.Concat(s.Where((item, index) => index % 2 != 0));
// If we've repeated our original string, return the saved
// value of the input count modulus the current iteration
if (s == source)
{
return results[count % (i + 1) - 1];
}
// Otherwise, save the value for later
else
{
results[i] = s;
}
}
// If we get here it means we hit the requested count before
// ever returning to the original sort order of the input
return s;
}

Instead of creating new immutable strings on each loop, you could work with a mutable array of characters (char[]), and swap characters between places. This would be the most efficient in terms of memory consumption, but doing the swaps on a single array could be quite tricky. Using two arrays is much easier, because you can just copy characters from one array to the other, and at the end of each loop swap the two arrays.
One more optimization you could do is to work with the indices of the char array, instead of its values. I am not sure if this will make any difference in practice, since in modern 64 bit machines both char and int types occupy 8 bytes (AFAIK). It will surely make a difference on 32 bit machines though. Here is an implementation, with all these ideas put together:
public static string ShuffleChars(string source, int count)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (count < 0) throw new ArgumentOutOfRangeException(nameof(count));
// Instantiate the two arrays
int[] indices = new int[source.Length];
int[] temp = new int[source.Length];
// Initialize the indices array with incremented numbers
for (int i = 0; i < indices.Length; i++)
indices[i] = i;
for (int k = 0; k < count; k++)
{
// Copy the odds to the temp array
for (int i = 0, j = 0; j < indices.Length; i += 1, j += 2)
temp[i] = indices[j];
// Copy the evens to the temp array
int lastEven = (indices.Length >> 1 << 1) - 1;
for (int i = indices.Length - 1, j = lastEven; j >= 0; i -= 1, j -= 2)
temp[i] = indices[j];
// Swap the two arrays, using value tuples
(indices, temp) = (temp, indices);
}
// Map the indices to characters from the source string
return String.Concat(indices.Select(i => source[i]));
}

Related

TapeEquilibrium: Single failed test case... where is the error?

TLDR: Codility "Challenge" - my results: Where is the error?
Short Description (Full Description): Given an Array, split the array into two (Upper and lower parts) and give the minimum difference between two possible parts in absolute value.
My thought process is:
create an "Upper" and "Lower" bucket for sums.
In one pass of the array, we get a sum for the "Upper" bucket.
Then, one array value at a time, move the numbers into lower (Upper-n, Lower+n).
At each step, get the difference (Abs(Upper-lower))
Monitor lowest "Minimum"
Submitted Code:
public int solution(int[] A)
{
// Quick results:
if (A == null) return -1;
if (A.Length == 0) return -1; // Can't split
if (A.Length == 1) return -1; // Can't split
if (A.Length == 2) return Math.Abs(A[0] - A[1]); // Only one way to split
// Hold above/below/result...
long lower = 0;
long upper = 0;
var min = long.MaxValue;
// Pass#1: Sum All to get "Upper"
for (long i = 0; i < A.Length; i++) upper += A[i];
// Pass#2:
// foreach in array
// ... Shift number from upper to lower
// ... Calculate new difference/minimum
for (var i = 0; i < A.Length; i++)
{
lower += A[i];
upper -= A[i];
var diff = Math.Abs(upper - lower);
min = Math.Min(min, diff);
if (diff == 0) return 0;
}
return (int) min;
}
Out of 13 test cases, the only one that Codility fails me on is: "Small Numbers". it says "Wrong answer, expected 20 got 0". It doesn't show the test data it uses, so I'm left guessing as to "Why".
Where is my error? I think I've stared at it too much, but I can't seem to figure out what case would "break" my function.
Edit: Fixed translation. Submitted code to Codility uses a Foreach, and the code I have here is a For. Corrected the variables in the loop.
The problem is that you didn't take into account one of the rules: 0 < P < N.
Your second loop is assuming 0 < P <= N.
Assume this input:
10, 10, -20
Your code would return 0 but 40 would be correct.
Fix:
Change your second loop header to
for (var i = 0; i < A.Length - 1; i++)
Proof

How to efficiently generate all combinations (at all depths) whose sum is within a specified range

Suppose you have a set of values (1,1,1,12,12,16) how would you generate all possible combinations (without repetition) whose sum is within a predefined range [min,max]. For example, here are all the combinations (of all depths) that have a range between 13 and 17:
1 12
1 1 12
1 1 1 12
16
1 16
This assumes that each item of the same value is indistinguishable, so you don't have three results of 1 12 in the final output. Brute force is possible, but in situations where the number of items is large, the number of combinations at all depths is astronomical. In the example above, there are (3 + 1) * (2 + 1) * (1 + 1) = 24 combinations at all depths. Thus, the total combinations is the product of the number of items of any given value + 1. Of course we can logically throw out huge number of combinations whose partial sum is greater than the max value (e.g. the set 16 12 is already bigger than the max value of 17, so skip any combinations that have a 16 and 12 in them).
I originally thought I could convert the input array into two arrays and increment them kind of like an odometer. But I am getting completely stuck on this recursive algorithm that breaks early. Any suggestions?
{
int uniqueValues = 3;
int[] maxCounts = new int[uniqueValues];
int[] values = new int[uniqueValues];
// easy code to bin the data, just hardcoding for example
maxCounts[0] = 3;
values[0] = 1;
maxCounts[1] = 2;
values[1] = 12;
maxCounts[2] = 1;
values[2] = 16;
GenerateCombinationsHelper(new List<int[]>(), 13, 17, 0, 0, maxCounts, new int[3], values);
}
private void GenerateCombinationsHelper(List<int[]> results, int min, int max, int currentValue, int index, int[] maxValues, int[] currentCombo, int[] values)
{
if (index >= maxValues.Length)
{
return;
}
while (currentCombo[index] < maxValues[index])
{
currentValue += values[index];
if (currentValue> max)
{
return;
}
currentCombo[index]++;
if (currentValue< min)
{
GenerateCombinationsHelper(results, min, max, currentValue, index + 1, maxValues, currentCombo, values);
}
else
{
results.Add((int[])currentCombo.Clone());
}
}
}
Edit
The integer values are just for demonstration. It can be any object that has a some sort of numerical value (int, double, float, etc...)
Typically there will only be a handful of unique values (~10 or so) but there can be several thousands total items.
Switch the main call to:
GenerateCombinationsHelper2(new List<int[]>(), 13, 17, 0, maxCounts, new int[3], values);
and then add this code:
private void GenerateCombinationsHelper2(List<int[]> results, int min, int max, int index, int[] maxValues, int[] currentCombo, int[] values)
{
int max_count = Math.Min((int)Math.Ceiling((double)max / values[index]), maxValues[index]);
for(int count = 0; count <= max_count; count++)
{
currentCombo[index] = count;
if(index < currentCombo.Length - 1)
{
GenerateCombinationsHelper2(results, min, max, index + 1, maxValues, currentCombo, values);
}
else
{
int sum = Sum(currentCombo, values);
if(sum >= min && sum <= max)
{
int[] copy = new int[currentCombo.Length];
Array.Copy(currentCombo, copy, copy.Length);
results.Add(copy);
}
}
}
}
private static int Sum(int[] combo, int[] values)
{
int sum = 0;
for(int i = 0; i < combo.Length; i++)
{
sum += combo[i] * values[i];
}
return sum;
}
It returns the 5 valid answers.
The general tendency with this kind of problem is that there are relatively few values that will show up, but each value shows up many, many times. Therefore you first want to create a data structure that efficiently describes the combinations that will add up to the desired values, and only then figure out all of the combinations that do so. (If you know the term "dynamic programming", that's exactly the approach I'm describing.)
The obvious data structure in C# terms would be a Hashtable whose keys are the totals that the combination adds up to, and whose values are arrays listing the positions of the last elements that can be used in a combination that could add up to that particular total.
How do you build that data structure?
First you start with a Hashtable which contains the total 0 as a key, and an empty array as a value. Then for each element of your array you create a list of the new totals you can reach from the previous totals, and append your element's position to each one of their values (inserting a new one if needed). When you've gone through all of your elements, you have your data structure.
Now you can search that data structure just for the totals that are in the range you want. And for each such total, you can write a recursive program that will go through your data structure to produce the combinations. This step can indeed have a combinatorial explosion, but the nice thing is that EVERY combination produced is actually a combination in your final answer. So if this phase takes a long time, it is because you have a lot of final answers!
Try this algo
int arr[] = {1,1,1,12,12,16}
for(int i = 0;i<2^arr.Length;i++)
{
int[] arrBin = BinaryFormat(i); // binary format i
for(int j = 0;j<arrBin.Length;j++)
if (arrBin[j] == 1)
Console.Write("{0} ", arr[j]);
Console.WriteLine();
}
This is quite similar to the subset sum problem which just happens to be NP-complete.
Wikipedia says the following about NP-complete problems:
Although any given solution to such a problem can be verified quickly,
there is no known efficient way to locate a solution in the first
place; indeed, the most notable characteristic of NP-complete problems
is that no fast solution to them is known. That is, the time required
to solve the problem using any currently known algorithm increases
very quickly as the size of the problem grows. This means that the
time required to solve even moderately sized versions of many of these
problems can easily reach into the billions or trillions of years,
using any amount of computing power available today. As a consequence,
determining whether or not it is possible to solve these problems
quickly, called the P versus NP problem, is one of the principal
unsolved problems in computer science today.
If indeed there is a way to solve this besides brute-forcing through the powerset and finding all subsets which sum up to a value within the given range, then I would be very interested in hearing it.
An idea for another implementation:
Create from the list of numbers a list of stacks, each stack represents a number that appear in the list, and this number is pushed into the stack as many times as he appeared in the numbers list. more so, this list is sorted.
The idea is that you iterate through the stack list, in each stack you pop one number at a time if it doesn't exceed the max value and recall the function, and perform an additional call of skipping the current stack.
This algorithm reduces many redundant computations like trying to add different elements which have the same value when adding this value exceeds the maximal value.
I was able to solve pretty large problems with this algorithm (50 numbers and more), depending on the min and max values, obviously when the interval is very big the number of combinations may be huge.
Here's the code:
static void GenerateLimitedCombinations(List<int> intList, int minValue, int maxValue)
{
intList.Sort();
List<Stack<int>> StackList = new List<Stack<int>>();
Stack<int> NewStack = new Stack<int>();
NewStack.Push(intList[0]);
StackList.Add(NewStack);
for (int i = 1; i < intList.count; i++)
{
if (intList[i - 1] == intList[i])
StackList[StackList.count - 1].Push(intList[i]);
else
{
NewStack = new Stack<int>();
NewStack.Push(intList[i]);
StackList.Add(NewStack);
}
}
GenerateLimitedCombinations(StackList, minValue, maxValue, 0, new List<int>(), 0);
}
static void GenerateLimitedCombinations(List<Stack<int>> stackList, int minValue, int maxValue, int currentStack, List<int> currentCombination, int currentSum)
{
if (currentStack == stackList.count)
{
if (currentSum >= minValue)
{
foreach (int tempInt in CurrentCombination)
{
Console.Write(tempInt + " ");
}
Console.WriteLine(;
}
}
else
{
int TempSum = currentSum;
List<int> NewCombination = new List<int>(currentCombination);
Stack<int> UndoStack = new Stack<int>();
while (stackList[currentStack].Count != 0 && stackList[currentStack].Peek() + TempSum <= maxValue)
{
int AddedValue = stackList[currentStack].Pop();
UndoStack.Push(AddedValue);
NewCombination.Add(AddedValue);
TempSum += AddedValue;
GenerateLimitedCombinations(stackList, minValue, maxValue, currentStack + 1, new List<int>(NewCombination), TempSum);
}
while (UndoStack.Count != 0)
{
stackList[currentStack].Push(UndoStack.Pop());
}
GenerateLimitedCombinations(stackList, minValue, maxValue, currentStack + 1, currentCombination, currentSum);
}
}
Here's a test program:
static void Main(string[] args)
{
Random Rnd = new Random();
List<int> IntList = new List<int>();
int NumberOfInts = 10, MinValue = 19, MaxValue 21;
for (int i = 0; i < NumberOfInts; i++) { IntList.Add(Rnd.Next(1, 10));
for (int i = 0; i < NumberOfInts; i++) { Console.Write(IntList[i] + " "); } Console.WriteLine(); Console.WriteLine();
GenerateLimitedCombinations(IntList, MinValue, MaxValue);
Console.ReadKey();
}

Fast intersection of two sorted integer arrays

I need to find the intersection of two sorted integer arrays and do it very fast.
Right now, I am using the following code:
int i = 0, j = 0;
while (i < arr1.Count && j < arr2.Count)
{
if (arr1[i] < arr2[j])
{
i++;
}
else
{
if (arr2[j] < arr1[i])
{
j++;
}
else
{
intersect.Add(arr2[j]);
j++;
i++;
}
}
}
Unfortunately it might to take hours to do all work.
How to do it faster? I found this article where SIMD instructions are used. Is it possible to use SIMD in .NET?
What do you think about:
http://docs.go-mono.com/index.aspx?link=N:Mono.Simd Mono.SIMD
http://netasm.codeplex.com/ NetASM(inject asm code to managed)
and something like http://www.atrevido.net/blog/PermaLink.aspx?guid=ac03f447-d487-45a6-8119-dc4fa1e932e1
EDIT:
When i say thousands i mean following (in code)
for(var i=0;i<arrCollection1.Count-1;i++)
{
for(var j=i+1;j<arrCollection2.Count;j++)
{
Intersect(arrCollection1[i],arrCollection2[j])
}
}
UPDATE
The fastest I got was 200ms with arrays size 10mil, with the unsafe version (Last piece of code).
The test I've did:
var arr1 = new int[10000000];
var arr2 = new int[10000000];
for (var i = 0; i < 10000000; i++)
{
arr1[i] = i;
arr2[i] = i * 2;
}
var sw = Stopwatch.StartNew();
var result = arr1.IntersectSorted(arr2);
sw.Stop();
Console.WriteLine(sw.Elapsed); // 00:00:00.1926156
Full Post:
I've tested various ways to do it and found this to be very good:
public static List<int> IntersectSorted(this int[] source, int[] target)
{
// Set initial capacity to a "full-intersection" size
// This prevents multiple re-allocations
var ints = new List<int>(Math.Min(source.Length, target.Length));
var i = 0;
var j = 0;
while (i < source.Length && j < target.Length)
{
// Compare only once and let compiler optimize the switch-case
switch (source[i].CompareTo(target[j]))
{
case -1:
i++;
// Saves us a JMP instruction
continue;
case 1:
j++;
// Saves us a JMP instruction
continue;
default:
ints.Add(source[i++]);
j++;
// Saves us a JMP instruction
continue;
}
}
// Free unused memory (sets capacity to actual count)
ints.TrimExcess();
return ints;
}
For further improvement you can remove the ints.TrimExcess();, which will also make a nice difference, but you should think if you're going to need that memory.
Also, if you know that you might break loops that use the intersections, and you don't have to have the results as an array/list, you should change the implementation to an iterator:
public static IEnumerable<int> IntersectSorted(this int[] source, int[] target)
{
var i = 0;
var j = 0;
while (i < source.Length && j < target.Length)
{
// Compare only once and let compiler optimize the switch-case
switch (source[i].CompareTo(target[j]))
{
case -1:
i++;
// Saves us a JMP instruction
continue;
case 1:
j++;
// Saves us a JMP instruction
continue;
default:
yield return source[i++];
j++;
// Saves us a JMP instruction
continue;
}
}
}
Another improvement is to use unsafe code:
public static unsafe List<int> IntersectSorted(this int[] source, int[] target)
{
var ints = new List<int>(Math.Min(source.Length, target.Length));
fixed (int* ptSrc = source)
{
var maxSrcAdr = ptSrc + source.Length;
fixed (int* ptTar = target)
{
var maxTarAdr = ptTar + target.Length;
var currSrc = ptSrc;
var currTar = ptTar;
while (currSrc < maxSrcAdr && currTar < maxTarAdr)
{
switch ((*currSrc).CompareTo(*currTar))
{
case -1:
currSrc++;
continue;
case 1:
currTar++;
continue;
default:
ints.Add(*currSrc);
currSrc++;
currTar++;
continue;
}
}
}
}
ints.TrimExcess();
return ints;
}
In summary, the most major performance hit was in the if-else's.
Turning it into a switch-case made a huge difference (about 2 times faster).
Have you tried something simple like this:
var a = Enumerable.Range(1, int.MaxValue/100).ToList();
var b = Enumerable.Range(50, int.MaxValue/100 - 50).ToList();
//var c = a.Intersect(b).ToList();
List<int> c = new List<int>();
var t1 = DateTime.Now;
foreach (var item in a)
{
if (b.BinarySearch(item) >= 0)
c.Add(item);
}
var t2 = DateTime.Now;
var tres = t2 - t1;
This piece of code takes 1 array of 21,474,836 elements and the other one with 21,474,786
If I use var c = a.Intersect(b).ToList(); I get an OutOfMemoryException
The result product would be 461,167,507,485,096 iterations using nested foreach
But with this simple code, the intersection occurred in TotalSeconds = 7.3960529 (using one core)
Now I am still not happy, so I am trying to increase the performance by breaking this in parallel, as soon as I finish I will post it
Yorye Nathan gave me the fastest intersection of two arrays with the last "unsafe code" method. Unfortunately it was still too slow for me, I needed to make combinations of array intersections, which goes up to 2^32 combinations, pretty much no? I made following modifications and adjustments and time dropped to 2.6X time faster. You need to make some pre optimization before, for sure you can do it some way or another. I am using only indexes instead the actual objects or ids or some other abstract comparison. So, by example if you have to intersect big number like this
Arr1: 103344, 234566, 789900, 1947890,
Arr2: 150034, 234566, 845465, 23849854
put everything into and array
Arr1: 103344, 234566, 789900, 1947890, 150034, 845465,23849854
and use, for intersection, the ordered indexes of the result array
Arr1Index: 0, 1, 2, 3
Arr2Index: 1, 4, 5, 6
Now we have smaller numbers with whom we can build some other nice arrays. What I did after taking the method from Yorye, I took Arr2Index and expand it into, theoretically boolean array, practically into byte arrays, because of the memory size implication, to following:
Arr2IndexCheck: 0, 1, 0, 0, 1, 1 ,1
that is more or less a dictionary which tells me for any index if second array contains it.
The next step I did not use memory allocation which also took time, instead I pre-created the result array before calling the method, so during the process of finding my combinations I never instantiate anything. Of course you have to deal with the length of this array separately, so maybe you need to store it somewhere.
Finally the code looks like this:
public static unsafe int IntersectSorted2(int[] arr1, byte[] arr2Check, int[] result)
{
int length;
fixed (int* pArr1 = arr1, pResult = result)
fixed (byte* pArr2Check = arr2Check)
{
int* maxArr1Adr = pArr1 + arr1.Length;
int* arr1Value = pArr1;
int* resultValue = pResult;
while (arr1Value < maxArr1Adr)
{
if (*(pArr2Check + *arr1Value) == 1)
{
*resultValue = *arr1Value;
resultValue++;
}
arr1Value++;
}
length = (int)(resultValue - pResult);
}
return length;
}
You can see the result array size is returned by the function, then you do what you wish(resize it, keep it). Obviously the result array has to have at least the minimum size of arr1 and arr2.
The big improvement, is that I only iterate through the first array, which would be best to have less size than the second one, so you have less iterations. Less iterations means less CPU cycles right?
So here is the really fast intersection of two ordered arrays, that if you need a reaaaaalllyy high performance ;).
Are arrCollection1 and arrCollection2 collections of arrays of integers? IN this case you should get some notable improvement by starting second loop from i+1 as opposed to 0
C# doesn't support SIMD. Additionally, and I haven't yet figured out why, DLL's that use SSE aren't any faster when called from C# than the non-SSE equivalent functions. Also, all SIMD extensions that I know of don't work with branching anyway, ie your "if" statements.
If you're using .net 4.0, you can use Parallel For to gain speed if you have multiple cores. Otherwise you can write a multithreaded version if you have .net 3.5 or less.
Here is a method similar to yours:
IList<int> intersect(int[] arr1, int[] arr2)
{
IList<int> intersect = new List<int>();
int i = 0, j = 0;
int iMax = arr1.Length - 1, jMax = arr2.Length - 1;
while (i < iMax && j < jMax)
{
while (i < iMax && arr1[i] < arr2[j]) i++;
if (arr1[i] == arr2[j]) intersect.Add(arr1[i]);
while (i < iMax && arr1[i] == arr2[j]) i++; //prevent reduntant entries
while (j < jMax && arr2[j] < arr1[i]) j++;
if (arr1[i] == arr2[j]) intersect.Add(arr1[i]);
while (j < jMax && arr2[j] == arr1[i]) j++; //prevent redundant entries
}
return intersect;
}
This one also prevents any entry from appearing twice. With 2 sorted arrays both of size 10 million, it completed in about a second. The compiler is supposed to remove array bounds checks if you use array.Length in a For statement, I don't know if that works in a while statement though.

Do something for every hundredth element in an array

I have 1 million elements in an array, I need to divide these elements into groups of 100, do a function and continue working on the next hundred
foreach (string value in lines)
{
filescreated++;
if (filescreated == ?????????)
{
do stuff
}
}
???? is equal to value divisible by 100
is equal to value divisable by 100
foreach (...)
{
filescreated++;
if (filescreated % 100 == 0)
{
// do stuff for the every 100th element
}
// do other stuff for every element
}
Reference: modulus (%) operator
Use this if you need to do something special for every 100th element, but you still need to process every element.
If you only need to process every 100th element, refer to Reed's answer.
What about this (if you need in order and % isn't good for you)?
The question is confusing as you talk about every hundredth element, then after about packs of 100. So here would be a guess.
string[] lines = new string[1000000];
for (int i = 0; i < 10000; i++)
{
for (int j = 0; j < 100; j++)
{
DoSomething(lines[100*i + j], i);
}
}
I need to divide these elements into groups of 100, do a function and continue working on the next hundred
You can do this directly, if this is an array, just by using a for loop and incrementing by 100:
int chunkSize = 100;
for (int start=0; start<lines.Length;start += chunkSize)
{
ProcessSectionOfArray(lines, start, Math.Min(start+chunkSize-1, lines.Length-1));
}
Here is a solution which separates the partitioning logic into a separate function.
// A separate static function
private static IEnumerable<IEnumerable<T>> BreakIntoBlocks<T>(T[] source, int blockSize)
{
for (int i = 0; i < source.Length; i += blockSize)
{
yield return source.Skip(i).Take(blockSize);
}
}
// And in your code
string[] lines = new string[1000000];
foreach(IEnumerable<string> stringBlock in BreakIntoBlocks(lines, 100))
{
// stringblock is a block of 100 elements
// Here is where you put the code that processes each separate group
}
The attempt above should be faster than my first attempt (below)
int blockSize = 100;
int i = 0;
IEnumerable<IEnumerable<string>> query = from s in lines
let num = i++
group s by num / blockSize into g
select g;
foreach(IEnumerable<string> stringBlock in query)
{
// Stringblock will be a block of 100 elements.
// Process this 100 elements here.
}
The problem which using the grouping clause is that LINQ will allocate every one of those 1000000 element to groups before it returns the first element.

Byte Array Manipulation - Interview Question

I was having this discussion with my friend who had this question asked to him in the Interview. The Question goes like this. Write a Function which takes in a byte array(2 dimensional) as input along with an Integer n, The initial assumption is all the elements of M*N byte array is zero and the problem is to fill 'n' Byte array elements with value 1, For instance if M=5 and N=5 and the n value is 10 the Byte array should've 10/25 elements to be 1 and rest of the 15 values to be 0. The values filled should be random and one cell in byte array should be filled only once. I was fascinated to try solving this on my own. I've attached the code I've come up with so far.
public Boolean ByteArrayFiller(int a,int b, int n)
{
int count = n;
int iLocalCount = 0;
byte[,] bArray= new byte[a,b];
for (int i = 0; i <a; i++)
for (int j = 1; j <b; j++)
bArray[i, j] = 0;
Random randa= new Random();
int iRandA = randa.Next(a);
int iRandB = randa.Next(b);
while (iLocalCount < n)
{
if (bArray[iRandA, iRandB] == 0)
{
bArray[iRandA, iRandB] = 1;
iLocalCount++;
}
iRandA = randa.Next(a);
iRandB = randa.Next(b);
continue;
}
//do
//{
// //iRandA = randa.Next(a);
// //iRandB = randa.Next(b);
// bArray[iRandA,iRandB]=1;
// iLocalCount++;
//} while (iLocalCount<=count && bArray[iRandA,iRandB]==0);
return true;
}
The code i wrote is in C# but it's straight forward to understand. It's able to do the purpose of the question( I did some trials runs and results came out correctly) perfectly but I have used Random object in C#(Equivalent to Math.Rand in Java) to fill up the byte array and I keep thinking if Rand returns the same values for a and b. There is a good chance for this to go indefinitely. Is that the purpose of the question? or Does the solution that i came up for this question is good enough!
I am curious to see how experts here solve this problem? I am just looking for new ideas to expand my horizon. Any pointers would be greatly appreciated. Thanks for taking the time to read this post!
A while loop trying random locations until it finds a good one is generally a very bad approach. If n = M*N, then the last one will have a probability of 1/(M*N) of finding a match. If M*N are sufficiently large, this can be extremely inefficient.
If M*N is not too large, I would create a temporary array of M*N size, fill it with the numbers 0 through (M*N)-1, and then permutate it - i.e. you walk through it and swap the current value with that of a random other value.
Then you go to the first n elements in your array and set the appropriate cell. (row = value / columns, col = value % columns).
I would treat the array, logically, as a one-dimensional array. Fill the first n positions with the prescribed value, and then shuffle the array.
Given a byte array, and the number of rows and columns in the array, and assuming that the array is already filled with 0:
int NumElements = NumRows * NumCols;
for (int i = 0; i < NumElementsToFill; ++i)
{
int row = i / NumRows;
int col = i % NumCols;
array[row, col] = 1;
}
// Now shuffle the array
Random rnd = new Random();
for (int i = 0; i < NumElements; ++i)
{
int irow = i / NumRows;
int icol = i % NumCols;
int swapWith = rnd.Next(i+1);
int swapRow = swapWith / NumRows;
int swapCol = swapWith % NumCols;
byte temp = array[irow, icol];
array[irow, icol] = array[swapRow, swapCol];
array[swapRow, swapCol] = temp;
}
The key here is converting the one-dimensional index into row/col values. I used / and %. You could also use Math.DivRem. Or create Action methods that do the get and set for you.
Choose a number, which is larger than both N and M and is prime (or co-prime to both N and M). Let's call this number p.
Loop until you've set x numbers:
Generate a random number less than N*M. Call this number `l`.
Then the next place to put the number will be `p*l%(N*M)`, if that position hasn't been set.
A downside to this approach is that if the array is filling up, you'll have more collisions.
Bascially, you need to choose n unique random numbers from range [0, p) (where p = M * N), and map them to positions of 2-dimensional array.
Naive approaches are 1) generate non-unique numbers with retry 2) fill an array with numbers from 0 to p-1, shuffle it and take first n numbers (takes O(p) time, O(p) memory).
Another approach is to choose them with the following algorithm (O(n2) time, O(n) memory, code in Java):
public Set<Integer> chooseUniqueRandomNumbers(int n, int p) {
Set<Integer> choosen = new TreeSet<Integer>();
Random rnd = new Random();
for (int i = 0; i < n; i++) {
// Generate random number from range [0, p - i)
int c = rnd.nextInt(p - i);
// Adjust it as it was choosen from range [0, p) excluding already choosen numbers
Iterator<Integer> it = choosen.iterator();
while (it.hasNext() && it.next() <= c) c++;
choosen.add(c);
}
return choosen;
}
Mapping of generated numbers to positions of 2-dimensional array is trivial.

Categories