Storing sum of chunks of array through one pass - c#

Let's say I have the array
1,2,3,4,5,6,7,8,9,10,11,12
if my chunck size = 4
then I want to be able to have a method that will output an array of ints int[] a =
a[0] = 1
a[1] = 3
a[2] = 6
a[3] = 10
a[4] = 14
a[5] = 18
a[6] = 22
a[7] = 26
a[8] = 30
a[9] = 34
a[10] = 38
a[11] = 42
note that a[n] = a[n] + a[n-1] + a[n-2] + a[n-3] because the chunk size is 4 thus I sum the last 4 items
I need to have the method without a nested loop
for(int i=0; i<12; i++)
{
for(int k = i; k>=0 ;k--)
{
// do sumation
counter++;
if(counter==4)
break;
}
}
for example i don't want to have something like that... in order to make code more efficient
also the chunck size may change so I cannot do:
a[3] = a[0] + a[1] + a[2] + a[3]
edit
The reason why I asked this question is because I need to implement check sum rolling for my data structures class. I basically open a file for reading. I then have a byte array. then I will perform a hash function on parts of the file. lets say the file is 100 bytes. I split it in chunks of 10 bytes. I perform a hash function in each chunck thus I get 10 hashes. then I need to compare those hashes with a second file that is similar. let's say the second file has the same 100 bytes but with an additional 5 so it contains a total of 105 bytes. becasuse those extra bytes may have been in the middle of the file if I perform the same algorithm that I did on the first file it is not going to work. Hope I explain my self correctly. and because some files are large. it is not efficient to have a nested loop in my algorithm.
also the real rolling hashing functions are very complex. Most of them are in c++ and I have a hard time understanding them. That's why I want to create my own hashing function very simple just to demonstrate how check sum rolling works...
Edit 2
int chunckSize = 4;
int[] a = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12 }; // the bytes of the file
int[] b = new int[a.Length]; // array where we will place the checksums
int[] sum = new int[a.Length]; // array needed to avoid nested loop
for (int i = 0; i < a.Length; i++)
{
int temp = 0;
if (i == 0)
{
temp = 1;
}
sum[i] += a[i] + sum[i-1+temp];
if (i < chunckSize)
{
b[i] = sum[i];
}
else
{
b[i] = sum[i] - sum[i - chunckSize];
}
}
the problem with this algorithm is that with large files the sum will at some point be larger than int.Max thus it is not going to work....
but at least know it is more efficient. getting rid of that nested loop helped a lot!
edit 3
Based on edit two I have worked this out. It does not work with large files and also the checksum algorithm is very bad. but at least I think it explains the hashing rolling that I am trying to explain...
Part1(#"A:\fileA.txt");
Part2(#"A:\fileB.txt", null);
.....
// split the file in chuncks and return the checksums of the chuncks
private static UInt64[] Part1(string file)
{
UInt64[] hashes = new UInt64[(int)Math.Pow(2, 20)];
var stream = File.OpenRead(file);
int chunckSize = (int)Math.Pow(2, 22); // 10 => kilobite 20 => megabite 30 => gigabite etc..
byte[] buffer = new byte[chunckSize];
int bytesRead; // how many bytes where read
int counter = 0; // counter
while ( // while bytesRead > 0
(bytesRead =
(stream.Read(buffer, 0, buffer.Length)) // returns the number of bytes read or 0 if no bytes read
) > 0)
{
hashes[counter] = 0;
for (int i = 0; i < bytesRead; i++)
{
hashes[counter] = hashes[counter] + buffer[i]; // simple algorithm not realistic to perform check sum of file
}
counter++;
}// end while loop
return hashes;
}
// split the file in chuncks rolling it. In reallity this file will be on a different computer..
private static void Part2(string file, UInt64[] hash)
{
UInt64[] hashes = new UInt64[(int)Math.Pow(2, 20)];
var stream = File.OpenRead(file);
int chunckSize = (int)Math.Pow(2, 22); // chunks must be as big as in pervious method
byte[] buffer = new byte[chunckSize];
int bytesRead; // how many bytes where read
int counter = 0; // counter
UInt64[] sum = new UInt64[(int)Math.Pow(2, 20)];
while ( // while bytesRead > 0
(bytesRead =
(stream.Read(buffer, 0, buffer.Length)) // returns the number of bytes read or 0 if no bytes read
) > 0)
{
for (int i = 0; i < bytesRead; i++)
{
int temp = 0;
if (counter == 0)
temp = 1;
sum[counter] += (UInt64)buffer[i] + sum[counter - 1 + temp];
if (counter < chunckSize)
{
hashes[counter] = (UInt64)sum[counter];
}else
{
hashes[counter] = (UInt64)sum[counter] - (UInt64)sum[counter - chunckSize];
}
counter++;
}
}// end while loop
// mising to compare hashes arrays
}

Add an array r for the result, and initialize its first chunk members using a loop from 0 to chunk-1. Now observe that to get r[i+1] you can add a[i+1] to r[i], and subtract a[i-chunk+1]. Now you can do the rest of the items in a single non-nested loop:
for (int i=chunk+1 ; i < N-1 ; i++) {
r[i+1] = a[i+1] + r[i] - a[i-chunk+1];
}

You can get this down to a single for loop, though that may not be good enough. To do that, just note that c[i+1] = c[i]-a[i-k+1]+a[i+1]; where a is the original array, c is the chunky array, and k is the size of the chunks.

I understand that you want to compute a rolling hash function to hash every n-gram (where n is what you call the "chunk size"). Rolling hashing is sometimes called "recursive hashing". There is a wikipedia entry on the topic:
http://en.wikipedia.org/wiki/Rolling_hash
A common algorithm to solve this problem is Karp-Rabin. Here is some pseudo-code which you should be able to easily implement in C#:
B←37
s←empty First-In-First-Out (FIFO) structure (e.g., a linked-list)
x←0(L-bit integer)
z←0(L-bit integer)
for each character c do
append c to s
x ← (B x−B^n z + c ) mod 2^L
yield x
if length(s) = n then
remove oldest character y from s
z ← y
end if
end for
Note that because B^n is a constant, the main loop only does two multiplications, one subtraction and one addition. The "mod 2^L" operation can be done very fast (use a mask, or unsigned integers with L=32 or L=64, for example).
Specifically, your C# code might look like this where n is the "chunk" size (just set B=37, and Btothen = 37 ^ n)
r[0] = 0
for (int i=1 ; i < N ; i++) {
r[i] = a[i] + B * r[i-1] - Btothen * a[i-n];
}
Karp-Rabin is not ideal however. I wrote a paper where better solutions are discussed:
Daniel Lemire and Owen Kaser, Recursive n-gram hashing is pairwise independent, at best, Computer Speech & Language 24 (4), pages 698-710, 2010.
http://arxiv.org/abs/0705.4676
I also published the source code (Java and C++, alas no C# but it should not be hard to go from Java to C#):
https://github.com/lemire/rollinghashjava
https://github.com/lemire/rollinghashcpp

How about storing off the last chunk_size values as you step through?
Allocate an array of size chunk_size, set them all to zero, and then set the element at i % chunk_size with your current element at each iteration of i, and then add up all the values?

using System;
class Sample {
static void Main(){
int chunckSize = 4;
int[] a = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 };
int[] b = new int[a.Length];
int sum = 0;
int d = chunckSize*(chunckSize-1)/2;
foreach(var i in a){
if(i < chunckSize){
sum += i;
b[i-1]=sum;
} else {
b[i-1]=chunckSize*i -d;
}
}
Console.WriteLine(String.Join(",", b));//1,3,6,10,14,18,22,26,30,34,38,42
}
}

Related

How to shuffle string characters to right and left until int.MaxValue?

My task is to make a organized shuffle, from source all odd numbers will go to left and even number will go to right.
I have done that much like this, and it is good for normal scenario:
public static string ShuffleChars(string source, int count)
{
if (string.IsNullOrWhiteSpace(source) || source.Length == 0)
{
throw new ArgumentException(null);
}
if (count < 0)
{
throw new ArgumentException(null);
}
for (int i = 0; i < count; i++)
{
source = string.Concat(source.Where((item, index) => index % 2 == 0)) +
string.Concat(source.Where((item, index) => index % 2 != 0));
}
return source;
}
Now the problem is, what if the count is int.MaxValue or a other huge number in millions, it will loop trough a lot. How can I optimize the code in terms of speed and resource consumption?
You should be able to determine by the string's length how many iterations it will take before it's back to it's original sort order. Then take the modulus of the iteration count and the input count, and only iterate that many times.
For example, a string that is three characters will be back to it's original sort order in 2 iterations. If the input count was to do 11 iterations, we know that 11 % 2 == 1, so we only need to iterate one time.
Once you determine a formula for how many iterations it takes to reach the original sort order for any length of string, you can always reduce the number of iterations to that number or less.
Coming up with a formula will be tricky, however. A string with 14 characters takes 12 iterations until it matches itself, but a string with 15 characters only takes 4 iterations.
Therefore, a shortcut might be to simply start iterating until we reach the original sort order (or the specified count, whichever comes first). If we reach the count first, then we return that answer. Otherwise, we can determine the answer from the idea in the first paragraph - take the modulus of the input count and the iteration count, and return that answer.
This would require that we store the values from our iterations (in a dictionary, for example) so we can retrieve a specific previous value.
For example:
public static string ShuffleChars(string source, int count)
{
string s = source;
var results = new Dictionary<int, string>();
for (int i = 0; i < count; i++)
{
s = string.Concat(s.Where((item, index) => index % 2 == 0)) +
string.Concat(s.Where((item, index) => index % 2 != 0));
// If we've repeated our original string, return the saved
// value of the input count modulus the current iteration
if (s == source)
{
return results[count % (i + 1) - 1];
}
// Otherwise, save the value for later
else
{
results[i] = s;
}
}
// If we get here it means we hit the requested count before
// ever returning to the original sort order of the input
return s;
}
Instead of creating new immutable strings on each loop, you could work with a mutable array of characters (char[]), and swap characters between places. This would be the most efficient in terms of memory consumption, but doing the swaps on a single array could be quite tricky. Using two arrays is much easier, because you can just copy characters from one array to the other, and at the end of each loop swap the two arrays.
One more optimization you could do is to work with the indices of the char array, instead of its values. I am not sure if this will make any difference in practice, since in modern 64 bit machines both char and int types occupy 8 bytes (AFAIK). It will surely make a difference on 32 bit machines though. Here is an implementation, with all these ideas put together:
public static string ShuffleChars(string source, int count)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (count < 0) throw new ArgumentOutOfRangeException(nameof(count));
// Instantiate the two arrays
int[] indices = new int[source.Length];
int[] temp = new int[source.Length];
// Initialize the indices array with incremented numbers
for (int i = 0; i < indices.Length; i++)
indices[i] = i;
for (int k = 0; k < count; k++)
{
// Copy the odds to the temp array
for (int i = 0, j = 0; j < indices.Length; i += 1, j += 2)
temp[i] = indices[j];
// Copy the evens to the temp array
int lastEven = (indices.Length >> 1 << 1) - 1;
for (int i = indices.Length - 1, j = lastEven; j >= 0; i -= 1, j -= 2)
temp[i] = indices[j];
// Swap the two arrays, using value tuples
(indices, temp) = (temp, indices);
}
// Map the indices to characters from the source string
return String.Concat(indices.Select(i => source[i]));
}

Breaking up underlying binary in Byte array into 10 or 12 bit words: C#

I'm parsing binary data from a file that comes in as a byte array. I'm trying to split the underlying binary of the array into 'words' (every 10 or 12 bits). I have a function that does this but it is pretty time consuming as I'm dealing with a lot of data. I have limited programming experience so I'm sure there's a better way to accomplish this.
private void separateWords(List<byte[]> minorFrames, int wordSize, int frameLength)
{
UInt16[] wordArray = new UInt16[frameLength];
foreach (byte[] array in minorFrames)
{
// Convert byte array to bit array
// Bits need to be reversed on a byte boundary
byte[] temp = new byte[array.Length];
for (int i = 0; i < array.Length; i++)
{
temp[i] = ReverseBits(array[i]);
}
BitArray binaryArray = new BitArray(temp);
for (int i = 0; i < (binaryArray.Length / wordSize); i++ )
{
UInt16 newWord = 0;
for (int j = 0; j < wordSize; j++)
{ // Converts every n bits to UInt16
if (binaryArray[j + (i*wordSize)])
newWord += Convert.ToUInt16(Math.Pow(2, ((wordSize-1)-j)));
}
wordArray[i]=newWord; // Populate formatted minor frame
}
words.Add(wordArray); // add populated minor frame to lsit
}
}
Ideally I'd like to operate directly on the byte array. The 'words' will be saved into UInt16's to keep the output size as small as possible.
My current thought is:
Shift first 10 bits into UInt16 variable
add variable to array of words
shift entire byte array over 10 bits
repeat
I'm having some trouble shifting bits into a UInt16 though, and unsure how to shift an entire array. Maybe there's a better way to approach this?
Math.Pow is very time consuming. Consider using Bitwise and shift operators (C# reference).
if (binaryArray[j + i * wordSize]) {
newWord |= (ushort)(1 << (wordSize - 1 - j));
}
After some feedback I've rewritten the loop to:
public List<UInt16[]> separateWords(List<byte[]> minorFrames, int wordSize, int frameLength)
{
List<UInt16[]> framedWords = new List<UInt16[]>();
UInt16 newWord = 0;
foreach (byte[] array in minorFrames)
{
int bitcount = 1;
int wordCount = 0;
BitArray binaryArray = new BitArray(array);
UInt16[] wordArray = new UInt16[frameLength];
for (int i = 1; i <= array.Length; i++)
{
for (int j = 1; j <= 8; j++)
{
newWord <<= 1; // Make room for next bit
newWord |= Convert.ToUInt16(binaryArray[(i * 8) - j]); // Adds next bit in array
if (bitcount % wordSize == 0) // Only if multiple of wordsize
{
wordArray[wordCount] = newWord; // Populate formatted minor frame
newWord = 0; // Reset for next word
wordCount++; // Advance index
}
bitcount++;
}
}
framedWords.Add(wordArray); // add populated minor frame to lsit
}
return framedWords;
}
This took the run time from 12 minutes to 2.5 minutes.

Mathematically updating the Max of a C# Integer Queue after an Enqueue and Dequeue [duplicate]

Given an array of size n and k, how do you find the maximum for every contiguous subarray of size k?
For example
arr = 1 5 2 6 3 1 24 7
k = 3
ans = 5 6 6 6 24 24
I was thinking of having an array of size k and each step evict the last element out and add the new element and find maximum among that. It leads to a running time of O(nk). Is there a better way to do this?
You have heard about doing it in O(n) using dequeue.
Well that is a well known algorithm for this question to do in O(n).
The method i am telling is quite simple and has time complexity O(n).
Your Sample Input:
n=10 , W = 3
10 3
1 -2 5 6 0 9 8 -1 2 0
Answer = 5 6 6 9 9 9 8 2
Concept: Dynamic Programming
Algorithm:
N is number of elements in an array and W is window size. So, Window number = N-W+1
Now divide array into blocks of W starting from index 1.
Here divide into blocks of size 'W'=3.
For your sample input:
We have divided into blocks because we will calculate maximum in 2 ways A.) by traversing from left to right B.) by traversing from right to left.
but how ??
Firstly, Traversing from Left to Right. For each element ai in block we will find maximum till that element ai starting from START of Block to END of that block.
So here,
Secondly, Traversing from Right to Left. For each element 'ai' in block we will find maximum till that element 'ai' starting from END of Block to START of that block.
So Here,
Now we have to find maximum for each subarray or window of size 'W'.
So, starting from index = 1 to index = N-W+1 .
max_val[index] = max(RL[index], LR[index+w-1]);
for index=1: max_val[1] = max(RL[1],LR[3]) = max(5,5)= 5
Simliarly, for all index i, (i<=(n-k+1)), value at RL[i] and LR[i+w-1]
are compared and maximum among those two is answer for that subarray.
So Final Answer : 5 6 6 9 9 9 8 2
Time Complexity: O(n)
Implementation code:
#include <iostream>
#include <cstdio>
#include <cstring>
#include <algorithm>
#define LIM 100001
using namespace std;
int arr[LIM]; // Input Array
int LR[LIM]; // maximum from Left to Right
int RL[LIM]; // maximum from Right to left
int max_val[LIM]; // number of subarrays(windows) will be n-k+1
int main(){
int n, w, i, k; // 'n' is number of elements in array
// 'w' is Window's Size
cin >> n >> w;
k = n - w + 1; // 'K' is number of Windows
for(i = 1; i <= n; i++)
cin >> arr[i];
for(i = 1; i <= n; i++){ // for maximum Left to Right
if(i % w == 1) // that means START of a block
LR[i] = arr[i];
else
LR[i] = max(LR[i - 1], arr[i]);
}
for(i = n; i >= 1; i--){ // for maximum Right to Left
if(i == n) // Maybe the last block is not of size 'W'.
RL[i] = arr[i];
else if(i % w == 0) // that means END of a block
RL[i] = arr[i];
else
RL[i] = max(RL[i+1], arr[i]);
}
for(i = 1; i <= k; i++) // maximum
max_val[i] = max(RL[i], LR[i + w - 1]);
for(i = 1; i <= k ; i++)
cout << max_val[i] << " ";
cout << endl;
return 0;
}
Running Code Link
I'll try to proof: (by #johnchen902)
If k % w != 1 (k is not the begin of a block)
Let k* = The begin of block containing k
ans[k] = max( arr[k], arr[k + 1], arr[k + 2], ..., arr[k + w - 1])
= max( max( arr[k], arr[k + 1], arr[k + 2], ..., arr[k*]),
max( arr[k*], arr[k* + 1], arr[k* + 2], ..., arr[k + w - 1]) )
= max( RL[k], LR[k+w-1] )
Otherwise (k is the begin of a block)
ans[k] = max( arr[k], arr[k + 1], arr[k + 2], ..., arr[k + w - 1])
= RL[k] = LR[k+w-1]
= max( RL[k], LR[k+w-1] )
Dynamic programming approach is very neatly explained by Shashank Jain. I would like to explain how to do the same using dequeue.
The key is to maintain the max element at the top of the queue(for a window ) and discarding the useless elements and we also need to discard the elements that are out of index of current window.
useless elements = If Current element is greater than the last element of queue than the last element of queue is useless .
Note : We are storing the index in queue not the element itself. It will be more clear from the code itself.
1. If Current element is greater than the last element of queue than the last element of queue is useless . We need to delete that last element.
(and keep deleting until the last element of queue is smaller than current element).
2. If if current_index - k >= q.front() that means we are going out of window so we need to delete the element from front of queue.
vector<int> max_sub_deque(vector<int> &A,int k)
{
deque<int> q;
for(int i=0;i<k;i++)
{
while(!q.empty() && A[i] >= A[q.back()])
q.pop_back();
q.push_back(i);
}
vector<int> res;
for(int i=k;i<A.size();i++)
{
res.push_back(A[q.front()]);
while(!q.empty() && A[i] >= A[q.back()] )
q.pop_back();
while(!q.empty() && q.front() <= i-k)
q.pop_front();
q.push_back(i);
}
res.push_back(A[q.front()]);
return res;
}
Since each element is enqueued and dequeued atmost 1 time to time complexity is O(n+n) = O(2n) = O(n).
And the size of queue can not exceed the limit k . so space complexity = O(k).
An O(n) time solution is possible by combining the two classic interview questions:
Make a stack data-structure (called MaxStack) which supports push, pop and max in O(1) time.
This can be done using two stacks, the second one contains the minimum seen so far.
Model a queue with a stack.
This can done using two stacks. Enqueues go into one stack, and dequeues come from the other.
For this problem, we basically need a queue, which supports enqueue, dequeue and max in O(1) (amortized) time.
We combine the above two, by modelling a queue with two MaxStacks.
To solve the question, we queue k elements, query the max, dequeue, enqueue k+1 th element, query the max etc. This will give you the max for every k sized sub-array.
I believe there are other solutions too.
1)
I believe the queue idea can be simplified. We maintain a queue and a max for every k. We enqueue a new element, and dequeu all elements which are not greater than the new element.
2) Maintain two new arrays which maintain the running max for each block of k, one array for one direction (left to right/right to left).
3) Use a hammer: Preprocess in O(n) time for range maximum queries.
The 1) solution above might be the most optimal.
You need a fast data structure that can add, remove and query for the max element in less than O(n) time (you can just use an array if O(n) or O(nlogn) is acceptable). You can use a heap, a balanced binary search tree, a skip list, or any other sorted data structure that performs these operations in O(log(n)).
The good news is that most popular languages have a sorted data structure implemented that supports these operations for you. C++ has std::set and std::multiset (you probably need the latter) and Java has PriorityQueue and TreeSet.
Here is the java implementation
public static Integer[] maxsInEveryWindows(int[] arr, int k) {
Deque<Integer> deque = new ArrayDeque<Integer>();
/* Process first k (or first window) elements of array */
for (int i = 0; i < k; i++) {
// For very element, the previous smaller elements are useless so
// remove them from deque
while (!deque.isEmpty() && arr[i] >= arr[deque.peekLast()]) {
deque.removeLast(); // Remove from rear
}
// Add new element at rear of queue
deque.addLast(i);
}
List<Integer> result = new ArrayList<Integer>();
// Process rest of the elements, i.e., from arr[k] to arr[n-1]
for (int i = k; i < arr.length; i++) {
// The element at the front of the queue is the largest element of
// previous window, so add to result.
result.add(arr[deque.getFirst()]);
// Remove all elements smaller than the currently
// being added element (remove useless elements)
while (!deque.isEmpty() && arr[i] >= arr[deque.peekLast()]) {
deque.removeLast();
}
// Remove the elements which are out of this window
while (!deque.isEmpty() && deque.getFirst() <= i - k) {
deque.removeFirst();
}
// Add current element at the rear of deque
deque.addLast(i);
}
// Print the maximum element of last window
result.add(arr[deque.getFirst()]);
return result.toArray(new Integer[0]);
}
Here is the corresponding test case
#Test
public void maxsInWindowsOfSizeKTest() {
Integer[] result = ArrayUtils.maxsInEveryWindows(new int[]{1, 2, 3, 1, 4, 5, 2, 3, 6}, 3);
assertThat(result, equalTo(new Integer[]{3, 3, 4, 5, 5, 5, 6}));
result = ArrayUtils.maxsInEveryWindows(new int[]{8, 5, 10, 7, 9, 4, 15, 12, 90, 13}, 4);
assertThat(result, equalTo(new Integer[]{10, 10, 10, 15, 15, 90, 90}));
}
Using a heap (or tree), you should be able to do it in O(n * log(k)). I'm not sure if this would be indeed better.
here is the Python implementation in O(1)...Thanks to #Shahshank Jain in advance..
from sys import stdin,stdout
from operator import *
n,w=map(int , stdin.readline().strip().split())
Arr=list(map(int , stdin.readline().strip().split()))
k=n-w+1 # window size = k
leftA=[0]*n
rightA=[0]*n
result=[0]*k
for i in range(n):
if i%w==0:
leftA[i]=Arr[i]
else:
leftA[i]=max(Arr[i],leftA[i-1])
for i in range(n-1,-1,-1):
if i%w==(w-1) or i==n-1:
rightA[i]=Arr[i]
else:
rightA[i]=max(Arr[i],rightA[i+1])
for i in range(k):
result[i]=max(rightA[i],leftA[i+w-1])
print(*result,sep=' ')
Method 1: O(n) time, O(k) space
We use a deque (it is like a list but with constant-time insertion and deletion from both ends) to store the index of useful elements.
The index of the current max is kept at the leftmost element of deque. The rightmost element of deque is the smallest.
In the following, for easier explanation we say an element from the array is in the deque, while in fact the index of that element is in the deque.
Let's say {5, 3, 2} are already in the deque (again, if fact their indexes are).
If the next element we read from the array is bigger than 5 (remember, the leftmost element of deque holds the max), say 7: We delete the deque and create a new one with only 7 in it (we do this because the current elements are useless, we have found a new max).
If the next element is less than 2 (which is the smallest element of deque), say 1: We add it to the right ({5, 3, 2, 1})
If the next element is bigger than 2 but less than 5, say 4: We remove elements from right that are smaller than the element and then add the element from right ({5, 4}).
Also we keep elements of the current window only (we can do this in constant time because we are storing the indexes instead of elements).
from collections import deque
def max_subarray(array, k):
deq = deque()
for index, item in enumerate(array):
if len(deq) == 0:
deq.append(index)
elif index - deq[0] >= k: # the max element is out of the window
deq.popleft()
elif item > array[deq[0]]: # found a new max
deq = deque()
deq.append(index)
elif item < array[deq[-1]]: # the array item is smaller than all the deque elements
deq.append(index)
elif item > array[deq[-1]] and item < array[deq[0]]:
while item > array[deq[-1]]:
deq.pop()
deq.append(index)
if index >= k - 1: # start printing when the first window is filled
print(array[deq[0]])
Proof of O(n) time: The only part we need to check is the while loop. In the whole runtime of the code, the while loop can perform at most O(n) operations in total. The reason is that the while loop pops elements from the deque, and since in other parts of the code, we do at most O(n) insertions into the deque, the while loop cannot exceed O(n) operations in total. So the total runtime is O(n) + O(n) = O(n)
Method 2: O(n) time, O(n) space
This is the explanation of the method suggested by S Jain (as mentioned in the comments of his post, this method doesn't work with data streams, which most sliding window questions are designed for).
The reason that method works is explained using the following example:
array = [5, 6, 2, 3, 1, 4, 2, 3]
k = 4
[5, 6, 2, 3 1, 4, 2, 3 ]
LR: 5 6 6 6 1 4 4 4
RL: 6 6 3 3 4 4 3 3
6 6 4 4 4
To get the max for the window [2, 3, 1, 4],
we can get the max of [2, 3] and max of [1, 4], and return the bigger of the two.
Max of [2, 3] is calculated in the RL pass and max of [1, 4] is calculated in LR pass.
Using Fibonacci heap, you can do it in O(n + (n-k) log k), which is equal to O(n log k) for small k, for k close to n this becomes O(n).
The algorithm: in fact, you need:
n inserts to the heap
n-k deletions
n-k findmax's
How much these operations cost in Fibonacci heaps? Insert and findmax is O(1) amortized, deletion is O(log n) amortized. So, we have
O(n + (n-k) log k + (n-k)) = O(n + (n-k) log k)
Sorry, this should have been a comment but I am not allowed to comment for now.
#leo and #Clay Goddard
You can save yourselves from re-computing the maximum by storing both maximum and 2nd maximum of the window in the beginning
(2nd maximum will be the maximum only if there are two maximums in the initial window). If the maximum slides out of the window you still have the next best candidate to compare with the new entry. So you get O(n) , otherwise if you allowed the whole re-computation again the worst case order would be O(nk), k is the window size.
class MaxFinder
{
// finds the max and its index
static int[] findMaxByIteration(int arr[], int start, int end)
{
int max, max_ndx;
max = arr[start];
max_ndx = start;
for (int i=start; i<end; i++)
{
if (arr[i] > max)
{
max = arr[i];
max_ndx = i;
}
}
int result[] = {max, max_ndx};
return result;
}
// optimized to skip iteration, when previous windows max element
// is present in current window
static void optimizedPrintKMax(int arr[], int n, int k)
{
int i, j, max, max_ndx;
// for first window - find by iteration.
int result[] = findMaxByIteration(arr, 0, k);
System.out.printf("%d ", result[0]);
max = result[0];
max_ndx = result[1];
for (j=1; j <= (n-k); j++)
{
// if previous max has fallen out of current window, iterate and find
if (max_ndx < j)
{
result = findMaxByIteration(arr, j, j+k);
max = result[0];
max_ndx = result[1];
}
// optimized path, just compare max with new_elem that has come into the window
else
{
int new_elem_ndx = j + (k-1);
if (arr[new_elem_ndx] > max)
{
max = arr[new_elem_ndx];
max_ndx = new_elem_ndx;
}
}
System.out.printf("%d ", max);
}
}
public static void main(String[] args)
{
int arr[] = {10, 9, 8, 7, 6, 5, 4, 3, 2, 1};
//int arr[] = {1,5,2,6,3,1,24,7};
int n = arr.length;
int k = 3;
optimizedPrintKMax(arr, n, k);
}
}
package com;
public class SlidingWindow {
public static void main(String[] args) {
int[] array = { 1, 5, 2, 6, 3, 1, 24, 7 };
int slide = 3;//say
List<Integer> result = new ArrayList<Integer>();
for (int i = 0; i < array.length - (slide-1); i++) {
result.add(getMax(array, i, slide));
}
System.out.println("MaxList->>>>" + result.toString());
}
private static Integer getMax(int[] array, int i, int slide) {
List<Integer> intermediate = new ArrayList<Integer>();
System.out.println("Initial::" + intermediate.size());
while (intermediate.size() < slide) {
intermediate.add(array[i]);
i++;
}
Collections.sort(intermediate);
return intermediate.get(slide - 1);
}
}
Here is the solution in O(n) time complexity with auxiliary deque
public class TestSlidingWindow {
public static void main(String[] args) {
int[] arr = { 1, 5, 7, 2, 1, 3, 4 };
int k = 3;
printMaxInSlidingWindow(arr, k);
}
public static void printMaxInSlidingWindow(int[] arr, int k) {
Deque<Integer> queue = new ArrayDeque<Integer>();
Deque<Integer> auxQueue = new ArrayDeque<Integer>();
int[] resultArr = new int[(arr.length - k) + 1];
int maxElement = 0;
int j = 0;
for (int i = 0; i < arr.length; i++) {
queue.add(arr[i]);
if (arr[i] > maxElement) {
maxElement = arr[i];
}
/** we need to maintain the auxiliary deque to maintain max element in case max element is removed.
We add the element to deque straight away if subsequent element is less than the last element
(as there is a probability if last element is removed this element can be max element) otherwise
remove all lesser element then insert current element **/
if (auxQueue.size() > 0) {
if (arr[i] < auxQueue.peek()) {
auxQueue.push(arr[i]);
} else {
while (auxQueue.size() > 0 && (arr[i] > auxQueue.peek())) {
auxQueue.pollLast();
}
auxQueue.push(arr[i]);
}
}else {
auxQueue.push(arr[i]);
}
if (queue.size() > 3) {
int removedEl = queue.removeFirst();
if (maxElement == removedEl) {
maxElement = auxQueue.pollFirst();
}
}
if (queue.size() == 3) {
resultArr[j++] = maxElement;
}
}
for (int i = 0; i < resultArr.length; i++) {
System.out.println(resultArr[i]);
}
}
}
static void countDistinct(int arr[], int n, int k)
{
System.out.print("\nMaximum integer in the window : ");
// Traverse through every window
for (int i = 0; i <= n - k; i++) {
System.out.print(findMaximuminAllWindow(Arrays.copyOfRange(arr, i, arr.length), k)+ " ");
}
}
private static int findMaximuminAllWindow(int[] win, int k) {
// TODO Auto-generated method stub
int max= Integer.MIN_VALUE;
for(int i=0; i<k;i++) {
if(win[i]>max)
max=win[i];
}
return max;
}
arr = 1 5 2 6 3 1 24 7
We have to find the maximum of subarray, Right?
So, What is meant by subarray?
SubArray = Partial set and it should be in order and contiguous.
From the above array
{1,5,2} {6,3,1} {1,24,7} all are the subarray examples
n = 8 // Array length
k = 3 // window size
For finding the maximum, we have to iterate through the array, and find the maximum.
From the window size k,
{1,5,2} = 5 is the maximum
{5,2,6} = 6 is the maximum
{2,6,3} = 6 is the maximum
and so on..
ans = 5 6 6 6 24 24
It can be evaluated as the n-k+1
Hence, 8-3+1 = 6
And the length of an answer is 6 as we seen.
How can we solve this now?
When the data is moving from the pipe, the first thought for the data structure came in mind is the Queue
But, rather we are not discussing much here, we directly jump on the deque
Thinking Would be:
Window is fixed and data is in and out
Data is fixed and window is sliding
EX: Time series database
While (Queue is not empty and arr[Queue.back() < arr[i]] {
Queue.pop_back();
Queue.push_back();
For the rest:
Print the front of queue
// purged expired element
While (queue not empty and queue.front() <= I-k) {
Queue.pop_front();
While (Queue is not empty and arr[Queue.back() < arr[i]] {
Queue.pop_back();
Queue.push_back();
}
}
arr = [1, 2, 3, 1, 4, 5, 2, 3, 6]
k = 3
for i in range(len(arr)-k):
k=k+1
print (max(arr[i:k]),end=' ') #3 3 4 5 5 5 6
Two approaches.
Segment Tree O(nlog(n-k))
Build a maximum segment-tree.
Query between [i, i+k)
Something like..
public static void printMaximums(int[] a, int k) {
int n = a.length;
SegmentTree tree = new SegmentTree(a);
for (int i=0; i<=n-k; i++) System.out.print(tree.query(i, i+k));
}
Deque O(n)
If the next element is greater than the rear element, remove the rear element.
If the element in the front of the deque is out of the window, remove the front element.
public static void printMaximums(int[] a, int k) {
int n = a.length;
Deque<int[]> deck = new ArrayDeque<>();
List<Integer> result = new ArrayList<>();
for (int i=0; i<n; i++) {
while (!deck.isEmpty() && a[i] >= deck.peekLast()[0]) deck.pollLast();
deck.offer(new int[] {a[i], i});
while (!deck.isEmpty() && deck.peekFirst()[1] <= i - k) deck.pollFirst();
if (i >= k - 1) result.add(deck.peekFirst()[0]);
}
System.out.println(result);
}
Here is an optimized version of the naive (conditional) nested loop approach I came up with which is much faster and doesn't require any auxiliary storage or data structure.
As the program moves from window to window, the start index and end index moves forward by 1. In other words, two consecutive windows have adjacent start and end indices.
For the first window of size W , the inner loop finds the maximum of elements with index (0 to W-1). (Hence i == 0 in the if in 4th line of the code).
Now instead of computing for the second window which only has one new element, since we have already computed the maximum for elements of indices 0 to W-1, we only need to compare this maximum to the only new element in the new window with the index W.
But if the element at 0 was the maximum which is the only element not part of the new window, we need to compute the maximum using the inner loop from 1 to W again using the inner loop (hence the second condition maxm == arr[i-1] in the if in line 4), otherwise just compare the maximum of the previous window and the only new element in the new window.
void print_max_for_each_subarray(int arr[], int n, int k)
{
int maxm;
for(int i = 0; i < n - k + 1 ; i++)
{
if(i == 0 || maxm == arr[i-1]) {
maxm = arr[i];
for(int j = i+1; j < i+k; j++)
if(maxm < arr[j]) maxm = arr[j];
}
else {
maxm = maxm < arr[i+k-1] ? arr[i+k-1] : maxm;
}
cout << maxm << ' ';
}
cout << '\n';
}
You can use Deque data structure to implement this. Deque has an unique facility that you can insert and remove elements from both the ends of the queue unlike the traditional queue where you can only insert from one end and remove from other.
Following is the code for the above problem.
public int[] maxSlidingWindow(int[] nums, int k) {
int n = nums.length;
int[] maxInWindow = new int[n - k + 1];
Deque<Integer> dq = new LinkedList<Integer>();
int i = 0;
for(; i<k; i++){
while(!dq.isEmpty() && nums[dq.peekLast()] <= nums[i]){
dq.removeLast();
}
dq.addLast(i);
}
for(; i <n; i++){
maxInWindow[i - k] = nums[dq.peekFirst()];
while(!dq.isEmpty() && dq.peekFirst() <= i - k){
dq.removeFirst();
}
while(!dq.isEmpty() && nums[dq.peekLast()] <= nums[i]){
dq.removeLast();
}
dq.addLast(i);
}
maxInWindow[i - k] = nums[dq.peekFirst()];
return maxInWindow;
}
the resultant array will have n - k + 1 elements where n is length of the given array, k is the given window size.
We can solve it using the Python , applying the slicing.
def sliding_window(a,k,n):
max_val =[]
val =[]
val1=[]
for i in range(n-k-1):
if i==0:
val = a[0:k+1]
print("The value in val variable",val)
val1 = max(val)
max_val.append(val1)
else:
val = a[i:i*k+1]
val1 =max(val)
max_val.append(val1)
return max_val
Driver Code
a = [15,2,3,4,5,6,2,4,9,1,5]
n = len(a)
k = 3
sl=s liding_window(a,k,n)
print(sl)
Create a TreeMap of size k. Put first k elements as keys in it and assign any value like 1(doesn't matter). TreeMap has the property to sort the elements based on key so now, first element in map will be min and last element will be max element. Then remove 1 element from the map whose index in the arr is i-k. Here, I have considered that Input elements are taken in array arr and from that array we are filling the map of size k. Since, we can't do anything with sorting happening inside TreeMap, therefore this approach will also take O(n) time.
100% working Tested (Swift)
func maxOfSubArray(arr:[Int],n:Int,k:Int)->[Int]{
var lenght = arr.count
var resultArray = [Int]()
for i in 0..<arr.count{
if lenght+1 > k{
let tempArray = Array(arr[i..<k+i])
resultArray.append(tempArray.max()!)
}
lenght = lenght - 1
}
print(resultArray)
return resultArray
}
This way we can use:
maxOfSubArray(arr: [1,2,3,1,4,5,2,3,6], n: 9, k: 3)
Result:
[3, 3, 4, 5, 5, 5, 6]
Just notice that you only have to find in the new window if:
* The new element in the window is smaller than the previous one (if it's bigger, it's for sure this one).
OR
* The element that just popped out of the window was the current bigger.
In this case, re-scan the window.
for how big k? for reasonable-sized k. you can create k k-sized buffers and just iterate over the array keeping track of max element pointers in the buffers - needs no data structures and is O(n) k^2 pre-allocation.
A complete working solution in Amortised Constant O(1) Complexity.
https://github.com/varoonverma/code-challenge.git
Compare the first k elements and find the max, this is your first number
then compare the next element to the previous max. If the next element is bigger, that is your max of the next subarray, if its equal or smaller, the max for that sub array is the same
then move on to the next number
max(1 5 2) = 5
max(5 6) = 6
max(6 6) = 6
... and so on
max(3 24) = 24
max(24 7) = 24
It's only slightly better than your answer

how many numbers between 1 to 10 billion contains 14

i tried this code but it takes so long and I can not get the result
public long getCounter([FromBody]object req)
{
JObject param = Utility.GetRequestParameter(req);
long input = long.Parse(param["input"].ToString());
long counter = 0;
for (long i = 14; i <= input; i++)
{
string s = i.ToString();
if (s.Contains("14"))
{
counter += 1;
}
}
return counter;
}
please help
We can examine all non-negative numbers < 10^10. Every such number can be represented with the sequence of 10 digits (with leading zeroes allowed).
How many numbers include 14
Dynamic programming solution. Let's find the number of sequences of a specific length that ends with the specific digit and contains (or not) subsequence 14:
F(len, digit, 0) is the number of sequences of length len that ends with digit and do not contain 14, F(len, digit, 1) is the number of such sequences that contain 14. Initially F(0, 0, 0) = 1. The result is the sum of all F(10, digit, 1).
C++ code to play with: https://ideone.com/2aS17v. The answer seems to be 872348501.
How many times the numbers include 14
First, let's place 14 at the end of the sequence:
????????14
Every '?' can be replaced with any digit from 0 to 9. Thus, there are 10^8 numbers in the interval that contains 14 at the end. Then consider ???????14?, ??????14??, ..., 14???????? numbers. There are 9 possible locations of 14 sequence. The answer is 10^8 * 9 = 90000000.
[Added by Matthew Watson]
Here's the C# version of the C++ implementation; it runs in less than 100ms:
using System;
namespace Demo
{
public static class Program
{
public static void Main(string[] args)
{
const int M = 10;
int[,,] f = new int [M + 1, 10, 2];
f[0, 0, 0] = 1;
for (int len = 1; len <= M; ++len)
{
for (int d = 0; d <= 9; ++d)
{
for (int j = 0; j <= 9; ++j)
{
f[len,d,0] += f[len - 1,j,0];
f[len,d,1] += f[len - 1,j,1];
}
}
f[len,4,0] -= f[len - 1,1,0];
f[len,4,1] += f[len - 1,1,0];
}
int sum = 0;
for (int i = 0; i <= 9; ++i)
sum += f[M,i,1];
Console.WriteLine(sum); // 872,348,501
}
}
}
If you want a brute force solution it could be something like this (please, notice, that we should avoid time consuming string operations like ToString, Contains):
int count = 0;
// Let's use all CPU's cores: Parallel.For
Parallel.For(0L, 10000000000L, (v) => {
for (long x = v; x > 10; x /= 10) {
// Get rid of ToString and Contains here
if (x % 100 == 14) {
Interlocked.Increment(ref count); // We want an atomic (thread safe) operation
break;
}
}
});
Console.Write(count);
It returns 872348501 within 6 min (Core i7 with 4 cores at 3.2GHz)
UPDATE
My parallel code calculated the result as 872,348,501 in 9 minutes on my 8- processor-core Intel Core I7 PC.
(There is a much better solution above that takes less than 100ms, but I shall leave this answer here since it provides corroborating evidence for the fast answer.)
You can use multiple threads (one per processor core) to reduce the calculation time.
At first I thought that I could use AsParallel() to speed this up - however, it turns out that you can't use AsParallel() on sequences with more than 2^31 items.
(For completeness I'm including my faulty implementation using AsParallel at the end of this answer).
Instead, I've written some custom code to break the problem down into a number of chunks equal to the number of processors:
using System;
using System.Linq;
using System.Threading.Tasks;
namespace Demo
{
class Program
{
static void Main()
{
int numProcessors = Environment.ProcessorCount;
Task<long>[] results = new Task<long>[numProcessors];
long count = 10000000000;
long elementsPerProcessor = count / numProcessors;
for (int i = 0; i < numProcessors; ++i)
{
long end;
long start = i * elementsPerProcessor;
if (i != (numProcessors - 1))
end = start + elementsPerProcessor;
else // Last thread - go right up to the last element.
end = count;
results[i] = Task.Run(() => processElements(start, end));
}
long sum = results.Select(r => r.Result).Sum();
Console.WriteLine(sum);
}
static long processElements(long inclusiveStart, long exclusiveEnd)
{
long total = 0;
for (long i = inclusiveStart; i < exclusiveEnd; ++i)
if (i.ToString().Contains("14"))
++total;
return total;
}
}
}
The following code does NOT work because AsParallel() doesn't work on sequences with more than 2^31 items.
static void Main(string[] args)
{
var numbersContaining14 =
from number in numbers(0, 100000000000).AsParallel()
where number.ToString().Contains("14")
select number;
Console.WriteLine(numbersContaining14.LongCount());
}
static IEnumerable<long> numbers(long first, long count)
{
for (long i = first, last = first + count; i < last; ++i)
yield return i;
}
You compute the count of numbers of a given length ending in 1, 4 or something else that don't contain 14. Then you can extend the length by 1.
Then the count of numbers that do contain 14 is the count of all numbers minus those that don't contain a 14.
private static long Count(int len) {
long e1=0, e4=0, eo=1;
long N=1;
for (int n=0; n<len; n++) {
long ne1 = e4+e1+eo, ne4 = e4+eo, neo = 8*(e1+e4+eo);
e1 = ne1; e4 = ne4; eo = neo;
N *= 10;
}
return N - e1 - e4 - eo;
}
You can reduce this code a little, noting that eo = 8*e1 except for the first iteration, and then avoiding the local variables.
private static long Count(int len) {
long e1=1, e4=1, N=10;
for (int n=1; n<len; n++) {
e4 += 8*e1;
e1 += e4;
N *= 10;
}
return N - 9*e1 - e4;
}
For both of these, Count(10) returns 872348501.
One easy way to calculate the answer is,
You can fix 14 at a place and count the combination of the remaining numbers right to it,
and do this for all the possible positions where 14 can be place such that the number is still less than 10000000000,Lets take a example,
***14*****,
Here the '*' before 14 can be filled by 900 ways and the * after 14 can be filled by 10^5 ways so total occurrence will be 10^5*(900),
Similarly you can fix 14 at other positions to calculate the result and this solution will be very fast O(10) or simply O(1), while the previous solution was O(N), where N is 10000000000
You can use the fact that in each 1000 (that is from 1 to 1000 and from 1001 to 2000 etc)
the 14 is found: 19 times so when you receive your input divide it by 1000 for example you received 1200 so 1200/1000
the result is 1 and remainder 200, so we have 1 * 19 "14"s and then you can loop over the 200.
you can extend for 10000 (that is count how many "14"s there are in 10000 and fix it to a global variable) and start dividing by 10000 then and apply the equation above, then you divide the remainder by 1000 and apply the equation and add the two results.
You can extend it as the fact that for all hundreds (that is from 1 to 100 and from 201 to 300) the "14" is found only 1 except for the second hundred (101 to 200).

Segmented Aggregation within an Array

I have a large array of primitive value-types. The array is in fact one dimentional, but logically represents a 2-dimensional field. As you read from left to right, the values need to become (the original value of the current cell) + (the result calculated in the cell to the left). Obviously with the exception of the first element of each row which is just the original value.
I already have an implementation which accomplishes this, but is entirely iterative over the entire array and is extremely slow for large (1M+ elements) arrays.
Given the following example array,
0 0 1 0 0
2 0 0 0 3
0 4 1 1 0
0 1 0 4 1
Becomes
0 0 1 1 1
2 2 2 2 5
0 4 5 6 6
0 1 1 5 6
And so forth to the right, up to problematic sizes (1024x1024)
The array needs to be updated (ideally), but another array can be used if necessary. Memory footprint isn't much of an issue here, but performance is critical as these arrays have millions of elements and must be processed hundreds of times per second.
The individual cell calculations do not appear to be parallelizable given their dependence on values starting from the left, so GPU acceleration seems impossible. I have investigated PLINQ but requisite for indices makes it very difficult to implement.
Is there another way to structure the data to make it faster to process?
If efficient GPU processing is feasible using an innovative teqnique, this would be vastly preferable, as this is currently texture data which is having to be pulled from and pushed back to the video card.
Proper coding and a bit of insight in how .NET knows stuff helps as well :-)
Some rules of thumb that apply in this case:
If you can hint the JIT that the indexing will never get out of bounds of the array, it will remove the extra branche.
You should vectorize it only in multiple threads if it's really slow (f.ex. >1 second). Otherwise task switching, cache flushes etc will probably just eat up the added speed and you'll end up worse.
If possible, make memory access predictable, even sequential. If you need another array, so be it - if not, prefer that.
Use as few IL instructions as possible if you want speed. Generally this seems to work.
Test multiple iterations. A single iteration might not be good enough.
using these rules, you can make a small test case as follows. Note that I've upped the stakes to 4Kx4K since 1K is just so fast you cannot measure it :-)
public static void Main(string[] args)
{
int width = 4096;
int height = 4096;
int[] ar = new int[width * height];
Random rnd = new Random(213);
for (int i = 0; i < ar.Length; ++i)
{
ar[i] = rnd.Next(0, 120);
}
// (5)...
for (int j = 0; j < 10; ++j)
{
Stopwatch sw = Stopwatch.StartNew();
int sum = 0;
for (int i = 0; i < ar.Length; ++i) // (3) sequential access
{
if ((i % width) == 0)
{
sum = 0;
}
// (1) --> the JIT will notice this won't go out of bounds because [0<=i<ar.Length]
// (5) --> '+=' is an expression generating a 'dup'; this creates less IL.
ar[i] = (sum += ar[i]);
}
Console.WriteLine("This took {0:0.0000}s", sw.Elapsed.TotalSeconds);
}
Console.ReadLine();
}
One of these iterations wil take roughly 0.0174 sec here, and since this is about 16x the worst case scenario you describe, I suppose your performance problem is solved.
If you really want to parallize it to make it faster, I suppose that is possible, even though you will loose some of the optimizations in the JIT (Specifically: (1)). However, if you have a multi-core system like most people, the benefits might outweight these:
for (int j = 0; j < 10; ++j)
{
Stopwatch sw = Stopwatch.StartNew();
Parallel.For(0, height, (a) =>
{
int sum = 0;
for (var i = width * a + 1; i < width * (a + 1); i++)
{
ar[i] = (sum += ar[i]);
}
});
Console.WriteLine("This took {0:0.0000}s", sw.Elapsed.TotalSeconds);
}
If you really, really need performance, you can compile it to C++ and use P/Invoke. Even if you don't use the GPU, I suppose the SSE/AVX instructions might already give you a significant performance boost that you won't get with .NET/C#. Also I'd like to point out that the Intel C++ compiler can automatically vectorize your code - even to Xeon PHI's. Without a lot of effort, this might give you a nice boost in performance.
Well, I don't know too much about GPU, but I see no reason why you can't parallelize it as the dependencies are only from left to right.
There are no dependencies between rows.
0 0 1 0 0 - process on core1 |
2 0 0 0 3 - process on core1 |
-------------------------------
0 4 1 1 0 - process on core2 |
0 1 0 4 1 - process on core2 |
Although the above statement is not completely true. There's still hidden dependencies between rows when it comes to memory cache.
It's possible that there's going to be cache trashing. You can read about "cache false sharing", in order to understand the problem, and see how to overcome that.
As #Chris Eelmaa told you it is possible to do a parallel execution by row. Using Parallel.For could be rewritten like this:
static int[,] values = new int[,]{
{0, 0, 1, 0, 0},
{2, 0, 0, 0, 3},
{0, 4, 1, 1, 0},
{0, 1, 0, 4 ,1}};
static void Main(string[] args)
{
int rows=values.GetLength(0);
int columns=values.GetLength(1);
Parallel.For(0, rows, (row) =>
{
for (var column = 1; column < columns; column++)
{
values[row, column] += values[row, column - 1];
}
});
for (var row = 0; row < rows; row++)
{
for (var column = 0; column < columns; column++)
{
Console.Write("{0} ", values[row, column]);
}
Console.WriteLine();
}
So, as stated in your question, you have a one dimensional array, the code would be a bit faster:
static void Main(string[] args)
{
var values = new int[1024 * 1024];
Random r = new Random();
for (int i = 0; i < 1024; i++)
{
for (int j = 0; j < 1024; j++)
{
values[i * 1024 + j] = r.Next(25);
}
}
int rows = 1024;
int columns = 1024;
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
Parallel.For(0, rows, (row) =>
{
for (var column = 1; column < columns; column++)
{
values[(row * columns) + column] += values[(row * columns) + column - 1];
}
});
}
Console.WriteLine(sw.Elapsed);
}
But not as fast as a GPU. To use parallel GPU processing you will have to rewrite it in C++ AMP or take a look on how to port this parallel for to cudafy: http://w8isms.blogspot.com.es/2012/09/cudafy-me-part-3-of-4.html
You may as well store the array as a jagged array, the memory layout will be the same. So, instead of,
int[] texture;
you have,
int[][] texture;
Isolate the row operation as,
private static Task ProcessRow(int[] row)
{
var v = row[0];
for (var i = 1; i < row.Length; i++)
{
v = row[i] += v;
}
return Task.FromResult(true);
}
then you can write a function that does,
Task.WhenAll(texture.Select(ProcessRow)).Wait();
If you want to remain with a 1-dimensional array, a similar approach will work, just change ProcessRow.
private static Task ProcessRow(int[] texture, int start, int limit)
{
var v = texture[start];
for (var i = start + 1; i < limit; i++)
{
v = texture[i] += v;
}
return Task.FromResult(true);
}
then once,
var rowSize = 1024;
var rows =
Enumerable.Range(0, texture.Length / rowSize)
.Select(i => Tuple.Create(i * rowSize, (i * rowSize) + rowSize))
.ToArray();
then on each cycle.
Task.WhenAll(rows.Select(t => ProcessRow(texture, t.Item1, t.Item2)).Wait();
Either way, each row is processed in parallel.

Categories