Given an array of size n and k, how do you find the maximum for every contiguous subarray of size k?
For example
arr = 1 5 2 6 3 1 24 7
k = 3
ans = 5 6 6 6 24 24
I was thinking of having an array of size k and each step evict the last element out and add the new element and find maximum among that. It leads to a running time of O(nk). Is there a better way to do this?
You have heard about doing it in O(n) using dequeue.
Well that is a well known algorithm for this question to do in O(n).
The method i am telling is quite simple and has time complexity O(n).
Your Sample Input:
n=10 , W = 3
10 3
1 -2 5 6 0 9 8 -1 2 0
Answer = 5 6 6 9 9 9 8 2
Concept: Dynamic Programming
Algorithm:
N is number of elements in an array and W is window size. So, Window number = N-W+1
Now divide array into blocks of W starting from index 1.
Here divide into blocks of size 'W'=3.
For your sample input:
We have divided into blocks because we will calculate maximum in 2 ways A.) by traversing from left to right B.) by traversing from right to left.
but how ??
Firstly, Traversing from Left to Right. For each element ai in block we will find maximum till that element ai starting from START of Block to END of that block.
So here,
Secondly, Traversing from Right to Left. For each element 'ai' in block we will find maximum till that element 'ai' starting from END of Block to START of that block.
So Here,
Now we have to find maximum for each subarray or window of size 'W'.
So, starting from index = 1 to index = N-W+1 .
max_val[index] = max(RL[index], LR[index+w-1]);
for index=1: max_val[1] = max(RL[1],LR[3]) = max(5,5)= 5
Simliarly, for all index i, (i<=(n-k+1)), value at RL[i] and LR[i+w-1]
are compared and maximum among those two is answer for that subarray.
So Final Answer : 5 6 6 9 9 9 8 2
Time Complexity: O(n)
Implementation code:
#include <iostream>
#include <cstdio>
#include <cstring>
#include <algorithm>
#define LIM 100001
using namespace std;
int arr[LIM]; // Input Array
int LR[LIM]; // maximum from Left to Right
int RL[LIM]; // maximum from Right to left
int max_val[LIM]; // number of subarrays(windows) will be n-k+1
int main(){
int n, w, i, k; // 'n' is number of elements in array
// 'w' is Window's Size
cin >> n >> w;
k = n - w + 1; // 'K' is number of Windows
for(i = 1; i <= n; i++)
cin >> arr[i];
for(i = 1; i <= n; i++){ // for maximum Left to Right
if(i % w == 1) // that means START of a block
LR[i] = arr[i];
else
LR[i] = max(LR[i - 1], arr[i]);
}
for(i = n; i >= 1; i--){ // for maximum Right to Left
if(i == n) // Maybe the last block is not of size 'W'.
RL[i] = arr[i];
else if(i % w == 0) // that means END of a block
RL[i] = arr[i];
else
RL[i] = max(RL[i+1], arr[i]);
}
for(i = 1; i <= k; i++) // maximum
max_val[i] = max(RL[i], LR[i + w - 1]);
for(i = 1; i <= k ; i++)
cout << max_val[i] << " ";
cout << endl;
return 0;
}
Running Code Link
I'll try to proof: (by #johnchen902)
If k % w != 1 (k is not the begin of a block)
Let k* = The begin of block containing k
ans[k] = max( arr[k], arr[k + 1], arr[k + 2], ..., arr[k + w - 1])
= max( max( arr[k], arr[k + 1], arr[k + 2], ..., arr[k*]),
max( arr[k*], arr[k* + 1], arr[k* + 2], ..., arr[k + w - 1]) )
= max( RL[k], LR[k+w-1] )
Otherwise (k is the begin of a block)
ans[k] = max( arr[k], arr[k + 1], arr[k + 2], ..., arr[k + w - 1])
= RL[k] = LR[k+w-1]
= max( RL[k], LR[k+w-1] )
Dynamic programming approach is very neatly explained by Shashank Jain. I would like to explain how to do the same using dequeue.
The key is to maintain the max element at the top of the queue(for a window ) and discarding the useless elements and we also need to discard the elements that are out of index of current window.
useless elements = If Current element is greater than the last element of queue than the last element of queue is useless .
Note : We are storing the index in queue not the element itself. It will be more clear from the code itself.
1. If Current element is greater than the last element of queue than the last element of queue is useless . We need to delete that last element.
(and keep deleting until the last element of queue is smaller than current element).
2. If if current_index - k >= q.front() that means we are going out of window so we need to delete the element from front of queue.
vector<int> max_sub_deque(vector<int> &A,int k)
{
deque<int> q;
for(int i=0;i<k;i++)
{
while(!q.empty() && A[i] >= A[q.back()])
q.pop_back();
q.push_back(i);
}
vector<int> res;
for(int i=k;i<A.size();i++)
{
res.push_back(A[q.front()]);
while(!q.empty() && A[i] >= A[q.back()] )
q.pop_back();
while(!q.empty() && q.front() <= i-k)
q.pop_front();
q.push_back(i);
}
res.push_back(A[q.front()]);
return res;
}
Since each element is enqueued and dequeued atmost 1 time to time complexity is O(n+n) = O(2n) = O(n).
And the size of queue can not exceed the limit k . so space complexity = O(k).
An O(n) time solution is possible by combining the two classic interview questions:
Make a stack data-structure (called MaxStack) which supports push, pop and max in O(1) time.
This can be done using two stacks, the second one contains the minimum seen so far.
Model a queue with a stack.
This can done using two stacks. Enqueues go into one stack, and dequeues come from the other.
For this problem, we basically need a queue, which supports enqueue, dequeue and max in O(1) (amortized) time.
We combine the above two, by modelling a queue with two MaxStacks.
To solve the question, we queue k elements, query the max, dequeue, enqueue k+1 th element, query the max etc. This will give you the max for every k sized sub-array.
I believe there are other solutions too.
1)
I believe the queue idea can be simplified. We maintain a queue and a max for every k. We enqueue a new element, and dequeu all elements which are not greater than the new element.
2) Maintain two new arrays which maintain the running max for each block of k, one array for one direction (left to right/right to left).
3) Use a hammer: Preprocess in O(n) time for range maximum queries.
The 1) solution above might be the most optimal.
You need a fast data structure that can add, remove and query for the max element in less than O(n) time (you can just use an array if O(n) or O(nlogn) is acceptable). You can use a heap, a balanced binary search tree, a skip list, or any other sorted data structure that performs these operations in O(log(n)).
The good news is that most popular languages have a sorted data structure implemented that supports these operations for you. C++ has std::set and std::multiset (you probably need the latter) and Java has PriorityQueue and TreeSet.
Here is the java implementation
public static Integer[] maxsInEveryWindows(int[] arr, int k) {
Deque<Integer> deque = new ArrayDeque<Integer>();
/* Process first k (or first window) elements of array */
for (int i = 0; i < k; i++) {
// For very element, the previous smaller elements are useless so
// remove them from deque
while (!deque.isEmpty() && arr[i] >= arr[deque.peekLast()]) {
deque.removeLast(); // Remove from rear
}
// Add new element at rear of queue
deque.addLast(i);
}
List<Integer> result = new ArrayList<Integer>();
// Process rest of the elements, i.e., from arr[k] to arr[n-1]
for (int i = k; i < arr.length; i++) {
// The element at the front of the queue is the largest element of
// previous window, so add to result.
result.add(arr[deque.getFirst()]);
// Remove all elements smaller than the currently
// being added element (remove useless elements)
while (!deque.isEmpty() && arr[i] >= arr[deque.peekLast()]) {
deque.removeLast();
}
// Remove the elements which are out of this window
while (!deque.isEmpty() && deque.getFirst() <= i - k) {
deque.removeFirst();
}
// Add current element at the rear of deque
deque.addLast(i);
}
// Print the maximum element of last window
result.add(arr[deque.getFirst()]);
return result.toArray(new Integer[0]);
}
Here is the corresponding test case
#Test
public void maxsInWindowsOfSizeKTest() {
Integer[] result = ArrayUtils.maxsInEveryWindows(new int[]{1, 2, 3, 1, 4, 5, 2, 3, 6}, 3);
assertThat(result, equalTo(new Integer[]{3, 3, 4, 5, 5, 5, 6}));
result = ArrayUtils.maxsInEveryWindows(new int[]{8, 5, 10, 7, 9, 4, 15, 12, 90, 13}, 4);
assertThat(result, equalTo(new Integer[]{10, 10, 10, 15, 15, 90, 90}));
}
Using a heap (or tree), you should be able to do it in O(n * log(k)). I'm not sure if this would be indeed better.
here is the Python implementation in O(1)...Thanks to #Shahshank Jain in advance..
from sys import stdin,stdout
from operator import *
n,w=map(int , stdin.readline().strip().split())
Arr=list(map(int , stdin.readline().strip().split()))
k=n-w+1 # window size = k
leftA=[0]*n
rightA=[0]*n
result=[0]*k
for i in range(n):
if i%w==0:
leftA[i]=Arr[i]
else:
leftA[i]=max(Arr[i],leftA[i-1])
for i in range(n-1,-1,-1):
if i%w==(w-1) or i==n-1:
rightA[i]=Arr[i]
else:
rightA[i]=max(Arr[i],rightA[i+1])
for i in range(k):
result[i]=max(rightA[i],leftA[i+w-1])
print(*result,sep=' ')
Method 1: O(n) time, O(k) space
We use a deque (it is like a list but with constant-time insertion and deletion from both ends) to store the index of useful elements.
The index of the current max is kept at the leftmost element of deque. The rightmost element of deque is the smallest.
In the following, for easier explanation we say an element from the array is in the deque, while in fact the index of that element is in the deque.
Let's say {5, 3, 2} are already in the deque (again, if fact their indexes are).
If the next element we read from the array is bigger than 5 (remember, the leftmost element of deque holds the max), say 7: We delete the deque and create a new one with only 7 in it (we do this because the current elements are useless, we have found a new max).
If the next element is less than 2 (which is the smallest element of deque), say 1: We add it to the right ({5, 3, 2, 1})
If the next element is bigger than 2 but less than 5, say 4: We remove elements from right that are smaller than the element and then add the element from right ({5, 4}).
Also we keep elements of the current window only (we can do this in constant time because we are storing the indexes instead of elements).
from collections import deque
def max_subarray(array, k):
deq = deque()
for index, item in enumerate(array):
if len(deq) == 0:
deq.append(index)
elif index - deq[0] >= k: # the max element is out of the window
deq.popleft()
elif item > array[deq[0]]: # found a new max
deq = deque()
deq.append(index)
elif item < array[deq[-1]]: # the array item is smaller than all the deque elements
deq.append(index)
elif item > array[deq[-1]] and item < array[deq[0]]:
while item > array[deq[-1]]:
deq.pop()
deq.append(index)
if index >= k - 1: # start printing when the first window is filled
print(array[deq[0]])
Proof of O(n) time: The only part we need to check is the while loop. In the whole runtime of the code, the while loop can perform at most O(n) operations in total. The reason is that the while loop pops elements from the deque, and since in other parts of the code, we do at most O(n) insertions into the deque, the while loop cannot exceed O(n) operations in total. So the total runtime is O(n) + O(n) = O(n)
Method 2: O(n) time, O(n) space
This is the explanation of the method suggested by S Jain (as mentioned in the comments of his post, this method doesn't work with data streams, which most sliding window questions are designed for).
The reason that method works is explained using the following example:
array = [5, 6, 2, 3, 1, 4, 2, 3]
k = 4
[5, 6, 2, 3 1, 4, 2, 3 ]
LR: 5 6 6 6 1 4 4 4
RL: 6 6 3 3 4 4 3 3
6 6 4 4 4
To get the max for the window [2, 3, 1, 4],
we can get the max of [2, 3] and max of [1, 4], and return the bigger of the two.
Max of [2, 3] is calculated in the RL pass and max of [1, 4] is calculated in LR pass.
Using Fibonacci heap, you can do it in O(n + (n-k) log k), which is equal to O(n log k) for small k, for k close to n this becomes O(n).
The algorithm: in fact, you need:
n inserts to the heap
n-k deletions
n-k findmax's
How much these operations cost in Fibonacci heaps? Insert and findmax is O(1) amortized, deletion is O(log n) amortized. So, we have
O(n + (n-k) log k + (n-k)) = O(n + (n-k) log k)
Sorry, this should have been a comment but I am not allowed to comment for now.
#leo and #Clay Goddard
You can save yourselves from re-computing the maximum by storing both maximum and 2nd maximum of the window in the beginning
(2nd maximum will be the maximum only if there are two maximums in the initial window). If the maximum slides out of the window you still have the next best candidate to compare with the new entry. So you get O(n) , otherwise if you allowed the whole re-computation again the worst case order would be O(nk), k is the window size.
class MaxFinder
{
// finds the max and its index
static int[] findMaxByIteration(int arr[], int start, int end)
{
int max, max_ndx;
max = arr[start];
max_ndx = start;
for (int i=start; i<end; i++)
{
if (arr[i] > max)
{
max = arr[i];
max_ndx = i;
}
}
int result[] = {max, max_ndx};
return result;
}
// optimized to skip iteration, when previous windows max element
// is present in current window
static void optimizedPrintKMax(int arr[], int n, int k)
{
int i, j, max, max_ndx;
// for first window - find by iteration.
int result[] = findMaxByIteration(arr, 0, k);
System.out.printf("%d ", result[0]);
max = result[0];
max_ndx = result[1];
for (j=1; j <= (n-k); j++)
{
// if previous max has fallen out of current window, iterate and find
if (max_ndx < j)
{
result = findMaxByIteration(arr, j, j+k);
max = result[0];
max_ndx = result[1];
}
// optimized path, just compare max with new_elem that has come into the window
else
{
int new_elem_ndx = j + (k-1);
if (arr[new_elem_ndx] > max)
{
max = arr[new_elem_ndx];
max_ndx = new_elem_ndx;
}
}
System.out.printf("%d ", max);
}
}
public static void main(String[] args)
{
int arr[] = {10, 9, 8, 7, 6, 5, 4, 3, 2, 1};
//int arr[] = {1,5,2,6,3,1,24,7};
int n = arr.length;
int k = 3;
optimizedPrintKMax(arr, n, k);
}
}
package com;
public class SlidingWindow {
public static void main(String[] args) {
int[] array = { 1, 5, 2, 6, 3, 1, 24, 7 };
int slide = 3;//say
List<Integer> result = new ArrayList<Integer>();
for (int i = 0; i < array.length - (slide-1); i++) {
result.add(getMax(array, i, slide));
}
System.out.println("MaxList->>>>" + result.toString());
}
private static Integer getMax(int[] array, int i, int slide) {
List<Integer> intermediate = new ArrayList<Integer>();
System.out.println("Initial::" + intermediate.size());
while (intermediate.size() < slide) {
intermediate.add(array[i]);
i++;
}
Collections.sort(intermediate);
return intermediate.get(slide - 1);
}
}
Here is the solution in O(n) time complexity with auxiliary deque
public class TestSlidingWindow {
public static void main(String[] args) {
int[] arr = { 1, 5, 7, 2, 1, 3, 4 };
int k = 3;
printMaxInSlidingWindow(arr, k);
}
public static void printMaxInSlidingWindow(int[] arr, int k) {
Deque<Integer> queue = new ArrayDeque<Integer>();
Deque<Integer> auxQueue = new ArrayDeque<Integer>();
int[] resultArr = new int[(arr.length - k) + 1];
int maxElement = 0;
int j = 0;
for (int i = 0; i < arr.length; i++) {
queue.add(arr[i]);
if (arr[i] > maxElement) {
maxElement = arr[i];
}
/** we need to maintain the auxiliary deque to maintain max element in case max element is removed.
We add the element to deque straight away if subsequent element is less than the last element
(as there is a probability if last element is removed this element can be max element) otherwise
remove all lesser element then insert current element **/
if (auxQueue.size() > 0) {
if (arr[i] < auxQueue.peek()) {
auxQueue.push(arr[i]);
} else {
while (auxQueue.size() > 0 && (arr[i] > auxQueue.peek())) {
auxQueue.pollLast();
}
auxQueue.push(arr[i]);
}
}else {
auxQueue.push(arr[i]);
}
if (queue.size() > 3) {
int removedEl = queue.removeFirst();
if (maxElement == removedEl) {
maxElement = auxQueue.pollFirst();
}
}
if (queue.size() == 3) {
resultArr[j++] = maxElement;
}
}
for (int i = 0; i < resultArr.length; i++) {
System.out.println(resultArr[i]);
}
}
}
static void countDistinct(int arr[], int n, int k)
{
System.out.print("\nMaximum integer in the window : ");
// Traverse through every window
for (int i = 0; i <= n - k; i++) {
System.out.print(findMaximuminAllWindow(Arrays.copyOfRange(arr, i, arr.length), k)+ " ");
}
}
private static int findMaximuminAllWindow(int[] win, int k) {
// TODO Auto-generated method stub
int max= Integer.MIN_VALUE;
for(int i=0; i<k;i++) {
if(win[i]>max)
max=win[i];
}
return max;
}
arr = 1 5 2 6 3 1 24 7
We have to find the maximum of subarray, Right?
So, What is meant by subarray?
SubArray = Partial set and it should be in order and contiguous.
From the above array
{1,5,2} {6,3,1} {1,24,7} all are the subarray examples
n = 8 // Array length
k = 3 // window size
For finding the maximum, we have to iterate through the array, and find the maximum.
From the window size k,
{1,5,2} = 5 is the maximum
{5,2,6} = 6 is the maximum
{2,6,3} = 6 is the maximum
and so on..
ans = 5 6 6 6 24 24
It can be evaluated as the n-k+1
Hence, 8-3+1 = 6
And the length of an answer is 6 as we seen.
How can we solve this now?
When the data is moving from the pipe, the first thought for the data structure came in mind is the Queue
But, rather we are not discussing much here, we directly jump on the deque
Thinking Would be:
Window is fixed and data is in and out
Data is fixed and window is sliding
EX: Time series database
While (Queue is not empty and arr[Queue.back() < arr[i]] {
Queue.pop_back();
Queue.push_back();
For the rest:
Print the front of queue
// purged expired element
While (queue not empty and queue.front() <= I-k) {
Queue.pop_front();
While (Queue is not empty and arr[Queue.back() < arr[i]] {
Queue.pop_back();
Queue.push_back();
}
}
arr = [1, 2, 3, 1, 4, 5, 2, 3, 6]
k = 3
for i in range(len(arr)-k):
k=k+1
print (max(arr[i:k]),end=' ') #3 3 4 5 5 5 6
Two approaches.
Segment Tree O(nlog(n-k))
Build a maximum segment-tree.
Query between [i, i+k)
Something like..
public static void printMaximums(int[] a, int k) {
int n = a.length;
SegmentTree tree = new SegmentTree(a);
for (int i=0; i<=n-k; i++) System.out.print(tree.query(i, i+k));
}
Deque O(n)
If the next element is greater than the rear element, remove the rear element.
If the element in the front of the deque is out of the window, remove the front element.
public static void printMaximums(int[] a, int k) {
int n = a.length;
Deque<int[]> deck = new ArrayDeque<>();
List<Integer> result = new ArrayList<>();
for (int i=0; i<n; i++) {
while (!deck.isEmpty() && a[i] >= deck.peekLast()[0]) deck.pollLast();
deck.offer(new int[] {a[i], i});
while (!deck.isEmpty() && deck.peekFirst()[1] <= i - k) deck.pollFirst();
if (i >= k - 1) result.add(deck.peekFirst()[0]);
}
System.out.println(result);
}
Here is an optimized version of the naive (conditional) nested loop approach I came up with which is much faster and doesn't require any auxiliary storage or data structure.
As the program moves from window to window, the start index and end index moves forward by 1. In other words, two consecutive windows have adjacent start and end indices.
For the first window of size W , the inner loop finds the maximum of elements with index (0 to W-1). (Hence i == 0 in the if in 4th line of the code).
Now instead of computing for the second window which only has one new element, since we have already computed the maximum for elements of indices 0 to W-1, we only need to compare this maximum to the only new element in the new window with the index W.
But if the element at 0 was the maximum which is the only element not part of the new window, we need to compute the maximum using the inner loop from 1 to W again using the inner loop (hence the second condition maxm == arr[i-1] in the if in line 4), otherwise just compare the maximum of the previous window and the only new element in the new window.
void print_max_for_each_subarray(int arr[], int n, int k)
{
int maxm;
for(int i = 0; i < n - k + 1 ; i++)
{
if(i == 0 || maxm == arr[i-1]) {
maxm = arr[i];
for(int j = i+1; j < i+k; j++)
if(maxm < arr[j]) maxm = arr[j];
}
else {
maxm = maxm < arr[i+k-1] ? arr[i+k-1] : maxm;
}
cout << maxm << ' ';
}
cout << '\n';
}
You can use Deque data structure to implement this. Deque has an unique facility that you can insert and remove elements from both the ends of the queue unlike the traditional queue where you can only insert from one end and remove from other.
Following is the code for the above problem.
public int[] maxSlidingWindow(int[] nums, int k) {
int n = nums.length;
int[] maxInWindow = new int[n - k + 1];
Deque<Integer> dq = new LinkedList<Integer>();
int i = 0;
for(; i<k; i++){
while(!dq.isEmpty() && nums[dq.peekLast()] <= nums[i]){
dq.removeLast();
}
dq.addLast(i);
}
for(; i <n; i++){
maxInWindow[i - k] = nums[dq.peekFirst()];
while(!dq.isEmpty() && dq.peekFirst() <= i - k){
dq.removeFirst();
}
while(!dq.isEmpty() && nums[dq.peekLast()] <= nums[i]){
dq.removeLast();
}
dq.addLast(i);
}
maxInWindow[i - k] = nums[dq.peekFirst()];
return maxInWindow;
}
the resultant array will have n - k + 1 elements where n is length of the given array, k is the given window size.
We can solve it using the Python , applying the slicing.
def sliding_window(a,k,n):
max_val =[]
val =[]
val1=[]
for i in range(n-k-1):
if i==0:
val = a[0:k+1]
print("The value in val variable",val)
val1 = max(val)
max_val.append(val1)
else:
val = a[i:i*k+1]
val1 =max(val)
max_val.append(val1)
return max_val
Driver Code
a = [15,2,3,4,5,6,2,4,9,1,5]
n = len(a)
k = 3
sl=s liding_window(a,k,n)
print(sl)
Create a TreeMap of size k. Put first k elements as keys in it and assign any value like 1(doesn't matter). TreeMap has the property to sort the elements based on key so now, first element in map will be min and last element will be max element. Then remove 1 element from the map whose index in the arr is i-k. Here, I have considered that Input elements are taken in array arr and from that array we are filling the map of size k. Since, we can't do anything with sorting happening inside TreeMap, therefore this approach will also take O(n) time.
100% working Tested (Swift)
func maxOfSubArray(arr:[Int],n:Int,k:Int)->[Int]{
var lenght = arr.count
var resultArray = [Int]()
for i in 0..<arr.count{
if lenght+1 > k{
let tempArray = Array(arr[i..<k+i])
resultArray.append(tempArray.max()!)
}
lenght = lenght - 1
}
print(resultArray)
return resultArray
}
This way we can use:
maxOfSubArray(arr: [1,2,3,1,4,5,2,3,6], n: 9, k: 3)
Result:
[3, 3, 4, 5, 5, 5, 6]
Just notice that you only have to find in the new window if:
* The new element in the window is smaller than the previous one (if it's bigger, it's for sure this one).
OR
* The element that just popped out of the window was the current bigger.
In this case, re-scan the window.
for how big k? for reasonable-sized k. you can create k k-sized buffers and just iterate over the array keeping track of max element pointers in the buffers - needs no data structures and is O(n) k^2 pre-allocation.
A complete working solution in Amortised Constant O(1) Complexity.
https://github.com/varoonverma/code-challenge.git
Compare the first k elements and find the max, this is your first number
then compare the next element to the previous max. If the next element is bigger, that is your max of the next subarray, if its equal or smaller, the max for that sub array is the same
then move on to the next number
max(1 5 2) = 5
max(5 6) = 6
max(6 6) = 6
... and so on
max(3 24) = 24
max(24 7) = 24
It's only slightly better than your answer
i tried this code but it takes so long and I can not get the result
public long getCounter([FromBody]object req)
{
JObject param = Utility.GetRequestParameter(req);
long input = long.Parse(param["input"].ToString());
long counter = 0;
for (long i = 14; i <= input; i++)
{
string s = i.ToString();
if (s.Contains("14"))
{
counter += 1;
}
}
return counter;
}
please help
We can examine all non-negative numbers < 10^10. Every such number can be represented with the sequence of 10 digits (with leading zeroes allowed).
How many numbers include 14
Dynamic programming solution. Let's find the number of sequences of a specific length that ends with the specific digit and contains (or not) subsequence 14:
F(len, digit, 0) is the number of sequences of length len that ends with digit and do not contain 14, F(len, digit, 1) is the number of such sequences that contain 14. Initially F(0, 0, 0) = 1. The result is the sum of all F(10, digit, 1).
C++ code to play with: https://ideone.com/2aS17v. The answer seems to be 872348501.
How many times the numbers include 14
First, let's place 14 at the end of the sequence:
????????14
Every '?' can be replaced with any digit from 0 to 9. Thus, there are 10^8 numbers in the interval that contains 14 at the end. Then consider ???????14?, ??????14??, ..., 14???????? numbers. There are 9 possible locations of 14 sequence. The answer is 10^8 * 9 = 90000000.
[Added by Matthew Watson]
Here's the C# version of the C++ implementation; it runs in less than 100ms:
using System;
namespace Demo
{
public static class Program
{
public static void Main(string[] args)
{
const int M = 10;
int[,,] f = new int [M + 1, 10, 2];
f[0, 0, 0] = 1;
for (int len = 1; len <= M; ++len)
{
for (int d = 0; d <= 9; ++d)
{
for (int j = 0; j <= 9; ++j)
{
f[len,d,0] += f[len - 1,j,0];
f[len,d,1] += f[len - 1,j,1];
}
}
f[len,4,0] -= f[len - 1,1,0];
f[len,4,1] += f[len - 1,1,0];
}
int sum = 0;
for (int i = 0; i <= 9; ++i)
sum += f[M,i,1];
Console.WriteLine(sum); // 872,348,501
}
}
}
If you want a brute force solution it could be something like this (please, notice, that we should avoid time consuming string operations like ToString, Contains):
int count = 0;
// Let's use all CPU's cores: Parallel.For
Parallel.For(0L, 10000000000L, (v) => {
for (long x = v; x > 10; x /= 10) {
// Get rid of ToString and Contains here
if (x % 100 == 14) {
Interlocked.Increment(ref count); // We want an atomic (thread safe) operation
break;
}
}
});
Console.Write(count);
It returns 872348501 within 6 min (Core i7 with 4 cores at 3.2GHz)
UPDATE
My parallel code calculated the result as 872,348,501 in 9 minutes on my 8- processor-core Intel Core I7 PC.
(There is a much better solution above that takes less than 100ms, but I shall leave this answer here since it provides corroborating evidence for the fast answer.)
You can use multiple threads (one per processor core) to reduce the calculation time.
At first I thought that I could use AsParallel() to speed this up - however, it turns out that you can't use AsParallel() on sequences with more than 2^31 items.
(For completeness I'm including my faulty implementation using AsParallel at the end of this answer).
Instead, I've written some custom code to break the problem down into a number of chunks equal to the number of processors:
using System;
using System.Linq;
using System.Threading.Tasks;
namespace Demo
{
class Program
{
static void Main()
{
int numProcessors = Environment.ProcessorCount;
Task<long>[] results = new Task<long>[numProcessors];
long count = 10000000000;
long elementsPerProcessor = count / numProcessors;
for (int i = 0; i < numProcessors; ++i)
{
long end;
long start = i * elementsPerProcessor;
if (i != (numProcessors - 1))
end = start + elementsPerProcessor;
else // Last thread - go right up to the last element.
end = count;
results[i] = Task.Run(() => processElements(start, end));
}
long sum = results.Select(r => r.Result).Sum();
Console.WriteLine(sum);
}
static long processElements(long inclusiveStart, long exclusiveEnd)
{
long total = 0;
for (long i = inclusiveStart; i < exclusiveEnd; ++i)
if (i.ToString().Contains("14"))
++total;
return total;
}
}
}
The following code does NOT work because AsParallel() doesn't work on sequences with more than 2^31 items.
static void Main(string[] args)
{
var numbersContaining14 =
from number in numbers(0, 100000000000).AsParallel()
where number.ToString().Contains("14")
select number;
Console.WriteLine(numbersContaining14.LongCount());
}
static IEnumerable<long> numbers(long first, long count)
{
for (long i = first, last = first + count; i < last; ++i)
yield return i;
}
You compute the count of numbers of a given length ending in 1, 4 or something else that don't contain 14. Then you can extend the length by 1.
Then the count of numbers that do contain 14 is the count of all numbers minus those that don't contain a 14.
private static long Count(int len) {
long e1=0, e4=0, eo=1;
long N=1;
for (int n=0; n<len; n++) {
long ne1 = e4+e1+eo, ne4 = e4+eo, neo = 8*(e1+e4+eo);
e1 = ne1; e4 = ne4; eo = neo;
N *= 10;
}
return N - e1 - e4 - eo;
}
You can reduce this code a little, noting that eo = 8*e1 except for the first iteration, and then avoiding the local variables.
private static long Count(int len) {
long e1=1, e4=1, N=10;
for (int n=1; n<len; n++) {
e4 += 8*e1;
e1 += e4;
N *= 10;
}
return N - 9*e1 - e4;
}
For both of these, Count(10) returns 872348501.
One easy way to calculate the answer is,
You can fix 14 at a place and count the combination of the remaining numbers right to it,
and do this for all the possible positions where 14 can be place such that the number is still less than 10000000000,Lets take a example,
***14*****,
Here the '*' before 14 can be filled by 900 ways and the * after 14 can be filled by 10^5 ways so total occurrence will be 10^5*(900),
Similarly you can fix 14 at other positions to calculate the result and this solution will be very fast O(10) or simply O(1), while the previous solution was O(N), where N is 10000000000
You can use the fact that in each 1000 (that is from 1 to 1000 and from 1001 to 2000 etc)
the 14 is found: 19 times so when you receive your input divide it by 1000 for example you received 1200 so 1200/1000
the result is 1 and remainder 200, so we have 1 * 19 "14"s and then you can loop over the 200.
you can extend for 10000 (that is count how many "14"s there are in 10000 and fix it to a global variable) and start dividing by 10000 then and apply the equation above, then you divide the remainder by 1000 and apply the equation and add the two results.
You can extend it as the fact that for all hundreds (that is from 1 to 100 and from 201 to 300) the "14" is found only 1 except for the second hundred (101 to 200).
I have a large array of primitive value-types. The array is in fact one dimentional, but logically represents a 2-dimensional field. As you read from left to right, the values need to become (the original value of the current cell) + (the result calculated in the cell to the left). Obviously with the exception of the first element of each row which is just the original value.
I already have an implementation which accomplishes this, but is entirely iterative over the entire array and is extremely slow for large (1M+ elements) arrays.
Given the following example array,
0 0 1 0 0
2 0 0 0 3
0 4 1 1 0
0 1 0 4 1
Becomes
0 0 1 1 1
2 2 2 2 5
0 4 5 6 6
0 1 1 5 6
And so forth to the right, up to problematic sizes (1024x1024)
The array needs to be updated (ideally), but another array can be used if necessary. Memory footprint isn't much of an issue here, but performance is critical as these arrays have millions of elements and must be processed hundreds of times per second.
The individual cell calculations do not appear to be parallelizable given their dependence on values starting from the left, so GPU acceleration seems impossible. I have investigated PLINQ but requisite for indices makes it very difficult to implement.
Is there another way to structure the data to make it faster to process?
If efficient GPU processing is feasible using an innovative teqnique, this would be vastly preferable, as this is currently texture data which is having to be pulled from and pushed back to the video card.
Proper coding and a bit of insight in how .NET knows stuff helps as well :-)
Some rules of thumb that apply in this case:
If you can hint the JIT that the indexing will never get out of bounds of the array, it will remove the extra branche.
You should vectorize it only in multiple threads if it's really slow (f.ex. >1 second). Otherwise task switching, cache flushes etc will probably just eat up the added speed and you'll end up worse.
If possible, make memory access predictable, even sequential. If you need another array, so be it - if not, prefer that.
Use as few IL instructions as possible if you want speed. Generally this seems to work.
Test multiple iterations. A single iteration might not be good enough.
using these rules, you can make a small test case as follows. Note that I've upped the stakes to 4Kx4K since 1K is just so fast you cannot measure it :-)
public static void Main(string[] args)
{
int width = 4096;
int height = 4096;
int[] ar = new int[width * height];
Random rnd = new Random(213);
for (int i = 0; i < ar.Length; ++i)
{
ar[i] = rnd.Next(0, 120);
}
// (5)...
for (int j = 0; j < 10; ++j)
{
Stopwatch sw = Stopwatch.StartNew();
int sum = 0;
for (int i = 0; i < ar.Length; ++i) // (3) sequential access
{
if ((i % width) == 0)
{
sum = 0;
}
// (1) --> the JIT will notice this won't go out of bounds because [0<=i<ar.Length]
// (5) --> '+=' is an expression generating a 'dup'; this creates less IL.
ar[i] = (sum += ar[i]);
}
Console.WriteLine("This took {0:0.0000}s", sw.Elapsed.TotalSeconds);
}
Console.ReadLine();
}
One of these iterations wil take roughly 0.0174 sec here, and since this is about 16x the worst case scenario you describe, I suppose your performance problem is solved.
If you really want to parallize it to make it faster, I suppose that is possible, even though you will loose some of the optimizations in the JIT (Specifically: (1)). However, if you have a multi-core system like most people, the benefits might outweight these:
for (int j = 0; j < 10; ++j)
{
Stopwatch sw = Stopwatch.StartNew();
Parallel.For(0, height, (a) =>
{
int sum = 0;
for (var i = width * a + 1; i < width * (a + 1); i++)
{
ar[i] = (sum += ar[i]);
}
});
Console.WriteLine("This took {0:0.0000}s", sw.Elapsed.TotalSeconds);
}
If you really, really need performance, you can compile it to C++ and use P/Invoke. Even if you don't use the GPU, I suppose the SSE/AVX instructions might already give you a significant performance boost that you won't get with .NET/C#. Also I'd like to point out that the Intel C++ compiler can automatically vectorize your code - even to Xeon PHI's. Without a lot of effort, this might give you a nice boost in performance.
Well, I don't know too much about GPU, but I see no reason why you can't parallelize it as the dependencies are only from left to right.
There are no dependencies between rows.
0 0 1 0 0 - process on core1 |
2 0 0 0 3 - process on core1 |
-------------------------------
0 4 1 1 0 - process on core2 |
0 1 0 4 1 - process on core2 |
Although the above statement is not completely true. There's still hidden dependencies between rows when it comes to memory cache.
It's possible that there's going to be cache trashing. You can read about "cache false sharing", in order to understand the problem, and see how to overcome that.
As #Chris Eelmaa told you it is possible to do a parallel execution by row. Using Parallel.For could be rewritten like this:
static int[,] values = new int[,]{
{0, 0, 1, 0, 0},
{2, 0, 0, 0, 3},
{0, 4, 1, 1, 0},
{0, 1, 0, 4 ,1}};
static void Main(string[] args)
{
int rows=values.GetLength(0);
int columns=values.GetLength(1);
Parallel.For(0, rows, (row) =>
{
for (var column = 1; column < columns; column++)
{
values[row, column] += values[row, column - 1];
}
});
for (var row = 0; row < rows; row++)
{
for (var column = 0; column < columns; column++)
{
Console.Write("{0} ", values[row, column]);
}
Console.WriteLine();
}
So, as stated in your question, you have a one dimensional array, the code would be a bit faster:
static void Main(string[] args)
{
var values = new int[1024 * 1024];
Random r = new Random();
for (int i = 0; i < 1024; i++)
{
for (int j = 0; j < 1024; j++)
{
values[i * 1024 + j] = r.Next(25);
}
}
int rows = 1024;
int columns = 1024;
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
Parallel.For(0, rows, (row) =>
{
for (var column = 1; column < columns; column++)
{
values[(row * columns) + column] += values[(row * columns) + column - 1];
}
});
}
Console.WriteLine(sw.Elapsed);
}
But not as fast as a GPU. To use parallel GPU processing you will have to rewrite it in C++ AMP or take a look on how to port this parallel for to cudafy: http://w8isms.blogspot.com.es/2012/09/cudafy-me-part-3-of-4.html
You may as well store the array as a jagged array, the memory layout will be the same. So, instead of,
int[] texture;
you have,
int[][] texture;
Isolate the row operation as,
private static Task ProcessRow(int[] row)
{
var v = row[0];
for (var i = 1; i < row.Length; i++)
{
v = row[i] += v;
}
return Task.FromResult(true);
}
then you can write a function that does,
Task.WhenAll(texture.Select(ProcessRow)).Wait();
If you want to remain with a 1-dimensional array, a similar approach will work, just change ProcessRow.
private static Task ProcessRow(int[] texture, int start, int limit)
{
var v = texture[start];
for (var i = start + 1; i < limit; i++)
{
v = texture[i] += v;
}
return Task.FromResult(true);
}
then once,
var rowSize = 1024;
var rows =
Enumerable.Range(0, texture.Length / rowSize)
.Select(i => Tuple.Create(i * rowSize, (i * rowSize) + rowSize))
.ToArray();
then on each cycle.
Task.WhenAll(rows.Select(t => ProcessRow(texture, t.Item1, t.Item2)).Wait();
Either way, each row is processed in parallel.