Checking for consecutive numbers - c#

I'm trying to implement a method for checking consecutive numbers in C#.
Given an integer list of n elements, it should return true/false if the numbers are consecutive.
So for example, 12345, 45678, 54321 would all be true.
And 435276, 243516, 974264 would be false.
My code seems to be performing as expected. But it's missing the end element.
for (int i = 0; i < inputList.Count - 1; i++)
{
if (inputList[i] < inputList[i + 1])
{
Console.WriteLine($"{inputList[i]} is consecutive when compared to {inputList[i + 1]}");
consecutiveCheck = true;
}
else
{
Console.WriteLine($"{inputList[i]} is not consecutive when compared to {inputList[i + 1]}");
consecutiveCheck = false;
break;
}
}

Here's a simple way to do it:
int[] inputList = new [] { 12345, 45678, 54321 };
bool all_increasing = inputList.Zip(inputList.Skip(1), (x0, x1) => x1 > x0).All(x => x);

What are Consecutive numbers
It is series of number which have equal distance.
Example of Consecutive numbers
1,2,3,4,5
1,3,5,7,9
Example of NO Consecutive numbers
1,2,4,8,11
1,3,6,10,11
int diff = 0;
for (var i = 0; i < numbersInput.Length-1; i++)
{
if(i==0)
{
diff = numbersInput[i + 1] - numbersInput[i];
}
else if(numbersInput[i] + diff != numbersInput[i+1])
{
return false;
}
}
return true;

You mean you are checking if they are increasing:
bool inc = true;
for (int i = 1; i < inputList.Count; i++)
{
if (inputList[i] < inputList[i - 1])
{
inc=false;
break;
}
}
Console.WriteLine($"List has consecutive numbers: {(inc?"yes":"no")}");

This code will work :
private static bool isConsecutive(int[] list)
{
switch (list.Length)
{
case 0:
throw new ArgumentException("Value cannot be an empty collection.", nameof(list));
case 1:
throw new ArgumentException("This collection contains only one element.", nameof(list));
}
int direction = list[1]-list[0];
for (var index = 0; index < list.Length; index++)
{
int nextIndex = index + 1;
if (nextIndex >= list.Length)
{
continue;
}
int diff = list[nextIndex] - list[index];
if (diff != direction)
{
return false;
}
}
return true;
}

Related

Check patterns of 5 in password "12345", "abcde"

I'm trying to validate a password string in a .NET for sequential patterns (forward or reverse) with numbers or letters of 5 or more.
Examples of patterns that will not be accepted:
"ABCDE",
"12345",
"54321",
"edcba"
I cannot find a decent regex pattern that handles finding the characters in order, currently just returning any sequence of 5 letters or numbers:
public bool CheckForSequence(string input)
{
return Regex.IsMatch(input.ToUpper(), #"([A-Z])(?!\1)([A-Z])(?!\1|\2)([A-Z])(?!\1|\2|\3)([A-Z])(?!\1|\2|\3|\4)([A-Z])") ||
Regex.IsMatch(input, #"([1-9])(?!\1)([1-9])(?!\1|\2)([1-9])(?!\1|\2|\3)([1-9])(?!\1|\2|\3|\4)([1-9])");
}
There are probably way better ways to do this, but, just for fun, I've made a simple brute-force algorithm:
bool CheckForSequence(string inp) {
bool InRange(int c) {
const int minLower = (int)'a';
const int maxLower = (int)'z';
const int minUpper = (int)'A';
const int maxUpper = (int)'Z';
const int minNumber = (int)'0';
const int maxNumber = (int)'9';
return (c >= minLower && c <= maxLower) || (c >= minUpper && c <= maxUpper) || (c >= minNumber && c <= maxNumber);
}
if(inp.Length < 5) return false;
for(var i = 0; i < inp.Length - 4; i++)
{
var c = (int)inp[i];
if(InRange(c))
{
var vM = c;
int x;
for(x = i+1; x < i + 5; x++)
{
if(inp[x] != vM+1 || !InRange(inp[x])) break;
vM++;
}
if(x == i+5) return true;
for(x = i+1; x < i + 5; x++)
{
if(inp[x] != vM-1 || !InRange(inp[x])) break;
vM--;
}
if(x == i+5) return true;
}
}
return false;
}
You can see it in action in this fiddle
Wiktor is correct - regex is the wrong tool for this.
Here's one possible implementation:
public static class SequenceChecker
{
private static char MapChar(char c) => c switch
{
>= '0' and <= '9' => c,
>= 'A' and <= 'Z' => c,
>= 'a' and <= 'z' => (char)(c - 'a' + 'A'),
_ => default,
};
private static bool IsSequence(ReadOnlySpan<char> input)
{
char x = MapChar(input[0]);
if (x == default) return false;
char y = MapChar(input[1]);
if (y == default) return false;
int direction = y - x;
if (Math.Abs(direction) != 1) return false;
for (int index = 2; index < input.Length; index++)
{
x = y;
y = MapChar(input[index]);
if (y == default) return false;
int nextDirection = y - x;
if (nextDirection != direction) return false;
}
return true;
}
public static bool ContainsSequence(string input, int sequenceLength = 5)
{
if (sequenceLength < 2) throw new ArgumentOutOfRangeException(nameof(sequenceLength));
if (input is null) return false;
if (input.Length < sequenceLength) return false;
for (int startIndex = 0; startIndex < 1 + input.Length - sequenceLength; startIndex++)
{
if (IsSequence(input.AsSpan(startIndex, sequenceLength)))
{
return true;
}
}
return false;
}
}
Just to add to the plethora of solutions posted so far:
public static int LongestAscendingOrDescendingRun(string s)
{
if (s.Length <= 1)
return 0;
int longest = 0;
int current = 0;
bool ascending = false;
for (int i = 1; i < s.Length; i++)
{
bool isAscending () => s[i]-s[i-1] == +1;
bool isDescending() => s[i]-s[i-1] == -1;
if (current > 0)
{
if (ascending)
{
if (isAscending())
{
longest = Math.Max(longest, ++current);
}
else // No longer ascending.
{
current = 0;
}
}
else // Descending.
{
if (isDescending())
{
longest = Math.Max(longest, ++current);
}
else // No longer descending.
{
current = 0;
}
}
}
else // No current.
{
if (isAscending())
{
ascending = true;
current = 2;
longest = Math.Max(longest, current);
}
else if (isDescending())
{
ascending = false;
current = 2;
longest = Math.Max(longest, current);
}
}
}
return longest;
}
Like Wiktor has already said, regex isn't a good way to do this. You could find the difference between consecutive characters of the string, and complain if you find a sequence of four or more ones (or -1s).
public bool CheckForSequence(string pass)
{
int curr_diff = 0; // The difference between the i-1th and i-2th character
int consec_diff = 0; // The number of consecutive pairs having the same difference
for (int i = 1; i < pass.Length; i++)
{
int diff = pass[i] - pass[i - 1]; // The difference between the ith and i-1th character
if (Math.Abs(diff) == 1 && curr_diff == diff)
{
// If the difference is the same, increment consec_diff
// And check if the password is invalid
consec_diff++;
if (consec_diff >= 4)
return false;
}
else
{
// New diff. reset curr_diff and consec_diff
curr_diff = diff;
consec_diff = Math.Abs(diff)==1 ? 1 : 0;
// If the difference is 1, set consec_diff to 1 else 0
}
}
return consec_diff < 4;
}

Count similar adjacent items in List<string>

I'm trying to find similar adjacent items in List and count its number, e.g.:
List<string> list = new List<string> {"a", "a", "b", "d", "c", "c"};
Desired Output:
a = 2, c = 2
What I've done is use for loop to iterate over each element of the list and to see whether it has similar adjacent element, but understandably it gives ArgumentOutOfRangeException() because I don't know how to keep track of the position of the iterator so that it doesn't go out of bounds. Here's what I've done:
for (int j = 0; j < list.Count; j++)
{
if (list[j] == "b")
{
if ((list[j + 1] == "b") && (list[j - 1] == "b"))
{
adjacent_found = true;
}
}
}
Having said that, if there's another easier way to find similar adjacent elements in a List other than using for loop iteration, please advise. Thanks.
You can do something like this:
static IEnumerable<Tuple<string, int>> FindAdjacentItems(IEnumerable<string> list)
{
string previous = null;
int count = 0;
foreach (string item in list)
{
if (previous == item)
{
count++;
}
else
{
if (count > 1)
{
yield return Tuple.Create(previous, count);
}
count = 1;
}
previous = item;
}
if (count > 1)
{
yield return Tuple.Create(previous, count);
}
}
for (int i= 0; i < list.Count; i++)
{
for (int j = i + 1; j < list.Count; j++)
{
if (list[i] == list[j])
{
adjacent_found = true;
count++;
}
}
}
Check this:
Dictionary<char,int> dic=new Dictionary<char,int>();
for(int i=1;i<list.count;i++)
{
if(list[i]==list[i-1])
{
if(dic.ContainsKey(list[i]))
{
dic[list[i]]+=1;
}
else
{
dic.Add(list[i],2)
}
}
}
To avoid ArgumentOutOfRangeException use for (int j = 1; j < list.Count - 1; j++). Desired answer can't be achieved this way. Try this:
IEnumerable<Adjacent> CountAdjacents(List<string> source)
{
var result = new List<Adjacent>();
for (var i = 0; i < source.Count() - 1; i++)
{
if (source[i] == source[i + 1])
{
if (result.Any(x => x.Word == source[i]))
{
result.Single(x => x.Word == source[i]).Quantity++;
}
else
result.Add(new Adjacent
{
Word = source[i],
Quantity = 2
});
}
}
return result;
}
class Adjacent
{
public string Word;
public int Quantity;
}
Maintain an int array of 256 size, initialized to 1. Run a loop [O(n)] for i=0 to i-2, compare each char with the next char. If same then find the ascii value of the char and increment the corresponding value in array.
Hope this helps!

Find the missing integer in Codility

I need to "Find the minimal positive integer not occurring in a given sequence. "
A[0] = 1
A[1] = 3
A[2] = 6
A[3] = 4
A[4] = 1
A[5] = 2, the function should return 5.
Assume that:
N is an integer within the range [1..100,000];
each element of array A is an integer within the range [−2,147,483,648..2,147,483,647].
I wrote the code in codility, but for many cases it did not worked and the performance test gives 0 %. Please help me out, where I am wrong.
class Solution {
public int solution(int[] A) {
if(A.Length ==0) return -1;
int value = A[0];
int min = A.Min();
int max = A.Max();
for (int j = min+1; j < max; j++)
{
if (!A.Contains(j))
{
value = j;
if(value > 0)
{
break;
}
}
}
if(value > 0)
{
return value;
}
else return 1;
}
}
The codility gives error with all except the example, positive and negative only values.
Edit: Added detail to answer your actual question more directly.
"Please help me out, where I am wrong."
In terms of correctness: Consider A = {7,2,5,6,3}. The correct output, given the contents of A, is 1, but our algorithm would fail to detect this since A.Min() would return 2 and we would start looping from 3 onward. In this case, we would return 4 instead; since it's the next missing value.
Same goes for something like A = {14,15,13}. The minimal missing positive integer here is again 1 and, since all the values from 13-15 are present, the value variable will retain its initial value of value=A[0] which would be 14.
In terms of performance: Consider what A.Min(), A.Max() and A.Contains() are doing behind the scenes; each one of these is looping through A in its entirety and in the case of Contains, we are calling it repeatedly for every value between the Min() and the lowest positive integer we can find. This will take us far beyond the specified O(N) performance that Codility is looking for.
By contrast, here's the simplest version I can think of that should score 100% on Codility. Notice that we only loop through A once and that we take advantage of a Dictionary which lets us use ContainsKey; a much faster method that does not require looping through the whole collection to find a value.
using System;
using System.Collections.Generic;
class Solution {
public int solution(int[] A) {
// the minimum possible answer is 1
int result = 1;
// let's keep track of what we find
Dictionary<int,bool> found = new Dictionary<int,bool>();
// loop through the given array
for(int i=0;i<A.Length;i++) {
// if we have a positive integer that we haven't found before
if(A[i] > 0 && !found.ContainsKey(A[i])) {
// record the fact that we found it
found.Add(A[i], true);
}
}
// crawl through what we found starting at 1
while(found.ContainsKey(result)) {
// look for the next number
result++;
}
// return the smallest positive number that we couldn't find.
return result;
}
}
The simplest solution that scored perfect score was:
public int solution(int[] A)
{
int flag = 1;
A = A.OrderBy(x => x).ToArray();
for (int i = 0; i < A.Length; i++)
{
if (A[i] <= 0)
continue;
else if (A[i] == flag)
{
flag++;
}
}
return flag;
}
Fastest C# solution so far for [-1,000,000...1,000,000].
public int solution(int[] array)
{
HashSet<int> found = new HashSet<int>();
for (int i = 0; i < array.Length; i++)
{
if (array[i] > 0)
{
found.Add(array[i]);
}
}
int result = 1;
while (found.Contains(result))
{
result++;
}
return result;
}
A tiny version of another 100% with C#
using System.Linq;
class Solution
{
public int solution(int[] A)
{
// write your code in C# 6.0 with .NET 4.5 (Mono)
var i = 0;
return A.Where(a => a > 0).Distinct().OrderBy(a => a).Any(a => a != (i = i + 1)) ? i : i + 1;
}
}
A simple solution that scored 100% with C#
int Solution(int[] A)
{
var A2 = Enumerable.Range(1, A.Length + 1);
return A2.Except(A).First();
}
public class Solution {
public int solution( int[] A ) {
return Arrays.stream( A )
.filter( n -> n > 0 )
.sorted()
.reduce( 0, ( a, b ) -> ( ( b - a ) > 1 ) ? a : b ) + 1;
}
}
It seemed easiest to just filter out the negative numbers. Then sort the stream. And then reduce it to come to an answer. It's a bit of a functional approach, but it got a 100/100 test score.
Got an 100% score with this solution:
https://app.codility.com/demo/results/trainingUFKJSB-T8P/
public int MissingInteger(int[] A)
{
A = A.Where(a => a > 0).Distinct().OrderBy(c => c).ToArray();
if (A.Length== 0)
{
return 1;
}
for (int i = 0; i < A.Length; i++)
{
//Console.WriteLine(i + "=>" + A[i]);
if (i + 1 != A[i])
{
return i + 1;
}
}
return A.Max() + 1;
}
JavaScript solution using Hash Table with O(n) time complexity.
function solution(A) {
let hashTable = {}
for (let item of A) {
hashTable[item] = true
}
let answer = 1
while(true) {
if(!hashTable[answer]) {
return answer
}
answer++
}
}
The Simplest solution for C# would be:
int value = 1;
int min = A.Min();
int max = A.Max();
if (A.Length == 0) return value = 1;
if (min < 0 && max < 0) return value = 1;
List<int> range = Enumerable.Range(1, max).ToList();
List<int> current = A.ToList();
List<int> valid = range.Except(current).ToList();
if (valid.Count() == 0)
{
max++;
return value = max;
}
else
{
return value = valid.Min();
}
Considering that the array should start from 1 or if it needs to start from the minimum value than the Enumerable.range should start from Min
MissingInteger solution in C
int solution(int A[], int N) {
int i=0,r[N];
memset(r,0,(sizeof(r)));
for(i=0;i<N;i++)
{
if(( A[i] > 0) && (A[i] <= N)) r[A[i]-1]=A[i];
}
for(i=0;i<N;i++)
{
if( r[i] != (i+1)) return (i+1);
}
return (N+1);
}
My solution for it:
public static int solution()
{
var A = new[] { -1000000, 1000000 }; // You can try with different integers
A = A.OrderBy(i => i).ToArray(); // We sort the array first
if (A.Length == 1) // if there is only one item in the array
{
if (A[0]<0 || A[0] > 1)
return 1;
if (A[0] == 1)
return 2;
}
else // if there are more than one item in the array
{
for (var i = 0; i < A.Length - 1; i++)
{
if (A[i] >= 1000000) continue; // if it's bigger than 1M
if (A[i] < 0 || (A[i] + 1) >= (A[i + 1])) continue; //if it's smaller than 0, if the next integer is bigger or equal to next integer in the sequence continue searching.
if (1 < A[0]) return 1;
return A[i] + 1;
}
}
if (1 < A[0] || A[A.Length - 1] + 1 == 0 || A[A.Length - 1] + 1 > 1000000)
return 1;
return A[A.Length-1] +1;
}
class Solution {
public int solution(int[] A) {
int size=A.length;
int small,big,temp;
for (int i=0;i<size;i++){
for(int j=0;j<size;j++){
if(A[i]<A[j]){
temp=A[j];
A[j]=A[i];
A[i]=temp;
}
}
}
int z=1;
for(int i=0;i<size;i++){
if(z==A[i]){
z++;
}
//System.out.println(a[i]);
}
return z;
}
enter code here
}
In C# you can solve the problem by making use of built in library functions. How ever the performance is low for very large integers
public int solution(int[] A)
{
var numbers = Enumerable.Range(1, Math.Abs(A.Max())+1).ToArray();
return numbers.Except(A).ToArray()[0];
}
Let me know if you find a better solution performance wise
C# - MissingInteger
Find the smallest missing integer between 1 - 1000.000.
Assumptions of the OP take place
TaskScore/Correctness/Performance: 100%
using System;
using System.Linq;
namespace TestConsole
{
class Program
{
static void Main(string[] args)
{
var A = new int[] { -122, -5, 1, 2, 3, 4, 5, 6, 7 }; // 8
var B = new int[] { 1, 3, 6, 4, 1, 2 }; // 5
var C = new int[] { -1, -3 }; // 1
var D = new int[] { -3 }; // 1
var E = new int[] { 1 }; // 2
var F = new int[] { 1000000 }; // 1
var x = new int[][] { A, B, C, D, E, F };
x.ToList().ForEach((arr) =>
{
var s = new Solution();
Console.WriteLine(s.solution(arr));
});
Console.ReadLine();
}
}
// ANSWER/SOLUTION
class Solution
{
public int solution(int[] A)
{
// clean up array for negatives and duplicates, do sort
A = A.Where(entry => entry > 0).Distinct().OrderBy(it => it).ToArray();
int lowest = 1, aLength = A.Length, highestIndex = aLength - 1;
for (int i = 0; i < aLength; i++)
{
var currInt = A[i];
if (currInt > lowest) return lowest;
if (i == highestIndex) return ++lowest;
lowest++;
}
return 1;
}
}
}
Got 100% - C# Efficient Solution
public int solution (int [] A){
int len = A.Length;
HashSet<int> realSet = new HashSet<int>();
HashSet<int> perfectSet = new HashSet<int>();
int i = 0;
while ( i < len)
{
realSet.Add(A[i]); //convert array to set to get rid of duplicates, order int's
perfectSet.Add(i + 1); //create perfect set so can find missing int
i++;
}
perfectSet.Add(i + 1);
if (realSet.All(item => item < 0))
return 1;
int notContains =
perfectSet.Except(realSet).Where(item=>item!=0).FirstOrDefault();
return notContains;
}
class Solution {
public int solution(int[] a) {
int smallestPositive = 1;
while(a.Contains(smallestPositive)) {
smallestPositive++;
}
return smallestPositive;
}
}
Well, this is a new winner now. At least on C# and my laptop. It's 1.5-2 times faster than the previous champion and 3-10 times faster, than most of the other solutions. The feature (or a bug?) of this solution is that it uses only basic data types. Also 100/100 on Codility.
public int Solution(int[] A)
{
bool[] B = new bool[(A.Length + 1)];
for (int i = 0; i < A.Length; i++)
{
if ((A[i] > 0) && (A[i] <= A.Length))
B[A[i]] = true;
}
for (int i = 1; i < B.Length; i++)
{
if (!B[i])
return i;
}
return A.Length + 1;
}
Simple C++ solution. No additional memory need, time execution order O(N*log(N)):
int solution(vector<int> &A) {
sort (A.begin(), A.end());
int prev = 0; // the biggest integer greater than 0 found until now
for( auto it = std::begin(A); it != std::end(A); it++ ) {
if( *it > prev+1 ) break;// gap found
if( *it > 0 ) prev = *it; // ignore integers smaller than 1
}
return prev+1;
}
int[] A = {1, 3, 6, 4, 1, 2};
Set<Integer> integers = new TreeSet<>();
for (int i = 0; i < A.length; i++) {
if (A[i] > 0) {
integers.add(A[i]);
}
}
Integer[] arr = integers.toArray(new Integer[0]);
final int[] result = {Integer.MAX_VALUE};
final int[] prev = {0};
final int[] curr2 = {1};
integers.stream().forEach(integer -> {
if (prev[0] + curr2[0] == integer) {
prev[0] = integer;
} else {
result[0] = prev[0] + curr2[0];
}
});
if (Integer.MAX_VALUE == result[0]) result[0] = arr[arr.length-1] + 1;
System.out.println(result[0]);
I was surprised but this was a good lesson. LINQ IS SLOW. my answer below got me 11%
public int solution (int [] A){
if (Array.FindAll(A, x => x >= 0).Length == 0) {
return 1;
} else {
var lowestValue = A.Where(x => Array.IndexOf(A, (x+1)) == -1).Min();
return lowestValue + 1;
}
}
I think I kinda look at this a bit differently but gets a 100% evaluation. Also, I used no library:
public static int Solution(int[] A)
{
var arrPos = new int[1_000_001];
for (int i = 0; i < A.Length; i++)
{
if (A[i] >= 0)
arrPos[A[i]] = 1;
}
for (int i = 1; i < arrPos.Length; i++)
{
if (arrPos[i] == 0)
return i;
}
return 1;
}
public int solution(int[] A) {
// write your code in Java SE 8
Set<Integer> elements = new TreeSet<Integer>();
long lookFor = 1;
for (int i = 0; i < A.length; i++) {
elements.add(A[i]);
}
for (Integer integer : elements) {
if (integer == lookFor)
lookFor += 1;
}
return (int) lookFor;
}
I tried to use recursion in C# instead of sorting, because I thought it would show more coding skill to do it that way, but on the scaling tests it didn't preform well on large performance tests. Suppose it's best to just do the easy way.
class Solution {
public int lowest=1;
public int solution(int[] A) {
// write your code in C# 6.0 with .NET 4.5 (Mono)
if (A.Length < 1)
return 1;
for (int i=0; i < A.Length; i++){
if (A[i]==lowest){
lowest++;
solution(A);
}
}
return lowest;
}
}
Here is my solution in javascript
function solution(A) {
// write your code in JavaScript (Node.js 8.9.4)
let result = 1;
let haveFound = {}
let len = A.length
for (let i=0;i<len;i++) {
haveFound[`${A[i]}`] = true
}
while(haveFound[`${result}`]) {
result++
}
return result
}
class Solution {
public int solution(int[] A) {
var sortedList = A.Where(x => x > 0).Distinct().OrderBy(x => x).ToArray();
var output = 1;
for (int i = 0; i < sortedList.Length; i++)
{
if (sortedList[i] != output)
{
return output;
}
output++;
}
return output;
}
}
You should just use a HashSet as its look up time is also constant instead of a dictionary. The code is less and cleaner.
public int solution (int [] A){
int answer = 1;
var set = new HashSet<int>(A);
while (set.Contains(answer)){
answer++;
}
return answer;
}
This snippet should work correctly.
using System;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
int result = 1;
List<int> lst = new List<int>();
lst.Add(1);
lst.Add(2);
lst.Add(3);
lst.Add(18);
lst.Add(4);
lst.Add(1000);
lst.Add(-1);
lst.Add(-1000);
lst.Sort();
foreach(int curVal in lst)
{
if(curVal <=0)
result=1;
else if(!lst.Contains(curVal+1))
{
result = curVal + 1 ;
}
Console.WriteLine(result);
}
}
}

Search for an Array or List in a List

Have
List<byte> lbyte
Have
byte[] searchBytes
How can I search lbyte for not just a single byte but for the index of the searchBytes?
E.G.
Int32 index = lbyte.FirstIndexOf(searchBytes);
Here is the brute force I came up with.
Not the performance I am looking for.
public static Int32 ListIndexOfArray(List<byte> lb, byte[] sbs)
{
if (sbs == null) return -1;
if (sbs.Length == 0) return -1;
if (sbs.Length > 8) return -1;
if (sbs.Length == 1) return lb.FirstOrDefault(x => x == sbs[0]);
Int32 sbsLen = sbs.Length;
Int32 sbsCurMatch = 0;
for (int i = 0; i < lb.Count; i++)
{
if (lb[i] == sbs[sbsCurMatch])
{
sbsCurMatch++;
if (sbsCurMatch == sbsLen)
{
//int index = lb.FindIndex(e => sbs.All(f => f.Equals(e))); // fails to find a match
IndexOfArray = i - sbsLen + 1;
return;
}
}
else
{
sbsCurMatch = 0;
}
}
return -1;
}
Brute force is always an option. Although slow in comparison to some other methods, in practice it's usually not too bad. It's easy to implement and quite acceptable if lbyte isn't huge and doesn't have pathological data.
It's the same concept as brute force string searching.
You may find Boyer-Moore algorithm useful here. Convert your list to an array and search. The algorithm code is taken from this post.
static int SimpleBoyerMooreSearch(byte[] haystack, byte[] needle)
{
int[] lookup = new int[256];
for (int i = 0; i < lookup.Length; i++) { lookup[i] = needle.Length; }
for (int i = 0; i < needle.Length; i++)
{
lookup[needle[i]] = needle.Length - i - 1;
}
int index = needle.Length - 1;
var lastByte = needle.Last();
while (index < haystack.Length)
{
var checkByte = haystack[index];
if (haystack[index] == lastByte)
{
bool found = true;
for (int j = needle.Length - 2; j >= 0; j--)
{
if (haystack[index - needle.Length + j + 1] != needle[j])
{
found = false;
break;
}
}
if (found)
return index - needle.Length + 1;
else
index++;
}
else
{
index += lookup[checkByte];
}
}
return -1;
}
You can then search like this. If lbyte will remain constant after a certain time, you can just convert it to an array once and pass that.
//index is returned, or -1 if 'searchBytes' is not found
int startIndex = SimpleBoyerMooreSearch(lbyte.ToArray(), searchBytes);
Update based on comment. Here's the IList implementation which means that arrays and lists (and anything else that implements IList can be passed)
static int SimpleBoyerMooreSearch(IList<byte> haystack, IList<byte> needle)
{
int[] lookup = new int[256];
for (int i = 0; i < lookup.Length; i++) { lookup[i] = needle.Count; }
for (int i = 0; i < needle.Count; i++)
{
lookup[needle[i]] = needle.Count - i - 1;
}
int index = needle.Count - 1;
var lastByte = needle[index];
while (index < haystack.Count)
{
var checkByte = haystack[index];
if (haystack[index] == lastByte)
{
bool found = true;
for (int j = needle.Count - 2; j >= 0; j--)
{
if (haystack[index - needle.Count + j + 1] != needle[j])
{
found = false;
break;
}
}
if (found)
return index - needle.Count + 1;
else
index++;
}
else
{
index += lookup[checkByte];
}
}
return -1;
}
Since arrays and lists implement IList, there's no conversion necessary when calling it in your case.
int startIndex = SimpleBoyerMooreSearch(lbyte, searchBytes);
Another way you could do with lambda expression
int index = lbyte.FindIndex(e => searchBytes.All(i => i.Equals(e));

Count occurences in byte list/array using another byte list/array

I am trying to get a count of all the times a byte sequences occurs in another byte sequences. It cannot however re-use a bytes if it already counted them. For example given the string
k.k.k.k.k.k. let's assume the byte sequence was k.k it would then find only 3 occurrences rather than 5 because they would be broke down like: [k.k].[k.k].[k.k]. and not like [k.[k].[k].[k].[k].k] where they over lap and essentially just shift 2 to the right.
Ideally the idea is to get an idea how a compression dictionary or run time encoding might look. so the goal would be to get
k.k.k.k.k.k. down to just 2 parts, as (k.k.k.) is the biggest and best symbol you can have.
Here is source so far:
using System;
using System.Collections.Generic;
using System.Collections;
using System.Linq;
using System.Text;
using System.IO;
static class Compression
{
static int Main(string[] args)
{
List<byte> bytes = File.ReadAllBytes("ok.txt").ToList();
List<List<int>> list = new List<List<int>>();
// Starting Numbers of bytes - This can be changed manually.
int StartingNumBytes = bytes.Count;
for (int i = StartingNumBytes; i > 0; i--)
{
Console.WriteLine("i: " + i);
for (int ii = 0; ii < bytes.Count - i; ii++)
{
Console.WriteLine("ii: " + i);
// New pattern comes with refresh data.
List<byte> pattern = new List<byte>();
for (int iii = 0; iii < i; iii++)
{
pattern.Add(bytes[ii + iii]);
}
DisplayBinary(bytes, "red");
DisplayBinary(pattern, "green");
int matches = 0;
// foreach (var position in bytes.ToArray().Locate(pattern.ToArray()))
for (int position = 0; position < bytes.Count; position++) {
if (pattern.Count > (bytes.Count - position))
{
continue;
}
for (int iiii = 0; iiii < pattern.Count; iiii++)
{
if (bytes[position + iiii] != pattern[iiii])
{
//Have to use goto because C# doesn't support continue <level>
goto outer;
}
}
// If it made it this far, it has found a match.
matches++;
Console.WriteLine("Matches: " + matches + " Orig Count: " + bytes.Count + " POS: " + position);
if (matches > 1)
{
int numBytesToRemove = pattern.Count;
for (int ra = 0; ra < numBytesToRemove; ra++)
{
// Remove it at the position it was found at, once it
// deletes the first one, the list will shift left and you'll need to be here again.
bytes.RemoveAt(position);
}
DisplayBinary(bytes, "red");
Console.WriteLine(pattern.Count + " Bytes removed.");
// Since you deleted some bytes, set the position less because you will need to redo the pos.
position = position - 1;
}
outer:
continue;
}
List<int> sublist = new List<int>();
sublist.Add(matches);
sublist.Add(pattern.Count);
// Some sort of calculation to determine how good the symbol was
sublist.Add(bytes.Count-((matches * pattern.Count)-matches));
list.Add(sublist);
}
}
Display(list);
Console.Read();
return 0;
}
static void DisplayBinary(List<byte> bytes, string color="white")
{
switch(color){
case "green":
Console.ForegroundColor = ConsoleColor.Green;
break;
case "red":
Console.ForegroundColor = ConsoleColor.Red;
break;
default:
break;
}
for (int i=0; i<bytes.Count; i++)
{
if (i % 8 ==0)
Console.WriteLine();
Console.Write(GetIntBinaryString(bytes[i]) + " ");
}
Console.WriteLine();
Console.ResetColor();
}
static string GetIntBinaryString(int n)
{
char[] b = new char[8];
int pos = 7;
int i = 0;
while (i < 8)
{
if ((n & (1 << i)) != 0)
{
b[pos] = '1';
}
else
{
b[pos] = '0';
}
pos--;
i++;
}
//return new string(b).TrimStart('0');
return new string(b);
}
static void Display(List<List<int>> list)
{
//
// Display everything in the List.
//
Console.WriteLine("Elements:");
foreach (var sublist in list)
{
foreach (var value in sublist)
{
Console.Write("{0,4}", value);
}
Console.WriteLine();
}
//
// Display total count.
//
int count = 0;
foreach (var sublist in list)
{
count += sublist.Count;
}
Console.WriteLine("Count:");
Console.WriteLine(count);
}
static public int SearchBytePattern(byte[] pattern, byte[] bytes)
{
int matches = 0;
// precomputing this shaves some seconds from the loop execution
int maxloop = bytes.Length - pattern.Length;
for (int i = 0; i < maxloop; i++)
{
if (pattern[0] == bytes[i])
{
bool ismatch = true;
for (int j = 1; j < pattern.Length; j++)
{
if (bytes[i + j] != pattern[j])
{
ismatch = false;
break;
}
}
if (ismatch)
{
matches++;
i += pattern.Length - 1;
}
}
}
return matches;
}
}
Refer to the post to get the non binary of the file should be, here is the binary data:
011010110010111001101011001011100110101100101110011010110010111001101011001011100110101100101110 I am hope to have it smaller than how it started.
private static int CountOccurences(byte[] target, byte[] pattern)
{
var targetString = BitConverter.ToString(target);
var patternString = BitConverter.ToString(pattern);
return new Regex(patternString).Matches(targetString).Count;
}
With this solution you'd have access to the individual indexes that matched (while enumerating) or you could call Count() on the result to see how many matches there were:
public static IEnumerable<int> Find<T>(T[] pattern, T[] sequence, bool overlap)
{
int i = 0;
while (i < sequence.Length - pattern.Length + 1)
{
if (pattern.SequenceEqual(sequence.Skip(i).Take(pattern.Length)))
{
yield return i;
i += overlap ? 1 : pattern.Length;
}
else
{
i++;
}
}
}
Call it with overlap: false to solve your problem or overlap: true to see the overlapped matches (if you're interested.)
I have a couple of other methods with slightly different API (along with better performance) here, including one that work directly on streams of bytes.
quick and dirty with no regex. although i'm not sure if it answers the intent of the question, it should be relatively fast. i think i am going to run some timing tests against regex to see for sure the relative speeds:
private int CountOccurrences(string TestString, string TestPattern)
{
int PatternCount = 0;
int SearchIndex = 0;
if (TestPattern.Length == 0)
throw new ApplicationException("CountOccurrences: Unable to process because TestPattern has zero length.");
if (TestString.Length == 0)
return 0;
do
{
SearchIndex = TestString.IndexOf(TestPattern, SearchIndex);
if (SearchIndex >= 0)
{
++PatternCount;
SearchIndex += TestPattern.Length;
}
}
while ((SearchIndex >= 0) && (SearchIndex < TestString.Length));
return PatternCount;
}
private void btnTest_Click(object sender, EventArgs e)
{
string TestString1 = "k.k.k.k.k.k.k.k.k.k.k.k";
string TestPattern1 = "k.k";
System.Console.WriteLine(CountOccurrences(TestString1, TestPattern1).ToString()); // outputs 6
System.Console.WriteLine(CountOccurrences(TestString1 + ".k", TestPattern1).ToString()); // still 6
System.Console.WriteLine(CountOccurrences(TestString1, TestPattern1 + ".").ToString()); // only 5
}

Categories