Search longest pattern in byte array in C# - c#

I need to write effective and quick method to search byte array for given pattern.
I write it this way, what do you think , how to improve? And it has one bug, it cannot return match with length 1.
public static bool SearchByteByByte(byte[] bytes, byte[] pattern)
{
bool found = false;
int matchedBytes = 0;
for (int i = 0; i < bytes.Length; i++)
{
if (pattern[0] == bytes[i] && bytes.Length - i >= pattern.Length)
{
for (int j = 1; j < pattern.Length; j++)
{
if (bytes[i + j] == pattern[j])
{
matchedBytes++;
if (matchedBytes == pattern.Length - 1)
{
return true;
}
continue;
}
else
{
matchedBytes = 0;
break;
}
}
}
}
return found;
}
Any suggestions ?

The Boyer-Moore algorithm that is used in grep is pretty efficient, and gets more efficient for longer pattern sizes. I'm pretty sure you could make it work for a byte array without too much difficulty, and its wikipedia page has an implementation in Java that should be fairly easy to port to C#.
UPDATE:
Here's an implementation of a simplified version of the Boyer-Moore algorithm for byte arrays in C#. It only uses the second jump table of the full algorithm. Based on the array sizes that you said (haystack: 2000000 bytes, needle: 10 bytes), it's about 5-8 times faster than a simple byte by byte algorithm.
static int SimpleBoyerMooreSearch(byte[] haystack, byte[] needle)
{
int[] lookup = new int[256];
for (int i = 0; i < lookup.Length; i++) { lookup[i] = needle.Length; }
for (int i = 0; i < needle.Length; i++)
{
lookup[needle[i]] = needle.Length - i - 1;
}
int index = needle.Length - 1;
var lastByte = needle.Last();
while (index < haystack.Length)
{
var checkByte = haystack[index];
if (haystack[index] == lastByte)
{
bool found = true;
for (int j = needle.Length - 2; j >= 0; j--)
{
if (haystack[index - needle.Length + j + 1] != needle[j])
{
found = false;
break;
}
}
if (found)
return index - needle.Length + 1;
else
index++;
}
else
{
index += lookup[checkByte];
}
}
return -1;
}

And it has one bug, it cannot return match with length 1
To fix this, start inner loop from zero:
public static bool SearchByteByByte(byte[] bytes, byte[] pattern)
{
bool found = false;
int matchedBytes = 0;
for (int i = 0; i < bytes.Length; i++)
{
if (pattern[0] == bytes[i] && bytes.Length - i >= pattern.Length)
{
for (int j = 0; j < pattern.Length; j++) // start from 0
{
if (bytes[i + j] == pattern[j])
{
matchedBytes++;
if (matchedBytes == pattern.Length) // remove - 1
return true;
continue;
}
else
{
matchedBytes = 0;
break;
}
}
}
}
return found;
}
UPDATE: Here is your searching algorithm after flattering and removing local variables (they are not needed)
public static bool SearchByteByByte(byte[] bytes, byte[] pattern)
{
for (int i = 0; i < bytes.Length; i++)
{
if (bytes.Length - i < pattern.Length)
return false;
if (pattern[0] != bytes[i])
continue;
for (int j = 0; j < pattern.Length; j++)
{
if (bytes[i + j] != pattern[j])
break;
if (j == pattern.Length - 1)
return true;
}
}
return false;
}

So you're looking, effectively, for the longest common substring, so see the Wikipedia article on that: http://en.wikipedia.org/wiki/Longest_common_substring_problem
... or even a reference implementation: http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring#C.23 -- you will, of course, have to substitute byte[] for string there, etc.

Related

Cross Search generate char Matrix

I am trying to create a word search puzzle matrix, this is the code I have,
static void PlaceWords(List<string> words)
{
Random rn = new Random();
foreach (string p in words)
{
String s = p.Trim();
bool placed = false;
while (placed == false)
{
int nRow = rn.Next(0,10);
int nCol = rn.Next(0,10);
int nDirX = 0;
int nDirY = 0;
while (nDirX == 0 && nDirY == 0)
{
nDirX = rn.Next(3) - 1;
nDirY = rn.Next(3) - 1;
}
placed = PlaceWord(s.ToUpper(), nRow, nCol, nDirX, nDirY);
}
}
}
static bool PlaceWord(string s, int nRow, int nCol, int nDirX, int nDirY)
{
bool placed = false;
int LetterNb = s.Length;
int I = nRow;
int J = nCol;
if (MatriceIndice[nRow, nCol] == 0)
{
placed = true;
for (int i = 0; i < s.Length-1; i++)
{
I += nDirX;
J += nDirY;
if (I < 10 && I>0 && J < 10 && J>0)
{
if (MatriceIndice[I, J] == 0)
placed = placed && true;
else
placed = placed && false;
}
else
{
return false;
}
}
}
else
{
return false;
}
if(placed==true)
{
int placeI = nRow;
int placeJ = nCol;
for (int i = 0; i < s.Length - 1; i++)
{
placeI += nDirX;
placeJ += nDirY;
MatriceIndice[placeI,placeJ] = 1;
MatriceChars[placeJ, placeJ] = s[i];
}
}
return placed;
}
However it seems like it is an infinite loop. I am trying to add the code in a 1010 char matrix linked to a 1010 int matrix initially filled with 0 where I change the cases to 1 if the word is added to the matrix. How should I fix the code?
There are several errors. First,
MatriceChars[placeJ, placeJ] = s[i];
should be
MatriceChars[placeI, placeJ] = s[i];
Second,
for (int i = 0; i < s.Length - 1; i++)
(two occurrences) should be
for (int i = 0; i < s.Length; i++)
(You do want all the letters in the words, right?)
Third, when testing indices, you should use I >= 0, not I > 0, as the matrix indices start at 0.
However, the main logic of the code seems to work, but if you try to place too many words, you will indeed enter an infinite loop, since it just keeps trying and failing to place words that can never fit.

How do I convert this pseudocode Enhanced InsertionSort to C#?

I am a student and have been given the pseudocode for an Enhanced InsertionSort, but I am really struggling to convert it to C#.
The pseudocode:
Calculate length n
var i=1,j
if (a[i] < a [i-1]), then
if (a[i] < a [0]), then
set j = 0 and goto-35
else j = i/2 and goto-10
end if
else i++ and repeat-3
end if
if (a[i]
elseif (if (a[i] = = a[j]), then
set j = j+1 and goto-35
end if
else goto-25
end if
while((j-1)>=0), do
j = j-2 and if (a[i]>a[j]), then
if (a[i] < a [j+1]), then
set j = j+1 and goto-35
else set j = j+2 and goto-35
end if
else if (a[i] = =a[j]), then
Set j = j+1 and goto-35
Else return
While (((i-1)-j) > = 0 ), do
J = j+2 and (a[i] < a[j]), then
If (a[i] < a [j-1]), then
Set j=j-1 and goto-35
Else set j+j+0 and goto-35
End if
Else if (a[i]= = a[j]), then
Set j=j+1 and goto-35
Else return
End if
Swap a[i] and a[j]
j++
while(j=i-1), do
i++ and goto-3
END
So far, what I have written in C# is this:
static public void EnhancedInsertionSort(ArrayList List)
{
int n = List.Count;
int i = 1;
int j = i-1;
do
{
if (((Webpage)List[i]).getVisits() < ((Webpage)List[i - 1]).getVisits())
{
if (((Webpage)List[i]).getVisits() < ((Webpage)List[0]).getVisits())
{
j = 0;
object temp = List[i];
List[i] = List[j];
List[j] = temp;
j++;
}
else
j = i / 2;
}
else
i++;
if (((Webpage)List[i]).getVisits() < ((Webpage)List[j]).getVisits())
{
while ((j - 1) >= 0)
{
j = j - 2;
if (((Webpage)List[i]).getVisits() > ((Webpage)List[j]).getVisits())
{
if (((Webpage)List[i]).getVisits() < ((Webpage)List[j + 1]).getVisits())
{
j = j + 1;
object temp = List[i];
List[i] = List[j];
List[j] = temp;
j++;
}
}
else
{
if (((Webpage)List[i]).getVisits() == ((Webpage)List[j]).getVisits())
j = j + 1;
else return;
}
}
}
else
if (((Webpage)List[i]).getVisits() == ((Webpage)List[j]).getVisits())
j = j + 1;
} while (j == i-1);
}
But now I am completely confused and stuck. I am sure I have made some interpretation mistakes, as I am struggling with the "goto" keywords and how to convert them to C# loops. I have researched this, and understand the basic idea, but with so many goto keywords used in one algorithm I am finding the process very confusing.
I would really appreciate any help that anyone can offer!
Note: I am trying to sort an ArrayList of objects ("Webpage") in descending order on the attribute Visits (integer value).
The pseudo code is an old invention, and being pseudo code has not been updated to modern languages. I believe the old GOTO sentence should be replaces with calls to subroutines (or functions) to execute the little functions. In example:
static public void EnhancedInsertionSort(ArrayList List)
{
int n = List.Count;
int i = 1;
int j = i-1;
do
{
if ( pageWithLessVisits(List[i], List[i - 1]) )
{
if ( pageWithLessVisits(List[i], List[0]) )
{
j = 0;
object temp = List[i];
List[i] = List[j];
List[j] = temp;
j++;
}
else
j = i / 2;
}
else
i++;
if (pageWithLessVisits(List[i],List[j]))
{
while ((j - 1) >= 0)
{
j = j - 2;
if (pageWithMoreVisits(List[i], List[j]))
{
if (pageWithLessVisits(List[i],List[j + 1]))
{
j = j + 1;
object temp = List[i];
List[i] = List[j];
List[j] = temp;
j++;
}
}
else
{
if (pageWithEqualVisits(List[i], List[j]))
j = j + 1;
else return;
}
}
}
else
if (pageWithEqualVisits(List[i], List[j]))
j = j + 1;
} while (j == i-1);
}
static private bool pageWithLessVisits(Webpage page1, Webpage page2)
{
return page1.getVisits() < page2.getVisits()
}
static private bool pageWithMoreVisits(Webpage page1, Webpage page2)
{
return page1.getVisits() > page2.getVisits()
}
static private bool pageWithEqualVisits(Webpage page1, Webpage page2)
{
return page1.getVisits() == page2.getVisits()
}
You can even encapsulate the little code inside de IFs. Making the code as legible as the pseudo code.

Search for an Array or List in a List

Have
List<byte> lbyte
Have
byte[] searchBytes
How can I search lbyte for not just a single byte but for the index of the searchBytes?
E.G.
Int32 index = lbyte.FirstIndexOf(searchBytes);
Here is the brute force I came up with.
Not the performance I am looking for.
public static Int32 ListIndexOfArray(List<byte> lb, byte[] sbs)
{
if (sbs == null) return -1;
if (sbs.Length == 0) return -1;
if (sbs.Length > 8) return -1;
if (sbs.Length == 1) return lb.FirstOrDefault(x => x == sbs[0]);
Int32 sbsLen = sbs.Length;
Int32 sbsCurMatch = 0;
for (int i = 0; i < lb.Count; i++)
{
if (lb[i] == sbs[sbsCurMatch])
{
sbsCurMatch++;
if (sbsCurMatch == sbsLen)
{
//int index = lb.FindIndex(e => sbs.All(f => f.Equals(e))); // fails to find a match
IndexOfArray = i - sbsLen + 1;
return;
}
}
else
{
sbsCurMatch = 0;
}
}
return -1;
}
Brute force is always an option. Although slow in comparison to some other methods, in practice it's usually not too bad. It's easy to implement and quite acceptable if lbyte isn't huge and doesn't have pathological data.
It's the same concept as brute force string searching.
You may find Boyer-Moore algorithm useful here. Convert your list to an array and search. The algorithm code is taken from this post.
static int SimpleBoyerMooreSearch(byte[] haystack, byte[] needle)
{
int[] lookup = new int[256];
for (int i = 0; i < lookup.Length; i++) { lookup[i] = needle.Length; }
for (int i = 0; i < needle.Length; i++)
{
lookup[needle[i]] = needle.Length - i - 1;
}
int index = needle.Length - 1;
var lastByte = needle.Last();
while (index < haystack.Length)
{
var checkByte = haystack[index];
if (haystack[index] == lastByte)
{
bool found = true;
for (int j = needle.Length - 2; j >= 0; j--)
{
if (haystack[index - needle.Length + j + 1] != needle[j])
{
found = false;
break;
}
}
if (found)
return index - needle.Length + 1;
else
index++;
}
else
{
index += lookup[checkByte];
}
}
return -1;
}
You can then search like this. If lbyte will remain constant after a certain time, you can just convert it to an array once and pass that.
//index is returned, or -1 if 'searchBytes' is not found
int startIndex = SimpleBoyerMooreSearch(lbyte.ToArray(), searchBytes);
Update based on comment. Here's the IList implementation which means that arrays and lists (and anything else that implements IList can be passed)
static int SimpleBoyerMooreSearch(IList<byte> haystack, IList<byte> needle)
{
int[] lookup = new int[256];
for (int i = 0; i < lookup.Length; i++) { lookup[i] = needle.Count; }
for (int i = 0; i < needle.Count; i++)
{
lookup[needle[i]] = needle.Count - i - 1;
}
int index = needle.Count - 1;
var lastByte = needle[index];
while (index < haystack.Count)
{
var checkByte = haystack[index];
if (haystack[index] == lastByte)
{
bool found = true;
for (int j = needle.Count - 2; j >= 0; j--)
{
if (haystack[index - needle.Count + j + 1] != needle[j])
{
found = false;
break;
}
}
if (found)
return index - needle.Count + 1;
else
index++;
}
else
{
index += lookup[checkByte];
}
}
return -1;
}
Since arrays and lists implement IList, there's no conversion necessary when calling it in your case.
int startIndex = SimpleBoyerMooreSearch(lbyte, searchBytes);
Another way you could do with lambda expression
int index = lbyte.FindIndex(e => searchBytes.All(i => i.Equals(e));

How to find the longest palindrome in a given string? [duplicate]

This question already has answers here:
Write a function that returns the longest palindrome in a given string
(23 answers)
Closed 9 years ago.
Possible Duplicate:
Write a function that returns the longest palindrome in a given string
I know how to do this in O(n^2). But it seems like there exist a better solution.
I've found this, and there is a link to O(n) answer, but it's written in Haskell and not clear for me.
It would be great to get an answer in c# or similar.
I've found clear explanation of the solution here. Thanks to Justin for this link.
There you can find Python and Java implementations of the algorithm (C++ implementation contains errors).
And here is C# implementation that is just a translation of those algorithms.
public static int LongestPalindrome(string seq)
{
int Longest = 0;
List<int> l = new List<int>();
int i = 0;
int palLen = 0;
int s = 0;
int e = 0;
while (i<seq.Length)
{
if (i > palLen && seq[i-palLen-1] == seq[i])
{
palLen += 2;
i += 1;
continue;
}
l.Add(palLen);
Longest = Math.Max(Longest, palLen);
s = l.Count - 2;
e = s - palLen;
bool found = false;
for (int j = s; j > e; j--)
{
int d = j - e - 1;
if (l[j] == d)
{
palLen = d;
found = true;
break;
}
l.Add(Math.Min(d, l[j]));
}
if (!found)
{
palLen = 1;
i += 1;
}
}
l.Add(palLen);
Longest = Math.Max(Longest, palLen);
return Longest;
}
And this is its java version:
public static int LongestPalindrome(String seq) {
int Longest = 0;
List<Integer> l = new ArrayList<Integer>();
int i = 0;
int palLen = 0;
int s = 0;
int e = 0;
while (i < seq.length()) {
if (i > palLen && seq.charAt(i - palLen - 1) == seq.charAt(i)) {
palLen += 2;
i += 1;
continue;
}
l.add(palLen);
Longest = Math.max(Longest, palLen);
s = l.size() - 2;
e = s - palLen;
boolean found = false;
for (int j = s; j > e; j--) {
int d = j - e - 1;
if (l.get(j) == d) {
palLen = d;
found = true;
break;
}
l.add(Math.min(d, l.get(j)));
}
if (!found) {
palLen = 1;
i += 1;
}
}
l.add(palLen);
Longest = Math.max(Longest, palLen);
return Longest;
}
public static string GetMaxPalindromeString(string testingString)
{
int stringLength = testingString.Length;
int maxPalindromeStringLength = 0;
int maxPalindromeStringStartIndex = 0;
for (int i = 0; i < stringLength; i++)
{
int currentCharIndex = i;
for (int lastCharIndex = stringLength - 1; lastCharIndex > currentCharIndex; lastCharIndex--)
{
if (lastCharIndex - currentCharIndex + 1 < maxPalindromeStringLength)
{
break;
}
bool isPalindrome = true;
if (testingString[currentCharIndex] != testingString[lastCharIndex])
{
continue;
}
else
{
int matchedCharIndexFromEnd = lastCharIndex - 1;
for (int nextCharIndex = currentCharIndex + 1; nextCharIndex < matchedCharIndexFromEnd; nextCharIndex++)
{
if (testingString[nextCharIndex] != testingString[matchedCharIndexFromEnd])
{
isPalindrome = false;
break;
}
matchedCharIndexFromEnd--;
}
}
if (isPalindrome)
{
if (lastCharIndex + 1 - currentCharIndex > maxPalindromeStringLength)
{
maxPalindromeStringStartIndex = currentCharIndex;
maxPalindromeStringLength = lastCharIndex + 1 - currentCharIndex;
}
break;
}
}
}
if(maxPalindromeStringLength>0)
{
return testingString.Substring(maxPalindromeStringStartIndex, maxPalindromeStringLength);
}
return null;
}
C#
First I search for even length palindromes. Then I search for odd length palindromes. When it finds a palindrome, it determines the length and sets the max length accordingly. The average case complexity for this is linear.
protected static int LongestPalindrome(string str)
{
int i = 0;
int j = 1;
int oldJ = 1;
int intMax = 1;
int intCount = 0;
if (str.Length == 0) return 0;
if (str.Length == 1) return 1;
int[] intDistance = new int[2] {0,1};
for( int k = 0; k < intDistance.Length; k++ ){
j = 1 + intDistance[k];
oldJ = j;
intCount = 0;
i = 0;
while (j < str.Length)
{
if (str[i].Equals(str[j]))
{
oldJ = j;
intCount = 2 + intDistance[k];
i--;
j++;
while (i >= 0 && j < str.Length)
{
if (str[i].Equals(str[j]))
{
intCount += 2;
i--;
j++;
continue;
}
else
{
break;
}
}
intMax = getMax(intMax, intCount);
j = oldJ + 1;
i = j - 1 - intDistance[k];
}
else
{
i++;
j++;
}
}
}
return intMax;
}
protected static int getMax(int a, int b)
{
if (a > b) return a; return b;
}
Recently I wrote following code during interview...
public string FindMaxLengthPalindrome(string s)
{
string maxLengthPalindrome = "";
if (s == null) return s;
int len = s.Length;
for(int i = 0; i < len; i++)
{
for (int j = 0; j < len - i; j++)
{
bool found = true;
for (int k = j; k < (len - j) / 2; k++)
{
if (s[k] != s[len - (k - j + 1)])
{
found = false;
break;
}
}
if (found)
{
if (len - j > maxLengthPalindrome.Length)
maxLengthPalindrome = s.Substring(j, len - j);
}
if(maxLengthPalindrome.Length >= (len - (i + j)))
break;
}
if (maxLengthPalindrome.Length >= (len - i))
break;
}
return maxLengthPalindrome;
}
I got this question when i took an interview.
I found out when i was back home, unfortunately.
public static string GetMaxPalindromeString(string testingString)
{
int stringLength = testingString.Length;
int maxPalindromeStringLength = 0;
int maxPalindromeStringStartIndex = 0;
for (int i = 0; i < testingString.Length; i++)
{
int currentCharIndex = i;
for (int lastCharIndex = stringLength - 1; lastCharIndex > currentCharIndex; lastCharIndex--)
{
bool isPalindrome = true;
if (testingString[currentCharIndex] != testingString[lastCharIndex])
{
continue;
}
for (int nextCharIndex = currentCharIndex + 1; nextCharIndex < lastCharIndex / 2; nextCharIndex++)
{
if (testingString[nextCharIndex] != testingString[lastCharIndex - 1])
{
isPalindrome = false;
break;
}
}
if (isPalindrome)
{
if (lastCharIndex + 1 - currentCharIndex > maxPalindromeStringLength)
{
maxPalindromeStringStartIndex = currentCharIndex;
maxPalindromeStringLength = lastCharIndex + 1 - currentCharIndex;
}
}
break;
}
}
return testingString.Substring(maxPalindromeStringStartIndex, maxPalindromeStringLength);
}

C# Array Subset fetching

I have an array of bytes and i want to determine if the contents of this array of bytes exists within another larger array as a continuous sequence. What is the simplest way to go about doing this?
The naive approach is:
public static bool IsSubsetOf(byte[] set, byte[] subset) {
for(int i = 0; i < set.Length && i + subset.Length <= set.Length; ++i)
if (set.Skip(i).Take(subset.Length).SequenceEqual(subset))
return true;
return false;
}
For more efficient approaches, you might consider more advanced string matching algorithms like KMP.
Try to adapt some string search algorithm. One of the fastest is Boyer-Moore . It's quite easy as well. For binary data, Knuth-Morris-Pratt algorithm might work very efficiently as well.
This, which is a 1/1 port of this answer: Searching for a sequence of Bytes in a Binary File with Java
Is a very efficient way of doing so:
public static class KmpSearch {
public static int IndexOf(byte[] data, byte[] pattern) {
int[] failure = ComputeFailure(pattern);
int j = 0;
if (data.Length == 0) return -1;
for (int i = 0; i < data.Length; i++) {
while (j > 0 && pattern[j] != data[i]) {
j = failure[j - 1];
}
if (pattern[j] == data[i]) { j++; }
if (j == pattern.Length) {
return i - pattern.Length + 1;
}
}
return -1;
}
private static int[] ComputeFailure(byte[] pattern) {
int[] failure = new int[pattern.Length];
int j = 0;
for (int i = 1; i < pattern.Length; i++) {
while (j > 0 && pattern[j] != pattern[i]) {
j = failure[j - 1];
}
if (pattern[j] == pattern[i]) {
j++;
}
failure[i] = j;
}
return failure;
}
}

Categories