Replace byte 10 by 10 10 - c#

Hi I want to replace a byte[] where ever there is 10 by 10 10. here is my code.
if i have my data as "10 20 10 20 40 50 50 50 50 10 03" i want to replce it by
"10 20 10 10 20 40 50 50 50 50 10 10 03"
note: first byte is untouched
plse follow my comment, my idea is to push the byte array to nxt position and add another 10.
foreach (var b in command.ToBytes())
{
// var c = b;
transmitBuffer[count++] = b; data is formed here
addedbuffer[addall++] = b; duplication is made
}
if (addedbuffer[0] == 16 && addedbuffer[1] == 32 || addedbuffer[50] == 16 && addedbuffer[51] == 03) /
{
/condition on which to enter here
addedbuffer[0] = 0; //making 1st and 2nd value as null
addedbuffer[1] = 0;
for (int i = 0; i < addedbuffer.Length; i++) //till length i will chk
{
if (addedbuffer[i] == 10) //replace 10 by 10 10
addedbuffer[i] = 1010; // error,
}
}

You can't insert into array (you can do it with List<T>), so it looks that you have to create a new array; Linq solution:
Byte[] source = new Byte[] {
20, 10, 20, 40, 50, 50, 50, 50, 10, 03
};
var result = source
.SelectMany((item, index) =>
item == 10 && index != 0 ? new Byte[] { item, item } : new Byte[] { item })
.ToArray();
However, using List<Byte> (in order just to insert 10's) instead of Byte[] is a better way out:
List<Byte> list = List<Byte>() {
20, 10, 20, 40, 50, 50, 50, 50, 10, 03
};
// In order not to read inserted 10's we loop backward
// i >= 1: 1st byte should be preserved as is even if its == 10
for (int i = list.Count - 1; i >= 1; --i)
if (list[i] == 10)
list.Insert(i + 1, 10);

It helps to think of sequences like arrays (IEnumerable<T> in C#) as things that can be transformed into new sequences. Much like numbers can be transformed when you send them into a function, sequences can be, too.
Consider if I have a function that is defined as Add10If10(Byte b). It might looks like this:
public static Byte Add10If10(Byte b)
{
if (b == 10)
{
return b + 10;
}
return b;
}
Numbers which go into this are transformed based on the condition, and come out either 10 larger or the same. The same can be done with sequences, you can take a sequence with some number of elements, and transform it so it has more elements. The result is a new sequence:
public static IEnumerable<Byte> AddAdditional10If10(IEnumerable<Byte> values)
{
foreach (var b in values)
{
if (b == 10)
{
yield return 10;
}
yield return b;
}
}
This function returns an additional 10 for every 10 it encounters. Now that you have the right sequence, you can change how it is stored by changing it to an array:
AddAdditional10If10(addedbuffer).ToArray();

This works via conversion to strings, using String.Replace and converting back:
byte[] source = new Byte[] { 20, 10, 20, 40, 50, 50, 50, 50, 10, 03 };
string[] strArr = Array.ConvertAll(source, b => b.ToString());
string[] replArr = String.Join(" ", strArr).Replace("10", "10 10").Split();
byte[] newArr = Array.ConvertAll(replArr, str => Byte.Parse(str));
Edit:
Another approach with LINQ - elements at first and last indexes and all elements not equal to 10 are unchanged, for all remaining 10s it returns a sequence of 2 10s:
byte[] res = source.SelectMany((b, index) => index == 0
|| index == source.Length - 1
|| b != 10 ?
Enumerable.Repeat(b, 1) : Enumerable.Repeat(b, 2))
.ToArray();

This is a fairly efficient way to do it (requires two passes over the input array, but does not require any resizing of the output array):
public static byte[] Replace10With1010ExcludingFirstByte(byte[] input)
{
// Count 10s excluding first byte.
int count = input.Skip(1).Count(b => b == 10);
// Create output array of appropriate size.
byte[] result = new byte[input.Length + count];
// Copy input to output, duplicating all 10s that are not the first byte.
result[0] = input[0];
for (int i = 1, j = 1; i < input.Length; ++i, ++j)
{
result[j] = input[i];
if (input[i] == 10)
result[++j] = 10;
}
return result;
}
Call it with your original array, and use the returned array instead.
Test code (for use in a Console app):
byte[] input = {10, 20, 10, 20, 40, 50, 50, 50, 50, 10, 03};
var result = Replace10With1010ExcludingFirstByte(input);
Console.WriteLine(string.Join(", ", result));
[EDIT] It seems from one of your comments to another answer that you also want to also exclude the last byte from conversion too.
If so, use this code instead:
public static byte[] Replace10With1010ExcludingFirstAndLastByte(byte[] input)
{
// Count 10s excluding first and last byte.
int count = input.Skip(1).Take(input.Length-2).Count(b => b == 10);
// Create output array of appropriate size.
byte[] result = new byte[input.Length + count];
// Copy input to output, duplicating all 10s that are not the first byte.
result[0] = input[0];
for (int i = 1, j = 1; i < input.Length; ++i, ++j)
{
result[j] = input[i];
if ((input[i] == 10) && (i != (input.Length-1)))
result[++j] = 10;
}
return result;
}

Related

Int Array Reorder with Even Distribution in C#?

12,13,14,15,16,19,19,19,19
to
12,19,13,19,14,19,15,19,16
Hey all. Can anyone point me to clues/samples on how to distribute the first array of Int32 values, where a bunch of 19 values were appended, to the second where the 19 values are fairly evenly interspersed in the array?
I am not looking for a random shuffling, as in this example #19 could still appear consecutively if there was randomization. I want to make sure that #19 is placed in between the other numbers in a predictable pattern.
The use-case for this is something like teams taking turns presenting a topic: teams 12-16 each present once and then team #19 shows up but should not show their topic four times in a row, they should show their topic in between the other teams.
Later, if twelve values of 7 are added to the array, then they will also have to be evenly distributed into the sequence as well, the array would be 21 elements but the same rule that neither #19 or #7 should have consecutive showings.
I thought there might be something in Math.NET library that would do this, but I did not find anything. Using C# on .NET Framework 4.7.
Thanks.
Details on the following method that evenly (mostly) distributes the duplicates in your list. Duplicates can be anywhere in your list, they will be distributed.
Create a dictionary of all numbers and keep track of the number of times they appear in the list
Use a new list without any duplicates. For Each number that has duplicates, spread it over the size of this new list. Each time the distribution is even.
public static List<int> EvenlyDistribute(List<int> list)
{
List<int> original = list;
Dictionary<int, int> dict = new Dictionary<int, int>();
list.ForEach(x => dict[x] = dict.Keys.Contains(x) ? dict[x] + 1 : 1);
list = list.Where(x => dict[x] == 1).ToList();
foreach (int key in dict.Where(x => x.Value > 1).Select(x => x.Key))
{
int iterations = original.Where(x => x == key).Count();
for (int i = 0; i < iterations; i++)
list.Insert((int)Math.Ceiling((decimal)((list.Count + iterations) / iterations)) * i, key);
}
return list;
}
Usage in main:
List<int> test = new List<int>() {11,11,11,13,14,15,16,17,18,19,19,19,19};
List<int> newList = EvenlyDistribute(test);
Output
19,11,13,19,14,11,19,15,16,19,11,17,18
Here's how to do this.
var existing = new[] { 12, 13, 14, 15, 16 };
var additional = new [] { 19, 19, 19, 19 };
var lookup =
additional
.Select((x, n) => new { x, n })
.ToLookup(xn => xn.n * existing.Length / additional.Length, xn => xn.x);
var inserted =
existing
.SelectMany((x, n) => lookup[n].StartWith(x))
.ToArray();
This gives me results like 12, 19, 13, 19, 14, 19, 15, 19, 16.
The only thing that this won't do is insert a value in the first position, but otherwise it does evenly distribute the values.
In case random distribution is enough the following code is sufficient:
static void MixArray<T>(T[] array)
{
Random random = new Random();
int n = array.Length;
while (n > 1)
{
n--;
int k = random.Next(n + 1);
T value = array[k];
array[k] = array[n];
array[n] = value;
}
}
For instance:
int[] input = new int[]{12,13,14,15,16,19,19,19,19};
MixArray<int>(input);
In case you require precise evenly distribution while retaining the order of the elements, to following code will do the job:
public static T[] EvenlyDistribute<T>(T[] existing, T[] additional)
{
if (additional.Length == 0)
return existing;
if (additional.Length > existing.Length)
{
//switch arrays
T[] temp = additional;
additional = existing;
existing = temp;
}
T[] result = new T[existing.Length + additional.Length];
List<int> distribution = new List<int>(additional.Length);
double ratio = (double)(result.Length-1) / (additional.Length);
double correction = -1;
if (additional.Length == 1)
{
ratio = (double)result.Length / 2;
correction = 0;
}
double sum = 0;
for (int i = 0; i < additional.Length; i++)
{
sum += ratio;
distribution.Add(Math.Max(0, (int)(sum+correction)));
}
int existing_added = 0;
int additional_added = 0;
for (int i = 0; i < result.Length; i++)
{
if (additional_added == additional.Length)
result[i] = existing[existing_added++];
else
if (existing_added == existing.Length)
result[i] = additional[additional_added++];
else
{
if (distribution[additional_added] <= i)
result[i] = additional[additional_added++];
else
result[i] = existing[existing_added++];
}
}
return result;
}
For instance:
int[] existing = new int[] { 12, 13, 14, 15, 16};
int[] additional = new int[] { 101, 102, 103, 104};
int[] result = EvenlyDistribute<int>(existing, additional);
//result = 12, 101, 13, 102, 14, 103, 15, 104, 16

How to dynamically add indexes values of an array in C#?

I have an array where the first two smallest values have to be added, and consequently the result has to be added to next smallest and so on until it reaches the end of the array to give a final total.
However, how can I dynamically modify the method/function so if the values changes and I have 6 vehicles and 6 specs values in the array, the return of the method/function total is not restricted to just 4 indexes.
The array values are unsorted, so in order to add the first smallest, it has to be sorted. Once that's done it adds the values of the new array.
Here's what I've tried:
public static int vehicles = 4;
public static int[] specs = new int[] { 40, 8, 16, 6 };
public static int time(int vehicles, int[] specs)
{
int newValue = 0;
for (int i = 1; i < vehicles; i++)
{
newValue = specs[i];
int j = i;
while (j > 0 && specs[j - 1] > newValue)
{
specs[j] = specs[j - 1];
j--;
}
specs[j] = newValue;
}
// How can I dynamically change this below:
int result1 = specs[0] + specs[1];
int result2 = result1 + specs[2];
int result3 = result2 + specs[3];
int total = result1 + result2 + result3;
return total; // Returns 114
}
Here's the idea of how it works:
4, [40, 8, 16, 6] = 14 --> [40, 14, 16] = 30 --> [40, 30] = 70 ==>> 14 + 30 + 70 = 114
6, [62, 14, 2, 6, 28, 41 ] = 8 --> [62, 14, 8, 28, 41 ] --> 22 [62, 22, 28, 41 ] --> 50
[62, 50, 41 ] --> 91 [62, 91 ] --> 153 ==> 8 + 22 + 50 + 91 + 153 = 324
First off, if you are not restricted to arrays for some weird reason use List<int> and your life will be easier.
List<int> integers = { 14, 6, 12, 8 };
integers.Sort();
integers.Reverse();
while( integers.Count > 1 )
{
int i = integers[integers.Count - 1];
int j = integers[integers.Count - 2];
integers[integers.Count - 2] = i + j;
integers.RemoveAt(integers.Count - 1);
}
var result = integers[0];
P.S.: This can be easily modified to operate on the array version, you can't RemoveAt() from an array but can separately maintain a lastValidIndex.
I would go with the simplest version of a one line solution using LINQ:
Array.Sort(specs);
int total = specs.Select((n, i) => specs.Take(i + 1).Sum()).Sum() - (specs.Length > 1 ? specs[0] : 0);
I would use Linq.
Enumerable.Range(2, specs.Length - 1)
.Select(i => specs
.Take(i)
.Sum())
.Sum();
Explanation:
We take a range starting from 2 ending with specs.Length.
We sum the first i values of specs where i is the current value in the range.
After we have all those sums, we sum them up as well.
To learn more about linq, start here.
This code only works if the values have been sorted already.
If you want to sort the values using linq, you should use this:
IEnumerable<int> sorted = specs.OrderBy(x => x);
Enumerable.Range(2, sorted.Count() - 1)
.Select(i => sorted
.Take(i)
.Sum())
.Sum();
The OrderBy function needs to know how to get the value it should use to compare the array values. Because the array values are the values we want to compare we can just select them using x => x. This lamba takes the value and returns it again.
See comments in code for explanation.
using System;
using System.Linq;
class Program
{
static void Main()
{
//var inputs = new [] { 40, 8, 16, 6 }; // total = 114
var inputs = new[] { 62, 14, 2, 6, 28, 41 }; // total = 324
var total = 0;
var query = inputs.AsEnumerable();
while (query.Count() > 1)
{
// sort the numbers
var sorted = query.OrderBy(x => x).ToList();
// get sum of the first two smallest numbers
var sumTwoSmallest = sorted.Take(2).Sum();
// count total
total += sumTwoSmallest;
// remove the first two smallest numbers
query = sorted.Skip(2);
// add the sum of the two smallest numbers into the numbers
query = query.Append(sumTwoSmallest);
}
Console.WriteLine($"Total = {total}");
Console.WriteLine("Press any key...");
Console.ReadKey(true);
}
}
I benchmark my code and the result was bad when dealing with large dataset. I suspect it was because of the sorting in the loop. The sorting is needed because I need to find the 2 smallest numbers in each iteration. So I think I need a better way to solve this. I use a PriorityQueue (from visualstudiomagazine.com) because the elements are dequeued based on priority, smaller numbers have higher priority in this case.
long total = 0;
while (pq.Count() > 0)
{
// get two smallest numbers when the priority queue is not empty
int sum = (pq.Count() > 0 ? pq.Dequeue() : 0) + (pq.Count() > 0 ? pq.Dequeue() : 0);
total += sum;
// put the sum of two smallest numbers in the priority queue if the queue is not empty
if (pq.Count() > 0) pq.Enqueue(sum);
}
Here's some benchmark results of the new (priority queue) code and the old code in release build. Results are in milliseconds. I didn't test the 1 million data with the old code because it's too slow.
+---------+----------+-------------+
| Data | New | Old |
+---------+----------+-------------+
| 10000 | 3.9158 | 5125.9231 |
| 50000 | 16.8375 | 147219.4267 |
| 1000000 | 406.8693 | |
+---------+----------+-------------+
Full code:
using System;
using System.Diagnostics;
using System.IO;
using System.Linq;
class Program
{
static void Main()
{
const string fileName = #"numbers.txt";
using (var writer = new StreamWriter(fileName))
{
var random = new Random();
for (var i = 0; i < 10000; i++)
writer.WriteLine(random.Next(100));
writer.Close();
}
var sw = new Stopwatch();
var pq = new PriorityQueue<int>();
var numbers = File.ReadAllLines(fileName);
foreach (var number in numbers)
pq.Enqueue(Convert.ToInt32(number));
long total = 0;
sw.Start();
while (pq.Count() > 0)
{
// get two smallest numbers when the priority queue is not empty
int sum = (pq.Count() > 0 ? pq.Dequeue() : 0) + (pq.Count() > 0 ? pq.Dequeue() : 0);
total += sum;
// put the sum of two smallest numbers in the priority queue if the queue is not empty
if (pq.Count() > 0) pq.Enqueue(sum);
}
sw.Stop();
Console.WriteLine($"Total = {total}");
Console.WriteLine($"Time = {sw.Elapsed.TotalMilliseconds}");
total = 0;
var query = File.ReadAllLines(fileName).Select(x => Convert.ToInt32(x));
sw.Restart();
while (query.Count() > 0)
{
// sort the numbers
var sorted = query.OrderBy(x => x).ToList();
// get sum of the first two smallest numbers
var sumTwoSmallest = sorted.Take(2).Sum();
// count total
total += sumTwoSmallest;
// remove the first two smallest numbers
query = sorted.Skip(2);
// add the sum of the two smallest numbers into the numbers
if (query.Count() > 0)
query = query.Append(sumTwoSmallest);
}
sw.Stop();
Console.WriteLine($"Total = {total}");
Console.WriteLine($"Time = {sw.Elapsed.TotalMilliseconds}");
Console.WriteLine("Press any key...");
Console.ReadKey(true);
}
}
PriorityQueue code:
using System;
using System.Collections.Generic;
// From http://visualstudiomagazine.com/articles/2012/11/01/priority-queues-with-c.aspx
public class PriorityQueue<T> where T : IComparable<T>
{
private List<T> data;
public PriorityQueue()
{
this.data = new List<T>();
}
public void Enqueue(T item)
{
data.Add(item);
int ci = data.Count - 1; // child index; start at end
while (ci > 0)
{
int pi = (ci - 1) / 2; // parent index
if (data[ci].CompareTo(data[pi]) >= 0)
break; // child item is larger than (or equal) parent so we're done
T tmp = data[ci];
data[ci] = data[pi];
data[pi] = tmp;
ci = pi;
}
}
public T Dequeue()
{
// assumes pq is not empty; up to calling code
int li = data.Count - 1; // last index (before removal)
T frontItem = data[0]; // fetch the front
data[0] = data[li];
data.RemoveAt(li);
--li; // last index (after removal)
int pi = 0; // parent index. start at front of pq
while (true)
{
int ci = pi * 2 + 1; // left child index of parent
if (ci > li)
break; // no children so done
int rc = ci + 1; // right child
if (rc <= li && data[rc].CompareTo(data[ci]) < 0) // if there is a rc (ci + 1), and it is smaller than left child, use the rc instead
ci = rc;
if (data[pi].CompareTo(data[ci]) <= 0)
break; // parent is smaller than (or equal to) smallest child so done
T tmp = data[pi];
data[pi] = data[ci];
data[ci] = tmp; // swap parent and child
pi = ci;
}
return frontItem;
}
public T Peek()
{
T frontItem = data[0];
return frontItem;
}
public int Count()
{
return data.Count;
}
public override string ToString()
{
string s = "";
for (int i = 0; i < data.Count; ++i)
s += data[i].ToString() + " ";
s += "count = " + data.Count;
return s;
}
public bool IsConsistent()
{
// is the heap property true for all data?
if (data.Count == 0)
return true;
int li = data.Count - 1; // last index
for (int pi = 0; pi < data.Count; ++pi)
{ // each parent index
int lci = 2 * pi + 1; // left child index
int rci = 2 * pi + 2; // right child index
if (lci <= li && data[pi].CompareTo(data[lci]) > 0)
return false; // if lc exists and it's greater than parent then bad.
if (rci <= li && data[pi].CompareTo(data[rci]) > 0)
return false; // check the right child too.
}
return true; // passed all checks
}
// IsConsistent
}
// PriorityQueue
Reference:
https://visualstudiomagazine.com/articles/2012/11/01/priority-queues-with-c.aspx
https://en.wikipedia.org/wiki/Priority_queue
You can simply sort it using Array.Sort(), then get the sums in a new array which starts with the smallest value and add each next value to the most recent sum, the total will be the value of the last sum.
public static int time(int vehicles, int[] specs)
{
int i, total;
int[] sums = new int[vehicles];
Array.Sort(spec);
sums[0] = specs[0];
for (i = 1; i < vehicles; i++)
sums[i] = sums[i - 1] + spec[i];
total = sums[spec - 1];
}

Displaying mulitples of 3 and 5 under 1000 into a textbox C#

We are working on the Project Euler problems and there is one part of the code I cannot get to work.
I have displayed and calculated the sum for multiples of 3 and 5 under 10, and I have calculated the sum for the same numbers under 1000 but I cannot initially display the numbers used for the calculation within a textbox or equivalent field.
Here's a link to the code.
http://pastebin.com/MZAA88UP
I think, it's a good task for Linq:
int n = 1000;
var numbers = Enumerable
.Range(1, n - 1)
.Where(item => item % 3 == 0 || item % 5 == 0);
Having numbers as a source you can easily play with it. If you want to sum up:
// 233168
var sum = numbers.Sum();
If you want to print out the numbers:
// 3, 5, 6, 9, 10, 12, ..., 996, 999
string report = string.Join(", ", numbers);
If you want to use loops instead of Linq:
private void BtnDisplay1000_Click(object sender, RoutedEventArgs e)
{
var stringBuilder = new StringBuilder();
for (int i = 0; i < 1000; i++)
{
if (i % 3 == 0 || i % 5 == 0)
{
stringBuilder.Append(i);
stringBuilder.Append(", ");
}
}
TxtDisplay1000.Text = (stringBuilder.ToString());
}

In C#, Rotate a List to The Right by the specified numbers of places without using LINQ?

I am trying to rotate a List of items to the right by the specified numbers of places without using LINQ and doing it the manual way, how am I able to do this?
I think I am not understanding how to approach/solve this problem in particular. Here's what I have tried so far.
A clear example of this problem could be something like this:
Initial Array(or List): 20,30,40,50,60,70
Rotate to the right by 3 places: 50,60,70,20,30,40
public void Test(List<int> items, int places)
{
items.RemoveAt(places);
items.Add(places);
}
Results from the below:
20,30,40,50,60,70
60,70,20,30,40,50 (Rotate 2)
50,60,70,20,30,40 (Rotate 3)
If you meant to shift each element n indexes to the left, just replace the line list[i] = copy[index]; with list[index] = copy[i];
Then you get these results:
20,30,40,50,60,70
40,50,60,70,20,30 (Rotate 2)
50,60,70,20,30,40 (Rotate 3)
This is a simple working generic method:
static void RotateList<T>(IList<T> list, int places)
{
// circular.. Do Nothing
if (places % list.Count == 0)
return;
T[] copy = new T[list.Count];
list.CopyTo(copy, 0);
for (int i = 0; i < list.Count; i++)
{
// % used to handle circular indexes and places > count case
int index = (i + places) % list.Count;
list[i] = copy[index];
}
}
Usage:
List<int> list = new List<int>() { 20, 30, 40, 50, 60, 70 };
RotateList(list, 3);
Or since I'm a big fan of extension Methods, you can create one:
(By using IList, this method works for arrays also)
public static class MyExtensions
{
public static void RotateList<T>(this IList<T> list, int places)
{
// circular.. Do Nothing
if (places % list.Count == 0)
return;
T[] copy = new T[list.Count];
list.CopyTo(copy, 0);
for (int i = 0; i < list.Count; i++)
{
int index = (i + places) % list.Count;
list[i] = copy[index];
}
}
Usage:
List<int> list = new List<int>() { 20, 30, 40, 50, 60, 70 };
list.RotateList(12); // circular - no changes
int[] arr = new int[] { 20, 30, 40, 50, 60, 70 };
arr.RotateList(3);
assuming you have a list 1-2-3-4-5, and you want to shift it 2 right to : 3-4-5-1-2.
for n = 1 to 2
remove the head, put it at the tail
simple loop with no linq.
Even though this sounds more like homework to me to be honest ..
int[] numbers = {20, 30, 40, 50, 60, 70};
int rotateBy = 3;
int[] rotated = new int[numbers.Length];
Array.Copy(numbers, rotateBy, rotated, 0, numbers.Length-rotateBy);
Array.Copy(numbers, 0, rotated, numbers.Length - rotateBy, rotateBy);
While this may not exactly answer your question it's worth mentioning about queues. For circular operations Queue<T>s could be a better data structure choice. To shift places it could be as simple as:
public static void Rotate<T>(this Queue<T> items, int places)
{
for (int i = 0; i < places; i++)
items.Enqueue(items.Dequeue());
}
E.g.
var q = new Queue<int>(new[] { 20, 30, 40, 50, 60, 70 });
q.Rotate(3);
Queues are also a lot more efficient than List<T>s for dequeuing operation since it doesnt have to remove from top and push entire block of array one up (read about it here: Queue<T> vs List<T>) or arrays which involves copying of the entire array.

Storing sum of chunks of array through one pass

Let's say I have the array
1,2,3,4,5,6,7,8,9,10,11,12
if my chunck size = 4
then I want to be able to have a method that will output an array of ints int[] a =
a[0] = 1
a[1] = 3
a[2] = 6
a[3] = 10
a[4] = 14
a[5] = 18
a[6] = 22
a[7] = 26
a[8] = 30
a[9] = 34
a[10] = 38
a[11] = 42
note that a[n] = a[n] + a[n-1] + a[n-2] + a[n-3] because the chunk size is 4 thus I sum the last 4 items
I need to have the method without a nested loop
for(int i=0; i<12; i++)
{
for(int k = i; k>=0 ;k--)
{
// do sumation
counter++;
if(counter==4)
break;
}
}
for example i don't want to have something like that... in order to make code more efficient
also the chunck size may change so I cannot do:
a[3] = a[0] + a[1] + a[2] + a[3]
edit
The reason why I asked this question is because I need to implement check sum rolling for my data structures class. I basically open a file for reading. I then have a byte array. then I will perform a hash function on parts of the file. lets say the file is 100 bytes. I split it in chunks of 10 bytes. I perform a hash function in each chunck thus I get 10 hashes. then I need to compare those hashes with a second file that is similar. let's say the second file has the same 100 bytes but with an additional 5 so it contains a total of 105 bytes. becasuse those extra bytes may have been in the middle of the file if I perform the same algorithm that I did on the first file it is not going to work. Hope I explain my self correctly. and because some files are large. it is not efficient to have a nested loop in my algorithm.
also the real rolling hashing functions are very complex. Most of them are in c++ and I have a hard time understanding them. That's why I want to create my own hashing function very simple just to demonstrate how check sum rolling works...
Edit 2
int chunckSize = 4;
int[] a = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12 }; // the bytes of the file
int[] b = new int[a.Length]; // array where we will place the checksums
int[] sum = new int[a.Length]; // array needed to avoid nested loop
for (int i = 0; i < a.Length; i++)
{
int temp = 0;
if (i == 0)
{
temp = 1;
}
sum[i] += a[i] + sum[i-1+temp];
if (i < chunckSize)
{
b[i] = sum[i];
}
else
{
b[i] = sum[i] - sum[i - chunckSize];
}
}
the problem with this algorithm is that with large files the sum will at some point be larger than int.Max thus it is not going to work....
but at least know it is more efficient. getting rid of that nested loop helped a lot!
edit 3
Based on edit two I have worked this out. It does not work with large files and also the checksum algorithm is very bad. but at least I think it explains the hashing rolling that I am trying to explain...
Part1(#"A:\fileA.txt");
Part2(#"A:\fileB.txt", null);
.....
// split the file in chuncks and return the checksums of the chuncks
private static UInt64[] Part1(string file)
{
UInt64[] hashes = new UInt64[(int)Math.Pow(2, 20)];
var stream = File.OpenRead(file);
int chunckSize = (int)Math.Pow(2, 22); // 10 => kilobite 20 => megabite 30 => gigabite etc..
byte[] buffer = new byte[chunckSize];
int bytesRead; // how many bytes where read
int counter = 0; // counter
while ( // while bytesRead > 0
(bytesRead =
(stream.Read(buffer, 0, buffer.Length)) // returns the number of bytes read or 0 if no bytes read
) > 0)
{
hashes[counter] = 0;
for (int i = 0; i < bytesRead; i++)
{
hashes[counter] = hashes[counter] + buffer[i]; // simple algorithm not realistic to perform check sum of file
}
counter++;
}// end while loop
return hashes;
}
// split the file in chuncks rolling it. In reallity this file will be on a different computer..
private static void Part2(string file, UInt64[] hash)
{
UInt64[] hashes = new UInt64[(int)Math.Pow(2, 20)];
var stream = File.OpenRead(file);
int chunckSize = (int)Math.Pow(2, 22); // chunks must be as big as in pervious method
byte[] buffer = new byte[chunckSize];
int bytesRead; // how many bytes where read
int counter = 0; // counter
UInt64[] sum = new UInt64[(int)Math.Pow(2, 20)];
while ( // while bytesRead > 0
(bytesRead =
(stream.Read(buffer, 0, buffer.Length)) // returns the number of bytes read or 0 if no bytes read
) > 0)
{
for (int i = 0; i < bytesRead; i++)
{
int temp = 0;
if (counter == 0)
temp = 1;
sum[counter] += (UInt64)buffer[i] + sum[counter - 1 + temp];
if (counter < chunckSize)
{
hashes[counter] = (UInt64)sum[counter];
}else
{
hashes[counter] = (UInt64)sum[counter] - (UInt64)sum[counter - chunckSize];
}
counter++;
}
}// end while loop
// mising to compare hashes arrays
}
Add an array r for the result, and initialize its first chunk members using a loop from 0 to chunk-1. Now observe that to get r[i+1] you can add a[i+1] to r[i], and subtract a[i-chunk+1]. Now you can do the rest of the items in a single non-nested loop:
for (int i=chunk+1 ; i < N-1 ; i++) {
r[i+1] = a[i+1] + r[i] - a[i-chunk+1];
}
You can get this down to a single for loop, though that may not be good enough. To do that, just note that c[i+1] = c[i]-a[i-k+1]+a[i+1]; where a is the original array, c is the chunky array, and k is the size of the chunks.
I understand that you want to compute a rolling hash function to hash every n-gram (where n is what you call the "chunk size"). Rolling hashing is sometimes called "recursive hashing". There is a wikipedia entry on the topic:
http://en.wikipedia.org/wiki/Rolling_hash
A common algorithm to solve this problem is Karp-Rabin. Here is some pseudo-code which you should be able to easily implement in C#:
B←37
s←empty First-In-First-Out (FIFO) structure (e.g., a linked-list)
x←0(L-bit integer)
z←0(L-bit integer)
for each character c do
append c to s
x ← (B x−B^n z + c ) mod 2^L
yield x
if length(s) = n then
remove oldest character y from s
z ← y
end if
end for
Note that because B^n is a constant, the main loop only does two multiplications, one subtraction and one addition. The "mod 2^L" operation can be done very fast (use a mask, or unsigned integers with L=32 or L=64, for example).
Specifically, your C# code might look like this where n is the "chunk" size (just set B=37, and Btothen = 37 ^ n)
r[0] = 0
for (int i=1 ; i < N ; i++) {
r[i] = a[i] + B * r[i-1] - Btothen * a[i-n];
}
Karp-Rabin is not ideal however. I wrote a paper where better solutions are discussed:
Daniel Lemire and Owen Kaser, Recursive n-gram hashing is pairwise independent, at best, Computer Speech & Language 24 (4), pages 698-710, 2010.
http://arxiv.org/abs/0705.4676
I also published the source code (Java and C++, alas no C# but it should not be hard to go from Java to C#):
https://github.com/lemire/rollinghashjava
https://github.com/lemire/rollinghashcpp
How about storing off the last chunk_size values as you step through?
Allocate an array of size chunk_size, set them all to zero, and then set the element at i % chunk_size with your current element at each iteration of i, and then add up all the values?
using System;
class Sample {
static void Main(){
int chunckSize = 4;
int[] a = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 };
int[] b = new int[a.Length];
int sum = 0;
int d = chunckSize*(chunckSize-1)/2;
foreach(var i in a){
if(i < chunckSize){
sum += i;
b[i-1]=sum;
} else {
b[i-1]=chunckSize*i -d;
}
}
Console.WriteLine(String.Join(",", b));//1,3,6,10,14,18,22,26,30,34,38,42
}
}

Categories