Find blocks of same values in a collection and manipulate surrounding values

Find blocks of same values in a collection and manipulate surrounding values - c#

I really don't know how to summarize the question in the title, sorry. :)
Let's assume I have a collection (i.e. ObservableCollection) containing thousands of objects. These objects consist of an ascending timestamp and a FALSE boolean value (very simplified).
Like this:
[0] 0.01, FALSE
[1] 0.02, FALSE
[2] 0.03, FALSE
[3] 0.04, FALSE
...
Now, let's assume that within this collection, there are blocks that have their flag set to TRUE.
Like this:
[2345] 23.46, FALSE
[2346] 23.47, FALSE
[2347] 23.48, FALSE
[2348] 23.49, TRUE
[2349] 23.50, TRUE
[2350] 23.51, TRUE
[2351] 23.52, TRUE
[2352] 23.53, TRUE
[2353] 23.54, FALSE
[2354] 23.55, FALSE
...
I need to find the blocks and set all flags within 1.5 seconds before and after the block to TRUE aswell.
How can I achieve this while maintaining a reasonable performance?

Matthias G solution is correct, although quite slow – seems to have n-squared complexity.
First algo scans the input values to filter them by IsActive, retrieve timestamps and put into a new list - this is O(n), at least. Then it scans the constructed list, which may be in the worst case the whole input – O(n), and for every timestamp retrieved it scans input values to modify appropriate of them – O(n^2).
Then it builds additional list just to be scanned once and destroyed.
I'd propose a solution similar somewhat to mergesort. First scan input values and for each Active item push appropriate time interval into a queue. You may delay pushing to see if the next interval overlaps the current one – then extend the interval instead of push. When the input list is done, finally push the last delayed interval. This way your queue will contain the (almost) minimum number of time intervals you want to modify.
Then scan again the values data and compare timestamps to the first interval in a queue. If the item's timestamp falls into the current interval, mark the item Active. If it falls past the interval remove the interval from the queue and compare the timestamp to the next one – and so on, until the item is in or before the interval. Your input data are in chronological order, so the intervals will be in the same order, too. This allows accomplishing the task in a single parallel pass through both lists.

Assuming you have a data structure like this:
Edit: Changed TimeStamp to double
public class Value
{
public double TimeStamp { get; set; }
public bool IsActive { get; set; }
}
And a list of this objects called values. Then you could search for active data sets and for each of them mark the values within a range around them as active:
double range = 1.5;
var activeTimeStamps = values.Where(value => value.IsActive)
.Select(value => value.TimeStamp)
.ToList();
foreach (var timeStamp in activeTimeStamps)
{
var valuesToMakeActive =
values.Where
(
value =>
value.TimeStamp >= timeStamp - range &&
value.TimeStamp <= timeStamp + range
);
foreach (var value in valuesToMakeActive)
{
value.IsActive = true;
}
}
Anyway, I guess there will be a solution with better performance..

Related

Some clarification about arrays for a project

Or someone to explain how to do this or direct me to some resources. Here's the assignment info:
dataCaptured: integer array - This field will store a history of a limited set of recently captured measurements. Once the array is full, the class should start overwriting the oldest elements while continuing to record the newest captures. (You may need some helper fields/variables to go with this one).
mostRecentMeasure: integer - This field will store the most recent measurement captured for convenience of display.
GetRawData. Return the contents of the dataCapturedarray. Which element is the oldest one in your history? Do you need to manipulate these values? How will they be presented to the user?
Add a button to display the measurement history (GetRawData). Where/how will you display this list? What did you make GetRawData() return? Does the list start with the oldest value collected? What happens when your history array fills up and older values are overwritten? What happens when your history has not been filled up yet? Does "0" count as an actual history entry? Does your history display the units in which the data was collected?
So, I want to pass the entire array to a Textbox? How do I do that? Atm, I do have the device up and running and I have a textbox ready for the array, I think. Can someone direct me to some articles on where to pass this information? Thanks for any help.
currently i have:
private double[] dataCaptured;
public double[] GetRawData()
{
return dataCaptured;
}
very little :(

I'm not going to do your homework for you, so this isn't really an answer; it's more of a tutorial.
First, the problem you have been assigned is a common one. Consider some kind of chemical process. You build a system that periodically measures some attribute of the system (say temperature). Depending on the nature of the process, you may measure it every 200 milliseconds, every 2 seconds, every 2 minutes, whatever. But, you are going to do that forever. You want to keep the data for some time, but not forever - memory is cheap these days, but not that cheap.
So you decide to keep the last N measurements. You can keep that in an N-element array. When you get the first measurement, you put it in the 0-element of the array, the second in the 1-element, and so on. When you get to the Nth measurement, you stick it in the last element. But, with the (N+1)th element, you have a problem. The solution is to put it in the 0-element of the array. The next one goes into the 1-element of the array, and so on.
At that point, you need some bookkeeping. You need to track where you put the most recent element. You also need to keep track of where the oldest element is. You also need a way to get all the measurements in some soft of time order.
This is a circular buffer. The reason it's circular is because you count 0, 1, 2...N-1, N, 0, 1, 2 and so on, around and around forever.
When you first start filling the buffer with measurements, the oldest is in element-0, and the most recent is wherever you put the last measurement. It continues like that until you have filled the buffer. Once you do that, then the oldest is no longer at 0, it's at the element logically after the most recent one (which is either the next element, or, if you are at the end of buffer, at the 0 position).
You could track the oldest and the newest indexes in the array/buffer. An easier way is to just track whether you are in the initial haven't-filled-buffer-yet phase (a boolean) and the latest index. The oldest is either index 0 (if you haven't filled buffer yet) or the one logically after the newest one. The bookkeeping is a little easier that way.
To do this, you need a way to find the next logical index in the buffer (either index+1 or 0 if you've hit the end of the buffer). You also need a way to get all the data (start at the index logically after the current entry, and then get the next N logical entries).
By the way, I'd use float rather than double to track my measurement. In general, measurements have enough error in themselves that the extra precision offered by double is of no use. By halving the data size, you can make your buffer twice as long (this, by the way, is a consideration in a measurement and control system).
For your measurement source, I'd initially use a number generator that starts at 1.0f to start and just increments by 1.0f every cycle. It will make debugging much simpler.
private float tempValue = 1.0f;
private float GetNextValue()
{
tempValue += 1.0f;
return tempValue;
}
Once you do this for real, you may want random numbers, something like:
private readonly Random _random = new Random();
private const float MinValue = 4.0f;
private const float MaxValue = 20.0f;
private float GetNextValue()
{
var nextRandom = _random.NextDouble();
var nextValue = (float) ((MaxValue - MinValue) * nextRandom + MinValue);
return nextValue;
}
By the way, the choice of 4.0 and 20.0 is intentional - google for "4 20 measurement".
If you want to get fancy, generate a random number stream and run it through a smoothing filter (but that gets more complicated).
You asked about how to get all the values into a single text box. This should do the trick:
// concatenate a collection of float values (rendered with one decimal point)
// into a string (with some spaces separating the values)
private string FormatValuesAsString(IEnumerable<float> values)
{
var buffer = new StringBuilder();
foreach (var val in values)
{
buffer.Append($"{val:F1} ");
}
return buffer.ToString();
}
An IEnumerable<float> represents a collection of float values. An array of floats implements that interface, as does just about every other collection of floats (which means you can put an array in for that parameter (FormatValuesAsString(myArray)). You can also create a function that dynamically generates values (look up yield return). The FormatValuesAsString function will return a string you can shove into a text box. If you do that on each calculation cycle, you'll see your values shift through the text box.
You will get to know the debugger very well doing this exercise. It is very hard to get this right in the first one, or two, or three, or... tries (you'll learn why programmers curse off-by-one errors). Don't give up. It's a good assignment.
Finally, if you get confused, you can post questions as comments to this answer (within reason). Tag me (to make sure I see them).

Split list of objects

So, here is My code:
private List<IEnumerable<Row>> Split(IEnumerable<Row> rows,
IEnumerable<DateTimePeriod> periods)
{
List<IEnumerable<Row>> result = new List<IEnumerable<Row>>();
foreach (var period in periods)
{
result.Add(rows.Where(row => row.Date >= period.begin && row.Date <= period.end));
}
return result;
}
private class DateTimePeriod
{
public DateTime begin { get; set; }
public DateTime end { get; set; }
}
As you can see, this code is not the best, it iterates throught all rows for each period.
I need advice on how to optimize this code. Maybe there are suitable Enumerable methods for this?
Update: all rows and periods ordered by date, and all of rows is always in one of these periods.

A faster method would be to perform a join on the two structures, however Linq only supports equi-joins (joins where two expressions are equal). In your case you are joining on one value being in a range of values, so an equi-join is not possible.
Before starting to optimize, make sure it needs to be optimized. Would your program be significantly faster if this function were faster? How much of your app time is spent in this function?
If optimization wouldn't benefit the program overall, then don't worry about it - make sure it works and then focus on other features of the program.
That said, since you say the rows and periods are already sorted by date, you might get some performance benefit by using loops, looping through the rows until you're out of the current period, then moving to the next period. At least that way you don't enumerate rows (or periods) multiple times.

There is a little problem in your code: rows is IEnumerable so that it can be enumerated multiple times. in foreach. It's a good idea to change it to something more stable, like array, out side of foreach:
var myRows = rows as Row[] ?? rows.ToArray();
by the way. I changed your code the following code, using Resharper:
var myRows = rows as Row[] ?? rows.ToArray();
return periods.Select(period => myRows.Where(row => row.Date >= period.begin && row.Date <= period.end)).ToList();

Your best chance to optimize an O(n x m) algorithm is to transform it in multiple consecutive O(n) operations. In order to gain time you must trade off space, so maybe if you create some lookup table based on the data in one of your Enumerables will help you in this case.
For example you can construct an int array that will have a value set for each day that belongs to a period (each period has another known hardcoded value). This would be your first O(n) loop. Then you do another O(m) loop and only check if the array position corresponding to row.Date is non zero (then you look up the actual value among the hardcoded ones and you get the actual Period).
Anyway, this is more of a general idea and implementation is important. If n and m are very small you might not get any benefit, but if they are large (huge) I can bet that the Split method will run faster.
Assuming that everything you work with is already in memory (no EF involved).

Deduce a downward trend from a list of values

In of the functions in my program that is called every second I get a float value that represents some X strength.
These values keep on coming at intervals and I am looking to store a history of the last 30 values and check if there's a downward/decreasing trend in the values (there might be a 2 or 3 false positives as well, so those have to neglected). If there's a downward trend and (If the most recent value minus the first value in the history) passes a threshold of 50 (say), I want to call another function. How can such a thing be implemented in C# which has such a structure to store history of 30 values and then analyse/deduce the downward trend?

You have several choices. If you only need to call this once per second, you can use a Queue<float>, like this:
Queue<float> theQueue = new Queue<float>(30);
// once per second:
// if the queue is full, remove an item
if (theQueue.Count >= 30)
{
theQueue.Dequeue();
}
// add the new item to the queue
theQueue.Enqueue(newValue);
// now analyze the items in the queue to detect a downward trend
foreach (float f in theQueue)
{
// do your analysis
}
That's easy to implement and will be plenty fast enough to run once per second.
How you analyze the downward trend really depends on your definition.
It occurs to me that the Queue<float> enumerator might not be guaranteed to return things in the order that they were inserted. If it doesn't, then you'll have to implement your own circular buffer.

I don't know C# but I'd probably store the values as an List of some sort. Here's some pseudo code for the trend checking:
if last value - first value < threshold
return
counter = 0
for int i = 1; i < 30; i++
if val[i] > val[i-1]
counter++
if counter < false_positive_threshold
//do stuff

A Circular List is the best data structure to store last X values. There doesn't seem to be one in the standard library but there are several questions on SO how to build one.
You need to define "downward trend". It seems to me that according to your current definition ((If the most recent value minus the first value in the history) the sequence "100, 150, 155, 175, 180, 182" is a downward trend. With that definition you only need to the latest and first value in history from the circular list, simplifying it somewhat.
But you probably need a more elaborate algorithm to identify a downward trend.

Group By variable ceiling value with catch-all maximum using LINQ To NHibernate

I'd like to construct a LINQ GroupBy statement according to the following decimal categories: 0-50, 50-100, 100-250, 250-above. I found Group by variable integer range using Linq which discusses how to use a variable range, but that query has a finite upper bound. My query needs to be able to handle everything over 250 as one group. I tried using decimal.maxValue as my upper bound, but the query couldn't handle it, I suppose because the value is larger than what NHibernate can handle. Ideally, I'd like to do this without specifying a max value so the query is independent of the database.
Edit: I'm pretty sure I can do this by using a array of floor values and following the pattern in the link. But I'm curious if there is a kind of catch-all group-by construct.

Edit:
You changed your OP to state that you could use a floor function, but you wanted to find out about a default grouping.
Mathematically a floor function is equivalent. In the case of ceiling, the lower bound for the data they used is presumably 0. In the case of floor, the logical upper bound is positive infinity (effectively it ends up being the highest value the DB supports, since integers don't support the concept of infinity). It gets you where you want to go.
If you want something that might be more applicable to other situations, you could try something like this:
items.GroupBy(item =>
(
floors.FirstOrDefault(floor => floor <= item)
?? "Default"
)
.ToString()
);
It probably wouldn't work in Linq to NHibernate, as I don't think this would map well to SQL. Instead you could import the whole set into memory first (.ToList()), and then add your grouping as a Linq to Objects query.
It doesn't make a lot of sense to use it in this situation, but it might in the case of non-number-line groupings:
var groups = new HashSet<string>
{
"Orange",
"Green",
"Mauve",
};
items.GroupBy(item =>
groups.Contains(item.Color)
? item.Color
: "Default"
);
Before Edit:
You could simply reverse the logic and you'll automatically include everything below a certain value.
var floors = new[] { 250, 100, 50, 0 };
var groupings = items.GroupBy(item => floors.First(floor => floor <= item));
How it works:
Take an item 270.
The first item in the list would be the first bucket it falls under. This is because 250 <= 270.
Take an item 99.
The third item in the list would be the first bucket it falls under. 250 is not less than 99. 100 is not less than 99. But 50 is less than 99.
An item 50 would fall into the third bucket.
It is less than 250 and 100, but equal to 50.
Doesn't quite match the description in your question:
Your group description is a bit broken. You'd have to bucketize them separately for this algorithm to work. There would be a 0-50 bucket, 51-100 bucket, etc. Or a 0-49 bucket, a 50-99 bucket, etc.
A 0-50 bucket, and a 50-100 bucket couldn't exist together.

I've made an C# application which connects to my webcam and reads the images at the speed the webcam delivers them. I'm parsing the stream, in a way that I have a few jpeg's per seconds.
I don't want to write all the webcam data to the disk, I want to store images in memory. Also the application will act as a webserver which I can supply a datetime in the querystring. And the webserver must server the image closest to that time which it still has in memory.
In my code I have this:
Dictionary<DateTime, byte[]> cameraImages;
of which DateTime is the timestamp of the received image and the bytearray is the jpeg.
All of that works; also handling the webrequest works. Basically I want to clean up that dictionary by keep images according to their age.
Now I need an algorithm for it, in that it cleans up older images.
I can't really figure out an algorithm for it, one reason for it is that the datetimes aren't exactly on a specific moment, and I can't be sure that an image always arrives. (Sometimes the image stream is aborted for several minutes). But what I want to do is:
Keep all images for the first minute.
Keep 2 images per second for the first half hour.
Keep only one image per second if it's older than 30 minutes.
Keep only one image per 30 seconds if it's older than 2 hours.
Keep only one image per minute if it's older than 12 hours.
Keep only one image per hour if it's older than 24 hours.
Keep only one image per day if it's older than two days.
Remove all images older than 1 weeks.
The above intervals are just an example.
Any suggestions?

I think #Kevin Holditch's approach is perfectly reasonable and has the advantage that it would be easy to get the code right.
If there were a large number of images, or you otherwise wanted to think about how to do this "efficiently", I would propose a thought process like the following:
Create 7 queues, representing your seven categories. We take care to keep the images in this queue in sorted time order. The queue data structure is able to efficiently insert at its front and remove from its back. .NET's Queue would be perfect for this.
Each Queue (call it Qi) has an "incoming" set and an "outgoing" set. The incoming set for Queue 0 is those images from the camera, and for any other set it is equal to the outgoing set for Queue i-1
Each queue has rules on both its input and output side which determine whether the queue will admit new items from its incoming set and whether it should eject items from its back into its outgoing set. As a specific example, if Q3 is the queue "Keep only one image per 30 seconds if it's older than 2 hours", then Q3 iterates over its incoming set (which is the outcoming set of Q2) and only admits item i where i's timestamp is 30 seconds or more away from Q3.first() (For this to work correctly the items need to be processed from highest to lowest timestamp). On the output side, we eject from Q3's tail any object older than 12 hours and this becomes the input set for Q4.
Again, #Kevin Holditch's approach has the virtue of simplicity and is probably what you should do. I just thought you might find the above to be food for thought.

You could do this quite easily (although it may not be the most efficient way by using Linq).
E.g.
var firstMinImages = cameraImages.Where(
c => c.Key >= DateTime.Now.AddMinutes(-1));
Then do an equivalent query for every time interval. Combine them into one store of images and overwrite your existing store (presuming you dont want to keep them). This will work with your current criteria as the images needed get progressively less over time.

My strategy would be to Group the elements into buckets that you plan to weed out, then pick 1 element from the list to keep... I have made an example of how to do this using a List of DateTimes and Ints but Pics would work exactly the same way.
My Class used to store each Pic
class Pic
{
public DateTime when {get;set;}
public int val {get;set;}
}
and a sample of a few items in the list...
List<Pic> intTime = new List<Pic>();
intTime.Add(new Pic() { when = DateTime.Now, val = 0 });
intTime.Add(new Pic() { when = DateTime.Now.AddDays(-1), val = 1 });
intTime.Add(new Pic() { when = DateTime.Now.AddDays(-1.01), val = 2 });
intTime.Add(new Pic() { when = DateTime.Now.AddDays(-1.02), val = 3 });
intTime.Add(new Pic() { when = DateTime.Now.AddDays(-2), val = 4 });
intTime.Add(new Pic() { when = DateTime.Now.AddDays(-2.1), val = 5 });
intTime.Add(new Pic() { when = DateTime.Now.AddDays(-2.2), val = 6 });
intTime.Add(new Pic() { when = DateTime.Now.AddDays(-3), val = 7 });
Now I create a helper function to bucket and remove...
private static void KeepOnlyOneFor(List<Pic> intTime, Func<Pic, int> Grouping, DateTime ApplyBefore)
{
var groups = intTime.Where(a => a.when < ApplyBefore).OrderBy(a=>a.when).GroupBy(Grouping);
foreach (var r in groups)
{
var s = r.Where(a=> a != r.LastOrDefault());
intTime.RemoveAll(a => s.Contains(a));
}
}
What this does is lets you specify how to group the object and set an age threshold on the grouping. Now finally to use...
This will remove all but 1 picture per Day for any pics greater than 2 days old:
KeepOnlyOneFor(intTime, a => a.when.Day, DateTime.Now.AddDays(-2));
This will remove all but 1 picture for each Hour after 1 day old:
KeepOnlyOneFor(intTime, a => a.when.Hour, DateTime.Now.AddDays(-1));

If you are on .net 4 you could use a MemoryCache for each interval with CachItemPolicy objects to expire them when you want them to expire and UpdateCallbacks to move some to the next interval.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.