Is there a faster way of obtaining the closest previous (past) DateTime from a list of DateTimes when compared to a specific time? (the list comes from a SQL database)
public DateTime GetClosestPreviousDateTime(List<DateTime> dateTimes, DateTime specificTime)
{
DateTime ret = null;
var lowestDifference = TimeSpan.MaxValue;
foreach (var date in dateTimes)
{
if (date >= specificTime)
continue;
var difference = specificTime- date;
if (difference < lowestDifference)
{
lowestDifference = difference;
ret = date;
}
}
return ret;
}
The source list will be sorted since the dates in the list come from a SQL database where they are written consecutively.
It depends what you mean by "faster". The algorithm you show is O(N) so no you won't get faster than that - if by faster you mean is there a way to not have to iterate over all dates.
But if you mean can you shave off a few microseconds with some code that doesn't emit quite as many op codes, then yes of course. But is that really the issue here?
The answer will also change based on the size of the list, how accurate you need the answer to be, whether we can make any assumptions on the data (e.g. is it already sorted).
dateTimes.Sort();
var closest = dateTimes[dateTimes.IndexOf(search) - 1];
Your problem is a classic search algorithm and binary search might suit you.
Sort list: dateTimes.Sort();
Apply Binary Search algo with similar logic as in your for statement.
dateTimes.Where(x => x < specificTime).Max()
or if you want to handle the case where none exist:
dateTimes.Where(x => x < specificTime).DefaultIfEmpty().Max()
Later edit: Now you introduce new information that the List<> is already sorted. That was not in the question before.
With a sorted List<>, your algorithm is silly since it foreaches on and on, even after having reached the point where the entries "pass" the threshold specificTime. You can use instead BinarySearch (assuming List<> is sorted in ascending order and contains no duplicates):
static DateTime GetClosestPreviousDateTime(List<DateTime> dateTimes, DateTime specificTime)
{
var search = dateTimes.BinarySearch(specificTime);
var index = (search < 0 ? ~search : search) - 1;
if (index == -1)
throw new InvalidOperationException("Not found");
return dateTimes[index];
}
If you want to do it faster, just ask the database for the value, it will know how to find the answer fast; do not fetch the entire List<> to memory first. Use SQL or LINQ to the database.
Related
I would like some help making this comparison faster (sample below). The sample take each value in an array, attach an hour to a comparison-variable. If no matching value, it's add the value to a second array (which are concatenated later).
if (ticks.TypeOf == Period.Hour)
while (compareAt <= endAt)
{
if (range.Where(d => d.time.AddMinutes(-d.time.Minute) == compareAt).Count() < 1)
gaps.Add(new SomeValue() {
...some dummy values.. });
compareAt = compareAt.AddTicks(ticks.Ticks);
}
This execution is too consuming when came to i.e. hours. There are 365 * 24 = 8760 values at most in this array. In future, there will also be minutes/seconds per month 60*24*31=44640, which means unusable.
If the array most often was complete (which means no gaps/empty slots), it could easily be by-passed with if (range.Count() == (hours/day * days)). Though, that day will not be today.
How would I solve it more effective?
One example: If ther are 7800 values in the array, we miss about 950, right? But can I find just the gaps-endings, and just create the missing values? That would make the o-notation depend on amount of gaps, not the amount of values..
One other welcome answer is just an more effective loop.
[Edit]
Sorry for bad english, I try my best to describe.
Your performance is low because the range lookup is not using any indexing and rechecks the entire range every time.
One way to do this a lot quicker;
if (ticks.TypeOf == Period.Hour)
{
// fill a hashset with the range's unique hourly values
var rangehs = new HashSet<DateTime>();
foreach (var r in range)
{
rangehs.Add(r.time.AddMinutes(-r.time.Minute));
}
// walk all the hours
while (compareAt <= endAt)
{
// quickly check if it's a gap
if (!rangehs.Contains(compareAt))
gaps.Add(new SomeValue() { ...some dummy values..});
compareAt = compareAt.AddTicks(ticks.Ticks);
}
}
I'm calculating intersection of 2 sets of sorted numbers in a time-critical part of my application. This calculation is the biggest bottleneck of the whole application so I need to speed it up.
I've tried a bunch of simple options and am currently using this:
foreach (var index in firstSet)
{
if (secondSet.BinarySearch(index) < 0)
continue;
//do stuff
}
Both firstSet and secondSet are of type List.
I've also tried using LINQ:
var intersection = firstSet.Where(t => secondSet.BinarySearch(t) >= 0).ToList();
and then looping through intersection.
But as both of these sets are sorted I feel there's a better way to do it. Note that I can't remove items from sets to make them smaller. Both sets usually consist of about 50 items each.
Please help me guys as I don't have a lot of time to get this thing done. Thanks.
NOTE: I'm doing this about 5.3 million times. So every microsecond counts.
If you have two sets which are both sorted, you can implement a faster intersection than anything provided out of the box with LINQ.
Basically, keep two IEnumerator<T> cursors open, one for each set. At any point, advance whichever has the smaller value. If they match at any point, advance them both, and so on until you reach the end of either iterator.
The nice thing about this is that you only need to iterate over each set once, and you can do it in O(1) memory.
Here's a sample implementation - untested, but it does compile :) It assumes that both of the incoming sequences are duplicate-free and sorted, both according to the comparer provided (pass in Comparer<T>.Default):
(There's more text at the end of the answer!)
static IEnumerable<T> IntersectSorted<T>(this IEnumerable<T> sequence1,
IEnumerable<T> sequence2,
IComparer<T> comparer)
{
using (var cursor1 = sequence1.GetEnumerator())
using (var cursor2 = sequence2.GetEnumerator())
{
if (!cursor1.MoveNext() || !cursor2.MoveNext())
{
yield break;
}
var value1 = cursor1.Current;
var value2 = cursor2.Current;
while (true)
{
int comparison = comparer.Compare(value1, value2);
if (comparison < 0)
{
if (!cursor1.MoveNext())
{
yield break;
}
value1 = cursor1.Current;
}
else if (comparison > 0)
{
if (!cursor2.MoveNext())
{
yield break;
}
value2 = cursor2.Current;
}
else
{
yield return value1;
if (!cursor1.MoveNext() || !cursor2.MoveNext())
{
yield break;
}
value1 = cursor1.Current;
value2 = cursor2.Current;
}
}
}
}
EDIT: As noted in comments, in some cases you may have one input which is much larger than the other, in which case you could potentially save a lot of time using a binary search for each element from the smaller set within the larger set. This requires random access to the larger set, however (it's just a prerequisite of binary search). You can even make it slightly better than a naive binary search by using the match from the previous result to give a lower bound to the binary search. So suppose you were looking for values 1000, 2000 and 3000 in a set with every integer from 0 to 19,999. In the first iteration, you'd need to look across the whole set - your starting lower/upper indexes would be 0 and 19,999 respectively. After you'd found a match at index 1000, however, the next step (where you're looking for 2000) can start with a lower index of 2000. As you progress, the range in which you need to search gradually narrows. Whether or not this is worth the extra implementation cost or not is a different matter, however.
Since both lists are sorted, you can arrive at the solution by iterating over them at most once (you may also get to skip part of one list, depending on the actual values they contain).
This solution keeps a "pointer" to the part of list we have not yet examined, and compares the first not-examined number of each list between them. If one is smaller than the other, the pointer to the list it belongs to is incremented to point to the next number. If they are equal, the number is added to the intersection result and both pointers are incremented.
var firstCount = firstSet.Count;
var secondCount = secondSet.Count;
int firstIndex = 0, secondIndex = 0;
var intersection = new List<int>();
while (firstIndex < firstCount && secondIndex < secondCount)
{
var comp = firstSet[firstIndex].CompareTo(secondSet[secondIndex]);
if (comp < 0) {
++firstIndex;
}
else if (comp > 0) {
++secondIndex;
}
else {
intersection.Add(firstSet[firstIndex]);
++firstIndex;
++secondIndex;
}
}
The above is a textbook C-style approach of solving this particular problem, and given the simplicity of the code I would be surprised to see a faster solution.
You're using a rather inefficient Linq method for this sort of task, you should opt for Intersect as a starting point.
var intersection = firstSet.Intersect(secondSet);
Try this. If you measure it for performance and still find it unwieldy, cry for further help (or perhaps follow Jon Skeet's approach).
I was using Jon's approach but needed to execute this intersect hundreds of thousands of times for a bulk operation on very large sets and needed more performance. The case I was running in to was heavily imbalanced sizes of the lists (eg 5 and 80,000) and wanted to avoid iterating the entire large list.
I found that detecting the imbalance and changing to an alternate algorithm gave me huge benifits over specific data sets:
public static IEnumerable<T> IntersectSorted<T>(this List<T> sequence1,
List<T> sequence2,
IComparer<T> comparer)
{
List<T> smallList = null;
List<T> largeList = null;
if (sequence1.Count() < Math.Log(sequence2.Count(), 2))
{
smallList = sequence1;
largeList = sequence2;
}
else if (sequence2.Count() < Math.Log(sequence1.Count(), 2))
{
smallList = sequence2;
largeList = sequence1;
}
if (smallList != null)
{
foreach (var item in smallList)
{
if (largeList.BinarySearch(item, comparer) >= 0)
{
yield return item;
}
}
}
else
{
//Use Jon's method
}
}
I am still unsure about the point at which you break even, need to do some more testing
try
firstSet.InterSect (secondSet).ToList ()
or
firstSet.Join(secondSet, o => o, id => id, (o, id) => o)
my LINQ query returns an ordered sequence of calendar dates, and i need to output this sequence starting from the earliest date that is more that the given number of days apart from the starting date of the sequence.
the code below does that using linear search. it seems that i could use a binary search to find the beginning date if LINQ query supported this.
in this contrived example i can search the list but in my real code i am trying to avoid storing the whole sequence in memory and i prefer to use just IEnumerable.
any ideas how to make it more efficient? i have thousands of items in my query and doing linear search is just lame...
thanks
konstantin
using System;
using System.Collections.Generic;
using System.Linq;
namespace consapp
{
static class Program
{
static void Main(string[] args)
{
var dates = new List<DateTime>();
var xs = dates.OrderBy(x => x);
dates.Add(DateTime.Parse("11/10/11"));
dates.Add(DateTime.Parse("02/02/11"));
dates.Add(DateTime.Parse("11/24/11"));
dates.Add(DateTime.Parse("09/09/11"));
dates.Add(DateTime.Parse("11/10/11"));
var d = DateTime.MinValue;
double offset = 1.2;
foreach (var x in xs)
{
if (d != DateTime.MinValue)
{
offset -= (x - d).Days;
}
if (offset < 1)
{
Console.WriteLine(x.ToShortDateString());
}
d = x;
}
}
}
}
Binary search would likely be better if your data set is pre-sorted or you do not know the start date of your sequence ahead of time. However if you are sorting your dates using OrderBy like your example and you do know the start date of the sequence, why not put a Where clause in to filter out the dates that don't meet your criteria before you order the sequence?
var xs = from date in dates
where (date - target).Days < 1.2
order by date
select date;
If you have sorted dates IEnumerable in sortedData collection, then here how you can get select dates later than a threshold from the first date:
var threshold = TimeSpan.FromDays(1);
var filteredDates = sortedDates.SkipWhile(sd => sd - sortedDates.First() <= theshold);
It has the advantage over .Where that it only need to go check first dates till it gets to threshold. After that it's just enumerating elements.
Note that it's IEnumerate, so you get all the benefits of lazy loading
I have a list of dates that a machine has worked on, but it doesn't include a date that machine was down. I need to create a list of days worked and not worked. I am not sure of the best way to do this. I have started by incrementing through all the days of a range and checking to see if the date is in the list by iterating through the entire list each time. I am looking for a more efficient means of finding the dates.
class machineday
{
datetime WorkingDay;
}
class machinedaycollection : List<machineday>
{
public List<TimeCatEvent> GetAllByCat(string cat)
{
_CategoryCode = cat;
List<machineday> li = this.FindAll(delegate(machinedaydummy) { return true; });
li.Sort(sortDate);
return li;
}
int sortDate(machinedayevent1, machinedayevent2)
{
int returnValue = -1;
if (event2.date < event1.date)
{
returnValue = 0;
}
else if (event2.date == event1.date)
{
//descending
returnValue = event1.date.CompareTo(event2.date);
}
return returnValue;
}
}
Sort the dates and iterate the resulting list in parallel to incrementing a counter. Whenever the counter does not match the current list element, you've found a date missing in the list.
List<DateTime> days = ...;
days.Sort();
DateTime dt = days[0].Date;
for (int i = 0; i < days.Length; dt = dt.AddDays(1))
{
if (dt == days[i].Date)
{
Console.WriteLine("Worked: {0}", dt);
i++;
}
else
{
Console.WriteLine("Not Worked: {0}", dt);
}
}
(This assumes there are no duplicate days in the list.)
Build a list of valid dates and subtract your machine day collection from it using LINQ's Enumerable.Except extension method. Something like this:
IEnumerable<DateTime> dates = get_candidate_dates();
var holidays = dates.Except(machinedays.Select(m => m.WorkingDay));
The get_candidate_dates() method could even be an iterator that generates all dates within a range on the fly, rather than a pre-stored list of all dates.
Enumerable's methods are reasonably smart and will usually do a decent job on the performance side of things, but if you want the fastest possible algorithm, it will depend on how you plan to consume the result.
Sorry dudes, but I do not pretty much like your solutions.
I think you should create a HashTable with your dates. You can do this by interating only once the working days.
Then, you interate the full range of of days and for every one you query in the hashtable if the date is there or not, by using
myHashTable.ContainsKey(day); // this is efficient
Simple, elegant and fast.
I think your solution uses an exponencial time, this one is lineal or logarithmical (which is actually a good thing).
Assuming the list is sorted and the machine was "working" most of the time, you may be able to avoid iterating through all the dates by grouping the dates by month and skipping the dates in between. Something like this (you'll need to clean up):
int chunksize = 60; // adjust depending on data
machineday currentDay = myMachinedaycollection[0];
for (int i = 0; i < myMachinedaycollection.Count; i += chunksize)
{
if (currentDay.WorkingDay.AddDays(chunksize) != myMachinedaycollection[i + chunksize].WorkingDay)
{
// write code to iterate through current chunk and get all the non-working days
}
currentDay = myMachinedaycollection[i + chunksize];
}
I doubt you want a list of days working and not working.
Your question title suggests that you want to know whether the system was up on a particular date. It also seems reasonable to calculate % uptime. Neither of these requires building a list of all time instants in the interval.
Sort the service times. For the first question, do BinarySearch for the date you care about and check whether the preceding entry was the system being taken offline of maintenance or put back into service. For % uptime, take the (down for maintenance, service restored) pair-wise, use subtraction to find the duration of maintenance, add these up. Then use subtraction to find the length of the total interval.
If your question didn't actually mean you were keeping track of maintenance intervals (or equivalently usage intervals) then you can ignore this answer.
So, here's the scenario. I have a file with a created time, and I want to choose a time from a list of times that that file's created time is closest or equal too...what would be the best way to accomplish this?
var closestTime = listOfTimes.OrderBy(t => Math.Abs((t - fileCreateTime).Ticks))
.First();
If you don't want the performance overhead of the OrderBy call then you could use something like the MinBy extension method from MoreLINQ instead:
var closestTime = listOfTimes.MinBy(t => Math.Abs((t - fileCreateTime).Ticks));
Something like this:
DateTime fileDate, closestDate;
ArrayList theDates;
long min = long.MaxValue;
foreach (DateTime date in theDates)
if (Math.Abs(date.Ticks - fileDate.Ticks) < min)
{
min = Math.Abs(date.Ticks - fileDate.Ticks);
closestDate = date;
}
The accepted answer is completely wrong. What you want is something like this:
DateTime fileDate, closestDate;
List<DateTime> theDates;
fileDate = DateTime.Today; //set to the file date
theDates = new List<DateTime>(); //load the date list, obviously
long min = Math.Abs(fileDate.Ticks - theDates[0].Ticks);
long diff;
foreach (DateTime date in theDates)
{
diff = Math.Abs(fileDate.Ticks - date.Ticks);
if (diff < min)
{
min = diff;
closestDate = date;
}
}
var closestTime = (from t in listOfTimes
orderby (t - fileInfo.CreationTime).Duration()
select t).First();
How often will you be doing this with the same list of times? If you're only doing it once, the fastest way is probably to just scan through the list and keep track of the closest time you've seen yet. When/if you encounter a time that's closer, replace the "closest" with that closer time.
If you're doing it very often, you'd probably want to sort the list, then use a binary search.
get the difference of your file creatime and every time in your list and sort the absolute value of each time difference. the first one should be the answer you are looking for.
Use the minimum absolute time difference between the file time and the time in the list. You might get 2 entries being the same, and then you would need a different method to differ between these.
Not an answer, but a question regarding the various LINQ solutions proposed above. How efficient is LINQ? I have not written any "real" programs with LINQ yet, so I'm not sure on the performance.
In this example, the "listOfTimes" collection implies that we have already iterated over some file system based objects to gather the times. Would it have been more efficient to do the analysis during the iteration instead of later in LINQ? I recognize that these solutions may be more "elegant" or nicely abstract the "collection as database" idea, but I tend to choose efficiency (must be readable though) over elagance in my programming. Just wondering if the cost of LINQ might outweigh the elegance here?
var creationTimes = new [] {DateTime.Now.AddDays(-1), DateTime.Now.AddDays(-2)};
FileInfo fi = new FileInfo("C:/test.xml");
var closestTime = creationTimes
.OrderBy(c => Math.Abs(c.Subtract(fi.CreationTime).Days))
.First();
var min = listoftimes.Select(
x => new { diff = Math.Abs((x - timeoffile).Ticks), time = x}).
OrderBy(x => x.diff).
First().time;
Note: Assumes at least 1 entry in listoftimes.
I thought I would update this post to include a real world scenario. I wanted this sort of function as I have a blog showing news of the latest movie screenings.
However I don't want to list screening in the past (ie screening date past the current date) and as I wanted to show a record I needed some sort of ID passed to pick up the record.
I have left if simple so that you can follow the process and no doubt make it more efficient with LINQ et al.
First the model
public class LatestScreeeningsModel
{
public int Id { get; set; }
public DateTime Date { get; set; }
}
Then the code block you can call from your controller
private static LatestScreeeningsModel GetLatestScreening(IPublishedContent currentNode)
{
LatestScreeeningsModel latestScreening = new LatestScreeeningsModel();
DateTime fileDate;
// get a list of screenings that have not shown yet
var screenings = currentNode.AncestorsOrSelf("siteLanguage")
.FirstOrDefault().Descendants("screening")
.Select(x => new LatestScreeeningsModel() { Id = x.Id, Date = x.GetPropertyValue<DateTime>("date") })
.Where(x => x.Date > DateTime.Now).ToList();
fileDate = DateTime.Today;
long min = Math.Abs(fileDate.Ticks - screenings[0].Date.Ticks);
long diff;
foreach (var comingDate in screenings)
{
diff = Math.Abs(fileDate.Ticks - comingDate.Date.Ticks);
if (diff <= min)
{
min = diff;
latestScreening = comingDate;
}
}
return latestScreening;
}
I am using Umbraco to get the date items but it would work with any custom model, List et al.
Hope it helps
This is a generalized solution to the question, "Find the closest time from a list of times". This solution finds the closest time before and after a given search time.
//For finding the closest time in a list using a given search time...
var listOfTimes = new List<DateTime>();
listOfTimes.Add(DateTime.Parse("1/1/2000"));
listOfTimes.Add(DateTime.Parse("1/2/2000"));
listOfTimes.Add(DateTime.Parse("1/3/2000"));
listOfTimes.Add(DateTime.Parse("1/4/2000"));
listOfTimes.Add(DateTime.Parse("1/5/2000"));
var searchTime = DateTime.Parse("1/3/2000");
var closestBefore = listOfTimes.LastOrDefault(t => t < searchTime);
var closestAfter = listOfTimes.FirstOrDefault(t => t > searchTime);
Console.WriteLine(closestBefore);
Console.WriteLine(closestAfter);
/*
searchTime: 1/3/2000
before: 1/2/2000 12:00:00 AM
after: 1/4/2000 12:00:00 AM
searchTime: 1/1/1900 (edge case)
before: 1/1/0001 12:00:00 AM (DateTime.MinValue)
after: 1/1/2000 12:00:00 AM
searchTime: 1/1/2001 (edge case)
before: 1/5/2000 12:00:00 AM
after: 1/1/0001 12:00:00 AM (DateTime.MinValue)
*/