Searching with Linq

Searching with Linq - c#

I have a collection of objects, each with an int Frame property. Given an int, I want to find the object in the collection that has the closest Frame.
Here is what I'm doing so far:
public static void Search(int frameNumber)
{
var differences = (from rec in _records
select new { FrameDiff = Math.Abs(rec.Frame - frameNumber), Record = rec }).OrderBy(x => x.FrameDiff);
var closestRecord = differences.FirstOrDefault().Record;
//continue work...
}
This is great and everything, except there are 200,000 items in my collection and I call this method very frequently. Is there a relatively easy, more efficient way to do this?

var closestRecord = _records.MinBy(rec => Math.Abs(rec.Frame - frameNumber));
using MinBy from MoreLINQ.

What you might want to try is to store the frames in a datastructure that's sorted by Frame. Then you can do a binary search when you need to find the closest one to a given frameNumber.

I don't know that I would use LINQ for this, at least not with an orderby.
static Record FindClosestRecord(IEnumerable<Record> records, int number)
{
Record closest = null;
int leastDifference = int.MaxValue;
foreach (Record record in records)
{
int difference = Math.Abs(number - record.Frame);
if (difference == 0)
{
return record; // exact match, return early
}
else if (difference < leastDifference)
{
leastDifference = difference;
closest = record;
}
}
return closest;
}

you can combine your statements into one ala:
var closestRecord = (from rec in _records
select new { FrameDiff = Math.Abs(rec.Frame - frameNumber),
Record = rec }).OrderBy(x => x.FrameDiff).FirstOrDefault().Record;

Maybe you could divide your big itemlist in 5 - 10 smaller lists that are ordered by their Framediff or something ?
this way the search is faster if you know in which list you need to search

Related

Loop to check for duplicate strings

I want to create a loop to check a list of titles for duplicates.
I currently have this:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach (var x in productTitles)
{
var title = x.Text;
productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach (var y in productTitles.Skip(productTitles.IndexOf(x) + 1))
{
if (title == y.Text)
{
Assert.Fail("Found duplicate product in the table");
}
}
}
But this is taken the item I skip out of the array for the next loop so item 2 never checks it's the same as item 1, it moves straight to item 3.
I was under the impression that skip just passed over the index you pass in rather than removing it from the list.

You can use GroupBy:
var anyDuplicates = SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.GroupBy(p => p.Text, p => p)
.Any(g => g.Count() > 1);
Assert.That(anyDuplicates, Is.False);
or Distinct:
var productTitles = SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.Select(p => p.Text)
.ToArray();
var distinctProductTitles = productTitles.Distinct().ToArray();
Assert.AreEqual(productTitles.Length, distinctProductTitles.Length);
Or, if it is enough to find a first duplicate without counting all of them it's better to use a HashSet<T>:
var titles = new HashSet<string>();
foreach (var title in SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.Select(p => p.Text))
{
if (!titles.Add(title))
{
Assert.Fail("Found duplicate product in the table");
}
}
All approaches are better in terms of computational complexity (O(n)) than what you propose (O(n2)).

You don't need a loop. Simply use the Where() function to find all same titles, and if there is more than one, then they're duplicates:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach(var x in productTitles) {
if (productTitles.Where(y => x.Text == y.Text).Count() > 1) {
Assert.Fail("Found duplicate product in the table");
}
}

I would try a slightly different way since you only need to check for duplicates in a one-dimensional array.
You only have to check the previous element with the next element within the array/collection so using Linq to iterate through all of the items seems a bit unnecessary.
Here's a piece of code to better understand:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
for ( int i = 0; i < productionTitles.Length; i++ )
{
var currentObject = productionTitles[i];
for ( int j = i + 1; j < productionTitles.Length; j++ )
{
if ( currentObject.Title == productionTitles[j].Title )
{
// here's your duplicate
}
}
}
Since you've checked that item at index 0 is not the same as item placed at index 3 there's no need to check that again when you're at index 3. The items will remain the same.

The Skip(IEnumerable, n) method returns an IEnumerable that doesn't "contain" the n first element of the IEnumerable it's called on.
Also I don't know what sort of behaviour could arise from this, but I wouldn't assign a new IEnumerable to the variable over which the foreach is being executed.
Here's another possible solution with LINQ:
int i = 0;
foreach (var x in productTitles)
{
var possibleDuplicate = productTitles.Skip(i++).Find((y) => y.title == x.title);
//if possibleDuplicate is not default value of type
//do stuff here
}
This goes without saying, but the best solution for you will depend on what you are trying to do. Also, I think the Skip method call is more trouble than it's worth, as I'm pretty sure it will most certainly make the search less eficient.

Filter c# list using timestamp, Take first record of each 5 seconds

I have a scenario like to filter the record based on timings.That is first record in a range of 5 seconds.
Example :
Input data :
data timings
1452 10:00:11
1455 10:00:11
1252 10:00:13
1952 10:00:15
1454 10:00:17
1451 10:00:19
1425 10:00:20
1425 10:00:21
1459 10:00:23
1422 10:00:24
Expected output
1452 10:00:11
1454 10:00:17
1459 10:00:23
I have tried to group the data based on timings like below
listSpacialRecords=listSpacialRecords.GroupBy(x => x.timings).Select(x => x.FirstOrDefault()).ToList();
But using this i can only filter the data using same time.
It hope someone can help me to resolve this
List contain huge data, so is there any way rather than looping through list ?

This works for me:
var results =
source
.Skip(1)
.Aggregate(
source.Take(1).ToList(),
(a, x) =>
{
if (x.timings.Subtract(a.Last().timings).TotalSeconds >= 5.0)
{
a.Add(x);
}
return a;
});
I get your desired output.

This should do (assuming listSpacialRecords is in order)
var result = new List<DateTime>();
var distance = TimeSpan.FromSeconds(5);
var pivot = default(DateTime);
foreach(var record in listSpacialRecords)
{
if(record.timings > pivot)
{
result.Add(record.timings); // yield return record.timings; as an alternative if you need defered execution
pivot = record.timings +distance;
}
}
If not, easiest but maybe not the most efficient way would be to change the foreach a littlebit
foreach(var time in listSpacialRecords.OrderBy(t=>t))
Doing this only using Linq is possible, but wont benefit readability.

assuming your class looks something like this:
public class DataNode
{
public int Data { get; set; }
public TimeSpan Timings { get; set; }
}
I wrote an extension method:
public static IEnumerable<DataNode> TimeFilter(this IEnumerable<DataNode> list, int timeDifference )
{
DataNode LastFound = null;
foreach (var item in list.OrderByDescending(p=> p.Timings))
{
if (item.Timings > LastFound?.Timings.Add(new TimeSpan(0,0,timeDifference)))
{
LastFound = item;
yield return item;
}
}
}
This can then be used like this
var list = new List<DataNode>();
var result = list.TimeFilter(5);

Something like this approach may work, using the % Operator (Modulo)
Assumptions
The list is in order
You don't care if it skips missing seconds
There is always a first element
And this is only within a 24 hour period
Note : Totally untested
var seconds = listSpacialRecords
.First() // get the first element
.Timmings
.TimeOfDay // convert it to TimeSpan
.TotalSeconds; // get the total seconds of the day
var result = listSpacialRecords
.Where(x => (x.Timmings
.TimeOfDay
.TotalSeconds - seconds) % 5 == 0)
// get the difference and mod 5
.ToList();

LINQ: Efficient way to get sum of all child items

I have the following for each loop to get sum of all child objects. Is there a better way using LINQ?
Purchase p1 = db.Purchases.FirstOrDefault(p => p.PurchaseId == 1);
int total = 0;
foreach (SellingItem item in p1.SellingItems)
{
total = total + Convert.ToInt32(item.Price);
}
REFERENCE:
How to get the sum of the volume of highest and lowest priced items in linq
Using the ALL operator in linq to filter child items of EntitySet

Sounds like you just want:
// Any reason for FirstOrDefault rather than SingleOrDefault?
var purchase = db.Purchases.FirstOrDefault(p => p.PurchaseId == 1);
if (purchase != null)
{
var total = purchase.SellingItems
.Sum(x => Convert.ToInt32(x.Price));
...
}
// TODO: Work out what to do if there aren't any such purchases
Why do you need the conversion of the price though? What type is Price, and why isn't it already the right type? (And does that really want to be int rather than decimal?)

p1.SellingItems.Sum(p => p.Price)

Try using the Linq Sum method:
Purchase p1 = db.Purchases.FirstOrDefault(p => p.PurchaseId == 1);
int total = p1.SellingItems.Sum(item => Convert.ToInt32(item.Price));
It is not more efficient, in that it will not be any faster. But it is more concise.

Query Nested Dictionary

I was curious if anyone had a good way to solving this problem efficiently. I currently have the following object.
Dictionary<int, Dictionary<double, CustomStruct>>
struct CustomStruct
{
double value1;
double value2;
...
}
Given that I know the 'int' I want to access, I need to know how to return the 'double key' for the dictionary that has the lowest sum of (value1 + value2). Any help would be greatly appreciated. I was trying to use Linq, but any method would be appreciated.

var result = dict[someInt].MinBy(kvp => kvp.Value.value1 + kvp.Value.value2).Key;
using the MinBy Extension Method from the awesome MoreLINQ project.

Using just plain LINQ:
Dictionary<int, Dictionary<double, CustomStruct>> dict = ...;
int id = ...;
var minimum =
(from kvp in dict[id]
// group the keys (double) by their sums
group kvp.Key by kvp.Value.value1 + kvp.Value.value2 into g
orderby g.Key // sort group keys (sums) in ascending order
select g.First()) // select the first key (double) in the group
.First(); // return first key in the sorted collection of keys
Whenever you want to get the minimum or maximum item using plain LINQ, you usually have to do it using ith a combination of GroupBy(), OrderBy() and First()/Last() to get it.

A Dictionary<TKey,TValue> is also a sequence of KeyValuePair<TKey,TValue>. You can select the KeyValuePair with the least sum of values and and get its key.
Using pure LINQ to Objects:
dict[someInt].OrderBy(item => item.Value.value1 + item.Value.value2)
.FirstOrDefault()
.Select(item => item.Key);

Here is the non LINQ way. It is not shorter than its LINQ counterparts but it is much more efficient because it does no sorting like most LINQ solutions which may turn out expensive if the collection is large.
The MinBy solution from dtb is a good one but it requires an external library. I do like LINQ a lot but sometimes you should remind yourself that a foreach loop with a few local variables is not archaic or an error.
CustomStruct Min(Dictionary<double, CustomStruct> input)
{
CustomStruct lret = default(CustomStruct);
double lastSum = double.MaxValue;
foreach (var kvp in input)
{
var other = kvp.Value;
var newSum = other.value1 + other.value2;
if (newSum < lastSum)
{
lastSum = newSum;
lret = other;
}
}
return lret;
}
If you want to use the LINQ method without using an extern library you can create your own MinBy like this one:
public static class Extensions
{
public static T MinBy<T>(this IEnumerable<T> coll, Func<T,double> criteria)
{
T lret = default(T);
double last = double.MaxValue;
foreach (var v in coll)
{
var newLast = criteria(v);
if (newLast < last)
{
last = newLast;
lret = v;
}
}
return lret;
}
}
It is not as efficient as the first one but it does the job and is more reusable and composable as the first one. Your solution with Aggregate is innovative but requires recalculation of the sum of the current best match for every item the current best match is compared to because you carry not enough state between the aggregate calls.

Thanks for all the help guys, found out this way too:
dict[int].Aggregate(
(seed, o) =>
{
var v = seed.Value.TotalCut + seed.Value.TotalFill;
var k = o.Value.TotalCut + o.Value.TotalFill;
return v < k ? seed : o;
}).Key;

Processing collection in sets

I've a C# generics list collection of customer Ids[customerIdsList].Lets say its count is 25.
I need to pass these Ids in sets of 10[a value which would be configurable and read from app.config]
to another method ProcessCustomerIds() which would process this customer Ids one by one.
ie. the first iteration will pass 10,next will pass the next 10 customer Ids and the last one will pass 5 Ids...and so on and so forth...
How do I achieve this using Linq?
Shall I be using Math.DivRem to do this?
int result=0;
int quotient = Math.DivRem(customerIdsList.Count, 10, out result)
Output:
quotient=2
result=5
So, I will iterate customerIdsList 2 times and invoke ProcessCustomerIds() in each step.
And if result value is greater than 0,then I will do customerIdsList.Skip(25-result) to get the last 5 customerIds from the collection.
Is there any other cleaner, more efficient way to do this? Please advise.

In our project, we have an extension method "Slice" which does exactly what you ask. It looks like this:
public static IEnumerable<IEnumerable<T>> Slice<T>(this IEnumerable<T> list, int size)
{
var slice = new List<T>();
foreach (T item in list)
{
slice.Add(item);
if (slice.Count >= size)
{
yield return slice;
slice = new List<T>();
}
}
if (slice.Count > 0) yield return slice;
}
You use it like this:
customerIdsList.Slice(10).ToList().ForEach(ProcessCustomerIds);
An important feature of this implementation is that it supports deferred execution. (Contrary to an approach using GroupBy). Granted, this doesn't matter most of the time, but sometimes it does.

You could always use this to group the collection:
var n = 10;
var groups = customerIdsList
.Select((id, index) => new { id, index = index / n })
.GroupBy(x => x.index);
Then just run through the groups and issue the members of the group to the server one group at a time.

Yes, you can use Skip and Take methods.
For example:
List <MyObject> list = ...;
int pageSize = 10;
int pageNumber = list.Count / pageSize;
for (int i =0; i<pageNumber; i++){
int currentItem = i * pageSize;
var query = (from obj in list orderby obj.Id).Skip(currentItem).Take(pageSize);
// call method
}
Remember to order the list if you want to use Skip and Take .

A simple extension:
public static class Extensions
{
public static IEnumerable<IEnumerable<T>> Chunks<T>(this List<T> source, int size)
{
for (int i = 0; i < source.Count; i += size)
{
yield return i - source.Count > size
? source.Skip(i)
: source.Skip(i).Take(size);
}
}
}
And then use it like:
var chunks = customerIdsList.Chunks(10);
foreach(var c in chunks)
{
ProcessCustomerIds(c);
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Searching with Linq - c#

var closestRecord = _records.MinBy(rec => Math.Abs(rec.Frame - frameNumber)); using MinBy from MoreLINQ.

What you might want to try is to store the frames in a datastructure that's sorted by Frame. Then you can do a binary search when you need to find the closest one to a given frameNumber.

you can combine your statements into one ala: var closestRecord = (from rec in _records select new { FrameDiff = Math.Abs(rec.Frame - frameNumber), Record = rec }).OrderBy(x => x.FrameDiff).FirstOrDefault().Record;

Maybe you could divide your big itemlist in 5 - 10 smaller lists that are ordered by their Framediff or something ? this way the search is faster if you know in which list you need to search

Related

Loop to check for duplicate strings

Filter c# list using timestamp, Take first record of each 5 seconds

LINQ: Efficient way to get sum of all child items

Query Nested Dictionary

Processing collection in sets

Categories

Resources