LINQ two datatables join - c#

I am stuck with this problem.
I am able to solve this problem by using foreach but there has to be a better solution for this.
I have a two datatables.
First has a column named "Por"
Second has two column named "FirstPor" and "LastPor".
My goals is to use LINQ for creating a new datatable depending on condition like this.
foreach ( DataRow row in FirstDatatable.Rows )
{
foreach ( DataRow secondRow in SecondDatatable.Rows )
{
if ( row["Por"] >= secondRow["FirstPor"] && row["Por"] <= secondRow["LastPor"] )
FinalDataTable.Rows.Add(row);
}
}
I am new in LINQ and this could be problem for me. I am able to do that via Parallel.Foreach but I think that LINQ could be much faster. The condition is simple. Get each number from first table ("Por" column) and check in which row it belongs from second table ( "Por" >= "FirstPor" && "Por" <= "LastPor" ). I think it is simple for anybody who's working this every day.
Yep, there is another task. The columns are STRING type, so conversion is needed in LINQ statement.
Yes, I have just modified my Parallel code to hybrid LINQ/Parallel and seems I am done. I used what James and Rahul wrote and put that to my code. Now, the process takes 52 seconds to estimate 421 000 rows :) It's much better.
public class Range
{
public int FirstPor { get; set; }
public int LastPor { get; set; }
public int PackageId { get; set; }
}
var ranges = (from r in tmpDataTable.AsEnumerable()
select new Range
{
FirstPor = Int32.Parse(r["FirstPor"] as string),
LastPor = Int32.Parse(r["LastPor"] as string),
PackageId = Int32.Parse(r["PackageId"] as string)
}).ToList();
Parallel.ForEach<DataRow>(dt.AsEnumerable(), row =>
{
int por = Int32.Parse(row["Por"].ToString());
lock (locker)
{
row["PackageId"] = ranges.First(range => por >= range.FirstPor && por <= range.LastPor).PackageId;
}
worker.ReportProgress(por);
});

(Untested code ahead.....)
var newRows = FirstDatatable.Rows
.Where(row =>SecondDatatable.Rows
.Any(secondRow => row["Por"] >= secondRow["FirstPor"]
&& row["Por"] <= secondRow["LastPor"]);
FinalDataTable.Rows.AddRange(newRows);
However, if speed is your real concern, my first suggestion is to dump the datatables, and use a list. I'm gonna gues that SecondDatatable, is largely fixed, and probabaly changes less than once a day. So, less create a nice in-memory structure for that:
class Range
{
public int FirstPor {get; set;}
public int LastPor {get; set;}
}
var ranges = (from r in SecondDatatable.Rows
select new Range
{
FirstPor = Int32.Parse(r["FirstPor"]),
LastPor = Int32.Parse(r["LastPor"])
}).ToList();
Then our code becomes:
var newRows = FirstDatatable.Rows
.Where(row =>ranges
.Any(range => row["Por"] >= range.FirstPor
&& row["Por"] <= range.LastPor).ToList();
Which by itself should make this considerably faster.
Now, on a success, it will scan the Ranges up until it finds one that matches. On a failure, it will have to scan the whole list before it gives up. So, the first thing we need to do to speed this up, is to sort the list of ranges. Then we only have to scan up to the point where the low end of the range it higher than the value we are looking for. That should cut the processing time for those rows outside any range in half.

Try this:-
DataTable FinalDataTable = (from x in dt1.AsEnumerable()
from y in dt2.AsEnumerable()
where x.Field<int>("Por") >= y.Field<int>("FirstPor")
&& x.Field<int>("Por") <= y.Field<int>("LastPor")
select x).CopyToDataTable();
Here is the complete Working Fiddle, (I have tested with some sample data with your existing code and my LINQ code) you can copy paste the same in your editor and test because DotNet Fiddle is not supporting AsEnumerable.

Related

how to to SUM specific column including the current data using LINQ

I have a query it should first add up the amount in the database starting from 3 months ago until the current date,and if its more than a specific amount which i put in the condition,it should return false.
public Task<bool> SecurityCheck(CustomerData cust)
{
var checkRsult = (from x in dbContext.CustomerModel
where x.CustomerReference == cust.CustomerReference
&& x.Created >= DateTime.Today.AddMonths(-3)
select new
{
AccomulateAmount = x.AmountToTransfer
}).Sum(x => x.AccomulateAmount);
}
var finalResult=checkRsult+cust.Amount;
if(finalResult>250000){
//return false
}
else{
//store th model in the db
}
first of all im not sure if the way i query is right or not(the LINQ part),my second question is ,is there any way to sum all including the current incoming one(cust.amount)inside a single query? Rather than get the database sum first and then add the current one to it?
It's slightly long winded, you could make it
dbContext.CustomerModel
.Where(cm => cm.CustomerReference == cust.CustomerReference && cm.Created >= DateTime.Today.AddMonths(-3))
.Sum(cm => cm.AmountToTransfer)

Use LINQ to load cumulative average into MVC model

Writing a stats web site for church softball team.
I have a SQL view that calculates the batting average per game from stats in a table. I use that view to build a model in MVC called PlayerStats.
I want to create and load data into a model that looks like this:
public class BattingAverage
{
[Key]
public int stat_id { get; set; }
public int player_id { get; set; }
public decimal batting_avg { get; set; }
public decimal cumulative_batting_avg { get; set; }
}
In my controller I am doing this:
var model = (from s in PlayerStats
where s.season_id == sid && s.player_id == pid
orderby s.game_no
select g).AsEnumerable()
.Select(b => new BattingAverage
{
stat_id = b.stat_id,
player_id = b.player_id,
batting_avg = b.batting_avg,
cumulative_batting_avg = ***[[ WHAT TO DO HERE?? ]]***
});
I don't know how to calculate that cumulative average to load into my model. The end goal is to Json.Encode the model data for use in AmCharts.
UPDATE - I tried this:
cumulative_batting_avg = getCumulativeAvg(sid, pid, b.game_no)
And my function:
public decimal getCumulativeAvg(int season_id, int player_id, int game_no)
{
var averages = PlayerStats.Where(g => g.season_id == season_id && g.player_id == player_id && g.game_no <= game_no).ToArray();
var hits = averages.Select(a => a.Hits).Sum();
var atbats = averages.Select(a => a.AtBats).Sum();
if (atbats == 0)
{
return 0.0m; // m suffix for decimal type
}
else
{
return hits / atbats;
}
}
This returned a correct average in the first row, but then zeroes for the rest. When I put a break on the return, I see that hits and atbats are properly accumulating inside the function, but for some reason the avg isn't being added to the model. What am I doing wrong?
Yeah basically you want to have a subquery to pull the average across all of the games, is my understanding correct? You can use the LET keyword so that while the main query is pulling in the context of the current game, the let subquery can pull across all of the games, something like this: Simple Example Subquery Linq
Which might translate to something like:
from s in PlayerStats
let cs = context.PlayerStats.Where(i => i.PlayerID == s.PlayerID).Select(i => i.batting_avg).Average()
.
.
select new {
batting_avg = b.batting_avg, /* for the current game */
cumulative_batting_avg = cs
}
Something like that, though I might be off syntax a little. With that type of approach, I often worry about performance with LINQ subqueries (you never know what it's going to render) and so you may want to consider using a stored procedure (really depends on how much data you have)
Now from the comments I think I understand what a cumulative batting average is. It sounds like for a given game, it's an average based on the sum of hits and at bats in that game and the previous games.
So let's say that you have a collection of stats that's already filtered by a player and season:
var playerSeasonStats = PlayerStats.Where(g =>
g.season_id == season_id && g.player_id == player_id && g.game_no <= game_no)
.ToArray();
I originally wrote the next part as a Linq expression. But that's just a habit, and in this case it's much simpler and easier to read with a normal for/each loop.
var playerSeasonStats = PlayerStats as PlayerStat[] ?? PlayerStats;
var averages = new List<BattingAverage>();
int cumulativeHits = 0;
int cumulativeAtBats = 0;
foreach (var stat in playerSeasonStats.OrderBy(stat => stat.game_no))
{
cumulativeHits += stat.Hits;
cumulativeAtBats += stat.AtBats;
var average = new BattingAverage
{
player_id = stat.player_id,
stat_id = stat.stat_id,
batting_avg = stat.Hits/stat.AtBats,
cumulative_batting_avg = cumulativeHits/cumulativeAtBats
};
averages.Add(average);
}

Extremely Slow Linq to Excel

I am trying to create an application that will extract some data out of an automatically generated excel file. This can be very easily done with Access but the file is in Excel and the solution must be a one button sort of thing.
For some reason, simply looping through the data without doing any actions is slow. The code below is my attempt at optimizing it from something that was far slower. I have arrived at using Linq to SQL after a few attempts at this with the Interop classes directly and through different wrappers.
I also have read the answers to a few questions on here and Google. In an attempt to see what is causing the slowness, I have removed all instructions but kept "i++" from the relevant section. It is still very slow. I also tried to optimize it by limiting the number of records retrieved in the where clause in the third line but that didn't work. Your help would be appreciated.
Thank you.
Dictionary<string,double> instructors = new Dictionary<string,double>();
var t = from c in excel.Worksheet("Course_201410_M1")
// where c["COURSE CODE"].ToString().Substring(0,4) == "COSC" || c["COURSE CODE"].ToString().Substring(0,3) == "COEN" || c["COURSE CODE"].ToString().Substring(0,3) == "GEIT" || c["COURSE CODE"].ToString().Substring(0,3) == "ITAP" || c["COURSE CODE"] == "PRPL 0012" || c["COURSE CODE"] == "ASSE 4311" || c["COURSE CODE"] == "GEEN 2312" || c["COURSE CODE"] == "ITLB 1311"
select c;
HashSet<string> uniqueForce = new HashSet<string>();
foreach (var c in t)
{
if(uniqueForce.Add(c["Instructor"]))
instructors.Add(c["Instructor"],0.0);
}
foreach (string name in instructors.Keys)
{
var y = from d in t
where d["Instructor"] == name
select d;
int i = 1;
foreach(var z in y)
{
//this is the really slow. It takes a couple of minutes to finish. The
// file has less than a 1000 records.
i++;
}
}
Put the query that forms var t into brackets and then call ToList() on it.
var t = (from c in excel.Worksheet("Course_201410_M1")
select c).ToList();
Due to linq's lazy/deferred execution model, whenever you iterate over the collection it will requery the data source unless you give it a List to work with.

Use LINQ to group a sequence by date with no gaps

I'm trying to select a subgroup of a list where items have contiguous dates, e.g.
ID StaffID Title ActivityDate
-- ------- ----------------- ------------
1 41 Meeting with John 03/06/2010
2 41 Meeting with John 08/06/2010
3 41 Meeting Continues 09/06/2010
4 41 Meeting Continues 10/06/2010
5 41 Meeting with Kay 14/06/2010
6 41 Meeting Continues 15/06/2010
I'm using a pivot point each time, so take the example pivot item as 3, I'd like to get the following resulting contiguous events around the pivot:
ID StaffID Title ActivityDate
-- ------- ----------------- ------------
2 41 Meeting with John 08/06/2010
3 41 Meeting Continues 09/06/2010
4 41 Meeting Continues 10/06/2010
My current implementation is a laborious "walk" into the past, then into the future, to build the list:
var activity = // item number 3: Meeting Continues (09/06/2010)
var orderedEvents = activities.OrderBy(a => a.ActivityDate).ToArray();
// Walk into the past until a gap is found
var preceedingEvents = orderedEvents.TakeWhile(a => a.ID != activity.ID);
DateTime dayBefore;
var previousEvent = activity;
while (previousEvent != null)
{
dayBefore = previousEvent.ActivityDate.AddDays(-1).Date;
previousEvent = preceedingEvents.TakeWhile(a => a.ID != previousEvent.ID).LastOrDefault();
if (previousEvent != null)
{
if (previousEvent.ActivityDate.Date == dayBefore)
relatedActivities.Insert(0, previousEvent);
else
previousEvent = null;
}
}
// Walk into the future until a gap is found
var followingEvents = orderedEvents.SkipWhile(a => a.ID != activity.ID);
DateTime dayAfter;
var nextEvent = activity;
while (nextEvent != null)
{
dayAfter = nextEvent.ActivityDate.AddDays(1).Date;
nextEvent = followingEvents.SkipWhile(a => a.ID != nextEvent.ID).Skip(1).FirstOrDefault();
if (nextEvent != null)
{
if (nextEvent.ActivityDate.Date == dayAfter)
relatedActivities.Add(nextEvent);
else
nextEvent = null;
}
}
The list relatedActivities should then contain the contiguous events, in order.
Is there a better way (maybe using LINQ) for this?
I had an idea of using .Aggregate() but couldn't think how to get the aggregate to break out when it finds a gap in the sequence.
Here's an implementation:
public static IEnumerable<IGrouping<int, T>> GroupByContiguous(
this IEnumerable<T> source,
Func<T, int> keySelector
)
{
int keyGroup = Int32.MinValue;
int currentGroupValue = Int32.MinValue;
return source
.Select(t => new {obj = t, key = keySelector(t))
.OrderBy(x => x.key)
.GroupBy(x => {
if (currentGroupValue + 1 < x.key)
{
keyGroup = x.key;
}
currentGroupValue = x.key;
return keyGroup;
}, x => x.obj);
}
You can either convert the dates to ints by means of subtraction, or imagine a DateTime version (easily).
In this case I think that a standard foreach loop is probably more readable than a LINQ query:
var relatedActivities = new List<TActivity>();
bool found = false;
foreach (var item in activities.OrderBy(a => a.ActivityDate))
{
int count = relatedActivities.Count;
if ((count > 0) && (relatedActivities[count - 1].ActivityDate.Date.AddDays(1) != item.ActivityDate.Date))
{
if (found)
break;
relatedActivities.Clear();
}
relatedActivities.Add(item);
if (item.ID == activity.ID)
found = true;
}
if (!found)
relatedActivities.Clear();
For what it's worth, here's a roughly equivalent -- and far less readable -- LINQ query:
var relatedActivities = activities
.OrderBy(x => x.ActivityDate)
.Aggregate
(
new { List = new List<TActivity>(), Found = false, ShortCircuit = false },
(a, x) =>
{
if (a.ShortCircuit)
return a;
int count = a.List.Count;
if ((count > 0) && (a.List[count - 1].ActivityDate.Date.AddDays(1) != x.ActivityDate.Date))
{
if (a.Found)
return new { a.List, a.Found, ShortCircuit = true };
a.List.Clear();
}
a.List.Add(x);
return new { a.List, Found = a.Found || (x.ID == activity.ID), a.ShortCircuit };
},
a => a.Found ? a.List : new List<TActivity>()
);
Somehow, I don't think LINQ was truly meant to be used for bidirectional-one-dimensional-depth-first-searches, but I constructed a working LINQ using Aggregate. For this example I'm going to use a List instead of an array. Also, I'm going to use Activity to refer to whatever class you are storing the data in. Replace it with whatever is appropriate for your code.
Before we even start, we need a small function to handle something. List.Add(T) returns null, but we want to be able to accumulate in a list and return the new list for this aggregate function. So all you need is a simple function like the following.
private List<T> ListWithAdd<T>(List<T> src, T obj)
{
src.Add(obj);
return src;
}
First, we get the sorted list of all activities, and then initialize the list of related activities. This initial list will contain the target activity only, to start.
List<Activity> orderedEvents = activities.OrderBy(a => a.ActivityDate).ToList();
List<Activity> relatedActivities = new List<Activity>();
relatedActivities.Add(activity);
We have to break this into two lists, the past and the future just like you currently do it.
We'll start with the past, the construction should look mostly familiar. Then we'll aggregate all of it into relatedActivities. This uses the ListWithAdd function we wrote earlier. You could condense it into one line and skip declaring previousEvents as its own variable, but I kept it separate for this example.
var previousEvents = orderedEvents.TakeWhile(a => a.ID != activity.ID).Reverse();
relatedActivities = previousEvents.Aggregate<Activity, List<Activity>>(relatedActivities, (items, prevItem) => items.OrderBy(a => a.ActivityDate).First().ActivityDate.Subtract(prevItem.ActivityDate).Days.Equals(1) ? ListWithAdd(items, prevItem) : items).ToList();
Next, we'll build the following events in a similar fashion, and likewise aggregate it.
var nextEvents = orderedEvents.SkipWhile(a => a.ID != activity.ID);
relatedActivities = nextEvents.Aggregate<Activity, List<Activity>>(relatedActivities, (items, nextItem) => nextItem.ActivityDate.Subtract(items.OrderBy(a => a.ActivityDate).Last().ActivityDate).Days.Equals(1) ? ListWithAdd(items, nextItem) : items).ToList();
You can properly sort the result afterwards, as now relatedActivities should contain all activities with no gaps. It won't immediately break when it hits the first gap, no, but I don't think you can literally break out of a LINQ. So it instead just ignores anything which it finds past a gap.
Note that this example code only operates on the actual difference in time. Your example output seems to imply that you need some other comparison factors, but this should be enough to get you started. Just add the necessary logic to the date subtraction comparison in both entries.

C# Linq statements or a foreach() for totalling subsets?

Which of these solutions is preferred?
For a List:
List<ExampleInfo> exampleList = new List<ExampleInfo>();
public class ExampleInfo
{
internal ExampleInfo()
{ }
/* Business Properties */
public int Id { get; set; }
public string Type { get; set; }
public decimal Total { get; set; }
}
I wish to get subtotals based off the 'Total' value.
Option 1:
var subtotal1 = exampleList.Where(x => x.Type == "Subtype1").Sum(x => x.Total);
var subtotal2 = exampleList.Where(x => x.Type == "Subtype2").Sum(x => x.Total);
Option 2:
decimal subtotal1 = 0m;
decimal subtotal2 = 0m;
foreach (ExampleInfo example in exampleList)
{
switch (example.Type)
{
case "Subtype1":
subtotal1 += example.Total;
break;
case "Subtype2":
subtotal2 += example.Total;
break;
default:
break;
}
}
The list will be <10 items in most cases.
Edit: Chris raised a very good point I did not mention. The program is already using .NET Framework 3.5 SP1 so compatibility isn't an important consideration here.
Regardless of list size, if you're targeting .NET 3.5 I'd go with LINQ, if only for readability.
I am a great fan of writing what you mean, not how it's done and LINQ makes this very easy in such cases.
You can probably even pull the calculations into a single LINQ statement, grouping by Type. That way you won't have two loops for LINQ but only one as in the second example:
var subtotals = from x in exampleList
group x by x.Type into g
select new { Type = x.Key, SubTotal = g.Sum(x => x.Total) };
(Not entirely sure whether the code works as it, it's just a quick adaption from one of the 101 LINQ Samples. Syntax should be ok, though.)
Both of these examples have duplicated code, and both aren't ready for a change on Type - what if it had three values? What if it had 30?
You could use linq to group by it and get the total:
var totals = from p in exampleList
group p by p.Type into g
select new { Type = g.Key, Total = g.Sum(p => p.Total ) };
So totals is a collection of objects with the properties Type and Total
Option 3
var groupings = exampleList
.GroupBy(x => x.Type, x => x.Total)
.Select(x => new { Type = x.Key, SubTotal = x.Sum() } );
You'll have a list of classes like so:
class <Anonymous>
{
public string Type { get; }
public decimal SubTotal { get; }
}
Enumerate and assign to the appropriate value, although it might be overkill for such a small set.
I don't think there would be much performance difference for such small lists.
Option 1 will iterate through the list twice while Option 2 only iterates through the list once. That may be more important to note for larger lists than small ones.
Option 1 is more readable, but I would definitely make sure to make a comment that it iterates through the list twice.
The obvious advantage to Option 2 is that the code works in .NET Framework 2.0. Using LINQ means that your application requires .NET Framework 3.5.
For option1, internally the foreach loop will be executed twice by the C# run-time env. Hence processing type will be more. But for <10 items, it hardly makes any difference, and option 1 seems more readable. I would go with option 1 for < 10 items.

Categories