Filling in missing dates using a linq group by date query - c#

I have a Linq query that basically counts how many entries were created on a particular day, which is done by grouping by year, month, day. The problem is that because some days won't have any entries I need to back fill those missing "calendar days" with an entry of 0 count.
My guess is that this can probably be done with a Union or something, or maybe even some simple for loop to process the records after the query.
Here is the query:
from l in context.LoginToken
where l.CreatedOn >= start && l.CreatedOn <= finish
group l by
new{l.CreatedOn.Year, l.CreatedOn.Month, l.CreatedOn.Day} into groups
orderby groups.Key.Year , groups.Key.Month , groups.Key.Day
select new StatsDateWithCount {
Count = groups.Count(),
Year = groups.Key.Year,
Month = groups.Key.Month,
Day = groups.Key.Day
}));
If I have data for 12/1 - 12/4/2009 like (simplified):
12/1/2009 20
12/2/2009 15
12/4/2009 16
I want an entry with 12/3/2009 0 added by code.
I know that in general this should be done in the DB using a denormalized table that you either populate with data or join to a calendar table, but my question is how would I accomplish this in code?
Can it be done in Linq? Should it be done in Linq?

I just did this today. I gathered the complete data from the database and then generated a "sample empty" table. Finally, I did an outer join of the empty table with the real data and used the DefaultIfEmpty() construct to deal with knowing when a row was missing from the database to fill it in with defaults.
Here's my code:
int days = 30;
// Gather the data we have in the database, which will be incomplete for the graph (i.e. missing dates/subsystems).
var dataQuery =
from tr in SourceDataTable
where (DateTime.UtcNow - tr.CreatedTime).Days < 30
group tr by new { tr.CreatedTime.Date, tr.Subsystem } into g
orderby g.Key.Date ascending, g.Key.SubSystem ascending
select new MyResults()
{
Date = g.Key.Date,
SubSystem = g.Key.SubSystem,
Count = g.Count()
};
// Generate the list of subsystems we want.
var subsystems = new[] { SubSystem.Foo, SubSystem.Bar }.AsQueryable();
// Generate the list of Dates we want.
var datetimes = new List<DateTime>();
for (int i = 0; i < days; i++)
{
datetimes.Add(DateTime.UtcNow.AddDays(-i).Date);
}
// Generate the empty table, which is the shape of the output we want but without counts.
var emptyTableQuery =
from dt in datetimes
from subsys in subsystems
select new MyResults()
{
Date = dt.Date,
SubSystem = subsys,
Count = 0
};
// Perform an outer join of the empty table with the real data and use the magic DefaultIfEmpty
// to handle the "there's no data from the database case".
var finalQuery =
from e in emptyTableQuery
join realData in dataQuery on
new { e.Date, e.SubSystem } equals
new { realData.Date, realData.SubSystem } into g
from realDataJoin in g.DefaultIfEmpty()
select new MyResults()
{
Date = e.Date,
SubSystem = e.SubSystem,
Count = realDataJoin == null ? 0 : realDataJoin.Count
};
return finalQuery.OrderBy(x => x.Date).AsEnumerable();

I made a helper function which is designed to be used with anonymous types, and reused in as generic way as possible.
Let's say this is your query to get a list of orders for each date.
var orders = db.Orders
.GroupBy(o => o.OrderDate)
.Select(o => new
{
OrderDate = o.Key,
OrderCount = o.Count(),
Sales = o.Sum(i => i.SubTotal)
}
.OrderBy(o => o.OrderDate);
For my function to work please note this list must be ordered by date. If we had a day with no sales there would be a hole in the list.
Now for the function that will fill in the blanks with a default value (instance of anonymous type).
private static IEnumerable<T> FillInEmptyDates<T>(IEnumerable<DateTime> allDates, IEnumerable<T> sourceData, Func<T, DateTime> dateSelector, Func<DateTime, T> defaultItemFactory)
{
// iterate through the source collection
var iterator = sourceData.GetEnumerator();
iterator.MoveNext();
// for each date in the desired list
foreach (var desiredDate in allDates)
{
// check if the current item exists and is the 'desired' date
if (iterator.Current != null &&
dateSelector(iterator.Current) == desiredDate)
{
// if so then return it and move to the next item
yield return iterator.Current;
iterator.MoveNext();
// if source data is now exhausted then continue
if (iterator.Current == null)
{
continue;
}
// ensure next item is not a duplicate
if (dateSelector(iterator.Current) == desiredDate)
{
throw new Exception("More than one item found in source collection with date " + desiredDate);
}
}
else
{
// if the current 'desired' item doesn't exist then
// create a dummy item using the provided factory
yield return defaultItemFactory(desiredDate);
}
}
}
The usage is as follows:
// first you must determine your desired list of dates which must be in order
// determine this however you want
var desiredDates = ....;
// fill in any holes
var ordersByDate = FillInEmptyDates(desiredDates,
// Source list (with holes)
orders,
// How do we get a date from an order
(order) => order.OrderDate,
// How do we create an 'empty' item
(date) => new
{
OrderDate = date,
OrderCount = 0,
Sales = 0
});
Must make sure there are no duplicates in the desired dates list
Both desiredDates and sourceData must be in order
Because the method is generic if you are using an anonymous type then the compiler will automatically tell you if your 'default' item is not the same 'shape' as a regular item.
Right now I include a check for duplicate items in sourceData but there is no such check in desiredDates
If you want to ensure the lists are ordered by date you will need to add extra code

Essentially what I ended up doing here is creating a list of the same type with all the dates in the range and 0 value for the count. Then union the results from my original query with this list. The major hurdle was simply creating a custom IEqualityComparer. For more details here: click here

You can generate the list of dates starting from "start" and ending at "finish", a then step by step check the number of count for each date separately

Related

LINQ: select specific value in a datatable column

In table I have 4 Columns GroupName, Display, Value and ID
How can I just show a specific data in display. I only want to show some of the groupNames Data
for example I only want to show Groupname = company and display = Forbes
Here's my linq
sample = (from c in smsDashboardDBContext.CodeDefinitions
orderby c.Display ascending
select new CodeDefinitionDTO
{
GroupName = c.GroupName,
Display = c.Display,
Value = c.Value,
Id = c.Id
}).ToList();
You can add a where statement in the query.
where c.GroupName == "company" && c.Display == "Forbes"
I only want to show some of the groupNames Data for example I only want to show Groupname = company and display = Forbes
Before the ToList, use a Where to keep only those items that you want to show:
var company = ...
var forbes = ...
var result = smsDashboardDBContext.CodeDefinitions
.OrderBy(codeDefinition => codeDefintion.Display)
.Select(codeDefinition => new CodeDefinitionDTO
{
Id = codeDefinition.Id,
GroupName = codeDefinition.GroupName,
Display = codeDefinition.Display,
Value = codeDefinition.Value,
})
.Where(codeDefinition => codeDefition.GroupName == company
&& codeDefintion.Display == forbes);
In words:
Order all codeDefinitions that are in the table of CodeDefintions by ascending value of property codeDefintion.Display.
From every codeDefinition in this ordered sequence make one new CodeDefinitionDTO with the following properties filled: Id, GroupName, Display, Value
Frome every codeDefintion in this sequence of CodeDefinitionDTOs, keep only those codeDefinitions that have a value for property GroupName that equals company and a value for property Display that equals forbes.
There is room for improvement!
Suppose your table has one million elements, and after the Where, only five elements are left. Then you will have sorted almost one million elements for nothing. Consider to first do the Where, then the Order and finally a Select.
In LINQ, try to do aWhere as soon as possible: all following statements will have to work on less items
In LINQ, try to do a Select as late as possible, preferrably just before the ToList / FirstOrDefault / ... This way the Select has to be done for as few elements as possible
So first the Where, then the OrderBy, then the Select, and finally the ToList / FirstOrDefault, etc:
var result = smsDashboardDBContext.CodeDefinitions
.Where(codeDefinition => ...);
.OrderBy(codeDefinition => codeDefintion.Display)
.Select(codeDefinition => new CodeDefinitionDTO
{
...
});

How to join two different databases' tables in C# with Linq?

I try to join two different databases tables in c# but it gives me an error how can I handle that ?
this is my query:
var list = (from h in db.database1.AsEnumerable()
join j in NV_DB.database2.AsEnumerable()
on h.Creation_Date equals j.Creation_Date
where j.Ship_Status == 3 && h.CustomerNo == CustomerNo
select new
{
shipName = h.ShipName,
creationDate = j.Creation_Date,
endingDate = j.Ending_Date
}
).ToList();
if I do like this it gives me System.OverflowException error. But when I run this in sql, it gives me just 30 records*
You need to remove °AsEnumerable`. While it does not run the sql, when you use it in the where it actually brings the entire tables in memory and then performs the job where part of your query
Your answer is basically the first comment in the accepted answer here: Am I misunderstanding LINQ to SQL .AsEnumerable()?
While AsEnumerable doesn't evaluate the query at the time that it's called , it definitely has an effect. Anything further called on the query will be evaluated using LINQ to objects, so you can't compose additional elements onto the query (another Where or an OrderBy or anything of that nature) that will become part of the SQL statement.
In depth explanation here: https://www.codeproject.com/Articles/732425/IEnumerable-Vs-IQueryable
While querying data from database, IEnumerable executes select query on server side, load data in-memory on client side and then filter data. Hence does more work and becomes slow.
While querying data from database, IQueryable executes select query on server side with all filters. Hence does less work and becomes fas
To debug this, start dividing your statements into smaller steps:
var list1 = db.database1.AsEnumerable().ToList();
var list2 = NV_DB.database2.AsEnumerable().ToList();
var joinResult = list1.Join(list2, // join list1 and list2
list1Row => list1Row.CreationDate, // from every row in list1 take the CreationDate
list2Row => list2Row.CreationDate, // from every row in list2 take the CreationDate
(list1Row, list2Row) => new // when they match, make one new object
{
// You only need the following properties:
ShipName = list1Item.ShipName,
CreationDate = list2Item.CreationDate,
EndingDate = list2Item.EndingDate,
ShipStatus = list2Item.ShipStatus,
CustomerNo = list1Item.CustomerNo,
})
.ToList();
var whereResult = joinResult
.Where(joinedRow => joinedRow.ShipStatus == 3
&& joinedRow.CustomerNo == customerNo)
.ToList();
var selectResult = whereResult.Select(whereResultRow => new
{
ShipName = whereResultRow.ShipName,
CreationDate = whereResultRow.CreationDate,
EndingDate = whereResultRow.Ending_Date,
})
.ToList();
This is executed completely as enumerable (in your local process, not by the database management system). My guess would be that this runs smoothly.
Now combine thw first few statements:
var joinResult = db.database1.AsEnumerable()
.Join(NV_DB.database2.AsEnumerable(), // join list1 and list2
list1Row => list1Row.CreationDate, // from every row in list1 take the CreationDate
list2Row => list2Row.CreationDate, // from every row in list2 take the CreationDate
(list1Row, list2Row) => new // when they match, make one new object
{
// You only need the following properties:
ShipName = list1Item.ShipName,
CreationDate = list2Item.CreationDate,
EndingDate = list2Item.EndingDate,
ShipStatus = list2Item.ShipStatus,
CustomerNo = list1Item.CustomerNo,
})
.ToList();
When this works, add the Where:
var whereResult = db.database1.AsEnumerable()
.Join(NV_DB.database2.AsEnumerable(), ...)
.Where(joinedRow => joinedRow.ShipStatus == 3
&& joinedRow.CustomerNo == customerNo)
.ToList();
Etc.
Using your debugger, you'll find the problem within a few minutes (depending on your compilation time). My guess is that it is within your join.

How to compare previous record column value with current record column value of a list using LINQ query in C#

Compare previous record column value with current record column value in c# linq
My linq query is as follows,
var list = (from v in db.VehicleLocation
where v.VehicleId == vehicleId
select new VehicleLocationModel {
Id = v.Id,
Location = v.Location,
DateTimestamp = v.DateTimestamp,
DiffTimestamp = v.DateTimestamp - previoustimestamp
}).OrderBy(x => x.DateTimestamp).ToList();
please help me...
You can try something like that:
var list = (from v in db.VehicleLocation.Where(x.VehicleId == vehicleId)
from v2 in db.VehicleLocation.Where(x.VehicleId == vehicleId)
where v.DateTimestamp > v2.previoustimestamp
group v2 by new { v.Id, v.Location, v.DateTimestamp } into sub
select new VehicleLocationModel
{
Id = sub.Key.Id,
Location = sub.Key.Location,
DateTimestamp = sub.Key.DateTimestamp,
DiffTimestamp = sub.Key.DateTimestamp - sub.Max(x => x.DateTimestamp)
}).OrderBy(x => x.DateTimestamp).ToList();
So you have a sequence of VehicleLocations, some of them are from the vehicle with vehicleId. Every VehicleLocation has a TimeStamp.
You want, among with some other properties all VehicleLocations of the vehicle with vehicleId, together with their TimeStamps and a value for DiffTimeStamp, which is the difference between the TimeStamp and something what you call "previous timestamp"
First you'll have to define the previous time stamp. I guess, that you mean that if you'd order all VehicleLocations of one particular Vehicle by ascending timestamp, that the "previous time stamp" of any but the first VehicleLocation is the timestamp of the VehicleLocation prior to the current one.
To make the definition complete: the previous time stamp of the first element is the timestamp of the element itself. This makes the DiffTimeStamp the difference between the current time stamp and the previous time stamp. DiffTimeStamp of the first item in the sequence is TimeSpan.Zero
I think the fastest method would be to transfer the ordered sequence of (the requested properties) of all VehicleLocations for the vehicle with vehicleId to local memory and then yield return the requested data:
IEnumerable<VehicleLocationModel> FetchModelsById(int vehicleId)
{
var vehicleLocations = db.VehicleLocations
.Where(vehicleLocation => vehicleLocation.VehicleId == vehicleId)
.Select(vehicleLocation => new VehicleLocationModel()
{
Id = vehicleLocation.Id,
Location = vehicleLocation.Location,
DateTimeStamp = vehicleLocation.DateTimestamp,
})
.OrderBy(vehicleLocation => vehicleLocation.TimeStamp);
Note: all values but DiffTimeStamp are filled. We'll only yield return VehicleLocations if the collection contains elements. The DiffTimeStamp of the first element will equal TimeSpan.Zero:
Continuing:
// only yield return something if there are elements:
if (vehicleLocations.Any())
{
// the first one will be different:
var firstElement = vehicleLocations.First();
firstElement.DiffTimeStamp = TimeSpan.Zero;
yield return firstElement;
// the rest of the elements:
DateTime previousTimeStamp = firstElement.DateTimeStamp;
foreach (VehicleLocation location in vehicleLocations.Skip(1))
{
location.DiffTimeStamp = location.DateTimeStamp - previousTimeStamp;
yield return location;
previousTimeStamp = location.DateTimeStamp;
}
}
}
}
The nice thing about this solution (apart from that its easy to understand) is that the database has to do less work, it has to transfer less bytes to your local process (the slowest part), and both the original sequence on the database side and the resulting sequence on local side are iterated only once. This is at the cost that your local process has to do the subtractions of the DatetimeStamp and the PreviousDateTimeStamp. But this is done at utmost once per iterated element

C# Filter List to remove any double object

Have searched ant tested many examples in this forum but can't get a fully working method.
I am using linq to bulk insert a list of entity classes (RemoteReadings).
Due to unique constraints I need to filter out any items already inserted.
Uniqiuness is composed of 2 columns meterid and datetime in RemoteReadings table.
// approx 5000 records (I need to do this in batches of 2000 due to a
// constraint in L2S,but can do this after get this working)
List<RemoteReading> lst = createListFromCSV();
// Option 1:
// This does not work as am comparing memory list to db list. I need to use contains() method.
// Actually am trying to accomplish this infollowing examples.
List<RemoteReading> myLst = (from ri in db.RemoteReadings
from l in lst
where l.meterid = ri.meterid
&& l.date = r.date
select ri).ToList();
////
// Option2:
// Get the list from DB that are in memory lst
List<RemoteReading> myLst = (from ri in db.RemoteReadings
where
// where in this list by comparing meaterid and datemeaured
(from l in lst
select
/// help here !
///
select ri).ToList<RemoteInterconnectorReading>();
// Option3:
// Get the list from lst that are not in database
// I am bit confused here !
// Tried also to remove from list any duplicates:
List<RemoteReading> result = List<RemoteReading>)myLst.Except(lst).ToList<RemoteReading>();
// Ultimately
db.RemoteReading.InsertAllOnSubmit(result);
db.submitChanges();
Any help please?
Due to limitations in EF, we can't join DB query with in-memory list. Also, Contains can only be used with primitive list. So we need to make some efforts to find the duplicates on two columns.
var newItems = createListFromCSV();
var meterIds = newItems.Select(n=> n.meterid).Distinct().ToList();
var dates = newItems.Select(n=> n.date).Distinct().ToList();
var probableMatches = (from ri in db.RemoteReadings
where (meterIds.Contains(ri.meterids)
|| dates.Contains(ri.date)
select new {ri.merterid, ri.date}).ToList();
var duplicates = (from existingRi in probaleMatches
join newRi in newItems
on new {existingRi.meterid, existingRi.date}
equals {newRi.meterid, newRi.date}
select newRi).ToList();
var insertList = newItems.Except(duplicates).ToList();
db.RemoteReadings.Insert(insertList); // or whatever
With the great help of aSharma and some other tweaks, I finally got a working and tested method. As my lists contain over 5000 items I had to execute in batches to override the 2112 SQL RPC call limitation. Added some comments and credits :)
/// List<RemoteReadings> contains a list of database Entity Classes RemoteReadings
public List<RemoteReadings> removeDublicatesFirst(List<RemoteReadings> lst)
{
try
{
DataClasses1DataContext db = new DataClasses1DataContext();
var meterIds = lst.Select(n => n.meterId).Distinct().ToList();
var dates = lst.Select(n => n.mydate).Distinct().ToList();
var myfLst = new List<RemoteReadings>();
// To avoid the following SqlException, Linq query should be exceuted in batches as follows.
//{System.Data.SqlClient.SqlException
// The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect.
// Too many parameters were provided in this RPC request. The maximum is 2100.
foreach (var batch in dates.Batch(2000))
{
// Gets a list of possible matches from DB.
var probableMatches = (from ri in db.RemoteReadingss
where (meterIds.Contains(ri.meterId)
&& batch.Contains(ri.mydate))
select new { ri.meterId, ri.mydate }).ToList();
// Join the probableMatches with the lst in memory on unique
// constraints meterid.date to find any duplicates
var duplicates = (from existingRi in probableMatches
join newRi in lst
on new
{
existingRi.meterId,
existingRi.mydate
}
equals new { newRi.meterId, newRi.mydate }
select newRi).ToList();
//Add duplicates in a new List due to batch executions.
foreach (var s in duplicates)
{
myfLst.Add(s);
}
}
// Remove the duplicates from lst found in myfLst;
var insertList = lst.Except(myfLst).ToList();
return insertList;
}
catch
(Exception ex)
{
return null;
}
}
// Found this extension Class to divide IEnumerable in batches.
// http://stackoverflow.com/a/13731854/288865
public static class MyExtensions
{
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items,
int maxItems)
{
return items.Select((item, inx) => new { item, inx })
.GroupBy(x => x.inx / maxItems)
.Select(g => g.Select(x => x.item));
}
}

linq-to-sql grouping anonymous type

I have a table the contains appointments. These appointments have different statuses (byte from 1 to 5) and dates; the column for the date is simply called AppointDate. I pass in a list of IDs and I want to group the result based on the status AND whether the date of the appointment is past or not.
TheIDs is a list of longs that's passed in as the parameter. This is what I have so far:
var TheCounterInDB = (from a in MyDC.Appointments
where TheIDs.Contains(a.ID)
group a by a.AppointStatus into TheGroups
select new {
TheStatus = TheGroups.Key,
TheTotalCount = TheGroups.Count(),
TheLateCount = ?,
ThePendingCount = ?
}).ToList();
Basically, I want TheLateCount to be the count of all the appointments where status is 1 AND the date is past and ThePendingCount to be the count where status is 1 AND the date is not past. My anonymous type is good to return the count of all the different statuses (that's where the .Key is) but I'm wondering how to best add the date requirement into the grouping.
Thanks for your suggestions.
var TheCounterInDB = (from a in MyDC.Appointments
where TheIDs.Contains(a.ID)
group a by a.AppointStatus into TheGroups
select new {
TheStatus = TheGroups.Key,
TheTotalCount = TheGroups.Count(),
TheLateCount = TheGroups.Count(x => x.AppointStatus == 1 && x.AppointDate < DateTime.Today),
ThePendingCount = TheGroups.Count(x => x.AppointStatus == 1 && x.AppointDate >= DateTime.Today)
}).ToList();

Categories