Grouping and sum

Grouping and sum - c#

I have a list as follows which will contain the following poco class.
public class BoxReportView
{
public DateTime ProductionPlanWeekStarting { get; set; }
public DateTime ProductionPlanWeekEnding { get; set; }
public string BatchNumber { get; set; }
public string BoxRef { get; set; }
public string BoxName { get; set; }
public decimal Qty { get; set; }
public FUEL_KitItem KitItem { get; set; }
public decimal Multiplier { get; set; }
}
I am wanting to group the report and sum it by using the BoxName and also the Qty SO I tried the following
var results = from line in kitItemsToGroup
group line by line.BoxName into g
select new BoxReportView
{
BoxRef = g.First().BoxRef,
BoxName = g.First().BoxName,
Qty = g.Count()
};
In My old report I was just doing this
var multiplier = finishedItem.SOPOrderReturnLine.LineQuantity -
finishedItem.SOPOrderReturnLine.StockUnitDespatchReceiptQuantity;
foreach (KitItem kItem in kitItems.Cast<KitItem().Where(z => z.IsBox == true).ToList())
{
kittItemsToGroup.Add(new BoxReportView() {
BatchNumber = _batchNumber,
ProductionPlanWeekEnding = _weekEndDate,
ProductionPlanWeekStarting = _weekStartDate,
BoxRef = kItem.StockCode,
KitItem = kItem,
Multiplier = multiplier,
Qty = kItem.Qty });
}
}
Then I was just returning
return kitItemsToGroup;
But as I am using it as a var I cannot what is best way to handle the grouping and the sum by box name and qty.

Whether it is the best way depends upon your priorities. Is processing speed important, or is it more important that the code is easy to understand, easy to test, easy to change and easy to debug?
One of the advantages of LINQ is, that it tries to avoid enumeration of the source more than necessary.
Are you sure that the users of this code will always need the complete collection? Can it be, that now, or in near future, someone only wants the first element? Or decides to stop enumeration after he fetched the 20th element and saw that there was nothing of interest for him?
When using LINQ, try to return IEnumerable<...> as long as possible. Let only the end-user who will interpret your LINQed data decide whether he wants to take only the FirstOrDefault(), or Count() everything, or put it in a Dictionary, or whatever. It is a waste of processing power to create a List if it is not going to be used as a List.
your LINQ code and your foreach do some completely different things. Alas it is quite common here on StackOverflow for people to ask for LINQ statements without really specifying their requirements. So I'll have to guess something in between your LINQ statement and your foreach.
Requirement Group the input sequence of kitItems, which are expected to be Fuel_KitItems into groups of BoxReportViews with the same BoxName, and select several properties from every Fuel_KitItem in each group.
var kitItemGroups = kitItems
.Cast<Fuel_KitItem>() // only needed if kitItems is not IEnumerable<Fuel_KitItem>
// make groups of Fuel_KitItems with same BoxName:
.GroupBy(fuelKitItem => fuelKitItem.BoxName,
// ResultSelector, take the BoxName and all fuelKitItems with this BoxName:
(boxName, fuelKitItemsWithThisBoxName) => new
{
// Select only the properties you plan to use:
BoxName = boxName,
FuelKitItems = fuelKitItemsWithThisBoxName.Select(fuelKitItem => new
{
// Only Select the properties that you plan to use
BatchNumber = fuelKitItem.BatchNumber,
Qty = fuelKitItem.Qty,
...
// Not needed, they are all equal to boxName:
// BoxName = fuelKitItem.BoxName
})
// only do ToList if you are certain that the user of the result
// will need the complete list of fuelKitItems in this group
.ToList(),
});
Usage:
var kitItemGroups = ...
// I only need the KitItemGroups with a BoxName starting with "A"
var result1 = kitItemGroups.Where(group => group.BoxName.StartsWith("A"))
.ToList();
// Or I only want the first three after sorting by group size
var result2 = kitItemGroups.OrderBy(group => group.FuelKitItems.Count())
.Take(3)
.ToList();
Efficiency Improvements: As long as you don't know how your LINQ will be used, don't make it a List. If you know that chances are high that the Count of group.FuelKitItems is needed, to a ToList

Related

Using Rx to merge multiple sources by key

I'm kinda new to the reactive extensions, but since I have a very data-flow heavy problem, I'm assuming, it could massively simplify my implementation. But it seems my problem is a bit more exotic than I anticipated.
Problem
I have multiple data sources, which all emit part of the data for the same entity. eg I have datasource1, which emits the first name of a person, and datasource2 which emits the last name of a person. The arrival of these data is completely unpredictable.
What I need to do now, is to observe both those sources, and to use some kind of operator or subject, which allows me to await both source-observables. I only want to continue if both datasources return their specific part. Both my sources also pass a key for the data, so it's possible to link the together at a later point.
Is there a construct built into reactive, which allows me to that? Or is reactive simply the wrong toolset to solve my problem?

I can't judge whether Rx or async/await or TPL-Dataflow is a better solution, since that would probably depend on your larger application. Some reproducible code would really help.
Anyhow, here's an Rx solution. I'm assuming for now datasource1 and datasource2 are observables of different types, or easily convertible to observables of different types. If they were observables of the same type, this solution would also work, but you would have other options as well:
var firstNameSource = new Subject<FirstNameMessage>();
var lastNameSource = new Subject<LastNameMessage>();
var timeout = TimeSpan.FromSeconds(1); //Set to length of time willing to wait
var join = firstNameSource.Join(lastNameSource,
fnm => Observable.Timer(timeout),
lnm => Observable.Timer(timeout),
(fnm, lnm) => new { FirstNameMessage = fnm, LastNameMessage = lnm }
)
.Where(a => a.FirstNameMessage.Id == a.LastNameMessage.Id)
.Select(a => Tuple.Create(a.FirstNameMessage.Name, a.LastNameMessage.Name))
.Timeout(timeout)
.Catch(Observable.Empty<Tuple<string, string>>());
Using these sample classes:
public class FirstNameMessage
{
public int Id { get; set; }
public string Name { get; set; }
}
public class LastNameMessage
{
public int Id { get; set; }
public string Name { get; set; }
}
Here's some sample subscription/execution code:
join.Subscribe(t => Console.WriteLine($"{t.Item1} {t.Item2}"), () => Console.WriteLine("No more names!"));
firstNameSource.OnNext(new FirstNameMessage{Id = 1, Name = "John" });
lastNameSource.OnNext(new LastNameMessage{Id = 1, Name = "Smith" });
lastNameSource.OnNext(new LastNameMessage { Id = 2, Name = "Jones" });
await Task.Delay(TimeSpan.FromMilliseconds(500));
firstNameSource.OnNext(new FirstNameMessage { Id = 2, Name = "Paul" });
firstNameSource.OnNext(new FirstNameMessage { Id = 3, Name = "Larry" });
await Task.Delay(TimeSpan.FromMilliseconds(1500));
lastNameSource.OnNext(new LastNameMessage { Id = 3, Name = "Fail" });
firstNameSource.OnNext(new FirstNameMessage { Id = 4, Name = "Won't Work" });
lastNameSource.OnNext(new LastNameMessage { Id = 4, Name = "Subscription terminated" });
Explanation:
The crucial part of this solution is the Join operator. Whereas a standard DB/LINQ Join joins things by key, Rx's Join joins by time window. So the Join above joins any FirstNameMessage and LastNameMessage that are within timeout timespan of each other. Since we also want to join by key, that's why the Where clause is there.
The TimeOut and Catch calls at the end are possibly superfluous: They just serve to terminate the subscription. It sounds like your solution may just be waiting for one value, not multiple, so that may be required.

Slow performance in getting model from list model using enumerable linq

I decided to pour database records into List<> model and use enumerable Linq to get record from it. It have 141,856 records in it. What we found instead is it is pretty slow.
So, any suggestion or recommendation on making it run very quickly?
public class Geography
{
public string Zipcode { get; set; }
public string City { get; set; }
public string State { get; set; }
}
var geography = new List<Geography>();
geography.Add(new Geography() { Zipcode = "32245", City = "Jacksonville", State = "Florida" });
geography.Add(new Geography() { Zipcode = "00001", City = "Atlanta", State = "Georgia" });
var result = geography.Where(x => (string.Equals(x.Zipcode, "32245", String Comparison.InvariantCulterIgnoreCase))).FirstOrDefault();
When we have 86,000 vehicles in Inventory and we want to use parallel task to get it done quickly but it become very slow when geography is being looked up.
await Task.WhenAll(vehicleInventoryRecords.Select(async inventory =>
{
var result = geography.Where(x => (string.Equals(x.Zipcode, inventory.Zipcode, String Comparison.InvariantCulterIgnoreCase))).FirstOrDefault();
}));

Use dictionary<string, Geography> to store geography data. Looking up data in dictionary by key is O(1) operation while for list it is O(n)

You haven't mentioned if your ZIP codes are unique, so I'll assume they aren't. If they are - look at Giorgi's answer and skip to part 2 of my answer.
1. Use lookups
Since you're looking up your geography list multiple times by the same property, you should group the values by Zipcode. You can do this easily by using ToLookup - this will create a Lookup object. It is similar to a Dictionary, except it can multiple values as it's value. Passing a StringComparer.InvariantCultureIgnoreCase as the second parameter to your ToLookup will make it case-insensitive.
var geography = new List<Geography>();
geography.Add(new Geography { Zipcode = "32245", City = "Jacksonville", State = "Florida" });
geography.Add(new Geography { Zipcode = "00001", City = "Atlanta", State = "Georgia" });
var geographyLookup = geography.ToLookup(x => x.Zipcode, StringComparer.InvariantCultureIgnoreCase);
var result = geographyLookup["32245"].FirstOrDefault();
This should increase your performance considerably.
2. Parallelize with PLINQ
The way you parallelize your lookups is questionable. Luckily, .NET has PLINQ. You can use AsParallel and a parallel Select to asynchronously iterate over your vehicleInventoryRecords like this:
var results = vehicleInventoryRecords.AsParallel().Select(x => geographyLookup[x].FirstOrDefault());
Using Parallel.ForEach is another good option.

Updating entire node with mutating cypher in Neo4jclient

I need to update all the properties of a given node, using mutating cypher. I want to move away from Node and NodeReference because I understand they are deprecated, so can't use IGraphClient.Update. I'm very new to mutating cypher. I'm writing in C#, using Neo4jclient as the interface to Neo4j.
I did the following code which updates the "Name" property of a "resunit" where property "UniqueId" equals 2. This works fine. However,
* my resunit object has many properties
* I don't know which properties have changed
* I'm trying to write code that will work with different types of objects (with different properties)
It was possible with IGraphClient.Update to pass in an entire object and it would take care of creating cypher that sets all properies.
Can I somehow pass in my object with mutating cypher as well?
The only alternative I can see is to reflect over the object to find all properties and generate .Set for each, which I'd like to avoid. Please tell me if I'm on the wrong track here.
string newName = "A welcoming home";
var query2 = agencyDataAccessor
.GetAgencyByKey(requestingUser.AgencyKey)
.Match("(agency)-[:HAS_RESUNIT_NODE]->(categoryResUnitNode)-[:THE_UNIT_NODE]->(resunit)")
.Where("resunit.UniqueId = {uniqueId}")
.WithParams(new { uniqueId = 2 })
.With("resunit")
.Set("resunit.Name = {residentialUnitName}")
.WithParams(new { residentialUnitName = newName });
query2.ExecuteWithoutResults();

It is indeed possible to pass an entire object! Below I have an object called Thing defined as such:
public class Thing
{
public int Id { get; set; }
public string Value { get; set; }
public DateTimeOffset Date { get; set; }
public int AnInt { get; set; }
}
Then the following code creates a new Thing and inserts it into the DB, then get's it back and updates it just by using one Set command:
Thing thing = new Thing{AnInt = 12, Date = new DateTimeOffset(DateTime.Now), Value = "Foo", Id = 1};
gc.Cypher
.Create("(n:Test {thingParam})")
.WithParam("thingParam", thing)
.ExecuteWithoutResults();
var thingRes = gc.Cypher.Match("(n:Test)").Where((Thing n) => n.Id == 1).Return(n => n.As<Thing>()).Results.Single();
Console.WriteLine("Found: {0},{1},{2},{3}", thingRes.Id, thingRes.Value, thingRes.AnInt, thingRes.Date);
thingRes.AnInt += 100;
thingRes.Value = "Bar";
thingRes.Date = thingRes.Date.AddMonths(1);
gc.Cypher
.Match("(n:Test)")
.Where((Thing n) => n.Id == 1)
.Set("n = {thingParam}")
.WithParam("thingParam", thingRes)
.ExecuteWithoutResults();
var thingRes2 = gc.Cypher.Match("(n:Test)").Where((Thing n) => n.Id == 1).Return(n => n.As<Thing>()).Results.Single();
Console.WriteLine("Found: {0},{1},{2},{3}", thingRes2.Id, thingRes2.Value, thingRes2.AnInt, thingRes2.Date);
Which gives:
Found: 1,Foo,12,2014-03-27 15:37:49 +00:00
Found: 1,Bar,112,2014-04-27 15:37:49 +00:00
All properties nicely updated!

List<object> Self-Filter

I have a list like
List<VoieData> listVoieData = new List<VoieData>();
and in VoieData Class I have :
public class VoieData
{
public int Depart { set; get; }
public int Arrive { set; get; }
public int DistanceDepart { set; get; }
public int DistanceArrive { set; get; }
}
Since I have a massive values I want to only consider all my Depart number , I would like to filter the listVoieData by finding the Arrive only have the same value as the
Depart
for example I have
listVoieData.Select(p=>p.Depart).ToList()= List<int>{1,2,3};
listVoieData.Select(p=>p.Arrive).ToList()= List<int>{1,2,3,4,5};
I need to throw away the entire VoieData which contain {4,5} as Arrive
right now my soulution is like this , but it' s not correct ;
List<VoieData> listVoieDataFilter = listVoieData .Join(listVoieData , o1 => o1.Arrive, o2 => o2.Depart, (o1, o2) => o1).ToList();
Sorry for the confusing question ;
I want to remove Arrive which is different from all the Depart in the list list , and return the new
List
it 's not only in one VoieData;
Arrive!=Depart
Thanks

I think you want to remove all objects where Arrive is not in any of the Depart from any object. In that case, first get all Depart and then filter by Arrive:
HashSet<int> allDepart = new HashSet<int>(listVoieData.Select(x => x.Depart));
var result = listVoieData.Where(v => !allDepart.Contains(v.Arrive))
We use a HashSet<int> for efficiency.

Use LINQ Where:
var records = listVoieData.Where(x => x.Arrive == x.Depart);
This will return results where both Arrive and Depart are the same.

That would be a typical case to use linq.
something like:
var res = from data in listVoieData
where data.Depart == data.Arrive
select data;
and then optionally just use res.ToArray() to run the query and get the array.

Since you've stated that you want:
I want to remove Arrive which is different from all the Depart
This can be re-phrased as, "The set of all arrivals except those in the set of departures", which translates very nicely into the following LINQ query:
var arrivalsWithNoDepartures = listVoieData.Select(p=>p.Arrive)
.Except(listVoieData.Select(p=>p.Depart));

How do i get the difference in two lists in C#?

Ok so I have two lists in C#
List<Attribute> attributes = new List<Attribute>();
List<string> songs = new List<string>();
one is of strings and and one is of a attribute object that i created..very simple
class Attribute
{
public string size { get; set; }
public string link { get; set; }
public string name { get; set; }
public Attribute(){}
public Attribute(string s, string l, string n)
{
size = s;
link = l;
name = n;
}
}
I now have to compare to see what songs are not in the attributes name so for example
songs.Add("something");
songs.Add("another");
songs.Add("yet another");
Attribute a = new Attribute("500", "http://google.com", "something" );
attributes.Add(a);
I want a way to return "another" and "yet another" because they are not in the attributes list name
so for pseudocode
difference = songs - attributes.names

var difference = songs.Except(attributes.Select(s=>s.name)).ToList();
edit
Added ToList() to make it a list

It's worth pointing out that the answers posted here will return a list of songs not present in attributes.names, but it won't give you a list of attributes.names not present in songs.
While this is what the OP wanted, the title may be a little misleading, especially if (like me) you came here looking for a way to check whether the contents of two lists differ. If this is what you want, you can use the following:-
var differences = new HashSet(songs);
differences.SymmetricExceptWith(attributes.Select(a => a.name));
if (differences.Any())
{
// The lists differ.
}

This is the way to find all the songs which aren't included in attributes names:
var result = songs
.Where(!attributes.Select(a => a.name).ToList().Contains(song));
The answer using Except is also perfect and probably more efficient.
EDIT: This sintax has one advantage if you're using it in LINQ to SQL: it translates into a NOT IN SQL predicate. Except is not translated to anything in SQL. So, in that context, all the records would be recovered from the database and excepted on the app side, which is much less efficient.

var diff = songs.Except(attributes.Select(a => a.name)).ToList();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Grouping and sum - c#

Related

Using Rx to merge multiple sources by key

Slow performance in getting model from list model using enumerable linq

Updating entire node with mutating cypher in Neo4jclient

List<object> Self-Filter

How do i get the difference in two lists in C#?

Categories

Resources