Raven returning wrong document in OrderByDescending Statement - c#

I have 50,000 documents in my raven database, but when I I run this query the Id of the latestProfile object is returned as 9999 (the first id in the db is 0, so this is the ten thousandth item).
//find the profile with the highest ID now existing in the collection
var latestProfile = session.Query<SiteProfile>()
.Customize(c => c.WaitForNonStaleResults())
.OrderByDescending(p => p.Id)
.FirstOrDefault();
//lastProfile.Id is 9999 here
//See how many items there are in the collection. This returns 50,000
var count = session.Query<SiteProfile>()
.Customize(c => c.WaitForNonStaleResults()).Count();
My guess is that Raven is paging before my OrderByDescending statement, but
The default page size is 10, and even the max is 1024
All the Parts of this are either IRavenQueryable or IQueryable
It is also not a stale index as I have tested this with WaitForNonStaleResults()
My expected result here is the most recent id I added (50,000) to be the item returned here, but yet it is not.
Why not? This looks like a bug in Raven to me.
EDIT:
Ok, so I now know exactly why, but it still looks like a bug. Here is a list of the items from that same list actualised by a ToArray()
{ Id = 9999 },
{ Id = 9998 },
{ Id = 9997 },
{ Id = 9996 },
{ Id = 9995 },
{ Id = 9994 },
{ Id = 9993 },
{ Id = 9992 },
{ Id = 9991 },
{ Id = 9990 },
{ Id = 999 }, //<-- Whoops! This is text order not int order
{ Id = 9989 },
So even though my Id column is an integer because Raven stores it internally as a string it is ordering by that representation. Clearly Ravens Queryable implementation is resolving the ordering before checking types
I have read that you can define sort order to use integer sorting on defined indexes but really, this should not matter. In a strongly typed language integers should be sorted as integers.
Is there a way to make this Id ordering correct? Do I have actually have to resort to creating a special index on the id column just to get integers ordered correctly?
UPDATE 2:
I am now using an index as follows:
public SiteProfiles_ByProfileId()
{
Map = profiles => from profile in profiles
select new
{
profile.Id
};
Sort(x => x.Id, SortOptions.Int);
}
To try and force it to understand integers. I can see that my index is called via the Raven server console as follows:
Request # 249: GET - 3 ms - Bede.Profiles - 200 - /indexes/SiteProfiles/ByProfileId?&pageSize=1&sort=-__document_id&operationHeadersHash=-1789353429
Query:
Time: 3 ms
Index: SiteProfiles/ByProfileId
Results: 1 returned out of 20,000 total.
but still it comes back with string ordered results. I have seen advice not to use integers as the id, but that would cause massive issues on this project as there are 3rd parties referencing the current ids (in the old service this is designed to replace).
UPDATE 3: I have specific unit test that shows the issue. it appears to work fine for any integer property except for the Id.
[TestMethod]
public void Test_IndexAllowsCorrectIntSortingWhenNotId()
{
using (var store = new EmbeddableDocumentStore() {RunInMemory = true})
{
store.Initialize();
IndexCreation.CreateIndexes(typeof(MyFakeProfiles_ByProfileId).Assembly, store);
using (var session = store.OpenSession())
{
var profiles = new List<MyFakeProfile>()
{
new MyFakeProfile() { Id=80, Age = 80, FirstName = "Grandpa", LastName = "Joe"},
new MyFakeProfile() { Id=9, Age = 9,FirstName = "Jonny", LastName = "Boy"},
new MyFakeProfile() { Id=22, Age = 22, FirstName = "John", LastName = "Smith"}
};
foreach (var myFakeProfile in profiles)
{
session.Store(myFakeProfile, "MyFakeProfiles/" + myFakeProfile.Id);
}
session.SaveChanges();
var oldestPerson = session.Query<MyFakeProfile>().Customize(c => c.WaitForNonStaleResults())
.OrderByDescending(p => p.Age).FirstOrDefault();
var youngestPerson = session.Query<MyFakeProfile>().Customize(c => c.WaitForNonStaleResults())
.OrderBy(p => p.Age).FirstOrDefault();
var highestId = session.Query<MyFakeProfile>("MyFakeProfiles/ByProfileId").Customize(c => c.WaitForNonStaleResults())
.OrderByDescending(p => p.Id).FirstOrDefault();
var lowestId = session.Query<MyFakeProfile>("MyFakeProfiles/ByProfileId").Customize(c => c.WaitForNonStaleResults())
.OrderBy(p => p.Id).FirstOrDefault();
//sanity checks for ordering in Raven
Assert.AreEqual(80,oldestPerson.Age); //succeeds
Assert.AreEqual(9, youngestPerson.Age);//succeeds
Assert.AreEqual(80, highestId.Id);//fails
Assert.AreEqual(9, lowestId.Id);//fails
}
}
}
private void PopulateTestValues(IDocumentSession session)
{
var profiles = new List<MyFakeProfile>()
{
new MyFakeProfile() { Id=80, Age = 80, FirstName = "Grandpa", LastName = "Joe"},
new MyFakeProfile() { Id=9, Age = 9,FirstName = "Jonny", LastName = "Boy"},
new MyFakeProfile() { Id=22, Age = 22, FirstName = "John", LastName = "Smith"}
};
foreach (var myFakeProfile in profiles)
{
session.Store(myFakeProfile, "MyFakeProfiles/" + myFakeProfile.Id);
}
}
}
public class MyFakeProfile
{
public int Id { get; set; }
public int Age { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
}
public class MyFakeProfiles_ByProfileId : AbstractIndexCreationTask<MyFakeProfile>
{
// The index name generated by this is going to be SiteProfiles/ByProfileId
public MyFakeProfiles_ByProfileId()
{
Map = profiles => from profile in profiles
select new
{
profile.Id
};
Sort(x => (int)x.Id, SortOptions.Int);
}
}

You need to specify the type of the field on the index, see http://ravendb.net/docs/2.5/client-api/querying/static-indexes/customizing-results-order
Side note, IDs in RavenDB are always strings. You seem to be trying to use integer IDs - don't do that.

You can provide multiple Sort field, as you have only defined it for Id:
public SiteProfiles_ByProfileId()
{
Map = profiles => from profile in profiles
select new
{
profile.Id
};
Sort(x => x.Id, SortOptions.Int);
Sort(x => x.Age, SortOptions.Int);
}
BUT ... I am unsure of the effects of applying a sort on a field that isn't mapped.
You may have to extend the mapping to select both fields, like this:
public SiteProfiles_ByProfileId()
{
Map = profiles => from profile in profiles
select new
{
profile.Id,
profile.Age
};
Sort(x => x.Id, SortOptions.Int);
Sort(x => x.Age, SortOptions.Int);
}

Related

Updating property values in one list with a property value average of matching items in another list

I have two Lists and need to update a property value of all the items in the 1st list with a property value average of all the matching items in another list.
class transaction
{
public string orderId;
public string parentOrderId;
public int quantity;
public decimal marketPrice;
public decimal fillPrice;
}
List<transaction> makerTransactions = new List<transaction>()
{
new transaction(){
orderId = "1",
parentOrderId = "1",
quantity = 100,
marketPrice = 75.87M,
fillPrice = 75.87M
}
};
List<transaction> takerTransactions = new List<transaction>()
{
new transaction(){
orderId = "2",
parentOrderId = "1",
quantity = 50,
marketPrice = 75.97M,
fillPrice = 75.97M
},
new transaction(){
orderId = "3",
parentOrderId = "1",
quantity = 50,
marketPrice = 75.85M,
fillPrice = 75.85M
}
};
Trying to make this work with LINQ extension methods but cant figure out the correct way.
makerTransactions.All(mt => mt.fillPrice = takerTransactions
.Where(tt => tt.parentOrderId == mt.orderId)
.Average(ta => ta.fillPrice));
try this:
makerTransactions.ForEach(mt => mt.fillPrice = takerTransactions
.Where(tt => tt.parentOrderId == mt.orderId)
.Average(ta => ta.fillPrice));
All is an extension method. It tells you if all the elements in a collection match a certain condition and, apparently, it's not what you need.
To make it more efficient, first create a dictionary and use that to take the averages from:
var priceDictionary = takerTransactions
.GroupBy(tt => tt.parentOrderId)
.ToDictionary(grp => gr.Key, grp => grp.Average(ta => ta.fillPrice));
makerTransactions.ForEach(mt => mt.fillPrice = priceDictionary[mt.orderId]);

Add duplicates together in List

First question :)
I have a List<Materiau> (where Materiau implements IComparable<Materiau>), and I would like to remove all duplicates and add them together
(if two Materiau is the same (using the comparator), merge it to the first and remove the second from the list)
A Materiau contains an ID and a quantity, when I merge two Materiau using += or +, it keeps the same ID, and the quantity is added
I cannot control the input of the list.
I would like something like this:
List<Materiau> materiaux = getList().mergeDuplicates();
Thank you for your time :)
Check out Linq! Specifically the GroupBy method.
I don't know how familiar you are with sql, but Linq lets you query collections similarly to how sql works.
It's a bit in depth to explain of you are totally unfamiliar, but Code Project has a wonderful example
To sum it up:
Imagine we have this
List<Product> prodList = new List<Product>
{
new Product
{
ID = 1,
Quantity = 1
},
new Product
{
ID = 2,
Quantity = 2
},
new Product
{
ID = 3,
Quantity = 7
},
new Product
{
ID = 4,
Quantity = 3
}
};
and we wanted to group all the duplicate products, and sum their quantities.
We can do this:
var groupedProducts = prodList.GroupBy(item => item.ID)
and then select the values out of the grouping, with the aggregates as needed
var results = groupedProducts.Select( i => new Product
{
ID = i.Key, // this is what we Grouped By above
Quantity = i.Sum(prod => prod.Quantity) // we want to sum up all the quantities in this grouping
});
and boom! we have a list of aggregated products
Lets say you have a class
class Foo
{
public int Id { get; set; }
public int Value { get; set; }
}
and a bunch of them inside a list
var foocollection = new List<Foo> {
new Foo { Id = 1, Value = 1, },
new Foo { Id = 2, Value = 1, },
new Foo { Id = 2, Value = 1, },
};
then you can group them and build the aggregate on each group
var foogrouped = foocollection
.GroupBy( f => f.Id )
.Select( g => new Foo { Id = g.Key, Value = g.Aggregate( 0, ( a, f ) => a + f.Value ) } )
.ToList();
List<Materiau> distinctList = getList().Distinct(EqualityComparer<Materiau>.Default).ToList();

Using Linq to remove from set where key exists in other set?

What is the proper way to do set subtraction using Linq? I have a List of 8000+ banks where I want to remove a portion of those based on the routing number. The portion is in another List and routing number is the key property to both. Here is a simplification:
public class Bank
{
public string RoutingNumber { get; set; }
public string Name { get; set; }
}
var removeThese = new List<string>() { "111", "444", "777" };
var banks = new List<Bank>()
{
new Bank() { RoutingNumber = "111", Name = "First Federal" },
new Bank() { RoutingNumber = "222", Name = "Second Federal" },
new Bank() { RoutingNumber = "333", Name = "Third Federal" },
new Bank() { RoutingNumber = "444", Name = "Fourth Federal" },
new Bank() { RoutingNumber = "555", Name = "Fifth Federal" },
new Bank() { RoutingNumber = "666", Name = "Sixth Federal" },
new Bank() { RoutingNumber = "777", Name = "Seventh Federal" },
new Bank() { RoutingNumber = "888", Name = "Eight Federal" },
new Bank() { RoutingNumber = "999", Name = "Ninth Federal" },
};
var query = banks.Remove(banks.Where(x => removeThese.Contains(x.RoutingNumber)));
This should do the trick:
var toRemove = banks.Where(x => removeThese.Contains(x.RoutingNumber)).ToList();
var query = banks.RemoveAll(x => toRemove.Contains(x));
The first step is to make sure that you don't have to re-run that first query over and over again, whenever banks changes.
This should work too:
var query = banks.Except(toRemove);
as your second line.
EDIT
Tim Schmelter pointed out that for Except to work, you need to override Equals and GetHashCode.
So you could implement it like so:
public override string ToString()
{
... any serialization will do, for instance JSON or CSV or XML ...
... OR any serialization that identifies the object quickly, such as:
return "Bank: " + this.RoutingNumber;
}
public override bool Equals(System.Object obj)
{
return ((obj is Bank) && (this.ToString().Equals(obj.ToString()));
}
public override int GetHashCode()
{
return this.ToString().GetHashCode();
}
Generally it's less work to just pull out the ones you need rather than deleting the ones you don't i.e.
var query = myList.Where(x => !removeThese.Contains(x.RoutingNumber));
Filtering of this type is generally done with generic LINQ constructs:
banks = banks.Where(bank => !removeThese.Contains(bank.RoutingNumber)).ToList();
In this specific case you can also use List<T>.RemoveAll to do the filtering in-place, which will be faster:
banks.RemoveAll(bank => removeThese.Contains(bank.RoutingNumber));
Also, for performance reasons, if the amount of routing numbers to remove is large you should consider putting them into a HashSet<string> instead.
Either use the Linq extension methods Where and ToList to create a new list or use List.RemoveAll which is more efficient since it modifies the original list:
banks = banks.Where(x => !removeThese.Contains(x.RoutingNumber)).ToList();
banks.RemoveAll(x => removeThese.Contains(x.RoutingNumber));
Of course you have to reverse the condition since the former keeps what Where leaves and the latter removes what the predicate in RemoveAll returns.
Have you tried using RemoveAll()?
var query = banks.RemoveAll(p => removeThese.Contains(p.RoutingNumber));
This will remove the any values from banks where a matching record is present in removeThese.
query will contain the number of records removed from the list.
Note: The orginal variable banks will be updated directly by this query; a reassignment is not required.
You can use RemoveAll()
var removedIndexes = banks.RemoveAll(x => removeThese.Contains(x.RoutingNumber));
or
banks = banks.Where(bank => !removeThese.Contains(bank.RoutingNumber)).ToList();

Code to collapse duplicate and semi-duplicate records?

I have a list of models of this type:
public class TourDude {
public int Id { get; set; }
public string Name { get; set; }
}
And here is my list:
public IEnumerable<TourDude> GetAllGuides {
get {
List<TourDude> guides = new List<TourDude>();
guides.Add(new TourDude() { Name = "Dave Et", Id = 1 });
guides.Add(new TourDude() { Name = "Dave Eton", Id = 1 });
guides.Add(new TourDude() { Name = "Dave EtZ5", Id = 1 });
guides.Add(new TourDude() { Name = "Danial Maze A", Id = 2 });
guides.Add(new TourDude() { Name = "Danial Maze B", Id = 2 });
guides.Add(new TourDude() { Name = "Danial", Id = 3 });
return guides;
}
}
I want to retrieve these records:
{ Name = "Dave Et", Id = 1 }
{ Name = "Danial Maze", Id = 2 }
{ Name = "Danial", Id = 3 }
The goal mainly to collapse duplicates and near duplicates (confirmable by the ID), taking the shortest possible value (when compared) as name.
Where do I start? Is there a complete LINQ that will do this for me? Do I need to code up an equality comparer?
Edit 1:
var result = from x in GetAllGuides
group x.Name by x.Id into g
select new TourDude {
Test = Exts.LongestCommonPrefix(g),
Id = g.Key,
};
IEnumerable<IEnumerable<char>> test = result.First().Test;
string str = test.First().ToString();
If you want to group the items by Id and then find the longest common prefix of the Names within each group, then you can do so as follows:
var result = from x in guides
group x.Name by x.Id into g
select new TourDude
{
Name = LongestCommonPrefix(g),
Id = g.Key,
};
using the algorithm for finding the longest common prefix from here.
Result:
{ Name = "Dave Et", Id = 1 }
{ Name = "Danial Maze ", Id = 2 }
{ Name = "Danial", Id = 3 }
static string LongestCommonPrefix(IEnumerable<string> xs)
{
return new string(xs
.Transpose()
.TakeWhile(s => s.All(d => d == s.First()))
.Select(s => s.First())
.ToArray());
}
I was able to achieve this by grouping the records on the ID then selecting the first record from each group ordered by the Name length:
var result = GetAllGuides.GroupBy(td => td.Id)
.Select(g => g.OrderBy(td => td.Name.Length).First());
foreach (var dude in result)
{
Console.WriteLine("{{Name = {0}, Id = {1}}}", dude.Name, dude.Id);
}

Group by in linq + select

say I have this data
1 757f27a2-e997-44f8-b2c2-6c0fd6ee2c2f 2 3
2 757f27a2-e997-44f8-b2c2-6c0fd6ee2c2f 3 1
3 757f27a2-e997-44f8-b2c2-6c0fd6ee2c2f 2 2
column 1 // pk
column 2 // userId
column 3 // courseId
column 4 // permissionId
I have this class
class CoursePermissions
{
public string Prefix { get; set; }
public bool OwnerPermission { get; set; } // permissionId 1
public bool AddPermission { get; set; } // permissionId 2
public bool EditPermission { get; set; } // permissionId 3
}
I want to group all the 3 rows by courseId(or Prefix) and then take that information and make a class out Of it
So the end result would be
List<CoursePermissions> permissions = new List<CoursePermissions>();
CoursePermissions a = new CoursePermissions
{
Prefix = "comp101";
OwnerPermission = false,
AddPermission = true,
EditPermission = true
};
CoursePermissions b = new CoursePermissions
{
Prefix = "comp102";
OwnerPermission = true,
AddPermission = false,
EditPermission = false
};
permissions.Add(a);
permissions.Add(b);
So the above is how the object would look if I took all the row data from the db and manually made it the way I wanted it too look. Of course I need to do it somehow as a query.
In my example I have 2 students. They both belong to the same course. Student 1has edit and Add permission for Comp101 but only owner permissions for comp102.
I want to get all the rows back for Comp101 and put it into CoursePermissions. Then I want to get all the rows back for Comp102 and put it into CoursePermissions. Then store all these in a collection and use them.
The only thing I can do is something like this
var list = session.Query<PermissionLevel>().Where(u => u.Student.StudentId == studentId).ToList();
IEnumerable<IGrouping<string, PermissionLevel>> test = list.GroupBy(x => x.Course.Prefix);
foreach (var t in test)
{
CoursePermissions c = new CoursePermissions();
foreach (var permissionLevel in t)
{
if (permissionLevel.PermissionLevelId == 1)
{
c.OwnerPermission = true;
}
}
}
It would nice if I could get rid of the nest for each loop and do it all when the data comes from the query.
Here's an approach that I think is quite functional.
First set up a dictionary of actions that will set the appropriate course permission given a permission level id.
var setPermission = new Dictionary<int, Action<CoursePermissions>>()
{
{ 1, cps => cps.OwnerPermission = true },
{ 2, cps => cps.AddPermission = true },
{ 3, cps => cps.EditPermission = true },
};
Now create a function that will turn the course prefix and a list of permission level ids into a new CoursePermissions object.
Func<string, IEnumerable<int>, CoursePermissions>
buildCoursePermission = (prefix, permissionLevelIds) =>
{
var cps = new CoursePermissions() { Prefix = prefix };
foreach (var permissionLevelId in permissionLevelIds)
{
setPermission[permissionLevelId](cps);
}
return cps;
};
Now all you have left is a simple query that turns your list of permission levels into a list of course permissions.
var coursePermissionsList =
(from pl in list
group pl.PermissionLevelId by pl.Course.Prefix into gcpls
select buildCoursePermission(gcpls.Key, gcpls)).ToList();
How does that work for you?

Categories