I'm working on an algorithm for recommendations as restaurants to the client. These recommendations are based on a few filters, but mostly by comparing reviews people have left on restaurants. (I'll spare you the details).
For calculating a pearson correlation (A number which determines how well users fit with eachother) I have to check where users have left a review on the same restaurant. To increase the amount of matches, I've included a match on the price range of the subjects. I'll try to explain, here is my Restaurant class:
public class Restaurant
{
public Guid Id { get; set; }
public int PriceRange { get; set; }
}
This is a simplified version, but it's enough for my example. A pricerange can be an integer of 1-5 which determines how expensive the restaurant is.
Here's the for loop I'm using to check if they left reviews on the same restaurant or a review on a restaurant with the same pricerange.
//List<Review> user1Reviews is a list of all reviews from the first user
//List<Review> user2Reviews is a list of all reviews from the second user
Dictionary<Review, Review> shared_items = new Dictionary<Review, Review>();
foreach (var review1 in user1Reviews)
foreach (var review2 in user2Reviews)
if (review1.Restaurant.Id == review2.Restaurant.Id ||
review1.Restaurant.PriceRange == review2.Restaurant.PriceRange)
if (!shared_items.ContainsKey(review1))
shared_items.Add(review1, review2);
Now here's my actual problem. You can see I'm looping the second list for each review the first user has left. Is there a way to improve the performance of these loops? I have tried using a hashset and the .contains() function, but I need to include more criteria (I.e. the price range). I couldn't figure out how to include that in a hashset.
I hope it's not too confusing, and thanks in advance for any help!
Edit: After testing both linq and the for loops I have concluded that the for loops is twice as fast as using linq. Thanks for your help!
You could try replacing your inner loop by a Linq query using the criteria of the outer loop:
foreach (var review1 in user1Reviews)
{
var review2 = user2Reviews.FirstOrDefault(r2 => r2.Restaurant.Id == review1.Restaurant.Id ||
r2.Restaurant.PriceRange == review1.Restaurant.PriceRange);
if (review2 != null)
{
if (!shared_items.ContainsKey(review1))
shared_items.Add(review1, review2);
}
}
If there are multiple matches you should use Where and deal with the potential list of results.
I'm not sure it would be any quicker though as you still have to check all the user2 reviews against the user1 reviews.
Hoever, if you wrote a custom comparer for your restaurant class you could use this overload of Intersect to return you the common reviews:
var commonReviews = user1Reviews.Intersect(user2Reviews, new RestaurantComparer());
Where RestaurantComparer looks something like this:
// Custom comparer for the Restaurant class
class RestaurantComparer : IEqualityComparer<Restaurant>
{
// Products are equal if their ids and price ranges are equal.
public bool Equals(Restaurant x, Restaurant y)
{
//Check whether the compared objects reference the same data.
if (Object.ReferenceEquals(x, y)) return true;
//Check whether any of the compared objects is null.
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
//Check whether the properties are equal.
return x.Id == y.Id && x.PriceRange == y.PriceRange;
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public int GetHashCode(Product product)
{
//Check whether the object is null
if (Object.ReferenceEquals(product, null)) return 0;
//Get hash code for the Id field.
int hashId product.Id.GetHashCode();
//Get hash code for the Code field.
int hashPriceRange = product.PriceRange.GetHashCode();
//Calculate the hash code for the product.
return hashId ^ hashPriceRange;
}
}
You basically need a fast way to locate a review by Id or PriceRange. Normally you would use fast hash based lookup structure like Dictionary<TKey, TValue> for a single key, or composite key if the match operation was and. Unfortunately your is or, so the Dictionary doesn't work.
Well, not really. Single dictionary does not work, but you can use two dictionaries, and since the dictionary lookup is O(1), the operation will still be O(N) (rather than O(N * M) as with inner loop / naïve LINQ).
Since the keys are not unique, instead of dictionaries you can use lookups, keeping the same efficiency:
var lookup1 = user2Reviews.ToLookup(r => r.Restaurant.Id);
var lookup2 = user2Reviews.ToLookup(r => r.Restaurant.PriceRange);
foreach (var review1 in user1Reviews)
{
var review2 = lookup1[review.Restaurant.Id].FirstOrDefault() ??
lookup2[review.Restaurant.PriceRange].FirstOrDefault();
if (review2 != null)
{
// do something
}
}
Related
I have 2 classes one called Record and one called Location
public class Record
{
public string Id { get; set; }
}
public class Location
{
public string LocationId { get; set; }
public bool isDefault { get; set; }
}
Now I need to check whether in a list of Locations whether the given ID from Record exists in the List of Locations. For example this is my code so far for handling. If field.Id does not exist in the current list then it should create a new Location and add it to the list.
foreach (Record field in inputs)
{
locationResponse = await myLocationCmd.GetLocation(locationInput).ConfigureAwait(false);
foreach (Location locations in locationResponse.Locations)
{
if (locations.IsDefault == true)
{
this.BadRequest();
}
else if (locations.IsDefault == false && // Check whether field.Id exists in locationResponse.Locations)
{
locationInput.Verb = CmdletVerb.Set.ToString();
}
else if (locations.IsDefault == false && // Check whether field.Id does not exist in the locationResponse.Locations)
{
locationInput.Verb = CmdletVerb.New.ToString();
}
}
Currently I tried doing locationResponse.Locations.Contains(field.Id) but this produces a type error. Does anyone know the correct way to do this?
If I understood correctly, you need to loop over every object to check the ID
locationResponse.Locations.Contains(field.Id) gives a type error because that is a list of Locations, not a list of Strings
locationResponse = await myLocationCmd.GetLocation(locationInput).ConfigureAwait(false);
foreach (Record field in inputs) {
var idExists = false;
foreach (Location loc in locationResponse.Locations)
idExists = loc.LocationId == field.Id;
if (idExists) break;
}
Console.WriteLine(idExists);
}
I would be cautious about the size of the inputs list, though, because you are looking over the same locations list more than once for every field, which is wasted iterations
The shortest fix based on the given code would be using:
bool idExistsInList = locationResponse.Locations.Any(location => location.LocationId == field.Id);
And then you can use idExistsInList and !idExistsInList (respectively) in your two comments.
However, there are several further improvements to be made:
There is a bug in your code. Your inner foreach keeps overwriting the same value and will effectively only keep the value that is set in the last loop of the foreach.
The exact same bug happens for your outer foreach. The value of locationInput.Verb is effectively defined by comparing the last element of both lists, and all other evalutation are effectively overwritten.
In general, favor LINQ over manual foreach logic. It enhances readability and cuts down on boilerplating.
Assuming this is the only logic in the outer foreach, that can also be simplified using LINQ.
For every input field, you fetch the same locationResponse. It makes no sense to do this for every field if the specific field value isn't actually used to fetch the object.
There's no need for the third if evaluation. When the first two evaluations have failed, then the third one is always going to be true.
== true can always be omitted.
== false should generally be refactored to using !
When you always set the same field/property, but a boolean value decided if you set it to one value or another, favor using the ternary myBool ? valueIfTrue : valueIfFalse.
You can extract this logic outside of the foreach loops:
var locationResponse = await myLocationCmd.GetLocation(locationInput).ConfigureAwait(false);
if(locationResponse.Locations.Any(location => location.IsDefault))
this.BadRequest();
var locationIds = locationResponse.Locations.Select(location => location.LocationId).ToList();
How to handle the foreach loops is unclear, as I'm not sure whether you're trying to find if any field (= at least one) has a match in the location list, or whether every field has a match in the location list.
If you're trying to find if any field (= at least one) has a match in the location list:
bool matchFound = inputs.Any(field => locationIds.Contains(field.Id);
If you're trying to find if every field has a match in the location list:
bool matchFound = inputs.All(field => locationIds.Contains(field.Id);
Any returns true if at least one field has a match, All only returns true if all fields have a match.
In either case, you can then continue on with:
locationInput.Verb = matchFound
? CmdletVerb.Set.ToString()
: CmdletVerb.New.ToString();
And that accomplishes the same as your code, but with less redundancy and higher readability.
A table has a column for Categories which hold integers representing Property, Cars, Others.
There are different columns of interest for each category shown below such that keywords searching for property will focus on columns for PropertyType, State, County, and NoOfBaths; while keyword searching for cars will focus on make, model, year and so on.
All entries have data in all columns but the data might sometimes have a slightly different meaning for different categories. For instance, PropertyType columns holds CarType data for cars and ItemType data for others, but the columns is only of interest when searching property.
Property
PropertyType,
Location State,
Location County,
No of Baths
Cars
Make,
Model,
Year,
Location State
Others
Itemname,
Make,
Colour,
Location State
The columns of interest were limited to four for performance reasons. A single search text box is used in the UI just like google. The algorithm used to pre identify the user’s search category before the query is fired posts an acceptable 98% accuracy rate. The algorithm also makes a good guess of what could be colour, state, county etc.
The site started as a small ads site developed using c#, entity framework, SQL server.
Since it was conceived as a small project, I thought I could pull it off with linq to entities. Using if statements to eliminate null fields, they were a finite number of queries (2 to the power 4) for each category.
Eg. 1
and some listings for the queryHelper
where the null value is checked before the where clause is composed.
By the time I was done, I was not sure if a small project like that deserved this kind of logic even though it seemed more flexible and maintainable. The columns of interest could be changed without affecting the code.
The question is if there is an easier way to achieve this?
Secondly, why isn’t there an ‘Ignorable()’ function in linq such that a given portion of the where clause can be ignored if the value being compared is null or empty?
Eg. 1 modifed
var results = context.Items.Where(m=>m.make.Ignorable() == make &&
m.model.Ignorable() == model && m.year.Ignorable() ==year &&
m.state.Ignorable() == state);
…
Or a symbol, say ‘¬’, which achieves the same like so
Eg. 1 modifed
var results = context.Items.Where(m=>m.make ¬== make && m.model ¬== model
&& m.year ¬==year && m.state ¬== state);
…
I think much easier and maintainable way for doing this is an overrided Equals() method in the particular class. So that any changes in the properties need not to alter the Linq queries. Let me explain this with the help of an example class, let it be the class Cars consider the definition of class will be like this:
public class Cars
{
// Properties
public string Make { get; set; }
public string Model { get; set; }
public int Year { get; set; }
public string Location_State { get; set; }
// overrided Equals method definition
public override bool Equals(object obj)
{
return this.Equals(obj as Cars);
}
public bool Equals(Cars other)
{
if (other == null)
return false;
return (this.Make == other.Make) &&
(this.Model == other.Model) &&
(this.Year == other.Year) &&
(this.Location_State == other.Location_State);
}
}
Now let objCars be the object you wanted to compare with the cars in the context.Items then you can format your LINQ query like this:
context.Items.Where(m=> m.Equals(objCars));
Note : You can give the N number of conditions in the Equals method so that you can avoid checking is null or empty or what ever else each time before executing the LINQ or even withing the LINQ. Easily made property changes to the class, you need to alter the condition in the overrieded method only
var q = context.Items;
if (!string.IsNullOrEmpty(make))
{
q = q.Where(m => m.make == make);
}
if (!string.IsNullOrEmpty(model))
{
q = q.Where(m => m.model == model);
}
//. . .
var results = q.ToList();
The query could be manipulated in multiple lines before being executed. See here
I have a list List<OfferComparison> Comparison. I want to
check if all the items have Value == null in an if condition.
How can I do it with linq?
public class OfferComparison : BaseModel
{
public string Name { get; set; }
public string Value { get; set; }
public bool Valid { get; set; }
}
Updated (post C# 7) Answer
If using C# 7 or 8 then one could use the is keyword together with Linq.All:
var result = Comparison.All(item => item.Value is null)
If using C# 9 then one could use the is not null together with Linq.Any:
var result = Comparison.Any(item => item.Value is not null)
If using C# 9 then one could also use the is object or is {} together with Linq.Any:
var result = Comparison.Any(item => item.Value is object)
All these options are somewhat equivalent. At least in terms of time complexity they are all O(n). I guess the "preferred" option simply depends on personal opinion.
Original (pre C# 7) Answer
Using linq method of All:
var result = Comparison.All(item => item.Value == null)
Basically what it does is to iterate all items of a collection and check a predicate for each of them. If one does not match - result is false
You can check by this linq statement
var isNull = Comparison.All(item => item.Value == null);
I'm not totally sure about the internal differences of All and Exists, but it might be a good idea to just check whether one of the entries is not null and then negate the result:
var result = !Comparison.Exists(o => o.Value != null);
I would expect this query to quit after the first non-null value was found and therefore to be a little more efficient.
Update: From the Enumerable.All documentation:
The enumeration of source is stopped as soon as the result can be determined.
Therefore, using All will probably not result in the entire list getting processed after a non-null value has been found.
So the aforementioned possible performance gain is not likely to occur and both solutions probably do not differ.
I'm wondering if anyone as any suggestions for this problem.
I'm using intersect and except (Linq) with a custom IEqualityComparer in order to query the set differences and set intersections of two sequences of ISyncableUsers.
public interface ISyncableUser
{
string Guid { get; }
string UserPrincipalName { get; }
}
The logic behind whether two ISyncableUsers are equal is conditional. The conditions center around whether either of the two properties, Guid and UserPrincipalName, have values. The best way to explain this logic is with code. Below is my implementation of the Equals method of my customer IEqualityComparer.
public bool Equals(ISyncableUser userA, ISyncableUser userB)
{
if (userA == null && userB == null)
{
return true;
}
if (userA == null)
{
return false;
}
if (userB == null)
{
return false;
}
if ((!string.IsNullOrWhiteSpace(userA.Guid) && !string.IsNullOrWhiteSpace(userB.Guid)) &&
userA.Guid == userB.Guid)
{
return true;
}
if (UsersHaveUpn(userA, userB))
{
if (userB.UserPrincipalName.Equals(userA.UserPrincipalName, StringComparison.InvariantCultureIgnoreCase))
{
return true;
}
}
return false;
}
private bool UsersHaveUpn(ISyncableUser userA, ISyncableUser userB)
{
return !string.IsNullOrWhiteSpace(userA.UserPrincipalName)
&& !string.IsNullOrWhiteSpace(userB.UserPrincipalName);
}
The problem I'm having, is with implementing GetHashCode so that the above conditional equality, represented above, is respected. The only way I've been able to get the intersect and except calls to work as expected is to simple always return the same value from GetHashCode(), forcing a call to Equals.
public int GetHashCode(ISyncableUser obj)
{
return 0;
}
This works but the performance penalty is huge, as expected. (I've tested this with non-conditional equality. With two sets containing 50000 objects, a proper hashcode implementation allows execution of intercept and except in about 40ms. A hashcode implementation that always returns 0 takes approximately 144000ms (yes, 2.4 minutes!))
So, how would I go about implementing a GetHashCode() in the scenario above?
Any thoughts would be more than welcome!
If I'm reading this correctly, your equality relation is not transitive. Picture the following three ISyncableUsers:
A { Guid: "1", UserPrincipalName: "2" }
B { Guid: "2", UserPrincipalName: "2" }
C { Guid: "2", UserPrincipalName: "1" }
A == B because they have the same UserPrincipalName
B == C because they have the same Guid
A != C because they don't share either.
From the spec,
The Equals method is reflexive, symmetric, and transitive. That is, it returns true if used to compare an object with itself; true for two objects x and y if it is true for y and x; and true for two objects x and z if it is true for x and y and also true for y and z.
If your equality relation isn't consistent, there's no way you can implement a hash code that backs it up.
From another point of view: you're essentially looking for three functions:
G mapping GUIDs to ints (if you know the GUID but the UPN is blank)
U mapping UPNs to ints (if you know the UPN but the GUID is blank)
P mapping (guid, upn) pairs to ints (if you know both)
such that G(g) == U(u) == P(g, u) for all g and u. This is only possible if you ignore g and u completely.
If we suppose that your Equals implementation is correct, i.e. it's reflective, transitive and symmetric then the basic implementation for your GetHashCode function should look like this:
public int GetHashCode(ISyncableUser obj)
{
if (obj == null)
{
return SOME_CONSTANT;
}
if (!string.IsNullOrWhiteSpace(obj.UserPrincipalName) &&
<can have user object with different guid and the same name>)
{
return GetHashCode(obj.UserPrincipalName);
}
return GetHashCode(obj.Guid);
}
You should also understand that you've got rather intricate dependencies between your objects.
Indeed, let's take two ISyncableUser objects: 'u1' and 'u2', such that u1.Guid != u2.Guid, but u1.UserPrincipalName == u2.UserPrincipalName and names are not empty. Requirements for Equality imposes that for any 'ISyncableUser' object 'u' such that u.Guid == u1.Guid, the condition u.UserPrincipalName == u1.UserPrincipalName should be also true. This reasoning dictates GetHashCode implementation, for each user object it should be based either on it's name or guid.
One way would be to maintain a dictionary of hashcodes for usernames and GUIDS.
You could generate this dictionary at the start once for all users, which would probably the cleanest solution.
You could add or update an entry in the Constructor of each user.
Or, you could maintain that dictionary inside the GetHashCode function. This means your GetHashCode function has more work to do and is not free of side-effects. Getting this to work with multiple threads or parallel-linq will need some more carefull work. So I don't know whether I would recommend this approach.
Nevertheless, here is my attempt:
private Dictionary<string, int> _guidHash =
new Dictionary<string, int>();
private Dictionary<string, int> _nameHash =
new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);
public int GetHashCode(ISyncableUser obj)
{
int hash = 0;
if (obj==null) return hash;
if (!String.IsNullOrWhiteSpace(obj.Guid)
&& _guidHash.TryGetValue(obj.Guid, out hash))
return hash;
if (!String.IsNullOrWhiteSpace(obj.UserPrincipalName)
&& _nameHash.TryGetValue(obj.UserPrincipalName, out hash))
return hash;
hash = RuntimeHelpers.GetHashCode(obj);
// or use some other method to generate an unique hashcode here
if (!String.IsNullOrWhiteSpace(obj.Guid))
_guidHash.Add(obj.Guid, hash);
if (!String.IsNullOrWhiteSpace(obj.UserPrincipalName))
_nameHash.Add(obj.UserPrincipalName, hash);
return hash;
}
Note that this will fail if the ISyncableUser objects do not play nice and exhibit cases like in Rawling's answer. I am assuming that users with the same GUID will have the same name or no name at all, and users with the same principalName have the same GUID or no GUID at all. (I think the given Equals implementation has the same limitations)
I am writing an application that validates some cities. Part of the validation is checking if the city is already in a list by matching the country code and cityname (or alt cityname).
I am storing my existing cities list as:
public struct City
{
public int id;
public string countrycode;
public string name;
public string altName;
public int timezoneId;
}
List<City> cityCache = new List<City>();
I then have a list of location strings that contain country codes and city names etc. I split this string and then check if the city already exists.
string cityString = GetCity(); //get the city string
string countryCode = GetCountry(); //get the country string
city = new City(); //create a new city object
if (!string.IsNullOrEmpty(cityString)) //don't bother checking if no city was specified
{
//check if city exists in the list in the same country
city = cityCache.FirstOrDefault(x => countryCode == x.countrycode && (Like(x.name, cityString ) || Like(x.altName, cityString )));
//if no city if found, search for a single match accross any country
if (city.id == default(int) && cityCache.Count(x => Like(x.name, cityString ) || Like(x.altName, cityString )) == 1)
city = cityCache.FirstOrDefault(x => Like(x.name, cityString ) || Like(x.altName, cityString ));
}
if (city.id == default(int))
{
//city not matched
}
This is very slow for lots of records, as I am also checking other objects like airports and countries in the same way. Is there any way I can speed this up? Is there a faster collection for this kind of comparison than List<>, and is there a faster comparison function that FirsOrDefault()?
EDIT
I forgot to post my Like() function:
bool Like(string s1, string s2)
{
if (string.IsNullOrEmpty(s1) || string.IsNullOrEmpty(s2))
return s1 == s2;
if (s1.ToLower().Trim() == s2.ToLower().Trim())
return true;
return Regex.IsMatch(Regex.Escape(s1.ToLower().Trim()), Regex.Escape(s2.ToLower().Trim()) + ".");
}
I would use a HashSet for the CityString and CountryCode.
Something like
var validCountryCode = new HashSet<string>(StringComparison.OrdinalIgnoreCase);
if (validCountryCode.Contains(city.CountryCode))
{
}
etc...
Personally I would do all the validation in the constructor to ensure only valid City objects exist.
Other things to watch out for performance
Use HashSet if you're looking it up in a valid list.
Use IEqualityComparer where appropriate, reuse the object to avoid the construction/GC costs.
Use a Dictionary for anything you need to lookup (e.g. timeZoneId)
Edit 1
You're cityCache could be something like,
var cityCache = new Dictionary<string, Dictionary<string, int>>();
var countryCode = "";
var cityCode = "";
var id = x;
public static IsCityValid(City c)
{
return
cityCache.ContainsKey(c.CountryCode) &&
cityCache[c.CountryCode].ContainsKey(c.CityCode) &&
cityCache[c.CountryCode][c.CityCode] == c.Id;
}
Edit 2
Didn't think I have to explain this, but based on the comments, maybe.
FirstOrDefault() is an O(n) operation. Essentially everytime you are trying to find a find something in a list, you can either be lucky and it is the first in the list, or unlucky and it is the last, average of list.Count / 2. A dictionary on the other hand will be an O(1) lookup. Using the IEqualtiyComparer it will generate a HashCode() and lookup what bucket it sits in. If there are loads of collisions only then will it use the Equals to find what you're after in the list of things in the same bucket. Even with a poor quality HashCode() (short of returning the same HashCode always) because Dictionary / HashSet use prime number buckets you will split your list up reducing the number of Equalities you need to complete.
So a list of 10 objects means you're on average running LIKE 5 times.
A Dictionary of the same 10 objects as below (depending on the quality of the HashCode), could be as little as one HashCode() call followed by one Equals() call.
This sounds like a good candidate for a binary tree.
For binary tree implementations in .NET, see: Objects that represent trees
EDIT:
If you want to search a collection quickly, and that collection is particularly large, then your best option is to sort it and implement a search algorithm based on that sorting.
Binary trees are a good option when you want to search quickly and insert items relatively infrequently. To keep your searches quick, though, you'll need to use a balancing binary tree.
For this to work properly, though, you'll also need a standard key to use for your cities. A numeric key would be best, but strings can work fine too. If you concatenated your city with other information (such as the state and country) you will get a nice unique key. You could also change the case to all upper- or lower-case to get a case-insensitive key.
If you don't have a key, then you can't sort your data. If you can't sort your data, then there's not going to many "quick" options.
EDIT 2:
I notice that your Like function edits your strings a lot. Editing a string is an extremely expensive operation. You would be much better off performing the ToLower() and Trim() functions once, preferably when you are first loading your data. This will probably speed up your function considerably.