Hacker News style ordering algorithm in Linq-To-SQL

Hacker News style ordering algorithm in Linq-To-SQL - c#

According to this site the ordering algorithm for hacker news goes something like this:
(p - 1) / (t + 2)^1.5
Description:
Votes divided by age factor
p = votes (points) from users. t =
time since submission in hours.
p is subtracted by 1 to negate
submitters vote. age factor is (time
since submission in hours plus two) to
the power of 1.5.
Given a table structure similar to this:
Item
ID
Link
DatePosted
Item_Votes
ItemID
Value
What would be the best way to implement the algorithm using linq to sql, would I be able to write the query entirely in linq or would I need to use a stored procedure or something else.
Update. Ended up using the code below based off TJB's answer:
var votesQuery =
from i in db.Items
join v in db.Item_Votes on i.ItemID equals v.ItemID
orderby
(double)(v.Value - 1) /
Math.Pow(
(DateTime.Now - i.DatePosted.Value).TotalHours + 2,
1.5) descending
select i;

Using 2 1 Linq queries (there's probably still a more efficent way)
var votesQuery =
from i in items
join v in votes on i.Id equals v.ItemId
orderby
(v.Value - 1) /
Math.Pow(
(DateTime.Now - i.Posted).Add(new TimeSpan(2,0,0)).Hours,
1.5 )
select new
{
Item = i,
Vote = v
};

Use a Comparer. This will allow you to write the logic for the comparisons between rows any way you want.
Here is an example, using case-insensitive ordering:
public void LinqExample()
{
string[] words = {
"aPPLE", "AbAcUs", "bRaNcH", "BlUeBeRrY", "ClOvEr", "cHeRry"
};
var sortedWords = words.OrderBy(a => a, new CaseInsensitiveComparer());
ObjectDumper.Write(sortedWords);
}
public class CaseInsensitiveComparer : IComparer<string>
{
public int Compare(string x, string y)
{
return string.Compare(x, y, StringComparison.OrdinalIgnoreCase);
}
}
http://msdn.microsoft.com/en-us/vcsharp/aa336756.aspx

This might work (untested):
var orderedList =
from extracted in (
from i in unorderedList
select new
{
Item = i,
Order = (p - 1) / Math.Pow(t + 2, 1.5)
}
orderby extracted.Order
select extracted.Item
).ToList();

Related

LINQ to JSON group query on array

I have a sample of JSON data that I am converting to a JArray with NewtonSoft.
string jsonString = #"[{'features': ['sunroof','mag wheels']},{'features': ['sunroof']},{'features': ['mag wheels']},{'features': ['sunroof','mag wheels','spoiler']},{'features': ['sunroof','spoiler']},{'features': ['sunroof','mag wheels']},{'features': ['spoiler']}]";
I am trying to retrieve the features that are most commonly requested together. Based on the above dataset, my expected output would be:
sunroof, mag wheels, 2
sunroof, 1
mag wheels 1
sunroof, mag wheels, spoiler, 1
sunroof, spoiler, 1
spoiler, 1
However, my LINQ is rusty, and the code I am using to query my JSON data is returning the count of the individual features, not the features selected together:
JArray autoFeatures = JArray.Parse(jsonString);
var features = from f in autoFeatures.Select(feat => feat["features"]).Values<string>()
group f by f into grp
orderby grp.Count() descending
select new { indFeature = grp.Key, count = grp.Count() };
foreach (var feature in features)
{
Console.WriteLine("{0}, {1}", feature.indFeature, feature.count);
}
Actual Output:
sunroof, 5
mag wheels, 4
spoiler, 3
I was thinking maybe my query needs a 'distinct' in it, but I'm just not sure.

This is a problem with the Select. You are telling it to make each value found in the arrays to be its own item. In actuality you need to combine all the values into a string for each feature. Here is how you do it
var features = from f in autoFeatures.Select(feat => string.Join(",",feat["features"].Values<string>()))
group f by f into grp
orderby grp.Count() descending
select new { indFeature = grp.Key, count = grp.Count() };
Produces the following output
sunroof,mag wheels, 2
sunroof, 1
mag wheels, 1
sunroof,mag wheels,spoiler, 1
sunroof,spoiler, 1
spoiler, 1

You could use a HashSet to identify the distinct sets of features, and group on those sets. That way, your Linq looks basically identical to what you have now, but you need an additional IEqualityComparer class in the GroupBy to help compare one set of features to another to check if they're the same.
For example:
var featureSets = autoFeatures
.Select(feature => new HashSet<string>(feature["features"].Values<string>()))
.GroupBy(a => a, new HashSetComparer<string>())
.Select(a => new { Set = a.Key, Count = a.Count() })
.OrderByDescending(a => a.Count);
foreach (var result in featureSets)
{
Console.WriteLine($"{String.Join(",", result.Set)}: {result.Count}");
}
And the comparer class leverages the SetEquals method of the HashSet class to check if one set is the same as another (and this handles the strings being in a different order within the set, etc.)
public class HashSetComparer<T> : IEqualityComparer<HashSet<T>>
{
public bool Equals(HashSet<T> x, HashSet<T> y)
{
// so if x and y both contain "sunroof" only, this is true
// even if x and y are a different instance
return x.SetEquals(y);
}
public int GetHashCode(HashSet<T> obj)
{
// force comparison every time by always returning the same,
// or we could do something smarter like hash the contents
return 0;
}
}

How to calculate a running total using linq

I have a linq query result as shown in the image. In the final query (not shown) I am grouping by Year by LeaveType. However I want to calculate a running total for the leaveCarriedOver per type over years. That is, sick LeaveCarriedOver in 2010 becomes "opening" balance for sick leave in 2011 plus the one for 2011.
I have done another query on the shown result list that looks like:
var leaveDetails1 = (from l in leaveDetails
select new
{
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = leaveDetails.Where(x => x.LeaveType == l.LeaveType).Sum(x => x.LeaveCarriedOver)
});
where leaveDetails is the result from the image.
The resulting RunningTotal is not cumulative as expected. How can I achieve my initial goal. Open to any ideas - my last option will be to do it in javascript in the front-end. Thanks in advance

The simple implementation is to get the list of possible totals first then get the sum from the details for each of these categories.
getting the distinct list of Year and LeaveType is a group by and select first of each group. we return a List<Tuple<int, string>> where Int is the year and string is the LeaveType
var distinctList = leaveDetails1.GroupBy(data => new Tuple<int, string>(data.Year, data.LeaveType)).Select(data => data.FirstOrDefault()).ToList();
then we want total for each of these elements so you want a select of that list to return the id (Year and LeaveType) plus the total so an extra value to the Tuple<int, string, int>.
var totals = distinctList.Select(data => new Tuple<int, string, int>(data.Year, data.LeaveType, leaveDetails1.Where(detail => detail.Year == data.Year && detail.LeaveType == data.LeaveType).Sum(detail => detail.LeaveCarriedOver))).ToList();
reading the line above you can see it take the distinct totals we want to list, create a new object, store the Year and LeaveType for reference then set the last Int with the Sum<> of the filtered details for that Year and LeaveType.

If I completely understand what you are trying to do then I don't think I would rely on the built in LINQ operators exclusively. I think (emphasis on think) that any combination of the built in LINQ operators is going to solve this problem in O(n^2) run-time.
If I were going to implement this in LINQ then I would create an extension method for IEnumerable that is similar to the Scan function in reactive extensions (or find a library out there that has already implemented it):
public static class EnumerableExtensions
{
public static IEnumerable<TAccumulate> Scan<TSource, TAccumulate>(
this IEnumerable<TSource> source,
TAccumulate seed,
Func<TAccumulate, TSource, TAccumulate> accumulator)
{
// Validation omitted for clarity.
foreach(TSource value in source)
{
seed = accumulator.Invoke(seed, value);
yield return seed;
}
}
}
Then this should do it around O(n log n) (because of the order by operations):
leaveDetails
.OrderBy(x => x.LeaveType)
.ThenBy(x => x.Year)
.Scan(new {
Year = 0,
LeaveType = "Seed",
LeaveTaken = 0,
LeaveAllocation = 0.0,
LeaveCarriedOver = 0.0,
RunningTotal = 0.0
},
(acc, x) => new {
x.Year,
x.LeaveType,
x.LeaveTaken,
x.LeaveAllocation,
x.LeaveCarriedOver,
RunningTotal = x.LeaveCarriedOver + (acc.LeaveType != x.LeaveType ? 0 : acc.RunningTotal)
});
You don't say, but I assume the data is coming from a database; if that is the case then you could get leaveDetails back already sorted and skip the sorting here. That would get you down to O(n).
If you don't want to create an extension method (or go find one) then this will achieve the same thing (just in an uglier way).
var temp = new
{
Year = 0,
LeaveType = "Who Cares",
LeaveTaken = 3,
LeaveAllocation = 0.0,
LeaveCarriedOver = 0.0,
RunningTotal = 0.0
};
var runningTotals = (new[] { temp }).ToList();
runningTotals.RemoveAt(0);
foreach(var l in leaveDetails.OrderBy(x => x.LeaveType).ThenBy(x => x.Year))
{
var s = runningTotals.LastOrDefault();
runningTotals.Add(new
{
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = l.LeaveCarriedOver + (s == null || s.LeaveType != l.LeaveType ? 0 : s.RunningTotal)
});
}
This should also be O(n log n) or O(n) if you can pre-sort leaveDetails.

If I understand the question you want something like
decimal RunningTotal = 0;
var results = leaveDetails
.GroupBy(r=>r.LeaveType)
.Select(r=> new
{
Dummy = RunningTotal = 0 ,
results = r.OrderBy(o=>o.Year)
.Select(l => new
{
l.Year,
l.LeaveType ,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = (RunningTotal = RunningTotal + l.LeaveCarriedOver )
})
})
.SelectMany(a=>a.results).ToList();
This is basically using the Select<TSource, TResult> overload to calculate the running balance, but first grouped by LeaveType so we can reset the RunningTotal for every LeaveType, and then ungrouped at the end.

You have to use Window Function Sum here. Which is not supported by EF Core and earlier versions of EF. So, just write SQL and run it via Dapper
SELECT
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
SUM(l.LeaveCarriedOver) OVER (PARTITION BY l.Year, l.LeaveType) AS RunningTotal
FROM leaveDetails l
Or, if you are using EF Core, use package linq2db.EntityFrameworkCore
var leaveDetails1 = from l in leaveDetails
select new
{
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = Sql.Ext.Sum(l.LeaveCarriedOver).Over().PartitionBy(l.Year, l.LeaveType).ToValue()
};
// switch to alternative LINQ translator
leaveDetails1 = leaveDetails1.ToLinqToDB();

linq-to-sql getting sequence contains more than one element

I have a query that looks like this: it takes a list of IDs (ThelistOfIDs) as parameter and I'm grouping for a count.
var TheCounter = (from l in MyDC.SomeTable
where ThelistOfIDs.Contains(l.ID)
group l by l.Status into groups
select new Counter()
{
CountOnes = (from g in groups
where g.Status == 1
select g).Count(),
CountTwos = (from g in groups
where g.Status == 2
select g).Count(),
}).Single();
And basically, I don't understand why I'm getting the error. I don't want to brring back the entore collection from the DB and do the count in linq-to-object; I want to do the count in the DB and bring back the result.

I have not put your query into my IDE or compiled with C#, but I guess the problem is that
groups in your query is IGrouping<Tkey, Telm> and not IQueryable<Tkey>
(where Tkey is type of l.Status and Telm is type of l).
I think you got confused with the use of grouping operator.
What you want to get is I guess:
var queryByStatus = from l in MyDC.SomeTable
where ThelistOfIDs.Contains(l.ID)
group l by l.Status;
var counter = new Counter()
{
CountOnes = queryByStatus.Where(l => l.Key == 1).Count(),
CountTwos = queryByStatus.Where(l => l.Key == 2).Count(),
};
EDIT:
Alternative query, to obtain the same, moving all operation on DB into the original query so that DB is queried only once.
var queryCountByStatus = from l in MyDC.SomeTable
where ThelistOfIDs.Contains(l.ID)
group l by l.Status into r
select new { status = r.Key, count = r.Count() };
var countByStatus = queryCountByStatus.ToList();
var counter = new Counter()
{
CountOnes = countByStatus.FirstOrDefault(l => l.status == 1).count,
CountTwos = countByStatus.FirstOrDefault(l => l.status == 2).count,
};
Note:
The query in my edit section queries the DB once only and mapping Status -> Count is returned.
Note that in my original query there were two calls to DB needed only - both of which returned single number - one for CountOnes, one for CountTwos.
In the edit query, one query is done which return table { { 1, CountOnes}, {2, CountTwos } }. The other lines are just to convert the result - which is set of items - into single object having certain objects as properties and is done physically on these two values.

You are grouping by Status, and then projecting from that group - but you will still have one row per unique Status (===group).
So: I propose that you don't have exactly one unique Status.

This might be what you're looking for to get...
(it's for users table I had but should be the same)
var statuscounts = (from u in db.Users
where u.UserStatus > 0
group u by u.UserStatus into groups
select new { Status = groups.Key, Count = groups.Count() });
// do this to iterate and pump into a Counter at will
foreach (var g in statuscounts)
Console.WriteLine("{0}, {1}", g.Status, g.Count);
...or even something like this...
var counter = statuscounts.AsEnumerable()
.Aggregate(new Counter(), (c, a) => {
switch (a.Status)
{
case 1: c.CountOfOnes = a.Count; return c;
case 2: c.CountOfTwos = a.Count; return c;
case 3: c.CountOfThrees = a.Count; return c;
default: c.CountOfOthers = a.Count; return c;
}});
...point is that if you're grouping already you should use the grouping result, it's of type IGrouping<out TKey, out TElement> where the key is your status and it's IEnumerable<> or your records.
hope this helps

Get min value in row during LINQ query

I know that I can use .Min() to get minimum value from column, but how to get minimum value in a row?
I have following LINQ query (for testing purposes):
from p in Pravidloes
where p.DulezitostId == 3
where p.ZpozdeniId == 1 || p.ZpozdeniId == 2
where p.SpolehlivostId == 2 || p.SpolehlivostId == 3
group p by p.VysledekId into g
select new {
result = g.Key,
value = g
}
Which results into this:
I would however like to get only the MIN value of following three columns:
DulezitostId, ZpozdeniId, SpolehlivostId as a value in:
select new {
result = g.Key,
value = g // <-- here
}
The final result then should look like:
result: 2, value: 1
result: 3, value: 2
I have been looking for similar questions here and googled for few examples with grouping and aggregating queries, but found nothing that would move me forward with this problem.
Btw: Solution isn't limited to linq, if you know better way how to do it.

You could create an array of the values and do Min on those.
select new {
result = g.Key,
value = g.SelectMany(x => new int[] { x.DulezitostId, x.ZpozdeniId, x.SpolehlivostId }).Min()
}
This will return the min for those 3 values in each grouping for ALL rows of that grouping.
Which would result in something like this...
result: 3, value: 1
The below will select the min for each row in the grouping...
select new {
result = g.Key,
value = g.Select(x => new int[] { x.DulezitostId, x.ZpozdeniId, x.SpolehlivostId }.Min())
}
Which would result in something like this...
result: 3, value: 1, 2

The best solution if you're using straight LINQ is Chad's answer. However, if you're using Linq To SQL it won't work because you can't construct an array like that.
Unfortunately, I believe the only way to do this in Linq To Sql is to use Math.Min repeatedly:
select new {
result = g.Key,
value = Math.Min(Math.Min(DulezitostId, ZpozdeniId), SpolehlivostId)
}
This will generate some ugly CASE WHEN ... statements, but it works.
The main advantage of doing it this way is that you're only returning the data you need from SQL (instead of returning all 3 columns and doing the Min in the application).

Count occurrences of values across multiple columns

I am having a terrible time finding a solution to what I am sure is a simple problem.
I started an app with data in Lists of objects. It's pertinent objects used to look like this (very simplified):
class A {
int[] Nums;
}
and
List<A> myListOfA;
I wanted to count occurrences of values in the member array over all the List.
I found this solution somehow:
var results
from a in myListOfA
from n in a.Nums
group n by n into g
orderby g.Key
select new{ number = g.Key, Occurences = g.Count}
int NumberOfValues = results.Count();
That worked well and I was able to generate the histogram I wanted from the query.
Now I have converted to using an SQL database. The table I am using now looks like this:
MyTable {
int Value1;
int Value2;
int Value3;
int Value4;
int Value5;
int Value6;
}
I have a DataContext that maps to the DB.
I cannot figure out how to translate the previous LINQ statement to work with this. I have tried this:
MyDataContext myContext;
var results =
from d in myContext.MyTable
from n in new{ d.Value1, d.Value2, d.Value3, d.Value4, d.Value5, d.Value6 }
group n by n into g
orderby g.Key
select new { number = g.Key, Occurences = g.Count() };
I have tried some variations on the constructed array like adding .AsQueryable() at the end - something I saw somewhere else. I have tried using group to create the array of values but nothing works. I am a relative newbie when it come to database languages. I just cannot find any clue anywhere on the web. Maybe I am not asking the right question. Any help is appreciated.

I received help on a microsoft site. The problem is mixing LINQ to SQL with LINQ to Objects.
This is how the query should be stated:
var results =
from d in MyContext.MyTable.AsEnumerable()
from n in new[]{d.Value1, d.Value2, d.Value3, d.Value4, d.Value5, d.Value6}
group n by n into g
orderby g.Key
select new {number = g.Key, Occureneces = g.Count()};
Works like a charm.

If you wish to use LINQ to SQL, you could try this "hack" that I recently discovered. It isn't the prettiest most cleanest code, but at least you won't have to revert to using LINQ to Objects.
var query =
from d in MyContext.MyTable
let v1 = MyContext.MyTable.Where(dd => dd.ID == d.ID).Select(dd => dd.Value1)
let v2 = MyContext.MyTable.Where(dd => dd.ID == d.ID).Select(dd => dd.Value2)
// ...
let v6 = MyContext.MyTable.Where(dd => dd.ID == d.ID).Select(dd => dd.Value6)
from n in v1.Concat(v2).Concat(v3).Concat(v4).Concat(v5).Concat(v6)
group 1 by n into g
orderby g.Key
select new
{
number = g.Key,
Occureneces = g.Count(),
};

How about creating your int array on the fly?
var results =
from d in myContext.MyTable
from n in new int[] { d.Value1, d.Value2, d.Value3, d.Value4, d.Value5, d.Value6 }
group n by n into g
orderby g.Key
select new { number = g.Key, Occurences = g.Count() };

In a relational database, such as SQL Server, collections are represented as tables. So you should actually have two tables - Samples and Values. The Keys table would represent a single "A" object, while the Values table would represent each element in A.Nums, with a foreign key pointing to the one of the records in the Samples table. LINQ to SQL
's O/R mapper will then create a "Values" property for each Sample object, which contains a queryable collection of the attached Values. You would then use the following query:
var results =
from sample in myContext.Samples
from value in sample.Values
group value by value into values
orderby values.Key
select new { Value = values.Key, Frequency = values.Count() };

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Hacker News style ordering algorithm in Linq-To-SQL - c#

Using 2 1 Linq queries (there's probably still a more efficent way) var votesQuery = from i in items join v in votes on i.Id equals v.ItemId orderby (v.Value - 1) / Math.Pow( (DateTime.Now - i.Posted).Add(new TimeSpan(2,0,0)).Hours, 1.5 ) select new { Item = i, Vote = v };

This might work (untested): var orderedList = from extracted in ( from i in unorderedList select new { Item = i, Order = (p - 1) / Math.Pow(t + 2, 1.5) } orderby extracted.Order select extracted.Item ).ToList();

Related

LINQ to JSON group query on array

How to calculate a running total using linq

linq-to-sql getting sequence contains more than one element

Get min value in row during LINQ query

Count occurrences of values across multiple columns

Categories

Resources