best algorithm for search between arrays - c#

I have a problem I need to solve in the best algorithm I can find.
Let me describe the problem first.
I have a class A with number of Hashset<int> with Z number of items
A -> {x,y,z | x = {0,1,2} , y = {-1,0,9} ... }
B -> {x,y,z,k | x = {0,1,-2} , y = {-1,0,19} ... }
...
with an input of a new array of int { ... } entered by the user, the result should be the group with the most hashset with matching numbers between the input and the groups.
For example :
A : {[1,2,3][2,3,8][-1,-2,2]}
B : {[0,-9,3][12,23,68][-11,-2,2]}
Input :
[2,3,-19]
result A : {[2,3][2,3][2]}
result B : {[3][][2]}
A : 3
B : 2
A is the correct answer.
Or something like that .
Yes, I know it's a subjective question but it's for a good cause.

Assuming you have an unknown number of samples to check on the input set, this Linq query should do the trick.
from sample in samples
let intersectedSets =
from set in sample
let intersection = input.Intersect(set)
where intersection.Count() > 0
select intersection
orderby intersectedSets.Count() descending
select intersectedSets;
The top-most element is your desired sample, thus yourCollection.First() will yield your result set - In your given example:
var samples = new[] {
new[]{
new[]{1, 2, 3},
new[]{2, 3, 8},
new[]{-1, -2, 2}
},
new[]{
new[]{0, -9, 3},
new[]{12, 23, 68},
new[]{-11, -2, 2}
}
};
var input = new[]{2, 3, -19};
var result =
(from sample in samples
let intersectedSets =
from set in sample
let intersection = input.Intersect(set)
where intersection.Count() > 0
select intersection
orderby intersectedSets.Count() descending
select intersectedSets).First();
result.Dump(); // LINQPad extension method

apparently you want to use C# to implement this. I don't know if this is the best algorithm (in whatever context) but you could use LINQ to write it down very plain and simple:
int[][] arrays = new[] { new[] { 1, 2 }, new[] { 2, 3 }, new[] {3, 4} };
int[] input = new[] { 1, 4 };
Console.WriteLine(arrays.Count((itemarray) => itemarray.Any((item) => input.Contains(item))));
in an array of int arrays this finds the number of arrays that have at least one of the values of the input array. this is what you're doing, though I'm not sure if it's what you're asking of us.

Given a sample class HashHolder and an instance A of it:
public class HashHolder
{
public HashHolder()
{
Hashes = new List<HashSet<int>>();
}
public List<HashSet<int>> Hashes { get; set; }
}
You can group by hashset and take the maximum count between all groups:
var maxHash = A.Hashes.GroupBy(h => h)
.Select(g => new { Hash = g.Key, Count = input.Count(num => g.Key.Contains(num)) })
.OrderByDescending(g => g.Count)
.FirstOrDefault();
The result will then be maxHash.Hash if maxhHash is not null.

Related

How to find duplicates fieldsa in 2 lists of different types and remove them from both lists, in C#

This is not a duplicate of: Given 2 C# Lists how to merge them and get only the non duplicated elements from both lists since he's looking at lists of the same type.
I have this scenario:
class A
{
string id;
.... some other stuff
}
class B
{
string id;
.... some other stuff
}
I would like to remove, both from A and B, elements that share an id field between the two lists.
I can do it in 3 steps: find the common ids, and then delete the records from both lists, but I'm wondering if there is something more elegant.
Edit: expected output
var A = [ 1, 3, 5, 7, 9 ]
var B = [ 1, 2, 3, 4, 5 ]
output:
A = [ 7, 9 ]
B = [ 2, 4 ]
but this is showing only the id field; as stated above, the lists are of different types, they just share ids.
You will require three steps, but you can use Linq to simplify the code.
Given two classes which have a property of the same (equatable) type, named "ID":
class Test1
{
public string ID { get; set; }
}
class Test2
{
public string ID { get; set; }
}
Then you can find the duplicates and remove them from both lists like so:
var dups =
(from item1 in list1
join item2 in list2 on item1.ID equals item2.ID
select item1.ID)
.ToArray();
list1.RemoveAll(item => dups.Contains(item.ID));
list2.RemoveAll(item => dups.Contains(item.ID));
But that is still three steps.
See .Net Fiddle example for a runnable example.
You can use LINQ Lambda expression for elegance:
var intersectValues = list2.Select(r => r.Id).Intersect(list1.Select(r => r.Id)).ToList();
list1.RemoveAll(r => intersectValues.Contains(r.Id));
list2.RemoveAll(r => intersectValues.Contains(r.Id));
Building on #Matthew Watson's answer you can move all of it to a single LINQ expression with
(from item1 in list1
join item2 in list2 on item1.ID equals item2.ID
select item1.ID)
.ToList()
.ForEach(d =>
{
list1.RemoveAll(i1 => d == i1.ID);
list2.RemoveAll(i2 => d == i2.ID);
}
);
I don't know where you land on the performance scale. The compiler might actually split this up into the three steps steps you already mentioned.
You also lose some readability as the from ... select result does not have a 'speaking' name like duplicates, to directly tell you what you will be working with in the ForEach.
Complete code example at https://gist.github.com/msdeibel/d2f8a97b754cca85fe4bcac130851597
O(n)
var aHash = list<A>.ToHashSet(x=>x.ID);
var bHash = list<B>.ToHashSet(x=>x.ID);
var result1 = new List<A>(A.Count);
var result2 = new List<B>(B.Count);
int value;
foreach (A item in list<A>)
{
if (!bHash.TryGetValue(item.ID, out value))
result1.Add(A);
}
foreach (B item in list<B>)
{
if (!aHash.TryGetValue(item.ID, out value))
result2.Add(B);
}

Select random questions with where clause in linq

I need to select random questions per category
private int[] categoryId = {1, 2, 3, 4, 5, ...};
private int[] questionsPerCategory = {3, 1, 6, 11, 7, ...};
Before linq i achieved it by using
SELECT TOP (#questionsPerCategory) * From Questions WHERE CategoriesID = #categoryId AND
InTest ='1' ORDER BY NEWID()
Which also was not correct, since i had to call this for every categoryId.
How can i have the desired results with linq in a single query?
All i need is fetch
3 random questions, with categoryId = 1 and InTest = true,
1 random question, with categoryId = 2 and InTest = true,
6 random questions, with categoryId = 3 and InTest = true
and so on..
Since Guid.NewGuid is not supported by LINQ to SQL, first you need to get access to NEWID function by using the trick from the accepted answer to Random row from Linq to Sql by adding the following to your context class:
partial class YourDataContext {
[Function(Name="NEWID", IsComposable=true)]
public Guid Random()
{ // to prove not used by our C# code...
throw new NotImplementedException();
}
}
Then the query for single CategoryID and question count would be:
var query = db.Questions
.Where(e => e.CategoriesID == categoryId[i] && e.InTest)
.OrderBy(e => db.Random())
.Take(questionsPerCategory[i])
To get the desired result for all category / question count pairs, you could build a UNION ALL SQL query by using Concat of the above single query for i = 0..N like this:
var query = categoryId.Zip(questionsPerCategory,
(catId, questions) => db.Questions
.Where(q => q.CategoriesID == catId && q.InTest)
.OrderBy(q => db.Random())
.Take(questions)
).Aggregate(Queryable.Concat)
.ToList();
This should produce the desired result with single SQL query. Of course it's applicable if the count of the categoryId is relative small.
Maybe you want something like this, you do a group by then select how many you want from each category.
Edited: As pointed out by Enigmativity in the comments, Guid.NewGuid() shouldn't be used to for randomness only for uniqueness. To produce randomness you should consult this StackOverflow post.
Demo
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
private static int[] categoryIds = new int[] {1, 2, 3, 4, 5};
private static int[] questionsPerCategory = {3, 1, 6, 11, 7};
//Part of demo
private static IEnumerable<QuestionVM> Questions = Enumerable.Range(0,100).Select(x=> new QuestionVM { Question = $"Question - {x}", CategoryId = (x % 5) + 1});
public static void Main()
{
var questions = Questions.Where(x=> x.InTest).GroupBy(x=> x.CategoryId).SelectMany(x=> x.OrderBy(y=> Guid.NewGuid()).Take(GetQuestionTake(x.Key)));
foreach(var question in questions)
Console.WriteLine($"{question.Question} - CategoryId: {question.CategoryId}");
}
///Finds out how many questions it should take by doing a search and then picking the element in the same position
private static int GetQuestionTake(int categoryId)
{
int element = categoryIds.Select((x, i) => new { i, x }).FirstOrDefault(x => x.x == categoryId).i;
return questionsPerCategory.ElementAtOrDefault(element);
}
}
//Part of demo
public class QuestionVM
{
public string Question {get;set;}
public int CategoryId {get;set;}
public bool InTest {get;set;} = true;
}
I think you are looking for Take() method.
You should also pass parameters to the method with category id and how many questions you want to receive. Pass those parameters from your arrays.
private IQuerable<Question> Method(int Id, int questionsCount)
{
return Questions.Where(c=>c.CategoriesId==Id && c.InTest==1).Take(questionsCount).OrderBy(c=>c.NewId);
}
A common way is to order by Guid.NewGuid(), so to extend Crekate's answer above.
.OrderBy(c=>Guid.NewGuid());

Sort groups based on values within groups

I am trying to sort an array that contains logical groups of people, and the people's scores.
Name | Group | Score
----------------------
Alfred | 1 | 3
Boris | 3 | 3
Cameron| 3 | 1
Donna | 1 | 2
Emily | 2 | 2
The people should be sorted by group, based on the lowest score in the group. Therefore, group 3 is first, because it contains the person with the lowest score. Then the people in group 1 because it has the person with the next lowest score (and a lower group number than group 2).
So the result would be: Cameron, Boris, Donna, Alfred, Emily
I have accomplished this, but I am wondering if there is a better way of doing it. I receive an array, and end up sorting the array in the correct order.
I use LINQ (mostly obtained from Linq order by, group by and order by each group?) to create a target sorting array that maps where a person should be, compared to where they currently are in the array.
I then use Array.Sort using my target sorting array, but the array the LINQ statement creates is "reversed" in terms of indices and values, so I have to reverse the indices and values (not the order).
I have attached my code below. Is there a better way of doing this?
using System;
using System.Collections.Generic;
using System.Linq;
namespace Sorter
{
class Program
{
static void Main(string[] args)
{
// Sample person array.
// Lower score is better.
Person[] peopleArray = new Person[]
{
new Person { Name = "Alfred", Group = "1", Score = 3, ArrayIndex = 0 },
new Person { Name = "Boris", Group = "3", Score = 3, ArrayIndex = 1 },
new Person { Name = "Cameron", Group = "3", Score = 1, ArrayIndex = 2 },
new Person { Name = "Donna", Group = "1", Score = 2, ArrayIndex = 3 },
new Person { Name = "Emily", Group = "2", Score = 2, ArrayIndex = 4 }
};
// Create people list.
List<Person> peopleModel = peopleArray.ToList();
// Sort the people based on the following:
// Sort people into groups (1, 2, 3)
// Sort the groups by the lowest score within the group.
// So, the first group would be group 3, because it has the
// member with the lowest score (Cameron with 1).
// The people are therefore sorted in the following order:
// Cameron, Boris, Donna, Alfred, Emily
int[] targetOrder = peopleModel.GroupBy(x => x.Group)
.Select(group => new
{
Rank = group.OrderBy(g => g.Score)
})
.OrderBy(g => g.Rank.First().Score)
.SelectMany(g => g.Rank)
.Select(i => i.ArrayIndex)
.ToArray();
// This will give the following array:
// [2, 1, 3, 0, 4]
// I.e: Post-sort,
// the person who should be in index 0, is currently at index 2 (Cameron).
// the person who should be in index 1, is currently at index 1 (Boris).
// etc.
// I want to use my target array to sort my people array.
// However, the Array.sort method works in the reverse.
// For example, in my target order array: [2, 1, 3, 0, 4]
// person currently at index 2 should be sorted into index 0.
// I need the following target order array: [3, 1, 0, 2, 4],
// person currently at index 0, should be sorted into index 3
// So, "reverse" the target order array.
int[] reversedArray = ReverseArrayIndexValue(targetOrder);
// Finally, sort the base array.
Array.Sort(reversedArray, peopleArray);
// Display names in order.
foreach (var item in peopleArray)
{
Console.WriteLine(item.Name);
}
Console.Read();
}
/// <summary>
/// "Reverses" the indices and values of an array.
/// E.g.: [2, 0, 1] becomes [1, 2, 0].
/// The value at index 0 is 2, so the value at index 2 is 0.
/// The value at index 1 is 0, so the value at index 0 is 1.
/// The value at index 2 is 1, so the value at index 1 is 2.
/// </summary>
/// <param name="target"></param>
/// <returns></returns>
private static int[] ReverseArrayIndexValue(int[] target)
{
int[] swappedArray = new int[target.Length];
for (int i = 0; i < target.Length; i++)
{
swappedArray[i] = Array.FindIndex(target, t => t == i);
}
return swappedArray;
}
}
}
As I understand, you want to sort the input array in place.
First, the sorting part can be simplified (and made more efficient) by first OrderBy Score and then GroupBy Group, utilizing the defined behavior of Enumerable.GroupBy:
The IGrouping<TKey, TElement> objects are yielded in an order based on the order of the elements in source that produced the first key of each IGrouping<TKey, TElement>. Elements in a grouping are yielded in the order they appear in source.
Once you have that, all you need is to flatten the result, iterate it (thus executing it) and put the yielded items in their new place:
var sorted = peopleArray
.OrderBy(e => e.Score)
.ThenBy(e => e.Group) // to meet your second requirement for equal Scores
.GroupBy(e => e.Group)
.SelectMany(g => g);
int index = 0;
foreach (var item in sorted)
peopleArray[index++] = item;
Not sure if I really understood what the wished outcome should be, but this at least gives same order as mentioned in example in comments:
var sortedNames = peopleArray
// group by group property
.GroupBy(x => x.Group)
// order groups by min score within the group
.OrderBy(x => x.Min(y => y.Score))
// order by score within the group, then flatten the list
.SelectMany(x => x.OrderBy(y => y.Score))
// doing this only to show that it is in right order
.Select(x =>
{
Console.WriteLine(x.Name);
return false;
}).ToList();
int[] order = Enumerable.Range(0, peopleArray.Length)
.OrderBy(i => peopleArray[i].Score)
.GroupBy(i => peopleArray[i].Group)
.SelectMany(g => g).ToArray(); // { 2, 1, 3, 0, 4 }
Array.Sort(order, peopleArray);
Debug.Print(string.Join(", ", peopleArray.Select(p => p.ArrayIndex))); // "3, 1, 0, 2, 4"
If your desired result is less line of codes. How about this?
var peoples = peopleModel.OrderBy(i => i.Score).GroupBy(g =>
g.Group).SelectMany(i => i, (i, j) => new { j.Name });
1) Order list by scores
2) Group it by grouping
3) Flatten the grouped list and create new list with "Name" property using SelectMany
For information using anonymous type
https://dzone.com/articles/selectmany-probably-the-most-p

LINQ: Determine if two sequences contains exactly the same elements

I need to determine whether or not two sets contains exactly the same elements. The ordering does not matter.
For instance, these two arrays should be considered equal:
IEnumerable<int> data = new []{3, 5, 6, 9};
IEnumerable<int> otherData = new []{6, 5, 9, 3}
One set cannot contain any elements, that are not in the other.
Can this be done using the built-in query operators? And what would be the most efficient way to implement it, considering that the number of elements could range from a few to hundreds?
If you want to treat the arrays as "sets" and ignore order and duplicate items, you can use HashSet<T>.SetEquals method:
var isEqual = new HashSet<int>(first).SetEquals(second);
Otherwise, your best bet is probably sorting both sequences in the same way and using SequenceEqual to compare them.
I suggest sorting both, and doing an element-by-element comparison.
data.OrderBy(x => x).SequenceEqual(otherData.OrderBy(x => x))
I'm not sure how fast the implementation of OrderBy is, but if it's a O(n log n) sort like you'd expect the total algorithm is O(n log n) as well.
For some cases of data, you can improve on this by using a custom implementation of OrderBy that for example uses a counting sort, for O(n+k), with k the size of the range wherein the values lie.
If you might have duplicates (or if you want a solution which performs better for longer lists), I'd try something like this:
static bool IsSame<T>(IEnumerable<T> set1, IEnumerable<T> set2)
{
if (set1 == null && set2 == null)
return true;
if (set1 == null || set2 == null)
return false;
List<T> list1 = set1.ToList();
List<T> list2 = set2.ToList();
if (list1.Count != list2.Count)
return false;
list1.Sort();
list2.Sort();
return list1.SequenceEqual(list2);
}
UPDATE: oops, you guys are right-- the Except() solution below needs to look both ways before crossing the street. And it has lousy perf for longer lists. Ignore the suggestion below! :-)
Here's one easy way to do it. Note that this assumes the lists have no duplicates.
bool same = data.Except (otherData).Count() == 0;
Here is another way to do it:
IEnumerable<int> data = new[] { 3, 5, 6, 9 };
IEnumerable<int> otherData = new[] { 6, 5, 9, 3 };
data = data.OrderBy(d => d);
otherData = otherData.OrderBy(d => d);
data.Zip(otherData, (x, y) => Tuple.Create(x, y)).All(d => d.Item1 == d.Item2);
First, check the length. If they are different, the sets are different.
you can do data.Intersect(otherData);, and check the length is identical.
OR, simplt sort the sets, and iterate through them.
First check if both data collections have the same number of elements and the check if all the elements in one collection are presented in the other
IEnumerable<int> data = new[] { 3, 5, 6, 9 };
IEnumerable<int> otherData = new[] { 6, 5, 9, 3 };
bool equals = data.Count() == otherData.Count() && data.All(x => otherData.Contains(x));
This should help:
IEnumerable<int> data = new []{ 3,5,6,9 };
IEnumerable<int> otherData = new[] {6, 5, 9, 3};
if(data.All(x => otherData.Contains(x)))
{
//Code Goes Here
}

How to get the top 3 elements in an int array using LINQ?

I have the following array of integers:
int[] array = new int[7] { 1, 3, 5, 2, 8, 6, 4 };
I wrote the following code to get the top 3 elements in the array:
var topThree = (from i in array orderby i descending select i).Take(3);
When I check what's inside the topThree, I find:
{System.Linq.Enumerable.TakeIterator}
count:0
What did I do wrong and how can I correct my code?
How did you "check what's inside the topThree"? The easiest way to do so is to print them out:
using System;
using System.Linq;
public class Test
{
static void Main()
{
int[] array = new int[7] { 1, 3, 5, 2, 8, 6, 4 };
var topThree = (from i in array
orderby i descending
select i).Take(3);
foreach (var x in topThree)
{
Console.WriteLine(x);
}
}
}
Looks okay to me...
There are potentially more efficient ways of finding the top N values than sorting, but this will certainly work. You might want to consider using dot notation for a query which only does one thing:
var topThree = array.OrderByDescending(i => i)
.Take(3);
Your code seems fine to me, you maybe want to get the result back to another array?
int[] topThree = array.OrderByDescending(i=> i)
.Take(3)
.ToArray();
Its due to the delayed execution of the linq query.
As suggested if you add .ToArray() or .ToList() or similar you will get the correct result.
int[] intArray = new int[7] { 1, 3, 5, 2, 8, 6, 4 };
int ind=0;
var listTop3 = intArray.OrderByDescending(a=>a).Select(itm => new {
count = ++ind, value = itm
}).Where(itm => itm.count < 4);

Categories