How to get count of duplicate values with value name - c#

I'm pretty new with Elasticsearch, I'm using NEST library. How can I get count of duplicate values?
Here is my class:
public class Book
{
public string BookName {get;set;}
public string Author {get;set;}
}
This is my data:
BookName=X, Author=a<br>
BookName=Y, Author=a<br>
BookName=Z, Author=b<br>
BookName=C, Author=b<br>
BookName=T, Author=c<br>
Query result should be:
a- 2
b- 2
c- 1
I tried the following query but it doesn't work:
client.Search<Book>(s => s
.Aggregations(a => a
.Terms("group_by_auth", ts => ts
.Field(o => o.Author)
.Size(10)
.Aggregations(aa => aa
.Sum("sum_value", sa => sa
.Field(o => o.Author)
)
)
)
));
Mapping is:
client= new ElasticClient(connectionSettings);
client.CreateIndex("books", c => c
.Mappings(m => m
.Map<Book>(mm => mm
.Properties(ps=>ps
.Text(s=>s
.Name(a=>a.Author)
)))
)

If you want the count of the terms you can get it like so
var uri = new Uri("http://localhost.fiddler:9200");
ElasticClient db = new ElasticClient(uri);
var data = new[] {
new{ BookName= "X", Author="a" },
new{ BookName= "Y", Author="a" },
new{ BookName= "Z", Author="b" },
new{ BookName= "C", Author="b" },
new{ BookName= "T", Author="c" },
};
db.DeleteIndex("test");
foreach (var d in data)
{
db.Index(d, id => id.Index("test"));
}
System.Threading.Thread.Sleep(1000);
var items = db.Search<dynamic>(s => s.Size(0).Aggregations(aggr => aggr.Terms("group_by_auth", ts => ts.Field("author.keyword"))));
foreach (var item in items.Aggs.Terms("group_by_auth").Buckets)
{
Console.WriteLine(item.Key + "-" + item.DocCount);
}
Console.WriteLine("DONE");
Console.ReadLine();

Related

How to Add Rownum to GroupBy Linq

I have a complex LINQ Query to extract Top students in my university. Here is the query :
var query = Db.Students.AsNoTracking().Where(...).AsQueryable();
var resultgroup = query.GroupBy(st => new
{
st.Student.CourseStudyId,
st.Student.EntranceTermId,
st.Student.StudyingModeId,
st.Student.StudyLevelId
}, (key, g) => new
{
CourseStudyId = key.CourseStudyId,
EntranceTermId = key.EntranceTermId,
StudyingModeId = key.StudyingModeId,
StudyLevelId = key.StudyLevelId,
list = g.OrderByDescending(x =>
x.StudentTermSummary.TotalAverageTillTerm).Take(topStudentNumber)
}).SelectMany(q => q.list).AsQueryable();
This Query give me top n students based on 4 parameters and on their TotalAverageTillTerm.
Now I want to add rownum for each group to simulate Total rank, for example Output is :
Now I want to Add TotalRank as rownumber like Sql. In the picture X1=1,X2=2,X3=3 and Y1=1,Y2=2,Y3=3
If I want to reduce problem. I only work on one group. Code Like this :
resultgroup = query.GroupBy(st => new
{
st.Student.StudyLevelId
}, st => st, (key, g) => new
{
StudyLevelId = key.StudyLevelId,
list = g.OrderByDescending(x =>
x.StudentTermSummary.TotalAverageTillTerm)
.Take(topStudentNumber)
}).SelectMany(q => q.list).AsQueryable();
list was a List of student but I see no sign of student having a rank property so I wrapped it into a annonimous type with rank.
var query = Db.Students.AsNoTracking().Where(...).AsEnumerable();
var resultgroup = query.GroupBy(st => new {
st.Student.CourseStudyId,
st.Student.EntranceTermId,
st.Student.StudyingModeId,
st.Student.StudyLevelId
})
.SelectMany( g =>
g.OrderByDescending(x =>x.StudentTermSummary.TotalAverageTillTerm)
.Take(topStudentNumber)
.Select((x,i) => new {
CourseStudyId = g.Key.CourseStudyId,
EntranceTermId = g.Key.EntranceTermId,
StudyingModeId = g.Key.StudyingModeId,
StudyLevelId = g.Key.StudyLevelId,
Rank = i+1
//studentPorperty = x.Prop1,
})
)
.AsQueryable();
Do you mean :
var query = Db.Students.AsNoTracking().Where(...).AsQueryable();
var resultgroup = query.GroupBy(st => new
{
st.Student.CourseStudyId,
st.Student.EntranceTermId,
st.Student.StudyingModeId,
st.Student.StudyLevelId
}, (key, g) => new
{
CourseStudyId = key.CourseStudyId,
EntranceTermId = key.EntranceTermId,
StudyingModeId = key.StudyingModeId,
StudyLevelId = key.StudyLevelId,
list = g.OrderByDescending(x =>
x.StudentTermSummary.TotalAverageTillTerm)
.Take(topStudentNumber)
.Select((x, i) => new { Item = x, TotalRank = i /* item number inside group */}),
StudentsInGroupCount = g.Count() // count group this items
}).SelectMany(q => q).AsQueryable();
To see the results :
foreach (var item in resultgroup.ToList())
{
item.list.ForEach(s => Console.WriteLine(s.TotalRank));
}

optimize the comparison in two lists with LINQ

I have two lists of object:
Customer And Employee
I need to check if there is at least 1 Client with the same name as an employee.
Currently I have:
client.ForEach(a =>
{
if (employee.Any(m => m.Name == a.Name && m.FirstName==a.FirstName)
{
// OK TRUE
}
});
can I improve reading by doing it in another way?
why won't you check it before hand using join?
var mergedClients = Client.Join(listSFull,
x => new { x.Name, x.FirstName},
y => new { Name = y.Name, FirstName= y.FirstName},
(x, y) => new { x, y }).ToList();
and then iterate over the new collection:
mergedClients.ForEach(a =>
//your logic
Only disadvantage of this approach (if it bothers you) is that null values will not be included.
I would go either with Join
var isDuplicated = clients.Join(employees,
c => new { c.Name, c.FirstName },
e => new { e.Name, e.FirstName },
(c, e) => new { c, e })
.Any();
or Intersect
var clientNames = clients.Select(c => new { c.Name, c.FirstName });
var employeeNames = employees.Select(e => new { e.Name, e.FirstName });
var isDuplicated = clientNames.Intersect(employeeNames).Any();
Both of Join and Intersect use hashing, and are close to O(n).
Note: equality (and hash code) of anonymous objects (new { , }) is evaluated as for a value type. I.e. two anonymous objects are equal (implies have same hash code) when all their fields are equal.
=== EDIT: Ok, I was interested myself (hope your question was about performance :P)
[TestMethod]
public void PerformanceTest()
{
var random = new Random();
var clients = Enumerable.Range(0, 10000)
.Select(_ => new Person { FirstName = $"{random.Next()}",
LastName = $"{random.Next()}" })
.ToList();
var employees = Enumerable.Range(0, 10000)
.Select(_ => new Person { FirstName = $"{random.Next()}",
LastName = $"{random.Next()}" })
.ToList();
var joinElapsedMs = MeasureAverageElapsedMs(() =>
{
var isDuplicated = clients.Join(employees,
c => new { c.FirstName, c.LastName },
e => new { e.FirstName, e.LastName },
(c, e) => new { c, e })
.Any();
});
var intersectElapsedMs = MeasureAverageElapsedMs(() =>
{
var clientNames = clients.Select(c => new { c.FirstName, c.LastName });
var employeeNames = employees.Select(e => new { e.FirstName, e.LastName });
var isDuplicated = clientNames.Intersect(employeeNames).Any();
});
var anyAnyElapsedMs = MeasureAverageElapsedMs(() =>
{
var isDuplicated = clients.Any(c => employees.Any(
e => c.FirstName == e.FirstName && c.LastName == e.LastName));
});
Console.WriteLine($"{nameof(joinElapsedMs)}: {joinElapsedMs}");
Console.WriteLine($"{nameof(intersectElapsedMs)}: {intersectElapsedMs}");
Console.WriteLine($"{nameof(anyAnyElapsedMs)}: {anyAnyElapsedMs}");
}
private static double MeasureAverageElapsedMs(Action action) =>
Enumerable.Range(0, 10).Select(_ => MeasureElapsedMs(action)).Average();
private static long MeasureElapsedMs(Action action)
{
var stopWatch = Stopwatch.StartNew();
action();
return stopWatch.ElapsedMilliseconds;
}
public class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
}
Output:
joinElapsedMs: 5.9
intersectElapsedMs: 3.5
anyAnyElapsedMs: 3185.8
Note: any-any is O(n^2) - (in worst case) every employee is iterated per each iterated client.

How to sort linq with fixed values and show all the rest with another sorting [duplicate]

This question already has answers here:
LINQ OrderBy versus ThenBy
(4 answers)
Closed 5 years ago.
IQueryable<Employee> query = ((IEnumerable<Employee>)employeeList)
.Select(x => x)
.AsQueryable();
var strListEmployees = input.MustIncludeIdsInPage.Split(",").ToList();
//the list of employee is dynamic, it'd return 3, 4, 5 or more data
var entities = query
.OrderBy(item => strListEmployees.IndexOf(item.Id.ToString()))
.PageBy(input)
.ToList();
example data
What I want is something like this in order:
by employee name
D
F
A
B
C
E
G
H
Employee D, F, A on top (fix value in List) and show the rest with name sorting (order by).
As M. Wiśnicki mentioned, this is easily solveable as You got only 3 elements. But to dynamically resolve this, I would stick to some function, where You would enter the List (or IEnumerable) of the objects and also the Names, based on which You want to filter them.
The code below is recursion, which will go through the array and select the 1st element (from array) and add the rest. Rest is calling the same function without the 1st name & without the element we have already added.
Something like:
public IEnumerable<Employee> GetOrderedPrefered(IEnumerable<Employee> aList, string[] aNames)
{
if (aNames.Length == 0) return aList.OrderBy(a => a.Name).ToList();
var lRes = new List<Employee>()
{
aList.FirstOrDefault(a => a.Name == aNames[0])
};
lRes.AddRange(
GetOrderedPrefered(
aList.Where(a => a.Name != aNames[0]),
aNames.Where(a => a != aNames.First()
).ToArray()
));
return lRes;
}
Usage:
var lRes = GetOrderedPrefered(persons, names);
foreach (var item in lRes)
Console.WriteLine(item.Name);
> D
> F
> A
> B
> C
> E
> G
You can use OrderBy() and ThenBy()
List<Test> tests = new List<Test>()
{
new Test() {EmployeeID = "1", Name = "A"},
new Test() {EmployeeID = "2", Name = "B"},
new Test() {EmployeeID = "3", Name = "C"},
new Test() {EmployeeID = "4", Name = "D"},
new Test() {EmployeeID = "5", Name = "E"},
new Test() {EmployeeID = "6", Name = "F"},
new Test() {EmployeeID = "7", Name = "G"},
new Test() {EmployeeID = "8", Name = "H"},
};
var x = tests.OrderBy(name => name.Name != "D")
.ThenBy(name => name.Name != "F")
.ThenBy(name => name.Name != "A")
.ThenBy(name => name.Name)
.ToList();
Result is: First D,F,A and others names
Edit:
string[] filtr = new[] {"D", "F", "A"};
var fdata = tests.Where(d => filtr.Contains(d.Name)).OrderBy(z=>z.Name).ToList();
var odata = tests.Where(d => !filtr.Contains(d.Name)).OrderBy(z => z.Name).ToList();
fdata.AddRange(odata);
var set = Enumerable.Range(0, 8)
.Select(i => new {
Name = new string(new[] { (char)('A' + i) })
});
var before = string.Join(",", set.Select(i => i.Name)); //A,B,C,D,E,F,G,H
var priorities = "D,F".Split(',').Select((v, i) => new { Value = v, Index = i });
var query = from s in set
join p in priorities on s.Name equals p.Value into m
from x in m.DefaultIfEmpty(new { Value = s.Name, Index = int.MaxValue })
orderby x.Index, s.Name
select s.Name;
var result = string.Join(",", query); //D,F,A,B,C,E,G,H

string manipulation to extract substring of particular pattern

I have the following string in c#
Count([AssignedTo]) as [AssignedTo] , Sum([Billing Amount]) as [Billing Amount] , Max([Billing Rate]) as [Billing Rate] , Min([ExecutionDate]) as [ExecutionDate] , Average([HoursSpent]) as [HoursSpent] , [Project], [Sub-Project], [TaskName], [Vendor], [Work Classification], [Work Done], Count([WorkItemType]) as [WorkItemType]
Now I want list of all fields having aggregate function , through string manipulation or linq
output like
Count([AssignedTo])
Sum([Billing Amount])
Max([Billing Rate])
Min([ExecutionDate])
Average([HoursSpent])
Count([WorkItemType])
Perhaps this works for you:
var aggr = new []{ "Count", "Sum", "Max", "Min", "Average"};
var allAggregates = text.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(col => new{ col, token = col.TrimStart().Split().First() })
.Where(x => x.token.Contains('(') && aggr.Any(a => x.token.StartsWith(a, StringComparison.OrdinalIgnoreCase)))
.Select(x => x.token);
DEMO
can i get the field name only which is inside function
I prefer string methods instead of regex if possible:
var allAggregates = text.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(col => new { col, token = col.TrimStart().Split().First() })
.Where(x => x.token.Contains('(') && aggr.Any(a => x.token.StartsWith(a, StringComparison.OrdinalIgnoreCase)))
.Select(x => {
string innerPart = x.token.Substring(x.token.IndexOf('(') + 1);
int index = innerPart.IndexOf(')');
if (index >= 0)
innerPart = innerPart.Remove(index);
return innerPart;
});
var aggregates = new []{ "Count", "Sum", "Max", "Min", "Average"};
var output=Regex.Matches(input,#"(\w+)\(.*?\)")
.Cast<Match>()
.Where(x=>aggregates.Any(y=>y==x.Groups[1].Value))
.Select(z=>z.Value);
This is another way:
string[] funcNames = new string[]{"Sum","Average","Count","Max","Min"};
string s = "Your String";
var output = from v in s.Split(',')
where funcNames.Contains(v.Split('(')[0].Trim())
select v.Split(new string[]{" as "},
StringSplitOptions.RemoveEmptyEntries)[0].Trim();
.Split('(')[1].Split(')')[0]; //this line is to get only field names
//Print results
foreach(var str in output) Console.WriteLine(str);

LINQ GroupBy and Select on different property

If I have the following collection:
var foos = new List<Foo>
{
new Foo{ Name = "A", Value = 1 },
new Foo{ Name = "B", Value = 1 },
new Foo{ Name = "B", Value = 2 },
new Foo{ Name = "C", Value = 1 },
};
And I want to end-up with:
A-1
B-2
C-1
Where in the case of the duplicate "B" I want to select the "B" with the highest Value?
Something like:
var filteredFoos = foos.GroupBy(x => x.Name).Select_Duplicate_With_Highest_Value
var query = from p in foos
group p by p.Name into g
select new
{
Name = g.Key,
Value = g.Max(a => a.Value)
};
var filteredFoos =
foos.GroupBy(x => x.Name)
.Select(x => new { Name = x.Key, Value = x.Max(f => f.Value) });
Try this query:
var filteredFoos = foos.GroupBy(x => x.Name)
.Select(p => new { p.Key, p.Max(x => x.Value) });
For anyone with more than 2 columns:
var subquery = from p in foos
group p by p.Name into g
select new
{
Name = g.Key,
Value = g.Max(a => a.Value)
};
var query = from f in foos
join s in subquery
on f.Name equals s.Name
where f.Value == s.Value
select f;
If this is against SQL, make sure Name is a primitive.

Categories