LINQ Where vs For Loop implementation [closed] - c#

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm trying to understand C# LINQ implementation and how is it performance against FOR and FOREACH loops.
Every where I see posts of how much better (in terms of performance) is to use a for loop implementation over a LINQ one. Example1, Example2, Example3
How ever, I'm trying to come along with my own POC to see if I can optimize the GroupBy and the Where operations and I see the opposite. Can you tell me if my implementations can be optimized better?
//Where Implementation (Main Call)
var students = createStudentList();
var stopwatch1 = new Stopwatch();
stopwatch1.Start();
var y = students.Where(s=> s.age == 32);
foreach(var entry in y){}
stopwatch1.Stop();
Console.WriteLine("1) TICKS ELAPSED WHERE: " + stopwatch1.ElapsedTicks);
Console.WriteLine("1) MILLISECONDS WHERE: " + stopwatch1.ElapsedMilliseconds);
var stopwatch2 = new Stopwatch();
stopwatch2.Start();
var y2 = WhereManual(students);
foreach(var entry in y2){}
stopwatch2.Stop();
Console.WriteLine("2) TICKS ELAPSED FOR: " + stopwatch2.ElapsedTicks);
Console.WriteLine("2) MILLISECONDS FOR: " + stopwatch2.ElapsedMilliseconds);
public List<Student> WhereManual(List<Student> students){
var filteredList = new List<Student>();
for(var i = 0; i < students.Count(); i++){
var student = students[i];
if(student.age == 32){
filteredList.Add(student);
}
}
return filteredList;
}
Output:
1) TICKS ELAPSED WHERE: 389478
1) MILLISECONDS WHERE: 38
2) TICKS ELAPSED FOR: 654023
2) MILLISECONDS FOR: 65
And for the GroupBy I have
//GroupBy Implementation (Main Call)
var students = createStudentList();
var stopwatch1 = new Stopwatch();
stopwatch1.Start();
var y = students.GroupBy(s => s.age);
foreach(var entry in y){}
stopwatch1.Stop();
Console.WriteLine("1) TICKS ELAPSED GROUPBY: " + stopwatch1.ElapsedTicks);
Console.WriteLine("1) MILLISECONDS GROUPBY: " + stopwatch1.ElapsedMilliseconds);
var stopwatch2 = new Stopwatch();
stopwatch2.Start();
var y2 = dictOperation(students);
foreach(var entry in y2){}
stopwatch2.Stop();
Console.WriteLine("2) TICKS ELAPSED FOR: " + stopwatch2.ElapsedTicks);
Console.WriteLine("2) MILLISECONDS FOR: " + stopwatch2.ElapsedMilliseconds);
public List<Student> GetStudent(Dictionary<int, List<Student>> dict, int age){
List<Student> dictStudent;
return dict.TryGetValue(age, out dictStudent) ? dictStudent : null;
}
public Dictionary<int, List<Student>> dictOperation(List<Student> students){
var dict = new Dictionary<int, List<Student>>();
for(var i = 0; i < students.Count(); i++){
var student = students[i];
var studentAge = student.age;
var dictStudent = GetStudent(dict, studentAge);
if(dictStudent == null)
{
dict.Add(studentAge, new List<Student>(){student});
}
else
{
dictStudent.Add(student);
}
}
return dict;
}
And this is the output:
1) TICKS ELAPSED GROUPBY: 865702
1) MILLISECONDS GROUPBY: 86
2) TICKS ELAPSED FOR: 1364863
2) MILLISECONDS FOR: 1.36

Not much of an answer, but since I played with it a little I may as well share.
I did not spend much time looking at the GroupBy comparison because the types used are different enough that they may be the bottleneck, and I'm not familiar enough with IGrouping to create a new test right now.
I found that if you use the List.Count property instead of the List.Count() extension method, it saved enough time (iterating over 1000000 items) to make the manual code faster than Linq. Additionally, a few more milliseconds were saved by removing the assignment var student = students[i];:
public class Student { public string Name { get; set; } public int Age { get; set; } }
public class Program
{
public static List<Student> Students = new List<Student>();
public static void CreateStudents()
{
for (var i = 0; i < 1000000; i++)
{
Students.Add(new Student {Name = $"Student{i}", Age = i});
}
}
public static List<Student> WhereManualOriginal(List<Student> students)
{
var filteredList = new List<Student>();
for (var i = 0; i < students.Count(); i++)
{
var student = students[i];
if (student.Age == 32)
{
filteredList.Add(student);
}
}
return filteredList;
}
public static List<Student> WhereManualNew(List<Student> students)
{
var filteredList = new List<Student>();
for (var i = 0; i < students.Count; i++)
{
if (students[i].Age == 32)
{
filteredList.Add(students[i]);
}
}
return filteredList;
}
public static long LinqWhere()
{
var sw = Stopwatch.StartNew();
var items = Students.Where(s => s.Age == 32);
foreach (var item in items) { }
sw.Stop();
return sw.ElapsedTicks;
}
public static long ManualWhere()
{
var sw = Stopwatch.StartNew();
var items = WhereManualOriginal(Students);
foreach (var item in items) { }
sw.Stop();
return sw.ElapsedTicks;
}
public static long NewManualWhere()
{
var sw = Stopwatch.StartNew();
var items = WhereManualNew(Students);
foreach (var item in items) { }
sw.Stop();
return sw.ElapsedTicks;
}
public static void Main()
{
// Warmup stuff
CreateStudents();
WhereManualOriginal(Students);
WhereManualNew(Students);
Students.Where(s => s.Age == 32).ToList();
var linqResults = new List<long>();
var manualResults = new List<long>();
var newManualResults = new List<long>();
for (int i = 0; i < 100; i++)
{
newManualResults.Add(NewManualWhere());
manualResults.Add(ManualWhere());
linqResults.Add(LinqWhere());
}
Console.WriteLine("Linq where ......... " + linqResults.Average());
Console.WriteLine("Manual where ....... " + manualResults.Average());
Console.WriteLine("New Manual where ... " + newManualResults.Average());
GetKeyFromUser("\nDone! Press any key to exit...");
}
}
Output

Related

creating array of bad names to check and replace in c#

I'm looking to create a method that loops through an list and replaces with matched values with a new value. I have something working below but it really doesnt follow the DRY principal and looks ugly.
How could I create a dictionary of value pairs that would hold my data of values to match and replace?
var match = acreData.data;
foreach(var i in match)
{
if (i.county_name == "DE KALB")
{
i.county_name = "DEKALB";
}
if (i.county_name == "DU PAGE")
{
i.county_name = "DUPAGE";
}
}
In your question, you can try to use linq and Replace to make it.
var match = acreData.data.ToList();
match.ForEach(x =>
x.county_name = x.county_name.Replace(" ", "")
);
or you can try to create a mapper table to let your data mapper with your value. as #user2864740 say.
Dictionary<string, string> dict = new Dictionary<string, string>();
dict.Add("DE KALB", "DEKALB");
dict.Add("DU PAGE", "DUPAGE");
var match = acreData.data;
string val = string.Empty;
foreach (var i in match)
{
if (dict.TryGetValue(i.county_name, out val))
i.county_name = val;
}
If this were my problem and it is possible a county could have more than one common misspelling I would create a class to hold the correct name and the common misspellings. The you could easily determine if the misspelling exists and correct if. Something like this:
public class County
{
public string CountyName { get; set; }
public List<string> CommonMisspellings { get; set; }
public County()
{
CommonMisspellings = new List<string>();
}
}
Usage:
//most likely populate from db
var counties = new List<County>();
var dekalb = new County { CountyName = "DEKALB" };
dekalb.CommonMisspellings.Add("DE KALB");
dekalb.CommonMisspellings.Add("DE_KALB");
var test = "DE KALB";
if (counties.Any(c => c.CommonMisspellings.Contains(test)))
{
test = counties.First(c => c.CommonMisspellings.Contains(test)).CountyName;
}
If you are simply replacing all words in a list containing space without space, then can use below:
var newList = match.ConvertAll(word => word.Replace(" ", ""));
ConvertAll returns a new list.
Also, I suggest not to use variable names like i, j, k etc..but use temp etc.
Sample code below:
var oldList = new List<string> {"DE KALB", "DE PAGE"};
var newList = oldList.ConvertAll(word => word.Replace(" ", ""));
We can try removing all the characters but letters and apostroph (Cote d'Ivoire has it)
...
i.country_name = String.Concat(i.country_name
.Where(c => char.IsLetter(c) || c == '\''));
...
I made a comment under answer of #Kevin and it seems it needs further explanation. Sequential searching in list does not scale well and unfortunately for Kevin, that is not my opinion, asymptotic computational complexity is math. While searching in dictionary is more or less O(1), searching in list is O(n). To show a practical impact for solution with 100 countries with 100 misspellings each, lets make a test
public class Country
{
public string CountryName { get; set; }
public List<string> CommonMisspellings { get; set; }
public Country()
{
CommonMisspellings = new List<string>();
}
}
static void Main()
{
var counties = new List<Country>();
Dictionary<string, string> dict = new Dictionary<string, string>();
Random rnd = new Random();
List<string> allCountryNames = new List<string>();
List<string> allMissNames = new List<string>();
for (int state = 0; state < 100; ++state)
{
string countryName = state.ToString() + rnd.NextDouble();
allCountryNames.Add(countryName);
var country = new Country { CountryName = countryName };
counties.Add(country);
for (int miss = 0; miss < 100; ++miss)
{
string missname = countryName + miss;
allMissNames.Add(missname);
country.CommonMisspellings.Add(missname);
dict.Add(missname, countryName);
}
}
List<string> testNames = new List<string>();
for (int i = 0; i < 100000; ++i)
{
if (rnd.Next(20) == 1)
{
testNames.Add(allMissNames[rnd.Next(allMissNames.Count)]);
}
else
{
testNames.Add(allCountryNames[rnd.Next(allCountryNames.Count)]);
}
}
System.Diagnostics.Stopwatch st = new System.Diagnostics.Stopwatch();
st.Start();
List<string> repairs = new List<string>();
foreach (var test in testNames)
{
if (counties.Any(c => c.CommonMisspellings.Contains(test)))
{
repairs.Add(counties.First(c => c.CommonMisspellings.Contains(test)).CountryName);
}
}
st.Stop();
Console.WriteLine("List approach: " + st.ElapsedMilliseconds.ToString() + "ms");
st = new System.Diagnostics.Stopwatch();
st.Start();
List<string> repairsDict = new List<string>();
foreach (var test in testNames)
{
if (dict.TryGetValue(test, out var val))
{
repairsDict.Add(val);
}
}
st.Stop();
Console.WriteLine("Dict approach: " + st.ElapsedMilliseconds.ToString() + "ms");
Console.WriteLine("Repaired count: " + repairs.Count
+ ", check " + (repairs.SequenceEqual(repairsDict) ? "OK" : "ERROR"));
Console.ReadLine();
}
And the result is
List approach: 7264ms
Dict approach: 4ms
Repaired count: 4968, check OK
List approach is about 1800x slower, actually more the thousand times slower in this case. The results are as expected. If that is a problem is another question, it depends on concrete usage pattern in concrete application and is out of scope of this post.

Build up List from another List

I've got a list of Players. Each Player has a Marketvalue. I need to build up a second list which iterates through the player list and builds up a team. The tricky thing is the new team should have at least 15 players and a maximum Marketvalue of 100 Mio +/- 1%.
Does anyone know how to do that elegantly?
private Result<List<Player>> CreateRandomTeam(List<Player> players, int startTeamValue)
{
// start formation 4-4-2
// Threshold tw 20 mio defender 40 mio Midfielder 40 Mio Striker 50 Mio
var playerKeeperList = players.FindAll(p => p.PlayerPosition == PlayerPosition.Keeper);
var playerDefenderList = players.FindAll(p => p.PlayerPosition == PlayerPosition.Defender);
var playerMidfieldList = players.FindAll(p => p.PlayerPosition == PlayerPosition.Midfield);
var playerStrikerList = players.FindAll(p => p.PlayerPosition == PlayerPosition.Striker);
List<Player> keeperPlayers = AddRandomPlayers(playerKeeperList, 2, 0, 20000000);
List<Player> defenderPlayers = AddRandomPlayers(playerDefenderList, 4, 0, 40000000);
List<Player> midfieldPlayers = AddRandomPlayers(playerMidfieldList, 4, 0, 40000000);
List<Player> strikerPlayers = AddRandomPlayers(playerStrikerList, 2, 0, 50000000);
List<Player> team = new List<Player>();
team.AddRange(keeperPlayers);
team.AddRange(defenderPlayers);
team.AddRange(midfieldPlayers);
team.AddRange(strikerPlayers);
var currentTeamValue = team.Sum(s => s.MarketValue);
var budgetLeft = startTeamValue - currentTeamValue;
players.RemoveAll(p => team.Contains(p));
var player1 = AddRandomPlayers(players, 2, 0, budgetLeft);
team.AddRange(player1);
players.RemoveAll(p => player1.Contains(p));
currentTeamValue = team.Sum(t => t.MarketValue);
budgetLeft = startTeamValue - currentTeamValue;
var player2 = players.Aggregate((x, y) => Math.Abs(x.MarketValue - budgetLeft) < Math.Abs(y.MarketValue - budgetLeft) ? x : y);
team.Add(player2);
players.Remove(player2);
return Result<List<Player>>.Ok(team);
}
private static List<Player> AddRandomPlayers(List<Player> players, int playerCount, double minMarketValue, double threshold)
{
// TODO: AYI Implement Random TeamName assign logic
Random rnd = new Random();
var team = new List<Player>();
double assignedTeamValue = 0;
while (team.Count < playerCount)
{
var index = rnd.Next(players.Count);
var player = players[index];
if ((assignedTeamValue + player.MarketValue) <= threshold)
{
team.Add(player);
players.RemoveAt(index);
assignedTeamValue += player.MarketValue;
}
}
return team;
}`
This isn't really a C# question so much as an algorithm question, so there may be a better place for it. AIUI, you want to pick 15 numbers from a list, such that the total adds up to 99-101.
It's likely that there are many solutions, all equally valid.
I think you could do it like this:
Build a list of the 14 cheapest items.
Pick the highest value, so long as the remaining space is greater than the total of the 14 cheapest.
Repeat the above, skipping any players that won't fit.
Fill the remaining places with players from the 'cheapest' list.
This will probably give you a team containing the best and worst players, and one middle-ranking player that just fits.
If you want to do some more research, this sounds like a variant of the coin change problem.
Just to show my solution if someone need's it.
var selection = new EliteSelection();
var crossover = new OnePointCrossover(0);
var mutation = new UniformMutation(true);
var fitness = new TeamFitness(players, startTeamValue);
var chromosome = new TeamChromosome(15, players.Count);
var population = new Population(players.Count, players.Count, chromosome);
var ga = new GeneticAlgorithm(population, fitness, selection, crossover, mutation)
{
Termination = new GenerationNumberTermination(100)
};
ga.Start();
var bestChromosome = ga.BestChromosome as TeamChromosome;
var team = new List<Player>();
if (bestChromosome != null)
{
for (int i = 0; i < bestChromosome.Length; i++)
{
team.Add(players[(int) bestChromosome.GetGene(i).Value]);
}
// Remove assigned player to avoid duplicate assignment
players.RemoveAll(p => team.Contains(p));
return Result<List<Player>>.Ok(team);
}
return Result<List<Player>>.Error("Chromosome was null!");
There is a fitness method which handles the logic to get the best result.
class TeamFitness : IFitness
{
private readonly List<Player> _players;
private readonly int _startTeamValue;
private List<Player> _selected;
public TeamFitness(List<Player> players, int startTeamValue)
{
_players = players;
_startTeamValue = startTeamValue;
}
public double Evaluate(IChromosome chromosome)
{
double f1 = 9;
_selected = new List<Player>();
var indexes = new List<int>();
foreach (var gene in chromosome.GetGenes())
{
indexes.Add((int)gene.Value);
_selected.Add(_players[(int)gene.Value]);
}
if (indexes.Distinct().Count() < chromosome.Length)
{
return int.MinValue;
}
var sumMarketValue = _selected.Sum(s => s.MarketValue);
var targetValue = _startTeamValue;
if (sumMarketValue < targetValue)
{
f1 = targetValue - sumMarketValue;
}else if (sumMarketValue > targetValue)
{
f1 = sumMarketValue - targetValue;
}
else
{
f1 = 0;
}
var keeperCount = _selected.Count(s => s.PlayerPosition == PlayerPosition.Keeper);
var strikerCount = _selected.Count(s => s.PlayerPosition == PlayerPosition.Striker);
var defCount = _selected.Count(s => s.PlayerPosition == PlayerPosition.Defender);
var middleCount = _selected.Count(s => s.PlayerPosition == PlayerPosition.Midfield);
var factor = 0;
var penaltyMoney = 10000000;
if (keeperCount > 2)
{
factor += (keeperCount - 2) * penaltyMoney;
}
if (keeperCount == 0)
{
factor += penaltyMoney;
}
if (strikerCount < 2)
{
factor += (2 - strikerCount) * penaltyMoney;
}
if (middleCount < 4)
{
factor += (4 - middleCount) * penaltyMoney;
}
if (defCount < 4)
{
factor += (4 - defCount) * penaltyMoney;
}
return 1.0 - (f1 + factor);
}
}

Replacing loops with linq code [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
My current code is like this:
var results = new List<Results>();
var items = new List<string>
{
"B,0",
"A,1",
"B,2",
"A,3",
"A,4",
"B,5",
"A,6",
"A,7",
"B,8"
};
int size = 2;
int temp;
var tempResults = new List<int>();
var keys = items.Select(t => t.Split(',')[0]).Distinct().ToList();
//var values = items.Select(t => t.Split(',')[1]).ToList();
//var result = items.SelectMany(k => values, (k, v) => new {k, v});
foreach (var key in keys)
{
temp = 0;
tempResults = new List<int>();
foreach (var item in items)
{
if (item.Split(',')[0] == key)
{
tempResults.Add(Int32.Parse(item.Split(',')[1]));
temp++;
}
if (temp == size)
{
results.Add(new Results
{
Key = key,
Values = new List<int>(tempResults)
});
temp = 0;
tempResults.Clear();
}
}
}
foreach (Results r in results)
{
Console.WriteLine("Key: " + r.Key);
Console.WriteLine("Values: ");
foreach (int i in r.Values)
{
Console.WriteLine(i);
}
}
Everything works fine with it, but I am using two loops to get the results needed. I want to replace them with a LINQ expression and been trying, but can't seem to figure it out. Any help is appreciated.
You could use a combination of LINQ methods: .GroupBy, .Select, SelectMany and some data structures like Tuple<T1, T2>.
Provided that we have class:
class Results
{
public string Key { get; set; }
public List<int> Values { get; set; }
}
The solution could be:
int k = 0;
var result =
items.Select(x => // parse initial string
{
var strValue = x.Split(',');
return Tuple.Create(strValue[0], Convert.ToInt32(strValue[1]));
})
.GroupBy(x => x.Item1, y => y.Item2) // group by key
.Select(x => Tuple.Create(x.Key, x)) // flatten to IEnumerable
.SelectMany(x => // select fixed size data chunks
x.Item2.GroupBy(y => k++ / size, z => z)
.Select(z => Tuple.Create(x.Item1, z)))
.Select(x => // cast to resulting model type
new Results()
{
Key = x.Item1,
Values = x.Item2.ToList()
})
.ToList(); // Return enumeration as list
How about writing a couple extension methods?
const int partitionSize = 2;
var itemLookup = items.ToLookup(x => x.Split(',')[0], x => Int32.Parse(x.Split(',')[1]));
var partitionedItems = itemLookup.Partition(partitionSize);
foreach (var partition in partitionedItems)
foreach (var lookup in partition)
{
Console.WriteLine("Key: " + lookup.Key);
Console.WriteLine("Values: ");
foreach (var i in lookup.ToList())
{
Console.WriteLine(i);
}
}
public static class PartitionExtensions
{
public static IList<ILookup<K, V>> Partition<K, V>(this ILookup<K, V> lookup, int size)
{
return lookup.SelectMany(l => l.ToList().Partition(size).Select(p => p.ToLookup(x => l.Key, x => x))).ToList();
}
public static IList<IList<T>> Partition<T>(this IList<T> list, int size)
{
IList<IList<T>> results = new List<IList<T>>();
var itemCount = list.Count();
var partitionCount = itemCount / size;
//your paritioning method is truncating items that don't make up a full partition
//if you want the remaining items in a partial partition, use this code instead
//var partitionCount = ((itemCount % size == 0) ? itemCount : itemCount + size) / size;
for (var i = 0; i < partitionCount; i++)
{
results.Add(list.Skip(i * size).Take(size).ToList());
}
return results;
}
}
Not really a way to remove the inner loop, but you could shorten a bit your code with:
....
var keys = items.Select(t => t.Split(',')[0]).Distinct().ToList();
foreach (var key in keys)
{
var forKey = items.Where(x => x.Split(',')[0] == key)
.Select(k => int.Parse(k.Split(',')[1]));
for (int x = 0; x < forKey.Count(); x += size)
{
results.Add(new Results
{
Key = key,
Values = forKey.Skip(x).Take(size).ToList()
});
}
}
....
At least this approach will remove the need of the temporary variables and all the if checks inside the loop and will also include in your results the last value for the A key that has only one integer in its list.

Don't the expression need to be faster for getting a property value than the reflection?

I know that the using expression is faster for a getting property value than the using reflection, I would like to convert list to datatable, I have used both of them,
The reflection elapsed time : 36 ms
The expression elapsed time : 2350 ms
I wonder I am doing something wrong there?
I have tried below code :
public class Foo
{
public long IntCode { get; set; }
public string Name { get; set; }
public string SurName { get; set; }
public int Age { get; set; }
}
class Program
{
static void Main(string[] args)
{
var r = new Random();
var foos = new List<Foo>();
var sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 10000; i++)
{
foos.Add(new Foo { IntCode = r.Next(), Name = Guid.NewGuid().ToString(), SurName = Guid.NewGuid().ToString(), Age = r.Next() });
}
sw.Stop();
Console.WriteLine("Elapsed Time For Creating : {0}", sw.ElapsedMilliseconds);
sw.Restart();
ConvertWithReflection(foos, "IntCode", "Name", "Age");
sw.Stop();
Console.WriteLine("Elapsed Time For Converting : {0}", sw.ElapsedMilliseconds);
sw.Restart();
ConvertWithExpression(foos, "IntCode", "Name", "Age");
sw.Stop();
Console.WriteLine("Elapsed Time For Converting : {0}", sw.ElapsedMilliseconds);
Console.ReadLine();
}
public static object GetValueGetter<T>(object item,string propertyName)
{
var arg = Expression.Parameter(item.GetType(), "x");
Expression expr = Expression.Property(arg, propertyName);
var unaryExpression = Expression.Convert(expr, typeof(object));
var propertyResolver = Expression.Lambda<Func<T, object>>(unaryExpression, arg).Compile();
var value = propertyResolver((T)item);
return value;
}
public static void ConvertWithReflection<T>(IEnumerable<T> list, params string[] columnNames)
{
var t = list.ToList();
if (!t.Any()) return;
var dataTable = new DataTable();
dataTable.Columns.Add("IntCode");
dataTable.Columns.Add("Name");
dataTable.Columns.Add("SurName");
dataTable.Columns.Add("Age");
foreach (var item in t)
{
var dr = dataTable.NewRow();
for (int i = 0; i < dataTable.Columns.Count; i++)
{
var el = columnNames.ElementAtOrDefault(i);
if (el == null)
{
dr[i] = DBNull.Value;
}
else
{
var property = item.GetType().GetProperty(el);
dr[i] = property.GetValue(item, null);
}
}
dataTable.Rows.Add(dr);
}
}
public static void ConvertWithExpression<T>(IEnumerable<T> list, params string[] columnNames)
{
var t = list.ToList();
if (!t.Any()) return;
var dataTable = new DataTable();
dataTable.Columns.Add("IntCode");
dataTable.Columns.Add("Name");
dataTable.Columns.Add("SurName");
dataTable.Columns.Add("Age");
foreach (var item in t)
{
var dr = dataTable.NewRow();
for (var i = 0; i < dataTable.Columns.Count; i++)
{
var el = columnNames.ElementAtOrDefault(i);
if (el == null)
{
dr[i] = DBNull.Value;
}
else
{
dr[i] = GetValueGetter<T>(item, el);
}
}
dataTable.Rows.Add(dr);
}
}
}
You are not comparing apples to apples: your expression code constructs and compiles an expression on each iteration, producing a fair amount of throw-away activity at each iteration. Reflection code, on the other hand, uses all the optimizations that the designers of the CLR have put into the system, performing only the necessary operations.
Essentially, you are comparing Preparation time + Working time for expressions vs. Working time for reflection. This is not the intended way of using expressions in situations when an action is repeated 10,000 times: you are expected to prepare and compile your lambdas upfront, store them in a cache of some sort, and then quickly retrieve them as needed on each iteration. Implementing some sort of caching would even out your comparison:
public static object GetValueGetter<T>(object item, string propertyName, IDictionary<string,Func<T,object>> cache) {
Func<T, object> propertyResolver;
if (!cache.TryGetValue(propertyName, out propertyResolver)) {
var arg = Expression.Parameter(item.GetType(), "x");
Expression expr = Expression.Property(arg, propertyName);
var unaryExpression = Expression.Convert(expr, typeof (object));
propertyResolver = Expression.Lambda<Func<T, object>>(unaryExpression, arg).Compile();
cache.Add(propertyName, propertyResolver);
}
return propertyResolver((T)item);
}
call looks like this:
var cache = new Dictionary<string,Func<T,object>>();
foreach (var item in t) {
var dr = dataTable.NewRow();
for (var i = 0; i < dataTable.Columns.Count; i++) {
var el = columnNames.ElementAtOrDefault(i);
if (el == null) {
dr[i] = DBNull.Value;
} else {
dr[i] = GetValueGetter<T>(item, el, cache);
}
}
dataTable.Rows.Add(dr);
}
Now that the costs of preparation are spread across 10,000 calls, reflections becomes the slower of the three methods:
Elapsed Time For Creating : 29
Elapsed Time For Converting : 84 <-- Reflection
Elapsed Time For Converting : 53 <-- Expressions
You're compiling the expression repeatedly. Compile it once and it will be faster.
If compiling an expression every time was faster the runtime would do it automatically.

c# RavenDB embedded optimize

I have a database (RavenDB) which needs to be able to handle 300 queries (Full text search) every 10 seconds. To increase peformance I splitted up the database so I have multiple documentStores
my Code:
var watch = Stopwatch.StartNew();
int taskcnt = 0;
int sum = 0;
for (int i = 0; i < 11; i++)
{
Parallel.For(0, 7, new Action<int>((x) =>
{
for(int docomentStore = 0;docomentStore < 5; docomentStore++)
{
var stopWatch = Stopwatch.StartNew();
Task<IList<eBayItem>> task = new Task<IList<eBayItem>>(Database.ExecuteQuery, new Filter()
{
Store = "test" + docomentStore,
MaxPrice = 600,
MinPrice = 200,
BIN = true,
Keywords = new List<string>() { "Canon", "MP", "Black" },
ExcludedKeywords = new List<string>() { "G1", "T3" }
});
task.ContinueWith((list) => {
stopWatch.Stop();
sum += stopWatch.Elapsed.Milliseconds;
taskcnt++;
if (taskcnt == 300)
{
watch.Stop();
Console.WriteLine("Average time: " + (sum / (float)300).ToString());
Console.WriteLine("Total time: " + watch.Elapsed.ToString() + "ms");
}
});
task.Start();
}
}));
Thread.Sleep(1000);
}
Average query time: 514,13 ms
Total time: 00:01:29.9108016
The code where I query ravenDB:
public static IList<eBayItem> ExecuteQuery(object Filter)
{
IList<eBayItem> items;
Filter filter = (Filter)Filter;
if (int.Parse(filter.Store.ToCharArray().Last().ToString()) > 4)
{
Console.WriteLine(filter.Store); return null;
}
using (var session = Shards[filter.Store].OpenSession())
{
var query = session.Query<eBayItem, eBayItemIndexer>().Where(y => y.Price <= filter.MaxPrice && y.Price >= filter.MinPrice);
query = filter.Keywords.ToArray()
.Aggregate(query, (q, term) =>
q.Search(xx => xx.Title, term, options: SearchOptions.And));
if (filter.ExcludedKeywords.Count > 0)
{
query = filter.ExcludedKeywords.ToArray().Aggregate(query, (q, exterm) =>
q.Search(it => it.Title, exterm, options: SearchOptions.Not));
}
items = query.ToList<eBayItem>();
}
return items;
}
And the initialization of RavenDB:
static Dictionary<string, EmbeddableDocumentStore> Shards = new Dictionary<string, EmbeddableDocumentStore>();
public static void Connect()
{
Shards.Add("test0", new EmbeddableDocumentStore() { DataDirectory = "test.db" });
Shards.Add("test1", new EmbeddableDocumentStore() { DataDirectory = "test1.db" });
Shards.Add("test2", new EmbeddableDocumentStore() { DataDirectory = "test2.db" });
Shards.Add("test3", new EmbeddableDocumentStore() { DataDirectory = "test3.db" });
Shards.Add("test4", new EmbeddableDocumentStore() { DataDirectory = "test4.db" });
foreach (string key in Shards.Keys)
{
EmbeddableDocumentStore store = Shards[key];
store.Initialize();
IndexCreation.CreateIndexes(typeof(eBayItemIndexer).Assembly, store);
}
}
How can I optimize my code so my total time is lower ? Is it good to divide my database up in 5 different ones ?
EDIT: The program has only 1 documentStore instead of 5. (As sugested by Ayende Rahien)
Also this is the Query on its own:
Price_Range:[* TO Dx600] AND Price_Range:[Dx200 TO NULL] AND Title:(Canon) AND Title:(MP) AND Title:(Black) -Title:(G1) -Title:(T3)
No, this isn't good.
Use a single embedded RavenDB. If you need sharding, this involved multiple machines.
In general, RavenDB queries are in the few ms each. You need to show what your queries looks like (you can call ToString() on them to see that).
Having shards of RavenDB in this manner means that all of them are fighting for CPU and IO
I know this is an old post but this was the top search result I got.
I had the same problem that my queries were taking 500ms. It now takes 100ms by applying the following search practices: http://ravendb.net/docs/article-page/2.5/csharp/client-api/querying/static-indexes/searching

Categories