Optimizing LINQ routines - c#

I run a build system. Datawise the simplified description would be that I have Configurations and each config has 0..n Builds.
Now builds produce artifacts and some of these are stored on server. What I am doing is writing kind of a rule, that sums all the bytes produced per configuration builds and checks if these are too much.
The code for the routine at the moment is following:
private void CalculateExtendedDiskUsage(IEnumerable<Configuration> allConfigurations)
{
var sw = new Stopwatch();
sw.Start();
// Lets take only confs that have been updated within last 7 days
var items = allConfigurations.AsParallel().Where(x =>
x.artifact_cleanup_type != null && x.build_cleanup_type != null &&
x.updated_date > DateTime.UtcNow.AddDays(-7)
).ToList();
using (var ctx = new LocalEntities())
{
Debug.WriteLine("Context: " + sw.Elapsed);
var allBuilds = ctx.Builds;
var ruleResult = new List<Notification>();
foreach (var configuration in items)
{
// all builds for current configuration
var configurationBuilds = allBuilds.Where(x => x.configuration_id == configuration.configuration_id)
.OrderByDescending(z => z.build_date);
Debug.WriteLine("Filter conf builds: " + sw.Elapsed);
// Since I don't know which builds/artifacts have been cleaned up, calculate it manually
if (configuration.build_cleanup_count != null)
{
var buildCleanupCount = "30"; // default
if (configuration.build_cleanup_type.Equals("ReserveBuildsByDays"))
{
var buildLastCleanupDate = DateTime.UtcNow.AddDays(-int.Parse(buildCleanupCount));
configurationBuilds = configurationBuilds.Where(x => x.build_date > buildLastCleanupDate)
.OrderByDescending(z => z.build_date);
}
if (configuration.build_cleanup_type.Equals("ReserveBuildsByCount"))
{
var buildLastCleanupCount = int.Parse(buildCleanupCount);
configurationBuilds =
configurationBuilds.Take(buildLastCleanupCount).OrderByDescending(z => z.build_date);
}
}
if (configuration.artifact_cleanup_count != null)
{
// skipped, similar to previous block
}
Debug.WriteLine("Done cleanup: " + sw.Elapsed);
const int maxDiscAllocationPerConfiguration = 1000000000; // 1GB
// Sum all disc usage per configuration
var confDiscSizePerConfiguration = configurationBuilds
.GroupBy(c => new {c.configuration_id})
.Where(c => (c.Sum(z => z.artifact_dir_size) > maxDiscAllocationPerConfiguration))
.Select(groupedBuilds =>
new
{
configurationId = groupedBuilds.FirstOrDefault().configuration_id,
configurationPath = groupedBuilds.FirstOrDefault().configuration_path,
Total = groupedBuilds.Sum(c => c.artifact_dir_size),
Average = groupedBuilds.Average(c => c.artifact_dir_size)
}).ToList();
Debug.WriteLine("Done db query: " + sw.Elapsed);
ruleResult.AddRange(confDiscSizePerConfiguration.Select(iter => new Notification
{
ConfigurationId = iter.configurationId,
CreatedDate = DateTime.UtcNow,
RuleType = (int) RulesEnum.TooMuchDisc,
ConfigrationPath = iter.configurationPath
}));
Debug.WriteLine("Finished loop: " + sw.Elapsed);
}
// find owners and insert...
}
}
This does exactly what I want, but I am thinking if I could make it any faster. Currenly I see:
Context: 00:00:00.0609067
// first round
Filter conf builds: 00:00:00.0636291
Done cleanup: 00:00:00.0644505
Done db query: 00:00:00.3050122
Finished loop: 00:00:00.3062711
// avg round
Filter conf builds: 00:00:00.0001707
Done cleanup: 00:00:00.0006343
Done db query: 00:00:00.0760567
Finished loop: 00:00:00.0773370
The SQL generated by .ToList() looks very messy. (Everything that is used in WHERE is covered with an index in DB)
I am testing with 200 configurations, so this adds up to 00:00:18.6326722. I have a total of ~8k items that need to get processed daily (so the whole routine takes more than 10 minutes to complete).
I have been randomly googling around this internet and it seems to me that Entitiy Framework is not very good with parallel processing. Knowing that I still decided to give this async/await approch a try (First time a tried it, so sorry for any nonsense).
Basically if I move all the processing out of scope like:
foreach (var configuration in items)
{
var confDiscSizePerConfiguration = await GetData(configuration, allBuilds);
ruleResult.AddRange(confDiscSizePerConfiguration.Select(iter => new Notification
{
... skiped
}
And:
private async Task<List<Tmp>> GetData(Configuration configuration, IQueryable<Build> allBuilds)
{
var configurationBuilds = allBuilds.Where(x => x.configuration_id == configuration.configuration_id)
.OrderByDescending(z => z.build_date);
//..skipped
var confDiscSizePerConfiguration = configurationBuilds
.GroupBy(c => new {c.configuration_id})
.Where(c => (c.Sum(z => z.artifact_dir_size) > maxDiscAllocationPerConfiguration))
.Select(groupedBuilds =>
new Tmp
{
ConfigurationId = groupedBuilds.FirstOrDefault().configuration_id,
ConfigurationPath = groupedBuilds.FirstOrDefault().configuration_path,
Total = groupedBuilds.Sum(c => c.artifact_dir_size),
Average = groupedBuilds.Average(c => c.artifact_dir_size)
}).ToListAsync();
return await confDiscSizePerConfiguration;
}
This, for some reason, drops the execution time for 200 items from 18 -> 13 sec. Anyway, from what I understand, since I am awaiting each .ToListAsync(), it is still processed in sequence, is that correct?
So the "can't process in parallel" claim starts coming out when I replace the foreach (var configuration in items) with Parallel.ForEach(items, async configuration =>. Doing this change results in:
A second operation started on this context before a previous
asynchronous operation completed. Use 'await' to ensure that any
asynchronous operations have completed before calling another method
on this context. Any instance members are not guaranteed to be thread
safe.
It was a bit confusing to me at first as I await practically in every place where the compiler allows it, but possibly the data gets seeded to fast.
I tried to overcome this by being less greedy and added the new ParallelOptions {MaxDegreeOfParallelism = 4} to that parallel loop, peasant assumption was that default connection pool size is 100, all I want to use is 4, should be plenty. But it still fails.
I have also tried to create new DbContexts inside the GetData method, but it still fails. If I remember correctly (can't test now), I got
Underlying connection failed to open
What possibilities there are to make this routine go faster?

Before going in parallel, it is worth to optimize query itself. Here are some suggestions that might improve your times:
1) Use Key when working with GroupBy. This might solve issue of complex & nested SQL query as in that way you instruct Linq to use the same keys defined in GROUP BY and not to create sub-select.
var confDiscSizePerConfiguration = configurationBuilds
.GroupBy(c => new { ConfigurationId = c.configuration_id, ConfigurationPath = c.configuration_path})
.Where(c => (c.Sum(z => z.artifact_dir_size) > maxDiscAllocationPerConfiguration))
.Select(groupedBuilds =>
new
{
configurationId = groupedBuilds.Key.ConfigurationId,
configurationPath = groupedBuilds.Key.ConfigurationPath,
Total = groupedBuilds.Sum(c => c.artifact_dir_size),
Average = groupedBuilds.Average(c => c.artifact_dir_size)
})
.ToList();
2) It seems that you are bitten by N+1 problem. In simple words - you execute one SQL query to get all configurations and N another ones to get build information. In total that would be ~8k small queries where 2 bigger queries would suffice. If used memory is not a constraint, fetch all build data in memory and optimize for fast lookup using ToLookup.
var allBuilds = ctx.Builds.ToLookup(x=>x.configuration_id);
Later you can lookup builds by:
var configurationBuilds = allBuilds[configuration.configuration_id].OrderByDescending(z => z.build_date);
3) You are doing OrderBy on configurationBuilds multiple times. Filtering does not affect record order, so you can safely remove extra calls to OrderBy:
...
configurationBuilds = configurationBuilds.Where(x => x.build_date > buildLastCleanupDate);
...
configurationBuilds = configurationBuilds.Take(buildLastCleanupCount);
...
4) There is no point to do GroupBy as builds are already filtered for a single configuration.
UPDATE:
I took it one step further and created code that would retrieve same results as your provided code with a single request. It should be more performant and use less memory.
private void CalculateExtendedDiskUsage()
{
using (var ctx = new LocalEntities())
{
var ruleResult = ctx.Configurations
.Where(x => x.build_cleanup_count != null &&
(
(x.build_cleanup_type == "ReserveBuildsByDays" && ctx.Builds.Where(y => y.configuration_id == x.configuration_id).Where(y => y.build_date > buildLastCleanupDate).Sum(y => y.artifact_dir_size) > maxDiscAllocationPerConfiguration) ||
(x.build_cleanup_type == "ReserveBuildsByCount" && ctx.Builds.Where(y => y.configuration_id == x.configuration_id).OrderByDescending(y => y.build_date).Take(buildCleanupCount).Sum(y => y.artifact_dir_size) > maxDiscAllocationPerConfiguration)
)
)
.Select(x => new Notification
{
ConfigurationId = x.configuration_id,
ConfigrationPath = x.configuration_path
CreatedDate = DateTime.UtcNow,
RuleType = (int)RulesEnum.TooMuchDisc,
})
.ToList();
}
}

First make a new context every parallel.foreach of you going to go that route. But u need to write a query that gets all the needed data in one trip. To speed up ef u can also disable change tracking or proxies on the context when ur reading data.

There are a lot of places for optimizations...
There are places where you should put .ToArray() to avoid asking multiple time to server...
I did a lot of refactor, but I'm unable to check, due lack of more information.
Maybe this can lead you to a better solution...
private void CalculateExtendedDiskUsage(IEnumerable allConfigurations)
{
var sw = new Stopwatch();
sw.Start();
using (var ctx = new LocalEntities())
{
Debug.WriteLine("Context: " + sw.Elapsed);
var allBuilds = ctx.Builds;
var ruleResult = GetRulesResult(sw, allConfigurations, allBuilds); // Clean Code!!!
// find owners and insert...
}
}
private static IEnumerable<Notification> GetRulesResult(Stopwatch sw, IEnumerable<Configuration> allConfigurations, ICollection<Configuration> allBuilds)
{
// Lets take only confs that have been updated within last 7 days
var ruleResult = allConfigurations
.AsParallel() // Check if you really need this right here...
.Where(IsConfigElegible) // Clean Code!!!
.SelectMany(x => CreateNotifications(sw, allBuilds, x))
.ToArray();
Debug.WriteLine("Finished loop: " + sw.Elapsed);
return ruleResult;
}
private static bool IsConfigElegible(Configuration x)
{
return x.artifact_cleanup_type != null &&
x.build_cleanup_type != null &&
x.updated_date > DateTime.UtcNow.AddDays(-7);
}
private static IEnumerable<Notification> CreateNotifications(Stopwatch sw, IEnumerable<Configuration> allBuilds, Configuration configuration)
{
// all builds for current configuration
var configurationBuilds = allBuilds
.Where(x => x.configuration_id == configuration.configuration_id);
// .OrderByDescending(z => z.build_date); <<< You should order only when needed (most at the end)
Debug.WriteLine("Filter conf builds: " + sw.Elapsed);
configurationBuilds = BuildCleanup(configuration, configurationBuilds); // Clean Code!!!
configurationBuilds = ArtifactCleanup(configuration, configurationBuilds); // Clean Code!!!
Debug.WriteLine("Done cleanup: " + sw.Elapsed);
const int maxDiscAllocationPerConfiguration = 1000000000; // 1GB
// Sum all disc usage per configuration
var confDiscSizePerConfiguration = configurationBuilds
.OrderByDescending(z => z.build_date) // I think that you can put this even later (or not to have anyway)
.GroupBy(c => c.configuration_id) // No need to create a new object, just use the property
.Where(c => (c.Sum(z => z.artifact_dir_size) > maxDiscAllocationPerConfiguration))
.Select(CreateSumPerConfiguration);
Debug.WriteLine("Done db query: " + sw.Elapsed);
// Extracting to variable to be able to return it as function result
var notifications = confDiscSizePerConfiguration
.Select(CreateNotification);
return notifications;
}
private static IEnumerable<Configuration> BuildCleanup(Configuration configuration, IEnumerable<Configuration> builds)
{
// Since I don't know which builds/artifacts have been cleaned up, calculate it manually
if (configuration.build_cleanup_count == null) return builds;
const int buildCleanupCount = 30; // Why 'string' if you always need as integer?
builds = GetDiscartBelow(configuration, buildCleanupCount, builds); // Clean Code (almost)
builds = GetDiscartAbove(configuration, buildCleanupCount, builds); // Clean Code (almost)
return builds;
}
private static IEnumerable<Configuration> ArtifactCleanup(Configuration configuration, IEnumerable<Configuration> configurationBuilds)
{
if (configuration.artifact_cleanup_count != null)
{
// skipped, similar to previous block
}
return configurationBuilds;
}
private static SumPerConfiguration CreateSumPerConfiguration(IGrouping<object, Configuration> groupedBuilds)
{
var configuration = groupedBuilds.First();
return new SumPerConfiguration
{
configurationId = configuration.configuration_id,
configurationPath = configuration.configuration_path,
Total = groupedBuilds.Sum(c => c.artifact_dir_size),
Average = groupedBuilds.Average(c => c.artifact_dir_size)
};
}
private static IEnumerable<Configuration> GetDiscartBelow(Configuration configuration,
int buildCleanupCount,
IEnumerable<Configuration> configurationBuilds)
{
if (!configuration.build_cleanup_type.Equals("ReserveBuildsByDays"))
return configurationBuilds;
var buildLastCleanupDate = DateTime.UtcNow.AddDays(-buildCleanupCount);
var result = configurationBuilds
.Where(x => x.build_date > buildLastCleanupDate);
return result;
}
private static IEnumerable<Configuration> GetDiscartAbove(Configuration configuration,
int buildLastCleanupCount,
IEnumerable<Configuration> configurationBuilds)
{
if (!configuration.build_cleanup_type.Equals("ReserveBuildsByCount"))
return configurationBuilds;
var result = configurationBuilds
.Take(buildLastCleanupCount);
return result;
}
private static Notification CreateNotification(SumPerConfiguration iter)
{
return new Notification
{
ConfigurationId = iter.configurationId,
CreatedDate = DateTime.UtcNow,
RuleType = (int)RulesEnum.TooMuchDisc,
ConfigrationPath = iter.configurationPath
};
}
}
internal class SumPerConfiguration {
public object configurationId { get; set; } //
public object configurationPath { get; set; } // I did use 'object' cause I don't know your type data
public int Total { get; set; }
public double Average { get; set; }
}

Related

Can only enumerate once over IEnumerable

Given is the following code (a xUnit test):
[Fact]
public void SetFilePathTest()
{
// Arrange
IBlobRepository blobRepository = null;
IEnumerable<Photo> photos = new List<Photo>()
{
new Photo()
{
File = "1.jpg"
},
new Photo()
{
File = "1.jpg"
}
};
IEnumerable<CloudBlockBlob> blobs = new List<CloudBlockBlob>()
{
new CloudBlockBlob(new Uri("https://blabla.net/media/photos/1.jpg")),
new CloudBlockBlob(new Uri("https://blabla.net/media/photos/2.jpg"))
};
// Act
photos = blobRepository.SetFilePath2(photos, blobs);
// Assert
Assert.Equal(2, photos.Count());
Assert.Equal(2, photos.Count());
}
Here is the SetFilePath2 method:
public static IEnumerable<T> SetFilePath2<T>(this IBlobRepository blobRepository, IEnumerable<T> entities, IEnumerable<CloudBlockBlob> blobs) where T : BlobEntityBase
{
var firstBlob = blobs.FirstOrDefault();
if (firstBlob is null == false)
{
var prefixLength = firstBlob.Parent.Prefix.Length;
return entities
.Join(blobs, x => x.File, y => y.Name.Substring(prefixLength), (entity, blob) => (entity, blob))
.Select(x =>
{
x.entity.File = x.blob.Uri.AbsoluteUri;
return x.entity;
});
}
else
{
return Enumerable.Empty<T>();
}
}
As you can see, I assert 2 times the very same thing. But only the first assert succeeds. When I step through with the debugger then I can only enumerate the collection once. So at the second Assert it yields no items back.
Can anyone explain me why that happens? I really don't see any problem with this code than explains this behavior.
Every time you call .Count() you basically call
blobRepository.SetFilePath2(photos, blobs).Count() and you modify the entity while using Select.I would recommend using new in the Select statement if you don't mean to alter the original value. That's why you get different results.

An unhandled exception of type 'System.StackOverflowException' occurred in EntityFramework dll

I get this exception when I try to process 270k records. It fails at 12k. Can someone explain to me what I am missing?
The database is SQL and I am using EF 6. I am using predicate builder to build my where clause.
The idea being
select * from table where ((a = 'v1' and b = 'v2') or (a = 'v11' and b = 'v21') or (a = 'v12' and b = 'v22') ..)
I don't see anywhere that I still hold reference to my object that represents EF class. I am creating a POCO for the result I want to send back to view.
Any ideas?
Also I am using CommandTimeout of 10000 and the point where it fails, when I run the query with same paramters in sql management studio, it returns 400 rows.
When I ran profiler, I noticed a few seconds before I got the error, memory usage shot up to 1GB+
Thanks
public List<SearchResult> SearchDocuments(List<SearchCriteria> searchCriterias)
{
List<SearchResult> results = new List<SearchResult>();
var fieldSettings = GetData() ;// make a call to database to get this data
using (var context = CreateContext())
{
var theQuery = PredicateBuilder.False<ViewInSqlDatabase>();
int skipCount = 0;
const int recordsToProcessInOneBatch = 100;
while (searchCriterias.Skip(skipCount).Any())
{
var searchCriteriasBatched = searchCriterias.Skip(skipCount).Take(recordsToProcessInOneBatch);
foreach (var searchCriteria in searchCriteriasBatched)
{
var queryBuilder = PredicateBuilder.True<ViewInSqlDatabase>();
// theQuery
if (searchCriteria.State.HasValue)
queryBuilder = queryBuilder.And(a => a.State == searchCriteria.State.Value);
if (!string.IsNullOrWhiteSpace(searchCriteria.StateFullName))
queryBuilder = queryBuilder.And(a => a.StateName.Equals(searchCriteria.StateFullName, StringComparison.CurrentCultureIgnoreCase));
if (searchCriteria.County.HasValue)
queryBuilder = queryBuilder.And(a => a.County == searchCriteria.County.Value);
if (!string.IsNullOrWhiteSpace(searchCriteria.CountyFullName))
queryBuilder = queryBuilder.And(a => a.CountyName.Equals(searchCriteria.CountyFullName, StringComparison.CurrentCultureIgnoreCase));
if (!string.IsNullOrWhiteSpace(searchCriteria.Township))
queryBuilder = queryBuilder.And(a => a.Township == searchCriteria.Township);
// and so on...for another 10 parameters
theQuery = theQuery.Or(queryBuilder.Expand());
}
// this is where I get error after 12k to 15k criterias have been processed
var searchQuery = context.ViewInSqlDatabase.AsExpandable().Where(theQuery).Distinct().ToList();
foreach (var query in searchQuery)
{
var newResultItem = SearchResult.Create(query, fieldSettings); // POCO object with no relation to database
if (!results.Contains(newResultItem))
results.Add(newResultItem);
}
skipCount += recordsToProcessInOneBatch;
}
}
return results.Distinct().OrderBy(a => a.State).ThenBy(a => a.County).ThenBy(a => a.Township).ToList();
}
Fourat is correct that you can modify your query to context.SearchResults.Where(x => ((x.a == 'v1' &&x.b == 'v2') || (x.a = 'v11' &&x.b = 'v21') || (x.a = 'v12' && x.b = 'v22')).Distinct().OrderBy(a => a.State).ThenBy(a => a.County).ThenBy(a => a.Township).ToList(); What this do with make the database do the heavy lifting for you and you
I would also suggest that you use lazy evaluation instead of forcing it into a list if you can.

Entity Framework throwing exception with Extension Method

I have the following code in my Controller:
public List<DockDoorViewModel> GetDoorViewModel()
{
List<DockDoorViewModel> doors = new List<DockDoorViewModel>();
for (int i = 1; i < 11; i++)
{
// This is where the Stack Trace is pointing to.
DockDoorViewModel door = db.vwDockDoorDatas
.Where(x => x.DockNo == i)
.Select(x => x.ToDockDoorViewModel())
.FirstOrDefault();
if (door == null)
{
door = new DockDoorViewModel(i);
}
else
{
door.Items = db.vwDockDoorDatas
.Where(x => x.DockNo == i)
.Select(x => x.ToDockDoorItem())
.ToList();
}
doors.Add(door);
}
return doors;
}
I am getting this exception when I try and Run the Web App:
Exception Details: System.NotSupportedException: LINQ to Entities does not recognize the method 'DockDoorMonitor.Models.DockDoorViewModel ToDockDoorViewModel(DockDoorMonitor.Models.vwDockDoorData)' method, and this method cannot be translated into a store expression.
Here is the extension method:
public static class vwDockDoorDataExtensions
{
public static DockDoorViewModel ToDockDoorViewModel(this vwDockDoorData x)
{
DockDoorViewModel vm = null;
if (x != null)
{
vm = new DockDoorViewModel()
{
ID = x.ID,
DockNo = x.DockNo,
loadType = x.loadType,
LoadDescription = x.LoadDescription,
Name = x.Name,
LocationCode = x.LocationCode,
SACode = x.SACode
};
}
return vm;
}
public static DockDoorItem ToDockDoorItem(this vwDockDoorData x)
{
DockDoorItem vm = null;
if (x != null)
{
vm = new DockDoorItem()
{
ID = x.ItemNo,
Description = x.Description,
Quantity = x.Quantity,
UnitOfMeasure = x.UnitOfMeasure
};
}
return vm;
}
}
I've done this kind of thing before so I'm not seeing what I am doing wrong? This is my first time with a MVC5 and EF6 application.
The error message tells you everything you need to know really - EF can't translate your extension methods to SQL therefore throws an exception. You need to convert your query from LINQ to Entities to LINQ to Objects, this can be done with a simple call to AsEnumerable() e.g.
DockDoorViewModel door = db.vwDockDoorDatas.Where(x => x.DockNo == i)
.AsEnumerable()
.Select(x => x.ToDockDoorViewModel())
.FirstOrDefault();
Effectively, what this does is create a hybrid query where everything before the AsEnumerable is translated and executed as SQL and the remainder being executed client-side and in memory.
As per your performance issues, looking at your query again you are unnecessarily pulling across a lot of records, you are only after the first one so why not just pull that one over i.e.
vwDockDoorData entity = db.vwDockDoorDatas.Where(x => x.DockNo == i)
.FirstOrDefault();
DockDoorViewModel door = entity != null ? entity.ToDockDoorViewModel() : null;
A further improvement on that would be to simply filter the records before you iterate them (give you have a start/end range) e.g.
var doorDatas = db.vwDockDoorDatas.Where(x => x.DockNo >= 1 && x.DockNo <= 11)
.ToList();
for (int i = 0; i < doorDatas.Count; i++)
{
// This is where the Stack Trace is pointing to.
DockDoorViewModel door = data.ToDockDoorViewModel();
if (door == null)
{
door = new DockDoorViewModel(i+1);
}
else
{
door.Items = data.ToDockDoorItem();
}
doors.Add(door);
}
The above would only require a single trip to the DB.
You will have to load the data from the SQL Server before using your to method. You can do this (for example) with the following command:
door.Items = db.vwDockDoorDatas
.Where(x => x.DockNo == i)
.ToList() //Possibly use AsEnumerable() here instead as James says
.Select(x => x.ToDockDoorItem())
.ToList();

jquery datatables server side filtering causes EF to timeout?

I have the following method which filters 2 million records but most of the times if i want to get the last page it causes entity framework to timeout is there any way I could improve the following code so that it can run faster.
public virtual ActionResult GetData(DataTablesParamsModel param)
{
try
{
int totalRowCount = 0;
// Generate Data
var allRecords = _echoMediaRepository.GetMediaList();
//Apply search criteria to data
var predicate = PredicateBuilder.True<MediaChannelModel>();
if (!String.IsNullOrEmpty(param.sSearch))
{
var wherePredicate = PredicateBuilder.False<MediaChannelModel>();
int i;
if (int.TryParse(param.sSearch, out i))
{
wherePredicate = wherePredicate.Or(m => m.ID == i);
}
wherePredicate = wherePredicate.Or(m => m.Name.Contains(param.sSearch));
predicate = predicate.And(wherePredicate);
}
if (param.iMediaGroupID > 0)
{
var wherePredicate = PredicateBuilder.False<MediaChannelModel>();
var mediaTypes = new NeptuneRepository<Lookup_MediaTypes>();
var mediaGroups = mediaTypes.FindWhere(m => m.MediaGroupID == param.iMediaGroupID)
.Select(m => m.Name)
.ToArray();
wherePredicate = wherePredicate.Or(m => mediaGroups.Contains(m.NeptuneMediaType) || mediaGroups.Contains(m.MediaType));
predicate = predicate.And(wherePredicate);
}
var filteredRecord = allRecords.Where(predicate);
var columnCriteria = param.sColumns.Split(',').ToList();
if (!String.IsNullOrEmpty(columnCriteria[param.iSortCol_0]))
{
filteredRecord = filteredRecord.ApplyOrder(
columnCriteria[param.iSortCol_0],
param.sSortDir_0 == "asc" ? QuerySortOrder.OrderBy : QuerySortOrder.OrderByDescending);
}
totalRowCount = filteredRecord.Count();
var finalQuery = filteredRecord.Skip(param.iDisplayStart).Take(param.iDisplayLength).ToList();
// Create response
return Json(new
{
sEcho = param.sEcho,
aaData = finalQuery,
iTotalRecords = allRecords.Count(),
iTotalDisplayRecords = totalRowCount
}, JsonRequestBehavior.AllowGet);
}
catch (Exception ex)
{
Logger.Error(ex);
throw;
}
}
Your code and queries look optimized, so the problem should be the lack of indexes in the database that degrade the performance of your orderby (used by the skip).
Using a test code very similar to yours, I've done some tests in a local test DB with a table with 5 Million rows (with XML Type columns all filled) and, as expected, using queries ordered by indexes was really fast but, by unindexed columns, they could take very, very, long time.
I recommend you to analyse the most common used columns for the dynamic Where and Order functions and do some performance tests by creating the corresponding indexes.

c# RavenDB embedded optimize

I have a database (RavenDB) which needs to be able to handle 300 queries (Full text search) every 10 seconds. To increase peformance I splitted up the database so I have multiple documentStores
my Code:
var watch = Stopwatch.StartNew();
int taskcnt = 0;
int sum = 0;
for (int i = 0; i < 11; i++)
{
Parallel.For(0, 7, new Action<int>((x) =>
{
for(int docomentStore = 0;docomentStore < 5; docomentStore++)
{
var stopWatch = Stopwatch.StartNew();
Task<IList<eBayItem>> task = new Task<IList<eBayItem>>(Database.ExecuteQuery, new Filter()
{
Store = "test" + docomentStore,
MaxPrice = 600,
MinPrice = 200,
BIN = true,
Keywords = new List<string>() { "Canon", "MP", "Black" },
ExcludedKeywords = new List<string>() { "G1", "T3" }
});
task.ContinueWith((list) => {
stopWatch.Stop();
sum += stopWatch.Elapsed.Milliseconds;
taskcnt++;
if (taskcnt == 300)
{
watch.Stop();
Console.WriteLine("Average time: " + (sum / (float)300).ToString());
Console.WriteLine("Total time: " + watch.Elapsed.ToString() + "ms");
}
});
task.Start();
}
}));
Thread.Sleep(1000);
}
Average query time: 514,13 ms
Total time: 00:01:29.9108016
The code where I query ravenDB:
public static IList<eBayItem> ExecuteQuery(object Filter)
{
IList<eBayItem> items;
Filter filter = (Filter)Filter;
if (int.Parse(filter.Store.ToCharArray().Last().ToString()) > 4)
{
Console.WriteLine(filter.Store); return null;
}
using (var session = Shards[filter.Store].OpenSession())
{
var query = session.Query<eBayItem, eBayItemIndexer>().Where(y => y.Price <= filter.MaxPrice && y.Price >= filter.MinPrice);
query = filter.Keywords.ToArray()
.Aggregate(query, (q, term) =>
q.Search(xx => xx.Title, term, options: SearchOptions.And));
if (filter.ExcludedKeywords.Count > 0)
{
query = filter.ExcludedKeywords.ToArray().Aggregate(query, (q, exterm) =>
q.Search(it => it.Title, exterm, options: SearchOptions.Not));
}
items = query.ToList<eBayItem>();
}
return items;
}
And the initialization of RavenDB:
static Dictionary<string, EmbeddableDocumentStore> Shards = new Dictionary<string, EmbeddableDocumentStore>();
public static void Connect()
{
Shards.Add("test0", new EmbeddableDocumentStore() { DataDirectory = "test.db" });
Shards.Add("test1", new EmbeddableDocumentStore() { DataDirectory = "test1.db" });
Shards.Add("test2", new EmbeddableDocumentStore() { DataDirectory = "test2.db" });
Shards.Add("test3", new EmbeddableDocumentStore() { DataDirectory = "test3.db" });
Shards.Add("test4", new EmbeddableDocumentStore() { DataDirectory = "test4.db" });
foreach (string key in Shards.Keys)
{
EmbeddableDocumentStore store = Shards[key];
store.Initialize();
IndexCreation.CreateIndexes(typeof(eBayItemIndexer).Assembly, store);
}
}
How can I optimize my code so my total time is lower ? Is it good to divide my database up in 5 different ones ?
EDIT: The program has only 1 documentStore instead of 5. (As sugested by Ayende Rahien)
Also this is the Query on its own:
Price_Range:[* TO Dx600] AND Price_Range:[Dx200 TO NULL] AND Title:(Canon) AND Title:(MP) AND Title:(Black) -Title:(G1) -Title:(T3)
No, this isn't good.
Use a single embedded RavenDB. If you need sharding, this involved multiple machines.
In general, RavenDB queries are in the few ms each. You need to show what your queries looks like (you can call ToString() on them to see that).
Having shards of RavenDB in this manner means that all of them are fighting for CPU and IO
I know this is an old post but this was the top search result I got.
I had the same problem that my queries were taking 500ms. It now takes 100ms by applying the following search practices: http://ravendb.net/docs/article-page/2.5/csharp/client-api/querying/static-indexes/searching

Categories