How to Fetch a Lot of Records with EF6

How to Fetch a Lot of Records with EF6 - c#

I need to fetch a lot of records from a SQL Server database with EF6. The problem that its takes a lot of time. The main problem is entity called Series which contains Measurements. There is like 250K of them and each has 2 nested entities called FrontDropPhoto and SideDropPhoto.
[Table("Series")]
public class DbSeries
{
[Key] public Guid SeriesId { get; set; }
public List<DbMeasurement> MeasurementsSeries { get; set; }
}
[Table("Measurements")]
public class DbMeasurement
{
[Key] public Guid MeasurementId { get; set; }
public Guid CurrentSeriesId { get; set; }
public DbSeries CurrentSeries { get; set; }
public Guid? SideDropPhotoId { get; set; }
[ForeignKey("SideDropPhotoId")]
public virtual DbDropPhoto SideDropPhoto { get; set; }
public Guid? FrontDropPhotoId { get; set; }
[ForeignKey("FrontDropPhotoId")]
public virtual DbDropPhoto FrontDropPhoto { get; set; }
}
[Table("DropPhotos")]
public class DbDropPhoto
{
[Key] public Guid PhotoId { get; set; }
}
I've wrote fetch method like this (Most of the properties omitted for clarity):
public async Task<List<DbSeries>> GetSeriesByUserId(Guid dbUserId)
{
using (var context = new DDropContext())
{
try
{
var loadedSeries = await context.Series
.Where(x => x.CurrentUserId == dbUserId)
.Select(x => new
{
x.SeriesId,
}).ToListAsync();
var dbSeries = new List<DbSeries>();
foreach (var series in loadedSeries)
{
var seriesToAdd = new DbSeries
{
SeriesId = series.SeriesId,
};
seriesToAdd.MeasurementsSeries = await GetMeasurements(seriesToAdd);
dbSeries.Add(seriesToAdd);
}
return dbSeries;
}
catch (SqlException e)
{
throw new TimeoutException(e.Message, e);
}
}
}
public async Task<List<DbMeasurement>> GetMeasurements(DbSeries series)
{
using (var context = new DDropContext())
{
var measurementForSeries = await context.Measurements.Where(x => x.CurrentSeriesId == series.SeriesId)
.Select(x => new
{
x.CurrentSeries,
x.CurrentSeriesId,
x.MeasurementId,
})
.ToListAsync();
var dbMeasurementsForAdd = new List<DbMeasurement>();
foreach (var measurement in measurementForSeries)
{
var measurementToAdd = new DbMeasurement
{
CurrentSeries = series,
MeasurementId = measurement.MeasurementId,
FrontDropPhotoId = measurement.FrontDropPhotoId,
FrontDropPhoto = measurement.FrontDropPhotoId.HasValue
? await GetDbDropPhotoById(measurement.FrontDropPhotoId.Value)
: null,
SideDropPhotoId = measurement.SideDropPhotoId,
SideDropPhoto = measurement.SideDropPhotoId.HasValue
? await GetDbDropPhotoById(measurement.SideDropPhotoId.Value)
: null,
};
dbMeasurementsForAdd.Add(measurementToAdd);
}
return dbMeasurementsForAdd;
}
}
private async Task<DbDropPhoto> GetDbDropPhotoById(Guid photoId)
{
using (var context = new DDropContext())
{
var dropPhoto = await context.DropPhotos
.Where(x => x.PhotoId == photoId)
.Select(x => new
{
x.PhotoId,
}).FirstOrDefaultAsync();
if (dropPhoto == null)
{
return null;
}
var dbDropPhoto = new DbDropPhoto
{
PhotoId = dropPhoto.PhotoId,
};
return dbDropPhoto;
}
}
Relationships configured via FluentAPI:
modelBuilder.Entity<DbSeries>()
.HasMany(s => s.MeasurementsSeries)
.WithRequired(g => g.CurrentSeries)
.HasForeignKey(s => s.CurrentSeriesId)
.WillCascadeOnDelete();
modelBuilder.Entity<DbMeasurement>()
.HasOptional(c => c.FrontDropPhoto)
.WithMany()
.HasForeignKey(s => s.FrontDropPhotoId);
modelBuilder.Entity<DbMeasurement>()
.HasOptional(c => c.SideDropPhoto)
.WithMany()
.HasForeignKey(s => s.SideDropPhotoId);
I need all of this data to populate WPF DataGrid. The obvious solution is to add paging to this DataGrid. This solution is tempting but it will break the logic of my application badly. I want to create plots at runtime using this data, so I need all of it, not just some parts. I've tried to optimize it a bit by make every method to use async await, but it wasn't helpful enough. I've tried to add
.Configuration.AutoDetectChangesEnabled = false;
for each context, but loading time is still really long. How to approach this problem?

Other than the very large amount of data that you are intent on returning, the main problem is that the way your code is structured means that for each of the 250,000 Series you are performing another trip to the database to get the Measurements for the Series and a further 2 trips to get the front/side DropPhotos for each Measurement. Apart from the round-trip time for the 750,000 calls this completely avoids taking advantage of SQL's set-based performance optimisations.
Try to ensure that EF submits as few queries as possible to return your data, preferably one:
var loadedSeries = await context.Series
.Where(x => x.CurrentUserId == dbUserId)
.Select(x => new DbSeries
{
SeriesId = x.SeriesId,
MeasurementsSeries = x.MeasurementsSeries.Select(ms => new DbMeasurement
{
MeasurementId = ms.MeasurementId,
FrontDropPhotoId = ms.FrontDropPhotoId,
FrontDropPhoto = new DbDropPhoto
{
PhotoId = ms.FrontDropPhotoId
},
SideDropPhotoId = ms.SideDropPhotoId,
SideDropPhoto = new DbDropPhoto
{
PhotoId = ms.SideDropPhotoId
},
})
}).ToListAsync();

Firstly, async/await will not help you here. It isn't a "go faster" type of operation, it is about accommodating systems that "can be doing something else while this operation is computing". If anything, it makes an operation slower in exchange for making a system more responsive.
My recommendation would be to separate your concerns: On the one hand you want to display detailed data. On the other hand you want to plot an overall graph. Separate these. A user doesn't need to see details for every record at one time, paginating it server-side will greatly reduce the raw amount of data at any one time. Graphs want to see all data, but they don't care about "heavy" details like bitmaps.
The next thing would be to separate your view's model from your domain model (entity). Doing stuff like:
var measurementToAdd = new DbMeasurement
{
CurrentSeries = series,
MeasurementId = measurement.MeasurementId,
FrontDropPhotoId = measurement.FrontDropPhotoId,
FrontDropPhoto = measurement.FrontDropPhotoId.HasValue
? await GetDbDropPhotoById(measurement.FrontDropPhotoId.Value)
: null,
SideDropPhotoId = measurement.SideDropPhotoId,
SideDropPhoto = measurement.SideDropPhotoId.HasValue
? await GetDbDropPhotoById(measurement.SideDropPhotoId.Value)
: null,
};
... is just asking for trouble. Any code that accepts a DbMeasurement should receive a complete, or completable DbMeasurement, not a partially populated entity. It will burn you in the future. Define a view model for the data grid and populate it. This way you clearly differentiate what is an entity model and what is the view's model.
Next, for the data grid, absolutely implement server-side pagination:
public ICollection<MeasurementViewModel> GetMeasurements(int seriesId, int pageNumber, int pageSize)
{
using (var context = new DDropContext())
{
var measurementsForSeries = await context.Measurements
.Where(x => x.CurrentSeriesId == seriesId)
.Select(x => new MeasurementViewModel
{
MeasurementId = x.MeasurementId,
FromDropPhoto = x.FromDropPhoto.ImageData,
SideDropPhoto = x.SideDropPhoto.ImageData
})
.Skip(pageNumber*pageSize)
.Take(pageSize)
.ToList();
return measurementsForSeries;
}
}
This assumes that we want to pull image data for the rows if available. Leverage the navigation properties for related data in the query rather than iterating over results and going back to the database for each and every row.
For the graph plot you can return either the raw integer data or a data structure for just the fields needed rather than relying on the data returned for the grid. It can be pulled for the entire table without having the "heavy" image data. It may seem counter-productive to go to the database when the data might already be loaded once already, but the result is two highly efficient queries rather than one very inefficient query trying to serve two purposes.

Why are you reinventing the wheel and manually loading and constructing your related entities? You’re causing an N+1 selects problem resulting in abhorrent performance. Let EF query for related entities efficiently via .Include
Example:
var results = context.Series
.AsNoTracking()
.Include( s => s.MeasurementSeries )
.ThenInclude( ms => ms.FrontDropPhoto )
.Where( ... )
.ToList(); // should use async
This will speed up execution dramatically though it may still not be quick enough for your requirments if it needs to construct hundreds of thousands to millions of objects, in which case you can retrieve the data in concurrent batches.

Related

How to query IQueryable with Include - ThenInclude?

In .NET Core 2.2 I'm stuck with filtering IQueryable built as:
_context.Ports.Include(p => p.VesselsPorts)
.ThenInclude(p => p.Arrival)
.Include(p => p.VesselsPorts)
.ThenInclude(p => p.Departure)
.OrderBy(p => p.PortLocode);
in many-to-many relation. And the entity models are such as:
public class PortModel
{
[Key]
public string PortLocode { get; set; }
public double? MaxKnownLOA { get; set; }
public double? MaxKnownBreadth { get; set; }
public double? MaxKnownDraught { get; set; }
public virtual ICollection<VesselPort> VesselsPorts { get; set; }
}
public class VesselPort
{
public int IMO { get; set; }
public string PortLocode { get; set; }
public DateTime? Departure { get; set; }
public DateTime? Arrival { get; set; }
public VesselModel VesselModel { get; set; }
public PortModel PortModel { get; set; }
}
Based on this this SO answer I managed to create LINQ like that:
_context.Ports.Include(p => p.VesselsPorts).ThenInclude(p => p.Arrival).OrderBy(p => p.PortLocode)
.Select(
p => new PortModel
{
PortLocode = p.PortLocode,
MaxKnownBreadth = p.MaxKnownBreadth,
MaxKnownDraught = p.MaxKnownDraught,
MaxKnownLOA = p.MaxKnownLOA,
VesselsPorts = p.VesselsPorts.Select(vp => vp.Arrival > DateTime.UtcNow.AddDays(-1)) as ICollection<VesselPort>
}).AsQueryable();
BUT what I need is to find all port records, where:
VesselsPorts.Arrival > DateTime.UtcNow.AddDays(-1) quantity is greater than int x = 5 value (for the example). And I have no clue how to do it :/

Thanks to #GertArnold comment, I ended up with query:
ports = ports.Where(p => p.VesselsPorts.Where(vp => vp.Arrival > DateTime.UtcNow.AddDays(-1)).Count() > x);

When using entity framework people tend to use Include instead of Select to save them some typing. It is seldom wise to do so.
The DbContext holds a ChangeTracker. Every complete row from any table that you fetch during the lifetime of the DbContext is stored in the ChangeTracker, as well as a clone. You get a reference to the copy. (or maybe a reference to the original). If you change properties of the data you got, they are changed in the copy that is in the ChangeTracker. During SaveChanges, the original is compared to the copy, to see if the data must be saved.
So if you are fetching quite a lot of data, and use include, then every fetched items is cloned. This might slow down your queries considerably.
Apart from this cloning, you will probably fetch more properties than you actually plan to use. Database management systems are extremely optimized in combining tables, and searching rows within tables. One of the slower parts is the transfer of the selected data to your local process.
For example, if you have a database with Schools and Students, with the obvious one to many-relation, then every Student will have a foreign key to the School he attends.
So if you ask for School [10] with his 2000 Students, then every Student will have a foreign key value of [10]. If you use Include, then you will be transferring this same value 10 over 2000 times. What a waste of processing power!
In entity framework, when querying data, always use Select to select the properties, and Select only the properties that you actually plan to use. Only use Include if you plan to change the fetched items.
Certainly don't use Include to save you some typing!
Requirement: Give me the Ports with their Vessels
var portsWithTheirVessels = dbContext.Ports
.Where(port => ...) // if you don't want all Ports
.Select(port => new
{
// only select the properties that you want:
PortLocode = port.PortLoCode,
MaxKnownLOA = port.MaxKnownLOA,
MaxKnownBreadth = prot.MaxKnownBreadth,
MaxKnownDraught = ports.MaxKnownDraught,
// The Vessels in this port:
Vessels = port.VesselsPort.Select(vessel => new
{
// again: only the properties that you plan to use
IMO = vessel.IMO,
...
// do not select the foreign key, you already know the value!
// PortLocode = vessle.PortLocode,
})
.ToList(),
});
Entity framework knows your one-to-many relation, and knows that if you use the virtual ICollection that it should do a (Group-)Join.
Some people prefer to do the Group-Join themselves, or they use a version of entity framework that does not support using the ICollection.
var portsWithTheirVessels = dbContext.Ports.GroupJoin(dbContext.VesselPorts,
port => port.PortLocode, // from every Port take the primary key
vessel => vessel.PortLocode, // from every Vessel take the foreign key to Port
// parameter resultSelector: take every Port with its zero or more Vessels to make one new
(port, vesselsInThisPort) => new
{
PortLocode = port.PortLoCode,
...
Vessels = vesselsInThisPort.Select(vessel => new
{
...
})
.ToList(),
});
Alternative:
var portsWithTheirVessels = dbContext.Ports.Select(port => new
{
PortLocode = port.PortLoCode,
...
Vessels = dbContext.VesselPorts.Where(vessel => vessel.PortLocode == port.PortLocode)
.Select(vessel => new
{
...
}
.ToList(),
});
Entity framework will translate this also to a GroupJoin.

How to correctly store a number of results in a DTO?

I have stored procedure attached to a DB which should return results from just a simple search. The query is added to my entity and calls a regular method. The problem I face is storing the results from this procedure to a particular DTO as a list.
Is there any way to effectively store the results from this stored procedure as a list to the DTO?
Below is what I have so far
Controller:
[Produces("application/json")]
[RoutePrefix("api/jobs")]
public class OutputController : ApiController
{
private TestCoastalToolsEntities _output;
public OutputController()
{
_output = new TestCoastalToolsEntities();
_output.Configuration.ProxyCreationEnabled = false;
}
/**Search**/
// POST: api/postsearch
[System.Web.Http.HttpPost, System.Web.Http.Route("postsearch")]
public async Task<IHttpActionResult> PostSearch(SearchInputDTO srequest)
{
OutputDTO<SearchInputDTO> output = new OutputDTO<SearchInputDTO>();
SearchInputDTO SearchInput = null;
var searchString = srequest.SearchValue.ToString();
SearchInput.Results = _output.searchLog2(searchString);
if (_oput != null)
{
output.Success = true;
output.Results = _SearchInput.Results;
var json = new JavaScriptSerializer().Serialize(output);
return Ok(json);
}
return Ok(_ot);
}
}
}
-------------------------------------
Search DTO:
namespace toolPortal.API.Data.DTO
{
public class SearchInputDTO
{
public List<object> Results { get; set; }
public SearchInputDTO(output output) {
this.ID = output.ID;
this.Name = output.Name;
this.Job = output.Job;
this.Start = output.Start;
this.End = output.End;
this.Logs = output.Logs;
}
}
}
The expected result is that the stored procedure runs and stores the list of results to SearchInputResults. From there, those results should be stored in another DTO to be passed off on the return.

With EF you will want to leverage Select() to map the entities to your DTO, though you will need to consider the entire structure of the DTO. For instance, what is the "Logs" data structure going to comprise of? Is it a single string value, a list of strings, or a list of log records?
Using Select() you need to leverage property setters, not a constructor accepting an entity.
So a pattern like this:
public class Entity
{
public string Field { get; set; }
}
public class Dto
{
public string Field { get; set; }
}
var dtos = context.Entities
.Where(x => x.IsActive)
.Select(x => new Dto
{
Field = x.Field
})
.ToList();
Looking at your example with the constructor:
public class Dto
{
public string Field { get; private set; }
public Dto(Entity entity)
{
Field = entity.Field;
}
}
var dtos = context.Entities
.Where(x => x.IsActive)
.Select(x => new Dto(x))
.ToList();
This doesn't work with EF & Select. EF can map to an object, but only via properties and a parameterless constructor. There is a hack around this to be aware of, but avoid if you do see it:
var dtos = context.Entities
.Where(x => x.IsActive)
.ToList()
.Select(x => new Dto(x))
.ToList();
With the extra ToList() before the select, the call will work because EF will execute the query and return the list of entities, then the Select() will be performed as a Linq2Object query. The reason you should avoid this is because EF will select all properties from the entity, where we should only pull back the properties we care about. It's also easy to fall into a lazy-load performance trap if your Dto constructor population starts iterating over related entities. Using Select to load just the fields you need from an entity and any related entities allows EF to build an efficient query for just the data needed without any lazy load traps.
Using AutoMapper you can simplify this by setting up the mapping from entity to DTO then leveraging ProjectTo<Dto>().
So, if you want a DTO to represent the results (such as a success flag, error message) with a collection of the results if successful:
[Serializable]
// Our results container.
public class SearchResultsDTO
{
public bool IsSuccessful { get; private set; } = false;
public string ErrorMessage { get; private set; }
public ICollection<SearchResultDTO> Results { get; private set; } = new List<SearchResultDTO>();
private SearchResultsDTO() {}
public static SearchResultsDTO Success(ICollection<SearchResultDTO> results)
{
var results = new SearchResultsDTO
{
IsSuccessful = true,
Results = results
};
return results;
}
public static SearchResultsDTO Failure(string errorMessage)
{
var results = new SearchResultsDTO
{
ErrorMessage = errorMessage
};
return results;
}
}
[Serializable]
public class SearchResultDTO
{
public int ID {get; set;}
public string Name {get; set;}
public string Job {get; set;}
public DateTime Start {get; set;}
public DateTime End {get; set;}
public ICollection<string> Logs {get; set;} = new List<string>();
}
then to populate these from a DbContext: (Inside a Repository or wherever reads the data)
using (var context = new SearchContext())
{
var results = context.Logs
.Where(x => x.Name.Contains(sRequest))
.Select(x => new SearchResultDTO
{
ID = x.ID,
Name = x.Name,
Job = x.Job,
Start = x.Start,
End = x.End,
Logs = x.LogLines.Select(y => y.Line).ToList(),
}).ToList();
var resultDto = SearchResultsDTO.Success(results);
return resultsDto;
}
This assumes that the log entry has a Job, name, start, end date/times, and then a list of "lines" or entries to display as "Logs". (Where the Log table has a related LogLine table for example with the one or more lines) This demonstrates how to leverage Select to map not only the log record into a DTO, but also to map related records into something like a collection of strings, or a collection of other DTOs can be done as well.
Once it selects the DTO, I have it fill a container DTO using static factory methods to populate either a successful read, or a failed read. (which can be set in an exception handler for example.) Alternatively you can just new up a container class and populate properties, use a constructor /w parameters, or just return the list of DTOs. The SearchResultsDTO container is not referenced within the EF query.

Best way to load navigation properties in new entity

I am trying to add new record into SQL database using EF. The code looks like
public void Add(QueueItem queueItem)
{
var entity = queueItem.ApiEntity;
var statistic = new Statistic
{
Ip = entity.Ip,
Process = entity.ProcessId,
ApiId = entity.ApiId,
Result = entity.Result,
Error = entity.Error,
Source = entity.Source,
DateStamp = DateTime.UtcNow,
UserId = int.Parse(entity.ApiKey),
};
_statisticRepository.Add(statistic);
unitOfWork.Commit();
}
There is navigation Api and User properties in Statistic entity which I want to load into new Statistic entity. I have tried to load navigation properties using code below but it produce large queries and decrease performance. Any suggestion how to load navigation properties in other way?
public Statistic Add(Statistic statistic)
{
_context.Statistic.Include(p => p.Api).Load();
_context.Statistic.Include(w => w.User).Load();
_context.Statistic.Add(statistic);
return statistic;
}
Some of you may have question why I want to load navigation properties while adding new entity, it's because I perform some calculations in DbContext.SaveChanges() before moving entity to database. The code looks like
public override int SaveChanges()
{
var addedStatistics = ChangeTracker.Entries<Statistic>().Where(e => e.State == EntityState.Added).ToList().Select(p => p.Entity).ToList();
var userCreditsGroup = addedStatistics
.Where(w => w.User != null)
.GroupBy(g => g.User )
.Select(s => new
{
User = s.Key,
Count = s.Sum(p=>p.Api.CreditCost)
})
.ToList();
//Skip code
}
So the Linq above will not work without loading navigation properties because it use them.
I am also adding Statistic entity for full view
public class Statistic : Entity
{
public Statistic()
{
DateStamp = DateTime.UtcNow;
}
public int Id { get; set; }
public string Process { get; set; }
public bool Result { get; set; }
[Required]
public DateTime DateStamp { get; set; }
[MaxLength(39)]
public string Ip { get; set; }
[MaxLength(2083)]
public string Source { get; set; }
[MaxLength(250)]
public string Error { get; set; }
public int UserId { get; set; }
[ForeignKey("UserId")]
public virtual User User { get; set; }
public int ApiId { get; set; }
[ForeignKey("ApiId")]
public virtual Api Api { get; set; }
}

As you say, the following operations against your context will generate large queries:
_context.Statistic.Include(p => p.Api).Load();
_context.Statistic.Include(w => w.User).Load();
These are materialising the object graphs for all statistics and associated api entities and then all statistics and associated users into the statistics context
Just replacing this with a single call as follows will reduce this to a single round trip:
_context.Statistic.Include(p => p.Api).Include(w => w.User).Load();
Once these have been loaded, the entity framework change tracker will fixup the relationships on the new statistics entities, and hence populate the navigation properties for api and user for all new statistics in one go.
Depending on how many new statistics are being created in one go versus the number of existing statistics in the database I quite like this approach.
However, looking at the SaveChanges method it looks like the relationship fixup is happening once per new statistic. I.e. each time a new statistic is added you are querying the database for all statistics and associated api and user entities to trigger a relationship fixup for the new statistic.
In which case I would be more inclined todo the following:
_context.Statistics.Add(statistic);
_context.Entry(statistic).Reference(s => s.Api).Load();
_context.Entry(statistic).Reference(s => s.User).Load();
This will only query for the Api and User of the new statistic rather than for all statistics. I.e you will generate 2 single row database queries for each new statistic.
Alternatively, if you are adding a large number of statistics in one batch, you could make use of the Local cache on the context by preloading all users and api entities upfront. I.e. take the hit upfront to pre cache all user and api entities as 2 large queries.
// preload all api and user entities
_context.Apis.Load();
_context.Users.Load();
// batch add new statistics
foreach(new statistic in statisticsToAdd)
{
statistic.User = _context.Users.Local.Single(x => x.Id == statistic.UserId);
statistic.Api = _context.Api.Local.Single(x => x.Id == statistic.ApiId);
_context.Statistics.Add(statistic);
}
Would be interested to find out if Entity Framework does relationship fixup from its local cache.
I.e. if the following would populate the navigation properties from the local cache on all the new statistics. Will have a play later.
_context.ChangeTracker.DetectChanges();
Disclaimer: all code entered directly into browser so beware of the typos.

Sorry I dont have the time to test that, but EF maps entities to objects. Therefore shouldnt simply assigning the object work:
public void Add(QueueItem queueItem)
{
var entity = queueItem.ApiEntity;
var statistic = new Statistic
{
Ip = entity.Ip,
Process = entity.ProcessId,
//ApiId = entity.ApiId,
Api = _context.Apis.Single(a => a.Id == entity.ApiId),
Result = entity.Result,
Error = entity.Error,
Source = entity.Source,
DateStamp = DateTime.UtcNow,
//UserId = int.Parse(entity.ApiKey),
User = _context.Users.Single(u => u.Id == int.Parse(entity.ApiKey)
};
_statisticRepository.Add(statistic);
unitOfWork.Commit();
}
I did a little guessing of your namings, you should adjust it before testing

How about make a lookup and load only necessary columns.
private readonly Dictionary<int, UserKeyType> _userKeyLookup = new Dictionary<int, UserKeyType>();
I'm not sure how you create a repository, you might need to clean up the lookup once the saving changes is completed or in the beginning of the transaction.
_userKeyLookup.Clean();
First find in the lookup, if not found then load from context.
public Statistic Add(Statistic statistic)
{
// _context.Statistic.Include(w => w.User).Load();
UserKeyType key;
if (_userKeyLookup.Contains(statistic.UserId))
{
key = _userKeyLookup[statistic.UserId];
}
else
{
key = _context.Users.Where(u => u.Id == statistic.UserId).Select(u => u.Key).FirstOrDefault();
_userKeyLookup.Add(statistic.UserId, key);
}
statistic.User = new User { Id = statistic.UserId, Key = key };
// similar code for api..
// _context.Statistic.Include(p => p.Api).Load();
_context.Statistic.Add(statistic);
return statistic;
}
Then change the grouping a little.
var userCreditsGroup = addedStatistics
.Where(w => w.User != null)
.GroupBy(g => g.User.Id)
.Select(s => new
{
User = s.Value.First().User,
Count = s.Sum(p=>p.Api.CreditCost)
})
.ToList();

Entity Framework 6 Update Graph

What is the correct way to save a graph of objects whose state you don't know? By state I mean whether they are new or existing database entries that are being updated.
For instance, if I have:
public class Person
{
public int Id { get; set; }
public int Name { get; set; }
public virtual ICollection<Automobile> Automobiles { get; set; }
}
public class Automobile
{
public int Id { get; set; }
public int Name { get; set; }
public short Seats { get; set; }
public virtual ICollection<MaintenanceRecord> MaintenanceRecords { get; set ;}
public virtual Person Person { get; set; }
}
public class MaintenanceRecord
{
public int Id { get; set; }
public int AutomobileId { get; set; }
public DateTime DatePerformed { get; set; }
public virtual Automobile Automobile{ get; set; }
}
I'm editing models, similar to these objects above, and then passing those models into the data layer to save, where for this instance I happen to be using entity framework. So I'm translating these models into POCO entities internal to the DAL.
It appears that unless my models have a state indicating whether they are new or updated, I have quite a bit of work to do to "Save" the changes. I have to first select the Person entity, update it, then match any existing Automobiles and update those and add any new, then for each automobile check for any new or updated maintenance records.
Is there a faster/easier way of doing this? It's possible I can keep track of the Model state, which I guess would be helpful with this, but it would mean changes to code outside of the data layer which i would prefer to avoid. I'm just hoping there is a pattern of usage out there that I can follow for updates like this.

I ran into this issue a while back and have been following this thread on the EF Codeplex site. https://entityframework.codeplex.com/workitem/864
Seems like it is being considered for the next release, I'm assuming EF 7, which apparently is a pretty large internal overhaul of EF. This may be worth checking out... http://www.nuget.org/packages/RefactorThis.GraphDiff/
Back when I was working on this I found another EF post on SO, and someone had an example of how to do this manually. At the time I decided to do it manually, not sure why, GraphDiff looks pretty cool. Here is an example of what I did.
public async Task<IHttpActionResult> PutAsync([FromBody] WellEntityModel model)
{
try
{
if (!ModelState.IsValid)
{
return BadRequest(ModelState);
}
var kne = TheContext.Companies.First();
var entity = TheModelFactory.Create(model);
entity.DateUpdated = DateTime.Now;
var currentWell = TheContext.Wells.Find(model.Id);
// Update scalar/complex properties of parent
TheContext.Entry(currentWell).CurrentValues.SetValues(entity);
//We don't pass back the company so need to attached the associated company... this is done after mapping the values to ensure its not null.
currentWell.Company = kne;
// Updated geometry - ARGHHH NOOOOOO check on this once in a while for a fix from EF-Team https://entityframework.codeplex.com/workitem/864
var geometryItemsInDb = currentWell.Geometries.ToList();
foreach (var geometryInDb in geometryItemsInDb)
{
// Is the geometry item still there?
var geometry = entity.Geometries.SingleOrDefault(i => i.Id == geometryInDb.Id);
if (geometry != null)
// Yes: Update scalar/complex properties of child
TheContext.Entry(geometryInDb).CurrentValues.SetValues(geometry);
else
// No: Delete it
TheContext.WellGeometryItems.Remove(geometryInDb);
}
foreach (var geometry in entity.Geometries)
{
// Is the child NOT in DB?
if (geometryItemsInDb.All(i => i.Id != geometry.Id))
// Yes: Add it as a new child
currentWell.Geometries.Add(geometry);
}
// Update Surveys
var surveyPointsInDb = currentWell.SurveyPoints.ToList();
foreach (var surveyInDb in surveyPointsInDb)
{
// Is the geometry item still there?
var survey = entity.SurveyPoints.SingleOrDefault(i => i.Id == surveyInDb.Id);
if (survey != null)
// Yes: Update scalar/complex properties of child
TheContext.Entry(surveyInDb).CurrentValues.SetValues(survey);
else
// No: Delete it
TheContext.WellSurveyPoints.Remove(surveyInDb);
}
foreach (var survey in entity.SurveyPoints)
{
// Is the child NOT in DB?
if (surveyPointsInDb.All(i => i.Id != survey.Id))
// Yes: Add it as a new child
currentWell.SurveyPoints.Add(survey);
}
// Update Temperatures - THIS IS A HUGE PAIN = HOPE EF is updated to handle updating disconnected graphs.
var temperaturesInDb = currentWell.Temperatures.ToList();
foreach (var tempInDb in temperaturesInDb)
{
// Is the geometry item still there?
var temperature = entity.Temperatures.SingleOrDefault(i => i.Id == tempInDb.Id);
if (temperature != null)
// Yes: Update scalar/complex properties of child
TheContext.Entry(tempInDb).CurrentValues.SetValues(temperature);
else
// No: Delete it
TheContext.WellTemperaturePoints.Remove(tempInDb);
}
foreach (var temps in entity.Temperatures)
{
// Is the child NOT in DB?
if (surveyPointsInDb.All(i => i.Id != temps.Id))
// Yes: Add it as a new child
currentWell.Temperatures.Add(temps);
}
await TheContext.SaveChangesAsync();
return Ok(model);
}
catch (Exception ex)
{
Trace.WriteLine(ex.Message);
}
return InternalServerError();
}

This is a huge pain to me too. I extracted the answer from #GetFuzzy to a more reusable method:
public void UpdateCollection<TCollection, TKey>(
DbContext context, IList<TCollection> databaseCollection,
IList<TCollection> detachedCollection,
Func<TCollection, TKey> keySelector) where TCollection: class where TKey: IEquatable<TKey>
{
var databaseCollectionClone = databaseCollection.ToArray();
foreach (var databaseItem in databaseCollectionClone)
{
var detachedItem = detachedCollection.SingleOrDefault(item => keySelector(item).Equals(keySelector(databaseItem)));
if (detachedItem != null)
{
context.Entry(databaseItem).CurrentValues.SetValues(detachedItem);
}
else
{
context.Set<TCollection>().Remove(databaseItem);
}
}
foreach (var detachedItem in detachedCollection)
{
if (databaseCollectionClone.All(item => keySelector(item).Equals(keySelector(detachedItem)) == false))
{
databaseCollection.Add(detachedItem);
}
}
}
With this method in place I can use it like this:
public void UpdateProduct(Product product)
{
...
var databaseProduct = productRepository.GetById(product.Id);
UpdateCollection(context, databaseProduct.Accessories, product.Accessories, productAccessory => productAcccessory.ProductAccessoryId);
UpdateCollection(context, databaseProduct.Categories, product.Categories, productCategory => productCategory.ProductCategoryId);
...
context.SubmitChanges();
}
However when the graph gets deeper, I have a feeling this will not be sufficient.

What your looking for is the Unit of Work pattern:
http://msdn.microsoft.com/en-us/magazine/dd882510.aspx
You can either track UoW on the client and pass it in with the DTO or have the server figure it out. Both the veritable DataSet and EF Entities have their own internal implementation of UoW. For something stand alone there is this framework, but I have never used it so have no feedback:
http://genericunitofworkandrepositories.codeplex.com/
Alternatively another option is to do real time updates with undo functionality, kind of like when you go into Gmail contacts and it saves the changes as you make them with the option to undo.

It depends HOW you are accomplishing adding/changing the entities.
I think you may be trying to do too much with an entity at any given time. Allowing editing and adding at the same time can get you into a situation where your not sure what is being done with the entity, especially in a disconnected scenario. You should only perform a single action on a single entity at a time, unless you are deleting entities. Does this seem monotonous, sure, but 99% of your users want a clean and easily understandable interface. Many time we end up making screens of our applications "god" screens where everything and anything can be done. Which 9/10 times isn't needed (YAGNI).
This way, when you edit a user, you know you are doing an update operation. If you are adding a new maintenance record, you know you are creating a new record that is attached to an automobile.
To summarize, you should limit how many operations you are making available for a single screen and make sure you provide some type of unique information for the entity so you can try to look up the entity to see if it exists.

I had the similar problem, and couldnt find my own solution. I think that problem is complex. Complete solution for updating graphs in disconected scenario with EF6 I find in extension method RefactoringThis.GraphDiff produced by Brent McKendric.
Exemple brings by author is:
using (var context = new TestDbContext())
{
// Update the company and state that the company 'owns' the collection Contacts.
context.UpdateGraph(company, map => map
.OwnedCollection(p => p.Contacts, with => with
.AssociatedCollection(p => p.AdvertisementOptions))
.OwnedCollection(p => p.Addresses)
);
context.SaveChanges();
}
See more at:
http://blog.brentmckendrick.com/introducing-graphdiff-for-entity-framework-code-first-allowing-automated-updates-of-a-graph-of-detached-entities/

How to get around "Internal .NET Framework Data Provider error 1025."?

I am using the Entity Framework 4.3, POCO, database first and I am getting the following error:
Internal .NET Framework Data Provider error 1025.
QUESTION: I think that my query expresses my intent but I seem to be hitting this error, so I am wondering if anyone knows how I could structure my query differently to get around this error?
Here is the scenario...
I have a SQL server 2008 database that has 2 tables - A and B:
A
AId (int - not null - identity - primary key)
AName (nvarchar(10) - not null)
B
BId (int - not null - identity - primary key)
SomeName (nvarchar(10) - not null)
AId (int - not null - foreign key connecting to AId in the table A)
I then define the context like so:
public class DatabaseContext : DbContext
{
public DatabaseContext(string name)
: base(name)
{
Configuration.AutoDetectChangesEnabled = false;
As = Set<A>();
Bs = Set<B>();
}
public DbSet<A> As { get; private set; }
public DbSet<B> Bs { get; private set; }
}
And the entity classes like so:
public class A
{
public int AId { get; set; }
public string AName { get; set; }
public virtual ICollection<B> Bs { get; private set; }
public void AddB(B b)
{
if (b == null)
{
throw new ArgumentNullException("b");
}
if (Bs == null)
{
Bs = new List<B>();
}
if (!Bs.Contains(b))
{
Bs.Add(b);
}
b.A = this;
}
}
public class B
{
public int BId { get; set; }
public A A { get; set; }
public string SomeName { get; set; }
}
Now for the query...
What I want is all of the As where every "B SomeName" is in the list of names supplied so I do this:
var names = new[] {"Name1", "Name2"};
var ctx = new DatabaseContext("EFPlayingEntities");
var res = ctx.As.Where(a => a.Bs.Select(b => b.SomeName).All(names.Contains));
// Here I evaluate the query and I get:
// Internal .NET Framework Data Provider error 1025.
Console.WriteLine(res.Count());
To be clear about what I mean, if the table data looks like this:
AId,AName
1,A1
2,A2
3,A3
4,A4
BId,SomeName,AId
1,Name1,1
2,Name2,1
3,Name1,2
4,Name1,3
5,Name3,3
6,Name1,4
7,Name2,4
I would expect to get back A1, A2 and A4 (so that count call above would return 3).

The reason why this happens is subtle.
Queryable.All need to be called with an Expression. Passing in just the method 'reference' creates a delegate, and subsequently, Enumerable.All becomes the candidate instead of the intended Queryable.All.
This is why your solution you posted as an answer works correctly.
EDIT
so if you write the statement as this, it will work without exception:
var res = ctx.As.Where(
a => a.Bs.Select(b => b.SomeName).All(b => names.Contains(b)));

I have worked out a solution to this, in case anyone is interested. Doing the following is equivalent and does not result in the exception in the question:
var res = ctx
.Bs
.GroupBy(b => b.A)
.Where(g => g.All(b => names.Contains(b.SomeName)))
.Select(g => g.Key);
I do not know if this is the best way though!?

The semantics of your query look good to me; clearly, getting an internal provider error is not the intended behaviour! I would have expected some more explicit mesage about EF not being able to translate your query into a store operation, if that's in fact what the problem is.
Another way to do what you want would be:
var names = new[] {"Name1", "Name2"};
var nameCount = names.Length;
var ctx = new DatabaseContext("EFPlayingEntities");
var result = ctx.As
.Where(a => a.Bs
.Select(b => b.SomeName)
.Intersect(names)
.Count() == a.Bs.Count());
(get every A such that intersecting its Bs' names with the fixed list gives all the Bs)
although I haven't tried this to see if EF can translate this successfully.
Another way:
var names = new[] {"Name1", "Name2"};
var ctx = new DatabaseContext("EFPlayingEntities");
var result = ctx.As
.Where(a => !a.Bs.Select(b => b.SomeName).Except(names).Any());
(get every A such that the list of its Bs' names is reduced to nothing by taking out the fixed list)
also untried.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to Fetch a Lot of Records with EF6 - c#

Related

How to query IQueryable with Include - ThenInclude?

How to correctly store a number of results in a DTO?

Best way to load navigation properties in new entity

Entity Framework 6 Update Graph

How to get around "Internal .NET Framework Data Provider error 1025."?

Categories

Resources