I am looking at the following sample code to include referenced documents and avoid round trip.
var order = session.Query<Order>()
.Customize(x => x.Include<Order>(o=>o.CustomerId)) // Load also the costumer
.First();
var customer = session.Load<Customer>(order.CustomerId);
My question is how does Raven know that this o=>o.CustomerId implies Customer document/collection? At no time was the entity Customer supplied in the query to get the Order entity. Yet Raven claims that the 2nd query to get Customer can be done against the cache, w/o any network trip.
If it's by naming convention, which seems like a very poor/fragile/brittle convention to adopt, what happens when I need to include more than 1 documents?
Eg. a car was purchased under 2 names, so I want to link back to 2 customers, the primary and secondary customer/driver. They're both stored in the Customer collection.
var sale = session.Query<Sale>()
.Customize(x => x.Include<Sale>(o=>o.PrimaryCustomerId).Include<Sale>(o=>o.SecondaryCustomerId)) // Load also the costumer
.First();
var primaryCustomer = session.Load<Customer>(order.PrimaryCustomerId);
var secondaryCustomer = session.Load<Customer>(order.SecondaryCustomerId);
How can I do the above in 1 network trip? How would Raven even knows that this o=>o.PrimaryCustomerId and o=>o.SecondaryCustomerId are references to the one and same table Customer since obviously the property name and collection name don't line up?
Raven doesn't have the concept of "tables". It does know about "collections", but they are just a convenience mechanism. Behind the scenes, all documents are stored in one big database. The only thing that makes a "collection" is that each document has a Raven-Entity-Name metadata value.
Both the examples you showed will result in one round trip (each). Your code looks just fine to me.
My question is how does Raven know that this o=>o.CustomerId implies Customer document/collection? At no time was the entity Customer supplied in the query to get the Order entity.
It doesn't need to be supplied in the query. As long as the data stored in the CustomerId field of the Sale document is a full document key, then that document will be returned to the client and loaded into session.
Yet Raven claims that the 2nd query to get Customer can be done against the cache, w/o any network trip.
That's correct. The session container tracks all documents returned - not just the ones from the query results. So later when you call session.Load using the same document key, it already has it in session so it doesn't need to go back to the server.
Regardless of whether you query, load, or include - the document doesn't get deserialized into a static type until you pull it out of the session. That's why you specify the Customer type in the session.Load<Customer> call.
If it's by naming convention, which seems like a very poor/fragile/brittle convention to adopt ...
Nope, it's by the value stored in the property which is a document key such as "customers/123". Every document is addressable by its document key, with or without knowing the static type of the class.
what happens when I need to include more than 1 documents?
The exact same thing. There isn't a limit on how many documents can be included or loaded into session. However, you should be sure to open the session in a using statement so it is disposed properly. The session is a "Unit of Work container".
How would Raven even knows that this o=>o.PrimaryCustomerId and o=>o.SecondaryCustomerId are references to the one and same table Customer since obviously the property name and collection name don't line up?
Again, it doesn't matter what the names of the fields are. It matters that the data in those fields contains a document id, such as "customers/123". If you aren't storing the full string identifier, then you will need to build the document key inside the lambda expression. In other words, if Sale.CustomerId contains just the number 123, then you would need to include it with .Include<Sale>(o=> "customers/" + o.CustomerId).
Related
we recently had a migration project that went badly wrong and we now have 1000's of duplicate records. The business has been working with them which has made the issue worse as we now have records that have the same name and address but could have different contact information. A small number are exact duplicates. we have started the panful process of manually merging the records but this is very slow. Can anyone suggest another way of tackling the problem please?
You can write a console app quickly to merge them & refer the MSDN sample code for the same.
Sample: Merge two records
// Create the target for the request.
EntityReference target = new EntityReference();
// Id is the GUID of the account that is being merged into.
// LogicalName is the type of the entity being merged to, as a string
target.Id = _account1Id;
target.LogicalName = Account.EntityLogicalName;
// Create the request.
MergeRequest merge = new MergeRequest();
// SubordinateId is the GUID of the account merging.
merge.SubordinateId = _account2Id;
merge.Target = target;
merge.PerformParentingChecks = false;
// Execute the request.
MergeResponse merged = (MergeResponse)_serviceProxy.Execute(merge);
When merging two records, you specify one record as the master record, and Microsoft Dynamics CRM treats the other record as the child record or subordinate record. It will deactivate the child record and copies all of the related records (such as activities, contacts, addresses, cases, notes, and opportunities) to the master record.
Read more
Building on #Arun Vinoth's answer, you might want to see what you can leverage with out-of-box duplicate detection to get sets of duplicates to apply the merge automation to.
Alternatively you can build your own dupe detection to match records on the various fields where you know dupes exist. I've done similar things to compare records across systems, including creating match codes to mimic how Microsoft does their dupe detection in CRM.
For example, a contact's match codes might be
1. the email address
2. the first name, last name, and company concatenated together without spaces.
If you need to match Companies, you can implement the an algorithm like Scribe's stripcompany to generate matchcodes based on company names.
Since this seems like a huge problem you may want to consider drastic solutions like deactivating the entire polluted data set and redoing the data import clean, then finding any of the deactivated records that got touched in the interim to merge them, then deleting the entire polluted (deactivated) data set.
Bottom line, all paths seem to lead to major headaches and the only consolation is that you get to choose which path to follow.
This is a bit of a puzzle I'm trying to figure out.
I am working on a system where we have a number of company records saved in the database. Some of these records are duplicates and are no longer wanted/required.
However, several external systems are still mapping to these invalid records. If we were to delete them entirely it would cause errors to the systems still wanting to get the detail of that company.
The ideal workflow I would like would be;
The external system looks up Company ID X.
The current system has a table which has a record of all the remapped records, so when the request comes in, the table specifies to redirect Company ID X to Company ID Y.
There are a number of endpoints that could be altered one-by-one to do this - but it would be time-consuming, resulting in lots of repetition too.
My question is, using Entity Framework and .Net - is there a smart way of achieving this workflow?
My initial thoughts were to do something with the constructor for the company object, which repopulates the object from EF if a 'redirect' exists, but I don't know if this will play nice with navigation properties.
Would anyone have an idea?
Thanks very much.
You can create a column with foreign key for the same table to express the single unique valid company.
For example, you can add DuplicateOf column:
ALTER TABLE [Company]
ADD COLUMN [DuplicateOf] bigint NULL,
FOREIGN KEY [DuplicateOf] REFERENCES [Company] ([Id]);
and express this relation in your code:
public class Company
{
// ...
public Company DuplicateOf { get; set; }
// May be useful, hides check for duplicate logic:
public bool IsDuplicate => DuplicateOf != null;
// May be useful as well,
// returns the non-duplicate uniue company, not a duplicate, either linked or current:
public Company EffectiveCompany => DuplicateOf ?? this;
}
You will have to address EffectiveCompany when you want to work with non-duplicate and maintain this column to always point to the correct record. It will also result into additional query, if eager-loaded.
Another idea is to have a stored procedure GetCompany(bigint id) which will return the effective record - if DuplicateOf exists, or record itself otherwise. It will be good for your external systems and will let you hide all this stuff behind abstraction layer of stored procedure. If you decide to change it in future, then you can easily update it without breaking external systems.
However, for you it isn't always convenient to work with stored procedures with EF.
These are just ideas and not the best solutions, anyway.
In my opinion, the best solution would be to get rid of duplicates, update data everywhere and forget forever about this mess of duplicated data.
Let's say we have a code list of all the countries including their country codes. The country code is primary key of the Countries table and it is used as a foreign key in many places in the database. In my application the countries are usually displayed as dropdowns on multiple forms.
Some of the countries, that used to exists in the past, don't exist any more, for example Serbia and Montenegro, which had the country code of SCG.
I have two objectives:
don't allow the user to use these old values (so these values should not be visible in dropdowns when inserting data)
the user should still be able to (readonly) open old stuff and in this case the deprecated values should be visible in dropdowns.
I see two options:
Rename deprecated values, for instance from 'CountryName' to '!!!!!CountryName'. This approach is the easiest to implement, but with obvious drawbacks.
Add IsActive column to Countries table and set it to false for all deprecated values and true for all other. On all the forms where the user can insert data, display only values which are active. On the readonly forms we can display all values (including deprecated ones) so the user will be able to display old data. But on some of my forms the user should be able to also edit data, which means that the deprecated values should be hidden from him. That means, that each dropbox should have some initialization logic like this: if the data displayed is readonly, then include deprecated values in dropbox and if the data is for edit also, then exclude them. But this is a lot of work and error prone too.
And other ideas?
I deal with this scenario a lot, and use the 'Active' flag to solve the problem, much as you described. When I populate a drop-down list with values, I only load 'active' data and include upto 1 deprecated value, but only if it is being used. (i.e. if I am looking at a person record, and that person has a deprecated country, then that country would be included in the Drop-downlist along with the active countries. I do this in read-only AND in edit modes, because in my cases, if a person record (for example) has a deprecated country listed, they can continue to use it, but once they change it to a non-deprecated country, and then save it, they can never switch back (your use case may vary).
So the key differences is, even in read-only mode I don't add all the deprecated countries to the DDL, just the deprecated country that applies to the record I am looking at, and even then, it is only if that record was already in use.
Here is an example of the logic I use when loading the drop down list:
protected void LoadSourceDropdownList(bool AddingNewRecord, int ExistingCode)
{
using (Entities db = new Entities())
{
if (AddingNewRecord) // when we are adding a new record, only show 'active' items in the drop-downlist.
ddlSource.DataSource = (from q in db.zLeadSources where (q.Active == true) select q);
else // for existing records, show all active items AND the current value.
ddlSource.DataSource = (from q in db.zLeadSources where ((q.Active == true) || (q.Code == ExistingCode)) select q);
ddlSource.DataValueField = "Code";
ddlSource.DataTextField = "Description";
ddlSource.DataBind();
ddlSource.Items.Insert(0, "--Select--");
ddlSource.Items[0].Value = "0";
}
}
If you are displaying the record as read-only, why bother loading the standing data at all?
Here's what I would do:
the record will contain the country code in any case, I would also propose returning the country description (which admittedly makes things less efficient), but when the user loads "old stuff", the business service recognises that this record will be read only, and you don't bother loading the country list (which would make things more efficient).
in my presentation service I will then generally do a check to see whether the list of countries is null. If not (r/w) load the data into the list box, if so (r/o) populate the list box from the data in the record - a single entry in the list equals read-only.
You can filter with CollectionViewSource or you could just create a Public Enumerable that filters the full list using LINQ.
CollectionViewSource Class
LINQ The FieldDef.DispSearch is the active condition. IEnumerable is a little better performance than List.
public IEnumerable<FieldDefApplied> FieldDefsAppliedSearch
{
get
{
return fieldDefsApplied.Where(df => df.FieldDef.DispSearch).OrderBy(df => df.FieldDef.DispName);
}
}
Why would you still want to display (for instance) customer-addresses with their OLD country-code?
If I understand correctly, you currently still have 'address'-records that still point to 'Serbia and Montenegro'. I think if you solve that problem, your current question would be none-existent.
The term "country" is perhaps a little misleading: not all the "countries" in ISO 3166 are actually independent. Rather, many of them are geographically separate territories that are legally portions or dependencies of other countries.
Also note that 'withdrawn country-codes' are reserved for 5 years, meaning that after 5 years they may be reused. So moving away from using the country-code itself as primary key would make sense to me, especially if for historical reasons you would need to back-track previous country-codes.
So why not make the 'withdrawn' field/table that points to the new country-id's. You can still check (in sql for instance, since you were already using a table) if this field is empty or not to get a true/false check if you need it.
The way I see it: "Country" codes may change, country's may merge and country's may divide.
If country's change or merge, you can update your address-records with a simple query.
If country's divide, you need a way to determine what address is part of what country.
You could use some automated system do do this (and write lengthly books about it).
OR
(when it is a forum like site), you could ask the users that still have a withdrawn country that points to multiple alternatives in their account to update their country-entry at login, where they can only choose from the list of new country's that are specified in the withdrawn field.
Think of this simplified country-table setup:
id cc cn withdrawn
1 DE Germany
2 CS Serbia and Montenegro 6,7
3 RH Southern Rhodesia 5
4 NL The Netherlands
5 ZW Zimbabwe
6 RS Serbia
7 ME Montenegro
In this example, address-records with country-id 3, get updated with a query to country-id 5, no user interaction (or other solution) needed.
But address-records that specify country-id 2 will be asked to select country-id 6 or 7 (of course in the text presented to the user you use the country-name) or are selected to perform your custom automated update routine on.
Also note: 'withdrawn' is a repeating group and as such you could/should make it into a separate table.
Implementing this idea (without downtime) in your scenario:
sql statement to build a new country-table with numerical id's as primary key.
sql statement to update address-records with new field 'country-id' and fill this field with the country-id from the new country-table that corresponds with country-code specified in that record's address-field.
(sql statement to) create the withdrawn table and populate the correct data with in it.
then rewrite your the sql statements that supply your forms with data
add the check and 'ask user to update country'-routine
let new forms go live
wait/see for unintended bugs
delete old country-table and (now unused) country-code column from the "address"-table
I am very curious what other experts think about this idea!!
I'm creating a database where users can enter some Error Reports and we can view them. I'm making these database with C# in the ASP MVC 3 .NET framework (as the tags imply). Each Error Report has a unique ID, dubbed ReportId, thus none of them are stored under the same Id. However, whenever a User creates a new Error, I pass their User Name and store it in with the rest of the report (I use User.Identity.Name.ToString() to get their name and store it as a string). I know how to get a single item from the data using a lambda expression, like so:
db.DBSetName.Single(g => g.Name == genre)
The above code is based on an MVC 3 tutorial (The Movie Store one) provided by ASP. This was how they taught me how to do it.
My major question is: is there a member function like the .Single one that will parse through the whole database and only output database entries whose stored User Name matches that of the currently logged in user's? Then, I can use this to restrict User's to being only able to edit their own entries, since only their entries would be passed to the User's View.
What would be the best way to implement this? Since the ReportId will not be changed, a new data structure can be created to store the user's Errors and passed through to the Index (or Home) View of that particular controller. From there they should be able to click any edit link, which will pass the stored ReportId back to the Edit Action of this particular controller, which can then search the entire database for it. Am I right in assuming this would work? And would this be ideal, given that the other items in the database are NOT passed through to the Index in this method, meaning the User does not have access to the other items' ReportId's, which the user needs to pass into the Edit Action for it to work? If this is ideal, this is the method that requires me to know how to parse through a database and grab every element that fits a particular description (stored User Name matches User's current User Name).
Or would a better approach be to pass the whole database to the Index View and only output the database entries that have User Name values that match the current logged in user's? I guess this could be done in a foreach loop with a nested if loop, like so:
#foreach(var item in db.Reports)
{
if(item.UserName == User.Identity.Name.ToString())
{
...code to output table...
}
}
But this passes the whole database which gives the user a lot more info than they need. It also gives them potential access to info I don't want them to have. However, I don't have to make a new data structure or database, which should lower server memory usage and fetch time, right? Or are databases passed by copy? If so, this method seems kinda dumb. However, I don't know if the first method would fracture the database potentially, this one certainly would not. Also don't remember if I NEED an else statement in C#, I'm more familiar with C++, where you don't need one and you also don't need {}'s for single line if's, if I need one: please don't judge me too harshly on it!
Small note: I am using CRUD Controllers made with the Entity First Framework in order to edit my database. As such, all creation, reading, updating, and deletion code has been provided for me. I have chosen not to add such basic, common code. If it is needed, I can add it. I will add what the Edit Action looks like:
public ActionResult Edit(string id)
{
Report report = db.Reports.Find(id);
return View(report);
}
It accepts a string as an id, ReportId is the id used and it IS a string. It is a randomly generated GUID string made with the GUID.NewGuid().ToString() function. I will also be doing the comparison of names with:
Model.UserName == User.Identity.Name.ToString()
Which was shown earlier. Sorry if this is too much text, I wanted to provide as much info as possible and not make anyone mad. If more info is needed, it can certainly be provided. So at the end of the post, the major question actually comes down to: which of the above two methods is best? And, if it's the first one, how do I implement something like that?
Thanks for your help!
Unless I'm completely misunderstanding you, you just want .Where()
Like this:
var reports = db.Reports.Where(r => r.genre == inputGenre);
This would get you an IEnumerable of Report, which you could then use however you wish.
Just a random query regarding Microsoft Velocity.
Scenario:
Say I want ALL Orders from my database. In SQL, this is fine, I can do SELECT OrderId,TotalCost... from Orders. This is one round trip to my database, and everyone is happy.
Now, if I'm using Memcached or (as I'm using now) Microsoft Velocity (CTP3), there is no easy way to do this. The two options I see are (in pseudo code)
FOR EACH ORDER
Order = cache.TryGet(OrderId)
if Order is null
Order = db.Get(OrderId)
END FOR EACH
which would be LOADS of roundtrips.
Also, consider I want to get orders by Customer
SQL: Select OrderId....TotalCost from Orders where CustomerId = MyCustomerId
One round trip, everyone is happy.
Using a cached solution there are two solutions I see really:
Solution 1:
DOES CustomerOrderIdsForCustomerId EXIST
NO
POPULATE CustomerOrderIdsForCustomerId FROM DATABASE
YES
FOR EACH OrderId IN CustomerOrdersForCustomerId
cache.TryGet(OrderId)
IF Order IS NULL
Order = db.Get(OrderId)
END FOR EACH
Solution 2 is to hold a serialized list of all the customer orders in it's own cache object. Reduces round trips, but just seems lame.
Can someone shed light on this situation please?
Just because you have a cache doesn't mean you have to use it for every query! In this instance as you've already identified, it's not really helping you and I'd probably go straight through to the database for this sort of thing.
It depends a bit on your application though - if you think customers are regularly going to be looking at their order history, or you have some function that's analysing orders to see what products are hot, then you might want to use some caching to keep load off your SQL server. In that case, I'd probably go with holding in the cache either a DataTable of the orders, or a collection of Orders and query it with LINQ to show the orders for a customer.
Keep in mind that a cache is not supposed to be the permanent store for any data (orders in your case). In this case the cache can help in removing some of the load from your DB server, but something has to load the orders in the cache before you can retrieve them. With that being said, here's a couple of options to consider if you are using velocity that avoid having to loop through a collection. However, you will always have to figure out a way to deal with data that is not in the cache.
Option 1: Use Regions
You can create a region and get all the objects from that region with one call. In your scenario, you could create an Orders region where you can store all the orders and then use the GetObjectsInRegion method to get all the orders in the cache. Note however that this brings back all the orders in the cache... which might or might not have all the orders that you have in the database.
Option 2: Use Regions And Tags
Velocity lets you tag objects that you put in the cache regions and then retrieve them using those tags. So, in your scenario you could tag the order objects with an "order" tag and then use the GetObjectsByTag method to retrieve them. Since you can use multiple tags, you could also tag them with their customer id tag and then pull them out that way.
These 2 options come with some caveats, so be sure to read up on the documentation:
Velocity Tag BasedMethods