I am trying to insert documents into MongoDB but I want to have only unique documents and whenever encounter a duplicate document, just ignore it if it is already exists and go to the next one. I am using the following code but apparently it does not work.
var keys = IndexKeys.Ascending("TrackingNumber");
var options = IndexOptions.SetUnique(true).SetDropDups(true);
_collection.CreateIndex(keys, options);`
If you really want to ignore these, it's probably best to do it in code, though that might not be that easy in a multi-client environment.
The dropDups flag is a parameter of the index creation only, so it will drop duplicates it finds while creating the index. The flag will be ignored for inserts afterwards, because it's not even a parameter of the index.
A better way, though not exactly the behavior you're looking for, is to use upserts, i.e. operations that insert a document if not yet present and update it if the document that was searched existed before. That has the advantage of being an idempotent operation (which the ignore strategy is not).
Related
I have a situation wherein a List object is built off of values pulled from a MSSQL database. However, this particular table is mysteriously getting an errant record or two tossed in. Removing the records cause trouble even though they have no referential links to any other tables, and will still get recreated without any known user actions taken. This causes some trouble as it puts unwanted values on display that add a little bit of confusion. The specific issue is that this is a platform that allows users to run a search for quotes, and the filtering allows for sales rep selection. The select/dropdown field is showing these errant values, and they need to be removed.
Given that deleting the offending table rows does not provide a desirable result, I was thinking that maybe the best course of action was to modify the code where the List object is created and either filter the values out or remove them after the object is populated. I'd like to do this in a clean, scalible fashion by providing some kind of appendable data object where I could just add in a new string value if something else cropped up as opposed to doing something clunky that adds new code to find the value and remove it each time.
My thought was to create a string array, and somehow loop through that to remove bad List values, but I wasn't entirely certain that was the best way to approach this, and I could not for the life of me think of a clean approach for this. I would think that the best way would be to add a filter within the Find arguments, but I don't know how to add in an array or list that way. Otherwise I figured to loop through the values either before or after the sorting of the List and remove any matches that way, but I wasn't sure that was the best choice of actions.
I have attached the current code, and would appreciate any suggestions.
int licenseeID = Helper.GetLicenseeIdByLicenseeShortName(Membership.ApplicationName);
List<User> listUsers;
if (Roles.IsUserInRole("Admin"))
{
//get all users
listUsers = User.Find(x => x.LicenseeID == licenseeID).ToList();
}
else
{
//get only the current user
listUsers = User.Find(x => (x.LicenseeID == licenseeID && x.EmailAddress == Membership.GetUser().Email)).ToList();
}
listUsers.Sort((x, y) => string.Compare(x.FirstName, y.FirstName));
-- EDIT --
I neglected to mention that I did not develop this, I merely inherited its maintenance after the original developer(s) disappeared, and my coworker who was assigned to it left the company. I'm not really really skilled at handling ASP.NET sites. Many object sources are hidden and unavailable for edit, I assume due to them being defined in a DLL somewhere. So, for any of these objects that are sourced from database tables, altering the tables will not help, since I would not be able to get the new data anyway.
However, I did try to do the following to filter out the undersirable data:
List<String> exclude = new List<String>(new String[] { "value1" , "value2" });
listUsers = User.Find(x => x.LicenseeID == licenseeID && !exclude.Contains(x.FirstName)).ToList();
Unfortunately it only resulted in an error being displayed to the page.
-- EDIT #2 --
I got the server setup to accept a new event viewer source so I could write info to the Application log to see what was happening. Looks like this installation of ASP.NET does not accept "Contains" as an action on a List object. An error gets kicked out stating that the method is not available.
I will probably add a bit to the table and flag Errant rows and then skip them when I query the table, something like
&& !ErrantData
Other way, that requires a bit more upkeep but doesn't require db change, would be to keep a text file that gets periodically updated and you read it and remove users from list based on it.
The bigger issue is unknown rows creeping in your database. Changing user credentials and adding creation timestamps may help you narrow down the search scope.
we recently had a migration project that went badly wrong and we now have 1000's of duplicate records. The business has been working with them which has made the issue worse as we now have records that have the same name and address but could have different contact information. A small number are exact duplicates. we have started the panful process of manually merging the records but this is very slow. Can anyone suggest another way of tackling the problem please?
You can write a console app quickly to merge them & refer the MSDN sample code for the same.
Sample: Merge two records
// Create the target for the request.
EntityReference target = new EntityReference();
// Id is the GUID of the account that is being merged into.
// LogicalName is the type of the entity being merged to, as a string
target.Id = _account1Id;
target.LogicalName = Account.EntityLogicalName;
// Create the request.
MergeRequest merge = new MergeRequest();
// SubordinateId is the GUID of the account merging.
merge.SubordinateId = _account2Id;
merge.Target = target;
merge.PerformParentingChecks = false;
// Execute the request.
MergeResponse merged = (MergeResponse)_serviceProxy.Execute(merge);
When merging two records, you specify one record as the master record, and Microsoft Dynamics CRM treats the other record as the child record or subordinate record. It will deactivate the child record and copies all of the related records (such as activities, contacts, addresses, cases, notes, and opportunities) to the master record.
Read more
Building on #Arun Vinoth's answer, you might want to see what you can leverage with out-of-box duplicate detection to get sets of duplicates to apply the merge automation to.
Alternatively you can build your own dupe detection to match records on the various fields where you know dupes exist. I've done similar things to compare records across systems, including creating match codes to mimic how Microsoft does their dupe detection in CRM.
For example, a contact's match codes might be
1. the email address
2. the first name, last name, and company concatenated together without spaces.
If you need to match Companies, you can implement the an algorithm like Scribe's stripcompany to generate matchcodes based on company names.
Since this seems like a huge problem you may want to consider drastic solutions like deactivating the entire polluted data set and redoing the data import clean, then finding any of the deactivated records that got touched in the interim to merge them, then deleting the entire polluted (deactivated) data set.
Bottom line, all paths seem to lead to major headaches and the only consolation is that you get to choose which path to follow.
I have a document I want to upsert. It has a unique index on one of the properties, so I have something like this to ensure I get no collisions
var barVal = 1;
collection.UpdateOne(
x=>x.Bar == barVal,
new UpdateDefinitionBuilder<Foo>().Set(x=>x.Bar, barVal),
new UpdateOptions { IsUpsert = true });
But I seem to sometimes get collisions from this on the unique index on bar.
Is mongo atomic around upserts, so if the filter matches the document cant be changed before the update completes?
If it is I probably have a problem somewhere else, if its not I need to handle the fact its not.
The docs don't seem to sugguest that this is one way or the other.
https://docs.mongodb.com/v3.2/reference/method/Bulk.find.upsert/
https://docs.mongodb.com/v3.2/reference/method/db.collection.update/
Actually, docs says something about this. Here is what I found in db.collection.update#use-unique-indexes
To avoid inserting the same document more than once, only use upsert:
true if the query field is uniquely indexed.
...
With a unique index, if multiple applications issue the same update
with upsert: true, exactly one update() would successfully insert a
new document.
The remaining operations would either:
update the newly inserted document, or
fail when they attempted to insert a duplicate.
If the operation fails because of a duplicate index key error,
applications may retry the operation which will succeed as an update
operation.
So, if you have created a unique index on the field you are querying, it is guaranteed that the insertion is "atomic" and a sort of rollback is performed if a failure occures.
I have a .NET application written in C#, and use Mongo for my database backend. One of my collections, UserSearchTerms, repeatedly (and unintentionally) has duplicate documents created.
I've teased out the problem to an update function that gets called asynchronously, and can be called multiple times simultaneously. In order to avoid problems with concurrent runs, I've implemented this code using an update which I trigger on any documents that match a specific query (unique on user and program), upserting if no documents are found.
Initially, I can guarantee that no duplicates exist and so expect that only the following two cases can occur:
No matching documents exist, triggering an upsert to add a new document
One matching document exists, and so an update is triggered only on that one document
Given these two cases, I expect that there would be no way for duplicate documents to be inserted through this function - the only time a new document should be inserted is if there are none to begin with. Yet over an hour or so, I've found that even though documents for a particular user/program pair exist, new documents for them are created.
Am I implementing this update correctly to guarantee that duplicate documents will not be created? If not, what is the proper way to implement an update in order to assure this?
This is the function in question:
public int UpdateSearchTerm(UserSearchTerm item)
{
_userSearches = _uow.Db.GetCollection<UserSearchTerm>("UserSearchTerms");
var query = Query.And(Query<UserSearchTerm>.EQ(ust => ust.UserId, item.UserId), Query<UserSearchTerm>.EQ(ust => ust.ProgramId, item.ProgramId));
_userSearches.Update(query, Update<UserSearchTerm>.Replace(item), new MongoUpdateOptions { Flags = UpdateFlags.Upsert });
return (int)_userSearches.Count(query);
}
Additional Information:
I'm using mongod version 2.6.5
The mongocsharpdriver version I'm using is 1.9.2
I'm running .NET 4.5
UserSearchTerms is the collection I store these documents in.
The query is intended to match users on both userId AND programId - my definition of a 'unique' document.
I return a count after the fact for debugging purposes.
You could add a unique index on userId and programId to ensure that no duplicate will be inserted
Doc : https://docs.mongodb.org/v2.4/tutorial/create-a-unique-index/
In MongoDB, accessing from C# driver:
I want to keep a list of keys (ints are fine), that have a current value. (Dictionary<int,int>) works well for the concept)
I need to have multiple (10+) machines setting values in this Document. Multiple threads on each machine.
Using C# and the MongoDB driver I need to:
If the key exists, increment the value for that key.
If it does not exist, I need to Add it, and set the value to 1.
Critically Important:
It can't overwrite values others are writing (I.E. No get doc and call Save() to save it)
It has to handle adding values that don't already exist gracefully.
I might be able to have a query that inserts a new document with all of the keys set to values of 0 - if this would help, but it won't be easy, so that is not a preferred answer.
I've tried using a Dictionary, and can't seem to figure out how to update it without it creating:
null,
{ v=1}
On my insert (which doesn't include the k=, and has the null, that I don't want. and causes deserialization to blow)
I don't care what method of serialization is used for the dictionary, and am open to any other method of storage.
Any ideas?
My best guess so far is to keep a list of keys that is separate from the values (two List and append the key to the key list if it isn't found, requery the list and use the first one as the index in the 2nd list. (This seems like it might have concurrency issues that could be hard to track down.)
I would prefer the Linq syntax, but am open to using the .Set (string,value) syntax if that makes things work.