db4o: serious problems and incoherences

db4o: serious problems and incoherences - c#

I am trying to figure out what happens by tuning my db4o instance but this is really driving me crazy: it simply does not make sense to me.
Essentially I am creating two objects and store the first in an ArrayList of the second. Then I want to remove the first object both form the whole database and from the list where I have initially stored.
Here is a basic list of the operations I am running.
User user = new User("user");
Device device = new Device("device");
objectContainer.ext().store(user,5); // object storing depth
objectContainer.commit();
objectContainer.delete(device);
//objectContainer.close();
//objectContainer = new ...
At this point if I close and I reopen the objectContainer the user deviceList contains a null object, while if I don't close the container (as a normal running application should normally avoid) the device object is still inside the user object, while it is not in the whole database.
I just want the object to be removed from both the list and from database without any null object in place. Is this possible?? I have tried many tuning the configuration (weakReferences, activations, constraints, ...) a lot but without any success.

Why do you have the object still in the list after reopening, but not before?
If you delete a object, it is deleted in the database.
However, db4o doesn't modify any memory content. So before reopening, the collection is the 'old' in memory representation of that collection. It still contains the reference to the object. db4o won't remove it.
After reopening: The collection is loaded from the database. Since the object has been removed from the database, it will use a 'null' reference for the object no longer existing.
db4o won't 'remove' objects for you 'magically' from in memory object. You have to ensure that the object model has a consistent state, like any other in memory object graph.\
Here are some tips: http://community.versant.com/documentation/reference/db4o-8.1/net/reference/Content/best_practises/managing_relations.htm

Related

How to get a backup List?

I want to make backup List for Undo/Redo,
But the object in the BackUp List will be change after I modified the object in the original List.
How can I deal with this problem? "const", "in" seems not working.
private List<GeminiFileStruct> BackUpForUndoRedo(List<GeminiFileStruct> gfl,
ToolStripMenuItem tm)
{
var li =
(from i in gfl
select i).ToList();
tm.Enabled = true;
return li;
}
Sorry, it used to be struct. Cause some problem, I change to class.
Could struct has Get/Set???
Green hand to C#.

While Bernoulli IT describes the actual problem with using a shallow copy, I wanted to provide some more background for undo/redo. There are two main approaches for undo/redo
Memento pattern. Before doing a change to an object, a memento is created that can be used to restore the state of said object. This can be applied to the whole application, i.e. before any change, the application state is serialized, just like it would if the user saves to a file. This serialized state can then be restored, just like when loading a file. Assuming there is a function to save to file, and that this represents the application state. Note that serialization/deserialization will implicitly create a deep copy.
Command pattern. Each change should be done by a command that knows how to reverse the change. A downside with this is that it can be complicated to make sure all actions generate these objects correctly.

What you need is a so-called deep copy of the list:
Items in the backup list will be clones of the items in the original
list. Fresh new instances of items with identical properties.
Not a shallow copy:
A backup list with "just" references to items in the original list. This will cause
changes to item A in the backup list to be changed to item A in the
original list because they reference the same item 😉.
Have a look at this SO post or any of these web pages: tutorial 1, tutorial 2.
Deep copying is not a trivial programming technique as you will discover. But under the right assumptions in the right context it can be done safely.
Note
As #Llama points out a deep copy of a list with structs is automagically obtained when doing new List<TStruct>(originalListWithStructs). Struct is a value type and behaves different compared to a reference type.

Optimize nested loop

I'm having a memory issue with my application with a nested for loop and I can't figure out how to improve it. I've tried using linq, but I guess that internally it's the same, because the memory leaks still is there.
EDIT: As I've been requested, I'll provide more information about my problem.
I've got all of my customers (about 400.000) indexed in a Lucene document store. Each customer can be present in more than one agency, exiting some of them than can be in 200-300 agencies.
I need to retrieve all of my customers from the 'global' customer index and build a separate index for each agency, only containing the customers it can see. There are some business rules and security rules that need to be applied to each agency index, so right now, I can't afford to maintain a single customer index for all my agencies.
My process looks like this:
int numDocuments = 400000;
// Get a Lucene Index Searcher from an Index Factory
IndexSearcher searcher = SearcherFactory.Instance.GetSearcher(Enums.CUSTOMER);
// Builds a query that gets everything in the index
Query query = QueryHelper.GetEverythingQuery();
Filter filter = new CachingWrapperFilter(new QueryWrapperFilter(query));
// Sorts by Agency Id
SortField sortField = new SortField("AgencyId, SortField.LONG);
Sort sort = new Sort(sortField);
TopDocs documents = searcher.Search(query, filter, numDocuments, sort);
for (int i = 0; i < numDocuments; i++)
{
Document document = searcher.Doc(documents.scoreDocs[i].doc);
// Builds a customer object from the lucene document
Customer customer = new Customer(document);
// If this nested loop is removed, the memory doesn't grow
foreach(Agency agency in customer.Agencies)
{
// Gets a writer from a factory for the agency id.
IndexWriter writer = WriterFactory.Instance.GetWriter(agency.Id);
// Builds an agency-specific document from the customer
Document customerDocument = customer.GetAgencyDocument(agency.Id);
// Adds the document to the agency's lucene index
writer.AddDocument(customerDocument);
}
}
EDIT: The solution
The problem was I wasn't reusing the instances of the "Document" object in the inner loop, and that caused an indecent grow of memory usage of my service. Just reusing a single instance of Document for the full process solved my problem.
Thanks everyone.

What I believe to be happening here is:
You have too much object creation inside the loops. If at all possible do no use the new() keyword inside the loops. Initialize objects that are reusable across the loops and pass them data to work on. DO not construct new objects inside that many loops because garbage collection will become a serious problem and the garbage collector may not be able to keep up with you, and will defer collection.
The first thing you can do to try if this is true, try to force garbage collection every X loops and wait for pending finalizers. If this brings memory down you know that this is the problem. And solving it is easy: just do not create new instances every loop iteration.

First you should re-use your Document and Field instances that you pass to IndexWriter.AddDocument() to minimize memory usage and relieve pressure on the garbage collector.
• Re-use Document and Field instances As of Lucene 2.3 there are new
setValue(...) methods that allow you to change the value of a Field.
This allows you to re-use a single Field instance across many added
documents, which can save substantial GC cost. It's best to create a
single Document instance, then add multiple Field instances to it, but
hold onto these Field instances and re-use them by changing their
values for each added document. For example you might have an idField,
bodyField, nameField, storedField1, etc. After the document is added,
you then directly change the Field values (idField.setValue(...),
etc), and then re-add your Document instance.
Note that you cannot re-use a single Field instance within a Document,
and, you should not change a Field's value until the Document
containing that Field has been added to the index.
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

The key may be how you are initializing customers and customer.Agencies. If you can, rather than returning a type of List, make the return types IEnumerable<Customer> and IEnumerable<Agency>. This may allow deferred execution to happen, which should consume less memory, but may make the operation take longer.
Another option would be to run the code in batches, so use your code above, but populate List<Customer> customers in batches of, e.g., 10,000 at a time.

As #RedFilter said, try using IEnumerable along with the yield statement.
This may help:
http://csharpindepth.com/Articles/Chapter11/StreamingAndIterators.aspx
http://www.alteridem.net/2007/08/22/the-yield-statement-in-c/

Looping through a list in memory that is allready loaded in memory, you do not change the amount of memory that the list is using.
It must be something that you are doing to the items in the list that is causing the memory usage.
You need to look at what you are trying to achieve and redesign your program to not have all data in memory at the same time.

If you mean you want to reduce the memory usage, then the basic answer is to break it up.
So get all the customers for one agency into a CustomersForAgency collection,then process just that.
Clearing or letting the CustomersForAgency collection got out of scope, will take all those customers and (optionally that agency) out of scope allowing .net to reuse the memory.
That's assuming of course that teh bulk of the memory allocation is for Customers, and not other persistent instances used for processing, you simplified out.

C#: Is there any way to easily find/update all references to an object?

I've been reading Rockford Lhotka's "Expert C# 2008 Business Objects", where there is such a thing as a data portal which nicely abstracts where the data comes from. When using the DataPortal.Update(this), which as you might guess persists 'this' to the database, an object is returned - the persisted 'this' with any changes the db made to it, eg. a timestamp.
Lhotka has written often and very casually, that you have to make sure to update all references to the old object to the new returned object. Makes sense, but is there an easy way to find all references to the old object and change them? Obviously the GC tracks references, is it possible to tap into that?
Cheers

There are profiling API's to do this but nothing for general consumption. One possible solution and one which I've used myself is to implement in a base class a tracking mechanism where each instance of the object adds a WeakReference to itself to a static collection.
I have this conditionally compiled for DEBUG builds but it probably wouldn't be a good idea to rely on this in a release build.
// simplified example
// do not use. performance would suck
abstract class MyCommonBaseClass {
static readonly List<WeakReference> instances = new List<WeakReference>();
protected MyCommonBaseClass() {
lock (instances) {
RemoveDeadOnes();
instances.Add(new WeakReference(this));
}
}
}

The GC doesn't actually track the references to the objects. Instead, it calculates which objects are reachable starting from global and stack objects at the runtime, and executing some variant of "flood fill" algorithm.
Specifically for your problem, why not just have a proxy holding reference to the "real" object? This way you need to update at only one place.

There isn't a simple way to do this directly, however, Son of Strike has this capability. It allows you to delve into all object references tracked by the CLR, and look at what objects are referencing any specific object, etc.
Here is a good tutorial for learning CLR debugging via SoS.

If you are passing object references around and those object references remain unchanged, then any changes made to the object in a persistence layer will be instantly visible to any other consumers of the object. However if your object is crossing a service boundary then the assemblies on each side of the object will be viewing different objects that are just carbon copies. Also if you have made clones of the object, or have created anonymous types that incorporate properties from the original object, then those will be tough to track down - and of course to the GC these are new objects that have no tie-in to the original object.
If you have some sort of key or ID in the object then this becomes easier. The key doesn't have to be a database ID, it can be a GUID that is new'ed up when the object is instantiated, and does not get changed for the entire lifecycle of the object (i.e. it is a property that has a getter but no setter) - as it is a property it will persist across service boundaries, so your object will still be identifiable. You can then use LINQ or even old-fashioned loops (icky!) to iterate through any collection that is likely to hold a copy of the updated object, and if one is found you can then merge the changes back in.
Having said this, i wouldn't think that you have too many copies floating around. IF you do then the places where these copies are should be very localized. Ensuring that your object implements INotifyPropertyChanged will also help propagate notifications of changes if you hold a list in one spot which is then bound to directly or indirectly in several other spots.

how does linq2sql keep track of database objects?

When using Linq2sql everything automagically works. My experience is that going with the flow is not always the best solution and to understand how something internally works is better so you use the technique optimally.
So, my question is about linq2sql.
If I do a query and get some database objects, or I create a new one, somehow the linqcontext object keeps references to these objects. If something changes in one of the objects, the context object 'knows' what has changed and needs updating.
If my references to the object are set to null, does this mean that the context object also removes it's link to this object? Or is the context object slowly getting filled with tons of references, and keeping my database objects from garbage collecting?
If not, how does this work??
Also, is it not very slow for the database object to always go through the entire list to see what changed and to update it?
Any insight in how this works would be excellent!
thanks

yes, the context keeps references of the loaded objects. That's one of the reasons why it isn't meant to be used with a single instance shared accross the different requests.
It keeps lists for the inserts/deletes. I am not sure if it captures update adding those to a list, or it loops at the end. But, u shouldn't be loading large sets of data at a time, because that alone would be a bigger hit to performance than any last check it might do on the list.

The DataContext registers to your objects PropertyChanged event to know when it is modified. At this point it clones the original object and keeps it to compare the 2 objects together later when you do your SubmitChanges().
If my references to the object are set to null, does this mean that the context object also removes it's link to this object?
Edit: No. Sorry for my original answer I had misinterpreted what you had written. In that case the data context still has a reference to both object but will remove the relationship with those 2 objects on next SubmitChanges().
Be careful though. If you created your own objects instead of using the ones generated from the .dbml, the "magic" that the datacontext performs might not work properly.

(Commerce Server) How to get the PageGroup[] without instancing a new MarketingContext object

If you take a look at Tom Schultz's Blog, you'll see that he says that if you instance your own Context objects (such as the CommerceContext object), an instance of the SiteConfigReadOnlyFreeThreaded class created in memory as well, and you can't do anything to destroy it. If you do this enough times, you'll eventually get warnings in your Application Log. Here's what the warning looks like:
The Commerce Server runtime has detected that more than # instances of the SiteConfigReadOnlyFreeThreaded object have been created. Creating many SiteConfigReadOnlyFreeThreaded instances will negatively affect the performance of the site. Please refer to the Commerce Server documentation for the recommended use of the SiteConfigReadOnlyFreeThreaded object.
You'll also see that Tom says to use the the Current property of the Context objects to avoid this error, much like this:
ContentSelector cso = CommerceContext.Current.TargetingSystem.SelectionContexts["advertising"].GetSelector();
Doing so re-uses the same singleton instance to avoid re-creating the SiteConfigReadOnlyFreeThreaded object every time you instance a new CommerceContext class.
With me so far? Good :)
Here's what I'm really trying to do: Get a list of all of the Page Groups set up in the Marketing section of Commerce Server. As far as my knowledge goes, here's the only way to do it:
using (MarketingContext ctx = MarketingContext.Create("MyCommerceSite", "MyMarketingAuthorizationStore", AuthorizationMode.NoAuthorization))
{
PageGroup[] pageGroups = ctx.PageGroups.GetAllPageGroups();
}
As you can see, I'm creating a MarketingContext class, which also creates a SiteConfigReadOnlyFreeThreaded in memory as well, each time it's called (which happens to be frequently).
Is there a way to get the list of all page groups configured without instancing an entirely new MarketingContext object each time I want to do it?

I did some digging, and found the following:
By default, Microsoft sets the threshold for these warnings even cropping up in the error log to 100. It turns out that these warnings are absolutely benign if they are staying under 100 in count consistently.
In my case, I had set the threshold for showing errors to 2, just to show every instance that cropped up, regardless of whether it was a valid concern or not. I've sense upped the limit back to 100, and I haven't seen any adverse affects thus far.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.