C# Out of Memory retrieving entity details from Dynamics 365 - c#

Since I have millions of records in Dynamics 365 and I need to retrieve them all, I am using paging to retrieve the entity data:
// Query using the paging cookie.
// Define the paging attributes.
// The number of records per page to retrieve.
int queryCount = 5000;
// Initialize the page number.
int pageNumber = 1;
// Initialize the number of records.
int recordCount = 0;
// Define the condition expression for retrieving records.
ConditionExpression pagecondition = new ConditionExpression();
pagecondition.AttributeName = "parentaccountid";
pagecondition.Operator = ConditionOperator.Equal;
pagecondition.Values.Add(_parentAccountId);
// Define the order expression to retrieve the records.
OrderExpression order = new OrderExpression();
order.AttributeName = "name";
order.OrderType = OrderType.Ascending;
// Create the query expression and add condition.
QueryExpression pagequery = new QueryExpression();
pagequery.EntityName = "account";
pagequery.Criteria.AddCondition(pagecondition);
pagequery.Orders.Add(order);
pagequery.ColumnSet.AddColumns("name", "emailaddress1");
// Assign the pageinfo properties to the query expression.
pagequery.PageInfo = new PagingInfo();
pagequery.PageInfo.Count = queryCount;
pagequery.PageInfo.PageNumber = pageNumber;
// The current paging cookie. When retrieving the first page,
// pagingCookie should be null.
pagequery.PageInfo.PagingCookie = null;
Console.WriteLine("Retrieving sample account records in pages...\n");
Console.WriteLine("#\tAccount Name\t\tEmail Address");
while (true)
{
// Retrieve the page.
EntityCollection results = _serviceProxy.RetrieveMultiple(pagequery);
if (results.Entities != null)
{
// Retrieve all records from the result set.
foreach (Account acct in results.Entities)
{
Console.WriteLine("{0}.\t{1}\t{2}", ++recordCount, acct.Name,
acct.EMailAddress1);
}
}
// Check for more records, if it returns true.
if (results.MoreRecords)
{
Console.WriteLine("\n****************\nPage number {0}\n****************", pagequery.PageInfo.PageNumber);
Console.WriteLine("#\tAccount Name\t\tEmail Address");
// Increment the page number to retrieve the next page.
pagequery.PageInfo.PageNumber++;
// Set the paging cookie to the paging cookie returned from current results.
pagequery.PageInfo.PagingCookie = results.PagingCookie;
}
else
{
// If no more records are in the result nodes, exit the loop.
break;
}
}
After a couple of hours of pulling data, the program ends saying OutofMemory exception. It would appear that memory usage grows with every new page. How does one clear the memory usage to avoid this problem?

It's good that you are retrieving only a couple columns. Getting all the columns would certainly increase memory usage.
They way that you're processing each page it would seem that the runtime should release each results object after you output it to the console.
It seems that something might be causing each results objects to stay on the heap.
Maybe it's the reuse of the same pagequery object.
The only thing that looks like it persists from the results is the paging cookie.
pagequery.PageInfo.PagingCookie = results.PagingCookie;
Maybe try moving the creation of the QueryExpression into a method and create a new one for each page.
For example:
private QueryExpression getQuery(int pageNum, string pagingCookie)
{
//all query creation code
}
What is going on with the condition for the parentaccountid? Does a single parent have millions of child accounts?
You might also want to consider tweaking the while loop:
var continue = true;
while(continue)
{
//process
continue = results.MoreRecords;
}

Related

Find TOP (1) for each ID in array

I have a large (60m+) document collection, whereby each ID has many records in time series. Each record has an IMEI identifier, and I'm looking to select the most recent record for each IMEI in a given List<Imei>.
The brute force method is what is currently happening, whereby I create a loop for each IMEI and yield out the top most record, then return a complete collection after the loop completes. As such:
List<BsonDocument> documents = new List<BsonDocument>();
foreach(var config in imeiConfigs)
{
var filter = GetImeiFilter(config.IMEI);
var sort = GetImeiSort();
var data = _historyCollection.Find(filter).Sort(sort).Limit(1).FirstOrDefault();
documents.Add(data);
}
The end result is a List<BsonDocument> which contains the most recent BsonDocument for each IMEI, but it's not massively performant. If imeiConfigs is too large, the query takes a long time to run and return as the documents are rather large.
Is there a way to select the TOP 1 for each IMEI in a single query, as opposed to brute forcing like I am above?
have tried using the LINQ Take function?
List documents = new List();
foreach(var config in imeiConfigs)
{
var filter = GetImeiFilter(config.IMEI);
var sort = GetImeiSort();
var data = _historyCollection.Find(filter).Sort(sort).Take(1).FirstOrDefault();
documents.Add(data);
}
https://learn.microsoft.com/es-es/dotnet/api/system.linq.enumerable.take?view=netframework-4.8
I think bad performance come from "Sort(sort)", because the sorting forces it to go through all the collection.
But perhaps you can improuve time performance with parallel.
List<BsonDocument> documents;
documents = imeiConfigs.AsParallel().Select((config) =>
{
var filter = GetImeiFilter(config.IMEI);
var sort = GetImeiSort();
var data = _historyCollection.Find(filter).Sort(sort).Limit(1).FirstOrDefault();
return data;
}).ToList();

Is a MongoDB bulk upsert possible? C# Driver

I'd like to do a bulk upsert in Mongo. Basically I'm getting a list of objects from a vendor, but I don't know which ones I've gotten before (and need to be updated) vs which ones are new. One by one I could do an upsert, but UpdateMany doesn't work with upsert options.
So I've resorted to selecting the documents, updating in C#, and doing a bulk insert.
public async Task BulkUpsertData(List<MyObject> newUpsertDatas)
{
var usernames = newUpsertDatas.Select(p => p.Username);
var filter = Builders<MyObject>.Filter.In(p => p.Username, usernames);
//Find all records that are in the list of newUpsertDatas (these need to be updated)
var collection = Db.GetCollection<MyObject>("MyCollection");
var existingDatas = await collection.Find(filter).ToListAsync();
//loop through all of the new data,
foreach (var newUpsertData in newUpsertDatas)
{
//and find the matching existing data
var existingData = existingDatas.FirstOrDefault(p => p.Id == newUpsertData.Id);
//If there is existing data, preserve the date created (there are other fields I preserve)
if (existingData == null)
{
newUpsertData.DateCreated = DateTime.Now;
}
else
{
newUpsertData.Id = existingData.Id;
newUpsertData.DateCreated = existingData.DateCreated;
}
}
await collection.DeleteManyAsync(filter);
await collection.InsertManyAsync(newUpsertDatas);
}
Is there a more efficient way to do this?
EDIT:
I did some speed tests.
In preparation I inserted 100,000 records of a pretty simple object. Then I upserted 200,000 records into the collection.
Method 1 is as outlined in the question. SelectMany, update in code, DeleteMany, InsertMany. This took approximately 5 seconds.
Method 2 was making a list of UpdateOneModel with Upsert = true and then doing one BulkWriteAsync. This was super slow. I could see the count in the mongo collection increasing so I know it was working. But after about 5 minutes it had only climbed to 107,000 so I canceled it.
I'm still interested if anyone else has a potential solution
Given that you've said you could do a one-by-one upsert, you can achieve what you want with BulkWriteAsync. This allows you to create one or more instances of the abstract WriteModel, which in your case would be instances of UpdateOneModel.
In order to achieve this, you could do something like the following:
var listOfUpdateModels = new List<UpdateOneModel<T>>();
// ...
var updateOneModel = new UpdateOneModel<T>(
Builders<T>.Filter. /* etc. */,
Builders<T>.Update. /* etc. */)
{
IsUpsert = true;
};
listOfUpdateModels.Add(updateOneModel);
// ...
await mongoCollection.BulkWriteAsync(listOfUpdateModels);
The key to all of this is the IsUpsert property on UpdateOneModel.

How to retrieve records more than 4000 from Raven DB in SIngle Session [duplicate]

I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}
It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.
The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default
The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging
Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.

ActiveDirectory with Range not changing results using DirectorySearcher

So I'm basically trying to enumerate results from AD, and for some reason I'm unable to pull down new results, meaning it keeps continuously pulling the first 1500 results even though I tell it I want an additional range.
Can someone point out where I'm making the mistake? The code never breaks out of the loop but more importantly it pulls users 1-1500 even when I say I want users 1500-3000.
uint rangeStep = 1500;
uint rangeLow = 0;
uint rangeHigh = rangeLow + (rangeStep - 1);
bool lastQuery = false;
bool quitLoop = false;
do
{
string attributeWithRange;
if (!lastQuery)
{
attributeWithRange = String.Format("member;Range={0}-{1}", rangeLow, rangeHigh);
}
else
{
attributeWithRange = String.Format("member;Range={0}-*", rangeLow);
}
DirectoryEntry dEntryhighlevel = new DirectoryEntry("LDAP://OU=C,OU=x,DC=h,DC=nt");
DirectorySearcher dSeacher = new DirectorySearcher(dEntryhighlevel,"(&(objectClass=user)(memberof=CN=Users,OU=t,OU=s,OU=x,DC=h,DC=nt))",new string[] {attributeWithRange});
dSeacher.PropertiesToLoad.Add("givenname");
dSeacher.PropertiesToLoad.Add("sn");
dSeacher.PropertiesToLoad.Add("samAccountName");
dSeacher.PropertiesToLoad.Add("mail");
dSeacher.PageSize = 1500;
SearchResultCollection resultCollection = resultCollection = dSeacher.FindAll();
dSeacher.Dispose();
foreach (SearchResult userResults in resultCollection)
{
string Last_Name = userResults.Properties["sn"][0].ToString();
string First_Name = userResults.Properties["givenname"][0].ToString();
string userName = userResults.Properties["samAccountName"][0].ToString();
string Email_Address = userResults.Properties["mail"][0].ToString();
OriginalList.Add(Last_Name + "|" + First_Name + "|" + userName + "|" + Email_Address);
}
if(resultCollection.Count == 1500)
{
lastQuery = true;
rangeLow = rangeHigh + 1;
rangeHigh = rangeLow + (rangeStep - 1);
}
else
{
quitLoop = true;
}
}
while (!quitLoop);
You're mixing up two concepts which is what is causing you trouble. This is a FAQ on the SO forums so I probably should blog on this to try and clear things up.
Let me first just explain the concepts, then correct the code once the concepts are out there.
Concept one is fetching large collections of objects. When you fetch a lot of objects, you need to ask for them in batches. This is typically called "paging" through the results. When you do this you'll get back a paging cookie and can pass back the paged control in subsequent searches to keep getting a "page worth" of results with each pass.
The second concept is fetching large numbers of values from a single attribute. The simple example of this is reading the member attribute from a group (ex: doing a base search for that group). This is called "ranged retrieval." In this search mode you are doing a base search against that object for the large attribute (like member) and asking for "ranges" of values with each passing search.
The code above confuses these concepts. You are doing member range logic like you are doing range retrieval but you are in fact doing a search that is constructed to return a large # of objects like a paged search. This is why you are getting the same results over and over.
To fix this you need to first pick an approach. :) I recommend range retrieval against the group object and asking for the large member attribute in ranges. This will get you all of the members in the group.
If you go down this path, you'll notice you can't ask for attributes for these values. The only vlaue you get is the list of members, and you can then do searches for them. IF you opt to stay with paged searches like you have above, then you end up switching to paged searches.
If you opt to stick with paged searches, then you'll need to:
Get rid of the Range logic, and all mentions of 1500
Set a page size of something like 1000
Instead of ranging, look up how to do paged searches (using the page search control) using your API
If you pick ranging, you'll switch from a memberOf search like this to a search of the form:
a) scope: base
b) filter: (objectclass=)
c) base DN: OU=C,OU=x,DC=h,DC=nt
d) Attributes: member;Range=0-
...then you will increment the 0 up as you fetch ranges of values (ie do this search over and over again for each subsequent range of values, changing only the 0 to subsequent integers)
Other nits you'll notice in my logic:
- I don't set page size...you're not doing a paged search, so it doesn't matter.
- I dont' ever hard code the value 1500 here. It doesn't matter. Ther eis no value in knowing or even computing this. The point is that you asked for 0-* (ie all), you got back 1500, so then you say 1500-, then 3000-, and so on. You don't need to knwo the range size, only what you have been given so far.
I hope this fully answers it...
Here is a code snip of doing a paged search, per my comment below (this is what you would need to do using the System.DirectoryServices.Protocols namespace classes, going down the logical path you started above (paged searches, not ranged retrieval)):
string searchFilter = "(&(objectClass=user)(memberof=CN=Users,OU=t,OU=s,OU=x,DC=h,DC=nt))";
string baseDN = "OU=C,OU=x,DC=h,DC=nt";
var scope = SearchScope.Subtree;
var attributeList = new string[] { "givenname", "sn", "samAccountName", "mail" };
PageResultRequestControl pageSearchControl = new PageResultRequestControl(1000);
do
{
SearchRequest sr = new SearchRequest(baseDN, searchFilter, scope, attributeList);
sr.Controls.Add(pageSearchControl);
var directoryResponse = ldapConnection.SendRequest(sr);
if (directoryResponse.ResultCode != ResultCode.Success)
{
// Handle error
}
var searchResponse = (SearchResponse)directoryResponse;
pageSearchControl = null; // Reset!
foreach (var control in searchResponse.Controls)
{
if (control is PageResultResponseControl)
{
var prrc = (PageResultResponseControl)control;
if (prrc.Cookie.Length > 0)
{
pageSearchControl = new PageResultRequestControl(prrc.Cookie);
}
}
}
foreach (var entry in searchResponse.Entries)
{
// Handle the search result entry
}
} while (pageSearchControl != null);
Your problem is caused by creating new object of directory searcher in loop. Each time there will be new object that will take first 1500 records. Create instance of searher out of the loop and use same instance for all queries.

Finalise SQLite 3 statement

I'm developing metro app using Windows 8 release preview and C#(VS 2012), I'm new to SQLite, I integrated SQLite 3.7.13 in my App and it is working fine, Observe my code below
var dbPath = Path.Combine(Windows.Storage.ApplicationData.Current.LocalFolder.Path, "Test.db");
using (var db = new SQLite.SQLiteConnection(dbPath))
{
var data = db.Table<tablename>().Where(tablename => tablename.uploaded_bool == false && tablename.Sid == 26);
try
{
int iDataCount = data.Count();
int id;
if (iDataCount > 0)
{
for (int i = 0; i < iDataCount; i++)
{
Elements = data.ElementAt(i);
id = Elements.id;
/*
Doing some code
*/
}
int i = db.Delete<tablename>(new tablename() { Sid = 26 });
}
}
catch (Exception ex)
{
}
}
where "Sid" is column in my database and with number "26" i will get n number of rows
So, using a for loop i need to do some code and after the for loop I need to delete records of Sid(26) in database, So at this line
int i = db.Delete<tablename>(new tablename() { Sid = 26 });
I'm getting unable to close due to unfinalised statements exception, So my question is how to finalise the statement in sqlite3,Apparently SQLite3 has a finalize method for destroying previous DB calls but I am not sure how to implement this. Please help me.
Under the covers sqlite-net does some amazing things in an attempt to manage queries and connections for you.
For example, the line
var data = db.Table<tablename>().Where(...)
Does not actually establish a connection or execute anything against the database. Instead, it creates an instance of a class called TableQuery which is enumerable.
When you call
int iDataCount = data.Count();
TableQuery actually executes
GenerateCommand("count(*)").ExecuteScalar<int>();
When you call
Elements = data.ElementAt(i);
TableQuery actually calls
return Skip(index).Take(1).First();
Take(1).First() eventually calls GetEnumerator, which compiles a SQLite command, executes it with TOP 1, and serializes the result back into your data class.
So, basically, every time you call data.ElementAt you are executing another query. This is different from standard .NET enumerations where you are just accessing an element in a collection or array.
I believe this is the root of your problem. I would recommend that instead of getting the count and using a for(i, ...) loop, you simply do foreach (tablename tn in data). This will cause all records to be fetched at once instead of record by record in the loop. That alone may be enough to close the query and allow you to delete the table during the loop. If not, I recommend you create a collection and add each SID to the collection during the loop. Then, after the loop go back and remove the sids in another pass. Worst case scenario, you could close the connection between loops.
Hope that helps.

Categories