So I'm basically trying to enumerate results from AD, and for some reason I'm unable to pull down new results, meaning it keeps continuously pulling the first 1500 results even though I tell it I want an additional range.
Can someone point out where I'm making the mistake? The code never breaks out of the loop but more importantly it pulls users 1-1500 even when I say I want users 1500-3000.
uint rangeStep = 1500;
uint rangeLow = 0;
uint rangeHigh = rangeLow + (rangeStep - 1);
bool lastQuery = false;
bool quitLoop = false;
do
{
string attributeWithRange;
if (!lastQuery)
{
attributeWithRange = String.Format("member;Range={0}-{1}", rangeLow, rangeHigh);
}
else
{
attributeWithRange = String.Format("member;Range={0}-*", rangeLow);
}
DirectoryEntry dEntryhighlevel = new DirectoryEntry("LDAP://OU=C,OU=x,DC=h,DC=nt");
DirectorySearcher dSeacher = new DirectorySearcher(dEntryhighlevel,"(&(objectClass=user)(memberof=CN=Users,OU=t,OU=s,OU=x,DC=h,DC=nt))",new string[] {attributeWithRange});
dSeacher.PropertiesToLoad.Add("givenname");
dSeacher.PropertiesToLoad.Add("sn");
dSeacher.PropertiesToLoad.Add("samAccountName");
dSeacher.PropertiesToLoad.Add("mail");
dSeacher.PageSize = 1500;
SearchResultCollection resultCollection = resultCollection = dSeacher.FindAll();
dSeacher.Dispose();
foreach (SearchResult userResults in resultCollection)
{
string Last_Name = userResults.Properties["sn"][0].ToString();
string First_Name = userResults.Properties["givenname"][0].ToString();
string userName = userResults.Properties["samAccountName"][0].ToString();
string Email_Address = userResults.Properties["mail"][0].ToString();
OriginalList.Add(Last_Name + "|" + First_Name + "|" + userName + "|" + Email_Address);
}
if(resultCollection.Count == 1500)
{
lastQuery = true;
rangeLow = rangeHigh + 1;
rangeHigh = rangeLow + (rangeStep - 1);
}
else
{
quitLoop = true;
}
}
while (!quitLoop);
You're mixing up two concepts which is what is causing you trouble. This is a FAQ on the SO forums so I probably should blog on this to try and clear things up.
Let me first just explain the concepts, then correct the code once the concepts are out there.
Concept one is fetching large collections of objects. When you fetch a lot of objects, you need to ask for them in batches. This is typically called "paging" through the results. When you do this you'll get back a paging cookie and can pass back the paged control in subsequent searches to keep getting a "page worth" of results with each pass.
The second concept is fetching large numbers of values from a single attribute. The simple example of this is reading the member attribute from a group (ex: doing a base search for that group). This is called "ranged retrieval." In this search mode you are doing a base search against that object for the large attribute (like member) and asking for "ranges" of values with each passing search.
The code above confuses these concepts. You are doing member range logic like you are doing range retrieval but you are in fact doing a search that is constructed to return a large # of objects like a paged search. This is why you are getting the same results over and over.
To fix this you need to first pick an approach. :) I recommend range retrieval against the group object and asking for the large member attribute in ranges. This will get you all of the members in the group.
If you go down this path, you'll notice you can't ask for attributes for these values. The only vlaue you get is the list of members, and you can then do searches for them. IF you opt to stay with paged searches like you have above, then you end up switching to paged searches.
If you opt to stick with paged searches, then you'll need to:
Get rid of the Range logic, and all mentions of 1500
Set a page size of something like 1000
Instead of ranging, look up how to do paged searches (using the page search control) using your API
If you pick ranging, you'll switch from a memberOf search like this to a search of the form:
a) scope: base
b) filter: (objectclass=)
c) base DN: OU=C,OU=x,DC=h,DC=nt
d) Attributes: member;Range=0-
...then you will increment the 0 up as you fetch ranges of values (ie do this search over and over again for each subsequent range of values, changing only the 0 to subsequent integers)
Other nits you'll notice in my logic:
- I don't set page size...you're not doing a paged search, so it doesn't matter.
- I dont' ever hard code the value 1500 here. It doesn't matter. Ther eis no value in knowing or even computing this. The point is that you asked for 0-* (ie all), you got back 1500, so then you say 1500-, then 3000-, and so on. You don't need to knwo the range size, only what you have been given so far.
I hope this fully answers it...
Here is a code snip of doing a paged search, per my comment below (this is what you would need to do using the System.DirectoryServices.Protocols namespace classes, going down the logical path you started above (paged searches, not ranged retrieval)):
string searchFilter = "(&(objectClass=user)(memberof=CN=Users,OU=t,OU=s,OU=x,DC=h,DC=nt))";
string baseDN = "OU=C,OU=x,DC=h,DC=nt";
var scope = SearchScope.Subtree;
var attributeList = new string[] { "givenname", "sn", "samAccountName", "mail" };
PageResultRequestControl pageSearchControl = new PageResultRequestControl(1000);
do
{
SearchRequest sr = new SearchRequest(baseDN, searchFilter, scope, attributeList);
sr.Controls.Add(pageSearchControl);
var directoryResponse = ldapConnection.SendRequest(sr);
if (directoryResponse.ResultCode != ResultCode.Success)
{
// Handle error
}
var searchResponse = (SearchResponse)directoryResponse;
pageSearchControl = null; // Reset!
foreach (var control in searchResponse.Controls)
{
if (control is PageResultResponseControl)
{
var prrc = (PageResultResponseControl)control;
if (prrc.Cookie.Length > 0)
{
pageSearchControl = new PageResultRequestControl(prrc.Cookie);
}
}
}
foreach (var entry in searchResponse.Entries)
{
// Handle the search result entry
}
} while (pageSearchControl != null);
Your problem is caused by creating new object of directory searcher in loop. Each time there will be new object that will take first 1500 records. Create instance of searher out of the loop and use same instance for all queries.
Related
Since I have millions of records in Dynamics 365 and I need to retrieve them all, I am using paging to retrieve the entity data:
// Query using the paging cookie.
// Define the paging attributes.
// The number of records per page to retrieve.
int queryCount = 5000;
// Initialize the page number.
int pageNumber = 1;
// Initialize the number of records.
int recordCount = 0;
// Define the condition expression for retrieving records.
ConditionExpression pagecondition = new ConditionExpression();
pagecondition.AttributeName = "parentaccountid";
pagecondition.Operator = ConditionOperator.Equal;
pagecondition.Values.Add(_parentAccountId);
// Define the order expression to retrieve the records.
OrderExpression order = new OrderExpression();
order.AttributeName = "name";
order.OrderType = OrderType.Ascending;
// Create the query expression and add condition.
QueryExpression pagequery = new QueryExpression();
pagequery.EntityName = "account";
pagequery.Criteria.AddCondition(pagecondition);
pagequery.Orders.Add(order);
pagequery.ColumnSet.AddColumns("name", "emailaddress1");
// Assign the pageinfo properties to the query expression.
pagequery.PageInfo = new PagingInfo();
pagequery.PageInfo.Count = queryCount;
pagequery.PageInfo.PageNumber = pageNumber;
// The current paging cookie. When retrieving the first page,
// pagingCookie should be null.
pagequery.PageInfo.PagingCookie = null;
Console.WriteLine("Retrieving sample account records in pages...\n");
Console.WriteLine("#\tAccount Name\t\tEmail Address");
while (true)
{
// Retrieve the page.
EntityCollection results = _serviceProxy.RetrieveMultiple(pagequery);
if (results.Entities != null)
{
// Retrieve all records from the result set.
foreach (Account acct in results.Entities)
{
Console.WriteLine("{0}.\t{1}\t{2}", ++recordCount, acct.Name,
acct.EMailAddress1);
}
}
// Check for more records, if it returns true.
if (results.MoreRecords)
{
Console.WriteLine("\n****************\nPage number {0}\n****************", pagequery.PageInfo.PageNumber);
Console.WriteLine("#\tAccount Name\t\tEmail Address");
// Increment the page number to retrieve the next page.
pagequery.PageInfo.PageNumber++;
// Set the paging cookie to the paging cookie returned from current results.
pagequery.PageInfo.PagingCookie = results.PagingCookie;
}
else
{
// If no more records are in the result nodes, exit the loop.
break;
}
}
After a couple of hours of pulling data, the program ends saying OutofMemory exception. It would appear that memory usage grows with every new page. How does one clear the memory usage to avoid this problem?
It's good that you are retrieving only a couple columns. Getting all the columns would certainly increase memory usage.
They way that you're processing each page it would seem that the runtime should release each results object after you output it to the console.
It seems that something might be causing each results objects to stay on the heap.
Maybe it's the reuse of the same pagequery object.
The only thing that looks like it persists from the results is the paging cookie.
pagequery.PageInfo.PagingCookie = results.PagingCookie;
Maybe try moving the creation of the QueryExpression into a method and create a new one for each page.
For example:
private QueryExpression getQuery(int pageNum, string pagingCookie)
{
//all query creation code
}
What is going on with the condition for the parentaccountid? Does a single parent have millions of child accounts?
You might also want to consider tweaking the while loop:
var continue = true;
while(continue)
{
//process
continue = results.MoreRecords;
}
I followed the solution from this question How can I get a list of users from active directory? and am able to get a list of users from the AD. The issue i am having is that it takes 35 seconds to load all of the records.
There must be a more efficient way to query all of the data in one go rather than having to wait 35 seconds for it to return 700+ records. I have written a method to return the list of users. I have put some additional code to try and filter out any users that aren't accounts for humans.
public List<ActiveUser> GetActiveDirectoryUsers()
{
List<ActiveUser> response = new List<ActiveUser>();
using (var context = new PrincipalContext(ContextType.Domain, "mydomain"))
{
using (var searcher = new PrincipalSearcher(new UserPrincipal(context)))
{
foreach (var result in searcher.FindAll())
{
DirectoryEntry de = result.GetUnderlyingObject() as DirectoryEntry;
if (de.NativeGuid != null && !Convert.ToBoolean((int)de.Properties["userAccountControl"].Value & 0x0002) &&
de.Properties["department"].Value != null && de.Properties["sn"].Value != null) response.Add(new ActiveUser(de));
}
}
}
return response.OrderBy(x => x.DisplayName).ToList();
}
The constructor for the ActiveUser just takes entry.property["whataver"] and assigns it to a property of that class. The overhead seems to be on the line for
DirectoryEntry de = result.GetUnderlyingObject() as DirectoryEntry;
I could cache the list of users to a file, but its still too much to be dealing with over 30 seconds of loading for one list. There has to be a faster way to do it.
I had a crack at this using a number of different approaches, as a learning experience.
What I found for myself was that all methods could list a set of adspath values pretty quickly, but once introducing a Console.WriteLine in the iteration caused the performance to drastically vary.
My limited C# knowledge led me to experiment with various methods such as IEnumerator straight over the DirectoryEntry, PrincipleSearcher with context, but both of these methods are slow, and vary greatly depending on what is being done with the information
In the end, this is what I ended up with. It was far and away the fastest, and doesn't take any noticeable performance hit when increasing the options to parse.
Note: this is actually a complete copy/paste powershell wrapper for the class, as I am not currently near a VM with Visual Studio.
$Source = #"
// " " <-- this just makes the code highlighter work
// Syntax: [soexample.search]::Get("LDAP Path", "property1", "property2", "etc...")
// Example: [soexample.search]::Get("LDAP://CN=Users,DC=mydomain,DC=com","givenname","sn","samaccountname","distinguishedname")
namespace soexample
{
using System;
using System.DirectoryServices;
public static class search
{
public static string Get(string ldapPath, params string[] propertiesToLoad)
{
DirectoryEntry entry = new DirectoryEntry(ldapPath);
DirectorySearcher searcher = new DirectorySearcher(entry);
searcher.SearchScope = SearchScope.OneLevel;
foreach (string p in propertiesToLoad) { searcher.PropertiesToLoad.Add(p); }
searcher.PageSize = 100;
searcher.SearchRoot = entry;
searcher.CacheResults = true;
searcher.Filter = "(sAMAccountType=805306368)";
SearchResultCollection results = searcher.FindAll();
foreach (SearchResult result in results)
{
foreach (string propertyName in propertiesToLoad)
{
foreach (object propertyValue in result.Properties[propertyName])
{
Console.WriteLine(string.Format("{0} : {1}", propertyName, propertyValue));
}
}
Console.WriteLine("");
}
return "";
}
}
}
"#
$Asem = ('System.DirectoryServices','System')
Add-Type -TypeDefinition $Source -Language CSharp -ReferencedAssemblies $Asem
I ran this on a particular domain that had 160 users, and here is the outcome;
Using the example command in the code comments:
PS > Measure-Command { [soexample.search]::Get(args as above..) }
Output:
givenname : John
sn : Surname
samaccountname : john.surname
distinguishedname : CN=John Surname,CN=Users,DC=mydomain,DC=com
etc ... 159 more ...
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 431
Ticks : 4317575
TotalDays : 4.99719328703704E-06
TotalHours : 0.000119932638888889
TotalMinutes : 0.00719595833333333
TotalSeconds : 0.4317575
TotalMilliseconds : 431.7575
Every additional string argument given, seems to increase the total processing time by about 100ms.
Running it with only samaccountname takes only 0.1s to list 160 users, parsed into the console.
Using Microsoft's example here, and modifying it to just list one property, took over 3 seconds, and every additional property took about a second.
A couple notes:
(sAMAccountType=805306368) turns out to be more efficient than (&(objectClass=user)(objectCategory=person)) (see https://stackoverflow.com/a/10053397/3544399) and many other examples
searcher.CacheResults = true; didn't seem to make any difference (in my domain anyway) whether it was true or explicitly false.
searcher.PageSize = 100; makes a measurable difference. I believe the default MaxPageSize on a 2012R2 DC is 1000 (https://technet.microsoft.com/en-us/library/cc770976(v=ws.11).aspx)
The properties are not case sensitive (i.e. whatever is given to the searcher is returned in result.Properties.PropertyNames, hence why the foreach loop simply iterates those propertiesToLoad)
The three foreach loops at first glance seem un-necessary, but every successful removal of a loop ended up costing much more overhead in cast conversions and running through method extensions.
There may be better ways still, I've seen some elaborate examples with threading and result caching that I just wouldn't know what to do with, but the tuned DirectorySearcher does seem to be the most flexible, and this code here only requires System and System.DirectoryServices namespaces.
Not sure exactly what you do with your "//do stuff" as to whether this would help or not, but I did find this an interesting exercise as I didn't know there were so many ways to do something like this.
To give an update, i have found a partial work around. It is faster than the method above, but it is missing some additional data. From what ive read, the method in the question is the equivalent of doing something like
SELECT id FROM sometable
foreach row in table
SELECT * FROM sometable where id = ?
So it's clear to see why it is slow. The following method executes in under a second and gives me all of the properties i need. A separate call to the directory entry needs to be made in order to obtain that data, but this is quite easy to achieve as it is possible to grab just one user if you provide some search params.
Here is an updated method that is more efficient.
DirectoryEntry de = new DirectoryEntry("ldap://mydomain");
using (DirectorySearcher search = new DirectorySearcher())
{
search.Filter = "(&(objectClass=user)(objectCategory=person))";
search.PropertiesToLoad.Add("userAccountControl");
search.PropertiesToLoad.Add("sn");
search.PropertiesToLoad.Add("department");
search.PropertiesToLoad.Add("l");
search.PropertiesToLoad.Add("title");
search.PropertiesToLoad.Add("givenname");
search.PropertiesToLoad.Add("co");
search.PropertiesToLoad.Add("displayName");
search.PropertiesToLoad.Add("distinguishedName");
foreach (SearchResult searchrecord in search.FindAll())
{
//do stuff
}
}
I am attempting to pull every user from Active Directory. I am using this method currently:
DirectorySearcher search = new DirectorySearcher();
search.Filter = "(objectClass=user)";
foreach (SearchResult result in search.FindAll())
{
if(result.Properties["mail"].Count > 0 && result.Properties["displayName"].Count > 0)
{
emailAddresses.Add(new EmailDetails
{
EmailAddress = result.Properties["mail"][0].ToString(),
EmailDisplayName = result.Properties["displayName"][0].ToString()
});
}
}
This is only givnig me around 3/4 of the names I am expecting. It is for one leaving me out.... So I got curious and added a new filter to see if I could pull myself by changing the filter to this:
search.Filter = "(&(objectClass=user)(sn=za*))";
This did in fact pull me in correctly, I basically am forcing it to pull me in by setting the filter to search for every last name that starts with za. But why is the first search filter I am using not pulling all of the users in?
why is the first search filter I am using not pulling all of the users in?
Most likely because SizeLimit kicks in at 1000 records. Set a PageSize to enable result paging.
Doing .FindAll() with no filter to speak of and then filtering the results on the client is silly. Write a proper filter.
var search = new DirectorySearcher();
search.Filter = "(&(objectClass=user)(mail=*)(displayName=*))";
search.PageSize = 1000; // see 1.
using (var results = searcher.FindAll()) { // see 2.
foreach (var result in results)
{
emailAddresses.Add(new EmailDetails
{
EmailAddress = result.Properties["mail"][0].ToString(),
EmailDisplayName = result.Properties["displayName"][0].ToString()
});
}
}
Small page size = faster results but more server round trips, larger page size = slower results but less server round trips. Pick a value that works for you.
You must dispose the SearchResultCollection manually, see "Remarks" in the MSDN documentation of DirectorySearcher.FindAll(). A using block will dispose the object properly.
I'm unable to get certain fields from user objects such as PasswordNeverExpires. Right now I'm cycling through every property returned by over 2000 users and my conditional breakpoint never breaks once, so I know it's not returning. If I break unconditionally the number of properties returned by this code is always 1.
Our sever is Windows 2003 Server. I can get all the information I want from NetEnum commands.
I've seen others claim that they can do this and I don't see what's different about my code. When I don't provide any properties to load, it grabs about 30-37 properties. Several of these properties I need and use.
public void FetchUsers(string domainId, Sql sql)
{
var entry = new DirectoryEntry("LDAP://" + DomainControllerAddress, DomainPrefixedUsername, Password,
AuthenticationType);
var dSearch = new DirectorySearcher(entry)
{
Filter = "(&(objectClass=user)(!(objectclass=computer)))",
SearchScope = SearchScope.Subtree,
PageSize = 1000,
};
dSearch.PropertiesToLoad.Add("passwordneverexpires");
var users = dSearch.FindAll();
foreach (SearchResult ldapUser in users)
{
SaveUser(ldapUser, sql, domainId);
}
}
private void SaveUser(SearchResult ldapUser, Sql sql, string domainId)
{
if (ldapUser.Properties.PropertyNames == null) return;
foreach (string propertyName in ldapUser.Properties.PropertyNames)
{
//I'm breaking here on the condition that propertyName != 'adspath' and it never breaks
var v = ldapUser.Properties[propertyName];
}
return;
}
Few things:
The base filter you have is very inefficient. Use this instead (&(objectCategory=person)(objectClass=user)).
There's no property called passwordneverexpires. You'll need to check bit 13 in the userAccountControl mask on the user - see http://msdn.microsoft.com/en-us/library/aa772300%28v=vs.85%29.aspx for a list of values.
You never break in to your loop because you're telling the client to only request one property.
You can use a filter like: (&(objectCategory=person)(objectClass=user)(userAccountControl:1.2.840.113556.1.4.803:=65536)) to get All users with the account configuration DONT_EXPIRE_PASSWORD.
-jim
I'm trying to achieve a super-fast search, and decided to rely heavily on caching to achieve this. The order of events is as follows;
1) Cache what can be cached (from entire database, around 3000 items)
2) When a search is performed, pull the entire result set out of the cache
3) Filter that result set based on the search criteria. Give each search result a "relevance" score.
4) Send the filtered results down to the database via xml to get the bits that can't be cached (e.g. prices)
5) Display the final results
This is all working and going at lightning speed, but in order to achieve (3) I've given each result a "relevance" score. This is just a member integer on each search result object. I iterate through the entire result set and update this score accordingly, then order-by it at the end.
The problem I am having is that the "relevance" member is retaining this value from search to search. I assume this is because what I am updating is a reference to the search results in the cache, rather than a new object, so updating it also updates the cached version. What I'm looking for is a tidy solution to get around this. What I've come up with so far is either;
a) Clone the cache when i get it.
b) Create a seperate dictionary to store relevances in and match them up at the end
Am I missing a really obvious and clean solution or should i go down one of these routes? I'm using C# and .net.
Hopefully it should be obvious from the description what I'm getting at, here's some code anyway; this first one is the iteration through the cached results in order to do the filtering;
private List<QuickSearchResult> performFiltering(string keywords, string regions, List<QuickSearchResult> cachedSearchResults)
{
List<QuickSearchResult> filteredItems = new List<QuickSearchResult>();
string upperedKeywords = keywords.ToUpper();
string[] keywordsArray = upperedKeywords.Split(' ');
string[] regionsArray = regions.Split(',');
foreach (var item in cachedSearchResults)
{
//Check for keywords
if (keywordsArray != null)
{
if (!item.ContainsKeyword(upperedKeywords, keywordsArray))
continue;
}
//Check for regions
if (regionsArray != null)
{
if (!item.IsInRegion(regionsArray))
continue;
}
filteredItems.Add(item);
}
return filteredItems.OrderBy(t=> t.Relevance).Take(_maxSearchResults).ToList<QuickSearchResult>();
}
and here is an example of the "IsInRegion" method of the QuickSearchResult object;
public bool IsInRegion(string[] regions)
{
int relevanceScore = 0;
foreach (var region in regions)
{
int parsedRegion = 0;
if (int.TryParse(region, out parsedRegion))
{
foreach (var thisItemsRegion in this.Regions)
{
if (thisItemsRegion.ID == parsedRegion)
relevanceScore += 10;
}
}
}
Relevance += relevanceScore;
return relevanceScore > 0;
}
And basically if i search for "london" i get a score of "10" the first time, "20" the second time...
If you use the NetDataContractSerializer to serialize your objects in the cache, you could use a [DataMember] attribute to control what gets serialized and what doesn't. For instance, you could store your temporarary calculated relevance value in a field that is not serialized.