Caching Active Directory Data - c#

In one of my applications, I am querying active directory to get a list of all users below a given user (using the "Direct Reports" thing). So basically, given the name of the person, it is looked up in AD, then the Direct Reports are read. But then for every direct report, the tool needs to check the direct reports of the direct reports. Or, more abstract: The Tool will use a person as the root of the tree and then walk down the complete tree to get the names of all the leaves (can be several hundred)
Now, my concern is obviously performance, as this needs to be done quite a few times. My idea is to manually cache that (essentially just put all the names in a long string and store that somewhere and update it once a day).
But I just wonder if there is a more elegant way to first get the information and then cache it, possibly using something in the System.DirectoryServices Namespace?

In order to take control over the properties that you want to be cached you can call 'RefreshCache()' passing the properties that you want to hang around:
System.DirectoryServices.DirectoryEntry entry = new System.DirectoryServices.DirectoryEntry();
// Push the property values from AD back to cache.
entry.RefreshCache(new string[] {"cn", "www" });

Active Directory is pretty efficient at storing information and the retrieval shouldn't be that much of a performance hit. If you are really intent on storing the names, you'll probably want to store them in some sort of a tree stucture, so you can see the relationships of all the people. Depending on how the number of people, you might as well pull all the information you need daily and then query all the requests against your cached copy.

AD does that sort of caching for you so don't worry about it unless performance becomes a problem. I have software doing this sort of thing all day long running on a corporate intranet that takes thousands of hits per hour and have never had to tune performance in this area.

Depends on how up to date you want the information to be. If you must have the very latest data in your report then querying directly from AD is reasonable. And I agree that AD is quite robust, a typical dedicated AD server is actually very lightly utilised in normal day to day operations but best to check with your IT department / support person.
An alternative is to have a daily script to dump the AD data into a CSV file and/or import it into a SQL database. (Oracle has a SELECT CONNECT BY feature that can automatically create multi-level hierarchies within a result set. MSSQL can do a similar thing with a bit of recursion IIRC).

Related

Query DB with the little things, or store some "bigger chunks" of results and filter them in code?

I'm working on an application that imports video files and lets the user browse them and filter them based on various conditions. By importing I mean creating instances of my VideoFile model class and storing them in a DB table. Once hundreds of files are there, the user wants to browse them.
Now, the first choice they have in the UI is to select a DateRecorded, which calls a GetFilesByDate(Date date) method on my data access class. This method will query the SQL database, asking only for files with the given date.
On top of that, I need to filter files by, let's say, FrameRate, Resolution or UserRating. This would place additional criteria on the files already filtered by their date. I'm deciding which road to take:
Only query the DB for a new set of files when the desired DateRecorded changes. Handle all subsequent filtering manually in C# code, by iterating over the stored collection of _filesForSelectedDay and testing them against current additional rules.
Query the DB each time any little filter changes, asking for a smaller and very specific set of files more often.
Which one would you choose, or even better, any thoughts on pros and cons of either of those?
Some additional points:
A query in GetFilesByDate is expected to return tens of items, so it's not very expensive to store the result in a collection always sitting in memory.
Later down the road I might want to select files not just for a specific day, but let's say for the entire month. This may give hundreds or thousands of items. This actually makes me lean towards option two.
The data access layer is not yet implemented. I just have a dummy class implementing the required interface, but storing the data in a in-memory collection instead of working with any kind of DB.
Once I'm there, I'll almost certainly use SQLite and store the database in a local file.
Personally I'd always go the DB every time until it proves impractical. If it's a small amount of data then the overhead should also be small. When it gets larger then the DB comes into its own. It's unlikely you will be able to write code better than the DB although the round trip can cost. Using the DB your data will always be consistent and up to date.
If you find you are hitting the BD too hard then you can try caching your data and working out if you already have some or all of the data being requested to save time. However then you have aging and consistency problems to deal with. You also then have servers with memory stuffed full of data that could be used for other things!
Basically, until it becomes an issue, just use the DB and use your energy on the actual problems you encounter, not the maybes.
If you've already gotten a bunch of data to begin with, there's no need to query the db again for a subset of that set. Just store it in an object which you can query on refinement of the search query by the user.

Slow search for items using extended property on Exchange

Problem at hand
Our C# Windows application uses EWS Managed API 2.0 to create appointments in a user's calendar. Each appointment has an extended property with a unique value. It later locates an appointment using FindItems and an ItemView.
Users experience significant delays the first time this search is performed. Subsequent response times are entirely acceptable.
("first time" is a little vague here, because users may experience the delay again later in the day)
// locate ID of appointment where extended property value equals 1234:
var filter = new Ews.SearchFilter.IsEqualTo(extendedPropertyDefinition, 1234);
var view = new ItemView(1, 0);
view.PropertySet = BasePropertySet.IdOnly;
var folder = new FolderId(WellKnownFolderName.Calendar, new Mailbox("..."));
var result = service.FindItems(folder, filter, view);
Remote server is an Exchange Server 2007 SP1.
Research
MSDN ties some comments to search folders and restricted views, however I am uncertain if these apply to our situation.
The act of applying a view to a folder creates search folders in the
store. When a search folder is created, it is cached for later use. If
a user tries to create a search folder which already exists, the
cached search folder is used. This allows future viewings to be fairly
quick. By default, Exchange does not cache all search folders
indefinitely.
Specifically with regard to EWS:
It is also important to be aware of the fact that the first time an
Exchange store search query is issued, it will run very slowly and
possibly time out, whereas on future runs it will respond without
issue. This is caused by back-end processes that occur on the Exchange
server when a store search is performed.
They suggest creating search folders for non-changing, non-dynamic queries, which doesn't seem fitting in our case, since the query is different for each appointment.
If an application requires a specific query that has a fixed set of
nonchanging parameters, you can use search folders. [...] search
folders are useful only for nonchanging, nondynamic queries.
What we need is in essence to create an "index" - in database terms - on the property, ensuring that all searches on this specific property are fast, no matter the time or frequency.
Is it possible to "index" this property? Can anything be configured either client or server side to remove this initial delay?
I've hit the same sort of problem with an integration project. I wish there was a good solution...
You cannot create an index for a property that is not already indexed by Exchange. Creating a search folder for each is not viable if the number of Appointments grows high enough. Too many search folders on a single folder will cause further problems as they will all need to be updated when a new item is added to the folder. That's my understanding, at least. Also, Exchange 2007 is limited to 11 dynamic search folders per parent folder, so it may be even less viable depending on the number of appointments and how often they're accessed. Using existing indexed properties may not be viable as these can likely be changed by the user outside of your application. If you have some way of ensuring that the Appointments you create can only be accessed or altered from your application, then that's a different story.
The database table is a good way to go, but there's a potential snag that some people don't see until it's too late. ItemId is the obvious choice to link to your extended property, but ItemId is NOT constant. It's a calculated property based on several others. It can change if the item is moved to another folder, and it may also change with the installation of a service pack or given enough time passing, or so I've heard. I can confirm at the least the first one. ItemId is not viable for long-term storage, at least not without additional checks. You could potentially store ItemId and your extended property. If a bind using the ItemId fails, fall back to the extended property search. If the bind is successful, then check it against the extended property in the database to be certain that it matches. Update the ItemId once you have the item if it doesn't match up. Do you need to work with anything beyond the Appointment objects, ie, meeting responses, forward notifications, etc, or is this concerned only with the Calendar?
It isn't pretty, but it should be a somewhat reasonable compromise. You may still have the occasional search, but they should be few and far between as long as the user doesn't decide to move Appointments to different folders or plans some Appointments way in advance, and even then the sync should help mitigate that as well. Just be prepared to repopulate that table if there are upgrades to Exchange.
Of course, if Microsoft had either added the capability to index additional properties or even added a blank string field or two to the index in Exchange Search for this very purpose, we wouldn't have this problem. Heck, an index on the GlobalObjectId properties on Appointments and associated objects would help, but alas...no. I'm not a fan of repurposing existing indexed fields. Not all of them are applicable to Appointments and the ones that are tend to be either required or editable by the user. Unless you know precisely what you're doing, repurposing those fields could potentially have unforeseen consequences down the road.
In any case, I don't claim to be an expert in all matters of EWS/Exchange, so maybe there is a better way than this. Take it with a grain of salt.
There isn't a way to switch on indexing for your property. I'm not familiar with which properties are indexed in Exchange 2007. Since your application appears to be using appointments, perhaps you could repurpose one of the other non-appointment properties to store your unique value. Perhaps use the AssistantName property via an extended property (to work around restrictions imposed by the EWS schema and service). This way, most clients will not be using that property for calendar items.
According to this topic, http://technet.microsoft.com/en-us/library/jj983804(v=exchg.150).aspx, that property is indexed (in 2013). That property has existed for a long time so it may be indexed for 2007.
Hey, this is a long shot, and not optimal by any means, but perhaps it might work for your scenario.
After reading this thread some more, I see that you are not looking for all items with your Extended Property but a specific item. Sorry I didn't catch that in my first response. I agree that the Search Folder alone would not work for you, since you would be required to update the filter each time you were searching for your item. This would obviously be pretty expensive (probably worse than your current approach). One idea I have is creating a View that sorts by your Extended Property. I could be wrong, but I believe you can apply this view to the above Search Folder (note that I'm talking about explicitly creating the Search Folder and View and storing them in the mailbox, they can be hidden or exposed to the OL UI under the Search Folders tree). The search folder would filter only Appointments that have your Extended Property. Then the View would sort the folder by the property value. In some reading I've been doing on the ESE internals, I've seen some commentary that indicates that sorting by a property will cause Exchange to create an index on the ESE (wish I could find it now). The section on ESE B-Tree Indexes seems to confirm this: http://books.google.com/books?id=12VMxwe3OMwC&pg=PA73&lpg=PA73&dq=how+to+create+exchange+ese+indexes&source=bl&ots=D5hJyJIEo5&sig=ppZ6RFJh3PnrzeePRWHFJOwXgeU&hl=en&sa=X&ei=QQ7HUtgggvTbBdjcgfAP&ved=0CFwQ6AEwBQ#v=onepage&q=how%20to%20create%20exchange%20ese%20indexes&f=false
You'd then have to use the same approach you used above on the Search Folder to find the specific item matching your criteria. One challenge of course is the issue with Exchange throwing away your index (which is probably what is happening in your current approach). Perhaps you could programmatically touch the search folder periodically to ensure that this doesn't happen? This link is also helpful to understand the performance impacts of creating a Search Folder/View: http://technet.microsoft.com/en-us/library/cc535025%28EXCHG.80%29.aspx
If you find a good solution (or this one works), I'm very interested to hear about it (and I'm sure many others are too). Oh the joy of Exchange Development :-)
Creating a search folder with your extended property as the criteria is the way to go. You'll pay the price while the search folder builds initially, but after the index is created as long as the folder exists and is running it will be updated automatically by Exchange. We use this technique quite successfully to find the proverbial "needle in a haystack".
http://msdn.microsoft.com/EN-US/library/dd633687(v=exchg.80).aspx

Using LINQ in C# for paginated results

I'm looking for a design solution for a pattern that I am going to have to repeat quite a lot throughout a website I am designing. It is going to be ASP.NET MVC front-end, with C# WCF web services connecting using NHibernate to SQL database.
It's a social networking site so imagine facebook here to get a conceptual idea. What I'm looking for is an efficient and performant way to return paginated results of large datasets, for example a user may have 150 emails. I want to return them 10 at a time depending on what page theyre on, obviously only returning the 10 that relate to the page rather than having to load all 150 items into memory and only displaying 10 at a time as I think the user experience would be better to have a slightly longer delay in changing pages compared to a faster initial load. After all when do you look at emails 6 months old? The usual case is you only care about the first page of results anyway. Similarly a user may have had a number of interactions since their last login (eg your notifications feed on facebook) but again I only want to load n number of results at a time, but in this instance rather than having pages, you would click the "Display more" button which would then fetch the next N results, display them with another "display more" link and so forth you can keep clicking until you reach the end of the dataset. I can imagine they would both use the same design though as they are technically both paginated results, just with different UI output and flow.
Can anyone offer some advice on a good design to use for this, bearing in mind my data retrieval is using NHibernate Queryable or Enumerables? Would I want to be loading all data from DB in one hit then using an interator pattern to only return N rows from the service layer, keeping the rest of the list held in memory on the server open in the users session context so if I made another call to retrieve the next N rows, it would be held in place and keep returning N rows until the iterator finished, or would it be best to simply retrieve N rows from the database and return those, holding nothing in session context? I can see how to return top 10 results from Queryable as
var results = (from email in emails where email.UserId = userId).Take(10);
But I'm not sure how efficient this is, is this the fastest way of doing it? And furthermore I don't see how to start at a certain position, this will always only return the first 10, not say the second 10, or third 10 etc.
So I'm a bit unsure how the best way to proceed is and was hoping for some pointers and advice from people who have done something similar. Bearing in mind with my website performance is going to be of the essence so the user experience needs to be pretty sharp and interactive with refreshing new results. Basically if you were trying to simulate a facebook news feed/wall - how would you implement it with the above architecture?
Thanks!
You can use Skip in combination with Take:
var results = (from email in emails where email.UserId = userId)
.Skip((currentPage - 1) * 10)
.Take(10);
About the web service: You really should make it a stateless web service. You could use the ASP.NET Web API for this. This enables you to build a RESTful web service.
Do I want to be loading all the in one hit...
Definitely not, you only want to pull down the records you need, not the ones you may need.
...using an interator pattern to only return N rows from the service layer, keeping the rest of the list held in memory on the server open in the users session context...
Scalability goes right out the window with that idea.
...or would it be best to simply retrieve N rows from the database and return those, holding nothing in session context?
Now your starting to get on the right track...
In general, you want to let the database do as much as the querying as possible i.e. you don't want to hit the database to then have to further query the results (however, that's not always avoidable). In other words, you want to delegate most, if not all, the heavy lifting to the database.
You mentioned you are using NHibernate which is a pretty powerful ORM. The good news is that do a lot of the work for you in terms of query optimization/caching data etc. Like most ORM's nowadays, NHibernate uses deferred execution with it's queries so just watch out for things like hitting the database too early & choosing when to eager load data instead of performing multiple queries. There is a lot to learn with NHibernate, if you haven't already, it's worth taking the time to read up about it before diving in it will save you a lot of hassle in the long run.
Bearing in mind with my website performance is going to be of the essence so the user experience needs to be pretty sharp and interactive with refreshing new results
In terms of the performance (I assume you mean page load speeds) you would just want to ajaxify your site i.e. load what needs to be loaded with the page, pull the rest in the background & update the page dynamically. To achieve the "refreshing new results" part you need to look at polling the server and pulling down new data. I am pretty sure Facebook use a technique called long polling which essentially keeps an active request open with the server for a set amount of time so the data appears to happen "instantly". Polling is a different ball game all together though, it's about striking the balance of server load vs how "fresh" the data needs to be - that's something you would need to decide yourself and the answer to that is usually dependant on the type of data vs the hardware capabilities of the server.
There are some links about it (like this) out there but I liked this guy approach. I don't know if I'd use his PagedQueryable, but his IPageable, IPagedEnumerable and PagedEnumerable are really interesting. Besides, his project introduction page may give you some ideas on how to roll your own pagination.

Should I use LINQ to SQL or XML for my project?

My program has 3 text fields, Title, Website, and PictureURL. When I click the 'save' button I want it to add the 3 entries into a log of some sort (LINQ or XML seems like the best choice). Only 1 user will be accessing the program at a time. The log will be local on the machine, and not on an external server. After the 3 fields have been saved as a single entry to the log, I want to be able to load each group of entries from the log back into the textboxes. Would either be a simpler solution or a more appropriate choice for this type of project? I am new to both hence my uncertainty for which would be better.
With given set of requirements indeed it would be better to stick with XML storage since you have not neither big amount of data nor complex search and grouping conditions nor remote and distributed access. So LINQ-to-XML would suit perfect for such simple desctop application. Keep it simple.
Why not LINQ to XML? Assuming local storage is going to be, as you stated, an XML file:
http://msdn.microsoft.com/en-us/library/bb387098.aspx
It's hard to give a good answer without knowing more about your situation.
If you are just running this locally on one machine, and do not anticipate the log growing overly large, I'd say XML wold be the better choice, as it requires less setup and overhead than a database.
However, if it needs to scale for size or users, you'll want to use a database. But that will add additional complexity, despite the fact that LINQ to SQL makes it simpler to use.

Ensure "Reasonable" queries only

In our organization we have the need to let employees filter data in our web application by supplying WHERE clauses. It's worked great for a long time, but we occasionally run into users providing queries that require full table scans on large tables or inefficient joins, etc.
Some clown might write something like:
select * from big_table where
Name in (select name from some_table where name like '%search everything%')
or name in ('a', 'b', 'c')
or price < 20
or price > 40
or exists (select 1 from some_other_table where col1 + col2 + col3 = 4)
or exists (select 1 from table_a, table+b)
Obviously, this is not a great way to query these tables with computed values, non-indexed columns, lots of OR's and an unrestricted join on table_a and table_b.
But for a user, this may make total sense.
So what's the best way, if any, to allow internal users to supply a query to the database while ensuring that it won't lock a dozen tables and hang the webserver for 5 minutes?
I'm guessing that's a programmatic way in c#/sql-server to get the execution plan for a query before it runs. And if so, what factors contribute to cost? Estimated I/O cost? Estimated CPU cost? What would be reasonable limits at which to tell the user that his query's no good?
EDIT: We're a market research company. We have thousands of surveys, each with their own data. We have dozens of researchers that want to slice that data in arbitrary ways. We have tools to let them construct "valid" filters using a GUI, but some "power users" want to supply their own queries. I realize this isn't standard or best practice, but how else can I let dozens of users query tables for the rows they want using arbitrarily complex conditions and ever-changing conditions?
The premise of your question states:
In our organization we have the need to let employees filter date in our web application by supplying WHERE clauses.
I find this premise to be flawed on its face. I can't imagine a situation where I would allow users to do this. In addition to the problems you have already identified, you are opening yourself up to SQL Injection attacks.
I would highly recommend reassessing your requirements to see if you can't build a safer, more focused way of allowing your users to search.
However, if your users really are sophisticated (and trusted!) enough to be supplying WHERE clauses directly, they need to be educated on what they can and can't submit as a filter.
You can try using the following:
SET SHOWPLAN_ALL ON
GO
SET FMTONLY ON
GO
<<< Your SQL code here >>>
GO
SET FMTONLY OFF
GO
SET SHOWPLAN_ALL OFF
GO
Then you can parse through what you've got. As to where to draw the line on various things, that's going to take some experience. There are some things to watch for, but nothing that is cut and dried. It's often more of an art to examine the query plans than a science.
As others have pointed out though, I think that your problem goes deeper than the technology implications. The fact that you let unqualified people access your database in such a way is the underlying problem. From past experience, I often see this in companies where they are too lazy or too inexperienced to properly capture their application's requirements. I'm not saying that this is necessarily the case with your corporate environment, but that's what I've seen.
In addition of trying to control what the users enter (which is a loosing battle, there will always be a new hire that will come up with an immaginative query), I'd look into Resource Governor, see Managing SQL Server Workloads with Resource Governor. You put the ad-hoc queries into a separate pool and cap the allocated resources. This way you can mitigate the problem by limiting the amount of damage a bad query can do to other tasks.
And you should also consider giving access to the data by other means, like Power Pivot and let users massage their data as hard as they want on their own Excel. Business power users love that, and the impact on the transaciton processign server is minimal.
Instead of allowing employees to directly write (append to) queries, and then trying to calculate the query cost before running it, why not create some kind of Advanced Search or filter feature that is NOT writing SQL you cannot control?
In very large Enterprise originations on internal application this is a common practice. Often during your design phase you will limit the criteria or put sensible limits on data ranges, but once the business gets hold of the app there will be calls from the business unit management to remove the restrictions. In my origination this is a management problem not an engineering issue.
What we did was profile all of the criteria and found the largest offenders, both users and what types of queries caused the most problems and put limitations on some of the queries. Also some very expensive queries that were used on a regular basis were added to the app and the app cached the results and ran the queries when load was low. We also created caned optimized queries for standard users and gave only specified users the ability to search for anything. Just a couple of ideas.
You could make a data model for your database and allow users to use SQL Reporting Services' Report Builder. Its GUI-based and doesn't require writing WHERE clauses, so there should be a limit to how much damage they can do.
Or you could warehouse a copy of the db for the purpose of user queries, update the db every hour or so, and let them go to town... :)
I have worked a few places where this also came up. What we ended up doing was NOT allowing users unconstrained access, and promising to have IT do their best to provide queries when needed. The issue was that the database is fairly complicated, and even if users could write grammatically and syntactically correct SQL, they don't necessarily understand the relationships between the tables. In other words, even if they could write their own SQL they would get the wrong answers. We convinced the users that the risk of making the wrong decision based on a flawed or incomplete understanding of the 200 tables in the database was too high. Better to get the right answer after a day than the wrong one instantly.
The other part of this is what does IT do when user A writes a query and gets 1 answer, then user B writes what he thinks is the same query and gets a different answer? Is it IT's job to find the differences? To fix both pieces of SQL? etc. The bottom line is that I would not allow them access. I would load the system with predefined queries, as others have mentioned, and try to train mgmt why that is the only way it will work in the long run.
If you have so much data and you want to provide to your customers the ability to analyse and view the information as they want to, I strongly recommand to thing about OLAP technologies.
I guess you've never heard of SQL Injection attacks? What if the user enters A DROP DATABASE command after the WHERE clause?
This is the reason that direct SELECT permission is almost never given to users in the vast majority of applications.
A far better approach would be to engineer your application around use cases so that you are able to cover a reasonable percentage of requirements with specifically designed filters/aggregation/layout options.
There are a myriad of ways to do this so some analysis of your specific problem domain will definitely be required together with research into viable methods.
Whilst direct SQL access is the most flexible for your users, long executing queries are likely to be just the start of your headaches. SQL injection is a big concern here, whether it's source is malicious or simply misguided.
(Chad mentioned this in a comment, but I think it deserves to be an answer.)
Maybe you should copy data that needs to be queried ad-hoc into a separate database, to isolate any problems from the majority of users.

Categories