Slow search for items using extended property on Exchange

Slow search for items using extended property on Exchange - c#

Problem at hand
Our C# Windows application uses EWS Managed API 2.0 to create appointments in a user's calendar. Each appointment has an extended property with a unique value. It later locates an appointment using FindItems and an ItemView.
Users experience significant delays the first time this search is performed. Subsequent response times are entirely acceptable.
("first time" is a little vague here, because users may experience the delay again later in the day)
// locate ID of appointment where extended property value equals 1234:
var filter = new Ews.SearchFilter.IsEqualTo(extendedPropertyDefinition, 1234);
var view = new ItemView(1, 0);
view.PropertySet = BasePropertySet.IdOnly;
var folder = new FolderId(WellKnownFolderName.Calendar, new Mailbox("..."));
var result = service.FindItems(folder, filter, view);
Remote server is an Exchange Server 2007 SP1.
Research
MSDN ties some comments to search folders and restricted views, however I am uncertain if these apply to our situation.
The act of applying a view to a folder creates search folders in the
store. When a search folder is created, it is cached for later use. If
a user tries to create a search folder which already exists, the
cached search folder is used. This allows future viewings to be fairly
quick. By default, Exchange does not cache all search folders
indefinitely.
Specifically with regard to EWS:
It is also important to be aware of the fact that the first time an
Exchange store search query is issued, it will run very slowly and
possibly time out, whereas on future runs it will respond without
issue. This is caused by back-end processes that occur on the Exchange
server when a store search is performed.
They suggest creating search folders for non-changing, non-dynamic queries, which doesn't seem fitting in our case, since the query is different for each appointment.
If an application requires a specific query that has a fixed set of
nonchanging parameters, you can use search folders. [...] search
folders are useful only for nonchanging, nondynamic queries.
What we need is in essence to create an "index" - in database terms - on the property, ensuring that all searches on this specific property are fast, no matter the time or frequency.
Is it possible to "index" this property? Can anything be configured either client or server side to remove this initial delay?

I've hit the same sort of problem with an integration project. I wish there was a good solution...
You cannot create an index for a property that is not already indexed by Exchange. Creating a search folder for each is not viable if the number of Appointments grows high enough. Too many search folders on a single folder will cause further problems as they will all need to be updated when a new item is added to the folder. That's my understanding, at least. Also, Exchange 2007 is limited to 11 dynamic search folders per parent folder, so it may be even less viable depending on the number of appointments and how often they're accessed. Using existing indexed properties may not be viable as these can likely be changed by the user outside of your application. If you have some way of ensuring that the Appointments you create can only be accessed or altered from your application, then that's a different story.
The database table is a good way to go, but there's a potential snag that some people don't see until it's too late. ItemId is the obvious choice to link to your extended property, but ItemId is NOT constant. It's a calculated property based on several others. It can change if the item is moved to another folder, and it may also change with the installation of a service pack or given enough time passing, or so I've heard. I can confirm at the least the first one. ItemId is not viable for long-term storage, at least not without additional checks. You could potentially store ItemId and your extended property. If a bind using the ItemId fails, fall back to the extended property search. If the bind is successful, then check it against the extended property in the database to be certain that it matches. Update the ItemId once you have the item if it doesn't match up. Do you need to work with anything beyond the Appointment objects, ie, meeting responses, forward notifications, etc, or is this concerned only with the Calendar?
It isn't pretty, but it should be a somewhat reasonable compromise. You may still have the occasional search, but they should be few and far between as long as the user doesn't decide to move Appointments to different folders or plans some Appointments way in advance, and even then the sync should help mitigate that as well. Just be prepared to repopulate that table if there are upgrades to Exchange.
Of course, if Microsoft had either added the capability to index additional properties or even added a blank string field or two to the index in Exchange Search for this very purpose, we wouldn't have this problem. Heck, an index on the GlobalObjectId properties on Appointments and associated objects would help, but alas...no. I'm not a fan of repurposing existing indexed fields. Not all of them are applicable to Appointments and the ones that are tend to be either required or editable by the user. Unless you know precisely what you're doing, repurposing those fields could potentially have unforeseen consequences down the road.
In any case, I don't claim to be an expert in all matters of EWS/Exchange, so maybe there is a better way than this. Take it with a grain of salt.

There isn't a way to switch on indexing for your property. I'm not familiar with which properties are indexed in Exchange 2007. Since your application appears to be using appointments, perhaps you could repurpose one of the other non-appointment properties to store your unique value. Perhaps use the AssistantName property via an extended property (to work around restrictions imposed by the EWS schema and service). This way, most clients will not be using that property for calendar items.
According to this topic, http://technet.microsoft.com/en-us/library/jj983804(v=exchg.150).aspx, that property is indexed (in 2013). That property has existed for a long time so it may be indexed for 2007.
Hey, this is a long shot, and not optimal by any means, but perhaps it might work for your scenario.

After reading this thread some more, I see that you are not looking for all items with your Extended Property but a specific item. Sorry I didn't catch that in my first response. I agree that the Search Folder alone would not work for you, since you would be required to update the filter each time you were searching for your item. This would obviously be pretty expensive (probably worse than your current approach). One idea I have is creating a View that sorts by your Extended Property. I could be wrong, but I believe you can apply this view to the above Search Folder (note that I'm talking about explicitly creating the Search Folder and View and storing them in the mailbox, they can be hidden or exposed to the OL UI under the Search Folders tree). The search folder would filter only Appointments that have your Extended Property. Then the View would sort the folder by the property value. In some reading I've been doing on the ESE internals, I've seen some commentary that indicates that sorting by a property will cause Exchange to create an index on the ESE (wish I could find it now). The section on ESE B-Tree Indexes seems to confirm this: http://books.google.com/books?id=12VMxwe3OMwC&pg=PA73&lpg=PA73&dq=how+to+create+exchange+ese+indexes&source=bl&ots=D5hJyJIEo5&sig=ppZ6RFJh3PnrzeePRWHFJOwXgeU&hl=en&sa=X&ei=QQ7HUtgggvTbBdjcgfAP&ved=0CFwQ6AEwBQ#v=onepage&q=how%20to%20create%20exchange%20ese%20indexes&f=false
You'd then have to use the same approach you used above on the Search Folder to find the specific item matching your criteria. One challenge of course is the issue with Exchange throwing away your index (which is probably what is happening in your current approach). Perhaps you could programmatically touch the search folder periodically to ensure that this doesn't happen? This link is also helpful to understand the performance impacts of creating a Search Folder/View: http://technet.microsoft.com/en-us/library/cc535025%28EXCHG.80%29.aspx
If you find a good solution (or this one works), I'm very interested to hear about it (and I'm sure many others are too). Oh the joy of Exchange Development :-)

Creating a search folder with your extended property as the criteria is the way to go. You'll pay the price while the search folder builds initially, but after the index is created as long as the folder exists and is running it will be updated automatically by Exchange. We use this technique quite successfully to find the proverbial "needle in a haystack".
http://msdn.microsoft.com/EN-US/library/dd633687(v=exchg.80).aspx

Related

Filtering JIRA issues with the SOAP API

I am trying to efficiently find all the JIRA issues that were closed in the last week. Does anyone know how to do this in a nice way?
My messy solution right now is to loop over every issue in a project and compare its closing date and time to the current time. However, the loop must start and end with given keys, such as PROJECT-1000 to PROJECT-2000. This isn't very satisfying having to hardcode these values in and I don't want to have to increase the upper bound from 2000 to something higher every time more issues are added. I could choose a very large number that will almost certainly be larger than the highest id number (scan until PROJECT-7777777), but this slows down the program way too much.
(Keep in mind that even old issues with small ids may be closed very recently, meaning that it won't work to just scan over issues that were created since the last running of the application.)
Any suggestions for an elegant way of doing this?

You can create a filter using the JQL, and then access it using the SOAP API.
First, to create the filter, use a query of this sort:
resolutionDate >= "-7d"
Than, you can access your filter using the getIssuesFromFilterWithLimit SOAP function.
By the way, If for any future reason you want to find the highest issue key, check out this answer.
EDIT
To find the filter ID, go to http://your.jira.com/ManageFilters.jspa (Manage Filters) and choose the filter. Then in the URL you will see the requestId.
As #C.Williamson said, there's also getIssuesFromJqlSearch(token, jqlQuery, maxIssuesReturned), which saves the use of a filter by executing the JQL directly.

Help with Search Engine Architecture .NET C#

I'm trying to create a search engine for all literature (books, articles, etc), music, and videos relating to a particular spiritual group. When a keyword is entered, I want to display a link to all the PDF articles where the keyword appears, and also all the music files and video files which are tagged with the keyword in question. The user should be able to filter it with information such as author/artist, place, date/time, etc. When the user clicks on one of the results links (book names, for instance), they are taken to another page where snippets from that book everywhere the keyword is found are displayed.
I thought of using the Lucene library (or Searcharoo) to implement my PDF search, but I also need a database to tag all the other information so that results can be filtered by author/artist information, etc. So I was thinking of having tables for Text, Music, and Videos, and a field containing the path to the file for each. When a keyword is entered, I need to search the DB for music and video files, and also need to search the PDF's, and when a filter is applied, the music and video search is easy, but limiting the text search based on the filters is getting confusing.
Is my approach correct? Are there better ways to do this? Since the search content is limited only to the spiritual group, there is not an infinite number of items to search. I'd say about 100-500 books and 1000-5000 songs.

Lucene is a great way to get up and running quickly without too much effort, along with several areas for extending the indexing and searching functionality to better suit your needs. It also has several built-in analyzers for common file types, such as HTML/XML, PDF, MS Word Documents, etc.
It provides the ability to use a variety of Fields, and they don't necessarily have to be uniform across all Documents (in other words, music files might have different attributes than text-based content, such as artist, title, length, etc.), which is great for storing different types of content.
Not knowing the exact implementation of what you're working on, this may or may not be feasible, but for tagging and other related features, you might also consider using a database, such as MySQL or SQL Server side-by-side with the Lucene index. Use the Lucene index for full-text search, then once you have a result set, go to the database to extract all the relational content. Our company has done this before, and it's actually not as big of a headache as it sounds.
NOTE: If you decide to go this route, BE CAREFUL, as the "unique id" provided by Lucene is highly volatile (it changes everytime the index is optimized), so you will want to store the actual id (the primary key in the database) as a separate field on the Document.
Another added benefit, if you are set on using C#.NET, there is a port called Lucene.Net, which is written entirely in C#. The down-side here is that you're a few months behind on all the latest features, but if you really need them, you can always check out the Java source and implement the required updates manually.

Yes, there is a better approach. Try Solr and in particular check out facets. It will save you a lot of trouble.

If you definitely want to go the database route then you should use SQL Server with Full Text Search enabled. You can use this with Express versions, too. You can then store and search the contents of PDFs very easily (so long as you install the free Adobe PDF iFilter).

You could try using MS Search Server Express Edition, one of the major benefits is that it is free.
http://www.microsoft.com/enterprisesearch/en/us/search-server-express.aspx#none

How to create a daily summary alert for any change in a SharePoint site

I recently got the requirement for a person to receive a daily summary alert for any change within a SharePoint site; each site has an owner who is in charge of the content on their site.
The current way we have something working is to automatically set up alerts for every list/library within the site.
// Get the Lists on this Site
SPListCollection siteLists = currentSite.Lists;
foreach (SPList list in siteLists)
{
if (!list.ToString().Equals("Master Page Gallery"))
{
if (list.ReadSecurity == 1) // user has read access to all items
{
// Create an Alert for this List
Guid alertID = currentUser.Alerts.Add(list, SPEventType.All, SPAlertFrequency.Daily);
// Set any additional properties
SPAlert newAlert = currentUser.Alerts[alertID];
}
}
}
This creates two problems:
The user has a lot of different alerts created. Ideal: Only ONE email with the daily summary.
Some sort of monitor would have to be set up to check for new lists or libraries in the site and automatically set up alerts for the user.
Q: How can I create a daily summary alert for all changes in a site?

I believe the solution you're looking for is available through the auditing framework. Auditing is very robust in SP, unfortunately it's easy to get overwhelmed by the output.
The Audit is a property available on the SPSite, SPWeb, SPList, and SPItem properties.
Adjust the specific audit flags (using the .Audit.AuditFlags properties) using this property to suite your needs (the specifics will depend on how you define "change" but almost anything you can think of is available).
Details about the SPAudit object are available on MSDN.
Once you've defined what/where you want to audit, you'll have to get that information back to your users.
By default, SP sets up some nice reports that available at the site collection level ([url of site collection]/_layouts/Reporting.aspx?Category=Auditing). These may meet your needs.
Your initial solution mentioned alerts via email for the users. Given that most users want to centralize their information in email (though their MySite is great place to put a link to the reports!) you'll have a little more work to do.
You can pull the required audit information through the object model using the SPAuditQuery and SPAuditEntryCollection objects. Again, MSDN has some information on how to use these objects.
I would recommend setting up a custom SPJobDefinition that runs at the end of the day to email the users the audit report for their site. Andrew Connell has a great explaination of how to setup a custom job on his blog.
To summarize:
enable auditing for the SPWeb's in question
create a report using SPAuditQuery and SPAuditEntryCollection for each SPWeb
create an SPJobDefinition that runs each night to email the report to each SPWeb owner

A thing to consider before enabling auditing policy on a site, is the performance overhead you add.
I would recommend keeping the footprint as little as possible here!
By that i mean if its only a certain content type or a certain list that you want this information from, be sure to only enable the information policy on these CT's or lists!
Also keep the logging to a minimum. Eg if you are only interested in views, not deletion or restore, only log these events!
On large sites i have seen auditing really trash performance!
Also be aware of some caveats here: even though you can enable auditing on lists (as in not document libraries), alot of events (for example view events) is not logged specifically for list items! This is not described anywhere (in fact i have even seen Ted Pattison mention item level audit in an MSDN article) but i have it directly from CSS and product team that item level audit is not implemented in SP2007 because of performance issues. Instead you just get a list event in the log specifying that the list has been touched.
Documents is tracked fairly ok, but i have seen problems with auditing view events on publishing page (which in the API is considered a document not a list item) depending on how and where auditing was set (for example if audit policies were implemented with inherited CT's) so thats something to be aware of.
[edit: did some testing around this yesterday and its even worse: In fact publishing pages is only tracked if you set on site level audit policy! If you set a policy on a list or a content type (or even a content type that inherits from a content type with a policy) you will get no SPAuditItemType.Document level events at all. Set it on a site and you will get too many audits! Eg. a view will trigger x2 view events, and same with updates, so you end up with too much being logged. It definetely looks like a bug that nothing is audited when policies are put on lists and CT's...]
The main message here is:
careful what you log, since it will affect your sites performance
TEST that what you expect to log is really logged!
hth
Anders Rask

Well, it is not a case that there is no item level auditing. The item level auditing is implemented but you have to turn it ON for specific item. If list item exists you can get its instance and turn ON auditing the same as you do this to lists. The problem is that how to turn it ON when the ListItem is created. Maybe workflow could help?

Tag system with search for .NET (and ASP.NET)

Is there a good tag search system that i can use in my C# .NET prototype that will also run on ASP.NET ?

When you say "tag search system" I am going to assume that you mean the ability in a social network to allow your users to tag content thereby bubbling up the things that are most popular in your site by way of a tag cloud. Also allowing your users to navigate by way of tagged content, etc. ??
I like to create a SystemObjects table which holds the various tables in my system that might have tags applied to it (or comments, or ratings, etc.) thereby allowing me to have a generic tagging system that can span my entire database. Then I would also have a SystemObjectTags table that would have a reference to the SystemObjectID (telling me which table has the record that I am interested in) as well as the SystemObjectRecordID (which tells me which specific record I am interested in). From there I have all of my detail data with regards to the tags and what was tagged. I then like to keep a running list of the tag specific information in a Tags table which keeps the unique tag (string value of a tag) the TagID (which the SystemObjectTags table references), the count of that tags usage across the system (a summed value of all uses), etc. If you have a high traffic site this data should be kept in your cache so that you don't hit the data too frequently.
With this subsystem in place you can then move to the search capabilities. You have all the data that you need with these three tables to easily be able to perform filtering, searching, etc. However, you might find that there is so much data in here and that the tables are so generic that your searches are not as fast as a more optimized table structure might be. For this reason I suggest that you use a Lucene.NET index to hold all of your searchable data. Lucene.NET provides a very fast read time and provides far more flexibility in search algorithms than SQL Servers freetext stuff does.
This would then allow you to provide filtering of your content by tags, searching for content by tag, tag counts, etc. Lucene.net is a big scary topic though! Be prepared to do some reading to get your past the basics.

An option we are using is to put our "tags" in the Meta Keywords on the page, and then we use Bing for our search.
http://msdn.microsoft.com/en-us/library/dd251056.aspx
Our architect said it best. "Let the search engines do what they do best. Search."
You can limit the search to your site only, pull back the results and display them yourself...on your own page with your own formatting.
The only downside is that until your site is live and has been indexed, you can't fully test your search.

Caching Active Directory Data

In one of my applications, I am querying active directory to get a list of all users below a given user (using the "Direct Reports" thing). So basically, given the name of the person, it is looked up in AD, then the Direct Reports are read. But then for every direct report, the tool needs to check the direct reports of the direct reports. Or, more abstract: The Tool will use a person as the root of the tree and then walk down the complete tree to get the names of all the leaves (can be several hundred)
Now, my concern is obviously performance, as this needs to be done quite a few times. My idea is to manually cache that (essentially just put all the names in a long string and store that somewhere and update it once a day).
But I just wonder if there is a more elegant way to first get the information and then cache it, possibly using something in the System.DirectoryServices Namespace?

In order to take control over the properties that you want to be cached you can call 'RefreshCache()' passing the properties that you want to hang around:
System.DirectoryServices.DirectoryEntry entry = new System.DirectoryServices.DirectoryEntry();
// Push the property values from AD back to cache.
entry.RefreshCache(new string[] {"cn", "www" });

Active Directory is pretty efficient at storing information and the retrieval shouldn't be that much of a performance hit. If you are really intent on storing the names, you'll probably want to store them in some sort of a tree stucture, so you can see the relationships of all the people. Depending on how the number of people, you might as well pull all the information you need daily and then query all the requests against your cached copy.

AD does that sort of caching for you so don't worry about it unless performance becomes a problem. I have software doing this sort of thing all day long running on a corporate intranet that takes thousands of hits per hour and have never had to tune performance in this area.

Depends on how up to date you want the information to be. If you must have the very latest data in your report then querying directly from AD is reasonable. And I agree that AD is quite robust, a typical dedicated AD server is actually very lightly utilised in normal day to day operations but best to check with your IT department / support person.
An alternative is to have a daily script to dump the AD data into a CSV file and/or import it into a SQL database. (Oracle has a SELECT CONNECT BY feature that can automatically create multi-level hierarchies within a result set. MSSQL can do a similar thing with a bit of recursion IIRC).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.