I'm working on a c# application that extracts metadata from all the documents in a Lotus Notes database (.nsf file, no Domino server) thusly:
NotesDocumentCollection documents = _notesDatabase.AllDocuments;
if (documents.Count>0)
{
NotesDocument document = documents.GetFirstDocument();
while (document!=null)
{
//processing
}
}
This works well, except we also need to record all the views that a document appears in. We iterate over all the views like this:
foreach (var viewName in _notesDatabase.Views)
{
NotesView view = _notesDatabase.GetView(viewName);
if (view != null)
{
if (view.AllEntries.Count > 0)
{
folderCount = view.AllEntries.Count;
NotesDocument document = view.GetFirstDocument();
while (document!=null)
{
//record the document/view cross reference
document = view.GetNextDocument(document);
}
}
Marshal.ReleaseComObject(view);
view = null;
}
}
Here are my problems and questions:
We fairly regularly encounter documents in a view that were not found in NotesDatabase.AllDocuments collection. How is that possible? Is there a better way to get all the documents in a notes database?
Is there a way to find out all the views a document is in without looping through all the views and documents? This part of the process can be very slow, especially on large nsf files (35 GB!). I'd love to find a way to get just a list of view name and Document.UniversalID.
If there is not a more efficient way to find all the document + view information, is it possible to do this in parallel, with a separate thread/worker/whatever processing each view?
Thanks!
Answering questions in the same order:
I'm not sure how this is possible either unless perhaps there's a special type of document that doesn't get returned by that AllDocuments property. Maybe replication conflicts are excluded?
Unfortunately there's no better way. Views are really just a saved query into the database that return a list of matching documents. There's no list of views directly associated with a document.
You may be able to do this in parallel by processing each view on its own thread, but the bottleneck may be the Domino server that needs to refresh the views and thus it might not gain much.
One other note, the "AllEntries" in a view is different than all the documents in the view. Entries can include things like the category row, which is just a grouping and isn't backed by an actual document. In other words, the count of AllEntries might be more than the count of all documents.
Well, first of all, it's possible that documents are being created while your job runs. It takes time to cycle through AllDocuments, and then it takes time to cycle through all the views. Unless you are working on a copy or replica of the database that is is isolated from all other possible users, then you can easily run into a case where a document was created after you loaded AllDocuments but before you accessed one of the views.
Also, is it may be possible that some of the objects returned by the view.getXXXDocument() methods are deleted documents. You should probably be checking document.isValid() to avoid trying to process them.
I'm going to suggest using the NotesNoteCollection as a check on AllDocuments. If AllDocuments were returning the full set of documents, or if NotesNoteCollection does (after selecting documents and building the collection), then there is a way to do this that is going to be faster than iterating each view.
(1) Read all the selection formulas from the views, removing the word 'SELECT' and saving them in a list of pairs of {view name, formula}.
(2) Iterate through the documents (from the NotesNoteCollection or AllDocuments) and for each doc you can use foreach to iterate through the list of view/formula pairs. Use the NotesSession.Evaluate method on each formula, passing the current document in for the context. A return of True from any evaluated formula tells you the document is in the view corresponding to the formula.
It's still brute force, but it's got to be faster than iterating all views.
Related
for a plugin I need to get all the viewsheet in the rvt file and display informations from them in an xaml dialog
but my process is very very slow the first time I use it
(with the debuger : 500 ms for 83 viewplan , it is very slow without the debuger too)
(if I execute my code again, the execution is istantaneous)
my code bellow
can you help me ?
thanks in advance
Luc
protected IEnumerable<Element> GetAllEl(Document document)
{
var filteredElementCollector = new FilteredElementCollector(document);
filteredElementCollector = filteredElementCollector
.OfCategory(BuiltInCategory.OST_Sheets)
.WhereElementIsNotElementType()
.OfClass(typeof(ViewSheet));
var fcElements = filteredElementCollector.ToElements();
return fcElements;
}
I do not think there is currently a known generic solution for that problem.
Here is a recent discussion with the development team on this:
Question: for a given element id, we need to find the list of sheet ids displaying it.
Current solution: we loop through all the sheets and views and use FilteredElementCollector( doc, sheet.Id)
With the results from that, we perform one more call to FilteredElementCollector( doc, view.Id) and look for the element id.
Issue: the current solution takes a lot of time and displays a Revit progress bar saying Generating graphics.
Is there any better way to know if a given element id is available in the sheet or not?
For example, something like this would be very useful:
getAllSheets(ElementId) // returns array of sheet id
hasGuid(ElementId,sheetId) // return true/false
Does the API provide any such methods, to check whether a given ElementId is available in the sheet?
Answer: So the goal is to find a view that displays a particular element on a sheet?
Many model elements could be visible on multiple views, while most annotation elements are typically present only in one view.
What type of elements are you checking for?
And what will you do with that info?
Response: the goal is to find a view that displays a particular element on a sheet.
It can be any type of element.
Answer: Here are some previous related discussions:
Determining Views Showing an Element
The inverse, Retrieving Elements Visible in View
Response: The problem is that the first call to FilteredElementCollector( doc, viewId ) shows generating graphics in the progress bar.
Only the first time search does so. The second time, search on the same view has no issues with performance.
Answer: The first time is slow because in order to iterate on the elements visible in a view the graphics for that view must be generated.
I can't think of a workaround to get a precise answer.
You might be able to skip sheets which don't have model views in their viewport list to save a bit of time.
Some sheets may only have drafting views and schedules and annotations.
The development team provided a very helpful suggestion which helped work around the generating graphics call in a special case,
to Loop through sheets - generating graphics.
Maybe you can optimise in a similar manner for your specific case?
I think you may be over-filtering the ElementCollector. In my add-in, I just use this code to get the view sheets: new FilteredElementCollector(_doc).OfClass(typeof(ViewSheet));
I have a situation wherein a List object is built off of values pulled from a MSSQL database. However, this particular table is mysteriously getting an errant record or two tossed in. Removing the records cause trouble even though they have no referential links to any other tables, and will still get recreated without any known user actions taken. This causes some trouble as it puts unwanted values on display that add a little bit of confusion. The specific issue is that this is a platform that allows users to run a search for quotes, and the filtering allows for sales rep selection. The select/dropdown field is showing these errant values, and they need to be removed.
Given that deleting the offending table rows does not provide a desirable result, I was thinking that maybe the best course of action was to modify the code where the List object is created and either filter the values out or remove them after the object is populated. I'd like to do this in a clean, scalible fashion by providing some kind of appendable data object where I could just add in a new string value if something else cropped up as opposed to doing something clunky that adds new code to find the value and remove it each time.
My thought was to create a string array, and somehow loop through that to remove bad List values, but I wasn't entirely certain that was the best way to approach this, and I could not for the life of me think of a clean approach for this. I would think that the best way would be to add a filter within the Find arguments, but I don't know how to add in an array or list that way. Otherwise I figured to loop through the values either before or after the sorting of the List and remove any matches that way, but I wasn't sure that was the best choice of actions.
I have attached the current code, and would appreciate any suggestions.
int licenseeID = Helper.GetLicenseeIdByLicenseeShortName(Membership.ApplicationName);
List<User> listUsers;
if (Roles.IsUserInRole("Admin"))
{
//get all users
listUsers = User.Find(x => x.LicenseeID == licenseeID).ToList();
}
else
{
//get only the current user
listUsers = User.Find(x => (x.LicenseeID == licenseeID && x.EmailAddress == Membership.GetUser().Email)).ToList();
}
listUsers.Sort((x, y) => string.Compare(x.FirstName, y.FirstName));
-- EDIT --
I neglected to mention that I did not develop this, I merely inherited its maintenance after the original developer(s) disappeared, and my coworker who was assigned to it left the company. I'm not really really skilled at handling ASP.NET sites. Many object sources are hidden and unavailable for edit, I assume due to them being defined in a DLL somewhere. So, for any of these objects that are sourced from database tables, altering the tables will not help, since I would not be able to get the new data anyway.
However, I did try to do the following to filter out the undersirable data:
List<String> exclude = new List<String>(new String[] { "value1" , "value2" });
listUsers = User.Find(x => x.LicenseeID == licenseeID && !exclude.Contains(x.FirstName)).ToList();
Unfortunately it only resulted in an error being displayed to the page.
-- EDIT #2 --
I got the server setup to accept a new event viewer source so I could write info to the Application log to see what was happening. Looks like this installation of ASP.NET does not accept "Contains" as an action on a List object. An error gets kicked out stating that the method is not available.
I will probably add a bit to the table and flag Errant rows and then skip them when I query the table, something like
&& !ErrantData
Other way, that requires a bit more upkeep but doesn't require db change, would be to keep a text file that gets periodically updated and you read it and remove users from list based on it.
The bigger issue is unknown rows creeping in your database. Changing user credentials and adding creation timestamps may help you narrow down the search scope.
we recently had a migration project that went badly wrong and we now have 1000's of duplicate records. The business has been working with them which has made the issue worse as we now have records that have the same name and address but could have different contact information. A small number are exact duplicates. we have started the panful process of manually merging the records but this is very slow. Can anyone suggest another way of tackling the problem please?
You can write a console app quickly to merge them & refer the MSDN sample code for the same.
Sample: Merge two records
// Create the target for the request.
EntityReference target = new EntityReference();
// Id is the GUID of the account that is being merged into.
// LogicalName is the type of the entity being merged to, as a string
target.Id = _account1Id;
target.LogicalName = Account.EntityLogicalName;
// Create the request.
MergeRequest merge = new MergeRequest();
// SubordinateId is the GUID of the account merging.
merge.SubordinateId = _account2Id;
merge.Target = target;
merge.PerformParentingChecks = false;
// Execute the request.
MergeResponse merged = (MergeResponse)_serviceProxy.Execute(merge);
When merging two records, you specify one record as the master record, and Microsoft Dynamics CRM treats the other record as the child record or subordinate record. It will deactivate the child record and copies all of the related records (such as activities, contacts, addresses, cases, notes, and opportunities) to the master record.
Read more
Building on #Arun Vinoth's answer, you might want to see what you can leverage with out-of-box duplicate detection to get sets of duplicates to apply the merge automation to.
Alternatively you can build your own dupe detection to match records on the various fields where you know dupes exist. I've done similar things to compare records across systems, including creating match codes to mimic how Microsoft does their dupe detection in CRM.
For example, a contact's match codes might be
1. the email address
2. the first name, last name, and company concatenated together without spaces.
If you need to match Companies, you can implement the an algorithm like Scribe's stripcompany to generate matchcodes based on company names.
Since this seems like a huge problem you may want to consider drastic solutions like deactivating the entire polluted data set and redoing the data import clean, then finding any of the deactivated records that got touched in the interim to merge them, then deleting the entire polluted (deactivated) data set.
Bottom line, all paths seem to lead to major headaches and the only consolation is that you get to choose which path to follow.
I am storing images in SQL Server in bytes[] and then retrieving it using VIEWDATA as following (there are nine images (bytes[]) in database which I am retrieving):
Action controller:
public ActionResult show_pics2()
{
using (cygnussolutionEntities6 db = new cygnussolutionEntities6())
{
// db.CommandTimeout = int.MaxValue; //For test
var querylist = (from f in db.Images
select f.ImageContent);
// get list in ViewBag
ViewBag.DataLIst = querylist;
// get list in View Data
ViewData["images"] = querylist.ToList();
return View();
}
}
In view, I am parsing images and displaying it with foreach loop and viewDATA, but it's taking so long to load to browser. Does anyone knows why this is taking so long?
It would be worth putting some timers through your code to see where the slowness is occurring - i.e. is it the query to the database that is slow, is materialising the result into a list slow, or is the slowness occurring within the view when processing the images.
Also worth considering - are you eager-loading any properties from the Images table as that will affect performance depending on the number of extra navigation properties being loaded.
Are you able to post the code within your view and your entities class?
If you are going to ToList() something and also going to use it elsewhere, then make both uses use the results of the ToList() (unless you really need to maintain the queryable interface). Then you need only get that list once.
Don't obtain several blobs and use them on a page, obtain several IDs and use them to call other resources through <img src="theImage/#id"> or similar, then have that resource served by a view that only retrieves the sole image. Then each such resource loads a single image and can do so in parallel to each other.
Unless the images are all small, use streams that build on ADO access to the blob and stream out a chunk of 4096 bytes at a time, rather than EF. While it means leaving a lot of what EF give you (and a lot of what MVC gives you since you'll have to do it in a result rather than a view) it allows for memory-efficient streaming that can begin with the first loaded chunk.
Asp.NET - C#.NET
I need a advice regarding a design problem below:
I'll receive everyday XML files. It changes the quantity e.g. yesterday 10 XML files received, today XML 56 files received and maybe tomorrow 161 XML files etc.
There are 12 types (12 XSD)... and in the top there is a attribute called FormType e.g. FormType="1", FormType="2" , FormType="12" etc. up to 12 formtypes.
All of them have common fields like Name, adres, Phone.
But e.g. FormType=1 is for Construction, FormType=2 is for IT, FormType 3=Hospital, Formtype=4 is for Advertisement etc. etc.
As I said all of them have common attributes.
Requirements:
Need a search screen so the user can do search on these XML contents. But I don't have any clue how to approach this. e.g. Search the text in some attributes for the xml's received from Date_From and Date_To.
Problem:
I've heard about putting the XML's in a Binary field and do XPATH query or whatever but don't know the word's to search on google.
I was thinking to create a big database.table and read all XML's and put in the Database Table. But the issue is some xml attributes are very huge like 2-3 pages. and the same attributes in other XML file are empty..
So creating NVARCHAR(MAX) for every XML attribute and putting them in table.field.... After some period my DATABASE will be a big big monster...
Can someone advice what is the best approach to handle this issue?
I'm not 100% sure I understand your problem. I'm guessing that the query's supposed to return individual XML documents that meet some kind of user-specified criteria.
In that event, my starting point would probably be to implement a method for querying a single XML document, i.e. one that returns true if the document's a hit and false otherwise. In all likelihood, I'd make the query parameter an XPath query, but who knows? Here's a simple example:
public bool TestXml(XDocument d, string query)
{
return d.XPathSelectElements(query).Any();
}
Next, I need a store of XML documents to query. Where does that store live, and what form does it take? At a certain level, those are implementation details that my application doesn't care about. They could live in a database, or the file system. They could be cached in memory. I'd start by keeping it simple, something like:
public IEnumerable<XDocument> XmlDocuments()
{
DirectoryInfo di = new DirectoryInfo(XmlDirectoryPath);
foreach (FileInfo fi in di.GetFiles())
{
yield return XDocument.Load(fi.Filename);
}
}
Now I can get all of the documents that fulfill a request like this:
public IEnumerable<XDocument> GetDocuments(query)
{
return XmlDocuments.Where(x => TextXml(x, query));
}
The thing that jumps out at me when I look at this problem: I have to parse my documents into XDocument objects to query them. That's going to happen whether they live in a database or the file system. (If I stick them in a database and write a stored procedure that does XPath queries, as someone suggested, I'm still parsing all of the XML every time I execute a query; I've just moved all that work to the database server.)
That's a lot of I/O and CPU time that gets spent doing the exact same thing over and over again. If the volume of queries is anything other than tiny, I'd consider building a List<XDocument> the first time GetDocuments() is called and come up with a scheme of keeping that list in memory until new XML documents are received (or possibly updating it when new XML documents are received).