I am writing a .NET Windows Service whose job is to monitor the status of documents stored in a document database (MongoDB). These documents will be modified from time-to-time by users via a web site. The Windows Service needs to run every, say, 5 minutes, poll around all the documents (hundreds of these), examine the documents and see if any of the documents needs attention from a user (a real person). Users will be notified of required action via email.
The service will run 24/7. There is no current SQL database in the mix, and I don't really want the overhead and expense of maintaining a SQL database just to support this requirement. I do have MSMQ in the mix, alongside MongoDB. I would consider using WWF, but is there a lightweight workflow persistence store that does not rely upon SQL?
Can anyone advise as to the best strategy to support this requirement?
Thanks.
Since you can easily write your own implementation of persistence for workflows, I would suggest you to store data in XML file.
You can find the example of implementation of XML persistence here: XML Instance Store
Not sure how many workflows will be started in you case. If the number is big enough it might makes sence to use SQL Server because the approach with XML file can lead to concurrency and performance problems.
Related
I'm creating a website content management system which stores a whole bunch of website articles and let user be able to modify these articles through the system. I'm a typical SQL Server developer however I'm thinking maybe this system can be done in DocumentDB.We are using C# plus WebAPI to do the read and write. I'm testing different data access technology to see which one performs better. I have been trying Ling, Linq Lambda, SQL and Stored Procedure. The thing is all these query methods seems all running around 600ms to 700ms when I test via Postman. For example, one of my test is a simple Get http://localhost:xxxxxx/multilanguage/resources/1, which would take 600ms+. That was only a 1 kb document and there are only have 5 documents stored in my collection so far. So I guess what I want to ask is: is there a quicker way to query DocumentDB than this. The reason I ask is because I did something similar in SQL Server before(not to query document, it was for relational tables). A much more complex query in a stored procedure on multiple joined tables only takes around 300ms. So I guess there should be a quicker way to do this. Thanks for any suggestions!
Most probably if you will change implementation to stab you will get same performance since actually you are testing connection time between yours server and client (postman).
There's a couple things you can do, but do keep in mind that DocumentDB, and other NoSQL solutions behave very differently than standard SQL Server. For example, the more nodes and RAM available to DocumentDB the better it will perform overall. The development instance of DocumentDB on Azure is understandably going to use fewer resources than a production instance. Since Azure takes care of scaling, one way to think about it is that the more data you have the better it will perform.
That said, something you are probably not used to is sharing your connection object for your whole application. That avoids the start up penalties every time you want to get your data. Summarizing Performance Tips:
Use TCP connection instead of HTTPS when you can
Use await client.OpenAsync() to avoid pausing on start up latency for the first request
Connect to the DocumentDB in the same region (keep in mind if you host across regions)
Use a singleton to access DocumentDB (it's threadsafe)
Cache your SelfLinks for quick access
Tune your page sizes so that you get only the data you intend to use
The more advanced performance tips cover index policies, etc. DocumentDB and other NoSQL databases behave differently than SQL databases. That also means your assumptions about how the APIs work are probably wrong. Make sure you are testing similar concepts. The SQL Server database connection object needs you to create/dispose of objects for each transaction so it can return those connections back to a connection pool. Treating DocumentDB the same way is going to cause the same kind of performance problems as if you didn't use a connection pool.
I have a WCF web service that currently is recording a record to SQL Server 2005 roughly every second throughout they day. Our business reporting team runs SELECT queries against this database in live.
I want to rethink this solution, so that the business reports are not querying our table directly. This is to prevent locking or other performance hits to my WCF web service.
So I am thinking about using another database to hold the reporting data, which will be a transformed version of the source record.
Can anyone point me towards the Microsoft technologies that will allow the WCF service to maintain 100% availability, and the maximum throughput of records possible - so no performance hits.
Without having tried this myself, I would recommend you to take a look at Snapshot Isolation. From what I have read this sounds like what you might need.
I have been working on an application for a couple of years that I updated using a back-end database. The whole key is that everything is cached on the client, so that it never requires an network connection to operate, but when it does have a connection it will always pickup the latest updates. Every application updated is shipped with the latest version of the database and I wanted it to download only the minimum amount of data when the database has been updated.
I currently use a table with a timestamp to check for updates. It looks something like this.
ID - Name - Description- Severity - LastUpdated
0 - test.exe - KnownVirus - Critical - 2009-09-11 13:38
1 - test2.exe - Firewall - None - 2009-09-12 14:38
This approach was fine for what I previously needed, but I am looking to expand more function of the application to use this type of dynamic approach. All the data is currently stored as XML, but I do not want to store complete XML files in the database and only transmit changed data.
So how would you go about allowing a fairly simple approach to storing dynamic content (text/xml/json/xaml) in a database, and have the client only download new updates? I was thinking of having logic that can handle XML inserted directly
ID - Data - Revision
15 - XXX - 15
XXX would be something like <Content><File>Test.dll<File/><Description>New DLL to load.</Description></Content> and would be inserted into the cache, but this would obviously be complicated as I would need to load them in sequence.
Another approach that has been mentioned was to base it on something similar to Source Control, storing the version in the root of the file and calculating the delta to figure out the minimal amount of data that need to be sent to the client.
Anyone got any suggestions on how to approach this with no risk for data corruption? I would also to expand with features that allows me to revert possibly bad revisions, and replace them with new working ones.
It really depends on the tools you are using and the architecture you already have. Is there already a server with some logic and a data access layer?
Dynamic approaches might get complicated, slow and limit the number of solutions. Why do you need a dynamic structure? Would it be feasible to just add data by using a name-value pair approach in a relational database? Static and uniform data structures are much easier to handle.
Before going into detail, you should consider the different scenarios.
Items can be added
Items can be changed
Items can be removed (I assume)
Adding is not a big problem. The client needs to remember the last revision number it got from the server and you write a query which get everything since there.
Changing is basically the same. You should care about identification of the items. You need an unchangeable surrogate key, as it seems to be the ID you already have. (Guids may be useful here.)
Removing is tricky. You need to either flag items as deleted instead of actually removing them, or have a list of removed IDs with the revision number when they had been removed.
Storing the data in the client: Consider using a relational database like SQLite in the client. (It doesn't need installation, it is just storing in a file. Firefox for instance stores quite a lot in SQLite databases.) When using the same in the server, you can probably reuse some code. It is also transaction based, which helps to keep it consistent (rollback in case of error during synchronization).
XML - if you really need it - can be stored just as a string in the database.
When using an abstraction layer or ORM that supports SQLite (eg. NHibernate), you may also reuse some code even when there is another database used by the server. Note that the learning curve for such an ORM might be rather steep. If you don't know anything like this, it could be too much.
You don't need to force reuse of code in the client and server.
Synchronization itself shouldn't be very complicated. You have a revision number in the client and a last revision in the server. You get all new / changed and deleted items since then in the client and apply it to the local store. Update the local revision number. Commit. Done.
I would never update only a part of a revision, because then you can't really know what changed since the last synchronization. Because you do differential updates, it is essential to have a well defined state of the client.
I would go with a solution using Sync Framework.
Quote from Microsoft:
Microsoft Sync Framework is a comprehensive synchronization platform enabling collaboration and offline for applications, services and devices. Developers can build synchronization ecosystems that integrate any application, any data from any store using any protocol over any network. Sync Framework features technologies and tools that enable roaming, sharing, and taking data offline.
A key aspect of Sync Framework is the ability to create custom providers. Providers enable any data sources to participate in the Sync Framework synchronization process, allowing peer-to-peer synchronization to occur.
I have just built an application pretty much exactly as you described. I built it on top of the Microsoft Sync Framework that DjSol mentioned.
I use a C# front end application with a SqlCe database, and a SQL 2005 Server at the other end.
The following articles were extremely useful for me:
Tutorial: Synchronizing SQL Server and SQL Server Compact
Walkthrough: Creating a Sync service
Step by step N-tier configuration of Sync services for ADO.NET 2.0
How to Sync schema changed database using sync framework?
You don't say what your back-end database is, but if it's SQL Server you can use SqlCE (SQL Server Compact Edition) as the client DB and then use RDA merge replication to update the client DB as desired. This will handle all your requirements for sure; there is no need to reinvent the wheel for such a common requirement.
I have a C# Windows service which manages some stuff for my server application. This is not the main application, but a helper process used to control my actual application. The user connects to this application via WCF using a WinForms application. It all looks a bit like the IIS manager.
I need a data store for this application.
Currently, I use separate XML files which are loaded at start up, are updated in memory and flushed to disk on every change. I like this because:
We can simply edit the XML files in notepad when issues arise;
I do not have external dependencies to e.g. MSSQL express;
I do not have to update a database schema when the format changes.
However, I find that this is not stable and that the in memory management is very fragile.
What should I use instead that is not overkill (like e.g. MSSQL express would be) without loosing too many of the above advantages?
SQLite is made for occasions like this where you need a solid data store, but do not require the power or scalability of a full database server.
If you do not want to worry about schema changes, you may be best off with your xml method or some variety of NoSQL database. What exactly is unstable about your xml setup?
If you have multiple concurrent processes accessing the xml file, you will have to load it quite often to ensure it remains synchronized. If this is a multiuser situation, xml files may not be feasible past a very very small scale. This is the problem database systems solve fairly effectively.
Try SQL CE or SQLLite.
db4o
One solution would be to use and object database like dB4o. It has an extremely small footprint, is fast as hell and can you can add properties to your persisted objects without needing to make schema changes. Also, you don't have to write any sql.
Storing objects is as easy as:
using(IObjectContainer db = Db4oEmbedded.OpenFile(YapFileName))
{
Pilot pilot1 = new Pilot("Michael Schumacher", 100);
db.Store(pilot1);
}
XML in Database
Another way to do it is using something like SQLLite or SQL CE (as mentioned by other posters) in conjunction with xml data.
Data Contract Serializer
If you're not already using the DataContractSerializer / DataContracts to generate / load your xml files, it's worth considering. It's the same robust framework that you're already using for WCF. It handles versioning pretty well. You could use this to deal with xml files on disk, or use it with a database.
Does anybody know how to query from MUMPS database using C# without using KBSQL -ODBC?
We have requirement to query from MUMPS database (Mckesson STAR Patient care) and when we use KBSQL it is limited for 6 concurrent users. So we are trying to query MUMPS directly without using KBSQL.
I am expecting something like LINQ TO MUMPS.
I think Mckesson uses Intersystems' Cache
as its mumps (M) provider. Cache has support for .Net (see the documentation here). Jesse Liberty has a pretty good article on using C#, .Net and Windows Forms as the front end to a Cache database.
I'm not sure about LINQ (I'm no expert here), but this might give you an idea as to where to start to get your project done.
Michael
First off, I too feel your pain. I had the unfortunate experience of developing in MagicFS/Focus a couple of years back, and we had the exact same request for relational query support. Why do people always want what they can't have?
Anyway, if the version of MUMPS you're using is anything like MagicFS/Focus and you have access to the file system which holds the "database" (flat files), then one possible avenue is:
Export the flat files to XML files. To do this, you'll have to manually emit the XML from the flat files using MUMPS or your language of choice. As painful as this sounds, MUMPS may be the way to go since you may not want to determine the current record manually.
Read in the XML using LINQ to XML
Run LINQ queries.
Granted, the first step is easier said than done, and may even be more difficult if you're trying to build the XML files on the fly. A variant to this would be to manage generation of the XML files like indexes, via a nightly server process or the like.
If you're only going to query in specific ways (i.e. I want to join Foo and Bar tables on ID, and that's all I want), I would consider instead pulling and caching that data into server-side C# collections, and skip querying altogether (or pull those across using WCF or like and then do your LINQ queries).
You can avoid the 6 user limitation by separating the database connection from the application instances
Use the KB/SQL ODBC thru a middle tier ( either a DAL in your application)
or a separate service (windows service ).
This component can talk to the MUMPS database using at most 6 separate threads ( in line with the KB/SQL limitation).
The component can use ADO.NET for ODBC to communicate with the KBSQL ODBC driver.
You can then consume the data from your application using LINQ for ADO.NET.
You may need to use a queuing system like MSMQ to manage queuing of the data requests. if the 6 concurrent connections are insufficient for the volume of requests.
It is a good design practice to queue requests and use LINQ asynchronous calls to avoid blocking user interaction.
All current MUMPS language implementations have the ability to specify MUMPS programs that respond to a TCP/IP connection. The native MUMPS database is structured as a hierarchy of ordered multi-key and value pairs, essentially a superset of the NoSQL paradigm.
KB/SQL is a group of programs that respond to SQL/ODBC queries, translate them into this MUMPS "global" data queries, retrieve and consolidate the results from MUMPS, and then send back the data in the form that the SQL/ODBC protocol expects.
If you have the permissions/security authorization for your implementation that allows you to create and run MUMPS programs (called "routines"), then you can respond to any protocol you desire from the those programs. MUMPS systems can produce text or binary results on a TCP/IP port, or a host operating system file. Many vendors explicitly keep you from doing this in their contracts to provide healthcare and financial solutions.
To my knowledge the LINQ syntax is a proprietary Microsoft product, although there are certainly LINQ-like Open Source efforts out there. I have not seen any formal definition of a line protocol for LINQ, but if there is one, a MUMPS routine can be written to communicate using that protocol. It would have to do something similar to KB/SQL however, since neither the LINQ syntax nor the SQL syntax are very close to the native MUMPS syntax.
The MUMPS data structuring and storage mechanism can be mechanically translated into the an XML syntax. This may still require an extensive effort as it is highly unlikely that the vendor of your system will provide a DTD defined for this mechanically created XML syntax, and you will still have to deal with encoded values and references which are stored in the MUMPS based system in their raw form.
What vendor and version of MUMPS are you using? The solution will undoubtedly be dependent on the vendor's api they have exposed.