I'm working on a project where i had been asked to do a semantic search. The scenario is a database with a table containing 3 pieces of information, Doctor Name, Patient Name, and Date of Visit. I had been asked to create a form that contains 3 fields: Doctor, Patient and Date. So when a user wants to search for a patient's corresponding doctor or doctors for corresponding patients or their dates, they can just enter any of the fields to retrieve information form the database. I had done the coding in C# using Regular Expressions for string manipulation and information retrieval. But the main task is that the search should work using RDF and URI.
Now that I had worked on most part of the coding can someone help me how to create the search using RDF and URI, is there any solution for this, how can I integrate RDF in C#, is there any documentation.
But as per my supervisor's requirements he had asked me to build a search that works with RDF, I mean the details of patients (e.g. Patient's Name), Doctor's Name and Date would be in a form of URI which locates the details of patients, doctors and date information in the database so if anyone is trying to search for any information like doctor or patient can just enter their name in the corresponding field and retrieve the information. I'm attaching 2 snapshots of my code for your understanding.
Image 1: http://img29.imageshack.us/i/15035706.jpg
Image 2: http://img31.imageshack.us/img31/1117/86105845.jpg
The first image is where I enter all the details to the database and the second image is the search.
This is the overall idea about my project, can you advice me how this can be done?
I would be really grateful to you if someone could help me on this as soon as possible.
Doing an RDF and URI based search is going to be dependent on whether your data is in RDF in the first place. If it's not you've either got to convert it from its current form into RDF on the fly or permanently. To do it on the fly you could use a technology like D2R which maps relational databases to RDF http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/
There's some other Semantic Web C# stuff about like Rowlex http://rowlex.nc3a.nato.int/ which is more OWL based or there's my own dotNetRDF library http://www.dotnetrdf.org but that's only just about to be a first Alpha release so I wouldn't recommend it for any production systems yet. SemWeb as Alex mentions is pretty good and scales particularly well - only disadvantage is that it's .Net 2.0 so you need a separate library if you want to do LINQ with it
A question about your question...
Your question is unclear about what you mean by semantic search, are you sure you're actually meaning to do an RDF search or did someone just specify "semantic search" in the spec and you googled it and got articles about RDF? Semantic search doesn't necessarily imply a need for RDF, it could be that you actually want to do natural language search.
By this I mean that it could be that you want the ability to search for things like "patients of Dr Smith" and that your search engine should be able to interpret this as a search for patients where the doctor field corresponds to Dr Smith.
Equally I could be wrong and you could indeed be attempting to build something that sounds very like TimBL's example from his 2001 Scientific American article on the Semantic Web.
Edit
So as you do want to do proper RDF search then I would advise that you put your data into a Triple Store rather than a database and preferably use a Triple Store that provides SPARQL query so you can convert the inputs on your query form into a SPARQL query and query the Triple store with that.
Maybe take a look at Talis http://www.talis.com or Virtuoso http://www.openlinksw.com/virtuoso/
If you decide to use SemWeb then you could just use the Triple Store that it provides.
You may be able to do what you need using LinqToRdf. LinqToRdf exposes two LINQ query providers (i.e. you will need .NET 3.5+) including one that produces standards compliant SPARQL queries.
Here's a typical LinqToRdf Query, which if you're familiar with LINQ to SQL, should be totally natural:
MusicDataContext ctx = new MusicDataContext(#"http://localhost/linqtordf/SparqlQuery.aspx");
var q = (from t in ctx.Tracks
where t.Year == "2006" &&
t.GenreName == "History 5 | Fall 2006 | UC Berkeley"
orderby t.FileLocation
select new {t.Title, t.FileLocation}).Skip(10).Take(5);
foreach (var track in q)
{
Console.WriteLine(track.Title + ": " + track.FileLocation);
}
I suggest you try RDFSharp (http://rdfsharp.codeplex.com/) because, as far as I can understand from your question, you probably need to quickly setup an RDF application capable of performing elementary triple-based searches like SUBJECT="xxx";PREDICATE=NULL;OBJECT="yyy".
Feel free to try it, of course there exist more powerful tools but for your scenario I believe it is the most simple to apply.
Using semantic web technologies for the scenario you describe is overkill. However, if you are interested in a mature .NET library for working with Semantic Web standards in .NET and SQL definitely take a look at Intellidimension's offerings.
A C# library for RDF which seems to be becoming quite popular in the community of LinqToRDF. The project was originated by Andrew Matthews and has been going since 2007b I think. The software is up on Google COde and can be found here:
LinkToRDF
Together with the library, there's also something called "LinqToRDF designer" which fits into Visual Studio and allows you model RDF graphically.
Related
I have an application where I need to search in various text-based fields. The application is developed using NHibernate as an ORM.
I would like to implement Porter Stemming in searches, in order to be able to return relevant results even when the keyword matches a similar word, for example the description of a product contains memories while the search keyword is memory.
Can anyone suggest the best practices for such types of searches? The first idea that comes to mind is to store two version of the same field in database, for example:
Description
Description_Search
The Description column would be the text as entered by the website administrator, and is the text visible on the frontend.
The Description_Search would include the same text, but passed through a Porter-Stemming algorithm. Search queries would then be based on the Description_Search field, rather than Description.
Does this make sense? Is it a waste of space having to store two version of almost the same text?
Also, would Lucene.Net help in such a case? I am also looking into integrating Lucene.Net for full-text based searches but haven't yet looked into it in detail.
Thanks in advance!
There's no need to use two fields for this, one would be enough. A field has two "values", one stored that can be retrieved using Document.Get(...), and one indexed that's used for searching. It's not technically required to store the values either, a common solution is to store a id that's used to lookup the original content in a database. This would also allow you to lookup more information, like author information and document location.
Lucene.Net would help in this case, but it requires you to write the infrastructure yourself. You would need to take care of configuring analyzers (usually nothing to configure), and index your content. As mention in a comment, you could go for SQL Server's Full Text Search functionality, but that itself has some limitations (which may not affect you).
One big problem I've had using SQL Server's FTS, but works in Lucene.Net (this isn't really fair since you can do almost anything in Lucene.Net since you write the code that does it) is accent sensitivity. I've been unable to configure it using Swedish language rules, where åäö should be treated as real characters. Enabling accent sensitivity would do this, but it would also mean that diacritics is parsed as real characters, which means that ñ differs from n. (Imagine searching for a "jalapeno" and get no matches for "jalapeño"). Disabling accent sensitivity basically removes all diacritics, turning åäö into aao, and words turn out completely different.
Writing things in Lucene.Net (compared to SQL Server FTS) allows you to provide result highlighting (present which phrases in a document that matches the query), search for similar documents, spell-checking, custom result boosting, facets, and other things that would enhance your users' search experience.
I am creating a search page where we can find the product by entering the text.
ex: Brings on the night.
My query bring the records which contain atleast word from this.
Needs:
1. First row should contains the record with the given sentence.
2. second row next most matching.
3. Third row next matching ...etc
How to achieve this. Is there any algorithm for this. It will be more helpful if anyone share your idea.
Edit:
Sample search Order:
1. Brings on the night
2. Whoever Brings the Night
3. Night Baseball Brings
4. Night ride
5. Night Round
6. Brings flower
Geetha
Building a search engine is a very complex undertaking, dealing with ambiguity, human language, typos, and much more. You should try to use whatever comes with your database engine. SQL Server and SQLite have them out of the box and most other databases probably have similar capabilities. These engines aren't particularly good, but they should suffice for simple scenarios. For more serious work, try Lucene, which comes in various flavors for different programming languages.
Have you tried full-text search?
http://msdn.microsoft.com/en-us/library/ms142583.aspx
As a really simple solution you could use sql's LIKE operator. Instead of
select object_name from table_name where parameter = something
You would do
select object_name from table_name where parameter LIKE something
This might work for very simple scenarios
Some pointers
- try your RDBMS full text search or investigate solutions such as Lucene/Solr
- there are implementations of distance (Levenshtein) in SQL, for not so trivial hand made ranking
- n-grams (bigrams, trigrams) can do a lot, see for example all the options in postgres internal search compared to mysql or MSSQL
Internal RDBMS searches (postgres might be an exception) usually have too little options, implementing your own is usually too hard or RDBMS would not let you do it (efficiently).
In Java you have Lucene
There is also a port for it in php (Zend Lucene).
You also have a port to C# Lucene .NET
Just by changing your db models you can integrate it into the search engine.
Have a look. I've used Lucene in the past and it's always been very effective and efficient.
I'm trying to create a search engine for all literature (books, articles, etc), music, and videos relating to a particular spiritual group. When a keyword is entered, I want to display a link to all the PDF articles where the keyword appears, and also all the music files and video files which are tagged with the keyword in question. The user should be able to filter it with information such as author/artist, place, date/time, etc. When the user clicks on one of the results links (book names, for instance), they are taken to another page where snippets from that book everywhere the keyword is found are displayed.
I thought of using the Lucene library (or Searcharoo) to implement my PDF search, but I also need a database to tag all the other information so that results can be filtered by author/artist information, etc. So I was thinking of having tables for Text, Music, and Videos, and a field containing the path to the file for each. When a keyword is entered, I need to search the DB for music and video files, and also need to search the PDF's, and when a filter is applied, the music and video search is easy, but limiting the text search based on the filters is getting confusing.
Is my approach correct? Are there better ways to do this? Since the search content is limited only to the spiritual group, there is not an infinite number of items to search. I'd say about 100-500 books and 1000-5000 songs.
Lucene is a great way to get up and running quickly without too much effort, along with several areas for extending the indexing and searching functionality to better suit your needs. It also has several built-in analyzers for common file types, such as HTML/XML, PDF, MS Word Documents, etc.
It provides the ability to use a variety of Fields, and they don't necessarily have to be uniform across all Documents (in other words, music files might have different attributes than text-based content, such as artist, title, length, etc.), which is great for storing different types of content.
Not knowing the exact implementation of what you're working on, this may or may not be feasible, but for tagging and other related features, you might also consider using a database, such as MySQL or SQL Server side-by-side with the Lucene index. Use the Lucene index for full-text search, then once you have a result set, go to the database to extract all the relational content. Our company has done this before, and it's actually not as big of a headache as it sounds.
NOTE: If you decide to go this route, BE CAREFUL, as the "unique id" provided by Lucene is highly volatile (it changes everytime the index is optimized), so you will want to store the actual id (the primary key in the database) as a separate field on the Document.
Another added benefit, if you are set on using C#.NET, there is a port called Lucene.Net, which is written entirely in C#. The down-side here is that you're a few months behind on all the latest features, but if you really need them, you can always check out the Java source and implement the required updates manually.
Yes, there is a better approach. Try Solr and in particular check out facets. It will save you a lot of trouble.
If you definitely want to go the database route then you should use SQL Server with Full Text Search enabled. You can use this with Express versions, too. You can then store and search the contents of PDFs very easily (so long as you install the free Adobe PDF iFilter).
You could try using MS Search Server Express Edition, one of the major benefits is that it is free.
http://www.microsoft.com/enterprisesearch/en/us/search-server-express.aspx#none
I've got a requirement where a user enters a few terms into a search box and clicks "go".
Does anyone have any good resources on how to implement a dynamic search that spans a few database tables?
Thanks,
Mike
I'm gonna throw in my vote for Lucene. While SQL Server does provide full text indexing and some search capabilities, it is not the greatest search engine. In my experience, it does not provide the best results or result ranking until you have a significant volume of indexed items (tens of thousands to hundreds of thousands minimum).
In contrast, Lucene is explicitly a search engine. It is an inverted index, behaving much like your run of the mill internet search engine. Lucene provides a very rich indexing and search platform, as well as some rich C# and .NET API's for querying the indexes. There is even a LINQ to Lucene provider that will allow you to query a Lucene index with LINQ.
The one drawback to using Lucene is that you have to build an index, which is a side-band process that runs independently of the database. You have to write your own tool to manage the index as well. Your search index, depending on how frequently you update it, may not be 100% up-to-the-minute up to date. Generally, that is not a huge concern, but if you have the resources, the Lucene index culd be incrementally updated every few minutes to keep things "fresh".
It is called Full-text Search.
http://msdn.microsoft.com/en-us/library/ms142571.aspx
This is a pretty loaded question given the lack of detail. If you just need a simple search over a few tables/columns then a single (cludgy) search SP may be enough for you.
That said, if you need more features such as:
Searching a large set of tables
Support for large amounts of data
Searching over forms of a word
Logical operations
etc
then you might want to look into Full-Text Search (which is a part of MS Sql 2000 and above). The initial investment to get up to speed with Full-Text Search can be a bit offsetting, but compared to implementing the above features you'll likely save yourself a ton of time and energy.
Here are some Full-Text Search links to get you started:
Msdn Page
Initial Set Up
Set Up Video
Hope that helps.
Ok there were a few requests for more info so let me provide some.
I have several tables (ie. users, companies, addresses) and I'd like a user to be able to enter something like this:
"microsoft wa gates"
and bring up a result list containing results for "gates", "microsoft", and "washington".
Lucene seems like it could be pretty cool.
You can create a SP that receive the search terms as parameters and retun some "selects" (recordsets) to the program that launched. It can return a select for each table and you can do whatever you need with the data in your app code.
If you need to receive only a dataset, you can make a View using UNION of the tables for consolidate the columns in a common schema and then filter the view same way. You will receive in your application only a dataset with all the information consolidated in the view and filtered.
The issue is there is a database with around 20k customer records and I want to make a best effort to avoid duplicate entries. The database is Microsoft SQL Server 2005, the application that maintains that database is Microsoft Dynamics/SL. I am creating an ASP.NET webservice that interacts with that database. My service can insert customer records into the database, read records from it, or modify those records. Either in my webservice, or through MS Dynamics, or in Sql Server, I would like to give a list of possible matches before a user confirms a new record add.
So the user would submit a record, if it seems to be unique, the record will save and return a new ID. If there are possible duplications, the user can then resubmit with a confirmation saying, "yes, I see the possible duplicates, this is a new record, and I want to submit it".
This is easy if it is just a punctuation or space thing (such as if you are entering "Company, Inc." and there is a "Company Inc" in the database, But what if there is slight changes such as "Company Corp." instead of "Company Inc" or if there is a fat fingered misspelling, such as "Cmpany, Inc." Is it even possible to return records like that in the list? If it's absolutely not possible, I'll deal with what I have. It just causes more work later on, if records need to be merged due to duplications.
The specifics of which algorithm will work best for you depends greatly on your domain, so I'd suggest experimenting with a few different ones - you may even need to combine a few to get optimal results. Abbreviations, especially domain specific ones, may need to be preprocessed or standardized as well.
For the names, you'd probably be best off with a phonetic algorithm - which takes into account pronunciation. These will score Smith and Schmidt close together, as they are easy to confuse when saying the words. Double Metaphone is a good first choice.
For fat fingering, you'd probably be better off with an edit distance algorithm - which gives a "difference" between 2 words. These would score Smith and Smoth close together - even though the 2 may slip through the phonetic search.
T-SQL has SOUNDEX and DIFFERENCE - but they are pretty poor. A Levenshtein variant is the canonical choice, but there's other good choices - most of which are fairly easy to implement in C#, if you can't find a suitably licensed implementation.
All of these are going to be much easier to code/use from C# than T-SQL (though I did find double metaphone in a horrendous abuse of T-SQL that may work in SQL).
Though this example is in Access (and I've never actually looked at the code, or used the implementation) the included presentation gives a fairly good idea of what you'll probably end up needing to do. The code is probably worth a look, and perhaps a port from VBA.
Look into SOUNDEXing within SQL Server. I believe it will give you the fuzziness of probable matches that you're looking for.
SOUNDEX # MSDN
SOUNDEX # Wikipedia
If it's possible to integrate Lucene.NET into your solutionm you should definetly try it out.
You could try using Full Text Search with FreeText (or FreeTextTable) functions to try to find possible matches.