How to search in table with fulltext search same as google?

How to search in table with fulltext search same as google? - c#

I am working on pretty enterprise document management system (DMS project) for big company.
DMS database is Microsoft SQL Server 2012 and document table name is "Document".
At this time more than 4,000,000 records are available in document table.
I need to search in document table same as Google search through SQL Server Full-text Search with very good performance(less than 1 second response time).
User see single text box for intelligence search. For example user need to find document that code contain "1107" and author name contains "Albert", therefore in that text box types: 1107 Albert
I generated below query to find this:
select count(*) over() totalRowFound, DocumentID
from dbo.Document
where contains(*,N'("*1107*")) AND contains(*,N'("*Albert*"))
I used * in contains function for better search result but response time is about 4~7 second.
I know google algorithm is very complicated but I want to implement intelligence search like Google concept in only 4~10 million record with less than 1 second response time.
How can I improve this query?
or
What is best practice for intelligence search same as google?

With the * you are searching all columns
Try
where contains(code,N'("*1107*"))
How is searching all columns better search results if they want to search a specific column?
Not going to SQL fulltext search the same as Google as they are not the same engines.
I don't think the Google engine is available.
Lucene is a freeware search engine.
Why did you go down a path of writing your own DMS?

When you say 'Google search', what you really mean is an inverse index. Apache's Lucene project provides this functionality in a similar indexing fashion. SQL Server's FullText uses inverse indexing as well.
If you want very, very fast performing text searches, you might want to try using Lucene or Solr as it has some features that SQL Server Full-Text search does not (and vice-versa) and when properly configured, can perform very well.

Related

Ideas to improve metadata count search

I am working on a classified website, i am converting from sql server to mongodb/no-sql and probably node.js. I really want to come up with a better solution to the filtering of metadata (Eg the filter in ebay). I currently have all my metadata in a table
Metadata Value Table
AdvertID
MetaDataValue
PrentID
Level
So as you can see the metadata takes a hierarchical structure. I then perform group by count on this table to produce the filter count for the left pane of the site. Then as a user drills in to the metadata options i end up doing recursive group by count sub queries.
As you can imagine this gets a little costly on time and performance. As i am going to move no a no-sql option, i thought this would be a good time to investigate an alternative option. I am open to any suggestions, but this is hobby site so i would be looking for an open source/free solution.

Can Lucene.net be used for a tag based search system?

I'm developing a ASP.Net MVC3 app which will have few hundred videos. I want to create a search system based on tags and other parameters like the user type that uploaded the video, the date of the video, video category, etc..
I have been looking around and Lucene.NET seems really good tool for full text search, but I don't know if it's the best solution for my project... I have read the tutorials and they recommend to keep the search index to a minimum but also that you should NOT hit your database for retrieving extra data that is not stored in the search index...
How this can be possible?
Lets put an example: I have a video row (as a concept, this is really held in different SQL tables) which has columns for the video id, the video name, the video file name, the full path, user id, user type, tags, creation date, video category, video subcategory, video location, etc... If I want to create a lucene search index I think I will have to put all the information in there so that later on I can query on every parameter, right?
This seems to me a duplicate of the SQL Database but with the overload of adding, editing and removing documents from lucene search index. Is this the standard scenario when using lucene? All the examples I have seen with lucene are based on a post id, post title and post body..
What do you think? Can you give me some light?

Yes, if you want to query multiple fields (including things like tags) from within lucene, you'll need to make that data available to lucene. It might sound like this is duplication, but it is not redundant duplication - it is restructuring the data into a very different layout - indexed for search.
It should work fine; it is pretty much how search works here on stackoverflow (which is using lucene.net to perform the search).
It should be noted, however, that a few hundred is not a large sample: frankly you could do that any way you like, and it'll take about the same amount of time. Writing a complex SQL query should work, as should full-text-search in the database (that is how stackoverflow's search used to work), as should filtering objects in-memory (at the few-hundred level, you could trivially just cache all the data excluding video frames in memory).

How to Store strings to Optimize Searching

I am having a table containing a column of type VARCHAR. I want to search strings inside the column according to user input query. I want to implement Approximate Searching. And my table contains Lacs of records. There are some ways I am thinking I can implement searching.
Load All records in C# and apply searching algorithm on it. (But it will consume too much memory.)
Fetch records individually or in some predefined batch size and apply searching algorithm on it. (But it will establish database connection rapidly, which may downgrade the performance.)
I am sure that, there will be some other mechanism to implement this functionality or some technique to store data so that i can search it faster.
Can anybody give me any better idea, to implement this?

Lucene is one of the best ways to search. You can still store your string in the database, but build a Lucene index out of it and then use it to search.

SQL Server has built-in functionality to do exactly what you're looking to do, it's called Full-Text Search.
Overview from Microsoft here: http://msdn.microsoft.com/en-us/library/ms142571.aspx
The general concept is that you tell SQL Server what tables/columns contain searchable text, and it builds space-efficient and query efficient "full-text indexes"; these indexes are built asynchronously (so your updates/inserts are not slowed down), and since SQL Server 2005 they are stored with your database (eg in backups), so they're easily managed.
When you want to search, the query language is different from "normal" text matching.
Full-Text search is even available in the free "SQL Server 2008 Express with Advanced Services" edition, so cost is no longer a concern.

Help with Search Engine Architecture .NET C#

I'm trying to create a search engine for all literature (books, articles, etc), music, and videos relating to a particular spiritual group. When a keyword is entered, I want to display a link to all the PDF articles where the keyword appears, and also all the music files and video files which are tagged with the keyword in question. The user should be able to filter it with information such as author/artist, place, date/time, etc. When the user clicks on one of the results links (book names, for instance), they are taken to another page where snippets from that book everywhere the keyword is found are displayed.
I thought of using the Lucene library (or Searcharoo) to implement my PDF search, but I also need a database to tag all the other information so that results can be filtered by author/artist information, etc. So I was thinking of having tables for Text, Music, and Videos, and a field containing the path to the file for each. When a keyword is entered, I need to search the DB for music and video files, and also need to search the PDF's, and when a filter is applied, the music and video search is easy, but limiting the text search based on the filters is getting confusing.
Is my approach correct? Are there better ways to do this? Since the search content is limited only to the spiritual group, there is not an infinite number of items to search. I'd say about 100-500 books and 1000-5000 songs.

Lucene is a great way to get up and running quickly without too much effort, along with several areas for extending the indexing and searching functionality to better suit your needs. It also has several built-in analyzers for common file types, such as HTML/XML, PDF, MS Word Documents, etc.
It provides the ability to use a variety of Fields, and they don't necessarily have to be uniform across all Documents (in other words, music files might have different attributes than text-based content, such as artist, title, length, etc.), which is great for storing different types of content.
Not knowing the exact implementation of what you're working on, this may or may not be feasible, but for tagging and other related features, you might also consider using a database, such as MySQL or SQL Server side-by-side with the Lucene index. Use the Lucene index for full-text search, then once you have a result set, go to the database to extract all the relational content. Our company has done this before, and it's actually not as big of a headache as it sounds.
NOTE: If you decide to go this route, BE CAREFUL, as the "unique id" provided by Lucene is highly volatile (it changes everytime the index is optimized), so you will want to store the actual id (the primary key in the database) as a separate field on the Document.
Another added benefit, if you are set on using C#.NET, there is a port called Lucene.Net, which is written entirely in C#. The down-side here is that you're a few months behind on all the latest features, but if you really need them, you can always check out the Java source and implement the required updates manually.

Yes, there is a better approach. Try Solr and in particular check out facets. It will save you a lot of trouble.

If you definitely want to go the database route then you should use SQL Server with Full Text Search enabled. You can use this with Express versions, too. You can then store and search the contents of PDFs very easily (so long as you install the free Adobe PDF iFilter).

You could try using MS Search Server Express Edition, one of the major benefits is that it is free.
http://www.microsoft.com/enterprisesearch/en/us/search-server-express.aspx#none

What's the best way to implement a search?

I've got a requirement where a user enters a few terms into a search box and clicks "go".
Does anyone have any good resources on how to implement a dynamic search that spans a few database tables?
Thanks,
Mike

I'm gonna throw in my vote for Lucene. While SQL Server does provide full text indexing and some search capabilities, it is not the greatest search engine. In my experience, it does not provide the best results or result ranking until you have a significant volume of indexed items (tens of thousands to hundreds of thousands minimum).
In contrast, Lucene is explicitly a search engine. It is an inverted index, behaving much like your run of the mill internet search engine. Lucene provides a very rich indexing and search platform, as well as some rich C# and .NET API's for querying the indexes. There is even a LINQ to Lucene provider that will allow you to query a Lucene index with LINQ.
The one drawback to using Lucene is that you have to build an index, which is a side-band process that runs independently of the database. You have to write your own tool to manage the index as well. Your search index, depending on how frequently you update it, may not be 100% up-to-the-minute up to date. Generally, that is not a huge concern, but if you have the resources, the Lucene index culd be incrementally updated every few minutes to keep things "fresh".

It is called Full-text Search.
http://msdn.microsoft.com/en-us/library/ms142571.aspx

This is a pretty loaded question given the lack of detail. If you just need a simple search over a few tables/columns then a single (cludgy) search SP may be enough for you.
That said, if you need more features such as:
Searching a large set of tables
Support for large amounts of data
Searching over forms of a word
Logical operations
etc
then you might want to look into Full-Text Search (which is a part of MS Sql 2000 and above). The initial investment to get up to speed with Full-Text Search can be a bit offsetting, but compared to implementing the above features you'll likely save yourself a ton of time and energy.
Here are some Full-Text Search links to get you started:
Msdn Page
Initial Set Up
Set Up Video
Hope that helps.

Ok there were a few requests for more info so let me provide some.
I have several tables (ie. users, companies, addresses) and I'd like a user to be able to enter something like this:
"microsoft wa gates"
and bring up a result list containing results for "gates", "microsoft", and "washington".
Lucene seems like it could be pretty cool.

You can create a SP that receive the search terms as parameters and retun some "selects" (recordsets) to the program that launched. It can return a select for each table and you can do whatever you need with the data in your app code.
If you need to receive only a dataset, you can make a View using UNION of the tables for consolidate the columns in a common schema and then filter the view same way. You will receive in your application only a dataset with all the information consolidated in the view and filtered.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.