Indexed search functionality

Indexed search functionality - c#

I'd like to add a research functionality to a software I'm developing. The idea is to adding some kind of "indexed" research, so when the user types in a text-box another, gui-component show the filtered results. Ex:
User types: a
aaa
aba
aab
user types: aa
aa
aab
and so on.
Sure this thing has a name (as it is used almost everywhere), but I don't know it and so until now I couldn't find anything useful over the web. I don't need the exact code, just a link to some resources (tutorial etc). Ty.
EDIT: I'm not looking for an autocomplete functionality: If I type in the textbox I'd like to see all the filtered result in a (for example) listbox near the textbox.

What you are trying to do is known as autocomplete (or it's a variation of that, you are simply filtering a list on-the-fly), and is a very common feature.
It requires that you be able to look up against your data quickly, as you have to be able to update the list as the input is formed. Of course, input can come in the form of keystrokes, and some people are very fast typists.
If your list is contained in-memory and is rather small, then your best bet would probably to filter over the list for search criteria (I'll refer to what is typed in the box as this).
If your list is not contained in memory, then you'll need to index your data somehow. Generally, databases are NOT good for this sort of thing. Some have text-indexing (SQL Server does) and if that suits your needs, you query against that.
If you aren't using a database, then you might want to consider using Lucene.NET to index your content. If your content is small enough, I'd recommend using the RAMDirectory, otherwise, the standard FSDirectory (file-based) will do fine.
With Lucene, you'll want to use the Contrib.Shingles package (it might be included in the latest build, I'm not sure); this is an n-gram filter which tokenizes items by characters, so basically, you could search on the first few characters (the search criteria) and get results.
Regardless of the approach you take, you need to take into account the speed of the inputs that come in. If you perform lookups every time a key is pressed, you'll have a good deal of requests that never will be applied.
Generally, you might want to start searching after the search criteria extends beyond two characters. Additionally, keep track of the number of requests that were made; if you have a request that is coming back and new input has been submitted, cancel the old request and submit the new request, the values from the old request won't be used.
When it comes to the UI component, it's better that you let another component vendor handle this; WinForms has an autocomplete mechanism for the TextBox, Silverlight has an autocomplete in the Silverlight Toolkit, jQuery has an autocomplete mechanism for web pages. Use one of those and shuffle your data to your control using the guidelines above.

If you are talking about a WinForms TextBox, then you might look at the AutomCompleteMode and AutoCompleteCustomSource properties of the TextBox.

Related

What schema, database, searching libraries are good for storing thousands of book pages in c# app

I want to write a C# program to store some books with the total of 5000 pages. But there are a few important issues here that I need your help and advice:
The ability to search all of the books’ content is one of the most important and challenging features of the app. The time that is needed to search a word should be about the time required to search a word in Microsoft Word or a PDF doc (with the same size) or more.
What method should I employ for storing the books so that more suitable approaches to searching the content would be in hand? Relational DB, MongoDB, couchDB, etc. which one is preferred?
For the case of using Database, what kind of Schema and indexing is required and important?
Which method or algorithm or library is better to be used for searching the whole content of the books? Is it possible to use lucene or Solr in a standalone windows app or would traditional searching method be better?
The program should be customized in such a way that the publisher would be able to add their own book contents. How can I handle this feature (can I use XML)?
The users should be able to add one or more lines from the contents to their favorite list. What is the best way to deal with this?

I think Solr will be able to meet most of these requirements. For #1, you can easily develop schema in Solr to hold various information in different formats. Solr's Admin UI has an Analysis tab that will help you greatly in developing your schema because it allows you to test your changes on the fly with different types of data. It is a huge time saver because you don't have to create a bunch of test content and index it in to test it. Additionally, if the contents of the books are in binary format you can use Apache Tika to perform text extraction. Solr also has a number of other bells and whistles that you may find helpful, such as highlighting and user query spell suggestion.
For #2, Solr will support updates to content via JSON files that can be sent to the update handler for your collection. It also supports atomic updates which you may find useful. It seems that in your case, you may need some kind of a security solution to sit on top of Solr to prevent publishers from modifying each other's content, however you will most likely run into this issue regardless of the type of solution you will use.
For #3, I am not sure what you are really looking for here. I think that for content search and retrieval you will find Solr a good fit. For general user information storage and etc, you may need a different tool, since that is kind of outside of scope of what Solr is supposed to do.
Hope it helps.

Pre-process MVC Razor File For Multi-Lingual Language Strings?

In my application we have multi-lingual language strings which are stored in custom tables, as the user can edit, delete, import new languages etc... via a UI
Currently, what I'm doing is at the beginning of each request is. I'm going off and getting all the language strings (From our database) for the currently selected language and sticking them in a dictionary.
I then have a Html Helper extension method which I use in the razor views (See below), which fishes in the dictionary I got at the beginning of the request to pull out the correct language based on the key supplied in the helper.
Html.LanguageString("MyLanguage.KeyHere")
Now this works fine. However, as the application is getting bigger. We are getting more and more language strings. It's not an issue right now, as its still very fast as there are only around 200 strings to get.
But this also means I'm getting all of them, even if a page has say one on it. I'd ideally like a way of processing the LanguageString("")'s before hand and doing a query to just get those that are needed at the beginning of the request? Or maybe my own linq based language that can be processed and product a more efficient call.
I'm looking for some advice on how to do this. As I'd like the application to be as efficient as possible. Any advice, help, tips are greatly received. Thanks.

I'd suggest caching language strings on the application basis rather than fetching them for every request. For example, this can be done by maintaining a static dictionary and invalidating the cache only when the user makes changes to these strings. This will make your application more responsive as well as save you from implementing (imho) rather more complex and not necessarily efficient technique of loading this data on-demand.
As a side note I'd add the following: it's usually a good practice to address these kinds of problems when they arise (rather than fixing something that is not broken) and focus on more important things. I totally agree that performance implications of a given solution must always be taken into consideration, I'm just saying that premature optimizations are not always a good idea.

Undo/Redo Best Practices with MVVM

I am working on essentially a drawing editor that allows you to define geometries based on key points on existing geometries. The user is then able to add some information about the thing they just added, such as name, expected size, etc. The API I am using to accomplish it is the awesome Reversible API, though I hope that the question extends beyond the API that I am using.
There are basically a couple questions that I am seeking a little clarity on:
1) If you are supporting Undo/Redo with an application that supports selection in a Master/Detail manner, should changing the state of a drawing object also cause it to be selected? The example being that an undo operation changed the name of an element, and that change would not be obvious unless the element was selected. Is there considered a standard behavior for something like this?
2) When dealing with certain types of incremental changes (Dragging box, or using a numeric spinner), it seems to be standard form for a set of changes to be grouped into a single user interaction (mouse swipe, or the act of releasing the spinner button), but when dealing with MVVM, I currently only know that the property has changed and not the source of the change. Is there a standard way for these types of interactions to propagate to the view model without completely disintegrating the pattern?

When in doubt the best approach is to take a look at typical behaviour of OS controls and other applications on the platform in order to be consistent with what users will be familiar with. In particular, consistency with the most commonly-used applications. If you examine how other apps approach a UI issue you can often learn a lot, especially about subtle cases you may not have considered in your own design.
1) Conventionally, undoing tends to select the changed item(s), both to highlight what changed and to move the user's input focus back to the last edit so that they can continue. This works particularly well for content like text because if you undo/redo something you typed, chances are you want to continue editing in the area of the text you've just undone/redone. The main choice for you to make with master/detail is whether to select the master object only, or to select the precise detail that changed.
2) Your undo manager can use some intelligence to conglomerate similar actions into a single undo step. For example, if the user types several characters in a row, it could notice that these actions are all alike and concatenate them into a single undo step. Just how it does this depends on how you are storing and processing the undo, but with a decent object oriented design this should be an easy option to add (i.e. ask undo records themselves if they can be conglomerated so you can easily add new types of undo record in future). Beware though that accumulating too many changes into one step can be intensely irritating, so you may find the lazier implementation of one action = 1 step actually achieves a better UX than trying to be too clever. I'd start with brute force and add conglomeration only if you find you end up with lots of repetitive undo sequences (like 100 single pixel-left movements instead of just one 100-pixel jump)

Best way to store data on the client side - ASP.Net + JQuery

I'm writing an admin form for some fairly complex objects. Its a standard repeater which displays some 'basic' information (name, id etc.) for each object row.
Clicking 'Edit' for a row expands it (using JQuery) to reveal the full horror of all the associated editable objects. One of these is a list of documents associated with each row and needs to be JQuery-editable so the user could click 'edit' to open up the full row gui, then un/select checkboxes to de/associate documents and then hit 'Save' to persist everything.
Currently I'm using nested repeaters to store the initially-hidden fields - the repeater generates a hidden formfield containing a comma-separated list of IDs for the assoc documents. When it comes to populating the Edit gui I do a split operation on the delimited string and set/unset the checkboxes as required.
This is proving a nightmare from a maintainability perspective and in my frustrated wanderings of the web in search of a solution i noticed JQuery has some functionality to act as a client-side database. Does any one have any experience of this, and if so, would you recommend it? My custom JS to parse csv-strings and dynamically build the gui is starting to grind me down a bit.
Thanks in advance,
5arx

Your getting into the realm of very advanced client-side behavior, and are bumping into a phenomenon that I think a lot of Web Forms developers hit. Trying to mash two paradigms into each other.
Without going into a lot of detail, my advice would be to go with a "Pure AJAX" approach to solving your client woes. The basic outline is this:
Use AJAX calls to grab a JSON representation of your data structure
In your case... jQuery.ajax jQuery.get jQuery.getJSON
Use a client side templating / binding framework to generate the UI and bind the JSON objects to those elements. There are a couple of good options here.
jQuery
Templating: http://api.jquery.com/category/plugins/templates/
Data Binding: http://api.jquery.com/category/plugins/data-link/
Knockout JS
Knockout can use jQuery templating, but implements it's own version of databinding.
Make actions on the UI call web service methods to handle data manipulation operations.
You can implement the JSON stuff however you feel best suits your needs, but in ASP.Net you basically have two options:
WCF
Page Methods
It's probably going to involve some re-architecting on your part, but if you want to achieve really nice client-side behavior you are going to have to bite the bullet and just do it.

Localizing data that is generated dynamically

This was a hard question for me to summarize so we may need to edit this a bit.
Background
About four years ago, we had to translate our asp.net application for our clients in Mexico. Extensibility and scalability were not that much of a concern at the time (oh yes, I just said those dreadful words) because we only have U.S. and Mexican customers.
Rather than use resource files, we replaced every single piece of static text in our application with some type of server control (asp.net label for example). We store each and every English word in a SQL database. We have added the ability to translate the English text into another language and also can add cultural overrides. For example, hello can be translated to ¡hola! in one language and overridden to ¡bueno! in a different culture. The business has full control over these translations because will built management utilities for them to control everything. The translation kicks in when we detect that the user has a browser culture other than en-us. Every form descends from a base form that iterates through each server control and executes a translation (translation data is stored as a datatable in an application variable for a culture). I'm still amazed at how fast the control iteration is.
The problem
The business is very happy with how the translations work. In addition to the static content that I mentioned above, the business now wants to have certain data translated as well. System notes are a good example of a translation they want. Example "Sent Letter #XXXX to Customer" - the business wants the "Sent Letter to Customer" text translated based on their browser culture.
I have read a couple of other posts on SO that talk about localization but they don't address my problem. How do you translate a phrase that is dynamically generated? I could easily read the English text and translate "Sent", "Letter", "to" and "Customer", but I guarantee that it will look stupid to the end user because it's a phrase. The dynamic part of the system-generated note would screw up any look-ups that we perform on the phrase if we stored the phrase in English, less the dynamic text.
One thought I had... We don't have a table of system generated note types. I suppose we could create one that had placeholders for dynamic data and the translation engine would ignore the placeholder markers. The problem with this approach is that our SQL server database is a replication of an old pick database and we don't really know all the types of system generated phrases (They are deep in the pic code base, in subroutines, control files, etc.). Things like notes, ticklers, and payment rejection reasons are all stored differently. Trying to normalize this data has proven difficult. It would be a huge effort to go back and identify and change every pick program that generated a message.
This question is very close; but I'm not dealing with just system-generated status messages but rather an infinite number of phrases and types of phrases with no central generation mechanism.
Any ideas?

The lack of a "bottleneck" -- what you identify as the (missing) "central generation mechanism" -- is the architectural problem in this situation. Ideally, rearchitecting to put such a bottleneck in place (so you can keep using your general approach with a database of culture-appropriate renditions of messages, just with "placeholders" for e.g. the #XXXX in your example) would be best.
If that's just unfeasible, you can place the "bottleneck" at the other end of the pipe -- when a message is about to be emitted. At that point, or few points, you need to try and match the (English) string that's about to be emitted with a series of well-crafted regular expressions (with "placeholders" typically like (.*?)...) and thereby identify the appropriate key for the DB lookup. Yes, that still is a lot of work, but at least it should be feasible without the issues you mention wrt old translated pick code.

We use technique you propose with insertion points.
"Sent letter #{0:Letter Num} to Customer {1:Customer Full Name}"
Which might be (in reverse Pig Latin, say):
"Ustomercay {1:Customer Full Name} asway entsay etterlay #{0:Letter Num}"
Note that this handles cases where the particular target langue reverses the order of insertion etc. It does not handle subtleties like first, second, etc, which have to be handled with application logic/more phrases:
"This is your {0:first, second, third} warning"

In a pinch I suppose you could try something like foisting the job off onto Google if you don't have a translation on hand for a particular phrase, and stashing the translation for later.
Stashing the translations for later provides both a data collection point for building a message catalog and a rough (if sometimes laughably wonky) dynamically built starter set of translations. Once you begin the process, track which translations have been reviewed and how frequently each have been hit. Frequently hit machine translations can then be reviewed and refined.

Dynamic machine translation is not suitable for a product that you actually expect people to pay money for. The only way to do it is with static templates containing insertion points (as Cade Roux has demonstrated in his answer).
There's no getting around a thorough refactoring of your code to make this feasible. The alternative is to do nothing with those phrases (which is what you're doing now, and it's working out okay, right?). Usually no translation is better than embarrassingly bad translation.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.