Get all lucene values that have a certain fieldName - c#

To solve this problem I created a new Lucene index where all possible distincted values of each field are indexed seperatly.
So it's an index with a few thousand docs that have a single Term.
I want to extract all the values for a certain term. For example, I would like all values that have the fieldName "companyName".
Defining a WildcardQuery is off course not a solution. Neither is enumerating ALL fields and only saving the ones with the correct fieldName.

This should work (I take it it still is in C#)
IndexReader.Open(/* path to index */).Terms(new Term("companyName", String.Empty));

Related

How to save matrix data to an object that you can query with indexes

I have data in the following format
Henry |Ford |34|Yes|Absolutely
Wiliman|Tenner|44|No |Why Not?
Each row can have a different count of columns. I want to have all values saved as strings. I want to store the data in a way I can query them later with indexes but have not found the right way yet.
I came up with the idea to use a List(Of List(of String)) but have not found a way to query this with indexes.
Another way would be to use a dataset and write the data into a datatable but this looks to me like the sledgehammer cracking nuts method.
As Nkosi mentioned, you can just access items in your list with listToAccess(i)(j).
However, you do want to be careful as finding values in a list is O(n). With a list of lists, finding becomes O(n^2).
If your secondary lists are roughly the same size, I would recommend using an array of arrays (matrix). You would make the secondary size the maximum size of your current secondary list. Although this would waste some space, your find would become O(1).
Dim matrix As Double(,) = New Double(firstSize, secondSize) {}
Then you can do matrix(i,j)
Referencing this answer: add a list into another list in vb.net
Reading them is like accessing any other list.
To get the first field in the first record
return records(0)(0)
second field in first record
return records(0)(1)

Comparing Windows.Forms.ListViewItem Items for Uniqueness

I have a Windows.Forms.ListView where the user shall be able to add and remove entries. Particularly, those are files (with attributes) the user can pick through a dialog. Now, I want to check whether the file names / entries I get from the file picker are already in the list; in other words, there shall only be unique items in the ListView.
I could not find any way to compare ListViewItems to check whether the exact same entry and information is already present in my ListView. The only way I see now is to:
> Loop through the files I get from the picker (multiselect is true)
> Loop through ListView.Items
compare ListViewItem.Text
> Loop through ListViewItem.SubItems
compare .Text
If during the comparisons a complete match was found, the new entry is a duplicate and thus is not added afterwards.
This seems like an awful lot of effort to do something that I would find to be a function that is not so uncommon. Is there any other way to achieve this?
The file system itself uses only the filename to test for uniqueness, so you should do the same, no need to compare sub-items too.
Items in a ListView typically represent some object. What I usually do is to assign that object (or at least some value identifying the object) to the Tag property of the corresponding ListViewItem when they are added to the list. That way you get a quite simple setup where you can compare items by getting the values from the Tag property and perform the comparison on those objects instead of the list view representation of them.

How to fetch entries starting with the given string from a SQL Server database?

I have a database with a lot of words to be used in a tag system. I have created the necessary code for an autocomplete box, but I am not sure of how to fetch the matching entries from the database in the most efficient way.
I know of the LIKE command, but it seems to me that it is more of an EQUAL command. I get only the words that looks exactly like the word I enter.
My plan is to read every row, and then use C#'s string.StartsWith() and string.Contains() functions to find words that may fit, but I am thinking that with a large database, it may be inefficient to read every row and then filter them.
Is there a way to read only rows that starts with or contains a given string from SQL Server?
When using like, you provide a % sign as a wildcard. If you want strings that start with Hello, you would use LIKE 'Hello%' If you wanted strings with Hello anywhere in the string, you would use LIKE '%Hello%'
As for efficiency, using Like is not optimal. You should look into full text search.
I know of the LIKE command, but it seems to me that it is more of an EQUAL command. I get only the words that looks exactly like the word I enter.
That's because you aren't using wildcards:
WHERE column LIKE 'abc%'
...will return rows where the column value starts with "abc". I'll point out that when using wildcards, this is the only version that can make use of an index on the column... er column.
WHERE column LIKE '%abc%'
...will return rows where the column value contains "abc" anywhere in it. Wildcarding the left side of a LIKE guarantees that an index can not be used.
SQL Server doesn't natively support regular expressions - you have to use CLR functions to gain access to the functionality. But it performs on par with LIKE.
Full Text Search (FTS) is the best means of searching text.
You can also implement a StartWith functionality using the following statements:
LEFT('String in wich you search', X) = 'abc'
CHARINDEX('abc', 'String in wich you search') = 1
'String in wich you search' LIKE 'abc%'
Use the one wich performs best.
You can use CONTAINS in T-SQL, but I'm pretty sure you have to have to be using full-text indexing for the table involved in your query.
Contains
Getting started with Full-Text Search

Inspecting Lucene.NET index with Luke want to replicate NHibernate.Search view

I am trying to put together an index using terms, which I specify as a comma separated list. I want to replicate the display in Luke as seen here:
http://ayende.com/Blog/archive/2009/05/03/nhibernate-search-again.aspx
But my index value just shows as a single field with the comma separate list value. For example:
Tags term,anotherterm
When I search my index, it will return results if I search with "term" but will not return anything if I search with "anotherterm"
I thought the indexing process would break the comma separate list apart into separate values but this does not seem to be the case.
Anyone got any ideas?
Thanks
The collection to index is ISet<T> and I suspect that T is a type with [Indexed] attribute and has a property Name marked with [Field] attribute.
This was answered as part of this question:
Lucene.NET search index approach

How can I create a simple class that is similar to a datatable, but without the overhead?

I want to create a simple class that is similar to a datatable, but without the overhead.
So loading the object with a sqldatareader, and then return this custom datatable-like object that will give me access to the rows and columns like:
myObject[rowID]["columnname"]
How would you go about creating such an object?
I don't want any built in methods/behavior for this object except for accessing the rows and columns of the data.
Update:
I don't want a datable, I want something much leaner (plus I want to learn how to create such an object).
This type of structure can be easily created with a type signature of:
List<Dictionary<string, object>>
This will allow access as you specify and should be pretty easy to populate.
You can always create an object that inherits from List < Dictionary < string, object > > and implements a constructor that takes a SqlDataReader. This constructor should create a enw dictionary for each row, and insert a new entry into the dictionary for each column, using the column name as the key.
I think you're missing something about how .Net works. The extra overhead involved in a DataTable is not significant. Can you point to a specific performance problem in existing code that you believe is caused by a datatable? Perhaps we can help correct that in a more elegant way.
Perhaps the specific thing you're asking about is how to use the convenient ["whatever"] indexing syntax in your own table object.
If so, I suggest you refer to this MSDN page on indexers.
Dictionary<int,object[]> would be better than List<Dictionary<string, object>>. You don't really need a dictionary for each row, since column names are the same for all rows. And if you want to have it lightweight, you should use column indexes instead of names.
So if you have a column "Name" that is a 3rd column, to get its value "Name" from a row ID 10, the code would be:
object val = table[10][2];
Another option is SortedList<int,object[]>... depending on the way you access the data (forward only or random access).
You could also use MultiDictionary<int,object> from PowerCollections.
From the memory usage perspective, I think the best option would be to use a single dimension array with some slack capacity. So after each, say 100, rows, you would create a new array, copy the old contents to it and leave 100 empty rows at the end. But you would have to keep some sort of an index when you delete a row, so that it is marked as deleted without resizing the array.
Isn't this a DataSet/DataTable? Maybe I didn't get the question.
Also, what is the programming language?

Categories