I'm writing a class to store some kind of a table-structure.
Now each column in this table structure has a name, and an index.
Now each row in this column will be looped through, and the data will 90% of the cases be requested using the name of the column rather then the index of it.
So what's a good data structure to store the columns, so that it can retrieve the index very quickly based upon the name. Right now I'm using a simple string[], but I wonder if there are faster ways to do this.
Parts of the code:
private string[] _columns;
private int _width;
private int getIndex(string columnName)
{
for (int i = 0; i < _width; i++)
{
if (_columns[i] == columnName) return i;
}
return -1;
}
The names of the columns will be constant after they've been set, and they're mostly about 10-16 characters long.
Thanks in advance.
Since you are usually going to access columns by the name, this sounds like a good place to use a Map (Dictionary class in C#) that maps Strings to Columns (String arrays). That would allow O(1) access for the name rather than the current O(n) in the above code.
The disadvantage is that you wouldn't be able to access directly by column index anymore. However, this is simple to solve--just keep your list of column names and use those to index! You can then call
_columnsMap[_columns[index]]
if you ever need to index by number, and it is still O(1) time.
Use a Dictionary<string,int> to store the names of the columns against their ID.
Using your example (which misses how _columns is populated):
private IDictionary<string,int> _columns;
private int _width;
private int getIndex(string columnName)
{
return _columns[columnName];
}
Related
I am creating static class using a static method that is comparing a string filled with selections from a user input, and a predefined array of what is "supposed to be the inputs";
My concern is the placement of the pre-defined array within the class and if the correct data type to be used is actually an array or a dictionary.
I will have roughly 150 strings max in the pre-defined and ready to compare against string array.
here is what I have so far.
public static class SelectionMatchHelper
{
static readonly string[] predefinedStrings = {"No answer", "Black", "Blonde"};
public readonly static bool SelectionMatch(string[] stringsWeAreComparingAgainst, int validCount)
{
int numberOfMatches = 0;
for (int x = 0; x < "length of string array"; x++)
{
//will loop through and check if the value exists because the array to match against will not always have the same index length
numberOfMatches += 1;
}
numberOfMatches.Dump();
if (numberOfMatches == 0 || numberOfMatches < validCount || numberOfMatches > validCount) return false;
return true;
}
}
what this basically does, is based on the number of parameters the user must fulfill, the method gets the matches, if it doesn't equal that amount then it returns false. The input the user is using is a drop down, so this is only being used to make sure my values aren't tampered with prior to saving.
My question is what data type is best used for this scenario an string array/list or dictionary? The second question is where should that be placed in order to avoid a thread issue, inside the method or out?
EDIT - I just want to add that the pre-defined values would remain the same, so I would end up making that field a readonly const value.
EDIT 2 - Just re-checked my code I wont be using CompareOrdinal because I totally forgot the part where the order matters. So it will be a key look up. So I will remove the inside of the method so people don't confused. The main question is still the same.
Thanks for your help everyone.
From readability point of view HashSet is the best type as it specifically exists for "item is present in the set".
static readonly HashSet<string> predefinedStrings = new HashSet<string>(
new []{"No answer", "Black", "Blonde"},
StringComparer.Ordinal);
if (predefinedStrings.Contains("bob"))....
Fortunately HashSet also thread safe for read-only operations, provides O(1) check time and supports case insensitive comparison if you need one.
I have an unmanaged API function and below mentioned is it's equivalent c# code...
Myfunction(unit handle, int index, out bool flag,out int value, out string name);
Here the variable index varies from 0 to 59. I am able to fetch the data individually. I mean I can pass value to the variable index from a TextBox and I am getting the corresponding outputs. But how to collect the values in an array fashion. Each time I don't want to give index input I simply want to display all the values in a ListBox... How to achieve this?
Before we start, this is not a multidimensional array. This is a simple linear array with one index.
Create a struct to hold the values for one item:
struct MyItem
{
bool flag;
int value;
string name;
}
Then have a function return an array of these:
MyItem[] GetItems()
{
MyItem[] result = new MyItem[ItemCount];
// Populate result
return result;
}
Alternatively you might well store the data in a generic collection like List<MyItem>. Fundamentally the key is to creat a structure that can contain a single item, and then operate on collections of items.
Let us assume we have an open SqlConnection and execute an arbitrary query resulting in a table consisting of one column Foo which contains integers.
To read those integers I noticed two kinds of possibilities to access the values in the column:
using (var reader = command.ExecuteReader())
{
if (reader.HasRows)
{
while (reader.Read())
{
// access value directly either by column name or index
int i = (int)reader["Foo"];
int j = (int)reader[0];
// access value by using GetOrdinal
int k = reader.GetInt32(reader.GetOrdinal("Foo"));
}
}
}
Both approaches result in the same values of course. I am using the first approach as it seems easier to me to use the column name to access the value directly instead of using it to get its index and then using that index to access the value.
But I am quite new to the subject so what are the best practices here? Current conventions? Differences in matters of performance?
All are correct way but reader["Foo"] and reader[0] are using the overloaded indexers defined. It's always easy/readable and recommended to use the column name in the indexer instead of using the index though. Not sure about any performance differences between them.
int i = (int)reader["Foo"];
int j = (int)reader[0];
using reflector you can see the indexer [] internally is the same as calling the methods
public override object this[int i]
{
get
{
return this.GetValue(i);
}
}
public override object this[string name]
{
get
{
return this.GetValue(this.GetOrdinal(name));
}
}
so the actual difference is. if you know the position and care about performance use the int version of the methods.
int j = (int)reader[0];//direct array access for some column
int j = reader.GetInt32(0);
if you don't know the position or prefer readability use the string version
//must first goto hash table and waist time looking for column index
int j = (int)reader["price"]; //but at least we know the meaning of the column
int j = reader.GetInt32(reader.GetOrdinal("price"));
int j = (int)reader.GetValue(reader.GetOrdinal("price"));
and to finalize the difference bettween GetInt32 and GetValue is just that GetInt32 does the type validation and cast for you, so if you know the type of data makes you life easier..
PS. the performance hit of looking up the index of the column by name is usually ignorable.. but.. is not to be dismissed, I have a project where GetOrdinal that is one of the most called functions hundreds of thousand of times, summing up to seconds of time, that I could have avoided by using ints, and now that I'm hitting bottlenecks, I can't rewrite the application.
Something like that:
using (var reader = command.ExecuteReader()) {
// if (reader.HasRows) is redundant
// do not repeat this in the loop
int k_index = reader.GetOrdinal("Foo");
while (reader.Read()) {
// reader["Foo"] is not necessary int;
// better practice is to convert since reader["Foo"] could be, say, Oracle Number
int i = Convert.ToInt32(reader["Foo"]);
// usually, reader[123] (instead of reader["MyField"]) syntax looks ugly, but
// it may happen that, say, "id" is always the first field in the query -
// so reader[0] syntax is reasonable
int j = Convert.ToInt32(reader[0]);
// what if reader[index] is declared as Int64 but contains Int32 values - convert
int k = Convert.ToInt32(reader[k_index]);
}
}
You will use SqlDataReader a lot so keep it more simple as possible, specifying column name:
int i = (int)reader["Foo"];
Right now I use Dictionary to store some configuration data in my app. The data gets added to Dictionary only once but it gets very frequent queries. Dictionary has around 2500 items, all "keys" are unique.
So right now I have something like this:
private Dictionary<string, string> Data;
public string GetValue(string key) // This gets hit very often
{
string value;
if (this.Data.TryGetValue(key, out value))
{
return value;
}
...
}
Is there more optimal way to do this?
What you have is pretty efficient. The only way to improve performance that I can think of is to use int as the dictionary key, instead of string. You would need to run performance tests to see how much it makes a difference in your use case -- it may or may not be significant.
And I would use an enum for storing the settings for convenience. Of course, this assumes you have a known set of settings.
private Dictionary<int, string> Data;
public string GetValue(MyAppSettingsEnum key)
{
string value;
if (this.Data.TryGetValue((int)key, out value))
{
return value;
}
...
}
Note that I don't use the enum directly as the dictionary key, as it is more efficient to use an int as the key. More details on that issue here.
Using TryGetValue is a pretty optimal way of returning an item so there's not much you can improve on that front. However, if this isn't causing a bottleneck at the moment, I wouldn't worry too much about trying to optimize TryGetValue.
One thing that you can do, but isn't shown in your code so I don't know if you are, is to create a Dictionary object with an estimated capacity. Since you seem to know the rough number of items that will be expected, creating the Dictionary with that capacity will improve performance as it would reduce the number of times .NET has to resize the dictionary.
From MSDN:
If the size of the collection can be estimated, specifying the initial
capacity eliminates the need to perform a number of resizing
operations while adding elements to the Dictionary.
The only faster way is using an array if your keys are int and have a short range.
As you can see from the source code of System.Collections.Generic.Dictionary (available at http://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs) the most frequent code used in your case is
private int FindEntry(TKey key) {
if( key == null) {
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}
if (buckets != null) {
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i;
}
}
return -1;
}
as you can see further the lookup is fast if the comparer.GetHashCode is fast and produces nice hash code distribution, if possible a perfect hash function.
The dictionary construction code is not visible in your example, but if you use the default constructor then the dictionary will use the default comparer EqualityComparer<string>.Default.
Providing your own comparer with time & space efficient hash function might speed up the code.
If you don't know how a good hash function should look like in your case, then using interned strings may also give you some boost (see http://www.dotnetperls.com/string-intern (or MSDN: String.Intern Method))
I have a listview and I am trying to sort it based on a column. I have the columnclick event etc working, and it sorts, but I have the following problem:
I can't seem to add items to the listview as integers. This is a problem as if I have a column of ints that I had to use ToString() on, the sort puts 10 ahead of 2.
Does anyone know how I can add items as int's so that the sort has the desired functionality. Also, not all columns are int, there are some string columns and I'd like the sort to work on those too.
For reference, I used the following tutorial for the sort code: http://support.microsoft.com/kb/319401
Cheers
You can create a sorter class that implements IComparer and assign it to the ListViewItemSorter property of the ListView.
IComparer has a method Compare. Two ListViewItem instances are passed to that method. You need to read the column value, then parse it to int and return the correct comparison result (int based instead of string based).
You can create your own ListViewItem class that creates the string value for the column but also holds the original int value to avoid the int.Parse call in the comparer.
Untested example:
public class MyItemComparer : IComparer
{
public int Compare(object x, object y)
{
ListViewItem xItem = (ListViewItem)x;
ListViewItem yItem = (ListViewItem)y;
int a = int.Parse(xItem.SubItems[0]);
int b = int.Parse(yItem.SubItems[0]);
return a.CompareTo(b);
}
}
You can detect if the selected column has numbers.
Write this in the compare function
int intX = 0, intY = 0;
if(int.TryParse(listviewX.SubItems[ColumnToSort].Text, out intX)
&& int.TryParse(listviewY.SubItems[ColumnToSort].Text, out intY))
{
return intX.CompareTo(inty);
}
Maybe is problem if some column contains numbers and text.