I have a view which I've created by joining several tables whose records can be changed so the content of the columns of the view can also be changed.
Columns of the view contain data like address,random numbers,date,some random string etc.
I'm accepting search text from user and returns rows if any of its column contain text entered by the user.
My view have millions of records so normal like query won't work(takes long time) ?
What is the most efficient way to search this view as it changes as its tables get changed ?
I'm using oracle database, C#, entityframework.
For better performance you should properly add index in the original table .. these indexes are automatically refreshed by rdbms engine on each change .. so is impossible that you obtain wrong data by the index value .. the index value and the table data contain the same values..
You don't need to reindex every time ... sometimes (monthly) you can updated the related statistcs ..
so the index can change you performance in better a lot .. and this also for the view
The view in create on the top of the original table on fly and is not a stored copy of the original tables .. so the indexes help the view to render more fastly the expected result ..
the indexes Indexes when properly designed, serve for important purposes in a database server:
They let the rdbms
find groups of adjacent rows instead of single rows.
avoid sorting by reading the rows in a desired order.
let the server satisfy (sometimes) entire queries from the index alone, avoiding (when possible) the need to access the table at all.
from mysql https://dev.mysql.com/doc/refman/5.5/en/mysql-indexes.html
https://dev.mysql.com/doc/refman/5.5/en/column-indexes.html
https://dev.mysql.com/doc/refman/5.5/en/multiple-column-indexes.html
http://code.tutsplus.com/tutorials/top-20-mysql-best-practices--net-7855
http://use-the-index-luke.com
Problem summary:
C# (MVC), entity framework 5.0 and Oracle.
I have a couple of million rows in a view which joins two tables.
I need to populate dropdownlists with filter-posibilities.
The options in these dropdownlists should reflect the actual contents
of the view for that column, distinct.
I want to update the dropdownlists whenever you select something, so
that the new options reflect the filtered content, preventing you
from choosing something that would give 0 results.
Its slow.
Question: whats the right way of getting these dropdownlists populated?
Now for more detail.
-- Goal of the page --
The user is presented with some dropownlists that filter the data in a grid below. The grid represents a view (see "Database") where the results are filtered.
Each dropdownlist represents a filter for a column of the view. Once something is selected, the rest of the page updates. The other dropdownlists now contain the posible values for their corresponding columns that complies to the filter that was just applied in the first dropdownlist.
Once the user has selected a couple of filters, he/she presses the search button and the grid below the dropdownlists updates.
-- Database --
I have a view that selects almost all columns from two tables, nothing fancy there. Like this:
SELECT tbl1.blabla, tbl2.blabla etc etc
FROM table1 tbl1, table2 tbl2
WHERE bsl.bvz_id = bvz.id AND bsl.einddatum IS NULL;
There is a total of 22 columns. 13 VARCHARS (mostly small, 1 - 20, one of em has a size of 2000!), 6 DATES and 3 NUMBERS (one of them size 38 and one of them 15,2).
There are a couple of indexes on the tables, among which the relevant ID's for the WHERE clause.
Important thing to know: I cannot change the database. Maybe set an index here and there, but nothing major.
-- Entity Framework --
I created a Database first EDMX in my solution and also mapped the view. There are also classes for both tables, but I need data from both of them, so I don't know if I need them. The problem by selecting things from either table would be that you can't apply half of the filtering, but maybe there are smart way's I didn't think of yet.
-- View --
My view is strongly bound to a viewModel. In there I have a IEnumerable for each dropdownlist. The getter for these gets its data from a single IEnumerable called NameOfViewObjects. Like this:
public string SelectedColumn1{ get; set; }
private IEnumerable<SelectListItem> column1Options;
public IEnumerable<SelectListItem> Column1Options
{
get
{
if (column1Options == null)
{
column1Options= NameOfViewObjects.Select(item => item.Column1).Distinct()
.Select(item => new SelectListItem
{
Value = item,
Text = item,
Selected = item.Equals(SelectedColumn1, StringComparison.InvariantCultureIgnoreCase)
});
}
return column1Options;
}
}
The two solutions I've tried are:
- 1 -
Selecting all columns in a linq query I need for the dropdownlists (the 2000 varchar is not one of them and there are only 2 date columns), do a distinct on them and put the results into a Hashset. Then I set NameOfViewObjects to point towards this hashset. I have to wait for about 2 minutes for that to complete, but after that, populating the dropdownlists is almost instant (maybe a second for each of them).
model.Beslissingen = new HashSet<NameOfViewObject>(dbBes.NameOfViewObject
.DistinctBy(item => new
{
item.VarcharColumn1,
item.DateColumn1,
item.DateColumn2,
item.VarcharColumn2,
item.VarcharColumn3,
item.VarcharColumn4,
item.VarcharColumn5,
item.VarcharColumn6,
item.VarcharColumn7,
item.VarcharColumn8
}
)
);
The big problem here is that the object NameOfViewObject is probably quite large, and even though using distinct here, resulting in less than 100.000 results, it still uses over 500mb of memory for it. This is unacceptable, because there will be a lot of users using this screen (a lot would be... 10 max, 5 average simultaniously).
- 2 -
The other solution is to use the same linq query and point NameOfViewObjects towards the IQueryable it produces. This means that every time the view wants to bind a dropdownlist to a IEnumerable, it will fire a query that will find the distinct values for that column in a table with millions of rows where most likely the column it's getting the values from is not indexed. This takes around 1 minute for each dropdownlist (I have 10), so that takes ages.
Don't forget: I need to update the dropdownlists every time one of them has it's selection changed.
-- Question --
So I'm probably going at this the wrong way, or maybe one of these solutions should be combined with indexing all of the columns I use, maybe I should use another way to store the data in memory, so it's only a little, but there must be someone out there who has done this before and figured out something smart. Can you please tell me what would be the best way to handle a situation like this?
Acceptable performance:
having to wait for a while (2 minutes) while the page loads, but
everything is fast after that.
having to wait for a couple of seconds every time a dropdownlist
changes
the page does not use more than 500mb of memory
Of course you should have indexes on all columns and combinations in WHERE clauses. No index means table scan and O(N) query times. Those cannot scale under any circumstance.
You do not need millions of entries in a drop down. You need to be smarter about filtering the database down to manageable numbers of entries.
I'd take a page from Google. Their type ahead helps narrow down the entire Internet graph into groups of 25 or 50 per page, with the most likely at the top. Maybe you could manage that, too.
Perhaps a better answer is something like a search engine. If you were a Java developer you might try Lucene/SOLR and indexing. I don't know what the .NET equivalent is.
First point you need to check is your DB, make sure you have to right indexes and entity relations in place,
next if you want to dynamical build your filter options then you need to run the query with the existing filters to obtain what the next filter can be. there are several ways to do this,
firstly you can query the data and extract the values from the return, this has a huge load time and wastes time returning data you don't want (unless you are live updating the results with the filter and dont have paging, in which case you might aswell just get all the data and use linqToObjects to filter)
a second option is to have a parallel queries for each filter that returns the possible filters, so filter A = all possible values of A from data, filter b = all possible values of B when filtered by A in the data, C = all possible values of C when filtered by A & B in the data, etc. this is better than the first but not by much
another option is the use aggregates to speed things up, ie you have a parallel query as above but instead of returning the data you return how many records are returned, aggregate functions are always quicker so this will cut your load time dramatically but you are still repeatedly querying a huge dataset to it wont be exactly nippy.
you can tweak this further using exist to just return a 0 or 1.
in this case you would look at a table with all possible filters and then remove the ones with no values from the parallel query
the next option will be the fastest by a mile is to cache the filters in the DB, with a separate table
then you can query that and say from Cache, where filter = ABC select D, the problem with this maintaining the cache, which you would have to do in the DB as part of the save functions, trigggers etc.
Another solution that can be added in addition to the previous suggestions is to use the /*+ result_cache */ hint, if your version of Oracle supports it (Oracle version 11g or later). If the output of the query is small enough for a drop-down list, then when a user enters criteria that matches the same criteria another user used, the results are returned in a few milliseconds instead of a few seconds or minutes. Result cache is wonderful for queries that return a small set of rows out of millions.
select /*+ result_cache */ item_desc from some_table where item_id ...
The result cache is automatically flushed when any insert/updates/deletes occur on the database tables.
I've done something 'kind of' similar in the past - if you can add a table to the database then I'd explore introducing a 'scratchpad' type table where results are temporarily stored as the user refines their search. Since multiple users could be working simultaneously the table would have to have an additional column for identifying the user.
I'd think you'd see some performance benefit since all processing is kept server-side and your app would simply be pulling data from this table. Since you're adding this table you would also have total control over it.
Essentially I'd imagine the program flow would go something like:
User selects some filters and clicks 'Search'.
Server populates scratchpad table with results from that search.
App populates results grid from scratchpad table.
User further refines search and clicks 'Search'.
Server removes/adds rows to scratchpad table as necessary.
App populates results grid from scratchpad table.
And so on.
Rather than having all the users results in one 'scratchpad' table you could possibly explore having temporary 'scratchpad' tables per user.
I am working on a Order (MVC) system where the orders transition to different states, i.e new order, paid, shipped, etc. Each state can have multiple transitions. Originally I thought I would have a status table with an ID and Description and then a transition table that would have current status and transitions status, with each transition on a single row. In order to populate a selection box, I would have to do the join to get the descriptions. Now I am thinking I could do it all in one table and add a comma separate column which would list the possible transitions. Is this a good idea or is there a better way?
Any RDBMS promotes database normalization, There are 6 forms of database normalization. Normally if you can get to first three it is good enough.
The first Normal Form states: you should have only one piece of information in a column and a column should store on one piece/Type of information.
Now if your case when you are try to save a comma deliminited list of transitions. if you have to pick only record with a particular type of transitions state?? it will be a messy query.
Also imagine a scenario where you have to update a column for a particular record when transition state is changed, again a very messy , error prone and performance killer query.
Therefore follow the basic rules of Database Normalization, and stick to your 1st which was to create a separate table and use IDs to define transition state, and add a new row whenever a transition changes.
My Suggestion
Simply Have one column for [current status] and one Column for [transition], add a new row everytime any of the values change.
Also a datetime column with default value to current datetime. Which will allow you to go back in history and see different status and transition states of a record in point in time.
Have only one column in only One table which stores this information reference this column in other tables if you need to.
I've been tasked with an enhancement to our order system that will require importing segmented GL account codes for assignment on individual line items of an order.
I need to support querying the codes by segment1, segment2, etc in order to load cascading dropdown boxes for assignment by the user. The GL codes will have one or more segments delimited by a character. An example of a code is "1010.1034001.99.01".
I've loaded several thousand codes into a table for testing where the entire string value exists in one column (delimited by a character). I've created two variations of functions that return rows where segment1 value is equal to some parameter. The query also supports further querying by providing additional parameters for other segment values.
I intend to support these queries from the table using Entify Framework 6, but used sql functions to get a feel for what the performance may be when the GL account codes are stored in one column. Performance was not as good as I had hoped.
Does anyone have recommendations on how best to store this data (there may be 200,000 codes). Do you feel that I can query using EF and expect performant results?
Would a hierarchy organization make more sense for this data? Our team was hopeful to store the delimited values on one column.
Thanks in advance.
If you would use a table with three columns you could store the values cascading, enabling you to make your queries a lot easier and probably faster. Why would your team hope to store it in one column, what advantage does that have?
if you have
ID
Code
ParentCodeId
where ID is a unique key and ParentCodeId is a nullable reference to that unique Id you can split your exaple code as follows:
ID Code Parent
1 1010 null
2 1034001 1
3 99 2
4 01 3
By applying some logic when importing your codes, you can check if a code already exists as a parent on the needed level so you don;t have to repeat them, and that way you coul dget all codes that start with 10100 by selecting on selectiong on parentID 1.
I am developing a project which access a database in sql server 2012 through C# and performs CRUD modifications on it. Here is the main form:
both listboxes on the right are used to deal with informations contained in an intermediate tables (many-to-many relationship). Here is how they work: Basically, you choose types and abilities from the comboboxes, then click on 'add' and they are added in the respective listboxes. To delete items in the listboxes, you just need to select one item and then click 'delete'.
Here's another print to clear any doubts:
On the first print I've provided here, you will see a 'Bulbasaur' data. The PokémonID = 1 is represented by the 'Bulbasaur'; TypeID = 1 and 12 are 'Grass' and 'Poison', respectively; and AbilityID = 1 is 'Overgrow'.
I was trying to create an update function (update_click) using sql queries (SqlCommand, SqlDataReader and so on...), but without deleting the whole associations of a pokémon and its types (and abilities) and then re-adding them, based on the new modifications on the listboxes. I want to avoid it in order to save some memory in cases that some pokémon may hold thousands of types and abilities...
Is it possible? If necessary, I can send you my C# project for more details.
I would suggest a combination of:
1) Use table-valued parameters to send all the data (in its present state in your listboxes) to your T-SQL query or stored procedure at once
2) Consider using the EXCEPT and/or INTERSECT operators (as well as any necessary LEFT or RIGHT JOIN) to compare the contents of your table-valued parameter (essentially a table itself) with the data currently in the underlying tables
3) UPDATE/DELETE/INSERT accordingly
Essentially it sounds like what you are saying you'd like to do is to only "send the changes" to the database:
add any abilities that were not there before;
remove any abilities that were in the database but have been removed
If that's the case then what you need to be able to do is simple set operations:
Set Union
Set Intersect
Set Difference
while you can perform these operations using simple arrays or lists, it is much more efficient to use an actual set implementation such as a generic HashSet<>. With a correct implementation using sets or hash tables you ca achieve linear-time performance.
I hope this helps point you in the right direction..