Imagine a table and a button to add new rows to the table. On each click to the button, a new row will be inserted at the end of the table. The button event is functioning as follows:
first of all, it points out a reference row to copy.
whatever the controls and text are inside this referenced row they are copied to a datatable. Since a datatable cannot hold controls I am converting them to strings and saving them like that.
At the end, the datatable is stored within a cache.
Finally, on each page_init event I re-create the table using the data inside the datatable. Everything works fine.
However, I'm curious. Since I have from 3 to 5 tables in the page and all of them are stored in a different cache with a different datatable, and all of them are re-created during the page-cycle events, may it cause any problems in the future? By the way, please note that once the user leaves the page the cache is deleted.
I did not want to paste the whole code here since it's a bit long and may alienate people from reading the question. But I can give some statistics so that you can make some comments on it.
The class I've written is 118 lines long.
During the process of recreation of the table, there are 3 nested for/foreach loops, but they are not that long (the average loop times is probably from 5 to 10 for each).
And finally, as mentioned above, to re-create the table a datatable that is saved in cache is used.
So, I ask the question again: The code works perfectly, but I would like to know if building such a code is performance-friendly?
It depends completely on the amount of data in the table (number of rows / columns).
If its small like, pulling down a list of 10 users and their logins and passwords for example, it will work just fine with no performance issues.
But if this is going to be thousands and thousands of records, this will probably start to have performance issues.
Edit: Write a script to fill the database to a "worse case" expected amount of data, and then see how it performs.
Related
This may be a dumb question, but I wanted to be sure. I am creating a Winforms app, and using c# oledbconnection to connect to a MS Access database. Right now, i am using a "SELECT * FROM table_name" and looping through each row to see if it is the row with the criteria I want, then breaking out of the loop if it is. I wonder if the performance would be improved if I used something like "SELECT * FROM table_name WHERE id=something" so basically use a "WHERE" statement instead of looping through every row?
The best way to validate the performance of anything is to test. Otherwise, a lot of assumptions are made about what is the best versus the reality of performance.
With that said, 100% of the time using a WHERE clause will be better than retrieving the data and then filtering via a loop. This is for a few different reasons, but ultimately you are filtering the data on a column before retrieving all of the columns, versus retrieving all of the columns and then filtering out the data. Relational data should be dealt with according to set logic, which is how a WHERE clause works, according to the data set. The loop is not set logic and compares each individual row, expensively, discarding those that don’t meet the criteria.
Don’t take my word for it though. Try it out. Especially try it out when your app has a lot of data in the table.
yes, of course.
if you have a access database file - say shared on a folder. Then you deploy your .net desktop application to each workstation?
And furthermore, say the table has 1 million rows.
If you do this:
SELECT * from tblInvoice WHERE InvoiceNumber = 123245
Then ONLY one row is pulled down the network pipe - and this holds true EVEN if the table has 1 million rows. To traverse and pull 1 million rows is going to take a HUGE amount of time, but if you add criteria to your select, then it would be in this case about 1 million times faster to pull one row as opposed to the whole table.
And say if this is/was multi-user? Then again, even on a network - again ONLY ONE record that meets your criteria will be pulled. The only requirement for this "one row pull" over the network? Access data engine needs to have a useable index on that criteria. Of course by default the PK column (ID) always has that index - so no worries there. But if as per above we are pulling invoice numbers from a table - then having a index on that column (InvoiceNumber) is required for the data engine to only pull one row. If no index can be used - then all rows behind the scenes are pulled until a match occurs - and over a network, then this means significant amounts of data will be pulled without that index across that network (or if local - then pulled from the file on the disk).
Suppose we have the following situation:
We have 2-3 tables in database with a huge amount of data (let it be 50-100mln of records) and we want to add 2k of new records. But before adding them we need to check our db on duplicates. So if this 2k contains records which we have in our DB we should ignore them. But to find out whether new record is a duplicate or not we need info from both tables (for example we need to make left join).
The idea of solution is: one task or thread create a suitable data for comparison and pushes data into queue (by batches, not record by record), so our queue(or concurrentQueue) is a global variable. The second thread gets batch from queue and look it through. But there's a problem - memory is growing...
How can I clean memory after I've surfed through the batch?
P.S. If smb has another idea how to optimize this process - please describe it...
This is not the specific answer to the question you are asking, because what you are asking, doesn't really make sense to me.
if you are looking to update specific rows:
INSERT INTO tablename (UniqueKey,columnname1, columnname2, etc...)
VALUES (UniqueKeyValue,value1,value2, etc....)
ON DUPLICATE KEY
UPDATE columnname1=value1, columnname2=value2, etc...
If not, simply ignore/remove the update statement.
This would be darn fast, considering, it would use the unique index of whatever field you want to be unique, and just do an insert or update. No need to validate in a separate table or anything.
Problem summary:
C# (MVC), entity framework 5.0 and Oracle.
I have a couple of million rows in a view which joins two tables.
I need to populate dropdownlists with filter-posibilities.
The options in these dropdownlists should reflect the actual contents
of the view for that column, distinct.
I want to update the dropdownlists whenever you select something, so
that the new options reflect the filtered content, preventing you
from choosing something that would give 0 results.
Its slow.
Question: whats the right way of getting these dropdownlists populated?
Now for more detail.
-- Goal of the page --
The user is presented with some dropownlists that filter the data in a grid below. The grid represents a view (see "Database") where the results are filtered.
Each dropdownlist represents a filter for a column of the view. Once something is selected, the rest of the page updates. The other dropdownlists now contain the posible values for their corresponding columns that complies to the filter that was just applied in the first dropdownlist.
Once the user has selected a couple of filters, he/she presses the search button and the grid below the dropdownlists updates.
-- Database --
I have a view that selects almost all columns from two tables, nothing fancy there. Like this:
SELECT tbl1.blabla, tbl2.blabla etc etc
FROM table1 tbl1, table2 tbl2
WHERE bsl.bvz_id = bvz.id AND bsl.einddatum IS NULL;
There is a total of 22 columns. 13 VARCHARS (mostly small, 1 - 20, one of em has a size of 2000!), 6 DATES and 3 NUMBERS (one of them size 38 and one of them 15,2).
There are a couple of indexes on the tables, among which the relevant ID's for the WHERE clause.
Important thing to know: I cannot change the database. Maybe set an index here and there, but nothing major.
-- Entity Framework --
I created a Database first EDMX in my solution and also mapped the view. There are also classes for both tables, but I need data from both of them, so I don't know if I need them. The problem by selecting things from either table would be that you can't apply half of the filtering, but maybe there are smart way's I didn't think of yet.
-- View --
My view is strongly bound to a viewModel. In there I have a IEnumerable for each dropdownlist. The getter for these gets its data from a single IEnumerable called NameOfViewObjects. Like this:
public string SelectedColumn1{ get; set; }
private IEnumerable<SelectListItem> column1Options;
public IEnumerable<SelectListItem> Column1Options
{
get
{
if (column1Options == null)
{
column1Options= NameOfViewObjects.Select(item => item.Column1).Distinct()
.Select(item => new SelectListItem
{
Value = item,
Text = item,
Selected = item.Equals(SelectedColumn1, StringComparison.InvariantCultureIgnoreCase)
});
}
return column1Options;
}
}
The two solutions I've tried are:
- 1 -
Selecting all columns in a linq query I need for the dropdownlists (the 2000 varchar is not one of them and there are only 2 date columns), do a distinct on them and put the results into a Hashset. Then I set NameOfViewObjects to point towards this hashset. I have to wait for about 2 minutes for that to complete, but after that, populating the dropdownlists is almost instant (maybe a second for each of them).
model.Beslissingen = new HashSet<NameOfViewObject>(dbBes.NameOfViewObject
.DistinctBy(item => new
{
item.VarcharColumn1,
item.DateColumn1,
item.DateColumn2,
item.VarcharColumn2,
item.VarcharColumn3,
item.VarcharColumn4,
item.VarcharColumn5,
item.VarcharColumn6,
item.VarcharColumn7,
item.VarcharColumn8
}
)
);
The big problem here is that the object NameOfViewObject is probably quite large, and even though using distinct here, resulting in less than 100.000 results, it still uses over 500mb of memory for it. This is unacceptable, because there will be a lot of users using this screen (a lot would be... 10 max, 5 average simultaniously).
- 2 -
The other solution is to use the same linq query and point NameOfViewObjects towards the IQueryable it produces. This means that every time the view wants to bind a dropdownlist to a IEnumerable, it will fire a query that will find the distinct values for that column in a table with millions of rows where most likely the column it's getting the values from is not indexed. This takes around 1 minute for each dropdownlist (I have 10), so that takes ages.
Don't forget: I need to update the dropdownlists every time one of them has it's selection changed.
-- Question --
So I'm probably going at this the wrong way, or maybe one of these solutions should be combined with indexing all of the columns I use, maybe I should use another way to store the data in memory, so it's only a little, but there must be someone out there who has done this before and figured out something smart. Can you please tell me what would be the best way to handle a situation like this?
Acceptable performance:
having to wait for a while (2 minutes) while the page loads, but
everything is fast after that.
having to wait for a couple of seconds every time a dropdownlist
changes
the page does not use more than 500mb of memory
Of course you should have indexes on all columns and combinations in WHERE clauses. No index means table scan and O(N) query times. Those cannot scale under any circumstance.
You do not need millions of entries in a drop down. You need to be smarter about filtering the database down to manageable numbers of entries.
I'd take a page from Google. Their type ahead helps narrow down the entire Internet graph into groups of 25 or 50 per page, with the most likely at the top. Maybe you could manage that, too.
Perhaps a better answer is something like a search engine. If you were a Java developer you might try Lucene/SOLR and indexing. I don't know what the .NET equivalent is.
First point you need to check is your DB, make sure you have to right indexes and entity relations in place,
next if you want to dynamical build your filter options then you need to run the query with the existing filters to obtain what the next filter can be. there are several ways to do this,
firstly you can query the data and extract the values from the return, this has a huge load time and wastes time returning data you don't want (unless you are live updating the results with the filter and dont have paging, in which case you might aswell just get all the data and use linqToObjects to filter)
a second option is to have a parallel queries for each filter that returns the possible filters, so filter A = all possible values of A from data, filter b = all possible values of B when filtered by A in the data, C = all possible values of C when filtered by A & B in the data, etc. this is better than the first but not by much
another option is the use aggregates to speed things up, ie you have a parallel query as above but instead of returning the data you return how many records are returned, aggregate functions are always quicker so this will cut your load time dramatically but you are still repeatedly querying a huge dataset to it wont be exactly nippy.
you can tweak this further using exist to just return a 0 or 1.
in this case you would look at a table with all possible filters and then remove the ones with no values from the parallel query
the next option will be the fastest by a mile is to cache the filters in the DB, with a separate table
then you can query that and say from Cache, where filter = ABC select D, the problem with this maintaining the cache, which you would have to do in the DB as part of the save functions, trigggers etc.
Another solution that can be added in addition to the previous suggestions is to use the /*+ result_cache */ hint, if your version of Oracle supports it (Oracle version 11g or later). If the output of the query is small enough for a drop-down list, then when a user enters criteria that matches the same criteria another user used, the results are returned in a few milliseconds instead of a few seconds or minutes. Result cache is wonderful for queries that return a small set of rows out of millions.
select /*+ result_cache */ item_desc from some_table where item_id ...
The result cache is automatically flushed when any insert/updates/deletes occur on the database tables.
I've done something 'kind of' similar in the past - if you can add a table to the database then I'd explore introducing a 'scratchpad' type table where results are temporarily stored as the user refines their search. Since multiple users could be working simultaneously the table would have to have an additional column for identifying the user.
I'd think you'd see some performance benefit since all processing is kept server-side and your app would simply be pulling data from this table. Since you're adding this table you would also have total control over it.
Essentially I'd imagine the program flow would go something like:
User selects some filters and clicks 'Search'.
Server populates scratchpad table with results from that search.
App populates results grid from scratchpad table.
User further refines search and clicks 'Search'.
Server removes/adds rows to scratchpad table as necessary.
App populates results grid from scratchpad table.
And so on.
Rather than having all the users results in one 'scratchpad' table you could possibly explore having temporary 'scratchpad' tables per user.
I have developed an eCommerce application in C# and ASP.Net. For the Admin users "dashboard" landing page, I would like to give them a GridView that shows them the total sales dollar amount for a couple different time ranges, these would be my columns (ie last day, last week, last month, last year, total ever). I would like to give these values for orders that are in different status' (ie complete, paid but not shipped, in progress). Something similar to this:
|OrderStatus|Today|LastWeek|LastMonth|
|Processed |$10 |$100 |$34000 |
|PaidNotShip|$4 |$12 |$45 |
My question: What is the best/most efficient way to do this? I know that I could write separate SQL statements and union them together and bind the gridview to a sqldatasource:
(select amountForYesterday, amountForLastWeek from sales where orderStatus = processed)
UNION
(select amountForYesterday, amountForLastWeek from sales where orderStatus = paidnotshipped)
But that seems like a pain and very inefficient, since I would effectively be writing a separate query for each value.
I could also do this in the .cs page behind on load and programmatically populate the grid view row by row.
This GridView would only show information for the user's specific organization, so it would have to filter based on that as well.
I'm kind of at a loss as to how to do this without writing a massive query and continually hitting that query and database each time the page is viewed.
Any ideas?
I prefer using LINQ to work with data and/or GridViews (accessing the rows etc.). Have a look at a project I have on GitHub, which does exactly what I am mentioning here, as example. Note that this is just a sandbox I used previously for illustration purposes.
GitHub Repo
https://github.com/pauloosthuysen/int
Other useful info:
http://www.codeproject.com/Articles/33685/Simple-GridView-Binding-using-LINQ-to-SQL
The Sales etc. for LastWeek and LastMonth does not change very often. You could store that in a static Dictionary indexed by organization or summarize it in a separate table for faster access. This way you will not need to select the same huge amount of rows to get the same numbers over and over again. Unless special demands I would stick to the Dictionary solution because it is simple but a combination could also be a good solution
There is no direct way of doing it.
However instead of hitting the DB to the sum of every columns, you can perform the stuff using you datatable which is used for binging to your grid.
All you need to do is use
Dim iSumSal As Integer
iSumSal = StudentTable.Compute("SUM(sal)", "")
similarly you can perform for other columns.
once this is done. then just add a new row to you data table with all the summed values in it.
And then you can bind it to your grid.
optional - you can put some text value in the first column of you new row as "Total:"
thanks
rahul
I have a text box that contains lines of data similar to this:
sheep
haggis
red squirrels
chickens
rabbits
and a SQL CE table with one column, titled foods, that stores these values. What is the best way to sync these two data sources? For example, if a user deletes red squirrels and chickens from the textbox and adds carrots, the table rows that contains red squirrels and chickens should be deleted and a new row inserted that contains carrots.
Right now, my solution is to compare the old list with the new list and perform two actions. 1) Find the items that have been removed by comparing the lists using Except:
List<String> removedFoods = oldList.Except(newList).ToList();
These tags are removed with DELETE. Another Except statement finds the tags that have been added, which are then added with INSERT.
List<String> addedFoods = newList.Except(oldList).ToList();
Is their a better way that involves one C#/SQL statement that I'm missing (or a better way in general)?
Your question is about performance right? I don't see much room for improvement in performance from what you are doing already. No matter what, you have to do deletion and insertion.
As Tim asked in his reply to your question, it depends on the number of foods in your table.
You could clear the table data like following:
TRUNCATE TABLE foods
MSDN: TRUNCATE TABLE is similar to the DELETE statement with no WHERE clause; however, TRUNCATE TABLE is faster and uses fewer system and transaction log resources
and then use the text box data to insert the correct data back in.
It will be easier to make a datagrid that looks like a multi line textbox than to do all the parsing and handling yourself.
Keeping track of the lines/rows will be much easier.
Deleting and updating too.