Just a random query regarding Microsoft Velocity.
Scenario:
Say I want ALL Orders from my database. In SQL, this is fine, I can do SELECT OrderId,TotalCost... from Orders. This is one round trip to my database, and everyone is happy.
Now, if I'm using Memcached or (as I'm using now) Microsoft Velocity (CTP3), there is no easy way to do this. The two options I see are (in pseudo code)
FOR EACH ORDER
Order = cache.TryGet(OrderId)
if Order is null
Order = db.Get(OrderId)
END FOR EACH
which would be LOADS of roundtrips.
Also, consider I want to get orders by Customer
SQL: Select OrderId....TotalCost from Orders where CustomerId = MyCustomerId
One round trip, everyone is happy.
Using a cached solution there are two solutions I see really:
Solution 1:
DOES CustomerOrderIdsForCustomerId EXIST
NO
POPULATE CustomerOrderIdsForCustomerId FROM DATABASE
YES
FOR EACH OrderId IN CustomerOrdersForCustomerId
cache.TryGet(OrderId)
IF Order IS NULL
Order = db.Get(OrderId)
END FOR EACH
Solution 2 is to hold a serialized list of all the customer orders in it's own cache object. Reduces round trips, but just seems lame.
Can someone shed light on this situation please?
Just because you have a cache doesn't mean you have to use it for every query! In this instance as you've already identified, it's not really helping you and I'd probably go straight through to the database for this sort of thing.
It depends a bit on your application though - if you think customers are regularly going to be looking at their order history, or you have some function that's analysing orders to see what products are hot, then you might want to use some caching to keep load off your SQL server. In that case, I'd probably go with holding in the cache either a DataTable of the orders, or a collection of Orders and query it with LINQ to show the orders for a customer.
Keep in mind that a cache is not supposed to be the permanent store for any data (orders in your case). In this case the cache can help in removing some of the load from your DB server, but something has to load the orders in the cache before you can retrieve them. With that being said, here's a couple of options to consider if you are using velocity that avoid having to loop through a collection. However, you will always have to figure out a way to deal with data that is not in the cache.
Option 1: Use Regions
You can create a region and get all the objects from that region with one call. In your scenario, you could create an Orders region where you can store all the orders and then use the GetObjectsInRegion method to get all the orders in the cache. Note however that this brings back all the orders in the cache... which might or might not have all the orders that you have in the database.
Option 2: Use Regions And Tags
Velocity lets you tag objects that you put in the cache regions and then retrieve them using those tags. So, in your scenario you could tag the order objects with an "order" tag and then use the GetObjectsByTag method to retrieve them. Since you can use multiple tags, you could also tag them with their customer id tag and then pull them out that way.
These 2 options come with some caveats, so be sure to read up on the documentation:
Velocity Tag BasedMethods
Related
I am building an app with a SQL Server database. I have a main table of products (tblProducts) with a column that holds the quantity in hand (quantity). Another table holds the orders (tblOrders) that come from the supplier.
When an order comes in, I add the order to my database (tblOrders) and then I edit tblProducts to add to the quantity column the new received product.
As far, everything is good.
My question: after let's say 1 year of many many orders, with a lot of edits in quantity, do you guys, periodically check all orders to check if the quantity in main table tblProducts is correct ? Or do I just assume that it is always correct?
What procedures do you use for updating this kind of database? Do you sum all orders every time when you need quantity in hand?
Thanks!
This is really up to how you want to implement it.
Trusting that the values will always check out (with adequate testing to ensure only stable code will see production) is the easiest and the fastest way, but might be vulnerable to data corruption, and thus, not that recommended.
Always summing up the orders is the safest way and correct way, but will become increasingly slower as the size of your tables grow. If this is not an issue for you, then this is the recommended option.
What I consider a good intermediate method is to have a separate tblProductLogs table which stores the stock of an item at a specific timestamp. You can sum the inventory at set periods (daily, hourly, up to you), and when you want to retrieve the current inventory stock you only need to sum the values that were registered after the last log entry for that item, saving you query time. This could be made more safe if update operations were disabled on the log table, since you won't need to modify the entries there. This is faster than the second option, and somewhat more stable than the first.
How to efficiently implement cursor based pagination with EF? Traditionally Take and Skip solve the common way of do it, but for scenarios where data is added and removed frequently traditional pagination is not the best way to go for.
To put things in context suppose you need to list a huge list of products, you can store last product id and go with a where clause asking for ids greater than or less than the value stored. Things get complicated when you need to provide the ability to sort based on criteria like price, date added etc where you can have equals values for many items, then greater than or less than is not enough.
LINQ has SkipWhile and TakeWile but this work over objects not over SQL, but I can go for it if a decent solution come to my mind or with a smart answer / comment. I am trying to implement graphql pagination as per Relay.js
Thanks in advance
Problem summary:
C# (MVC), entity framework 5.0 and Oracle.
I have a couple of million rows in a view which joins two tables.
I need to populate dropdownlists with filter-posibilities.
The options in these dropdownlists should reflect the actual contents
of the view for that column, distinct.
I want to update the dropdownlists whenever you select something, so
that the new options reflect the filtered content, preventing you
from choosing something that would give 0 results.
Its slow.
Question: whats the right way of getting these dropdownlists populated?
Now for more detail.
-- Goal of the page --
The user is presented with some dropownlists that filter the data in a grid below. The grid represents a view (see "Database") where the results are filtered.
Each dropdownlist represents a filter for a column of the view. Once something is selected, the rest of the page updates. The other dropdownlists now contain the posible values for their corresponding columns that complies to the filter that was just applied in the first dropdownlist.
Once the user has selected a couple of filters, he/she presses the search button and the grid below the dropdownlists updates.
-- Database --
I have a view that selects almost all columns from two tables, nothing fancy there. Like this:
SELECT tbl1.blabla, tbl2.blabla etc etc
FROM table1 tbl1, table2 tbl2
WHERE bsl.bvz_id = bvz.id AND bsl.einddatum IS NULL;
There is a total of 22 columns. 13 VARCHARS (mostly small, 1 - 20, one of em has a size of 2000!), 6 DATES and 3 NUMBERS (one of them size 38 and one of them 15,2).
There are a couple of indexes on the tables, among which the relevant ID's for the WHERE clause.
Important thing to know: I cannot change the database. Maybe set an index here and there, but nothing major.
-- Entity Framework --
I created a Database first EDMX in my solution and also mapped the view. There are also classes for both tables, but I need data from both of them, so I don't know if I need them. The problem by selecting things from either table would be that you can't apply half of the filtering, but maybe there are smart way's I didn't think of yet.
-- View --
My view is strongly bound to a viewModel. In there I have a IEnumerable for each dropdownlist. The getter for these gets its data from a single IEnumerable called NameOfViewObjects. Like this:
public string SelectedColumn1{ get; set; }
private IEnumerable<SelectListItem> column1Options;
public IEnumerable<SelectListItem> Column1Options
{
get
{
if (column1Options == null)
{
column1Options= NameOfViewObjects.Select(item => item.Column1).Distinct()
.Select(item => new SelectListItem
{
Value = item,
Text = item,
Selected = item.Equals(SelectedColumn1, StringComparison.InvariantCultureIgnoreCase)
});
}
return column1Options;
}
}
The two solutions I've tried are:
- 1 -
Selecting all columns in a linq query I need for the dropdownlists (the 2000 varchar is not one of them and there are only 2 date columns), do a distinct on them and put the results into a Hashset. Then I set NameOfViewObjects to point towards this hashset. I have to wait for about 2 minutes for that to complete, but after that, populating the dropdownlists is almost instant (maybe a second for each of them).
model.Beslissingen = new HashSet<NameOfViewObject>(dbBes.NameOfViewObject
.DistinctBy(item => new
{
item.VarcharColumn1,
item.DateColumn1,
item.DateColumn2,
item.VarcharColumn2,
item.VarcharColumn3,
item.VarcharColumn4,
item.VarcharColumn5,
item.VarcharColumn6,
item.VarcharColumn7,
item.VarcharColumn8
}
)
);
The big problem here is that the object NameOfViewObject is probably quite large, and even though using distinct here, resulting in less than 100.000 results, it still uses over 500mb of memory for it. This is unacceptable, because there will be a lot of users using this screen (a lot would be... 10 max, 5 average simultaniously).
- 2 -
The other solution is to use the same linq query and point NameOfViewObjects towards the IQueryable it produces. This means that every time the view wants to bind a dropdownlist to a IEnumerable, it will fire a query that will find the distinct values for that column in a table with millions of rows where most likely the column it's getting the values from is not indexed. This takes around 1 minute for each dropdownlist (I have 10), so that takes ages.
Don't forget: I need to update the dropdownlists every time one of them has it's selection changed.
-- Question --
So I'm probably going at this the wrong way, or maybe one of these solutions should be combined with indexing all of the columns I use, maybe I should use another way to store the data in memory, so it's only a little, but there must be someone out there who has done this before and figured out something smart. Can you please tell me what would be the best way to handle a situation like this?
Acceptable performance:
having to wait for a while (2 minutes) while the page loads, but
everything is fast after that.
having to wait for a couple of seconds every time a dropdownlist
changes
the page does not use more than 500mb of memory
Of course you should have indexes on all columns and combinations in WHERE clauses. No index means table scan and O(N) query times. Those cannot scale under any circumstance.
You do not need millions of entries in a drop down. You need to be smarter about filtering the database down to manageable numbers of entries.
I'd take a page from Google. Their type ahead helps narrow down the entire Internet graph into groups of 25 or 50 per page, with the most likely at the top. Maybe you could manage that, too.
Perhaps a better answer is something like a search engine. If you were a Java developer you might try Lucene/SOLR and indexing. I don't know what the .NET equivalent is.
First point you need to check is your DB, make sure you have to right indexes and entity relations in place,
next if you want to dynamical build your filter options then you need to run the query with the existing filters to obtain what the next filter can be. there are several ways to do this,
firstly you can query the data and extract the values from the return, this has a huge load time and wastes time returning data you don't want (unless you are live updating the results with the filter and dont have paging, in which case you might aswell just get all the data and use linqToObjects to filter)
a second option is to have a parallel queries for each filter that returns the possible filters, so filter A = all possible values of A from data, filter b = all possible values of B when filtered by A in the data, C = all possible values of C when filtered by A & B in the data, etc. this is better than the first but not by much
another option is the use aggregates to speed things up, ie you have a parallel query as above but instead of returning the data you return how many records are returned, aggregate functions are always quicker so this will cut your load time dramatically but you are still repeatedly querying a huge dataset to it wont be exactly nippy.
you can tweak this further using exist to just return a 0 or 1.
in this case you would look at a table with all possible filters and then remove the ones with no values from the parallel query
the next option will be the fastest by a mile is to cache the filters in the DB, with a separate table
then you can query that and say from Cache, where filter = ABC select D, the problem with this maintaining the cache, which you would have to do in the DB as part of the save functions, trigggers etc.
Another solution that can be added in addition to the previous suggestions is to use the /*+ result_cache */ hint, if your version of Oracle supports it (Oracle version 11g or later). If the output of the query is small enough for a drop-down list, then when a user enters criteria that matches the same criteria another user used, the results are returned in a few milliseconds instead of a few seconds or minutes. Result cache is wonderful for queries that return a small set of rows out of millions.
select /*+ result_cache */ item_desc from some_table where item_id ...
The result cache is automatically flushed when any insert/updates/deletes occur on the database tables.
I've done something 'kind of' similar in the past - if you can add a table to the database then I'd explore introducing a 'scratchpad' type table where results are temporarily stored as the user refines their search. Since multiple users could be working simultaneously the table would have to have an additional column for identifying the user.
I'd think you'd see some performance benefit since all processing is kept server-side and your app would simply be pulling data from this table. Since you're adding this table you would also have total control over it.
Essentially I'd imagine the program flow would go something like:
User selects some filters and clicks 'Search'.
Server populates scratchpad table with results from that search.
App populates results grid from scratchpad table.
User further refines search and clicks 'Search'.
Server removes/adds rows to scratchpad table as necessary.
App populates results grid from scratchpad table.
And so on.
Rather than having all the users results in one 'scratchpad' table you could possibly explore having temporary 'scratchpad' tables per user.
I want to get the fastest way with select queries.
I have a table that contains two million lines and I want to add an information about the country for each line.
for exemple the table:
strain(id,name,sequenceinformations,depositor,numberofsequences)
and I want to add country informations: country(id,name,code)
what is the fastest way doing it in the same table or adding the country table and adding just id of country.
I know that for design it is better to separate tables and for maintenance it is mach better but in my case I search only the speed.
The age old normalization vs denormalization debate. At first glance, a separate table (the normalized approach) seems like the logical choice. However, for country data (which tends to be relatively static), adding it directly to the first table is a viable option. On the rare occasion when a country changes its name, the amount of maintenance is fairly minimal. Sure, it takes up more space, but space is cheap.
That said, for relatively small databases, the performance difference is probably negligible. Therefore, the best approach is whatever you find easiest to understand and maintain.
Also consider if the country information is likely to be used in other tables: if you're not careful, maintenance could become difficult and error prone.
So, to address your specific question: yes, a denormalized approach will, in most cases, be technically faster for select queries, but slower in update queries. Whether the difference is sufficient to justify it is another question.
As an aside, I saw an interesting approach recently where a separate table with country data was kept for the purpose of populated dropdown lists, etc, but the country name itself was added to the other tables. Obviously this approach isn't as robust as full normalization, but it certainly helped enforce a certain level of consistency.
Since your country table will not have rows more than countries in world so it will be small table so you can use separate table to have country data and use join to get the data.
I believe hash join will be a better option but since MySQL resolves all joins using nested-loop join. In nested loop join, The driving table is read once and for each row in driving table, the inner table is processed once. The smaller the inner result set,better is the performance. So, you need to keep inner result from the country table.If inner input is indexed then it will be faster.
At last it depends on the factor how often your main table data is getting updated and selected. More updates go for new tables, lesser updates go for other approach.
I really need an expert's help to answer my query.
Here is the scenario:
Im using an sql select query to retrieve a million records.
I need to perform sorting and grouping on the resultant records which im storing in a datatable( in one execution)
and looping through it for grouping and sorting it.
I know this is so childish and not the right way to process it.
How can i manage the million records effectively and apply the grouping and sorting to it?
Really need help out here. Heard of executing the select query batch wise but how to implement the grouping and sorting while we dont have the entire data in hand?
I cannot go for sql order by and group by directly and that's against my requirement.
Here is what i'm doing right now:
I have the following objects, i.e the column names for grouping and Sorting
List<Group> groupList;
List<Sort> sortList;
DataTable reportData; // Here im having the entire records from db
Im looping through the 'reportData' row by row and matches the current and previous row for the custom grouping and sorting. Would like to know how the same can be done when we are using a batchwise execution or any alternative solution is there?
I need to perform sorting and grouping on the resultant records which
im storing in a datatable( in one execution) and looping through it
for grouping and sorting it.
What for?
Seriously.
Do not pull then try plaing smart with a stupid object model behind (and datasets are not particularly smart, sorry).
Group and sort in your select statement, pull the data lready grouped and joined and be done with it.
A million records was a small amount of data for sql server when the original version was release (4.2 it was, a port of sysase sql server) 17 years of so ago. These days it is something that fits likely into the processor thiird level cache and is nothing a proper sql server even realizes it has just processed.
SQL is particulaly good ad doing projects and ever since they indoruced MARS you can even run multiple queries over one connection, which comes in handy here.
So, go back - throw away the dataset and "I try to program a sort algo" and create proper SQL statements to pull the data as you need it.
Sounds like you should implement Partition Pruning. Partitioning will allow for a separation of content like you are requesting in order to have faster queries.
If I understood correctly, in your case, I would create a temporary database table with the structure I want especially to cover my grouping.
Then I would select the records from main tables and insert them to the temporary one appying all modifications including grouping.
A specific index on how you want them sorted should be also applied.
After that, just select from this table, do what you have to do, and finally if the data are not needed any more, delete the temporary table.
I would choose the above solution because a million of records in memory smells trouble to me...
For example:
1. Lets assume that you would like to group them by their DocumentTypeID
var groupByType = reportData.GroupBy(g=>g.DocumentTypeID);
2. Sorting Alphabetically
var sortAlphabetically = reportData.OrderBy(g=>g.DocumentName);
3. Grouping and Sorting
var groupAndSort = reportData.GroupBy(g=>g.DocumentTypeID)
.OrderBy(g=>g.DocumentName);
4. Sort and Group
var groupAndSort = reportData.OrderBy(g=>g.DocumentName)
.GroupBy(g=>g.DocumentTypeID);
5. Multiple Grouping and sorting
var multipleGroupAndSort = reportData.GroupBy(g=>g.DocumentTypeID)
.GroupBy(g=>g.CreatedOnDate.Month)
.OrderBy(g=>g.DocumentName);
so on and so forth...
But I would still discourage bringing million rows to application. It will cost memory. There are of course ways to manage it through stored procedures etc.