i hope the term 'lookup table' is well chosen, what i mean is for example a rate table (lookup) with the following rates:
cheap: $15,-
Medium: $30,-
expensive: $45,-
we're at the situation that for a given entity (we call it 'fault', it is a malfunction of a device, airco, elevator, krane, toilet etc.) a constructor is hired to fix that device.
that constructor has these three (made up) rates: cheap, medium and expensive.
When the constructor fixes the fault, he enters the hours worked and the rate (when a senior has done the job, 'expensive', and when a junior has done the job, 'cheap')
technically, we then add a FK from the Fault table to the Rates table.
So when the invoice has to be printed, we get the rate via the FK and the hours worked from the fault record.
Problem is that when the constructor changes his rates, and you recalculate an old invoice months later, other amounts are calculated for the invoice because the record has changed.
So we have to construct some kind of history, and that's the question: how is that done?
what i've come up with is 2 different situations, and the question is: is one of these a good one are there better ways?
1 add a valid-from and valid-until field at the rate table, so when you edit a value, you in fact create a new record with new valid dates. downside is you have to always get the rates with a specific date in mind, which for the current situation (the actual rate at this moment) is not neccessary.
2 don't put a FK from fault to rate, but when you set a rate at a fault, you just copy the VALUE from rate to fault. downside is that when the fault is still editable, when you edit the rate, the fault's rate is not updated. And, when you edit a fault, you get a dropdown box with 3 values to choose from, non of which are the same of the current values.
At this point thanks already for reading this entire post!
I don't like #2; I never like replacing relationships with actual values (denormalizing) if I can help it. Also, it makes auditing a lot harder; if there's a weird value in for the rate, where did it come from?
The problem with #1, though, is that if for some reason you change the date of the invoice, it should probably still have the same rate that it had when it was originally created.
For these reasons, I'd recommend doing the part of #1 where a rate change always created a new row, but then link from each fault to the rate that was actually applied (i.e. rather than relying on the date to join to a rate, actually store a rate id with the fault).
One approach to finding the current rate is just to look for the one that has no end date. Or alternately, don't use end dates at all (the start date of the next rate is treated as the end date of the previous rate), and just sort by date and take the last one.
There was a good discussion of this over on Programmers.SE
How to Store Prices That Have Effective Dates
It's a well-known problem and using effective dates is the best way to do it.
I'd suggest keeping a table of contractor rates, ordered by date. When a contractor's rates change, instead of changing the existing rate add a new entry. When you need to get the current rate, sort by the timestamp descending and limit 1. Add the date entry for the current rate entry to each job record and then you can perform a simple join to get all the information at once.
Related
I am building an app with a SQL Server database. I have a main table of products (tblProducts) with a column that holds the quantity in hand (quantity). Another table holds the orders (tblOrders) that come from the supplier.
When an order comes in, I add the order to my database (tblOrders) and then I edit tblProducts to add to the quantity column the new received product.
As far, everything is good.
My question: after let's say 1 year of many many orders, with a lot of edits in quantity, do you guys, periodically check all orders to check if the quantity in main table tblProducts is correct ? Or do I just assume that it is always correct?
What procedures do you use for updating this kind of database? Do you sum all orders every time when you need quantity in hand?
Thanks!
This is really up to how you want to implement it.
Trusting that the values will always check out (with adequate testing to ensure only stable code will see production) is the easiest and the fastest way, but might be vulnerable to data corruption, and thus, not that recommended.
Always summing up the orders is the safest way and correct way, but will become increasingly slower as the size of your tables grow. If this is not an issue for you, then this is the recommended option.
What I consider a good intermediate method is to have a separate tblProductLogs table which stores the stock of an item at a specific timestamp. You can sum the inventory at set periods (daily, hourly, up to you), and when you want to retrieve the current inventory stock you only need to sum the values that were registered after the last log entry for that item, saving you query time. This could be made more safe if update operations were disabled on the log table, since you won't need to modify the entries there. This is faster than the second option, and somewhat more stable than the first.
Should the user's Account balance be stored in the database or calculated dynamically?
For accurate results calculating it dynamically make sense but then it might be a problem, when there are many user's and the database grows very large?
Transaction
Id (PK)
AccountId
Type
DateTime
Amount
etc..etc...
AccountBalance
TransactionId (PK/FK)
BalanceAmount
In order to keep accurate auditing you should make record of every transaction that affects the users account balance. This means you can calculate the balance dynamically, however for performance reasons I would have the balance stored as well. To ensure the balance is correct though, I would have a daily job run that recalculates the balance from scratch.
You need to ask yourself a few questions:
1) Who will OWN the calculation?
2) Who will NEED the result?
Now if the owner of the calculation is the only one who will need it - or if anyone else who needs it will get it from the owner, then there is no need to store the calculation.
However, in most applications that actually run for a long time, the calculated result will probably end up being needed somewhere else. For instance, a reporting application like SQLReportingServices will need the result of the calculation, so if the owner of the calculation is a web application, you have a problem. How will reporting services (which talks to the database only) get the result?
Solution in that case - either store the calculation OR make the database the owner OF the calculation and have a sql function that returns the result.
Personally, I tend to go for the non-purist approach - I store calculated results in the database. Space is cheap, and response time is faster on a read than on a read+function call.
I think this is a good question. Calculating every time is obviously easy to do but would probably result in a lot of needless calculations with the resultant performance hit.
But storing the current balance in some other table can lead to the issues in data concurrency, where the data the builds the aggregate is modified out of sync with the aggregate.
Perhaps a happy medium is to have a sql trigger on the transaction table that updates the aggregate value on an insert or update for that user.
the current balance is already available!
it is the balance in the last transaction for the account:
select top 1 [Balance]
from dbo.Trans
where [AccountID] = #AccountID
order by [TranID] desc
the balance has to be calculated and stored as part of every transaction
otherwise the system won't scale ...
also if you don't store the balance you have no checks and balances (as in balance must equal previous balance plus new credits less new debits)
If your application is not retrieving data from database for balance calculation while you need the balance, I will suggest that you should calculate the balance or else store in database.
If you need updated balance frequently and it is dynamically change based on more than one table then you should have table view instead of trigger.
this is more of an architectural question more than a specific code problem as I've hit a major block in how I am going to proceed with this project.
I'm building a financial scanning software that filters stock picks on specific criteria, for example. For example if out of 8000 stocks, its closing price today is above the SMA 100 and 10 days ago the closing price is below the SMA 100, then return this stock Symbol back to me.
However, note that the SMA (Simple Moving Average) is calculated with the last 100 days of data in the above example, but it could be that we could change the 100 days for lets say another value, 105 or 56 - could be anything.
In my Database I have a table definition called EODData with a few columns, here is the definition below;
EODData
sSymbol nvarchar(6)
mOpen money
mClose money
mHigh money
mLow money
Date datetime
The table will hold 3 years of End Of Day Data for the American Stock Exchange so that is approximately 6,264,000 rows, no problem for MS SQL 2008 R2.
Now, I'm currently using Entity Framework to retrieve data from my database, however what would be the best way to run or create my filter because the SMA must be calculated for each Symbol or underlying Stock Ticker each time a scan is performed because the 100 day variable can change.
Should I convert from Entity Objects to a DataSet for in memory filtering etc???
I've not worked with DataSets or DataTables much so I am looking for pointers.
Note that the SMA is just one of the filters, I have another algorithm that calculates the EMA (Exponential Moving Average, which is a much more complicated formula) and MACD (Moving Average Convergence Divergence).
Any opinions?
What about putting the calculations in the database? You have your EODData table, which is great. Create another table that is your SummaryData, something like:
SummaryData
stockSymbol varchar(6) -- don't need nvarchar, since AMSE doesn't have characters outside of normal English alphabet.
SMA decimal
MCDA decimal
EMA decimal
Then you can write some stored procedures that run on close of day and update this one table based on the data in your EODData table. Or you could write a trigger so that each insert of the EODData table updates the summary data in your database.
One downside to this is that you're putting some business logic in the database. Another downside is that you will be updating statistical data on a stock symbol that you might not need to do. For example, if nobody ever wants to see what XYZZ did, then the calculation on it is pointless.
However, this second issue is mitigated by the fact that 1. you're running SPs on the server which MSSQL can optimize and 2. You can run these after hours when everyone is at home, so if it takes a little bit of time you're not affected. (To be honest, I'd assume if they're calculations like rolling averages, min/max etc, SQL won't be that slow.)
One upside is your queries should be wicked fast, because you could write
select stockSymbol from SummaryData where SMA > 10 you've already done the calculation.
Another upside is that the data will only change once per day (at the close of the business day) but you might be querying several times throughout the day. For example, you want to run several different queries today for all the data up to and including yesterday. If you run 10 queries, and your partner runs the same 10 queries, you've just done the same calculation over. (Essentially, write once, read many.)
I was thinking of formatting it like this
TYYYYMMDDNNNNNNNNNNX
(1 character + 19 digits)
Where
T is type
YYYY is year
MM is month
DD is day
N is sequencial number
X is check digit
The problem is, how do I generate the sequencial number? since my primary key is not an auto increment integer value, if it was i would use that, but its not.
EDIT can I have the sequencial number resets itself after 1 day (24hours).
P201012080000000001X <-- first
transaction of 2010/12/08
P2010120810000000002X <--- second
transaction of 2010/12/08
P201012090000000001X <--- First
transaction of 2010/12/09
(X is the check digit)
The question is meaningless without a context. Others have commented on your question. Please answer the comments. What is the "transaction number" for; where is it used; what is the "transaction" that you need an external identifier for.
Identity or auto-increment columns may have some use internally, but they are quite useless outside the database.
If we had the full schema, knowing which components are PKs that will not change, etc, we could provide a more meaningful answer.
At first glance, without the info requested, I see no point in recording date in the "transaction" (the date is already stored in the transaction row)
You seem to have the formula for your transaction number, the only question you really have is how to generate a sequence number that resets each day.
You can consider the following options:
Use a database sequence and a scheduled job that resets it.
Use a sequence from outside the database (for instance, a file or memory structure).
With the proper isolation level, you should be able to include the (SELECT (MAX(Seq) + 1) FROM Table WHERE DateCol = CURRENT_DATE) as a value expression in your INSERT statement.
Also note that there's probably no real reason to actually store the transaction number in the database as it's easy to derive it from the information it encodes. All you need to store is the sequential number.
You can track the auto-incs separately.
Or, as you get ready to add a new transaction. First poll the DB for the newest transaction and break that apart to find the number, and increase that.
Or add an auto-inc field, but don't use it as a key.
You can use a uuid generator so that you don't have to mind about a sequence and you are sure not to have collision between transactions.
eg :
in java :
java.util.UUID.randomUUID()
05f4c168-083a-4107-84ef-10346fad6f58
5fb202f1-5d2a-4d59-bbeb-5bcabd513520
31836df6-d4ee-457b-a47a-d491d5960530
3aaaa3c2-c1a0-4978-9ca8-be1c7a0798cf
in php :
echo uniqid()
4d00fe31232b6
4d00fe4eeefc2
4d00fe575c262
there is a UUID generator in barely all languages.
A primary key that big is a very, very bad idea. You will waste huge amounts of table space unnecessarily and make your table very slow to query and manage. Make you primary key a small simple incrementing int and store the transaction date in a separate field. When necessary in a query you can select a transaction number for that day with:
SELECT ROW_NUMBER OVER (PARTITION BY TxnDate ORDER BY TxnID), TxnDate, ...
Please read this regarding good primary key selection criteria. http://www.sqlskills.com/BLOGS/KIMBERLY/category/Indexes.aspx
I am trying to have a ratings strategy in the hotels search website.Ratings is based on number of clicks by the users who view different hotels.I am trying to assign number of points to each click.
Suppose i have a POINTS column in the hotels table which increases by the number of clicks on each hotel, the problem i face is suppose i have a hotel1 which was introduced in week1 then it gains considerable amount of points by week2 but if i introduce a new hotel2 at week2 although this new hotel is competetively increasing the points it cant reach easily the hotel1 becoz of there difference in weeks which they were introduced.
I thought a way to solve the above problem by introducing a DAYS column then i can divide the POINTS of each HOTEL by number of days so that i can have clear ratings to each hotel.
I need your help about how i get the number of each passing day in the DAYS column after new hotel is added in the table of database.
It would probably be better to have a CreateDate column, and then in client side code do something along the lines of:
int days = Date.Now.Subtract(hotel.CreateDate).Days;
This will cause less updates to your database too, as the date only needs to be set on create.
I'm not going to go into the statistical theory of what you're doing, but let me state for the record that I think your stats are going to be misleading.
Be that as it may, just to achieve what you say you want to achieve, I would do what #Andy just said you should do. (Beat me to it!)
As I understand it, you are asking people to "vote" on the hotels, but the only possible votes are "yes" and "no vote".
Whether this is statistically valid will depend on when people are asked to vote. If every time someone visits a hotel, you ask him whether he was satisfied with his stay, and people consistently do this, I suppose it would be generally valid. But if the scoring system is such that users do not perceive a need to re-vote on a hotel that they have already voted on, then hotels that are on the list longer will see their ratings tend to sink. If it reached a point where every user who had visited a hotel has voted (or not), and no one saw a need to vote again, then that hotels score would gradually sink.
Also, this system would be biased in favor of big hotels. If hotel A has 500 rooms and hotel B has 10 rooms, it would b e very tough for hotel B to ever get as many votes as hotel A.
I think you'd be better to ask people to rate the hotel on some scale -- 1 to 5 stars or whatever -- and then present the average score. Probably along with the number of ratings, as people can probably figure out that if there's only one rating and it's the highest possible, that might be the owner rating his own hotel.
An alternative to calculating the days in code would be to use a computed column in the database (assuming by the sql tag you meant sql server). As the other posters have said, add a CreateDate column for the hotel and then add a computed column to return the date diff.