Asp.net storing DateTime/long to smallest comparable value - c#

I am working on a rather large SQL Database project that requires some history tracking. I am aware that SQL has things like Data Capture but I need to have more control than just storing backup copies of the data along with other requirements.
Here is what I am trying to do, I found some similar questions on here like
get-the-smallest-datetime-value-for-each-day-in-sql-database
and
how-can-i-truncate-a-datetime-in-sql-server
What I am trying to avoid is exactly the types of things that are mentioned in these answers, by that I mean that they require some sort of conversion during the Query i.e. truncating the DateTime value, what I would like to do is take the DateTime.Ticks or some other DateTime property, and do a one way conversion to a smaller type to create a "Version" of each record that can be quickly Queried without doing any kind of Conversion after the database update/insert.
The concern that I have about simply storing the long from the DateTime.Ticks as a Version or using something like a Base36 string is the size and the time that is required during queries to compare the fields.
Can anyone point me in the right direction on how to SAFELY convert a DateTime.Ticks or a long into something that can be directly compared during queries? By this I mean I would like to be able to locate the history record using something like:
int versionToFind = GetVersion(DateTime.Now);
var result = from rec in db.Records
where rec.version <= versionToFind
select rec;
or
int versionToFind = record.version;
var result = from rec in db.Records
where rec.version >= versionToFind
select rec;
One thing to mention here is that I am not opposed to using some other method of quickly tracking the History of the data. I just need to end up with the quickest and smallest solution to be able to generate and compare Versions for each record.

Related

SQL - Better two queries instead of one big one

I am working on a C# application, which loads data from a MS SQL 2008 or 2008 R2 database. The table looks something like this:
ID | binary_data | Timestamp
I need to get only the last entry and only the binary data. Entries to this table are added irregular from another program, so I have no way of knowing if there is a new entry.
Which version is better (performance etc.) and why?
//Always a query, which might not be needed
public void ProcessData()
{
byte[] data = "query code get latest binary data from db"
}
vs
//Always a smaller check-query, and sometimes two queries
public void ProcessData()
{
DateTime timestapm = "query code get latest timestamp from db"
if(timestamp > old_timestamp)
data = "query code get latest binary data from db"
}
The binary_data field size will be around 30kB. The function "ProcessData" will be called several times per minutes, but sometimes can be called every 1-2 seconds. This is only a small part of a bigger program with lots of threading/database access, so I want to the "lightest" solution. Thanks.
Luckily, you can have both:
SELECT TOP 1 binary_data
FROM myTable
WHERE Timestamp > #last_timestamp
ORDER BY Timestamp DESC
If there is a no record newer than #last_timestamp, no record will be returned and, thus, no data transmission takes place (= fast). If there are new records, the binary data of the newest is returned immediately (= no need for a second query).
I would suggest you perform tests using both methods as the answer would depend on your usages. Simulate some expected behaviour.
I would say though, that you are probably okay to just do the first query. Do what works. Don't prematurely optimise, if the single query is too slow, try your second two-query approach.
Two-step approach is more efficient from overall workload of system point of view:
Get informed that you need to query new data
Query new data
There are several ways to implement this approach. Here are a pair of them.
Using Query Notifications which is built-in functionality of SQL Server supported in .NET.
Using implied method of getting informed of database table update, e.g. one described in this article at SQL Authority blog
I think that the better path is a storedprocedure that keeps the logic inside the database, Something with an output parameter with the data required and a return value like a TRUE/FALSE to signal the presence of new data

How to store a sparse boolean vector in a database?

Let's say I have a book with ~2^40 pages. Each day, I read a random chunk of contiguous pages (sometimes including some pages I've already read). What's the smartest way to store and update the information of "which pages I've read" in a (SQLite) database ?
My current idea is to store [firstChunkPage, lastChunkPage] entries in a table, but I'm not sure about how to update this efficiently.
Should I first check for every possible overlaps and then update ?
Should I just insert my new range and then merge overlapping entries (perhaps multiple times because multiple overlaps can occur ?) ? I'm not sure about how to build such a SQL query.
This looks like a pretty common problem, so I'm wondering if anyone knows a 'recognized' solution for this.
Any help or idea is welcome !
EDIT : The reading isn't actually random, the number of chunks is expected to be pretty much constant and very small compared to the number of pages.
Your idea to store ranges of (firstChunkPage, lastChunkPage) pairs should work if data is relatively sparse.
Unfortunately, queries like you mentioned:
SELECT count(*) FROM table
WHERE firstChunkPage <= page AND page <= lastChunkPage
cannot work effectively, unless you use spatial indexes.
For SQLite, you should use R-Tree module, which implements support for this kind of index. Quote:
An R-Tree is a special index that is designed for doing range queries. R-Trees are most commonly used in geospatial systems where each entry is a rectangle with minimum and maximum X and Y coordinates. ... For example, suppose a database records the starting and ending times for a large number of events. A R-Tree is able to quickly find all events, for example, that were active at any time during a given time interval, or all events that started during a particular time interval, or all events that both started and ended within a given time interval.
With R-Tree, you can very quickly identify all overlaps before inserting new range and replace them with new combined entry.
To create your RTree index, use something like this:
CREATE VIRTUAL TABLE demo_index USING rtree(
id, firstChunkPage, lastChunkPage
);
For more information, read documentation.

Designing a data-driven logic system

I'm developing a tax calculation system that applies various taxes based on a set of supplied criteria.
The information frequently changes, so I'm trying to create a way to store all these logic rules in the database.
As you can imagine, there is a lot of compound logic involved in applying taxes.
For example, a tax might only apply if A is true, B is less than 100, and C equals 7.
My current design is terrible.
I have a few database columns for very common criteria filtering, such as location and tax year.
For more complex logic, I have a column that holds JavaScript, and in code, I run an interpreter to filter the results. Performance and maintainability suck.
I'd like to improve this design by making the logic entirely data-driven, but I'm having trouble figuring out how to correctly represent this logic within a relational database. What is a good way to model this logic in the database?
I have worked on this similar issue for over a year now for a manufacturing cost generation application. Similarly, it takes in loads of product design data input and base on the design, and other inventory considerations such as quantity, bulk purchase options, part supplier, electrical ratings etc. The result is a list of direct materials, labour and costs.
I knew from the onset that what I need is some kind of query language instead of a computational one, and it has to be scripted, not compiled. But I have yet to find a perfect solution:
METHOD 1 - SQL
I created tables that represents my objects and columns that represents properties and then manually typed in the all the SQL SELECT statments required in an item_rules table. What I did was to first save the object into the database, then then I did
rules = SELECT * FROM item_rules
foreach(rules as _rule)
{
count = SELECT COUNT(*) FROM (_rule[select_statement]) as T1
if(count > 1) itemlist.add(_rule[item_that_satisfy_rule])
}
What it does is it takes each rule in the item_rules table and run it against my object that is now in the tables. e.g. SELECT * FROM my_object WHERE A=5 AND B>10. If I successfully pick it up, I get a positive count and then I know I should include the corresponding rule item to my items list.
METHOD 2 - NCALC
Instead of storing the queries in SQL format, I found the NCALC opensource expression parsing library. NCALC takes a string expression and option variable and computes a result. The string expressions can be stored in plain text on the filesystem.
METHOD 3 - EXCEL
EXCEL is actually a very good piece of software for doing data lookups. You can create the formulas in excel and then feed data from your application into excel and then let excel run the formulas to give you the results. Advantage is that many people knows how to use excel, so different people can maintain it.
But like I say, none of these are perfect for me. I am just sharing and hopefully we can get better recommedations.
If you are to go with Jake's approach, You can use Dynamic Sql too.

Reducing Parse Calls in Oracle

I am noticing the parse_calls are equal to the number of executions in our Oracle 11g database.
select parse_calls, executions
from v$sql order by parse_calls desc;
Running the above query gives the following result.
"PARSE_CALLS" "EXECUTIONS"
87480 87480
87475 87476
87044 87044
26662 26662
21870 21870
21870 21870
As I'm aware this is a major performance drawback. All of these SQL statements are either stored procedures or using bind variables. I'm also reusing the command objects that are calling the stored procedures from C#.
How do I reduce the number of parse calls in this?
Also, is there some method I can distinguish between hard parses and soft parses?
EDIT:
As #DCookie mentioned I ran the following query on the database.
SELECT s2.name, SUM(s1.value)
FROM v$sesstat s1 join v$statname s2 on s1.statistic# = s2.statistic#
WHERE s2.name LIKE '%parse count%'
GROUP BY s2.name
ORDER BY 1,2;
The result is as below
"NAME" "SUM(S1.VALUE)"
"parse count (describe)" 0
"parse count (failures)" 29
"parse count (hard)" 258
"parse count (total)" 11471
So the number of hard parses seem to be very low compared to the number of parses. Thanks to everyone for their responses :)
FINAL UPDATE:
The main issue for the parsing was because we had connection pooling turned off in the connection string. After turning on connection pooling I was able to completely resolve the parsing problem.
Start with this:
SELECT name, SUM(value)
FROM v$sesstat s1 join v$statname s2 on s1.statistic# = s2.statistic#
WHERE s1.name LIKE '%parse count%'
GROUP BY name
ORDER BY 1,2;
This will give you the number of hard parses and total parses. The parse_calls values in your query is total parses, hard and soft.
What does your SQL do? Not much cursor processing, mostly single statements? You are getting pretty much a 1-1 ratio of executions to parses, which, if they are soft parses means you are not doing much cursor processing.
EDIT:
Unless you can modify your code to open and hang on to a cursor for each of your SQL statements, and reuse them as much as possible within a session, I don't think you can avoid the parses.
A parse call has to occur any time a new cursor is created, even if the statement is in the library cache. It is the parse call that checks the library cache. If the statement is found in the library cache, it is a soft parse.
#DCookie has given an answer to your question about checking hard vs. soft parse count. I expect you will find most parse calls are soft parses. Note that you shouldn't expect the counts returned from v$sysstat to be very close to total parse calls from v$sql, since the former is a count since instance startup and the latter just shows the statements that are currently in the library cache.
The only way to avoid parse calls entirely at all is to keep a handle to an existing cursor, and execute it when needed, binding new values in when appropriate. This will occur sometimes by caching of cursors -- it's out of your explicit control, although I believe there are parameters you can change to affect it. In PL/SQL code you can explicitly hold onto cursors that you create and manage using the DBMS_SQL package. I would expect that C# has corresponding capabilities.
In any case, what you're looking at is probably not a problem. Just because counts seem high in no way implies that parsing is a bottleneck on your system.
First of all, you should check whether the SQL statements with those high parse counts are even in your control. When I did a modified version of your query on one of my systems:
select parse_calls, executions, parsing_schema_name,sql_text
FROM v$sql
ORDER BY parse_calls DESC;
I found that the statements with the highest number of parse calls were all recursive SQL parsed by SYS. This may not be the case for you depending on your usage, but it is something to check.

How do I compare two datasets for equality

I have two datasets each with one data table pulled from different sources and I need to know if there are any differences in the data contained in the data tables. I'm trying to avoid looping and comparing each individual record or column, although there may be no other way. All I need to know is if there is a difference in the data, I do not need to know the details of any difference.
I have tried the below code, but it appears that dataset.Merge does not update rowstatus so dataset.HasChanges() always returns false. Any help is appreciated:
var currentDataSet = GetSomeData();
var historicalDataSet = GetSomeHistoricalData();
historicalDataSet.Merge(currentDataSet);
if (historicalDataSet.HasChanges()) DoSomeStuff();
I don't know of any built-in support for this and I wouldn't expect it either. So you'll have to do this by yourself in some way.
The most obvious way would be a brute force, table by table and row by row approach.
If you can rely on certain factors to be the same, ie exactly the same naming, ordering of records etc then you could test if saving both as XML and comparing the results might be an efficient trick.

Categories