Summarizing data by multiple columns

Summarizing data by multiple columns - c#

My boss is asking me to code a report that has the following components:
A pie chart of employee count by state
A pie chart of employee count by age bracket (10 year brackets)
A pie chart of employee length of service (5 year brackets)
A pie chart of employee Male/Female breakdown
A pie chart of employee count by salary band (computer generates brackets).
There may be others.
I know I can do this by writting 5 different sql statements. However it seems like this would generate 5 table scans for one report.
I could switch gears and do one table scan and analyse each record on the front end and increment counters and probably accomplish this with one-pass.
Which way would the collective wisdom at stackoverflow go on this?
Is there a way to accomplish this with the CUBE or ROLL UP clauses in T-SQL?

If your data is properly indexed, those reports may not require any table scans at all.
Really, for a problem like this you should code up the reports the simple way, and then see whether the performance meets the business requirements. If not, then look at optimisation strategies.

if you want 5 pie charts and need to summarize then you need 5 SQL statements since your WHERE clause is different for each

You may have some performance gains by storing intermediate results in a table variable or temp table, then running more aggregation against it. Example with only two result sets:
SELECT COUNT(*) as cnt, State, AgeBracket
INTO #t
FROM YourTable
GROUP BY State, AgeBracket;
SELECT SUM(cnt) AS cnt, State FROM #t GROUP BY State;
SELECT SUM(cnt) AS cnt, AgeBracket FROM #t GROUP BY AgeBracket;

Related

Ranking Database Entries

I have a database that contains:
user_id | category_id | liked_id | disliked_id
(thanks to stack overflow users for helping me get my database setup properly in the first place!!)
Last time I used food as an example but this time I'm going to use people.
The user is given 2 images (male vs male or female vs female) and he/she simply chooses which one he/she thinks is more attractive. The user repeats this process as long as he/she wishes. Each selection is entered into the database showing which person they liked and which they disliked (also a button would be available if you think the two are similar).
Now that I have my table full of entries, I'm trying to develop an algorithm that will take all of those "votes" and translate it into a ranked list of who the user finds most attractive (based on hundreds or maybe even thousands of ranking entries).
I've been at the drawing board for hours and can't seem to think of an effective way of doing this.
Any help would be appreciated.
P.S.: The idea is also to have this be a multi-user thing, where other users can see your "like" tables and also have globally averaged tables showing how all users in general rank things.

So you posted your question in the c# group. I want to give you, however, a solution that is implemented in the database, making it more independent of your program.
What you probably want to do first is to get the number of times an image has been liked and disliked. This SQL statement should do that for you (if you are using a database supporting grouping sets it would probably be easier to write):
SELECT t1.liked_id as id, t1.c_liked, t2.c_disliked
FROM
(SELECT liked_id, COUNT(*) as c_liked FROM table GROUP BY liked_id) t1
LEFT JOIN
(SELECT disliked_id, COUNT(*) c_disliked FROM table GROUP BY disliked_id) t2
ON
t1.liked_id = t2.disliked_id
Then it's up to you what you do with the numbers. In the outermost SELECT-statement, you could put a very complicated function, e.g. you could choose to weigh the dislikes less than the likes. To give you an idea of a possible very simple function:
SELECT t1.liked_id as id,
(t1.c_liked/(t1.c_liked + t2.c_disliked) - t2.c_disliked/(t1.c_liked + t2.c_disliked)) as score
This returns you values [-1, 1] (which you could normalize to [0, 1] if you like, but don't have to), which you then can sort as in this example:
SELECT t1.liked_id as id,
(t1.c_liked/(t1.c_liked + t2.c_disliked) - t2.c_disliked/(t1.c_liked + t2.c_disliked)) as score
FROM
(SELECT liked_id, COUNT(*) as c_liked FROM table GROUP BY liked_id) t1
LEFT JOIN
(SELECT disliked_id, COUNT(*) c_disliked FROM table GROUP BY disliked_id) t2
ON
t1.liked_id = t2.disliked_id
ORDER BY score

Database structure, Users + User Types where Users can be of more than one Type

I currently have a User table, tblUser and a User Types table, tblUserTypes.
The two are linked by means of a foreign key link in tblUser... fkUserTypeID.
Hence at the moment a user can be of only one type.
BUT, there are circumstances where the user can be of multiple types... say for example, a Customer as well as a Supplier.
The obvious solution to me is to create a new table in between tblUser and tblUserTypes, tblUser_UserTypes which is a bridging table:
[tblUser] ---< [tblUser_UserTypes] >--- [tblUserTypes]
BUT, I can see complexities arising from this... for example when exporting a list of users joined onto their user types, with a straight forward join I'm going to end up with multiple rows of those users. It could be possible to bring each user record back to a single row using a PIVOT query perhaps? (more below on this)
Importing Users into the system also seems problematic... I am currently using a BCP (Bulk Copy Process) from a file to import users directly into the user table... the import file contains a single field "user type" which works in the existing model because each user can currently only be of one type. BUT, with multiple user types I can't see how a direct BCP directly into the user table could work.
Adding to the complexity is that user types are not currently fixed... the table tblUserTypes is dynamic ... part of the system is to allow creation of any number of user types. However, there are some types of users that I need to know about to be able to define business logic at a higher level.... e.g. "Only allow users of type=x in this area"... so it has been suggested that in the user types table there is a series of flags that define what type of type the user types are (e.g. IsCustomer, IsSupplier)
This is feeling like an over complicated mess and I'm loosing sleep over how to move forward.
I would love to bring the user types back into the table tblUser and do away with the other two tables entirely... a series of checkboxes in the user table (e.g. IsCustomer, IsSupplier)... because that makes importing and exporting straight forward. BUT then the user types wouldn't be dynamic. Interestingly though the user types are not COMPLETELY dynamic... because as mentioned above there are some user types I need to know about when it comes to business login.
Hmmm, should it be a hybrid of the two? Am I trying to squash two features into one? Perhaps I could have checkbox / boolean types in the user table for the types that correlate to business logic (e.g. IsCustomer, IsSupplier) and rename the context of the "User Types" to be "User Groups" or something like that.
A major concern for me is impact on importing, exporting and search results when considering a structure where a straight forward join is going to result in users being replicated... one row for each user type they belong to. I would have to do a PIVOT query to bring this back to one record per user, with a column for each user type, wouldn't I? A realistic example is a User table with 3 million records and importing 10,000 records at a time... or exporting 10,000 records at a time... or searching across those 3 million records to retrieve 3,000 matches and having that rendered on a web page in a paginated fashion where they can flick through the search result pages (I use ROWNUM in my search query to work with pagination, I don't return the whole lot every time).
This is my first question on Stack Overflow, I'm sorry if it's a bit convoluted or there are already answers listed... I tried to search but couldn't come up with examples handling the complexities of working with Users that can be of multiple Types.
Oh, in case it matters... this is a C# ASP.NET application working with SQL Server.
After thinking it through and reading responses I'm going to go all the way and use the bridging table... the requirements say that users can be of multiple types so that's how it will be. Consequences on existing code are dramatic, but better now than down the track.
I played around with the table structure and the queries required to get data out in a flat structure are a bit fiddly and ultimately require dynamic SQL (because the list of user types is dynamic) a which I'm not a fan of but I can't see another way to do it.
In the examples below companies fetched are filtered by an 'Event ID' i.e. fkEventID
If there is a better way to do the 'flattening' I would be very appreciative of any help :-)
Straight forward join (multiple rows per company if they are of more than one type)
select * from tblCompany
left join tblCompany_CompanyType on fkCompanyID = pkCompanyID
left join tblCompanyType on fkCompanyTypeID = pkCompanyTypeID
where tblCompany.fkEventID = 1
Hard Coded pivot query (single rows per company if they are of more than one type, but the company types are not dynamic)
select * from (
select tblCompany.*,tblCompanyType.CompanyType from tblCompany left join
tblCompany_CompanyType on fkCompanyID = pkCompanyID
left join tblCompanyType on fkCompanyTypeID = pkCompanyTypeID
where tblCompany.fkEventID = 1
) AS sourcequery
Pivot (count(CompanyType) for CompanyType IN ([Customer],[Supplier],[Something Else])) as CompanyTypeName
Dynamic Pivot Query (multiple rows per company and handles dynamic company types)
DECLARE #cols AS NVARCHAR(MAX)
DECLARE #sql AS NVARCHAR(MAX)
SET #cols = STUFF(
(SELECT N',' + QUOTENAME(CompanyType) AS [text()]
FROM (
select CompanyType from tblCompanyType
where fkEventID = 1
) AS Y
FOR XML PATH('')),
1, 1, N'');
SET #sql = N'SELECT * FROM (
select tblCompany.*,tblCompanyType.CompanyType from tblCompany left join tblCompany_CompanyType on fkCompanyID = pkCompanyID
left join tblCompanyType on fkCompanyTypeID = pkCompanyTypeID
where tblCompany.fkEventID = 1
) AS sourcequery
Pivot (count(CompanyType) for CompanyType IN (' + #cols + ')) as CompanyTypeName
order by pkCompanyID'
EXEC sp_executesql #sql;

You truly do have a many to many relationship between users and user types, and I suggest you go ahead and implement it that way.
If you have a need to see it flattened out in some instances, you can accomodate that with a view or stored procedure.
If you want to continue to import using BCP, you can always BCP into a staging table and then use a stored proc to fill out your 3 tables. It's probably safer to do it that way anyway.
Keeping to fully implementing the many to many relationship will give you the most flexibility in your app, and will prevent you from needing to continually modify your user table as you get new requirements for new security roles.

limiting the records inserted in a sql table

I have the requirement to build a asp.net sign up form which will allow students to register a training. So far I built a database in sql server and 3 tables: student, training & studenttraining
My question is, how can I limit the form from displaying the dates available once a particular training gets full, or meabe how can I prevent by checking the tables that the user can register?

Select count(*) as SeatsFilled, t.TrainingKey, t.TrainingDate
From Training t
Inner Join StudentTraining st on t.TrainingKey = st.TrainingKey
Group By t.TrainingKey, t.TrainingDate
Having count(*) < t.TotalSeats
TotalSeats is a column in the Training table that specifies how many seats the training provides. I assumed StudentTraining is a many-to-many bridge table between Students and Training.

You'll need to establish what "full" is first. Then, you can do a simple
SELECT COUNT(id) FROM table to determine if the full amount is already reached.

I guess you could have a MaxTraining column in the training table and when you get the data for your form, you can count the training entries in studenttraining, and if it equals MaxTraining, then don't bring that training entry, cause it means it's already full.

Which approach is better to retrieve data from a database

I am confused about selecting two approaches.
Scenario
there are two tables Table 1 and Table 2 respectively. Table 1 contains user's data for example first name, last name etc
Table 2 contains cars each user has with its description. i.e Color, Registration No etc
Now if I want to have all the information of all users then what approach is best to be completed in minimum time?
Approach 1.
Query for all rows in Table 1 and store them all in a list for ex.
then Loop through the list and query it and get data from Table 2 according to user saved in in first step.
Approach 2
Query for all rows and while saving that row get its all values from table 2 and save them too.
If I think of system processes then I think it might be the same because there are same no of records to be processed in both approaches.
If there is any other better idea please let me know

Your two approaches will have about the same performance (slow because of N+1 queries). It would be faster to do a single query like this:
select *
from T1
left join T2 on ...
order by T1.PrimaryKey
Your client app can them interpret the results and have all data in a single query. An alternative would be:
select *, 1 as Tag
from T1
union all
select *, 2 as Tag
from T2
order by T1.PrimaryKey, Tag
This is just pseudo code but you could make it work.
The union-all query will have surprisingly good performance because sql server will do a "merge union" which works like a merge-join. This pattern also works for multi-level parent-child relationships, although not as well.

Best way of acquiring information from several database tables

I have a medical database that keeps different types of data on patients: examinations, lab results, x-rays... each type of record exists in a separate table. I need to present this data on one table to show the patient's history with a particular clinic.
My question: what is the best way to do it? Should I do a SELECT from each table where the patient ID matches, order them by date, and then keep them in some artificial list-like structure (ordered by date)? Or is there a better way of doing this?
I'm using WPF and SQL Server 2008 for this app.

As others have said, JOIN is the way you'd normally do this. However, if there are multiple rows in one table for a patient then there's a chance you'll get data in some columns repeated across multiple rows, which often you don't want. In that case it's sometimes easier to use UNION or UNION ALL.
Let's say you have two tables, examinations and xrays, each with a PatientID, a Date and some extra details. You could combine them like this:
SELECT PatientID, ExamDate [Date], ExamResults [Details]
FROM examinations
WHERE PatientID = #patient
UNION ALL
SELECT PatientID, XrayDate [Date], XrayComments [Details]
FROM xrays
WHERE PatientID = #patient
Now you have one big result set with PatientID, Date and Details columns. I've found this handy for "merging" multiple tables with similar, but not identical, data.

If this is something you're going to be doing often, I'd be tempted to create a denormalized view on all of patient data (join the appropriate tables) and index the appropriate column(s) in the view. Then use the appropriate method (stored procedure, etc) to retrieve the data for a passed-in patientID.

Use a JOIN to get data from several tables.

You can use a join (can't remember which type exactly) to get all the records from each table for a specific patient. The way this works depends on your database design.

I'd do it with separate SELECT statements, since a simple JOIN probably won't do due to the fact that some tables might have more than 1 row for the patient.
So I would retrieve multiple result-sets in a simple DataSet, add a DalaRelation, cache the object and query it down the line (by date, by exam type, subsets, ...)
The main point is that you have all the data handy, even cached if needed, in a structure which is easily queried and filtered.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Summarizing data by multiple columns - c#

If your data is properly indexed, those reports may not require any table scans at all. Really, for a problem like this you should code up the reports the simple way, and then see whether the performance meets the business requirements. If not, then look at optimisation strategies.

if you want 5 pie charts and need to summarize then you need 5 SQL statements since your WHERE clause is different for each

Related

Ranking Database Entries

Database structure, Users + User Types where Users can be of more than one Type

limiting the records inserted in a sql table

Which approach is better to retrieve data from a database

Best way of acquiring information from several database tables

Categories

Resources