Ranking Database Entries

Ranking Database Entries - c#

I have a database that contains:
user_id | category_id | liked_id | disliked_id
(thanks to stack overflow users for helping me get my database setup properly in the first place!!)
Last time I used food as an example but this time I'm going to use people.
The user is given 2 images (male vs male or female vs female) and he/she simply chooses which one he/she thinks is more attractive. The user repeats this process as long as he/she wishes. Each selection is entered into the database showing which person they liked and which they disliked (also a button would be available if you think the two are similar).
Now that I have my table full of entries, I'm trying to develop an algorithm that will take all of those "votes" and translate it into a ranked list of who the user finds most attractive (based on hundreds or maybe even thousands of ranking entries).
I've been at the drawing board for hours and can't seem to think of an effective way of doing this.
Any help would be appreciated.
P.S.: The idea is also to have this be a multi-user thing, where other users can see your "like" tables and also have globally averaged tables showing how all users in general rank things.

So you posted your question in the c# group. I want to give you, however, a solution that is implemented in the database, making it more independent of your program.
What you probably want to do first is to get the number of times an image has been liked and disliked. This SQL statement should do that for you (if you are using a database supporting grouping sets it would probably be easier to write):
SELECT t1.liked_id as id, t1.c_liked, t2.c_disliked
FROM
(SELECT liked_id, COUNT(*) as c_liked FROM table GROUP BY liked_id) t1
LEFT JOIN
(SELECT disliked_id, COUNT(*) c_disliked FROM table GROUP BY disliked_id) t2
ON
t1.liked_id = t2.disliked_id
Then it's up to you what you do with the numbers. In the outermost SELECT-statement, you could put a very complicated function, e.g. you could choose to weigh the dislikes less than the likes. To give you an idea of a possible very simple function:
SELECT t1.liked_id as id,
(t1.c_liked/(t1.c_liked + t2.c_disliked) - t2.c_disliked/(t1.c_liked + t2.c_disliked)) as score
This returns you values [-1, 1] (which you could normalize to [0, 1] if you like, but don't have to), which you then can sort as in this example:
SELECT t1.liked_id as id,
(t1.c_liked/(t1.c_liked + t2.c_disliked) - t2.c_disliked/(t1.c_liked + t2.c_disliked)) as score
FROM
(SELECT liked_id, COUNT(*) as c_liked FROM table GROUP BY liked_id) t1
LEFT JOIN
(SELECT disliked_id, COUNT(*) c_disliked FROM table GROUP BY disliked_id) t2
ON
t1.liked_id = t2.disliked_id
ORDER BY score

Related

Create an ordered table

I am not sure if I titled this question correctly, here is my question. I have a table that has Products and these products have various details including buying price, selling price, initial stock quantity, number of items sold, number of items remaining. I then have another table with sales information based on location of the buyer lets call it LocationSales. I need to create a table that will show the products and the location information like in the snapshot below.
I made the representation using Excel. I had already achieved this while working on a project earlier last year but it involved a lot or hard-coding and stack-overflow exceptions. I would like to achieve this more effectively. This is a hypothetical representation of my actual problem. I have tried using hierarchical tables before using Telerik UI which worked well but did not display the information quite as required by my superiors. I would really appreciate it if someone would point me to the right direction. I understand it's impossible to address the entire problem here but I would appreciate someone including a link to a blog post or video or some literature that can assist me. PS. I have tried database views as well, didn't work as required.
I am using ASP.NET C# MVC5 with Visual Studio 2013 and the database as SQL Server 2012
EDIT://Tables I am using
EDIT 2:///
As per my research so far i have managed to obtain the following
SQL CODE://
select * from
(select
locationName
,productName as [PRODUCT]
,roadTransfer
,airTransfer
,initialQty
,soldQty
,remQty
from
Location as l inner join
Sales as s on l.locationID=s.locationID
inner join
Product as p on p.productID=s.productID
)as BaseData
pivot(
count(roadTransfer)
for locationName
in([Germany]
,[Kenya]
) as SummaryTable
order by Product asc
I used a pivot table. I am currently trying to get both the road and air transfer. I I will try using a GROUP BY clause and see what I can do. I will post an update when I have the full solution.

So at this point (Edit 2) you have created a pivot for road transfer. Unfortunately you can't aggregate more than one value in a pivot, so the way round it is to do two pivots and join the results together. You result set therefore has 3 query parts to produce the query
A list of products
Pivoted Road Transfer amounts by country
Pivoted Air Transfer amounts by country
If you left join all of them together in a single query on the product ID you will get close to your query, but the column order won't be quite right i.e. all the Road Transfers will come before the Air Transfers.
I have created a SQLFiddle demonstrating this
Select p.prodName AS Product,
p.initialQty AS [Initial Quantity],
p.sold AS [Quantity Sold],
p.remaining AS [Quantity Remaining],
AirT.[Air Transfer - Germany],
RoadT.[Road Transfer - Germany],
AirT.[Air Transfer - Kenya],
RoadT.[Road Transfer - Kenya]
FROM Products p
LEFT JOIN (
SELECT productID, [Germany] AS [Air Transfer - Germany], [Kenya] AS [Air Transfer - Kenya]
FROM (select productID, l.locName, qty
FROM sales s
INNER JOIN Locations l on l.locID = s.locID
where transfer = 'Air Transfer') as SourceTable
PIVOT (SUM(qty)
FOR locName IN ([Germany], [Kenya])
) As AirTransfers
) AirT on AirT.productID = p.ProductID
LEFT JOIN (
SELECT productID, [Germany] AS [Road Transfer - Germany], [Kenya] AS [Road Transfer - Kenya]
FROM (select productID, l.locName, qty
FROM sales s
INNER JOIN Locations l on l.locID = s.locID
where transfer = 'Road Transfer') as SourceTable
PIVOT (SUM(qty)
FOR locName IN ([Germany], [Kenya])
) As RoadTransfers
) RoadT on RoadT.productID = p.ProductID

Database structure, Users + User Types where Users can be of more than one Type

I currently have a User table, tblUser and a User Types table, tblUserTypes.
The two are linked by means of a foreign key link in tblUser... fkUserTypeID.
Hence at the moment a user can be of only one type.
BUT, there are circumstances where the user can be of multiple types... say for example, a Customer as well as a Supplier.
The obvious solution to me is to create a new table in between tblUser and tblUserTypes, tblUser_UserTypes which is a bridging table:
[tblUser] ---< [tblUser_UserTypes] >--- [tblUserTypes]
BUT, I can see complexities arising from this... for example when exporting a list of users joined onto their user types, with a straight forward join I'm going to end up with multiple rows of those users. It could be possible to bring each user record back to a single row using a PIVOT query perhaps? (more below on this)
Importing Users into the system also seems problematic... I am currently using a BCP (Bulk Copy Process) from a file to import users directly into the user table... the import file contains a single field "user type" which works in the existing model because each user can currently only be of one type. BUT, with multiple user types I can't see how a direct BCP directly into the user table could work.
Adding to the complexity is that user types are not currently fixed... the table tblUserTypes is dynamic ... part of the system is to allow creation of any number of user types. However, there are some types of users that I need to know about to be able to define business logic at a higher level.... e.g. "Only allow users of type=x in this area"... so it has been suggested that in the user types table there is a series of flags that define what type of type the user types are (e.g. IsCustomer, IsSupplier)
This is feeling like an over complicated mess and I'm loosing sleep over how to move forward.
I would love to bring the user types back into the table tblUser and do away with the other two tables entirely... a series of checkboxes in the user table (e.g. IsCustomer, IsSupplier)... because that makes importing and exporting straight forward. BUT then the user types wouldn't be dynamic. Interestingly though the user types are not COMPLETELY dynamic... because as mentioned above there are some user types I need to know about when it comes to business login.
Hmmm, should it be a hybrid of the two? Am I trying to squash two features into one? Perhaps I could have checkbox / boolean types in the user table for the types that correlate to business logic (e.g. IsCustomer, IsSupplier) and rename the context of the "User Types" to be "User Groups" or something like that.
A major concern for me is impact on importing, exporting and search results when considering a structure where a straight forward join is going to result in users being replicated... one row for each user type they belong to. I would have to do a PIVOT query to bring this back to one record per user, with a column for each user type, wouldn't I? A realistic example is a User table with 3 million records and importing 10,000 records at a time... or exporting 10,000 records at a time... or searching across those 3 million records to retrieve 3,000 matches and having that rendered on a web page in a paginated fashion where they can flick through the search result pages (I use ROWNUM in my search query to work with pagination, I don't return the whole lot every time).
This is my first question on Stack Overflow, I'm sorry if it's a bit convoluted or there are already answers listed... I tried to search but couldn't come up with examples handling the complexities of working with Users that can be of multiple Types.
Oh, in case it matters... this is a C# ASP.NET application working with SQL Server.
After thinking it through and reading responses I'm going to go all the way and use the bridging table... the requirements say that users can be of multiple types so that's how it will be. Consequences on existing code are dramatic, but better now than down the track.
I played around with the table structure and the queries required to get data out in a flat structure are a bit fiddly and ultimately require dynamic SQL (because the list of user types is dynamic) a which I'm not a fan of but I can't see another way to do it.
In the examples below companies fetched are filtered by an 'Event ID' i.e. fkEventID
If there is a better way to do the 'flattening' I would be very appreciative of any help :-)
Straight forward join (multiple rows per company if they are of more than one type)
select * from tblCompany
left join tblCompany_CompanyType on fkCompanyID = pkCompanyID
left join tblCompanyType on fkCompanyTypeID = pkCompanyTypeID
where tblCompany.fkEventID = 1
Hard Coded pivot query (single rows per company if they are of more than one type, but the company types are not dynamic)
select * from (
select tblCompany.*,tblCompanyType.CompanyType from tblCompany left join
tblCompany_CompanyType on fkCompanyID = pkCompanyID
left join tblCompanyType on fkCompanyTypeID = pkCompanyTypeID
where tblCompany.fkEventID = 1
) AS sourcequery
Pivot (count(CompanyType) for CompanyType IN ([Customer],[Supplier],[Something Else])) as CompanyTypeName
Dynamic Pivot Query (multiple rows per company and handles dynamic company types)
DECLARE #cols AS NVARCHAR(MAX)
DECLARE #sql AS NVARCHAR(MAX)
SET #cols = STUFF(
(SELECT N',' + QUOTENAME(CompanyType) AS [text()]
FROM (
select CompanyType from tblCompanyType
where fkEventID = 1
) AS Y
FOR XML PATH('')),
1, 1, N'');
SET #sql = N'SELECT * FROM (
select tblCompany.*,tblCompanyType.CompanyType from tblCompany left join tblCompany_CompanyType on fkCompanyID = pkCompanyID
left join tblCompanyType on fkCompanyTypeID = pkCompanyTypeID
where tblCompany.fkEventID = 1
) AS sourcequery
Pivot (count(CompanyType) for CompanyType IN (' + #cols + ')) as CompanyTypeName
order by pkCompanyID'
EXEC sp_executesql #sql;

You truly do have a many to many relationship between users and user types, and I suggest you go ahead and implement it that way.
If you have a need to see it flattened out in some instances, you can accomodate that with a view or stored procedure.
If you want to continue to import using BCP, you can always BCP into a staging table and then use a stored proc to fill out your 3 tables. It's probably safer to do it that way anyway.
Keeping to fully implementing the many to many relationship will give you the most flexibility in your app, and will prevent you from needing to continually modify your user table as you get new requirements for new security roles.

limiting the records inserted in a sql table

I have the requirement to build a asp.net sign up form which will allow students to register a training. So far I built a database in sql server and 3 tables: student, training & studenttraining
My question is, how can I limit the form from displaying the dates available once a particular training gets full, or meabe how can I prevent by checking the tables that the user can register?

Select count(*) as SeatsFilled, t.TrainingKey, t.TrainingDate
From Training t
Inner Join StudentTraining st on t.TrainingKey = st.TrainingKey
Group By t.TrainingKey, t.TrainingDate
Having count(*) < t.TotalSeats
TotalSeats is a column in the Training table that specifies how many seats the training provides. I assumed StudentTraining is a many-to-many bridge table between Students and Training.

You'll need to establish what "full" is first. Then, you can do a simple
SELECT COUNT(id) FROM table to determine if the full amount is already reached.

I guess you could have a MaxTraining column in the training table and when you get the data for your form, you can count the training entries in studenttraining, and if it equals MaxTraining, then don't bring that training entry, cause it means it's already full.

Which approach is better to retrieve data from a database

I am confused about selecting two approaches.
Scenario
there are two tables Table 1 and Table 2 respectively. Table 1 contains user's data for example first name, last name etc
Table 2 contains cars each user has with its description. i.e Color, Registration No etc
Now if I want to have all the information of all users then what approach is best to be completed in minimum time?
Approach 1.
Query for all rows in Table 1 and store them all in a list for ex.
then Loop through the list and query it and get data from Table 2 according to user saved in in first step.
Approach 2
Query for all rows and while saving that row get its all values from table 2 and save them too.
If I think of system processes then I think it might be the same because there are same no of records to be processed in both approaches.
If there is any other better idea please let me know

Your two approaches will have about the same performance (slow because of N+1 queries). It would be faster to do a single query like this:
select *
from T1
left join T2 on ...
order by T1.PrimaryKey
Your client app can them interpret the results and have all data in a single query. An alternative would be:
select *, 1 as Tag
from T1
union all
select *, 2 as Tag
from T2
order by T1.PrimaryKey, Tag
This is just pseudo code but you could make it work.
The union-all query will have surprisingly good performance because sql server will do a "merge union" which works like a merge-join. This pattern also works for multi-level parent-child relationships, although not as well.

Summarizing data by multiple columns

My boss is asking me to code a report that has the following components:
A pie chart of employee count by state
A pie chart of employee count by age bracket (10 year brackets)
A pie chart of employee length of service (5 year brackets)
A pie chart of employee Male/Female breakdown
A pie chart of employee count by salary band (computer generates brackets).
There may be others.
I know I can do this by writting 5 different sql statements. However it seems like this would generate 5 table scans for one report.
I could switch gears and do one table scan and analyse each record on the front end and increment counters and probably accomplish this with one-pass.
Which way would the collective wisdom at stackoverflow go on this?
Is there a way to accomplish this with the CUBE or ROLL UP clauses in T-SQL?

If your data is properly indexed, those reports may not require any table scans at all.
Really, for a problem like this you should code up the reports the simple way, and then see whether the performance meets the business requirements. If not, then look at optimisation strategies.

if you want 5 pie charts and need to summarize then you need 5 SQL statements since your WHERE clause is different for each

You may have some performance gains by storing intermediate results in a table variable or temp table, then running more aggregation against it. Example with only two result sets:
SELECT COUNT(*) as cnt, State, AgeBracket
INTO #t
FROM YourTable
GROUP BY State, AgeBracket;
SELECT SUM(cnt) AS cnt, State FROM #t GROUP BY State;
SELECT SUM(cnt) AS cnt, AgeBracket FROM #t GROUP BY AgeBracket;

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.