I have a need to bring some role based content (several pieces resulting into one object) from the database to a service layer. Access to each content piece will depend on the role that the user has. If the user role has no access to a piece, that one will be empty. This is not a huge application, just a web service method exposing some content based on roles.
I could do this in three ways.
1) Based on the user role, make database calls for each piece of content.
Pros - The roles are managed in the code thus helping business logic (?) stay in the code. Only brings back data that is needed.
Cons - Multiple db calls. Code needs to be modified when a new role is added. (unless some really complicated business logic is used.)
2) Make a single db call and bring back all the content pieces as separate results sets. Loop through the sets and obtain pieces based on the user role.
Pros - Single db call. Roles are managed within the code.
Cons - Code needs to be modified for new roles. Extra data is brought back though it may not be needed. Those unneeded queries can add a couple of seconds.
3) Send the roles to the db and get each piece based on the role access.
Pros - single db call. Only brings back what is needed. No need to change code for new roles as only stored procedure needs to change.
Cons - Business logic in the database?
It looks to me that #3 > #2 > #1. (overriden > to mean better than)
Does anyone have any insights into which approach may be better?
Update -based on some comments, some more details are below.
User role is obtained from another system. #3 would ideally pass it to the db, and in crude terms for the db, return data as- if user_role ="admin", get all pieces, for "editor" get content pieces 1,3 and 55. Again, this is not a big application where the role management is done in the db. Its a web service method to expose some data for several companies.
We obviously cannot use this model for managing the roles across an application. But for a method level access control as in this scenario, I believe #3 is the best way. Since the roles come from another system than where the content resides, the logic to control access to different content pieces has to reside somewhere. The database looks like the right place to have a maintainable,scalable, less hassle solution in this particular scenario. Perhaps, even create a look up table in the content db to hold roles and content piece access to give a sense of "data" and "logic" separation, rather than having a udf to perform the logic.
If no one can think of a valid case against #3, I think I'll go ahead with it.
I would always pick option 3 and enforce it in the database itself.
Security is best handled at the closest point to the actual data itself for a lot of reasons. Look at it this way: It is more common for an additional application to be added in a different language than it is to toss a database model. When this happens all of your role handling code would have to be duplicated.
Or let's say the application is completely bypassed during a hack. The database should still enforce it's security.
Finally, although people like separating "business logic" from their data, the reality is that most data has no meaning without said logic. Further "security logic" isn't the same thing as regular "business logic" anyway. It is there to protect you and your clients. But that's my $0.02.
Looking at your other options:
2) You are sending too much data back to the client. This is both a security and performance no no. What if your app isn't the one making the data request? What if you have a slight bug in your app that shows too much to the user?
1 and 2) Both require redeploy for even slight logic changes (such as fixing the mythical bug above). This might not be desired. Personally, I prefer making minor adjustments to stored procedures over redeploying code. On a sizeable enough project it might be difficult to know exactly what all is being deployed and just generally has a higher potential of problems.
UPDATE
Based on your additional info, I still suggest sticking with #3.
It depends on how your database is structured.
If you can manage access rights in the database, you might have a table design along the lines of
Content Table
ContentId Content
Role Table
RoleID RoleName
ContentAccess Table
ContentID RoleID
Then passing in the role as a query parameter is absolutely not "business logic in the database". You would obviously write a query to join the "content" and "contentaccess" tables to retrieve those rows in the content table where there's a matching record in ContentAccess for the current user's role.
If your application uses code to determine if a user is allowed to see a specific piece of content, that doesn't work. The crudest example of this would be "if user_role = "admin" then get all content, if user_role = "editor" get items 1, 3 and 55". I'd argue that this is not really a maintainable design - but you say the application is not really that big to begin with, so it might not be a huge deal.
Ideally, I'd want to refactor the application to "manage access rights as data, not code", because you do mention maintainability as a requirement.
If you don't want to do that, option 1 is the way to go; you could perhaps refine it to an "in" query, rather than multiple different queries. So, you run whatever logic determines whether a user role can see the content, and then execute a query along the lines of "select * from content where content_id in (1, 3, 55)".
Different people have different feelings about stored procedures; my view is to avoid using stored procedures unless you have a proven, measurable performance requirement that can only be met by using stored procedures. They are hard to test, hard to debug, it's relatively rare to find developers who are great at both Transact SQL and C# (or whatever), and version control etc. is usually a pain.
How many new roles per year do you anticipate? If few roles, then stick everything in code if it makes the code simpler. If a lot, use option #3.
If you really dislike multiple calls, you can always do a SELECT ... UNION or defer the retrieval to a simple stored procedure.
Alternatively, consider just getting one of myriad RBAC frameworks and let it take care of the problem.
Related
Coming from a sql background. When I want to limit access to data based upon certain attributes of a user, I can create a view and use the view as a filter in limiting what data a user sees based upon the criteria in the view. This approach relies upon relationships and so far it has worked for me. Looking at NoSql and the shift in strategy and concept, I am confused about how to implement this consdering the nature of NoSql. What is the NoSql approach to a problem such as this? When users are only privy to certain rows based upon their user type? For example, say an administrator can see all of the records for a particular group and a generic user can only see their records and certain group level items, group photos, group messaging, etc. that are public within a group. I am really trying to wrap my head around not thinking in terms of the sql approach to this problem but I am new to NoSql so that has been a challenge.
NoSQL databases are conceptually different from relational databases in many aspects. Authorization and security in general is not their primary focus. But, most of them have evolved in that area, and they have fine-grained authorization. Basically, it depends on a particular database.
For example, Cassandra has column-level permissions in plan (https://issues.apache.org/jira/browse/CASSANDRA-12859), HBase has cell-level permissions (https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_hbase_authorization.html). On the other hand, MongoDB is generally schemaless and it has different (more complex) document-oriented data model, which makes it hard to implement low level access control. Additionally, MongoDB has views.
So, if DBMS you are using doesn't have built-in authorization on expected level, it has to be implemented inside an application that interacts with db (if there are more than one application, things can get tricky, and some usage rules have to be established). Using denormalized models is a common thing, so different roles/groups can interact with different tables/collections which contain data that only that role/group can see (basically, it's a simulation of RDBMS views). This denormalized approach usually takes more space and it requires keeping copies in sync. If DBMS supports projection, subset of columns/fields can be presented for different roles/groups (that way, at least some of processing is done on db side).
I hope this helps, though it's a late answer.
I'm not sure if what I'm attempting to do is simply incorrect/impossible or if there is an easier way and that I'm missing the point.
I'm using SQL Server 2012
What I would like to do is have a table that can store rows with values relating to stored properties in another table. Basically, key value pair. The thing is, I would like to determine which key values can be used by which entities.
For example,
I would like one table listing various companies, another to store 'files' created for each company - this is used to store historical information, another listing various production departments(stages in production), another listing production figures(KGs, Units, etc), and one listing the actual production capture against these figures for each month. There are also tables in place to show which production departments can use which production figures as well as which company has which production departments.
Some of the companies have the same stages in production as well as additional stages that the others don't.
These figures are captured on a monthly basis ONLY, so I have a table describing all the months of a year.
Each production department may have similar types of recordings to be captured, though they don't all have the same production readings.
Here's a link to a graphical representation of the table layouts:
http://tinypic.com/r/30a51mx/8
..
My end result is to auto-populate / update the table with newly added figures as the user enters this section of the program (by passing through the FileID), and to allow the user to edit this using a datagridview (or atleast select a value to be edited from the datagridview)
I will then need to write reports later on that will need to pivot on this information.
Any help or suggestions would be greatly appreciated.
Thanks
For an effective DB design it is very important to understand, two major requirements:
Should the DB design be done keeping in mind the ease of use from application point of view or from efficient storage point of view.
This point is by large decided keeping in mind following factors:
How much data are we going to store, so we need to have some idea about cost of storage factoring redundancy. Good normalised DB reduces redundancy.
Your DB is normalised very well, but is that really needed. Typically cost of storage is very less in today's time, and so if we can think of design which is slightly more redundant, it should be OK. Unless of course you plan to use Standard version of SQL server which has its own limitation in terms of DB size.
Is the data retrieval and update slow/fast? The more normalised DB is more number JOINS are expected. In your case, if you want to return values for multiple properties, say n, in a single result, then you'd need to make n joins on the ProductionProperty table, which will essentially reduce query performance, and hence slow user experience. So unless your UI is not very demanding, and your users can live with small lag, go ahead with a normalised DB design.
ORM mismatch- Since the relational database model and object model (assuming programming language follows OOPs concept) usually mismatch and they will heavily in a normalised scenario like this; you'll need to spend more hours coding through or troubleshooting scenarios which may require you to squirm in pain when making changes to either of these models. I suggest you use a good ORM framework to counter this and be more aware of the ORM mismatch scenarios.
Will you have separate Reporting DB or Reporting tables? Basically this translates into is your DB an OLTP database or Reporting Database? If this DB is going to worked on heavily by Data entry persons day in and day out, normalised form should be suitable considering that Point #1 is satisfied. If however reporting is a major need, then de-normalised form should be preferred (which means that you do not need so many separate tables).
PS: Master data should be kept in table of its own.I can say Months definitely is a master data and so is UoM unless you plan to do CRUD on the UoM measures too. Also note that it hardly matters keeping month in separate table especially when same business logic/constraints can be enforced on columns in SQL.
I have somewhat of a thought problem, where I'm not sure if what I already built can be done a lot more efficiently, so that's why I'll share my 'problem' here. (to be clear, everything I have built works, I'm just looking to make it more efficient).
I have a webapp made with MVC & SQL which has a log-in system etc for users.
A user has a status, which is an enum, and can be active, blocked etc and is stored in the database (along with other user-data).
Within my webapp I have made a custom AuthorizeAttr. for authorizing users on every call made (applied as a global filter).
However, the 'normal' authentication is based on the cookie, which does not change when I would change the user-status in the database. For instance, users can de-activate another user in the same group (when being Admin). These database changes are not taking immediate effect since by default the authorization only verifies the cookie, and the cookie is based on the status when logging in.
To fix this issue, I added some additional logic to my authorizationAttr, which on every request calls the database for the current user status (the enum), and then simply does some checks whether the user is allowed to continue, or a redirect is required.
Calling the database on every request seems (even just for 1 enum) seems to be a bit taxing on the server/db especially when the webapp would grow in popularity (= lots of users).
One idea I thought of was to cache the enum in session cache but for short periods of time (like 60 seconds), this would save some database calls, but obviously the user can still use the webapp for max 60seconds after being de-activated.
I could be wrong in thinking that these database calls are actually that taxing of course.
Any ideas for improvement?
how do you know that checking status per request is too expensive? did you measure performance cost of checking user status in the database? have you created your custom cache without actually measuring the cost of simple solution? do you use ORM like hibernate? they have 2nd level cache built in so often there will be no roundtrip to the database.
i think it's way better to stick to the KISS principle rather than creating custom solution for a difficult problem. even if your database will be the bottleneck then usually buying additional hardware once is cheaper than maintaining overcomplicated solution for years
if your application grow, first thing you throw away is relation database
Have you considered using ADO.NET DataSets for your requirement? If you don't have multiple front-ends you could possibly read the login statuses initially into the dataset. All read/write operations could be made to this and you could later save your changes to the actual database. In case you have multiple front-ends, would it be possible for you to restrict all read/write/modify operations of one group to a single front-end instance? Because I guess you could use the dataset approach in that case as well.
This is a beginner pattern question for a web forms-over-data sort of thing. I read Exposing database IDs - security risk? and the accepted answer has me thinking that this is a waste of time, but wait...
I have an MVC project referencing a business logic library, and an assembly of NHibernate SQL repositories referencing the same. If something forced my hand to go and reference those repositories directly from my controller codebase, I'd know what went wrong. But when those controllers talk in URL parameters with the database record IDs, does it only seem wrong?
I can't conceive of those IDs ever turning un-consumable (by MVC actions). I don't think I'd ever need two UI entities corresponding to the same row in the database. I don't intend for the controller to interpret the ID in any way. Surrogate keys would make zero difference. Still, I want to have the problem because assumptions about the ralational design aren't any better than layer-skipping dependencies.
How would you make a web application that only references the business logic assembly and talks in BL objects and GUIDs that only have meaning for that session, while the assembly persists transactions using database IDs?
You can encrypt or hash your ids if you want. Using session id as a salt. It depends on the context. A public shopping site you want the catalog pages to be clear an easily copyable. User account admin it's fine to encrypt the ids, so users can't url hack into someone else's account.
I would not consider this to be security by obscurity. If a malicious user has one compromised account they can look at all the form fields, url ids, and cookie values set while logged in as that user. They can then try using those when logged in as a different user to escalate permissions. But by protecting them using session id as a salt, you have locked that data down so it's only useful in one session. The pages can't even be bookmarked. Could they figure out your protection? Possibly. But likely they'd just move on to another site. Locking your car door doesn't actually keep anyone out of your car if they want to get in, but it makes it harder, so everyone does it.
I'm no security expert, but I have no problem exposing certain IDs to the user, those such as Product IDs, User IDs, and anything that the user could normally read, meaning if I display a product to the user, displaying its Product ID is not a problem.
Things that are internal to the system that the users do not directly interact with, like Transaction IDs, I do not display to the user, not in fear of them editing it somehow, but just because that is not information that is useful to them.
Quite often in forms, I would have the action point to "mysite.com/messages/view/5", where 5 is the message they want to view. In all of these actions, I always ensure that the user has access to view it (modify or delete, which ever functionality is required), by doing a simple database check and ensure the logged in user is equal to the messages owner.
Be very very very careful as parameter tampering can lead to data modification. Rules on 'who can access what ids' must be very very carefully built into your application when exposing these ids.
For instance, if you are updating an Order based on OrderId, include in your where clause for load and updates that :
where order.orderid=passedInOrderId and Order.CustomerId=
I developed an extension to help with stored ids in MVC available here:
http://mvcsecurity.codeplex.com/
Also I talk about this a bit in my security course at: Hack Proofing your ASP.NET MVC and Web Forms Applications
Other than those responses, sometimes it's good to use obvious id's so people can hack the url for the information they want. For example, www.music.com\artist\acdc or www.music.com\arist\smashing-pumpkins. If it's meaningful to your users and if you can increase the information the user understands from the page through the URL then all the better and especially if your market segment is young or tech savvy then use the id to your advantage. This will also boost your SEO.
I would say when it's not of use, then encode it. It only takes one developer one mistake to not check a customer id against a session and you expose your entire customer base.
But of course, your unit tests should catch that!
While you will find some people who say that IDs are just an implementation detail, in most systems you need a way of uniquely identifying a domain entity, and most likely you will generate an ID for that identifier. The fact that the ID is generated by the database is an implementation detail; but once it has been generated it becomes an attribute of the domain entity, and it is therefore perfectly reasonable to use it wherever you need to reference the entity.
In our organization we have the need to let employees filter data in our web application by supplying WHERE clauses. It's worked great for a long time, but we occasionally run into users providing queries that require full table scans on large tables or inefficient joins, etc.
Some clown might write something like:
select * from big_table where
Name in (select name from some_table where name like '%search everything%')
or name in ('a', 'b', 'c')
or price < 20
or price > 40
or exists (select 1 from some_other_table where col1 + col2 + col3 = 4)
or exists (select 1 from table_a, table+b)
Obviously, this is not a great way to query these tables with computed values, non-indexed columns, lots of OR's and an unrestricted join on table_a and table_b.
But for a user, this may make total sense.
So what's the best way, if any, to allow internal users to supply a query to the database while ensuring that it won't lock a dozen tables and hang the webserver for 5 minutes?
I'm guessing that's a programmatic way in c#/sql-server to get the execution plan for a query before it runs. And if so, what factors contribute to cost? Estimated I/O cost? Estimated CPU cost? What would be reasonable limits at which to tell the user that his query's no good?
EDIT: We're a market research company. We have thousands of surveys, each with their own data. We have dozens of researchers that want to slice that data in arbitrary ways. We have tools to let them construct "valid" filters using a GUI, but some "power users" want to supply their own queries. I realize this isn't standard or best practice, but how else can I let dozens of users query tables for the rows they want using arbitrarily complex conditions and ever-changing conditions?
The premise of your question states:
In our organization we have the need to let employees filter date in our web application by supplying WHERE clauses.
I find this premise to be flawed on its face. I can't imagine a situation where I would allow users to do this. In addition to the problems you have already identified, you are opening yourself up to SQL Injection attacks.
I would highly recommend reassessing your requirements to see if you can't build a safer, more focused way of allowing your users to search.
However, if your users really are sophisticated (and trusted!) enough to be supplying WHERE clauses directly, they need to be educated on what they can and can't submit as a filter.
You can try using the following:
SET SHOWPLAN_ALL ON
GO
SET FMTONLY ON
GO
<<< Your SQL code here >>>
GO
SET FMTONLY OFF
GO
SET SHOWPLAN_ALL OFF
GO
Then you can parse through what you've got. As to where to draw the line on various things, that's going to take some experience. There are some things to watch for, but nothing that is cut and dried. It's often more of an art to examine the query plans than a science.
As others have pointed out though, I think that your problem goes deeper than the technology implications. The fact that you let unqualified people access your database in such a way is the underlying problem. From past experience, I often see this in companies where they are too lazy or too inexperienced to properly capture their application's requirements. I'm not saying that this is necessarily the case with your corporate environment, but that's what I've seen.
In addition of trying to control what the users enter (which is a loosing battle, there will always be a new hire that will come up with an immaginative query), I'd look into Resource Governor, see Managing SQL Server Workloads with Resource Governor. You put the ad-hoc queries into a separate pool and cap the allocated resources. This way you can mitigate the problem by limiting the amount of damage a bad query can do to other tasks.
And you should also consider giving access to the data by other means, like Power Pivot and let users massage their data as hard as they want on their own Excel. Business power users love that, and the impact on the transaciton processign server is minimal.
Instead of allowing employees to directly write (append to) queries, and then trying to calculate the query cost before running it, why not create some kind of Advanced Search or filter feature that is NOT writing SQL you cannot control?
In very large Enterprise originations on internal application this is a common practice. Often during your design phase you will limit the criteria or put sensible limits on data ranges, but once the business gets hold of the app there will be calls from the business unit management to remove the restrictions. In my origination this is a management problem not an engineering issue.
What we did was profile all of the criteria and found the largest offenders, both users and what types of queries caused the most problems and put limitations on some of the queries. Also some very expensive queries that were used on a regular basis were added to the app and the app cached the results and ran the queries when load was low. We also created caned optimized queries for standard users and gave only specified users the ability to search for anything. Just a couple of ideas.
You could make a data model for your database and allow users to use SQL Reporting Services' Report Builder. Its GUI-based and doesn't require writing WHERE clauses, so there should be a limit to how much damage they can do.
Or you could warehouse a copy of the db for the purpose of user queries, update the db every hour or so, and let them go to town... :)
I have worked a few places where this also came up. What we ended up doing was NOT allowing users unconstrained access, and promising to have IT do their best to provide queries when needed. The issue was that the database is fairly complicated, and even if users could write grammatically and syntactically correct SQL, they don't necessarily understand the relationships between the tables. In other words, even if they could write their own SQL they would get the wrong answers. We convinced the users that the risk of making the wrong decision based on a flawed or incomplete understanding of the 200 tables in the database was too high. Better to get the right answer after a day than the wrong one instantly.
The other part of this is what does IT do when user A writes a query and gets 1 answer, then user B writes what he thinks is the same query and gets a different answer? Is it IT's job to find the differences? To fix both pieces of SQL? etc. The bottom line is that I would not allow them access. I would load the system with predefined queries, as others have mentioned, and try to train mgmt why that is the only way it will work in the long run.
If you have so much data and you want to provide to your customers the ability to analyse and view the information as they want to, I strongly recommand to thing about OLAP technologies.
I guess you've never heard of SQL Injection attacks? What if the user enters A DROP DATABASE command after the WHERE clause?
This is the reason that direct SELECT permission is almost never given to users in the vast majority of applications.
A far better approach would be to engineer your application around use cases so that you are able to cover a reasonable percentage of requirements with specifically designed filters/aggregation/layout options.
There are a myriad of ways to do this so some analysis of your specific problem domain will definitely be required together with research into viable methods.
Whilst direct SQL access is the most flexible for your users, long executing queries are likely to be just the start of your headaches. SQL injection is a big concern here, whether it's source is malicious or simply misguided.
(Chad mentioned this in a comment, but I think it deserves to be an answer.)
Maybe you should copy data that needs to be queried ad-hoc into a separate database, to isolate any problems from the majority of users.