ASP.NET Custom Membership Provider for very large application

ASP.NET Custom Membership Provider for very large application - c#

I have to come up with a membership solution for a very large website. The site will be built using ASP.NET MVC 2 and a MS SQL2008 database.
The current Membership provider seems like a BIG overkill, there's way too much functionality.
All I want to store is email/password and basic profile information such as First/LastName, Phone number. I will only ever need 2 roles, administrators & users.
What are your recommendations on this type of scenario, considering there might be millions of users registered? What does StackOverflow use?
I've used the existing Membership API a lot in the past and have extended it to store additional information etc. But there's tables such as
aspnet_Applications
aspnet_Paths
aspnet_SchemaVersions
aspnet_WebEvent_Events
aspnet_PersonalizationAllUsers
aspnet_PersonalizationPerUser
which are extremely redundant and I've never found use for.
Edit
Just to clarify a few other redundancies after #drachenstern's answer, there are also extra columns which I have no use for in the Membership/Users table, but which would add to the payload of each select/insert statements.
MobilePIN
PasswordQuestion/PasswordAnswer (I'll do email based password recovery)
IsApproved (user will always be approved)
Comment
MobileAlias
Username/LoweredUsername (or Email/LoweredEmail) [email IS the username so only need 1 of these]
Furthermore, I've heard that GUID's aren't all that fast, and would prefer to have integers instead (like Facebook does) which would also be publicly exposed.
How do I go about creating my own Membership Provider, re-using some of the Membership APIs (validation, password encryption, login cookie, etc) but only with tables that meet my requirements?
Links to articles and existing implementations are most welcome, my Google searches have returned some very basic examples.
Thanks in advance
Marko

#Marko I can certainly understand that the standard membership system may contain more functionality than you need, but the truth is that it really isn't going to matter. There are parts of the membership system that you aren't going to use just like there are parts of .Net that you aren't going to use. There are plenty of things that .Net can do that you are never, ever going to use, but you aren't going to go through .Net and strip out that functionality are you? Of course not. You have to focus on the things that are important to what you are trying to accomplish and work from there. Don't get caught up in the paralysis of analysis. You will waste your time, spin your wheels and not end up with anything better than what has already been created for you. Now Microsoft does get it wrong sometimes, but they do get a lot of things right. You don't have to embrace everything they do to accomplish your goals - you just have to understand what is important for your needs.
As for the Guids and ints as primary keys, let me explain something. There is a crucial difference between a primary key and a clustered index. You can add a primary key AND a clustered index on columns that aren't a part of the primary key! That means that if it is more important to have your data arranged by a name (or whatever), you can customize your clustered index to reflect exactly what you need without it affecting your primary key. Let me say it another way - a primary key and a clustered index are NOT one in the same. I wrote a blog post about how to add a clustered index and then a primary key to your tables. The clustered index will physically order the table rows the way you need them to be and the primary key will enforce the integrity that you need. Have a look at my blog post to see exactly how you can do it.
Here is the link - http://iamdotnetcrazy.blogspot.com/2010/09/primary-keys-do-not-or-should-not-equal.html.
It is really simple, you just add the clustered index FIRST and then add the primary key SECOND. It must be done in that order or you won't be able to do it. This assumes, of course, that you are using Sql Server. Most people don't realize this because SQL Server will create a clustered index on your primary key by default, but all you have to do is add the clustered index first and then add the primary key and you will be good to go. Using ints as a primary key can become VERY problematic as your database and server system scales out. I would suggest using Guids and adding the clustered index to reflect the way you actually need your data stored.
Now, in summary, I just want to tell you to go create something great and don't get bogged down with superficial details that aren't going to give you enough of a performance gain to actually matter. Life is too short. Also, please remember that your system can only be as fast as its slowest piece of code. So make sure that you look at the things that ACTUALLY DO take up a lot of time and take care of those.
And one more additional thing. You can't take everything you see on the web at face value. Technology changes over time. Sometimes you may view an answer to a question that someone wrote a long time ago that is no longer relevant today. Also, people will answer questions and give you information without having actually tested what they are saying to see if it is true or not. The best thing you can do for your application is to stress test it really well. If you are using ASP.Net MVC you can do this in your tests. One thing you can do is to add a for loop that adds users to your app in your test and then test things out. That is one idea. There are other ways. You just have to give it a little effort to design your tests well or at least well enough for your purposes.
Good luck to you!

The current Membership provider seems like a BIG overkill, there's way too much functionality.
All I want to store is email/password and basic profile information such as First/LastName, Phone number. I will only ever need 2 roles, administrators & users.
Then just use that part. It's not going to use the parts that you don't use, and you may find that you have a need for those other parts down the road. The classes are already present in the .NET framework so you don't have to provide any licensing or anything.
The size of the database is quite small, and if you do like I do, and leave aspnetdb to itself, then you're not really taking anything from your other databases.
Do you have a compelling reason to use a third-party component OVER what's in the framework already?
EDIT:
there are also extra columns which I
have no use for in the
Membership/Users table, but which
would add to the payload of each
select/insert statements.
MobilePIN
PasswordQuestion/PasswordAnswer (I'll
do email based password recovery)
IsApproved (user will always be
approved)
Comment MobileAlias
Username/LoweredUsername (or
Email/LoweredEmail) [email IS the
username so only need 1 of these]
This sounds like you're trying to microoptimize. Passing empty strings is virtually without cost (ok, it's there, but you have to profile to know just how much it's costing you. It won't be THAT much per user). We already routinely don't use all these fields in our apps either, but we use the membership system with no measurable detrimental impact.
Furthermore, I've heard that Guid's aren't all that fast, and would prefer to have integers instead (like Facebook does) which would also be publicly exposed.
I've heard that the cookiemonster likes cookies. Again, without profiling, you don't know if that's detrimental. Usually people use GUIDs because they want it to be absolutely (well to a degree of absoluteness) unique, no matter when it's created. The cost of generating it ONCE per user isn't all that heavy. Not when you're already creating them a new account.
Since you are absolutely set on creating a MembershipProvider from scratch, here are some references:
http://msdn.microsoft.com/en-us/library/system.web.security.membershipprovider.aspx
https://web.archive.org/web/20211020202857/http://www.4guysfromrolla.com/articles/120705-1.aspx
http://msdn.microsoft.com/en-us/library/f1kyba5e.aspx
http://www.amazon.com/ASP-NET-3-5-Unleashed-Stephen-Walther/dp/0672330113
Stephen Walther goes into detail on that in his book and it's a good reference for you to have as is.

My recommendation would be for you to benchmark it. Add as many records as you think you will have in production and submit a similar number of requests as you would get in production and see how it performs for your environment.
My guess is that it would be OK, the overhead that you are talking about would be insignificant.

Related

CQRS and primary key: guid or not?

For my project, which is a potentially big web site, I have chosen to separate the command interface from the query interface. As a result, submitting commands are one-way operations that don't return a result. This means that the client has to provide the key, for example:
service.SubmitCommand(new AddUserCommand() { UserId = key, ... });
Obviously I can't use an int for the primary key, so a Guid is a logical choice - except that I read everywhere about the performance impact it has, which scares me :)
But then I also read about COMB Guids, and how they provide the advantages of Guid's while still having a good performance. I also found an implementation here: Sequential GUID in Linq-to-Sql?.
So before I take this important decision: does someone have experience with this matter, of advice?
Thanks a lot!
Lud

First of all, I use sequential GUIDs as a primary key and I don't have any problems with performance.
Most of tests Sequential GUID vs INT as primary key operates with batch insert and selects data from idle database. But in a real life selects and updates happen in SAME time.
As you are applying CQRS, you will not have batch inserts and burden for opening and closing transactions will take much more time than 1 write query. As you have separated read storage, your select operations on a table with GUID PK will be much faster than they would be on a table with INT PK in a unified storage.
Besides, asynchrony, that gives you messaging, allows your applications scale much better than systems with blocking RPC calls can do.
In consideration of aforesaid, choosing GUIDs vs INTs seems to me as be penny-wise and pound-foolish.

You didn't specify which database engine you are using, but since you mentioned LINQ to SQL, I guess it's MS SQL Server.
If yes, then Kimberly Tripp has some advice about that:
Disk space is cheap...
GUIDs as PRIMARY KEYs and/or the clustering key
To summarize the two links in a few words:
sequential GUIDs perform better than random GUIDs, but still worse than numeric autoincrement keys
it's very important to choose the right clustered index for your table, especially when your primary key is a GUID

Instead of supplying a Guid to a command (which is probably meaningless to the domain), you probably already have a natural key like username which serves to uniquely identify the user. This natural key make a lot more sense for the user commands:
When you create a user, you know the username because you submitted it as part of the command.
When you're logging in, you know the username because the user submitted it as part of the login command.
If you index the username column properly, you may not need the GUID. The best way to verify this is to run a test - insert a million user records and see how CreateUser and Login perform. If you really to see a serious performance hit that you have verified adversely affects the business and can't be solved by caching, then add a Guid.
If you're doing DDD, you'll want to focus hard on keeping the domain clean so the code is easy to understand and reflects the actual business processes. Introducing an artificial key is contrary to that goal, but if you're sure that it provides actual value to the business, then go ahead.

Routes with 16-bit Guids seem Crazy?

I just looked at my Database Schema from my DBA and its using 16bit unique Identifiers as the Primary Key. The question I have how do I used this in the routing for MVC.
Something like http://www.app.com/project/21212/product/212121
This is a midsize enterprise application, why would you need a GUID for our tables anyway?
I know we can create a friendly ID field, but I know MVC Routing doesn’t recommend using database IDs in Routes..
So I guess my questions are:
Why would we need 16 Bit Guids for our Primary Key?
How could I use that in the Route. The route isn’t supposed to contain and Database IDs.

On the DB part of the question
The decision of what kind of DB keys you are going to use should be completely independent of your MVC routes. DBAs might chose to use whatever value they think is appropriate to your application without having to worry about how you are going to craft your routes. I couldn't tell whether they make sense for your domain or not.
On the route/URL part of the question
Depending on what you are trying to do adding a GUID to a route might not be the best idea for a route/URL. The authors of "ASP.NET MVC in Action" (page 95) give some good guidelines on how URLs should be:
Simple and clean
Hackable
Allow URLs parameters to clash
Short
Avoid exposing database IDs where possible
Consider adding unnecessary information
If you have GUIDs as database IDs see if you can use another value to craft the route to each resource/record. For example the name of the product plus the last 4 digits of the db ID, or another unique and user friendly (see guidelines) value that you can come up with based on the information that you are trying to access.

Lets look at this very page as an example. I think we can all agree that StackOverflow is a successful MVC application...
https://stackoverflow.com/questions/4079861/routes-with-16-bit-guids-seem-crazy
What is that "4079861" in there? A database ID?
Note that the database ID is really the only important part as these links also arrive at the same location:
https://stackoverflow.com/questions/4079861/
https://stackoverflow.com/questions/4079861/Foo
So, the short answer is: yes, your routes will probably have a big ugly Guid in them. Go talk to your DBA if you have a problem with that.

You should ask yourself: Is there another way to uniquely identify my [insert name] ?
#Mark gives the StackOverflow example. The nice part is that it's a number, even if it's a long one. Numbers are nicer than GUIDs.
Your options are probably:
create a simple number to GUID mapping in the database, creating a redundant unique identifier, for routing purposes
actually display the GUID as part of your routing
find some other way you could uniquely identify your record (i.e. date + name combination, which blogs use) - although here you have to make sure you don't allow duplicate entries of your routing identifier.
What you end up using will depend entirely on your situation and requirements.

Assistance with URL structure for accept/decline links

I am in the process of creating an app in which a customer can add email addresses to an event. This means that each email address is sent 2 urls via email when added to the list, 1 url to accept and the other to decline. The url is made up of a number of query parmatters, id's etc.
The issue I have is that I want to prevent the scenario in which someone could "guess" another persons url - as such guest the combination of parametters etc. While this is very unlikely, I still want to prevent such.
I have seen several scenarios to help prevent this, ie. add a hash value, encrypt the url etc. However I am looking for the most secure and best practise approach to this and would like any possible feedback.
As an aside I am coding in C# but I dont believe the solution to this is language specific.
Thanks in advance.

I agree this is not language specific. I had a situation very similar to this within the last few years. It needed to be extremely secure due to children and parents receiving the communications. The fastest solution was something like the following:
First store the information that you would use in the URL as parameters somewhere in a database. This should be relatively quick and simple.
Create two GUIDs.
Associate the first GUID with the data in the database that you would have used for processing an "acceptance".
Associate the second GUID for a "decline" record in the database.
Create the two URL's with only the GUID's as parameters.
If the Acceptance URL is clicked, use the database data associated with it to process the acceptance.
If the Decline is clicked, delete the data out of the database, or archive it, or whatever.
After a timeframe, is no URL is clicked, delete or archive the data associated with those GUID's so that they can no longer be used.
GUID's are extremely hard to guess, and the likelihood of guessing one that is actually usable would be so unlikely it is nearly impossible.

I'm guessing you are saving these email addresses somewhere. So it's quite easy to make a secure identifier for each entry you have. Whether that is a hash or some encryption technique, doesn't really matter. But I guess a hash is easier to implement and actually meant for this job.
So you hash for example the emailaddress, the PK value of the record, with the timestamp of when it was added, and some really impossible to guess salt. Just concatenate the various fields together and hash them.
In the end, you send nothing but the hashed key to the server. So when you send those two links, they could look as follows:
http://www.url.com/newsletter/acceptsubscription.aspx?id=x1r15ff2svosdf4r2s0f1
http://www.url.com/newsletter/cancelsubscription.aspx?id=x1r15ff2svosdf4r2s0f1
When the user clicks such a link, your server looks in the database for the record which contains the supplied key. Easy to implement, and really safe if done right. No way in hell someone can guess another persons key. Just bear in mind the standard things when doing something with hashing. Such as:
Do not forget to add salt.
Pick a really slow, and really secure, hashing algorithm.
Just make sure that no one can figure out their own hash, from information they can possess.
If you are really scared of people doing bad things, make sure to stop bruteforcing by adding throttle control to the website. Only allow X number of requests per minute for example. Or some form of banning on an IP-address.
I'm not an expert at these things, so there might be room for improvement. However I think this should point you in the right direction.
edit: I have to add; the solution provided by Tim C is also good. GUID's are indeed very useful for situations like these, and work effectively the same as my hashed solution above.

Sometimes Connected CRUD application DAL

I am working on a Sometimes Connected CRUD application that will be primarily used by teams(2-4) of Social Workers and Nurses to track patient information in the form of a plan. The application is a revisualization of a ASP.Net app that was created before my time. There are approx 200 tables across 4 databases. The Web App version relied heavily on SP's but since this version is a winform app that will be pointing to a local db I see no reason to continue with SP's. Also of note, I had planned to use Merge Replication to handle the Sync'ing portion and there seems to be some issues with those two together.
I am trying to understand what approach to use for the DAL. I originally had planned to use LINQ to SQL but I have read tidbits that state it doesn't work in a Sometimes Connected setting. I have therefore been trying to read and experiment with numerous solutions; SubSonic, NHibernate, Entity Framework. This is a relatively simple application and due to a "looming" verion 3 redesign this effort can be borderline "throwaway." The emphasis here is on getting a desktop version up and running ASAP.
What i am asking here is for anyone with any experience using any of these technology's(or one I didn't list) to lend me your hard earned wisdom. What is my best approach, in your opinion, for me to pursue. Any other insights on creating this kind of App? I am really struggling with the DAL portion of this program.
Thank you!

If the stored procedures do what you want them to, I would have to say I'm dubious that you will get benefits by throwing them away and reimplementing them. Moreover, it shouldn't matter if you use stored procedures or LINQ to SQL style data access when it comes time to replicate your data back to the master database, so worrying about which DAL you use seems to be a red herring.
The tricky part about sometimes connected applications is coming up with a good conflict resolution system. My suggestions:
Always use RowGuids as your primary keys to tables. Merge replication works best if you always have new records uniquely keyed.
Realize that merge replication can only do so much: it is great for bringing new data in disparate systems together. It can even figure out one sided updates. It can't magically determine that your new record and my new record are actually the same nor can it really deal with changes on both sides without human intervention or priority rules.
Because of this, you will need "matching" rules to resolve records that are claiming to be new, but actually aren't. Note that this is a fuzzy step: rarely can you rely on a unique key to actually be entered exactly the same on both sides and without error. This means giving weighted matches where many of your indicators are the same or similar.
The user interface for resolving conflicts and matching up "new" records with the original needs to be easy to operate. I use something that looks similar to the classic three way merge that many source control systems use: Record A, Record B, Merged Record. They can default the Merged Record to A or B by clicking a header button, and can select each field by clicking against them as well. Finally, Merged Records fields are open for edit, because sometimes you need to take parts of the address (say) from A and B.
None of this should affect your data access layer in the slightest: this is all either lower level (merge replication, provided by the database itself) or higher level (conflict resolution, provided by your business rules for resolution) than your DAL.

If you can install a db system locally, go for something you feel familiar with. The greatest problem I think will be the syncing and merging part. You must think of several possibilities: Changed something that someone else deleted on the server. Who does decide?
Never used the Sync framework myself, just read an article. But this may give you a solid foundation to built on. But each way you go with data access, the solution to the businesslogic will probably have a much wider impact...

There is a sample app called issueVision Microsoft put out back in 2004.
http://windowsclient.net/downloads/folders/starterkits/entry1268.aspx
Found link on old thread in joelonsoftware.com. http://discuss.joelonsoftware.com/default.asp?joel.3.25830.10
Other ideas...
What about mobile broadband? A couple 3G cellular cards will work tomorrow and your app will need no changes sans large pages/graphics.
Excel spreadsheet used in the field. DTS or SSIS to import data into application. While a "better" solution is created.
Good luck!

If by SP's you mean stored procedures... I'm not sure I understand your reasoning from trying to move away from them. Considering that they're fast, proven, and already written for you (ie. tested).
Surely, if you're making an app that will mimic the original, there are definite merits to keeping as much of the original (working) codebase as possible - the least of which is speed.
I'd try installing a local copy of the db, and then pushing all affected records since the last connected period to the master db when it does get connected.

Difference between creating Guid keys in C# vs. the DB

We use Guids as primary keys for entities in the database. Traditionally, we've followed a pattern of letting the database set the ID for an entity during the INSERT, I think mostly because this is typically how you'd handle things using an auto-increment field or whatever.
I'm finding more and more that it's a lot handier to do key assignment in code during object construction, for two main reasons:
you know that once an object's constructor has run, all of it's fields have been initialized. You never have "half-baked" objects kicking around.
if you need to do a batch of operations, some of which depend on knowing an object's key, you can do them all at once without round-tripping to the database.
Are there any compelling reasons not to do things this way? That is, when using Guids as keys, is there a good reason to leave key assignment up to the database?
Edit:
A lot of people have strong opinions on whether or not Guids should be used for PKs (which I knew), but that wasn't really the point of my question.
Aside from the clustering issue (which doesn't seem to be a problem if you set your indexes up properly), I haven't seen a compelling reason to avoid creating keys in the application layer.

I think you are doing just fine by creating them on the client side. As you mentioned, if you let the db do it, you have to find some way (can't think of any really) to get that key. If you were using an identity, there are calls you can use to get the latest one created for a table, but I'm not sure if such exists for a guid.

By doing it in C# you might run the risk of reassigning the GUID and saving it back to the database. By having the database be responsible for it, you're guaranteed that this PK will not change, that is, if you set up the proper constraints. Having said that, you could set similar constraints in your C# code that prevent changing a unique id once it has been assigned, but you'd have to do the same in all of your applications...In my opinion, having it in C# sounds like is more maintenance than the database, since databases already have built in methods to prevent changing primary keys.

Interesting question.
Traditionally I too used the DB assigned guid but recently I was working on a Windows Mobile application and the SQL CE database doesn't allow for newguid so I had to do it in code.
I use SQL replication to get the data from the mobile devices to the server. Over the last 6 months I have had 40 SQL CE clients synchronise back over 100000 records to a SQL 2005 server without one missed or duplicated guid.
The additional coding required was negligible and the benefit of knowing the guid before inserting has in fact cut down on some of the complexity.
I haven't done any performance checking so performance aside I cannot see any reason not to implement guid handling as you suggest.

GUIDs are horrible for performance
I would leave it in the database especially now that SQL Server has NEWSEQUENTIALID() which doesn't cause page splits on inserts anymore because the values are random, every NEWSEQUENTIALID created will be greater than the previous one...only caviat is that it can only be used as a default value

If you ever have to do an insert outside of the GUI (think import from another vendor or data from a company you bought and have to merge with your data), then the GUID would not automatically be assigned. It's not an insurmountable issue, but it is something to consider nonetheless.

I let an empty Guid be an indicator that this object, although constructed, has not yet been inserted into (or retrieved from) the database.

As SQLMenace noted, standard GUIDs negatively affects indexing & paging. In C# you can generate sequential GUIDs like NEWSEQUENTIALID() using a little P/Invoke fun.
[DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(out Guid guid);
This way you can at least keep using GUIDs, but get more flexibility with how and where they are generated.

Ok, time to chime in. I would say that generated GUIDs client-side for saving to the database is the best way to do things -- provided you happen to be using GUIDs as your PKs, which I only recommend in one scenario: disconnected environment.
When you are using a disconnected model for your data propagation (i.e. PDA/cellphone apps, laptop apps intended for limited connectivity scenarios, etc), GUIDs as PKs generated client-side are the best way to do it.
For every other scenario, you're probably better off with auto-increment identity PKs.
Why? Well, a couple reasons. First, you really do get a big performance boost by using a row-spanning clustered PK index. A GUID PK and a clustered index do not play well together -- even with NEWSEQUENTIALID, which, by the way, I think totally misses the point of GUIDs. Second, unless your situation forces you not to (i.e. you have to use a disconnected model) you really want to keep everything transactional and insert as much interrelated data together at the same time.

Aside from the clustering issue (which doesn't seem to be a problem if you set your indexes up properly),
GUID as indexes will always be terribly cluttered - there's no "proper" setup to avoid that (unless you use the NEWSEQUENTIALGUID function in the SQL Server engine).
The biggest drawback IMHO is size - a GUID is 16 byte, an INT is 4. The PK is not only stored in the tree of the primary key, but also ON EVERY non-clustered index entry.
With a few thousand entries, that might not make a big difference - but if you have a table with millions or billions of entries and several non-clustered indices, using a 16-byte GUID vs. a 4-byte INT as PK might make a HUGE difference in space needed - on disk and in RAM.
Marc

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.