143 lookup tables in EF CORE [closed] - c#

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Currently I'm redesigning an existing program which uses a master table which contains multiple values. (C# .net core 3.0 & EF) (One big lookup table)
Much of these values are rarely changing and I would put them in a c# enum.
Some examples: Language, Sex, ReceiptStatus, RiskType, RelationType, SignatureStatus, CommunicationType, PartKind, LegalStatute, ...
The list goes on and on and currently has 143 different categories, each having their own values with 2 translations in it.
My company wants the values to be in the database, so a non programmer can change them when they have to.
However it doesn't feel good at all. I would love to separate the table but creating 143 tables seem a bit of an overkill. If it was only 5-10 lookup tables it would have been fine..
Any advice? Stick to 1 lookup table? Feels wrong to my eyes. Multiple tables?
Convince my company we should just use C# enums which work perfectly fine, ruling out the possibility that a non programmer can edit them?

Based on your inclination to use enums, I'm going to assume that these lookup values do not change often.
Buckle up because a lot of hard-fought knowledge about maintainability is embedded in the analysis below. Let me break down the approaches you are considering:
Pure enums: This is the least flexible approach because it closes a lot of doors. As you said, changing values requires a developer and a deployment. What's your strategy if you eventually have other tables that need to relate to one of your many, many values? To me this is far too restrictive, especially since with either of the other approaches, you could create a .t4 template that generates
enums based on the data. Then if the data changes, you just
re-generate. I do this a lot.
One giant lookup table: Not as flexible as it may seem! This trades complexity, single responsibility principal, and referential integrity against repetition/table spam and is probably an expression of the Big Ball of Mud anti-pattern. You could add a column to this table that controls where a given value can be used, and that will allow you to have sane drop down lists, but that isn't as good as referential integrity. If other tables need to relate to a lookup, you have to relate against this entire table, which is much less clear. You will have to be careful to enforce your own layer of referential integrity since the database can't help you. Finally, and this is a big deal, if any if your 143 values has or will ever have extra complexity and could really benefit from an additional column, cognitive load begins to escalate. If five of the 143 need their own columns, you now have to hold all five columns in your mind to understand any one column... That is agony. Here's a thought experiment for you if I'm not getting my point across: why not build your entire project as one giant table?
143 tables: The most flexible approach, and all things considered, the easiest to maintain by a massive margin. It does not close any doors; down the road you can still create a UI for editing any value you want. If you want to relate other tables to a lookup value, that relationship will be easy to understand because you can relate to LegalStatus instead of GiantEverythingTable, and enjoy the benefits of referential integrity, never having to worry about corrupting your own data. You can also script table and index creation with something like NimbleText (a great tool and a hidden gem). There will be a huge number of tables, which is itself a minor maintenance problem, but it's one that doesn't actually break anything and doesn't lead to cognitive load. This is an acceptable trade-off. I would go this way and generate enums using t4.
The thing about most software projects of any size is that you may look at my objections and say they don't apply, and you might be right. But if this thing is going to be in active development, you have to ask: are you sure? Do you really know what's going to happen in a year?
When considering trade-offs, I've learned to assign a lot of weight to the most flexible/simple decision. Maintainability problems are what kill software projects. They are the enemy.
Hope that helps!

Related

Best practice for populating primary key values in SQL Server using Visual Studio [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I am working on a project for college and stumbled upon a problem which I am not sure how to handle it in a best way.
I have a SQL Server database that is manually created, and a Winforms project in Visual Studio written in C#. The application should do CRUD operations on database.
My question is what is the best way for manipulate the primary key columns in the tables? Should I define them in database as auto increment integer and let the database management system to handle the primary keys or should I define them just as int and populate them programatically within Visual Studio project, and if so how to do it?
I am not looking for complete solution, just for hint what is the best way of doing this.
I am very much a beginner, so please be gentle...
In general, auto-incremented (or identity or serial) primary keys are the way to go. You don't generally want your application to be worrying about things like whether the values have been used already.
If your application is multi-threaded -- that is, multiple users at the same time -- then the database will take care of any conflicts. That is quite convenient.
I am a fan of surrogate keys created by the database. In databases that cluster (sort) the rows by the primary key, it is much more efficient to have an automatically incremented value. And the database can take care of that.
There are some cases where you want a natural key. Of course, that is also permissible. But if you are going to invent a primary key for a table, let the database do the work.
When defining structures for DB backing for CRUD operations, you need to ask yourself:
Does it matter that my primary key is highly predictable?
By that I mean, if I am launching a user to a screen such "whatever.com/something/edit/1"
Aside from obvious security, does it help or harm the business process that a user can manipulate the url and inject 2 or 3 or 4 into path?
If it doesn't matter then absolutely set it as auto increminitng int on the DB side and offload that area of responsibility to the database to handle. You now no longer have to highly concern yourself with dealing with key generation.
IF it does matter then set the primary key as a unique identifier. In code when adding a new record set then you will generate a new GUID and set that as the primary key (Guid.NewGuid()). This would prevent the user from traversing your data in an uncontrolled manner as randomly guessing GUIDs would then be problematic to them. EX:
New path: "whatever.com/something/edit/0f8fad5b-d9cb-469f-a165-70867728950e"
Not saying that it is impossible to stumble upon things but the regular person using your application would not be so inclined to go exploring with url manipulation since they would be wasting 99.99% of their time with invlaid posts trying to guess a valid GUID that is registered in your DB.
As an added comment, if you decided to keep the primary key as a int and not use auto-increment then you are just setting yourself up for a ton of unnecessary work where I have never personally ever seen any real return on investment for the logic you would right to check if the placeholder is already used. That and think about tracking history? You would be settings yourself up for a world of pain if you ever decided to remove records from the table and then reuse them. That is a whole other set of concerns you would have to manage on top of what you are doing.

Is this a good case to use EAV or no [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I have different product types that have different attributes. They cannot be stored in a single table as the attributes are too distinct. There's a couple of options I'm currently looking at: EAV and a table for each type.
My situation is, at the moment, there are only a number of types (lets say 8) but in the near future with almost 100% certainty, this can grow. But the growth is controlled by me, its not defined by users. It will be up to me to grow the product type.
I'm currently inclined to use EAV (for the reason that I can cover the growth easily - I think) but I am not sure as I'm concerned with the performance as well as modeling them in my language of choice (C#). My question is, given the scenario above, is it better for me to create a single table for each product type and add as necessary, or would this be a good case (or not even good, lets say acceptable) to use EAV?
There's no short good or bad answer to this concern, because it depends of many things.
Do you have a lot of product types ?
How do you think each of them will evolve (think to what will happen when you will add new fields to products) ?
Do you need to handle "variants" of the products ?
Do you intend to add entirely new types of products ?
Etc.
EAV is probably a good way to go if you answer if you answer "yes" to some or all these questions.
Regarding C#, I have implemented in the past an EAV data catalog with it, and using Entity Framework over SQL Server (so a RDBMS).
It worked nice to me.
But if you need to handle a lot of products, performance can quickly become an issue. You could also look for a "NoSQL" solution, did you think about it ?
Just keep in mind that your model object does not have to match your data model.
For example you could perfectly have a stronly typed object for each type of product if you need so.
Much depends on the operations that will be performed on entities. If you will:
often add new attributes to products;
add a lot of products type;
implement full product type search (or other "full product type" feature);
I recommend you to use EAV.
I have implemented in the past EAV data structure with ADO.NET and MS SQL and don't have any problem with performance.
Also, Morten Bork above recommend use "sub types". But if you want implement some "full product type" features, I think it will be more difficult then use pure EAV model.
EAV doesn't really play well with a relational database. So if that is what you are doing. (IE connecting to SQL) Then I would say no. Take the hit in development time, and design a table pr type of product, or make a aggregate table that holds various properties for a product type, and then connect the properties to the relevant tables.
So if a product contains "Cogs" then you have a table with "teethcount", "radius" etc.
Another product type has "Scews" with properties "Length", "riling" etc.
And if a product type has both cogs and screws, it merely has relation to each of these subtypes.

What is the best way to implement record updates on an ORM? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm writing a simple code generator in C# for automating common tasks on bussiness applications such as data binding, model and viewmodel generation and record updating.
The generated code uses a data mapper that implements equallity by reference comparision (without id) and flag properties for transient state (if the object was created but not persisted).
For updating the object properties I have 3 options:
On the property setter call an UPDATE for one column only immediatly. This would provide instant persistence without any other mecanism managed by the final programmer, but it would requiere and unnecessary number of UPDATE calls
Mantain a Frozen state on all entities wich would prevent any property set, and BeginModification and EndModification methods, wich would enable property setters and UPDATE all modified columns on the EndModification. This woud requiere the programmer to call this methods wich is undesirable for the code generator, because code simplicity and diminishing programmer intervention is its primary goal
Mantain a timer for each entity (wich can be implemented as a global timer and local counters), and give certain "dirty time" to entities, when a property is setted, its dirty time is resetted to 0 and when its local clock gets to certain values, columns UPDATE would be made. This wouldn't require any extern final programmer code and woud group several property sets on a single UPDATE, because contiguos property sets have almost 0 time between.
The timer aproach can be combined with a CommitChanges method that will call the UPDATE immediatly if desired
My prefered way is the local dirty timer because the posibility of zero programmer intervention besides property sets, the question is: It is posible that this timer aproach would lead to data inconsistency?
If you're writing this as an educational exercise or as a means for further honing your design skills, then great! If you're writing this because you actually need an ORM, I would suggest that looking at one of the many existing ORM's would be a much wiser idea. These products--Entity Framework, NHinbernate, etc.--already have people dedicated to maintaining them, so they provide a much more viable option than trying to roll your own ORM.
That said, I would shy away from any automatic database updates. Most existing ORM's follow a pattern of storing state information at the entity level (typically an entity represents a single row in a table, though entities can relate to other entities, of course), and changes are committed by the developer explicitly calling a function to do so. This is similar to your timer approach, but without the...well...timer. It can be nice to have changes committed automatically if you're writing something like a Winforms application and the user is updating properties through data binding, but that is generally better accomplished by having a utility class (such as a custom binding list implementation) that detects changes and commits them automatically.

multiple resultsets vs multiple calls for performance [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm working on a fairly high performance application, and I know database connections are usually one of the more expensive operations. I have a task that runs pretty frequently, and in the course of business it has to select data from Table1 and Table2. I have two options:
Keep making two entity framework queries like I am right now. select from Table1 and select from Table2 in linq queries. (What I'm currently doing now).
Created a stored procedure that returns both resultsets in one query, using multiple resultsets.
I'd imagine the cost to SQL Server is the same: the same IO is being performed. I'm curious if anyone can speak to the performance bump that may exist in a "hot" codepath where milliseconds matter.
and I know database connections are usually one of the more expensive operations
Unless you turn off connection pooling, then as long as there are connections already established in the pool and available to use, obtaining a connection is pretty cheap. It also really shouldn't matter here anyway.
When it comes to two queries (whether EF or not) vs one query with two result sets (and using NextResult on the data reader) then you will gain a little, but really not much. Since there's no need to re-establish a connection either way, there's only a very small reduction in the overhead of one over the other, that will be dwarfed by the amount of actual data if the results are large enough for you to care much about this impact. (Less overhead again if you could union the two resultsets, but then you could do that with EF too anyway).
If you mean the bytes going too and fro over the connection after it's been established, then you should be able to send slightly less to the database (but we're talking a handful of bytes) and about the same coming back, assuming that your query is only obtaining what is actually needed. That is you do something like from t in Table1Repository select new {t.ID, t.Name} if you need ids and names rather than pulling back complete entities for each row.
EntityFramework does a whole bunch of things, and doing anything costs, so taking on more of the work yourself should mean you can be tighter. However, as well as introducing new scope for error over the tried and tested, you also introduce new scope for doing things less efficiently than EF does.
Any seeking of commonality between different pieces of database-handling code gets you further and further along the sort of path that ends up with you producing your own version of EntityFramework, but with the efficiency of all of it being up to you. Any attempt to streamline a particular query brings you in the opposite direction of having masses of similar, but not identical, code with slightly different bugs and performance hits.
In all, you are likely better off taking the EF approach first, and if a particular query proves particularly troublesome when it comes to performance then first see if you can improve it while remaining with EF (optimise the linq, use AsNoTracking when appropriate and so on) and if it is still a hotspot then try to hand-roll with ADO for just that part and measure. Until then, saying "yes, it would be slightly faster to use two resultsets with ADO.NET" isn't terribly useful, because just what that "slightly" is depends.
If the query is a simple one to read from table1 and table2 then LinQ queries should give similar performance as executing stored procedure (plain SQL). But if the query runs across different databases then plain SQL is always better, where you can Union the result sets and have the data from all databases.
In MySQL "Explain" statement can be used to know the performance of query. See this link:
http://www.sitepoint.com/using-explain-to-write-better-mysql-queries/
Another useful tool, is to check the SQL generated for your LinQ query in the output window of Microsoft Visual Studio. You can execute this query directly in a SQL editor and check the performance.

Why should we avoid public methods? Benefits of encapsulation [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Before down-voting let me explain my question. I have a little experience in designing architectures and try to progress. Ones, when I was fixing a bug, I came up with a conclusion that we need to make our private method to be public and than use it. That was the fastest way to make my job done, and have a bug fixed. I went to my team-leader and said it. After I've got a grimace from him, I was explained that every public method is a very expensive pleasure. I was told that every public method should be supported throughout the lifetime of a project. And much more..
I was wondering. Indeed! Why it wasn't so clearly when I was looking in the code. It wasn't also so evidently when I designed my own architectures. I remember my thoughts about it:
Ahh, I will leave this method public, who knows, maybe it will come usefull when the system grows.
I was confused, and thought that I made scaleable systems, but in fact got tons of garbage in my interfaces.
My question:
How can you explain to yourself if a method is really important and worthy to be public? Are any counterexamples for checking it? How you get trained to make private/public choise without spending hours in astral?
I suggest you read up on YAGNI http://c2.com/cgi/wiki?YouArentGonnaNeedIt
You should write code to suit actual requirements because writing code to suit imagined requirements leads to bloated code which is harder to maintain.
My favourite quote
Perfection is achieved, not when there is nothing more to add, but
when there is nothing left to take away.
-- Antoine de Saint-Exupery French writer (1900 - 1944)
This question need a deep and thorough discussion on OOP design, but my simple answer is anything with public visibility can be used by other classes. Hence if you're not building method for others to use, do not make it public.
One pitfall of unecessarily making private method public is when other classes did use it, it makes it harder for you to refactor / change the method, you have to maintain the downstream (think if this happen to hundreds of classes)
But nevertheless maybe this discussion will never end. You should spend more time reading OOP design pattern books, it will give you heaps more idea
There are a few questions you can ask yourself about the domain in which the object exists:
Does this member (method, property, etc.) need to be accessed by other objects?
Do other objects have any business accessing this member?
Encapsulation is often referred to as "data hiding" or "hiding members" which I believe leads to a lot of confusion. Inexperienced developers would rightfully ask, "Why would I want to hide anything from the rest of my code? If it's there, I should be able to use it. It's my code after all."
And while I'm not really convinced with the way in which your team leader worded his response, he has a very good point. When you have too many connection points between your objects, you end up with too many connections. Objects become more and more tightly coupled and fuse into one big unsupportable mess.
Clearly and strictly maintaining a separation of concerns throughout the architecture can significantly help prevent this. When you design your objects, think in terms of what their public interfaces would look like. What kind of outwardly-visible attributes and functionality would they have? Anything which wouldn't reasonably be expected as part of that functionality shouldn't be public.
For example, consider an object called a Customer. You would reasonably expect some attributes which describe a Customer, such as:
Name
Address
Phone Number
List of orders processed
etc.
You might also expect some functionality available:
Process Payment
Hold all Orders
etc.
Suppose you also have some technical considerations within that Customer. For example, maybe the methods on the Customer object directly access the database via a class-level connection object. Should that connection object be public? Well, in the real world, a customer doesn't have a database connection associated with it. So, clearly, no it should not be public. It's an internal implementation concern which isn't part of the outwardly-visible interface for a Customer.
This is a pretty obvious example, of course, but illustrates the point. Whenever you expose a public member, you add to the outwardly-visible "contract" of functionality for that object. What if you need to replace that object with another one which satisfies the same contract? In the above example, suppose you wanted to create a version of the system which stores data in XML files instead of a database. If other objects outside of the Customer are using its public database connection, that's a problem. You'd have to change a lot more about the overall design than just the internal implementation of the Customer.
As a general rule it's usually best to prefer the strictest member visibilities first and open them up as needed. Combine that guideline with an approach of thinking of your objects in terms of what real-world entities they represent and what functionality would be visible on those entities and you should be able to determine the correct course of action for any given situation.

Categories