How to deal with deprecated values in (country-)code lists - c#

Let's say we have a code list of all the countries including their country codes. The country code is primary key of the Countries table and it is used as a foreign key in many places in the database. In my application the countries are usually displayed as dropdowns on multiple forms.
Some of the countries, that used to exists in the past, don't exist any more, for example Serbia and Montenegro, which had the country code of SCG.
I have two objectives:
don't allow the user to use these old values (so these values should not be visible in dropdowns when inserting data)
the user should still be able to (readonly) open old stuff and in this case the deprecated values should be visible in dropdowns.
I see two options:
Rename deprecated values, for instance from 'CountryName' to '!!!!!CountryName'. This approach is the easiest to implement, but with obvious drawbacks.
Add IsActive column to Countries table and set it to false for all deprecated values and true for all other. On all the forms where the user can insert data, display only values which are active. On the readonly forms we can display all values (including deprecated ones) so the user will be able to display old data. But on some of my forms the user should be able to also edit data, which means that the deprecated values should be hidden from him. That means, that each dropbox should have some initialization logic like this: if the data displayed is readonly, then include deprecated values in dropbox and if the data is for edit also, then exclude them. But this is a lot of work and error prone too.
And other ideas?

I deal with this scenario a lot, and use the 'Active' flag to solve the problem, much as you described. When I populate a drop-down list with values, I only load 'active' data and include upto 1 deprecated value, but only if it is being used. (i.e. if I am looking at a person record, and that person has a deprecated country, then that country would be included in the Drop-downlist along with the active countries. I do this in read-only AND in edit modes, because in my cases, if a person record (for example) has a deprecated country listed, they can continue to use it, but once they change it to a non-deprecated country, and then save it, they can never switch back (your use case may vary).
So the key differences is, even in read-only mode I don't add all the deprecated countries to the DDL, just the deprecated country that applies to the record I am looking at, and even then, it is only if that record was already in use.
Here is an example of the logic I use when loading the drop down list:
protected void LoadSourceDropdownList(bool AddingNewRecord, int ExistingCode)
{
using (Entities db = new Entities())
{
if (AddingNewRecord) // when we are adding a new record, only show 'active' items in the drop-downlist.
ddlSource.DataSource = (from q in db.zLeadSources where (q.Active == true) select q);
else // for existing records, show all active items AND the current value.
ddlSource.DataSource = (from q in db.zLeadSources where ((q.Active == true) || (q.Code == ExistingCode)) select q);
ddlSource.DataValueField = "Code";
ddlSource.DataTextField = "Description";
ddlSource.DataBind();
ddlSource.Items.Insert(0, "--Select--");
ddlSource.Items[0].Value = "0";
}
}

If you are displaying the record as read-only, why bother loading the standing data at all?
Here's what I would do:
the record will contain the country code in any case, I would also propose returning the country description (which admittedly makes things less efficient), but when the user loads "old stuff", the business service recognises that this record will be read only, and you don't bother loading the country list (which would make things more efficient).
in my presentation service I will then generally do a check to see whether the list of countries is null. If not (r/w) load the data into the list box, if so (r/o) populate the list box from the data in the record - a single entry in the list equals read-only.

You can filter with CollectionViewSource or you could just create a Public Enumerable that filters the full list using LINQ.
CollectionViewSource Class
LINQ The FieldDef.DispSearch is the active condition. IEnumerable is a little better performance than List.
public IEnumerable<FieldDefApplied> FieldDefsAppliedSearch
{
get
{
return fieldDefsApplied.Where(df => df.FieldDef.DispSearch).OrderBy(df => df.FieldDef.DispName);
}
}

Why would you still want to display (for instance) customer-addresses with their OLD country-code?
If I understand correctly, you currently still have 'address'-records that still point to 'Serbia and Montenegro'. I think if you solve that problem, your current question would be none-existent.
The term "country" is perhaps a little misleading: not all the "countries" in ISO 3166 are actually independent. Rather, many of them are geographically separate territories that are legally portions or dependencies of other countries.
Also note that 'withdrawn country-codes' are reserved for 5 years, meaning that after 5 years they may be reused. So moving away from using the country-code itself as primary key would make sense to me, especially if for historical reasons you would need to back-track previous country-codes.
So why not make the 'withdrawn' field/table that points to the new country-id's. You can still check (in sql for instance, since you were already using a table) if this field is empty or not to get a true/false check if you need it.
The way I see it: "Country" codes may change, country's may merge and country's may divide.
If country's change or merge, you can update your address-records with a simple query.
If country's divide, you need a way to determine what address is part of what country.
You could use some automated system do do this (and write lengthly books about it).
OR
(when it is a forum like site), you could ask the users that still have a withdrawn country that points to multiple alternatives in their account to update their country-entry at login, where they can only choose from the list of new country's that are specified in the withdrawn field.
Think of this simplified country-table setup:
id cc cn withdrawn
1 DE Germany
2 CS Serbia and Montenegro 6,7
3 RH Southern Rhodesia 5
4 NL The Netherlands
5 ZW Zimbabwe
6 RS Serbia
7 ME Montenegro
In this example, address-records with country-id 3, get updated with a query to country-id 5, no user interaction (or other solution) needed.
But address-records that specify country-id 2 will be asked to select country-id 6 or 7 (of course in the text presented to the user you use the country-name) or are selected to perform your custom automated update routine on.
Also note: 'withdrawn' is a repeating group and as such you could/should make it into a separate table.
Implementing this idea (without downtime) in your scenario:
sql statement to build a new country-table with numerical id's as primary key.
sql statement to update address-records with new field 'country-id' and fill this field with the country-id from the new country-table that corresponds with country-code specified in that record's address-field.
(sql statement to) create the withdrawn table and populate the correct data with in it.
then rewrite your the sql statements that supply your forms with data
add the check and 'ask user to update country'-routine
let new forms go live
wait/see for unintended bugs
delete old country-table and (now unused) country-code column from the "address"-table
I am very curious what other experts think about this idea!!

Related

Remove list values based on series of other values

I have a situation wherein a List object is built off of values pulled from a MSSQL database. However, this particular table is mysteriously getting an errant record or two tossed in. Removing the records cause trouble even though they have no referential links to any other tables, and will still get recreated without any known user actions taken. This causes some trouble as it puts unwanted values on display that add a little bit of confusion. The specific issue is that this is a platform that allows users to run a search for quotes, and the filtering allows for sales rep selection. The select/dropdown field is showing these errant values, and they need to be removed.
Given that deleting the offending table rows does not provide a desirable result, I was thinking that maybe the best course of action was to modify the code where the List object is created and either filter the values out or remove them after the object is populated. I'd like to do this in a clean, scalible fashion by providing some kind of appendable data object where I could just add in a new string value if something else cropped up as opposed to doing something clunky that adds new code to find the value and remove it each time.
My thought was to create a string array, and somehow loop through that to remove bad List values, but I wasn't entirely certain that was the best way to approach this, and I could not for the life of me think of a clean approach for this. I would think that the best way would be to add a filter within the Find arguments, but I don't know how to add in an array or list that way. Otherwise I figured to loop through the values either before or after the sorting of the List and remove any matches that way, but I wasn't sure that was the best choice of actions.
I have attached the current code, and would appreciate any suggestions.
int licenseeID = Helper.GetLicenseeIdByLicenseeShortName(Membership.ApplicationName);
List<User> listUsers;
if (Roles.IsUserInRole("Admin"))
{
//get all users
listUsers = User.Find(x => x.LicenseeID == licenseeID).ToList();
}
else
{
//get only the current user
listUsers = User.Find(x => (x.LicenseeID == licenseeID && x.EmailAddress == Membership.GetUser().Email)).ToList();
}
listUsers.Sort((x, y) => string.Compare(x.FirstName, y.FirstName));
-- EDIT --
I neglected to mention that I did not develop this, I merely inherited its maintenance after the original developer(s) disappeared, and my coworker who was assigned to it left the company. I'm not really really skilled at handling ASP.NET sites. Many object sources are hidden and unavailable for edit, I assume due to them being defined in a DLL somewhere. So, for any of these objects that are sourced from database tables, altering the tables will not help, since I would not be able to get the new data anyway.
However, I did try to do the following to filter out the undersirable data:
List<String> exclude = new List<String>(new String[] { "value1" , "value2" });
listUsers = User.Find(x => x.LicenseeID == licenseeID && !exclude.Contains(x.FirstName)).ToList();
Unfortunately it only resulted in an error being displayed to the page.
-- EDIT #2 --
I got the server setup to accept a new event viewer source so I could write info to the Application log to see what was happening. Looks like this installation of ASP.NET does not accept "Contains" as an action on a List object. An error gets kicked out stating that the method is not available.
I will probably add a bit to the table and flag Errant rows and then skip them when I query the table, something like
&& !ErrantData
Other way, that requires a bit more upkeep but doesn't require db change, would be to keep a text file that gets periodically updated and you read it and remove users from list based on it.
The bigger issue is unknown rows creeping in your database. Changing user credentials and adding creation timestamps may help you narrow down the search scope.

Millions of rows in the database, only so much needed

Problem summary:
C# (MVC), entity framework 5.0 and Oracle.
I have a couple of million rows in a view which joins two tables.
I need to populate dropdownlists with filter-posibilities.
The options in these dropdownlists should reflect the actual contents
of the view for that column, distinct.
I want to update the dropdownlists whenever you select something, so
that the new options reflect the filtered content, preventing you
from choosing something that would give 0 results.
Its slow.
Question: whats the right way of getting these dropdownlists populated?
Now for more detail.
-- Goal of the page --
The user is presented with some dropownlists that filter the data in a grid below. The grid represents a view (see "Database") where the results are filtered.
Each dropdownlist represents a filter for a column of the view. Once something is selected, the rest of the page updates. The other dropdownlists now contain the posible values for their corresponding columns that complies to the filter that was just applied in the first dropdownlist.
Once the user has selected a couple of filters, he/she presses the search button and the grid below the dropdownlists updates.
-- Database --
I have a view that selects almost all columns from two tables, nothing fancy there. Like this:
SELECT tbl1.blabla, tbl2.blabla etc etc
FROM table1 tbl1, table2 tbl2
WHERE bsl.bvz_id = bvz.id AND bsl.einddatum IS NULL;
There is a total of 22 columns. 13 VARCHARS (mostly small, 1 - 20, one of em has a size of 2000!), 6 DATES and 3 NUMBERS (one of them size 38 and one of them 15,2).
There are a couple of indexes on the tables, among which the relevant ID's for the WHERE clause.
Important thing to know: I cannot change the database. Maybe set an index here and there, but nothing major.
-- Entity Framework --
I created a Database first EDMX in my solution and also mapped the view. There are also classes for both tables, but I need data from both of them, so I don't know if I need them. The problem by selecting things from either table would be that you can't apply half of the filtering, but maybe there are smart way's I didn't think of yet.
-- View --
My view is strongly bound to a viewModel. In there I have a IEnumerable for each dropdownlist. The getter for these gets its data from a single IEnumerable called NameOfViewObjects. Like this:
public string SelectedColumn1{ get; set; }
private IEnumerable<SelectListItem> column1Options;
public IEnumerable<SelectListItem> Column1Options
{
get
{
if (column1Options == null)
{
column1Options= NameOfViewObjects.Select(item => item.Column1).Distinct()
.Select(item => new SelectListItem
{
Value = item,
Text = item,
Selected = item.Equals(SelectedColumn1, StringComparison.InvariantCultureIgnoreCase)
});
}
return column1Options;
}
}
The two solutions I've tried are:
- 1 -
Selecting all columns in a linq query I need for the dropdownlists (the 2000 varchar is not one of them and there are only 2 date columns), do a distinct on them and put the results into a Hashset. Then I set NameOfViewObjects to point towards this hashset. I have to wait for about 2 minutes for that to complete, but after that, populating the dropdownlists is almost instant (maybe a second for each of them).
model.Beslissingen = new HashSet<NameOfViewObject>(dbBes.NameOfViewObject
.DistinctBy(item => new
{
item.VarcharColumn1,
item.DateColumn1,
item.DateColumn2,
item.VarcharColumn2,
item.VarcharColumn3,
item.VarcharColumn4,
item.VarcharColumn5,
item.VarcharColumn6,
item.VarcharColumn7,
item.VarcharColumn8
}
)
);
The big problem here is that the object NameOfViewObject is probably quite large, and even though using distinct here, resulting in less than 100.000 results, it still uses over 500mb of memory for it. This is unacceptable, because there will be a lot of users using this screen (a lot would be... 10 max, 5 average simultaniously).
- 2 -
The other solution is to use the same linq query and point NameOfViewObjects towards the IQueryable it produces. This means that every time the view wants to bind a dropdownlist to a IEnumerable, it will fire a query that will find the distinct values for that column in a table with millions of rows where most likely the column it's getting the values from is not indexed. This takes around 1 minute for each dropdownlist (I have 10), so that takes ages.
Don't forget: I need to update the dropdownlists every time one of them has it's selection changed.
-- Question --
So I'm probably going at this the wrong way, or maybe one of these solutions should be combined with indexing all of the columns I use, maybe I should use another way to store the data in memory, so it's only a little, but there must be someone out there who has done this before and figured out something smart. Can you please tell me what would be the best way to handle a situation like this?
Acceptable performance:
having to wait for a while (2 minutes) while the page loads, but
everything is fast after that.
having to wait for a couple of seconds every time a dropdownlist
changes
the page does not use more than 500mb of memory
Of course you should have indexes on all columns and combinations in WHERE clauses. No index means table scan and O(N) query times. Those cannot scale under any circumstance.
You do not need millions of entries in a drop down. You need to be smarter about filtering the database down to manageable numbers of entries.
I'd take a page from Google. Their type ahead helps narrow down the entire Internet graph into groups of 25 or 50 per page, with the most likely at the top. Maybe you could manage that, too.
Perhaps a better answer is something like a search engine. If you were a Java developer you might try Lucene/SOLR and indexing. I don't know what the .NET equivalent is.
First point you need to check is your DB, make sure you have to right indexes and entity relations in place,
next if you want to dynamical build your filter options then you need to run the query with the existing filters to obtain what the next filter can be. there are several ways to do this,
firstly you can query the data and extract the values from the return, this has a huge load time and wastes time returning data you don't want (unless you are live updating the results with the filter and dont have paging, in which case you might aswell just get all the data and use linqToObjects to filter)
a second option is to have a parallel queries for each filter that returns the possible filters, so filter A = all possible values of A from data, filter b = all possible values of B when filtered by A in the data, C = all possible values of C when filtered by A & B in the data, etc. this is better than the first but not by much
another option is the use aggregates to speed things up, ie you have a parallel query as above but instead of returning the data you return how many records are returned, aggregate functions are always quicker so this will cut your load time dramatically but you are still repeatedly querying a huge dataset to it wont be exactly nippy.
you can tweak this further using exist to just return a 0 or 1.
in this case you would look at a table with all possible filters and then remove the ones with no values from the parallel query
the next option will be the fastest by a mile is to cache the filters in the DB, with a separate table
then you can query that and say from Cache, where filter = ABC select D, the problem with this maintaining the cache, which you would have to do in the DB as part of the save functions, trigggers etc.
Another solution that can be added in addition to the previous suggestions is to use the /*+ result_cache */ hint, if your version of Oracle supports it (Oracle version 11g or later). If the output of the query is small enough for a drop-down list, then when a user enters criteria that matches the same criteria another user used, the results are returned in a few milliseconds instead of a few seconds or minutes. Result cache is wonderful for queries that return a small set of rows out of millions.
select /*+ result_cache */ item_desc from some_table where item_id ...
The result cache is automatically flushed when any insert/updates/deletes occur on the database tables.
I've done something 'kind of' similar in the past - if you can add a table to the database then I'd explore introducing a 'scratchpad' type table where results are temporarily stored as the user refines their search. Since multiple users could be working simultaneously the table would have to have an additional column for identifying the user.
I'd think you'd see some performance benefit since all processing is kept server-side and your app would simply be pulling data from this table. Since you're adding this table you would also have total control over it.
Essentially I'd imagine the program flow would go something like:
User selects some filters and clicks 'Search'.
Server populates scratchpad table with results from that search.
App populates results grid from scratchpad table.
User further refines search and clicks 'Search'.
Server removes/adds rows to scratchpad table as necessary.
App populates results grid from scratchpad table.
And so on.
Rather than having all the users results in one 'scratchpad' table you could possibly explore having temporary 'scratchpad' tables per user.

Using a SQL table to handle state transitions

I am working on a Order (MVC) system where the orders transition to different states, i.e new order, paid, shipped, etc. Each state can have multiple transitions. Originally I thought I would have a status table with an ID and Description and then a transition table that would have current status and transitions status, with each transition on a single row. In order to populate a selection box, I would have to do the join to get the descriptions. Now I am thinking I could do it all in one table and add a comma separate column which would list the possible transitions. Is this a good idea or is there a better way?
Any RDBMS promotes database normalization, There are 6 forms of database normalization. Normally if you can get to first three it is good enough.
The first Normal Form states: you should have only one piece of information in a column and a column should store on one piece/Type of information.
Now if your case when you are try to save a comma deliminited list of transitions. if you have to pick only record with a particular type of transitions state?? it will be a messy query.
Also imagine a scenario where you have to update a column for a particular record when transition state is changed, again a very messy , error prone and performance killer query.
Therefore follow the basic rules of Database Normalization, and stick to your 1st which was to create a separate table and use IDs to define transition state, and add a new row whenever a transition changes.
My Suggestion
Simply Have one column for [current status] and one Column for [transition], add a new row everytime any of the values change.
Also a datetime column with default value to current datetime. Which will allow you to go back in history and see different status and transition states of a record in point in time.
Have only one column in only One table which stores this information reference this column in other tables if you need to.

Defining Status of data via Enum or a relation table

I have an application which has rows of data in a relation database the table needs a status which will always be either
Not Submitted, Awaiting Approval, Approved, Rejected
Now since these will never change I was trying to decide the best way to implement them I can either think of a Status enum with the values and an int assigned where the int is placed into the status column on the table row.
Or a status table that linked to the table and the user select one of these as the current status.
I can't decide which is the better option as I currently have a enum in place with these values for the approval pages to populate the dropdown etc and setup the sql (as it currently using to bool Approved and submitted for approval but this is dirty for various reasons and needs changed).
Wondering what your thought on this were and whether I should go for one or the other.
If it makes any difference I am using Entity framework.
I would go with the Enum if it never changes since this will be more performant (no join to get the status). Also, it's the simpler solution :).
Now since these will never change...
You can count on this assumption being false, and sooner than you think.
I would use a lookup table. It's far easier to add or change values in a lookup table than to change the definition of an enum.
You can use a natural primary key in the lookup table so you don't need to do a join to get the value. Yes a string takes a bit more space than an integer id, but if your goal is to avoid the join this will accomplish that goal.
I use Enums and use the [Description("asdf")] attribute to bind meaningful sentences or other things that aren't allowed in Enums. Then use the Enum text itself as a value in drop downs and the Description as the visible text.

Saving multiple items per single database cell

i have a countries list. Each user can check multiple countries. Once saved, this "user country list" will be used to get whether other users fit into countries certain user chose.
Question is what would be the most efficient approach to this problem...
I have one, one to save user selection as delimited list like Canada,USA,France ... in single varchar(max) field but problem with it would be that once user from Germany enters page i perform this check on. To search for Germany i would be needed to get all items and un-delimit each field to check against value or to use sql 'like' which again is pretty damn slow..
If you have better solution or some tips i would be glad to hear.
Just to make sure, many users will have their own selections of countries from which and only they want to have users to land on their page. While millions of users will reach those pages. So the faster approach will be the better.
technology, MSSQL and ASP.NET
thanks
You should not store a list of values in one cell. Consider having a separate table that stores each of the selected countries with a foreign key reference to the user table. This is standard Database Normalization.
PLEASE don't go down the route you're thinking of, storing multiple entries in one field. I've had to re-write more applications because of bad database design than for any other reason, and that is a bad design.
Added
I have this poster on my wall at work: http://www.informationqualitysolutions.com/FreeStuff/rettigNormalizationPoster.pdf
One of my predecessors was a newbie to DB Design, and this helped her a lot. I keep it for any new hires that may need it. It explains normalization very nicely, with examples.
Do not save delimited fields into your database. Your database will not be normalized.
You need a many-to-many table for users and countries:
UserId
CountryId
If you do start using a delimited field, you end up needing to parse it (either in SQL or your Code). It is more difficult to query and optimize.
In this case, you want will want to create a table called UserCountries (or some such) which would store the UserID and CountryID. This is a standard relational construct. To beginners, it seems strange and too involved, but this structure makes it very easy and very fast to write flexible queries against this type of data. No delimiting required!
I think it would be better to use a UserCountry table, which contains a link to the User and the Country table. This creates a lot more possibilities to query against the database. Example queries that are much simpler this way:
Number of Countries per user
All users which selected a particular country
Sort all popular countries
Do not store multiple countries in a single field. Add 2 additional tables - Countries (ID, Name) and UserCountries (UserID, CountryID)

Categories