So I am making an inventory system as a school project and I want to generate a Unique IDs for the items so that it will be the identifier for that certain Item.
Is it possible to generate an ID which is based on certain fields such as the item name, the price, the expiration.
Is there some existing library I can use for this?
EDIT: It is okay for my system to have duplicate ids because it will mean that the item already exists in the system and does not need to be added again.
Is it possible to generate an ID which is based on certain fields such as the item name, the price, the expiration.
Yes: a hash.
For example:
// It's very important to use `InvariantCulture ` and "o" to ensure consistent formatting on all computers
String infoText = item.Name + item.Price.ToString( "C2", CultureInfo.InvariantCulture ) + item.Expiration.ToString("o");
Byte[] infoBytes = Encoding.UTF8.GetBytes( infoText );
using( SHA256 sha = new SHA256Cng() )
{
Byte[] hash = sha.ComputeHash( infoBytes );
String hashText = Convert.ToBase64String( hash );
Console.WriteLine( "{0} = {1}", infoText, hashText )
}
A hash (also known as a "digest") will always be the same for the same input, and will be different for different input.
So if you have an SKU containing $10 worth of apples that expires on 2019-10-09, and you feed that in to the code above, then it will generate a unique code you can use without needing to store the mapping between that SKU and that unique code (also known as "content-based addressing").
...so if you come across another $10 worth of apples that also expire on 2019-10-09, then it will have the same unique code, even though it's a different object, and you didn't need to memorize that unique code you generated earlier.
But if you come across $10 worth of pears that expires on 2019-10-09, or $20 worth of apples that also expires on the same day, the'll have a different code.
You could create the ID by concatenating all of the information together as a big string or even hashing that string.
However this can still give you duplicates if all of that information is that same several times. There doesn't seem to be any need on using the existing information to generate the ID. If you are basing it on existing data then there will always be a chance of duplicates. Unless there is some sort of constraint on one of the fields requiring that to be unique. But if that's unique then you could just use that as an ID in the first place.
If you are storing this data in a database I would suggest creating an ID field and make that the primary key and give it an identity. This will automatically increment the ID so it will always be unique.
You could also just generate a GUID by using Guid.NewGuid() and use that as the ID.
Is it possible to generate an ID which is based on certain fields such as the item name, the price, the expiration.
Joining these three strings or even hashing them still have the chance of being duplicate.
Simple question needs simple solution, why not using a running number? Since you have an inventory system, for the first item in your inventory system that should be ID#1 (or you can start with any numbers you like).
But if you insist to use certain fields information i would suggest that adding all below information together:
combination of fields information
timestamp
user id(person who perform the insert)
maybe your favorite colors
hashing all information above
Related
I'm working on a ASP.Net app, that is not really a distributed application per se, but at some point will have all of its data synchronized to the master node.
In order to be able to store the data from different nodes on one unique table
without having colision of ids, the approach that was taken was:
I. Not use auto generated ids
II. The row Id would be composed by a concatenation of the NodeId + NextRowtId
The NextRowId is generated by:
Selecting the highest id from one specific node,
Splitting it into 2 parts, the first part being the NodeId and the second the being the LastDocumentId
Incrementing the LastDocumentId
Concatenate the NodeId with the incremented LastDocumentId
Eg
Id = 20099, split into (NodeId = 200, LastDocumentId = 99)
LastDocumentId + 1 = 100
NextRowId = 200100
This works perfectly in theory, or if the requests are processed in a sequential way. However, if multiple requests are processed at same time they often end up generating the same id.
So in practice if multiple there is a collision of ids when multiple users try to update the same table at the same time.
I have had a look at the best practices on generating unique ids for distributed systems. However, none of them is a viable option at this point in time, as they would require a rethinking of the whole architecture and lots and lots of refactoring. Both require time which management will not allow me to take.
So what are the other ways that I can ensure that ids generated are unique or that the requests are processed in a sequential way? All this, ideally without having to restructure the application or cause performance bottlenecks.
Create a unique constraint on your key column. If you happen to insert the same id twice, catch the exception and regenerate your id.
You probably want to use Guids instead.
That said, if you need to know to which node your data is associated, you should model your database according to it: Have 2 columns NodeId and DocumentId. You can also generate a Unique Constraint above multiple columns.
I know similar questions have been asked, but I have a rather different scenario here.
I have a SQL Server database which will store TicketNumber and other details. This TicketNumber is generated randomly from a C# program, which is passed to the database and stored there. The TicketNumber must be unique, and can be from 000000000-999999999.
Currently, what I do is: I will do a select statement to query all existing TicketNumber from the database:
Select TicketNumber from SomeTable
After that, I will load all the TicketNumber into a List:
List<int> temp = new List<int>();
//foreach loop to add all numbers to the List
Random random = new Random();
int randomNumber = random.Next(0, 1000000000);
if !(temp.Contain(randomNumber))
//Add this new number to the database
There is no problem with the code above, however, when the dataset get larger, the performance is deteriorating. (I have close to hundred thousand of records now). I'm wondering if there is any more effective way of handling this?
I can do this from either the C# application or the SQL Server side.
This answer assumes you can't change the requirements. If you can use a hi/lo scheme to generate unique IDs which aren't random, that would be better.
I assume you've already set this as a primary key in the database. Given that you've already got the information in the database, there's little sense (IMO) in fetching it to the client as well. That goes double if you've got multiple clients (which seems likely - if not now then in the future).
Instead, just try to insert a record with a random ID. If it works, great! If not, generate a new random number and try again.
After 1000 days, you'll have a million records, so roughly one in a thousand inserts will fail. That's only one a day - unless you've got some hard limit on the insertion time, that seems pretty reasonable to me.
EDIT: I've just thought of another solution, which would take a bunch of storage, but might be quite reasonable otherwise... create a table with two columns:
NaturalID ObfuscatedID
Prepopulate that with a billion rows, which you generate by basically shuffling all the possible ticket IDs. It may take quite a while, but it's a one-off cost.
Now, you can use an auto-incrementing ID for your ticket table, and then either copy the corresponding obfuscated ID into the table as you populate it, or join into it when you need the ticket ID.
You can create a separate table with only one column . Lets just name it UniqueID for now. Populate that column with UniqueID = 000000000-999999999. Everytime you want to generate a random number, do something like
SELECT TOP 1 UniqueID From (Table) WHERE UniqueID NOT IN (SELECT ID FROM (YOUR TABLE))
Code has not been tested but just to show the idea
I have a table and it has one of the attribute set as identity. I want to get the value of the identity attribute that would be generated after I enter a value to the database.
I have EmpTable made of EmpID and EmpName. EmpID is set as Identity. I want to fetch the EmpID value before inserting a new row to the database.
I would advise against trying to do this with a table that is set up to use an integer column as the primary key. You will run into concurrency problems if you simply fetch the previous ID and increment it. Instead you should use a GUID (uniqueidentifier in SQL) as your primary key.
This will allow you to generate a new GUID in your code that can safely be saved to the database at a later stage.
http://msdn.microsoft.com/en-us/library/system.guid.newguid.aspx
http://msdn.microsoft.com/en-us/library/ms187942.aspx
Sure the server knows where the auto-increment count is in its sequence, but there is almost nothing useful you can do with that information. Imagine you go to the Post Office and they hand out numbered tickets so they can serve customers in order. Of course you could ask them what the next number they'll give out is, but since anyone can walk in at any time you don't know you'll get that number. If you don't know that you'll get it, you can't do anything with it - e.g. writing it as a reference number on a form would be a mistake.
Depending on what you're trying to do, your two main options are:
Use a client-generated guid as your identifier. This kind of messes up the order so the analogy isn't great, but imagine if each customer who walked in could generate a random number that they are sure would never have been used before. They could use that to fill out forms before taking a number.
Take a number, but do it in a transaction with the other operations. A customer can take a number and use it to fill out some paperwork. If they realize they left their money at home, they just throw everything away and you never call their number.
Why do you think you need this information? Can you use either of these strategies instead?
I am entering student id as a randon number into the DB
int num = r.Next(1000);
Session["number"] = "SN" + (" ") + num.ToString();
But is there any chance of getting a duplicate number?How can i avoid this?
EDIT :: I have a identity column and the student id is separate from the ID,i am going to enter a random student id into the DB from UI.
It is a very common task to have a column in a DB that is merely an integer unique ID. So much so that every database I've ever worked with has a specific column type, function, etc. for dealing with it. It will vary based on whatever specific database you use, but you should figure out what that is and use it.
You need a value that is unique not, random. The two are different. Random numbers repeat, they aren't unique. Unique numbers also aren't random. For example, if you just increment numbers up from 0 it will be unique, but that's not in any way random.
You could use a GUID, which would be unique, but it would be 128 bits. That's pretty big. Most databases will just have a counter that they increment every time you add an item, so 32 bits is usually enough. This will save you a lot of space. Incrementing a counter is also quicker than calculating a GUID's new value. For DB operations that tend to involve adding lots of items, that could matter.
As Jodrell mentions in the comments, you should also consider the size of the index if you use a GUID or other large field. Storing and maintaining that index will be much more expensive (in both time and space) with column that needs that many more bits.
If you try to do something yourself there's a good chance you'll do it wrong. Either your algorithm won't be entirely unique, it will have race conditions due to improper synchronization, it will be less performant because of excessive synchronization, it will be significantly larger because that's what it took to reduce the risk of collisions, etc. At the end of the day the database will have access to tools that you don't; let it take care of it so you don't need to worry about what you could mess up.
Sure there is a very likely chance that you will get a duplicate number. Next is just giving you a number between 0 and 1000, but there is no guarantee that the number will not be some number that Next has returned in the past.
If you are trying to work with unique values, look into possibly using Guids instead of integers or have a constantly increasing integer value instead of any random number. Here the reference page on Guid
http://msdn.microsoft.com/en-us/library/system.guid.aspx
you can use Guid's instead of random int , they are going to always be unique
There is no way to guarentee an int is unique unless you check every one that already exists, and even then - like the comments say , you are guarenteed duplicates when you pass 1000 ids
EDIT:
I mention that I think Guid's are best here because of the question , first indexing the table is not going to take long at all - it is assumed that there are going to be less then 1000 students because of the size of int, 128 bits is fine in a table with less then 1000 rows.
Guid's are a good thing to learn - even though they are not always the most effecient way
Creating a unique Guid in c# has a benifit that you can keep using and displaying that id - like in the question , without another trip to Db to figure out which unique id was assigned to the student
Yes, you will get duplicates. If you want a truly unique item, you will need to use Guid. If you still want to use numbers, then you will need to keep track of the numbers you have already used, similar to identity column in database.
Yes, you will certainly get duplicates. You could use a GUID instead:
Guid g = Guid.NewGuid();
GUIDs are theoretically "Globally Unique".
You can try to generate id using Guid:
Session["number"] = "SN" + (" ") + Guid.NewGuid().ToString();
It will highly descrease a chance to get duplicate id.
If you are using random numbers then no there is no way of avoiding it. There will always be a chance of a collision.
I think what you are probably looking for is an Identity column, or whatever the equivalent is for your database server.
In LINQ to SQL it is possible to set row like this:
[Column ( IsPrimaryKey = true, IsDbGenerated = true )]
public int ID { get; set; }
I dont know if it helps you in asp, but maybe it is a good hint...
Yes there is a chance of course.
Quick solution:
Check if it is a duplicate number first and try again until it is no longer a duplicate number.
Let's say we have a code list of all the countries including their country codes. The country code is primary key of the Countries table and it is used as a foreign key in many places in the database. In my application the countries are usually displayed as dropdowns on multiple forms.
Some of the countries, that used to exists in the past, don't exist any more, for example Serbia and Montenegro, which had the country code of SCG.
I have two objectives:
don't allow the user to use these old values (so these values should not be visible in dropdowns when inserting data)
the user should still be able to (readonly) open old stuff and in this case the deprecated values should be visible in dropdowns.
I see two options:
Rename deprecated values, for instance from 'CountryName' to '!!!!!CountryName'. This approach is the easiest to implement, but with obvious drawbacks.
Add IsActive column to Countries table and set it to false for all deprecated values and true for all other. On all the forms where the user can insert data, display only values which are active. On the readonly forms we can display all values (including deprecated ones) so the user will be able to display old data. But on some of my forms the user should be able to also edit data, which means that the deprecated values should be hidden from him. That means, that each dropbox should have some initialization logic like this: if the data displayed is readonly, then include deprecated values in dropbox and if the data is for edit also, then exclude them. But this is a lot of work and error prone too.
And other ideas?
I deal with this scenario a lot, and use the 'Active' flag to solve the problem, much as you described. When I populate a drop-down list with values, I only load 'active' data and include upto 1 deprecated value, but only if it is being used. (i.e. if I am looking at a person record, and that person has a deprecated country, then that country would be included in the Drop-downlist along with the active countries. I do this in read-only AND in edit modes, because in my cases, if a person record (for example) has a deprecated country listed, they can continue to use it, but once they change it to a non-deprecated country, and then save it, they can never switch back (your use case may vary).
So the key differences is, even in read-only mode I don't add all the deprecated countries to the DDL, just the deprecated country that applies to the record I am looking at, and even then, it is only if that record was already in use.
Here is an example of the logic I use when loading the drop down list:
protected void LoadSourceDropdownList(bool AddingNewRecord, int ExistingCode)
{
using (Entities db = new Entities())
{
if (AddingNewRecord) // when we are adding a new record, only show 'active' items in the drop-downlist.
ddlSource.DataSource = (from q in db.zLeadSources where (q.Active == true) select q);
else // for existing records, show all active items AND the current value.
ddlSource.DataSource = (from q in db.zLeadSources where ((q.Active == true) || (q.Code == ExistingCode)) select q);
ddlSource.DataValueField = "Code";
ddlSource.DataTextField = "Description";
ddlSource.DataBind();
ddlSource.Items.Insert(0, "--Select--");
ddlSource.Items[0].Value = "0";
}
}
If you are displaying the record as read-only, why bother loading the standing data at all?
Here's what I would do:
the record will contain the country code in any case, I would also propose returning the country description (which admittedly makes things less efficient), but when the user loads "old stuff", the business service recognises that this record will be read only, and you don't bother loading the country list (which would make things more efficient).
in my presentation service I will then generally do a check to see whether the list of countries is null. If not (r/w) load the data into the list box, if so (r/o) populate the list box from the data in the record - a single entry in the list equals read-only.
You can filter with CollectionViewSource or you could just create a Public Enumerable that filters the full list using LINQ.
CollectionViewSource Class
LINQ The FieldDef.DispSearch is the active condition. IEnumerable is a little better performance than List.
public IEnumerable<FieldDefApplied> FieldDefsAppliedSearch
{
get
{
return fieldDefsApplied.Where(df => df.FieldDef.DispSearch).OrderBy(df => df.FieldDef.DispName);
}
}
Why would you still want to display (for instance) customer-addresses with their OLD country-code?
If I understand correctly, you currently still have 'address'-records that still point to 'Serbia and Montenegro'. I think if you solve that problem, your current question would be none-existent.
The term "country" is perhaps a little misleading: not all the "countries" in ISO 3166 are actually independent. Rather, many of them are geographically separate territories that are legally portions or dependencies of other countries.
Also note that 'withdrawn country-codes' are reserved for 5 years, meaning that after 5 years they may be reused. So moving away from using the country-code itself as primary key would make sense to me, especially if for historical reasons you would need to back-track previous country-codes.
So why not make the 'withdrawn' field/table that points to the new country-id's. You can still check (in sql for instance, since you were already using a table) if this field is empty or not to get a true/false check if you need it.
The way I see it: "Country" codes may change, country's may merge and country's may divide.
If country's change or merge, you can update your address-records with a simple query.
If country's divide, you need a way to determine what address is part of what country.
You could use some automated system do do this (and write lengthly books about it).
OR
(when it is a forum like site), you could ask the users that still have a withdrawn country that points to multiple alternatives in their account to update their country-entry at login, where they can only choose from the list of new country's that are specified in the withdrawn field.
Think of this simplified country-table setup:
id cc cn withdrawn
1 DE Germany
2 CS Serbia and Montenegro 6,7
3 RH Southern Rhodesia 5
4 NL The Netherlands
5 ZW Zimbabwe
6 RS Serbia
7 ME Montenegro
In this example, address-records with country-id 3, get updated with a query to country-id 5, no user interaction (or other solution) needed.
But address-records that specify country-id 2 will be asked to select country-id 6 or 7 (of course in the text presented to the user you use the country-name) or are selected to perform your custom automated update routine on.
Also note: 'withdrawn' is a repeating group and as such you could/should make it into a separate table.
Implementing this idea (without downtime) in your scenario:
sql statement to build a new country-table with numerical id's as primary key.
sql statement to update address-records with new field 'country-id' and fill this field with the country-id from the new country-table that corresponds with country-code specified in that record's address-field.
(sql statement to) create the withdrawn table and populate the correct data with in it.
then rewrite your the sql statements that supply your forms with data
add the check and 'ask user to update country'-routine
let new forms go live
wait/see for unintended bugs
delete old country-table and (now unused) country-code column from the "address"-table
I am very curious what other experts think about this idea!!