C# Concurrent application - Unique custom generated Ids

C# Concurrent application - Unique custom generated Ids - c#

I'm working on a ASP.Net app, that is not really a distributed application per se, but at some point will have all of its data synchronized to the master node.
In order to be able to store the data from different nodes on one unique table
without having colision of ids, the approach that was taken was:
I. Not use auto generated ids
II. The row Id would be composed by a concatenation of the NodeId + NextRowtId
The NextRowId is generated by:
Selecting the highest id from one specific node,
Splitting it into 2 parts, the first part being the NodeId and the second the being the LastDocumentId
Incrementing the LastDocumentId
Concatenate the NodeId with the incremented LastDocumentId
Eg
Id = 20099, split into (NodeId = 200, LastDocumentId = 99)
LastDocumentId + 1 = 100
NextRowId = 200100
This works perfectly in theory, or if the requests are processed in a sequential way. However, if multiple requests are processed at same time they often end up generating the same id.
So in practice if multiple there is a collision of ids when multiple users try to update the same table at the same time.
I have had a look at the best practices on generating unique ids for distributed systems. However, none of them is a viable option at this point in time, as they would require a rethinking of the whole architecture and lots and lots of refactoring. Both require time which management will not allow me to take.
So what are the other ways that I can ensure that ids generated are unique or that the requests are processed in a sequential way? All this, ideally without having to restructure the application or cause performance bottlenecks.

Create a unique constraint on your key column. If you happen to insert the same id twice, catch the exception and regenerate your id.
You probably want to use Guids instead.
That said, if you need to know to which node your data is associated, you should model your database according to it: Have 2 columns NodeId and DocumentId. You can also generate a Unique Constraint above multiple columns.

Related

C# Winforms Fastest Way To Query MS Access

This may be a dumb question, but I wanted to be sure. I am creating a Winforms app, and using c# oledbconnection to connect to a MS Access database. Right now, i am using a "SELECT * FROM table_name" and looping through each row to see if it is the row with the criteria I want, then breaking out of the loop if it is. I wonder if the performance would be improved if I used something like "SELECT * FROM table_name WHERE id=something" so basically use a "WHERE" statement instead of looping through every row?

The best way to validate the performance of anything is to test. Otherwise, a lot of assumptions are made about what is the best versus the reality of performance.
With that said, 100% of the time using a WHERE clause will be better than retrieving the data and then filtering via a loop. This is for a few different reasons, but ultimately you are filtering the data on a column before retrieving all of the columns, versus retrieving all of the columns and then filtering out the data. Relational data should be dealt with according to set logic, which is how a WHERE clause works, according to the data set. The loop is not set logic and compares each individual row, expensively, discarding those that don’t meet the criteria.
Don’t take my word for it though. Try it out. Especially try it out when your app has a lot of data in the table.

yes, of course.
if you have a access database file - say shared on a folder. Then you deploy your .net desktop application to each workstation?
And furthermore, say the table has 1 million rows.
If you do this:
SELECT * from tblInvoice WHERE InvoiceNumber = 123245
Then ONLY one row is pulled down the network pipe - and this holds true EVEN if the table has 1 million rows. To traverse and pull 1 million rows is going to take a HUGE amount of time, but if you add criteria to your select, then it would be in this case about 1 million times faster to pull one row as opposed to the whole table.
And say if this is/was multi-user? Then again, even on a network - again ONLY ONE record that meets your criteria will be pulled. The only requirement for this "one row pull" over the network? Access data engine needs to have a useable index on that criteria. Of course by default the PK column (ID) always has that index - so no worries there. But if as per above we are pulling invoice numbers from a table - then having a index on that column (InvoiceNumber) is required for the data engine to only pull one row. If no index can be used - then all rows behind the scenes are pulled until a match occurs - and over a network, then this means significant amounts of data will be pulled without that index across that network (or if local - then pulled from the file on the disk).

`BatchStatement` occasionally gets data out of sync

"Cassandra: The Definitive Guide, 2nd Edition" says:
Cassandra’s batches are a good fit for use cases such as making
multiple updates to a single partition, or keeping multiple tables in
sync. A good example is making modifications to denormalized tables
that store the same data for different access patterns.
The last statement above applies to the following attempt, where all the Save... are insert statements for different tables
var bLogged = new BatchStatement();
var now = DateTimeOffset.UtcNow;
var uuidNow = TimeUuid.NewId(now);
bLogged.Add(SaveMods.Bind(id, uuidNow, data1)); // 1
bLogged.Add(SaveMoreMods.Bind(id, uuidNow, data2)); // 2
bLogged.Add(SaveActivity.Bind(now.ToString("yyyy-MM-dd"), id, now)); // 3
await GetSession().ExecuteAsync(bLogged);
We'll focus on statements 1 and 2 (the 3rd one is just to signify there's one more statement in the batch).
Statement 1 writes to table1 partitioned by id with uuidNow being a clustering key desc.
Statement 2 writes to table2 partitioned by id only, so it's the tip of the table1 for the same id.
More times than I'd like the two tables get out of sync in the sense that table2 does not have the tip of the table1. It would be one or two mods behind within a few milliseconds.
While looking for resolution most on the web advise against all batches, which prompted my solution eliminating all mismatches:
await Task.WhenAll(
GetSession().ExecuteAsync(SaveMods.Bind(id, uuidNow, data1)),
GetSession().ExecuteAsync(SaveMoreMods.Bind(id, uuidNow, data2)),
GetSession().ExecuteAsync(SaveActivity.Bind(now.ToString("yyyy-MM-dd"), id, now))
);
The question is: what are batches good for, just the first statement in the quote? In that case how do I ensure modifications to different tables are in sync?

Using higher consistency (ie quorum) on reads/writes may help but there is always a possibility for inconsistencies between the table/partitions.
Batch statements will try to ensure that all the mutations in the batch will all happen or not. It does not guarantee that all the mutations will occur in an instant (no isolation, you can do a read where first mutation has been applied but others haven't). Also, batch statements will not provide a consistent view of all the data across all the nodes. For linearizable consistency you should consider using paxos (lightweight transactions) for conditional updates and trying to limit things that require the linearizability into a single partition.

Generate Unique Random Number

I know similar questions have been asked, but I have a rather different scenario here.
I have a SQL Server database which will store TicketNumber and other details. This TicketNumber is generated randomly from a C# program, which is passed to the database and stored there. The TicketNumber must be unique, and can be from 000000000-999999999.
Currently, what I do is: I will do a select statement to query all existing TicketNumber from the database:
Select TicketNumber from SomeTable
After that, I will load all the TicketNumber into a List:
List<int> temp = new List<int>();
//foreach loop to add all numbers to the List
Random random = new Random();
int randomNumber = random.Next(0, 1000000000);
if !(temp.Contain(randomNumber))
//Add this new number to the database
There is no problem with the code above, however, when the dataset get larger, the performance is deteriorating. (I have close to hundred thousand of records now). I'm wondering if there is any more effective way of handling this?
I can do this from either the C# application or the SQL Server side.

This answer assumes you can't change the requirements. If you can use a hi/lo scheme to generate unique IDs which aren't random, that would be better.
I assume you've already set this as a primary key in the database. Given that you've already got the information in the database, there's little sense (IMO) in fetching it to the client as well. That goes double if you've got multiple clients (which seems likely - if not now then in the future).
Instead, just try to insert a record with a random ID. If it works, great! If not, generate a new random number and try again.
After 1000 days, you'll have a million records, so roughly one in a thousand inserts will fail. That's only one a day - unless you've got some hard limit on the insertion time, that seems pretty reasonable to me.
EDIT: I've just thought of another solution, which would take a bunch of storage, but might be quite reasonable otherwise... create a table with two columns:
NaturalID ObfuscatedID
Prepopulate that with a billion rows, which you generate by basically shuffling all the possible ticket IDs. It may take quite a while, but it's a one-off cost.
Now, you can use an auto-incrementing ID for your ticket table, and then either copy the corresponding obfuscated ID into the table as you populate it, or join into it when you need the ticket ID.

You can create a separate table with only one column . Lets just name it UniqueID for now. Populate that column with UniqueID = 000000000-999999999. Everytime you want to generate a random number, do something like
SELECT TOP 1 UniqueID From (Table) WHERE UniqueID NOT IN (SELECT ID FROM (YOUR TABLE))
Code has not been tested but just to show the idea

SQL Server recommendation for storing segmented gl account codes in database

I've been tasked with an enhancement to our order system that will require importing segmented GL account codes for assignment on individual line items of an order.
I need to support querying the codes by segment1, segment2, etc in order to load cascading dropdown boxes for assignment by the user. The GL codes will have one or more segments delimited by a character. An example of a code is "1010.1034001.99.01".
I've loaded several thousand codes into a table for testing where the entire string value exists in one column (delimited by a character). I've created two variations of functions that return rows where segment1 value is equal to some parameter. The query also supports further querying by providing additional parameters for other segment values.
I intend to support these queries from the table using Entify Framework 6, but used sql functions to get a feel for what the performance may be when the GL account codes are stored in one column. Performance was not as good as I had hoped.
Does anyone have recommendations on how best to store this data (there may be 200,000 codes). Do you feel that I can query using EF and expect performant results?
Would a hierarchy organization make more sense for this data? Our team was hopeful to store the delimited values on one column.
Thanks in advance.

If you would use a table with three columns you could store the values cascading, enabling you to make your queries a lot easier and probably faster. Why would your team hope to store it in one column, what advantage does that have?
if you have
ID
Code
ParentCodeId
where ID is a unique key and ParentCodeId is a nullable reference to that unique Id you can split your exaple code as follows:
ID Code Parent
1 1010 null
2 1034001 1
3 99 2
4 01 3
By applying some logic when importing your codes, you can check if a code already exists as a parent on the needed level so you don;t have to repeat them, and that way you coul dget all codes that start with 10100 by selecting on selectiong on parentID 1.

C# SQL Server - More Efficient for Multiple Database accesses or multiple loops through data?

In part of my application I have to get the last ID of a table where a condition is met
For example:
SELECT(MAX) ID FROM TABLE WHERE Num = 2
So I can either grab the whole table and loop through it looking for Num = 2, or I can grab the data from the table where Num = 2. In the latter, I know the last item will be the MAX ID.
Either way, I have to do this around 50 times...so would it be more efficient grabbing all the data and looping through the list of data looking for a specific condition...
Or would it be better to grab the data several times based on the condition..where I know the last item in the list will be the max id
I have 6 conditions I will have to base the queries on
Im just wondering which is more efficient...looping through a list of around 3500 items several times, or hitting the database several times where I can already have the data broken down like I need it

I could speak for SqlServer. If you create a StoredProcedure where Num is a parameter that you pass, you will get the best performance due to its optimization engine on execution plan of the stored procedure. Of course an Index on that field is mandatory.

Let the database do this work, it's what it is designed to do.

Does this table have a high insert frequency? Does it have a high update frequency, specifically on the column that you're applying the MAX function to? If the answer is no, you might consider adding an IS_MAX BIT column and set it using an insert trigger. That way, the row you want is essentially cached, and it's trivial to look up.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.