I have data in IBM DB2 and what I am trying to do is to get that data using EntityFramework.
I have one column that has TIMESTAMP values in this format: dd.MM.yyyy hh:mm:ss.ffffff and it is primary key in that table.
When I am getting data from DB2 database time part of timestamp is lost.
On the other side I have entity that has string property for that Id:
public string Id { get; set; }
This is a method that will return specific merchant. It is expected only one, because timestamps aka Ids are unique. But, when data gets pulled and when time part is lost I get 9 merchants with same Id and of course exception).
MerchantEntity IMerchantRepository.GetMerchantByRegistrationNumber(string companyRegistrationNumber)
{
var db = merchantDataContextProvider.GetMerchantContext();
var merchant = db.Merchants.Include(x => x.MerchantProperty).
SingleOrDefault(x => x.MerchantProperty.CompanyRegistrationNumber == companyRegistrationNumber);
return merchant;
}
I have tried to use DateTime type for Id, the only difference was that it was cutting last 3 numbers, instead whole time part.
Did anyone had same or similar problem? Who to get accurate data when using DB2?
Timestamps as primary key.. not a great idea. If you do however want to use time as your basis for creating an ID of the merchant, you could do this:
var newId = string.format("{0:D19}", DateTime.Now.Ticks)
which will give you a 19-digit long number based on the current time. The problem of course is that if the clock on the PC resets, or you run the program on a different computer, different timezones etc, this strategy will fail.
If you need something that uniquely identifies a row in a table, use GUID's. They're very unique, and the chance of two different processes making the same one is very close to zero
Related
I've a situation where I want to store count of messages of a particular user in a chat in a particular day.
To make the Id unique, I thought I should combine these ids together and it became ~20 characters length consisted of ChatId + DDMMYY + UserId
public class UserContributions
{
[BsonId]
public string ChatIdDateUserId { get; set; }
public int Count { get; set; }
// the rest
}
But I guess an ID with this length is not good regarding performance. Is that How I should make a complex ID?
Thanks.
The length should not be too much of a problem. Furthermore, you can have a compound _id
{
_id:{
ChatID: "someId",
Date: ISODate("2017-10-30T00:00:00.000Z"),
UserID: "someUID"
}
}
Some notes first: Do NOT use strings to denote a date. First of all, ISODates are stored as 64bit unsigned integer. While with a date stored as 6 characters, you save some space, but you loose all capabilities for date operations in aggregations as well as normal date comparisons. Bad idea.
Second, your model is prone to collision. The same user at the same date could only post one message to a specific chat. The second message of the day would have exactly the same values and hence would violate the constraint of uniqueness. So you actually have to use the full ISODate, down to the millisecond.
And still then, there is a small chance of collision (say you have a date generated on two application servers which are slightly of time wise). There is a reason why there is an additional counter in ObjectId.
Here is how I would model it
{
_id: new ObjectId(),
ChatId: "someChat",
UserId: "someUser"
}
Reason: the ObjectId contains a timestamp by which you can query (I do not know C# well enough to give an example, hence I will make this a wiki answer to give others the opportunity to do so), eliminates unneeded complexity and with indices on both ChatId and UserId it is fast enough.
I am trying to retrieve the newest row created in the Primary Minute Metrics tables that is automatically created by Azure. Is there any way to do this without scanning through the whole table? The partition key is basically the timestamp in a different format. For example:
20150811T1250
However, there is no way for me to tell what the latest partitionkey is, so I can't just query by partition. Also, the row key is useless since all the rows have the same rowkey. I am completely stumped on how I would do this even though it seems like a really basic thing to do. Any ideas?
An example of a few partition keys of rows in the table:
20150813T0623
20150813T0629
20150813T0632
20150813T0637
20150813T0641
20150813T0646
20150813T0650
20150813T0654
EDIT: As a followup question. Is there a way to scan the table backwards? That would allow me to just get the first row scanned since that would be the latest row.
When it comes to querying data, Azure Tables offer very limited choices. Given that you know how the PartitionKey gets assigned (YYYYMMDDTHHmm format), one possible solution would be to query from current date/time (in UTC) minus some offset to current date/time and go from there.
For example, assuming start time is 03-Dec-2015 00:00:00. What you could do is try to fetch data from 02-Dec-2015 23:00:00 to 03-Dec-2015 00:00:00 and see if any records are returned. If the records are returned, you can simply take the last entry in the resultset and that would be your latest entry. If no records are found, then you move back by 1 hour (i.e. from 02-Dec-2015 22:00:00 to 02-Dec-2015 23:00:00) and fetch records again and repeat this till the time you find matching result.
Yet another idea (though a bit convoluted one) is to create another table and periodically copy the data from the main table to this new table. When you copy the data, what you would need to do is take the PartitionKey value, create a Date/Time object out of it, subtract that from DateTime.MaxValue. Calculate the ticks for this new value and use that as the PartitionKey for your new entity (you would need to convert that ticks into string and do some string prepadding so that all values are of same length). Now the latest entries will always be on the top.
I know similar questions have been asked, but I have a rather different scenario here.
I have a SQL Server database which will store TicketNumber and other details. This TicketNumber is generated randomly from a C# program, which is passed to the database and stored there. The TicketNumber must be unique, and can be from 000000000-999999999.
Currently, what I do is: I will do a select statement to query all existing TicketNumber from the database:
Select TicketNumber from SomeTable
After that, I will load all the TicketNumber into a List:
List<int> temp = new List<int>();
//foreach loop to add all numbers to the List
Random random = new Random();
int randomNumber = random.Next(0, 1000000000);
if !(temp.Contain(randomNumber))
//Add this new number to the database
There is no problem with the code above, however, when the dataset get larger, the performance is deteriorating. (I have close to hundred thousand of records now). I'm wondering if there is any more effective way of handling this?
I can do this from either the C# application or the SQL Server side.
This answer assumes you can't change the requirements. If you can use a hi/lo scheme to generate unique IDs which aren't random, that would be better.
I assume you've already set this as a primary key in the database. Given that you've already got the information in the database, there's little sense (IMO) in fetching it to the client as well. That goes double if you've got multiple clients (which seems likely - if not now then in the future).
Instead, just try to insert a record with a random ID. If it works, great! If not, generate a new random number and try again.
After 1000 days, you'll have a million records, so roughly one in a thousand inserts will fail. That's only one a day - unless you've got some hard limit on the insertion time, that seems pretty reasonable to me.
EDIT: I've just thought of another solution, which would take a bunch of storage, but might be quite reasonable otherwise... create a table with two columns:
NaturalID ObfuscatedID
Prepopulate that with a billion rows, which you generate by basically shuffling all the possible ticket IDs. It may take quite a while, but it's a one-off cost.
Now, you can use an auto-incrementing ID for your ticket table, and then either copy the corresponding obfuscated ID into the table as you populate it, or join into it when you need the ticket ID.
You can create a separate table with only one column . Lets just name it UniqueID for now. Populate that column with UniqueID = 000000000-999999999. Everytime you want to generate a random number, do something like
SELECT TOP 1 UniqueID From (Table) WHERE UniqueID NOT IN (SELECT ID FROM (YOUR TABLE))
Code has not been tested but just to show the idea
Using EF, I need to pull records from one database, do some initialization and mapping, and insert the records into another DB. I need to only so this for records that have not already been imported. My model on the secondary DB looks like:
public class Loan
{
public int Id { get; set; }
// this is the id of this loan's record from the original DB
public int OriginalDbId { get; set; }
// the loan's last date of attendance
public DateTime LDA { get; set; }
...
}
Approach 1:
I added a model called ImportHistory that just saved the time of the last import. I used this date to pull loans where the LDA was greater than the last time of import. The problem is that a user can edit the LDA of a loan causing some loans to be missed and vise verse. For example, if a loan was created after the last import, but edited so that the LDA is before the last import, the new loan would not be imported.
Approach 2:
I retrieved all OriginalDbIds from the secondary DB. And then pulled all records where the Id was not contained in the collection of OriginalDbIds. Note that there's a bit more selection criteria contained in the query.
var allIds = _2ndDbLoanRepo.Query()
.Select(m => m.OriginalDbId)
.ToList();
var newRecords = _1stDbLoanRepo.Query()
.Where(other criteria ...)
.Where(m => !allIds.Contains(m.Id))
.ToList();
This approach will end up being way to slow when there's 100k+ records already found. It's already throwing an EntityCommandExecutionException, with the message "Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding", when there's only 10k records.
So what's the best way to approach this?
Fastest way would be if data never leaves the DB server. The server process updates a log table and front end updates out of that log table – T.S. yesterday
1 - Servers can be linked and mapping can be saved as metadata to be used by server process.
2 - Have you thought of doing it export/import style? Export to text file, import from text file. This is how Database does it, why your program should be different?
I have a table and it has one of the attribute set as identity. I want to get the value of the identity attribute that would be generated after I enter a value to the database.
I have EmpTable made of EmpID and EmpName. EmpID is set as Identity. I want to fetch the EmpID value before inserting a new row to the database.
I would advise against trying to do this with a table that is set up to use an integer column as the primary key. You will run into concurrency problems if you simply fetch the previous ID and increment it. Instead you should use a GUID (uniqueidentifier in SQL) as your primary key.
This will allow you to generate a new GUID in your code that can safely be saved to the database at a later stage.
http://msdn.microsoft.com/en-us/library/system.guid.newguid.aspx
http://msdn.microsoft.com/en-us/library/ms187942.aspx
Sure the server knows where the auto-increment count is in its sequence, but there is almost nothing useful you can do with that information. Imagine you go to the Post Office and they hand out numbered tickets so they can serve customers in order. Of course you could ask them what the next number they'll give out is, but since anyone can walk in at any time you don't know you'll get that number. If you don't know that you'll get it, you can't do anything with it - e.g. writing it as a reference number on a form would be a mistake.
Depending on what you're trying to do, your two main options are:
Use a client-generated guid as your identifier. This kind of messes up the order so the analogy isn't great, but imagine if each customer who walked in could generate a random number that they are sure would never have been used before. They could use that to fill out forms before taking a number.
Take a number, but do it in a transaction with the other operations. A customer can take a number and use it to fill out some paperwork. If they realize they left their money at home, they just throw everything away and you never call their number.
Why do you think you need this information? Can you use either of these strategies instead?