I am trying to read all new rows that are added to the database on a timer.
First I read the entire database and save it to a local data table, but I want to read all new rows that are added to the database. Here is how I'm trying to read new rows:
string accessDB1 = string.Format("SELECT * FROM {0} ORDER BY ID DESC", tableName);
setupaccessDB(accessDB1);
int dTRows = localDataTable.Rows.Count + 1;
localDataTable.Rows.Add();
using (readNext = command.ExecuteReader())
{
while (readNext.Read())
{
for (int xyz = 0; xyz < localDataTable.Columns.Count; xyz++)
{
// Code
}
break;
}
}
If only 1 row is added within the timer then this works fine, but when multiple rows are added this only reads the latest row.
So is there any way I can read all added rows.
I am using OledbDataReader.
Thanks in advance
For most tables the primary key is based an incremental value. This can be a very simple integer that is incremented by one, but it could also be a datetime based guid.
Anyway if you know the id of the last record. You can simple ask for all records that have a 'higher' id. In that way you do get the new records, but what about updated records? If you also want those you might want to use a column that contains a datetime value.
A little bit more trickier are records that are deleted from the database. You can't retrieve those with a basic query. You could solve that by setting a TTL for each record you retrieve from the database much like a cache. When the record is 'expired', you try to retrieve it again.
Some databases like Microsoft SQL Server also provide more advanced options into this regard. You can use query notifications via the broker services or enable change tracking on your database. The last one can even indicate what was the last action per record (insert, update or delete).
Your immediate problem lies here:
while (readNext.Read())
{
doSomething();
break;
}
This is what your loop basically boils down to. That break is going to exit the loop after processing the first item, regardless of how many items there are.
The first item, in this case, will probably be the last one added (as you state it is) since you're sorting by descending ID.
In terms of reading only newly added rows, there are a variety of ways to do it, some which will depend on the DBMS that you're using.
Perhaps the simplest and most portable would be to add an extra column processed which is set to false when a row is first added.
That way, you can simply have a query that looks for those records and, for each, process them and set the column to true.
In fact, you could use triggers to do this (force the flag to false on insertion) which opens up the possibility for doing it with updates as well.
Tracking deletions is a little more difficult but still achievable. You could have a trigger which actually writes the record to a separate table before deleting it so that your processing code has access to those details as well.
The following works
using (readNext = command.ExecuteReader())
{
while (readNext.Read())
{
abc = readNext.FieldCount;
for (int s = 1; s < abc; s++)
{
var nextValue = readNext.GetValue(s);
}
}
}
The For Loop reads the current row and then the While Loop moves onto the next row
Related
In this system one program creates table records and a second updates them. I want the update program to see the new records. See lots of queries/responses to this but so far none have worked for me. One solution was to clear and reload the dataset table. The following code recreates the dataset but I can't include the auto incremented primary key bookid. If it is there I get an overload error even thought the field count is correct. If I remove it the dataset.booking table is loaded but the bookid values are wrong negative numbers) and I can't update the dataset.booking table as it does not match the database table.
tclDataSet3.booking.AcceptChanges();
tclDataSet3.booking.Clear();
bookingBindingSource.ResetBindings(false);
dataGridView1.ClearSelection();
var bkas1 = tcdb.bookings.Where(b => b.approvalStatus == 1);
foreach (booking bk in bkas1)
tclDataSet3.booking.AddbookingRow(
(int)bk.bookId,
(int)bk.bookYear,
(int)bk.bookMonth,
(int)bk.bookDay,
(int)bk.workOrder,
(int)bk.customerNum,
bk.firstName,
bk.lastName,
bk.vehicle,
(int)bk.serviceCar,
bk.repairType,
(bool)bk.isCompleted,
(bool)bk.isPickedUp,
bk.outYear,
bk.outMonth,
bk.outDay,
(bool)bk.isDeleted,
(int)bk.isUpdated,
bk.bookingTime,
(int)bk.approvalStatus);
Program requirements:
display datagridview of dataset.booking table where as_code = 1
updates rows in datagrideview to change as_code = 2
remove updated rows from datagridview (bookingBindingSource.RemoveCurrent(); works well)
Refresh datagridview to see all dataset.booking table rows where as_code = 1
Currently the refresh only sees existing records in the datagrideview.
Is there a better way to do this?
After much trial and error I decided to rewrite the code to manually build the dataGridView rather than use any data binding. I created a subroutine to clear the dataGridView, read the base table and rebuild the dataGridView.
I need to go through a database finding all text-like fields, checking for a particular set of URLs that might be in these fields and modify it using some regex.
The actual text manipulation part is fine, but since I'm generically going through tables, some of which seem to not have primary keys, I'm wondering how to update these rows once I've read this. I'll give a dummy example below.
foreach(var matchingTable in tables){
foreach(var matchingColumn in columns){
SqlCommand currentCommand = new SqlCommand("select * from #matchingTable;");
currentCommand.Parameters.AddWithValue("#matchingTable", matchingTable);
using (SqlDataReader reader = currentCommand.ExecuteReader())
{
while (reader.Read())
{
if(/* logic to check reader[matchingColumn]*/){
/*edit row to change column matchingColumn, a row which I can't be sure has any uniquely identifying factor*/
}
}
}
}
}
Is it possible to edit this arbitrary row or will I have to change how I'm doing this?
If a table do not have guaranteed unique id then there is no value of a duplicated record. And so if a duplicated record is also updated then no harm will be done.
You can consider all the fields as one composite field and perform the update.
One thing to keep in mind is the duplicate records will also be updated.
This might be easier to solve inside the database. You can write a cursor equivalent to the two for-loops to select the table and column. From there you can write a simple UPDATE/WHERE using your regular expression.
I have a DataTable object that I need to fill based on data stored in a stream of columns - i.e. the stream initially contains the schema of the DataTable, and subsequently, values that should go into it organised by column.
At present, I'm taking the rather naive approach of
Create enough empty rows to hold all data values.
Fill those rows per cell.
The result is a per-cell iteration, which is not especially quick to say the least.
That is:
// Create rows first...
// Then populate...
foreach (var col in table.Columns.Cast<DataColumn>)
{
List<object> values = GetValuesfromStream(theStream);
// Actual method has some DBNull checking here, but should
// be immaterial to any solution.
for (var i=0; i<values.Count; i++)
table.Rows[i][col] = values[i];
}
My guess is the backing DataStorage items for each column aren't expanding as the rows are added, but as values are added to each column, but I'm far from certain. Any tips for loading this kind of data.
NB that loading all lists first and then reading in by row is probably not sensible - this approach is being taken in the first place to mitigate potential out of memory exceptions that tend to result when serializing huge DataTable objects, so grabbing a clone of the entire data grid and reading it in would probably just move the problem elsewhere. There's definitely enough memory for the original table and another column of values, but there probably isn't for two copies of the DataTable.
Whilst I haven't found a way to avoid iterating cells, as per the comments above, I've found that writing to DataRow items that have already been added to the table turns out to be a bad idea, and was responsible for the vast majority of the slowdown I observed.
The final approach I used ended up looking something like this:
List<DataRow> rows = null;
// Start population...
var cols = table.Columns.Cast<DataColumn>.Where(c => string.IsNullOrEmpty(c.Expression));
foreach (var col in cols)
{
List<object> values = GetValuesfromStream(theStream);
// Create rows first if required.
if (rows == null)
{
rows = new List<DataRow>();
for (var i=0; i<values.Count; i++)
rows.Add(table.NewRow());
}
// Actual method has some DBNull checking here, but should
// be immaterial to any solution.
for (var i=0; i<values.Count; i++)
rows[i][col] = values[i];
}
rows.ForEach(r => table.Rows.Add(r));
This approach addresses two problems:
If you try to add an empty DataRow to a table that has null-restrictions or similar, then you'll get an error. This approach ensures all the data is there before it's added, which should address most such issues (although I haven't had need to check how it works with auto-incrementing PK columns).
Where expressions are involved, these are evaluated when row state changes for a row that has been added to a table. Consequently, where before I had re-calculation of all expressions taking place every time a value was added to a cell (expensive and pointless), now all calculation takes place just once after all base data has been added.
There may of course be other complications with writing to a table that I've not yet encountered because the tables I am making use of don't use those features of the DataTable class/model. But for simple cases, this works well.
I have some .NET code wrapped up in a repeatable read transaction that looks like this:
using (
var transaction = new TransactionScope(
TransactionScopeOption.Required,
new TransactionOptions { IsolationLevel = IsolationLevel.RepeatableRead },
TransactionScopeAsyncFlowOption.Enabled))
{
int theNextValue = GetNextValueFromTheDatabase();
var entity = new MyEntity
{
Id = Guid.NewGuid(),
PropertyOne = theNextValue, //An identity column
PropertyTwo = Convert.ToString(theNextValue),
PropertyThree = theNextValue,
...
};
DbSet<MyEntity> myDbSet = GetEntitySet();
myDbSet.Add(entity);
await this.databaseContext.Entities.SaveChangesAsync();
transaction.Complete();
}
The first method, GetNextValueFromTheDatabase, retrieves the max value stored in a column in a table in the database. I'm using repeatable read because I don't want two users to read and use the same value. Then, I simply create an Entity in memory and call SaveChangesAsync() to write the values to the database.
Sporadically, I see that the values of entity.PropertyOne, entity.PropertyTwo, and entity.PropertyThree do not match each other. For example, entity.PropertyOne has a value of 500, but entity.PropertyTwo and entity.PropertyThree have a value of 499. How is that possible? Even if the code weren't wrapped in a transaction, I would expect the values to match (just maybe duplicated across the Entities if two users ran at the same time).
I am using Entity Framework 6 and Sql Server 2008R2.
Edit:
Here is the code for GetNextValueFromTheDatabase
public async Task<int> GetNextValueFromTheDatabase()
{
return await myQuerable
.OrderByDescending(x => x.PropertyOne) //PropertyOne is an identity column (surprise!)
.Select(x => x.PropertyOne)
.Take(1)
.SingleAsync() + 1;
}
So this question cannot be definitively answered because GetNextValueFromTheDatabase is not shown. I'm going off of what you said what it does:
REPEATABLE READ in SQL Server S-locks rows that you have read. When you read the current maximum, presumably from an index, that row is S-locked. Now, if a new maximum appears that row is unaffected by the lock. That's why the lock does not prevent other, competing maximum values from appearing.
You need SERIALIZABLE isolation if you obtain the maximum by reading the largest values from a table. This will result in deadlocks in your specific case. That can be solved through locking hints or retries.
You could also keep a separate table that stores the current maximum value. REPEATABLE READ is enough here because you always access the same row of that table. You will be seeing deadlocks here as well even with REPEATABLE READ without locking hints.
Retries are a sound solution to deadlocks.
I think that you are basically experiencing the phantom read.
Consider two transactions T1, T2 that are scheaduled for execution like shown below. The things is that in T1's first read you do not get value (X) that is inserted from transaction T2. In the second time you do get the value (X) in your select statement. This is the scary nature of the repeatable read. It does not block insertion in the whole table if some rows are read from it. It only locks existing rows.
T1 T2
SELECT A.X FROM WeirdTable
INSERT INTO WeirdTable TABLE (A) VALUES (X)
SELECT A.X FROM WeirdTable
.
UPDATE
It seems that this answer turned out irrelavant for this specific question. It is related to the repeatable read isolation level, matches the keywords of this question and is not concepcually wrong though, so I will leave it in here.
I finally figured this out. As described in usr's response, multiple transaction can read the same max value at the same time (S-Lock).The problem was that one of the columns is an identity column. EF allows you specify an identity column's value when inserting but ignores the value you specify. So the identity column seemed to update with the expected value most of the time, but in fact the value specified in the domain entity just happen to match what the database was generating internally.
So for example, let's say the current max number is 499, transaction A and transaction B both read 499. When transaction A finishes, it successfully writes 500 to all three properties. Transaction B attempts to write 500 to all 3 columns. The non-identity columns are updated successfully to 500, but the identity column's value is incremented to the next available value automatically (without throwing an error)
A few solutions
The solution I used is to not set the value for any of the columns when inserting the record. Once the record is inserted, update the other two columns with the database assigned identity column's value.
Another option would be to change the column's option to .HasDatabaseGeneratedOption(DatabaseGeneratedOption.None)
...which would perform better than the first option, but would require the changes usr suggested to mitigate the lock issues.
I am using this code to insert 1 million records into an empty table in the database. Ok so without much code I will start from the point I have already interacted with data, and read the schema into a DataTable:
So:
DataTable returnedDtViaLocalDbV11 = DtSqlLocalDb.GetDtViaConName(strConnName, queryStr, strReturnedDtName);
And now that we have returnedDtViaLocalDbV11 lets create a new DataTable to be a clone of the source database table:
DataTable NewDtForBlkInsert = returnedDtViaLocalDbV11.Clone();
Stopwatch SwSqlMdfLocalDb11 = Stopwatch.StartNew();
NewDtForBlkInsert.BeginLoadData();
for (int i = 0; i < 1000000; i++)
{
NewDtForBlkInsert.LoadDataRow(new object[] { null, "NewShipperCompanyName"+i.ToString(), "NewShipperPhone" }, false);
}
NewDtForBlkInsert.EndLoadData();
DBRCL_SET.UpdateDBWithNewDtUsingSQLBulkCopy(NewDtForBlkInsert, tblClients._TblName, strConnName);
SwSqlMdfLocalDb11.Stop();
var ResSqlMdfLocalDbv11_0 = SwSqlMdfLocalDb11.ElapsedMilliseconds;
This code is populating 1 million records to an embedded SQL database (localDb) in 5200ms. The rest of the code is just implementing the bulkCopy but I will post it anyway.
public string UpdateDBWithNewDtUsingSQLBulkCopy(DataTable TheLocalDtToPush, string TheOnlineSQLTableName, string WebConfigConName)
{
//Open a connection to the database.
using (SqlConnection connection = new SqlConnection(ConfigurationManager.ConnectionStrings[WebConfigConName].ConnectionString))
{
connection.Open();
// Perform an initial count on the destination table.
SqlCommand commandRowCount = new SqlCommand("SELECT COUNT(*) FROM "+TheOnlineSQLTableName +";", connection);
long countStart = System.Convert.ToInt32(commandRowCount.ExecuteScalar());
var nl = "\r\n";
string retStrReport = "";
retStrReport = string.Concat(string.Format("Starting row count = {0}", countStart), nl);
retStrReport += string.Concat("==================================================", nl);
// Create a table with some rows.
//DataTable newCustomers = TheLocalDtToPush;
// Create the SqlBulkCopy object.
// Note that the column positions in the source DataTable
// match the column positions in the destination table so
// there is no need to map columns.
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
{
bulkCopy.DestinationTableName = TheOnlineSQLTableName;
try
{
// Write from the source to the destination.
for (int colIndex = 0; colIndex < TheLocalDtToPush.Columns.Count; colIndex++)
{
bulkCopy.ColumnMappings.Add(colIndex, colIndex);
}
bulkCopy.WriteToServer(TheLocalDtToPush);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
// Perform a final count on the destination
// table to see how many rows were added.
long countEnd = System.Convert.ToInt32(
commandRowCount.ExecuteScalar());
retStrReport += string.Concat("Ending row count = ", countEnd, nl);
retStrReport += string.Concat("==================================================", nl);
retStrReport += string.Concat((countEnd - countStart)," rows were added.", nl);
retStrReport += string.Concat("New Customers Was updated successfully", nl, "END OF PROCESS !");
//Console.ReadLine();
return retStrReport;
}
}
Trying it via a connection to SQL server was around 7000ms(at best) & ~7700ms average. Also via a random kv nosql database took around 40 sec (really I did not even keep records of it as it passed over the x2 of sql variants). So... is there a faster way than what I was testing in my code?
Edit
i am using win7 x64 8gb ram and most important i should think (as i5 3ghz) is not so great by now
the x3 500Gb Wd on Raid-0 does the job even better
but i am just saying if you will check on your pc
though just compare it to any other method in your configuration
Have you tried SSIS? I have never written an SSIS package with a loacldb connection, but this is the sort of activity SSIS should be well suited.
If your data source is a SQL Server, another idea would be setting up a linked server. Not sure if this would work with localdb. If you can set up a linked server, you could bypass the C# all together and load your data with an INSERT .. SELECT ... FROM ... SQL statement.
you can use Dapper.NET.
Dapper is a micro-ORM, executes a query and map the results to a strongly typed List.
Object-relational mapping (ORM, O/RM, and O/R mapping) in computer software is a programming technique for converting data between incompatible type systems in object-oriented programming languages. This creates, in effect, a “virtual object database” that can be used from within the programming language
For more info:
check out https://code.google.com/p/dapper-dot-net/
GitHub Repository: https://github.com/SamSaffron/dapper-dot-net
Hope It helps..
Remove looping... In SQL, try to make a table with 1 million rows... and left join it use this for insert/select data
Try sending it without storing it in a datatable.
See the example at the end of this post, that allows you to do it with an enumerator http://www.developerfusion.com/article/122498/using-sqlbulkcopy-for-high-performance-inserts/
If you are just creating nonsense data, create a stored procedure and just call that through .net
If you are passing real data, again passing it to a stored proc would be quicker but you would be best off dropping the table and recreating it with the data.
If you insert one row at a time, it will take longer than inserting it all at once. It will take even longer if you have indexes to write.
Create a single XML file for all rows you want to save into data base. Pass this XML to SQL stored procedure and save all record in one call only.
But your stored procedure must be written so that it can read all read then insert into table.
If this is a new project I recommend you to use Entity Framework. In this case you can create a List<> with an object with all the data you need and then simply add it entirely to the corresponding table.
This way you are quickly geting the needed data and then sending it to the database at once.
I agree with Mike on SSIS but it my not suit your environment, however for ETL processes that involve cross server calls and general data flow processes it is a great built in tool and highly integrated.
With 1 million rows you will likely have to do a bulk insert. Depending on the row size you would not really be able to use a stored procedure unless you did this in batches. A datatable will fill memory pretty quick, again depending on the row size. You could make a stored procedure and have that take a table type and call that every X number of rows but why would we do this when you already have a better, more scalable solution. That million rows could be 50 million next year.
I have used SSIS a bit and if that is an organizational fit I would suggest looking at it, but it wouldn't be a one time answer, wouldn't be worth the dependencies.