how to delete duplicate rows from ms access database (C#) - c#

I have been going through various sites and codes, but nothing seems to end my misery. Either they help to find and remove duplicates for a specific column or they remove only from the datatable, not the underlying database itself. I want to delete duplicate rows from table "table1" of my mdb file.
To make my requirements clearer:
there's no primary key set for any column in the table (and I cant afford to)
I want to delete all duplicate rows but one! (order has no significance)
I prefer deleting the duplicates from database more than first checking if such a row exist or not before updating database itself (if that's the last resort, which cant be, then that's welcome)
by duplicate rows I mean rows that are not distinct. for eg, in the following example,only 3rd and 5th row are duplicates. And I want to delete any of them.
Name1 Name2 Name3
tom dick harry
tom dick mike
ann sara mike
sara ann mike
ann sara mike
The duplicate rows should be deleted from database with a button click as follows
private void button1_Click(object sender, EventArgs e)
{
deletedupes();
}
private void deletedupes()
{
OleDbConnection con = new OleDbConnection("PROVIDER=Microsoft.Jet.OLEDB.4.0; Data Source=C:\\hi.mdb");
DataSet ds = new DataSet();
OleDbDataAdapter da = new OleDbDataAdapter("select * from table1", con);
con.Open();
da.Fill(ds, "table1");
// what could be rest of the code??
}
Thanks in advance. Yes I'm a novice..

If you haven't realized it already, database engines tends to think in absolutes. If you want it to delete a row, you have to tell it how to identify that row. Thus, primary keys.
Having said that, there is generally, but not always, two (2) ways you can do this:
Find out if Access supports syntax to tell DELETE to only consider the "first N rows", similar to DELETE TOP 1 FROM ...
Grab a distinct dataset from your table, delete all the rows in it, and insert the distinct rows back into it
The first might be possible, but it depends on whether Access supports any syntax that makes it possible. For instance. Microsoft SQL Server supports executing a statement SET ROWCOUNT 1 before a DELETE, and then DELETE will delete only 1 row, and then stop. I don't know if Access will do that.
The second will be a pain if you have foreign keys, but I'm going to go out on a limb here and assume that since you don't have primary keys, you don't have foreign keys, so data integrity is not a real problem here.

Here is an article discussing several approaches for deleting duplicate rows in SQL Server, but I suspect it would apply to MS Access, as well: Removing Duplicates from a Table in SQL Server

Ok, this is a complete hack, but it sounds like that's your only option...
Do a SELECT DISTINCTROW from your table. Delete all records from your table. Insert the distinct rows back in.
DISTINCTROW Syntax.

I have a similar problem were the values on rows are indentical but should keep only 1 row per combination of 2 columns. I was thinking about COUNT() and GROUP BY with HAVING COUNT() > 1 to get the combinations of these columns that have more than one occurence in a table. Then using recodset object from DAO to get rows, skip first and delete the rest. This is slow and cumbersome but works without adding a primary key.

As none of the answers were satsifactory for me (I'm just a tad too novice to understand the succint and slightly technicalized way spoken here by more knowledgable and experienced people), i tried my own variant to get this done. I could not follow what to be done with commands like distinct or set rowcount or delete from etc. Nowhere I could find a fully deployed code in an example. So I tried this. From scratch.
int id, k;
private void button2_Click(object sender, EventArgs e)
{
OleDbConnection con = new OleDbConnection("PROVIDER=Microsoft.Jet.OLEDB.4.0; Data Source=C:\\hi.mdb");
DataSet ds = new DataSet();
OleDbDataAdapter da = new OleDbDataAdapter("select * from table2", con);
con.Open();
da.Fill(ds, "table2");
for (int i = 0; i < ds.Tables["table2"].Rows.Count; i++)
{
DataRow row = ds.Tables["table2"].Rows[i];
k++;
for (int j = k; j < ds.Tables["table2"].Rows.Count; j++)
{
DataRow row2 = ds.Tables["table2"].Rows[j];
if (row.ItemArray.GetValue(1).ToString() == row2.ItemArray.GetValue(1).ToString())
{
if (row.ItemArray.GetValue(3).ToString() == row2.ItemArray.GetValue(3).ToString())
{
id = int.Parse(row2.ItemArray.GetValue(0).ToString());
deletedupes(id);
}
}
}
}
con.Close();
}
private void deletedupes(int num)
{
OleDbConnection con = new OleDbConnection("PROVIDER=Microsoft.Jet.OLEDB.4.0; Data Source=C:\\hi.mdb");
con.Open();
OleDbCommand c = new OleDbCommand("Delete from table2 where id =?", con);
c.Parameters.AddWithValue("id", num);
c.ExecuteNonQuery();
con.Close();
}
Edit: Sorry, I missed to say that I did use a unique column having a primary key to get this done. Nevertheless, this can be done without that as well. Just a matter of choice. And for unknown reasons, this method seems so fast too..

Related

DataAdapter_RowUpdated Event's row changes aren't reflected in DataSet and DataTable

My situation involves batch updates to individual tables in an SQLite database through ADO.NET objects. I use the DataAdapter.Update() method to push the changes which works well:
DataTable changes = dataset.Tables[table].GetChanges();
if (changes == null) return 0;
SQLiteCommandBuilder scb = new SQLiteCommandBuilder(adapter);
scb.ConflictOption = ConflictOption.CompareRowVersion;
int cnt = adapter.Update(changes);
return cnt;
However each time a record is inserted I also want the local DataSet tables to reflect with the newly inserted row id. For this I use the adapter_RowUpdated event :
static void adapter_RowUpdated(object sender,
System.Data.Common.RowUpdatedEventArgs e)
{
if (e.StatementType == StatementType.Insert)
{
SQLiteCommand cmd = new SQLiteCommand("select last_insert_rowid();", conn);
e.Row["id"] = cmd.ExecuteScalar();
}
}
The above fetches last_insert_rowid() because I'm able to see it when I debug by putting a breakpoint. However, the assignment statement to e.Row["id"] isn't working. The id change isn't reflected in my original DataSet and DataTable objects. For example when I test the following value (N refers to the specific row index), it still has a DBNull value. What is going wrong here? How can I ensure that the specific row which just got inserted is updated with its corresponding id field value?
dataset.Tables["projects"].row[N]["id"];
After a little experimenting, I found the solution to this myself.
As strange as it may sound but it looks like adapter.Update() requires a dataset along with the actual table name in order for this to work. I was passing the table object (DataTable.GetChanges()) so far which did the job of updating the database but failed only in this particular scenario. The moment I did that, the inserted id started reflecting in rows all over the dataset!
//int cnt = adapter.Update(changes); // doesn't work
int cnt = adapter.Update(dataset, tableName); // works perfectly!
edit
Lo and Behold! It even works when I just pass the table like this instead of entire dataset. It was only causing problem when I was just passing the changes table (got from dataset.Tables[tableName].GetChanges()).
int cnt = adapter.Update(dataset.Tables[tableName]); // works perfectly!

Programatically erasing a row in SQL within a foreach loop?

I have a relational database connecting meal_ingredients and ingredient nutritional values (see here for further information), utilising PostgreSQL. Within a WinForms application, there is a button that takes values from a DataGridView and then places each row into an array.
Each row in the DataGridView is an ingredient with its nutritional values. Above the DataGridView, a textbox takes a string for the meal name. Upon clicking the button (code below), the array values do one of two things:
If the meal name (meal_name being the PK) already exists in the meal_ingredients table, all rows in the database containing this string are removed. The rows from the DataGridView are then inserted, effectively 'overwriting' the ingredients for that meal.
If the meal name does not exist in the meal_ingredients table, the rows plus the meal name, entered into the textbox, are simply appended to the table.
In my code, as you can see once the data is placed into the array, a connection is made with the database, and the results from the SELECT query loaded into a DataTable.
The loop which follows triggers a MessageBox if the meal_name field matches the string value in the textbox. This works fine.
My issue is as follows. For however many rows exist in the DataGridView, the MessageBox will fire off that many times; so with two rows, I will see two MessageBoxes, for example. This, per se, is not a problem, unless replacing this MessageBox with DELETE and INSERT statements would throw an error.
In place of MessageBox.Show("test");, I would instead place a SQL statement to remove any records where meal_name == txtMealName.Text and then a second SQL statement to insert new records based upon the DataGridView rows. Of course, if the MessageBox fires off according to the number of rows, I expect the SQL would also occur that many times. Again, this is fine in principle. But I am just wondering if this would cause a conflict of any kind (that is, for example, the SQL throwing an exception because there are no remaining rows to delete)?
private void btnMealAdd_Click(object sender, EventArgs e)
{
if (txtMealName.Text != "")
{
foreach (DataGridViewRow row in dgvMealIngredient.Rows)
{
List<string> macroList = new List<string>();
macroList.Add(row.Cells[0].Value.ToString());
macroList.Add(row.Cells[9].Value.ToString());
macroList.Add(txtMealName.Text);
String[] str = macroList.ToArray();
NpgsqlConnection conn = new NpgsqlConnection(Globals.connectionString());
conn.Open();
NpgsqlCommand comm = new NpgsqlCommand();
comm.Connection = conn;
comm.CommandType = CommandType.Text;
comm.CommandText = "SELECT * FROM meal_ingredients";
NpgsqlDataReader dr = comm.ExecuteReader();
DataTable dt = new DataTable();
dt.Load(dr);
foreach (DataRow dataRow in dt.Rows)
{
if (dataRow[0].ToString() == txtMealName.Text)
{
MessageBox.Show("test");
}
Debug.WriteLine(dataRow[0]);
}
}
}
else
{
MessageBox.Show("error: enter a meal name");
}
}
A simplified form of the database relation (note that qty is one of the fields in the DataGridView):
just whether replacing that with INSERT or DELETE will throw an error
An SQL DELETE can be typically run multiple times without error. There either will be some rows for it to delete or there will not but it will only normally result an error if there are dependent records in another table and no arrangement for them to be deleted or disconnected in cascade fashion. It is not an error for a DELETE statement to affect 0 rows
An SQL INSERT can typically only be run multiple times when it is not subsequently(after the first run) inhibited by the presence of a unique constraint on one or more of the columns. As most tables you design should really have a primary key, you can only insert a row with a unique value for the key column. If you aren't devolving generation of the value to the database then re-running an identical INSERT will fail on the second run. If the table depends on another table to have a related row and a foreign key constraint backs this up, then an insert that doesn't relate to a row in the parent table will fail on first run

C#: Fill DataGridView with DataTable creates empty table

I searched the web and Stack Overflow and found lots of descriptions on how to fill a DataGridView with the content of a DataTable. But still it does not work for me. My DataGridView shows the correct number of columns and rows, but they appear empty.
I use following method:
public void ShowDataInGrid(ref DataTable table)
{
BindingSource sBind = new BindingSource();
dbView.Columns.Clear();
dbView.AutoGenerateColumns = false;
sBind.DataSource = table;
dbView.DataSource = sBind; //Add table to DataGridView
dbView.Columns.Add("Date", "Date");
}
Before this I created a DataGridView of name "dbView" via the designer. I am not even sure, whether I need sBind. Without it I can bind the table directly to dbView, with the same bad result.
I suspect my table is the problem. It origins from a database (SQLite) and has several columns and rows (one of the columns has the name "Date"). It is definately filled with readable data.
I mainly read the table in using following commands (after this I manipulate the data in several different steps, like changing strings and adding numbers...):
string sql = "select * from Bank";
SQLiteCommand command = new SQLiteCommand(sql, m_dbConnection);
SQLiteDataReader reader = command.ExecuteReader();
table.Load(reader);
reader.Close();
table.AcceptChanges();
I think the problem might be, that the table entries are stored as objects and not as string, and hence can't be shown. That's why I tried to force the content to be strings with the following change to my table:
DataTable dbTableClone = new DataTable();
dbTableClone.Load(reader);
SQLiteDataReader reader.Close();
dbTableClone.AcceptChanges();
string[] dBHeader = new string[dbTableClone.Columns.Count];
dBHeader = ReadHeaderFromDataTable(dbTableClone); //own funktion, which reads the header
DataTable table;
table.Clear();
//will first create dbTable as empty clone, so I can set DataTyp of each Column
table = dbTableClone.Clone();
for (int col = 0; col > dBHeader.Length; col++) //first set all columns as string
{
dbTable.Columns[col].DataType = typeof(string);
}
foreach (DataRow Row in dbTableClone.Rows)
{
dbTable.ImportRow(Row);
}
This did not help me neither.
Another idea: I found some comments on similar problems, where it got apparently solved with quote: "I designed columns in the VS datagridview designer. Not the column name, but the column DataPropertyName must match with fields in database." Unfortunately I don't seem to be able to do/understand this.
Following you see one row of my input table.
Try fetching and setting to GridView this way
SqlLiteConnection con = new SqlLiteConnection(#"Data Source=(LocalDB)\v11.0;AttachDbFilename=DB.mdf;Integrated Security=True");
con.Open();
SqlLiteDataAdapter adap = new SqlLiteDataAdapter("select * from Bank", con);
DataSet ds = new System.Data.DataSet();
adap.Fill(ds);
dataGridView1.DataSource = ds.Tables[0];
Comment everything you've done so far, try this and let me know if this works for you or not. Change connection according to your DB.
I solved the problem.
The DataTable was fine. The problem was the setup of my DataGridView dbView. I set up dbView in the designer and somehow gave it a datasource. Now I set the datasource to "none" (In "DataGridView Tasks") and my data appears as intended.
Thanks to M Adeel Khalid for looking at my stuff. Him assuring to me that my code for the link was right, made me find the solution eventually.
At the end I really only needed to use a single line:
dbView.DataSource = table;

C# Manually added dataset, how to retrieve data to text columns

I have added a dataset to the solution (Windows Form) by "Add"-> New item ->DataSet & created a new tableadapter query that fetches desired data against passed parameter in the design time.
Now I want to assign the data filled in the tableadapter to few textboxes while a button is clicked.
How I can achieve this?
I think I found an answer specific to my situation. I am not sure whether it is the best or the standard, as I am not getting much help with searches, I am accepting my own finding as a solution.
private void getnameid2()
{
PersonDataSet newPersonDataSet = new PersonDataSet(); //PersonDataSet is the manually created dataset
PersonDataSetTableAdapters.L_PEOPLETableAdapter newPersonDataSetTableAdapter = new PersonDataSetTableAdapters.L_PEOPLETableAdapter();
DataTable mytable = new DataTable();
mytable = newPersonDataSetTableAdapter.GetData(decimal.Parse(this.civilidTextbox.Text.ToString()));
//foreach (DataRow row in newPersonDataSetTableAdapter.GetData(decimal.Parse(this.civilidTextbox.Text.ToString()))
foreach (DataRow row in mytable.Rows)
{
nameTextBox.Text = row["FIRST_NAME"].ToString();
personidTextBox.Text = row["PERSON_ID"].ToString();
}
// if (mytable.Rows.Count > 0) { MessageBox.Show(mytable.Rows.Count.ToString()); }
}
Now I am calling the procedure while cell is validating to avoid the already saved transactions being updated while browsing the records, I am checking the transaction id column and calling a return to avoid.
Hope this helps someone else out there or brings the attention of experts who can device better approach :)

Get Inserted Row after .Net Dataset Row Added

I have a .Net dataset and am adding a row to a table. This works and the record is saved to the database. How do I get the updated version of my row after the insert. Or, alternatively, how do I know the ID of the item that was added (so that I can then use it in a subsequent child table insert.
MyDataSet.ProjectRow r = dsMyDataSet.Projects.AddProjectRow(txtTitle.Text);
m_daProjects.Update(dsMyDataSet.Projects);
// What is the ID of the new item here?
If the column is an identity column you can find the new ID's in the inserted rows.
You: "thanks. which object maintains a list of inserted rows?"
You can use DataTable.GetChanges(DataRowState.Added) to get a DataTable with all DataRows which are going to be added. You need to use it before AcceptChanges was called. If i remember correctly TableAdapter.Update calls AcceptChanges at the end. Then you need to use it before m_daProjects.Update(dsMyDataSet.Projects):
DataTable addedRows = ds.modModel.GetChanges(DataRowState.Added);
MyDataSet.ProjectRow r = dsMyDataSet.Projects.AddProjectRow(txtTitle.Text);
m_daProjects.Update(dsMyDataSet.Projects);
now addedRows contains all DataRows with the new identity value in each row
foreach(DataRow addedRow in addedRows.Rows)
Console.WriteLine("New ID: {0}", addedRow.Field<int>("IdColumn"));
Update: However, in your case it's simpler. You have already the single row that you want to insert. So you dont need to call DataTable.GetChanges at all.
You can see the new identity value in the (typed DataRow) ProjectRow r after Update.
Thanks to Tim Schmelter. In the link he posted there's a reference to an article on Beth Massi's blog with a complete walkthrough of the solution. It worked for me.
http://blogs.msdn.com/bethmassi/archive/2009/05/14/using-tableadapters-to-insert-related-data-into-an-ms-access-database.aspx
The basic steps are:
1) Add RowUpdated event handler on the strongly typed table adapter. This event handler issues a new OleDBCommand to the database to retrieve ##Identity and then assigns the integer to the member column of the table.
public void _adapter_RowUpdated(dynamic sender, System.Data.OleDb.OleDbRowUpdatedEventArgs e)
{
HMUI.Classes.AccessIDHelper.SetPrimaryKey(this.Connection, e);
}
public static void SetPrimaryKey(OleDbConnection trans, OleDbRowUpdatedEventArgs e)
{
if (e.Status == System.Data.UpdateStatus.Continue && e.StatementType == System.Data.StatementType.Insert)
{
if (pk != null)
{
OleDbCommand cmdGetIdentity = new OleDbCommand("SELECT ##IDENTITY", trans);
// Execute the post-update query to fetch new ##Identity
e.Row.Table.Columns[pk(0)] = Convert.ToInt32(cmdGetIdentity.ExecuteScalar());
e.Row.AcceptChanges();
}
}
}
2) In the constructor of the form using the dataset and table adapter I attach the function in step 1 to the RowUpdated event on the table adapter's internal data adapter.
// Event to handle inserted records and retrieve the primary key ID
m_daDataSources.Adapter.RowUpdated += new System.Data.OleDb.OleDbRowUpdatedEventHandler(m_daDataSources._adapter_RowUpdated);

Categories