how to get distinct records in datatable? - c#

I am using C# + VS2008 + .Net + ASP.Net + IIS 7.0 + ADO.Net + SQL Server 2008. I have a ADO.Net datatable object, and I want to filter out duplicate/similar records (in my specific rule to judge whether records are duplicate/similar -- if record/row has the same value for a string column, I will treat them as duplicate/similar records), and only keep one of such duplicate/similar records.
The output needs to be a datatable, may output the same datatable object if filter operation could be operated on the same datatable object.
What is the most efficient solution?

Are you using .NET 3.5? If you cast your data rows, you can use LINQ to Objects:
var distinctRows = table.Rows.Cast<DataRow>().Distinct(new E());
...
public class E : IEqualityComparer<DataRow>
{
bool IEqualityComparer<DataRow>.Equals(DataRow x, DataRow y)
{
return x["colA"] == y["colA"];
}
int IEqualityComparer<DataRow>.GetHashCode(DataRow obj)
{
return obj["colA"].GetHashCode();
}
}
Or an even simpler way, since you're basing it on a single column's values:
var distinct = from r in table.Rows.Cast<DataRow>()
group r by (string)r["colA"] into g
select g.First();
If you need to make a new DataTable out of these distinct rows, you can do this:
var t2 = new DataTable();
t2.Columns.AddRange(table.Columns.Cast<DataColumn>().ToArray());
foreach(var r in distinct)
{
t2.Rows.Add(r);
}
Or if it would be more handy to work with business objects, you can do an easy conversion:
var persons = (from r in distinct
select new PersonInfo
{
EmpId = (string)r["colA"],
FirstName = (string)r["colB"],
LastName = (string)r["colC"],
}).ToList();
...
public class PersonInfo
{
public string EmpId {get;set;}
public string FirstName {get;set;}
public string LastName {get;set;}
}
Update
Everything you can do in LINQ to Objects can also be done without it: it just takes more code. For example:
var table = new DataTable();
var rowSet = new HashSet<DataRow>(new E());
var newTable = new DataTable();
foreach(DataColumn column in table.Columns)
{
newTable.Columns.Add(column);
}
foreach(DataRow row in table.Rows)
{
if(!rowSet.Contains(row))
{
rowSet.Add(row);
newTable.Rows.Add(row);
}
}
You could also use a similar strategy to simply remove duplicate rows from the original table instead of creating a new table.

You can do a select into with a group by clause, so not duplicates are created. Then drop the old table and rename the table into which you selected to the original table name.

I would do this in the database layer:
SELECT Distinct...
FROM MyTable
Or if you need aggregates:
SELECT SUM(Field1), ID FROM MyTable
GROUP BY ID
Put the SELECT statement in a stored procedure. Then in .net make a connection to the database, call the stored procedure, execute .ExecuteNonQuery(). Return the rows in a datatable and return the datatable back to your UI.

Related

How do I use LINQ to update a datatable with a SqlDataReader?

I am trying to merge data from two separate queries using C#. The data is located on separate servers or I would just combine the queries. I want to update the data in one of the columns of the first data set with the data in one of the columns of the second data set, joining on a different column.
Here is what I have so far:
ds.Tables[3].Columns[2].ReadOnly = false;
List<object> table = new List<object>();
table = ds.Tables[3].AsEnumerable().Select(r => r[2] = reader.AsEnumerable().Where(s => r[3] == s[0])).ToList();
The ToList() is just for debugging. To summarize, ds.Tables[3].Rows[2] is the column I want to update. ds.Tables[3].Rows[3] contains the key I want to join to.
In the reader, the first column contains the matching key to ds.Tables[3].Rows[3] and the second column contains the data with which I want to update ds.Tables[3].Rows[2].
The error I keep getting is
Unable to cast object of type 'WhereEnumerableIterator1[System.Data.IDataRecord]' to type 'System.IConvertible'.Couldn't store <System.Linq.Enumerable+WhereEnumerableIterator1[System.Data.IDataRecord]> in Quoting Dealers Column. Expected type is Int32.
Where am I going wrong with my LINQ?
EDIT:
I updated the line where the updating is happening
table = ds.Tables[3].AsEnumerable().Select(r => r[2] = reader.AsEnumerable().First(s => r[3] == s[0])[1]).ToList();
but now I keep getting
Sequence contains no matching element
For the record, the sequence does contain a matching element.
You can use the following sample to achieve the join and update operation. Let's suppose there are two Datatables:
tbl1:
tbl2:
Joining two tables and updating the value of column "name1" of tbl1 from column "name2" of tbl2.
public DataTable JoinAndUpdate(DataTable tbl1, DataTable tbl2)
{
// for demo purpose I have created a clone of tbl1.
// you can define a custom schema, if needed.
DataTable dtResult = tbl1.Clone();
var result = from dataRows1 in tbl1.AsEnumerable()
join dataRows2 in tbl2.AsEnumerable()
on dataRows1.Field<int>("ID") equals dataRows2.Field<int>("ID") into lj
from reader in lj
select new object[]
{
dataRows1.Field<int>("ID"), // ID from table 1
reader.Field<string>("name2"), // Updated column value from table 2
dataRows1.Field<int>("age")
// .. here comes the rest of the fields from table 1.
};
// Load the results in the table
result.ToList().ForEach(row => dtResult.LoadDataRow(row, false));
return dtResult;
}
Here's the result:
After considering what #DStanley said about LINQ, I abandoned it and went with a foreach statement. See code below:
ds.Tables[3].Columns[2].ReadOnly = false;
while (reader.Read())
{
foreach (DataRow item in ds.Tables[3].Rows)
{
if ((Guid)item[3] == reader.GetGuid(0))
{
item[2] = reader.GetInt32(1);
}
}
}

Populate a List with all the table names which is having specified columns through sql in c#

I have a database in sql server which is having few tables in it.
I need to populate a listbox which contains a list of tables names from the database which contains a specified column name
say 'special' .
i have tried something like..
using (SqlConnection connection = new SqlConnection(connectionString))
{
connection.Open();
List<string> tables = new List<string>();
DataTable dt = connection.GetSchema("Tables");
foreach (DataRow row in dt.Rows)
{
string tablename = (string)row[2];
tables.Add(tablename);
}
listbox1.ItemsSource = tables;
connection.Close();
}
but it is showing all the tables present in the database..
but i want only those table which have a specific columns in a list...
Kindly suggest me the way ... :)
You can use this linq query (now tested):
List<string> tNames= new List<string>(); // fill it with some table names
List<string> columnNames = new List<string>() { "special" };
// ...
IEnumerable<DataRow> tableRows = con.GetSchema("Tables").AsEnumerable()
.Where(r => tNames.Contains(r.Field<string>("TABLE_NAME"), StringComparer.OrdinalIgnoreCase));
foreach (DataRow tableRow in tableRows)
{
String database = tableRow.Field<String>("TABLE_CATALOG");
String schema = tableRow.Field<String>("TABLE_SCHEMA");
String tableName = tableRow.Field<String>("TABLE_NAME");
String tableType = tableRow.Field<String>("TABLE_TYPE");
IEnumerable<DataRow> columns = con.GetSchema("Columns", new[] { database, null, tableName }).AsEnumerable()
.Where(r => columnNames.Contains(r.Field<string>("COLUMN_NAME"), StringComparer.OrdinalIgnoreCase));
if (columns.Any())
{
tables.Add(tableName);
}
}
IMHO you should simply query the INFORMATION_SCHEMA.COLUMNS table instead of trying to filter the returned schema. First retrieving the hole schema to just throw most of the data away is totally ineffective.
SELECT c.TABLE_NAME
FROM INFORMATION_SCHEMA.COLUMNS c
WHERE c.COLUMN_NAME = 'YourLovelyColumnName'
Assuming you are working on SQL Server:
IF COL_LENGTH('table_name','column_name') IS NOT NULL
BEGIN
/*Column exists */
END
See more:
How to check if column exists in SQL Server table

How to copy all the rows in a datatable to a datarow array?

I have two tables:
tbl_ClassFac:
ClassFacNo (Primary Key)
,FacultyID
,ClassID
tbl_EmpClassFac:
EmpID, (Primary Key)
DateImplement, (Primary Key)
ClassFacNo
I want to know all the Employees who are on a specific ClassFacNo. ie. All EmpID with a specific ClassFacNo... What I do is that I first search tbl_EmpClassFac with the EmpID supplied by the user. I store these datarows. Then use the ClassFacNo from these datarows to search through tbl_ClassFac.
The following is my code.
empRowsCF = ClassFacDS.Tables["EmpClassFac"].Select("EmpID='" + txt_SearchValueCF.Text + "'");
int maxempRowsCF = empRowsCF.Length;
if (maxempRowsCF > 0)
{
foundempDT = ClassFacDS.Tables["ClassFac"].Clone();
foreach (DataRow dRow in empRowsCF)
{
returnedRowsCF = ClassFacDS.Tables["ClassFac"].Select("ClassFacNo='" + dRow[2].ToString() + "'");
foundempDT.ImportRow(returnedRowsCF[0]);
}
}
dataGrid_CF.DataSource = null;
dataGrid_CF.DataSource = foundempDT.DefaultView;
***returnedRowsCF = foundempDT.Rows;*** // so NavigateRecordsCF can be used
NavigateRecordsCF("F"); // function to display data in textboxes (no importance here)
I know the code is not very good but that is all I can think of. If anyone has any suggestions please please tell me. If not tell me how do I copy all the Rows in a datatable to a datarow array ???
"How to copy all the rows in a datatable to a datarow array?"
If that helps, use the overload of Select without a parameter
DataRow[] rows = table.Select();
DataTable.Select()
Gets an array of all DataRow objects.
According to the rest of your question: it's actually not clear what's the question.
But i assume you want to filter the first table by a value of a field in the second(related) table. You can use this concise Linq-To-DataSet query:
var rows = from cfrow in tbl_ClassFac.AsEnumerable()
join ecfRow in tbl_EmpClassFac.AsEnumerable()
on cfrow.Field<int>("ClassFacNo") equals ecfRow.Field<int>("ClassFacNo")
where ecfRow.Field<int>("EmpId") == EmpId
select cfrow;
// if you want a new DataTable from the filtered tbl_ClassFac-DataRows:
var tblResult = rows.CopyToDataTable();
Note that you can get an exception at CopyToDataTable if the sequence of datarows is empty, so the filter didn't return any rows. You can avoid it in this way:
var tblResult = rows.Any() ? rows.CopyToDataTable() : tbl_ClassFac.Clone(); // empty table with same columns as source table

How to get my datasource only once for combobox?

I use telerik:RadComboBox
Like this :
<telerik:RadComboBox runat="server" ID="RadComboBox1" EnableLoadOnDemand="true"
ShowMoreResultsBox="true" EnableVirtualScrolling="true" CollapseDelay="0" Culture="ar-EG" ExpandDelay="0" Filter="StartsWith" ItemsPerRequest="100"
MarkFirstMatch="true" Skin="Outlook" ValidationGroup="L" Width="202px" EnableAutomaticLoadOnDemand="True"
EmptyMessage="-Enter user name-"
EnableItemCaching="true" >
<WebServiceSettings Path="../WebService/Employees.asmx" Method="LoadData" />
and my web service :
[System.Web.Script.Services.ScriptService]
public class Employees : System.Web.Services.WebService
{
[WebMethod(EnableSession = true)]
public RadComboBoxData LoadData(RadComboBoxContext context)
{
RadComboBoxData result = new RadComboBoxData();
DataTable dt = FollowsDAL.GetAllEmployees();
var allEmployees = from r in dt.AsEnumerable()
orderby r.Field<string>("name")
select new RadComboBoxItemData
{
Text = r.Field<string>("name").ToString().TrimEnd()
};
string text = context.Text;
if (!String.IsNullOrEmpty(text))
{
allEmployees = allEmployees.Where(item => item.Text.StartsWith(text));
}
//Perform the paging
// - first skip the amount of items already populated
// - take the next 10 items
int numberOfItems = context.NumberOfItems;
var employees = allEmployees.Skip(numberOfItems).Take(100);
result.Items = employees.ToArray();
int endOffset = numberOfItems + employees.Count();
int totalCount = allEmployees.Count();
//Check if all items are populated (this is the last page)
if (endOffset == totalCount)
result.EndOfItems = true;
//Initialize the status message
result.Message = String.Format("Items <b>1</b>-<b>{0}</b> out of <b>{1}</b>",
endOffset, totalCount);
return result;
}}
My problem is :
Although this control is so fast , every time i enter specific name firstly it fetches the 20000 employee in the datatable dt !!!
with every character .
My question is:
How it 's fast like this with this bad behavior?
Is there some way to get all the employees only once ?
How to enhance the performance?
It is always better to use server side filtering, because you do not need to retreive 20000 records to the webserver to use 10 or 20 items to return.
http://demos.telerik.com/aspnet-ajax/combobox/examples/populatingwithdata/autocompletesql/defaultcs.aspx
Your DAL should have a method to filter the results based on the sent text, then you add them to the combobox. My DAL is Telerik OpenAccess ORM (Linq2SQL) but you could also write a stored procedure to filter the results as well.
Here is an example of one of my asmx services that populates a radcombobox:
[WebMethod]
public RadComboBoxData FindEmployee(RadComboBoxContext context)
{
RadComboBoxData comboData = new RadComboBoxData();
using (DataBaseContext dbc = new DataBaseContext())
{
IQueryable<Employee> Employees = dbc.FindEmployee(context.Text);
int itemOffset = context.NumberOfItems;
int endOffset = Math.Min(itemOffset + 10, Employees.Count());
List<RadComboBoxItemData> result = new List<RadComboBoxItemData>();
var AddingEmployees = Employees.Skip(itemOffset).Take(endOffset - itemOffset);
foreach (var Employee in AddingEmployees)
{
RadComboBoxItemData itemData = new RadComboBoxItemData();
itemData.Text = Employee.Person.FullName;
itemData.Value = Employee.EmployeeID.ToString();
result.Add(itemData);
}
comboData.EndOfItems = endOffset == Employees.Count();
comboData.Items = result.ToArray();
if (Employees.Count() <= 0)
comboData.Message = "No matches";
else
comboData.Message = String.Format("Items <b>1</b>-<b>{0}</b> out of <b>{1}</b>", endOffset, Employees.Count());
return comboData;
}
}
and in case you are wondering what my FindEmployee method is:
public IQueryable<Employee> FindEmployee(string SearchString, bool IncludeInactive = false)
{
return from e in this.Employees
where
(e.EmployeeID.ToString() == SearchString ||
e.Person.FirstName.Contains(SearchString) ||
e.Person.MiddleName.Contains(SearchString) ||
e.Person.LastName.Contains(SearchString) ||
(e.Person.FirstName + " " + e.Person.LastName).Contains(SearchString) ||
(e.Person.FirstName + " " + e.Person.MiddleName).Contains(SearchString) ||
(e.Person.FirstName + " " + e.Person.MiddleName + " " + e.Person.LastName).Contains(SearchString)) &&
((e.Inactive == false || e.Inactive == null) && IncludeInactive == false)
select e;
}
According to my understanding, sending request to the Database over and over again for the same purpose is not good for the Application health.
There are basically two ways to make the process fast.
Bring the Data in the form of DataTable from you DataBase.
Bring the Data in the form of DataSet from you DataBase.
DataTable Approach
Fetch all the records from the Database during your Form Load. Preserve it in the ViewState and not in Session. Please take care of this point. Access the Data like below..
Now access the ViewState. Type Cast it and access the below mentioned function.
public static class GetFilteredData
{
public static DataTable FilterDataTable(this DataTable Dt, string FilterExpression)
{
using (DataView Dv = new DataView(Dt))
{
Dv.RowFilter = FilterExpression;
return Dv.ToTable();
}
}
}
DataTableObject.FilterDataTable("Search Expression or your string variable")
This will return you the DataTable. Reassign the data to the control without any DataBase trips. Execute this step whenever you have to filter the records.
DataSet Approach
This process will send 26 DataTable from your database. I know it is looking very heavy. But as you have already mentioned that total records will be 25,000. So, all these records will be divided among these tables. Please see below the explanation.
The ComboBox DataField Text column can have 26 different Start With characters. You have to divide these records according to the Start with character. Record start with A will be inserted into First Table. Records start with B will be inserted into second table, records start with C will be inserted into third table and so on till Record start with Z will be inserted into 26th Table.
Please Note that Your UDT query will originally be used to insert all records in a Local Temporary Table. This Local Temporary Table will further have 26 select statements based upon the Start With Character.
Below is the Sample Stored Proc.
Create Proc ProcName
As
Create Table #Temp
(
ColumnName Varchar(50)
)
Insert into #Temp(ColumnName)
Select ColumnName from YourTableName
Select ColumnName From #Temp Where ColumnName like 'a%'
Select ColumnName From #Temp Where ColumnName like 'b%'
Select ColumnName From #Temp Where ColumnName like 'c%'
--UpTo Z
Now, Finally you have 26 Tables and Data will be returned as DataSet from your BLL.
Preserve it in ViewState only. Now will filtering the data, Please use the below mentioned function.
public static class GetFilteredData
{
public static DataTable FilterDataTable(this DataSet Dt, string FilterExpression)
{
string Lowercase = FilterExpression.ToLower();
Int16 TableID = 0;
if (Lowercase.StartsWith("a"))
{
TableID = 0;
}
else if (Lowercase.StartsWith("b"))
{
TableID = 1;
}
else if (Lowercase.StartsWith("c"))
{
TableID = 2;
}
//upTo Z
using (DataView Dv = new DataView(Dt.Tables[TableID]))
{
Dv.RowFilter = FilterExpression;
return Dv.ToTable();
}
}
}
So what we have understood the significance of using DataSet Technique is that, the records are further divided into Sub Nodes in the for of Tables. Your Search expression will be implemented on Splitted Nodes of DataSet rather then the Original DataSet.
Code Modification as Per mentioned in the Original Query
Add the following in your Web Application/WebSite only.
public static class GetFilteredData
{
public static DataTable FilterDataTable(this DataTable Dt, string FilterExpression)
{
using (DataView Dv = new DataView(Dt))
{
Dv.RowFilter = FilterExpression;
return Dv.ToTable();
}
}
}
Add the following Property in the WebForm itself. The following Property will return you the result set from Database in case the ViewState is null. Otherwise it will return the ViewState preserved data only.
public DataTable Employees
{
get
{
if (ViewState["Employees"] == null)
{
return FollowsDAL.GetAllEmployees();
}
return (DataTable)ViewState["Employees"];
}
set
{
ViewState["Employees"] = value;
}
}
Now you can access this ViewState in your WebForm , where you have Combobox control. As per my understanding you should go for DataSet Approach.
Please note that WebService is not required in this context.
I would create a method that loaded the values from your database and then stored them in cache. Subsequent calls to this method should return the cached version. Then set the DataSource to this method. That should give you a very nice performance boost.
http://msdn.microsoft.com/en-us/library/system.web.caching.cache.aspx
I think your solution should be a mix of answers by #PraVn and #nurgent. Write a stored procedure which filters records by search string. Have your DAL call this SP using a method which in-turn is called from your existing web method public RadComboBoxData LoadData(RadComboBoxContext context)

How to query a DataTable in memory to fill another data table

I am trying to update a Microsoft report. What it does is write out how many clients where excluded from a conversion process and for what reason. Currently the program writes all of the deleted clients back to the server then queries it back to fill a specialty table with the results.
Here is the current query:
SELECT DeletedClients.Reason,
COUNT(DeletedClients.Reason) AS Number,
CAST(CAST(COUNT(DeletedClients.Reason) AS float)
/ CAST(t.Total AS float)
* 100 AS numeric(4, 1)) AS percentage
FROM DeletedClients CROSS JOIN
(SELECT COUNT(*) AS Total
FROM DeletedClients AS DeletedClients_1
WHERE (ClinicID = #ClinicID)) AS t
WHERE (DeletedClients.ClinicID = #ClinicID)
AND (DeletedClients.TotalsIdent = #ident)
GROUP BY DeletedClients.Reason, t.Total
ORDER BY Number DESC
What I would like to do is not write DeletedClients to the server as it already exists in memory in my program as a DataTable and it is just slowing down the report and filling the database with information we do not need to save.
My main question is this, Either :
How do I query a data table to make a new in memory data table that has the same results as if I wrote out the the SQL server and read it back in with the query above?
OR
How in Microsoft Reports do you do a group by clause for items in a Tablix to turn =Fields!Reason.Value =Fields!Number.Value =Fields!percentage.Value into something similar to the returned result from the query above?
You can use DataTable.Select to query the DataTable.
DataTable table = GetDataTableResults();
DataTable results = table.Select("SomeIntColumn > 0").CopyToDataTable();
Or for more complex queries, you can use LINQ to query the DataTable:
DataTable dt = GetDataTableResults();
var results = from row in dt.AsEnumerable()
group row by new { SomeIDColumn = row.Field<int>("SomeIDColumn") } into rowgroup
select new
{
SomeID = rowgroup.Key.SomeIDColumn,
SomeTotal = rowgroup.Sum(r => r.Field<decimal>("SomeDecimalColumn"))
};
DataTable queryResults = new DataTable();
foreach (var result in query)
queryResults.Rows.Add(new object[] { result.SomeID, result.SomeTotal });
There are two ways that I can think of to query the data table. Below is an example using both ways.
using System;
using System.Data;
namespace WindowsFormsApplication1
{
static class Program
{
[STAThread]
static void Main()
{
var deletedClients = GetDataTable();
// Using linq to create the new DataTable.
var example1 = deletedClients.AsEnumerable()
.Where(x => x.Field<int>("ClinicId") == 1)
.CopyToDataTable();
// Using the DefaultView RowFilter to create a new DataTable.
deletedClients.DefaultView.RowFilter = "ClinicId = 1";
var rowFilterExample = deletedClients.DefaultView.ToTable();
}
static DataTable GetDataTable()
{
var dataTable = new DataTable();
// Assumes ClinicId is an int...
dataTable.Columns.Add("ClinicId", typeof(int));
dataTable.Columns.Add("Reason");
dataTable.Columns.Add("Number", typeof(int));
dataTable.Columns.Add("Percentage", typeof(float));
for (int counter = 0; counter < 10; counter++)
{
dataTable.Rows.Add(counter, "Reason" + counter, counter, counter);
}
return dataTable;
}
}
}

Categories