LINQ - how to remove duplicate rows in table

LINQ - how to remove duplicate rows in table - c#

After certain proccess, I wan to remove duplicates from the table and commit the changes, so only single values remain.
I have three criteria for removal:
Name
date
status (is always 1)
So if there are records with same Name, and same date and same status... remove one. Does not matter which one.
I have:
dbContext.tbl_mytable

Since you are talking about deleting records, you need to test this first.
So if there are records with same Name, and same date and same status... remove one. Does not matter which one.
I'm assuming you want to remove all but one, ie, if you have three records with the same details, you remove two and leave one.
If so, you should be able to identify the duplicates by grouping by { Name, date, status} and then selecting all except the first record in each group.
ie something like
var duplicates = (from r in dbContext.tbl_mytable
group r by new { r.Name, r.date, r.status} into results
select results.Skip(1)
).SelectMany(a=>a);
dbContext.tbl_mytable.DeleteAllOnSubmit(duplicates);
dbContext.SubmitChanges();

Related

Full match against a List of strings per id

I need a Linq query that will return null if not all the rows have matching string from within a List<string> for each hardware_id column.
I have the following table:
id (int) - Primary Key
name (string)
user_id (int)
hardware_id (int)
I have a List<string> that contain phrases. I want the query to return the hardare_id number if all the phrases in the List have matching strings in the name row. If there one of the phrases doesn't have a name match, to return null and if all the phrases exist per each hardware_id for all the phrases, the query should return the list of hardware_id's that each one of those hardware_id's, have full match with all the phrases within the List.
Or in other words, return a list of hardware_id's that each id, has its all name 's matching the ones in the List<string>.
I thought about iterating each Id in a different query but it's not an effective way to do it. Maybe you know a good query to tackle this.
I'm using Entity Framework 6 / C# / MySQL
Note: the query is done only per user id. So I filter the table first by the User Id and then need to find the matching hardare_id's that satisfy the condition.

Group on hardware_id and then look for all phrases existence in the List
table.GroupBy(x=>x.hardware_id)
.Where(x=> x.All(s=> phrases.Contains(s.name))
.Select(x=>x.Key);

Sqlite get last item where condition does apply in .Net

How to get the last Item of Database where a Special condition does apply?
var History = (from c in conn.Table<HistoryItem>() select c.Done); //how to get last Item where c.Done is true?

You would need to specify an order condition to have Last make any sense. Otherwise, it would simply be Last() or LastOrDefault() (or you can order in the order direction and take First()
var History = (from c in conn.Table<HistoryItem>() select c.Done)
.OrderByDescending(c => c.DateCreated).FirstOrDefault();
Would select the newest history item. Again, going back to the point about ordering being important.. Whether that's the last or first item is a matter of perspective. It's first if you were to view the change log newest -> oldest, but it's the last in terms of date added.

Iterating through two identical data sources

I have data with the same schema in a pipe delimited text file and in a database table, including the primary key column.
I have to check if each row in the file is present in the table, if not generate an INSERT statement for that row.
The table has 30 columns, but here I've simplified for this example:
ID Name Address1 Address2 City State Zip
ID is the running identity column; so if a particular ID value from the file is found in the table, there should be no insert statement generated for that.
Here's my attempt, which doesn't feel correct:
foreach (var item in RecipientsInFile)
{
if (!RecipientsInDB.Any(u => u.ID == item.ID ))
{
Console.WriteLine(GetInsertSql(item));
}
}
Console.ReadLine();
EDIT: Sorry, I missed the asking the actual question; how to do this?
Thank you very much for all the help.
EDIT: The table has a million plus rows, while the file has 50K rows. This a one time thing, not a permanent project.

I would add all the RecipientsInDB Ids in a HashSet and then test if the set contains the item Id.
var recipientsInDBIds = new Hashset(RecipientsInDB.Select(u => u.ID));
foreach (var item in RecipientsInFile)
{
if (!recipientsInDBIds.Contains(item.ID ))
{
Console.WriteLine(GetInsertSql(item));
}
}
Console.ReadLine();

Try comparing the ID lists using .Except()
List<int> dbIDs = Recipients.Select(x=>x.ID).ToList();
List<int> fileIDs = RecipientsFile.Select(x=>x.ID).ToList();
List<int> toBeInserted = fileIDs.Except(dbIDs).ToList();
toBeInserted.ForEach(x=>GetInsertSqlStatementForID(x));
For the pedantic and trollish among us in the comments, please remember the above code (like any source code you find on the interwebs) shouldn't be copy/pasted into your production code. Try this refactoring:
foreach (var item in RecipientsFile.Select(x=>x.ID)
.Except(DatabaseRecipients.Select(x=>x.ID)))
{
GetInsertSqlStatementForID(item);
}

Lots of ways of accomplishing this. Yours is one way.
Another would be to always generate SQL, but generate it in the following manner:
if not exists (select 1 from Recipients where ID == 1234)
insert Recipients (...) values (...)
if not exists (select 1 from Recipients where ID == 1235)
insert Recipients (...) values (...)
Another would be to retrieve the entire contents of the database into memory beforehand, loading the database IDs into a HashSet, then only checking that HashSet to see if it exists - would take a little longer to get started, but would be faster for each record.
Any of these three techniques would work - it all depends on how big your database table is, and how big your file is. If they're both relatively small (maybe 10,000 records or so), then any of these should work fine.
EDIT
And there's always option D: Insert all records from the file into a temporary table (could be a real table or a SQL temp table, doesn't really matter) in the database, then use SQL to join the two tables together and retrieve the differences (using not exists or in or whatever technique you want), and insert the missing records that way.

Collecting metadata into table

I have tabluar data that passes through a C# program that I need to collect some metadata on before finishing. The metadata is always counts based on fields of the data. Also, I need them all grouped by one field in the data. Periodically, I need to add new counts to this collection of metadata.
I've been researching it for a little while, and I think what makes sense is to rework my program to store the data as a DataTable, then run LINQ queries on the table. The problem I'm having is being able to put the different counts into one table-like structure and then write that out.
I might run a query like this:
var query01 =
from record in records.AsEnumerable()
group record by record.Field<String>("Association Key") into associationsGroup
select new { AssociationKey = associationsGroup.Key, Count = associationsGroup.Count<DataRow>() };
To get a count of all of the records grouped by the field Association Key. I'm going to want another count, grouped in the same way:
var query02 =
from record in records.AsEnumerable()
where record.Field<String>("Number 9") == "yes"
group record by record.Field<String>("Association Key") into associationsGroup
select new { AssociationKey = associationsGroup.Key, Number9Count = associationsGroup.Count<DataRow>() };
And so on.
I thought about trying Union chain the queries but I was having trouble getting them to union since I'm projecting into anonymous types. I couldn't figure out how to do it differently to make a union work better.
So, how can I collect my metadata into one table-like structure?

Not going to union because you have different types. Add Number9Count and Count to both annonymous types and try union again.

I ended up solving the problem by creating a class that holds the set of records I need as a DataTable. A user can add queries through a method, taking an argument Func<DataRow, bool>. The method constructs the query supplying that argument as the where clause, maintaining the same grouping and properties in the resulting anonymous-typed object.
When retrieving the results, the class iterates over each query stored and enters the results into a new DataTable.

group by first letter of the string

I have a Person table, with huge number of records, and want to group by duplicate persons in it,
by one of requirements persons are duplicates, if they have the same family name, and the first letter of first names are equal, so I want to group by first name, and first letter of family name, is there a way to group in sql like this? I need this in C#, so some code processing could be done, but the number of persons is huge, so it should be fast algorithm.

If I understand you correctly, from SqlServer you can do something like
SELECT DISTINCT
Surname,
LEFT(FirstName,1) FirstNameLetter
FROM Persons
Other than that, we will need a little bit more detail. Table schema, expected result set, etc...

SELECT MEMBER.MEMBER_FIRSTNAME, COUNT(MEMBER.MEMBER_LASTNAME)
FROM dbo.MEMBER
GROUP BY MEMBER.MEMBER_FIRSTNAME, SUBSTRING(MEMBER.MEMBER_LASTNAME, 1,1)
HAVING COUNT(MEMBER.MEMBER_LASTNAME) > 1
This query will give you all (members in this case) where the first name is the same and the last name's first letter is the same for more than one member. In other words duplicates as you've defined it.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ - how to remove duplicate rows in table - c#

Related

Full match against a List of strings per id

Sqlite get last item where condition does apply in .Net

Iterating through two identical data sources

Collecting metadata into table

group by first letter of the string

Categories

Resources