Textual Mining on the column Cell of Table that remove the Duplicates based on "##" notation - c#

Let's Assume I have Table in SQL server that represents employee information for example
I want to do the Textual Mining on the Degree column that remove the Duplicates based on "##" notation.
LINQ to SQL
I am using Linq to SQL , so I am planning to get this data in C# variable context.And Perform operation on string and store again to the location!
Rules: i need to update the data or generate new table!
Is this right way of doing whether its possible ? need some suggestion on this approach or any alternative suggestions are welcome

So it looks like you need to break up the string based on the "##" delimiters, take the distinct items, and put them back in -- comma-delimited this time? The String.Split method to break up the string and then LINQ's Distinct extension method should get you just the unique ones.
Assuming you've got the text of the degree in a variable somewhere:
var uniques = degree
.Split(new String[] { "##" }, StringSplitOptions.None)
.Distinct();
String.Split usually works with a single character delimiter, but there's an overload that allows splitting on a larger string, so you'll have to use that one.
Then you can use String.Join to comma-delimit the unique items, or whatever else you need to do.
Edit: Apologies, I thought your original question was more about how to eliminate the duplicates than how to use LINQ to SQL.
Assuming you've got your DataContext and object model set up, you just need to select your object(s) out of the database using LINQ to SQL, make the changes you need to them, and then and then call SubmitChanges() on them.
For example:
var degrees = from d in context.GetTable<Employee>() select d;
foreach (var d in degrees)
{
d.Degree = String.Join(",", d.Degree
.Split(new String[] { "##" }, StringSplitOptions.None)
.Distinct());
}
context.SubmitChanges();
If you're new to LINQ to SQL, it may be worthwhile to run through a tutorial or two first. Here's part 1 of a pretty good series:
Lastly, you mentioned in your edit that you have the option of creating a new table after making your changes -- if that's the case, I'd consider storing the individual degrees in a table that links back to the employee record, rather than storing them as comma-separated values. It depends on your needs, of course, but SQL is designed to work in tables and sets, so the less string parsing/processing you can do the better.
Good luck!

Related

Expand a start/end interval entry into separate rows using LINQ/EF

Given an object (or data row, in SQL terms) representing a range of numbers, with Start and End properties, how do I "expand" those numbers into actual individual units using LINQ in a way that is translatable to EFCore?
This is easy to do in memory with an iterator:
for (var number = range.Start; number <= range.End; number++)
yield return number;
But of course I can't do this with a queryable, since iterating on the original query already goes into the database.
How do I achieve the same behavior while doing this in the database? Say I have a IQueryable<Range> and wanted to convert it into a IQueryable<int>, where for each range, the resulting query would contain all individual entries in the range as separate rows.
I found a few examples of how to achieve this directly in SQL but I wanted to make sure it was possible to do it using LINQ operators and in a way that EF could interpret into the appropriate SQL statement.
Can something like this be coded without using an iterator?

Return Values That Are In Lowercase

We recently discovered a bug in our system whereby any serial numbers that have been entered in lowercase have not been processed correctly.
To correct this, we need to add a one off function that will run through the database and re-process all items with lower case serial numbers.
In linq, is there a query I can run that will return a list of such items?
Note: I am not asking how to convert lowercase to uppercase or reverse, which is all google will return. I need to generate a list of all database entries where the serial number has been entered in lowercase.
EDIT: I am using Linq to MS SQL, which appears to be case insensitive.
Yes, there is. You can try something like this:
var result = serialnumber.Any(c => char.IsLower(c));
[EDIT]
Well, in case of Linq to Entities...
As is stated here: Regex in Linq (EntityFramework), String processing in database, there's few ways to workaround it.
Change database table structure. E.g. create table Foo_Filter which will link your entities to filters. And then create table Filters
which will contain filters data.
Execute query in memory and use Linq to Objects. This option will be slow, because you have to fetch all data from database to memory
Note: link to MSDN documentation has been added by me.
For example:
var result = context.Serials.ToList().Where(sn => sn.Any(c => char.IsLower(c)));
Another way is to use SqlMethods.Like Method
Finally, i'd strongly recommend to read this: Case sensitive search using Entity Framework and Custom Annotation

How to use Linq to check if string property contains any of string in a list collection

I have string with comma separated list of emails
var emails="a#gmail.com,b#gmail.com,c#gmail.com,d#gmail.com,e#gmail.com,";
var list=new List<string>();
list.Add("c#gmail.com");
list.Add("d#gmail.com");
how can I write ling query to find out if string(email) has any email address matching to
collection(list).
I am using EF and emails string is property in a class and list is independent collection.
This solution uses linq. Because your email addresses are comma separated (and ends in a comma) we can check if Any() of the items in the list are contained by the emails string. I used ToLower() to make it case insensitive (which email addresses typically are).
var hasMatch = list.Any(item => ","+emails.ToLower().Contains(","+item.ToLower()+","));
emails.Split(',').Any(e=>list.Contains(e));
alternatively:
emails.Split(',').Intersect(list).Any();
If you are using it to find database records, then you can do this:
db.MyTable.Where(l=>list.Any(e=>l.emails.StartsWith(e+",")) ||
list.Any(e=>l.emails.EndsWith(","+e)) ||
list.Any(e=>l.emails.Contains(","+e+",")) ||
list.Any(e=>l.emails==e)
)
or you can simplify it with:
db.MyTable.Where(l=>list.Any(e=>(","+l.emails+",").Contains(","+e+",")))
The 3rd option may perform better if you are looking for the first record as it might use any index you have on emails to quickly locate the record, but it will generate some really big SQL statements if the list is big (Current implementations of SQL LINQ provider unfortunately translates this to a CHARINDEX function instead of LIKE 'email%', but that could change).
The 4th option will generate simpler SQL, and will likely perform better if want to find all the records that match instead of just the first one.
You can do it following way
emails.Split(',').Any (y => list.Contains (y));
var emailsList = emails.Split(',');
list.Any( x => emailsList.Contains(x));
Antoher approach
HashSet<String> emailSet = New HashSet<String>(temp.Split(","));
list.Any(s => emailSet.Contains(s, StringComparer.Ordinal));
Use HashSet - Contains works faster (if emails string can be big) and remove duplicates
And comparing stringg in case-sensitive without creating new strings by ToLower()

Search Substring on a Integer Value

Let's say we have a mongodb collection that has elements containing an int attribute value like: {"MyCollectionAttribute": 12345}
How can I search the string "234" inside the int using Query<T>. syntax?
For now it seems to work(as explained here) using raw query like:
var query = new QueryDocument("$where", "/234/.test(this.MyCollectionAttribute)");
myCollection.Find(query);
Is it preferable to store the values directly as strings instead of integers, since a regex match will be slow? How do you approach theese situations?
Edit
Context: a company can have some internal codes that are numbers. In sql server they can be stored as a column of int type in order to have data integrity at database level and then queried from linq to sql with something like:
.where(item => item.CompanyCode.ToString().Contains("234"))
In this way there is both data integrity at db level and type safety of the query.
I asked the question in order to see how this scenario can be implemented using mongodb.
Does not make much sense what you are asking.
Regular expressions are for search within strings and not within integers.
If you want to perform a substring search (for whatever reason) then store your numbers
as strings and not as integers - obviously.

How to Pass a list as a param in .NET

I have a query that is basically like this:
SELECT foo
FROM bar
where bar.Id in (1,2,3);
I would like to pass the list of Id's in as a single param with IDbDataParameter where the query is formatted:
SELECT foo
FROM bar
where bar.Id in (?ListOfID);
and then have a single param that is a list rather than having to do something like this:
SELECT foo
FROM bar
where bar.Id in (?id1, ?id2, ?id3);
I know this is possible in other data providers can I do this with the standard System.Data classes?
P.S. the reason I want it as a single list param rather than a series of params is because as the number of params changes MySQL will view the query as new and we loose some of the caching optimizations. MySQl basically ends up with one query per number of ID's. This is the same reason I dont just want to manipulate the base SQl as a string, because then I end up with one query per VAULE and that would be worse.
Is it possible to use:
string[] myParamaters = new string[2];
myParameters[0] = "id1"
myParameters[1] = "id2"
After creating an array and filling it how you want:
SELECT foo
FROM bar
where bar.Id in (string.Join(", ", myParameters));
I'm not totally sure if that was what you were asking, but it's what I think I understood from your post.
If I understand correctly you want to pass in one parameter into your query and have it split out into something that can be used with an 'IN' operator. What I've done in the past is used a string parameter and filled it with comma delimited list (delimiter can be whatever) then created a sql function to turn the comma delimited list into a table. I used the table outputted from the sql function with the 'IN' operator.
I was using MS Sql Server so not sure if this is possible in MYSQL. If it is would like something like
SELECT foo
FROM bar
where bar.Id in (SELECT * FROM ConvertToTableFunctionOrProc(?DelimitedList, ?Delimiter));
The function creates a table with one column and one row for each value in the delimited list. Don't know if this can be done in mysql.
You can pass in a delimited list and use a table-variable function to 'split' the list into rows.
I've posted a example here.
It is unclear to me whether you are talking about LINQ, but if you are, you should flip it around and use Contains().
var whatever = from bar in bars
where ListOfID.Contains(bar.Id)
select bar;
Yes, the list is converted to dynamic sql, but at least you don't have to mess around with escaping/string manipulation on your own.
As far as I know, sql can't accept arrays as parameters to functions/procs at all, so I doubt whether cached execution plans can even express the idea of an array.
If you have a reasonable number of items, you could "overload" the sproc for each count of parameters up to a certain number, beyond which it would be the regular dynamic way.
StuffWithList1 one
StuffWithList2 one two
StuffWithList3 one two three
OR, just
StuffWithList one two three four five six seven eight nine ten eleven ... twenty
and pass nulls for the unneeded parameters:
StuffWithList 8 9 3 null null null null null null null null ... null

Categories