remove strings from linq query - c#

I have a linq query that returns Names like:
Joe
Joe (1)
Joe Bloggs
Joe (2)
Joe Bloggs (1)
What is the best way to remove the Joe Bloggs entries from the list so my result set would be - the format for name been duplicates will always be (n) where n is the incremented number for that name:
Joe
Joe (1)
Joe (2)
I was using the below to attempt to remove the elements I needed
newResults = results.Distinct().ToList();
However this it appears would only remove Duplicate entries. I was hoping to use a regex expression in linq query to find any entries that are Joe and any entries that are Joe ( but not sure if regex is the correct way to implement.
I have also plugged in a linq query as below:
var searchName = "Joe";
var results = names.Where(name => name.Equals(searchName)).ToList();
again this only returns 1 entry - can you include a regex in the .Equals to say find any names where the name equals the search name or equals the search name + space + (

You can you Regex.IsMatch method to get only these strings starting with Joe followed by an optional (1), (2), etc. string:
var pattern = #"Joe(?=(\W?\(\d+\))|$)";
var results = names.Where(name => Regex.IsMatch(name, pattern)).ToList();
Based on the sample input you provided results will be {"Joe", "Joe (1)", "Joe (2)"}.

You need to refine your query to return only the data that you want. If this is not possible, then you must apply a further query to the collection once obtained. Barring this, try using a loop structure to remove unwanted entries by comparing each item in the collection and then responding to the matches with desired code.

If i understand you correctly, you want to exclude records which contains "Bloggs" word, so...
List<string> joes = new List<string>{"Joe", "Joe (1)", "Joe Bloggs", "Joe (2)", "Joe Bloggs (1)"};
var qry = joes.Where(j=>!j.Contains("Bloggs"));
result:
Joe
Joe (1)
Joe (2)

Related

How to construct a lucene search query with multiple parameters

I am new to Lucene.net ,Here I would like to know how to make a lucene search query almost like an sql query .Lemme give more..
I have set of parameter values,Let assume like a stored procedure has set of parameters .Now I want to build a query with all this parameters.
searchParams.UseLast = Convert.ToBoolean(base.Arguments["UseLast"]);
searchParams.LastEditedFrom= Convert.ToDateTime(base.Arguments["LastEditedFrom"]);
searchParams.LastEditedTo = Convert.ToDateTime(base.Arguments["LastEditedTo"]);
searchParams.Reviewed = Convert.ToBoolean(base.Arguments["Reviewed"]);
searchParams.Approved = Convert.ToBoolean(base.Arguments["Approved"]);
searchParams.Include = Convert.ToBoolean(base.Arguments["Include"]);
searchParams.IsVisibleToUser = Convert.ToBoolean(base.Arguments["IsVisibleToUser"]);
searchParams.IsEntry = Convert.ToBoolean(base.Arguments["IsEntry"]);
searchParams.UserId = Convert.ToInt32(base.Arguments["UserId"]);
IEnumerable Categories = base.Arguments["Categories"] as IEnumerable;
IEnumerable Departments = base.Arguments["Departments"] as IEnumerable;
String mQuery = "How to construct it ....!!!" // Need help in this
var query = queryParser.Parse(mQuery);
indexSearcher.Search(query, collector);
Here I want to fetch all records from lucene index which has the value for all the above fields.
I'm unclear what you are using searchParams for, however in general you may construct your query string (mQuery) in this cases with any of the features of the Lucene query syntax. Here is a link to the documentation for Lucene.Net version 4.8 Query Parser Syntax.
In general, when multiple words are listed in the query they are treated with a logical OR but doc matches that contain all terms are ranked higher than docs with only one term. So for example white dog would match docs containing white dog or white or dog. You can put and in the statement if you only want docs that match all the terms so for example you could say small and white and dog if you only want docs that contain all three terms.
To specify the specific field to search you list the field name followed by a colon. So for example you can search UserId:ron and Categories:dogs. There is much more to the Lucene query syntax but hopefully that will get you started. For more details see the Lucene query syntax doc I referred to.

Merging queries of same table N times

I have a table of word, a lookup table of where those words are found in documents, and the number of times that word appears in that document. So there might be a record that says Alpha exists 5 times in document X, while Beta exists 3 times in document X, and another for Beta existing twice in document Y.
The user can enter a number of words to search, so "quick brown fox" is three queries, but "quick brown fox jumped" is four queries. While I can get a scored result set of each word in turn, what I actually want is to add the number of occurrences together for each word, such that the top result is the highest occurrence count for all words.
A document might have hundreds of "quick" and "brown" occurrences but no "fox" occurrences. The results should still be included as it could score higher than a document with only one each of "quick", "brown", and "fox".
The problem I can't work out is how to amalgamate the 1 to N queries with the occurences summed. I think I need to use GROUP BY and SUM() but not certain. Linq preferred but SQL would be ok. MS SQL 2016.
I want to pass the results on to a page indexer so a for-each over the results wouldn't work, plus we're talking 80,000 word records, 3 million document-word records, and 100,000 document records.
// TextIndexDocument:
// Id | WordId | Occurences | DocumentId | (more)
//
// TextIndexWord:
// Id | Word
foreach (string word in words)
{
string lword = word.ToLowerInvariant();
var results = from docTable in db.TextIndexDocuments
join wordTable in db.TextIndexWords on docTable.WordId equals wordTable.Id
where wordTable.Word == lword
orderby docTable.Occurences descending
select docTable;
// (incomplete)
}
More information
I understand that full text searching is recommended. The problem then is how to rank the results from a half dozen unrelated tables (searching in forum posts, articles, products...) into one unified result set - let's say record Id, record type (article/product/forum), and score. The top result might be a forum post while the next best hits are a couple of articles, then a product, then another forum post and so on. The TextIndexDocument table already has this information across all the relevant tables.
Let's assume that you can create a navigation property TextIndexDocuments in Document:
public virtual ICollection<TextIndexDocuments> TextIndexDocuments{ get; set; }
and a navigation property in TextIndexDocument:
public virtual TextIndexWord TextIndexWord { get; set; }
(highly recommended)
Then you can use the properties to get the desired results:
var results =
(
from doc in db.Documents
select new
{
doc,
TotalOccurrences =
doc.TextIndexDocuments
.Where(tid => lwords.Contains(tid.TextIndexWord.Word))
.Sum(doc => doc.Occurrences)
}
).OrderByDescending(x => x.TotalOccurrences)
As far as I know this can not, or at least easily, be accomplished in LINQ, especially in any kind of performant way.
What you really should consider, assuming your DBA will allow it, is Full-Text indexing of your documents stored in SQL Server. From my understanding the RANK operator is exactly what you are looking for which has been highly optimized for Full-Text.
In response to your comment: (sorry for not noticing that)
You'll need to either do a series of subqueries or Common-Table-Expressions. CTE's are a bit hard to get used to writing at first but once you get used to them they are far more elegant than the corresponding query written with sub queries. Either way the query execution plan will be exactly the same, so there is no performance gain from going the CTE route.
You want to add up occurences for the words per document. So group by document ID, use SUM and order by total descending:
select documentid, sum(occurences)
from doctable
where wordid in (select id from wordtable where word in 'quick', 'brown', 'fox')
group by documentid
order by sum(occurences) desc;

How to query part of a field in Lucene.NET

Say I have an index containing a collection of Users, storing their full name in a Name field. Some of these users are of the format "Firstname Lastname", and some are "Firstname Middlename(s) Lastname"
e.g.
Joe Bloggs
Joe Fred Bloggs
Joe John Paul Bloggs
If I search for "Joe Bloggs", I need it to return all users listed above.
I've tried using a PhraseQuery , but this will only return 'Joe Bloggs' (I presume due to terms needing to be in the correct order).
Is my only option to use a WildcardQuery? I wouldn't want 'Joe Smith' or 'John Bloggs' to be returned. Also, I can't rework the index to split the full name into separate fields.
How best should I form my query to get things to work as required?
Thanks
Take care which analyzer you use.
You probably just want "whitespace" to break words at ws. Plus "lower case" so that "fred" matches "Fred"
You query should simply be "name:joe AND name:bloggs" (or the equivalent if you are constructing your query objects manually)
This says that the name field MUST contain both words

Full match against a List of strings per id

I need a Linq query that will return null if not all the rows have matching string from within a List<string> for each hardware_id column.
I have the following table:
id (int) - Primary Key
name (string)
user_id (int)
hardware_id (int)
I have a List<string> that contain phrases. I want the query to return the hardare_id number if all the phrases in the List have matching strings in the name row. If there one of the phrases doesn't have a name match, to return null and if all the phrases exist per each hardware_id for all the phrases, the query should return the list of hardware_id's that each one of those hardware_id's, have full match with all the phrases within the List.
Or in other words, return a list of hardware_id's that each id, has its all name 's matching the ones in the List<string>.
I thought about iterating each Id in a different query but it's not an effective way to do it. Maybe you know a good query to tackle this.
I'm using Entity Framework 6 / C# / MySQL
Note: the query is done only per user id. So I filter the table first by the User Id and then need to find the matching hardare_id's that satisfy the condition.
Group on hardware_id and then look for all phrases existence in the List
table.GroupBy(x=>x.hardware_id)
.Where(x=> x.All(s=> phrases.Contains(s.name))
.Select(x=>x.Key);

Dynamic Parameter Count for SQL with C#

So I was thinking about creating a dynamic sql question, meaning that i want the amount of parameters to be dynamic.
Having looked at this: Parameterize an SQL IN clause i was thinking that using like '%x%' is SLOW and not good.
What i actually would like to do is using the IN keyword and have it somewhat like this:
select UserId from NameTable where Name IN (#List_of_names)
Where the #List_of_names could contain i.e.
Filip Ekberg
Filip
Ekberg Filip
Ekberg
Johan Ekberg
Johan
( my second name is johan, thats why it's in there ,) )
So all these should match up with Johan Filip Ekberg.
I want to be using either LINQ to SQL or LINQ to SQL ( Stored Procedure ) using C#.
Suggestions and thoughts please!
----------------- Clearification of scenario and tables -------------------------
Imagine i have the following: A table with Users, A table with Names and a Table that connects the Names to a certain user.
So by having
[User Table]
User ID Full Name
1 Johan Filip Ekberg
2 Anders Ekberg
[Names]
Name ID Name
1 Filip
2 Ekberg
3 Johan
4 Anders
[Connect Names]
Name ID User ID
1 1
2 1
3 1
2 4
2 2
So if i want to look for: Ekberg
The return should be:
Johan Filip Ekberg
Anders Ekberg
If i want to search for Johan Ekberg
Johan Filip Ekberg
If i want to search for Anders
Anders Ekberg
The amount of "search names" can be infinite, if i have the name: Johan Anders Carl Filip Ekberg ( This is just 1 person with many names ) and i just pass "Carl Filip" in, i want to find that User.
Does this make everything clearer?
It looks like SQL 2008 is an option, so passing a table parameter is probably your best bet. Another possible solution for those who can't move to 2008 yet, is to write a table-valued UDF that generates a table from a delimited string. Then you can do something like this:
SELECT DISTINCT
CN.user_id
FROM
dbo.Names N
INNER JOIN dbo.Connect_Names CN ON CN.name_id = N.name_id
INNER JOIN dbo.GetTableFromNameList(#names) T ON T.name = N.name
This will give you user IDs where ANY of the passed names match ANY of the user's names. If you want to change it so that it only gives a match when ALL of the passed names match one of the user's names then you could do something like this:
SELECT
CN.user_id
FROM
dbo.Names N
INNER JOIN dbo.Connect_Names CN ON CN.name_id = N.name_id
INNER JOIN dbo.GetTableFromNameList(#names) T ON T.name = N.name
GROUP BY CN.user_id
HAVING COUNT(*) = (SELECT COUNT(*) FROM dbo.GetTableFromNameList(#names))
I haven't tested that second query, so you may need to fiddle with it. Also, you might want to create a local table and fill it with the function so that you don't have to run the function multiple times.
I can post my own version of the string to table function if you need it. Also, if you want to use LIKE rather than exact matches, you can do that.
I hear that SQL Server 2008 has a feature called Table Parameters, so you can pass a Table as a parameter to a Function or Stored Procedure.
Until then, you could use the classic:
SELECT UserId FROM NameTable WHERE CHARINDEX( '|' + Name + '|', '|Filip Ekberg|Filip|Ekberg Filip|') > 0
This means that column name can be any of the values you have in the list.
You can also pass an XML parameter into the stored procedure and then use it as a table in your code, via the OPENXML command.
You say you want to use IN ? IN looks for exact matches so if you use IN('Ekberg')
You will only find Ekberg and not Anders Ekberg.
If you want the above to find Anders Ekberg you need Like '%Ekberg'
In fact in your second example where you have name A B C and you pass A C you want it to find A B C.
Really you want a Like on every word you pass. There are other ways to achieve what it sounds like you require. Mind if I ask the database you are running against first of all ? :) Thanks
From the similar SQL-only question here: SQL query: Simulating an "AND" over several rows instead of sub-querying
Try this.
List<string> targets = new List<string>() {"Johan", "Ekberg"};
int targetCount = targets.Count;
//
List<Users> result =
dc.Names
.Where(n => targets.Contains(n.Name))
.SelectMany(n =>
dc.ConnectNames.Where(cn => cn.NameId == n.NameId)
)
.GroupBy(cn => cn.UserId)
.Where(g => g.Count() == targetCount)
.SelectMany(g =>
dc.Users.Where(u => u.UserId == g.Key)
)
.ToList();
This is freehand code, so there's probably some syntax errors in there.

Categories