Linq with Regex - c#

I have the matches of a regex pattern and I'm having some difficulties designing the Linq around it to produce the desired output.
The data is fixed lengths: 1231234512341234567
Lengths in this case are: 3, 5, 4, 7
The regex pattern used is: (.{3})(.{5})(.{4})(.{7})
This all works perfectly fine and the matched results of the pattern are as expected, however, the desired output is proving to be somewhat difficult. In fact, I'm not even certain what it would be called in SQL terms - except maybe a pivot query. The desired output is to take all the values from each of the groups at a given position and concatenate them so for example:
field1:value1;value2;value3;valueN;field2:value2;value3;valueN;
Using the below Linq expression, I was able to get field1-value1, field2-value2, etc...
var matches = Regex.Matches(data, re).Cast<Match>();
var xmlResults = from m in matches
from e in elements
select string.Format("<{0}>{1}</{0}>", e.Name, m.Groups[e.Ordinal].Value);
but I can't seem to figure out how to get all the values at position 1 from "Groups" using the element's Ordinal, then all the values at position 2 and so on.
The "elements" in this example is a collection of field names and ordinal positions (starting at 1). So, it would look like this:
public class Element
{
public string Name { get; set; }
public int Ordinal { get; set; }
}
var elements = new List<Element>{
new Element { Name="Field1", Ordinal=1 },
new Element { Name="Field2", Ordinal=2 }
};
I've reviewed a bunch of various Linq expressions and dug into some pivot type Linq expressions, but none of them get me close - they all use the join operator which I don't think is possible.
Does anyone have any idea how to make this Linq?

You should be able to do this by changing the query to select from elements only, and bring in the matches through string.Join, like this:
// Use ToList to avoid iterating matches multiple times
var matches = Regex.Matches(data, re).Cast<Match>().ToList();
// For each element, join all matches, and pull in the value for e.Ordinal
var xmlResults = elements.Select(e =>
string.Format(
"<{0}>{1}</{0}>"
, e.Name
, string.Join(";", matches.Select(m => m.Groups[e.Ordinal].Value))
);
Note: this is not the best way of formatting XML. You would be better off using one of .NET's libraries for making XML, such as LINQ2XML.

Related

How to do this kind of search in ASP.net MVC?

I have an ASP.NET MVC web application.
The SQL table has one column ProdNum and it contains data such as 4892-34-456-2311.
The user needs a form to search the database that includes this field.
The problem is that the user wants to have 4 separate fields in the UI razor view whereas each field should match with the 4 parts of data above between -.
For example ProdNum1, ProdNum2, ProdNum3 and ProdNum4 field should match with 4892, 34, 456, 2311.
Since the entire search form contains many fields including these 4 fields, the search logic is based on a predicate which is inherited from the PredicateBuilder class.
Something like this:
...other field to be filtered
if (!string.IsNullOrEmpty(ProdNum1) {
predicate = predicate.And(
t => t.ProdNum.toString().Split('-')[0].Contains(ProdNum1).ToList();
...other fields to be filtered
But the above code has run-time error:
The LINQ expression node type 'ArrayIndex' is not supported in LINQ to Entities`
Does anybody know how to resolve this issue?
Thanks a lot for all responses, finally, I found an easy way to resolve it.
instead of rebuilding models and change the database tables, I just add extra space in the search strings to match the search criteria. since the data format always is: 4892-34-456-2311, so I use Startwith(PODNum1) to search first field, and use Contains("-" + PODNum2 + "-") to search second and third strings (replace PODNum1 to PODNum3), and use EndWith("-" + PODNum4) to search 4th string. This way, I don't need to change anything else, it is simple.
Again, thanks a lot for all responses, much appreciated.
If i understand this correct,you have one column which u want to act like 4 different column ? This isn't worth it...For that,you need to Split each rows column data,create a class to handle the splitted data and finally use a `List .Thats a useless workaround.I rather suggest u to use 4 columns instead.
But if you still want to go with your existing applied method,you first need to Split as i mentioned earlier.For that,here's an example :
public void test()
{
SqlDataReader datareader = new SqlDataReader;
while (datareader.read)
{
string part1 = datareader(1).toString.Split("-")(0);///the 1st part of your column data
string part2 = datareader(1).toString.Split("-")(1);///the 2nd part of your column data
}
}
Now,as mentioned in the comments,you can rather a class to handle all the data.For example,let's call it mydata
public class mydata {
public string part1;
public string part2;
public string part3;
public string part4;
}
Now,within the While loop of the SqlDatareader,declare a new instance of this class and pass the values to it.An example :
public void test()
{
SqlDataReader datareader = new SqlDataReader;
while (datareader.read)
{
Mydata alldata = new Mydata;
alldata.Part1 = datareader(1).toString.Split("-")(0);
alldata.Part2 = datareader(1).toString.Split("-")(1);
}
}
Create a list of the class in class-level
public class MyForm
{
List<MyData> storedData = new List<MyData>;
}
Within the while loop of the SqlDatareader,add this at the end :
storedData.Add(allData);
So finally, u have a list of all the splitted data..So write your filtering logic easily :)
As already mentioned in a comment, the error means that accessing data via index (see [0]) is not supported when translating your expression to SQL. Split('-') is also not supported hence you have to resort to the supported functions Substring() and IndexOf(startIndex).
You could do something like the following to first transform the string into 4 number strings ...
.Select(t => new {
t.ProdNum,
FirstNumber = t.ProdNum.Substring(0, t.ProdNum.IndexOf("-")),
Remainder = t.ProdNum.Substring(t.ProdNum.IndexOf("-") + 1)
})
.Select(t => new {
t.ProdNum,
t.FirstNumber,
SecondNumber = t.Remainder.Substring(0, t.Remainder.IndexOf("-")),
Remainder = t.Remainder.Substring(t.Remainder.IndexOf("-") + 1)
})
.Select(t => new {
t.ProdNum,
t.FirstNumber,
t.SecondNumber,
ThirdNumber = t.Remainder.Substring(0, t.Remainder.IndexOf("-")),
FourthNumber = t.Remainder.Substring(t.Remainder.IndexOf("-") + 1)
})
... and then you could simply write something like
if (!string.IsNullOrEmpty(ProdNum3) {
predicate = predicate.And(
t => t.ThirdNumber.Contains(ProdNum3)

How to construct a LINQ Query to test a list against another list where elements start with the elements from the other

Basically, I am constructing an autocomplete textbox to search for name fragments. (Yes could have used Lucene or etc but due to many non-technical reasons, not using it)
public IEnumerable<ContactAutoComplete> SelectActiveContactsAutoCompleteForMailingList(string fullName)
{
//Search query fullname e.g. James Francis Cameron is decomposed
//into a list comprising James, Francis, Cameron
IEnumerable<string> fragment = fullName.Trim().Split();
return _db.Contacts.Where(contact => contact.Status == Statuses.Activated &&
(fragment.All(c => contact.FullName.Trim().Split().Any(frag =>
frag.StartsWith(c))
}
What I need in the above context is a clause to
Apply the .Trim() and .Split() to the FullName field of each contact
Test the obtained list of text fragments (contact.FullName.Trim.Split) against the text fragments (fragment) obtained from the search query
Check if each text fragment (fragment) will appear at the start of each of fragments obtained from contact.FullName.Trim.Split
Examples:
In the database, a contact has the FullName, James Francis Cameron
Searching for
"Fra Cam" - OK
"Cam Fra" - OK (because in Asia, name ordering convention is inconsistent)
"Cis Ron" - not OK
Many thanks!
Linq can't translate methods like string.Split() to SQL, so you can't do it in a super general way like this, and you can't easily split the Full Name into two fields while doing a query. If you have your name in two fields for first and last name you can do this:
var fragments = fullName.Split(' ');
var first = fragments.FirstOrDefault().Trim();
var last = fragments.Skip(1).FirstOrDefault().Trim();
var r = db.Contacts.Where(x => (x.First.StartsWith(first) && x.Last.StartsWith(last))
|| (x.First.StartsWith(last) && x.Last.StartsWith(first))
);

Minimum scores in Lucene.net/Lucene?

Is it possible to set a minimum score for which to return results in Lucene?
I have this function:
public Tuple<int,ICollection<Guid>> Search(string searchQuery,int maxResults)
{
var booleanQuery = new BooleanQuery();
var s1 = new TermQuery(new Term("companyName", searchQuery));
booleanQuery.Add(s1, Occur.SHOULD);
using (var searcher = new IndexSearcher(this.Directory))
{
TopDocs hits = searcher.Search(booleanQuery, maxResults);
var ids = new List<Guid>();
for (int i = 0; i < hits.ScoreDocs.Count(); i++)
{
var idString = searcher.Doc(hits.ScoreDocs[i].Doc).Get("id");
ids.Add(new Guid(idString));
}
return new Tuple<int, ICollection<Guid>>(hits.TotalHits, ids);
}
}
The function searches my index and returns the IDs of the companies that match the searchQuery, along with the total number of companies that matched the search - so I can write 'Showing 1-20 of 245 matching companies'.
My problem is that the threshold for a match is very low. If the user enters "accountant" the search returns meaningful results, but if they enter "adasdfsdf" it returns results that are are not relevant. I would rather display a message like "Sorry, no companies match your query" if the results are not relevant enough.
Is it possible to set a minimum score for the matches? Will the TopDocs.TotalHits property respect this score?
In short, no. You can't really create a minimum score cutoff point in Lucene. Here is one discussion of why not. Note the cases discussed there are a bit different that what your asking for, but the difficulties are much the same (and, in fact, providing a reasonable cut-off point to be used on different, independant queries introduces greater, though closely related, difficulties).
The better way to address this is to design your queries such that you don't get irrelevant results. In your example, I don't really see why you would see a lot of irrelevant results coming up, so I'll assume there are other terms being added to the query. In that case, if you only want to get those documents for which new Term("companyName", searchQuery) is a match, you should add it with the Occur.MUST booleanClause, like:
var booleanQuery = new BooleanQuery();
var s1 = new TermQuery(new Term("companyName", searchQuery));
booleanQuery.Add(s1, Occur.MUST);
To explain further, the Occur.MUST and Occur.SHOULD are your problem there. If you have a query like:
category:type1 companyName:asdfdas
And have no results on companyName, then you would just see the results for the query category:type1. If you did have a match on companyName, those results would be judged to have much higher relevance, and would be displayed first, but it would still bring up everything that matched the category as well, just lower on the list. Both terms, in that example, are added with the BooleanClause.Occur.SHOULD, and so both are optional (although at least one matching term must still be found in any result).
If you wish to only display those terms that match both the category and the companyName, you should make both of them required terms in your query, by using the BooleanClause.Occur.MUST. Using the query syntax, this would look like:
+category:type1 +companyName:asdfdas
Or building a the BooleanQuery:
var s1 = new TermQuery(new Term("companyName", "asdfdas"));
booleanQuery.Add(s1, Occur.MUST);
var s1 = new TermQuery(new Term("category", "type1"));
booleanQuery.Add(s1, Occur.MUST);

How can I check if a string in sql server contains at least one of the strings in a local list using linq-to-sql?

In my database field I have a Positions field, which contains a space separated list of position codes. I need to add criteria to my query that checks if any of the locally specified position codes match at least one of the position codes in the field.
For example, I have a local list that contains "RB" and "LB". I want a record that has a Positions value of OL LB to be found, as well as records with a position value of RB OT but not records with a position value of OT OL.
With AND clauses I can do this easily via
foreach (var str in localPositionList)
query = query.Where(x => x.Position.Contains(str);
However, I need this to be chained together as or clauses. If I wasn't dealing with Linq-to-sql (all normal collections) I could do this with
query = query.Where(x => x.Positions.Split(' ').Any(y => localPositionList.contains(y)));
However, this does not work with Linq-to-sql as an exception occurs due it not being able to translate split into SQL.
Is there any way to accomplish this?
I am trying to resist splitting this data out of this table and into other tables, as the sole purpose of this table is to give an optimized "cache" of data that requires the minimum amount of tables in order to get search results (eventually we will be moving this part to Solr, but that's not feasible at the moment due to the schedule).
I was able to get a test version working by using separate queries and running a Union on the result. My code is rough, since I was just hacking, but here it is...
List<string> db = new List<string>() {
"RB OL",
"OT LB",
"OT OL"
};
List<string> tests = new List<string> {
"RB", "LB", "OT"
};
IEnumerable<string> result = db.Where(d => d.Contains("RB"));
for (int i = 1; i < tests.Count(); i++) {
string val = tests[i];
result = result.Union(db.Where(d => d.Contains(val)));
}
result.ToList().ForEach(r => Console.WriteLine(r));
Console.ReadLine();

Linq question about grouping something that can change?

I have a list of multiple string and I need to do operation on them by the suffixe they have. The only thing that is not changing is the beginning of the string (They will be always ManifestXXX.txt, FileNameItems1XXX...). The string end's with a suffix is different everytime. Here is what I have so far (Linq Pad):
var filesName = new[] { "ManifestSUFFIX.txt",
"FileNameItems1SUFFIX.txt",
"FileNameItems2SUFFIX.txt",
"FileNameItems3SUFFIX.txt",
"FileNameItems4SUFFIX.txt",
"ManifestWOOT.txt",
"FileNameItems1WOOT.txt",
"FileNameItems2WOOT.txt",
"FileNameItems3WOOT.txt",
"FileNameItems4WOOT.txt",
}.AsQueryable();
var query =
from n in filesName
group n by n.EndsWith("SUFFIX.txt") into ere
select new{ere} ;
query.Dump();
The condition in the GROUP is not good. I am thinking to try to get all possible suffixe with a nested SELECT in the group but I can't find a way to do it.
How can I have 3 differents group, grouping by their suffixe with Linq? Is it possible?
*Jimmy answer is great but still doesn't work the way desired. Any fix?
group by the suffix rather than whether it matches any particular one.
...
group by GetSuffix(n) into ere
...
string GetSuffix(string n) {
return Regex.Replace(n,"^Manifest|^FileNameItems[0-9]+", "");
}

Categories