LINQ to count occurrences - c#

I have the following query which works great:
string[] Words = {"search","query","example"};
... Snip ...
var Results = (
from a in q
from w in Words
where
(
a.Title.ToLower().Contains(w)
|| a.Body.ToLower().Contains(w)
)
select new
{
a,
Count = 0
}).OrderByDescending(x=> x.Count)
.Distinct()
.Take(Settings.ArticlesPerPage);
What I need it to do, is return Count which is the total occurrences of the words. I'm going to weight it in favour of the title as well, example:
Count = (OccuranceInTitle * 5) + (OccurancesInBody)
I'm assuming I need to use the Linq.Count but I'm not sure how to apply it in this instance.

This is what I came up with:
var query =
from a in q
from w in Words
let title = a.Title.ToLower()
let body = a.Body.ToLower()
let replTitle = Regex.Replace(title, string.Format("\\b{0}\\b", w), string.Empty)
let replBody = Regex.Replace(body, string.Format("\\b{0}\\b", w), string.Empty)
let titleOccurences = (title.Length - replTitle.Length) / w.Length
let bodyOccurences = (body.Length - replBody.Length) / w.Length
let score = titleOccurences * 5 + bodyOccurences
where score > 0
select new { Article = a, Score = score };
var results = query.GroupBy(r => r.Article)
.OrderByDescending(g => g.Sum(r => r.Score))
.Take(Settings.ArticlesPerPage);
Counting occurrences is done with the (surprisingly) quick and dirty method of replacing occurrences with string.Empty and calculating based on the resulting string length. After the scores for each article and each word are calculated, I 'm grouping for each article, ordering by the sum of scores for all the words and taking a chunk out of the results.
I didn't fire up the compiler, so please excuse any obvious mistakes.
Update: This version uses regexes as in
Regex.Replace(title, string.Format("\\b{0}\\b", w), string.Empty)
instead of the original version's
title.Replace(w, string.Empty)
so that it now matches only whole words (the string.Replace version would also match word fragments).

Related

LINQ conditional selection and formatting

I have a string of characters and I'm trying to set up a query that'll substitute a specific sequence of similar characters into a character count. Here's an example of what I'm trying to do:
agd69dnbd555bdggjykbcx555555bbb
In this case, I'm trying to isolate and count ONLY the occurrences of the number 5, so my output should read:
agd69dnbd3bdggjykbcx6bbb
My current code is the following, where GroupAdjacentBy is a function that groups and counts the character occurrences as above.
var res = text
.GroupAdjacentBy((l, r) => l == r)
.Select(x => new { n = x.First(), c = x.Count()})
.ToArray();
The problem is that the above function groups and counts EVERY SINGLE character in my string, not the just the one character I'm after. Is there a way to conditionally perform that operation on ONLY the character I need counted?
Regex is a better tool for this job than LINQ.
Have a look at this:
string input = "agd69dnbd555bdggjykbcx555555bbb";
string pattern = #"5+"; // Match 1 or more adjacent 5 character
string output = Regex.Replace(input, pattern, match => match.Length.ToString());
// output = agd69dnbd3bdggjykbcx6bbb
Not sure if your intending to replace every 5 character, or just when there is more than one adjacent 5.
If it's the latter, just change your pattern to:
string pattern = #"5{2,}"; // Match 2 or more adjacent 5's
The best answer was already given by Johnathan Barclay. But just for the case that you need something similar by using Linq and to show an alternative solution:
var charToCombine = '5';
var res = text
.GroupAdjacentBy((l, r) => l == r )
.SelectMany(x => x.Count() > 1 && x.First() == charToCombine ? x.Count().ToString() : x)
.ToArray();

Remove keywords and operators from string expression

I need you help
I have one string like
a1 + b1 + ( v1 + g1 ) * 10
I need to retrieve only a1,b1,v1,g1
any idea
I would use RegEx to filter the desired output.
Assuming that your result always starts with a small letter and ends with a digit [a-z][0-9]
string input = "a1 + b1 + ( v1 + g1 ) * 10";
List<string> Result = Regex.Matches(input, #"[a-z][0-9]")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
I'd go after it using a recursive descent parser. It may seem as an overkill to start with but it will work for all sorts of expressions.
Here's a good introduction to the theory.

Manipulating Matched values in Regex C#

I have read a text file and matched the data I am interested in. My question is, what is the best way to manipulate the data I have matched?
The code I am reading the text file with is.
OpenFileDialog dialog = new OpenFileDialog();
dialog.Filter =
"All files (*.*)|*.*";
//dialog.InitialDirectory = "C:\\";
dialog.Title = "Select a text file";
if (dialog.ShowDialog() == DialogResult.OK)
{
string fname = dialog.FileName; // selected file
label1.Text = fname;
if (String.IsNullOrEmpty(richTextBox1.Text))
{
var matches1 = Regex.Matches(System.IO.File.ReadAllText(fname), #"L10 P\d\d\d R \S\S\S\S\S\S\S")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
richTextBox1.Lines = matches1.ToArray();
}
The result now looks like:
L10 P015 R +4.9025
and I need it to look like this:
#2015=4.9025
L10 is excluded, P015 turns into #2015, R and + turn into =, and the number stays the same.
Use capturing groups:
First change your regex to:
L10 P(?<key>\d{3}) R \S(?<val>\S{6})
The (?<name>...) syntax lets you declare a named capturing group. You can later retrieve the value that was matched by this group.
Next, when you have a match object, you can extract the matching group contents with match.Groups["key"].Value and match.Groups["val"].Value, like that:
.Select(m => string.Format("#2{0}={1}", m.Groups["key"].Value, m.Groups["val"].Value))
var matches = Regex.Matches(System.IO.File.ReadAllText(fname), #"L10 P\d\d\d R \S\S\S\S\S\S\S")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
string num1 = "2" + matches[1].Substring(1); // "2" + "015"
string num2 = matches[3].Substring(1); // "4.9025"
string finalValue = "#" + num1 + "=" + num2; // "#2015=4.9025"
richTextBox1.Text = finalValue;
I believe that this should work, based on your single example.
This assumes that we are simply always ignoring the first character of the P015 item and the first character of the +4.9025 item.
Why don't you simply split the receiving stream, your rules are basic and there is no need for regexes.
string receivingStream = "L10 P015 R +4.9025";
string[] tokens = receivingStream.Split(new char[] { ' ' });
tokens[0] == L10
tokens[1] == Date
tokens[2] == R
tokens[3] == Number
You want to be using Regex.Replace to mutate the string once instead of going through all of this matching. You'll want to add grouping to the regex, and use substitutions in the replacement string.
see:
https://msdn.microsoft.com/en-us/library/xwewhkd1(v=vs.110).aspx

Split string pattern

I have a string that I need to split in an array of string. All the values are delimited by a pipe | and are separated by a comma.
|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||
The array should have the following 8 values after the split
111
2,2
room 1
13'2'' x 13'8''
""
""
""
""
by "" I simply mean an empty string. Please note that the value can also have a comma e.g 2,2. I think probably the best way to do this is through Regex.Split but I am not sure how to write the correct regular expression. Any suggestions or any better way of achieving this will be really appreciated.
You can use Match() to get the values instead of split() as long as the values between the pipe characters don't contain the pipe character itself:
(?<=\|)[^|]*(?=\|)
This will match zero or more non-pipe characters [^|]* which are preceded (?<=\|) and followed by a pipe (?=\|).
In C#:
var input = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var results = Regex.Matches(input, #"(?<=\|)[^|]*(?=\|)");
foreach (Match match in results)
Console.WriteLine("Found '{0}' at position {1}",
match.Value, match.Index);
EDIT: Since commas always separate the values that are between pipe characters | then we can be sure that the commas used as separators will always appear at odd intervals, so we can only walk the even indexes of the array to get the true values like this:
var input = "|room 1|,|,|,||,||,||,||,||,||";
var results = Regex.Matches(input, #"(?<=\|)[^|]*(?=\|)");
for (int i = 0; i < results.Count; i+=2)
Console.WriteLine("Found '{0}'", results[i].Value);
This can be also used in the first example above.
Assuming all fields are enclosed by a pipe and delimited by a comma you can use |,| as the delimiter, removing the leading and trailing |
Dim data = "|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||"
Dim delim = New String() {"|,|"}
Dim results = data.Substring(1, data.Length - 2).Split(delim, StringSplitOptions.None)
For Each s In results
Console.WriteLine(s)
Next
Output:
111
2,2
room 1
13'2'' x 13'8''
""
""
""
""
No need to use a regex, remove the pipes and split the string on the comma:
var input = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var parts = input.Split(',').Select(x => x.Replace("|", string.Empty));
or
var parts = input.Replace("|", string.Empty).Split(',');
EDIT: OK, in that case, use a while loop to parse the string:
var values = new List<string>();
var str = #"|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";;
while (str.Length > 0)
{
var open = str.IndexOf('|');
var close = str.IndexOf('|', open + 1);
var value = str.Substring(open + 1, open + close - 1);
values.Add(value);
str = open + close < str.Length - 1
? str.Substring(open + close + 2)
: string.Empty;
}
You could try this:
string a = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
string[] result = a.Split('|').Where(s => !s.Contains(",")).Select(s => s.Replace("|",String.Empty)).ToArray();
mmm maybe this work for you:
var data = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var resultArray = data.Replace("|", "").Split(',');
Regards.,
k
EDIT: You can use wildcard
string data = "|111|,|2,2|,|,3|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var resultArray = data.Replace("|,|", "¬").Replace("|", "").Split('¬');
Regards.,
k
Check, if this fits your needs...
var str = "|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
//Iterate through all your matches (we're looking for anything between | and |, non-greedy)
foreach (Match m in Regex.Matches(str, #"\|(.*?)\|"))
{
//Groups[0] is entire match, with || symbols, but [1] - something between ()
Console.WriteLine(m.Groups[1].Value);
}
Though, to find anything between | and |, you might and probably should use [^\|] instead of . character.
At least, for specified use case it gives the result you're expecting.

Splitting a string and adding to list using Regex and C#

I have a string where the number of words might vary. Like:
string a_string = " one two three four five six seven etc etc etc ";
How do I split the string into 5 words each, and each of those are added it to a list, such that it becomes a list of string (with each string containing 5 words). I think list would be better because number of words in string can vary, so list can grow or shrink accordingly.
I tried using Regex to get first 5 words through below line of code:
Regex.Match(rawMessage, #"(\w+\s+){5}").ToString().Trim();
but bit unsure on how to proceed further and add to list dynamically and robustly. I guess Regex can further help, or some awesome string/list function? Can you please guide me a bit?
Eventually, I would want list[0] to contain "one two three four five" and list[1] to contain "six seven etc etc etc", and so on.. Thanks.
var listOfWords = Regex.Matches(a_string, #"(\w+\s+){1,5}")
.Cast<Match>()
.Select(i => i.Value.Trim())
.ToList();
Splitting for words does not require regex, string provides this capability:
var list = str.Split(' ').ToList();
ToList() is a LINQ extension method for converting IEnumerable<T> objects to lists (Split() method returns an array of strings).
To group a list by 5 words, use this code snippet:
var res = list
.Select((s, i) => new { Str = s, Index = i })
.GroupBy(p => p.Index/5)
.Select(g => string.Join(" ", g.Select(v => v.Str)));
You can use simple
a_string.Split(' ');
And then loop throught resulting array and fill your list as you want, for example
int numOfWords = 0;
int currentPosition = 0;
foreach (var str in a_string.Split(' '))
{
if (numOfWords == 5)
{
numOfWords = 0;
currentPosition++;
}
list[currentPosition] += str;
numOfWords++;
}

Categories