Problems highlighting substring in a text

Problems highlighting substring in a text - c#

I have a problem with a code that I do not know very well how to solve.
The fact is that I want to highlight the substrings found in a text, for this I have developed the following code:
texts.ForEach((a) =>
{
if (a.Content.Contains(word))
{
RenderResultSubString(a, word);
}
});
The method with which I render the text is the following:
public TextGramaticaItemPublic RenderResultSubString(TextGramaticaItemPublic a, string substr) {
a.Content = Regex.Replace(a.Content, String.Format(#"\b{0}\b", substr), new MatchEvaluator(ReplaceKeyWords), RegexOptions.IgnoreCase);
return a;
}
And the delegate of the MatchEvaluator that adds the HTML to cause the highlighting effect is this:
public string ReplaceKeyWords(Match m) {
return "<mark><b>" + m.Value + "</b></mark>";
}
And the truth is that this works when it comes to strings but not when it comes to substrings. I think I'm on the right track, but there is something that escapes me and I can't quite get it right.
I've done a lot of research! But I can't see my failure! :(

SOLVED:
The code must be:
public TextGramaticaItemPublic RenderResultSubString(TextGramaticaItemPublic a, string substr) {
a.Content = Regex.Replace(a.Content, String.Format(substr), new MatchEvaluator(ReplaceKeyWords), RegexOptions.IgnoreCase);
return a;
}
Such #Harshad Raval and #Jamiec comments the expression #"\b{0}\b" of the String.Format() method it's not relevant. Was inherited code...
Thanks to everybody!

Related

Regular expressions - Equal number of characters in left and right

So I have this regular expression
[a+][a-z-[a]]{1}[a+]
which will match string "aadaa"
but it will also match string "aaaaaaaadaa"
Is there any way to force it to match only those strings in which left side a's and right side a's occurrence count should be same?
so that it will match only "aadaa" and not this "aaaaaaaadaa"
Edit
With the help of Peter's answer I could make it working, this is the working version for my requirement
(a+)[a-z-[a]]{1}\1

You can use a back reference, as follows:
console.log(check("ada"));
console.log(check("aadaa"));
console.log(check("aaaaaaaadaa"));
console.log(check("aaadaaaaaaa"));
function check(str) {
var re = /^(.*).\1$/;
return re.test(str);
}
Or to only match a's and d's:
console.log(check("aca"));
console.log(check("aadaa"));
console.log(check("aaaaaaaadaa"));
console.log(check("aaadaaaaaaa"));
function check(str) {
var re = /^(a*)d\1$/;
return re.test(str);
}
Or to only match a's that surround not-an-a:
console.log(check("aca"));
console.log(check("aadaa"));
console.log(check("aaaaaaaadaa"));
console.log(check("aaadaaaaaaa"));
function check(str) {
var re = /^(a*)[b-z]\1$/;
return re.test(str);
}
I realize all the above is javascript, which was easy for quick demoing within the context of SO.
I made a working DotNetFiddle with the following C# code that is similar to all the above:
public static Regex re = new Regex(#"^(a+)[b-z]\1$");
public static void Main()
{
check("aca");
check("ada");
check("aadaa");
check("aaddaa");
check("aadcaa");
check("aaaaaaaadaa");
check("aadaaaaaaaa");
}
public static void check(string str)
{
Console.WriteLine(str + " -> " + re.IsMatch(str));
}

You can also use the following regex for the same although I would prefer the one suggested by #PeterB
console.log(check("aca"));
console.log(check("aadaa"));
console.log(check("aaaaaaaadaa"));
console.log(check("aaadaaaaaaa"));
function check(str) {
var re = /^(\w+)[A-Za-z]\1$/;
return re.test(str);
}
The code is similar to the one in Peter B's answer, but the regex is the one changed by me.

C# Method to Check if a String Contains Certain Letters

I'm trying to create a method which takes two parameters, "word" and "input". The aim of the method is to print any word where all of its characters can be found in "input" no more than once (this is why the character is removed if a letter is found).
Not all the letters from "input" must be in "word" - eg, for input = "cacten" and word = "ace", word would be printed, but if word = "aced" then it would not.
However, when I run the program it produces unexpected results (words being longer than "input", containing letters not found in "input"), and have coded the solution several ways all with the same outcome. This has stumped me for hours and I cannot work out what's going wrong. Any and all help will be greatly appreciated, thanks. My full code for the method is written below.
static void Program(string input, string word)
{
int letters = 0;
List<string> remaining = new List<string>();
foreach (char item in input)
{
remaining.Add(item.ToString());
}
input = remaining.ToString();
foreach (char letter in word)
{
string c = letter.ToString();
if (input.Contains(c))
{
letters++;
remaining.Remove(c);
input = remaining.ToString();
}
}
if (letters == word.Length)
{
Console.WriteLine(word);
}
}

Ok so just to go through where you are going wrong.
Firstly when you assign remaining.ToString() to your input variable. What you actually assign is this System.Collections.Generic.List1[System.String]. Doing to ToString on a List just gives you the the type of list it is. It doesnt join all your characters back up. Thats probably the main thing that is casuing you issues.
Also you are forcing everything into string types and really you don't need to a lot of the time, because string already implements IEnumerable you can get your string as a list of chars by just doing myString.ToList()
So there is no need for this:
foreach (char item in input)
{
remaining.Add(item.ToString());
}
things like string.Contains have overloads that take chars so again no need for making things string here:
foreach (char letter in word)
{
string c = letter.ToString();
if (input.Contains(c))
{
letters++;
remaining.Remove(c);
input = remaining.ToString();
}
}
you can just user the letter variable of type char and pass that into contains and beacuse remaining is now a List<char> you can remove a char from it.
again Don't reassign remaining.ToString() back into input. use string.Join like this
string.Join(string.empty,remaining);
As someone else has posted there is a probably better ways of doing this, but I hope that what I've put here helps you understand what was going wrong and will help you learn

You can also use Regular Expression which was created for such scenarios.
bool IsMatch(string input, string word)
{
var pattern = string.Format("\\b[{0}]+\\b", input);
var r = new Regex(pattern);
return r.IsMatch(word);
}
I created a sample code for you on DotNetFiddle.
You can check what the pattern does at Regex101. It has a pretty "Explanation" and "Quick Reference" panel.

There are a lot of ways to achieve that, here is a suggestion:
static void Main(string[] args)
{
Func("cacten","ace");
Func("cacten", "aced");
Console.ReadLine();
}
static void Func(string input, string word)
{
bool isMatch = true;
foreach (Char s in word)
{
if (!input.Contains(s.ToString()))
{
isMatch = false;
break;
}
}
// success
if (isMatch)
{
Console.WriteLine(word);
}
// no match
else
{
Console.WriteLine("No Match");
}
}

Not really an answer to your question but its always fun to do this sort of thing with Linq:
static void Print(string input, string word)
{
if (word.All(ch => input.Contains(ch) &&
word.GroupBy(c => c)
.All(g => g.Count() <= input.Count(c => c == g.Key))))
Console.WriteLine(word);
}
Functional programming is all about what you want without all the pesky loops, ifs and what nots... Notice that this code does what you'd do in your head without needing to painfully specify step by step how you'd actually do it:
Make sure all characters in word are in input.
Make sure all characters in word are used at most as many times as they are present in input.
Still, getting the basics right is a must, posted this answer as additional info.

Regex C# is it possible to use a variable in substitution?

I got bunch of strings in text, which looks like something like this:
h1. this is the Header
h3. this one the header too
h111. and this
And I got function, which suppose to process this text depends on what lets say iteration it been called
public void ProcessHeadersInText(string inputText, int atLevel = 1)
so the output should look like one below in case of been called
ProcessHeadersInText(inputText, 2)
Output should be:
<h3>this is the Header<h3>
<h5>this one the header too<h5>
<h9 and this <h9>
(last one looks like this because of if value after h letter is more than 9 it suppose to be 9 in the output)
So, I started to think about using regex.
Here's the example https://regex101.com/r/spb3Af/1/
(As you can see I came up with regex like this (^(h([\d]+)\.+?)(.+?)$) and tried to use substitution on it <h$3>$4</h$3>)
Its almost what I'm looking for but I need to add some logic into work with heading level.
Is it possible to add any work with variables in substitution?
Or I need to find other way? (extract all heading first, replace em considering function variables and value of the header, and only after use regex I wrote?)

The regex you may use is
^h(\d+)\.+\s*(.+)
If you need to make sure the match does not span across line, you may replace \s with [^\S\r\n]. See the regex demo.
When replacing inside C#, parse Group 1 value to int and increment the value inside a match evaluator inside Regex.Replace method.
Here is the example code that will help you:
using System;
using System.Linq;
using System.Text.RegularExpressions;
using System.IO;
public class Test
{
// Demo: https://regex101.com/r/M9iGUO/2
public static readonly Regex reg = new Regex(#"^h(\d+)\.+\s*(.+)", RegexOptions.Compiled | RegexOptions.Multiline);
public static void Main()
{
var inputText = "h1. Topic 1\r\nblah blah blah, because of bla bla bla\r\nh2. PartA\r\nblah blah blah\r\nh3. Part a\r\nblah blah blah\r\nh2. Part B\r\nblah blah blah\r\nh1. Topic 2\r\nand its cuz blah blah\r\nFIN";
var res = ProcessHeadersInText(inputText, 2);
Console.WriteLine(res);
}
public static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
return reg.Replace(inputText, m =>
string.Format("<h{0}>{1}</h{0}>", (int.Parse(m.Groups[1].Value) > 9 ?
9 : int.Parse(m.Groups[1].Value) + atLevel), m.Groups[2].Value.Trim()));
}
}
See the C# online demo
Note I am using .Trim() on m.Groups[2].Value as . matches \r. You may use TrimEnd('\r') to get rid of this char.

You can use a Regex like the one used below to fix your issues.
Regex.Replace(s, #"^(h\d+)\.(.*)$", #"<$1>$2<$1>", RegexOptions.Multiline)
Let me explain you what I am doing
// This will capture the header number which is followed
// by a '.' but ignore the . in the capture
(h\d+)\.
// This will capture the remaining of the string till the end
// of the line (see the multi-line regex option being used)
(.*)$
The parenthesis will capture it into variables that can be used as "$1" for the first capture and "$2" for the second capture

Try this:
private static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
// Group 1 = value after 'h'
// Group 2 = Content of header without leading whitespace
string pattern = #"^h(\d+)\.\s*(.*?)\r?$";
return Regex.Replace(inputText, pattern, match => EvaluateHeaderMatch(match, atLevel), RegexOptions.Multiline);
}
private static string EvaluateHeaderMatch(Match m, int atLevel)
{
int hVal = int.Parse(m.Groups[1].Value) + atLevel;
if (hVal > 9) { hVal = 9; }
return $"<h{hVal}>{m.Groups[2].Value}</h{hVal}>";
}
Then just call
ProcessHeadersInText(input, 2);
This uses the Regex.Replace(string, string, MatchEvaluator, RegexOptions) overload with a custom evaluator function.
You could of course streamline this solution into a single function with an inline lambda expression:
public static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
string pattern = #"^h(\d+)\.\s*(.*?)\r?$";
return Regex.Replace(inputText, pattern,
match =>
{
int hVal = int.Parse(match.Groups[1].Value) + atLevel;
if (hVal > 9) { hVal = 9; }
return $"<h{hVal}>{match.Groups[2].Value}</h{hVal}>";
},
RegexOptions.Multiline);
}

A lot of good solution in this thread, but I don't think you really need a Regex solution for your problem. For fun and challenge, here a non regex solution:
Try it online!
using System;
using System.Linq;
public class Program
{
public static void Main()
{
string extractTitle(string x) => x.Substring(x.IndexOf(". ") + 2);
string extractNumber(string x) => x.Remove(x.IndexOf(". ")).Substring(1);
string build(string n, string t) => $"<h{n}>{t}</h{n}>";
var inputs = new [] {
"h1. this is the Header",
"h3. this one the header too",
"h111. and this" };
foreach (var line in inputs.Select(x => build(extractNumber(x), extractTitle(x))))
{
Console.WriteLine(line);
}
}
}
I use C#7 nested function and C#6 interpolated string. If you want, I can use more legacy C#. The code should be easy to read, I can add comments if needed.
C#5 version
using System;
using System.Linq;
public class Program
{
static string extractTitle(string x)
{
return x.Substring(x.IndexOf(". ") + 2);
}
static string extractNumber(string x)
{
return x.Remove(x.IndexOf(". ")).Substring(1);
}
static string build(string n, string t)
{
return string.Format("<h{0}>{1}</h{0}>", n, t);
}
public static void Main()
{
var inputs = new []{
"h1. this is the Header",
"h3. this one the header too",
"h111. and this"
};
foreach (var line in inputs.Select(x => build(extractNumber(x), extractTitle(x))))
{
Console.WriteLine(line);
}
}
}

Remove BR tag from the beginning and end of a string

How can I use something like
return Regex.Replace("/(^)?(<br\s*\/?>\s*)+$/", "", source);
to replace this cases:
<br>thestringIwant => thestringIwant
<br><br>thestringIwant => thestringIwant
<br>thestringIwant<br> => thestringIwant
<br><br>thestringIwant<br><br> => thestringIwant
thestringIwant<br><br> => thestringIwant
It can have multiple br tags at begining or end, but i dont want to remove any br tag in the middle.

A couple of loops would solve the issue and be easier to read and understand (use a regex = tomorrow you look at your own code wondering what the heck is going on)
while(source.StartsWith("<br>"))
source = source.SubString(4);
while(source.EndsWith("<br>"))
source = source.SubString(0,source.Length - 4);
return source;

When I see your regular expression, it sounds like there could be spaces allowed with in br tag.
So you can try something like:
string s = Regex.Replace(input,#"\<\s*br\s*\/?\s*\>","");

There is no need to use regular expression for it
you can simply use
yourString.Replace("<br>", "");
This will remove all occurances of <br> from your string.
EDIT:
To keep the tag present in between the string, just use as follows-
var regex = new Regex(Regex.Escape("<br>"));
var newText = regex.Replace("<br>thestring<br>Iwant<br>", "<br>", 1);
newText = newText.Substring(0, newText.LastIndexOf("<br>"));
Response.Write(newText);
This will remove only 1st and last occurance of <br> from your string.

How about doing it in two goes so ...
result1 = Regex.Replace("/^(<br\s*\/?>\s*)+/", "", source);
then feed the result of that into
result2 = Regex.Replace("/(<br\s*\/?>\s*)+$/", "", result1);
It's a bit of added overhead I know but simplifies things enormously, and saves trying to counter match everything in the middle that isn't a BR.
Note the subtle difference between those two .. one matching them at start and one matching them at end. Doing it this way keeps the flexibility of keeping a regular expression that allows for the general formatting of BR tags rather than it being too strict.

if you also want it to work with
<br />
then you could use
return Regex.Replace("((:?<br\s*/?>)*<br\s*/?>$|^<br\s*/?>(:?<br\s*/?>)*)", "", source);
EDIT:
Now it should also take care of multiple
<br\s*/?>
in the start and end of the lines

You can write an extension method to this stuff
public static string TrimStart(this string value, string stringToTrim)
{
if (value.StartsWith(stringToTrim, StringComparison.CurrentCultureIgnoreCase))
{
return value.Substring(stringToTrim.Length);
}
return value;
}
public static string TrimEnd(this string value, string stringToTrim)
{
if (value.EndsWith(stringToTrim, StringComparison.CurrentCultureIgnoreCase))
{
return value.Substring(0, value.Length - stringToTrim.Length);
}
return value;
}
you can call it like
string example = "<br> some <br> test <br>";
example = example.TrimStart("<br>").TrimEnd("<br>"); //output some <br> test

I believe that one should not ignore the power of Regex. If you name the regular expression appropriately then it would not be difficult to maintain it in future.
I have written a sample program which does your task using Regex. It also ignores the character cases and white space at beginning and end. You can try other source string samples you have.
Most important, It would be faster.
using System;
using System.Text.RegularExpressions;
namespace ConsoleDemo
{
class Program
{
static void Main(string[] args)
{
string result;
var source = #"<br><br>thestringIwant<br><br> => thestringIwant<br/> same <br/> <br/> ";
result = RemoveStartEndBrTag(source);
Console.WriteLine(result);
Console.ReadKey();
}
private static string RemoveStartEndBrTag(string source)
{
const string replaceStartEndBrTag = #"(^(<br>[\s]*)+|([\s]*<br[\s]*/>)+[\s]*$)";
return Regex.Replace(source, replaceStartEndBrTag, "", RegexOptions.IgnoreCase);
}
}
}

How to check if a word starts with a given character?

I have a list of a Sharepoint items: each item has a title, a description and a type.
I successfully retrieved it, I called it result. I want to first check if there is any item in result which starts with A then B then C, etc. I will have to do the same for each alphabet character and then if I find a word starting with this character I will have to display the character in bold.
I initially display the characters using this function:
private string generateHeaderScripts(char currentChar)
{
string headerScriptHtml = "$(document).ready(function() {" +
"$(\"#myTable" + currentChar.ToString() + "\") " +
".tablesorter({widthFixed: true, widgets: ['zebra']})" +
".tablesorterPager({container: $(\"#pager" + currentChar.ToString() +"\")}); " +
"});";
return headerScriptHtml;
}
How can I check if a word starts with a given character?

To check one value, use:
string word = "Aword";
if (word.StartsWith("A"))
{
// do something
}
You can make a little extension method to pass a list with A, B, and C
public static bool StartsWithAny(this string source, IEnumerable<string> strings)
{
foreach (var valueToCheck in strings)
{
if (source.StartsWith(valueToCheck))
{
return true;
}
}
return false;
}
if (word.StartsWithAny(new List<string>() { "A", "B", "C" }))
{
// do something
}
AND as a bonus, if you want to know what your string starts with, from a list, and do something based on that value:
public static bool StartsWithAny(this string source, IEnumerable<string> strings, out string startsWithValue)
{
startsWithValue = null;
foreach (var valueToCheck in strings)
{
if (source.StartsWith(valueToCheck))
{
startsWithValue = valueToCheck;
return true;
}
}
return false;
}
Usage:
string word = "AWord";
string startsWithValue;
if (word.StartsWithAny(new List<string>() { "a", "b", "c" }, out startsWithValue))
{
switch (startsWithValue)
{
case "A":
// Do Something
break;
// etc.
}
}

You could do something like this to check for a specific character.
public bool StartsWith(string value, string currentChar) {
return value.StartsWith(currentChar, true, null);
}
The StartsWith method has an option to ignore the case. The third parameter is to set the culture. If null, it just uses the current culture. With this method, you can loop through your words, run the check and process the word to highlight that first character as needed.

Assuming the properties you're checking are string types, you can use the String.StartsWith() method.. for example: -
if(item.Title.StartsWith("A"))
{
//do whatever
}
Rinse and repeat

Try the following below. You can do either StartsWith or Substring 0,1 (first letter)
if (Word.Substring(0,1) == "A") {
}

You can simply check the first character:
string word = "AWord"
if (word[0] == 'A')
{
// do something
}
Remember that character comparison is more efficient than string comparison.

To return the first character in a string, use:
Word.Substring(0,1) //where word is a string

You could implement Regular Expressions. They are quite powerful, but when you design your expression it will actually accomplish a task for you.
For example finding a number, letter, word, and etc. it is quite expressive and flexible.
They have a really great tutorial on them here:
An example of such an expression would be:
string input = "Some additional string to compare against.";
Match match = Regex.Match(input, #"\ba\w*\b", RegexOptions.IgnoreCase);
That would find all the items that start with an "a" no matter the case. You find even utilize Lambda and Linq to make them flow even better.
Hopefully that helps.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Problems highlighting substring in a text - c#

Related

Regular expressions - Equal number of characters in left and right

C# Method to Check if a String Contains Certain Letters

Regex C# is it possible to use a variable in substitution?

Remove BR tag from the beginning and end of a string

How to check if a word starts with a given character?

Categories

Resources