I'm trying to refresh my knowledge regarding c# and came accross this problem,
Have the function StringChallenge(str) take the str parameter being passed and return a compressed version of the string using the Run-length encoding algorithm. This algorithm works by taking the occurrence of each repeating character and outputting that number along with a single character of the repeating sequence. For example: "wwwggopp" would return 3w2g1o2p. The string will not contain any numbers, punctuation, or symbols.
and my code is
using System;
using System.Text;
class MainClass {
public static string StringChallenge(string str) {
// code goes here
var newString = new StringBuilder();
var result = new StringBuilder();
foreach (var c in str){
if (newString.Length == 0 || newString[newString.Length - 1] == c){
newString.Append(c);
}
else{
result.Append($"{newString.Length}{newString[0]}");
newString.Clear();
newString.Append(c);
}
}
if (newString.Length > 0){
result.Append($"{newString.Length}{newString[0]}");
}
return result;
}
static void Main() {
// keep this function call here
Console.WriteLine(StringChallenge(Console.ReadLine()));
}
}
please help. thank you!
Related
I got bunch of strings in text, which looks like something like this:
h1. this is the Header
h3. this one the header too
h111. and this
And I got function, which suppose to process this text depends on what lets say iteration it been called
public void ProcessHeadersInText(string inputText, int atLevel = 1)
so the output should look like one below in case of been called
ProcessHeadersInText(inputText, 2)
Output should be:
<h3>this is the Header<h3>
<h5>this one the header too<h5>
<h9 and this <h9>
(last one looks like this because of if value after h letter is more than 9 it suppose to be 9 in the output)
So, I started to think about using regex.
Here's the example https://regex101.com/r/spb3Af/1/
(As you can see I came up with regex like this (^(h([\d]+)\.+?)(.+?)$) and tried to use substitution on it <h$3>$4</h$3>)
Its almost what I'm looking for but I need to add some logic into work with heading level.
Is it possible to add any work with variables in substitution?
Or I need to find other way? (extract all heading first, replace em considering function variables and value of the header, and only after use regex I wrote?)
The regex you may use is
^h(\d+)\.+\s*(.+)
If you need to make sure the match does not span across line, you may replace \s with [^\S\r\n]. See the regex demo.
When replacing inside C#, parse Group 1 value to int and increment the value inside a match evaluator inside Regex.Replace method.
Here is the example code that will help you:
using System;
using System.Linq;
using System.Text.RegularExpressions;
using System.IO;
public class Test
{
// Demo: https://regex101.com/r/M9iGUO/2
public static readonly Regex reg = new Regex(#"^h(\d+)\.+\s*(.+)", RegexOptions.Compiled | RegexOptions.Multiline);
public static void Main()
{
var inputText = "h1. Topic 1\r\nblah blah blah, because of bla bla bla\r\nh2. PartA\r\nblah blah blah\r\nh3. Part a\r\nblah blah blah\r\nh2. Part B\r\nblah blah blah\r\nh1. Topic 2\r\nand its cuz blah blah\r\nFIN";
var res = ProcessHeadersInText(inputText, 2);
Console.WriteLine(res);
}
public static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
return reg.Replace(inputText, m =>
string.Format("<h{0}>{1}</h{0}>", (int.Parse(m.Groups[1].Value) > 9 ?
9 : int.Parse(m.Groups[1].Value) + atLevel), m.Groups[2].Value.Trim()));
}
}
See the C# online demo
Note I am using .Trim() on m.Groups[2].Value as . matches \r. You may use TrimEnd('\r') to get rid of this char.
You can use a Regex like the one used below to fix your issues.
Regex.Replace(s, #"^(h\d+)\.(.*)$", #"<$1>$2<$1>", RegexOptions.Multiline)
Let me explain you what I am doing
// This will capture the header number which is followed
// by a '.' but ignore the . in the capture
(h\d+)\.
// This will capture the remaining of the string till the end
// of the line (see the multi-line regex option being used)
(.*)$
The parenthesis will capture it into variables that can be used as "$1" for the first capture and "$2" for the second capture
Try this:
private static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
// Group 1 = value after 'h'
// Group 2 = Content of header without leading whitespace
string pattern = #"^h(\d+)\.\s*(.*?)\r?$";
return Regex.Replace(inputText, pattern, match => EvaluateHeaderMatch(match, atLevel), RegexOptions.Multiline);
}
private static string EvaluateHeaderMatch(Match m, int atLevel)
{
int hVal = int.Parse(m.Groups[1].Value) + atLevel;
if (hVal > 9) { hVal = 9; }
return $"<h{hVal}>{m.Groups[2].Value}</h{hVal}>";
}
Then just call
ProcessHeadersInText(input, 2);
This uses the Regex.Replace(string, string, MatchEvaluator, RegexOptions) overload with a custom evaluator function.
You could of course streamline this solution into a single function with an inline lambda expression:
public static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
string pattern = #"^h(\d+)\.\s*(.*?)\r?$";
return Regex.Replace(inputText, pattern,
match =>
{
int hVal = int.Parse(match.Groups[1].Value) + atLevel;
if (hVal > 9) { hVal = 9; }
return $"<h{hVal}>{match.Groups[2].Value}</h{hVal}>";
},
RegexOptions.Multiline);
}
A lot of good solution in this thread, but I don't think you really need a Regex solution for your problem. For fun and challenge, here a non regex solution:
Try it online!
using System;
using System.Linq;
public class Program
{
public static void Main()
{
string extractTitle(string x) => x.Substring(x.IndexOf(". ") + 2);
string extractNumber(string x) => x.Remove(x.IndexOf(". ")).Substring(1);
string build(string n, string t) => $"<h{n}>{t}</h{n}>";
var inputs = new [] {
"h1. this is the Header",
"h3. this one the header too",
"h111. and this" };
foreach (var line in inputs.Select(x => build(extractNumber(x), extractTitle(x))))
{
Console.WriteLine(line);
}
}
}
I use C#7 nested function and C#6 interpolated string. If you want, I can use more legacy C#. The code should be easy to read, I can add comments if needed.
C#5 version
using System;
using System.Linq;
public class Program
{
static string extractTitle(string x)
{
return x.Substring(x.IndexOf(". ") + 2);
}
static string extractNumber(string x)
{
return x.Remove(x.IndexOf(". ")).Substring(1);
}
static string build(string n, string t)
{
return string.Format("<h{0}>{1}</h{0}>", n, t);
}
public static void Main()
{
var inputs = new []{
"h1. this is the Header",
"h3. this one the header too",
"h111. and this"
};
foreach (var line in inputs.Select(x => build(extractNumber(x), extractTitle(x))))
{
Console.WriteLine(line);
}
}
}
I'm trying to find a regex pattern to match a word with some given characters. But each character should be used only once. For example if I'm given "yrarbil" (library backwards), it should match these:
library
rar
lib
rarlib
But it should not match the following
libraryy ("y" is used more times than given)
libraries ("i" is used more times than given, and also "es" are not given at all)
I've searched all around but best I could find was code to match a word but the same character is used more than the amount of times it was given. Thank you.
P.S: If this can't be done in regex (I'm a noob at it as you can see) what would be the best way to match a word like this programmatically?
"library" is confusing because it has 2 litters r. But it is solvable from my opinion.
Easily Create a map<char, int> this will store the count of each character in the pattern. Then we will generate a map<char, int> for word to check, it will also contain the count of each char then iterate over the map if any char has more count than the same char in the map of pattern it don't match, also if it is not found at all then it don't match also.
As required the code in C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
static bool Match(string pattern, string toMatch)
{
Dictionary<char, int> patternMap = new Dictionary<char, int>();
Dictionary<char, int> toMatchMap = new Dictionary<char, int>();
foreach (char ch in pattern)
{
if (patternMap.ContainsKey(ch))
++patternMap[ch];
else
patternMap[ch] = 1;
}
foreach (char ch in toMatch)
{
if (toMatchMap.ContainsKey(ch))
++toMatchMap[ch];
else
toMatchMap[ch] = 1;
}
foreach (var item in toMatchMap)
{
if (!patternMap.ContainsKey(item.Key) || patternMap[item.Key] < item.Value)
return false;
}
return true;
}
static void Main(string[] args)
{
string pattern = "library";
string[] test = { "lib", "rarlib", "rarrlib", "ll" };
foreach (var item in test)
{
if(Match(pattern, item))
Console.WriteLine("Match item : {0}", item);
else
Console.WriteLine("Failed item : {0}", item);
}
Console.ReadKey();
/*
Match item : lib
Match item : rarlib
Failed item : rarrlib
Failed item : ll
*/
}
}
}
A regex won't work for that. A solution would be to simply count the caracters of your list.
For example in JavaScript:
function count(str){
return str.split('').reduce(function(m,c){
m[c] = (m[c]||0)+1;
return m;
},{})
}
function check(str, reference){
var ms = count(str), mr = count(reference);
for (var k in ms) {
if (!(ms[k]<=mr[k])) return false;
}
return true;
}
// what follows is only for demonstration in a snippet
$('button').click(function(){
$('#r').text(check($('#a').val(), "library") ? "OK":"NOT OK");
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<input id=a value="rarlib">
<button>CHECK</button>
<div id=r></div>
I do not understand why do you want to do this using a regexp when a very simple and straightforward solution is available.
You just count how much times each letter appears in a given word, and in a word you test. Then you check that each letter in a tested word appears no more times than in a given word.
for ch in given_word
cnt[ch]++
for ch in test_word
cnt[ch]--
for ch='a'..'z'
if cnt[ch]<0
answer is no
if for all leters cnt[ch]>=0
answer is yes
I am trying to sanitize a string so that it can be used to be put in an URL. This is just for show in the URL. Now I was using this function in PHP which worked fine:
$CleanString = IconV('UTF-8', 'ASCII//TRANSLIT//IGNORE', $String);
$CleanString = Preg_Replace("/[^a-zA-Z0-9\/_|+ -]/", '', $CleanString);
$CleanString = StrToLower( Trim($CleanString, '-') );
$CleanString = Preg_Replace("/[\/_|+ -]+/", $Delimiter, $CleanString);
Now I am trying to put this in C#, the regex's are no problem but the first line is a bit tricky. What is the safe way to replace characters as é á ó with their normal equivalents a e o?
For example, above would transer:
The cát ís running & getting away
to
the-cat-is-running-getting-away
The CharUnicodeInfo.GetUnicodeCategory(c) method can tell you if a character is a "Non spacing mark". This can only be used when the string is in a form where accents ("diacritics") are separated from their letter, which can be obtained with Normalize(NormalizationForm.FormD).
Here is the full string extension method:
using System.Text;
using System.Globalization;
...
public static string RemoveDiacritics(this string strThis)
{
if (strThis == null)
return null;
var sb = new StringBuilder();
foreach (char c in strThis.Normalize(NormalizationForm.FormD))
{
if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
sb.Append(c);
}
return sb.ToString();
}
I'd hate to reinvent something that was already written, so I'm wondering if there is a ReadWord() function somewhere in the .NET Framework that extracts words based some text delimited by white space and line breaks.
If not, do you have a implementation that you'd like to share?
string data = "Four score and seven years ago";
List<string> words = new List<string>();
WordReader reader = new WordReader(data);
while (true)
{
string word =reader.ReadWord();
if (string.IsNullOrEmpty(word)) return;
//additional parsing logic goes here
words.Add(word);
}
Not that I'm aware of directly. If you don't mind getting them all in one go, you could use a regular expression:
Regex wordSplitter = new Regex(#"\W+");
string[] words = wordSplitter.Split(data);
If you have leading/trailing whitespace you'll get an empty string at the beginning or end, but you could always call Trim first.
A different option is to write a method which reads a word based on a TextReader. It could even be an extension method if you're using .NET 3.5. Sample implementation:
using System;
using System.IO;
using System.Text;
public static class Extensions
{
public static string ReadWord(this TextReader reader)
{
StringBuilder builder = new StringBuilder();
int c;
// Ignore any trailing whitespace from previous reads
while ((c = reader.Read()) != -1)
{
if (!char.IsWhiteSpace((char) c))
{
break;
}
}
// Finished?
if (c == -1)
{
return null;
}
builder.Append((char) c);
while ((c = reader.Read()) != -1)
{
if (char.IsWhiteSpace((char) c))
{
break;
}
builder.Append((char) c);
}
return builder.ToString();
}
}
public class Test
{
static void Main()
{
// Give it a few challenges :)
string data = #"Four score and
seven years ago ";
using (TextReader reader = new StringReader(data))
{
string word;
while ((word = reader.ReadWord()) != null)
{
Console.WriteLine("'{0}'", word);
}
}
}
}
Output:
'Four'
'score'
'and'
'seven'
'years'
'ago'
Not as such, however you could use String.Split to split the string into an array of string based on a delimiting character or string. You can also specify multiple strings / characters for the split.
If you'd prefer to do it without loading everything into memory then you could write your own stream class that does it as it reads from a stream but the above is a quick fix for small amounts of data word splitting.
everyone, i've got below function to return true if input is badword
public bool isAdultKeyword(string input)
{
if (input == null || input.Length == 0)
{
return false;
}
else
{
Regex regex = new Regex(#"\b(badword1|badword2|anotherbadword)\b");
return regex.IsMatch(input);
}
}
above function only matched to whole string i.e if input badword it wont match but it will when input is bawrod1.
what im trying to do it is get match when part of input contains one of the badwords
So under your logic, would you match as to ass?
Also, remember the classic place Scunthorpe - your adult filter needs to be able to allow this word through.
You probably don't have to do it in such a complex way but you can try to implement Knuth-Morris-Pratt. I had tried using it in one of my failed(totally my fault) OCR enhancer modules.
Try:
Regex regex = new Regex(#"(\bbadword1\b|\bbadword2\b|\banotherbadword\b)");
return regex.IsMatch(input);
Your method seems to be working fine. Can you clarify what wrong with it? My tester program below shows it passing a number of tests with no failures.
using System;
using System.Text.RegularExpressions;
namespace CSharpConsoleSandbox {
class Program {
public static bool isAdultKeyword(string input) {
if (input == null || input.Length == 0) {
return false;
} else {
Regex regex = new Regex(#"\b(badword1|badword2|anotherbadword)\b");
return regex.IsMatch(input);
}
}
private static void test(string input) {
string matchMsg = "NO : ";
if (isAdultKeyword(input)) {
matchMsg = "YES: ";
}
Console.WriteLine(matchMsg + input);
}
static void Main(string[] args) {
// These cases should match
test("YES badword1");
test("YES this input should match badword2 ok");
test("YES this input should match anotherbadword. ok");
// These cases should not match
test("NO badword5");
test("NO this input will not matchbadword1 ok");
}
}
}
Output:
YES: YES badword1
YES: YES this input should match badword2 ok
YES: YES this input should match anotherbadword. ok
NO : NO badword5
NO : NO this input will not matchbadword1 ok
Is \b the word boundary in a regular expression?
In that case your regular expression is only looking for entire words.
Removing these will match any occurances of the badwords including where it has been included as part of a larger word.
Regex regex = new Regex(#"(bad|awful|worse)", RegexOptions.IgnoreCase);