How to replace words ending with -ing with a static string - c#

In my block of code below, I am triyng to replace words ending with -ing with a static text "------". However, this doesn't seem to work and throws --- all over the place. What am I doing wrong?
string ingString = "I like programming, running, jobs and swimming.";
string ingWords = #"[^\\b\\w+(ing\\b)$]";
string staticLine = "------";
replaceString = Regex.Replace(ingString, ingWords, staticLine);
It should read "I like ------, ------, jobs and ------."
Thanks

This will do it:
string ingString = "I like programming, running, jobs and swimming.";
string ingWords = #"\w+ing\b";
string staticLine = "------";
Console.WriteLine(Regex.Replace(ingString, ingWords, staticLine));
Output:
I like ------, ------, jobs and ------.
Now answering to your question:
What am I doing wrong?
You regex:
[^\\b\\w+(ing\\b)$]
When you use brackets [...] it represents a set of characters, so the engine is trying to match all caracters inside your set, thats why its replacing a lot of chars with -----

Your regex isn't right. Try this: \w*ing\b
Taken from another question but modified to suit your need.

Related

Display string after certain words are found

Basically what I'm trying to do is find the first string that starts with "/Game/Mods" but the problem is how do i tell the program where to end the string? here's an example what a string can look like: string example
As you can see the string starts with "/Game/Mods", i want it to end after the word "TamingSedative", the problem is that the ending word (TamingSedative)is different for every file it has to check, for example: example 2
There you can see that the ending word is now "WeapObsidianSword" (instead of TamingSedative) so basically the string has to end when it comes across the "NUL" but how do i specify that in c# code?
This a simple example using Regex.
Dim yourString As String = "/Game/Mods/TamingSedative/PrimalItemConsumable_TamingSedative"
Dim M As System.Text.RegularExpressions.Match = System.Text.RegularExpressions.Regex.Match(yourString, "/Game/Mods/(.+?)/")
MessageBox.Show(M.Groups(0).Value) 'This should show /Game/Mods/TamingSedative/
MessageBox.Show(M.Groups(1).Value) 'This should show TamingSedative
Since you need only the first occurance, this is the simplest solution I could think of:
(In case you cannot see the image, click on it to open in new tab)
EDIT:
In case the existence of a path like this is not guaranteed in the string, you can do an additional check before proceeding to use Substring, like this:
int exists = fullString.IndexOf("/Game/Mods");
if (exists == -1) return null;
Note: I have included "ENDED" in order to see in case any NULL chars have been included (white spaces)
From your comments: "the string just has to start at /Game/Mods and end when it reaches the whitespace".
In that case, you can easily get the matches using Linq, like this (assuming filePath is a string that has the path to your file):
var text = File.ReadAllText(filePath);
var matches = text.Split(null).Where(s => s.StartsWith("/Game/Mods"));
And, if you only need the first occurrence, it would be:
var firstMatch = matches.Any() ? matches.First() : null;
Check this post.

How to strip a string from the point a hyphen is found within the string C#

I'm currently trying to strip a string of data that is may contain the hyphen symbol.
E.g. Basic logic:
string stringin = "test - 9894"; OR Data could be == "test";
if (string contains a hyphen "-"){
Strip stringin;
output would be "test" deleting from the hyphen.
}
Console.WriteLine(stringin);
The current C# code i'm trying to get to work is shown below:
string Details = "hsh4a - 8989";
var regexItem = new Regex("^[^-]*-?[^-]*$");
string stringin;
stringin = Details.ToString();
if (regexItem.IsMatch(stringin)) {
stringin = stringin.Substring(0, stringin.IndexOf("-") - 1); //Strip from the ending chars and - once - is hit.
}
Details = stringin;
Console.WriteLine(Details);
But pulls in an Error when the string does not contain any hyphen's.
How about just doing this?
stringin.Split('-')[0].Trim();
You could even specify the maximum number of substrings using overloaded Split constructor.
stringin.Split('-', 1)[0].Trim();
Your regex is asking for "zero or one repetition of -", which means that it matches even if your input does NOT contain a hyphen. Thereafter you do this
stringin.Substring(0, stringin.IndexOf("-") - 1)
Which gives an index out of range exception (There is no hyphen to find).
Make a simple change to your regex and it works with or without - ask for "one or more hyphens":
var regexItem = new Regex("^[^-]*-+[^-]*$");
here -------------------------^
It seems that you want the (sub)string starting from the dash ('-') if original one contains '-' or the original string if doesn't have dash.
If it's your case:
String Details = "hsh4a - 8989";
Details = Details.Substring(Details.IndexOf('-') + 1);
I wouldn't use regex for this case if I were you, it makes the solution much more complex than it can be.
For string I am sure will have no more than a couple of dashes I would use this code, because it is one liner and very simple:
string str= entryString.Split(new [] {'-'}, StringSplitOptions.RemoveEmptyEntries)[0];
If you know that a string might contain high amount of dashes, it is not recommended to use this approach - it will create high amount of different strings, although you are looking just for the first one. So, the solution would look like something like this code:
int firstDashIndex = entryString.IndexOf("-");
string str = firstDashIndex > -1? entryString.Substring(0, firstDashIndex) : entryString;
you don't need a regex for this. A simple IndexOf function will give you the index of the hyphen, then you can clean it up from there.
This is also a great place to start writing unit tests as well. They are very good for stuff like this.
Here's what the code could look like :
string inputString = "ho-something";
string outPutString = inputString;
var hyphenIndex = inputString.IndexOf('-');
if (hyphenIndex > -1)
{
outPutString = inputString.Substring(0, hyphenIndex);
}
return outPutString;

Find and replace file lines

I have a text file with over 12,000 lines. In that file I need to replace certain lines.
Some lines begin with a ;, some have random words, some start with space. However, I am only concerned with the two types of lines I describe below.
I have a line like
SET avariable:0 ;Comments
and I need to replace it to look like
set aDIFFvariable:0 :Integer // comments
The only CASE that is necessary is in the word Integer I needs to be capitalized.
I also have
String aSTRING(7) ;Comment
that needs to look like
STRING aSTRING(7) :array [0..7] of AnsiChar; // Comments
I need to keep all the spacing the same.
Here is what I have so far
static void Main(string[] args)
{
string text = File.ReadAllText("C:\\old.txt");
text = text.Replace("old text", "new text");
File.WriteAllText("C:\\new.txt", text);
}
I think I need to use REGEX, which I have tried to make for my first example:
\s\s[set]\s*{4}.*[:0]\s*[;].* <-- I now know this is invalid - please advise
I need help with properly setting up my program to find and replace those lines. Should I read one line at a time and if it matches then do something? I am confused really as to where to start.
BRIEF pseudo code of what I want to do
//open file
//step through file
//if line == [regex] then add/replace as needed
//else, go to next line
//if EOF, close file
Taking a stab at this separately because each line is so radically different that capturing both in the same expression will be a nightmare.
To match your first example and replace it:
String input = "SET avariable:0 ;Comments";
if (Regex.IsMatch(input, #"\s?(set)\s*(\w+):?(\d)\s+;?(.*)?"))
{
input = Regex.Replace(input, #"\s?(set)\s*(\w+):?(\d)\s+;?(.*)?", "$1 $2:$3 :Integer // $4";
}
Give that a shot (Play with it here: http://regex101.com/r/zY7hV2)
To match your second example and replace it:
String input = "String aSTRING(7) ;Comments";
if (Regex.IsMatch(input, #"\s?(string)\s*(\w+)\((\d)\)\s*;(.*)"))
{
input = Regex.Replace(input, #"\s?(string)\s*(\w+)\((\d)\)\s*;(.*)", "$1 $2($3) :array [0..$3] of AnsiChar; // $4";
}
And play around with this one here: http://regex101.com/r/jO5wP5

Match Multiline & IgnoreSome

I'm trying to extract some information from a JCL source using regex in C#
Basically, this is a string I can have:
//JOBNAME0 JOB (BLABLABLA),'SOME TEXT',MSGCLASS=YES,ILIKE=POTATOES, GRMBL
// IALSOLIKE=TOMATOES, ANOTHER GARBAGE
// FINALLY=BYE
//OTHER STUFF
So I need to extract the jobname JOBNAME0, the info (BLABLABLA), the description 'SOME TEXT' and the other parms MSGCLASS=YES ILIKE=POTATOES IALSOLIKE=TOMATOES FINALLY=BYE.
I must ignore everything that is after the space ... like GRMBL or ANOTHER GARBAGE
I must continue to next line if my last valid char was a , and stop if it there were none.
So far, I have successfully managed to get the jobname, the info and the description, pretty easy. For the other parms, i'm able to get all the parms and to split them, but i don't know how to get rid of the garbage.
Here is my code:
var regex = "//([^\\s]*) JOB (\\([^)]*\\))?,?(\\'[^']*\\')?,?([^,]*[,|\\s|$])*";
Match match2 = Regex.Match(test5, regex,RegexOptions.Singleline);
string CarteJob2 = match2.Groups[0].Value;
string JobName2 = match2.Groups[1].Value;
string JobInfo2 = match2.Groups[2].Value;
string JobDesc2 = match2.Groups[3].Value;
IEnumerable<string> parms = match2.Groups[4].Captures.OfType<Capture>().Select(x => x.Value);
string JobParms2 = String.Join("|", parms);
Console.WriteLine(CarteJob2 + "|");
Console.WriteLine(JobName2 + "|");
Console.WriteLine(JobInfo2 + "|");
Console.WriteLine(JobDesc2 + "|");
Console.WriteLine(JobParms2 + "|");
The output I get is this one:
//JOBNAME0 JOB (BLABLABLA),'SOME TEXT',MSGCLASS=YES,ILIKE=POTATOES, GRMBL
// IALSOLIKE=TOMATOES, ANOTHER GARBAGE
// FINALLY=BYE
//OTHER |
JOBNAME0|
(BLABLABLA)|
'SOME TEXT'|
MSGCLASS=YES,|ILIKE=POTATOES,| GRMBL
// IALSOLIKE=TOMATOES,| ANOTHER GARBAGE
// FINALLY=BYE
//OTHER |
The output I would like to see is:
//JOBNAME0 JOB (BLABLABLA),'SOME TEXT',MSGCLASS=YES,ILIKE=POTATOES, GRMBL
// IALSOLIKE=TOMATOES, ANOTHER GARBAGE
// FINALLY=BYE|
JOBNAME0|
(BLABLABLA)|
'SOME TEXT'|
MSGCLASS=YES|ILIKE=POTATOES|IALSOLIKE=TOMATOES|FINALLY=BYE|
Is there a way to get what I want ?
I think I'd try and do this with two Regex expressions.
The first one to get all the starting information from the beginning of the string - job name, info, description.
The second one to get all the parameters, which all seem to have a simple pattern of <param name>=<param value>.
The first Regex might look like this:
^//(?<job>[\d\w]+)[ ]+JOB[ ]+\((?<info>[\d\w]+)\),'(?<description>[\d\w ]+)'
I don't know if rules permit whitespaces to appear in the job name, info or description - adjust as needed. Also, I'm assuming this is the start of the file using the ^ char. Finally, this Regex has groups already defined, so getting values should be easier in C#.
The second Regex might be something like this:
(?<param>[\w\d]+)=(?<value>[\w\d]+)
Again, grouping is added to help get the parameter names and values.
Hope this helps.
EDIT:
A small tip - you can use the # sign before a string in C# to make it easier to write such Regex patterns. For example:
Regex reg = new Regex(#"(?<param>[\w\d]+)=(?<value>[\w\d]+)");

Replace Bad words using Regex

I am trying to create a bad word filter method that I can call before every insert and update to check the string for any bad words and replace with "[Censored]".
I have an SQL table with has a list of bad words, I want to bring them back and add them to a List or string array and check through the string of text that has been passed in and if any bad words are found replace them and return a filtered string back.
I am using C# for this.
Please see this "clbuttic" (or for your case cl[Censored]ic) article before doing a string replace without considering word boundaries:
http://www.codinghorror.com/blog/2008/10/obscenity-filters-bad-idea-or-incredibly-intercoursing-bad-idea.html
Update
Obviously not foolproof (see article above - this approach is so easy to get around or produce false positives...) or optimized (the regular expressions should be cached and compiled), but the following will filter out whole words (no "clbuttics") and simple plurals of words:
const string CensoredText = "[Censored]";
const string PatternTemplate = #"\b({0})(s?)\b";
const RegexOptions Options = RegexOptions.IgnoreCase;
string[] badWords = new[] { "cranberrying", "chuffing", "ass" };
IEnumerable<Regex> badWordMatchers = badWords.
Select(x => new Regex(string.Format(PatternTemplate, x), Options));
string input = "I've had no cranberrying sleep for chuffing chuffings days -
the next door neighbour is playing classical music at full tilt!";
string output = badWordMatchers.
Aggregate(input, (current, matcher) => matcher.Replace(current, CensoredText));
Console.WriteLine(output);
Gives the output:
I've had no [Censored] sleep for [Censored] [Censored] days - the next door neighbour is playing classical music at full tilt!
Note that "classical" does not become "cl[Censored]ical", as whole words are matched with the regular expression.
Update 2
And to demonstrate a flavour of how this (and in general basic string\pattern matching techniques) can be easily subverted, see the following string:
"I've had no cranberryıng sleep for chuffıng chuffıngs days - the next door neighbour is playing classical music at full tilt!"
I have replaced the "i"'s with Turkish lower case undottted "ı"'s. Still looks pretty offensive!
Although I'm a big fan of Regex, I think it won't help you here. You should fetch your bad word into a string List or string Array and use System.String.Replace on your incoming message.
Maybe better, use System.String.Split and .Join methods:
string mayContainBadWords = "... bla bla ...";
string[] badWords = new string[]{"bad", "worse", "worst"};
string[] temp = string.Split(badWords, StringSplitOptions.RemoveEmptyEntries);
string cleanString = string.Join("[Censored]", temp);
In the sample, mayContainBadWords is the string you want to check; badWords is a string array, you load from your bad word sql table and cleanString is your result.
you can use string.replace() method or RegEx class
There is also a nice article about it which can e found here
With a little html-parsing skills, you can get a large list with swear words from noswear

Categories