How to encode strings for Regular Expression in .NET? - c#

I need to dynamically build a Regex to catch the given keywords, like
string regex = "(some|predefined|words";
foreach (Product product in products)
regex += "|" + product.Name; // Need to encode product.Name because it can include special characters.
regex += ")";
Is there some kind of Regex.Encode that does this?

You can use Regex.Escape. For example:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
public class Test
{
static void Main()
{
string[] predefined = { "some", "predefined", "words" };
string[] products = { ".NET", "C#", "C# (2)" };
IEnumerable<string> escapedKeywords =
predefined.Concat(products)
.Select(Regex.Escape);
Regex regex = new Regex("(" + string.Join("|", escapedKeywords) + ")");
Console.WriteLine(regex);
}
}
Output:
(some|predefined|words|\.NET|C\#|C\#\ \(2\))
Or without the LINQ, but using string concatenation in a loop (which I try to avoid) as per your original code:
string regex = "(some|predefined|words";
foreach (Product product)
regex += "|" + Regex.Escape(product.Name);
regex += ")";

Related

Parsing a list of values with option to empty list

I'm trying to parse an array of items, using Sprache library for C# I have a working code that goes like this.
public static Parser<string> Array =
from open in OpenBrackets.Named("open bracket")
from items in Literal.Or(Identifier).Or(Array).DelimitedBy(Comma).Optional()
from close in CloseBrackets.Named("close bracket")
select open + (items.IsDefined ? string.Join(", ", items.Get()) : " ") + close;
where "Literal" is a parser for numbers or strings, "Identifier" is a parser for a variable identifier and "Comma" is a parser for a comma token. But if I want the array to allow being empty "[ ]" I need to add the Optional() property and verify if "items" is defined:
select open + (items.IsDefined ? string.Join(", ", items.Get()) : " ") + close;
Is there a better cleaner way to do this for parsing a list of items separated by a separator char, that can be empty (list). That I can reuse with other lists of items.
Sample of input data structure:
[Literal/Identifier/Array] => Value;
[Value] [,Value]* => Array
[public/private] [identifier]; => Declaration;
[public/private] [identifier] [[=] [Value]] => Initialization;
A little cleaner way can be accomplished by GetOrElse method.
select open + string.Join(", ", items.GetOrElse(new string[0])) + close;
Try using Regex as in code below :
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.txt";
static void Main(string[] args)
{
StreamReader reader = new StreamReader(FILENAME);
string pattern = #"\[(?'bracketData'[^\]]+)\](?'repeat'[*+])?";
string line = "";
while ((line = reader.ReadLine()) != null)
{
line = line.Trim();
if (line.Length > 0)
{
string suffix = line.Split(new string[] {"=>"}, StringSplitOptions.None).Select(x => x.Trim()).Last();
MatchCollection matches = Regex.Matches(line, pattern);
var brackets = matches.Cast<Match>().Select(x => new { bracket = x.Groups["bracketData"].Value, repeat = x.Groups["repeat"].Value }).ToArray();
Console.WriteLine("Brackets : '{0}'; Suffix : '{1}'", string.Join(",", brackets.Select(x => "(" + x.bracket + ")" + x.repeat )), suffix);
}
}
Console.ReadLine();
}
}
}

How to read a specific line and text from a text file

string lot = "RU644276G01";
var year = "201" + lot.Substring(2, 1);
var folder = #"\\sinsdn38.ap.infineon.com\ArchView\03_Reports\" + year +
#"\" + lot.Substring(3, 2) + #"\" + lot.Substring(0,8) + #"\";
DirectoryInfo di = new DirectoryInfo(folder);
foreach (var fi in di.GetFiles("*.TLT"))
{
var file = fi.FullName;
string line;
using (StreamReader sr = new StreamReader(file))
{
while ((line = sr.ReadLine()) != null)
{
if (line.StartsWith("TEST-END"))
{
timeStampTextBox.Text = line;
}
}
}
This is my code currently.
I want to read from a specific line (for example line 8) and the line starts with "Test-End". However, line 8 contains all these
"TEST-END : 2017-01-08 15:51 PROGRAM : TLE8888QK-B2 BAU-NR : 95187193"
but I only want to read "2017-01-98 15:51".
How do I change my code to get that? Currently I'm getting the whole line instead of the specific timestamp that I want.
Edit
How do I change the code such that the string lot =" " can be any number, meaning it does not need to be RU644276G01, it can be a different number which will be typed by users. I have created a textbox for users to input the number.
You extract the text. It seems quite regular pattern, so regular expressions should be able to help:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var line = "TEST-END : 2017-01-08 15:51 PROGRAM : TLE8888QK-B2 BAU-NR : 95187193";
Regex re = new Regex(#"^(?:TEST-END : )(.*?\d{4}-\d{2}-\d{2} \d{2}:\d{2})");
var match = re.Match(line);
Console.WriteLine(match.Groups[1]);
Console.ReadLine(); // leave console open
}
}
Output:
2017-01-08 15:51 // this is group 1, group 0 is the full capture including TEST-END :
Use this to check it in regexr: https://regexr.com/3l1sf if you hover about the text it will diplay your capturing groups
The regex means:
^ start of the string
(?:TEST-END : ) non capturing group, text must be present
( a group
.*? as few (0-n) anythings as possible
\d{4}-\d{2}-\d{2} \d{2}:\d{2} 4 digits-2 digits-2digits 2digits:2digits
) end of group
More about regular expressions:
RegEx-Class
a regex Tester (one of many, the one I use): https://regexr.com/
Here is my answer using Regular Expressions.
if (line.StartsWith("TEST-END"))
{
Regex re = new Regex(#"\d{4}-\d{2}-\d{2} \d{2}:\d{2}");
var match = re.Match(line);
if(m.Success)
{
timeStampTextBox.Text = match.Value;
}
}
Output: 2017-01-08 15:51
you can split the line with ":", like this
var value = line.split(':');
and get your date like this.
var date = value[1] + ":" + value[2].Replace("PROGRAM", "");
above statement means
date = "2017-01-98 15" + ":" + "51"
if (line.StartsWith("TEST-END"))
{
var value = line.split(':');
var date = value[1] + ":" + value[2].Replace("PROGRAM", "");
timeStampTextBox.Text = date;
}
This is not the best answer, it depends on exactly the statement you had given.
I finally got all three parameters out of the last line
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Dictionary<string, string> dict = new Dictionary<string, string>();
string pattern = #"(?'name'[^\s]+)\s:\s(?'value'[\w\s\-]*|\d{4}-\d{2}-\d{2}\s\d{2}:\d{2})";
string line = "TEST-END : 2017-01-08 15:51 PROGRAM : TLE8888QK-B2 BAU-NR : 95187193";
MatchCollection matches = Regex.Matches(line, pattern, RegexOptions.RightToLeft);
foreach (Match match in matches)
{
Console.WriteLine("name : '{0}', value : '{1}'", match.Groups["name"].Value, match.Groups["value"].Value);
dict.Add(match.Groups["name"].Value, match.Groups["value"].Value);
}
DateTime date = DateTime.Parse(dict["TEST-END"]);
Console.ReadLine();
}
}
}

Converting Wordpress sanitize filename function from PHP to C#

I'm trying to convert Wordpress sanitize_file_name function from PHP to C# so I can use it to generate unicode slugs for my site's articles on a web app that build myself.
This is my class:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using System.Web;
namespace MyProject.Helpers
{
public static class Slug
{
public static string SanitizeFileName(string filename)
{
string[] specialChars = { "?", "[", "]", "/", "\\", "=", "< ", "> ", ":", ";", ",", "'", "\"", "& ", "$", "#", "*", "(", ")", "|", "~", "`", "!", "{", "}" };
filename = MyStrReplace(filename, specialChars, "");
filename = Regex.Replace(filename, #"/[\s-]+/", "-");
filename.TrimEnd('-').TrimStart('-');
filename.TrimEnd('.').TrimStart('.');
filename.TrimEnd('_').TrimStart('_');
return filename;
}
private static string MyStrReplace(string strToCheck, string[] strToReplace, string newValue)
{
foreach (string s in strToReplace)
{
strToCheck = strToCheck.Replace(s, newValue);
}
return strToCheck;
}
// source: http://stackoverflow.com/questions/166855/c-sharp-preg-replace
public static string PregReplace(string input, string[] pattern, string[] replacements)
{
if (replacements.Length != pattern.Length)
throw new ArgumentException("Replacement and Pattern Arrays must be balanced");
for (int i = 0; i < pattern.Length; i++)
{
input = Regex.Replace(input, pattern[i], replacements[i]);
}
return input;
}
}
}
I put a title like: "let's say that I have --- in there what to do" but I get the same results only with the single apostrophe trimmed (let's -> lets), nothing else changed.
I want the same equivalent conversion as Wordpress'. Using ASP.NET 4.5 / C#
Since in C# you do not have action modifiers, there are no regex delimiters.
The solution is simply to remove / symbols from the pattern:
filename = Regex.Replace(filename, #"[\s-]+", "-");
^ ^

How to check if a string contains substring with wildcard? like abc*xyz

When I parse lines in text file, I want to check if a line contains abc*xyz, where * is a wildcard. abc*xyz is a user input format.
You can generate Regex and match using it
searchPattern = "abc*xyz";
inputText = "SomeTextAndabc*xyz";
public bool Contains(string searchPattern,string inputText)
{
string regexText = WildcardToRegex(searchPattern);
Regex regex = new Regex(regexText , RegexOptions.IgnoreCase);
if (regex.IsMatch(inputText ))
{
return true;
}
return false;
}
public static string WildcardToRegex(string pattern)
{
return "^" + Regex.Escape(pattern)
.Replace(#"\*", ".*")
.Replace(#"\?", ".")
+ "$";
}
Here is the source
and Here is a similar issue
If asterisk is the only wildcard character that you wish to allow, you could replace all asterisks with .*?, and use regular expressions:
var filter = "[quick*jumps*lazy dog]";
var parts = filter.Split('*').Select(s => Regex.Escape(s)).ToArray();
var regex = string.Join(".*?", parts);
This produces \[quick.*?jumps.*?lazy\ dog] regex, suitable for matching inputs.
Demo.
Use Regex
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string prefix = "abc";
string suffix = "xyz";
string pattern = string.Format("{0}.*{1}", prefix, suffix);
string input = "abc123456789xyz";
bool resutls = Regex.IsMatch(input, pattern);
}
}
}
​

Writing into .txt file without erasing previous data C#

I am trying to split a string in a .txt-file by commas (,) into a string[] and then replacing every item of the string[] to another formula, for example:
"Marko Kostic, Faculty of Technical Sciences, University of Novi Sad,
Trg D. Obradovica 6, 21125 Novi Sad, Serbia"
I want to split this string by commas in between the words and then I want to put every value in separate line like a list and then changing every value with another like "Marko Kostic" to be
<addr-line>Marko Kostic<\addr-line>
The problem is the writer wrote only the last value of string[] and erase the previous values.
Any suggestions?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;
using Microsoft.Office.Interop;
using Microsoft.Office.Interop.Word;
using System.Diagnostics;
using System.Reflection;
using System.Collections;
using System.Runtime.InteropServices;
namespace AffiliationParser
{
class Program
{
static void Main(string[] args)
{
Microsoft.Office.Interop.Word.Application oWord = new Microsoft.Office.Interop.Word.Application();
object missing = System.Reflection.Missing.Value;
object isVisible = false;
using (StreamReader batch = new StreamReader(#"D:\Developing\REF\AffiliationParser\AffiliationParser\AffiliationParser\bin\Debug\Run.bat"))
{
string bat;
while (!batch.EndOfStream)
{
bat = batch.ReadLine();
// do your processing with batch command
if (bat == "pause")
{
continue;
}
string fpath = bat.Substring(bat.IndexOf(" \""));
string path = fpath.Replace("\"", "").Replace(" ","");
string[] name = Directory.GetFiles(path, "*.txt");
string words = name.Min();
string word = words.Substring(words.LastIndexOf("\\")).Replace("\\", "");
Console.WriteLine("Processing........");
Console.WriteLine(word);
string Npath = path + #"\Arr" + word;
if (File.Exists(Npath))
{
System.Windows.Forms.MessageBox.Show("The file Arr" + word + " alredy exist in " + path);
continue;
}
else
{
File.Copy(words, Npath);
StreamReader temp = new StreamReader(Npath, Encoding.UTF8);
string tempstring = temp.ReadToEnd();
string[] temp3 = tempstring.Split(',');
temp.Close();
foreach (string item in temp3)
{
string Nitem = item.TrimStart().TrimEnd();
//Match MatchCont = Regex.Match(Nitem, #"Afganistan|Albania|Algeria|American\s+Samoa|Andorra|Angola|Anguilla|Antarctica|Antigua\s+and\s+Barbuda|Argentina|Armenia|Aruba|Australia|Austria|Azerbaijan|Bahamas|Bahrain|Bangladesh|Barbados|Belarus|Belgium|Belize|Benin|Bermuda|Bhutan|Bolivia|Bosnia\s+and\s+Herzegovina|Botswana|Bouvet\s+Island|Brazil|British\s+Indian\s+Ocean\s+Territory|Brunei\s+Darussalam|Bulgaria|Burkina\s+Faso|Burundi|Cambodia|Cameroon|Canada|Cape\s+Verde|Cayman\s+Islands|Central\s+African\s+Republic|Chad|Chile|China|Christmas\s+Island|Cocos\s+\(Keeling\)\s+Islands|Colombia|Comoros|Democratic\s+People's\s+Republic\s+of\s+Korea|Democratic\s+Republic\s+of\s+Congo|Cook\s+Islands|Costa\s+Rica|Cote\s+D'Ivoire|Croatia|Cuba|Cyprus|Czech\s+Republic|Republic\s+of\s+Korea|Denmark|Djibouti|Dominica|Dominican\s+Republic|East\s+Timor|Ecuador|Egypt|El\s+Salvador|Equatorial\s+Guinea|Eritrea|Estonia|Ethiopia|Falkland\s+Islands\s+\(Malvinas\)|Faroe\s+Islands|Fiji|Finland|France\s+Metropolitan|France|French\s+Guiana|French\s+Polynesia|French\s+Southern\s+Territories|Gabon|Gambia|Georgia|Germany|Ghana|Gibraltar|Greece|Greenland|Grenadaf|Guadeloupe|Guam|Guatemala|Guinea|Guinea\-Bissau|Guyana|Haiti|Heard\s+Island\s+and\s+McDonald\s+Island|Honduras|Hong\s+Kong|Hungary|Iceland|India|Indonesia|Iran|Iraq|Ireland|Northern\s+Ireland|Isle\s+Of\s+Man|Israel|Italy|Jamaica|Japan|Jordan|Kazakhstan|Kenya|Kiribati|Kuwait|Kyrgyzstan|Lao\s+People'S\s+Democratic\s+Republic|Latvia|Lebanon|Lesotho|Liberia|Libya|Liechtenstein|Lithuania|Luxembourg|Macau|Macedonia|Madagascar|Malawi|Malaysia|Maldives|Mali|Malta|Marshall\s+Islands|Martinique|Mauritania|Mauritius|Mayotte|Mexico|Micronesia|Moldova|Monaco|Mongolia|Montserrat|Morocco|Mozambique|Myanmar|Namibia|Nauru|Nepal|Netherlands\s+Antilles|New\s+Caledonia|New\s+Zealand|Nicaragua|Nigeria|Niger|Niue|Norfolk\s+Island|Northern\s+Mariana\s+Islands|Norway|Oman|Pakistan|Palau|Palestine|Panama|Papua\s+New\s+Guinea|Paraguay|Peru|Philippines|Pitcairn|Poland|Portugal|Puerto\s+Rico|Qatar|Reunion|Romania|Russia|Rwanda|Saint\s+Kitts\s+and\s+Nevis|Saint\s+Lucia|Saint\s+Vincent\s+and\s+The\s+Grenadines|Samoa|San\s+Marino|Sao\s+Tome\s+and\s+Principe|Saudi\s+Arabia|Scotland|Senegal|Serbia|Kosovo|Montenegro|Seychelles|Sierra\s+Leone|Singapore|Slovakia|Slovenia|Solomon\s+Islands|Somalia|South\s+Africa|South\s+Georgia\s+and\s+The\s+South\s+Sandwich\s+Islands|Spain|Sri\s+Lanka|St.\s+Helena|St.\s+Pierre\s+and\s+Miquelon|Sudan|Suriname|Svalbard\s+and\s+Jan\s+Mayen\s+Islands|Swaziland|Sweden|Switzerland|Syria|Taiwan|Tajikistan|Tanzania|Thailand|The\s+Netherlands|Togo|Tokelau|Tonga|Trinidad\s+and\s+Tobago|Tunisia|Turkey|Turkmenistan|Turks\s+and\s+Caicos\s+Islands|Tuvalu|Uganda|Ukraine|United\s+Arab\s+Emirates|UAE|UK|United\s+States\s+Minor\s+Outlying\s+Islands|Uruguay|USA|Uzbekistan|Vanuatu|Vatican\s+City\s+State\s+\(Holy\s+See\)|Venezuela|Vietnam|British\s+Virgin\s+Islands|USA\s+Virgin\s+Islands|Wallis\s+and\s+Futuna\s+Islands|Western\s+Sahara|West\s+Indies|Yemen|Zambia|Zimbabwe|Abkhazia|Afghanistan|Akrotiri\s+and\s+Dhekelia|Aland|Ascension\s+Island|The\s+Bahamas|Brunei|Central\s+Africa|Cocos|Congo|Cote\s+d'lvoire|Czech|Dominican|Falkland\s+Islands|Cambia,\s+The|Grenada|Guemsey|Isle\s+of\s+Man|Jersey|Korea|Laos|Macao|Nagorno\-Karabakh|Netherlands|Northern\s+Cyprus|Pitcaim\s+Islands|Sahrawi\s+Arab\s+Democratic|Saint\-Barthelemy|Saint\s+Helena|Saint\s+Martin|Saint\s+Pierre\s+and\s+Miquelon|Saint\s+Vincent\s+and\s+Grenadines|Samos|Somaliland|South\s+Ossetia|Svalbard|Transnistria|Tristan\s+da\s+Cunha|United\s+Kingdom|Vatican\s+City|Virgin\s+Islands|Wallis\s+and\s+Futuna|Espa�a|Witsch|United\s+States|Prague\s+Czech\s+Republic", RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase);
//if (MatchCont.Success==true)
//{
// MatchCont.Result(#"<country>" + Nitem + #"<\country>");
//}
}
}
}
}
}
}
}
Try to include code in you question, it's not a best practice to simply hand out answers. That being said, you'll want to look at the String.Split method, String.Trim and the File.AppendText method.
Simple ways to do this:
string[] stuff = data.Split(',');
StreamWriter sW = File.AppendText(pathToFile);
foreach(string parts in stuff)
{
sW.WriteLine(parts.Trim());
}
Very, very basic, and not giving you the answer without some work on your part. Good luck!
Here's some references: File.AppendText and String.Trim
string input="a,b,c,d";
string [] parts=input.Split(",",StringSplitOptions.RemoveEmptyEntries);
List<string> output=new List<string>();
foreach(string s in parts)
{
// do sth you like;
var newStr="<abc>"+s+"</abc>";
output.Add(newStr);
}
return output.ToArray();

Categories