Converting Wordpress sanitize filename function from PHP to C# - c#

I'm trying to convert Wordpress sanitize_file_name function from PHP to C# so I can use it to generate unicode slugs for my site's articles on a web app that build myself.
This is my class:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using System.Web;
namespace MyProject.Helpers
{
public static class Slug
{
public static string SanitizeFileName(string filename)
{
string[] specialChars = { "?", "[", "]", "/", "\\", "=", "< ", "> ", ":", ";", ",", "'", "\"", "& ", "$", "#", "*", "(", ")", "|", "~", "`", "!", "{", "}" };
filename = MyStrReplace(filename, specialChars, "");
filename = Regex.Replace(filename, #"/[\s-]+/", "-");
filename.TrimEnd('-').TrimStart('-');
filename.TrimEnd('.').TrimStart('.');
filename.TrimEnd('_').TrimStart('_');
return filename;
}
private static string MyStrReplace(string strToCheck, string[] strToReplace, string newValue)
{
foreach (string s in strToReplace)
{
strToCheck = strToCheck.Replace(s, newValue);
}
return strToCheck;
}
// source: http://stackoverflow.com/questions/166855/c-sharp-preg-replace
public static string PregReplace(string input, string[] pattern, string[] replacements)
{
if (replacements.Length != pattern.Length)
throw new ArgumentException("Replacement and Pattern Arrays must be balanced");
for (int i = 0; i < pattern.Length; i++)
{
input = Regex.Replace(input, pattern[i], replacements[i]);
}
return input;
}
}
}
I put a title like: "let's say that I have --- in there what to do" but I get the same results only with the single apostrophe trimmed (let's -> lets), nothing else changed.
I want the same equivalent conversion as Wordpress'. Using ASP.NET 4.5 / C#

Since in C# you do not have action modifiers, there are no regex delimiters.
The solution is simply to remove / symbols from the pattern:
filename = Regex.Replace(filename, #"[\s-]+", "-");
^ ^

Related

Parsing a list of values with option to empty list

I'm trying to parse an array of items, using Sprache library for C# I have a working code that goes like this.
public static Parser<string> Array =
from open in OpenBrackets.Named("open bracket")
from items in Literal.Or(Identifier).Or(Array).DelimitedBy(Comma).Optional()
from close in CloseBrackets.Named("close bracket")
select open + (items.IsDefined ? string.Join(", ", items.Get()) : " ") + close;
where "Literal" is a parser for numbers or strings, "Identifier" is a parser for a variable identifier and "Comma" is a parser for a comma token. But if I want the array to allow being empty "[ ]" I need to add the Optional() property and verify if "items" is defined:
select open + (items.IsDefined ? string.Join(", ", items.Get()) : " ") + close;
Is there a better cleaner way to do this for parsing a list of items separated by a separator char, that can be empty (list). That I can reuse with other lists of items.
Sample of input data structure:
[Literal/Identifier/Array] => Value;
[Value] [,Value]* => Array
[public/private] [identifier]; => Declaration;
[public/private] [identifier] [[=] [Value]] => Initialization;
A little cleaner way can be accomplished by GetOrElse method.
select open + string.Join(", ", items.GetOrElse(new string[0])) + close;
Try using Regex as in code below :
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.txt";
static void Main(string[] args)
{
StreamReader reader = new StreamReader(FILENAME);
string pattern = #"\[(?'bracketData'[^\]]+)\](?'repeat'[*+])?";
string line = "";
while ((line = reader.ReadLine()) != null)
{
line = line.Trim();
if (line.Length > 0)
{
string suffix = line.Split(new string[] {"=>"}, StringSplitOptions.None).Select(x => x.Trim()).Last();
MatchCollection matches = Regex.Matches(line, pattern);
var brackets = matches.Cast<Match>().Select(x => new { bracket = x.Groups["bracketData"].Value, repeat = x.Groups["repeat"].Value }).ToArray();
Console.WriteLine("Brackets : '{0}'; Suffix : '{1}'", string.Join(",", brackets.Select(x => "(" + x.bracket + ")" + x.repeat )), suffix);
}
}
Console.ReadLine();
}
}
}

String.Replace() will not change the source string of the variable [duplicate]

This question already has answers here:
C# string replace does not actually replace the value in the string [duplicate]
(3 answers)
Closed 6 years ago.
I have the following code inside my c# application:-
string[] excludechar = { "|", "\\", "\"", "'", "/", "[", " ]", ":", "<", " >", "+", "=", ",", ";", "?", "*", " #" };
var currentgroupname = curItemSiteName;
for (int i = 0; i < excludechar.Length; i++)
{
if (currentgroupname.Contains(excludechar[i]))
currentgroupname.Replace(excludechar[i], "");
}
site.RootWeb.SiteGroups.Add(currentgroupname)
now in my abive code the currentgroupname variable which i am passing inside the .ADD function will have all the special characters i have replaced inside my for loop. so can anyone adivce if i can modify my code so the .Replace will be actually replacing the original string of the currentgroupname ...
You are not actually assigning the "replaced" string to currentgroupname
string[] excludechar = { "|", "\\", "\"", "'", "/", "[", " ]", ":", "<", " >", "+", "=", ",", ";", "?", "*", " #" };
var currentgroupname = curItemSiteName;
for (int i = 0; i < excludechar.Length; i++)
{
if (currentgroupname.Contains(excludechar[i]))
currentgroupname = currentgroupname.Replace(excludechar[i], "");
}
site.RootWeb.SiteGroups.Add(currentgroupname)
Isn't it easier with some regex?
var input = "<>+--!I/have:many#invalid\\|\\characters";
var result = Regex.Replace(input, #"\W", "");
Console.Write(result); //Ihavemanyinvalidcharacters
Don't forget the using:
using System.Text.RegularExpressions;

How to check if a string contains substring with wildcard? like abc*xyz

When I parse lines in text file, I want to check if a line contains abc*xyz, where * is a wildcard. abc*xyz is a user input format.
You can generate Regex and match using it
searchPattern = "abc*xyz";
inputText = "SomeTextAndabc*xyz";
public bool Contains(string searchPattern,string inputText)
{
string regexText = WildcardToRegex(searchPattern);
Regex regex = new Regex(regexText , RegexOptions.IgnoreCase);
if (regex.IsMatch(inputText ))
{
return true;
}
return false;
}
public static string WildcardToRegex(string pattern)
{
return "^" + Regex.Escape(pattern)
.Replace(#"\*", ".*")
.Replace(#"\?", ".")
+ "$";
}
Here is the source
and Here is a similar issue
If asterisk is the only wildcard character that you wish to allow, you could replace all asterisks with .*?, and use regular expressions:
var filter = "[quick*jumps*lazy dog]";
var parts = filter.Split('*').Select(s => Regex.Escape(s)).ToArray();
var regex = string.Join(".*?", parts);
This produces \[quick.*?jumps.*?lazy\ dog] regex, suitable for matching inputs.
Demo.
Use Regex
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string prefix = "abc";
string suffix = "xyz";
string pattern = string.Format("{0}.*{1}", prefix, suffix);
string input = "abc123456789xyz";
bool resutls = Regex.IsMatch(input, pattern);
}
}
}
​

Writing into .txt file without erasing previous data C#

I am trying to split a string in a .txt-file by commas (,) into a string[] and then replacing every item of the string[] to another formula, for example:
"Marko Kostic, Faculty of Technical Sciences, University of Novi Sad,
Trg D. Obradovica 6, 21125 Novi Sad, Serbia"
I want to split this string by commas in between the words and then I want to put every value in separate line like a list and then changing every value with another like "Marko Kostic" to be
<addr-line>Marko Kostic<\addr-line>
The problem is the writer wrote only the last value of string[] and erase the previous values.
Any suggestions?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;
using Microsoft.Office.Interop;
using Microsoft.Office.Interop.Word;
using System.Diagnostics;
using System.Reflection;
using System.Collections;
using System.Runtime.InteropServices;
namespace AffiliationParser
{
class Program
{
static void Main(string[] args)
{
Microsoft.Office.Interop.Word.Application oWord = new Microsoft.Office.Interop.Word.Application();
object missing = System.Reflection.Missing.Value;
object isVisible = false;
using (StreamReader batch = new StreamReader(#"D:\Developing\REF\AffiliationParser\AffiliationParser\AffiliationParser\bin\Debug\Run.bat"))
{
string bat;
while (!batch.EndOfStream)
{
bat = batch.ReadLine();
// do your processing with batch command
if (bat == "pause")
{
continue;
}
string fpath = bat.Substring(bat.IndexOf(" \""));
string path = fpath.Replace("\"", "").Replace(" ","");
string[] name = Directory.GetFiles(path, "*.txt");
string words = name.Min();
string word = words.Substring(words.LastIndexOf("\\")).Replace("\\", "");
Console.WriteLine("Processing........");
Console.WriteLine(word);
string Npath = path + #"\Arr" + word;
if (File.Exists(Npath))
{
System.Windows.Forms.MessageBox.Show("The file Arr" + word + " alredy exist in " + path);
continue;
}
else
{
File.Copy(words, Npath);
StreamReader temp = new StreamReader(Npath, Encoding.UTF8);
string tempstring = temp.ReadToEnd();
string[] temp3 = tempstring.Split(',');
temp.Close();
foreach (string item in temp3)
{
string Nitem = item.TrimStart().TrimEnd();
//Match MatchCont = Regex.Match(Nitem, #"Afganistan|Albania|Algeria|American\s+Samoa|Andorra|Angola|Anguilla|Antarctica|Antigua\s+and\s+Barbuda|Argentina|Armenia|Aruba|Australia|Austria|Azerbaijan|Bahamas|Bahrain|Bangladesh|Barbados|Belarus|Belgium|Belize|Benin|Bermuda|Bhutan|Bolivia|Bosnia\s+and\s+Herzegovina|Botswana|Bouvet\s+Island|Brazil|British\s+Indian\s+Ocean\s+Territory|Brunei\s+Darussalam|Bulgaria|Burkina\s+Faso|Burundi|Cambodia|Cameroon|Canada|Cape\s+Verde|Cayman\s+Islands|Central\s+African\s+Republic|Chad|Chile|China|Christmas\s+Island|Cocos\s+\(Keeling\)\s+Islands|Colombia|Comoros|Democratic\s+People's\s+Republic\s+of\s+Korea|Democratic\s+Republic\s+of\s+Congo|Cook\s+Islands|Costa\s+Rica|Cote\s+D'Ivoire|Croatia|Cuba|Cyprus|Czech\s+Republic|Republic\s+of\s+Korea|Denmark|Djibouti|Dominica|Dominican\s+Republic|East\s+Timor|Ecuador|Egypt|El\s+Salvador|Equatorial\s+Guinea|Eritrea|Estonia|Ethiopia|Falkland\s+Islands\s+\(Malvinas\)|Faroe\s+Islands|Fiji|Finland|France\s+Metropolitan|France|French\s+Guiana|French\s+Polynesia|French\s+Southern\s+Territories|Gabon|Gambia|Georgia|Germany|Ghana|Gibraltar|Greece|Greenland|Grenadaf|Guadeloupe|Guam|Guatemala|Guinea|Guinea\-Bissau|Guyana|Haiti|Heard\s+Island\s+and\s+McDonald\s+Island|Honduras|Hong\s+Kong|Hungary|Iceland|India|Indonesia|Iran|Iraq|Ireland|Northern\s+Ireland|Isle\s+Of\s+Man|Israel|Italy|Jamaica|Japan|Jordan|Kazakhstan|Kenya|Kiribati|Kuwait|Kyrgyzstan|Lao\s+People'S\s+Democratic\s+Republic|Latvia|Lebanon|Lesotho|Liberia|Libya|Liechtenstein|Lithuania|Luxembourg|Macau|Macedonia|Madagascar|Malawi|Malaysia|Maldives|Mali|Malta|Marshall\s+Islands|Martinique|Mauritania|Mauritius|Mayotte|Mexico|Micronesia|Moldova|Monaco|Mongolia|Montserrat|Morocco|Mozambique|Myanmar|Namibia|Nauru|Nepal|Netherlands\s+Antilles|New\s+Caledonia|New\s+Zealand|Nicaragua|Nigeria|Niger|Niue|Norfolk\s+Island|Northern\s+Mariana\s+Islands|Norway|Oman|Pakistan|Palau|Palestine|Panama|Papua\s+New\s+Guinea|Paraguay|Peru|Philippines|Pitcairn|Poland|Portugal|Puerto\s+Rico|Qatar|Reunion|Romania|Russia|Rwanda|Saint\s+Kitts\s+and\s+Nevis|Saint\s+Lucia|Saint\s+Vincent\s+and\s+The\s+Grenadines|Samoa|San\s+Marino|Sao\s+Tome\s+and\s+Principe|Saudi\s+Arabia|Scotland|Senegal|Serbia|Kosovo|Montenegro|Seychelles|Sierra\s+Leone|Singapore|Slovakia|Slovenia|Solomon\s+Islands|Somalia|South\s+Africa|South\s+Georgia\s+and\s+The\s+South\s+Sandwich\s+Islands|Spain|Sri\s+Lanka|St.\s+Helena|St.\s+Pierre\s+and\s+Miquelon|Sudan|Suriname|Svalbard\s+and\s+Jan\s+Mayen\s+Islands|Swaziland|Sweden|Switzerland|Syria|Taiwan|Tajikistan|Tanzania|Thailand|The\s+Netherlands|Togo|Tokelau|Tonga|Trinidad\s+and\s+Tobago|Tunisia|Turkey|Turkmenistan|Turks\s+and\s+Caicos\s+Islands|Tuvalu|Uganda|Ukraine|United\s+Arab\s+Emirates|UAE|UK|United\s+States\s+Minor\s+Outlying\s+Islands|Uruguay|USA|Uzbekistan|Vanuatu|Vatican\s+City\s+State\s+\(Holy\s+See\)|Venezuela|Vietnam|British\s+Virgin\s+Islands|USA\s+Virgin\s+Islands|Wallis\s+and\s+Futuna\s+Islands|Western\s+Sahara|West\s+Indies|Yemen|Zambia|Zimbabwe|Abkhazia|Afghanistan|Akrotiri\s+and\s+Dhekelia|Aland|Ascension\s+Island|The\s+Bahamas|Brunei|Central\s+Africa|Cocos|Congo|Cote\s+d'lvoire|Czech|Dominican|Falkland\s+Islands|Cambia,\s+The|Grenada|Guemsey|Isle\s+of\s+Man|Jersey|Korea|Laos|Macao|Nagorno\-Karabakh|Netherlands|Northern\s+Cyprus|Pitcaim\s+Islands|Sahrawi\s+Arab\s+Democratic|Saint\-Barthelemy|Saint\s+Helena|Saint\s+Martin|Saint\s+Pierre\s+and\s+Miquelon|Saint\s+Vincent\s+and\s+Grenadines|Samos|Somaliland|South\s+Ossetia|Svalbard|Transnistria|Tristan\s+da\s+Cunha|United\s+Kingdom|Vatican\s+City|Virgin\s+Islands|Wallis\s+and\s+Futuna|Espa�a|Witsch|United\s+States|Prague\s+Czech\s+Republic", RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase);
//if (MatchCont.Success==true)
//{
// MatchCont.Result(#"<country>" + Nitem + #"<\country>");
//}
}
}
}
}
}
}
}
Try to include code in you question, it's not a best practice to simply hand out answers. That being said, you'll want to look at the String.Split method, String.Trim and the File.AppendText method.
Simple ways to do this:
string[] stuff = data.Split(',');
StreamWriter sW = File.AppendText(pathToFile);
foreach(string parts in stuff)
{
sW.WriteLine(parts.Trim());
}
Very, very basic, and not giving you the answer without some work on your part. Good luck!
Here's some references: File.AppendText and String.Trim
string input="a,b,c,d";
string [] parts=input.Split(",",StringSplitOptions.RemoveEmptyEntries);
List<string> output=new List<string>();
foreach(string s in parts)
{
// do sth you like;
var newStr="<abc>"+s+"</abc>";
output.Add(newStr);
}
return output.ToArray();

How to encode strings for Regular Expression in .NET?

I need to dynamically build a Regex to catch the given keywords, like
string regex = "(some|predefined|words";
foreach (Product product in products)
regex += "|" + product.Name; // Need to encode product.Name because it can include special characters.
regex += ")";
Is there some kind of Regex.Encode that does this?
You can use Regex.Escape. For example:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
public class Test
{
static void Main()
{
string[] predefined = { "some", "predefined", "words" };
string[] products = { ".NET", "C#", "C# (2)" };
IEnumerable<string> escapedKeywords =
predefined.Concat(products)
.Select(Regex.Escape);
Regex regex = new Regex("(" + string.Join("|", escapedKeywords) + ")");
Console.WriteLine(regex);
}
}
Output:
(some|predefined|words|\.NET|C\#|C\#\ \(2\))
Or without the LINQ, but using string concatenation in a loop (which I try to avoid) as per your original code:
string regex = "(some|predefined|words";
foreach (Product product)
regex += "|" + Regex.Escape(product.Name);
regex += ")";

Categories