Related
I am writing a method which processes a large number of SQL procedures written by our previous SQL developer.
I am trying to search the files for the following strings CREATE VIEW, CREATE PROCEDURE, CREATE FUNCTION, CREATE TRIGGER.
The search for these strings in the file needs to be case-insensitive
and should match for any number of spaces between each element, e.g.
CREATE VIEW or CREATE VIEW.
When it finds a match it needs to replace the CREATE with CREATE OR ALTER.
The script shall ignore occurrences such as CREATE TABLE.
The script shall ignore occurrences such as CREATE OR ALTER PROCEDURE.
I started by writing a procedure to process the files line by line (this is because the text to search is always contained within the line), but I got stuck...
/// <summary>
/// This method process each individual line executing the replacement where necessary
/// </summary>
/// <param name="line"></param>
/// <returns></returns>
private static string ProcessLine(string line)
{
// how do I perform the logic here?
return line;
}
/// <summary>
/// This method will process each individual file and create a new file with the _new suffix
/// </summary>
/// <param name="file"></param>
public static void ProcessSqlFile(FileInfo file)
{
StringBuilder sb = new StringBuilder();
var lines = File.ReadAllLines(file.FullName);
for (var i = 0; i < lines.Length; i += 1)
{
sb.Append(ProcessLine(lines[i]));
sb.Append(Environment.NewLine);
}
var outputName = Path.Combine(file.DirectoryName, file.Name +"_new");
File.WriteAllText(outputName, sb.ToString());
}
static void Main(string[] args)
{
var inputPath = new DirectoryInfo(#"...");
var files = inputPath.GetFiles("*.sql");
foreach (var fileInfo in files)
{
ProcessSqlFile(fileInfo);
}
}
You may use Regular Expressions (AKA, Regex) for this. For example, you may use the following pattern:
\bcreate\s+(view|procedure|function|trigger)\b
..and replace with:
CREATE OR ALTER $1
Regex demo.
Regex pattern details:
\b - Ensure a word boundary (avoid matching partial words).
\s+ - Match one or more whitespace characters.
(view|procedure|function|trigger) - Match any of the listed words and capture it in group 1.
\b Ensure a word boundary.
Replacement:
CREATE OR ALTER - Literal string.
$1 - Whatever was captured in group 1.
Full C# example:
string input = "I am trying to search the files for the following strings " +
"CREATE VIEW, CREATE PROCEDURE, CREATE FUNCTION, CREATE TRIGGER";
string output = Regex.Replace(input, #"\bcreate\s+(view|procedure|function|trigger)\b",
#"CREATE OR ALTER $1", RegexOptions.IgnoreCase);
Console.WriteLine(output);
Try it online.
Disclaimer:
As GSerg and Charlieface indicated in the comments, this (and similar solutions) would match false positives in string literals. If you might have those, you'd be better off using an SQL parser as a regex pattern would be overly complicated, in this case, if we wish to cover all edge cases.
the solution that seems to me to be the simplest is not making use of regex, but plain text processing.
I tried to run the following method on a few sql files and it looked good to me.
private static string ProcessLine(string line)
{
if (!line.ToUpper().Contains("CREATE"))
{
return line;
}
var wordArray = line.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
for (var i = 0; i < wordArray.Length - 1; i++)
{
if (wordArray[i].ToUpper() != "CREATE" ||
(wordArray[i + 1].ToUpper() != "VIEW" && wordArray[i + 1].ToUpper() != "PROCEDURE" && wordArray[i + 1].ToUpper() != "FUNCTION" && wordArray[i + 1].ToUpper() != "TRIGGER")) continue;
return line.Replace("CREATE", "CREATE OR ALTER");
}
return line;
}
I found this question, which achieves what I am looking for, however I only have one problem: the "start" and "end" of the substring are the same character.
My string is:
.0.label unicode "Area - 110"
and I want to extract the text between the inverted commas ("Area - 110").
In the linked question, the answers are all using specific identifiers, and IndexOf solutions. The problem is that if I do the same, IndexOf will likely return the same value.
Additionally, if I use Split methods, the text I want to keep is not a fixed length - it could be one word, it could be seven; so I am also having issues specifying the indexes of the first and last word in that collection as well.
The problem is that if I do the same, IndexOf will likely return the same value.
A common trick in this situation is to use LastIndexOf to find the location of the closing double-quote:
int start = str.IndexOf('"');
int end = str.LastIndexOf('"');
if (start >= 0 && end > start) {
// We have two separate locations
Console.WriteLine(str.Substring(start+1, end-start-1));
}
Demo.
I would to it like this:
string str = ".0.label unicode \"Area - 110\"";
str = input.SubString(input.IndexOf("\"") + 1);
str = input.SubString(0, input.IndexOf("\""));
In fact, this is one of my most used helper methods/extensions, because it is quite versatile:
/// <summary>
/// Isolates the text in between the parameters, exclusively, using invariant, case-sensitive comparison.
/// Both parameters may be null to skip either step. If specified but not found, a FormatException is thrown.
/// </summary>
public static string Isolate(this string str, string entryString, string exitString)
{
if (!string.IsNullOrEmpty(entryString))
{
int entry = str.IndexOf(entryString, StringComparison.InvariantCulture);
if (entry == -1) throw new FormatException($"String.Isolate failed: \"{entryString}\" not found in string \"{str.Truncate(80)}\".");
str = str.Substring(entry + entryString.Length);
}
if (!string.IsNullOrEmpty(exitString))
{
int exit = str.IndexOf(exitString, StringComparison.InvariantCulture);
if (exit == -1) throw new FormatException($"String.Isolate failed: \"{exitString}\" not found in string \"{str.Truncate(80)}\".");
str = str.Substring(0, exit);
}
return str;
}
You'd use that like this:
string str = ".0.label unicode \"Area - 110\"";
string output = str.Isolate("\"", "\"");
My code should be translating a phrase into pig latin. Every word must have an "ay" at the end and every first letter of each word should be placed before "ay"
ex wall = "allway"
any ideas? this is the easiest way i could think of..
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace english_to_pig_latin
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("THIS IS A English to Pig Latin translator");
Console.WriteLine("ENTER Phrase");
string[] phrase = Console.ReadLine().Split(' ');
int words = phrase.Length;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < words; i++)
{
//to add ay in the end
/*sb.Append(phrase[i].ToString());
sb.Append("ay ");
Console.WriteLine(sb);*/
}
Console.ReadLine();
}
}
}
First you need to define your pig-latin rules. your description lacks real pig-latin rules. for instance, English "sharp" is correctly "Pig-Latinized" as 'arpshay', not 'harpsay', as your explanation above explained. (But i prefer to use 'arp-sh-ay' to facilitate reading of PigLatin as well as using hyphens make it possible to reverse translate back into English.) i suggest you first find some rules for Pig-Latin. Your start is a good start. Your code now separates a phrase into (almost) words. Note that your code will turn "Please, Joe" into "Please," and "Joe" tho, and you probably do not want that comma sent to your word-by-word translator.
when defining your rules, i suggest you consider how to Pig-Latin-ize these words:
hello --> 'ellohay' (a normal word),
string --> 'ingstray' ('str' is the whole consonant string moved to the end),
apple --> 'appleway', 'appleay', or 'appleyay', (depending on your dialect of Pig-Latin),
queen --> 'eenquay' ('qu' is the consonant string here),
yellow --> 'ellowyay' (y is consonant here),
rhythm --> 'ythmrhay' (y is vowel here),
sky --> 'yskay' (y is vowel here).
Note that for any word that starts with 'qu' (like 'queen'), this 'qu' is a special condition that needs handled too. Note that y is probably a consonant when it begins an English word, but a vowel when in the middle or at the end of a word.
The hyphenated Pig Latin versions of these words would be:
ello-h-ay, ing-str-ay, ('apple-way', 'apple-ay', or 'apple-yay'), 'een-qu-ay', 'ellow-y-ay', 'ythm-rh-ay', and 'y-sk-ay'. The hyphenation allows both easier reading as well as an ability to reverse the Pig Latin back into English by a computer parser. But unfortunately, many people just cram the Pig Latin word together without showing any hyphenation separation, so reversing the translation cannot be done simply without ambiguity.
Real pig-latin really goes by the sound of the word, not the spelling, so without a very complex word to phoneme system, this is way too difficult. but most (good) pig-latin writing translators handle the above cases and ignore other exceptions because English is really a very bad language when it comes to phonetically sounding out words.
So my first suggestion is get a set of rules. my 2nd suggestion is use two functions, PigLatinizePhrase() and PigLatinizeWord() where your PigLatinizePhrase() method parses a phrase into words (and punctuation), and calls PigLatinizeWord() for each word, excluding any punctuation. you can use a simple loop thru each character and test for char.IsLetter to determine if it's a letter or not. if it's a letter then add it to a string builder and move to the next letter. if it's not a letter and the string builder is not empty then send that word to your word parser to parse it, and then add the non-letter to your result. this would be your logic for your PigLatinizePhrase() method. Here is my code which does just that:
/// <summary>
/// </summary>
/// <param name="eng">English text, paragraphs, etc.</param>
/// <param name="suffixWithNoOnset">Used to differentiate between Pig Latin dialects.
/// Known dialects may use any of: "ay", "-ay", "way", "-way", "yay", or "-yay".
/// Cooresponding translations for 'egg' will yield: "eggay", "egg-ay", "eggway", "egg-way", "eggyay", "egg-yay".
/// Or for 'I': "Iay", "I-ay", "Iway", "I-way", "Iyay", "I-yay".
/// </param>
/// <returns></returns>
public static string PigLatinizePhrase(string eng, string suffixWithNoOnset = "-ay")
{
if (eng == null) { return null; } // don't break if null
var word = new StringBuilder(); // only current word, built char by char
var pig = new StringBuilder(); // pig latin text
char prevChar = '\0';
foreach (char thisChar in eng)
{
// the "'" test is so "I'll", "can't", and "Ashley's" will work right.
if (char.IsLetter(thisChar) || thisChar == '\'')
{
word.Append(thisChar);
}
else
{
if (word.Length > 0)
{
pig.Append(PigLatinizeWord(word.ToString(), suffixWithNoOnset));
word = new StringBuilder();
}
pig.Append(thisChar);
}
prevChar = thisChar;
}
if (word.Length > 0)
{
pig.Append(PigLatinizeWord(word.ToString(), suffixWithNoOnset));
}
return pig.ToString();
} // public static string PigLatinizePhrase(string eng, string suffixWithNoOnset = "-ay")
The suffixWithNoOnset variable is simply passed directly to the PigLatinizeWord() method and it determines exactly which 'dialect' of Pig Latin will be used. (See the XML comment before the method in the source code for more clarity.)
For the PigLatinizeWord() method, upon actually programming it, i found that it was very convenient to split this functionality into two methods, one method to parse the English word into the 2 parts that Pig Latin cares about, and another to actually do what is desired with those 2 parts, depending on which version of Pig Latin is desired. Here's the source code for these two functions:
/// <summary>
/// </summary>
/// <param name="eng">English word before being translated to Pig Latin.</param>
/// <param name="suffixWithNoOnset">Used to differentiate between Pig Latin dialects.
/// Known dialects may use any of: "ay", "-ay", "way", "-way", "yay", or "-yay".
/// Cooresponding translations for 'egg' will yield: "eggay", "egg-ay", "eggway", "egg-way", "eggyay", "egg-yay".
/// Or for 'I': "Iay", "I-ay", "Iway", "I-way", "Iyay", "I-yay".
/// </param>
/// <returns></returns>
public static string PigLatinizeWord(string eng, string suffixWithNoOnset = "-ay")
{
if (eng == null || eng.Length == 0) { return eng; } // don't break if null or empty
string[] onsetAndEnd = GetOnsetAndEndOfWord(eng);
// string h = string.Empty;
string o = onsetAndEnd[0]; // 'Onset' of first syllable that gets moved to end of word
string e = onsetAndEnd[1]; // 'End' of word, without the onset
bool hyphenate = suffixWithNoOnset.Contains('-');
// if (hyphenate) { h = "-"; }
var sb = new StringBuilder();
if (e.Length > 0) { sb.Append(e); if (hyphenate && o.Length > 0) { sb.Append('-'); } }
if (o.Length > 0) { sb.Append(o); if (hyphenate) { sb.Append('-'); } sb.Append("ay"); }
else { sb.Append(suffixWithNoOnset); }
return sb.ToString();
} // public static string PigLatinizeWord(string eng)
public static string[] GetOnsetAndEndOfWord(string word)
{
if (word == null) { return null; }
// string[] r = ",".Split(',');
string uppr = word.ToUpperInvariant();
if (uppr.StartsWith("QU")) { return new string[] { word.Substring(0,2), word.Substring(2) }; }
int x = 0; if (word.Length <= x) { return new string[] { string.Empty, string.Empty }; }
if ("AOEUI".Contains(uppr[x])) // tests first letter/character
{ return new string[] { word.Substring(0, x), word.Substring(x) }; }
while (++x < word.Length)
{
if ("AOEUIY".Contains(uppr[x])) // tests each character after first letter/character
{ return new string[] { word.Substring(0, x), word.Substring(x) }; }
}
return new string[] { string.Empty, word };
} // public static string[] GetOnsetAndEndOfWord(string word)
I have written a PigLatinize() method in JavaScript before, which was a lot of fun for me. :) I enjoyed making my C# version with more features, giving it the ability to translate to 6 varyious 'dialects' of Pig Latin, especially since C# is my favorite (programming) language. ;)
I think you need this transformation: phrase[i].Substring(1) + phrase[i][0] + "ay"
Is there any way to format a string by name rather than position in C#?
In python, I can do something like this example (shamelessly stolen from here):
>>> print '%(language)s has %(#)03d quote types.' % \
{'language': "Python", "#": 2}
Python has 002 quote types.
Is there any way to do this in C#? Say for instance:
String.Format("{some_variable}: {some_other_variable}", ...);
Being able to do this using a variable name would be nice, but a dictionary is acceptable too.
There is no built-in method for handling this.
Here's one method
string myString = "{foo} is {bar} and {yadi} is {yada}".Inject(o);
Here's another
Status.Text = "{UserName} last logged in at {LastLoginDate}".FormatWith(user);
A third improved method partially based on the two above, from Phil Haack
Update: This is now built-in as of C# 6 (released in 2015).
String Interpolation
$"{some_variable}: {some_other_variable}"
I have an implementation I just posted to my blog here: http://haacked.com/archive/2009/01/04/fun-with-named-formats-string-parsing-and-edge-cases.aspx
It addresses some issues that these other implementations have with brace escaping. The post has details. It does the DataBinder.Eval thing too, but is still very fast.
Interpolated strings were added into C# 6.0 and Visual Basic 14
Both were introduced through new Roslyn compiler in Visual Studio 2015.
C# 6.0:
return "\{someVariable} and also \{someOtherVariable}" OR
return $"{someVariable} and also {someOtherVariable}"
source: what's new in C#6.0
VB 14:
return $"{someVariable} and also {someOtherVariable}"
source: what's new in VB 14
Noteworthy features (in Visual Studio 2015 IDE):
syntax coloring is supported - variables contained in strings are highlighted
refactoring is supported - when renaming, variables contained in strings get renamed, too
actually not only variable names, but expressions are supported - e.g. not only {index} works, but also {(index + 1).ToString().Trim()}
Enjoy! (& click "Send a Smile" in the VS)
You can also use anonymous types like this:
public string Format(string input, object p)
{
foreach (PropertyDescriptor prop in TypeDescriptor.GetProperties(p))
input = input.Replace("{" + prop.Name + "}", (prop.GetValue(p) ?? "(null)").ToString());
return input;
}
Of course it would require more code if you also want to parse formatting, but you can format a string using this function like:
Format("test {first} and {another}", new { first = "something", another = "something else" })
There doesn't appear to be a way to do this out of the box. Though, it looks feasible to implement your own IFormatProvider that links to an IDictionary for values.
var Stuff = new Dictionary<string, object> {
{ "language", "Python" },
{ "#", 2 }
};
var Formatter = new DictionaryFormatProvider();
// Interpret {0:x} where {0}=IDictionary and "x" is hash key
Console.WriteLine string.Format(Formatter, "{0:language} has {0:#} quote types", Stuff);
Outputs:
Python has 2 quote types
The caveat is that you can't mix FormatProviders, so the fancy text formatting can't be used at the same time.
The framework itself does not provide a way to do this, but you can take a look at this post by Scott Hanselman. Example usage:
Person p = new Person();
string foo = p.ToString("{Money:C} {LastName}, {ScottName} {BirthDate}");
Assert.AreEqual("$3.43 Hanselman, {ScottName} 1/22/1974 12:00:00 AM", foo);
This code by James Newton-King is similar and works with sub-properties and indexes,
string foo = "Top result for {Name} was {Results[0].Name}".FormatWith(student));
James's code relies on System.Web.UI.DataBinder to parse the string and requires referencing System.Web, which some people don't like to do in non-web applications.
EDIT: Oh and they work nicely with anonymous types, if you don't have an object with properties ready for it:
string name = ...;
DateTime date = ...;
string foo = "{Name} - {Birthday}".FormatWith(new { Name = name, Birthday = date });
See https://stackoverflow.com/questions/271398?page=2#358259
With the linked-to extension you can write this:
var str = "{foo} {bar} {baz}".Format(foo=>"foo", bar=>2, baz=>new object());
and you'll get "foo 2 System.Object".
I think the closest you'll get is an indexed format:
String.Format("{0} has {1} quote types.", "C#", "1");
There's also String.Replace(), if you're willing to do it in multiple steps and take it on faith that you won't find your 'variables' anywhere else in the string:
string MyString = "{language} has {n} quote types.";
MyString = MyString.Replace("{language}", "C#").Replace("{n}", "1");
Expanding this to use a List:
List<KeyValuePair<string, string>> replacements = GetFormatDictionary();
foreach (KeyValuePair<string, string> item in replacements)
{
MyString = MyString.Replace(item.Key, item.Value);
}
You could do that with a Dictionary<string, string> too by iterating it's .Keys collections, but by using a List<KeyValuePair<string, string>> we can take advantage of the List's .ForEach() method and condense it back to a one-liner:
replacements.ForEach(delegate(KeyValuePair<string,string>) item) { MyString = MyString.Replace(item.Key, item.Value);});
A lambda would be even simpler, but I'm still on .Net 2.0. Also note that the .Replace() performance isn't stellar when used iteratively, since strings in .Net are immutable. Also, this requires the MyString variable be defined in such a way that it's accessible to the delegate, so it's not perfect yet.
My open source library, Regextra, supports named formatting (amongst other things). It currently targets .NET 4.0+ and is available on NuGet. I also have an introductory blog post about it: Regextra: helping you reduce your (problems){2}.
The named formatting bit supports:
Basic formatting
Nested properties formatting
Dictionary formatting
Escaping of delimiters
Standard/Custom/IFormatProvider string formatting
Example:
var order = new
{
Description = "Widget",
OrderDate = DateTime.Now,
Details = new
{
UnitPrice = 1500
}
};
string template = "We just shipped your order of '{Description}', placed on {OrderDate:d}. Your {{credit}} card will be billed {Details.UnitPrice:C}.";
string result = Template.Format(template, order);
// or use the extension: template.FormatTemplate(order);
Result:
We just shipped your order of 'Widget', placed on 2/28/2014. Your {credit} card will be billed $1,500.00.
Check out the project's GitHub link (above) and wiki for other examples.
private static Regex s_NamedFormatRegex = new Regex(#"\{(?!\{)(?<key>[\w]+)(:(?<fmt>(\{\{|\}\}|[^\{\}])*)?)?\}", RegexOptions.Compiled);
public static StringBuilder AppendNamedFormat(this StringBuilder builder,IFormatProvider provider, string format, IDictionary<string, object> args)
{
if (builder == null) throw new ArgumentNullException("builder");
var str = s_NamedFormatRegex.Replace(format, (mt) => {
string key = mt.Groups["key"].Value;
string fmt = mt.Groups["fmt"].Value;
object value = null;
if (args.TryGetValue(key,out value)) {
return string.Format(provider, "{0:" + fmt + "}", value);
} else {
return mt.Value;
}
});
builder.Append(str);
return builder;
}
public static StringBuilder AppendNamedFormat(this StringBuilder builder, string format, IDictionary<string, object> args)
{
if (builder == null) throw new ArgumentNullException("builder");
return builder.AppendNamedFormat(null, format, args);
}
Example:
var builder = new StringBuilder();
builder.AppendNamedFormat(
#"你好,{Name},今天是{Date:yyyy/MM/dd}, 这是你第{LoginTimes}次登录,积分{Score:{{ 0.00 }}}",
new Dictionary<string, object>() {
{ "Name", "wayjet" },
{ "LoginTimes",18 },
{ "Score", 100.4 },
{ "Date",DateTime.Now }
});
Output:
你好,wayjet,今天是2011-05-04, 这是你第18次登录,积分{ 100.40 }
Check this one:
public static string StringFormat(string format, object source)
{
var matches = Regex.Matches(format, #"\{(.+?)\}");
List<string> keys = (from Match matche in matches select matche.Groups[1].Value).ToList();
return keys.Aggregate(
format,
(current, key) =>
{
int colonIndex = key.IndexOf(':');
return current.Replace(
"{" + key + "}",
colonIndex > 0
? DataBinder.Eval(source, key.Substring(0, colonIndex), "{0:" + key.Substring(colonIndex + 1) + "}")
: DataBinder.Eval(source, key).ToString());
});
}
Sample:
string format = "{foo} is a {bar} is a {baz} is a {qux:#.#} is a really big {fizzle}";
var o = new { foo = 123, bar = true, baz = "this is a test", qux = 123.45, fizzle = DateTime.Now };
Console.WriteLine(StringFormat(format, o));
Performance is pretty ok compared to other solutions.
I doubt this will be possible. The first thing that comes to mind is how are you going to get access to local variable names?
There might be some clever way using LINQ and Lambda expressions to do this however.
Here's one I made a while back. It extends String with a Format method taking a single argument. The nice thing is that it'll use the standard string.Format if you provide a simple argument like an int, but if you use something like anonymous type it'll work too.
Example usage:
"The {Name} family has {Children} children".Format(new { Children = 4, Name = "Smith" })
Would result in "The Smith family has 4 children."
It doesn't do crazy binding stuff like arrays and indexers. But it is super simple and high performance.
public static class AdvancedFormatString
{
/// <summary>
/// An advanced version of string.Format. If you pass a primitive object (string, int, etc), it acts like the regular string.Format. If you pass an anonmymous type, you can name the paramters by property name.
/// </summary>
/// <param name="formatString"></param>
/// <param name="arg"></param>
/// <returns></returns>
/// <example>
/// "The {Name} family has {Children} children".Format(new { Children = 4, Name = "Smith" })
///
/// results in
/// "This Smith family has 4 children
/// </example>
public static string Format(this string formatString, object arg, IFormatProvider format = null)
{
if (arg == null)
return formatString;
var type = arg.GetType();
if (Type.GetTypeCode(type) != TypeCode.Object || type.IsPrimitive)
return string.Format(format, formatString, arg);
var properties = TypeDescriptor.GetProperties(arg);
return formatString.Format((property) =>
{
var value = properties[property].GetValue(arg);
return Convert.ToString(value, format);
});
}
public static string Format(this string formatString, Func<string, string> formatFragmentHandler)
{
if (string.IsNullOrEmpty(formatString))
return formatString;
Fragment[] fragments = GetParsedFragments(formatString);
if (fragments == null || fragments.Length == 0)
return formatString;
return string.Join(string.Empty, fragments.Select(fragment =>
{
if (fragment.Type == FragmentType.Literal)
return fragment.Value;
else
return formatFragmentHandler(fragment.Value);
}).ToArray());
}
private static Fragment[] GetParsedFragments(string formatString)
{
Fragment[] fragments;
if ( parsedStrings.TryGetValue(formatString, out fragments) )
{
return fragments;
}
lock (parsedStringsLock)
{
if ( !parsedStrings.TryGetValue(formatString, out fragments) )
{
fragments = Parse(formatString);
parsedStrings.Add(formatString, fragments);
}
}
return fragments;
}
private static Object parsedStringsLock = new Object();
private static Dictionary<string,Fragment[]> parsedStrings = new Dictionary<string,Fragment[]>(StringComparer.Ordinal);
const char OpeningDelimiter = '{';
const char ClosingDelimiter = '}';
/// <summary>
/// Parses the given format string into a list of fragments.
/// </summary>
/// <param name="format"></param>
/// <returns></returns>
static Fragment[] Parse(string format)
{
int lastCharIndex = format.Length - 1;
int currFragEndIndex;
Fragment currFrag = ParseFragment(format, 0, out currFragEndIndex);
if (currFragEndIndex == lastCharIndex)
{
return new Fragment[] { currFrag };
}
List<Fragment> fragments = new List<Fragment>();
while (true)
{
fragments.Add(currFrag);
if (currFragEndIndex == lastCharIndex)
{
break;
}
currFrag = ParseFragment(format, currFragEndIndex + 1, out currFragEndIndex);
}
return fragments.ToArray();
}
/// <summary>
/// Finds the next delimiter from the starting index.
/// </summary>
static Fragment ParseFragment(string format, int startIndex, out int fragmentEndIndex)
{
bool foundEscapedDelimiter = false;
FragmentType type = FragmentType.Literal;
int numChars = format.Length;
for (int i = startIndex; i < numChars; i++)
{
char currChar = format[i];
bool isOpenBrace = currChar == OpeningDelimiter;
bool isCloseBrace = isOpenBrace ? false : currChar == ClosingDelimiter;
if (!isOpenBrace && !isCloseBrace)
{
continue;
}
else if (i < (numChars - 1) && format[i + 1] == currChar)
{//{{ or }}
i++;
foundEscapedDelimiter = true;
}
else if (isOpenBrace)
{
if (i == startIndex)
{
type = FragmentType.FormatItem;
}
else
{
if (type == FragmentType.FormatItem)
throw new FormatException("Two consequtive unescaped { format item openers were found. Either close the first or escape any literals with another {.");
//curr character is the opening of a new format item. so we close this literal out
string literal = format.Substring(startIndex, i - startIndex);
if (foundEscapedDelimiter)
literal = ReplaceEscapes(literal);
fragmentEndIndex = i - 1;
return new Fragment(FragmentType.Literal, literal);
}
}
else
{//close bracket
if (i == startIndex || type == FragmentType.Literal)
throw new FormatException("A } closing brace existed without an opening { brace.");
string formatItem = format.Substring(startIndex + 1, i - startIndex - 1);
if (foundEscapedDelimiter)
formatItem = ReplaceEscapes(formatItem);//a format item with a { or } in its name is crazy but it could be done
fragmentEndIndex = i;
return new Fragment(FragmentType.FormatItem, formatItem);
}
}
if (type == FragmentType.FormatItem)
throw new FormatException("A format item was opened with { but was never closed.");
fragmentEndIndex = numChars - 1;
string literalValue = format.Substring(startIndex);
if (foundEscapedDelimiter)
literalValue = ReplaceEscapes(literalValue);
return new Fragment(FragmentType.Literal, literalValue);
}
/// <summary>
/// Replaces escaped brackets, turning '{{' and '}}' into '{' and '}', respectively.
/// </summary>
/// <param name="value"></param>
/// <returns></returns>
static string ReplaceEscapes(string value)
{
return value.Replace("{{", "{").Replace("}}", "}");
}
private enum FragmentType
{
Literal,
FormatItem
}
private class Fragment
{
public Fragment(FragmentType type, string value)
{
Type = type;
Value = value;
}
public FragmentType Type
{
get;
private set;
}
/// <summary>
/// The literal value, or the name of the fragment, depending on fragment type.
/// </summary>
public string Value
{
get;
private set;
}
}
}
here is a simple method for any object:
using System.Text.RegularExpressions;
using System.ComponentModel;
public static string StringWithFormat(string format, object args)
{
Regex r = new Regex(#"\{([A-Za-z0-9_]+)\}");
MatchCollection m = r.Matches(format);
var properties = TypeDescriptor.GetProperties(args);
foreach (Match item in m)
{
try
{
string propertyName = item.Groups[1].Value;
format = format.Replace(item.Value, properties[propertyName].GetValue(args).ToString());
}
catch
{
throw new FormatException("The format string is not valid");
}
}
return format;
}
And here how to use it:
DateTime date = DateTime.Now;
string dateString = StringWithFormat("{Month}/{Day}/{Year}", date);
output : 2/27/2012
I implemented this is a simple class that duplicates the functionality of String.Format (except for when using classes). You can either use a dictionary or a type to define fields.
https://github.com/SergueiFedorov/NamedFormatString
C# 6.0 is adding this functionality right into the language spec, so NamedFormatString is for backwards compatibility.
I solved this in a slightly different way to the existing solutions.
It does the core of the named item replacement (not the reflection bit that some have done). It is extremely fast and simple...
This is my solution:
/// <summary>
/// Formats a string with named format items given a template dictionary of the items values to use.
/// </summary>
public class StringTemplateFormatter
{
private readonly IFormatProvider _formatProvider;
/// <summary>
/// Constructs the formatter with the specified <see cref="IFormatProvider"/>.
/// This is defaulted to <see cref="CultureInfo.CurrentCulture">CultureInfo.CurrentCulture</see> if none is provided.
/// </summary>
/// <param name="formatProvider"></param>
public StringTemplateFormatter(IFormatProvider formatProvider = null)
{
_formatProvider = formatProvider ?? CultureInfo.CurrentCulture;
}
/// <summary>
/// Formats a string with named format items given a template dictionary of the items values to use.
/// </summary>
/// <param name="text">The text template</param>
/// <param name="templateValues">The named values to use as replacements in the formatted string.</param>
/// <returns>The resultant text string with the template values replaced.</returns>
public string FormatTemplate(string text, Dictionary<string, object> templateValues)
{
var formattableString = text;
var values = new List<object>();
foreach (KeyValuePair<string, object> value in templateValues)
{
var index = values.Count;
formattableString = ReplaceFormattableItem(formattableString, value.Key, index);
values.Add(value.Value);
}
return String.Format(_formatProvider, formattableString, values.ToArray());
}
/// <summary>
/// Convert named string template item to numbered string template item that can be accepted by <see cref="string.Format(string,object[])">String.Format</see>
/// </summary>
/// <param name="formattableString">The string containing the named format item</param>
/// <param name="itemName">The name of the format item</param>
/// <param name="index">The index to use for the item value</param>
/// <returns>The formattable string with the named item substituted with the numbered format item.</returns>
private static string ReplaceFormattableItem(string formattableString, string itemName, int index)
{
return formattableString
.Replace("{" + itemName + "}", "{" + index + "}")
.Replace("{" + itemName + ",", "{" + index + ",")
.Replace("{" + itemName + ":", "{" + index + ":");
}
}
It is used in the following way:
[Test]
public void FormatTemplate_GivenANamedGuid_FormattedWithB_ShouldFormatCorrectly()
{
// Arrange
var template = "My guid {MyGuid:B} is awesome!";
var templateValues = new Dictionary<string, object> { { "MyGuid", new Guid("{A4D2A7F1-421C-4A1D-9CB2-9C2E70B05E19}") } };
var sut = new StringTemplateFormatter();
// Act
var result = sut.FormatTemplate(template, templateValues);
//Assert
Assert.That(result, Is.EqualTo("My guid {a4d2a7f1-421c-4a1d-9cb2-9c2e70b05e19} is awesome!"));
}
Hope someone finds this useful!
Even though the accepted answer gives some good examples, the .Inject as well as some of the Haack examples do not handle escaping. Many also rely heavily on Regex (slower), or DataBinder.Eval which is not available on .NET Core, and in some other environments.
With that in mind, I've written a simple state machine based parser that streams through characters, writing to a StringBuilder output, character by character. It is implemented as String extension method(s) and can take both a Dictionary<string, object> or object with parameters as input (using reflection).
It handles unlimited levels of {{{escaping}}} and throws FormatException when input contains unbalanced braces and/or other errors.
public static class StringExtension {
/// <summary>
/// Extension method that replaces keys in a string with the values of matching object properties.
/// </summary>
/// <param name="formatString">The format string, containing keys like {foo} and {foo:SomeFormat}.</param>
/// <param name="injectionObject">The object whose properties should be injected in the string</param>
/// <returns>A version of the formatString string with keys replaced by (formatted) key values.</returns>
public static string FormatWith(this string formatString, object injectionObject) {
return formatString.FormatWith(GetPropertiesDictionary(injectionObject));
}
/// <summary>
/// Extension method that replaces keys in a string with the values of matching dictionary entries.
/// </summary>
/// <param name="formatString">The format string, containing keys like {foo} and {foo:SomeFormat}.</param>
/// <param name="dictionary">An <see cref="IDictionary"/> with keys and values to inject into the string</param>
/// <returns>A version of the formatString string with dictionary keys replaced by (formatted) key values.</returns>
public static string FormatWith(this string formatString, IDictionary<string, object> dictionary) {
char openBraceChar = '{';
char closeBraceChar = '}';
return FormatWith(formatString, dictionary, openBraceChar, closeBraceChar);
}
/// <summary>
/// Extension method that replaces keys in a string with the values of matching dictionary entries.
/// </summary>
/// <param name="formatString">The format string, containing keys like {foo} and {foo:SomeFormat}.</param>
/// <param name="dictionary">An <see cref="IDictionary"/> with keys and values to inject into the string</param>
/// <returns>A version of the formatString string with dictionary keys replaced by (formatted) key values.</returns>
public static string FormatWith(this string formatString, IDictionary<string, object> dictionary, char openBraceChar, char closeBraceChar) {
string result = formatString;
if (dictionary == null || formatString == null)
return result;
// start the state machine!
// ballpark output string as two times the length of the input string for performance (avoids reallocating the buffer as often).
StringBuilder outputString = new StringBuilder(formatString.Length * 2);
StringBuilder currentKey = new StringBuilder();
bool insideBraces = false;
int index = 0;
while (index < formatString.Length) {
if (!insideBraces) {
// currently not inside a pair of braces in the format string
if (formatString[index] == openBraceChar) {
// check if the brace is escaped
if (index < formatString.Length - 1 && formatString[index + 1] == openBraceChar) {
// add a brace to the output string
outputString.Append(openBraceChar);
// skip over braces
index += 2;
continue;
}
else {
// not an escaped brace, set state to inside brace
insideBraces = true;
index++;
continue;
}
}
else if (formatString[index] == closeBraceChar) {
// handle case where closing brace is encountered outside braces
if (index < formatString.Length - 1 && formatString[index + 1] == closeBraceChar) {
// this is an escaped closing brace, this is okay
// add a closing brace to the output string
outputString.Append(closeBraceChar);
// skip over braces
index += 2;
continue;
}
else {
// this is an unescaped closing brace outside of braces.
// throw a format exception
throw new FormatException($"Unmatched closing brace at position {index}");
}
}
else {
// the character has no special meaning, add it to the output string
outputString.Append(formatString[index]);
// move onto next character
index++;
continue;
}
}
else {
// currently inside a pair of braces in the format string
// found an opening brace
if (formatString[index] == openBraceChar) {
// check if the brace is escaped
if (index < formatString.Length - 1 && formatString[index + 1] == openBraceChar) {
// there are escaped braces within the key
// this is illegal, throw a format exception
throw new FormatException($"Illegal escaped opening braces within a parameter - index: {index}");
}
else {
// not an escaped brace, we have an unexpected opening brace within a pair of braces
throw new FormatException($"Unexpected opening brace inside a parameter - index: {index}");
}
}
else if (formatString[index] == closeBraceChar) {
// handle case where closing brace is encountered inside braces
// don't attempt to check for escaped braces here - always assume the first brace closes the braces
// since we cannot have escaped braces within parameters.
// set the state to be outside of any braces
insideBraces = false;
// jump over brace
index++;
// at this stage, a key is stored in current key that represents the text between the two braces
// do a lookup on this key
string key = currentKey.ToString();
// clear the stringbuilder for the key
currentKey.Clear();
object outObject;
if (!dictionary.TryGetValue(key, out outObject)) {
// the key was not found as a possible replacement, throw exception
throw new FormatException($"The parameter \"{key}\" was not present in the lookup dictionary");
}
// we now have the replacement value, add the value to the output string
outputString.Append(outObject);
// jump to next state
continue;
} // if }
else {
// character has no special meaning, add it to the current key
currentKey.Append(formatString[index]);
// move onto next character
index++;
continue;
} // else
} // if inside brace
} // while
// after the loop, if all braces were balanced, we should be outside all braces
// if we're not, the input string was misformatted.
if (insideBraces) {
throw new FormatException("The format string ended before the parameter was closed.");
}
return outputString.ToString();
}
/// <summary>
/// Creates a Dictionary from an objects properties, with the Key being the property's
/// name and the Value being the properties value (of type object)
/// </summary>
/// <param name="properties">An object who's properties will be used</param>
/// <returns>A <see cref="Dictionary"/> of property values </returns>
private static Dictionary<string, object> GetPropertiesDictionary(object properties) {
Dictionary<string, object> values = null;
if (properties != null) {
values = new Dictionary<string, object>();
PropertyDescriptorCollection props = TypeDescriptor.GetProperties(properties);
foreach (PropertyDescriptor prop in props) {
values.Add(prop.Name, prop.GetValue(properties));
}
}
return values;
}
}
Ultimately, all the logic boils down into 10 main states - For when the state machine is outside a bracket and likewise inside a bracket, the next character is either an open brace, an escaped open brace, a closed brace, an escaped closed brace, or an ordinary character. Each of these conditions is handled individually as the loop progresses, adding characters to either an output StringBuffer or a key StringBuffer. When a parameter is closed, the value of the key StringBuffer is used to look up the parameter's value in the dictionary, which then gets pushed into the output StringBuffer. At the end, the value of the output StringBuffer is returned.
string language = "Python";
int numquotes = 2;
string output = language + " has "+ numquotes + " language types.";
Edit:
What I should have said was, "No, I don't believe what you want to do is supported by C#. This is as close as you are going to get."
Is there a built-in mechanism in .NET to match patterns other than Regular Expressions? I'd like to match using UNIX style (glob) wildcards (* = any number of any character).
I'd like to use this for a end-user facing control. I fear that permitting all RegEx capabilities will be very confusing.
I like my code a little more semantic, so I wrote this extension method:
using System.Text.RegularExpressions;
namespace Whatever
{
public static class StringExtensions
{
/// <summary>
/// Compares the string against a given pattern.
/// </summary>
/// <param name="str">The string.</param>
/// <param name="pattern">The pattern to match, where "*" means any sequence of characters, and "?" means any single character.</param>
/// <returns><c>true</c> if the string matches the given pattern; otherwise <c>false</c>.</returns>
public static bool Like(this string str, string pattern)
{
return new Regex(
"^" + Regex.Escape(pattern).Replace(#"\*", ".*").Replace(#"\?", ".") + "$",
RegexOptions.IgnoreCase | RegexOptions.Singleline
).IsMatch(str);
}
}
}
(change the namespace and/or copy the extension method to your own string extensions class)
Using this extension, you can write statements like this:
if (File.Name.Like("*.jpg"))
{
....
}
Just sugar to make your code a little more legible :-)
Just for the sake of completeness. Since 2016 in dotnet core there is a new nuget package called Microsoft.Extensions.FileSystemGlobbing that supports advanced globing paths. (Nuget Package)
some examples might be, searching for wildcard nested folder structures and files which is very common in web development scenarios.
wwwroot/app/**/*.module.js
wwwroot/app/**/*.js
This works somewhat similar with what .gitignore files use to determine which files to exclude from source control.
I found the actual code for you:
Regex.Escape( wildcardExpression ).Replace( #"\*", ".*" ).Replace( #"\?", "." );
The 2- and 3-argument variants of the listing methods like GetFiles() and EnumerateDirectories() take a search string as their second argument that supports filename globbing, with both * and ?.
class GlobTestMain
{
static void Main(string[] args)
{
string[] exes = Directory.GetFiles(Environment.CurrentDirectory, "*.exe");
foreach (string file in exes)
{
Console.WriteLine(Path.GetFileName(file));
}
}
}
would yield
GlobTest.exe
GlobTest.vshost.exe
The docs state that there are some caveats with matching extensions. It also states that 8.3 file names are matched (which may be generated automatically behind the scenes), which can result in "duplicate" matches in given some patterns.
The methods that support this are GetFiles(), GetDirectories(), and GetFileSystemEntries(). The Enumerate variants also support this.
If you want to avoid regular expressions this is a basic glob implementation:
public static class Globber
{
public static bool Glob(this string value, string pattern)
{
int pos = 0;
while (pattern.Length != pos)
{
switch (pattern[pos])
{
case '?':
break;
case '*':
for (int i = value.Length; i >= pos; i--)
{
if (Glob(value.Substring(i), pattern.Substring(pos + 1)))
{
return true;
}
}
return false;
default:
if (value.Length == pos || char.ToUpper(pattern[pos]) != char.ToUpper(value[pos]))
{
return false;
}
break;
}
pos++;
}
return value.Length == pos;
}
}
Use it like this:
Assert.IsTrue("text.txt".Glob("*.txt"));
If you use VB.Net, you can use the Like statement, which has Glob like syntax.
http://www.getdotnetcode.com/gdncstore/free/Articles/Intoduction%20to%20the%20VB%20NET%20Like%20Operator.htm
I have written a globbing library for .NETStandard, with tests and benchmarks. My goal was to produce a library for .NET, with minimal dependencies, that doesn't use Regex, and outperforms Regex.
You can find it here:
github.com/dazinator/DotNet.Glob
https://www.nuget.org/packages/DotNet.Glob/
I wrote a FileSelector class that does selection of files based on filenames. It also selects files based on time, size, and attributes. If you just want filename globbing then you express the name in forms like "*.txt" and similar. If you want the other parameters then you specify a boolean logic statement like "name = *.xls and ctime < 2009-01-01" - implying an .xls file created before January 1st 2009. You can also select based on the negative: "name != *.xls" means all files that are not xls.
Check it out.
Open source. Liberal license.
Free to use elsewhere.
Based on previous posts, I threw together a C# class:
using System;
using System.Text.RegularExpressions;
public class FileWildcard
{
Regex mRegex;
public FileWildcard(string wildcard)
{
string pattern = string.Format("^{0}$", Regex.Escape(wildcard)
.Replace(#"\*", ".*").Replace(#"\?", "."));
mRegex = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
}
public bool IsMatch(string filenameToCompare)
{
return mRegex.IsMatch(filenameToCompare);
}
}
Using it would go something like this:
FileWildcard w = new FileWildcard("*.txt");
if (w.IsMatch("Doug.Txt"))
Console.WriteLine("We have a match");
The matching is NOT the same as the System.IO.Directory.GetFiles() method, so don't use them together.
From C# you can use .NET's LikeOperator.LikeString method. That's the backing implementation for VB's LIKE operator. It supports patterns using *, ?, #, [charlist], and [!charlist].
You can use the LikeString method from C# by adding a reference to the Microsoft.VisualBasic.dll assembly, which is included with every version of the .NET Framework. Then you invoke the LikeString method just like any other static .NET method:
using Microsoft.VisualBasic;
using Microsoft.VisualBasic.CompilerServices;
...
bool isMatch = LikeOperator.LikeString("I love .NET!", "I love *", CompareMethod.Text);
// isMatch should be true.
https://www.nuget.org/packages/Glob.cs
https://github.com/mganss/Glob.cs
A GNU Glob for .NET.
You can get rid of the package reference after installing and just compile the single Glob.cs source file.
And as it's an implementation of GNU Glob it's cross platform and cross language once you find another similar implementation enjoy!
I don't know if the .NET framework has glob matching, but couldn't you replace the * with .*? and use regexes?
Just out of curiosity I've glanced into Microsoft.Extensions.FileSystemGlobbing - and it was dragging quite huge dependencies on quite many libraries - I've decided why I cannot try to write something similar?
Well - easy to say than done, I've quickly noticed that it was not so trivial function after all - for example "*.txt" should match for files only in current directly, while "**.txt" should also harvest sub folders.
Microsoft also tests some odd matching pattern sequences like "./*.txt" - I'm not sure who actually needs "./" kind of string - since they are removed anyway while processing.
(https://github.com/aspnet/FileSystem/blob/dev/test/Microsoft.Extensions.FileSystemGlobbing.Tests/PatternMatchingTests.cs)
Anyway, I've coded my own function - and there will be two copies of it - one in svn (I might bugfix it later on) - and I'll copy one sample here as well for demo purposes. I recommend to copy paste from svn link.
SVN Link:
https://sourceforge.net/p/syncproj/code/HEAD/tree/SolutionProjectBuilder.cs#l800
(Search for matchFiles function if not jumped correctly).
And here is also local function copy:
/// <summary>
/// Matches files from folder _dir using glob file pattern.
/// In glob file pattern matching * reflects to any file or folder name, ** refers to any path (including sub-folders).
/// ? refers to any character.
///
/// There exists also 3-rd party library for performing similar matching - 'Microsoft.Extensions.FileSystemGlobbing'
/// but it was dragging a lot of dependencies, I've decided to survive without it.
/// </summary>
/// <returns>List of files matches your selection</returns>
static public String[] matchFiles( String _dir, String filePattern )
{
if (filePattern.IndexOfAny(new char[] { '*', '?' }) == -1) // Speed up matching, if no asterisk / widlcard, then it can be simply file path.
{
String path = Path.Combine(_dir, filePattern);
if (File.Exists(path))
return new String[] { filePattern };
return new String[] { };
}
String dir = Path.GetFullPath(_dir); // Make it absolute, just so we can extract relative path'es later on.
String[] pattParts = filePattern.Replace("/", "\\").Split('\\');
List<String> scanDirs = new List<string>();
scanDirs.Add(dir);
//
// By default glob pattern matching specifies "*" to any file / folder name,
// which corresponds to any character except folder separator - in regex that's "[^\\]*"
// glob matching also allow double astrisk "**" which also recurses into subfolders.
// We split here each part of match pattern and match it separately.
//
for (int iPatt = 0; iPatt < pattParts.Length; iPatt++)
{
bool bIsLast = iPatt == (pattParts.Length - 1);
bool bRecurse = false;
String regex1 = Regex.Escape(pattParts[iPatt]); // Escape special regex control characters ("*" => "\*", "." => "\.")
String pattern = Regex.Replace(regex1, #"\\\*(\\\*)?", delegate (Match m)
{
if (m.ToString().Length == 4) // "**" => "\*\*" (escaped) - we need to recurse into sub-folders.
{
bRecurse = true;
return ".*";
}
else
return #"[^\\]*";
}).Replace(#"\?", ".");
if (pattParts[iPatt] == "..") // Special kind of control, just to scan upper folder.
{
for (int i = 0; i < scanDirs.Count; i++)
scanDirs[i] = scanDirs[i] + "\\..";
continue;
}
Regex re = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
int nScanItems = scanDirs.Count;
for (int i = 0; i < nScanItems; i++)
{
String[] items;
if (!bIsLast)
items = Directory.GetDirectories(scanDirs[i], "*", (bRecurse) ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);
else
items = Directory.GetFiles(scanDirs[i], "*", (bRecurse) ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);
foreach (String path in items)
{
String matchSubPath = path.Substring(scanDirs[i].Length + 1);
if (re.Match(matchSubPath).Success)
scanDirs.Add(path);
}
}
scanDirs.RemoveRange(0, nScanItems); // Remove items what we have just scanned.
} //for
// Make relative and return.
return scanDirs.Select( x => x.Substring(dir.Length + 1) ).ToArray();
} //matchFiles
If you find any bugs, I'll be grad to fix them.
I wrote a solution that does it. It does not depend on any library and it does not support "!" or "[]" operators. It supports the following search patterns:
C:\Logs\*.txt
C:\Logs\**\*P1?\**\asd*.pdf
/// <summary>
/// Finds files for the given glob path. It supports ** * and ? operators. It does not support !, [] or ![] operators
/// </summary>
/// <param name="path">the path</param>
/// <returns>The files that match de glob</returns>
private ICollection<FileInfo> FindFiles(string path)
{
List<FileInfo> result = new List<FileInfo>();
//The name of the file can be any but the following chars '<','>',':','/','\','|','?','*','"'
const string folderNameCharRegExp = #"[^\<\>:/\\\|\?\*" + "\"]";
const string folderNameRegExp = folderNameCharRegExp + "+";
//We obtain the file pattern
string filePattern = Path.GetFileName(path);
List<string> pathTokens = new List<string>(Path.GetDirectoryName(path).Split('\\', '/'));
//We obtain the root path from where the rest of files will obtained
string rootPath = null;
bool containsWildcardsInDirectories = false;
for (int i = 0; i < pathTokens.Count; i++)
{
if (!pathTokens[i].Contains("*")
&& !pathTokens[i].Contains("?"))
{
if (rootPath != null)
rootPath += "\\" + pathTokens[i];
else
rootPath = pathTokens[i];
pathTokens.RemoveAt(0);
i--;
}
else
{
containsWildcardsInDirectories = true;
break;
}
}
if (Directory.Exists(rootPath))
{
//We build the regular expression that the folders should match
string regularExpression = rootPath.Replace("\\", "\\\\").Replace(":", "\\:").Replace(" ", "\\s");
foreach (string pathToken in pathTokens)
{
if (pathToken == "**")
{
regularExpression += string.Format(CultureInfo.InvariantCulture, #"(\\{0})*", folderNameRegExp);
}
else
{
regularExpression += #"\\" + pathToken.Replace("*", folderNameCharRegExp + "*").Replace(" ", "\\s").Replace("?", folderNameCharRegExp);
}
}
Regex globRegEx = new Regex(regularExpression, RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
string[] directories = Directory.GetDirectories(rootPath, "*", containsWildcardsInDirectories ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);
foreach (string directory in directories)
{
if (globRegEx.Matches(directory).Count > 0)
{
DirectoryInfo directoryInfo = new DirectoryInfo(directory);
result.AddRange(directoryInfo.GetFiles(filePattern));
}
}
}
return result;
}
Unfortunately the accepted answer will not handle escaped input correctly, because string .Replace("\*", ".*") fails to distinguish between "*" and "\*" - it will happily replace "*" in both of these strings, leading to incorrect results.
Instead, a basic tokenizer can be used to convert the glob path into a regex pattern, which can then be matched against a filename using Regex.Match. This is a more robust and flexible solution.
Here is a method to do this. It handles ?, *, and **, and surrounds each of these globs with a capture group, so the values of each glob can be inspected after the Regex has been matched.
static string GlobbedPathToRegex(ReadOnlySpan<char> pattern, ReadOnlySpan<char> dirSeparatorChars)
{
StringBuilder builder = new StringBuilder();
builder.Append('^');
ReadOnlySpan<char> remainder = pattern;
while (remainder.Length > 0)
{
int specialCharIndex = remainder.IndexOfAny('*', '?');
if (specialCharIndex >= 0)
{
ReadOnlySpan<char> segment = remainder.Slice(0, specialCharIndex);
if (segment.Length > 0)
{
string escapedSegment = Regex.Escape(segment.ToString());
builder.Append(escapedSegment);
}
char currentCharacter = remainder[specialCharIndex];
char nextCharacter = specialCharIndex < remainder.Length - 1 ? remainder[specialCharIndex + 1] : '\0';
switch (currentCharacter)
{
case '*':
if (nextCharacter == '*')
{
// We have a ** glob expression
// Match any character, 0 or more times.
builder.Append("(.*)");
// Skip over **
remainder = remainder.Slice(specialCharIndex + 2);
}
else
{
// We have a * glob expression
// Match any character that isn't a dirSeparatorChar, 0 or more times.
if(dirSeparatorChars.Length > 0) {
builder.Append($"([^{Regex.Escape(dirSeparatorChars.ToString())}]*)");
}
else {
builder.Append("(.*)");
}
// Skip over *
remainder = remainder.Slice(specialCharIndex + 1);
}
break;
case '?':
builder.Append("(.)"); // Regex equivalent of ?
// Skip over ?
remainder = remainder.Slice(specialCharIndex + 1);
break;
}
}
else
{
// No more special characters, append the rest of the string
string escapedSegment = Regex.Escape(remainder.ToString());
builder.Append(escapedSegment);
remainder = ReadOnlySpan<char>.Empty;
}
}
builder.Append('$');
return builder.ToString();
}
The to use it:
string testGlobPathInput = "/Hello/Test/Blah/**/test*123.fil?";
string globPathRegex = GlobbedPathToRegex(testGlobPathInput, "/"); // Could use "\\/" directory separator chars on Windows
Console.WriteLine($"Globbed path: {testGlobPathInput}");
Console.WriteLine($"Regex conversion: {globPathRegex}");
string testPath = "/Hello/Test/Blah/All/Hail/The/Hypnotoad/test_somestuff_123.file";
Console.WriteLine($"Test Path: {testPath}");
var regexGlobPathMatch = Regex.Match(testPath, globPathRegex);
Console.WriteLine($"Match: {regexGlobPathMatch.Success}");
for(int i = 0; i < regexGlobPathMatch.Groups.Count; i++) {
Console.WriteLine($"Group [{i}]: {regexGlobPathMatch.Groups[i]}");
}
Output:
Globbed path: /Hello/Test/Blah/**/test*123.fil?
Regex conversion: ^/Hello/Test/Blah/(.*)/test([^/]*)123\.fil(.)$
Test Path: /Hello/Test/Blah/All/Hail/The/Hypnotoad/test_somestuff_123.file
Match: True
Group [0]: /Hello/Test/Blah/All/Hail/The/Hypnotoad/test_somestuff_123.file
Group [1]: All/Hail/The/Hypnotoad
Group [2]: _somestuff_
Group [3]: e
I have created a gist here as a canonical version of this method:
https://gist.github.com/crozone/9a10156a37c978e098e43d800c6141ad