Search list of filenames with wildcards - c#

I have a list of filenames and one DOS-style wildcard as the search parameter.
List<string> filenames = FileNames;
How can I receive only files which match my wildcard ("p*.doc", "w*.*", "??r.doc?", and so on)?
Yes, I know about Directory.GetFiles, but I don't have a directory. Only filenames.

What you could do is make a regex from the DOS-style wildcard filename specification, then you can select the filenames from the list which match the regex, something like:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static string DosWildcardToRegex(string dwc)
{
string s = "";
foreach (char c in dwc)
{
if (c == '.') { s += #"\."; }
else if (c == '*') { s += ".*?"; }
else if (c == '?') { s += "."; }
else s += Regex.Escape(c.ToString());
}
return "^" + s + "$";
}
static void Main(string[] args)
{
string userWildcardSpec = "p*.xls*";
string reWildcard = DosWildcardToRegex(userWildcardSpec);
Regex re = new Regex(reWildcard, RegexOptions.IgnoreCase);
var sampleFilenames = new List<string>() { "a1.xls", "p2.txt", "p1.xls", "p234.xls", "p7.xlsx" };
var matchedFilenames = sampleFilenames.Where(f => re.IsMatch(f));
Console.WriteLine(string.Join("\r\n", matchedFilenames));
Console.ReadLine();
}
}
}
Which outputs:
p1.xls
p234.xls
p7.xlsx
In a regex, a . is a special character which means "any character", so it needs to be escaped as \..
In a DOS wildcard, * means zero-or-more of any character, which is .*? in a regex (the ? makes it "non-greedy").
In a DOS wildcard, ? means one character, which is . in a regex.
In case there is some other special regex character, they are escaped where necessary.
The ^ and $ mean the start and end of the string respectively, so it won't get an unwanted match from the middle of a filename.

The easiest way:
using Microsoft.VisualBasic.CompilerServices;
...
var filenames = new List<string>() { "b.pdf", "al.rtf", "mel.doc"};
var matched = filenames.Where(f => LikeOperator.LikeString(f, "*.d*", CompareMethod.Text));

Related

Split string into multiple alpha and numeric segments

I have a string like "ABCD232ERE44RR". How can I split it into separate segments by letters/numbers. I need:
Segment1: ABCD
Segment2: 232
Segment3: ERE
Segment4: 44
There could be any number of segments. I am thinking go Regex but don't understand how to write it properly
You can do it like this;
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var substrings = Regex.Split("ABCD232ERE44RR", #"[^A-Z0-9]+|(?<=[A-Z])(?=[0-9])|(?<=[0-9])(?=[A-Z])");
Console.WriteLine(string.Join(",",substrings));
}
}
Output : ABCD,232,ERE,44,RR
I suggest thinking of this as finding matches to a target pattern rather than splitting into the parts you want. Splitting gives significance to the delimiters whereas matching gives significance to the tokens.
You can use Regex.Matches:
Searches the specified input string for all occurrences of a specified regular expression.
var matches = Regex.Matches("ABCD232ERE44RR", "[A-Z]+|[0-9]+");
foreach (Match match in matches) {
Console.WriteLine("Found '{0}' at position {1}", match.Value, match.Index);
}
Try something like:
((A-Z)+(\d)*)+
If you decide not to use regex, you can always go the manual route.
const string str = "ABCD232ERE44RR1SGGSG3333GSDGSDG";
var result = new List<StringBuilder>
{
new StringBuilder()
};
char last = str[0];
result.Last().Append(last);
bool isLastNum = Char.IsNumber(last);
for (int i = 1; i < str.Length; i++)
{
char ch = str[i];
if (!((Char.IsDigit(ch) && isLastNum) || (Char.IsLetter(ch) && !isLastNum)))
{
result.Add(new StringBuilder());
}
result.Last().Append(ch);
last = ch;
isLastNum = Char.IsDigit(ch);
}

C# - Read between parentheses in RichTextBox

So I want to find out what is in between specific parentheses in a richtextbox. For example:
if (richTextBox1.Text.Contains("testname(") {
// Find what is in brackets of testname()
String outcome = //what is in brackets of testname()
}
This may be hard to understand, but let's say this is the richtextbox:
testname(name)
Then the string outcome would be name.
You can use regular expression to get the value between the parenthesis.
string text = richTextBox1.Text; // testname("some text")
string value = Regex.Match(text, #"\(([^)]*)\)").Groups[1].Value;
wrote from iPad so not tested, but something along these lines should get you there
String a = "testname(";
String b = "testname(Val)";
int x = b.IndexOf(a);
If (richTextBox1.Text.Contains(a)){
if (x != -1)
{
string final = b.Substring(x + a.Length);
}
}
string outcome = richTextBox1.Text;
outcome = outcome.Substring(outcome.IndexOf("(")+1, outcome.IndexOf(")") - outcome.IndexOf("("));
Regex would be the way to go.
Here I am matching exactly a parenthesis followed by any character one or more times followed by another parenthesis.
You have to escape parenthesis.
string yourText = richTextBox1.Text;
//string cow = "this is some text(this is the text you want)";
Regex regex = new Regex(#"\(.+\)");
Match match = regex.Match(yourText);
if (match.Success)
{
Console.WriteLine(match.Value); // with cow -> this is the text you want
} // can have else
You may want to consider using * instead of + in case they don't have any input between the ( ) Example: Regex regex = new Regex(#"\(.*\)");
Now it matches any character 0 or more times
You can achieve this with Linq and Regular Expressions.
using System;
using System.Collections.Generic;
using System.Data;
using System.Linq;
using System.Windows.Forms;
using System.Text.RegularExpressions;
namespace Propress
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void richTextBox1_TextChanged(object sender, EventArgs e)
{
string text = richTextBox1.Text;
ICollection<string> matches =
Regex.Matches(text.Replace(Environment.NewLine, ""), #"\(([^)]*)\)")
.Cast<Match>()
.Select(x => x.Groups[1].Value)
.ToList();
foreach (string match in matches)
MessageBox.Show(match);
}
}
}
string textBox = richTextBox1.Text;
textBox = textBox.Replace("testname(", "");
// Now textBox has the string you are looking for

C# Pig Latin with Regex Replace

First off- This is a Homework problem. Just getting that out there. Trying to build a Pig Latin Translator in C#, we have to use Regex replace but I'm having some issues. Not allowed to use the Split method to obtain an array of words. We have to use the static method Replace of type Regex. White Space, punctuation linebreaks et should be preserved. Capitalized words should remain so. For those unfamiliar with the rules of Pig Latin-
If the string begins with a vowel, add "way" to the string. (vowels are a,e,i,o,u)
Examples: Pig-Latin for "orange" is "orangeway", Pig-Latin for “eating” is “eatingway”
Otherwise, find the first occurrence of a vowel, move all the characters before the vowel to the end of the word, and add "ay".
(in the middle of the word ‘y’ also counts as a vowel, but NOT at the beginning)
Examples: Pig-Latin for "story" is "orystay" since the characters "st" occur before the first vowel; Pig-Latin for "crystal" is "ystalcray", but Pig-Latin for "yellow" is "ellowyay".
If there are no vowels, add "ay".Examples: Pig-Latin for "mph" is "mphay", Pig-Latin for RPM is RPMay
I've got a ton of commented out code, so I'll remove that for reading ease.
My test sentence is "Eat monkey poo." I'm getting "Ewayaayt moaynkeayy poayoay."
I know Regex is 'greedy', but I can't figure out how to get it to stop with just the first vowel it finds. Using Textboxes as well.
namespace AssignmentPigLatin
{
public partial class MainWindow : Window
{
public MainWindow()
{
InitializeComponent();
OriginalTb.Text = "Eat monkey poo.";
}
private void translateButton_Click(object sender, RoutedEventArgs e)
{
string vowels = "[AEIOUaeiou]";
var regex = new Regex(vowels);
var translation = regex.Replace(OriginalTb.Text, TranslateToPigLatin);
PigLatinTb.Text = translation;
}
private void ClearButton_Click(object sender, RoutedEventArgs e)
{
OriginalTb.Text = "";
PigLatinTb.Text = "";
}
static string TranslateToPigLatin(Match match)
{
string word = match.ToString();
string firstLetters = word.Substring(0, match.Length);
string restLetters = word.Substring(firstLetters.Length - 1, word.Length-1);
string newWord;
if (match.Index == 0)
{
return word + "way";
}
else
{
return restLetters + firstLetters + "ay";
}
}
}
}
The question was interesting to answer. Don't forget to attribute me ;)
Add this method in your class AssignmentPigLatin
private string PigLatinTranslator(string s)
{
s = Regex.Replace(s, #"(\b[a|e|i|o|u]\w+)", "$1way", RegexOptions.IgnoreCase);
List<string> words = new List<string>();
foreach (Match v in Regex.Matches(s, #"\w+"))
{
string result;
if (!v.Value.EndsWith("way"))
{
result = Regex.Replace(v.Value, #"([^a|e|i|o|u]*)([a|e|i|o|u])(\w+)", "$2$3$1ay", RegexOptions.IgnoreCase);
words.Add(result);
}
else { words.Add(v.Value); }
}
s = string.Join(" ", words);
words.Clear();
foreach (Match v in Regex.Matches(s,#"\w+"))
{
string result = Regex.Replace(v.Value, #"\b([^a|e|i|o|u]+)\b", "$1ay", RegexOptions.IgnoreCase);
words.Add(result);
}
s = string.Join(" ", words);
return s;
}
Call it like this:
string test = "MPH Eat monkey poo."; // Added MPH, so that you can test my method works or not.
string result = PigLatinTranslator(test);
Console.WriteLine(result); // MPHay Eatway onkeymay oopay.
Easier and more clear solution is to use Regex.Replace with lambda.
static string TranslateToPigLatin(string input)
{
char[] vowels = new[] { 'A', 'E', 'I', 'O', 'U', 'a', 'e', 'i', 'o', 'u' };
char[] vowelsExtended = vowels.Concat(new[] { 'Y', 'y' }).ToArray();
string output = Regex.Replace(input, #"\w+", m =>
{
string word = m.Value;
if (vowels.Contains(word[0]))
return word + "way";
else
{
int indexOfVowel = word.IndexOfAny(vowelsExtended, 1);
if (indexOfVowel == -1)
return word + "ay";
else
return word.Substring(indexOfVowel) + word.Substring(0, indexOfVowel) + "ay";
}
});
return output;
}

C# Split string into array based on prior character

I need to take a string and split it into an array based on the type of charcter not matching they proceeding it.
So if you have "asd fds 1.4#3" this would split into array as follows
stringArray[0] = "asd";
stringArray[1] = " ";
stringArray[2] = "fds";
stringArray[3] = " ";
stringArray[4] = "1";
stringArray[5] = ".";
stringArray[6] = "4";
stringArray[7] = "#";
stringArray[8] = "3";
Any recomendations on the best way to acheive this? Of course I could create a loop based on .ToCharArray() but was looking for a better way to achieve this.
Thank you
Using a combination of Regular Expressions and link you can do the following.
using System.Text.RegularExpressions;
using System.Linq;
var str="asd fds 1.4#3";
var regex=new Regex("([A-Za-z]+)|([0-9]+)|([.#]+)|(.+?)");
var result=regex.Matches(str).OfType<Match>().Select(x=>x.Value).ToArray();
Add additional capture groups to capture other differences. The last capture (.+?) is a non greedy everything else. So every item in this capture will be considered different (including the same item twice)
Update - new revision of regex
var regex=new Regex(#"(?:[A-Za-z]+)|(?:[0-9]+)|(?:[#.]+)|(?:(?:(.)\1*)+?)");
This now uses non capturing groups so that \1 can be used in the final capture. This means that the same character will be grouped if its in then catch all group.
e.g. before the string "asd fsd" would create 4 strings (each space would be considered different) now the result is 3 strings as 2 adjacent spaces are combined
Use regex:
var mc = Regex.Matches("asd fds 1.4#3", #"([a-zA-Z]+)|.");
var res = new string[mc.Count];
for (var i = 0; i < mc.Count; i++)
{
res[i] = mc[i].Value;
}
This program produces exactly output you want, but I am not sure wether it's generic enaugh for your goal.
class Program
{
private static void Main(string[] args)
{
var splited = Split("asd fds 1.4#3").ToArray();
}
public static IEnumerable<string> Split(string text)
{
StringBuilder result = new StringBuilder();
foreach (var ch in text)
{
if (char.IsLetter(ch))
{
result.Append(ch);
}
else
{
yield return result.ToString();
result.Clear();
yield return ch.ToString(CultureInfo.InvariantCulture);
}
}
}
}

Find substring ignoring specified characters

Do any of you know of an easy/clean way to find a substring within a string while ignoring some specified characters to find it. I think an example would explain things better:
string: "Hello, -this- is a string"
substring to find: "Hello this"
chars to ignore: "," and "-"
found the substring, result: "Hello, -this"
Using Regex is not a requirement for me, but I added the tag because it feels related.
Update:
To make the requirement clearer: I need the resulting substring with the ignored chars, not just an indication that the given substring exists.
Update 2:
Some of you are reading too much into the example, sorry, i'll give another scenario that should work:
string: "?A&3/3/C)412&"
substring to find: "A41"
chars to ignore: "&", "/", "3", "C", ")"
found the substring, result: "A&3/3/C)41"
And as a bonus (not required per se), it will be great if it's also not safe to assume that the substring to find will not have the ignored chars on it, e.g.: given the last example we should be able to do:
substring to find: "A3C412&"
chars to ignore: "&", "/", "3", "C", ")"
found the substring, result: "A&3/3/C)412&"
Sorry if I wasn't clear before, or still I'm not :).
Update 3:
Thanks to everyone who helped!, this is the implementation I'm working with for now:
http://www.pastebin.com/pYHbb43Z
An here are some tests:
http://www.pastebin.com/qh01GSx2
I'm using some custom extension methods I'm not including but I believe they should be self-explainatory (I will add them if you like)
I've taken a lot of your ideas for the implementation and the tests but I'm giving the answer to #PierrOz because he was one of the firsts, and pointed me in the right direction.
Feel free to keep giving suggestions as alternative solutions or comments on the current state of the impl. if you like.
in your example you would do:
string input = "Hello, -this-, is a string";
string ignore = "[-,]*";
Regex r = new Regex(string.Format("H{0}e{0}l{0}l{0}o{0} {0}t{0}h{0}i{0}s{0}", ignore));
Match m = r.Match(input);
return m.Success ? m.Value : string.Empty;
Dynamically you would build the part [-, ] with all the characters to ignore and you would insert this part between all the characters of your query.
Take care of '-' in the class []: put it at the beginning or at the end
So more generically, it would give something like:
public string Test(string query, string input, char[] ignorelist)
{
string ignorePattern = "[";
for (int i=0; i<ignoreList.Length; i++)
{
if (ignoreList[i] == '-')
{
ignorePattern.Insert(1, "-");
}
else
{
ignorePattern += ignoreList[i];
}
}
ignorePattern += "]*";
for (int i = 0; i < query.Length; i++)
{
pattern += query[0] + ignorepattern;
}
Regex r = new Regex(pattern);
Match m = r.Match(input);
return m.IsSuccess ? m.Value : string.Empty;
}
Here's a non-regex string extension option:
public static class StringExtensions
{
public static bool SubstringSearch(this string s, string value, char[] ignoreChars, out string result)
{
if (String.IsNullOrEmpty(value))
throw new ArgumentException("Search value cannot be null or empty.", "value");
bool found = false;
int matches = 0;
int startIndex = -1;
int length = 0;
for (int i = 0; i < s.Length && !found; i++)
{
if (startIndex == -1)
{
if (s[i] == value[0])
{
startIndex = i;
++matches;
++length;
}
}
else
{
if (s[i] == value[matches])
{
++matches;
++length;
}
else if (ignoreChars != null && ignoreChars.Contains(s[i]))
{
++length;
}
else
{
startIndex = -1;
matches = 0;
length = 0;
}
}
found = (matches == value.Length);
}
if (found)
{
result = s.Substring(startIndex, length);
}
else
{
result = null;
}
return found;
}
}
EDIT: here's an updated solution addressing the points in your recent update. The idea is the same except if you have one substring it will need to insert the ignore pattern between each character. If the substring contains spaces it will split on the spaces and insert the ignore pattern between those words. If you don't have a need for the latter functionality (which was more in line with your original question) then you can remove the Split and if checking that provides that pattern.
Note that this approach is not going to be the most efficient.
string input = #"foo ?A&3/3/C)412& bar A341C2";
string substring = "A41";
string[] ignoredChars = { "&", "/", "3", "C", ")" };
// builds up the ignored pattern and ensures a dash char is placed at the end to avoid unintended ranges
string ignoredPattern = String.Concat("[",
String.Join("", ignoredChars.Where(c => c != "-")
.Select(c => Regex.Escape(c)).ToArray()),
(ignoredChars.Contains("-") ? "-" : ""),
"]*?");
string[] substrings = substring.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
string pattern = "";
if (substrings.Length > 1)
{
pattern = String.Join(ignoredPattern, substrings);
}
else
{
pattern = String.Join(ignoredPattern, substring.Select(c => c.ToString()).ToArray());
}
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine("Index: {0} -- Match: {1}", match.Index, match.Value);
}
Try this solution out:
string input = "Hello, -this- is a string";
string[] searchStrings = { "Hello", "this" };
string pattern = String.Join(#"\W+", searchStrings);
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine(match.Value);
}
The \W+ will match any non-alphanumeric character. If you feel like specifying them yourself, you can replace it with a character class of the characters to ignore, such as [ ,.-]+ (always place the dash character at the start or end to avoid unintended range specifications). Also, if you need case to be ignored use RegexOptions.IgnoreCase:
Regex.Matches(input, pattern, RegexOptions.IgnoreCase)
If your substring is in the form of a complete string, such as "Hello this", you can easily get it into an array form for searchString in this way:
string[] searchString = substring.Split(new[] { ' ' },
StringSplitOptions.RemoveEmptyEntries);
This code will do what you want, although I suggest you modify it to fit your needs better:
string resultString = null;
try
{
resultString = Regex.Match(subjectString, "Hello[, -]*this", RegexOptions.IgnoreCase).Value;
}
catch (ArgumentException ex)
{
// Syntax error in the regular expression
}
You could do this with a single Regex but it would be quite tedious as after every character you would need to test for zero or more ignored characters. It is probably easier to strip all the ignored characters with Regex.Replace(subject, "[-,]", ""); then test if the substring is there.
Or the single Regex way
Regex.IsMatch(subject, "H[-,]*e[-,]*l[-,]*l[-,]*o[-,]* [-,]*t[-,]*h[-,]*i[-,]*s[-,]*")
Here's a non-regex way to do it using string parsing.
private string GetSubstring()
{
string searchString = "Hello, -this- is a string";
string searchStringWithoutUnwantedChars = searchString.Replace(",", "").Replace("-", "");
string desiredString = string.Empty;
if(searchStringWithoutUnwantedChars.Contains("Hello this"))
desiredString = searchString.Substring(searchString.IndexOf("Hello"), searchString.IndexOf("this") + 4);
return desiredString;
}
You could do something like this, since most all of these answer require rebuilding the string in some form.
string1 is your string you want to look through
//Create a List(Of string) that contains the ignored characters'
List<string> ignoredCharacters = new List<string>();
//Add all of the characters you wish to ignore in the method you choose
//Use a function here to get a return
public bool subStringExist(List<string> ignoredCharacters, string myString, string toMatch)
{
//Copy Your string to a temp
string tempString = myString;
bool match = false;
//Replace Everything that you don't want
foreach (string item in ignoredCharacters)
{
tempString = tempString.Replace(item, "");
}
//Check if your substring exist
if (tempString.Contains(toMatch))
{
match = true;
}
return match;
}
You could always use a combination of RegEx and string searching
public class RegExpression {
public static void Example(string input, string ignore, string find)
{
string output = string.Format("Input: {1}{0}Ignore: {2}{0}Find: {3}{0}{0}", Environment.NewLine, input, ignore, find);
if (SanitizeText(input, ignore).ToString().Contains(SanitizeText(find, ignore)))
Console.WriteLine(output + "was matched");
else
Console.WriteLine(output + "was NOT matched");
Console.WriteLine();
}
public static string SanitizeText(string input, string ignore)
{
Regex reg = new Regex("[^" + ignore + "]");
StringBuilder newInput = new StringBuilder();
foreach (Match m in reg.Matches(input))
{
newInput.Append(m.Value);
}
return newInput.ToString();
}
}
Usage would be like
RegExpression.Example("Hello, -this- is a string", "-,", "Hello this"); //Should match
RegExpression.Example("Hello, -this- is a string", "-,", "Hello this2"); //Should not match
RegExpression.Example("?A&3/3/C)412&", "&/3C\\)", "A41"); // Should match
RegExpression.Example("?A&3/3/C) 412&", "&/3C\\)", "A41"); // Should not match
RegExpression.Example("?A&3/3/C)412&", "&/3C\\)", "A3C412&"); // Should match
Output
Input: Hello, -this- is a string
Ignore: -,
Find: Hello this
was matched
Input: Hello, -this- is a string
Ignore: -,
Find: Hello this2
was NOT matched
Input: ?A&3/3/C)412&
Ignore: &/3C)
Find: A41
was matched
Input: ?A&3/3/C) 412&
Ignore: &/3C)
Find: A41
was NOT matched
Input: ?A&3/3/C)412&
Ignore: &/3C)
Find: A3C412&
was matched

Categories