Evaluating a regular expression range

Evaluating a regular expression range - c#

Is there a nice way to evaluate a regular expression range, say, for a url such as
http://example.com/[a-z]/[0-9].htm
This would be converted into:
http://example.com/a/0.htm
http://example.com/a/1.htm
http://example.com/a/2.htm
...
http://example.com/a/9.htm
...
http://example.com/z/0.htm
http://example.com/z/1.htm
http://example.com/z/2.htm
...
http://example.com/z/9.htm
I've been scratching my head about this, and there's no pretty way of doing it without going through the alphabet and looping through numbers.
Thanks in advance!

If you really need to do this, it's not that hard to generate the strings using recursion. Here's a snippet to do just that in Java:
public class Explode {
static void dfs(String prefix, String suffix) {
final int k = suffix.indexOf('[');
if (k == -1) {
System.out.println(prefix + suffix);
} else {
prefix += suffix.substring(0, k);
char from = suffix.charAt(k+1);
char to = suffix.charAt(k+3);
suffix = suffix.substring(k+5);
for (char ch = from; ch <= to; ch++) {
dfs(prefix + ch, suffix);
}
}
}
public static void main(String[] args) {
String template = "http://example.com/[a-c]/[0-2][x-z].htm";
dfs("", template);
}
}
(see full output)
This is a standard recursive tuple generator, but with some string infixing in between. It's trivial to port to C#. You'd want to use a mutable StringBuilder-like class for better performance.

I guess there is no way to expand regular expressions in general. Your example
http://foo.com/[a-z]/[0-9].htm
is a very easy regex without * or + for instance. How would you expand such a regex?
In your case you might get away with some loops, but as I said - this is a untypical (easy) regex.

Related

Regex for string without spacial characters or spaces [duplicate]

How do I check a string to make sure it contains numbers, letters, or space only?

In C# this is simple:
private bool HasSpecialChars(string yourString)
{
return yourString.Any(ch => ! char.IsLetterOrDigit(ch));
}

The easiest way it to use a regular expression:
Regular Expression for alphanumeric and underscores
Using regular expressions in .net:
http://www.regular-expressions.info/dotnet.html
MSDN Regular Expression
Regex.IsMatch
var regexItem = new Regex("^[a-zA-Z0-9 ]*$");
if(regexItem.IsMatch(YOUR_STRING)){..}

string s = #"$KUH% I*$)OFNlkfn$";
var withoutSpecial = new string(s.Where(c => Char.IsLetterOrDigit(c)
|| Char.IsWhiteSpace(c)).ToArray());
if (s != withoutSpecial)
{
Console.WriteLine("String contains special chars");
}

Try this way.
public static bool hasSpecialChar(string input)
{
string specialChar = #"\|!#$%&/()=?»«#£§€{}.-;'<>_,";
foreach (var item in specialChar)
{
if (input.Contains(item)) return true;
}
return false;
}

String test_string = "tesintg#$234524##";
if (System.Text.RegularExpressions.Regex.IsMatch(test_string, "^[a-zA-Z0-9\x20]+$"))
{
// Good-to-go
}
An example can be found here: http://ideone.com/B1HxA

If the list of acceptable characters is pretty small, you can use a regular expression like this:
Regex.IsMatch(items, "[a-z0-9 ]+", RegexOptions.IgnoreCase);
The regular expression used here looks for any character from a-z and 0-9 including a space (what's inside the square brackets []), that there is one or more of these characters (the + sign--you can use a * for 0 or more). The final option tells the regex parser to ignore case.
This will fail on anything that is not a letter, number, or space. To add more characters to the blessed list, add it inside the square brackets.

Use the regular Expression below in to validate a string to make sure it contains numbers, letters, or space only:
[a-zA-Z0-9 ]

You could do it with a bool. I've been learning recently and found I could do it this way. In this example, I'm checking a user's input to the console:
using System;
using System.Linq;
namespace CheckStringContent
{
class Program
{
static void Main(string[] args)
{
//Get a password to check
Console.WriteLine("Please input a Password: ");
string userPassword = Console.ReadLine();
//Check the string
bool symbolCheck = userPassword.Any(p => !char.IsLetterOrDigit(p));
//Write results to console
Console.WriteLine($"Symbols are present: {symbolCheck}");
}
}
}
This returns 'True' if special chars (symbolCheck) are present in the string, and 'False' if not present.

A great way using C# and Linq here:
public static bool HasSpecialCharacter(this string s)
{
foreach (var c in s)
{
if(!char.IsLetterOrDigit(c))
{
return true;
}
}
return false;
}
And access it like this:
myString.HasSpecialCharacter();

private bool isMatch(string strValue,string specialChars)
{
return specialChars.Where(x => strValue.Contains(x)).Any();
}

Create a method and call it hasSpecialChar with one parameter
and use foreach to check every single character in the textbox, add as many characters as you want in the array, in my case i just used ) and ( to prevent sql injection .
public void hasSpecialChar(string input)
{
char[] specialChar = {'(',')'};
foreach (char item in specialChar)
{
if (input.Contains(item)) MessageBox.Show("it contains");
}
}
in your button click evenement or you click btn double time like that :
private void button1_Click(object sender, EventArgs e)
{
hasSpecialChar(textbox1.Text);
}

While there are many ways to skin this cat, I prefer to wrap such code into reusable extension methods that make it trivial to do going forward. When using extension methods, you can also avoid RegEx as it is slower than a direct character check. I like using the extensions in the Extensions.cs NuGet package. It makes this check as simple as:
Add the [https://www.nuget.org/packages/Extensions.cs][1] package to your project.
Add "using Extensions;" to the top of your code.
"smith23#".IsAlphaNumeric() will return False whereas "smith23".IsAlphaNumeric() will return True. By default the .IsAlphaNumeric() method ignores spaces, but it can also be overridden such that "smith 23".IsAlphaNumeric(false) will return False since the space is not considered part of the alphabet.
Every other check in the rest of the code is simply MyString.IsAlphaNumeric().

Based on #prmph's answer, it can be even more simplified (omitting the variable, using overload resolution):
yourString.Any(char.IsLetterOrDigit);

No special characters or empty string except hyphen
^[a-zA-Z0-9-]+$

string manipulation check and replace with fastest method

I have some strings like below:
string num1 = "D123_1";
string num2 = "D123_2";
string num3 = "D456_11";
string num4 = "D456_22";
string num5 = "D_123_D";
string num5 = "_D_123";
I want to make a function that will do the following actions:
1- Checks if given string DOES HAVE an Underscore in it, and this underscore should be after some Numbers and Follow with some numbers: in this case 'num5' and 'num6' are invalid!
2- Replace the numbers after the last underscore with any desired string, for example I want 'num1 = "D123_1"' to be changed into 'D123_2'
So far I came with this idea but it is not working :( First I dont know how to check for criteria 1 and second the replace statement is not working:
private string CheckAndReplace(string given, string toAdd)
{
var changedString = given.Split('_');
return changedString[changedString.Length - 1] + toAdd;
}
Any help and tips will be appriciated

What you are looking for is a regular expression. This is (mostly) from the top of my head. But it should easily point you in the right direction. The regular expression works fine.
public static Regex regex = new Regex("(?<character>[a-zA-Z]+)(?<major>\\d+)_(?<minor>\\d+)",RegexOptions.CultureInvariant | RegexOptions.Compiled);
Match m = regex.Match(InputText);
if (m.Succes)
{
var newValue = String.Format("{0}{1}_{2}"m.Groups["character"].Value, m.Groups["major"].Value, m.Groups["minor"].Value);
}

In your code you split the String into an array of strings and then access the wrong index of the array, so it isn't doing what you want.
Try working with a substring instead. Find the index of the last '_' and then get the substring:
private string CheckAndReplace(string given, string toAdd) {
int index = given.LastIndexOf('_')+1;
return given.Substring(0,index) + toAdd;
}
But before that check the validity of the string (see other answers). This code fragment will break when there's no '_' in the string.

You could use a regular expression (this is not a complete implementation, only a hint):
private string CheckAndReplace(string given, string toAdd)
{
Regex regex = new Regex("([A-Z]*[0-9]+_)[0-9]+");
if (regex.IsMatch(given))
{
return string.Concat(regex.Match(given).Groups[1].Value, toAdd);
}
else
{
... do something else
}
}

Use a good regular expression implementation. .NET has standard implementation of them

How to lowercase a string except for first character with C#

How do convert a string to lowercase except for the first character?
Can this be completed with LINQ?
Thanks

If you only have one word in the string, you can use TextInfo.ToTitleCase. No need to use Linq.
As #Guffa noted:
This will convert any string to title case, so, "hello world" and "HELLO WORLD" would both be converted to "Hello World".
To achieve exectly what you asked (convert all characters to lower, except the first one), you can do the following:
string mostLower = myString.Substring(0, 1) + myString.Substring(1).ToLower();

This can be done with simple string operations:
s = s.Substring(0, 1) + s.Substring(1).ToLower();
Note that this does exactly what you asked for, i.e. it converts all characters to lower case except the first one that is left unchanged.
If you instead also want to change the first character to upper case, you would do:
s = s.Substring(0, 1).ToUpper() + s.Substring(1).ToLower();
Note that this code assumes that there is at least two characters in the strings. If there is a possibility that it's shorter, you should of course test for that first.

String newString = new String(str.Select((ch, index) => (index == 0) ? ch : Char.ToLower(ch)).ToArray());

Use namespace: using System.Globalization;
...
string value = CultureInfo.CurrentCulture.TextInfo.ToTitleCase("hello");
EDIT
This code work only if its single word .For convert all character into lower except first letter check Guffa Answer.
string value = myString.Substring(0, 1) + myString.Substring(1).ToLower();

Not sure you can do it in linq here is a non-linq approach:
public static string FirstCap(string value)
{
string result = String.Empty;
if(!String.IsNullOrEmpty(value))
{
if(value.Length == 1)
{
result = value.ToUpper();
}
else
{
result = value.Substring(0,1).ToString().ToUpper() + value.Substring(1).ToLower();
}
}
return result;
}

based on guffa's example above (slightly amended). you could convert that to an extension method (please pardon the badly named method :)):
public static string UpperFirst(this string source)
{
return source.ToLower().Remove(0, 1)
.Insert(0, source.Substring(0, 1).ToUpper());
}
usage:
var myNewString = myOldString.UpperFirst();
// or simply referenced as myOldString.UpperFirst() where required
cheers guffa

var initialString = "Hello hOW r u?";
var res = string.Concat(initialString..ToUpper().Substring(0, 1), initialString.ToLower().Substring(1));

You can use an extension method:
static class StringExtensions
{
public static string ToLowerFirst(this string text)
=> !string.IsNullOrEmpty(text)
? $"{text.Substring(0, 1).ToLower()}{text.Substring(1)}"
: text;
}
Unit tests as well (using FluentAssertions and Microsoft UnitTesting):
[TestClass]
public class StringExtensionsTests
{
[TestMethod]
public void ToLowerFirst_ShouldReturnCorrectValue()
=> "ABCD"
.ToLowerFirst()
.Should()
.Be("aBCD");
[TestMethod]
public void ToLowerFirst_WhenStringIsEmpty_ShouldReturnCorrectValue()
=> string.Empty
.ToLowerFirst()
.Should()
.Be(string.Empty);
}

BestPractice - Transform first character of a string into lower case

I'd like to have a method that transforms the first character of a string into lower case.
My approaches:
1.
public static string ReplaceFirstCharacterToLowerVariant(string name)
{
return String.Format("{0}{1}", name.First().ToString().ToLowerInvariant(), name.Substring(1));
}
2.
public static IEnumerable<char> FirstLetterToLowerCase(string value)
{
var firstChar = (byte)value.First();
return string.Format("{0}{1}", (char)(firstChar + 32), value.Substring(1));
}
What would be your approach?

I would use simple concatenation:
Char.ToLowerInvariant(name[0]) + name.Substring(1)
The first solution is not optimized because string.Format is slow and you don't need it if you have a format that will never change. It also generates an extra string to covert the letter to lowercase, which is not needed.
The approach with "+ 32" is ugly / not maintainable as it requires knowledge of ASCII character value offsets. It will also generate incorrect output with Unicode data and ASCII symbol characters.

Depending on the situation, a little defensive programming might be desirable:
public static string FirstCharacterToLower(string str)
{
if (String.IsNullOrEmpty(str) || Char.IsLower(str, 0))
return str;
return Char.ToLowerInvariant(str[0]) + str.Substring(1);
}
The if statement also prevents a new string from being built if it's not going to be changed anyway. You might want to have the method fail on null input instead, and throw an ArgumentNullException.
As people have mentioned, using String.Format for this is overkill.

Just in case it helps anybody who happens to stumble across this answer.
I think this would be best as an extension method, then you can call it with yourString.FirstCharacterToLower();
public static class StringExtensions
{
public static string FirstCharacterToLower(this string str)
{
if (String.IsNullOrEmpty(str) || Char.IsLower(str, 0))
{
return str;
}
return Char.ToLowerInvariant(str[0]) + str.Substring(1);
}
}

The fastest solution I know without abusing c#:
public static string LowerCaseFirstLetter(string value)
{
if (value?.Length > 0)
{
var letters = value.ToCharArray();
letters[0] = char.ToLowerInvariant(letters[0]);
return new string(letters);
}
return value;
}

Mine is
if (!string.IsNullOrEmpty (val) && val.Length > 0)
{
return val[0].ToString().ToLowerInvariant() + val.Remove (0,1);
}

I like the accepted answer, but beside checking string.IsNullOrEmpty I would also check if Char.IsLower(name[1]) in case you are dealing with abbreviation. E.g. you would not want "AIDS" to become "aIDS".

If you care about performance I would go with
public static string FirstCharToLower(this string str)
=> string.Create(str.Length, str, (output, input) =>
{
input.CopyTo(output);
output[0] = char.ToLowerInvariant(input[0]);
});
I did some quick benchmarking and it seems to be at least twice as fast as the fastest solution posted here and 3.5 times faster than the worst one across multiple input lengths.
There is no input checking as it should be the responsibility of the caller. Allowing you to prepare your data in advance and do fast bulk processing not being slowed down by having branches in your hot path if you ever need it.

With range operator C# 8.0 or above you can do this:
Char.ToLowerInvariant(name[0]) + name[1..];

Combined a few and made it a chainable extension. Added short-circuit on whitespace and non-letter.
public static string FirstLower(this string input) =>
(!string.IsNullOrWhiteSpace(input) && input.Length > 0
&& char.IsLetter(input[0]) && !char.IsLower(input[0]))
? input[0].ToString().ToLowerInvariant() + input.Remove(0, 1) : input;

This is a little extension method using latest syntax and correct validations
public static class StringExtensions
{
public static string FirstCharToLower(this string input)
{
switch (input)
{
case null: throw new ArgumentNullException(nameof(input));
case "": throw new ArgumentException($"{nameof(input)} cannot be empty", nameof(input));
default: return input.First().ToString().ToLower() + input.Substring(1);
}
}
}

Use This:
string newName= name[0].ToString().ToLower() + name.Substring(1);

If you don't want to reference your string twice in your expression you could do this using System.Linq.
new string("Hello World".Select((c, i) => i == 0 ? char.ToLower(c) : c).ToArray())
That way if your string comes from a function, you don't have to store the result of that function.
new string(Console.ReadLine().Select((c, i) => i == 0 ? char.ToLower(c) : c).ToArray())

It is better to use String.Concat than String.Format if you know that format is not change data, and just concatenation is desired.

Is there a way of making strings file-path safe in c#?

My program will take arbitrary strings from the internet and use them for file names. Is there a simple way to remove the bad characters from these strings or do I need to write a custom function for this?

Ugh, I hate it when people try to guess at which characters are valid. Besides being completely non-portable (always thinking about Mono), both of the earlier comments missed more 25 invalid characters.
foreach (var c in Path.GetInvalidFileNameChars())
{
fileName = fileName.Replace(c, '-');
}
Or in VB:
'Clean just a filename
Dim filename As String = "salmnas dlajhdla kjha;dmas'lkasn"
For Each c In IO.Path.GetInvalidFileNameChars
filename = filename.Replace(c, "")
Next
'See also IO.Path.GetInvalidPathChars

To strip invalid characters:
static readonly char[] invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars
var validFilename = new string(filename.Where(ch => !invalidFileNameChars.Contains(ch)).ToArray());
To replace invalid characters:
static readonly char[] invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars and an _ for invalid ones
var validFilename = new string(filename.Select(ch => invalidFileNameChars.Contains(ch) ? '_' : ch).ToArray());
To replace invalid characters (and avoid potential name conflict like Hell* vs Hell$):
static readonly IList<char> invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars and replaces invalid chars with a unique letter (Moves the Char into the letter range of unicode, starting at "A")
var validFilename = new string(filename.Select(ch => invalidFileNameChars.Contains(ch) ? Convert.ToChar(invalidFileNameChars.IndexOf(ch) + 65) : ch).ToArray());

This question has been asked many times before and, as pointed out many times before, IO.Path.GetInvalidFileNameChars is not adequate.
First, there are many names like PRN and CON that are reserved and not allowed for filenames. There are other names not allowed only at the root folder. Names that end in a period are also not allowed.
Second, there are a variety of length limitations. Read the full list for NTFS here.
Third, you can attach to filesystems that have other limitations. For example, ISO 9660 filenames cannot start with "-" but can contain it.
Fourth, what do you do if two processes "arbitrarily" pick the same name?
In general, using externally-generated names for file names is a bad idea. I suggest generating your own private file names and storing human-readable names internally.

I agree with Grauenwolf and would highly recommend the Path.GetInvalidFileNameChars()
Here's my C# contribution:
string file = #"38?/.\}[+=n a882 a.a*/|n^%$ ad#(-))";
Array.ForEach(Path.GetInvalidFileNameChars(),
c => file = file.Replace(c.ToString(), String.Empty));
p.s. -- this is more cryptic than it should be -- I was trying to be concise.

Here's my version:
static string GetSafeFileName(string name, char replace = '_') {
char[] invalids = Path.GetInvalidFileNameChars();
return new string(name.Select(c => invalids.Contains(c) ? replace : c).ToArray());
}
I'm not sure how the result of GetInvalidFileNameChars is calculated, but the "Get" suggests it's non-trivial, so I cache the results. Further, this only traverses the input string once instead of multiple times, like the solutions above that iterate over the set of invalid chars, replacing them in the source string one at a time. Also, I like the Where-based solutions, but I prefer to replace invalid chars instead of removing them. Finally, my replacement is exactly one character to avoid converting characters to strings as I iterate over the string.
I say all that w/o doing the profiling -- this one just "felt" nice to me. : )

Here's the function that I am using now (thanks jcollum for the C# example):
public static string MakeSafeFilename(string filename, char replaceChar)
{
foreach (char c in System.IO.Path.GetInvalidFileNameChars())
{
filename = filename.Replace(c, replaceChar);
}
return filename;
}
I just put this in a "Helpers" class for convenience.

If you want to quickly strip out all special characters which is sometimes more user readable for file names this works nicely:
string myCrazyName = "q`w^e!r#t#y$u%i^o&p*a(s)d_f-g+h=j{k}l|z:x\"c<v>b?n[m]q\\w;e'r,t.y/u";
string safeName = Regex.Replace(
myCrazyName,
"\W", /*Matches any nonword character. Equivalent to '[^A-Za-z0-9_]'*/
"",
RegexOptions.IgnoreCase);
// safeName == "qwertyuiopasd_fghjklzxcvbnmqwertyu"

Here's what I just added to ClipFlair's (http://github.com/Zoomicon/ClipFlair) StringExtensions static class (Utils.Silverlight project), based on info gathered from the links to related stackoverflow questions posted by Dour High Arch above:
public static string ReplaceInvalidFileNameChars(this string s, string replacement = "")
{
return Regex.Replace(s,
"[" + Regex.Escape(new String(System.IO.Path.GetInvalidPathChars())) + "]",
replacement, //can even use a replacement string of any length
RegexOptions.IgnoreCase);
//not using System.IO.Path.InvalidPathChars (deprecated insecure API)
}

static class Utils
{
public static string MakeFileSystemSafe(this string s)
{
return new string(s.Where(IsFileSystemSafe).ToArray());
}
public static bool IsFileSystemSafe(char c)
{
return !Path.GetInvalidFileNameChars().Contains(c);
}
}

Why not convert the string to a Base64 equivalent like this:
string UnsafeFileName = "salmnas dlajhdla kjha;dmas'lkasn";
string SafeFileName = Convert.ToBase64String(Encoding.UTF8.GetBytes(UnsafeFileName));
If you want to convert it back so you can read it:
UnsafeFileName = Encoding.UTF8.GetString(Convert.FromBase64String(SafeFileName));
I used this to save PNG files with a unique name from a random description.

private void textBoxFileName_KeyPress(object sender, KeyPressEventArgs e)
{
e.Handled = CheckFileNameSafeCharacters(e);
}
/// <summary>
/// This is a good function for making sure that a user who is naming a file uses proper characters
/// </summary>
/// <param name="e"></param>
/// <returns></returns>
internal static bool CheckFileNameSafeCharacters(System.Windows.Forms.KeyPressEventArgs e)
{
if (e.KeyChar.Equals(24) ||
e.KeyChar.Equals(3) ||
e.KeyChar.Equals(22) ||
e.KeyChar.Equals(26) ||
e.KeyChar.Equals(25))//Control-X, C, V, Z and Y
return false;
if (e.KeyChar.Equals('\b'))//backspace
return false;
char[] charArray = Path.GetInvalidFileNameChars();
if (charArray.Contains(e.KeyChar))
return true;//Stop the character from being entered into the control since it is non-numerical
else
return false;
}

From my older projects, I've found this solution, which has been working perfectly over 2 years. I'm replacing illegal chars with "!", and then check for double !!'s, use your own char.
public string GetSafeFilename(string filename)
{
string res = string.Join("!", filename.Split(Path.GetInvalidFileNameChars()));
while (res.IndexOf("!!") >= 0)
res = res.Replace("!!", "!");
return res;
}

I find using this to be quick and easy to understand:
<Extension()>
Public Function MakeSafeFileName(FileName As String) As String
Return FileName.Where(Function(x) Not IO.Path.GetInvalidFileNameChars.Contains(x)).ToArray
End Function
This works because a string is IEnumerable as a char array and there is a string constructor string that takes a char array.

Many anwer suggest to use Path.GetInvalidFileNameChars() which seems like a bad solution to me. I encourage you to use whitelisting instead of blacklisting because hackers will always find a way eventually to bypass it.
Here is an example of code you could use :
string whitelist = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.";
foreach (char c in filename)
{
if (!whitelist.Contains(c))
{
filename = filename.Replace(c, '-');
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Evaluating a regular expression range - c#

I guess there is no way to expand regular expressions in general. Your example http://foo.com/[a-z]/[0-9].htm is a very easy regex without * or + for instance. How would you expand such a regex? In your case you might get away with some loops, but as I said - this is a untypical (easy) regex.

Related

Regex for string without spacial characters or spaces [duplicate]

string manipulation check and replace with fastest method

How to lowercase a string except for first character with C#

BestPractice - Transform first character of a string into lower case

Is there a way of making strings file-path safe in c#?

Categories

Resources