Inserting my own illegal characters into Path.GetInvalidFileNameChars() in C# - c#

How can I extend the Path.GetInvalidFileNameChars to include my own set of characters that is illegal in my application?
string invalid = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
If I wanted to add the '&' as an illegal character, could I do that?

typeof(Path).GetField("InvalidFileNameChars", BindingFlags.NonPublic | BindingFlags.Static).SetValue(null, new[] { 'o', 'v', 'e', 'r', '9', '0', '0', '0' });

Try this:
var invalid = Path.GetInvalidFileNameChars().Concat(new [] { '&' });
This will yeild an IEnumerable<char> with all invalid characters, including yours.
Here is a full example:
using System.IO;
using System.Linq;
class Program
{
static void Main()
{
// This is the sequence of characters
var invalid = Path.GetInvalidFileNameChars().Concat(new[] { '&' });
// If you want them as an array you can do this
var invalid2 = invalid.ToArray();
// If you want them as a string you can do this
var invalid3 = new string(invalid.ToArray());
}
}

You can't modify an existing function, but you can write a wrapper function that returns Path.GetInvalidFileNameChars() and your illegal characters.
public static string GetInvalidFileNameChars() {
return Path.GetInvalidFileNameChars().Concat(MY_INVALID_FILENAME_CHARS);
}

An extension method is your best bet here.
public static class Extensions
{
public static char[] GetApplicationInvalidChars(this char[] input)
{
//Your list of invalid characters goes below.
var invalidChars = new [] { '%', '#', 't' };
return String.Concat(input, invalidChars).ToCharArray();
}
}
Then use it as follows:
string invalid = Path.GetInvalidFileNameChars().GetApplicationInvalidChars();
It will concatenate your invalid characters to what's already in there.

First create a helper class "SanitizeFileName.cs"
public class SanitizeFileName
{
public static string ReplaceInvalidFileNameChars(string fileName, char? replacement = null)
{
if (fileName != null && fileName.Length != 0)
{
var sb = new StringBuilder();
var badChars = new[] { ',', ' ', '^', '°' };
var inValidChars = Path.GetInvalidFileNameChars().Concat(badChars).ToList();
foreach (var #char in fileName)
{
if (inValidChars.Contains(#char))
{
if (replacement.HasValue)
{
sb.Append(replacement.Value);
}
continue;
}
sb.Append(#char);
}
return sb.ToString();
}
return null;
}
}
Then, use it like this:
var validFileName = SanitizeFileName.ReplaceInvalidFileNameChars(filename, '_');
in my case, i had to clean up the "filename" on the "Content-Deposition" in Response Headers in a c# download method.
Response.AddHeader("Content-Disposition", "attachment;filename=" + validFileName);

Related

Extract multiple substring in the same line

I'm trying to build a logparser but i'm stuck.
Right now my program goes trough multiple file in a directory and read all the file line by line.
I was able to identify the substring i was looking for "fct=" and extract the value next to the "=" using delimiter but i notice that when i have a line with more then one "fct=" it doesnt see it.
So i restart my code and i find a way to get the index position of all occurence of fct= in the same line using an extension method that put the index in a list but i dont see how i can use this list to get the value next to the "=" and using my delimiter.
How can i extract the value next to the "=" knowing the start position of "fct=" and the delimiter at the end of the wanted value?
I'm starting in C# so let me know if i can give you more information.
Thanks,
Here's an example of what i would like to parse:
<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC FCT=666</dat></logurl>
<dat>XN=KEY,CN=RTU FCT=4515</dat></logurl>
<dat>XN=KEY,CN=RT</dat></logurl>
I would like t retrieve 10019,666 and 4515.
namespace LogParserV1
{
class Program
{
static void Main(string[] args)
{
int counter = 0;
string[] dirs = Directory.GetFiles(#"C:/LogParser/LogParserV1", "*.txt");
string fctnumber;
char[] enddelimiter = { '<', ',', '&', ':', ' ', '\\', '\'' };
foreach (string fileName in dirs)
{
StreamReader sr = new StreamReader(fileName);
{
String lineRead;
while ((lineRead = sr.ReadLine()) != null)
{
if (lineRead.Contains("fct="))
{
List<int> list = MyExtensions.GetPositions(lineRead, "fct");
//int start = lineRead.IndexOf("fct=") + 4;
// int end = lineRead.IndexOfAny(enddelimiter, start);
//string result = lineRead.Substring(start, end - start);
fctnumber = result;
//System.Console.WriteLine(fctnumber);
list.ForEach(Console.WriteLine);
}
// affiche tout les ligne System.Console.WriteLine(lineRead);
counter++;
}
System.Console.WriteLine(fileName);
sr.Close();
}
}
// Suspend the screen.
System.Console.ReadLine();
}
}
}
namespace ExtensionMethods
{
public class MyExtensions
{
public static List<int> GetPositions(string source, string searchString)
{
List<int> ret = new List<int>();
int len = searchString.Length;
int start = -len;
while (true)
{
start = source.IndexOf(searchString, start + len);
if (start == -1)
{
break;
}
else
{
ret.Add(start);
}
}
return ret;
}
}
}
You could simplify your code a lot by using Regex pattern matching instead.
The following pattern: (?<=FCT=)[0-9]* will match any group of digits preceded by FCT=.
Try it out
This enables us to do the following:
string input = "<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC FCT=666</dat></logurl>...";
string pattern = "(?<=FCT=)[0-9]*";
var values = Regex.Matches(input, pattern).Cast<Match>().Select(x => x.Value);
I have tested this solution with your data, and it gives me the expected results (10019,666 and 4515)
string data = #"<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC FCT=666</dat></logurl>
<dat>XN=KEY,CN=RTU FCT=4515</dat></logurl>
<dat>XN=KEY,CN=RT</dat></logurl>";
char[] delimiters = { '<', ',', '&', ':', ' ', '\\', '\'' };
Regex regex = new Regex("fct=(.+)", RegexOptions.IgnoreCase);
var values = data.Split(delimiters).Select(x => regex.Match(x).Groups[1].Value);
values = values.Where(x => !string.IsNullOrWhiteSpace(x));
values.ToList().ForEach(Console.WriteLine);
I hope my solution will be helpful, let me know.
Below code is usefull to extract the repeated words with linq in text
string text = "Hi Naresh, How are you. You will be next Super man";
IEnumerable<string> strings = text.Split(' ').ToList();
var result = strings.AsEnumerable().Select(x => new {str = Regex.Replace(x.ToLowerInvariant(), #"[^0-9a-zA-Z]+", ""), count = Regex.Matches(text.ToLowerInvariant(), #"\b" + Regex.Escape(Regex.Replace(x.ToLowerInvariant(), #"[^0-9a-zA-Z]+", "")) + #"\b").Count}).Where(x=>x.count>1).GroupBy(x => x.str).Select(x => x.First());
foreach(var item in result)
{
Console.WriteLine(item.str +" = "+item.count.ToString());
}
As always, break down the porblem into smaller bits. See if the following methods help in any way. Tying it up to your code is left as an excercise.
private const string Prefix = "fct=";
//make delimiter look up fast
private static HashSet<char> endDelimiters =
new HashSet<char>(new [] { '<', ',', '&', ':', ' ', '\\', '\'' });
private static string[] GetAllFctFields(string line) =>
line.Split(new string[] { Prefix });
private static bool TryGetValue(string delimitedString, out string value)
{
var buffer = new StringBuilder(delimitedString.Length);
foreach (var c in delimitedString)
{
if (endDelimiters.Contains(c))
break;
buffer.Append(c);
}
//I'm assuming that no end delimiter is a format error.
//Modify according to requirements
if (buffer.Length == delimitedString.Length)
{
value = null;
return false;
}
value = buffer.ToString();
return true;
}
Something like :
class Program
{
static void Main(string[] args)
{
char[] enddelimiter = { '<', ',', '&', ':', ' ', '\\', '\'' };
var fct = "fct=";
var lineRead = "fct=value1,useless text fct=vfct=alue2,fct=value3";
var values = new List<string>();
int start = lineRead.IndexOf(fct);
while(start != -1)
{
start += fct.Length;
int end = lineRead.IndexOfAny(enddelimiter, start);
if (end == -1)
end = lineRead.Length;
string result = lineRead.Substring(start, end - start);
values.Add(result);
start = lineRead.IndexOf(fct, end);
}
values.ForEach(Console.WriteLine);
}
}
You can split the line by string[]
char[] enddelimiter = { '<', ',', '&', ':', ' ', '\\', '\'' };
while ((lineRead = sr.ReadLine()) != null)
{
string[] parts1 = lineRead.Split(new string[] { "fct=" },StringSplitOptions.None);
if(parts1.Length > 0)
{
foreach(string _ar in parts1)
{
if(!string.IsNullOrEmpty(_ar))
{
if(_ar.IndexOfAny(enddelimiter) > 0)
{
MessageBox.Show(_ar.Substring(0, _ar.IndexOfAny(enddelimiter)));
}
else
{
MessageBox.Show(_ar);
}
}
}
}
}

How can I trim strings down to the first occurrence of a ";" or a "["

I have c# strings that look like this:
"かたむく;かたぶく[ok]"
"そば[側,傍];そく[側];はた"
"くすり"
"おととい[gikun];おとつい[gikun];いっさくじつ"
How can I trim these down so that the output has only text up to the first occurrence of a ";" character (not a normal semicolon) or a "[" or if neither are present then the new string would be the same as the existing.
"かたむく"
"そば"
"くすり"
"おととい"
Is that something that would be best done with Regex or should I use some indexOf type of code to do this?
You don't need a Regex, just string.IndexOfAny. Something like:
var inputs = new[]
{
"かたむく;かたぶく[ok]",
"そば[側,傍];そく[側];はた",
"くすり",
"おととい[gikun];おとつい[gikun];いっさくじつ"
};
var separators = new[] {' ', '['};
foreach (var input in inputs)
{
var separatorPosition = input.IndexOfAny(separators);
if (separatorPosition >= 0)
{
Debug.WriteLine($"Split: {input.Substring(0, separatorPosition)}");
}
else
{
Debug.WriteLine($"No Split: {input}");
}
}
I get the following output from your inputs:
Split: かたむく;かたぶく
Split: そば
No Split: くすり
Split: おととい
It doesn't quite match what you show, but I think it's correct (and what you show isn't)
Expanding on my comment, "IndexOf can be used to find the first index of the [ character, and Substring can return the string up to that point."
public static string GetSubstringToChar(string input, char delimeter = '[')
{
if (input == null || !input.Contains(delimeter)) return input;
return input.Substring(0, input.IndexOf(delimeter));
}
To make this work with multiple delimeters, we can pass in an array of delimeter characters and use IndexOfAny:
public static string GetSubstringToChar(string input, char[] delimeters)
{
if (input == null || !input.Any(delimeters.Contains)) return input;
return input.Substring(0, input.IndexOfAny(delimeters));
}
You could then call this like:
var strings = new List<string>
{
"かたむく;かたぶく[ok]",
"そば[側,傍];そく[側];はた",
"くすり",
"おととい[gikun];おとつい[gikun];いっさくじつ",
};
var delimeters = new[] { ';', '[' };
foreach (var str in strings)
{
Console.WriteLine(GetSubstringToChar(str, delimeters));
}
An extension method with a little validation will do the job.
public static string GetUntil(this string input, char[] delimiters)
{
if (input == null || input.IndexOfAny(delimiters) == -1)
return input;
else
return input.Split(delimiters)[0];
}
then call like:
var test = "かたむく;かたぶく[ok]".GetUntil(new char[] { ' ', '[' });

How do I make letters to uppercase after each of a set of specific characters

I have a collection of characters (',', '.', '/', '-', ' ') then I have a collection of strings (about 500).
What I want to do as fast as possible is: after each of the characters I want to make the next letter uppercase.
I want the first capitalized as well and many of the strings are all uppercase to begin with.
EDIT:
I modified tdragons answer to this final result:
public static String CapitalizeAndStuff(string startingString)
{
startingString = startingString.ToLower();
char[] chars = new[] { '-', ',', '/', ' ', '.'};
StringBuilder result = new StringBuilder(startingString.Length);
bool makeUpper = true;
foreach (var c in startingString)
{
if (makeUpper)
{
result.Append(Char.ToUpper(c));
makeUpper = false;
}
else
{
result.Append(c);
}
if (chars.Contains(c))
{
makeUpper = true;
}
}
return result.ToString();
}
Then I call this method for all my strings.
string a = "fef-aw-fase-fes-fes,fes-,fse--sgr";
char[] chars = new[] { '-', ',' };
StringBuilder result = new StringBuilder(a.Length);
bool makeUpper = true;
foreach (var c in a)
{
if (makeUpper)
{
result.Append(Char.ToUpper(c));
makeUpper = false;
}
else
{
result.Append(c);
}
if (chars.Contains(c))
{
makeUpper = true;
}
}
public static string Capitalise(string text, string targets, CultureInfo culture)
{
bool capitalise = true;
var result = new StringBuilder(text.Length);
foreach (char c in text)
{
if (capitalise)
{
result.Append(char.ToUpper(c, culture));
capitalise = false;
}
else
{
if (targets.Contains(c))
capitalise = true;
result.Append(c);
}
}
return result.ToString();
}
Use it like this:
string targets = ",./- ";
string text = "one,two.three/four-five six";
Console.WriteLine(Capitalise(text, targets, CultureInfo.InvariantCulture));
char[] chars = new char[] { ',', '.', '/', '-', ' ' };
string input = "Foo bar bar foo, foo, bar,foo-bar.bar_foo zz-";
string result = input[0] + new string(input.Skip(1).Select((c, i) =>
chars.Contains(input[i]) ? char.ToUpper(input[i + 1]) : input[i + 1]
).ToArray());
Console.WriteLine(result);
Or you can use a simply regex expression:
var result = Regex.Replace(str, #"([.,-][a-z]|\b[a-z])", m => m.Value.ToUpper());
You can stringSplit your whole string multiple times, once for each element, rinse and repeate, and then uppcase each block.
char[] tempChar = {',','-'};
List<string> tempList = new List();
tempList.Add(yourstring);
foreach(var currentChar in tempChar)
{
List<string> tempSecondList = new List();
foreach(var tempString in tempList)
{
foreach(var tempSecondString in tempString.split(currentchar))
{
tempSecondList.Add(tempSecondString);
}
}
tempList = tempSecondList;
}
I hope i did count correct, anyway, afterwards make every entry in tempList Upper

How to convert Turkish chars to English chars in a string?

string strTurkish = "ÜST";
how to make value of strTurkish as "UST" ?
var text = "ÜST";
var unaccentedText = String.Join("", text.Normalize(NormalizationForm.FormD)
.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));
You can use the following method for solving your problem. The other methods do not convert "Turkish Lowercase I (\u0131)" correctly.
public static string RemoveDiacritics(string text)
{
Encoding srcEncoding = Encoding.UTF8;
Encoding destEncoding = Encoding.GetEncoding(1252); // Latin alphabet
text = destEncoding.GetString(Encoding.Convert(srcEncoding, destEncoding, srcEncoding.GetBytes(text)));
string normalizedString = text.Normalize(NormalizationForm.FormD);
StringBuilder result = new StringBuilder();
for (int i = 0; i < normalizedString.Length; i++)
{
if (!CharUnicodeInfo.GetUnicodeCategory(normalizedString[i]).Equals(UnicodeCategory.NonSpacingMark))
{
result.Append(normalizedString[i]);
}
}
return result.ToString();
}
I'm not an expert on this sort of thing, but I think you can use string.Normalize to do it, by decomposing the value and then effectively removing an non-ASCII characters:
using System;
using System.Linq;
using System.Text;
class Test
{
static void Main()
{
string text = "\u00DCST";
string normalized = text.Normalize(NormalizationForm.FormD);
string asciiOnly = new string(normalized.Where(c => c < 128).ToArray());
Console.WriteLine(asciiOnly);
}
}
It's entirely possible that this does horrible things in some cases though.
public string TurkishCharacterToEnglish(string text)
{
char[] turkishChars = {'ı', 'ğ', 'İ', 'Ğ', 'ç', 'Ç', 'ş', 'Ş', 'ö', 'Ö', 'ü', 'Ü'};
char[] englishChars = {'i', 'g', 'I', 'G', 'c', 'C', 's', 'S', 'o', 'O', 'u', 'U'};
// Match chars
for (int i = 0; i < turkishChars.Length; i++)
text = text.Replace(turkishChars[i], englishChars[i]);
return text;
}
This is not a problem that requires a general solution. It is known that there only 12 special characters in Turkish alphabet that has to be normalized. Those are ı,İ,ö,Ö,ç,Ç,ü,Ü,ğ,Ğ,ş,Ş. You can write 12 rules to replace those with their English counterparts: i,I,o,O,c,C,u,U,g,G,s,S.
Public Function Ceng(ByVal _String As String) As String
Dim Source As String = "ığüşöçĞÜŞİÖÇ"
Dim Destination As String = "igusocGUSIOC"
For i As Integer = 0 To Source.Length - 1
_String = _String.Replace(Source(i), Destination(i))
Next
Return _String
End Function
public static string TurkishChrToEnglishChr(this string text)
{
if (string.IsNullOrEmpty(text)) return text;
Dictionary<char, char> TurkishChToEnglishChDic = new Dictionary<char, char>()
{
{'ç','c'},
{'Ç','C'},
{'ğ','g'},
{'Ğ','G'},
{'ı','i'},
{'İ','I'},
{'ş','s'},
{'Ş','S'},
{'ö','o'},
{'Ö','O'},
{'ü','u'},
{'Ü','U'}
};
return text.Aggregate(new StringBuilder(), (sb, chr) =>
{
if (TurkishChToEnglishChDic.ContainsKey(chr))
sb.Append(TurkishChToEnglishChDic[chr]);
else
sb.Append(chr);
return sb;
}).ToString();
}

Get Substring - everything before certain char

I'm trying to figure out the best way to get everything before the - character in a string. Some example strings are below. The length of the string before - varies and can be any length
223232-1.jpg
443-2.jpg
34443553-5.jpg
so I need the value that's from the start index of 0 to right before -. So the substrings would turn out to be 223232, 443, and 34443553
.Net Fiddle example
class Program
{
static void Main(string[] args)
{
Console.WriteLine("223232-1.jpg".GetUntilOrEmpty());
Console.WriteLine("443-2.jpg".GetUntilOrEmpty());
Console.WriteLine("34443553-5.jpg".GetUntilOrEmpty());
Console.ReadKey();
}
}
static class Helper
{
public static string GetUntilOrEmpty(this string text, string stopAt = "-")
{
if (!String.IsNullOrWhiteSpace(text))
{
int charLocation = text.IndexOf(stopAt, StringComparison.Ordinal);
if (charLocation > 0)
{
return text.Substring(0, charLocation);
}
}
return String.Empty;
}
}
Results:
223232
443
34443553
344
34
Use the split function.
static void Main(string[] args)
{
string s = "223232-1.jpg";
Console.WriteLine(s.Split('-')[0]);
s = "443-2.jpg";
Console.WriteLine(s.Split('-')[0]);
s = "34443553-5.jpg";
Console.WriteLine(s.Split('-')[0]);
Console.ReadKey();
}
If your string doesn't have a - then you'll get the whole string.
String str = "223232-1.jpg"
int index = str.IndexOf('-');
if(index > 0) {
return str.Substring(0, index)
}
Things have moved on a bit since this thread started.
Now, you could use
string.Concat(s.TakeWhile((c) => c != '-'));
One way to do this is to use String.Substring together with String.IndexOf:
int index = str.IndexOf('-');
string sub;
if (index >= 0)
{
sub = str.Substring(0, index);
}
else
{
sub = ... // handle strings without the dash
}
Starting at position 0, return all text up to, but not including, the dash.
Slightly modified and refreshed Fredou's solution for C# ≥ 8
Uses range operator syntax (..)
Uses local function
Fiddler: link
/// <summary>
/// Get substring until first occurrence of given character has been found. Returns the whole string if character has not been found.
/// </summary>
public static string GetUntil(this string that, char #char)
{
return that[..(IndexOf() == -1 ? that.Length : IndexOf())];
int IndexOf() => that.IndexOf(#char);
}
Tests:
[TestCase("", ' ', ExpectedResult = "")]
[TestCase("a", 'a', ExpectedResult = "")]
[TestCase("a", ' ', ExpectedResult = "a")]
[TestCase(" ", ' ', ExpectedResult = "")]
[TestCase("/", '/', ExpectedResult = "")]
[TestCase("223232-1.jpg", '-', ExpectedResult = "223232")]
[TestCase("443-2.jpg", '-', ExpectedResult = "443")]
[TestCase("34443553-5.jpg", '-', ExpectedResult = "34443553")]
[TestCase("34443553-5-6.jpg", '-', ExpectedResult = "34443553")]
public string GetUntil(string input, char until) => input.GetUntil(until);
The LINQy way
String.Concat( "223232-1.jpg".TakeWhile(c => c != '-') )
(But, you do need to test for null ;)
Building on BrainCore's answer:
int index = 0;
str = "223232-1.jpg";
//Assuming we trust str isn't null
if (str.Contains('-') == "true")
{
int index = str.IndexOf('-');
}
if(index > 0) {
return str.Substring(0, index);
}
else {
return str;
}
You can use regular expressions for this purpose, but it's good to avoid extra exceptions when input string mismatches against regular expression.
First to avoid extra headache of escaping to regex pattern - we could just use function for that purpose:
String reStrEnding = Regex.Escape("-");
I know that this does not do anything - as "-" is the same as Regex.Escape("=") == "=", but it will make difference for example if character is #"\".
Then we need to match from begging of the string to string ending, or alternately if ending is not found - then match nothing. (Empty string)
Regex re = new Regex("^(.*?)" + reStrEnding);
If your application is performance critical - then separate line for new Regex, if not - you can have everything in one line.
And finally match against string and extract matched pattern:
String matched = re.Match(str).Groups[1].ToString();
And after that you can either write separate function, like it was done in another answer, or write inline lambda function. I've wrote now using both notations - inline lambda function (does not allow default parameter) or separate function call.
using System;
using System.Text.RegularExpressions;
static class Helper
{
public static string GetUntilOrEmpty(this string text, string stopAt = "-")
{
return new Regex("^(.*?)" + Regex.Escape(stopAt)).Match(text).Groups[1].Value;
}
}
class Program
{
static void Main(string[] args)
{
Regex re = new Regex("^(.*?)-");
Func<String, String> untilSlash = (s) => { return re.Match(s).Groups[1].ToString(); };
Console.WriteLine(untilSlash("223232-1.jpg"));
Console.WriteLine(untilSlash("443-2.jpg"));
Console.WriteLine(untilSlash("34443553-5.jpg"));
Console.WriteLine(untilSlash("noEnding(will result in empty string)"));
Console.WriteLine(untilSlash(""));
// Throws exception: Console.WriteLine(untilSlash(null));
Console.WriteLine("443-2.jpg".GetUntilOrEmpty());
}
}
Btw - changing regex pattern to "^(.*?)(-|$)" will allow to pick up either until "-" pattern or if pattern was not found - pick up everything until end of string.

Categories