C# Remove Invalid Characters from Filename - c#

I have data coming from an nvarchar field of the SQL server database via EF3.5. This string is used to create a Filename and need to remove invalid characters and tried following options but none of them works. Please suggest why this is such an understandable mystery? Am I doing anything wrong?
I went though almost all of the related questions on this site.. and now posting a consolidated question from all the suggestions/answers from other similar questions.
UPD: The Issue was unrelated..All of these options do work. So posting it to community wiki.
public static string CleanFileName1(string filename)
{
string file = filename;
file = string.Concat(file.Split(System.IO.Path.GetInvalidFileNameChars(), StringSplitOptions.RemoveEmptyEntries));
if (file.Length > 250)
{
file = file.Substring(0, 250);
}
return file;
}
public static string CleanFileName2(string filename)
{
var builder = new StringBuilder();
var invalid = System.IO.Path.GetInvalidFileNameChars();
foreach (var cur in filename)
{
if (!invalid.Contains(cur))
{
builder.Append(cur);
}
}
return builder.ToString();
}
public static string CleanFileName3(string filename)
{
string regexSearch = string.Format("{0}{1}",
new string(System.IO.Path.GetInvalidFileNameChars()),
new string(System.IO.Path.GetInvalidPathChars()));
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
string file = r.Replace(filename, "");
return file;
}
public static string CleanFileName4(string filename)
{
return new String(filename.Except(System.IO.Path.GetInvalidFileNameChars()).ToArray());
}
public static string CleanFileName5(string filename)
{
string file = filename;
foreach (char c in System.IO.Path.GetInvalidFileNameChars())
{
file = file.Replace(c, '_');
}
return file;
}

Here is a function I use in a static common class:
public static string RemoveInvalidFilePathCharacters(string filename, string replaceChar)
{
string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
return r.Replace(filename, replaceChar);
}

Try this
filename = Regex.Replace(filename, "[\/?:*""><|]+", "", RegexOptions.Compiled)

no invalid chars returned by System.IO.Path.GetInvalidFileNameChars() being removed. – Bhuvan 5 mins ago
The first method you posted works OK for the characters in Path.GetInvalidFileNameChars(), here it is at work:
static void Main(string[] args)
{
string input = "abc<def>ghi\\1234/5678|?9:*0";
string output = CleanFileName1(input);
Console.WriteLine(output); // this prints: abcdefghi1234567890
Console.Read();
}
I suppose though that your problem is with some language-specific special characters. You can try to troubleshoot this problem by printing out the ASCII codes of the characters in your string:
string stringFromDatabase = "/5678|?9:*0"; // here you get it from the database
foreach (char c in stringFromDatabase.ToCharArray())
Console.WriteLine((int)c);
and consulting the ASCII table: http://www.asciitable.com/
I again suspect that you'll see characters with codes larger than 128, and you should exclude those from your string.

Related

How to add string padding in textbox by importing textfile that has 2 values

This is my code for importing txt file to TextBox (it works). Now my question is how to add string padding like this:
dean harris...........dean.harris#outlook.com
Now it shows just:
dean harris, dean.harris#outlook.com.
I looked up a lot but didn't get any good result. I tried using the documentation but I couldn't figure it out. ==> https://learn.microsoft.com/en-us/dotnet/standard/base-types/padding
Thanks in advance!
private void BtnInlezen_Click(object sender, RoutedEventArgs e)
{
string[] lines = File.ReadAllLines(txtFile);
try
{
using (StreamReader file = new StreamReader(txtFile))
{
StringBuilder builder = new StringBuilder();
while (!file.EndOfStream)
{
builder.Append(file.ReadLine().Replace("\"", ""));
builder.AppendLine();
}
TxtResultaat.Text = builder.ToString();
}
}
catch (Exception ex)
{
if (!File.Exists(txtFile))
{
MessageBox.Show(ex.Message, "File not found");
}
else
{
MessageBox.Show("an unknown error has occurred");
}
return;
}
}
I'm not sure how you want to pad the string so here are two ways. The first center pads the string so they all have the same length and the second aligns the email addresses.
public static string CenterPad(string s, int maxLength, string delimeter, char replaceWith='.')
{
int numDots = maxLength - s.Length + delimeter.Length;
return s.Replace(delimeter, new string(replaceWith, numDots));
}
public static string AlignSecond(string s, int maxLength, string delimeter, char replaceWith='.')
{
string [] parts = s.Split(new string[]{delimeter}, StringSplitOptions.None);
return parts[0].PadRight(maxLength, replaceWith) + parts[1];
}
public static void Main()
{
string [] tests = {"dean harris, dean.harris#outlook.com",
"john smith, john#example.com",
"sally washington, sallywashington#example.com"};
foreach (var s in tests) {
Console.WriteLine(CenterPad(s, 50, ", "));
}
Console.WriteLine();
foreach (var s in tests) {
Console.WriteLine(AlignSecond(s, 25, ", "));
}
}
Output:
dean harris................dean.harris#outlook.com
john smith........................john#example.com
sally washington.......sallywashington#example.com
dean harris..............dean.harris#outlook.com
john smith...............john#example.com
sally washington.........sallywashington#example.com
If you want to add a string pad, you can use these methods https://learn.microsoft.com/en-us/dotnet/standard/base-types/padding of class String , and manipulate the string before assing value to the property of object TxtResultaat.

Dealing with multiple '.' in a file extension

I have this string that contains a filename
string filename = "C:\\Users\\me\\Desktop\\filename.This.Is.An.Extension"
I tried using the conventional
string modifiedFileName = System.IO.Path.GetFileNameWithoutExtension(filename);
but it only gets me:
modifiedFileName = "C:\\Users\\me\\Desktop\\filename.This.Is.An"
In order for me to get "C:\\Users\\me\\Desktop\\filename" I would have to use System.IO.Path.GetFileNameWithoutExtension several times, and that's just not efficient.
What better way is there to take my file name and have it return the directory + filename and no exceptions?
Many thanks in advance!
If you want to stop at the first period, you will have to handle it yourself.
Path.GetDirectoryName(filepath) + Path.GetFileName(filepath).UpTo(".")
using this string extension:
public static string UpTo(this string s, string stopper) => s.Substring(0, Math.Max(0, s.IndexOf(stopper)));
Take the directory and the base name:
var directoryPath = Path.GetDirectoryName(filename);
var baseName = Path.GetFileName(filename);
Strip the base name’s “extensions”:
var baseNameWithoutExtensions = baseName.Split(new[] {'.'}, 2)[0];
Recombine them:
var modifiedFileName = Path.Combine(directoryPath, baseNameWithoutExtensions);
demo
Without built in function:
public static void Main(string[] args)
{
string s = "C:\\Users\\me\\Desktop\\filename.This.Is.An.Extension";
string newString="";
for(int i=0;i<s.Length;i++)
{
if(s[i]=='.'){
break;
}else{
newString += s[i].ToString();
}
}
Console.WriteLine(newString); //writes "C:\Users\me\Desktop\filename"
}

Delete all but {x} C# string

I'm trying to cycle through a .txt to build a test function for another application I'm building.
I've got a list of UK based lat/long values that are formatted like this:
Latitude: 57°39′55″N 57.665198
Longitude: 6°57′27″W -6.95739395
Distance: 184.8338 mi Bearing: 329.815°
with the intended result of this small application being just the lat/long values:
57.665198
-6.95739395
So far I've got a StreamReader working with a myString.StartsWith("Latitude") {} but I'm stuck.
How do I detect a splitstring of 2 spaces " " inside of a string and delete everything before that? My code so far is this:
static void Main(string[] args)
{
string text = "";
using (var streamReader = new StreamReader(#"c:\mb\latlong.txt", Encoding.UTF8))
{
text = streamReader.ReadToEnd();
if (text.Trim().StartsWith("Latitude: "))
{
text.Split()
} else if (text.StartsWith("Distance: "))
{
} else if (text.StartsWith(""))
{
}
streamReader.ReadLine();
}
Console.ReadKey();
}
Thanks in advance
You can try using regular expressions
var result = File
.ReadLines(#"C:\MyFile.txt")
.SelectMany(line => Regex
.Matches(line, #"(?<=\s)-?[0-9]+(\.[0-9]+)*$")
.OfType<Match>()
.Select(match => match.Value));
Test
// 57.665198
// -6.95739395
Console.Write(String.Join(Environment.NewLine, result));
Use string.IndexOf(" ") to find the position of the two spaces in the string. Then you can use string.Substring(position) to get the string after that point.
In your code:
if (text.Trim().StartsWith("Latitude: "))
{
var positionOfTwoSpaces = text.IndexOf(" ");
var latString = text.Substring(positionOfTwoSpaces);
var latValue = float.Parse(latString);
}
You can try the regular expression solution. (You might need to fix up the space counts in the regex definitions)
static void Main(string[] args)
{
string text = "";
Regex lat = new Regex("Latitude: .+? (.+)");
Regex lon = new Regex("Longitude .+? (.+)");
using (var streamReader = new StreamReader(#"c:\mb\latlong.txt", Encoding.UTF8))
{
string line;
while ((line = streamReader.ReadLine() != null)
{
if (lat.IsMatch(line))
lat.Match(line).Groups[1].Value // latitude
else if(lon.IsMatch(line))
lon.Match(line).Groups[1].Value // longitude
}
}
Console.ReadKey();
}
A simple solution would be
string[] fileLines = IO.File.ReadAllLines("input file path");
List<string> resultLines = new List<string>();
foreach (string line in fileLines) {
string[] parts = line.Split(" "); //Double space
if (parts.Count() > 1) {
string lastPart = parts.LastOrDefault();
if (!string.IsNullOrEmpty(lastPart)) {
resultLines.Add(lastPart);
}
}
}
IO.File.WriteAllLines("output file path", resultLines.ToArray());
As I already suggested in my comment. You can look for the last occurrence of the space and substring from there.
using System;
using System.IO;
using System.Text;
public class Test
{
public static void Main()
{
String line = String.Empty;
while(!String.IsNullOrEmpty((line = streamReader.ReadLine())))
{
if(line.StartsWith("Latitude:"))
{
line = line.Substring(line.LastIndexOf(' ') + 1);
Console.WriteLine(line);
}
}
Console.ReadKey();
}
}
Working example.
I didn't provide all the code because this is just copy paste for the longitude case. I think you can do this by your own. :)

Parsing string C#

So here is my problem, I'm trying to get the content of a text file as a string, then parse it. What I want is a tab containing each word and only words (no blank, no backspace, no \n ...) What I'm doing is using a function LireFichier that send me back the string containing the text from the file (works fine because it's displayed correctly) but when I try to parse it fails and start doing random concatenation on my string and I don't get why.
Here is the content of the text file I'm using :
truc,
ohoh,
toto, tata, titi, tutu,
tete,
and here's my final string :
;tete;;titi;;tata;;titi;;tutu;
which should be:
truc;ohoh;toto;tata;titi;tutu;tete;
Here is the code I wrote (all using are ok):
namespace ConsoleApplication1{
class Program
{
static void Main(string[] args)
{
string chemin = "MYPATH";
string res = LireFichier(chemin);
Console.WriteLine("End of reading...");
Console.WriteLine("{0}",res);// The result at this point is good
Console.WriteLine("...starting parsing");
res = parseString(res);
Console.WriteLine("Chaine finale : {0}", res);//The result here is awfull
Console.ReadLine();//pause
}
public static string LireFichier(string FilePath) //Read the file, send back a string with the text
{
StreamReader streamReader = new StreamReader(FilePath);
string text = streamReader.ReadToEnd();
streamReader.Close();
return text;
}
public static string parseString(string phrase)//is suppsoed to parse the string
{
string fin="\n";
char[] delimiterChars = { ' ','\n',',','\0'};
string[] words = phrase.Split(delimiterChars);
TabToString(words);//I check the content of my tab
for(int i=0;i<words.Length;i++)
{
if (words[i] != null)
{
fin += words[i] +";";
Console.WriteLine(fin);//help for debug
}
}
return fin;
}
public static void TabToString(string[] montab)//display the content of my tab
{
foreach(string s in montab)
{
Console.WriteLine(s);
}
}
}//Fin de la class Program
}
I think your main issue is
string[] words = phrase.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
You could try using the string splitting option to remove empty entries for you:
string[] words = phrase.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
See the documentation here.
Try this:
class Program
{
static void Main(string[] args)
{
var inString = LireFichier(#"C:\temp\file.txt");
Console.WriteLine(ParseString(inString));
Console.ReadKey();
}
public static string LireFichier(string FilePath) //Read the file, send back a string with the text
{
using (StreamReader streamReader = new StreamReader(FilePath))
{
string text = streamReader.ReadToEnd();
streamReader.Close();
return text;
}
}
public static string ParseString(string input)
{
input = input.Replace(Environment.NewLine,string.Empty);
input = input.Replace(" ", string.Empty);
string[] chunks = input.Split(',');
StringBuilder sb = new StringBuilder();
foreach (string s in chunks)
{
sb.Append(s);
sb.Append(";");
}
return sb.ToString(0, sb.ToString().Length - 1);
}
}
Or this:
public static string ParseFile(string FilePath)
{
using (var streamReader = new StreamReader(FilePath))
{
return streamReader.ReadToEnd().Replace(Environment.NewLine, string.Empty).Replace(" ", string.Empty).Replace(',', ';');
}
}
Your main problem is that you are splitting on \n, but the linebreaks read from your file are \r\n.
You output string does contain all of your items, but the \r characters left in it cause later "lines" to overwrite earlier "lines" on the console.
(\r is a "return to start of line" instruction; without the \n "move to the next line" instruction your words from line 1 are being overwritten by those in line 2, then line 3 and line 4.)
As well as splitting on \r as well as \n, you need to check a string is not null or empty before adding it to your output (or, preferably, use StringSplitOptions.RemoveEmptyEntries as others have mentioned).
string ParseString(string filename) {
return string.Join(";", System.IO.File.ReadAllLines(filename).Where(x => x.Length > 0).Select(x => string.Join(";", x.Split(",".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Select(y => y.Trim()))).Select(z => z.Trim())) + ";";
}

Format String "Hello\World" to "HelloWorld"

I loop the value on first column each row of datagridview, and the format has "\" in the middle, how do we convert convert the string without "\"
ex.
"Hello\World" to "HelloWorld"
"Hi\There" to "HiThere""
etc
String handling
string hello = "Hello\\World";
string helloWithoutBackslashes = hello.Replace("\\",string.Empty);
or, using the # operator
string hi = #"Hi\There";
string hiWithoutBackslashes = hi.Replace(#"\",string.Empty);
I thought I would mix it up a bit.
public class StringCleaner
{
private readonly string dirtyString;
public StringCleaner(string dirtyString)
{
this.dirtyString = dirtyString;
}
public string Clean()
{
using (var sw = new System.IO.StringWriter())
{
foreach (char c in dirtyString)
{
if (c != '\\') sw.Write(c);
}
return sw.ToString();
}
}
}

Categories