C# Check string for specific length of numbers - c#

I have the ability to search and return files in a given file location. I also have the ability to return a number sequence from the file name as such:
public List<AvailableFile> GetAvailableFiles(string rootFolder)
{
List<AvailableFile> files = new List<AvailableFile>();
if (Directory.Exists(rootFolder))
{
Log.Info("Checking folder: " + rootFolder + " for files");
try
{
foreach (string f in Directory.GetFiles(rootFolder))
{
files = FileUpload.CreateFileList(f);
var getNumbers = new String(f.Where(Char.IsDigit).ToArray());
System.Diagnostics.Debug.WriteLine(getNumbers);
}
}
catch (System.Exception excpt)
{
Log.Fatal("GetAvailableFiles failed: " + excpt.Message);
}
}
return files;
}
What I want to do now is only return a sequence of numbers that is exactly 8 characters long. For example a file with the name New File1 12345678 123 I'm only caring about getting 12345678 back.
How can I modify my method to achieve this?

A regex seems to be good for this:
var r = new Regex(".*(\\d{8})");
foreach (string f in Directory.GetFiles(rootFolder))
{
files = FileUpload.CreateFileList(f);
var match = r.Match(f);
if(m.Success)
{
Console.WriteLine(m.Groups[1]); // be aware that index zero contains the entire matched string
}
}
The regex will match the very first occurence of 8 digits and put it into the GroupsCollection.

You could use a regular expression:
var match = Regex.Match(input, #"\d{8}");

Related

I want to read text and replace parts that are surrounded by signals

I have a chunk of text for example
string OriginalText = "Hello my name is <!name!> and I am <!age!> years old";
I'm struggling to write a function that I can enter this text into and it will return the same string except with the values surrounded by the Tags "<!" and "!>" to be replace with actual values. I have some code written but don't know how to progress any further.
if(OriginalText.Contains("<!")) //Checks if Change is necessary
{
string[] Total = OriginalText.Split(
new Char[] { '<', '!' },
StringSplitOptions.RemoveEmptyEntries);
if(Total[1].Contains("!>")) //Checks if closing tag exists
{
string ExtTag = Total[1].Split(
new Char[] { '<', '!' },
StringSplitOptions.RemoveEmptyEntries)[0];
ExtData.Add(Total[1].Split(
new Char[] { '<', '!' },
StringSplitOptions.RemoveEmptyEntries)[0]);
return Total[1];
}
}
The desired output would be
"Hello my name is James and I am 21 years old"
I am currently getting this text from a database and so this functions purpose would be to read that text and input the correct information.
Edit: Figured out how to do it so I'm going to include it below however I'm writing this in a program called mattersphere so there will reference to functions that aren't standard c#, I will put comments next to them explain what they do.
private string ConvertCodeToExtData(string OriginalText) //Accepts text with the identifying symbols as placeholders
{
string[] OriginalWords = OriginalText.Split(' '); //Creates array of individual words
string ConvertedText = string.Empty;
int Index = 0;
foreach(string OriginalWord in OriginalWords) //Go through each word in the array
{
if(OriginalWord.Substring(0,1).Equals("<") && OriginalWord.Substring(OriginalWord.Length-1 ,1).Equals(">")) //Checks if Change is necessary
{
string[] ExtDataCodeAndSymbols = OriginalWord.Substring(1, OriginalWord.Length-2).Split('.'); //Decided to create 4 different parts inbetween the <> tags it goes Symbol(e.g £, $, #) . area to look . field . symbol //separates the Code Identifier and the ExtData and Code
try
{
foreach(ExtendedData ex in this.CurrentSession.CurrentFile.ExtendedData) //Search through All data connected to the file, Extended data is essentially all the data from the database that is specific to the current user
{
if(ex.Code.ToLower() == ExtDataCodeAndSymbols[1].ToLower())
{
OriginalWords[Index] = ExtDataCodeAndSymbols[0] + ex.GetExtendedData(ExtDataCodeAndSymbols[2]).ToString() + ExtDataCodeAndSymbols[3]; //Replace code with new data
break;
}
}
}
catch (Exception ex)
{
System.Windows.Forms.MessageBox.Show("Extended Data Field " + ExtDataCodeAndSymbols[1] + "." + ExtDataCodeAndSymbols[2] + " Not found, please speak to your system administrator"); //Handles Error if Ext Data is not found
}
}
Index++;
}
foreach(string Word in OriginalWords)
{
ConvertedText += Word + " "; //Adds all words into a single string and adds space
}
ConvertedText.Remove(ConvertedText.Length -1, 1); //Removes Last Space
return ConvertedText;
}
The text goes in "Hello my name is <.Person.name.> and I have <£.Account.Balance.> in my bank account" and comes out "Hello my name is James and I have £100 in my bank account"
The symbols are optional but the "." are necessary as they are used to split the strings early in the function
If you have to use <!...!> placeholders, I suggest regular expressions:
using System.Text.RegularExpressions;
...
string OriginalText = "Hello my name is <!name!> and I am <!age!> years old";
Dictionary<string, string> substitutes =
new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase) {
{ "name", "John" },
{ "age", "108"},
};
string result = Regex
.Replace(OriginalText,
#"<!([A-Za-z0-9]+)!>", // let placeholder contain letter and digits
match => substitutes[match.Groups[1].Value]);
Console.WriteLine(result);
Outcome:
Hello my name is John and I am 108 years old
Assuming you are stuck with that format, and assuming you know the list of fields ahead of time, you can compose a dictionary of replacement strings and, well, replace them.
//Initialize fields dictionary
var fields = new Dictionary<string, string>();
fields.Add("name", "John");
fields.Add("age", "18");
//Replace each field if it is found
string text = OriginalText;
foreach (var key in fields.Keys)
{
string searchFor = "<!" + key + "!>";
text = text.Replace(searchFor, fields[key]);
}
If the values for the replacement fields come from a domain object, you could just iterate over the properties using reflection:
class Person
{
public string Name { get; set; }
public int Age { get; set; }
}
class Program
{
const string OriginalText = "Hello my name is <!name!> and I am <!age!> years old";
public static void Main()
{
var p = new Person();
p.Age = 18;
p.Name = "John";
//Initialize fields dictionary
var fields = new Dictionary<string, string>();
foreach (var prop in typeof(Person).GetProperties(BindingFlags.Public | BindingFlags.Instance))
{
fields.Add(prop.Name, prop.GetValue(p).ToString());
}
///etc....
And if you need the tag check to be case insensitive, you can use this instead of String.Replace():
string searchFor = #"\<\!" + key + #"\!\>";
text = Regex.Replace(text, searchFor, fields[key], RegexOptions.IgnoreCase);
I think you're looking for this:
var str = string.Format("Hello my name is {0} and I am {1} years old", name, age);
Or, since C# 6, you can just use this:
var str = $"Hello my name is {name} and I am {age} years old";

C# Edit string in file - delete a character (000)

I am rookie in C#, but I need solve one Problem.
I have several text files in Folder and each text files has this structure:
IdNr 000000100
Name Name
Lastname Lastname
Sex M
.... etc...
Load all files from Folder, this is no Problem ,but i need delete "zero" in IdNr, so delete 000000 and 100 leave there. After this file save. Each files had other IdNr, Therefore, it is harder :(
Yes, it is possible each files manual edit, but when i have 3000 files, this is not good :)
Can C# one algorithm, which could this 000000 delete and leave only number 100?
Thank you All.
Vaclav
So, thank you ALL !
But in the End I have this Code :-) :
using System.IO;
namespace name
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Browse_Click(object sender, EventArgs e)
{
DialogResult dialog = folderBrowserDialog1.ShowDialog();
if (dialog == DialogResult.OK)
TP_zdroj.Text = folderBrowserDialog1.SelectedPath;
}
private void start_Click(object sender, EventArgs e)
{
try
{
foreach (string file in Directory.GetFiles(TP_zdroj.Text, "*.txt"))
{
string text = File.ReadAllText(file, Encoding.Default);
text = System.Text.RegularExpressions.Regex.Replace(text, "IdNr 000*", "IdNr ");
File.WriteAllText(file, text, Encoding.Default);
}
}
catch
{
MessageBox.Show("Warning...!");
return;
}
{
MessageBox.Show("Done");
}
}
}
}
Thank you ALL ! ;)
You can use int.Parse:
int number = int.Parse("000000100");
String withoutzeros = number.ToString();
According to your read/save file issue, do the files contain more than one record, is that the header or does each record is a list of key and value like "IdNr 000000100"? It's difficult to answer without these informations.
Edit: Here's a simple but efficient approach which should work if the format is strict:
var files = Directory.EnumerateFiles(path, "*.txt", SearchOption.TopDirectoryOnly);
foreach (var fPath in files)
{
String[] oldLines = File.ReadAllLines(fPath); // load into memory is faster when the files are not really huge
String key = "IdNr ";
if (oldLines.Length != 0)
{
IList<String> newLines = new List<String>();
foreach (String line in oldLines)
{
String newLine = line;
if (line.Contains(key))
{
int numberRangeStart = line.IndexOf(key) + key.Length;
int numberRangeEnd = line.IndexOf(" ", numberRangeStart);
String numberStr = line.Substring(numberRangeStart, numberRangeEnd - numberRangeStart);
int number = int.Parse(numberStr);
String withoutZeros = number.ToString();
newLine = line.Replace(key + numberStr, key + withoutZeros);
newLines.Add(line);
}
newLines.Add(newLine);
}
File.WriteAllLines(fPath, newLines);
}
}
Use TrimStart
var trimmedText = number.TrimStart('0');
This should do it. It assumes your files have a .txt extension, and it removes all occurrences of "000000" from each file.
foreach (string fileName in Directory.GetFiles("*.txt"))
{
File.WriteAllText(fileName, File.ReadAllText(fileName).Replace("000000", ""));
}
These are the steps you would want to take:
Loop each file
Read file line by line
for each line split on " " and remove leading zeros from 2nd element
write the new line back to a temp file
after all lines processed, delete original file and rename temp file
do next file
(you can avoid the temp file part by reading each file in full into memory, but depending on your file sizes this may not be practical)
You can remove the leading zeros with something like this:
string s = "000000100";
s = s.TrimStart('0');
Simply, read every token from the file and use this method:
var token = "000000100";
var result = token.TrimStart('0');
You can write a function similar to this one:
static IEnumerable<string> ModifiedLines(string file) {
string line;
using(var reader = File.OpenText(file)) {
while((line = reader.ReadLine()) != null) {
string[] tokens = line.Split(new char[] { ' ' });
line = string.Empty;
foreach (var token in tokens)
{
line += token.TrimStart('0') + " ";
}
yield return line;
}
}
}
Usage:
File.WriteAllLines(file, ModifiedLines(file));

The best way to split a string without a separator

I have string:
MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938
From the string above you can see that it is separated by colon (:), but in the actual environment, it does not have a standard layout.
The standard is the fields name, example MONEY-ID, and MONEY-STAT.
How I can I split it the right way? And get the value from after the fields name?
Something like that should work:
string s = "MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938";
Regex regex = new Regex(#"MONEY-ID(?<moneyId>.*?)\:MONEY-STAT(?<moneyStat>.*?)\:MONEY-PAYetr-(?<moneyPaetr>.*?)$"); Match match = regex.Match(s);
if (match.Success)
{
Console.WriteLine("Money ID: " + match.Groups["moneyId"].Value);
Console.WriteLine("Money Stat: " + match.Groups["moneyStat"].Value);
Console.WriteLine("Money Paetr: " + match.Groups["moneyPaetr"].Value);
}
Console.WriteLine("hit <enter>");
Console.ReadLine();
UPDATE
Answering additional question, if we're not sure in format, then something like the following could be used:
string s = "MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938";
var itemsToExtract = new List<string> { "MONEY-STAT", "MONEY-PAYetr-", "MONEY-ID", };
string regexFormat = #"{0}(?<{1}>[\d]*?)[^\w]";//sample - MONEY-ID(?<moneyId>.*?)\:
foreach (var item in itemsToExtract)
{
string input = s + ":";// quick barbarian fix of lack of my knowledge of regex. Sorry
var match = Regex.Match(input, string.Format(regexFormat, item, "match"));
if (match.Success)
{
Console.WriteLine("Value of {0} is:{1}", item, match.Groups["match"]);
}
}
Console.WriteLine("hit <enter>");
Console.ReadLine();
As Andre said, I would personally go with regular expressions.
Use groups of something like,
"MONEY-ID(?<moneyid>.*)MONEY-STAT(?<moneystat>.*)MONEY-PAYetr(?<moneypay>.*)"
See this post for how to extract the groups.
Probably followed by a private method that trims off illegal characters in the matched group (e.g. : or -).
Check this out:
string regex = #"^(?i:money-id)(?<moneyid>.*)(?i:money-stat)(?<moneystat>.*)(?i:money-pay)(?<moneypay>.*)$";
string input = "MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938";
Match regexMatch = Regex.Match(input, regex);
string moneyID = regexMatch.Groups["moneyid"].Captures[0].Value.Trim();
string moneyStat = regexMatch.Groups["moneystat"].Captures[0].Value.Trim();
string moneyPay = regexMatch.Groups["moneypay"].Captures[0].Value.Trim();
Try
string data = "MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938";
data = data.Replace("MONEY-", ";");
string[] myArray = data.Split(';');
foreach (string s in myArray)
{
if (!string.IsNullOrEmpty(s))
{
if (s.StartsWith("ID"))
{
}
else if (s.StartsWith("STAT"))
{
}
else if (s.StartsWith("PAYetr"))
{
}
}
}
results in
ID123456:
STAT43:
PAYetr-1232832938
For example, using regular expressions,
(?<=MONEY-ID)(\d)*
It will extract
123456
from your string.

Problem with Existing File Name & Creating a Unique File Name

I have this code:
public void FileCleanup(List<string> paths)
{
string regPattern = (#"[~#&!%+{}]+");
string replacement = "";
string replacement_unique = "_";
Regex regExPattern = new Regex(regPattern);
List<string> existingNames = new List<string>();
StreamWriter errors = new StreamWriter(#"C:\Documents and Settings\jane.doe\Desktop\SharePointTesting\Errors.txt");
StreamWriter resultsofRename = new StreamWriter(#"C:\Documents and Settings\jane.doe\Desktop\SharePointTesting\Results of File Rename.txt");
foreach (string files2 in paths)
try
{
string filenameOnly = Path.GetFileName(files2);
string pathOnly = Path.GetDirectoryName(files2);
string sanitizedFileName = regExPattern.Replace(filenameOnly, replacement);
string sanitized = Path.Combine(pathOnly, sanitizedFileName);
if (!System.IO.File.Exists(sanitized))
{
existingNames.Add(sanitized);
try
{
foreach (string names in existingNames)
{
string filename = Path.GetFileName(names);
string filepath = Path.GetDirectoryName(names);
string cleanName = regExPattern.Replace(filename, replacement_unique);
string scrubbed = Path.Combine(filepath, cleanName);
System.IO.File.Move(names, scrubbed);
//resultsofRename.Write("Path: " + pathOnly + " / " + "Old File Name: " + filenameOnly + "New File Name: " + sanitized + "\r\n" + "\r\n");
resultsofRename = File.AppendText("Path: " + filepath + " / " + "Old File Name: " + filename + "New File Name: " + scrubbed + "\r\n" + "\r\n");
}
}
catch (Exception e)
{
errors.Write(e);
}
}
else
{
System.IO.File.Move(files2, sanitized);
resultsofRename.Write("Path: " + pathOnly + " / " + "Old File Name: " + filenameOnly + "New File Name: " + sanitized + "\r\n" + "\r\n");
}
}
catch (Exception e)
{
//write to streamwriter
}
}
}
}
What i'm trying to do here is rename "dirty" filenames by removing invalid chars (defined in the Regex), replace them with "". However, i noticed if i have duplicate file names, the app does not rename them. I.e. if i have ##test.txt and ~~test.txt in the same folder, they'd be renamed to test.txt. So, i created another foreach loop that instead replaces the invalid char with a "_" versus a blank space.
Problem is, whenever i try to run this, nothing ends up happening! None of the files are renamed!
Can someone tell me if my code is incorrect and how to fix it?
ALSO-- does anybody know how i could replace the invalid char in the 2nd foreach loop with a different char everytime? That way if there are multiple instances of i.e. %Test.txt, ~Test.txt and #test.txt (all to be renamed to test.txt), they can somehow be uniquely named with a different char?
However, would you know how to replace the invalid char with a different unique character every time so that each filename remains unique?
This is one way:
char[] uniques = ",'%".ToCharArray(); // whatever chars you want
foreach (string file in files)
{
foreach (char c in uniques)
{
string replaced = regexPattern.Replace(file, c.ToString());
if (File.Exists(replaced)) continue;
// create file
}
}
You may of course want to refactor this into its own method. Take note also that the maximum number of files only differing by unique character is limited to the number of characters in your uniques array, so if you have a lot of files with the same name only differing by the special characters you listed, it might be wise to use a different method, such as appending a digit to the end of the file name.
how would i append a digit to the end of the file name (with a different # everytime?)
A slightly modified version of Josh's suggestion would work that keeps track of the modified file names mapped to the number of times the same file name has been generated after the replacement:
var filesCount = new Dictionary<string, int>();
string replaceSpecialCharsWith = "_"; // or "", whatever
foreach (string file in files)
{
string sanitizedPath = regexPattern.Replace(file, replaceSpecialCharsWith);
if (filesCount.ContainsKey(sanitizedPath))
{
filesCount[file]++;
}
else
{
filesCount.Add(sanitizedPath, 0);
}
string newFileName = String.Format("{0}{1}{2}",
Path.GetFileNameWithoutExtension(sanitizedPath),
filesCount[sanitizedPath] != 0
? filesCount[sanitizedPath].ToString()
: "",
Path.GetExtension(sanitizedPath));
string newFilePath = Path.Combine(Path.GetDirectoryName(sanitizedPath),
newFileName);
// create file...
}
just a suggestion
after removing/replacing the special characters append timestamp to the file name. timestamps are unique so appending them to filenames will give you a unique filename.
How about maintaining a dictionary of all renamed files, checking each file against it, and if already existing add a number to the end of it?
In response to the answer #Josh Smeaton's gave here's some sample code using a dictionary to keep track of the file names :-
class Program
{
private static readonly Dictionary<string,int> _fileNames = new Dictionary<string, int>();
static void Main(string[] args)
{
var fileName = GetUniqueFileName("filename.txt");
Console.WriteLine(fileName);
fileName = GetUniqueFileName("someotherfilename.txt");
Console.WriteLine(fileName);
fileName = GetUniqueFileName("filename.txt");
Console.WriteLine(fileName);
fileName = GetUniqueFileName("adifferentfilename.txt");
Console.WriteLine(fileName);
fileName = GetUniqueFileName("filename.txt");
Console.WriteLine(fileName);
fileName = GetUniqueFileName("adifferentfilename.txt");
Console.WriteLine(fileName);
Console.ReadLine();
}
private static string GetUniqueFileName(string fileName)
{
// If not already in the dictionary add it otherwise increment the counter
if (!_fileNames.ContainsKey(fileName))
_fileNames.Add(fileName, 0);
else
_fileNames[fileName] += 1;
// Now return the new name using the counter if required (0 means it's just been added)
return _fileNames[fileName].ToString().Replace("0", string.Empty) + fileName;
}
}

RegEx -- getting rid of double whitespaces?

I have an app that goes in, replaces "invalid" chars (as defined by my Regex) with a blankspace. I want it so that if there are 2 or more blank spaces in the filename, to trim one. For example:
Deal A & B.txt after my app runs, would be renamed to Deal A   B.txt (3 spaces b/w A and B). What i want is really this: Deal A B.txt (one space between A and B).
I'm trying to determine how to do this--i suppose my app will have to run through all filenames at least once to replace invalid chars and then run through filenames again to get rid of extraneous whitespace.
Can anybody help me with this?
Here is my code currently for replacing the invalid chars:
public partial class CleanNames : Form
{
public CleanNames()
{
InitializeComponent();
}
public void Sanitizer(List<string> paths)
{
string regPattern = (#"[~#&$!%+{}]+");
string replacement = " ";
Regex regExPattern = new Regex(regPattern);
StreamWriter errors = new StreamWriter(#"S:\Testing\Errors.txt", true);
var filesCount = new Dictionary<string, int>();
dataGridView1.Rows.Clear();
try
{
foreach (string files2 in paths)
{
string filenameOnly = System.IO.Path.GetFileName(files2);
string pathOnly = System.IO.Path.GetDirectoryName(files2);
string sanitizedFileName = regExPattern.Replace(filenameOnly, replacement);
string sanitized = System.IO.Path.Combine(pathOnly, sanitizedFileName);
if (!System.IO.File.Exists(sanitized))
{
DataGridViewRow clean = new DataGridViewRow();
clean.CreateCells(dataGridView1);
clean.Cells[0].Value = pathOnly;
clean.Cells[1].Value = filenameOnly;
clean.Cells[2].Value = sanitizedFileName;
dataGridView1.Rows.Add(clean);
System.IO.File.Move(files2, sanitized);
}
else
{
if (filesCount.ContainsKey(sanitized))
{
filesCount[sanitized]++;
}
else
{
filesCount.Add(sanitized, 1);
}
string newFileName = String.Format("{0}{1}{2}",
System.IO.Path.GetFileNameWithoutExtension(sanitized),
filesCount[sanitized].ToString(),
System.IO.Path.GetExtension(sanitized));
string newFilePath = System.IO.Path.Combine(System.IO.Path.GetDirectoryName(sanitized), newFileName);
System.IO.File.Move(files2, newFilePath);
sanitized = newFileName;
DataGridViewRow clean = new DataGridViewRow();
clean.CreateCells(dataGridView1);
clean.Cells[0].Value = pathOnly;
clean.Cells[1].Value = filenameOnly;
clean.Cells[2].Value = newFileName;
dataGridView1.Rows.Add(clean);
}
}
}
catch (Exception e)
{
errors.Write(e);
}
}
private void SanitizeFileNames_Load(object sender, EventArgs e)
{ }
private void dataGridView1_CellContentClick(object sender, DataGridViewCellEventArgs e)
{
}
private void button1_Click(object sender, EventArgs e)
{
Application.Exit();
}
}
The problem is, that not all files after a rename will have the same amount of blankspaces. As in, i could have Deal A&B.txt which after a rename would become Deal A B.txt (1 space b/w A and B--this is fine). But i will also have files that are like: Deal A & B & C.txt which after a rename is: Deal A   B   C.txt (3 spaces between A,B and C--not acceptable).
Does anybody have any ideas/code for how to accomplish this?
Do the local equivalent of:
s/\s+/ /g;
Just add a space to your regPattern. Any collection of invalid characters and spaces will be replaced with a single space. You may waste a little bit of time replacing a space with a space, but on the other hand you won't need a second string manipulation call.
Does this help?
var regex = new System.Text.RegularExpressions.Regex("\\s{2,}");
var result = regex.Replace("Some text with a lot of spaces, and 2\t\ttabs.", " ");
Console.WriteLine(result);
output is:
Some text with a lot of spaces, and 2 tabs.
It just replaces any sequence of 2 or more whitespace characters with a single space...
Edit:
To clarify, I would just perform this regex right after your existing one:
public void Sanitizer(List<string> paths)
{
string regPattern = (#"[~#&$!%+{}]+");
string replacement = " ";
Regex regExPattern = new Regex(regPattern);
Regex regExPattern2 = new Regex(#"\s{2,}");
and:
foreach (string files2 in paths)
{
string filenameOnly = System.IO.Path.GetFileName(files2);
string pathOnly = System.IO.Path.GetDirectoryName(files2);
string sanitizedFileName = regExPattern.Replace(filenameOnly, replacement);
sanitizedFileName = regExPattern2.Replace(sanitizedFileName, replacement); // clean up whitespace
string sanitized = System.IO.Path.Combine(pathOnly, sanitizedFileName);
I hope that makes more sense.
you can perform another regex replace after your first one
#" +" -> " "
As Fosco said, with formatting:
while (mystring.Contains(" ")) mystring = mystring.Replace(" "," ");
// || || |
After you're done sanitizing it your way, simply replace 2 spaces with 1 space, while 2 spaces exist in the string.
while (mystring.Contains(" ")) mystring = mystring.Replace(" "," ");
I think that's the right syntax...

Categories