Search for files that have a common naming convention - c#

I have a folder, full of 38,000+ .pdf files. I was not the genius to put them all into one folder, but I now have the task of separating them. The files that are of value to us, all have the same basic naming convention, for example:
123456_20130604_NEST_IV
456789_20120209_VERT_IT
What I'm trying to do, if possible, is search the folder for only those files with that particular naming convention. As in, search only for files that have 6 digits, an underscore, and then 8 digits followed by another underscore. Kind of like *****_********. I've searched online but I haven't had much luck. Any help would be great!

var regex = new Regex(#"^\d{6}_\d{8}_", RegexOptions.Compiled);
string[] files = Directory.GetFiles(folderPath)
.Where(path => regex.Match(Path.GetFileName(path)).Success)
.ToArray();
files would contain paths to a files, that match criteria.
For my example C:\Temp\123456_20130604_NEST_IV 456789_20120209_VERT_IT.pdf, which I've added beforehand.
As a bonus, here is PowerShell script to do this (assuming you are in the correct folder, otherwise use gc "C:\temp" instead of dir):
dir | Where-Object {$_ -match "^\d{6}_\d{8}_"}

? - single character
* - multiple characters
So, I would say use ?????? _ ???????? _ ???? _ ??.* to get all your files
You can use move or copy command from a command prompt to do that.
If you want to do advanced searches such as pattern matching, use windows grep: http://www.wingrep.com/

Are you familiar with regular expressions? If not, they are a generalized way to search for strings of a special format. I see you tagged your question with C# so assuming you are writing a C# script you might try the .NET regular expression module.
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
If you are a beginner, you may want to start here.
http://www.codeproject.com/Articles/9099/The-30-Minute-Regex-Tutorial

There are numerous ways to handle this. What I like to do is to divide work into different steps with clear output/data in each step. Hence I would tackle this in the following way (since this seems easier for me instead of writing a master program in c# that does everything):
Open windows command prompt (start/run/cmd), navigate to correct
folder and then "dir *.pdf > pdf_files.txt". This would give you a
file containing all pdf-files inside the specific folder.
open up the txt-file (pdf_files.txt) in Notepad++ and then press "ctrl + f
(find)" activate radio button "regular expressions"
type: [0-9]{6}_[0-9]{8}_.*\.pdf and press "Find all in current document"
Copy results and save to new .txt-file
Now you have a text file containing all pdf-files that you can do what you want with (create a c# program that parses the files and move them depending on their name or whatever)

Related

sony vegas script: probleme to get directory path

I d like to create a blind test generator with a script using Sony vegas 14. For this I must make my script in C#.
I don’t have many experiences in C# so maybe my problem is a very basic one.
To do my script I must use a library class (.dll) and execute my script by Sony vegas. To test my code easily I create a console app where I try my code and can easily print in the console what my code does.
For my program y need to get the path of all subdirectory in a Directory in a string.
My problem is the next one.
the command "Directory.GetDirectories" don't work
When I use the next code to check what in my array/list I get a coherent result if I use it in the console app version on my script (the number of subdirectories in my directory)
string[] dirs = Directory.GetDirectories(myDirectorypath, "", SearchOption.TopDirectoryOnly);// get all directory path in dirs
Console.WriteLine("the number of element in your array is "+ dirs.Length);
List<string> listdedossier = new List<string>(dirs); // convert the array in a list
Console.WriteLine("the number of element in your list is " + listdedossier.Count);
But when in paste my code in my dll project nothing is written in my array or my list. I notice this because when I want to print the number of elements in the list /array that return me 0
.
do you have any idea of what happen i my code?
thanks
You should check the online Microsoft documentation for GetDirectories. The 2nd argument is supposed to be a pattern to search for that conforms to Windows file name patterns. Essentially, all or part of a file name is allowed with * being a wildcard (The .* from regex meaning "match any character any number of times") and ? being a single character wildcard (regex .). You are providing an empty string, so you get nothing back. The pattern *.exe will match all executables in a folder (if you are using GetFiles, while the pattern pattern* matches any files/folders that start with pattern. If you want all directories, do this:
string[] dirs = Directory.GetDirectories(myDirectoryPath, "*", SearchOption.TopDirectoryOnly);
Next point, the path you provide can be either a relative path (e.g., "relative\path\to\folder"), an absolute path (e.g., "D:\path\to\folder"), or a fully qualified domain name (FQDN, e.g. "\servername.gov.edu.com\drive$\path\to\folder"). If you supply a relative path, you'll need to look up Windows' rules for path resolution. It is very easy using a relative path to search the wrong folder, or even a non-existent location (though you should get an exception in that case). Also remember: Windows path names are NOT case-sensitive.
Finally, when writing text with arguments, I HIGHLY recommend you use this format:
Console.WriteLine("The number of elements in your array is {0}", dirs.Length);
This uses a place holder in the string itself which has a numeric value in it. The number indicates what argument after the format string to use (0 is the first argument after the format string). You can use as many placeholders as you want, and use the same place holder in multiple locations. This is a more type-safe way to doing string printing in C# than using the + operator, which requires that an operator be defined that takes a string and whatever type you provided. When you use placeholders, WriteLine will use the built-in ToString method which is defined for all types in the Object class. Placeholders will always work, while using + will only sometimes work.

Regular Expression For File Name in C#

I am trying to find a regular expression to parse two sections out of the file name for the .resx files in my project. There is one main file called "UiText.resx" and then many translation .resx files with convention "UiText.ja-JP.resx". I need both the "UiText" and the "ja-JP" out of the latter string, as we do have other resx files that don't have to be for UiText (e.g. I have some files named "ExceptionText.resx").
The pattern I'm using right now (which works, it just requires a little extra coding after) is "(?<=\.)((.*?)(?=\.resx))". For the example above, "UiText.ja-JP.resx" gets me a match set in C# of "UiText.", "ja-JP.", "ja-JP.", ".resx"
Of course I am able to just take the first occurrence of "ja-JP." and "UiText." from this set and massage it to what I want, but I'd rather just have a cleaner "UiText" "ja-JP" and be done with it.
I figure I'll probably have to have at least two different patterns for this, so that is OK. Thank you in advance!
Since UiText seems to be constant you can use this regex to extract just js-JP into $1:
^UiText\.(.+?)\.resx$
https://regex101.com/r/XKvwHA/1/
If I'm understanding your needs correctly, then the main reason you need "UiText" is not because you have any value for the term itself, but rather because you need to filter your files. The real term you need to play around with is "ja-JP", which changes for the files you need.
If I'm correct, try this regex:
(?<=UiText\.).+(?=\.resx)
Used in C# as follows:
var fileName = "UiText.ja-JP.resx";
var result = new Regex(#"(?<=^UiText\.).+(?=\.resx$)").Match(fileName).Value;
A little explanation:
(?<=^UiText\.) Start of string must begin exactly with "UiText."
.+ Any number of characters (but at least one)
(?=\.resx$) End of string must end with ".resx"
Any file that doesn't meet your criteria will return an empty string for 'result'.

How can I replicate FileSystemWatcher Filter functionality with Regex in C#

I have a single FileSystemWatcher object for a c# project, and multiple patterns to match when events occur.
As a project requirement, I can't create multiple FileSystemObjects and use individual filter methods to match my patterns. So have to have one filter set to *.* to
monitor all files, and when events are created I have to apply a Regex to process matching files per pattern in a loop. My patterns will contain * characters, same as how we pass it to Filter method.
How can I replicate the built in filter functionality with single Regex statement?
Some of the many FileSystemWatcher filter scenarios;
pi*.txt (These should match: piano.txt, pie.txt)
*.doc (Anything with the .doc extension should match)
04*2013.txt (any text file and it should be any day in the month of april)
pi*ar*.jpg (prefix is pi, contains ar after prefix anywhere in the file name and has to be a jpg extension)
pre*ar*pre (any file with pre prefix, followed by ar and followed by another pre)
*.* (Anything is accepted)
Few time back I wrote something similar for my work
public static string WildcardPatternToRegexPattern(string pattern)
{
return string.Format("^{0}$", Regex.Escape(pattern.Replace('/', Path.DirectorySeparatorChar)).Replace(#"\*", ".*").Replace(#"\?", "."));
}
If you are looking to learn regex statements, you can start by looking at http://txt2re.com/. if you paste in a piece of text, there is a point and click interface to generate regex expressions for you. i would also recommend just googling "preg".

Perform find & replace in excel doc in C#

I am trying to create a postage label that is populated dynamically in C#.
What I thought I could do was create the label in Excel so I could get the layout perfect, then use OpenXML to replace the fields. i.e. I have fields called things like XXAddressLine1XX, where I want to edit the XML and replace that text with the actual Address Line 1 from the database.
Has anyone actually done something similar to this before and could post some code up that I could try? I've used OpenXML to do this with word documents before, but I can't seem to find the XML data for the Excel document when using OpenXML in c# so am struggling to make progress.
Either that, or are there any better methods I could try to accomplish this?
Every now and then, there's a programming problem that is best solved with a non-programming solution. Businesses have had the need for printing mailing labels en masse for a long time, and in recent decades programs like Microsoft Word make this really simple via the "mail merge" feature. See http://office.microsoft.com/en-us/word-help/demo-use-mail-merge-to-format-and-print-mailing-labels-HA001190394.aspx for an example.
Word's mail merge will allow you to connect to a variety of data sources. The given example uses an Excel spreadsheet, but you can also use Access or SQL Databases, etc.
The purely academic answer is: it's possible. I wrote a set of template parsing classes that search through a PowerPoint presentation replacing tags from a mark-up language I invented with charts and dynamic text objects fetched from a database. The trickiest part of the string replacement bit was handling tags that occurred across Run elements inside a Paragraph element. This occurs usually if you use special characters such as '{' or spaces in your tags. I was able to solve it by storing the text of the entire TextBody element in a gigantic character array (in your case it would be the contents of a Cell element), storing a list of extents that enumerated where in the character array each Run element began and ended, and then walking the character array while paying attention to the Run boundaries appropriately. Mind that if your tag spans across multiple Run elements you'd need to remove any extras and snip content across boundaries before you inserted the replacement Run. Unfortunately I cannot post any code because the work was done for a company, but that's the general idea of how to achieve it. I was not able to handle any newline cases (i.e. a tag occurs with a newline in it) because that would require writing a cross Paragraph indexer, which was beyond the scope of what I wanted to achieve. That could be done as well, but it would be significantly more difficult I think.
The Powershell script extension PSExcel have an search and replace feature:
PSExcel: https://github.com/RamblingCookieMonster/PSExcel
Import-Module d:\psexcel\trunk\PSExcel\PSExcel
# Get commands in the module
Get-Command -Module PSExcel
# Get help for a command
Get-Help Import-XLSX -Full
# Modify the path of the Excel file to your local svn ceck-out
Get-ChildItem "C:\tmp\Source" -Filter *.xlsx | Foreach-Object{
$fileName = $_.FullName
# Open an existing XLSX to search and set cells within
$Excel = New-Excel -Path $_.FullName
$Workbook = $Excel | Get-Workbook
#$Excel | Get-Worksheet
$Worksheet = $Workbook | Get-Worksheet -Name "Sheet1"
#Search for any cells like 'Chris'. Set them all to Chris2
$Worksheet | Search-CellValue {$_ -like 'Chris'} -As Passthru | Set-CellValue -Value "Chris2"
#Save your changes and close the ExcelPackage
$Excel | Save-Excel -Close
}

How to get files in a relative path in C#

If I have an executable called app.exe which is what I am coding in C#, how would I get files from a folder loaded in the same directory as the app.exe, using relative paths?
This throws an illegal characters in path exception:
string [ ] files = Directory.GetFiles ( "\\Archive\\*.zip" );
How would one do this in C#?
To make sure you have the application's path (and not just the current directory), use this:
http://msdn.microsoft.com/en-us/library/system.diagnostics.process.getcurrentprocess.aspx
Now you have a Process object that represents the process that is running.
Then use Process.MainModule.FileName:
http://msdn.microsoft.com/en-us/library/system.diagnostics.processmodule.filename.aspx
Finally, use Path.GetDirectoryName to get the folder containing the .exe:
http://msdn.microsoft.com/en-us/library/system.io.path.getdirectoryname.aspx
So this is what you want:
string folder = Path.GetDirectoryName(Process.GetCurrentProcess().MainModule.FileName) + #"\Archive\";
string filter = "*.zip";
string[] files = Directory.GetFiles(folder, filter);
(Notice that "\Archive\" from your question is now #"\Archive\": you need the # so that the \ backslashes aren't interpreted as the start of an escape sequence)
Hope that helps!
string currentDirectory = Path.GetDirectoryName(Assembly.GetEntryAssembly().Location);
string archiveFolder = Path.Combine(currentDirectory, "archive");
string[] files = Directory.GetFiles(archiveFolder, "*.zip");
The first parameter is the path. The second is the search pattern you want to use.
Write it like this:
string[] files = Directory.GetFiles(#".\Archive", "*.zip");
. is for relative to the folder where you started your exe, and # to allow \ in the name.
When using filters, you pass it as a second parameter. You can also add a third parameter to specify if you want to search recursively for the pattern.
In order to get the folder where your .exe actually resides, use:
var executingPath = Path.GetDirectoryName(Assembly.GetEntryAssembly().Location);
As others have said, you can/should prepend the string with # (though you could also just escape the backslashes), but what they glossed over (that is, didn't bring it up despite making a change related to it) was the fact that, as I recently discovered, using \ at the beginning of a pathname, without . to represent the current directory, refers to the root of the current directory tree.
C:\foo\bar>cd \
C:\>
versus
C:\foo\bar>cd .\
C:\foo\bar>
(Using . by itself has the same effect as using .\ by itself, from my experience. I don't know if there are any specific cases where they somehow would not mean the same thing.)
You could also just leave off the leading .\ , if you want.
C:\foo>cd bar
C:\foo\bar>
In fact, if you really wanted to, you don't even need to use backslashes. Forwardslashes work perfectly well! (Though a single / doesn't alias to the current drive root as \ does.)
C:\>cd foo/bar
C:\foo\bar>
You could even alternate them.
C:\>cd foo/bar\baz
C:\foo\bar\baz>
...I've really gone off-topic here, though, so feel free to ignore all this if you aren't interested.
Pretty straight forward, use relative path
string[] offerFiles = Directory.GetFiles(Server.MapPath("~/offers"), "*.csv");

Categories