How can I delimitate a string at runtime? - c#

I am looking to write a utility to batch rename a bunch of files at once using a regular expression. The files that I will be renaming all at once follow a certain naming convention, and I want to alter them to a new naming convention using data that's already in the filenames; but not all my files follow the same convention currently.
So I want to be able to write a general use program that lets me input into a textbox during runtime the pattern of the filename, and what tokens I want to extract from the filename to use for renaming.
For example - Assume I have one file named [Coalgirls]_Suite_Precure_02_(1280x720_Blu-Ray_FLAC)_[33D74D55].mkv. I want to be able to rename this file to Suite Precure - Ep 02 [Coalgirls][33D74D55].mkv
This means I would preferably be able to enter into my program before renaming something akin to [%group%]_Suite_Precure_%ep%_(...)_[%crc%].mkv and it would populate the local variables group, ep, and crc to use in the batch rename.
One particular program I'm thinking of that does this is mp3tag, used for converting file names to id3 tags. It lets you put something like %artist% - %album% - %tracknumber% - %title%, and it takes those 4 tokens and puts them into the respective id3 tags.
How can I make a system similar to this without having to make the user know regex syntax?

As mentioned by usr, you can extract all the named placeholders in the search string using %(?<name>[^%]+)%. This will get you "group", "ep", and "crc".
Now you need to scan all the fragments between the placeholders and put a capture at each placeholder in the regex. I'd iterate through the matches from above (you can get start offset and length of each match to navigate through the non-placeholder fragments).
(There are mistakes in your example, I'll assume the last part is correct and I'm dropping the mysterious (...))
It would build a regex that looks like this:
^%(?<group>.*?)_Suite_Precure_(?<ep>.*?)_(?<crc>.*?).mkv$
Pass the literal fragments to Regex.Escape before using it in the regex to handle troublesome characters properly.
Now, for each filename, you try to match the regex to it. If it matches, you get the values of the placeholders for this file. Then you take those placeholder values and merge them into the output pattern, replacing the placeholders appropriately. This gives you the new name, you can do the rename.
using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
namespace renamer
{
class RenameImpl
{
public static IEnumerable<Tuple<string,string>> RenameWithPatterns(
string path, string curpattern, string newpattern,
bool caseSensitive)
{
var placeholderNames = new List<string>();
// Extract all the cur_placeholders from the user's input pattern
var input_regex = new Regex(#"(\%[^%]+\%)");
var cur_matches = input_regex.Matches(curpattern);
var new_matches = input_regex.Matches(newpattern);
var regex_pattern = new StringBuilder();
if (!caseSensitive)
regex_pattern.Append("(?i)");
regex_pattern.Append('^');
// Do a pass over the matches and grab info about each capture
var cur_placeholders = new List<Tuple<string, int, int>>();
var new_placeholders = new List<Tuple<string, int, int>>();
for (var i = 0; i < cur_matches.Count; ++i)
{
var m = cur_matches[i];
cur_placeholders.Add(new Tuple<string, int, int>(
m.Value, m.Index, m.Length));
}
for (var i = 0; i < new_matches.Count; ++i)
{
var m = new_matches[i];
new_placeholders.Add(new Tuple<string, int, int>(
m.Value, m.Index, m.Length));
}
// Build the regular expression
for (var i = 0; i < cur_placeholders.Count; ++i)
{
var ph = cur_placeholders[i];
// Get the literal before the first capture if it is the first
if (i == 0 && ph.Item2 > 0)
regex_pattern.Append(Regex.Escape(
curpattern.Substring(0, ph.Item2)));
// Generate the capture for the placeholder
regex_pattern.AppendFormat("(?<{0}>.*?)",
ph.Item1.Replace("%", ""));
// The literal after the placeholder
if (i + 1 == cur_placeholders.Count)
regex_pattern.Append(Regex.Escape(
curpattern.Substring(ph.Item2 + ph.Item3)));
else
regex_pattern.Append(Regex.Escape(
curpattern.Substring(ph.Item2 + ph.Item3,
cur_placeholders[i + 1].Item2 - (ph.Item2 + ph.Item3))));
}
regex_pattern.Append('$');
var re = new Regex(regex_pattern.ToString());
foreach (var pathname in Directory.EnumerateFileSystemEntries(path))
{
var file = Path.GetFileName(pathname);
var m = re.Match(file);
if (!m.Success)
continue;
// New name is initially same as target pattern
var newname = newpattern;
// Iterate through the placeholder names
for (var i = new_placeholders.Count; i > 0; --i)
{
// Target placeholder name
var tn = new_placeholders[i-1].Item1.Replace("%", "");
// Get captured value for this capture
var ct = m.Groups[tn].Value;
// Perform the replacement
newname = newname.Remove(new_placeholders[i - 1].Item2,
new_placeholders[i - 1].Item3);
newname = newname.Insert(new_placeholders[i - 1].Item2, ct);
}
newname = Path.Combine(path, newname);
yield return new Tuple<string, string>(pathname, newname);
}
}
}
}

Make the regex pattern %(?<name>[^%]+)%. This will capture you all tokens in the string that are surrounded by percent signs.
Then, use Regex.Replace to replace them:
var replaced = Regex.Replace(input, pattern, (Match m) => EvaluateToken(m.Groups["name"].Value));
Regex.Replace can take a callback that allows you to provide a dynamic value.

Related

How to get all files ending with the extension "_\<fileNum>of\<totalFileNum>" and sometimes without? [duplicate]

a user specifies a file name that can be either in the form "<name>_<fileNum>of<fileNumTotal>" or simply "<name>". I need to somehow extract the "<name>" part from the full file name.
Basically, I am looking for a solution to the method "ExtractName()" in the following example:
string fileName = "example_File"; \\ This var is specified by user
string extractedName = ExtractName(fileName); // Must return "example_File"
fileName = "example_File2_1of5";
extractedName = ExtractName(fileName); // Must return "example_File2"
fileName = "examp_File_3of15";
extractedName = ExtractName(fileName); // Must return "examp_File"
fileName = "example_12of15";
extractedName = ExtractName(fileName); // Must return "example"
Edit: Here's what I've tried so far:
ExtractName(string fullName)
{
return fullName.SubString(0, fullName.LastIndexOf('_'));
}
But this clearly does not work for the case where the full name is just "<name>".
Thanks
This would be easier to parse using Regex, because you don't know how many digits either number will have.
var inputs = new[]
{
"example_File",
"example_File2_1of5",
"examp_File_3of15",
"example_12of15"
};
var pattern = new Regex(#"^(.+)(_\d+of\d+)$");
foreach (var input in inputs)
{
var match = pattern.Match(input);
if (!match.Success)
{
// file doesn't end with "#of#", so use the whole input
Console.WriteLine(input);
}
else
{
// it does end with "#of#", so use the first capture group
Console.WriteLine(match.Groups[1].Value);
}
}
This code returns:
example_File
example_File2
examp_File
example
The Regex pattern has three parts:
^ and $ are anchors to ensure you capture the entire string, not just a subset of characters.
(.+) - match everything, be as greedy as possible.
(_\d+of\d+) - match "_#of#", where "#" can be any number of consecutive digits.

How would I access a txt file and split the links

Alright, I have a program that grabs links off of a website and puts it into a txt BUT the links aren't separated onto their own lines and I need to somehow do that without having to manually do it myself, here is the code used to grab the links off of the website, write the links to a text file then grab the txt file and read it.
private void linkLabel1_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
var client = new WebClient();
string text = client.DownloadString("https://currentlinks.com");
File.WriteAllText("C:/ProgramData/oof.txt", text);
string searchKeyword = "https://foobar.to/showthread.php";
string fileName = "C:/ProgramData/oof.txt";
string[] textLines = File.ReadAllLines(fileName);
List<string> results = new List<string>();
foreach (string line in textLines)
{
if (line.Contains(searchKeyword))
{
results.Add(line);
}
var sb = new StringBuilder();
foreach (var item in results)
{
sb.Append(item);
}
textBox1.Text = sb.ToString();
var parsed = textBox1;
TextWriter tw = new StreamWriter("C:/ProgramData/parsed.txt");
// write lines of text to the file
tw.WriteLine(parsed);
// close the stream
tw.Close();
}
}
You are getting all the Links (URLs) in one single string. There is not straight forward way to get all the URLs individually without some assumptions.
With the sample data you shared, I assume that the URLs in the string follow simple URLs format and do not have any fancy stuff in it. They start with http and one url does not have any other http.
With above assumptions, I suggest following code.
// Sample data as shared by the OP
string data = "https://forum.to/showthread.php?tid=22305https://forum.to/showthread.php?tid=22405https://forum.to/showthread.php?tid=22318";
//Splitting the string by string `http`
var items = data.Split(new [] {"http"},StringSplitOptions.RemoveEmptyEntries).ToList();
//At this point all the strings in items collection will be without "http" at the start.
//So they will look like as following.
// s://forum.to/showthread.php?tid=22305
// s://forum.to/showthread.php?tid=22405
// s://forum.to/showthread.php?tid=22318
//So we need to add "http" at the start of each of the item as following.
items = items.Select(i => "http" + i).ToList();
// After this they will become like following.
// https://forum.to/showthread.php?tid=22305
// https://forum.to/showthread.php?tid=22405
// https://forum.to/showthread.php?tid=22318
//Now we need to create a single string with newline character between two items so
//that they represent a single line individually.
var text = String.Join("\r\n", items);
// Then write the text to the file.
File.WriteAllText("C:/ProgramData/oof.txt", text);
This should help you resolve your issue.
.Split way
Could you use yourString.Split("https://");?
Example:
//This simple example assumes that all links are https (not http)
string contents = "https://www.example.com/dogs/poodles/poodle1.htmlhttps://www.example.com/dogs/poodles/poodle2.html";
const string Prefix = "https://";
var linksWithoutPrefix = contents.Split(Prefix, StringSplitOptions.RemoveEmptyEntries);
//using System.Linq
var linksWithPrefix = linksWithoutPrefix.Select(l => Prefix + l);
foreach (var match in linksWithPrefix)
{
Console.WriteLine(match);
}
Regex way
Another option is to use reg exp.
Failed - cannot find/write the right regex ... got to go now
string contents = "http://www.example.com/dogs/poodles/poodle1.htmlhttp://www.example.com/dogs/poodles/poodle2.html";
//From https://regexr.com/
var rgx = new Regex(#"(?<Protocol>\w+):\/\/(?<Domain>[\w#][\w.:#]+)\/?[\w\.?=%&=\-#/$,]*");
var matches = rgx.Matches(contents);
foreach(var match in matches )
{
Console.WriteLine(match);
}
//This finds 'http://www.example.com/dogs/poodles/poodle1.htmlhttp' (note the htmlhttp at the end

Reading a string from and to a certain character

I would like to know if it is possible to read a certain string of n-length, from x character to y character without having to split it into smaller pieces.
I have the following string, for a path in AD CN=someName,OU=someGroup,OU=groups,DC=some,DC=domain,DC=com, and I would like to be able to just read the someName part of it, without splitting by = or , first. How do I achieve that?
Reason is, that I do not have to do group comparison as I am doing it right now:
SearchResult t1 = search.FindOne();
foreach (string s in t1.Properties["memberof"])
{
foreach (string g in groups)
{
if (s.ToLower().Contains(g.ToLower()))
{
// do something
}
}
}
I would rather make the if clause to equals, but I do not want to always split the above path/groups into an array twice. How do I do that?
Using simple string manipulation with IndexOf and Substring:
string s = "CN=someName,OU=someGroup,OU=groups,DC=some,DC=domain,DC=com";
const string prefix = "CN=";
int start = s.IndexOf(prefix);
if (start >= 0)
{
string value = s.Substring(start + prefix.Length, s.IndexOf(',', start) - prefix.Length);
Console.WriteLine(value);
}
Note that this simple example would fail if the CN= entry was the last in the line (since it’s not terminated by a comma). You could check that first by looking at the return value of the second IndexOf call though.
But in this case, CN= will usually be the first thing anyway.
If you are doing group comparisons I would use System.DirectoryServices.AccountManagement namespace
PrincipalContext Context = new PrincipalContext(ContextType.Domain, "");
UserPrincipal Usr = UserPrincipal.FindByIdentity(Context, "User");
GroupPrincipal G = GroupPrincipal.FindByIdentity(Context, "Group");
if(Usr.IsMemberOf(G)) {
}
You can use String.IndexOf to find the correct offset, then use String.SubString to read the part you want.
const string input = "CN=someName,OU=someGroup,OU=groups,DC=some,DC=domain,DC=com";
const string start = "CN=";
const string stop = ",";
int startIndex = input.IndexOf(start, 0);
int stopIndex = input.IndexOf(stop, startIndex);
var extracted = input.Substring(startIndex + start.Length, stopIndex - startIndex - start.Length);
Console.WriteLine(extracted);
.net Fiddle
PS: maybe also take a look at Is there a .NET class that can parse CN= strings out of LDAP? for your special usecase!
With Split you can check if any of the given keys equals to the search value.
var val = "CN=someName,OU=someGroup,OU=groups,DC=some,DC=domain,DC=com";
var prefix = "CN";
var searchValue = "someName";
var contains = val.Split(',').Any(value => value.Split('=')[0] == prefix && value.Split('=')[1] == searchValue);
Insead of checking if the value is equal to the search value you can also just return the value.
var val = "CN=someName,OU=someGroup,OU=groups,DC=some,DC=domain,DC=com";
var prefix = "CN";
var foundValue = val.Split(',').FirstOrDefault(value => value.Split('=')[0] == prefix)?.Split('=')[1];
I still used Split despite you said you don't want to use it as I think it makes a nice one liner.

When using indexof and substring how do i parse the right start and end indexs ? And how do i encode hebrew chars?

I have this code:
string firstTag = "Forums2008/forumPage.aspx?forumId=";
string endTag = "</a>";
index = forums.IndexOf(firstTag, index1);
if (index == -1)
continue;
var secondIndex = forums.IndexOf(endTag, index);
result = forums.Substring(index + firstTag.Length + 12, secondIndex - (index + firstTag.Length - 50));
The string i want to extract from is for example:
הנקה
What i want to get is the word after the title only this: הנקה
And the second problem is that when i'm extracting it i see instead hebrew some gibrish like this: ������
One powerful way to do this is to use Regular Expressions instead of trying to find a starting position and use a substring. Try out this code, and you'll see that it extracts the anchor tag's title:
var input = "הנקה";
var expression = new System.Text.RegularExpressions.Regex(#"title=\""([^\""]+)\""");
var match = expression.Match(input);
if (match.Success) {
Console.WriteLine(match.Groups[1]);
}
else {
Console.WriteLine("not found");
}
And for the curious, here is a version in JavaScript:
var input = 'הנקה';
var expression = new RegExp('title=\"([^\"]+)\"');
var results = expression.exec(input);
if (results) {
document.write(results[1]);
}
else {
document.write("not found");
}
Okay here is the solution using String.Substring() String.Split() and String.IndexOf()
String str = "הנקה"; // <== Assume this is passing string. Yes unusual scape sequence are added
int splitStart = str.IndexOf("title="); // < Where to start splitting
int splitEnd = str.LastIndexOf("</a>"); // < = Where to end
/* What we try to extract is this : title="הנקה">הנקה
* (Given without escape sequence)
*/
String extracted = str.Substring(splitStart, splitEnd - splitStart); // <=Extracting required portion
String[] splitted = extracted.Split('"'); // < = Now split with "
Console.WriteLine(splitted[1]); // <= Try to Out but yes will produce ???? But put a breakpoint here and check the values in split array
Now the problem, here you can see that i have to use escape sequence in an unusual way. You may ignore that since you are simply passing the scanning string.
And this actually works, but you cannot visualize it with the provided Console.WriteLine(splitted[1]);
But if you put a break point and check the extracted split array you can see that text are extracted. you can confirm it with following screenshot

Search and replace values in text file with C#

I have a text file with a certain format. First comes an identifier followed by three spaces and a colon. Then comes the value for this identifier.
ID1 :Value1
ID2 :Value2
ID3 :Value3
What I need to do is searching e.g. for ID2 : and replace Value2 with a new value NewValue2. What would be a way to do this? The files I need to parse won't get very large. The largest will be around 150 lines.
If the file isn't that big you can do a File.ReadAllLines to get a collection of all the lines and then replace the line you're looking for like this
using System.IO;
using System.Linq;
using System.Collections.Generic;
List<string> lines = new List<string>(File.ReadAllLines("file"));
int lineIndex = lines.FindIndex(line => line.StartsWith("ID2 :"));
if (lineIndex != -1)
{
lines[lineIndex] = "ID2 :NewValue2";
File.WriteAllLines("file", lines);
}
Here's a simple solution which also creates a backup of the source file automatically.
The replacements are stored in a Dictionary object. They are keyed on the line's ID, e.g. 'ID2' and the value is the string replacement required. Just use Add() to add more as required.
StreamWriter writer = null;
Dictionary<string, string> replacements = new Dictionary<string, string>();
replacements.Add("ID2", "NewValue2");
// ... further replacement entries ...
using (writer = File.CreateText("output.txt"))
{
foreach (string line in File.ReadLines("input.txt"))
{
bool replacementMade = false;
foreach (var replacement in replacements)
{
if (line.StartsWith(replacement.Key))
{
writer.WriteLine(string.Format("{0} :{1}",
replacement.Key, replacement.Value));
replacementMade = true;
break;
}
}
if (!replacementMade)
{
writer.WriteLine(line);
}
}
}
File.Replace("output.txt", "input.txt", "input.bak");
You'll just have to replace input.txt, output.txt and input.bak with the paths to your source, destination and backup files.
Ordinarily, for any text searching and replacement, I'd suggest some sort of regular expression work, but if this is all you're doing, that's really overkill.
I would just open the original file and a temporary file; read the original a line at a time, and just check each line for "ID2 :"; if you find it, write your replacement string to the temporary file, otherwise, just write what you read. When you've run out of source, close both, delete the original, and rename the temporary file to that of the original.
Something like this should work. It's very simple, not the most efficient thing, but for small files, it would be just fine:
private void setValue(string filePath, string key, string value)
{
string[] lines= File.ReadAllLines(filePath);
for(int x = 0; x < lines.Length; x++)
{
string[] fields = lines[x].Split(':');
if (fields[0].TrimEnd() == key)
{
lines[x] = fields[0] + ':' + value;
File.WriteAllLines(lines);
break;
}
}
}
You can use regex and do it in 3 lines of code
string text = File.ReadAllText("sourcefile.txt");
text = Regex.Replace(text, #"(?i)(?<=^id2\s*?:\s*?)\w*?(?=\s*?$)", "NewValue2",
RegexOptions.Multiline);
File.WriteAllText("outputfile.txt", text);
In the regex, (?i)(?<=^id2\s*?:\s*?)\w*?(?=\s*?$) means, find anything that starts with id2 with any number of spaces before and after :, and replace the following string (any alpha numeric character, excluding punctuations) all the way 'till end of the line. If you want to include punctuations, then replace \w*? with .*?
You can use regexes to achieve this.
Regex re = new Regex(#"^ID\d+ :Value(\d+)\s*$", RegexOptions.IgnoreCase | RegexOptions.Compiled);
List<string> lines = File.ReadAllLines("mytextfile");
foreach (string line in lines) {
string replaced = re.Replace(target, processMatch);
//Now do what you going to do with the value
}
string processMatch(Match m)
{
var number = m.Groups[1];
return String.Format("ID{0} :NewValue{0}", number);
}

Categories