I have a situation where I need to extract dates from the file names whose general pattern is [XXXX_BBBB]_YYYY-MM-DD[.fileExtension] example Sales_person_2019-05-03.xlsx.
I am using c# in the SSIS script task component to achieve this.
below is my code:
public void Main()
{
// TODO: Add your code here
string pat;
string date;
string filename = 'Sales_person_2019-05-03.xlsx'
// Get the Date part from the file name only
pat = #"[0-9]{2}[0-9]{2}[0-9]{4}";
Regex r = new Regex(pat, RegexOptions.IgnoreCase);
date = r.Match(filename);
MessageBox.Show(date.ToString());}
Dts.TaskResult = (int)ScriptResults.Success;
}
but this is not working. Can someone help, please. Newbie on C#
You can achieve this without regular expressions, just use string functions (IndexOf() and Substring()):
Since you are handling fixed pattern [XXXX_BBBB]_YYYY-MM-DD[.fileExtension], just retrieve the 10 characters located after the second underscore.
public void Main()
{
string filename = "Sales_person_2019-05-03.xlsx";
// Get the Date part from the file name only
string filedate = filename.Substring(filename.IndexOf('_',filename.IndexOf('_') + 1) + 1,10);
DateTime dt = DateTime.ParseExact(filedate, "yyyy-MM-dd", System.Globalization.CultureInfo.InvariantCulture, System.Globalization.DateTimeStyles.None)
Dts.TaskResult = (int)ScriptResults.Success;
}
Related
I have to parse a log file and not sure how to best take different pieces of each line. The problem I am facing is original developer used ':' to delimit tokens which was a bit idiotic since the line contains timestamp which itself contains ':'!
A sample line looks something like this:
transaction_date_time:[systemid]:sending_system:receiving_system:data_length:data:[ws_name]
2019-05-08 15:03:13:494|2019-05-08 15:03:13:398:[192.168.1.2]:ABC:DEF:67:cd71f7d9a546ec2b32b,AACN90012001000012,OPNG:[WebService.SomeName.WebServiceModule::WebServiceName]
I have no problem reading the log file and accessing each line but no sure how to get the pieces parsed?
Since the input string is not exactly splittable, because of the delimiter char is also part of the content, a simple regex expression can be used instead.
Simple but probably fast enough, even with the default settings.
The different parts of the input string can be separated with these capturing groups:
string pattern = #"^(.*?)\|(.*?):\[(.*?)\]:(.*?):(.*?):(\d+):(.*?):\[(.*)\]$";
This will give you 8 groups + 1 (Group[0]) which contains the whole string.
Using the Regex class, simply pass a string to parse (named line, here) and the regex (named pattern) to the Match() method, using default settings:
var result = Regex.Match(line, pattern);
The Groups.Value property returns the result of each capturing group. For example, the two dates:
var dateEnd = DateTime.ParseExact(result.Groups[1].Value, "yyyy-MM-dd hh:mm:ss:ttt", CultureInfo.InvariantCulture),
var dateStart = DateTime.ParseExact(result.Groups[2].Value, "yyyy-MM-dd hh:mm:ss:ttt", CultureInfo.InvariantCulture),
The IpAddress is extracted with: \[(.*?)\].
You could give a name to this grouping, so it's more clear what the value refers to. Simply add a string, prefixed with ? and enclosed in <> or single quotes ' to name the grouping:
...\[(?<IpAddress>.*?)\]...
Note, however, that naming a group will modify the Regex.Groups indexing: the un-named groups will be inserted first, the named groups after. So, naming only the IpAddress group will cause it to become the last item, Groups[8]. Of course you can name all the groups and the indexing will be preserved.
var hostAddress = IPAddress.Parse(result.Groups["IpAddress"].Value);
This patter should allow a medium machine to parse 130,000~150,000 strings per second.
You'll have to test it to find the perfect pattern. For example, the first match (corresposnding to the first date): (.*?)\|, is much faster if non-greedy (using the *? lazy quantifier). The opposite for the last match: \[(.*)\]. The pattern used by jdweng is even faster than the one used here.
See Regex101 for a detailed description on the use and meaning of each token.
Using Regex I was able to parse everything. It looks like the data came from excel because the faction of seconds has a colon instead of a period. c# does not like the colon so I had to replace colon with a period. I also parsed from right to left to get around the colon issues.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;
namespace ConsoleApplication3
{
class Program1
{
const string FILENAME = #"c:\temp\test.txt";
static void Main(string[] args)
{
string line = "";
int rowCount = 0;
StreamReader reader = new StreamReader(FILENAME);
string pattern = #"^(?'time'.*):\[(?'systemid'[^\]]+)\]:(?'sending'[^:]+):(?'receiving'[^:]+):(?'length'[^:]+):(?'data'[^:]+):\[(?'ws_name'[^\]]+)\]";
while ((line = reader.ReadLine()) != null)
{
line = line.Trim();
if (line.Length > 0)
{
if (++rowCount != 1) //skip header row
{
Log_Data newRow = new Log_Data();
Log_Data.logData.Add(newRow);
Match match = Regex.Match(line, pattern, RegexOptions.RightToLeft);
newRow.ws_name = match.Groups["ws_name"].Value;
newRow.data = match.Groups["data"].Value;
newRow.length = int.Parse(match.Groups["length"].Value);
newRow.receiving_system = match.Groups["receiving"].Value;
newRow.sending_system = match.Groups["sending"].Value;
newRow.systemid = match.Groups["systemid"].Value;
//end data is first then start date is second
string[] date = match.Groups["time"].Value.Split(new char[] {'|'}).ToArray();
string replacePattern = #"(?'leader'.+):(?'trailer'\d+)";
string stringDate = Regex.Replace(date[1], replacePattern, "${leader}.${trailer}", RegexOptions.RightToLeft);
newRow.startDate = DateTime.Parse(stringDate);
stringDate = Regex.Replace(date[0], replacePattern, "${leader}.${trailer}", RegexOptions.RightToLeft);
newRow.endDate = DateTime.Parse(stringDate );
}
}
}
}
}
public class Log_Data
{
public static List<Log_Data> logData = new List<Log_Data>();
public DateTime startDate { get; set; } //transaction_date_time:[systemid]:sending_system:receiving_system:data_length:data:[ws_name]
public DateTime endDate { get; set; }
public string systemid { get; set; }
public string sending_system { get; set; }
public string receiving_system { get; set; }
public int length { get; set; }
public string data { get; set; }
public string ws_name { get; set; }
}
}
I am trying to create a new folder with todays date on specific given path:
string LocalDirectory = Directory.CreateDirectory(
DateTime.Now.ToString("I:\\test\\final test\\snaps\\dd-MM-yyyy"));
But I receive this error:
Cannot implicitly convert type 'System.IO.DirectoryInfo' to 'string'
As per the documentation for Directory.CreateDirectory, CreateDirectory returns a DirectoryInfo object, not a string.
So do this:
DirectoryInfo localDirectory = Directory.CreateDirectory(...
or this:
var localDirectory = Directory.CreateDirectory(...
(which will basically do the same thing)
The code can be written as :
String Todaysdate = DateTime.Now.ToString("dd-MMM-yyyy");
if(!Directory.Exists("I:\\test\\final test\\snaps\\" + Todaysdate)
{
Directory.CreateDirectory("I:\\test\\final test\\snaps\\" + Todaysdate);
}
Directory.CreateDirectory return a DirectoryInfo not string
you can try something like this
DirectoryInfo LocalDirectory = Directory.CreateDirectory(string.Format("I:\\test\\final test\\snaps\\{0}-{1}-{2}", DateTime.Now.Day, DateTime.Now.Month, DateTime.Now.Year));
to get the path as string
string strLocalDir = LocalDirectory.FullName;
Here is about the simplest way of creating a new folder named with todays date.
using System;
namespace CreateNewFolder
{
class Program
{
static void Main(string[] args)
{
string Todaysdate = DateTime.Now.ToString("-dd-MM-yyyy-(hh-mm-ss)");
{
Directory.CreateDirectory("c:/Top-Level Folder/Subfolder/Test" + Todaysdate);
}
}
}
}
Output of New folder name:
Test-02-05-2018-(11-05-02)
I put the hours, minutes and seconds inside some parentheses for clarity.
You can take out any part of the date to return only the time/date portion you want to call your folder. If you don’t want to call it “Test-02-05-2018-(11-05-02)” but simply have todays date as the name; like “02-05-2018”, then remove the “Test” from the “CreateDirectory” line but leave a blank space between -Subfolder/ and the closing quotation mark. Like this:
Directory.CreateDirectory("c:/Top-Level Folder/Subfolder/ " + Todaysdate);
Notice that I added a hyphen between the date parameters. This is just a visual separator for the date, you could also use a “space” as the separator.
I know this string is about 4 years old, but maybe this will help another newbie just starting out in C#.
Enjoy and share.
Take into account the culture
var rootOutputDir = #"I:\test\final test\snaps";
var Todaysdate = DateTime.Now.ToString(CultureInfo.CurrentUICulture.DateTimeFormat.ShortDatePattern.Replace("/", "-"));
Directory.CreateDirectory(Path.Combine(rootOutputDir, Todaysdate));
string path = Server.MapPath(#"/Content/");
path = Path.Combine(path,DateTime.Now.ToString('ddmmyyyy'));
if (!Directory.Exists(path))
{
Directory.CreateDirectory(path);
}
I wanted to create directories for the year and then the month inside the year folder.
Here's what worked for me:
public void CreateDirectory()
{
string strArchiveFolder = (#"\\fullpath" + DateTime.Now.Year.ToString() + "\\" +
DateTime.Now.Month.ToString());
if (!Directory.Exists(strArchiveFolder))
{
Directory.CreateDirectory(strArchiveFolder);
}
I want to save my data to a text file but the file name must contain 2 diferent strings, here's what I've do so far:
string input = "Name_"
string input2 = string.Format("stats-{0:yyyy-MM-dd}.txt",
DateTime.Now);
and I can't figure it out how to add here: string.Format(input, "stats...
and the file name must be like:
*Name_stats-2013-11-27.txt*
Strings can be concatenated simply by using the + operator:
string filename = input + input2;
Also, you can add multiple tags to your format-operation:
string format = string.Format("{0}stats-{1:yyyy-MM-dd}.txt", input, DateTime.Now);
Just do,
string input = "Name_"
string input2 = string.Format("stats-{0:yyyy-MM-dd}.txt",
DateTime.Now);
var fileName = input + input2;
or alternatively,
var fileName = string.Format(
"{0}stats-{1:yyyy-MM-dd}.txt",
"Name_", // Or an actual name
DateTime.Now)
With Format you start counting at 0 and then continue to count up each placeholder. So your text would be
string result = string.Format("{0}stats-{1:yyyy-MM-dd}.txt", input, DateTime.Now);
Why not try this? Make your life easier...
string input2 = string.Format("{0} stats-{1:yyyy-MM-dd}.txt", input, DateTime.Now);
I have a string of the format MASTER CARD 01/01/2012, I need to grab the date part separately.
Sometimes it could be VISA 01/01/2012, I have tried splitting by the space but got stuck when there are two spaces like in the case of MASTER CARD 01/01/2012.
Any help would be much appreciated;
string date = e.Brick.Text;
string[] dates = date.Split(' ');
The way your strings look, you will get the date in your last element in array.
//dates[dates.Length-1] should have date
string date = "MASTER CARD 01/01/2012";
string[] dates = date.Split(' ');
Console.WriteLine(dates[dates.Length - 1]);
A proper solution should be to check each item against DateTime, something on the following line
DateTime tempDs = new DateTime();
foreach (string str in dates)
{
if (DateTime.TryParse(str, out tempDs))
{
Console.WriteLine("Found Date");
}
}
Assuming all the dates for the various cards have similar formatting, Regular Expressions could be a viable alternative.
using System.Text.RegularExpressions;
Match mDate = Regex.Match(e.Brick.Text, #"\b(?<date>(?:\d{1,2}[\\/-]){2}\d{4})\b", RegexOptions.Compiled);
if (mDate.Success)
{
MessageBox.Show(string.Format("Date: {0}", mDate.Groups["date"].Value));
}
Split by spaces and use the DateTime.TryParse method to parse the dates. The method should fail for VISA, MASTER, and CARD; but it will succeed for the date parts of the string.
You can use your code.
If the date is always at the end of the string you can do something like
year = dates[dates.Length-1]
And so on for month and day
Here is another alternative:
string date = e.Brick.Text.Substring(e.Brick.Text.LastIndexOf(' ')+1);
This should do the trick.
public string ExtractDateTimeString(string s){
return s.Split(' ').Where(x =>
{
DateTime o;
return DateTime.TryParse(x, out o);
}).FirstOrDefault();
}
Or another way:
string text = "MASTER CARD 4.5.2012";
string[] split = text.Split(' ');
string mc = "";
string date = ""; //when you get this value, you can easily convert to date if you need it
foreach (string str in split)
{
if (char.IsNumber(str[0]))
{
date = str;
mc = mc.Remove(mc.Length - 1, 1);
}
else
mc += str + " ";
}
I have these strings as a response from a FTP server:
02-17-11 01:39PM <DIR> dec
04-06-11 11:17AM <DIR> Feb 2011
05-10-11 07:09PM 87588 output.xlsx
06-10-11 02:52PM 3462 output.xlsx
where the pattern is: [datetime] [length or <dir>] [filename]
Edit: my code was- #"^\d{2}-\d{2}-\d{2}(\s)+(<DIR>|(\d)+)+(\s)+(.*)+"
I need to parse these strings in this object:
class Files{
Datetime modifiedTime,
bool ifTrueThenFile,
string name
}
Please note that, filename may have spaces.
I am not good at regex matching, can you help?
Regex method
One approach is using this regex
#"(\d{2}-\d{2}-\d{2} \d{2}:\d{2}(?:PM|AM)) (<DIR>|\d+) (.+)";
I am capturing groups, so
// Group 1 - Matches the DateTime
(\d{2}-\d{2}-\d{2} \d{2}:\d{2}(?:PM|AM))
Notice the syntax (?:xx), it means that the content here will not be caught in a group, we need to match PM or AM but this group alone doesn't matter.
Next I match the file size or <DIR> with
// Group 2 - Matches the file size or <DIR>
(<DIR>|\d+)
Catching the result in a group.
The last part matches directory names or file names
// Group 3 - Matches the dir/file name
(.+)
Now that we captured all groups we can parse the values:
DateTime.Parse(g[1].Value); // be careful with current culture
// a different culture may not work
To check if the captured entry is a file or not you can just check if it is <DIR> or a number.
IsFile = g[2].Value != "<DIR>"; // it is a file if it is not <DIR>
And the name is just what is left
Name = g[3].Value; // returns a string
Then you can use the groups to build the object, an example:
public class Files
{
public DateTime ModifiedTime { get; set; }
public bool IsFile { get; set; }
public string Name { get; set; }
public Files(GroupCollection g)
{
ModifiedTime = DateTime.Parse(g[1].Value);
IsFile = g[2].Value != "<DIR>";
Name = g[3].Value;
}
}
static void Main(string[] args)
{
var p = #"(\d{2}-\d{2}-\d{2} \d{2}:\d{2}(?:PM|AM)) (<DIR>|\d+) (.+)";
var regex = new Regex(p, RegexOptions.IgnoreCase);
var m1 = regex.Match("02-17-11 01:39PM <DIR> dec");
var m2 = regex.Match("05-10-11 07:09PM 87588 output.xlsx");
// DateTime: 02-17-11 01:39PM
// IsFile : false
// Name : dec
var file1 = new Files(m1.Groups);
// DateTime: 05-10-11 07:09PM
// IsFile : true
// Name : output.xlsx
var file2 = new Files(m2.Groups);
}
Further reading
Regex class
Regex groups
String manipulation method
Another way to achieve this is to split the string which can be much faster:
public class Files
{
public DateTime ModifiedTime { get; set; }
public bool IsFile { get; set; }
public string Name { get; set; }
public Files(string line)
{
// Gets the date part and parse to DateTime
ModifiedTime = DateTime.Parse(line.Substring(0, 16));
// Gets the file information part and split
// in two parts
var fileBlock = line.Substring(17).Split(new char[] { ' ' }, 2);
// first part tells if it is a file
IsFile = fileBlock[0] != "<DIR>";
// second part tells the name
Name = fileBlock[1];
}
}
static void Main(string[] args)
{
// DateTime: 02-17-11 01:39PM
// IsFile : false
// Name : dec
var file3 = new Files("02-17-11 01:39PM <DIR> dec");
// DateTime: 05-10-11 07:09PM
// IsFile : true
// Name : out put.xlsx
var file4 = new Files("05-10-11 07:09PM 87588 out put.xlsx");
}
Further reading
String split
String.Split Method (Char[], Int32)
You can try with something like:
^(\d\d-\d\d-\d\d)\s+(\d\d:\d\d[AP]M)\s+(\S+)\s+(.*)$
The first capture group will contain the date, the second the time, the third the size (or <DIR>, and the last everything else (which will be the filename).
(Note that this is probably not portable, the time format is locale dependent.)
Here you go:
(\d{2})-(\d{2})-(\d{2}) (\d{2}):(\d{2})([AP]M) (<DIR>|\d+) (.+)
I used a lot of sub expressions, so it would catch all relevant parts like year, hour, minute etc. Maybe you dont need them all, just remove the brackets in case.
try this
String regexTemp= #"(<Date>(\d\d-\d\d-\d\d\s*\d\d:\d\dA|PM)\s*(<LengthOrDir>\w*DIR\w*|\d+)\s*(<Name>.*)";
Match mExprStatic = Regex.Match(regexTemp, RegexOptions.IgnoreCase | RegexOptions.Singleline);
if (mExprStatic.Success || !string.IsNullOrEmpty(mExprStatic.Value))
{
DateTime _date = DateTime.Parse(mExprStatic.Groups["lang"].Value);
String lengthOrDir = mExprStatic.Groups["LengthOrDir"].Value;
String Name = mExprStatic.Groups["Name"].Value;
}
A lot of good answers, but I like regex puzzles, so I thought I'd contribute a slightly different version...
^([\d- :]{14}[A|P]M)\s+(<DIR>|\d+)\s(.+)$
For help in testing, I always use this site : http://www.myregextester.com/index.php
You don't need to use regex here. Why don't you split the string by spaces with a number_of_elements limit:
var split = yourEntryString.Split(new string []{" "}, 4,
StringSplitOptions.RemoveEmptyEntries);
var date = string.Join(" ", new string[] {split[0], split[1]});
var length = split[2];
var filename = split[3];
this is of course assuming that the pattern is correct and none of the entries would be empty.
I like the regex Leif posted.
However, i'll give you another solution which people will probably hate: fast and dirty solution which i am coming up with just as i am typing:
string[] allParts = inputText.Split(" ")
allParts[0-1] = parse your DateTime
allParts[2] = <DIR> or Size
allParts[3-n] = string.Join(" ",...) your filename
There are some checks missing there, but you get the idea.
Is it nice code? Probably not. Will it work? With the right amount of time, surely.
Is it more readable? I tend to to think "yes", but others might disagree.
You should be able to implement this with simple string.split, if statement and parse/parseexact method to convert the value. If it is a file then just concatenated the remaining string token so you can reconstruct filename with space