I have below text line and I intend to extract the "date" after the ",", i,e,
1 Sep 2015
Allocation/bundle report 10835.0000 Days report step 228, 1 Sep 2015
I wrote the below regex code and it returns empty in the match.
`Regex regexdate = new Regex(#"\Allocation/bundle\s+\report\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\,\+(\S)+\s+(\S)+\s+(\S)"); // to get dates
MatchCollection matchesdate = regexdate.Matches(text);
Can you advice about what's wrong with the Regex format that I mentioned?
The \A is an anchor asserting the start of string. You must have meant A. (\S)+ must be turned into (\S+). Also, \r is a carriage return matching pattern, again remove the backslash to turn \r into r.
Use
#"Allocation/bundle\s+report\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\,\s+(\S+)\s+(\S+)\s+(\S+)"
See the regex demo
Note that the last part of the regex may be made a bit more specific to match 1+ digits, then some letters and then 4 digits: (\S+)\s+(\S+)\s+(\S+) -> (\d+)\s+(\p{L}+)\s+(\d{4})
Can you do it without Regex? Here's an example using a bit of help from LINQ.
var text = "Allocation/bundle report 10835.0000 Days report step 228, 1 Sep 2015";
var sDate = text.Split(',').Last().Trim();
if (string.IsNullOrEmpty(sDate))
{
Console.WriteLine("No date found.");
}
else
{
Console.WriteLine(sDate); // Returns "1 Sep 2015"
}
Related
I have a string in the format:
خصم بقيمة 108 بتاريخ 31-01-2021
And I want to replace the digits between the words: بقيمة & بتاريخ with a "?" character.
And keep the digits in the date part of the string
I tried using this Regular Expression: (?<=بقيمة)(.*?)(?=بتاريخ)
Which works on https://regex101.com/
But when I implement it in C# in Regex.Replace function, it doesn't have any effect when I use the Arabic words:
e.Row.Cells[3].Text = Regex.Replace(e.Row.Cells[3].Text, "(?<=بقيمة)(.*?)(?=بتاريخ)", "?");
But it works if I use Latin letters:
e.Row.Cells[3].Text = Regex.Replace(e.Row.Cells[3].Text, "(?<=X)(.*?)(?=Y)", "?");
Is there anyway to make the function work with Arabic characters?
Or is there a better approach I can take to achieve the desired result? For example excluding the date part?
Since the needed digits (without "-"s) are bookended by spaces just use \s(\d+)\s.
var txt = "خصم بقيمة 108 بتاريخ 12-31-2021";
var pattern = #"\s(\d+)\s";
Console.WriteLine( Regex.Match(txt, pattern).Value ); // 108
When looping through elements in a List<string> I want to extract the first date time.
This is a sample line where in this case month is a single digit, month can also be a double digit e.g. 12 but not say 01.
string line = "\\\\SomeServer\\HTTP\\demo1\\index.cfm 4 KB CFM " +
"File 2/19/2019 3:48:21 PM " +
"2/19/2019 1:05:53 PM 2/19/2019 1:05:53 PM 5";
The expected result would be
2/19/2019 3:48:21 PM
I have looked at various regular expression code sample here, the following is one which properly handles single digit months only and does not return the time portion for the date (as I don't know what pattern to use).
var line = "\\\\SomeServer\\HTTP\\FolderName\\index.cfm 4 KB CFM " +
"File 02/19/2019 3:48:21 PM " +
"2/19/2019 1:05:53 PM 2/19/2019 1:05:53 PM 5";
var match = Regex.Match(line,
#"\d{2}\/\d{2}\/\d{4}");
var dateValue = match.Value;
if (!string.IsNullOrWhiteSpace(dateValue))
{
var dateTime = DateTime.ParseExact(dateValue,
"MM/dd/yyyy",
CultureInfo.CurrentCulture);
Console.WriteLine(
dateTime.ToString(CultureInfo.InvariantCulture));
}
In closing, I've looked at recommended question lite up when posting this question and have virtually no expertise creating regular expressions. I appreciate any recommendations and/or code samples to get me in the right direction.
You may use
\b\d{1,2}/\d{1,2}/\d{4}\s\d{1,2}:\d{2}:\d{2}\s?[AP]M\b
See the regex demo. The Regex.Match will get you the first match.
Details
\b - word boundary
\d{1,2}/\d{1,2}/\d{4} - one or two digits, /, one or two digits, /, four digits
\s - a whitespace
\d{1,2}:\d{2}:\d{2} - 1 or 2 digits, :, 2 digits, :, 2 digits
\s? - an optional whitespace
[AP]M - AM or PM
\b - word boundary.
I'm pretty bad at Regex (C#) with my attempts at doing the following giving non-sense results.
Given string: 058:09:07
where only the last two digits are guaranteed, I need the result of:
"58y 9m 7d"
The needed rules are:
The last two digits "07" are days group and always present. If "00", then only the last "0" is to be printed,
The group immediately to the left of "07" which ends with ":" signify the months and are only present if enough days are present to lead into months. Again, if "00", then only the last "0" is to be printed,
The group immediately to the left of "09:" which ends with ":" signify years and will only be present if more then 12 months are needed.
In each group a leading "0" will be dropped.
(This is the result of an age calculation where 058:09:07 means 58 years, 9 months, and 7 days old. The ":" (colon) always used to separate years from months from days).
Example:
058:09:07 --> 58y 9m 7d
01:00 --> 1m 0d
08:00:00 --> 8y 0m 0d
00 --> 0d
Any help is most appreciated.
Well, you can pretty much do this without regex.
var str = "058:09:07";
var integers = str.Split(':').Select(int.Parse).ToArray();
var result = "";
switch(integers.Length)
{
case 1:
result = string.Format("{0}d", integers[0]); break;
case 2:
result = string.Format("{0}m {1}d", integers[0], integers[1]); break;
case 3:
result = string.Format("{0}y {1}m {2}d", integers[0], integers[1], integers[2]); break;
}
If you want to use regex so bad, that it starts to hurt, you can use this one instead:
var integers = Regex.Matches(str, "\d+").Cast<Match>().Select(x=> int.Parse(x.Value)).ToArray();
But, its overhead, of course. You see, regex is not parsing language, its pattern matching language, and should be used as one. For example, for finding substrings in strings. If you can find final substrings simply by cutting it by char, why not to use it?
DISCLAIMER: I am posting this answer for the educational purposes. The easiest and most correct way in case the whole string represents the time span eocron06's answer is to be used.
The point here is that you have optional parts that go in a specific order. To match them all correctly you may use the following regex:
\b(?:(?:0*(?<h>\d+):)?0*(?<m>\d+):)?0*(?<d>\d+)\b
See the regex demo
Details:
\b - initial word boundary
(?: - start of a non-capturing optional group (see the ? at the end below)
(?:0*(?<h>\d+):)? - a nested non-capturing optional group that matches zero or more zeros (to trim this part from the start from zeros), then captures 1+ digits into Group "h" and matches a :
0*(?<m>\d+): - again, matches zero or more 0s, then captures one or more digits into Group "m"
)? - end of the first optional group
0*(?<d>\d+) - same as the first two above, but captures 1+ digits (days) into Group "d"
\b - trailing word boundary
See the C# demo where the final string is built upon analyzing which group is matched:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var pattern = #"\b(?:(?:0*(?<h>\d+):)?0*(?<m>\d+):)?0*(?<d>\d+)\b";
var strs = new List<string>() {"07", "09:07", "058:09:07" };
foreach (var s in strs)
{
var result = Regex.Replace(s, pattern, m =>
m.Groups["h"].Success && m.Groups["m"].Success ?
string.Format("{0}h {1}m {2}d", m.Groups["h"].Value, m.Groups["m"].Value, m.Groups["d"].Value) :
m.Groups["m"].Success ?
string.Format("{0}m {1}d", m.Groups["m"].Value, m.Groups["d"].Value) :
string.Format("{0}d", m.Groups["d"].Value)
);
Console.WriteLine(result);
}
}
}
Let me preface this by saying I am new to Regex and C# so I am still trying to figure it out. I also realize that Regex is a deep subject that takes time to understand. I have done a little research to figure this out but I don't have the time needed to properly study the art of Regex syntax as I need this program finished tomorrow. (no this is not homework, it is for my job)
I am using c# to search through a text file line by line and I am trying to use a Regex expression to check whether any lines contain any dates of the current month in the format MM-DD. The Regex expression is used within a method that is passed each line of the file.
Here is the method I am currently using:
private bool CheckTransactionDates(string line)
{
// in the actual code this is dynamically set based on other variables
string month = "12";
Regex regExPattern = new Regex(#"\s" + month + #"-\d(0[1-9]|[1-2][0-9]|3[0-1])\s");
Match match = regExPattern.Match(line);
return match.Success;
}
Essentially I need it to match if it is preceded by a space and followed by a space. Only if it is the current month (in this case 12), an hyphen, and a day of the month ( " 12-01 " should match but not " 12-99 "). It should always be 2 digits on either side of the hyphen.
This Regex (The only thing I can make match) will work, but also picks up items outside the necessary range:
Regex regExPattern = new Regex(#"\s" + month + #"-\d{2}\s");
I have also tried this without sucess:
Regex regExPattern = new Regex(#"\s" + month + #"-\d[01-30]{2}\s");
Can anyone tell me what I need to change to get the results I need?
Thanks in advance.
If you just need to find out if the line contains any valid match, something like this will work:
private bool CheckTransactionDates(string line)
{
// in the actual code this is dynamically set based on other variables
int month = DateTime.Now.Month;
int daysInMonth = DateTime.DaysInMonth(DateTime.Today.Year, DateTime.Today.Month);
Regex pattern = new Regex(string.Format(#"{0:00}-(?<DAY>[0123][0-9])", month));
int day = 0;
foreach (Match match in pattern.Matches(line))
{
if (int.TryParse(match.Groups["DAY"].Value, out day))
{
if (day <= daysInMonth)
{
return true;
}
}
}
return false;
}
Here's how it works:
You determine the month to search for (here, I use the current month), and the number of days in that month.
Next, the regex pattern is built using a string.Format function that puts the left-zero-padded month, followed by dash, followed by any two digit number 00 to 39 (the [0123] for the first digit, the [0-9] for the second digit). This narrows the regex matches, but not conclusively for a date. The (?<DAY>...) that surrounds it creates a regex group, which will make processing it later easier. Note that I didn't check for a whitespace, in case the line begins with a valid date. You could easily add a space to the pattern, or modify the pattern to your specific needs.
Next, we check all possible matches on that line (pattern.Matches) in a loop.
If a match is found, we then try to parse it as an integer (it should always work, based on the pattern we are matching). We use the DAY group of that match that we defined in the pattern.
After parsing that match into an integer day, we check to see if that day is a valid number for the month specified. If it is, we return true from the function, as we found a valid date.
Finally, if we found no matches, or if none of the matches is valid, we return false from the function (only if we hadn't returned true earlier).
One thing to note is that \s matches any white space character, not just a space:
\s match any white space character [\r\n\t\f ]
However, a Regex that literally looks for a space would not, one like this (12-\d{2}). However, I've got to go with the rest of the community a bit on what to do with the matches. You're going to need to go through every match and validate the date with a better approach:
var input = string.Format(
" 11-20 2690 E 28.76 12-02 2468 E* 387.85{0}11-15 3610 E 29.34 12-87 2534 E",
Environment.NewLine);
var pattern = string.Format(#" ({0}-\d{{2}}) ", DateTime.Now.ToString("MM"));
var lines = new List<string>();
foreach (var line in input.Split(new string[] { Environment.NewLine },
StringSplitOptions.RemoveEmptyEntries))
{
var m = Regex.Match(line, pattern);
if (!m.Success)
{
continue;
}
DateTime dt;
if (!DateTime.TryParseExact(m.Value.Trim(),
"MM-dd",
null,
DateTimeStyles.None,
out dt))
{
continue;
}
lines.Add(line);
}
The reason I went through the lines one at a time is because presumably you need to know what line is good and what line is bad. My logic may not exactly match what you need but you can easily modify it.
I have a regular expression which matches a date format like : 26 August 2011
and I'm trying to read each line in a file and capture the line that contains the date in above format. But it does not seem to be working:
Regex test = new Regex(#"^((31(?!\ (Feb(ruary)?|Apr(il)?|June?|(Sep(?=\b|t)t?|Nov)(ember)?)))|((30|29)(?!\ Feb(ruary)?))|(29(?=\ Feb(ruary)?\ (((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))|(0?[1-9])|1\d|2[0-8])\ (Jan(uary)?|Feb(ruary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sep(?=\b|t)t?|Nov|Dec)(ember)?)\ ((1[6-9]|[2-9]\d)\d{2})$");
StreamReader file = new StreamReader(outputFile);
while ((line2 = file.ReadLine()) != null)
{
lines.Add(line2);
foreach (Match match in test.Matches(line2))
{
v += match.Value;
}
}
Ok, so this is the scenario..
1st - If line contains: "26 August 2011", it returns that date.
2nd - If line contains : " some text etc 26 August 2011", it returns null.
Any idea how this issue can be tackled?
The leading ^ character in your regular expression says, "match starting at the beginning of the line." And the last character is $, meaning that the line has to end with the expression. So if your line contains anything other than a date in the format you specified, the regular expression isn't going to match.
Remove the ^ at the front and the $ at the end.
I'm guessing test is defined as Regex test=new Regex("26 August 2011");
Try this
StreamReader file = new StreamReader(outputFile);
while ((line2 = file.ReadLine()) != null)
{
lines.Add(line2);
if (test.IsMatch(line2))
{
v += line2;
}
}
Albeit you probably want to use a StringBuilder for performance (eg v = new StringBuilder()) and then instead of v += line2 you do v.Append(line2)
--UPDATE
Reading your updated answer with the provided regex, if you just use your existing code and remove the ^ at the begining of the regex and the $ at the end then your code will find all dates within the file regardless of position if that is what you are after.