Formatting long datetime string to remove T character - c#

I have a number of XML nodes which output a datetime object as string.
The problem is that when outputting both the time stamp and the date they are bonded together with a T Character.
Here is an example
2016-01-13T23:59:59
Of course all of the nodes in the XML are of a different type so grouping by name or type is out of the question. Im thinking my only option is to match a pattern with regex and resolve the problem that way.
Below is an example of how the XML would work, you can see that each element is named as something different but they all follow a similar pattern, where the T between the date and the time must be removed and a space replaced instead.
<dates>
<1stDate> 2016-01-13T23:59:59 </1stdate>
<2ndDate> 2017-01-13T23:55:57 </2ndDate>
<3rdDate> 2018-01-13T23:22:19 </3rdDate>
</dates>
Ideal solution to output like this
2016-01-13 23:59:59
2017-01-13 23:55:57
2018-01-13 23:22:19
I havent had to use Regex before but i know what it is. I have been trying to decode what this cheat sheet means http://regexlib.com/CheatSheet.aspx?AspxAutoDetectCookieSupport=1 but to no avail.
UPDATE
//How each node is output
foreach (XText node in nodes)
{
node.Value = node.Value.Replace("T"," "); // Where a date occurs, replace T with space.
}
The <date> elements provided in the example may contain dates in my XML but may not include the word date as a name.
e.g.
<Start> 2017-01-13T23:55:57 </start>
<End> 2018-01-13T23:22:19 </End>
<FirstDate> 2018-01-13T23:22:19 </FirstDate>
The main reason I would have liked a regex solution was because I need to match the date string with a pattern that can determine if its a date or not, then i can apply formatting.

Why not parse that (perfectly valid ISO-8601) date time into a DateTime, and then use the built in string formatting to produce a presentable human readable date time?
if (!string.IsNullOrWhiteSpace(node.Value))
{
DateTime date;
if (DateTime.TryParseExact(node.Value.Trim(),
#"yyyy-MM-dd\THH:mm:ss",
CultureInfo.InvariantCulture,
DateTimeStyles.AssumeUniversal,
out date)
{
node.Value = date.ToString("yyyy-MM-dd HH:mm:ss");
}
}

I would use:
if (DateTime.TryParse(yourString))
{
yourString.Replace("T", " ");
}
EDIT
If you would only like to replace the first instance of the letter "T" like I think you are suggesting in your UPDATE. You could use this extension method:
public static string ReplaceFirst(this string text, string search, string replace)
{
int pos = text.IndexOf(search);
if (pos < 0)
{
return text;
}
return text.Substring(0, pos) + replace + text.Substring(pos + search.Length);
}
and you would use it like:
yourString.ReplaceFirst("T", " ");

If you still want to do this with regex, the following expression should do the trick:
# Positive lookbehind for date part which consists of numbers and dashes
(?<=[0-9-]+)
# Match the T in between
T
# Positive lookahead for time part which consists of numbers and colons
(?=[0-9:]+)
EDIT
The regex above will NOT check if the string is in date/time format. It is a generic pattern. To impose the format for your strings use this pattern:
# Positive lookbehind for date part
(?<=\d{4}(-\d{2}){2})
# Match the T
T
# Positive lookahead for time part
(?=\d{2}(:\d{2}){2})
Again, this will match the exactly the strings you have but it you should not use it to validate date/time values because it will match invalid dates like 2015-15-10T24:12:10; to validate date/time values use DateTime.Parse() or DateTime.TryParse() methods.

Related

Extract value from a string in C# from a specific position

I have bunch of files in a folder and I am looping through them.
How do I extract the value from the below example? I need the value 0519 only.
DOC 75-20-0519-1.PDF
The below code gives the complete part include -1.
Convert.ToInt32(Path.GetFileNameWithoutExtension(objFile).Split('-')[2]);
Appreciate any help.
You can try regular expressions in order to match the value.
pattern:
[0-9]+ - one ore more digits
(?=[^0-9][0-9]+$) - followed by not a digit and one or more digits and end of string
code:
using System.Text.RegularExpressions;
...
string file = "DOC 75-20-0519-1.PDF";
// "0519"
string result = Regex
.Match(Path.GetFileNameWithoutExtension(file), #"[0-9]+(?=[^0-9][0-9]+$)")
.Value;
If Split('-') fails, and you have an entire string as a result, it seems that you have a wrong delimiter. It can be, say, one of the dashes:
"DOC 75–20–0519–1.PDF"; // n-dash
"DOC 75—20—0519—1.PDF"; // m-dash
You can use REGEX for this
Match match = Regex.Match("DOC 75-20-0519-1.PDF", #"DOC\s+\d+\-\d+\-(\d+)\-\d+", RegexOptions.IgnoreCase);
string data = match.Groups[1].Value;

C# Extract part of the string that starts with specific letters

I have a string which I extract from an HTML document like this:
var elas = htmlDoc.DocumentNode.SelectSingleNode("//a[#class='a-size-small a-link-normal a-text-normal']");
if (elas != null)
{
//
_extractedString = elas.Attributes["href"].Value;
}
The HREF attribute contains this part of the string:
gp/offer-listing/B002755TC0/
And I'm trying to extract the B002755TC0 value, but the problem here is that the string will vary by its length and I cannot simply use Substring method that C# offers to extract that value...
Instead I was thinking if there's a clever way to do this, to perhaps a match beginning of the string with what I search?
For example I know for a fact that each href has this structure like I've shown, So I would simply match these keywords:
offer-listing/
So I would find this keyword and start extracting the part of the string B002755TC0 until the next " / " sign ?
Can someone help me out with this ?
This is a perfect job for a regular expression :
string text = "gp/offer-listing/B002755TC0/";
Regex pattern = new Regex(#"offer-listing/(\w+)/");
Match match = pattern.Match(text);
string whatYouAreLookingFor = match.Groups[1].Value;
Explanation : we just match the exact pattern you need.
'offer-listing/'
followed by any combination of (at least one) 'word characters' (letters, digits, hyphen, etc...),
followed by a slash.
The parenthesis () mean 'capture this group' (so we can extract it later with match.Groups[1]).
EDIT: if you want to extract also from this : /dp/B01KRHBT9Q/
Then you could use this pattern :
Regex pattern = new Regex(#"/(\w+)/$");
which will match both this string and the previous. The $ stands for the end of the string, so this literally means :
capture the characters in between the last two slashes of the string
Though there is already an accepted answer, I thought of sharing another solution, without using Regex. Just find the position of your pattern in the input + it's lenght, so the wanted text will be the next character. to find the end, search for the first "/" after the begining of the wanted text:
string input = "gp/offer-listing/B002755TC0/";
string pat = "offer-listing/";
int begining = input.IndexOf(pat)+pat.Length;
int end = input.IndexOf("/",begining);
string result = input.Substring(begining,end-begining);
If your desired output is always the last piece, you could also use split and get the last non-empty piece:
string result2 = input.Split(new string[]{"/"},StringSplitOptions.RemoveEmptyEntries)
.ToList().Last();

Using RegEx to match Month-Day in C#

Let me preface this by saying I am new to Regex and C# so I am still trying to figure it out. I also realize that Regex is a deep subject that takes time to understand. I have done a little research to figure this out but I don't have the time needed to properly study the art of Regex syntax as I need this program finished tomorrow. (no this is not homework, it is for my job)
I am using c# to search through a text file line by line and I am trying to use a Regex expression to check whether any lines contain any dates of the current month in the format MM-DD. The Regex expression is used within a method that is passed each line of the file.
Here is the method I am currently using:
private bool CheckTransactionDates(string line)
{
// in the actual code this is dynamically set based on other variables
string month = "12";
Regex regExPattern = new Regex(#"\s" + month + #"-\d(0[1-9]|[1-2][0-9]|3[0-1])\s");
Match match = regExPattern.Match(line);
return match.Success;
}
Essentially I need it to match if it is preceded by a space and followed by a space. Only if it is the current month (in this case 12), an hyphen, and a day of the month ( " 12-01 " should match but not " 12-99 "). It should always be 2 digits on either side of the hyphen.
This Regex (The only thing I can make match) will work, but also picks up items outside the necessary range:
Regex regExPattern = new Regex(#"\s" + month + #"-\d{2}\s");
I have also tried this without sucess:
Regex regExPattern = new Regex(#"\s" + month + #"-\d[01-30]{2}\s");
Can anyone tell me what I need to change to get the results I need?
Thanks in advance.
If you just need to find out if the line contains any valid match, something like this will work:
private bool CheckTransactionDates(string line)
{
// in the actual code this is dynamically set based on other variables
int month = DateTime.Now.Month;
int daysInMonth = DateTime.DaysInMonth(DateTime.Today.Year, DateTime.Today.Month);
Regex pattern = new Regex(string.Format(#"{0:00}-(?<DAY>[0123][0-9])", month));
int day = 0;
foreach (Match match in pattern.Matches(line))
{
if (int.TryParse(match.Groups["DAY"].Value, out day))
{
if (day <= daysInMonth)
{
return true;
}
}
}
return false;
}
Here's how it works:
You determine the month to search for (here, I use the current month), and the number of days in that month.
Next, the regex pattern is built using a string.Format function that puts the left-zero-padded month, followed by dash, followed by any two digit number 00 to 39 (the [0123] for the first digit, the [0-9] for the second digit). This narrows the regex matches, but not conclusively for a date. The (?<DAY>...) that surrounds it creates a regex group, which will make processing it later easier. Note that I didn't check for a whitespace, in case the line begins with a valid date. You could easily add a space to the pattern, or modify the pattern to your specific needs.
Next, we check all possible matches on that line (pattern.Matches) in a loop.
If a match is found, we then try to parse it as an integer (it should always work, based on the pattern we are matching). We use the DAY group of that match that we defined in the pattern.
After parsing that match into an integer day, we check to see if that day is a valid number for the month specified. If it is, we return true from the function, as we found a valid date.
Finally, if we found no matches, or if none of the matches is valid, we return false from the function (only if we hadn't returned true earlier).
One thing to note is that \s matches any white space character, not just a space:
\s match any white space character [\r\n\t\f ]
However, a Regex that literally looks for a space would not, one like this (12-\d{2}). However, I've got to go with the rest of the community a bit on what to do with the matches. You're going to need to go through every match and validate the date with a better approach:
var input = string.Format(
" 11-20 2690 E 28.76 12-02 2468 E* 387.85{0}11-15 3610 E 29.34 12-87 2534 E",
Environment.NewLine);
var pattern = string.Format(#" ({0}-\d{{2}}) ", DateTime.Now.ToString("MM"));
var lines = new List<string>();
foreach (var line in input.Split(new string[] { Environment.NewLine },
StringSplitOptions.RemoveEmptyEntries))
{
var m = Regex.Match(line, pattern);
if (!m.Success)
{
continue;
}
DateTime dt;
if (!DateTime.TryParseExact(m.Value.Trim(),
"MM-dd",
null,
DateTimeStyles.None,
out dt))
{
continue;
}
lines.Add(line);
}
The reason I went through the lines one at a time is because presumably you need to know what line is good and what line is bad. My logic may not exactly match what you need but you can easily modify it.

how to extract date from a string using regex

i m looking for regex which can extract the date from the following html
<p>British Medical Journal, 29.9.12, pp.37-41.</p>
and convert it in the format 29/09/12
Match this pattern: -
(\d+)[.](\d+)[.](\d+)
and replace with: -
$1/$2/$3
\d is used to match digits. Using it with quantifier (+), you would match one or more digits.
Now, in regex, a dot(.) is a metacharacter, that matches any character. To match a period literally, you would need to either escape it, or use it inside a character class.
To convert to a specific Date Format, e.g.: - convert 9 -> 09, you can make use of a MatchEvaluator: -
string input = "British Medical Journal, 29.9.12, pp.37-41.";
Regex reg = new Regex(#"(\d+)[.](\d+)[.](\d+)");
string result = reg.Replace(input, delegate(Match m) {
return m => DateTime.Now.ToString("dd/MM/yy")
});
You can check whether it works or not.
Here is the regex pattern: \d{1,2}\.\d{1,2}\.\d{1,2}.
And here is the example of how to parse this string to DateTime:
DateTime.ParseExact("29.9.12", "d.M.yy", CultureInfo.InvariantCulture);
(\d{4})[-](\d{2})[-](\d{2}) use this regex to pick 2017-01-23 format date

Can someone help me out with some regex?

I'm trying to get an int[] from the following string. I was initially doing a Regex().Split() but as you can see there isn't a definite way to split this string. I could split on [a-z] but then I have to remove the commas afterwards. It's late and I can't think of a nice way to do this.
"21,false,false,25,false,false,27,false,false,false,false,false,false,false,false"
Any suggestions?
Split on the following:
[^\d]+
You may get an empty element at the beginning if your string doesn't begin with a number and/or an empty element at the end if your string doesn't end with a number. If this happens you can trim those elements. Better yet, if Regex().Split() supports some kind of flag to not return empty strings, use that.
A simple solution here is to match the string with the pattern \d+.
Sometimes it is easier to split by the values you don't want to match, but here it serves as an anti-pattern.:
MatchCollection numbers = Regex.Matches(s, #"\d+");
Another solution is to use regular String.Split and some LINQ magic. First we split by commas, and then check that all letters in a given token are digits:
var numbers = s.Split(',').Where(word => word.All(Char.IsDigit));
It is an commonly ignored fact the \d in .Net matches all Unicode digits, and not just [0-9] (try ١٢٣ in your array, just for fun). A more robust solution is to try to parse each token according to your defined culture, and return the valid numbers. This can easily be adapted to support decimals, exponents or ethnic numeric formats:
static IEnumerable<int> GetIntegers(string s)
{
CultureInfo culture = CultureInfo.InvariantCulture;
string[] tokens = s.Split(',');
foreach (string token in tokens)
{
int number;
if (Int32.TryParse(token, NumberStyles.Integer, culture, out number))
yield return number;
}
}

Categories