Most loose way to parse a date/time in C#? - c#

I'm parsing a broad range of RSS feeds - apprently they all use their own way to show the timestamp of the article.
Now we even found one that uses a local words, like Donderdag 17 juli 2018.
At the moment we have a fallback mechanism where we just fall back to DateTime.UtcNow when we can't parse the date.
Still I would like to make a best attempt. What is the best way to really loosely parse a DateTime in C#? So it can handle i.e.:
13-11-2018 14.32
donderdag 13 november 2018, 14:32
13 nov 2018
14:32 13.11.2018
2018-11-13T16:32:00+2:00
etc. I know that this would not be foolproof, but still I like to make a best attempt.
Is there any recommended way? Or do I have to roll my own?

You could use DateTime.TryParseExact and include all the expected formats.
DateTime result;
if( DateTime.TryParseExact(input, new [] {"dd-MM-yyyy HH.mm", "dddd dd MMMM yyyy, HH:mm", "more formats here"}, CultureInfo.CreateSpecificCulture("nl-NL"), DateTimeStyles.None, out result)) {
Console.WriteLine("Succeeded " + result);
}
The only big "gotcha" here is date formats where the date and month are in ambiguous positions. I do not see any in your example but if you were to mix cultures in one stream then it could become a problem. As an example the U.S. generally starts a formatted date with the month while the Netherlands starts it with the day of the month. If this is a problem there is no way to handle this dynamically in your use case above unless you also get the culture from the RSS stream in which case you could try to create a set of culture specific parsing rules.

This suggestion is not specific to date times, but you could try to use parser combinators, especially if you decide to roll your own solution.
There are multiple libs for .net, Sprache for example.

Loosely parsing date times from mixed sources if data is probably not a good idea. Some things like Microsoft's text-to-speech may try, but it can sometimes have the effect of reading consecutive dates as
October first, November first, December first, January thirteenth, etc.
The only way loose parsing can be made somewhat reliable is if one can use
other cues to associate dates with whatever wrote them. If you have a bunch of dates that occur at the top level of a particular feed, and you find that all parsing patterns that work for all of them yield the same results, then it's likely that that parsing pattern is parsing the dates correctly. The biggest parts of such an endeavor, however, will likely not be parsing the dates, but rather (1) ensuring that dates that are written in different formats get grouped separately, and (2) providing a means by which an operator can assist the program in places where it has trouble.
Incidentally, I don't know if any date parsing programs make use of attached weekdays as part of format validation, but they could often help. For example, "2-1-2018" could either be January 2 or February 1, but "Thursday 2-1-2018" could only be the latter. It may be helpful when parsing numeric dates from a source whose format isn't fully established to determine what the weekday would be with each method of parsing and check whether the input contains something that looks like a weekday matching one but not the other.

You can use the TryParse method to try to parse the strings, while looping through all cultures to capture any culture differences in the string. The following method will parse all standard formats for all cultures and return the date in the out parameter if it's found.
Note that the danger here is that some dates will have ambiguous month and day values (any number less than 13 could be a month or a day). In that case, the result will be the first culture found that matches, which may not be correct.
Here's the code:
public static bool TryParseAllCultures(string formattedDate,
out DateTime result)
{
// First try in our local culture
if (DateTime.TryParse(formattedDate, out result)) return true;
foreach (var cultureInfo in CultureInfo.GetCultures(CultureTypes.AllCultures))
{
if (DateTime.TryParse(formattedDate, cultureInfo, DateTimeStyles.None,
out result))
{
return true;
}
}
return false;
}
Sample usage
Note: I modified one of your dates because the date itself was invalid! The second date used to be "donderdag 13 november 2018", except the 13th is dienstag (Tuesday), not donderdag (Thursday).
private static void Main()
{
DateTime date;
var dateFormats = new List<string>
{
"13-11-2018 14.32",
"donderdag 15 november 2018, 14:32",
"13 nov 2018",
"14:32 13.11.2018",
"2018-11-13T16:32:00+2:00"
};
DateTime result;
foreach (var dateFormat in dateFormats)
{
if (TryParseAllCultures(dateFormat, out result))
{
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"SUCCESS: {dateFormat.PadRight(36, '.')} {result}");
}
else
{
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine($"ERROR: Unable to parse format: {dateFormat}");
}
Console.ResetColor();
}
GetKeyFromUser("\nDone! Press any key to exit...");
}
Output

Related

How do I parse iso8601-2004 datetimes into datetime following current iso8601 standard [duplicate]

I noticed quite an interesting error when parsing some times.
DateTime fails to parse 24:00:00. Under some Googling and Stacking, I found out that DateTime only recognizes 00 - 23 (what the?????), so if your input is 24:00:00, you're out of luck. You would think someone would put in a condition to equate 24:00:00 as 00:00:00 (the midnight), but not yet..
My question is, how do I allow DateTime to allow me to parse 24:00:00?
Unfortunately I cannot to use NodaTime under specification reasons (sorry Jon. I love your library though).
Experimentation below:
An input of 2014-03-18 24:00:00 would present the following error. Expected.
An input of 2014-03-18 23:59:59 would successfully parse. Expected.
An input of 2014-03-19 00:00:00` would successfully parse. Expected.
There is no "24th hour" support in the DateTime class.
The hour (HH/H, 24-hour clock) must be 0-23, inclusive. This is why 00:00:00 is valid, but 24:00:00 is not.
Change 24:00:00 to 00:00:00 (before parsing) and, if needed, advance the day as appropriate (after parsing).
The following will work on times in the provided format (but only up to the 24th hour) although it doesn't account for an arbitrary format. Supporting different format strings only adds additional complications.
DateTime ParseWithTwentyFourthHourToNextDay (string input) {
var wrapped = Regex.Replace(input, #"24:(\d\d:\d\d)$", "00:$1");
var res = DateTime.ParseExact(wrapped, "yyyy-MM-dd HH:mm:ss", null);
return wrapped != input
? res.AddDays(1)
: res;
}
24:00:00 doesn't exist. It is 00:00:00 - 23:59:59
Why would you like to parse 24:00:00 as a valid time expression when it would be like saying 09:05:60. The roof for time is 23:59:59.99999999999 and after that, it turns over to 00:00:00.
Before parsing, do a simple search and replace - replace '24:00:00' with '00:00:00' and then parse as usual.
Convert to Minute.
if t.TotalMinutes < 0
double _24h = 0;
_24h = 1440 + t.TotalMinutes;
TimeSpan t = TimeSpan.FromMinutes(_24h);

What is the longest string that would convert to a valid DateTime?

I am writing a data parser and trying to work out if a field is a number, a date, a string etc.
The .NET DateTime.TryParse is understandably slow when checking many records (as it checks many different date formats). Therefore, I want to shortcut the processing if possible. A simple check I can do initially is look at the length of the string and reject it if it falls outside of some bounds.
The shortest date I think I should reasonably expect is 6 characters long (e.g. d/M/yy) so I can make the following check:
if (fieldValue.Length < 6)
{
// no datetime is shorter than 6 chars (e.g. d/M/yy is the shotest I can think of)
return false;
}
What is the longest string that still represents a parse-able DateTime?
(For example, "Wednesday, 30th September 2020 12:34:56" is pretty long but I bet there are longer examples!)
A few points:
I am not looking for tricksy answers where the date is padded out with white space or something like that.
I am focused on English dates initially but would be interested if other cultures can throw up longer examples.
What is the longest string that still represents a parse-able
DateTime?
Take a look at the list of custom format specifiers for a DateTime, and take all of those into account.
For instance, this:
DateTime dt = DateTime.Now;
string strNow = dt.ToString("dddd, MMMM dd, yyyy gg hh:mm:ss.fffffff tt K");
Console.WriteLine(strNow);
Gives:
Tuesday, June 16, 2020 A.D. 08:47:02.2667911 AM -06:00
But those different types of values can be output differently based on the information in the DateTime. Look CLOSELY at all the different possible outputs for each specifier in the documentation to see what I mean.

How does DateTime.TryParse know the date format?

Here is a scenario.
You have a string that represents a date i.e. "Jan 25 2016 10:10 AM".
You want to know whether it represents a date in a specific culture.
You want to know what dateTime pattern satisfies this date string.
Example:
Date string is "Jan 25 2016 10:10 AM"
Culture is en-US
The POSSIBLE format for it could be "MMM dd yyyy HH:mm tt"
Implementation:
To get the list of all dateTime patterns you can get a CultureInfo.DateTimeFormat.GetAllDateTimePatterns()
Then try the overloaded version of DateTime.TryParseExact(dateString, pattern, culture, DateTimeStyles.None, out resultingDate) for each of the patterns above and see whether it can parse a date.
That should give you the needed dateTime pattern.
HOWEVER if we iterate all those patterns it will not find any matches!
This is even more weird if you try and use a DateTime.TryParse(dateString, culture, DateTimeStyles.None, out resultingDate) and it DOES parse the correct date!
So the question is how come the DateTime.TryParse knows the pattern of a date string when this info is not a part of CultureInfo and how to get to this info in a culture?
Thanks!
I agree with xanatos, there is no perfect solution for that and you can't assume that every format GetAllDateTimePatterns returns can be perfectly parsable with Parse or TryParse methods.
From DateTimeFormatInfo.GetAllDateTimePatterns;
You can use the custom format strings in the array returned by the
GetAllDateTimePatterns method in formatting operations. However, if
you do, the string representation of a date and time value returned in
that formatting operation cannot always be parsed successfully by the
Parse and TryParse methods. Therefore, you cannot assume that the
custom format strings returned by the GetAllDateTimePatterns method
can be used to round-trip date and time values.
If you see Remarks section on the page, there are only 42 formats that can be parsed by TryParse method in 96 formats that GetAllDateTimePatterns method returns for it-IT culture for example.7
Tarek Mahmoud Sayed responded as;
Parse/TryParse are implemented as finite state machine so it doesn’t
really use the date patterns in parsing. It just split the parsed
string into tokens and try to find if the token match specific part of
the date (like Month, day, day of week…etc.). in the other hand
ParseExact/TryParseExact will just parse the string according to the
passed format pattern.
In short, Parsing is really hard because there are a lot of things that can trip it up. And someone in some government could suddenly decide that country X should use D/M/Y instead of M/D/Y, or could have someone entering data used to the other format.
I talk a little about this on a blog post (toward the bottom-ish) https://web.archive.org/web/20190110065542/https://blogs.msdn.microsoft.com/shawnste/2005/04/05/culture-data-shouldnt-be-considered-stable-except-for-invariant/
DateTime.Parse attempts to guess what the input might be based on the pattern(s) and separators it sees in the specified culture. Unfortunately, some cultures are REALLY hard to guess at. For example, . has been used for time formats in some locales, so is 1.1.1 12.12.12 the 12th day of December 2012? Or the 1st day of January 2001?
ParseExact (as the other answers suggest) is more reliable as you can tell it exactly what you're looking for - even better, you can also tell the user exactly what to enter. (Hopefully this is human input). Unfortunately it requires the user to follow the template.
This is also why most date controls you encounter, especially on the web, have separate fields for month, day & year.
For machine readable formats its best to spit it out in some standard format and read it back in with that exact same format. We've had customers send data from one country to another using the CurrentCulture and wonder why their vendor can't read it ;-)

Transform between datetime formats

I am facing a problem in which I need to transform dates in a given input format into a target one. Is there any standard way to do this in C#?
As an example say we have yyyy.MM.dd as the source format and the target format is MM/dd/yyy (current culture).
The problem arises since I am using a parsing strategy that gives priority to the current culture and then if it fails it tries to parse from a list of known formats. Now say we have two equivalent dates one in the source culture above (2015.12.9) and the other in the current culture (9/12/2015). Then if we attempt to parse this two dates the month will be 12 for the first case and in the second will be 9, so we have an inconsistency (they were supposed to mean be the same exact date).
I believe that if existing it should be something as
DateTime.Convert(2015.12.9, 'yyyy/MM/dd', CultureInfo.CurrentCulture).
Any ideas?
EDIT:
Thank you all for your ideas and suggestions, however the interpretation most of you gave to my question was not quite right. What most of you have answered is a direct parse in the given format and then a conversion to the CurrentCulture.
DateTime.ParseExact("2015.12.9", "yyyy.MM.dd", CultureInfo.CurrentCulture)
This will still return 12 as month, although it is in the CurrentCulture format. My question thus was, is there any standard way to transform the date in yyyy.MM.d to the format MM/dd/yyy so that the month is now in the correct place and THEN parsed it in the target culture. Such function is likely to be unexisting.
DateTime.ParseExact is what you are looking for:
DateTime parsedDate = DateTime.ParseExact("2015.12.9", "yyyy.MM.d", CultureInfo.InvariantCulture);
Or eventualy DateTime.TryParseExact if you're not confident with input string.
I know it's late but I try to explain little bit deep if you let me..
I am facing a problem in which I need to transform dates in any format
to a target one.
There no such a thing as dates in any format. A DateTime does not have any implicit format. It just has date and time values. Looks like you have a string which formatted as date and you want to convert another string with different format.
Is there any standard way to do this in C#?
Yes. You can parse your string with DateTime.ParseExact or DateTime.TryParseExact first with specific format to DateTime and then generate it's string representation with a different format.
As an example say we have yyyy.MM.dd as the source format and the
target format is MM/dd/yyy (current culture).
I didn't understand what is the meaning of current culture in this sentences and I assume you want yyyy not yyy, but you can generate it as I described above like;
string source = "2015.12.9";
DateTime dt = DateTime.ParseExact(source, "yyyy.MM.d", CultureInfo.InvariantCulture);
string target = dt.ToString("MM/dd/yyyy", CultureInfo.InvariantCulture); // 12/09/201
The problem arises since I am using a parsing strategy that gives
priority to the current culture and then if it fails it tries to parse
from a list of known formats.
Since you didn't show any parsing strategy and there is no DateTime.Convert method in .NET Framework, I couldn't any comment.
Now say we have two equivalent dates one in the source culture above
(2015.12.9) and the other in the current culture (9/12/2015). Then if
we attempt to parse this two dates the month will be 12 and in the
second will be 9, so we have an inconsistency.
Again.. You don't have DateTime's. You have strings. And those formatted strings can't belong on any culture. Sure all cultures might parse or generate different string representations with the same format format a format does not belong any culture.
I assume you have 2 different string which different formatted and you wanna parse the input no matter which one it comes. In such a case, you can use DateTime.TryParseExact overload that takes string array for all possible formats as a parameter. Then generate it's string representation with MM/dd/yyy format and a culture that has / as a DateSeparator like InvariantCulture.
string s = "2015.12.9"; // or 9/12/2015
string[] formats = { "yyyy.MM.d", "d/MM/yyyy" };
DateTime dt;
if (DateTime.TryParseExact(s, formats, CultureInfo.InvariantCulture,
DateTimeStyles.None, out dt))
{
Console.WriteLine(dt.ToString("MM/dd/yyyy", CultureInfo.InvariantCulture));
}
The Simple and Best way to do it is Using .ToString() Method
See this code:
DateTime x =DateTime.Now;
To Convert This Just Write like This:
x.ToString("yyyyMMdd")//20151210
x.ToString("yyyy/MM/dd)//2015/12/10
x.ToString("yyyy/MMM/dd)//2015/DEC/10 //Careful About M type should be capital for month .
Hope helpful

Converting the WhenChanged attribute (Generalized-Time) in LDAP to a DateTime in C#

I recently switch from using S.DS namespace (which uses ADSI) to the S.SD.Protocol namespace. The only problem is that ADSI handled the conversion of Generalized-Time to a DateTime for me. Now I'm getting back a value of "20070828085401.0Z" for the WhenChanged attribute. DateTime.Parse() will not convert this so is there another way?
The format you are getting is close to the round trip date time pattern ("o") and universal sortable round trip date time pattern ("u") standard date time format strings as described here.
One kludgy solution would be to massage the string you get to fit the pattern and then use the "o" or "u" standard format string with ParseExact.
A better way would be to construct a custom format string that matches the data you are already getting. In the "How Standard Format Strings Work" section of the standard date time format strings page you'll see the full custom formatting strings equivalent to "o" and "u". That should give you a good start.
EDIT: Add code
string format = "yyyyMMddHHmmss.f'Z'";
string target = "20070828085401.0Z";
DateTime d = DateTime.ParseExact(target, format, CultureInfo.InvariantCulture);
In the comments lixonn observes that, using the format string above, ParseExact will not successfully parse a time string like 199412160532-0500.
It also won't parse a number of other valid strings such as times without the trailing 'Zulu' indicator (20070828085401.0); times without a fractional part (20070828085401Z) and times that represent minutes and seconds as a fractional hour (2007082808.90028Z).
The format string can be made slightly more forgiving by replacing the hard-coded 'Z' with the K custom specifier which will accept 'Z', an offset like -0500, and nothing. Whether that additional flexibility is a good thing will depend on your application.
Note that even with the K specifier Lixonn's string won't be parsed successfully since it lacks a fractional part to match the .f component of the format string.
You'll have to use DateTime.ParseExact() specifying the exact format.
You might have to play with the format a little bit but it would be something like this.
DateTime result;
CultureInfo provider = CultureInfo.InvariantCulture;
string format="yyyyMMddhhmmss.0Z";
result = DateTime.ParseExact(dateString, format, provider);
You can use datetime's .strptime().
import datetime
# Since 0Z denotes UTC, you can get rid of it and apply the timezone
# later if you would like
time_string = "20070828085401.0Z".split('.')[0]
time_object = datetime.datetime.strptime(time_string, "%Y%m%d%H%M%S")
time_object should output as datetime.datetime(2007, 8, 28, 8, 54, 1). I believe it will be timezone naive, and equivalent to UTC time.
// WIN32 FILETIME is a 64-bit value representing the number of 100-nanosecond intervals since January 1, 1601 (UTC).
// While the unix timestamp represents the seconds since January 1, 1970 (UTC).
private static long Win32FileTimeToUnixTimestamp(long fileTime)
{
//return fileTime / 10000L - 11644473600000L;
return DateTimeOffset.FromFileTime(fileTime).ToUnixTimeSeconds();
}
// The GeneralizedTime follows ASN.1 format, something like: 20190903130100.0Z and 20190903160100.0+0300
private static long GeneralizedTimeToUnixTimestamp(string generalizedTime)
{
var formats = new string[] { "yyyyMMddHHmmss.fZ", "yyyyMMddHHmmss.fzzz" };
return DateTimeOffset.ParseExact(generalizedTime, formats, System.Globalization.CultureInfo.InvariantCulture).ToUnixTimeSeconds();
}

Categories