Capture a Date value from a string - c#

I have a string like below,
var text1 = "TEST 01DEC22 test";
I want to capture only the "01DEC22" date from the string; I tried and was successful if only the text contained the date only, as shown below.
var text = "01DEC22";
var results = Regex.Matches(text, #"^(([0-9])|([0-2][0-9])|([3][0-1]))(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\d{2}$").Cast<Match>().Select(x => x.Value).FirstOrDefault();
Kindly help me how to retrieve the value if it is contained within a string.

If the dates are to be in two digits always, you may use below regex
((0[1-9])|([12]\d)|(3[01]))(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\d\d

As stated in the comments just remove the ^ and $ but you seem to have a careful way of checking for the day of the month instead of just \d?\d but with your method you still accept 0DEC22 as a date.
You can simplify the regex to this which only accepts valid days of the month:
(0?[1-9]|[12]\d|3[01])(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\d\d
This would be even simpler if you aren't worried about invalid dates:
(\d?\d)(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\d\d

Related

Check if string starts with Regex number

I'm really new to Regex, so I'm looking for a way to check if my string starts with a certain regex. I found this on the Internet, however, I can't make it work with a custom regex.
I need to know if a specific line starts with
3.3. XXX
which is how you format German date. So the regex doesn't need to look up only this, but possibly
17.4. XXX
in both cases, I need to know if the input string starts with a date (which can have two possible notations, as stated above). So, for both it'd return true, however, it wouldn't for this:
15G
(for example).
Which regex is good to go for this?
Regex is not good at parsing number ranges so it can get pretty messy https://stackoverflow.com/a/15504877/1383168
To check if a string is a valid date you can use DateTime.TryParseExact
string s = "17.4. XXX"; DateTime d;
if (DateTime.TryParseExact(s.Split(' ')[0], "d.M.",
System.Globalization.CultureInfo.InvariantCulture,
System.Globalization.DateTimeStyles.None, out d))
{
// matches
Debug.Print($"{d}"); // "4/17/2017 12:00:00 AM"
}
if you want a regex for detecting dd.mm type of date this your answer.
string testregex = "([0-2][0-9]|3[0-1]|[0-9])(.)(0[1-9]|1[0-2]|[0-9])(.)";
you can check any string to find match for this regex, Regex.IsMatch() returns true and statements in if block will execute.
string text="17.4 xxxxxx";
if (Regex.IsMatch(string test,testregex))
{
//do something
}

Short Time with DateTime.ParseExact

I’m trying to parse a time. I’ve seen this question asked/answered here many times but not for this specific scenario. Here’s my code:
var time1 = DateTime.ParseExact("919", "Hmm", CultureInfo.InvariantCulture);
also
var time2 = DateTime.ParseExact("919", "Hmm", null);
both of these throw the same
"String was not recognized as a valid DateTime"
What I want is 9:19 AM.
For further info I also need to parse “1305” as 1:05 PM, this is working fine.
It seems to me I’m using the correct format. What am I overlooking?
I'm not sure there is any format that can handle this. The problem is that "H" can be either one digit or two, so if there are two digits available, it will grab both - in this case parsing it as hour 91, which is clearly invalid.
Ideally, you'd change the format to HHmm - zero-padding the value where appropriate - so "0919" would parse fine. Alternatively, use a colon in the format, to distinguish between the hours and the minutes. I don't believe there's any way of making DateTime parse a value of "919" as you want it to... so you'll need to adjust the string somehow before parsing it. (We don't have enough context to recommend a particular way of doing that.)
Yes, your format is right but since H specifier might be 2 character, ParseExact method try to parse 91 as an hour, which is an invalid hour, that's why you get FormatException in both case.
I connected to microsoft team about this situation 4 months ago. Take a look;
DateTime conversion from string C#
They suggest to use 2 digit form in your string or insert a date separator between them.
var time1 = DateTime.ParseExact("0919", "Hmm", CultureInfo.InvariantCulture);
or
var time1 = DateTime.ParseExact("9:19", "H:mm", CultureInfo.InvariantCulture);
You cant exclude the 0 prefix to the hour. This works
var time1 = DateTime.ParseExact("0919", "Hmm", CultureInfo.InvariantCulture);
Perhaps you want to just prefix 3-character times with a leading zero before parsing.
Much appreciated for all the answers. I don’t have control of the text being created so the simplest solution for me seemed to be prefixing a zero as opposed to adding a colon in the middle.
var text = "919";
var time = DateTime.ParseExact(text.PadLeft(4, '0'), "Hmm", null);

How to determine if a string contains a specific date in any format?

I have an object with property A of type DATE and property B of type string. A requirement is that property B may not contain the string representation of property A.
right now I have the very simplistic approach of validating:
public function Validate() as boolean
if b.contains(a.toString("[format1]", CultureInfo....)
return false
end if
if b.contains(a.toString("[format2]", CultureInfo...)
return false
end if
'etc
return true
end function
But something in that approach feels wrong. Date.TryParse won't work because B may have more in it than JUST A.
Is there some approach that would let me validate B without typing out every possible datetime format (in a variety of Cultures) for A?
I don't care of the solution is VB.net or C#.
Clarification:
There are a few format restrictions on property B. It won't allow the typical date delimiters of forward slash or dot or even space. So I expect to see the date in something like mmddyyyy and ddmmyyyy or even MONddyyyy, etc. I'm not worried about anything except for month, day, and year.
I could just keep a list of possible formats and iterate through it, though my concern is that I might overlook a potential format that way.
Additional Clarification
Property A is a date value, not a string. So format is not determined by the user - it is determined by my validation process. So in the following examples, B should NOT validate.
A = Date(1962, 01, 22)
B = 01221962MyNewString
or
B = Mystring19620122value
or
B = Jan221962mystring
etc.
There are many possibilities for the string representation that I would need to exclude. Although I suppose, I don't need to exclude "the22ofjanuary1962".
I could use regular expressions - but the same issue exists. I simply have to think of every possible string representation and check for it. I was hoping something in the .Net framework already existed and I could use it. But sounds like no such luck.
ANSWER:
I marked Blam's post as the answer. It got me really close, especially once I added a regular expression. I iterate through all possible cultures. I then go through all the relevant standard datetime formats (http://msdn.microsoft.com/en-us/library/az4se3k1(v=vs.110).aspx), strip all non alphanumberic characters, and do the compare.
Not saying it caught everything, but all my unit tests pass so it caught all the ones I could think of today. and I could always add custom datetime formats later if the need becomes apparent.
Here's my solution.
Dim allCultures As CultureInfo()
allCultures = CultureInfo.GetCultures(CultureTypes.AllCultures)
Dim rgx = New Regex("[^a-zA-Z0-9]")
For Each ci As CultureInfo In allCultures
If B.Contains(rgx.Replace(Me.A.ToString("d", ci), "")) Or _
B.Contains(rgx.Replace(Me.A.ToString("D", ci), "")) Or _
B.Contains(rgx.Replace(Me.A.ToString("M", ci), "")) Or _
B.Contains(rgx.Replace(Me.A.ToString("s", ci), "")) Or _
B.Contains(rgx.Replace(Me.A.ToString("u", ci), "")) Or _
B.Contains(rgx.Replace(Me.A.ToString("Y", ci), "")) Or _
B.Contains(rgx.Replace(Me.A.ToString("g", ci), "")) Then
Return False
End If
Next
Return True
Could just enumerate cultures
CultureTypes[] mostCultureTypes = new CultureTypes[] {
CultureTypes.NeutralCultures,
CultureTypes.SpecificCultures,
CultureTypes.InstalledWin32Cultures,
CultureTypes.UserCustomCulture,
CultureTypes.ReplacementCultures,
};
CultureInfo[] allCultures;
DateTime dt = new DateTime(1962, 01, 22);
// Get and enumerate all cultures.
allCultures = CultureInfo.GetCultures(CultureTypes.AllCultures);
foreach (CultureInfo ci in allCultures)
{
// Display the name of each culture.
Console.WriteLine("Culture: {0}", ci.Name);
Thread.CurrentThread.CurrentCulture = ci;
Console.WriteLine("Displaying short date for {0} culture:",
Thread.CurrentThread.CurrentCulture.Name);
Console.WriteLine(" {0} (Short Date String)",
dt.ToShortDateString());
Console.WriteLine();
}
Even with that it appears you are not allowing spaces so you would need remove spaces.
You are not allowing certain delimiters so would need to remove them
You state any data format but you are really looking for specific date formats.
But you don't want to identify the date formats because you might miss one.
I suppose one option is to keep your list of possible formats somewhere, e.g. some constants or in a config file, then you can iterate over that list and make a new list with your date in each of those formats. Once you have that, you can check if your string contains any of those values.

How do I keep the 0's in a Date

I am trying to figure out how it is that I can keep the 0's or add them when I grab a date.
What Im getting is this:
6/15/2010
What I'm tring to get is:
06/15/2010
I have added it so that it checks the length to and if its less than 6 (im stripping the "/") it pads the left side. That solves the issue when the month is a single digit, but what about when the date is a single digit.
My ultimate goal is to have a date such as:
1/1/2010
read out like:
01/01/2010
Any suggestions would be greatly appreciated.
Use a custom format : dd/MM/yyyy, or in your case MM/dd/yyyy. Note the capital M, the small m gets you the minutes.
string s = DateTime.Now.ToString("MM/dd/yyyy");
You need to use a custom DateTime format string:
string str = someDate.ToString("dd/MM/yyyy");
It depends on the format of date you are using.
For instance, dd/MM/yyyy will produce 01/05/2009 and d/M/yyyy would produce 1/5/2009
A complete reference can be found there : http://msdn.microsoft.com/en-us/library/8kb3ddd4.aspx
You want something like this:
string myDate = "1/1/2010";
DateTime date = DateTime.Parse(myDate);
string formattedDate = date.ToString("MM/dd/yyyy");
If the starting date is some other unrecognized format you could use DateTime.ParseExact();
Use DateTime.ParseExact() to parse the string into a valid datetime object and then use DateTime.ToString("dd/MM/yyyy") to get result in desired format.

.NET: Why is TryParseExact failing on Hmm and Hmmss?

I'm trying out the DateTime.TryParseExact method, and I have come over a case that I just don't get. I have some formats and some subjects to parse that each should match one of the formats perfectly:
var formats = new[]
{
"%H",
"HH",
"Hmm",
"HHmm",
"Hmmss",
"HHmmss",
};
var subjects = new[]
{
"1",
"12",
"123",
"1234",
"12345",
"123456",
};
I then try to parse them all and print out the results:
foreach(var subject in subjects)
{
DateTime result;
DateTime.TryParseExact(subject, formats,
CultureInfo.InvariantCulture,
DateTimeStyles.NoCurrentDateDefault,
out result);
Console.WriteLine("{0,-6} : {1}",
subject,
result.ToString("T", CultureInfo.InvariantCulture));
}
I get the following:
1 : 01:00:00
12 : 12:00:00
123 : 00:00:00
1234 : 12:34:00
12345 : 00:00:00
123456 : 12:34:56
And to my question... why is it failing on 123 and 12345? Shouldn't those have become 01:23:00 and 01:23:45? What am I missing here? And how can I get it to work as I would expect it to?
Update: So, seems like we might have figured out why this is failing sort of. Seems like the H is actually grabbing two digits and then leaving just one for the mm, which would then fail. But, does anyone have a good idea on how I can change this code so that I would get the result I am looking for?
Another update: Think I've found a reasonable solution now. Added it as an answer. Will accept it in 2 days unless someone else come up with an even better one. Thanks for the help!
Ok, so I think I have figured this all out now thanks to more reading, experimenting and the other helpful answers here. What's happening is that H, m and s actually grabs two digits when they can, even if there won't be enough digits for the rest of the format. So for example with the format Hmm and the digits 123, H would grab 12 and there would only be a 3 left. And mm requires two digits, so it fails. Tadaa.
So, my solution is currently to instead use just the following three formats:
var formats = new[]
{
"%H",
"Hm",
"Hms",
};
With the rest of the code from my question staying the same, I will then get this as a result:
1 : 01:00:00
12 : 12:00:00
123 : 12:03:00
1234 : 12:34:00
12345 : 12:34:05
123456 : 12:34:56
Which I think should be both reasonable and acceptable :)
0123
012345
I'm guessing it looks for a length of 2/4/6 when it finds a string of numbers like that. Is 123 supposed to be AM or PM? 0123 isn't ambiguous like that.
If you do not use date or time
separators in a custom format pattern,
use the invariant culture for the
provider parameter and the widest form
of each custom format specifier. For
example, if you want to specify hours
in the pattern, specify the wider
form, "HH", instead of the narrower
form, "H"
cite:
http://msdn.microsoft.com/en-us/library/ms131044.aspx
As others have pointed out H is ambiguous because it implies a 10 hour day, where as HH is 12
To quote from MSDN's Using Single Custom Format Specifiers:
A custom date and time format string consists of two or more characters. For example, if the format string consists only of the specifier h, the format string is interpreted as a standard date and time format specifier. However, in this particular case, an exception is thrown because there is no h standard date and time format specifier.
To use a single custom date and time format specifier, include a space before or after the date and time specifier, or include a percent (%) format specifier before the single custom date and time specifier. For example, the format strings "h " and "%h" are interpreted as custom date and time format strings that display the hour represented by the current date and time value. Note that, if a space is used, it appears as a literal character in the result string.
So, should that have been % H in the first element in the formats array?
Hope this helps,
Best regards,
Tom.
I could be wrong, but I suspect it may have to do with the ambiguity inherent in the "H" part of your format string -- i.e., given the string "123", you could be dealing with hour "1" (01:00) or hour "12" (12:00); and since TryParseExact doesn't know which is correct, it returns false.
As for why the method does not supply a "best guess": the documentation is not on your side on this one, I'm afraid. From the MSDN documentation on DateTime.TryParse (emphasis mine):
When this method returns, contains the
DateTime value equivalent to the date
and time contained in s, if the
conversion succeeded, or
DateTime.MinValue if the
conversion failed. The conversion
fails if either the s or format
parameter is null, is an empty string, or does not
contain a date and time that
correspond to the pattern specified in
format. This parameter is passed
uninitialized.
"123" and "12345" seem to be ambiguous with respect to the TryParseExact method. "12345" could be either 12:34:50 or 01:23:45 for instance. Just a guess though.

Categories