My problem is that my csv file has data stored in form of a json file. I now need to extract that data in the most efficient way to create objects that store the data.
My csv-file looks like this:
1. 2022-09-19,"{
2. "timestamp": 41202503,
3. "machineId": 3567,
4. "status": 16,
5. "isActive": false,
6. "scanWidth": 5.0,
7. }"
8. 2022-09-19,"{
9. "timestamp": 41202505,
10. "machineId": 3568,
11. "status": 5,
12. "isActive": true,
13. "scanWidth": 1.4,
14. }"
15. 2022-09-19,"{
16. "timestamp": 41202507,
17. "machineId": 3569,
18. "status": 12,
19. "isActive": false,
20. "scanWidth": 6.2,
21. }"
In my project I would have class called "MachineData" with all the relevant properties.
My question is now, how can I extract the data stored in this csv file?
Thanks again for helping!
Create a type to represent this data:
class ResultItem
{
public string Date { get; set; }
public string Timestamp { get; set; }
public string MachineId { get; set; }
public string Status { get; set; }
public string IsActive { get; set; }
public string ScanWidth { get; set; }
}
Use Regex and Newtonsoft.Json to extract the data:
//Remove the line number
csvText = Regex.Replace(csvText, #"\d+\. ", "");
//Match items with separating the date and the json in different groups
var matches = Regex.Matches(csvText, #"(?<date>\d{4}-\d{2}-\d{2}),""(?<json>(.+\n){6}})", RegexOptions.CultureInvariant | RegexOptions.Multiline);
var results = new List<ResultItem>();
foreach (Match match in matches)
{
//Getting values from json group
var item = JsonConvert.DeserializeObject<ResultItem>(match.Groups["json"].Value);
//Getting value from date group
item.Date = match.Groups["date"].Value;
results.Add(item);
}
I wouldn't normally recommend parsing json with Regular Expressions, but this is a special case, so here is a solution entirely based on RegEx:
string pattern = #"(?<date>[0-9-]+),""{\s+""timestamp"":\s+(?<timestamp>[0-9]+),\s+""machineId"":\s+(?<machineId>[0-9]+),\s+""status"":\s+(?<status>[0-9]+),\s+""isActive"":\s+(?<isActive>(true|false)),\s+""scanWidth"":\s+(?<scanWidth>[0-9\.]+),\s+}""";
Regex rg = new Regex(pattern, RegexOptions.Multiline);
foreach (Match match in rg.Matches(File.ReadAlltext("nameoffile.csv")))
{
Console.WriteLine(match.Groups["date"].Value);
Console.WriteLine(match.Groups["timestamp"].Value);
Console.WriteLine(match.Groups["machineId"].Value);
Console.WriteLine(match.Groups["status"].Value);
Console.WriteLine(match.Groups["isActive"].Value);
Console.WriteLine(match.Groups["scanWidth"].Value);
}
Note that if the input differs the slightest this will fail (negative values, additional or missing white space, etc.).
EDIT
If the line numbers are part of the input, you need to add [0-9]+\.\s+ in the beginning of the Regex to swallow the line number, the dot, and the white space, giving:
string pattern = #"[0-9]+\.\s+(?<date>[0-9-]+),""{\s+""timestamp"":\s+(?<timestamp>[0-9]+),\s+""machineId"":\s+(?<machineId>[0-9]+),\s+""status"":\s+(?<status>[0-9]+),\s+""isActive"":\s+(?<isActive>(true|false)),\s+""scanWidth"":\s+(?<scanWidth>[0-9\.]+),\s+}""";
Related
I'm using Regex to match characters from a file, but I want to match 2 different strings from that file but they appear more than once, that's why I am using a loop. I can match with a single string but not with 2 strings.
Regex celcius = new Regex(#"""temp"":\d*\.?\d{1,3}");
foreach (Match match in celcius.Matches(htmlcode))
{
Regex date = new Regex(#"\d{4}-\d{2}-\d{2}");
foreach (Match match1 in date.Matches(htmlcode))
{
string date1 = Convert.ToString(match1.Value);
string temperature = Convert.ToString(match.Value);
Console.Write(temperature + "\t" + date1);
}
}
htmlcode:
{"temp":287.05,"temp_min":286.932,"temp_max":287.05,"pressure":1019.04,"sea_level":1019.04,"grnd_level":1001.11,"humidity":89,"temp_kf":0.12},"weather":[{"id":804,"main":"Clouds","description":"overcast
clouds","icon":"04n"}],"clouds":{"all":100},"wind":{"speed":0.71,"deg":205.913},"sys":{"pod":"n"},"dt_txt":"2019-09-22
21:00:00"},{"dt":1569196800,"main":{"temp":286.22,"temp_min":286.14,"temp_max":286.22,"pressure":1019.27,"sea_level":1019.27,"grnd_level":1001.49,"humidity":90,"temp_kf":0.08},"weather":[{"id":804,"main":"Clouds","description":"overcast
clouds","icon":"04n"}],"clouds":{"all":99},"wind":{"speed":0.19,"deg":31.065},"sys":{"pod":"n"},"dt_txt":"2019-09-23
00:00:00"},{"dt":1569207600,"main":{"temp":286.04,"temp_min":286,"temp_max":286.04,"pressure":1019.38,"sea_level":1019.38,"grnd_level":1001.03,"humidity":89,"temp_kf":0.04},"weather":
You can use a single Regex pattern with two capturing groups for temperature and date. The pattern can look something like this:
("temp":\d*\.?\d{1,3}).*?(\d{4}-\d{2}-\d{2})
Regex demo.
C# example:
string htmlcode = // ...
var matches = Regex.Matches(htmlcode, #"(""temp"":\d*\.?\d{1,3}).*?(\d{4}-\d{2}-\d{2})");
foreach (Match m in matches)
{
Console.WriteLine(m.Groups[1].Value + "\t" + m.Groups[2].Value);
}
Output:
"temp":287.05 2019-09-22
"temp":286.22 2019-09-23
Try it online.
I don't think you have HTML. I think you have a collection of something called JSON (JavaScript Object Notification) which is a way to pass data efficiently.
So, this is one of your "HTML" objects.
{
"temp":287.05,
"temp_min":286.932,
"temp_max":287.05,
"pressure":1019.04,
"sea_level":1019.04,
"grnd_level":1001.11,
"humidity":89,
"temp_kf":0.12},
"weather":[{
"id":804,
"main":"Clouds",
"description":"overcast clouds",
"icon":"04n"
}],
"clouds":{
"all":100
},
"wind":{
"speed":0.71,"deg":205.913
},
"sys":{
"pod":"n"
},
"dt_txt":"2019-09-22 21:00:00"
}
So, I would recommend converting the line using the C# web helpers and parsing the objects directly.
//include this library
using System.Web.Helpers;
//parse your htmlcode using this loop
foreach(var line in htmlcode)
{
dynamic data = JSON.decode(line);
string temperature = (string)data["temp"];
string date = Convert.ToDateTime(data["dt_txt"]).ToString("yyyy-MM-dd");
Console.WriteLine($"temperature: {temperature} date: {date}"");
}
I have to parse a log file and not sure how to best take different pieces of each line. The problem I am facing is original developer used ':' to delimit tokens which was a bit idiotic since the line contains timestamp which itself contains ':'!
A sample line looks something like this:
transaction_date_time:[systemid]:sending_system:receiving_system:data_length:data:[ws_name]
2019-05-08 15:03:13:494|2019-05-08 15:03:13:398:[192.168.1.2]:ABC:DEF:67:cd71f7d9a546ec2b32b,AACN90012001000012,OPNG:[WebService.SomeName.WebServiceModule::WebServiceName]
I have no problem reading the log file and accessing each line but no sure how to get the pieces parsed?
Since the input string is not exactly splittable, because of the delimiter char is also part of the content, a simple regex expression can be used instead.
Simple but probably fast enough, even with the default settings.
The different parts of the input string can be separated with these capturing groups:
string pattern = #"^(.*?)\|(.*?):\[(.*?)\]:(.*?):(.*?):(\d+):(.*?):\[(.*)\]$";
This will give you 8 groups + 1 (Group[0]) which contains the whole string.
Using the Regex class, simply pass a string to parse (named line, here) and the regex (named pattern) to the Match() method, using default settings:
var result = Regex.Match(line, pattern);
The Groups.Value property returns the result of each capturing group. For example, the two dates:
var dateEnd = DateTime.ParseExact(result.Groups[1].Value, "yyyy-MM-dd hh:mm:ss:ttt", CultureInfo.InvariantCulture),
var dateStart = DateTime.ParseExact(result.Groups[2].Value, "yyyy-MM-dd hh:mm:ss:ttt", CultureInfo.InvariantCulture),
The IpAddress is extracted with: \[(.*?)\].
You could give a name to this grouping, so it's more clear what the value refers to. Simply add a string, prefixed with ? and enclosed in <> or single quotes ' to name the grouping:
...\[(?<IpAddress>.*?)\]...
Note, however, that naming a group will modify the Regex.Groups indexing: the un-named groups will be inserted first, the named groups after. So, naming only the IpAddress group will cause it to become the last item, Groups[8]. Of course you can name all the groups and the indexing will be preserved.
var hostAddress = IPAddress.Parse(result.Groups["IpAddress"].Value);
This patter should allow a medium machine to parse 130,000~150,000 strings per second.
You'll have to test it to find the perfect pattern. For example, the first match (corresposnding to the first date): (.*?)\|, is much faster if non-greedy (using the *? lazy quantifier). The opposite for the last match: \[(.*)\]. The pattern used by jdweng is even faster than the one used here.
See Regex101 for a detailed description on the use and meaning of each token.
Using Regex I was able to parse everything. It looks like the data came from excel because the faction of seconds has a colon instead of a period. c# does not like the colon so I had to replace colon with a period. I also parsed from right to left to get around the colon issues.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;
namespace ConsoleApplication3
{
class Program1
{
const string FILENAME = #"c:\temp\test.txt";
static void Main(string[] args)
{
string line = "";
int rowCount = 0;
StreamReader reader = new StreamReader(FILENAME);
string pattern = #"^(?'time'.*):\[(?'systemid'[^\]]+)\]:(?'sending'[^:]+):(?'receiving'[^:]+):(?'length'[^:]+):(?'data'[^:]+):\[(?'ws_name'[^\]]+)\]";
while ((line = reader.ReadLine()) != null)
{
line = line.Trim();
if (line.Length > 0)
{
if (++rowCount != 1) //skip header row
{
Log_Data newRow = new Log_Data();
Log_Data.logData.Add(newRow);
Match match = Regex.Match(line, pattern, RegexOptions.RightToLeft);
newRow.ws_name = match.Groups["ws_name"].Value;
newRow.data = match.Groups["data"].Value;
newRow.length = int.Parse(match.Groups["length"].Value);
newRow.receiving_system = match.Groups["receiving"].Value;
newRow.sending_system = match.Groups["sending"].Value;
newRow.systemid = match.Groups["systemid"].Value;
//end data is first then start date is second
string[] date = match.Groups["time"].Value.Split(new char[] {'|'}).ToArray();
string replacePattern = #"(?'leader'.+):(?'trailer'\d+)";
string stringDate = Regex.Replace(date[1], replacePattern, "${leader}.${trailer}", RegexOptions.RightToLeft);
newRow.startDate = DateTime.Parse(stringDate);
stringDate = Regex.Replace(date[0], replacePattern, "${leader}.${trailer}", RegexOptions.RightToLeft);
newRow.endDate = DateTime.Parse(stringDate );
}
}
}
}
}
public class Log_Data
{
public static List<Log_Data> logData = new List<Log_Data>();
public DateTime startDate { get; set; } //transaction_date_time:[systemid]:sending_system:receiving_system:data_length:data:[ws_name]
public DateTime endDate { get; set; }
public string systemid { get; set; }
public string sending_system { get; set; }
public string receiving_system { get; set; }
public int length { get; set; }
public string data { get; set; }
public string ws_name { get; set; }
}
}
This question already has answers here:
RegEx for matching UK Postcodes
(33 answers)
Closed 3 years ago.
I need to check the U.K postcode against a list.
The U.K postcode is of a standard format but the list only contains the outward section that I need to check against.
The list contains a series of outward postcode with also some data relating to this outward postcode, so for example
AL St Albans
B Birmingham
BT Belfast
TR Taunton
TR21 Taunton X
TR22 Taunton Y
My aim is that when I get a postcode, for example B20 7TP, I can search and find Birmingham.
Any ideas??
The question is different to the ones referred to as possible answers, but in my case I need to check a full postcode against just the outward postcode.
If you have the whole postcode and only want to use the outcode, remove the last three characters and use what remains. All postcodes end with the pattern digit-alpha-alpha, so removing those characters will give the outcode; any string that does not fit that pattern or that does not give a valid outcode after removing that substring is not a valid postcode. (Source)
If you're willing to take on an external (and Internet-based) dependency, you could look at using something like https://postcodes.io, in particular the outcodes section of that API. I have no affiliation with postcodes.io; I just found it after a Google.
Per the documentation, /outcodes will return
the outcode
the eastings
the northings
the andministrative counties under the code
the district/unitary authories under the code
the administrative/electoral areas under the code
the WGS84 logitude
the WGS84 latitude
the countries included in the code
the parish/communities in the code
For reference, a call to /outcodes/TA1 returns:
{
"status": 200,
"result": {
"outcode": "TA1",
"longitude": -3.10297767924529,
"latitude": 51.0133987332761,
"northings": 124359,
"eastings": 322721,
"admin_district": [
"Taunton Deane"
],
"parish": [
"Taunton Deane, unparished area",
"Bishop's Hull",
"West Monkton",
"Trull",
"Comeytrowe"
],
"admin_county": [
"Somerset"
],
"admin_ward": [
"Taunton Halcon",
"Bishop's Hull",
"Taunton Lyngford",
"Taunton Eastgate",
"West Monkton",
"Taunton Manor and Wilton",
"Taunton Fairwater",
"Taunton Killams and Mountfield",
"Trull",
"Comeytrowe",
"Taunton Blackbrook and Holway"
],
"country": [
"England"
]
}
}
If you have the whole postcode, the /postcodes endpoint will return considerably more detailed information which I will not include here, but it does include the outcode and the incode as separate fields.
I would, of course, recommend caching the results of any call to a remote API.
Build a regular expression from the list of known codes. Pay attention that the order of known codes in the regular expression matters. You need to use longer codes before shorter codes.
private void button1_Click(object sender, EventArgs e)
{
textBoxLog.Clear();
var regionList = BuildList();
var regex = BuildRegex(regionList.Keys);
TryMatch("B20 7TP", regionList, regex);
TryMatch("BT1 1AB", regionList, regex);
TryMatch("TR21 1AB", regionList, regex);
TryMatch("TR0 00", regionList, regex);
TryMatch("XX123", regionList, regex);
}
private static IReadOnlyDictionary<string, string> BuildList()
{
Dictionary<string, string> result = new Dictionary<string, string>();
result.Add("AL", "St Albans");
result.Add("B", "Birmingham");
result.Add("BT", "Belfast");
result.Add("TR", "Taunton");
result.Add("TR21", "Taunton X");
result.Add("TR22", "Taunton Y");
return result;
}
private static Regex BuildRegex(IEnumerable<string> codes)
{
// Sort the code by length descending so that for example TR21 is sorted before TR and is found by regex engine
// before the shorter match
codes = from code in codes
orderby code.Length descending
select code;
// Escape the codes to be used in the regex
codes = from code in codes
select Regex.Escape(code);
// create Regex Alternatives
string codesAlternatives = string.Join("|", codes.ToArray());
// A regex that starts with any of the codes and then has any data following
string lRegExSource = "^(" + codesAlternatives + ").*";
return new Regex(lRegExSource, RegexOptions.IgnoreCase | RegexOptions.Singleline);
}
/// <summary>
/// Try to match the postcode to a region
/// </summary>
private bool CheckPostCode(string postCode, out string identifiedRegion, IReadOnlyDictionary<string, string> regionList, Regex regex)
{
// Check whether we have any match at all
Match match = regex.Match(postCode);
bool result = match.Success;
if (result)
{
// Take region code from first match group
// and use it in dictionary to get region name
string regionCode = match.Groups[1].Value;
identifiedRegion = regionList[regionCode];
}
else
{
identifiedRegion = "";
}
return result;
}
private void TryMatch(string code, IReadOnlyDictionary<string, string> regionList, Regex regex)
{
string region;
if (CheckPostCode(code, out region, regionList, regex))
{
AppendLog(code + ": " + region);
}
else
{
AppendLog(code + ": NO MATCH");
}
}
private void AppendLog(string log)
{
textBoxLog.AppendText(log + Environment.NewLine);
}
Produces this output:
B20 7TP: Birmingham
BT1 1AB: Belfast
TR21 1AB: Taunton X
TR0 00: Taunton
XX123: NO MATCH
For your information, the regex built here is ^(TR21|TR22|AL|BT|TR|B).*
I'm trying to get some field value from a text file using a streamReader.
To read my custom value, I'm using split() method. My separator is a colon ':' and my text format looks like:
Title: Mytitle
Manager: Him
Thema: Free
.....
Main Idea: best idea ever
.....
My problem is, when I try to get the first field, which is title, I use:
string title= text.Split(:)[1];
I get title = MyTitle Manager
instead of just: title= MyTitle.
Any suggestions would be nice.
My text looks like this:
My mail : ........................text............
Manager mail : ..................text.............
Entity :.......................text................
Project Title :...............text.................
Principal idea :...................................
Scope of the idea : .........text...................
........................text...........................
Description and detail :................text.......
..................text.....
Cost estimation :..........
........................text...........................
........................text...........................
........................text...........................
Advantage for us :.................................
.......................................................
Direct Manager IM :................................
Updated per your post
//I would create a class to use if you haven't
//Just cleaner and easier to read
public class Entry
{
public string MyMail { get; set; }
public string ManagerMail { get; set; }
public string Entity { get; set; }
public string ProjectTitle { get; set; }
// ......etc
}
//in case your format location ever changes only change the index value here
public enum EntryLocation
{
MyMail = 0,
ManagerMail = 1,
Entity = 2,
ProjectTitle = 3
}
//return the entry
private Entry ReadEntry()
{
string s =
string.Format("My mail: test#test.com{0}Manager mail: test2#test2.com{0}Entity: test entity{0}Project Title: test project title", Environment.NewLine);
//in case you change your delimiter only need to change it once here
char delimiter = ':';
//your entry contains newline so lets split on that first
string[] split = s.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
//populate the entry
Entry entry = new Entry()
{
//use the enum makes it cleaner to read what value you are pulling
MyMail = split[(int)EntryLocation.MyMail].Split(delimiter)[1].Trim(),
ManagerMail = split[(int)EntryLocation.ManagerMail].Split(delimiter)[1].Trim(),
Entity = split[(int)EntryLocation.Entity].Split(delimiter)[1].Trim(),
ProjectTitle = split[(int)EntryLocation.ProjectTitle].Split(delimiter)[1].Trim()
};
return entry;
}
That is because split returns strings delimited by the sign you've specified. In your case:
Title
Mytitle Manager
Him
.1. You can change your data format to get the value you need, for example:
Title: Mytitle:Manager: Him
There each second element will be the value.
text.Split(:)[1] == " Mytitle";
text.Split(:)[3] == " Him";
.2. Or you can call text.Split(' ', ':') to get identical list of name-value pairs without format change.
.3. Also if your data is placed each on a new line in the file like:
Title: Mytitle
Manager: Him
And you content is streamed into single string then you can also do:
text.Split(new string[] {Environment.NewLine, ":"}, StringSplitOptions.None);
I have these strings as a response from a FTP server:
02-17-11 01:39PM <DIR> dec
04-06-11 11:17AM <DIR> Feb 2011
05-10-11 07:09PM 87588 output.xlsx
06-10-11 02:52PM 3462 output.xlsx
where the pattern is: [datetime] [length or <dir>] [filename]
Edit: my code was- #"^\d{2}-\d{2}-\d{2}(\s)+(<DIR>|(\d)+)+(\s)+(.*)+"
I need to parse these strings in this object:
class Files{
Datetime modifiedTime,
bool ifTrueThenFile,
string name
}
Please note that, filename may have spaces.
I am not good at regex matching, can you help?
Regex method
One approach is using this regex
#"(\d{2}-\d{2}-\d{2} \d{2}:\d{2}(?:PM|AM)) (<DIR>|\d+) (.+)";
I am capturing groups, so
// Group 1 - Matches the DateTime
(\d{2}-\d{2}-\d{2} \d{2}:\d{2}(?:PM|AM))
Notice the syntax (?:xx), it means that the content here will not be caught in a group, we need to match PM or AM but this group alone doesn't matter.
Next I match the file size or <DIR> with
// Group 2 - Matches the file size or <DIR>
(<DIR>|\d+)
Catching the result in a group.
The last part matches directory names or file names
// Group 3 - Matches the dir/file name
(.+)
Now that we captured all groups we can parse the values:
DateTime.Parse(g[1].Value); // be careful with current culture
// a different culture may not work
To check if the captured entry is a file or not you can just check if it is <DIR> or a number.
IsFile = g[2].Value != "<DIR>"; // it is a file if it is not <DIR>
And the name is just what is left
Name = g[3].Value; // returns a string
Then you can use the groups to build the object, an example:
public class Files
{
public DateTime ModifiedTime { get; set; }
public bool IsFile { get; set; }
public string Name { get; set; }
public Files(GroupCollection g)
{
ModifiedTime = DateTime.Parse(g[1].Value);
IsFile = g[2].Value != "<DIR>";
Name = g[3].Value;
}
}
static void Main(string[] args)
{
var p = #"(\d{2}-\d{2}-\d{2} \d{2}:\d{2}(?:PM|AM)) (<DIR>|\d+) (.+)";
var regex = new Regex(p, RegexOptions.IgnoreCase);
var m1 = regex.Match("02-17-11 01:39PM <DIR> dec");
var m2 = regex.Match("05-10-11 07:09PM 87588 output.xlsx");
// DateTime: 02-17-11 01:39PM
// IsFile : false
// Name : dec
var file1 = new Files(m1.Groups);
// DateTime: 05-10-11 07:09PM
// IsFile : true
// Name : output.xlsx
var file2 = new Files(m2.Groups);
}
Further reading
Regex class
Regex groups
String manipulation method
Another way to achieve this is to split the string which can be much faster:
public class Files
{
public DateTime ModifiedTime { get; set; }
public bool IsFile { get; set; }
public string Name { get; set; }
public Files(string line)
{
// Gets the date part and parse to DateTime
ModifiedTime = DateTime.Parse(line.Substring(0, 16));
// Gets the file information part and split
// in two parts
var fileBlock = line.Substring(17).Split(new char[] { ' ' }, 2);
// first part tells if it is a file
IsFile = fileBlock[0] != "<DIR>";
// second part tells the name
Name = fileBlock[1];
}
}
static void Main(string[] args)
{
// DateTime: 02-17-11 01:39PM
// IsFile : false
// Name : dec
var file3 = new Files("02-17-11 01:39PM <DIR> dec");
// DateTime: 05-10-11 07:09PM
// IsFile : true
// Name : out put.xlsx
var file4 = new Files("05-10-11 07:09PM 87588 out put.xlsx");
}
Further reading
String split
String.Split Method (Char[], Int32)
You can try with something like:
^(\d\d-\d\d-\d\d)\s+(\d\d:\d\d[AP]M)\s+(\S+)\s+(.*)$
The first capture group will contain the date, the second the time, the third the size (or <DIR>, and the last everything else (which will be the filename).
(Note that this is probably not portable, the time format is locale dependent.)
Here you go:
(\d{2})-(\d{2})-(\d{2}) (\d{2}):(\d{2})([AP]M) (<DIR>|\d+) (.+)
I used a lot of sub expressions, so it would catch all relevant parts like year, hour, minute etc. Maybe you dont need them all, just remove the brackets in case.
try this
String regexTemp= #"(<Date>(\d\d-\d\d-\d\d\s*\d\d:\d\dA|PM)\s*(<LengthOrDir>\w*DIR\w*|\d+)\s*(<Name>.*)";
Match mExprStatic = Regex.Match(regexTemp, RegexOptions.IgnoreCase | RegexOptions.Singleline);
if (mExprStatic.Success || !string.IsNullOrEmpty(mExprStatic.Value))
{
DateTime _date = DateTime.Parse(mExprStatic.Groups["lang"].Value);
String lengthOrDir = mExprStatic.Groups["LengthOrDir"].Value;
String Name = mExprStatic.Groups["Name"].Value;
}
A lot of good answers, but I like regex puzzles, so I thought I'd contribute a slightly different version...
^([\d- :]{14}[A|P]M)\s+(<DIR>|\d+)\s(.+)$
For help in testing, I always use this site : http://www.myregextester.com/index.php
You don't need to use regex here. Why don't you split the string by spaces with a number_of_elements limit:
var split = yourEntryString.Split(new string []{" "}, 4,
StringSplitOptions.RemoveEmptyEntries);
var date = string.Join(" ", new string[] {split[0], split[1]});
var length = split[2];
var filename = split[3];
this is of course assuming that the pattern is correct and none of the entries would be empty.
I like the regex Leif posted.
However, i'll give you another solution which people will probably hate: fast and dirty solution which i am coming up with just as i am typing:
string[] allParts = inputText.Split(" ")
allParts[0-1] = parse your DateTime
allParts[2] = <DIR> or Size
allParts[3-n] = string.Join(" ",...) your filename
There are some checks missing there, but you get the idea.
Is it nice code? Probably not. Will it work? With the right amount of time, surely.
Is it more readable? I tend to to think "yes", but others might disagree.
You should be able to implement this with simple string.split, if statement and parse/parseexact method to convert the value. If it is a file then just concatenated the remaining string token so you can reconstruct filename with space