Split string into class - c#

I have an array which contains following values:
str[0]= "MeterNr 29202"
str[1]="- 20111101: position 61699 (Previous calculation) "
str[2]="- 20111201: position 68590 (Calculation) consumption 6891 kWh"
str[3]="- 20111101: position 75019 (Previous calculation) "
str[4]="MeterNr 50273"
str[5]="- 20111101: position 18103 (Previous reading) "
str[6]="- 20111201: position 19072 (Calculation) consumption 969 kWh "
I want to split the rows in logical order so that I can store them in following Reading class. I have problems with spliting the values. Everything in brackets () is ItemDescription.
I will be thankful for the quick answer.
public class Reading
{
public string MeterNr { get; set; }
public string ItemDescription { get; set; }
public string Date { get; set; }
public string Position { get; set; }
public string Consumption { get; set; }
}

You should parse the values one by one.
If you have a string, which starts with "MeterNr", you should save it as currentMeterNumber and parse the values further.
Otherwise, you can parse the values with Regex:
var dateRegex = new Regex(#"(?<=-\s)(?<year>\d{4})(?<month>\d{2})(?<day>\d{2})");
var positionRegex = new Regex(#"(?<=position\s+)(\d+)");
var descriptionRegex = new Regex(#"(?<=\()(?<description>[^)]+)(?=\))");
var consuptionRegex = new Regex(#"(?<=consumption\s+)(?<consumption>(?<consumtionValue>\d+)\s(?<consumptionUom>\w+))");
I hope, you would be able to create the final algorithm, as well as understand how each of those expressions works. A final point could be to combine them all into single Regex. You should do it yourself to enhance your skills.
P.S.: There are a lot of tutorials in Internet.

I would just use a for loop and string indexes etc, but then I am a bit simple like that! Not sure of your data (i.e. if things might be missing) but this would work on the data you have posted...
var readings = new List<Reading>();
int meterNrLength = "MeterNr".Length;
int positionLength = "position".Length;
int consumptionLength = "consumption".Length;
string meterNr = null;
foreach(var s in str)
{
int meterNrIndex = s.IndexOf("MeterNr",
StringComparison.OrdinalIgnoreCase);
if (meterNrIndex != -1)
{
meterNr = s.Substring(meterNrIndex + meterNrLength).Trim();
continue;
}
var reading = new Reading {MeterNr = meterNr};
string rest = s.Substring(0, s.IndexOf(':'));
reading.Date = rest.Substring(1).Trim();
rest = s.Substring(s.IndexOf("position") + positionLength);
int bracketIndex = rest.IndexOf('(');
reading.Position = rest.Substring(0, bracketIndex).Trim();
rest = rest.Substring(bracketIndex + 1);
reading.ItemDescription = rest.Substring(0, rest.IndexOf(")"));
int consumptionIndex = rest.IndexOf("consumption",
StringComparison.OrdinalIgnoreCase);
if (consumptionIndex != -1)
{
reading.Consumption = rest.Substring(consumptionIndex + consumptionLength).Trim();
}
readings.Add(reading);
}

public static List<Reading> Parser(this string[] str)
{
List<Reading> result = new List<Reading>();
string meterNr = "";
Reading reading;
foreach (string s in str)
{
MatchCollection mc = Regex.Matches(s, "\\d+|\\((.*?)\\)");
if (mc.Count == 1)
{
meterNr = mc[0].Value;
continue;
}
reading = new Reading()
{
MeterNr = meterNr,
Date = mc[0].Value,
Position = mc[1].Value,
ItemDescription = mc[2].Value.TrimStart('(').TrimEnd(')')
};
if (mc.Count == 4)
reading.Consumption = mc[3].Value;
result.Add(reading);
}
return result;
}

Related

binary search in a sorted list in c#

I am retrieving client id\ drum id from a file and storing them in a list.
then taking the client id and storing it in another list.
I need to display the client id that the user specifies (input_id) on a Datagrid.
I need to get all the occurrences of this specific id using binary search.
the file is already sorted.
I need first to find the occurrences of input_id in id_list.
The question is: how to find all the occurrences of input_id in the sorted list id_list using binary search?
using(StreamReader sr= new StreamReader(path))
{
List<string> id_list = new List<string>();
List<string> all_list= new List<string>();
List<int> indexes = new List<int>();
string line = sr.ReadLine();
line = sr.ReadLine();
while (line != null)
{
all_list.Add(line);
string[] break1 = line.Split('/');
id_list.Add(break1[0]);
line = sr.ReadLine();
}
}
string input_id = textBox1.Text;
Data in the file:
client id/drum id
-----------------
123/321
231/3213
321/213123 ...
If the requirement was to use binary search I would create a custom class with a comparer, and then find an element and loop forward/backward to get any other elements. Like:
static void Main(string[] args
{
var path = #"file path...";
// read all the Ids from the file.
var id_list = File.ReadLines(path).Select(x => new Drum
{
ClientId = x.Split('/').First(),
DrumId = x.Split('/').Last()
}).OrderBy(o => o.ClientId).ToList();
var find = new Drum { ClientId = "231" };
var index = id_list.BinarySearch(find, new DrumComparer());
if (index != -1)
{
List<Drum> matches = new List<Drum>();
matches.Add(id_list[index]);
//get previous matches
for (int i = index - 1; i > 0; i--)
{
if (id_list[i].ClientId == find.ClientId)
matches.Add(id_list[i]);
else
break;
}
//get forward matches
for (int i = index + 1; i < id_list.Count; i++)
{
if (id_list[i].ClientId == find.ClientId)
matches.Add(id_list[i]);
else
break;
}
}
}
public class Drum
{
public string DrumId { get; set; }
public string ClientId { get; set; }
}
public class DrumComparer : Comparer<Drum>
{
public override int Compare(Drum x, Drum y) =>
x.ClientId.CompareTo(y.ClientId);
}
If i understand you question right then this should be a simple where stats.
// read all the Ids from the file.
var Id_list = File.ReadLines(path).Select(x => new {
ClientId = x.Split('/').First(),
DrumId = x.Split('/').Last()
}).ToList();
var foundIds = Id_list.Where(x => x.ClientId == input_id);

How to parse nested parenthesis only in first level in C#

I would like to write C# code that parses nested parenthesis to array elements, but only on first level. An example is needed for sure:
I want this string:
"(example (to (parsing nested paren) but) (first lvl only))"
tp be parsed into:
["example", "(to (parsing nested paren) but)", "(first lvl only)"]
I was thinking about using regex but can't figure out how to properly use them without implementing this behaviour from scratch.
In the case of malformed inputs I would like to return an empty array, or an array ["error"]
I developed a parser for your example. I also checked some other examples which you can see in the code.
using System;
using System.Collections;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
string str = "(example (to (parsing nested paren) but) (first lvl only))"; // => [example , (to (parsing nested paren) but) , (first lvl only)]
//string str = "(first)(second)(third)"; // => [first , second , third]
//string str = "(first(second)third)"; // => [first , (second) , third]
//string str = "(first(second)(third)fourth)"; // => [first , (second) , (third) , fourth]
//string str = "(first((second)(third))fourth)"; // => [first , ((second)(third)) , fourth]
//string str = "just Text"; // => [ERROR]
//string str = "start with Text (first , second)"; // => [ERROR]
//string str = "(first , second) end with text"; // => [ERROR]
//string str = ""; // => [ERROR]
//string str = "("; // => [ERROR]
//string str = "(first()(second)(third))fourth)"; // => [ERROR]
//string str = "(((extra close pareanthese))))"; // => [ERROR]
var res = Parser.parse(str);
showRes(res);
}
static void showRes(ArrayList res)
{
var strings = res.ToArray();
var theString = string.Join(" , ", strings);
Console.WriteLine("[" + theString + "]");
}
}
public class Parser
{
static Dictionary<TokenType, TokenType> getRules()
{
var rules = new Dictionary<TokenType, TokenType>();
rules.Add(TokenType.OPEN_PARENTHESE, TokenType.START | TokenType.OPEN_PARENTHESE | TokenType.CLOSE_PARENTHESE | TokenType.SIMPLE_TEXT);
rules.Add(TokenType.CLOSE_PARENTHESE, TokenType.SIMPLE_TEXT | TokenType.CLOSE_PARENTHESE);
rules.Add(TokenType.SIMPLE_TEXT, TokenType.SIMPLE_TEXT | TokenType.CLOSE_PARENTHESE | TokenType.OPEN_PARENTHESE);
rules.Add(TokenType.END, TokenType.CLOSE_PARENTHESE);
return rules;
}
static bool isValid(Token prev, Token cur)
{
var rules = Parser.getRules();
return rules.ContainsKey(cur.type) && ((prev.type & rules[cur.type]) == prev.type);
}
public static ArrayList parse(string sourceText)
{
ArrayList result = new ArrayList();
int openParenthesesCount = 0;
Lexer lexer = new Lexer(sourceText);
Token prevToken = lexer.getStartToken();
Token currentToken = lexer.readNextToken();
string tmpText = "";
while (currentToken.type != TokenType.END)
{
if (currentToken.type == TokenType.OPEN_PARENTHESE)
{
openParenthesesCount++;
if (openParenthesesCount > 1)
{
tmpText += currentToken.token;
}
}
else if (currentToken.type == TokenType.CLOSE_PARENTHESE)
{
openParenthesesCount--;
if (openParenthesesCount < 0)
{
return Parser.Error();
}
if (openParenthesesCount > 0)
{
tmpText += currentToken.token;
}
}
else if (currentToken.type == TokenType.SIMPLE_TEXT)
{
tmpText += currentToken.token;
}
if (!Parser.isValid(prevToken, currentToken))
{
return Parser.Error();
}
if (openParenthesesCount == 1 && tmpText.Trim() != "")
{
result.Add(tmpText);
tmpText = "";
}
prevToken = currentToken;
currentToken = lexer.readNextToken();
}
if (openParenthesesCount != 0)
{
return Parser.Error();
}
if (!Parser.isValid(prevToken, currentToken))
{
return Parser.Error();
}
if (tmpText.Trim() != "")
{
result.Add(tmpText);
}
return result;
}
static ArrayList Error()
{
var er = new ArrayList();
er.Add("ERROR");
return er;
}
}
class Lexer
{
string _txt;
int _index;
public Lexer(string text)
{
this._index = 0;
this._txt = text;
}
public Token getStartToken()
{
return new Token(-1, TokenType.START, "");
}
public Token readNextToken()
{
if (this._index >= this._txt.Length)
{
return new Token(-1, TokenType.END, "");
}
Token t = null;
string txt = "";
if (this._txt[this._index] == '(')
{
txt = "(";
t = new Token(this._index, TokenType.OPEN_PARENTHESE, txt);
}
else if (this._txt[this._index] == ')')
{
txt = ")";
t = new Token(this._index, TokenType.CLOSE_PARENTHESE, txt);
}
else
{
txt = this._readText();
t = new Token(this._index, TokenType.SIMPLE_TEXT, txt);
}
this._index += txt.Length;
return t;
}
private string _readText()
{
string txt = "";
int i = this._index;
while (i < this._txt.Length && this._txt[i] != '(' && this._txt[i] != ')')
{
txt = txt + this._txt[i];
i++;
}
return txt;
}
}
class Token
{
public int position
{
get;
private set;
}
public TokenType type
{
get;
private set;
}
public string token
{
get;
private set;
}
public Token(int position, TokenType type, string token)
{
this.position = position;
this.type = type;
this.token = token;
}
}
[Flags]
enum TokenType
{
START = 1,
OPEN_PARENTHESE = 2,
SIMPLE_TEXT = 4,
CLOSE_PARENTHESE = 8,
END = 16
}
well, regex will do the job:
var text = #"(example (to (parsing nested paren) but) (first lvl only))";
var pattern = #"\(([\w\s]+) (\([\w\s]+ \([\w\s]+\) [\w\s]+\)) (\([\w\s]+\))\)*";
try
{
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = r.Match(text);
string group_1 = m.Groups[1].Value; //example
string group_2 = m.Groups[2].Value; //(to (parsing nested paren) but)
string group_3 = m.Groups[3].Value; //(first lvl only)
return new string[]{group_1,group_2,group_3};
}
catch(Exception ex){
return new string[]{"error"};
}
hopefully this helps, tested here in dotnetfiddle
Edit:
this might get you started into building the right expression according to whatever patterns you are falling into and maybe build a recursive function to parse the rest into the desired output :)
RegEx is not recursive. You either count bracket level, or recurse.
An non-recursive parser loop I tested for the example you show is..
string SplitFirstLevel(string s)
{
List<string> result = new List<string>();
int p = 0, level = 0;
for (int i = 0; i < s.Length; i++)
{
if (s[i] == '(')
{
level++;
if (level == 1) p = i + 1;
if (level == 2)
{
result.Add('"' + s.Substring(p, i - p) + '"');
p = i;
}
}
if (s[i] == ')')
if (--level == 0)
result.Add('"' + s.Substring(p, i - p) + '"');
}
return "[" + String.Join(",", result) + "]";
}
Note: after some more testing, I see your specification is unclear. How to delimit orphaned level 1 terms, that is terms without bracketing ?
For example, my parser translates
(example (to (parsing nested paren) but) (first lvl only))
to:
["example ","(to (parsing nested paren) but) ","(first lvl only)"]
and
(example (to (parsing nested paren)) but (first lvl only))
to:
["example ","(to (parsing nested paren)) but ","(first lvl only)"]
In either case, "example" gets a separate term, while "but" is grouped with the first term. In the first example this is logical, it is in the bracketing, but it may be unwanted behaviour in the second case, where "but" should be separated, like "example", which also has no bracketing (?)

How to Split and Sum Members of a String Value

I have a database column that is a text field, and this text field contains values that look like
I=5212;A=97920;D=20181121|I=5176;A=77360;D=20181117|I=5087;A=43975;D=20181109
and can vary sometimes to look like:
I=29;A=20009.34;D=20190712;F=300|I=29;A=2259.34;D=20190714;F=300
Where 'I' represents the invoice Id, 'A' the invoice amount, 'D' the date in YYYYMMDD format and 'F' the original foreign currency value if the invoice was from a foreign supplier.
I am fetching that column and binding it to a datagrid which has a button labelled "Show Amount". On button click, it fetches the selected row and splits the string to extract "A"
I need to fetch all the sections with A= within the column result... i.e
A=97920
A=77360
A=43975
Then sum them all together and display the result on a label.
I have tried splitting using '|' first, extracting the substring 'A=' then splitting it using ';' to get the amount after "=".
string cAlloc;
string[] amount;
string InvoiceTotal;
string SupplierAmount;
string BalanceUnpaid;
DataRowView dv = invoicesDataGrid.SelectedItem as DataRowView;
if (dv != null)
{
cAlloc = dv.Row.ItemArray[7].ToString();
InvoiceTotal = dv.Row.ItemArray[6].ToString();
if (invoicesDataGrid.Columns[3].ToString() == "0")
{
lblAmount.Foreground = Brushes.Red;
lblAmount.Content = "No Amount Has Been Paid Out to the Supplier";
}
else
{
amount = cAlloc.Split('|');
foreach (string i in amount)
{
string toBeSearched = "A=";
string code = i.Substring(i.IndexOf(toBeSearched) + toBeSearched.Length);
string[] res = code.Split(';');
SupplierAmount = res[0];
float InvTotIncl = float.Parse(InvoiceTotal, CultureInfo.InvariantCulture.NumberFormat);
float AmountPaid = float.Parse(SupplierAmount, CultureInfo.InvariantCulture.NumberFormat);
float BalUnpaid = InvTotIncl - AmountPaid;
BalanceUnpaid = Convert.ToString(BalUnpaid);
if (BalUnpaid == 0)
{
lblAmount.Content = "Amount Paid = " + SupplierAmount + " No Balance Remaining, Supplier Invoice Paid in Full";
}
else if (BalUnpaid < 0)
{
lblAmount.Content = "Amount Paid = " + SupplierAmount + " Supplier Paid an Excess of " + BalanceUnpaid;
}
else
{
lblAmount.Content = "Amount Paid = " + SupplierAmount + " You Still Owe the Supplier a Total of " + BalanceUnpaid; ;
}
}
}
But I am only able to extract A=43975, the very last "A=". Instead of all three, plus I have not figured out how to sum the strings. Somebody help... please.
Regex is prefered solution. Alternatively split, split and split.
var cAlloc = "I=29;A=20009.34;D=20190712;F=300|I=29;A=2259.34;D=20190714;F=300";
var amount = cAlloc.Split('|');
decimal sum = 0;
foreach (string i in amount)
{
foreach (var t in i.Split(';'))
{
var p = t.Split('=');
if (p[0] == "A")
{
var s = decimal.Parse(p[1], CultureInfo.InvariantCulture);
sum += s;
break;
}
}
}
var in1 = "I=5212;A=97920;D=20181121|I=5176;A=77360;D=20181117|I=5087;A=43975;D=20181109";
var in2 = "I=29;A=20009.34;D=20190712;F=300|I=29;A=2259.34;D=20190714;F=300";
var reg = #"A=(\d+(\.\d+)?)";
Regex.Matches(in1, reg).OfType<Match>().Sum(m => double.Parse(m.Groups[1].Value));
Regex.Matches(in2, reg).OfType<Match>().Sum(m => double.Parse(m.Groups[1].Value));
You're doing too much work for something like this. Here's a simpler solution using Regex.
If the invoice amount is always located as a second value in the set you can access it directly by index after split:
var str = "I=5212;A=97920;D=20181121|I=5176;A=77360;D=20181117|I=5087;A=43975;D=20181109";
var invoices = str.Trim().Split(new[] { '|' }, StringSplitOptions.RemoveEmptyEntries);
var totalSum = 0M;
foreach (var invoice in invoices)
{
var invoiceParts = invoice.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
var invoiceAmount = decimal.Parse(invoiceParts[1].Trim().Substring(2));
totalSum += invoiceAmount;
}
Otherwise, you can use a little more "flexible" solution like this:
var str = "I=5212;A=97920;D=20181121|I=5176;A=77360;D=20181117|I=5087;A=43975;D=20181109";
var invoices = str.Trim().Split(new[] { '|' }, StringSplitOptions.RemoveEmptyEntries);
var totalSum = 0M;
foreach (var invoice in invoices)
{
var invoiceParts = invoice.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
var invoiceAmount = decimal.Parse(invoiceParts.First(ip => ip.Trim().ToLower().StartsWith("a=")).Substring(2));
totalSum += invoiceAmount;
}
Import the input: "Deserialisation"
With the following given input, we have a list of object with property name I,A, and D.
var input = "I=5212;A=97920;D=20181121|I=5176;A=77360;D=20181117|I=5087;A=43975;D=20181109";
Give this simple class:
public class inputClass
{
public decimal I { get; set; }
public decimal A { get; set; }
public decimal D { get; set; }
}
Parsing it will look like:
var inputItems =
input.Split('|')
.Select(
x =>
x.Split(';')
.ToDictionary(
y => y.Split('=')[0],
y => y.Split('=')[1]
)
)
.Select(
x => //Manual parsing from dictionary to inputClass.
//If dictionary Key match an object property we could use something more generik.
new inputClass
{
I = decimal.Parse(x["I"], CultureInfo.InvariantCulture.NumberFormat),
A = decimal.Parse(x["A"], CultureInfo.InvariantCulture.NumberFormat),
D = decimal.Parse(x["D"], CultureInfo.InvariantCulture.NumberFormat),
}
)
.ToList();
It look complexe? lets give the inputClass the responsability to initialise it self based on string
PropertyName=Value[; PropertyName=Value] :
public inputClass(string input, NumberFormatInfo numberFormat)
{
var dict = input
.Split(';')
.ToDictionary(
y => y.Split('=')[0],
y => y.Split('=')[1]
);
I = decimal.Parse(dict["I"], numberFormat);
A = decimal.Parse(dict["A"], numberFormat);
D = decimal.Parse(dict["D"], numberFormat);
}
Then the parsing is simple:
var inputItems = input.Split('|').Select(x => new inputClass(x, CultureInfo.InvariantCulture.NumberFormat));
Once we have a more useable Structure a List of object We can easly compute Sum, Avg, Max, Min:
var sumA = inputItems.Sum(x => x.A);
Producing the output: "Serialisation"
In order to process the input we will define an object like similar to the Input
public class outputClass
{
public decimal I { get; set; }
public decimal A { get; set; }
public decimal D { get; set; }
public decimal F { get; set; }
The Class should be able to produce the String PropertyName=Value[; PropertyName=Value], :
public override string ToString()
{
return $"I={I};A={A};D={D};F={F}";
}
Then producing and string "serialisation" after computing the ListOutput based on the List input:
//process The input into the output.
var outputItems = new List<outputClass>();
foreach (var item in inputItems)
{
// compute things to be able to create the nex output item
item.A++;
outputItems.Add(
new outputClass { A = item.A, D = item.D, I = item.I, F = 42 }
);
}
// "Serialisation"
var outputString = String.Join("|", outputItems);
Online Demo. https://dotnetfiddle.net/VcEQmf
Long story short:
Define a class with the property you will use/display.
Add a constructor that take a string like "I=5212;A=97920;D=20181121"
nb: the String may contain property that will not be map to the object
Override the ToString(), so It can easly produce it's serialisation.
nb: Property and value that are not stored in the object will not be in the serialisation result.
Now You simply have to split on your line/object separator "|" and you are ready to go using real object, not having to care about that weird string anymore.
PS:
There was a little missunderstand about your 2 type of inputs, I mentally saw them as input, output. Dont mind those name. It can be the same class. It doens't change anything in this answer.

Split a string into three seperate parts

I have a URL string coming into an API e.g. c1:1=25.
*http://mysite/api/controllername?serial=123&c1:=25*
I want to split it into the channel name (c1), the channel reading number (1) after the colon and the value (25).
There are also occasions, where there is no colon as it is a fixed value such as a serial number (serial=123).
I have created a class:
public class UriDataModel
{
public string ChannelName { get; set; }
public string ChannelNumber { get; set; }
public string ChannelValue { get; set; }
}
I am trying to use an IEnumerable with some LINQ and not getting very far.
var querystring = HttpContext.Current.Request.Url.Query;
querystring = querystring.Substring(1);
var urldata = new UrlDataList
{
UrlData = querystring.Split('&').ToList()
};
IEnumerable<UriDataModel> uriData =
from x in urldata.UrlData
let channelname = x.Split(':')
from y in urldata.UrlData
let channelreading = y.Split(':', '=')
from z in urldata.UrlData
let channelvalue = z.Split('=')
select new UriDataModel()
{
ChannelName = channelname[0],
ChannelNumber = channelreading[1],
ChannelValue = channelvalue[2]
};
List<UriDataModel> udm = uriData.ToList();
I feel as if I am over complicating things here.
In summary, I want to split the string into three parts and where there is no colon split it into two.
Any pointers will be great. TIA
You can use regex. I think you switched the channel number and the colon in your example, so my code reflects this assumption.
public static (string channelName, string channelNumber, string channelValue) ParseUrlData(string urlData)
{
var regex = new Regex(#"serial=(\d+)(&c(:\d+)?=(\d+))?");
var matches = regex.Match(urlData);
string name = null;
string number = null;
string value = null;
if (matches.Success)
{
name = matches.Groups[1].Value;
if (matches.Groups.Count == 5) number = matches.Groups[3].Value.TrimStart(':');
if (matches.Groups.Count >= 4) value = matches.Groups[matches.Groups.Count - 1].Value;
}
Console.WriteLine($"[{name}] [{number}] [{value}]");
return (name, number, value);
}
Then you can call it like this
(var channelName, var channelNumber, var channelValue) = ParseUrlData("serial=123&c:1=25");
(var channelName, var channelNumber, var channelValue) = ParseUrlData("serial=123&c=25");
(var channelName, var channelNumber, var channelValue) = ParseUrlData("serial=123");
and it'll return (and print)
[123] [1] [25]
[123] [] [25]
[123] [] []

C# - Regex subtitle file (.srt) to get text content?

I have a srt file
1
00:00:07,000 --> 00:00:09,000
Time to amaze the world..
create by Hazy
2
00:00:11,000 --> 00:00:12,200
show them
3
00:00:15,000 --> 00:00:16,500
an impossible feat
i want to get text content
Time to amaze the world..
create by Hazy,
show them,
an impossible feat
My regex:
string[] souceSrt = Regex.Split(inputText.Text, #"\n*\d+\n\d\d:\d\d:\d\d,\d\d\d --> \d\d:\d\d:\d\d,\d\d\d\n");
but it's not working. What should i do??
Your approach wasn't bad, I think your pattern doesn't work because of newlines (that are probably CRLF):
(?:\r?\n)*\d+\r?\n\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}\r?\n
Note that your first approach is safer than searching all lines that contains letters (imagine a character that says "how old are you?")
using RegexHero
string strRegex = #"^.*([a-zA-Z]).*$";
Regex myRegex = new Regex(strRegex, RegexOptions.Multiline);
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
//grab line
}
}
unless there's something I've missed, the lines you don't want will never have an alphabetic character in them.
A solution to parse an SRT without RegEx
Create a class to Deserialize the SRT
public class SrtContent
{
public string Text { get; set; }
public string StartTime { get; set; }
public string EndTime { get; set; }
public string Segment { get; set; }
}
Now here is the method that will parse the SRT
private static void ParseSRT(string srtFilePath)
{
var fileContent = File.ReadAllLines(srtFilePath);
if (fileContent.Length <= 0)
return;
var content = new List<SrtContent>();
var segment = 1;
for (int item = 0; item < fileContent.Length; item++)
{
if (segment.ToString() == fileContent[item])
{
content.Add(new SrtContent
{
Segment = segment.ToString(),
StartTime = fileContent[item + 1].Substring(0, fileContent[item + 1].LastIndexOf("-->")).Trim(),
EndTime = fileContent[item + 1].Substring(fileContent[item + 1].LastIndexOf("-->") + 3).Trim(),
Text = fileContent[item + 2]
});
// The block numbers of SRT like 1, 2, 3, ... and so on
segment++;
// Iterate one block at a time
item += 3;
}
}
}

Categories