Regex doesn't read multiple lines correctly in C#

Regex doesn't read multiple lines correctly in C# - c#

i faced few problems trying to capture info from text file via REGEX in C#.
Here is an example of the code and the string:
string pattern = #"([\w\d]+) Hand #([\d]+): Tournament #([\d]+), ([$€])([.\d]+)\+[$€]([.\d]+).+\s\(([\d]+\/[\d]+)\).+([\d]+\/[\d]+\/[\d]+ )([\d:]+) ET" +
#"(\n|\r|\r\n)Table '([\s\d]+)' (.+) Seat #(\d) is the button" +
#"((?:(?:\n|\r|\r\n)^Seat ([\d]+): (.+) \(([\d]+) in chips\))*)";
MatchCollection matches = Regex.Matches(_hh, pattern, RegexOptions.Multiline);
PokerStars Hand #232702710836: Tournament #3332581238, $9.22+$0.78 USD Hold'em No Limit - Level IV (40/80) - 2021/12/31 22:34:19 ET
Table '3332581238 1' 9-max Seat #3 is the button
Seat 1: mpolishuk (2018 in chips)
Seat 3: Kevin2049 (1154 in chips)
Seat 4: IPray2Buddha (1030 in chips)
Seat 5: Sakura2892 (1499 in chips)
Seat 7: Lillien66 (2141 in chips)
Seat 9: owlie45 (5658 in chips)
mpolishuk: posts the ante 10
Kevin2049: posts the ante 10
IPray2Buddha: posts the ante 10
Sakura2892: posts the ante 10
Lillien66: posts the ante 10
owlie45: posts the ante 10
IPray2Buddha: posts small blind 40
Sakura2892: posts big blind 80
*** HOLE CARDS ***
Dealt to IPray2Buddha [6c 6h]
Lillien66: folds
owlie45: folds
mpolishuk: folds
Kevin2049: folds
IPray2Buddha: calls 40
Sakura2892: raises 1409 to 1489 and is all-in
IPray2Buddha: calls 940 and is all-in
Uncalled bet (469) returned to Sakura2892
*** FLOP *** [8d 7c Ts]
*** TURN *** [8d 7c Ts] [6s]
*** RIVER *** [8d 7c Ts 6s] [Ah]
*** SHOW DOWN ***
IPray2Buddha: shows [6c 6h] (three of a kind, Sixes)
Sakura2892: shows [Kd Qh] (high card Ace)
IPray2Buddha collected 2100 from pot
*** SUMMARY ***
Total pot 2100 | Rake 0
Board [8d 7c Ts 6s Ah]
Seat 1: mpolishuk folded before Flop (didn't bet)
Seat 3: Kevin2049 (button) folded before Flop (didn't bet)
Seat 4: IPray2Buddha (small blind) showed [6c 6h] and won (2100) with three of a kind, Sixes
Seat 5: Sakura2892 (big blind) showed [Kd Qh] and lost with high card Ace
Seat 7: Lillien66 folded before Flop (didn't bet)
Seat 9: owlie45 folded before Flop (didn't bet)
Regex doesn't recognize \n , ^ , $ depside that RegexOptions.Multiline is enabled;
It's reading only the first occurrence of repeating expression, tried to use both "*" or just copy the same expression without * 2+ times, in both ways it's reading just the first occurrence.

Perhaps there is a misunderstanding of groups and captures and what they contain. The RegEx appears to be trying to get data from the "Seat ... in chips" lines. But parts of these are enclosed in non-capturing groups. However, the main values from these lines are captured (see the output, shown below, from the captures).
Using the RegEx in the question and the code below, where string input is initialised to the multi-line text shown in the question.
MatchCollection matches = Regex.Matches(input, pattern, RegexOptions.Multiline);
for (int ii = 0; ii < matches.Count; ii++)
{
Console.WriteLine("Match[{0}] // of 0..{1}:", ii, matches.Count - 1);
DisplayMatchResults(matches[ii]);
}
Gives the lines below (there is much more output) from the "Seat ... in chips" lines. Note that function DisplayMatchResults is taken from this StackOverflow answer.
match.Groups[15].Captures[0].Value == "1"
match.Groups[15].Captures[1].Value == "3"
match.Groups[15].Captures[2].Value == "4"
match.Groups[15].Captures[3].Value == "5"
match.Groups[15].Captures[4].Value == "7"
match.Groups[15].Captures[5].Value == "9"
match.Groups[16].Captures[0].Value == "mpolishuk"
match.Groups[16].Captures[1].Value == "Kevin2049"
match.Groups[16].Captures[2].Value == "IPray2Buddha"
match.Groups[16].Captures[3].Value == "Sakura2892"
match.Groups[16].Captures[4].Value == "Lillien66"
match.Groups[16].Captures[5].Value == "owlie45"
match.Groups[17].Captures[0].Value == "2018"
match.Groups[17].Captures[1].Value == "1154"
match.Groups[17].Captures[2].Value == "1030"
match.Groups[17].Captures[3].Value == "1499"
match.Groups[17].Captures[4].Value == "2141"
match.Groups[17].Captures[5].Value == "5658"
Note that the regex is overcomplicated. [\w\d] is the same\w. [\d]+ is the same as \d+. There is no need to escape /, so replace \/ with /. The dates and times are treated differently, namely ([\d]+\/[\d]+\/[\d]+ ) versus ([\d:]+). Perhaps the date be simplified to ([\d/]+ )? Also, does the space need to be captured as part of the date? When matching linebreaks I normally use `[\r\n]+, unless the specific pattern of CRs and LFs is important. There are lots of capture groups in the Regex, are they all needed? Note that changing the RexEx by adding or removing captures will mean that the numbers of all subsequent groups will change.

Seems like C# reads the file with multiples \r\n at some lines which was causing the problem. Changing the new line expression to (\r\n)* solved the problem.

Related

C# Search for string in text file and return a different string

I have many text files formatted like this:
tag(1008)<EX-->
------- Critical Item -------
Point taken at 06:00
NAME: OUTPUT_EXH_1_PLAN14_POINT3
Y -43.842 -43.850 0.100 0.100 0.008 +
tag(1009)<EX-->
------- Critical Item -------
Point taken at 09:00
NAME: OUTPUT_EXH_1_PLAN14_POINT4
Y -43.825 -43.850 0.100 0.100 0.025 ++
tag(1010)<EX-->
------- Critical Item -------
Y = ITEM 4
NAME: OUTPUT_EXH_1_PLAN14
Y -43.838 -43.850 0.100 0.100 0.012 +
tag(1011)<EX-->
EXH_1 Zero hole Cast to machine location
NAME: OUTPUT_EXH_1_CIRC30
Z 0.041 0.000 0.150 0.150 0.041 ++
X -0.035 0.000 0.150 0.150 -0.035 -
tag(1012)<EX-->
Point taken at 06:00
NAME: OUTPUT_EXH_1_PLAN15_POINT1
Y -23.555 -23.500 0.100 0.100 -0.055 ---
The actual text files may be several hundreds of lines (but less than 1000 lines). Above is just an example of some of the lines. I am new to C# and I have been searching online for hours for how to do what I want to do and have found many different methods... some seem simple... some seem complicated... I don't know which method is "better" for my application. Regardless, everything I have found either needs to be tweaked to do what I need or only shows part of the code I need and assumes I am skilled enough to figure out the rest. Can someone please help me by posting a complete working example.
What I need...
If the above text file is "D:\myFile.txt"
I want to search for the string "tag(1010)"
Then I want to get the first number after the "Y" after the string "tag(1010)"
So the number I would get would be "myNumber = -43.838"
As far as I have gotten was:
var myString = File.ReadAllLines("D:\myFile.txt")
.SkipWhile(myString => !myString.Contains("tag(1010)<EX-->"))
.Skip(1) // optional
.TakeWhile(myString => !myString.Contains("tag(1011)<EX-->h"));
Then I was going to try to add more code to extract the "-43.838" out of myString... But of course the above code doesn't work.

string myNumber = input.Split(new string[] {Environment.NewLine},StringSplitOptions.RemoveEmptyEntries)
.SkipWhile((str) => !str.Trim().StartsWith("tag(1010)"))
.FirstOrDefault((str) => str.Trim().StartsWith("Y"))
.Split(new string[] {" "}, StringSplitOptions.RemoveEmptyEntries)[1];
Replace input in input.Split
This will split by newline, skip all lines until it finds tag(1010), Find and returns the next line starting with Y, splits on spaces and returns the 2nd item [1] as the first will be the Y we split on.

C# - break out large string into multiple smaller strings for export to a database

C# newb here - I have a script written in C# which takes the contents of several fields of the internal database of an application (Contoso Application, in this case) and exports them to a SQL Server Database table.
Here is the code:
using System;
using System.IO;
using System.Data.SqlClient;
using Contoso.Application.Api;
using Contoso.Application.Commands;
using System.Linq;
public class Script
{
public static bool ExportData(DataExportArguments args)
{
try
{
var sqlStringTest = new SqlConnectionStringBuilder();
sqlStringTest.DataSource = "SQLserverName";
sqlStringTest.InitialCatalog = "TableName";
sqlStringTest.IntegratedSecurity = True;
sqlStringTest.UserID = "userid";
sqlStringTest.Password = "password";
using (var sqlConnection = new SqlConnection(sqlStringTest.ConnectionString))
{
sqlConnection.Open();
using (IExportReader dataReader = args.Data.GetTable())
{
while (dataReader.Read())
{
using (var sqlCommand = new SqlCommand())
{
sqlCommand.Connection = sqlConnection;
sqlCommand.CommandText =
#"INSERT INTO [dbo].[Table] (
Id,
Url,
articleText)
VALUES (
#Id,
#Url,
#articleText)";
sqlCommand.Parameters.AddWithValue("#Id", dataReader.GetStringValue("Id"));
sqlCommand.Parameters.AddWithValue("#Url", dataReader.GetStringValue("Url"));
sqlCommand.Parameters.AddWithValue("#articleText",
dataReader.Columns.Any(x => x.Name == "articleText")
? dataReader.GetStringValue("articleText")
: (object)DBNull.Value);
}
}
}
}
}
catch (Exception exp)
{
args.WriteDebug(exp.ToString(), DebugMessageType.Error);
return false;
}
return true;
}
}
FYI - articleText is of type nvarchar(max)
What I'm trying to accomplish: sometimes the data in the articleText field is short, sometimes it is very long. What I need to do is break out a record into multiple records when the string in a given articleText field is greater than 10,000 characters. So if a given articleText field is 25,000 characters, there would be 3 records exported: first one would have an articleText field of 10,000 characters, 2nd, 10,000 characters, 3rd, 5,000 characters.
Further to this requirement, I need to ensure that if the character cutoff for each record falls in the middle of a word (which will likely happen most of the time) that I account for that.
Therefore, as an example, if we have a record in the application's internal database with Id of 1, Url of www.contoso.com, and articleText of 28,000 characters, I would want to export 3 records to SQL Server as such:
Record 1:
Id: 1
Url: www.contoso.com
articleText: if articleText greater than 10,000 characters, export characters 1-10,000, else export entirety of articleText.
Record 2:
Id: 1
Url: www.contoso.com
articleText: assuming Record 2 only exists if Record 1 was greater than 10k character, export characters 9,990-20,000 (start at character 9,990 in case Record 1 cuts off at the middle of a word).
Record 3:
Id: 1
Url: www.contoso.com
articleText: export characters 19,900-28,000 (or alternatively, 19,900 through end of string).
For any given export session, there are thousands of records in the internal database to be exported (hence the while loop). Approximately 20% of the records will meet the criteria of articleText exceeding 10k characters, so for any that don't, we absolutely only want to export one record. Further, although my example above only goes to 28k characters, this script needs to be able to accommodate any size.
I'm a bit stumped at how one would go about accomplishing something like this. I believe the first step is to get a character count for articleText to determine how many records need to be exported. From there, I feel I've gone down a rabbit hole. Any suggestions on how to go about this would be greatly appreciated.
EDIT #1: to clarify on the cutoff requirement - the reason the above is the approach I'm suggesting to handle the cutoff is because the article may have a person's name in it. Simply finding a space and cutting it off there wouldn't work because it's possible you would split between a first and last name. The approach I mention above would meet our requirements because the word or name only needs to exist in its entirety in one of the records.
Further, reassembly of the separated records in SQL Server is not a requirement and therefore not necessary.

This might be a start: it's not very efficient, admittedly, but just to illustrate how it might be done:
void Main()
{
string text = "012345 6789012 3456789012 34567890 1234567" +
"0123 456789 01234567 8901234567 8901234567" +
"012345 67890123456 78901234567890123456" +
"0123456 7890123456 789012345 6789012345" +
"012345 678901234 5678901234 5678901234" +
"01234567 89012345678 901234567890123" +
"ABCDEFGHI JLMNOPQES TUVWXYZ";
int startingPoint = 0;
int chunkSize = 50;
int padding = 10;
List<string> chunks = new List<string>();
do
{
if (startingPoint == 0)
{
chunks.Add(new string(text.Take(chunkSize).ToArray()));
}
else
{
chunks.Add(new string(text.Skip(startingPoint).Take(chunkSize).ToArray()));
}
startingPoint = startingPoint + chunkSize - padding;
}
while (startingPoint < text.Length);
Console.WriteLine("Original length: {0}", text.Length);
Console.WriteLine("Chunk count: {0}", chunks.Count);
Console.WriteLine("Expected new length: {0}", text.Length + (chunks.Count -1) * padding);
Console.WriteLine("Actual new length: {0}", chunks.Sum(c => c.Length));
Console.WriteLine();
Console.WriteLine("Chunks:");
foreach (var chunk in chunks)
{
Console.WriteLine(chunk);
}
}
Output:
Original length: 263
Chunk count: 7
Expected new length: 323
Actual new length: 323
Chunks:
012345 6789012 3456789012 34567890 12345670123 456
670123 456789 01234567 8901234567 8901234567012345
4567012345 67890123456 789012345678901234560123456
4560123456 7890123456 789012345 6789012345012345 6
45012345 678901234 5678901234 567890123401234567 8
01234567 89012345678 901234567890123ABCDEFGHI JLMN
EFGHI JLMNOPQES TUVWXYZ

You are going to have to tokenize the input to be able split it sensibly. In order to do that, you have to be able to make some assumptions about the input.
For example, you could split the input on the last end-of-sentence that occurs prior to the 10K character boundary. But, you have to be able to make concrete assumptions with the input about what constitutes an end-of-sentence. If you can assume that the input is well-punctuated and grammatically correct, then a simple regex like [^.!?]+[.!?] {1,2}[A-Z] can be used to detect the end of a sentence, where the sentence ends with ".", "!", or "?", is followed by at least one but no more than two spaces, and the next character is a capital letter. Since the
following capital letter is included in the match, you just drop back one character position and split.
The exact process will depend on the specific assumptions you can make about the input.

Regex access elements within string

I have the following regex code:
#"(N[0-9][EHPULMAVRYGBWK123670]{4}[N]{1}PF[0]{1}[0-9]{1})";
The [EHPULMAVRYGBWK123670] within the regex refer to specific button types or colours.
There are four buttons in total, and the order they are in the part number denotes the order that they are in the product (from top left to top right).
For example if the part number contained:
RGBY - Red (Top Left), Green (Top Right), Blue (Bottom Left), Yellow (Bottom Right)
GBYR - Green (Top Left), Blue (Top Right), Yellow (Bottom Left), Red (Bottom Right)
After the buttons, there is always the letter N, and a PF number.
What I want to do is extract the 4 letter combination for the colors. The {4} in the regex is what captures those letters. I then needs to make a decision based on the order of the letters.
How would I go about doing this?

You need to modify your regex slightly so that it captures the four letters in question. You can then decide what to do with them:
var pattern = #"(N[0-9]([EHPULMAVRYGBWK123670]{4})[N]{1}PF[0]{1}[0-9]{1})";
var test = "NO4A6SRP11N2UBWYNPF05";
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
var result = regex.Match(test);
if(result.Success)
{
var value = result.Groups[2].Value;
switch (value)
{
case "UBWY":
//Do something
break;
case "RBYG":
//Do something
break;
default:
break;
}
}
Notice the parenthesis around the pattern that matches the four letters.
There are more elegant approaches to deciding what to do with the four letter code. In this case I have provided a simple switch statement for illustration purposes.
Alternatively, you can examine the string you capture letter by letter:
//Character by character, in order
for (int i = 0; i < value.Length; i++)
{
char letter = value[i];
//Decide what to do here.
}
//Or check positions by index
if(value[0] == 'U')
{
//Decide what to do here.
}
Depending on how many combinations are possible, you might want to consider using a state machine.

Regex - 1st group 1 time, 2nd group Multiple times

I have data like -
06deepaksharma
i need regex to split the data as
06 > then multiple group of (06 char)
so its going to be like
first 2 digit then multiple groups, each with the length of first 2 digit value.
01DE > 01 D E 01 - then 2 group each 1 char length
02DE > 02 DE 02 - then 1 group each 2 char length
02DESH > 02 DE SH 02 - then 2 group each 2 char length
03DEESHA > 03 DEE SHA 03 - then 2 group each 3 char length
01DEESHA > 01 D E E S H A 01 - then 6 group each 1 char length
Hope now its clear what i want.
I am not getting how to fix the length for second group on the basis of first group value and how to define that second group may occur N times.
UPDATE BELOW ---
so if we can not apply the length on second group then can we get all the possibility if I say i fix the length of second group?
mean if length going to be 2 for char groups
01DE > 01 DE
01DEEPAK > 01 DE EP AK
XXDEEP > XX DE EP
So if we say length going to be 2 all the times, now can be get the desired result as stated in UPDATED parts

You can achieve what you described in the beginning of your question with both regex and LINQ:
var input = "03DEESHA";
var result = new List<string>();
var mtch = Regex.Match(input, #"^(\d+)(.*)"); // Get the Match object with captured texts
result.Add(mtch.Groups[1].Value); // Add the number to the resulting list
var chunks = Regex.Matches(mtch.Groups[2].Value, // Get all chunks
string.Format(".{{{0}}}", int.Parse(mtch.Groups[1].Value)))
.Cast<Match>()
.Select(p => p.Value)
.ToList();
result.AddRange(chunks);
The regex ^(\d+)(.*) matches any numbers in the beginning (Group 1), and then captures the rest of a single-line string (with no newlines, if you want to support them, add a RegexOptions.Singleline flag to the Regex.Match) into Group 2.
Result of the above code execution:
If you have strings where the number of the letters cannot be divided by the initial number without a remainder, instead of ".{{{0}}}" use ".{{1,{0}}}".

I don´t think you can use regex here as you need to use a back-ref with variable value.
However you may consider a simple linq on the characters:
// first get the number of characters to read
int num = Convert.ToInt32(myString.Substring(0, 2));
// now a simple loop on the characters
for(int i = 2; i < myString.Length; i += num) result.Add(myString.SubString(i, num);
Or if you really want a regex parse the number first and THEN apply your regex:
var r = "([a-zA-Z]{" + num + "})";
var res = new Regex(r).Split(new string(myString.Skip(2).ToArray()));

TextReader.ReadLine() Fails to Read Entire Line

I've got a Comma Delimited Text file that I am trying to read in.
I read in 1 line at a time, and process that information.
Using the code snippet and file fragment below, my error comes when I get to the line that starts with 841 - it only pulls in 147 characters.
Question: What is causing the TextReader to stop pulling in this line? Is there some special sequence in it?
Code Snippet:
int lastNum = -1;
int num = 1;
using (TextReader reader = File.OpenText(filename)) {
do {
string line = reader.ReadLine();
if (!String.IsNullOrEmpty(line)) {
string[] split = line.Split(',');
int indexer = Convert.ToInt32(split[0]);
Console.WriteLine("#{0}: ID '{1}' Line Length = {2}", num++, split[0], line.Length);
}
} while ((-1 < reader.Peek());
reader.Close();
}
File Fragment (from line 0 to ProblemLine + 1):
ID,Line,[Date],WO,Module,DSO,Integer,Unit,,Contact,Category,Problem,Solution,Action,Actor,Acted
824,,1/4/2011,589259,,170966,JC,V3A,,Tom Read,WO.3,"The unit is stainless steel, but the coil connection plates that were on the work order were not stainless steel",MTF # 264698 to take off CC500 AND CC875 and added XCC500 AND XCC875,,,
825,,1/4/2011,588779,,171102,JC,V3A,,,W.4,Changing from a 310AJ motor to a 310AX,MTF # 46746 to fan assembly and motor,,,
826,,1/4/2011,588948,,170941,JC,V3B,,,W.4,Changing from a 310AJ motor to a 310AX,MTF # 241092 and 241093 to change fan assemly and EBM motor,,,
827,,1/4/2011,588206,,171143,JC,H3A,,,WO.2,Potentiometer was missing from the work order,MTF # 264851 to add 29278,,,
828,,1/4/2011,584741 584742 584748 584747 584749,,171009,BF,V2B,,"Carlos, Laura",,Johnson units. Motors would not fit correctly using the motor mounts already installed.,MTF# S264510 to remove 006-300 motor mounts from work orders. MTF# S264699 to add 006-033 motor mounts to work orders.,,,
829,,1/4/2011,586519,,170891-1-2,DB,H3B,,"Carlos, Laura",WO.2,"1"" bushing not on BOM.",MTF# 264769 added 28614,,,
830,,1/4/2011,583814,,170804-1-3,DB,V3B,,"Carlos, Laura",WO.3,Wrong pulley (26710) and wrong Belt A-41 (29725) appear on WO.,MTF# 264570 removed those and put on an A-33 (26768) and pulley 27005. Two units so Qty 2 for each item.,,,
831,,1/5/2011,584742,,171009,JC,V2B,,,,there was an extra overload relay on the work order because it had been changed and the original was never taken off.,MTF # 241926 to take off 7- 27167 overload relay,,,
832,,1/5/2011,591742,,170965,JC,H3C,,"Carlos, Laura",WO.3,Belt was too short,MTF # 241729 to take off 30737 (BX42) and put on 28589 (BX52). Center to center distance was 19 3/8 in,,,
833,,1/5/2011,584749,,171009,JC,H2A,,Joe ,E.3,Did a motor change in order for the motor to work on the unit,MTF # 264854 to add 28918 and take off 28095 motor and SP01204 pulley,,,
834,,1/5/2011,588945,,171157,JC,V3B,,Alex,D,Stainless steel unit needed a stainless steel power entering cover plate.,Spoke with Alex and he designed X302-905 and MTF # 241094 was done to add to this work order.,,,
835,,1/5/2011,589259,,170966,JC,V3A,,Alex,D,Stainless steel unit needed a stainless steel power entering cover plate.,Spoke with Alex and he designed X302-905 and MTF # 241094 was done to add to this work order.,,,
836,,1/5/2011,584749,,171009,JC,H2A,,,,Changed overload relay because changed motor,MTF # 264857 to change overload relay. Took off 27169 and added 26736,,,
837,,1/6/2011,583815,,170804,JC,V3B,,"Carlos, Laura",WO.3,bore hole on the pulley was too big ,MTF # 241096 to take off 26710 7/8 pull and put on 27005 5/8 pulley,,,
838,,1/6/2011,583816,,170804,JC,V3B,,"Carlos, Laura",WO.3,bore hole on the pulley was too big ,MTF # 241096 to take off 26710 7/8 pull and put on 27005 5/8 pulley,,,
839,,1/6/2011,587632,,171143,BF,M2,,"Carlos, Laura",WO.2,H302-850 blank off #3 not on WO.,MTF# S242648 to add (1) H302-850,,,
840,,1/6/2011,583816,,170804,BF,M2,,"Carlos, Laura",WO.3,A41 Belt too large,"MTF# S241706 to remove A41 (29725) and add A33 (26780). C-C distance 12.5",,,
841,,1/7/2011,588945,,171157,JC,V3B,,Tom Read ,D,"Assembly drawing AD-V3B-162C-EPSSTLDR had a 7/8 distributor connecting to a 5/8 opening on a tee.
",MTF # 264653 to to add bushing 27256 and 28997 tee in order to use a tee that would fit into the distributor.,,,
842,,1/7/2011,589257,,170966,JC,V3C,,Everyone ,WO.2,heat exchanger was missing from the work order ,MTF # 264858 to add the heat exchanger on work order and one was ordered.,,,
LOOK! ^^^ S.O.'s reader did it too!
Here is the exact text of the line that starts with 841:
841,,1/7/2011,588945,,171157,JC,V3B,,Tom Read ,D,"Assembly drawing AD-V3B-162C-EPSSTLDR had a 7/8 distributor connecting to a 5/8 opening on a tee.
",MTF # 264653 to to add bushing 27256 and 28997 tee in order to use a tee that would fit into the distributor.,,,
FYI: I am developing in C# against .NET Framework 4.
[Solved] I was able to figure this out using Rob Parker's comment and using a raw Stream instead of the prettier TextReader class. It turns out my Rogue character was an inserted Carriage Return (\n).
using (Stream fs = File.Open(filename, FileMode.Open, FileAccess.Read)) {
byte[] data = new byte[1024];
int len;
do {
len = fs.Read(data, 0, data.Length);
for (int n = 0; n < len; n++) {
if ((n + 3) < len) {
string strId = string.Format("{0}{1}{2}", (char)data[n + 1], (char)data[n + 2], (char)data[n + 3]);
int numeric = Convert.ToInt32(strId);
if (numeric == 841) {
char[] suspects = new char[50];
int n2 = n;
int n3 = 0;
while (n2 < len) {
if ((n + 130 < n2) && (n2 < n + 160)) {
suspects[n3++] = (char)data[n2];
}
n2++;
}
Console.WriteLine("Wait Here!");
break;
}
}
}
num++;
} while (0 < len);
}
Thanks everyone for your help!

TextReader treats any of the following characters as an end-of-line delimiter (it tries to play nice with the various end-of-line conventions out there):
CR.The old MacOS (pre-OS X) end-of-line convention: "\r".
CR+LF.The Microsoft Windows/DOS end-of-line convention: "\r\n".
LF.The *nix end-of-line convention: "\n".
My suspicion is that you've got a spurious \r (CR) floating around in their somewhere.

Since it turned out to be particularly helpful...
Have you checked what character(s) there are between the period and doublequote character at the point where it's splitting the line?
If ReadLine() doesn't include the line-break characters in what it returns you might have to do a little work to get to it/them. But if you can get the FileStream object used by the TextReader (not sure if it's exposed) you could add code to detect the problem line (starting "841,") and hit a breakpoint (or Debugger.Break()) and then use the underlying FileStream to back up the Position and read the raw bytes to see what's there.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex doesn't read multiple lines correctly in C# - c#

Seems like C# reads the file with multiples \r\n at some lines which was causing the problem. Changing the new line expression to (\r\n)* solved the problem.

Related

C# Search for string in text file and return a different string

C# - break out large string into multiple smaller strings for export to a database

Regex access elements within string

Regex - 1st group 1 time, 2nd group Multiple times

TextReader.ReadLine() Fails to Read Entire Line

Categories

Resources