C# Processing Fixed Width Files - Solution Not Working

C# Processing Fixed Width Files - Solution Not Working - c#

I have implemented Cuong's solution here:
C# Processing Fixed Width Files
Here is my code:
var lines = File.ReadAllLines(#fileFull);
var widthList = lines.First().GroupBy(c => c)
.Select(g => g.Count())
.ToList();
var list = new List<KeyValuePair<int, int>>();
int startIndex = 0;
for (int i = 0; i < widthList.Count(); i++)
{
var pair = new KeyValuePair<int, int>(startIndex, widthList[i]);
list.Add(pair);
startIndex += widthList[i];
}
var csvLines = lines.Select(line => string.Join(",",
list.Select(pair => line.Substring(pair.Key, pair.Value))));
File.WriteAllLines(filePath + "\\" + fileName + ".csv", csvLines);
#fileFull = File Path & Name
The issue I have is the first line of the input file also contains digits. So it could be AAAAAABBC111111111DD2EEEEEE etc. For some reason the output from Cuong's code gives me CSV headings like 1111RRRR and 222223333.
Does anyone know why this is and how I would fix it?
Header row example:
AAAAAAAAAAAAAAAABBBBBBBBBBCCCCCCCCDEFCCCCCCCCCGGGGGGGGHHHHHHHHIJJJJJJJJKKKKLLLLMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOPPPPQQQQ1111RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR222222222333333333444444444555555555666666666777777777888888888999999999S00001111TTTTTTTTTTTTUVWXYZ!"£$$$$$$%&
Converted header row:
AAAAAAAAAAAAAAAA BBBBBBBBBB CCCCCCCCDEFCCCCCC C C C GGGGGGGG HHHHHHHH I JJJJJJJJ KKKK LLLL MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO PPPP QQQQ 1111RRRR RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR2222 222223333 333334444 444445555 555556666 666667777 777778888 888889999 99999S000 0 1111 TTTTTTTTTTTT U V W X Y Z ! ",ï¿½,$$$$$$,%,&,"
Jodrell - I implemented your suggestion but the header output is like:
BBBBBBBBBBCCCCCC CCCCCCCCD DEFCCCC GGGGGGGG HHHHHHH IJJJJJJ KKKKLLL LLL MMM NNNNNNNNNNNNNNNNNNNNNNNNNNNNN OOOOOOOOOOOOOOOOOOOOOOOOOOOOO PPPPQQQQ1111RRRRRRRRRRRRRRRRR QQQ 111 RRR 33333333 44444444 55555555 66666666 77777777 88888888 99999999 S0000111 111 TTT UVWXYZ!"ï¿½$$ %&

As Jodrell already mentioned, your code doesn't work because it assumed that the character representing each column header is distinct. Change the code that parse the header widths would fix it.
Replace:
var widthList = lines.First().GroupBy(c => c)
.Select(g => g.Count())
.ToList();
With:
var widthList = new List<int>();
var header = lines.First().ToArray();
for (int i = 0; i < header.Length; i++)
{
if (i == 0 || header[i] != header[i-1])
widthList.Add(0);
widthList[widthList.Count-1]++;
}
Parsed header columns:
AAAAAAAAAAAAAAAA BBBBBBBBBB CCCCCCCC D E F CCCCCCCCC GGGGGGGG HHHHHHHH I JJJJJJJJ KKKK LLLL MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO PPPP QQQQ 1111 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 222222222 333333333 444444444 555555555 666666666 777777777 888888888 999999999 S 0000 1111 TTTTTTTTTTTT U V W X Y Z ! " £ $$$$$$ % &

EDIT
Because the problem annoyed me I wrote some code that handles " and ,. This code replaces the header row with comma delimited alternating zeros and ones. Any commas or double quotes in the body are appropriately escaped.
static void FixedToCsv(string sourceFile)
{
if (sourceFile == null)
{
// Throw exception
}
var dir = Path.GetDirectory(sourceFile)
var destFile = string.Format(
"{0}{1}",
Path.GetFileNameWithoutExtension(sourceFile),
".csv");
if (dir != null)
{
destFile = Path.Combine(dir, destFile);
}
if (File.Exists(destFile))
{
// Throw Exception
}
var blocks = new List<KeyValuePair<int, int>>();
using (var output = File.OpenWrite(destFile))
{
using (var input = File.OpenText(sourceFile))
{
var outputLine = new StringBuilder();
// Make header
var header = input.ReadLine();
if (header == null)
{
return;
}
var even = false;
var lastc = header.First();
var counter = 0;
var blockCounter = 0;
foreach(var c in header)
{
counter++;
if (c == lastc)
{
blockCounter++;
}
else
{
blocks.Add(new KeyValuePair<int, int>(
counter - blockCounter - 1,
blockCounter));
blockCounter = 1;
outputLine.Append(',');
even = !even;
}
outputLine.Append(even ? '1' : '0');
lastc = c;
}
blocks.Add(new KeyValuePair<int, int>(
counter - blockCounter,
blockCounter));
outputLine.AppendLine();
var lineBytes = Encoding.UTF.GetBytes(outputLine.ToString());
outputLine.Clear();
output.Write(lineBytes, 0, lineBytes.Length);
// Process Body
var inputLine = input.ReadLine();
while (inputLine != null)
{
foreach(var block in block.Select(b =>
inputLine.Substring(b.Key, b.Value)))
{
var sanitisedBlock = block;
if (block.Contains(',') || block.Contains('"'))
{
santitisedBlock = string.Format(
"\"{0}\"",
block.Replace("\"", "\"\""));
}
outputLine.Append(sanitisedBlock);
outputLine.Append(',');
}
outputLine.Remove(outputLine.Length - 1, 1);
outputLine.AppendLine();
lineBytes = Encoding.UTF8.GetBytes(outputLne.ToString());
outputLine.Clear();
output.Write(lineBytes, 0, lineBytes.Length);
inputLine = input.ReadLine();
}
}
}
}
1 is repeated in your header row, so your two fours get counted as one eight and everything goes wrong from there.
(There is a block of four 1s after the Qs and another block of four 1s after the 0s)
Essentialy, your header row is invalid or, at least, doesen't work with the proposed solution.
Okay, you could do somthing like this.
public void FixedToCsv(string fullFile)
{
var lines = File.ReadAllLines(fullFile);
var firstLine = lines.First();
var widths = new List<KeyValuePair<int, int>>();
var innerCounter = 0;
var outerCounter = 0
var firstLineChars = firstLine.ToCharArray();
var lastChar = firstLineChars[0];
foreach(var c in firstLineChars)
{
if (c == lastChar)
{
innerCounter++;
}
else
{
widths.Add(new KeyValuePair<int, int>(
outerCounter
innerCounter);
innerCounter = 0;
lastChar = c;
}
outerCounter++;
}
var csvLines = lines.Select(line => string.Join(",",
widths.Select(pair => line.Substring(pair.Key, pair.Value))));
// Get filePath and fileName from fullFile here.
File.WriteAllLines(filePath + "\\" + fileName + ".csv", csvLines);
}

Related

Getting the sum per hour from 2 datatable

I'm writing a txt file from 2 data table.
Following is the 2 data table.
dt1
Transaction No. Time Amount Date
1 10:00:00 200.00 03/05/2020
2 10:30:11 250.00 03/05/2020
3 11:05:22 140.00 03/05/2020
4 11:45:33 230.00 03/05/2020
5 12:15:10 220.00 03/05/2020
dt2
Transaction No. Added Amount Date
1 40.00 03/05/2020
2 25.00 03/05/2020
3 40.00 03/05/2020
4 30.00 03/05/2020
5 30.00 03/05/2020
following is my code
using (StreamWriter sw = File.AppendText(fileName))
{
for (int a = 6; a <= 23; a++)
{
string aa = a.ToString().PadLeft(2, '0');
double salex = double.Parse(dt1.Rows[0]["Amount"].ToString());
if (salex.Equals(""))
{
salex = 0;
}
else
{
salex = double.Parse(dt1.Rows[0]["Amount"].ToString());
}
double vatx = double.Parse(dt2.Rows[0]["Added Amount"].ToString());
if (vatx.Equals(""))
{
vatx = 0;
}
else
{
vatx = double.Parse(dt2.Rows[0]["Added Amount"].ToString());
}
double dailysaleHRLY = -salex + -vatx;
sw.Write(dtpDate.Value.ToString("MM/dd/yyyy") + ",");
sw.Write(aa + ":00" + ",");
sw.Write(dailysaleHRLY.ToString("0.00") + ",");
}
for (int a = 0; a <= 5; a++)
{
string aa = a.ToString().PadLeft(2, '0');
double salex = double.Parse(dt1.Rows[0]["Amount"].ToString());
if (salex.Equals(""))
{
salex = 0;
}
else
{
salex = double.Parse(dt1.Rows[0]["Amount"].ToString());
}
double vatx = double.Parse(dt2.Rows[0]["Added Amount"].ToString());
if (vatx.Equals(""))
{
vatx = 0;
}
else
{
vatx = double.Parse(dt2.Rows[0]["Added Amount"].ToString());
}
double dailysaleHRLY = -salex + -vatx;
sw.Write(dtpDate.Value.ToString("MM/dd/yyyy") + ",");
sw.Write(aa + ":00" + ",");
sw.Write(dailysaleHRLY.ToString("0.00") + ",");
}
MessageBox.Show("Txt File succesfully created!", "SYSTEM", MessageBoxButtons.OK, MessageBoxIcon.Information);
}
This is the output of my code.
Date, Time, Sum
03/05/2020,06:00,515.00
03/05/2020,07:00,515.00
03/05/2020,08:00,515.00
03/05/2020,09:00,515.00
03/05/2020,10:00,515.00
03/05/2020,11:00,515.00
03/05/2020,12:00,515.00
03/05/2020,13:00,515.00
03/05/2020,14:00,515.00
03/05/2020,15:00,515.00
03/05/2020,16:00,515.00
03/05/2020,17:00,515.00
03/05/2020,18:00,515.00
03/05/2020,19:00,515.00
03/05/2020,20:00,515.00
03/05/2020,21:00,515.00
03/05/2020,22:00,515.00
03/05/2020,23:00,515.00
03/05/2020,00:00,515.00
03/05/2020,01:00,515.00
03/05/2020,02:00,515.00
03/05/2020,03:00,515.00
03/05/2020,04:00,515.00
03/05/2020,05:00,515.00
I just want to get the sum of Amount and Added Amount base on hour. Like this.
Date, Time, Sum
03/05/2020,06:00,0.00
03/05/2020,07:00,0.00
03/05/2020,08:00,0.00
03/05/2020,09:00,0.00
03/05/2020,10:00,515.00
03/05/2020,11:00,440.00
03/05/2020,12:00,250.00
03/05/2020,13:00,0.00
03/05/2020,14:00,0.00
03/05/2020,15:00,0.00
03/05/2020,16:00,0.00
03/05/2020,17:00,0.00
03/05/2020,18:00,0.00
03/05/2020,19:00,0.00
03/05/2020,20:00,0.00
03/05/2020,21:00,0.00
03/05/2020,22:00,0.00
03/05/2020,23:00,0.00
03/05/2020,00:00,0.00
03/05/2020,01:00,0.00
03/05/2020,02:00,0.00
03/05/2020,03:00,0.00
03/05/2020,04:00,0.00
03/05/2020,05:00,0.00

Assuming that you have two DataTable-s and you have them filled with the mentioned data.
var dt1 = new DataTable();
var dt2 = new DataTable();
dt1.Columns.AddRange(new[]
{
new DataColumn("Transaction No.", typeof(int)),
new DataColumn("Time", typeof(DateTime)),
new DataColumn("Amount", typeof(decimal)),
new DataColumn("Date", typeof(DateTime)),
});
dt2.Columns.AddRange(new[]
{
new DataColumn("Transaction No.", typeof(int)),
new DataColumn("Added Amount", typeof(decimal)),
new DataColumn("Date", typeof(DateTime)),
});
Note: The double types have been replaced with decimal types since its the right type to be used when dealing with money.
As I understand the problem, you want to group the rows of dt1 by hour part of the Time field, sum the Amount, and add to the sum the Added Amount from dt2 rows where their Transaction No. equals to any Transaction No. of the grouped rows of dt1.
This will do:
var group = dt1.AsEnumerable().GroupBy(x => x.Field<DateTime>(1).Hour);
var sb = new StringBuilder();
sb.Append("Date,");
sb.Append("Time,".PadLeft(12, ' '));
sb.AppendLine("Sum".PadLeft(5, ' '));
//if PadLeft is not required in the output, then just:
//sb.AppendLine($"Date, Time, Sum");
foreach (var g in group)
{
var sum = 0M;
foreach (var r in g)
sum += r.Field<decimal>(2) + dt2.AsEnumerable()
.Where(x => x.Field<int>(0) == r.Field<int>(0))
.Sum(x => x.Field<decimal>(1));
sb.AppendLine($"{g.First().Field<DateTime>(3).ToString("MM/dd/yyyy")}, {g.Key.ToString("00")}:00, {sum.ToString("0.00")}");
}
Note: You can use the fields names instead of their indexes.
The output is:
Date, Time, Sum
03/05/2020, 10:00, 515.00
03/05/2020, 11:00, 440.00
03/05/2020, 12:00, 250.00
I don't know whether the DataTable-s already contain the required data to generate the output mentioned in the last quote block or you want to append the rest before writing to the text file. In case of the second scenario, you can do something like:
var group = dt1.AsEnumerable().GroupBy(x => x.Field<DateTime>(1).Hour);
var sb = new StringBuilder();
sb.AppendLine($"Date, Time, Sum");
for (var i = 0; i < 24; i++)
{
var g = group.FirstOrDefault(x => x.Key == i);
if (g != null)
{
var sum = 0M;
foreach (var r in g)
sum += r.Field<decimal>(2) + dt2.AsEnumerable()
.Where(x => x.Field<int>(0) == r.Field<int>(0))
.Sum(x => x.Field<decimal>(1));
sb.AppendLine($"{g.First().Field<DateTime>(3).ToString("MM/dd/yyyy")}, {g.Key.ToString("00")}:00, {sum.ToString("0.00")}");
}
else
sb.AppendLine($"{group.First().First().Field<DateTime>(3).ToString("MM/dd/yyyy")}, {i.ToString("00")}:00, 0.00");
}
If you need to preserve the same order of the hours:
for (var ii = 6; ii < 30; ii++)
{
var i = ii > 23 ? ii % 24 : ii;
var g = group.FirstOrDefault(x => x.Key == i);
if (g != null)
{
//The same...
}
Finally, to create or overwrite the text file (fileName):
File.WriteAllText(fileName, sb.ToString());
Or to append the output:
File.AppendAllText(fileName, sb.ToString());

C# Character encoding issues

I'm trying to make a transport agent that parses the body of an email to find the pertinent pieces of information and replaces the generic subject line with specific details. My problem is that a subject line that should read like:
ABC Co Error: No status reason code (0123456)
instead shows up as
A B C C o E r r o r : N o s t a t u s r e a s o n c o d e ( 0 1 2 3 4 5 6 )
The email is plain text and encoded in us-ascii according to the email header. My problem is that It is my understanding from this question and this question that C# uses UTF-16 as the default string encoding. The spaces between each character lead me to believe that my code is somehow implicitly converting the ASCII to UTF-16, but I don't know where this would be happening. Any ideas on how to make this work properly?
void OnSubmittedMessageHandler(SubmittedMessageEventSource source, QueuedMessageEventArgs args)
{
this.mailItem = args.MailItem;
for (int intCounter = this.mailItem.Recipients.Count - 1; intCounter >= 0; intCounter--)
{
// Check if the email was sent to automated#mydomain.com
string msgRecipientP1 = this.mailItem.Recipients[intCounter].Address.LocalPart;
if (msgRecipientP1.ToLower() == "automated")
{
// Read the body of the email
string line = "";
Dictionary<string, string> EDIErrors = new Dictionary<string, string>();
Body body = this.mailItem.Message.Body;
Stream originalBodyContent = body.GetContentReadStream();
StreamReader streamReader = new StreamReader(originalBodyContent, System.Text.Encoding.ASCII, true);
while ((line = streamReader.ReadLine()) != null)
{
if (line.IndexOf("Partner:") > 0)
{
line.Replace(": ", ":");
string[] lineParts = line.Split(new[] { " " }, StringSplitOptions.None);
foreach (string EDIErrorPart in lineParts)
{
int idx = EDIErrorPart.IndexOf(':');
int qidx = EDIErrorPart.IndexOf('"');
if (idx > 0)
{
EDIErrors[EDIErrorPart.Substring(0, idx).ToLower()] = EDIErrorPart.Substring(idx + 1).ToLower();
}
else if (qidx > 0)
{
EDIErrors["Message"] = EDIErrorPart.Replace("\"", string.Empty);
}
}
}
}
if (originalBodyContent != null)
{
originalBodyContent.Close();
}
// Build the new Subject line and the recipient groups
string sOrder;
string sMessage;
string sDistroGroup;
EDIErrors.TryGetValue("Order", out sOrder);
EDIErrors.TryGetValue("Message", out sMessage);
EDIErrors.TryGetValue("Partner", out sDistroGroup);
string NewSubject = sPartner + " Error: " + sMessage + "(" + sOrder + ")";
this.mailItem.Message.Subject = NewSubject;
if (IsTicketable)
{
this.mailItem.Recipients.Add(new RoutingAddress("helpdesk#mydomain.com"));
}
}
}
return;
}

PDF Multi-level break and print

I am trying print documents by simplex/duplex and then by envelope type (pressure seal or regular)
I have Boolean fields for Simplex and for PressureSeal in my Record class.
All pressure seal are simplex, then there are regular simplex and duplex documents.
I can currently print the pressure seal documents separate from the regular simplex. I need to be able to create the regular duplex documents.
I have some lines commented out that caused all documents to be duplicated.
So, I am looking for something that works like so:
if (Simplex)
if (pressureseal)
create output file
else
create regular simplex output file
else
create duplex output file
Here is my existing code
#region Mark Records By Splits
//splits - 3,7,13
var splits = Properties.Settings.Default.Splits.Split(',');
Dictionary<int, int> splitRanges = new Dictionary<int, int>();
int lastSplit = 0;
foreach (var split in splits)
{
// Attempt to convert split into a integer and skip it if we can't.
int splitNum;
if (!int.TryParse(split, out splitNum))
continue;
splitRanges.Add(lastSplit, splitNum);
lastSplit = Math.Max(lastSplit, splitNum + 1);
}
// Assign record splits.
foreach (var range in splitRanges)
{
var recordsInRange = NoticeParser.records
.Where(x => x.Sheets >= range.Key && x.Sheets <= range.Value)
.ToList();
recordsInRange.ForEach(x => x.Split = string.Format("{0}-{1}", range.Key, range.Value));
}
var unassignedRecords = NoticeParser.records.Where(x => x.Sheets >= lastSplit).ToList();
unassignedRecords.ForEach(x => x.Split = string.Format("{0}up", lastSplit));
#endregion
#region Sort out Pressure Seal records
var recordsGroupedByPressureSeal = NoticeParser.records
.GroupBy(x=>x.PressureSeal);
//var recordsGroupedBySimplex = NoticeParser.records.GroupBy(x => x.Simplex);
#endregion
int fCount = 0;
int nsCount = 0;
//foreach (var simdupGroup in recordsGroupedBySimplex)
//{
// var recordsGroupedBySimDup = simdupGroup.GroupBy(x => x.Split).OrderBy(x => x.Key).ToDictionary(x => x.Key, x => x.ToList());
foreach (var pressureGroup in recordsGroupedByPressureSeal)
{
var recordsGroupedBySplit = pressureGroup.GroupBy(x => x.Split).OrderBy(x => x.Key).ToDictionary(x => x.Key, x => x.ToList());
foreach (var recordsInSplit in recordsGroupedBySplit.Values)
{
string processingExecutable = Path.Combine(Properties.Settings.Default.RootFolder, Properties.Settings.Default.ProcessingExecutable);
string toProcessingFile = string.Format(Properties.Settings.Default.OutputFolder + "{0}_" + "toBCC.txt", fCount);
string fromProcessingFile = string.Format(Properties.Settings.Default.OutputFolder + "IBC_LN_Sort_FromBCC.txt");
// If a sortation executable is specified, run it.
if (recordsInSplit.Count >= Properties.Settings.Default.MinimumSortationCount &&
File.Exists(processingExecutable))
{
// log.Info("Sorting records...");
var processedRecords = recordsInSplit.ProcessAddresses<Record, RecordMap>(
processingExecutable,
toProcessingFile,
fromProcessingFile);
// Update records with the sortation fields.
recordsInSplit.UpdateAddresses(processedRecords);
}
else
{
toProcessingFile = string.Format(Properties.Settings.Default.OutputFolder + "{0}_no_sort_toBCC.txt", nsCount);
fromProcessingFile = string.Format(Properties.Settings.Default.OutputFolder + "IBC_LN_NoSort_FromBCC.txt");
//var processedRecords = recordsInSplit.ProcessAddresses<Record, RecordMap>(
// processingExecutable,
// toProcessingFile,
// fromProcessingFile);
// Update records with the sortation fields.
// recordsInSplit.UpdateAddresses(processedRecords);
// If not sorted, provide our own sequence number.
int sequence = 1;
recordsInSplit.ForEach(x => x.SequenceNumber = sequence++);
recordsInSplit.ForEach(x => x.TrayNumber = 1);
nsCount++;
}
fCount++;
}
}
//}
NoticeWriter noticeWriter = new NoticeWriter(noticeParser.reader);
#region Print by PressureSeal or Regular
//foreach (var simdupGroup in recordsGroupedBySimplex)
//{
// string printType = null;
// if (simdupGroup.Key)
// printType = "Simplex";
// else
// printType = "Duplex";
foreach (var splitGroup in recordsGroupedByPressureSeal)
{
string envType = ""; // envelope type
if (splitGroup.Key)
envType = "PressureSeal";
else
envType = "Regular";
var recordsGroupedBySplit = splitGroup.GroupBy(x => x.Split).OrderBy(x => x.Key).ToDictionary(x => x.Key, x => x.ToList());
foreach (var recordsInSplit in recordsGroupedBySplit)
{
string outputName = string.Format("IBC_Daily_Notices_{0}_{1}",envType, /*printType,*/ recordsInSplit.Key);
noticeWriter.WriteOutputFiles(Properties.Settings.Default.OutputFolder, outputName, recordsInSplit.Value, Properties.Settings.Default.RecordsPerBatch);
}
}
//}
#endregion

String Split to String Array without seperator

Code:
string animals = "cat98dog75";
What i try to achieve :
string a = "cat98";
string b = "dog75";
Question :
How do i split the string using some range delimiter?
example :
animals.split();

I suggest matching with a help of regular expressions:
using System.Text.RegularExpressions;
...
string animals = "cat98dog75";
string[] items = Regex
.Matches(animals, "[a-zA-Z]+[0-9]*")
.OfType<Match>()
.Select(match => match.Value)
.ToArray();
string a = items[0];
string b = items[1];
Concole.Write(string.Join(", ", items));
Outcome:
cat98, dog75
In case you want to split the initial string by equal size chunks:
int size = 5;
string[] items = Enumerable
.Range(0, animals.Length / size + (animals.Length % size > 0 ? 1 : 0))
.Select(index => (index + 1) * size <= animals.Length
? animals.Substring(index * size, size)
: animals.Substring(index * size))
.ToArray();
string a = items[0];
string b = items[1];

This might do the trick for you
string animals = "cat98dog75";
string[] DiffAnimals = Regex.Split(animals, "(?<=[0-9]{2})")
.Where(s => s != String.Empty) //Just to Remove Empty Entries.
.ToArray();

If you want to split the name of the animal and number, try following..
I know its too long....
private static void SplitChars()
{
string animals = "cat98dog75";
Dictionary<string, string> dMyobject = new Dictionary<string, string>();
string sType = "",sCount = "";
bool bLastOneWasNum = false;
foreach (var item in animals.ToCharArray())
{
if (char.IsLetter(item))
{
if (bLastOneWasNum)
{
dMyobject.Add(sType, sCount);
sType = ""; sCount = "";
bLastOneWasNum = false;
}
sType = sType + item;
}
else if (char.IsNumber(item))
{
bLastOneWasNum = true;
sCount = sCount + item;
}
}
dMyobject.Add(sType, sCount);
foreach (var item in dMyobject)
{
Console.WriteLine(item.Key + "- " + item.Value);
}
}
You will get output as
cat - 98
dog - 75
Basically, you are getting type and numbers so if you want to use the count, you don't need to split again...

How to make the custom parser for text file

Actually I set four columns using data table and I want this column retrieve value from text file. I used regex for remove the particular line from the text file.
My objective is that I want to show text file on the grid using data table so first I am trying to create data table and remove the line (show at the program) using regex.
Here I post my full code.
namespace class
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
StreamReader sreader = File.OpenText(#"C:\FareSearchRegex.txt");
string line;
DataTable dt = new DataTable();
DataRow dr;
dt.Columns.Add("PTC");
dt.Columns.Add("CUR");
dt.Columns.Add("TAX");
dt.Columns.Add("FARE BASIS");
while ((line = sreader.ReadLine()) != null)
{
var pattern = "---------- RECOMMENDATION 1 OF 3 IN GROUP 1 (USD 168.90)----------";
var result = Regex.Replace(line,pattern," ");
dt.Rows.Add(line);
}
}
}
class Class1
{
string PTC;
string CUR;
float TAX;
public string gsPTC
{
get{ return PTC; }
set{ PTC = value; }
}
public string gsCUR
{
get{ return CUR; }
set{ CUR = value; }
}
public float gsTAX
{
get{ return TAX; }
set{ TAX = value; }
}
}
}

If your format is strict(e.g. always 4 columns) and you want to remove only this complete line i don't see any reason to use regex:
var rows = File.ReadLines(#"C:\FareSearchRegex.txt")
.Where(l => l != "---------- RECOMMENDATION 1 OF 3 IN GROUP 1 (USD 168.90)----------")
.Select(l => new { line = l, items = l.Split(','), row = dt.Rows.Add() });
foreach (var x in rows)
x.row.ItemArray = x.items;
(assumed that the fields are separated by comma)
Edit: This works with your pastebin:
string header = " PTC CUR TAX FARE BASIS";
bool takeNextLine = false;
foreach (string line in File.ReadLines(#"C:\FareSearchRegex.txt"))
{
if (line.StartsWith(header))
takeNextLine = true;
else if (takeNextLine)
{
var tokens = line.Split(new[] { #" " }, StringSplitOptions.RemoveEmptyEntries);
dt.Rows.Add().ItemArray = tokens.Where((t, i) => i != 2).ToArray();
takeNextLine = false;
}
}
(since you have an empty column which you want to exclude from the result i've used the clumsy and possibly error-prone(?) query Where((t, i) => i != 2))

To parse the file you'll need to:
Split the text of the file into data chunks. A chunk, in your case can be identified by the header PTC CUR TAX FARE BASIS and by the TOTAL line. To split the text you'll need to tokenize the input as follows> (i) define a regular expression to match the headers, (ii) define a regular expression to match the Total lines (footers); Using (i) and (ii) you can join them by the order of appearance index and determine the total size of each chunk (see the line with (x,y)=>new{StartIndex = x.Match.Index, EndIndex = y.Match.Index + y.Match.Length}) below). Use String.Substring method to separate the chunks.
Extract the data from each individual chunk. Knowing that data is split by lines you just have to iterate through all lines in a chunk (ignoring header and footer) and process each line.
This code should help:
string file = #"C:\FareSearchRegex.txt";
string text = File.ReadAllText(file);
var headerRegex = new Regex(#"^(\)>)?\s+PTC\s+CUR\s+TAX\s+FARE BASIS$", RegexOptions.IgnoreCase | RegexOptions.Multiline);
var totalRegex = new Regex(#"^\s+TOTAL[\w\s.]+?$",RegexOptions.IgnoreCase | RegexOptions.Multiline);
var lineRegex = new Regex(#"^(?<Num>\d+)?\s+(?<PTC>[A-Z]+)\s+\d+\s(?<Cur>[A-Z]{3})\s+[\d.]+\s+(?<Tax>[\d.]+)",RegexOptions.IgnoreCase | RegexOptions.Multiline);
var dataIndices =
headerRegex.Matches(text).Cast<Match>()
.Select((m, index) => new{ Index = index, Match = m })
.Join(totalRegex.Matches(text).Cast<Match>().Select((m, index) => new{ Index = index, Match = m }),
x => x.Index,
x => x.Index,
(x, y) => new{ StartIndex = x.Match.Index, EndIndex = y.Match.Index + y.Match.Length });
var items = dataIndices
.Aggregate(new List<string>(), (list, x) =>
{
var item = text.Substring(x.StartIndex, x.EndIndex - x.StartIndex);
list.Add(item);
return list;
});
var result = items.SelectMany(x =>
{
var lines = x.Split(new string[]{Environment.NewLine, "\r", "\n"}, StringSplitOptions.RemoveEmptyEntries);
return lines.Skip(1) //Skip header
.Take(lines.Length - 2) // Ignore footer
.Select(line =>
{
var match = lineRegex.Match(line);
return new
{
Ptc = match.Groups["PTC"].Value,
Cur = match.Groups["Cur"].Value,
Tax = Convert.ToDouble(match.Groups["Tax"].Value)
};
});
});

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Processing Fixed Width Files - Solution Not Working - c#

Related

Getting the sum per hour from 2 datatable

C# Character encoding issues

PDF Multi-level break and print

String Split to String Array without seperator

How to make the custom parser for text file

Categories

Resources