C# Character encoding issues - c#

I'm trying to make a transport agent that parses the body of an email to find the pertinent pieces of information and replaces the generic subject line with specific details. My problem is that a subject line that should read like:
ABC Co Error: No status reason code (0123456)
instead shows up as
A B C C o E r r o r : N o s t a t u s r e a s o n c o d e ( 0 1 2 3 4 5 6 )
The email is plain text and encoded in us-ascii according to the email header. My problem is that It is my understanding from this question and this question that C# uses UTF-16 as the default string encoding. The spaces between each character lead me to believe that my code is somehow implicitly converting the ASCII to UTF-16, but I don't know where this would be happening. Any ideas on how to make this work properly?
void OnSubmittedMessageHandler(SubmittedMessageEventSource source, QueuedMessageEventArgs args)
{
this.mailItem = args.MailItem;
for (int intCounter = this.mailItem.Recipients.Count - 1; intCounter >= 0; intCounter--)
{
// Check if the email was sent to automated#mydomain.com
string msgRecipientP1 = this.mailItem.Recipients[intCounter].Address.LocalPart;
if (msgRecipientP1.ToLower() == "automated")
{
// Read the body of the email
string line = "";
Dictionary<string, string> EDIErrors = new Dictionary<string, string>();
Body body = this.mailItem.Message.Body;
Stream originalBodyContent = body.GetContentReadStream();
StreamReader streamReader = new StreamReader(originalBodyContent, System.Text.Encoding.ASCII, true);
while ((line = streamReader.ReadLine()) != null)
{
if (line.IndexOf("Partner:") > 0)
{
line.Replace(": ", ":");
string[] lineParts = line.Split(new[] { " " }, StringSplitOptions.None);
foreach (string EDIErrorPart in lineParts)
{
int idx = EDIErrorPart.IndexOf(':');
int qidx = EDIErrorPart.IndexOf('"');
if (idx > 0)
{
EDIErrors[EDIErrorPart.Substring(0, idx).ToLower()] = EDIErrorPart.Substring(idx + 1).ToLower();
}
else if (qidx > 0)
{
EDIErrors["Message"] = EDIErrorPart.Replace("\"", string.Empty);
}
}
}
}
if (originalBodyContent != null)
{
originalBodyContent.Close();
}
// Build the new Subject line and the recipient groups
string sOrder;
string sMessage;
string sDistroGroup;
EDIErrors.TryGetValue("Order", out sOrder);
EDIErrors.TryGetValue("Message", out sMessage);
EDIErrors.TryGetValue("Partner", out sDistroGroup);
string NewSubject = sPartner + " Error: " + sMessage + "(" + sOrder + ")";
this.mailItem.Message.Subject = NewSubject;
if (IsTicketable)
{
this.mailItem.Recipients.Add(new RoutingAddress("helpdesk#mydomain.com"));
}
}
}
return;
}

Related

get line number of string in file C#

That code makes filter on the files with the regular expression, and to shape the results. But how can I get a line number of each matches in the file?
I have no idea how to do it...
Please help me
class QueryWithRegEx
enter code here
IEnumerable<System.IO.FileInfo> fileList = GetFiles(startFolder);
System.Text.RegularExpressions.Regex searchTerm =
new System.Text.RegularExpressions.Regex(#"Visual (Basic|C#|C\+\+|Studio)");
var queryMatchingFiles =
from file in fileList
where file.Extension == ".htm"
let fileText = System.IO.File.ReadAllText(file.FullName)
let matches = searchTerm.Matches(fileText)
where matches.Count > 0
select new
{
name = file.FullName,
matchedValues = from System.Text.RegularExpressions.Match match in matches
select match.Value
};
Console.WriteLine("The term \"{0}\" was found in:", searchTerm.ToString());
foreach (var v in queryMatchingFiles)
{
string s = v.name.Substring(startFolder.Length - 1);
Console.WriteLine(s);
foreach (var v2 in v.matchedValues)
{
Console.WriteLine(" " + v2);
}
}
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
static IEnumerable<System.IO.FileInfo> GetFiles(string path)
{
...}
Give this a go:
var queryMatchingFiles =
from file in Directory.GetFiles(startFolder)
let fi = new FileInfo(file)
where fi.Extension == ".htm"
from line in File.ReadLines(file).Select((text, index) => (text, index))
let match = searchTerm.Match(line.text)
where match.Success
select new
{
name = file,
line = line.text,
number = line.index + 1
};
You can Understand from this example:
Read lines number 3 from string
string line = File.ReadLines(FileName).Skip(14).Take(1).First();
Read text from a certain line number from string
string GetLine(string text, int lineNo)
{
string[] lines = text.Replace("\r","").Split('\n');
return lines.Length >= lineNo ? lines[lineNo-1] : null;
}
Retrieving the line number from a Regex match can be tricky, I'd use the following approach:
private static void FindMatches(string startFolder)
{
var searchTerm = new Regex(#"Visual (Basic|C#|C\+\+|Studio)");
foreach (var file in Directory.GetFiles(startFolder, "*.htm"))
{
using (var reader = new StreamReader(file))
{
int lineNumber = 1;
string lineText;
while ((lineText = reader.ReadLine()) != null)
{
foreach (var match in searchTerm.Matches(lineText).Cast<Match>())
{
Console.WriteLine($"Match found: <{match.Value}> in file <{file}>, line <{lineNumber}>");
}
lineNumber++;
}
}
}
}

How to parse nested parenthesis only in first level in C#

I would like to write C# code that parses nested parenthesis to array elements, but only on first level. An example is needed for sure:
I want this string:
"(example (to (parsing nested paren) but) (first lvl only))"
tp be parsed into:
["example", "(to (parsing nested paren) but)", "(first lvl only)"]
I was thinking about using regex but can't figure out how to properly use them without implementing this behaviour from scratch.
In the case of malformed inputs I would like to return an empty array, or an array ["error"]
I developed a parser for your example. I also checked some other examples which you can see in the code.
using System;
using System.Collections;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
string str = "(example (to (parsing nested paren) but) (first lvl only))"; // => [example , (to (parsing nested paren) but) , (first lvl only)]
//string str = "(first)(second)(third)"; // => [first , second , third]
//string str = "(first(second)third)"; // => [first , (second) , third]
//string str = "(first(second)(third)fourth)"; // => [first , (second) , (third) , fourth]
//string str = "(first((second)(third))fourth)"; // => [first , ((second)(third)) , fourth]
//string str = "just Text"; // => [ERROR]
//string str = "start with Text (first , second)"; // => [ERROR]
//string str = "(first , second) end with text"; // => [ERROR]
//string str = ""; // => [ERROR]
//string str = "("; // => [ERROR]
//string str = "(first()(second)(third))fourth)"; // => [ERROR]
//string str = "(((extra close pareanthese))))"; // => [ERROR]
var res = Parser.parse(str);
showRes(res);
}
static void showRes(ArrayList res)
{
var strings = res.ToArray();
var theString = string.Join(" , ", strings);
Console.WriteLine("[" + theString + "]");
}
}
public class Parser
{
static Dictionary<TokenType, TokenType> getRules()
{
var rules = new Dictionary<TokenType, TokenType>();
rules.Add(TokenType.OPEN_PARENTHESE, TokenType.START | TokenType.OPEN_PARENTHESE | TokenType.CLOSE_PARENTHESE | TokenType.SIMPLE_TEXT);
rules.Add(TokenType.CLOSE_PARENTHESE, TokenType.SIMPLE_TEXT | TokenType.CLOSE_PARENTHESE);
rules.Add(TokenType.SIMPLE_TEXT, TokenType.SIMPLE_TEXT | TokenType.CLOSE_PARENTHESE | TokenType.OPEN_PARENTHESE);
rules.Add(TokenType.END, TokenType.CLOSE_PARENTHESE);
return rules;
}
static bool isValid(Token prev, Token cur)
{
var rules = Parser.getRules();
return rules.ContainsKey(cur.type) && ((prev.type & rules[cur.type]) == prev.type);
}
public static ArrayList parse(string sourceText)
{
ArrayList result = new ArrayList();
int openParenthesesCount = 0;
Lexer lexer = new Lexer(sourceText);
Token prevToken = lexer.getStartToken();
Token currentToken = lexer.readNextToken();
string tmpText = "";
while (currentToken.type != TokenType.END)
{
if (currentToken.type == TokenType.OPEN_PARENTHESE)
{
openParenthesesCount++;
if (openParenthesesCount > 1)
{
tmpText += currentToken.token;
}
}
else if (currentToken.type == TokenType.CLOSE_PARENTHESE)
{
openParenthesesCount--;
if (openParenthesesCount < 0)
{
return Parser.Error();
}
if (openParenthesesCount > 0)
{
tmpText += currentToken.token;
}
}
else if (currentToken.type == TokenType.SIMPLE_TEXT)
{
tmpText += currentToken.token;
}
if (!Parser.isValid(prevToken, currentToken))
{
return Parser.Error();
}
if (openParenthesesCount == 1 && tmpText.Trim() != "")
{
result.Add(tmpText);
tmpText = "";
}
prevToken = currentToken;
currentToken = lexer.readNextToken();
}
if (openParenthesesCount != 0)
{
return Parser.Error();
}
if (!Parser.isValid(prevToken, currentToken))
{
return Parser.Error();
}
if (tmpText.Trim() != "")
{
result.Add(tmpText);
}
return result;
}
static ArrayList Error()
{
var er = new ArrayList();
er.Add("ERROR");
return er;
}
}
class Lexer
{
string _txt;
int _index;
public Lexer(string text)
{
this._index = 0;
this._txt = text;
}
public Token getStartToken()
{
return new Token(-1, TokenType.START, "");
}
public Token readNextToken()
{
if (this._index >= this._txt.Length)
{
return new Token(-1, TokenType.END, "");
}
Token t = null;
string txt = "";
if (this._txt[this._index] == '(')
{
txt = "(";
t = new Token(this._index, TokenType.OPEN_PARENTHESE, txt);
}
else if (this._txt[this._index] == ')')
{
txt = ")";
t = new Token(this._index, TokenType.CLOSE_PARENTHESE, txt);
}
else
{
txt = this._readText();
t = new Token(this._index, TokenType.SIMPLE_TEXT, txt);
}
this._index += txt.Length;
return t;
}
private string _readText()
{
string txt = "";
int i = this._index;
while (i < this._txt.Length && this._txt[i] != '(' && this._txt[i] != ')')
{
txt = txt + this._txt[i];
i++;
}
return txt;
}
}
class Token
{
public int position
{
get;
private set;
}
public TokenType type
{
get;
private set;
}
public string token
{
get;
private set;
}
public Token(int position, TokenType type, string token)
{
this.position = position;
this.type = type;
this.token = token;
}
}
[Flags]
enum TokenType
{
START = 1,
OPEN_PARENTHESE = 2,
SIMPLE_TEXT = 4,
CLOSE_PARENTHESE = 8,
END = 16
}
well, regex will do the job:
var text = #"(example (to (parsing nested paren) but) (first lvl only))";
var pattern = #"\(([\w\s]+) (\([\w\s]+ \([\w\s]+\) [\w\s]+\)) (\([\w\s]+\))\)*";
try
{
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = r.Match(text);
string group_1 = m.Groups[1].Value; //example
string group_2 = m.Groups[2].Value; //(to (parsing nested paren) but)
string group_3 = m.Groups[3].Value; //(first lvl only)
return new string[]{group_1,group_2,group_3};
}
catch(Exception ex){
return new string[]{"error"};
}
hopefully this helps, tested here in dotnetfiddle
Edit:
this might get you started into building the right expression according to whatever patterns you are falling into and maybe build a recursive function to parse the rest into the desired output :)
RegEx is not recursive. You either count bracket level, or recurse.
An non-recursive parser loop I tested for the example you show is..
string SplitFirstLevel(string s)
{
List<string> result = new List<string>();
int p = 0, level = 0;
for (int i = 0; i < s.Length; i++)
{
if (s[i] == '(')
{
level++;
if (level == 1) p = i + 1;
if (level == 2)
{
result.Add('"' + s.Substring(p, i - p) + '"');
p = i;
}
}
if (s[i] == ')')
if (--level == 0)
result.Add('"' + s.Substring(p, i - p) + '"');
}
return "[" + String.Join(",", result) + "]";
}
Note: after some more testing, I see your specification is unclear. How to delimit orphaned level 1 terms, that is terms without bracketing ?
For example, my parser translates
(example (to (parsing nested paren) but) (first lvl only))
to:
["example ","(to (parsing nested paren) but) ","(first lvl only)"]
and
(example (to (parsing nested paren)) but (first lvl only))
to:
["example ","(to (parsing nested paren)) but ","(first lvl only)"]
In either case, "example" gets a separate term, while "but" is grouped with the first term. In the first example this is logical, it is in the bracketing, but it may be unwanted behaviour in the second case, where "but" should be separated, like "example", which also has no bracketing (?)

String formatting in C#?

I have some problems to format strings from a List<string>
Here's a picture of the List values:
Now I managed to manipulate some of the values but others not, here's what I used to manipulate:
string prepareStr(string itemToPrepare) {
string first = string.Empty;
string second = string.Empty;
if (itemToPrepare.Contains("\"")) {
first = itemToPrepare.Replace("\"", "");
}
if (first.Contains("-")) {
int beginIndex = first.IndexOf("-");
second = first.Remove(beginIndex, first.Length - beginIndex);
}
return second;
}
Here's a picture of the Result:
I need to get the clear Path without the (-startup , -minimzed , MSRun , double apostrophes).
What am I doing wrong here?
EDIT my updated code:
void getStartUpEntries() {
var startEntries = StartUp.getStartUp();
if (startEntries != null && startEntries.Count != 0) {
for (int i = 0; i < startEntries.Count; i++) {
var splitEntry = startEntries[i].Split(new string[] { "||" }, StringSplitOptions.None);
var str = splitEntry[1];
var match = Regex.Match(str, #"\|\|""(?<path>(?:\""|[^""])*)""");
var finishedPath = match.Groups["path"].ToString();
if (!string.IsNullOrEmpty(finishedPath)) {
if (File.Exists(finishedPath) || Directory.Exists(finishedPath)) {
var _startUpObj = new StartUp(splitEntry[0], finishedPath,
"Aktiviert: ", new Uri("/Images/inWatch.avOK.png", UriKind.RelativeOrAbsolute),
StartUp.getIcon(finishedPath));
_startUpList.Add(_startUpObj);
}
else {
var _startUpObjNo = new StartUp(splitEntry[0], finishedPath,
"Aktiviert: ", new Uri("/Images/inWatch.avOK.png", UriKind.RelativeOrAbsolute),
StartUp.getIcon(string.Empty));
_startUpList.Add(_startUpObjNo);
}
}
var _startUpObjLast = new StartUp(splitEntry[0], splitEntry[1],
"Aktiviert: ", new Uri("/Images/inWatch.avOK.png", UriKind.RelativeOrAbsolute),
StartUp.getIcon(string.Empty));
_startUpList.Add(_startUpObjLast);
}
lstStartUp.ItemsSource = _startUpList.OrderBy(item => item.Name).ToList();
}
You could use a regex to extract the path:
var str = #"0Raptr||""C:\Program Files (x86)\Raptr\raptrstub.exe"" --startup"
var match = Regex.Match(str, #"\|\|""(?<path>(?:\""|[^""])*)""");
Console.WriteLine(match.Groups["path"]);
This will match any (even empty) text (either an escaped quote, or any character which is not a quote) between two quote characters preceeded by two pipe characters.
Similarly, you could simply split on the double quotes as I see that's a repeating occurrence in your examples and take the second item in the split array:
var path = new Regex("\"").Split(s)[1];
This is and update to your logic without using any Regex:
private string prepareStr(string itemToPrepare)
{
string result = null;
string startString = #"\""";
string endString = #"\""";
int startPoint = itemToPrepare.IndexOf(startString);
if (startPoint >= 0)
{
startPoint = startPoint + startString.Length;
int EndPoint = itemToPrepare.IndexOf(endString, startPoint);
if (EndPoint >= 0)
{
result = itemToPrepare.Substring(startPoint, EndPoint - startPoint);
}
}
return result;
}

Sorting lines in winforms richtextbox preserving RTF formatting

Is there any way to sort lines in winforms richtextbox preserving RTF formatting?
var lines = edit.Lines.OrderBy(s => s);
edit.Lines = lines.ToArray();
do the job fine, but, obviously, loosing any RTF formatting.
I have slightly changed the snippet of TaW:
1. Adding "unique" might break the very first line formatting
2. Besides "\par" tag there is also "\pard"
Here is a snippet (thanks again to TaW!):
private void cmdSort_Click(object sender, EventArgs e)
{
const string PARD = "\\pard";
var pard = Guid.NewGuid().ToString();
var pos1 = edit.Rtf.IndexOf(PARD, StringComparison.Ordinal) + PARD.Length;
if (pos1 < 0) return;
var header = edit.Rtf.Substring(0, pos1);
var body = edit.Rtf.Substring(pos1);
body = body.Replace("\\pard", pard);
var lines = body.Split(new[] { "\\par" }, StringSplitOptions.None);
var lastFormat = "";
var sb = new StringBuilder();
var rtfLines = new SortedList<string, string>();
foreach (var line in lines)
{
var ln = line.Replace(pard, "\\pard");
var temp = ln.Replace("\r\n", "").Trim();
if (temp.Length > 0 && temp[0] != '\\')
{
rtfLines.Add(temp.Trim(), lastFormat + " " + ln);
}
else
{
var pos2 = temp.IndexOf(' ');
if (pos2 < 0)
{
rtfLines.Add(temp.Trim(), ln);
}
else
{
rtfLines.Add(temp.Substring(pos2).Trim(), ln);
lastFormat = temp.Substring(0, pos2);
}
}
}
foreach (var key in rtfLines.Keys.Where(key => key != "}"))
{
sb.Append(rtfLines[key] + "\\par");
}
edit.Rtf = header + sb;
}
Here is a code snippet that seems to work if the file has neither images nor tables embedded..
It uses two RTF boxes. In my tests they sorted alright and kept all formatting intact.
private void button4_Click(object sender, EventArgs e)
{
string unique = Guid.NewGuid().ToString() ;
richTextBox1.SelectionStart = 0;
richTextBox1.SelectionLength = 0;
richTextBox1.SelectedText = unique;
int pos1 = richTextBox1.Rtf.IndexOf(unique);
if (pos1 >= 0)
{
string header = richTextBox1.Rtf.Substring(0, pos1);
string header1 = "";
string header2 = "";
int pos0 = header.LastIndexOf('}') + 1;
if (pos0 > 1) { header1 = header.Substring(0, pos0); header2 = header.Substring(pos0); }
// after the header comes a string of formats to start the document
string[] formats = header2.Split('\\');
string firstFormat = "";
string lastFormat = "";
// we extract a few important character formats (bold, italic, underline, font, color)
// to keep with the first line which will be sorted away
// the lastFormat variable holds the last formatting encountered
// so we can add it to all lines without formatting
// (and of course we are really talking about paragraphs)
foreach (string fmt in formats)
if (fmt[0]=='b' || ("cfiu".IndexOf(fmt[0]) >= 0 && fmt.Substring(0,2)!="uc") )
lastFormat += "\\" + fmt; else firstFormat += "\\" + fmt;
// add the rest to the header
header = header1 + firstFormat;
// now we remove our marker from the body and split it into paragraphs
string body = richTextBox1.Rtf.Substring(pos1);
string[] lines = body.Replace(unique, "").Split(new string[] { "\\par" }, StringSplitOptions.None);
StringBuilder sb = new StringBuilder();
// the soteredlist will contain the unformatted text as key and the formatted one as valaue
SortedList<string, string> rtfLines = new SortedList<string, string>();
foreach (string line in lines)
{
// cleanup
string line_ = line.Replace("\r\n", "").Trim();
if (line_[0] != '\\' ) rtfLines.Add(line_, lastFormat + " " + line);
else
{
int pos2 = line_.IndexOf(' ');
if (pos2 < 0) rtfLines.Add(line_, line);
else
{
rtfLines.Add(line_.Substring(pos2).Trim(), line);
lastFormat = line_.Substring(0, pos2);
}
}
}
foreach (string key in rtfLines.Keys) if (key != "}") sb.Append(rtfLines[key] + "\\par");
richTextBox2.Rtf = header + sb.ToString();
}
Of course this is really q&d and not ready for serious production; but it looks like a start.
EDIT 2: I changed the code to fix a bug with the first line's format and added some comments. This should work a lot better, but is still a hack that must be adapted to the real input files..
RichTextBox has a property Rtf that would keep RTF formatting.
[BrowsableAttribute(false)]
public string Rtf { get; set; }

C# Processing Fixed Width Files - Solution Not Working

I have implemented Cuong's solution here:
C# Processing Fixed Width Files
Here is my code:
var lines = File.ReadAllLines(#fileFull);
var widthList = lines.First().GroupBy(c => c)
.Select(g => g.Count())
.ToList();
var list = new List<KeyValuePair<int, int>>();
int startIndex = 0;
for (int i = 0; i < widthList.Count(); i++)
{
var pair = new KeyValuePair<int, int>(startIndex, widthList[i]);
list.Add(pair);
startIndex += widthList[i];
}
var csvLines = lines.Select(line => string.Join(",",
list.Select(pair => line.Substring(pair.Key, pair.Value))));
File.WriteAllLines(filePath + "\\" + fileName + ".csv", csvLines);
#fileFull = File Path & Name
The issue I have is the first line of the input file also contains digits. So it could be AAAAAABBC111111111DD2EEEEEE etc. For some reason the output from Cuong's code gives me CSV headings like 1111RRRR and 222223333.
Does anyone know why this is and how I would fix it?
Header row example:
AAAAAAAAAAAAAAAABBBBBBBBBBCCCCCCCCDEFCCCCCCCCCGGGGGGGGHHHHHHHHIJJJJJJJJKKKKLLLLMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOPPPPQQQQ1111RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR222222222333333333444444444555555555666666666777777777888888888999999999S00001111TTTTTTTTTTTTUVWXYZ!"£$$$$$$%&
Converted header row:
AAAAAAAAAAAAAAAA BBBBBBBBBB CCCCCCCCDEFCCCCCC C C C GGGGGGGG HHHHHHHH I JJJJJJJJ KKKK LLLL MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO PPPP QQQQ 1111RRRR RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR2222 222223333 333334444 444445555 555556666 666667777 777778888 888889999 99999S000 0 1111 TTTTTTTTTTTT U V W X Y Z ! ",�,$$$$$$,%,&,"
Jodrell - I implemented your suggestion but the header output is like:
BBBBBBBBBBCCCCCC CCCCCCCCD DEFCCCC GGGGGGGG HHHHHHH IJJJJJJ KKKKLLL LLL MMM NNNNNNNNNNNNNNNNNNNNNNNNNNNNN OOOOOOOOOOOOOOOOOOOOOOOOOOOOO PPPPQQQQ1111RRRRRRRRRRRRRRRRR QQQ 111 RRR 33333333 44444444 55555555 66666666 77777777 88888888 99999999 S0000111 111 TTT UVWXYZ!"�$$ %&
As Jodrell already mentioned, your code doesn't work because it assumed that the character representing each column header is distinct. Change the code that parse the header widths would fix it.
Replace:
var widthList = lines.First().GroupBy(c => c)
.Select(g => g.Count())
.ToList();
With:
var widthList = new List<int>();
var header = lines.First().ToArray();
for (int i = 0; i < header.Length; i++)
{
if (i == 0 || header[i] != header[i-1])
widthList.Add(0);
widthList[widthList.Count-1]++;
}
Parsed header columns:
AAAAAAAAAAAAAAAA BBBBBBBBBB CCCCCCCC D E F CCCCCCCCC GGGGGGGG HHHHHHHH I JJJJJJJJ KKKK LLLL MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO PPPP QQQQ 1111 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 222222222 333333333 444444444 555555555 666666666 777777777 888888888 999999999 S 0000 1111 TTTTTTTTTTTT U V W X Y Z ! " £ $$$$$$ % &
EDIT
Because the problem annoyed me I wrote some code that handles " and ,. This code replaces the header row with comma delimited alternating zeros and ones. Any commas or double quotes in the body are appropriately escaped.
static void FixedToCsv(string sourceFile)
{
if (sourceFile == null)
{
// Throw exception
}
var dir = Path.GetDirectory(sourceFile)
var destFile = string.Format(
"{0}{1}",
Path.GetFileNameWithoutExtension(sourceFile),
".csv");
if (dir != null)
{
destFile = Path.Combine(dir, destFile);
}
if (File.Exists(destFile))
{
// Throw Exception
}
var blocks = new List<KeyValuePair<int, int>>();
using (var output = File.OpenWrite(destFile))
{
using (var input = File.OpenText(sourceFile))
{
var outputLine = new StringBuilder();
// Make header
var header = input.ReadLine();
if (header == null)
{
return;
}
var even = false;
var lastc = header.First();
var counter = 0;
var blockCounter = 0;
foreach(var c in header)
{
counter++;
if (c == lastc)
{
blockCounter++;
}
else
{
blocks.Add(new KeyValuePair<int, int>(
counter - blockCounter - 1,
blockCounter));
blockCounter = 1;
outputLine.Append(',');
even = !even;
}
outputLine.Append(even ? '1' : '0');
lastc = c;
}
blocks.Add(new KeyValuePair<int, int>(
counter - blockCounter,
blockCounter));
outputLine.AppendLine();
var lineBytes = Encoding.UTF.GetBytes(outputLine.ToString());
outputLine.Clear();
output.Write(lineBytes, 0, lineBytes.Length);
// Process Body
var inputLine = input.ReadLine();
while (inputLine != null)
{
foreach(var block in block.Select(b =>
inputLine.Substring(b.Key, b.Value)))
{
var sanitisedBlock = block;
if (block.Contains(',') || block.Contains('"'))
{
santitisedBlock = string.Format(
"\"{0}\"",
block.Replace("\"", "\"\""));
}
outputLine.Append(sanitisedBlock);
outputLine.Append(',');
}
outputLine.Remove(outputLine.Length - 1, 1);
outputLine.AppendLine();
lineBytes = Encoding.UTF8.GetBytes(outputLne.ToString());
outputLine.Clear();
output.Write(lineBytes, 0, lineBytes.Length);
inputLine = input.ReadLine();
}
}
}
}
1 is repeated in your header row, so your two fours get counted as one eight and everything goes wrong from there.
(There is a block of four 1s after the Qs and another block of four 1s after the 0s)
Essentialy, your header row is invalid or, at least, doesen't work with the proposed solution.
Okay, you could do somthing like this.
public void FixedToCsv(string fullFile)
{
var lines = File.ReadAllLines(fullFile);
var firstLine = lines.First();
var widths = new List<KeyValuePair<int, int>>();
var innerCounter = 0;
var outerCounter = 0
var firstLineChars = firstLine.ToCharArray();
var lastChar = firstLineChars[0];
foreach(var c in firstLineChars)
{
if (c == lastChar)
{
innerCounter++;
}
else
{
widths.Add(new KeyValuePair<int, int>(
outerCounter
innerCounter);
innerCounter = 0;
lastChar = c;
}
outerCounter++;
}
var csvLines = lines.Select(line => string.Join(",",
widths.Select(pair => line.Substring(pair.Key, pair.Value))));
// Get filePath and fileName from fullFile here.
File.WriteAllLines(filePath + "\\" + fileName + ".csv", csvLines);
}

Categories