Speedily Read and Parse Data - c#
As of now, I am using this code to open a file and read it into a list and parse that list into a string[]:
string CP4DataBase =
"C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
CP4DataBaseRTB.LoadFile(CP4DataBase, RichTextBoxStreamType.PlainText);
string[] splitCP4DataBaseLines = CP4DataBaseRTB.Text.Split('\n');
List<string> tempCP4List = new List<string>();
string[] line1CP4Components;
foreach (var line in splitCP4DataBaseLines)
tempCP4List.Add(line + Environment.NewLine);
string concattedUnitPart = "";
foreach (var line in tempCP4List)
{
concattedUnitPart = concattedUnitPart + line;
line1CP4PartLines++;
}
line1CP4Components = new Regex("\"UNIT\",\"PARTS\"", RegexOptions.Multiline)
.Split(concattedUnitPart)
.Where(c => !string.IsNullOrEmpty(c)).ToArray();
I am wondering if there is a quicker way to do this. This is just one of the files I am opening, so this is repeated a minimum of 5 times to open and properly load the lists.
The minimum file size being imported right now is 257 KB. The largest file is 1,803 KB. These files will only get larger as time goes on as they are being used to simulate a database and the user will continually add to them.
So my question is, is there a quicker way to do all of the above code?
EDIT:
***CP4***
"UNIT","PARTS"
"BLOCK","HEADER-"
"NAME","106536"
"REVISION","0000"
"DATE","11/09/03"
"TIME","11:10:11"
"PMABAR",""
"COMMENT",""
"PTPNAME","R160805"
"CMPNAME","R160805"
"BLOCK","PRTIDDT-"
"PMAPP",1
"PMADC",0
"ComponentQty",180
"BLOCK","PRTFORM-"
"PTPSZBX",1.60
"PTPSZBY",0.80
"PTPMNH",0.25
"NeedGlue",0
"BLOCK","TOLEINF-"
"PTPTLBX",0.50
"PTPTLBY",0.40
"PTPTLCL",10
"PTPTLPX",0.30
"PTPTLPY",0.30
"PTPTLPQ",30
"BLOCK","ELDT+" "PGDELSN","PGDELX","PGDELY","PGDELPP","PGDELQ","PGDELP","PGDELW","PGDELL","PGDELWT","PGDELLT","PGDELCT","PGDELR"
0,0.000,0.000,0,0,0.000,0.000,0.000,0.000,0.000,0.000,0
"BLOCK","VISION-"
"PTPVIPL",0
"PTPVILCA",0
"PTPVILB",0
"PTPVICVT",10
"PENVILIT",0
"BLOCK","ENVDT"
"ELEMENT","CP43ENVDT-"
"PENNMI",1.0
"PENNMA",1.0
"PENNZN",""
"PENNZT",1.0
"PENBLM",12
"PENCRTS",0
"PENSPD1",100
"PTPCRDCT",0
"PENVICT",1
"PCCCRFT",1
"BLOCK","CARRING-"
"PTPCRAPO",0
"PTPCRPCK",0
"PTPCRPUX",0.00
"PTPCRPUY",0.00
"PTPCRRCV",0
"BLOCK","PACKCLS-"
"FDRTYPE","Emboss"
"TAPEWIDTH","8mm"
"FEEDPITCH",4
"REELDIAMETER",0
"TAPEDEPTH",0.0
"DOADVVACUUM",0
"CHKBEFOREFEED",0
"TAPEARMLENGTH",0
"PPCFDPP",0
"PPCFDEC",4
"PPCMNPT",30
"UNIT","PARTS"
"BLOCK","HEADER-"
"NAME","106653"
"REVISION","0000"
"DATE","11/09/03"
"TIME","11:10:42"
"PMABAR",""
"COMMENT",""
"PTPNAME","0603R"
"CMPNAME","0603R"
"BLOCK","PRTIDDT-"
"PMAPP",1
"PMADC",0
"ComponentQty",18
"BLOCK","PRTFORM-"
"PTPSZBX",1.60
"PTPSZBY",0.80
"PTPMNH",0.23
"NeedGlue",0
"BLOCK","TOLEINF-"
"PTPTLBX",0.50
"PTPTLBY",0.34
"PTPTLCL",0
"PTPTLPX",0.60
"PTPTLPY",0.40
"PTPTLPQ",30
"BLOCK","ELDT+" "PGDELSN","PGDELX","PGDELY","PGDELPP","PGDELQ","PGDELP","PGDELW","PGDELL","PGDELWT","PGDELLT","PGDELCT","PGDELR"
0,0.000,0.000,0,0,0.000,0.000,0.000,0.000,0.000,0.000,0
"BLOCK","VISION-"
"PTPVIPL",0
"PTPVILCA",0
"PTPVILB",0
"PTPVICVT",10
"PENVILIT",0
"BLOCK","ENVDT"
"ELEMENT","CP43ENVDT-"
"PENNMI",1.0
"PENNMA",1.0
"PENNZN",""
"PENNZT",1.0
"PENBLM",12
"PENCRTS",0
"PENSPD1",80
"PTPCRDCT",0
"PENVICT",1
"PCCCRFT",1
"BLOCK","CARRING-"
"PTPCRAPO",0
"PTPCRPCK",0
"PTPCRPUX",0.00
"PTPCRPUY",0.00
"PTPCRRCV",0
"BLOCK","PACKCLS-"
"FDRTYPE","Emboss"
"TAPEWIDTH","8mm"
"FEEDPITCH",4
"REELDIAMETER",0
"TAPEDEPTH",0.0
"DOADVVACUUM",0
"CHKBEFOREFEED",0
"TAPEARMLENGTH",0
"PPCFDPP",0
"PPCFDEC",4
"PPCMNPT",30
... the file goes on and on and on.. and will only get larger.
The REGEX is placing each "UNIT PARTS" and the following code until the NEXT "UNIT PARTS" into a string[].
After this, I am checking each string[] to see if the "NAME" section exists in a different list. If it does exist, I am outputting that "UNIT PARTS" at the end of a textfile.
This bit is a potential performance killer:
string concattedUnitPart = "";
foreach (var line in tempCP4List)
{
concattedUnitPart = concattedUnitPart + line;
line1CP4PartLines++;
}
(See this article for why.) Use a StringBuilder for repeated concatenation:
// No need to use tempCP4List at all
StringBuilder builder = new StringBuilder();
foreach (var line in splitCP4DataBaseLines)
{
concattedUnitPart.AppendLine(line);
line1CP4PartLines++;
}
Or even just:
string concattedUnitPart = string.Join(Environment.NewLine,
splitCP4DataBaseLines);
Now the regex part may well also be slow - I'm not sure. It's not obvious what you're trying to achieve, whether you need regular expressions at all, or whether you really need to do the whole thing in one go. Can you definitely not just process it line by line?
You could achieve the same output list 'line1CP4Components' using the following:
Regex StripEmptyLines = new Regex(#"^\s*$", RegexOptions.Multiline);
Regex UnitPartsMatch = new Regex(#"(?<=\n)""UNIT"",""PARTS"".*?(?=(?:\n""UNIT"",""PARTS"")|$)", RegexOptions.Singleline);
string CP4DataBase =
"C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
CP4DataBaseRTB.LoadFile(CP4DataBase, RichTextBoxStreamType.PlainText);
List<string> line1CP4Components = new List<string>(
UnitPartsMatch.Matches(StripEmptyLines.Replace(CP4DataBaseRTB.Text, ""))
.OfType<Match>()
.Select(m => m.Value)
);
return line1CP4Components.ToArray();
You may be able to ignore the use of StripEmptyLines, but your original code is doing this via the Where(c => !string.IsNullOrEmpty(c)). Also your original code is causing the '\r' part of the "\r\n" newline/linefeed pair to be duplicated. I assumed this was an accident and not intentional?
Also you don't seem to be using the value in 'line1CP4PartLines' so I omitted the creation of the value. It was seemingly inconsistent with the omission of empty lines later so I guess you're not depending on it. If you need this value a simple regex can tell you how many new lines are in the string:
int linecount = new Regex("^", RegexOptions.Multiline).Matches(CP4DataBaseRTB.Text).Count;
// example of what your code will look like
string CP4DataBase = "C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
List<string> Cp4DataList = new List<string>(File.ReadAllLines(CP4DataBase);
//or create a Dictionary<int,string[]> object
string strData = string.Empty;//hold the line item data which is read in line by line
string[] strStockListRecord = null;//string array that holds information from the TFE_Stock.txt file
Dictionary<int, string[]> dctStockListRecords = null; //dictionary object that will hold the KeyValuePair of text file contents in a DictList
List<string> lstStockListRecord = null;//Generic list that will store all the lines from the .prnfile being processed
if (File.Exists(strExtraLoadFileLoc + strFileName))
{
try
{
lstStockListRecord = new List<string>();
List<string> lstStrLinesStockRecord = new List<string>(File.ReadAllLines(strExtraLoadFileLoc + strFileName));
dctStockListRecords = new Dictionary<int, string[]>(lstStrLinesStockRecord.Count());
int intLineCount = 0;
foreach (string strLineSplit in lstStrLinesStockRecord)
{
lstStockListRecord.Add(strLineSplit);
dctStockListRecords.Add(intLineCount, lstStockListRecord.ToArray());
lstStockListRecord.Clear();
intLineCount++;
}//foreach (string strlineSplit in lstStrLinesStockRecord)
lstStrLinesStockRecord.Clear();
lstStrLinesStockRecord = null;
lstStockListRecord.Clear();
lstStockListRecord = null;
//Alter the code to fit what you are doing..
Related
Read text from a text file with specific pattern
Hi there I have a requirement where i need to read content from a text file. The sample text content is as below. Name=Check_Amt Public=Yes DateName=pp Name=DBO I need to read the text and only extract the value which comes after Name='What ever text'. So I am expecting the output as Check_Amt, DBO I need to do this in C#
When querying data (e.g. file lines) Linq is often a convenient tool; if the file has lines in name=value format, you can query it like this Read file lines Split each line into name, value pair Filter pairs by their names Extract value from each pair Materialize values into a collection Code: using System.Linq; ... // string[] {"Check_Amt", "DBO"} var values = File .ReadLines(#"c:\MyFile.txt") .Select(line => line.Split(new char[] { '=' }, 2)) // split into name, value pairs .Where(items => items.Length == 2) // to be on the safe side .Where(items => items[0] == "Name") // name == "Name" only .Select(items => items[1]) // value from name=value .ToArray(); // let's have an array finally, if you want comma separated string, Join the values: // "Check_Amt,DBO" string result = string.Join(",", values);
Another way: var str = #"Name=Check_Amt Public=Yes DateName=pp Name=DBO"; var find = "Name="; var result = new List<string>(); using (var reader = new StringReader(str)) //Change to StreamReader to read from file { string line; while ((line = reader.ReadLine()) != null) { if (line.StartsWith(find)) result.Add(line.Substring(find.Length)); } }
You can use LINQ to select what you need: var names=File. ReadLines("my file.txt" ).Select(l=>l.Split('=')).Where(t=>t.Length==2).Where(t=>t[0]=="Name").Select(t=>t[1])
I think that the best case would be a regex. using System; using System.Text.RegularExpressions; public class Example { public static void Main() { string pattern = #"(?<=Name=).*?(?=Public)"; string input = #"Name=Check_Amt Public=Yes DateName=pp Name=DBO"; RegexOptions options = RegexOptions.Multiline; foreach (Match m in Regex.Matches(input, pattern, options)) { Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index); } } } EDIT: My answer was written before your question were corrected, while it's still working the LINQ answer would be better IMHO.
Extract a pattern from a group of filenames and place into listbox
I have a folder with a lot of files like this: 2016-01-02-03-abc.txt 2017-01-02-03-defjh.jpg 2018-05-04-03-hij.txt 2022-05-04-03-klmnop.jpg I need to extract the pattern from each group of filenames. For example, I need the pattern 01-02-03 from the first two files placed in a list. I also need the pattern 05-04-03 placed in the same list. So, my list will look like this: 01-02-03 05-04-03 Here is what I have so far. I can successfully remove the characters but getting one instance of a pattern back into a list is beyond my pay grade: public void GetPatternsToList() { //Get all filenames with characters removed and place in listbox. List<string> files = new List<string>(Directory.EnumerateFiles(folderBrowserDialog1.SelectedPath)); foreach (var file in files) { var removeallbeforefirstdash = file.Substring(file.IndexOf("-") + 1); // removes everthing before the dash in the filename var finalfile = removeallbeforefirstdash.Substring(0,removeallbeforefirstdash.LastIndexOf("-")); // removes everything after dash in name -- will crash if file without dash is in folder (not sure how to fix this either) string[] array = finalfile.ToArray(); // I need to do the above with each file in the list and then place it back in an array to display in a listbox List<string> filesList = array.ToList(); listBox1.DataSource = filesList; } }
You could do it this way: public void GetPatternsToList() { var files = Directory.GetFiles(folderBrowserDialog1.SelectedPath); var patterns = new HashSet<string>(); foreach (var file in files) { var splitFileName = file.Split('-').Skip(1).Take(3); var joinedFileName = string.Join("-", splitFileName); if(!string.IsNullOrEmpty(joinedFileName) patterns.Add(joinedFileName); } listBox1.DataSource = patterns; } I used a HashSet<string> in order to avoid adding duplicate patterns to the DataSource. A few remarks that aren't related to your question, but your code in general: I would pass the SelectedPath as a string to the method I would let the method return you the HashSet If you implement the above, please also name the method accordingly All of the above is of course optional for you, but would improve your code quality.
Try this: public void GetPatternsToList() { List<string> files = new List<string>(Directory.EnumerateFiles(folderBrowserDialog1.SelectedPath)); List<string> resultFiles = new List<string>(); foreach (var file in files) { var removeallbeforefirstdash = file.Substring(file.IndexOf("-") + 1); // removes everthing before the dash in the filename var finalfile = removeallbeforefirstdash.Substring(0, removeallbeforefirstdash.LastIndexOf("-")); // removes everything after dash in name -- will crash if file without dash is in folder (not sure how to fix this either) resultFiles.Add(finalfile); } listBox1.DataSource = resultFiles.Distinct().ToList(); }
Split does not work as expected with commas
I need to write a CSV Parser I am now trying to separat the fields to manipulate them. Sample CSV: mitarbeiter^tagesdatum^lohnart^kostenstelle^kostentraeger^menge^betrag^belegnummer 11005^23.01.2018^1^^31810020^5,00^^ 11081^23.01.2018^1^^31810020^5,00^^ As you can see, there a several empty cells. I am doing the following: using (CsvFileReader reader = new CsvFileReader(path)) { CsvRow row = new CsvRow(); while (reader.ReadRow(row)) { foreach (string s in row) { csvROW.Add(new aCSVROW()); string[] items = s.Split(new char[] { '^' }, StringSplitOptions.None); csvROW[0].mitarbeiter = items[0]; csvROW[0].tagesdatum = items[1]; csvROW[0].lohnart = items[2]; csvROW[0].kostenstelle = items[3]; csvROW[0].kostentraeger = items[4]; csvROW[0].menge = items[5]; csvROW[0].betrag = items[6]; csvROW[0].belegnummer = items[7]; } } } Problem: It seems that Split stops after the comma (5,00). The separator is ^ ... is there a reason why? I tried several things without success... Thank you so much!
CsvFileReader reads rows from a CSV file and then strings within that row. What else do you expect the CsvFileReader to do than separating the row? After reading the second line, row will have the contents 11005^23.01.2018^1^^31810020^5 and 00^^ When you split the first row by ^, the last entry of the resulting array will be "5". Anyway, your code will throw, because you are trying to access items exceeding the bounds of the array. I don't know CsvFileReader. Maybe you can pass ^ as a separator and spare the splitting of the string. Anyway, you could use a StreamReader, too. This will work much more like you expected. using (StreamReader reader = new StreamReader(path)) { while (!reader.EndOfStream) { var csvLine = reader.ReadLine(); csvROW.Add(new aCSVROW()); string[] items = csvLine.Split(new char[] { '^' }, StringSplitOptions.None); csvROW[0].mitarbeiter = items[0]; csvROW[0].tagesdatum = items[1]; csvROW[0].lohnart = items[2]; csvROW[0].kostenstelle = items[3]; csvROW[0].kostentraeger = items[4]; csvROW[0].menge = items[5]; csvROW[0].betrag = items[6]; csvROW[0].belegnummer = items[7]; } }
Is CsvRow meant to be the data of all rows, or of one row? Because as it is, you keep adding a new aCSVROW object into csvROW for each read line, but you keep replacing the data on just csvROW[0], the first inserted aCSVROW. This means that in the end, you will have a lot of rows that all have no data in them, except for the one on index 0, that had its properties overwritten on each iteration, and ends up containing the data of the last read row. Also, despite using a CsvReader class, you are using plain normal String.Split to actually separate the fields. Surely that's what the CsvReader class is for? Personally, I always use the TextFieldParser, from the Microsoft.VisualBasic.FileIO namespace. It has the advantage it's completely native in the .Net framework, and you can simply tell it which separator to use. This function can get the data out of it as simple List<String[]>: A: Using C# to search a CSV file and pull the value in the column next to it Once you have your data, you can paste it into objects however you want. List<String[]> lines = SplitFile(path, textEncoding, "^"); // I assume "CsvRow" is some kind of container for multiple rows? // Looks like pretty bad naming to me... CsvRow allRows = new CsvRow(); foreach (String items in lines) { // Create new object, and add it to list. aCSVROW row = new aCSVROW(); csvROW.Add(row); // Fill the actual newly created object, not the first object in allRows. // conside adding index checks here though to avoid index out of range exceptions. row.mitarbeiter = items[0]; row.tagesdatum = items[1]; row.lohnart = items[2]; row.kostenstelle = items[3]; row.kostentraeger = items[4]; row.menge = items[5]; row.betrag = items[6]; row.belegnummer = items[7]; } // Done. All rows added to allRows.
CsvRow row = new CsvRow(); while (reader.ReadRow(row)) { foreach (string s in row) { csvROW.Add(new aCSVROW()); s.Split("^",""); csvROW[0].mitarbeiter = items[0]; csvROW[0].tagesdatum = items[1]; csvROW[0].lohnart = items[2]; csvROW[0].kostenstelle = items[3]; csvROW[0].kostentraeger = items[4]; csvROW[0].menge = items[5]; csvROW[0].betrag = items[6]; csvROW[0].belegnummer = items[7]; } } }
How to split a server address and store in a list
I have a List<string> with some 10 strings. The values are as follows: \\Server\Site\MySite\File1.xml \\Server\Site\MySite\File2.xml \\Server\Site\MySite\File2.xml ....................... \\Server\Site\MySite\File10.xml I need to extract \MySIte\File1.xml to \MySite\File10.xml and store in another list. I tried to use Split keyword, with another list to populate the splitted string. But it doesn't seem to give the correct answer. Below is the code: for(int index=0;index<list.Count;list++) { string[] myArray=list[index].Split('\\'); for(int innerIndex=0;innerIndex<myArray.Length;innerIndex++) { anotherList[innerIndex]=myArray[2]+"\\"+myArray[3]; } } Experts please help.
You don't need to work too hard if you know the input of all the strings str.Substring(str.IndexOf("\\MySite"))
One word: LINQ! var results = (from x in source let parts = x.Split('\\') select String.Join("\\", parts.Skip(1)).ToArray();
You can use following code. List<string> source = new List<string>(); source.Add(#"\\Server\Site\MySite\File1.xml"); source.Add(#"\\Server\Site\MySite\File2.xml"); source.Add(#"\\Server\Site\MySite\File2.xml"); source.Add(#"\\Server\Site\MySite\File10.xml"); foreach(string s in source) { string[] parts = s.Split(new string[]{ Path.DirectorySeparatorChar.ToString() },StringSplitOptions.RemoveEmptyEntries); Console.WriteLine(parts[parts.Length - 1] + Path.DirectorySeparatorChar + parts[parts.Length - 2]); }
I would just remove anything before \MySite and get the rest: Test data used: List<string> source = new List<string> { #"\\Server\Site\MySite\File1.xml", #"\\Server\Site\MySite\File2.xml", #"\\Server\Site\MySite\File2.xml", #"\\Server\Site\MySite\File10.xml", }; query: var result = source // Start removing at 0 and remove everything before '\MySite' .Select(x => x.Remove(0, x.IndexOf("\\MySite"))) .ToList();
C# reading variables into static string from text file
I have seen several posts giving examples of how to read from text files, and examples on how to make a string 'public' (static or const), but I haven't been able to combine the two inside a 'function' in a way that is making sense to me. I have a text file called 'MyConfig.txt'. In that, I have 2 lines. MyPathOne=C:\TestOne MyPathTwo=C:\TestTwo I want to be able to read that file when I start the form, making both MyPathOne and MyPathTwo accessible from anywhere inside the form, using something like this : ReadConfig("MyConfig.txt"); the way I am trying to do that now, which is not working, is this : public voice ReadConfig(string txtFile) { using (StreamReader sr = new StreamResder(txtFile)) { string line; while ((line = sr.ReadLine()) !=null) { var dict = File.ReadAllLines(txtFile) .Select(l => l.Split(new[] { '=' })) .ToDictionary( s => s[0].Trim(), s => s[1].Trim()); } public const string MyPath1 = dic["MyPathOne"]; public const string MyPath2 = dic["MyPathTwo"]; } } The txt file will probably never grow over 5 or 6 lines, and I am not stuck on using StreamReader or dictionary. As long as I can access the path variables by name from anywhere, and it doesn't add like 400 lines of code or something , then I am OK with doing whatever would be best, safest, fastest, easiest. I have read many posts where people say the data should stored in XML, but I figure that part really doesn't matter so much because reading the file and getting the variables part would be almost the same either way. That aside, I would rather be able to use a plain txt file that somebody (end user) could edit without having to understand XML. (which means of course lots of checks for blank lines, does the path exist, etc...I am OK with doing that part, just wanna get this part working first). I have read about different ways using ReadAllLines into an array, and some say to create a new separate 'class' file (which I don't really understand yet..but working on it). Mainly I want to find a 'stable' way to do this. (project is using .Net4 and Linq by the way) Thanks!!
The code you've provided doesn't even compile. Instead, you could try this: public string MyPath1; public string MyPath2; public void ReadConfig(string txtFile) { using (StreamReader sr = new StreamReader(txtFile)) { // Declare the dictionary outside the loop: var dict = new Dictionary<string, string>(); // (This loop reads every line until EOF or the first blank line.) string line; while (!string.IsNullOrEmpty((line = sr.ReadLine()))) { // Split each line around '=': var tmp = line.Split(new[] { '=' }, StringSplitOptions.RemoveEmptyEntries); // Add the key-value pair to the dictionary: dict[tmp[0]] = dict[tmp[1]]; } // Assign the values that you need: MyPath1 = dict["MyPathOne"]; MyPath2 = dict["MyPathTwo"]; } } To take into account: You can't declare public fields into methods. You can't initialize const fields at run-time. Instead you provide a constant value for them at compilation time.
Got it. Thanks! public static string Path1; public static string Path2; public static string Path3; public void ReadConfig(string txtFile) { using (StreamReader sr = new StreamReader(txtFile)) { var dict = new Dictionary<string, string>(); string line; while (!string.IsNullOrEmpty((line = sr.ReadLine()))) { dict = File.ReadAllLines(txtFile) .Select(l => l.Split(new[] { '=' })) .ToDictionary( s => s[0].Trim(), s => s[1].Trim()); } Path1 = dict["PathOne"]; Path2 = dict["PathTwo"]; Path3 = Path1 + #"\Test"; } }
You need to define the variables outside the function to make them accessible to other functions. public string MyPath1; // (Put these at the top of the class.) public string MyPath2; public voice ReadConfig(string txtFile) { var dict = File.ReadAllLines(txtFile) .Select(l => l.Split(new[] { '=' })) .ToDictionary( s => s[0].Trim(), s => s[1].Trim()); // read the entire file into a dictionary. MyPath1 = dict["MyPathOne"]; MyPath2 = dict["MyPathTwo"]; }
This question is similar to Get parameters out of text file (I put an answer there. I "can't" paste it here.) (Unsure whether I should "flag" this question as duplicate. "Flagging" "closes".) (Do duplicate questions ever get consolidated? Each can have virtues in the wording of the [often lame] question or the [underreaching and overreaching] answers. A consolidated version could have the best of all, but consolidation is rarely trivial.)