Read text from a text file with specific pattern

Read text from a text file with specific pattern - c#

Hi there I have a requirement where i need to read content from a text file. The sample text content is as below.
Name=Check_Amt
Public=Yes
DateName=pp
Name=DBO
I need to read the text and only extract the value which comes after Name='What ever text'.
So I am expecting the output as Check_Amt, DBO
I need to do this in C#

When querying data (e.g. file lines) Linq is often a convenient tool; if the file has lines in
name=value
format, you can query it like this
Read file lines
Split each line into name, value pair
Filter pairs by their names
Extract value from each pair
Materialize values into a collection
Code:
using System.Linq;
...
// string[] {"Check_Amt", "DBO"}
var values = File
.ReadLines(#"c:\MyFile.txt")
.Select(line => line.Split(new char[] { '=' }, 2)) // split into name, value pairs
.Where(items => items.Length == 2) // to be on the safe side
.Where(items => items[0] == "Name") // name == "Name" only
.Select(items => items[1]) // value from name=value
.ToArray(); // let's have an array
finally, if you want comma separated string, Join the values:
// "Check_Amt,DBO"
string result = string.Join(",", values);

Another way:
var str = #"Name=Check_Amt
Public=Yes
DateName=pp
Name=DBO";
var find = "Name=";
var result = new List<string>();
using (var reader = new StringReader(str)) //Change to StreamReader to read from file
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (line.StartsWith(find))
result.Add(line.Substring(find.Length));
}
}

You can use LINQ to select what you need:
var names=File. ReadLines("my file.txt" ).Select(l=>l.Split('=')).Where(t=>t.Length==2).Where(t=>t[0]=="Name").Select(t=>t[1])

I think that the best case would be a regex.
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(?<=Name=).*?(?=Public)";
string input = #"Name=Check_Amt Public=Yes DateName=pp Name=DBO";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
EDIT: My answer was written before your question were corrected, while it's still working the LINQ answer would be better IMHO.

Related

Dynamically concatenate value in list if pattern matched

I have a list of string and an array of pattern
List<string> filePaths = Directory.GetFiles(dir, filter).ToList();
string[] prefixes = { "0.", "1.", "2.", "3.", "4.", "5.", "6.", "7.", "8.", "9." };
I want to replace value in filePaths for example like this:
"1. fileA" becomes "01. fileA"
"2. fileB" becomes "02. fileB"
"10. fileC" becomes "10. fileC" (since "10." is not in prefixes list)
Is there a way to do this without looping?

You can do the following, using Select:
class Program
{
static void Main(string[] args)
{
string[] prefixes = { "0.", "1.", "2.", "3.", "4.", "5.", "6.", "7.", "8.", "9." };
var result = Directory.GetFiles(dir, filter).Select(s => prefixes.Contains(s.Substring(0, 2)) ? "0" + s : s).ToList();
}
}
You enumerate the enumerable to check for the condition whether padding is needed, if so you pad, otherwise just return the original value.

No need for a prefixes list, you can just pad left with 0's using regex:
string input = "1. fileA";
string result = Regex.Replace(input, #"^\d+", m => m.Value.PadLeft(2, '0'));
To use on the whole list:
var filePaths = Directory.GetFiles(dir, filter).Select(s => Regex.Replace(s, #"^\d+", m => m.Value.PadLeft(2, '0'))).ToList();

Unable to order a list using both OrderBy or Sort

So I am trying to sort a file out in a descending order.
The text file looks something like this:
%[TIMESTAMP=1441737006376][EVENT=agentStateEvent][queue=79651][agentID=61871][extension=22801][state=2][reason=0]%
%[TIMESTAMP=1441737006102][EVENT=agentStateEvent][queue=79654][agentID=62278][extension=22828][state=2][reason=0]%
%[TIMESTAMP=1441737006105][EVENT=CallControlTerminalConnectionTalking][callID=2619][ucid=10000026191441907765][deviceType=1][deviceName=21775][Queue=][Trunk=384:82][TrunkType=1][TrunkState=1][Cause=100][CalledDeviceID=07956679058][CallingDeviceID=21775][extension=21775]%
and basically I want the end result to only output unique values of the timestamp. I have used substring to get rid of the excess text, and it outputs fine as shown below:
[TIMESTAMP=1441737006376]
[TIMESTAMP=1441737006102]
[TIMESTAMP=1441737006105]
however i want it to order in the following order (basically numeric descending to ascending):
[TIMESTAMP=1441737006102]
[TIMESTAMP=1441737006105]
[TIMESTAMP=1441737006376]
I have tried the .sort and .orderBy but not having any joy. I wouldve using this prior to doing any substring formatting wouldve sufficed but clearly not.
Code is as follows:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace FedSorter
{
class Program
{
static void Main(string[] args)
{
int counter = 0;
string line;
string readIn = "C:\\Users\\xxx\\Desktop\\Files\\ex1.txt";
System.IO.TextWriter writeOut = new StreamWriter("C:\\Users\\xxx\\Desktop\\Files\\ex1_new.txt");
List<String> list = new List<String>();
// Read the file and display it line by line.
System.IO.StreamReader file = new System.IO.StreamReader(readIn);
string contents = "";
string checkValues = "";
while ((line = file.ReadLine()) != null)
{
string text = line;
text = text.Substring(1, 25);
if (!checkValues.Contains(text))
{
list.Add(text);
Console.WriteLine(text);
writeOut.WriteLine(text);
counter++;
}
contents = text;
checkValues += contents + ",";
}
list = list.OrderBy(x => x).ToList();
writeOut.Close();
file.Close();
orderingFile();
}
public static void orderingFile()
{
string line = "";
string readIn = "C:\\Users\\xxx\\Desktop\\Files\\ex1_new.txt";
System.IO.TextWriter writeOut = new StreamWriter("C:\\Users\\xxx\\Desktop\\Files\\ex1_new2.txt");
List<String> ordering = new List<String>();
// Read the file and display it line by line.
System.IO.StreamReader file = new System.IO.StreamReader(readIn);
while ((line = file.ReadLine()) != null)
{
ordering.OrderBy(x => x).ToList();
ordering.Add(line);
writeOut.WriteLine(line);
}
writeOut.Close();
file.Close();
}
}
}

You are creating a new list and you need to assign it to the variable
list = list.OrderBy(x => x).ToList();
However it doesn't look like you even use list after you create and sort it. Additionally you have the same issue in the orderingFile method with
ordering.OrderBy(x => x).ToList();
However instead of sorting and creating a new list on each line it would be better to use a SortedList<TKey, TValue> that will keep the contents sorted as you add to it.
But again you are not actually using the ordering list after you finish adding to it in the foreach. If you are looking to read the values in a file, sort them and then output them to another file, then you need to do it in that order.

Aside from #juharr's correct answer, you would do well to take advantage of LINQ to simplify your code greatly.
string readIn = "C:\\Users\\xxx\\Desktop\\Files\\ex1.txt";
var timestamps = File.ReadAllLines(readIn)
.Select(l => l.Substring(1, 25))
.Distinct()
.OrderBy(t => t)
.ToArray();
To write out the values, you can either use a foreach on timestamps and write out each line to your TextWriter, or you can use the File class again:
string readOut = "C:\\Users\\xxx\\Desktop\\Files\\ex1_new.txt";
File.WriteAllLines(readOut, timestamps);
//notice I've changed it to ToArray in the first part instead of ToList.

Check whether a string is in a list at any order in C#

If We have a list of strings like the following code:
List<string> XAll = new List<string>();
XAll.Add("#10#20");
XAll.Add("#20#30#40");
string S = "#30#20";//<- this is same as #20#30 also same as "#20#30#40" means S is exist in that list
//check un-ordered string S= #30#20
// if it is contained at any order like #30#20 or even #20#30 ..... then return true :it is exist
if (XAll.Contains(S))
{
Console.WriteLine("Your String is exist");
}
I would prefer to use Linq to check that S in this regard is exist, no matter how the order is in the list, but it contains both (#30) and (#20) [at least] together in that list XAll.
I am using
var c = item2.Intersect(item1);
if (c.Count() == item1.Length)
{
return true;
}

You should represent your data in a more meaningful way. Don't rely on strings.
For example I would suggest creating a type to represent a set of these numbers and write some code to populate it.
But there are already set types such as HashSet which is possibly a good match with built in functions for testing for sub sets.
This should get you started:
var input = "#20#30#40";
var hashSetOfNumbers = new HashSet<int>(input
.Split(new []{'#'}, StringSplitOptions.RemoveEmptyEntries)
.Select(s=>int.Parse(s)));

This works for me:
Func<string, string[]> split =
x => x.Split(new [] { '#' }, StringSplitOptions.RemoveEmptyEntries);
if (XAll.Any(x => split(x).Intersect(split(S)).Count() == split(S).Count()))
{
Console.WriteLine("Your String is exist");
}
Now, depending on you you want to handle duplicates, this might even be a better solution:
Func<string, HashSet<string>> split =
x => new HashSet<string>(x.Split(
new [] { '#' },
StringSplitOptions.RemoveEmptyEntries));
if (XAll.Any(x => split(S).IsSubsetOf(split(x))))
{
Console.WriteLine("Your String is exist");
}
This second approach uses pure set theory so it strips duplicates.

Speedily Read and Parse Data

As of now, I am using this code to open a file and read it into a list and parse that list into a string[]:
string CP4DataBase =
"C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
CP4DataBaseRTB.LoadFile(CP4DataBase, RichTextBoxStreamType.PlainText);
string[] splitCP4DataBaseLines = CP4DataBaseRTB.Text.Split('\n');
List<string> tempCP4List = new List<string>();
string[] line1CP4Components;
foreach (var line in splitCP4DataBaseLines)
tempCP4List.Add(line + Environment.NewLine);
string concattedUnitPart = "";
foreach (var line in tempCP4List)
{
concattedUnitPart = concattedUnitPart + line;
line1CP4PartLines++;
}
line1CP4Components = new Regex("\"UNIT\",\"PARTS\"", RegexOptions.Multiline)
.Split(concattedUnitPart)
.Where(c => !string.IsNullOrEmpty(c)).ToArray();
I am wondering if there is a quicker way to do this. This is just one of the files I am opening, so this is repeated a minimum of 5 times to open and properly load the lists.
The minimum file size being imported right now is 257 KB. The largest file is 1,803 KB. These files will only get larger as time goes on as they are being used to simulate a database and the user will continually add to them.
So my question is, is there a quicker way to do all of the above code?
EDIT:
***CP4***
"UNIT","PARTS"
"BLOCK","HEADER-"
"NAME","106536"
"REVISION","0000"
"DATE","11/09/03"
"TIME","11:10:11"
"PMABAR",""
"COMMENT",""
"PTPNAME","R160805"
"CMPNAME","R160805"
"BLOCK","PRTIDDT-"
"PMAPP",1
"PMADC",0
"ComponentQty",180
"BLOCK","PRTFORM-"
"PTPSZBX",1.60
"PTPSZBY",0.80
"PTPMNH",0.25
"NeedGlue",0
"BLOCK","TOLEINF-"
"PTPTLBX",0.50
"PTPTLBY",0.40
"PTPTLCL",10
"PTPTLPX",0.30
"PTPTLPY",0.30
"PTPTLPQ",30
"BLOCK","ELDT+" "PGDELSN","PGDELX","PGDELY","PGDELPP","PGDELQ","PGDELP","PGDELW","PGDELL","PGDELWT","PGDELLT","PGDELCT","PGDELR"
0,0.000,0.000,0,0,0.000,0.000,0.000,0.000,0.000,0.000,0
"BLOCK","VISION-"
"PTPVIPL",0
"PTPVILCA",0
"PTPVILB",0
"PTPVICVT",10
"PENVILIT",0
"BLOCK","ENVDT"
"ELEMENT","CP43ENVDT-"
"PENNMI",1.0
"PENNMA",1.0
"PENNZN",""
"PENNZT",1.0
"PENBLM",12
"PENCRTS",0
"PENSPD1",100
"PTPCRDCT",0
"PENVICT",1
"PCCCRFT",1
"BLOCK","CARRING-"
"PTPCRAPO",0
"PTPCRPCK",0
"PTPCRPUX",0.00
"PTPCRPUY",0.00
"PTPCRRCV",0
"BLOCK","PACKCLS-"
"FDRTYPE","Emboss"
"TAPEWIDTH","8mm"
"FEEDPITCH",4
"REELDIAMETER",0
"TAPEDEPTH",0.0
"DOADVVACUUM",0
"CHKBEFOREFEED",0
"TAPEARMLENGTH",0
"PPCFDPP",0
"PPCFDEC",4
"PPCMNPT",30
"UNIT","PARTS"
"BLOCK","HEADER-"
"NAME","106653"
"REVISION","0000"
"DATE","11/09/03"
"TIME","11:10:42"
"PMABAR",""
"COMMENT",""
"PTPNAME","0603R"
"CMPNAME","0603R"
"BLOCK","PRTIDDT-"
"PMAPP",1
"PMADC",0
"ComponentQty",18
"BLOCK","PRTFORM-"
"PTPSZBX",1.60
"PTPSZBY",0.80
"PTPMNH",0.23
"NeedGlue",0
"BLOCK","TOLEINF-"
"PTPTLBX",0.50
"PTPTLBY",0.34
"PTPTLCL",0
"PTPTLPX",0.60
"PTPTLPY",0.40
"PTPTLPQ",30
"BLOCK","ELDT+" "PGDELSN","PGDELX","PGDELY","PGDELPP","PGDELQ","PGDELP","PGDELW","PGDELL","PGDELWT","PGDELLT","PGDELCT","PGDELR"
0,0.000,0.000,0,0,0.000,0.000,0.000,0.000,0.000,0.000,0
"BLOCK","VISION-"
"PTPVIPL",0
"PTPVILCA",0
"PTPVILB",0
"PTPVICVT",10
"PENVILIT",0
"BLOCK","ENVDT"
"ELEMENT","CP43ENVDT-"
"PENNMI",1.0
"PENNMA",1.0
"PENNZN",""
"PENNZT",1.0
"PENBLM",12
"PENCRTS",0
"PENSPD1",80
"PTPCRDCT",0
"PENVICT",1
"PCCCRFT",1
"BLOCK","CARRING-"
"PTPCRAPO",0
"PTPCRPCK",0
"PTPCRPUX",0.00
"PTPCRPUY",0.00
"PTPCRRCV",0
"BLOCK","PACKCLS-"
"FDRTYPE","Emboss"
"TAPEWIDTH","8mm"
"FEEDPITCH",4
"REELDIAMETER",0
"TAPEDEPTH",0.0
"DOADVVACUUM",0
"CHKBEFOREFEED",0
"TAPEARMLENGTH",0
"PPCFDPP",0
"PPCFDEC",4
"PPCMNPT",30
... the file goes on and on and on.. and will only get larger.
The REGEX is placing each "UNIT PARTS" and the following code until the NEXT "UNIT PARTS" into a string[].
After this, I am checking each string[] to see if the "NAME" section exists in a different list. If it does exist, I am outputting that "UNIT PARTS" at the end of a textfile.

This bit is a potential performance killer:
string concattedUnitPart = "";
foreach (var line in tempCP4List)
{
concattedUnitPart = concattedUnitPart + line;
line1CP4PartLines++;
}
(See this article for why.) Use a StringBuilder for repeated concatenation:
// No need to use tempCP4List at all
StringBuilder builder = new StringBuilder();
foreach (var line in splitCP4DataBaseLines)
{
concattedUnitPart.AppendLine(line);
line1CP4PartLines++;
}
Or even just:
string concattedUnitPart = string.Join(Environment.NewLine,
splitCP4DataBaseLines);
Now the regex part may well also be slow - I'm not sure. It's not obvious what you're trying to achieve, whether you need regular expressions at all, or whether you really need to do the whole thing in one go. Can you definitely not just process it line by line?

You could achieve the same output list 'line1CP4Components' using the following:
Regex StripEmptyLines = new Regex(#"^\s*$", RegexOptions.Multiline);
Regex UnitPartsMatch = new Regex(#"(?<=\n)""UNIT"",""PARTS"".*?(?=(?:\n""UNIT"",""PARTS"")|$)", RegexOptions.Singleline);
string CP4DataBase =
"C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
CP4DataBaseRTB.LoadFile(CP4DataBase, RichTextBoxStreamType.PlainText);
List<string> line1CP4Components = new List<string>(
UnitPartsMatch.Matches(StripEmptyLines.Replace(CP4DataBaseRTB.Text, ""))
.OfType<Match>()
.Select(m => m.Value)
);
return line1CP4Components.ToArray();
You may be able to ignore the use of StripEmptyLines, but your original code is doing this via the Where(c => !string.IsNullOrEmpty(c)). Also your original code is causing the '\r' part of the "\r\n" newline/linefeed pair to be duplicated. I assumed this was an accident and not intentional?
Also you don't seem to be using the value in 'line1CP4PartLines' so I omitted the creation of the value. It was seemingly inconsistent with the omission of empty lines later so I guess you're not depending on it. If you need this value a simple regex can tell you how many new lines are in the string:
int linecount = new Regex("^", RegexOptions.Multiline).Matches(CP4DataBaseRTB.Text).Count;

// example of what your code will look like
string CP4DataBase = "C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
List<string> Cp4DataList = new List<string>(File.ReadAllLines(CP4DataBase);
//or create a Dictionary<int,string[]> object
string strData = string.Empty;//hold the line item data which is read in line by line
string[] strStockListRecord = null;//string array that holds information from the TFE_Stock.txt file
Dictionary<int, string[]> dctStockListRecords = null; //dictionary object that will hold the KeyValuePair of text file contents in a DictList
List<string> lstStockListRecord = null;//Generic list that will store all the lines from the .prnfile being processed
if (File.Exists(strExtraLoadFileLoc + strFileName))
{
try
{
lstStockListRecord = new List<string>();
List<string> lstStrLinesStockRecord = new List<string>(File.ReadAllLines(strExtraLoadFileLoc + strFileName));
dctStockListRecords = new Dictionary<int, string[]>(lstStrLinesStockRecord.Count());
int intLineCount = 0;
foreach (string strLineSplit in lstStrLinesStockRecord)
{
lstStockListRecord.Add(strLineSplit);
dctStockListRecords.Add(intLineCount, lstStockListRecord.ToArray());
lstStockListRecord.Clear();
intLineCount++;
}//foreach (string strlineSplit in lstStrLinesStockRecord)
lstStrLinesStockRecord.Clear();
lstStrLinesStockRecord = null;
lstStockListRecord.Clear();
lstStockListRecord = null;
//Alter the code to fit what you are doing..

Get values from textfile with C#

I've got a textfile which contains the following data:
name = Very well sir
age = 23
profile = none
birthday= germany
manufacturer = Me
And I want to get the profile, birthday and manufacturer value but can't seem to get it right. I succeded including the file into my program but there it stops. I just can't figure out how I will clean the textfile up.
Here's my current code: http://sv.paidpaste.com/HrQXbg

using System;
using System.IO;
using System.Linq;
class Program
{
static void Main()
{
var data = File
.ReadAllLines("test.txt")
.Select(x => x.Split('='))
.Where(x => x.Length > 1)
.ToDictionary(x => x[0].Trim(), x => x[1]);
Console.WriteLine("profile: {0}", data["profile"]);
Console.WriteLine("birthday: {0}", data["birthday"]);
Console.WriteLine("manufacturer: {0}", data["manufacturer"]);
}
}

I would suggest instead of using ReadToEnd, reading each line and doing a string.Split('=') and then a string.Trim() on each line text. You should be left with 2 values per line, the first being the key and the second, the value.
For example, in your reading loop:
List<string[]> myList = new List<string[]>();
string[] splits = nextLine.Split('=');
if (splits.Length == 2)
myList.Add(splits);

You need to split into lines first and then split the lines:
StreamReader reader = new StreamReader(filePath);
string line;
while(null != (line=reader.Read())
{
string[] splitLine = strLines.Split('=');
//Code to find specific items based on splitLine[0] - Example
//TODO: Need a check for splitLine length
case(splitLine[0].ToLower().Trim())
{
case "age": { age = int.Parse(splitLine[1]);
}
}
reader.Dispose();
This should make a good start for you.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Read text from a text file with specific pattern - c#

You can use LINQ to select what you need: var names=File. ReadLines("my file.txt" ).Select(l=>l.Split('=')).Where(t=>t.Length==2).Where(t=>t[0]=="Name").Select(t=>t[1])

Related

Dynamically concatenate value in list if pattern matched

Unable to order a list using both OrderBy or Sort

Check whether a string is in a list at any order in C#

Speedily Read and Parse Data

Get values from textfile with C#

Categories

Resources