Read Specific Strings from Text File - c#

I'm trying to get certain strings out of a text file and put it in a variable.
This is what the structure of the text file looks like keep in mind this is just one line and each line looks like this and is separated by a blank line:
Date: 8/12/2013 12:00:00 AM Source Path: \\build\PM\11.0.64.1\build.11.0.64.1.FileServerOutput.zip Destination Path: C:\Users\Documents\.NET Development\testing\11.0.64.1\build.11.0.55.5.FileServerOutput.zip Folder Updated: 11.0.64.1 File Copied: build.11.0.55.5.FileServerOutput.zip
I wasn't entirely too sure of what to use for a delimiter for this text file or even if I should be using a delimiter so it could be subjected to change.
So just a quick example of what I want to happen with this, is I want to go through and grab the Destination Path and store it in a variable such as strDestPath.
Overall the code I came up with so far is this:
//find the variables from the text file
string[] lines = File.ReadAllLines(GlobalVars.strLogPath);
Yeah not much, but I thought perhaps if I just read one line at at a time and tried to search for what I was looking for through that line but honestly I'm not 100% sure if I should stick with that way or not...

If you are skeptical about how large your file is, you should come up using ReadLines which is deferred execution instead of ReadAllLines:
var lines = File.ReadLines(GlobalVars.strLogPath);
The ReadLines and ReadAllLines methods differ as follows:
When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned; when you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array. Therefore, when you are working with very large files, ReadLines can be more efficient.

As weird as it might sound, you should take a look to log parser. If you are free to set the file format you could use one that fits with log parser and, believe me, it will make your life a lot more easy.
Once you load the file with log parse you can user queries to get the information you want. If you don't care about using interop in your project you can even add a com reference and use it from any .net project.
This sample reads a HUGE csv file a makes a bulkcopy to the DB to perform there the final steps. This is not really your case, but shows you how easy is to do this with logparser
COMTSVInputContextClass logParserTsv = new COMTSVInputContextClass();
COMSQLOutputContextClass logParserSql = new COMSQLOutputContextClass();
logParserTsv.separator = ";";
logParserTsv.fixedSep = true;
logParserSql.database = _sqlDatabaseName;
logParserSql.server = _sqlServerName;
logParserSql.username = _sqlUser;
logParserSql.password = _sqlPass;
logParserSql.createTable = false;
logParserSql.ignoreIdCols = true;
// query shortened for clarity purposes
string SelectPattern = #"Select TO_STRING(UserName),TO_STRING(UserID) INTO {0} From {1}";
string query = string.Format(SelectPattern, _sqlTable, _csvPath);
logParser.ExecuteBatch(query, logParserTsv, logParserSql);
LogParser in one of those hidden gems Microsoft has and most people don't know about. I have use to read iis logs, CSV files, txt files, etc. You can even generate graphics!!!
Just check it here http://support.microsoft.com/kb/910447/en

Looks like you need to create a Tokenizer. Try something like this:
Define a list of token values:
List<string> gTkList = new List<string>() {"Date:","Source Path:" }; //...etc.
Create a Token class:
public class Token
{
private readonly string _tokenText;
private string _val;
private int _begin, _end;
public Token(string tk, int beg, int end)
{
this._tokenText = tk;
this._begin = beg;
this._end = end;
this._val = String.Empty;
}
public string TokenText
{
get{ return _tokenText; }
}
public string Value
{
get { return _val; }
set { _val = value; }
}
public int IdxBegin
{
get { return _begin; }
}
public int IdxEnd
{
get { return _end; }
}
}
Create a method to Find your Tokens:
List<Token> FindTokens(string str)
{
List<Token> retVal = new List<Token>();
if (!String.IsNullOrWhitespace(str))
{
foreach(string cd in gTkList)
{
int fIdx = str.IndexOf(cd);
if(fIdx > -1)
retVal.Add(cd,fIdx,fIdx + cd.Length);
}
}
return retVal;
}
Then just do something like this:
foreach(string ln in lines)
{
//returns ordered list of tokens
var tkns = FindTokens(ln);
for(int i=0; i < tkns.Length; i++)
{
int len = (i == tkns.Length - 1) ? ln.Length - tkns[i].IdxEnd : tkns[i+1].IdxBegin - tkns[i].IdxEnd;
tkns[i].value = ln.Substring(tkns[i].IdxEnd+1,len).Trim();
}
//Do something with the gathered values
foreach(Token tk in tkns)
{
//stuff
}
}

Related

Custom Class to CSV

I have a requirement to output some of our ERP data to a very specific csv format with an exact number of fields per record. Most of which we won't be providing at this time (Or have default values). To support future changes, I decided to write out the CSV format into a custom class of strings (All are strings) and readonly each of the strings we are not currently utilizing and default in the values that should go into those, most are String.Empty. So the Class looks something like this:
private class CustomClass
{
public string field1 = String.Empty;
public readonly string field2 = String.Empty; //Not going to be used
public string field3 = String.Empty;
public readonly string field4 = "N/A"; //Not going to be used
...
}
Now, after I populate the used fields, I need to take this data and export a specifically formatted comma delimited string. So using other posts on StackOverflow I came up with the following function to add to the class:
public string ToCsvFields()
{
StringBuilder sb = new StringBuilder();
foreach (var f in typeof(CustomClass).GetFields())
{
if (sb.Length > 0)
sb.Append(",");
var x = f.GetValue(this);
if (x != null)
sb.Append("\"" + x.ToString() + "\"");
}
return sb.ToString();
}
This works and gives me the exact CSV output I need for each Line when I call CustomClass.ToCsvFields(), and makes it pretty easy to maintain if the consumer of the CSV changes their column definition. But this line in-particular makes me feel like something could go wrong with Production code: var x = f.GetValue(this);
I understand what it is doing, but I generally shy away from "this" in my code; am I just being paranoid and this is totally acceptable code for this purpose?

Find comments in text and replace them using Regex

I currently go trought all my source files and read their text with File.ReadAllLines and i want to filter all comments with one regex. Basically all comment possiblities. I tried several regex solutions i found on the internet. As this one:
#"(#(?:""[^""]*"")+|""(?:[^""\n\\]+|\\.)*""|'(?:[^'\n\\]+|\\.)*')|//.*|/\*(?s:.*?)\*/"
And the top result when i google:
string blockComments = #"/\*(.*?)\*/";
string lineComments = #"//(.*?)\r?\n";
string strings = #"""((\\[^\n]|[^""\n])*)""";
string verbatimStrings = #"#(""[^""]*"")+";
See: Regex to strip line comments from C#
The second solution won't recognize any comments.
Thats what i currently do
public static List<string> FormatList(List<string> unformattedList, string dataType)
{
List<string> formattedList = unformattedList;
string blockComments = #"/\*(.*?)\*/";
string lineComments = #"//(.*?)\r?\n";
string strings = #"""((\\[^\n]|[^""\n])*)""";
string verbatimStrings = #"#(""[^""]*"")+";
string regexCS = blockComments + "|" + lineComments + "|" + strings + "|" + verbatimStrings;
//regexCS = #"(#(?:""[^""]*"")+|""(?:[^""\n\\]+|\\.)*""|'(?:[^'\n\\]+|\\.)*')|//.*|/\*(?s:.*?)\*/";
string regexSQL = "";
if (dataType.Equals("cs"))
{
for(int i = 0; i < formattedList.Count;i++)
{
string line = formattedList[i];
line = line.Trim(' ');
if(Regex.IsMatch(line, regexCS))
{
line = "";
}
formattedList[i] = line;
}
}
else if(dataType.Equals("sql"))
{
}
else
{
throw new Exception("Unknown DataType");
}
return formattedList;
}
The first Method recognizes the comments, but also finds things like
string[] bla = text.Split('\\\\');
Is there any solution to this problem? That the regex excludes the matches which are in a string/char? If you have any other links i should check out please let me know!
I tried a lot and can't figure out why this won't work for me.
[I also tried these links]
https://blog.ostermiller.org/find-comment
https://codereview.stackexchange.com/questions/167582/regular-expression-to-remove-comments
Regex to find comment in c# source file
Doing this with regexes will be very difficult, as stated in the comments. However, a fine way to eliminate comments would be by utilizing a CSharpSyntaxWalker. The syntaxwalker knows about all language constructs and won't make hard to investigate mistakes (as regexes do).
Add a reference to the Microsoft.CodeAnalysis.CSharp Nuget package and inherit from CSharpSyntaxWalker.
class CommentWalker : CSharpSyntaxWalker
{
public CommentWalker(SyntaxWalkerDepth depth = SyntaxWalkerDepth.Node) : base(depth)
{
}
public override void VisitTrivia(SyntaxTrivia trivia)
{
if (trivia.IsKind(SyntaxKind.MultiLineCommentTrivia)
|| trivia.IsKind(SyntaxKind.SingleLineCommentTrivia))
{
// Do something with the comments
// For example, find the comment location in the file, so you can replace it later.
// Make a List as a public property, so you can iterate the list of comments later on.
}
}
}
Then you can use it like so:
// Get the program text from your .cs file
SyntaxTree tree = CSharpSyntaxTree.ParseText(programText);
CompilationUnitSyntax root = tree.GetCompilationUnitRoot();
var walker = new CommentWalker();
walker.Visit(root);
// Now iterate your list of comments (probably backwards) and remove them.
Further reading:
Syntax walkers
Checking for big blocks of comments in code (NDepend, Roslyn)

Use continue key word to processed with the loop

I am reading data from excel file(which is actually a comma separated csv file) columns line-by-line, this file gets send by an external entity.Among the columns to be read is the time, which is in 00.00 format, so a split method is used read all the different columns, however the file sometimes comes with extra columns(commas between the elements) so the split elements are now always correct. Below is the code used to read and split the different columns, this elements will be saved in the database.
public void SaveFineDetails()
{
List<string> erroredFines = new List<string>();
try
{
log.Debug("Start : SaveFineDetails() - Saving Downloaded files fines..");
if (!this.FileLines.Any())
{
log.Info(string.Format("End : SaveFineDetails() - DataFile was Empty"));
return;
}
using (RAC_TrafficFinesContext db = new RAC_TrafficFinesContext())
{
this.FileLines.RemoveAt(0);
this.FileLines.RemoveAt(FileLines.Count - 1);
int itemCnt = 0;
int errorCnt = 0;
int duplicateCnt = 0;
int count = 0;
foreach (var line in this.FileLines)
{
count++;
log.DebugFormat("Inserting {0} of {1} Fines..", count.ToString(), FileLines.Count.ToString());
string[] bits = line.Split(',');
int bitsLength = bits.Length;
if (bitsLength == 9)
{
string fineNumber = bits[0].Trim();
string vehicleRegistration = bits[1];
string offenceDateString = bits[2];
string offenceTimeString = bits[3];
int trafficDepartmentId = this.TrafficDepartments.Where(tf => tf.DepartmentName.Trim().Equals(bits[4], StringComparison.InvariantCultureIgnoreCase)).Select(tf => tf.DepartmentID).FirstOrDefault();
string proxy = bits[5];
decimal fineAmount = GetFineAmount(bits[6]);
DateTime fineCreatedDate = DateTime.Now;
DateTime offenceDate = GetOffenceDate(offenceDateString, offenceTimeString);
string username = Constants.CancomFTPServiceUser;
bool isAartoFine = bits[7] == "1" ? true : false;
string fineStatus = "Sent";
try
{
var dupCheck = db.GetTrafficFineByNumber(fineNumber);
if (dupCheck != null)
{
duplicateCnt++;
string ExportFileName = (base.FileName == null) ? string.Empty : base.FileName;
DateTime FileDate = DateTime.Now;
db.CreateDuplicateFine(ExportFileName, FileDate, fineNumber);
}
else
{
var adminFee = db.GetAdminFee();
db.UploadFTPFineData(fineNumber, fineAmount, vehicleRegistration, offenceDate, offenceDateString, offenceTimeString, trafficDepartmentId, proxy, false, "Imported", username, adminFee, isAartoFine, dupCheck != null, fineStatus);
}
itemCnt++;
}
catch
{
errorCnt++;
}
}
else
{
erroredFines.Add(line);
continue;
}
}
Now the problem is, this file doesn't always come with 9 elements as we expect, for example on this image, the lines are not the same(ignore first line, its headers)
On first line FM is supposed to be part of 36DXGP instead of being two separated elements. This means the columns are now extra. Now this brings us to the issue at hand, which is the time element, beacuse of extra coma, the time is now something else, is now read as 20161216, so the split on the time element is not working at all. So what I did was, read the incorrect line, check its length, if the length is not 9 then, add it to the error list and continue.
But my continue key word doesn't seem to work, it gets into the else part and then goes back to read the very same error line.
I have checked answers on Break vs Continue and they provide good example on how continue works, I introduced the else because the format on this examples did not work for me(well the else did not made any difference neither). Here is the sample data,
NOTE the first line to be read starts with 96
H,1789,,,,,,,,
96/17259/801/035415,FM,36DXGP,20161216,17.39,city hall-cape town,Makofane,200,0,0
MA/80/034808/730,CA230721,20170117,17.43,malmesbury,PATEL,200,0,0,
what is it that I am doing so wrong here
I have found a way to solve my problem, there was an issue with the length of the line because of the trailing comma which caused an empty element, I then got rid of this empty element with this code and determined the new length
bits = bits.Where(x => !string.IsNullOrEmpty(x)).ToArray();
int length = bits.Length
All is well now
I suggest you use the following overload for performance and readability reasons:
line.Split(new char[] {','}, StringSplitOptions.RemoveEmptyEntries)l

Search a string from 500k entries in txt

I have a .txt file which has about 500k entries, each separated by new line. The file size is about 13MB and the format of each line is the following:
SomeText<tab>Value<tab>AnotherValue<tab>
My problem is to find a certain "string" with the input from the program, from the first column in the file, and get the corresponding Value and AnotherValue from the two columns.
The first column is not sorted, but the second and third column values in the file are actually sorted. But, this sorting is of no good use to me.
The file is static and does not change. I was thinking to use the Regex.IsMatch() here but I am not sure if that's the best approach here to go line by line.
If the lookup time would increase drastically, I could probably go for rearranging the first column (and hence un-sorting the second & third column). Any suggestions on how to implement this approach or the above approach if required?
After locating the string, how should I fetch those two column values?
EDIT
I realized that there will be quite a bit of searches in the file for atleast oe request by the user. If I have an array of values to be found, how can I return some kind of dictionary having a corresponding values of found matches?
Maybe with this code:
var myLine = File.ReadAllLines()
.Select(line => line.Split(new [] {' ', '\t'}, SplitStringOptions.RemoveEmptyEntries)
.Single(s => s[0] == "string to find");
myLine is an array of strings that represents a row. You may also use .AsParallel() extension method for better performance.
How many times do you need to do this search?
Is the cost of some pre-processing on startup worth it if you save time on each search?
Is loading all the data into memory at startup feasible?
Parse the file into objects and stick the results into a hashtable?
I don't think Regex will help you more than any of the standard string options. You are looking for a fixed string value, not a pattern, but I stand to be corrected on that.
Update
Presuming that the "SomeText" is unique, you can use a dictionary like this
Data represents the values coming in from the file.
MyData is a class to hold them in memory.
public IEnumerable<string> Data = new List<string>() {
"Text1\tValue1\tAnotherValue1\t",
"Text2\tValue2\tAnotherValue2\t",
"Text3\tValue3\tAnotherValue3\t",
"Text4\tValue4\tAnotherValue4\t",
"Text5\tValue5\tAnotherValue5\t",
"Text6\tValue6\tAnotherValue6\t",
"Text7\tValue7\tAnotherValue7\t",
"Text8\tValue8\tAnotherValue8\t"
};
public class MyData {
public String SomeText { get; set; }
public String Value { get; set; }
public String AnotherValue { get; set; }
}
[TestMethod]
public void ParseAndFind() {
var dictionary = Data.Select(line =>
{
var pieces = line.Split(new char[] { '\t' }, StringSplitOptions.RemoveEmptyEntries);
return new MyData {
SomeText = pieces[0],
Value = pieces[1],
AnotherValue = pieces[2],
};
}).ToDictionary<MyData, string>(dat =>dat.SomeText);
Assert.AreEqual("AnotherValue3", dictionary["Text3"].AnotherValue);
Assert.AreEqual("Value7", dictionary["Text7"].Value);
}
hth,
Alan
var firstFoundLine = File.ReadLines("filename").FirstOrDefault(s => s.StartsWith("string"));
if (firstFoundLine != "")
{
char yourColumnDelimiter = '\t';
var columnValues = firstFoundLine.Split(new []{yourColumnDelimiter});
var secondColumn = columnValues[1];
var thirdColumns = columnValues[2];
}
File.ReadLines is better than File.RealAllLines because you won't need to read the whole file -- only until matching string is found http://msdn.microsoft.com/en-us/library/dd383503.aspx
Parse this monstrosity into some sort of database.
SQL Server/MySQL would be preferable, but if you can't use them for various reasons, SQLite or even Access or Excel could work.
Doing that a single time is not hard.
After you are done with that, searching will become easy and fast.
GetLines(inputPath).FirstOrDefault(p=>p.Split(",")[0]=="SearchText")
private static IEnumerable<string> GetLines(string inputFile)
{
string filePath = Path.Combine(Directory.GetCurrentDirectory(),inputFile);
return File.ReadLines(filePath);
}

Searching a String for a certain thing, then removing up to a certain point to a list

Im working on an Automatic Downloader of sorts for personal use, and so far I have managed to set up the program to store the source of the link provided into a string, the links to the downloads are written in plain text in the source, So what I need to be able to do, is search a string for say "http://media.website.com/folder/" and have it return all occurences to a list? the problem is though, I also need the unique id given for each file after the /folder/" to be stored with each occurence of the above, Any ideas? Im using Visual C#.
Thanks!!!
Steven
Maybe something like this?
Dictionary<string, string> dictionary = new Dictionary<string, string>();
string searchText = "Text to search here";
string textToFind = "Text to find here";
string fileID = "";
bool finished = false;
int foundIndex = 0;
while (!finished)
{
foundIndex = searchText.IndexOf(textToFind, foundIndex);
if (foundIndex == -1)
{
finished = true;
}
else
{
//get fieID, change to whatever logic makes sense, in this example
//it assumes a 2 character identifier following the search text
fileID = searchText.Substring(foundIndex + searchText.Length, 2);
dictionary.Add(fileID, textToFind);
}
}
use Regex to get the matches, that will give you a list of all the matches. Use wildcards for the numeric value that will differ between matches, so you can parse for it.
I'm not great with Regex, but it'd be something like,
Regex.Match(<your string>,#"(http://media.website.com/folder/)(d+)")
Or
var textToFind = "http://media.website.com/folder/";
var ids = from l in listOfUrls where l.StartsWith(textToFind) select new { RawUrl = l, ID=l.Substring(textToFind.Length)}

Categories