C# Reading file and pulling out specific lines - c#

Thanks in advance. What i currently have is a filedialog box selecting a text file. The text file contents will look like example.txt, and it needs to look like output.txt. NOTE: the string CoreDBConnectString= is 1 line all the way to ;Database=Source_DB.
example.txt
[Information]
Date=
CreatedBy=
Unique=eqwe-asd-123-as12-3
CoreDataSource=
CoreDBConnectString=Provider=SQLOLEDB.1;Server=Server;Integrated Security=SSPI;Database=Source_DB
NoteDataSource=SQLServer
NoteDBConnectString=Provider=Provider=SQLOLEDB.1;Server=Server;Integrated Security=SSPI;Database=Source_DB
CoreDBCaseID=99
NoteDBCaseID=99
Output.txt
Table=99 (Comes from CoreDBCaseID)
Server=Server (comes from the string CoreDBConnectString=)
Security=SSPI (comes from the string CoreDBConnectString=)
Database=Source_DB (comes from the string CoreDBConnectString=)

You can do something like this:
// Load the file contents
string contents = File.ReadAllText("example.txt");
// Obtain the data using regular expressions
string id = string id = Regex.Match(
contents,
#"CoreDBCaseID=(?<id>\d+)").Groups["id"].Value;
string server = string.Empty; // Add regex here
string security = string.Empty; // Add regex here
string database = string.Empty; // Add regex here
// Save the data in the new format
string[] data = new string[] {
String.Format("Table={0}", id),
String.Format("Server={0}", server),
String.Format("Security={0}", security),
String.Format("Database={0}", database)
};
File.WriteAllLines("output.txt", data);
And a quick way to learn those regular expressions:
Regular Expression Cheat Sheet
Regular-Expressions.info
You can use a positive lookahead to stop matching at the (;) character. Something like this:
#"Security=(?[\D]+)(?=;)"

That is an INI file, which is very old school. Nowadays, we programming folks use XML instead. Consequently, there is no built-in support for INI files in C#.
You can P/Invoke GetPrivateProfileString to read the data.
You can use WritePrivateProfileString to write the new data out if you don't mind the information section header like so :
[Information]
Table=99
Server=Server
Security=SSPI
Database=Source_DB
This CodeProject article on INI-file handling with C# may help.

I would use a combination of StreamReader.ReadLine and RegEx to read each line and extract the appropriate information.

Open the file and read it in one line at a time. You can use Regular Expressions (cheat sheet) to match and parse only the text you want to find.

INI files are old skool. Thus, someone has already written a class for it: http://www.codeproject.com/KB/cs/readwritexmlini.aspx. The class linked reads XML, INI, Registry, and Config Files.

Related

What is the best way to check a .TXT extension file for CSV format data?

I need to Export & Import TXT file fill-up with CSV format data. I need want to do it in MVC4. What is the best approach to do this ?
Txt file can contain a large number of CSV format data,
Just run it through a CSV parser (I've used this one in the past - worked fine) and check that it makes semantic sense, and has the same number of columns on each row. That would be very unlikely if it wasn't CSV data. Note: columns != commas - you need to watch out for quoted data "like, this", and line-breaks - both of which a parser will help you with. You cannot just Split by ',' or use line-endings to detect rows - CSV is more complex than that.
If all you want is to check the file extension, then using lastIndexOf or Split will pretty much do an excellent trick for you.
Using endsWith
String myFile = "some.file.txt";
System.out.println(myFile.endsWith(".txt"));
Using split
String myFile = "some.file.txt";
String[] myFileArray = myFile.split("\\.(?=[^\\.]+$)");
if (myFileArray[myFileArray.length - 1].equalsIgnoreCase("txt")) {
System.out.println("Ends with .txt");
}

Data processing puzzle/headache

I have a CSV file I need to process which is a bit of a nightmare. Esentially it is the following
"Id","Name","Description"
"1","Test1","Test description text"
"2","Test2","<doc><style>body{font-family:"Calibri","sans-serif";}</style><p class="test_class"
name="test_name">Lots of word xdoc content here.</p></doc>"
"guid-xxxx-xxxx-xxxx-xxxx","Test3","Test description text 3"
I'm using the File Helpers library to process the CSV rather than reinvent the wheel. However, due to the description field containing unescaped Word xdoc xml which contains quotes it's getting rather confused when it comes to the start and end points of each record.
The following is an example mapping class.
[DelimitedRecord(","), IgnoreFirst(1), IgnoreEmptyLines()]
public class CSVDoc
{
#region Properties
[FieldQuoted('"', QuoteMode.AlwaysQuoted), FieldTrim(TrimMode.Both)]
public string Id;
[FieldQuoted('"', QuoteMode.AlwaysQuoted), FieldTrim(TrimMode.Both)]
public string Name;
[FieldQuoted('"', QuoteMode.AlwaysQuoted), FieldTrim(TrimMode.Both)]
public string Description;
[FieldQuoted('"', QuoteMode.AlwaysQuoted), FieldTrim(TrimMode.Both)]
}
I considered (despite my hate of regex for this kind of task) replacing all " with ' and then using ((?<=(^|',))'|'(?=($|,'))) pattern to replace all ' with " at the start and end of lines and where they are formatted ','. However, the dirty file contains some lines which end with a " and some css style attributes which are formatted ","
So now I'm left scratching my head trying to figure out how to do this and how it can be automated.
Any ideas?
You're going to have to re-invent the wheel, because that's not valid CSV or indeed a reasonable file at all - it doesn't have any sort of provably consistent escaping rules (e.g. we don't know if the plain-text columns are escaped correctly or not).
Your best bet is to ask the person producing this to fix the bug, it should be e.g.:
"2","Test2","<doc><style>body{font-family:""Calibri"",""sans-serif"";}</style><p class=""test_class""
name=""test_name"">Lots of word xdoc content here.</p></doc>"
Which your parser should handle fine, and which should not be hard for them to produce in a simple and efficient manner.
Failing that, you'll have to hand-code the parser to:
Read a line.
Check for unescaped " (any "that isn't followed by a " a , or whitespace.
If none found, parse as CSV.
If any found, parse as this horrible thing until you hit the line ending with "
It may be easier to look for < if that is consistently not used in the other lines. Or perhaps for <doc if it consistently identifies the correct rows.
If you don't mind doing some pre-processing before, you can change the first and second "," to "|" and then use FileHelper to parse the file normally (Assuming you don't have | in the last column where there are HTML tags)
The pre-processing could be something like (Simple pseudo code) :
var sb = new StringBuilder()
var regex = new Regex("\",\"");
foreach(string line in textFileLines)
{
sb.AppendLine(regex.Replace(line , "\"|\"", 2));
}
I worked on the CSV-1203 File Format standard a few months ago, so the first thing to realise is that you're not dealing with a CSV file - even though it's named "xyz.CSV".
As said by others here, it will be easier to write your own reader, they're not too difficult. I too have a hatred of everything regex, but the good news is you can code any solution without ever using it.
A couple of things: There's a really weird thing Excel does to CSV files that begin with the two capital letters ID (without quotes). It thinks your CSV is a corrupted SYLK file! Try it.
For details of this issue and a detailed CSV File Format specification, please refer to http://mastpoint.curzonnassau.com/csv-1203

Inserting a Line at a specific place in a txt using .net

The process I currently use to insert a string into a text file is to read the file and modify it as I write the file out again which seems to be how everyone is doing it.
Since .net has such a diverse library I was wondering if there was a specific command for inserting a string at a specific line in a txt file.
I would imagine it would look something like this:
dim file as file
file.open("filePath")
file.insert(string, location) 'this isn't actually an option, I tried
No, there's nothing specifically to do that in .NET.
You can do it very simply if you're happy to read all the lines of text in:
var lines = File.ReadAllLines("file.txt").ToList();
lines.Insert(location, text);
File.WriteAllLines("file.txt", lines);
It would be more efficient in terms of memory to write code to copy a line at a time, inserting the new line at the right point while you copy, but that would be more work. If you know your file will be small, I'd go for the above.
You could simply read all the text into a string, then use the string insert method, e.g.
File.WriteAllText("file.txt", File.ReadAllText("file.txt").Insert(startIndex, "value"));

Edit resource files and search string in .cs files.

How can I automate searching for strings in all .cs files and add certain code for localization, where I can use a key in resource files. Let's say there is a
string s = "A"
in cs files. I need to change it to something like,
string s = ("A","ResourceFileKey")
and then add to the resource file keys with country specific values. Is there any tool available? Presently, I am using macros and searching ...
If you just want to get all string literals out of your C# code to put them into your resource file, I suggest not to parse your C# code, but the IL code generated by the C# compiler, that ist much (!) easier.
Here is a helpful link with some code showing how to parse IL code:
http://www.codeproject.com/KB/cs/sdilreader.aspx
That, of course, does not solve your problem how to modify your existing code.
You can write your own. Its a simple String.Replace call.
Read your file using FileStream Execute ReadToEnd method and you'll get a string. Then use String.Replace on it which will again return you a modified string. Replace your file content with the new string and save.

String manipulation in C#: split on `/`

I need to extract headstone data from an inscription file (structured text file). From this file I am supposed to extract the name of a deceased person, date of birth (or age) and also personal messages. The application should be able to analyse a raw text file and then extract information and display it in tabular form.
The raw text files looks like this:
In loving memory of/JOHN SMITH/who died on 13.02.07/at age 40/we will
miss you/In loving memory of/JANE AUSTIN/who died on 10.06.98/born on
19.12.80/hope you are well in heaven.
Basically / is a delimiter and the name of a deceased person is always in capital letters. I have tried to use String.Split() and substring methods but I can't get it to work for me; I can only get raw data without the delimiter (Environment.Newline) but I don't know how to extract specific information.
You'll need something like:
Open your data file (System.IO)
For each line, do (tip: pick a stream where you can read line by line)
Split them by "/" getting a string[]
Arrays in C# starts at 0; so splitted[1] will be that name, ans so on
Store that information in a managed collection (maybe to use generics?)
Display that collection in a tabular view
Check out How to: Use Regular Expressions to Extract Data Fields. I know the example is in C++, but the Regex and Match classes should help you get started
Here is another good example of parsing out individual sentences.
I would have thought something like this would do the trick
private static void GetHeadStoneData()
{
using(StreamReader read = new StreamReader("YourFile.txt"))
{
const char Delimter = '/';
string data;
while((data = read.ReadLine()) != null)
{
string[] headStoneInformation = data.Split(Delimter);
//Do your stuff here, e.g. Console.WriteLine("{0}", headStoneInformation[0]);
}
}
}

Categories