Eead text file with fixed columns in C# - c#

is there any way to read text files with fixed columns in C # without using regex and substring?
I want to read a file with fixed columns and transfer the column to an excel file (.xlsx)
example 1
POPULACAO
MUNICIPIO UF CENSO 2010
AC 78.507
AC 15.100
Rio Branco AC 336.038
Sena Madureira AC 38.029
example 2
POPULACAO
MUNICIPIO UF CENSO 2010
AC 78.507
Epitaciolândia AC 15.100
Rio Branco AC 336.038
Sena Madureira AC 38.029
remembering that I have a case as in the second example where a column is blank, I can get the columns and the values ​​using regex and / or substring, but if it appears as a file in Example 2, with the regex line of the file is ignored, so does substring.

Assuming you mean "fixed columns" extremely literally, and every single non-terminal column is exactly the same width, each column is separated by exactly one space, yes, you can get away with using neither regex or substring. If that's the case - and bear in mind that's also suggesting that every single person in the database has a name that's exactly four letters long - then you can just read the file in by lines. Id would be line[0].ToString(), name would be new string(new char[] { line[2], line[3], line[4], line[5]), etc.
Or, for any given value:
var str = new StringBuilder();
for (int i = firstIndex; i < lastIndex; i++)
{
str.Append(line[i]);
}
But this is basically just performing the exact function of Substring. Substring isn't your problem - handling empty values in the first (city) column is. So, for any given line, you need to check whether the line is empty:
foreach (line in yourLines)
{
if (line.Substring(cityStartIndex, cityEndIndex).IsNullOrWhitespace) == "")
{
continue;
}
}
Alternately, if you're sure the city name will always be at the very first index of the line:
foreach (line in yourLines)
{
if (line[0] == ' ') { continue; }
}
And if the value you got from the city cell was valid, you'd store that value and continue on to using Substring with the indices of the rest of the values in the row.

If for whatever reason you don't want to use a regular expression or Substring(), you have a couple of other options:
String.Split, e.g. var columns = line.Split(' ');
String.Chars, using the known widths of each column to build your output;

Why not just use string.Split()?
Something like:
using (StreamReader stream = new StreamReader(file)) {
while (!stream.EndOfStream) {
string line = stream.ReadLine();
if (string.IsNullOrWhitespace(line))
continue;
string[] fields = line.Split((char[])null, StringSplitOptions.RemoveEmptyEntries);
int ID = -1, age = -1;
string name = null, training = null;
ID = int.Parse(fields[0]);
if (fields.Length > 1)
name = fields[1];
if (fields.Length > 2)
age = int.Parse(fields[2]);
if (fields.Length > 3)
training = fields[3];
// do stuff
}
}
Only downside to this is that it will allow fields of arbitrary length. And spaces in fields will break the fields.
As for regular expressions being ignored in the last case, try something like:
Match m = Regex.Match(line, #"^(.{2}) (.{4}) (.{2})( +.+?)?$");

First - define a variable for each column in the file. Then go through the file line by line and assign each column to the correct variable. Substitute the correct start positions and lengths. This should be enough information to get you started parsing your file.
private string id;
private string name;
private string age;
private string training;
while((line = file.ReadLine()) != null)
{
id = line.Substring(0, 3)
name = line.Substring(3, 10)
age = line.Substring(12, 2)
training = line.Substring(14, 10)
...
if (string.IsNullOrWhiteSpace(name))
{
// ignore this line if the name is blank
}
else
{
// do something useful
}
counter++;
}

Related

Use continue key word to processed with the loop

I am reading data from excel file(which is actually a comma separated csv file) columns line-by-line, this file gets send by an external entity.Among the columns to be read is the time, which is in 00.00 format, so a split method is used read all the different columns, however the file sometimes comes with extra columns(commas between the elements) so the split elements are now always correct. Below is the code used to read and split the different columns, this elements will be saved in the database.
public void SaveFineDetails()
{
List<string> erroredFines = new List<string>();
try
{
log.Debug("Start : SaveFineDetails() - Saving Downloaded files fines..");
if (!this.FileLines.Any())
{
log.Info(string.Format("End : SaveFineDetails() - DataFile was Empty"));
return;
}
using (RAC_TrafficFinesContext db = new RAC_TrafficFinesContext())
{
this.FileLines.RemoveAt(0);
this.FileLines.RemoveAt(FileLines.Count - 1);
int itemCnt = 0;
int errorCnt = 0;
int duplicateCnt = 0;
int count = 0;
foreach (var line in this.FileLines)
{
count++;
log.DebugFormat("Inserting {0} of {1} Fines..", count.ToString(), FileLines.Count.ToString());
string[] bits = line.Split(',');
int bitsLength = bits.Length;
if (bitsLength == 9)
{
string fineNumber = bits[0].Trim();
string vehicleRegistration = bits[1];
string offenceDateString = bits[2];
string offenceTimeString = bits[3];
int trafficDepartmentId = this.TrafficDepartments.Where(tf => tf.DepartmentName.Trim().Equals(bits[4], StringComparison.InvariantCultureIgnoreCase)).Select(tf => tf.DepartmentID).FirstOrDefault();
string proxy = bits[5];
decimal fineAmount = GetFineAmount(bits[6]);
DateTime fineCreatedDate = DateTime.Now;
DateTime offenceDate = GetOffenceDate(offenceDateString, offenceTimeString);
string username = Constants.CancomFTPServiceUser;
bool isAartoFine = bits[7] == "1" ? true : false;
string fineStatus = "Sent";
try
{
var dupCheck = db.GetTrafficFineByNumber(fineNumber);
if (dupCheck != null)
{
duplicateCnt++;
string ExportFileName = (base.FileName == null) ? string.Empty : base.FileName;
DateTime FileDate = DateTime.Now;
db.CreateDuplicateFine(ExportFileName, FileDate, fineNumber);
}
else
{
var adminFee = db.GetAdminFee();
db.UploadFTPFineData(fineNumber, fineAmount, vehicleRegistration, offenceDate, offenceDateString, offenceTimeString, trafficDepartmentId, proxy, false, "Imported", username, adminFee, isAartoFine, dupCheck != null, fineStatus);
}
itemCnt++;
}
catch
{
errorCnt++;
}
}
else
{
erroredFines.Add(line);
continue;
}
}
Now the problem is, this file doesn't always come with 9 elements as we expect, for example on this image, the lines are not the same(ignore first line, its headers)
On first line FM is supposed to be part of 36DXGP instead of being two separated elements. This means the columns are now extra. Now this brings us to the issue at hand, which is the time element, beacuse of extra coma, the time is now something else, is now read as 20161216, so the split on the time element is not working at all. So what I did was, read the incorrect line, check its length, if the length is not 9 then, add it to the error list and continue.
But my continue key word doesn't seem to work, it gets into the else part and then goes back to read the very same error line.
I have checked answers on Break vs Continue and they provide good example on how continue works, I introduced the else because the format on this examples did not work for me(well the else did not made any difference neither). Here is the sample data,
NOTE the first line to be read starts with 96
H,1789,,,,,,,,
96/17259/801/035415,FM,36DXGP,20161216,17.39,city hall-cape town,Makofane,200,0,0
MA/80/034808/730,CA230721,20170117,17.43,malmesbury,PATEL,200,0,0,
what is it that I am doing so wrong here
I have found a way to solve my problem, there was an issue with the length of the line because of the trailing comma which caused an empty element, I then got rid of this empty element with this code and determined the new length
bits = bits.Where(x => !string.IsNullOrEmpty(x)).ToArray();
int length = bits.Length
All is well now
I suggest you use the following overload for performance and readability reasons:
line.Split(new char[] {','}, StringSplitOptions.RemoveEmptyEntries)l

C# reading multiple lines into a single variable

I have looked but failed to find a tutorial that would address my question. Perhaps I didn't word my searches correctly. In any event, I am here.
I have taken over a handyman company and he had about 150 customers. He had a program that he bought that produces records for his customers. I have the records, but he wouldn't sell me the program as it is commercial and he's afraid of going to prison for selling something like that... whatever... They are written in a text file. The format appears to be:
string name
string last_job_description
int job_codes
string job_address
string comments
The text file looks like this
*Henderson*
*Cleaned gutters, fed the lawn, added mulch to the front flower beds,
cut the back yard, edged the side walk, pressure washed the driveway*
*04 34 32 1 18 99 32 22 43 72 11 18*
*123 Anywhere ave*
*Ms.always pays cash no tip. Mr. gives you a check but always tips*
Alright.. My question is in C# I want to write a program to edit these records, add new customers and delete some I may lose, moves, or dies... But the 2nd entry is broken over two lines, sometimes 3 and 4 lines. They all start and end with *. So, how do I read the 2 to 4 lines and get them into the last_job_description string variable? I can write the class, I can read lines, I can trim away the asterisks. I can find nothing on reading multiple lines into a single variable.
Let's do it right!
First define the customer model:
public class Customer
{
public string Name { get; set; }
public string LastJobDescription { get; set; }
public List<int> JobCodes { get; set; }
public string JobAddress { get; set; }
public string Comments { get; set; }
}
Then, we need a collection of customers:
var customers = new List<Customer>();
Fill the collection with data from the file:
string text = File.ReadAllText("customers.txt");
string pattern = #"(?<= ^ \*) .+? (?= \* \r? $)";
var options = RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled
| RegexOptions.Singleline | RegexOptions.Multiline;
var matches = Regex.Matches(text, pattern, options);
for (int i = 0; i < matches.Count; i += 5)
{
var customer = new Customer
{
Name = matches[i].Value,
LastJobDescription = matches[i + 1].Value,
JobCodes = matches[i + 2].Value.Split().Select(s => int.Parse(s)).ToList(),
JobAddress = matches[i + 3].Value,
Comments = matches[i + 4].Value
};
customers.Add(customer);
}
I'm using a regular expression that allows to have the * character in the middle of the lines.
Now we can comfortably work with this collection.
Examples of usage.
Remove the first customer:
customers.RemoveAt(0);
Add a comment to the latest client:
customers.Last().Comments += " Very generous.";
Find the first record for a client by the name of Henderson and add the code of the job performed:
customers.Find(c => c.Name == "Henderson").JobCodes.Add(42);
Add new customer:
var customer = new Customer
{
Name = "Chuck Norris",
LastJobDescription= "Saved the world.",
JobCodes = new List<int>() { 1 },
JobAddress = "UN",
Comments = "Nice guy!"
};
customers.Add(customer);
And so on.
To save data to a file, use the following:
var sb = new StringBuilder();
foreach (var customer in customers)
{
sb.Append('*').Append(customer.Name).Append('*').AppendLine();
sb.Append('*').Append(customer.LastJobDescription).Append('*').AppendLine();
sb.Append('*').Append(string.Join(" ", customer.JobCodes)).Append('*').AppendLine();
sb.Append('*').Append(customer.JobAddress).Append('*').AppendLine();
sb.Append('*').Append(customer.Comments).Append('*').AppendLine();
}
File.WriteAllText("customers.txt", sb.ToString());
You probably need a graphical user interface. If so, I suggest you to ask a new question where you specify what you use: WinForms, WPF, Web-application or something else.
if you want to read all the lines in a file into a single variable then you need to do this
var txt = File.ReadAllText(.. your file location here ..);
or you can do
var lines = File.ReadAllLines(.. your file location here); and then you can iterate through each line and remove the blanks of leave as it is.
But based on your question the first line is what you're really after
when you should read last_job_description read second line. if it starts with * it means that this line is job_codes otherwise append it to pervous read line. do this fo every lines until you find job_codes.
using (var fileReader = new StreamReader("someFile"))
{
// read some pervous data here
var last_job_description = fileReader.ReadLine();
var nextLine = fileReader.ReadLine();
while (nextLine.StartsWith("*") == false)
{
last_job_description += nextLine;
nextLine = fileReader.ReadLine();
}
//you have job_codes in nextline, so parse it now and read next data
}
or you may even use the fact that each set of data starts and ENDS!!! with * so you may create function that reads each "set" and returns it as singleline string, no matter how much lines it really was. let's assume that reader variable you pass to this function is the same StreamReader as in upper code
function string readSingleValue(StreamReader fileReader){
var result = "";
do
{
result += fileReader.ReadLine().Trim();
} while (result.EndsWith("*") == false);
return result;
}
I think your problem is little complex. What i am trying to say is it doesn't mention to a specific problem. It is contained at least 3 different programming aspects that you need to know or learn to reach your goal. The steps that you must take are described below.
unfortunately you need to consider your current file as huge string type.
Then you are able to process this string and separate different parts.
Next move is defining a neat, reliable and robust XML file which can hold your data.
Fill your xml with your manipulated string.
If you have a xml file you can simply update it.
There is not built-in method that will parse your file exactly like you want, so you have to get your hands dirty.
Here's an example of a working code for your specific case. I assumed you wanted your job_codes in an array since they're multiple integers separated by spaces.
Each time we encounter a line starting with '*', we increase a counter that'll tell us to work with the next of your properties (name, last job description, etc...).
Each line is appended to the current property.
And obviously, we remove the stars from the beginning and ending of every line.
string name = String.Empty;
string last_job_description = String.Empty;
int[] job_codes = null;
string job_address = String.Empty;
string comments = String.Empty;
int numberOfPropertiesRead = 0;
var lines = File.ReadAllLines("C:/yourfile.txt");
for (int i = 0; i < lines.Count(); i++)
{
var line = lines[i];
bool newProp = line.StartsWith("*");
bool endOfProp = line.EndsWith("*");
if (newProp)
{
numberOfPropertiesRead++;
line = line.Substring(1);
}
if (endOfProp)
line = line.Substring(0, line.Length - 1);
switch (numberOfPropertiesRead)
{
case 1: name += line; break;
case 2: last_job_description += line; break;
case 3:
job_codes = line.Split(' ').Select(el => Int32.Parse(el)).ToArray();
break;
case 4: job_address += line; break;
case 5: comments += line; break;
default:
throw new ArgumentException("Wow, that's too many properties dude.");
}
}
Console.WriteLine("name: " + name);
Console.WriteLine("last_job_description: " + last_job_description);
foreach (int job_code in job_codes)
Console.Write(job_code + " ");
Console.WriteLine();
Console.WriteLine("job_address: " + job_address);
Console.WriteLine("comments: " + comments);
Console.ReadLine();

Parse Text File Into Dictionary

I have a text file that has several hundred configuration values. The general format of the configuration data is "Label:Value". Using C# .net, I would like to read these configurations, and use the Values in other portions of the code. My first thought is that I would use a string search to look for the Labels then parse out the values following the labels and add them to a dictionary, but this seems rather tedious considering the number of labels/values that I would have to search for. I am interested to hear some thoughts on a possible architecture to perform this task. I have included a small section of a sample text file that contains some of the labels and values (below). A couple of notes: The Values are not always numeric (as seen in the AUX Serial Number); For whatever reason, the text files were formatted using spaces (\s) rather than tabs (\t). Thanks in advance for any time you spend thinking about this.
Sample Text:
AUX Serial Number: 445P000023 AUX Hardware Rev: 1
Barometric Pressure Slope: -1.452153E-02
Barometric Pressure Intercept: 9.524336E+02
This is a nice little brain tickler. I think this code might be able to point you in the right direction. Keep in mind, this fills a Dictionary<string, string>, so there are no conversions of values into ints or the like. Also, please excuse the mess (and the poor naming conventions). It was a quick write-up based on my train of thought.
Dictionary<string, string> allTheThings = new Dictionary<string, string>();
public void ReadIt()
{
// Open the file into a streamreader
using (System.IO.StreamReader sr = new System.IO.StreamReader("text_path_here.txt"))
{
while (!sr.EndOfStream) // Keep reading until we get to the end
{
string splitMe = sr.ReadLine();
string[] bananaSplits = splitMe.Split(new char[] { ':' }); //Split at the colons
if (bananaSplits.Length < 2) // If we get less than 2 results, discard them
continue;
else if (bananaSplits.Length == 2) // Easy part. If there are 2 results, add them to the dictionary
allTheThings.Add(bananaSplits[0].Trim(), bananaSplits[1].Trim());
else if (bananaSplits.Length > 2)
SplitItGood(splitMe, allTheThings); // Hard part. If there are more than 2 results, use the method below.
}
}
}
public void SplitItGood(string stringInput, Dictionary<string, string> dictInput)
{
StringBuilder sb = new StringBuilder();
List<string> fish = new List<string>(); // This list will hold the keys and values as we find them
bool hasFirstValue = false;
foreach (char c in stringInput) // Iterate through each character in the input
{
if (c != ':') // Keep building the string until we reach a colon
sb.Append(c);
else if (c == ':' && !hasFirstValue)
{
fish.Add(sb.ToString().Trim());
sb.Clear();
hasFirstValue = true;
}
else if (c == ':' && hasFirstValue)
{
// Below, the StringBuilder currently has something like this:
// " 235235 Some Text Here"
// We trim the leading whitespace, then split at the first sign of a double space
string[] bananaSplit = sb.ToString()
.Trim()
.Split(new string[] { " " },
StringSplitOptions.RemoveEmptyEntries);
// Add both results to the list
fish.Add(bananaSplit[0].Trim());
fish.Add(bananaSplit[1].Trim());
sb.Clear();
}
}
fish.Add(sb.ToString().Trim()); // Add the last result to the list
for (int i = 0; i < fish.Count; i += 2)
{
// This for loop assumes that the amount of keys and values added together
// is an even number. If it comes out odd, then one of the lines on the input
// text file wasn't parsed correctly or wasn't generated correctly.
dictInput.Add(fish[i], fish[i + 1]);
}
}
So the only general approach that I can think of, given the format that you're limited to, is to first find the first colon on the line and take everything before it as the label. Skip all whilespace characters until you get to the first non-whitespace character. Take all non-whitespace characters as the value of the label. If there is a colon after the end of that value take everything after the end of the previous value to the colon as the next value and repeat. You'll also probably need to trim whitespace around the labels.
You might be able to capture that meaning with a regex, but it wouldn't likely be a pretty one if you could; I'd avoid it for something this complex unless you're entire development team is very proficient with them.
I would try something like this:
While string contains triple space, replace it with double space.
Replace all ": " and ": " (: with double space) with ":".
Replace all " " (double space) with '\n' (new line).
If line don't contain ':' than skip the line. Else, use string.Split(':'). This way you receive arrays of 2 strings (key and value). Some of them may contain empty characters at the beginning or at the end.
Use string.Trim() to get rid of those empty characters.
Add received key and value to Dictionary.
I am not sure if it solves all your cases but it's a general clue how I would try to do it.
If it works you could think about performance (use StringBuilder instead of string wherever it is possible etc.).
This is probably the dirtiest function I´ve ever written, but it works.
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
string[] data = line.Split(':');
if (line != String.Empty)
{
for (int i = 0; i < data.Length - 1; i++)
{
if (i != 0)
{
bool isPair;
if (i % 2 == 0)
{
isPair = true;
}
else
{
isPair = false;
}
if (isPair)
{
string keyOdd = data[i].Trim();
try { keyOdd = keyOdd.Substring(keyOdd.IndexOf(' ')).TrimStart(); }
catch { }
string valueOdd = data[i + 1].TrimStart();
try { valueOdd = valueOdd.Remove(valueOdd.IndexOf(' ')); } catch{}
yourDic.Add(keyOdd, valueOdd);
}
else
{
string keyPair = data[i].TrimStart();
keyPair = keyPair.Substring(keyPair.IndexOf(' ')).Trim();
string valuePair = data[i + 1].TrimStart();
try { valuePair = valuePair.Remove(valuePair.IndexOf(' ')); } catch { }
yourDic.Add(keyPair, valuePair);
}
}
else
{
string key = data[i].Trim();
string value = data[i + 1].TrimStart();
try { value = value.Remove(value.IndexOf(' ')); } catch{}
yourDic.Add(key, value);
}
}
}
}
How does it works?, well splitting the line you can know what you can get in every position of the array, so I just play with the even and odd values.
You will understand me when you debug this function :D. It fills the Dictionary that you need.
I have another idea. Does values contain spaces? If not you could do like this:
Ignore white spaces until you read some other char (first char of key).
Read string until ':' occures.
Trim key that you get.
Ignore white spaces until you read some other char (first char of value).
Read until you get empty char.
Trim value that you get.
If it is the end than stop. Else, go back to step 1.
Good luck.
Maybe something like this would work, be careful with the ':' character
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
yourDic.Add(line.Split(':')[0], line.Split(':')[1]);
}
Anyway, I recommend to organize that file in some way that you´ll always know in what format it comes.

How to format and read CSV file?

Here is just an example of the data I need to format.
The first column is simple, the problem the second column.
What would be the best approach to format multiple data fields in one column?
How to parse this data?
Important*: The second column needs to contain multiple values, like in an example below
Name Details
Alex Age:25
Height:6
Hair:Brown
Eyes:Hazel
A csv should probably look like this:
Name,Age,Height,Hair,Eyes
Alex,25,6,Brown,Hazel
Each cell should be separated by exactly one comma from its neighbor.
You can reformat it as such by using a simple regex which replaces certain newline and non-newline whitespace with commas (you can easily find each block because it has values in both columns).
A CSV file is normally defined using commas as field separators and CR for a row separator. You are using CR within your second column, this will cause problems. You'll need to reformat your second column to use some other form of separator between multiple values. A common alternate separator is the | (pipe) character.
Your format would then look like:
Alex,Age:25|Height:6|Hair:Brown|Eyes:Hazel
In your parsing, you would first parse the comma separated fields (which would return two values), and then parse the second field as pipe separated.
This is an interesting one - it can be quite difficult to parse specific format files which is why people often write specific classes to deal with them. More conventional file formats like CSV, or other delimited formats are [more] easy to read because they are formatted in a similar way.
A problem like the above can be addressed in the following way:
1) What should the output look like?
In your instance, and this is just a guess, but I believe you are aiming for the following:
Name, Age, Height, Hair, Eyes
Alex, 25, 6, Brown, Hazel
In which case, you have to parse out this information based on the structure above. If it's repeated blocks of text like the above then we can say the following:
a. Every person is in a block starting with Name Details
b. The name value is the first text after Details, with the other columns being delimited in the format Column:Value
However, you might also have sections with addtional attributes, or attributes that are missing if the original input was optional, so tracking the column and ordinal would be useful too.
So one approach might look like the following:
public void ParseFile(){
String currentLine;
bool newSection = false;
//Store the column names and ordinal position here.
List<String> nameOrdinals = new List<String>();
nameOrdinals.Add("Name"); //IndexOf == 0
Dictionary<Int32, List<String>> nameValues = new Dictionary<Int32 ,List<string>>(); //Use this to store each person's details
Int32 rowNumber = 0;
using (TextReader reader = File.OpenText("D:\\temp\\test.txt"))
{
while ((currentLine = reader.ReadLine()) != null) //This will read the file one row at a time until there are no more rows to read
{
string[] lineSegments = currentLine.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
if (lineSegments.Length == 2 && String.Compare(lineSegments[0], "Name", StringComparison.InvariantCultureIgnoreCase) == 0
&& String.Compare(lineSegments[1], "Details", StringComparison.InvariantCultureIgnoreCase) == 0) //Looking for a Name Details Line - Start of a new section
{
rowNumber++;
newSection = true;
continue;
}
if (newSection && lineSegments.Length > 1) //We can start adding a new person's details - we know that
{
nameValues.Add(rowNumber, new List<String>());
nameValues[rowNumber].Insert(nameOrdinals.IndexOf("Name"), lineSegments[0]);
//Get the first column:value item
ParseColonSeparatedItem(lineSegments[1], nameOrdinals, nameValues, rowNumber);
newSection = false;
continue;
}
if (lineSegments.Length > 0 && lineSegments[0] != String.Empty) //Ignore empty lines
{
ParseColonSeparatedItem(lineSegments[0], nameOrdinals, nameValues, rowNumber);
}
}
}
//At this point we should have collected a big list of items. We can then write out the CSV. We can use a StringBuilder for now, although your requirements will
//be dependent upon how big the source files are.
//Write out the columns
StringBuilder builder = new StringBuilder();
for (int i = 0; i < nameOrdinals.Count; i++)
{
if(i == nameOrdinals.Count - 1)
{
builder.Append(nameOrdinals[i]);
}
else
{
builder.AppendFormat("{0},", nameOrdinals[i]);
}
}
builder.Append(Environment.NewLine);
foreach (int key in nameValues.Keys)
{
List<String> values = nameValues[key];
for (int i = 0; i < values.Count; i++)
{
if (i == values.Count - 1)
{
builder.Append(values[i]);
}
else
{
builder.AppendFormat("{0},", values[i]);
}
}
builder.Append(Environment.NewLine);
}
//At this point you now have a StringBuilder containing the CSV data you can write to a file or similar
}
private void ParseColonSeparatedItem(string textToSeparate, List<String> columns, Dictionary<Int32, List<String>> outputStorage, int outputKey)
{
if (String.IsNullOrWhiteSpace(textToSeparate)) { return; }
string[] colVals = textToSeparate.Split(new[] { ":" }, StringSplitOptions.RemoveEmptyEntries);
List<String> outputValues = outputStorage[outputKey];
if (!columns.Contains(colVals[0]))
{
//Add the column to the list of expected columns. The index of the column determines it's index in the output
columns.Add(colVals[0]);
}
if (outputValues.Count < columns.Count)
{
outputValues.Add(colVals[1]);
}
else
{
outputStorage[outputKey].Insert(columns.IndexOf(colVals[0]), colVals[1]); //We append the value to the list at the place where the column index expects it to be. That way we can miss values in certain sections yet still have the expected output
}
}
After running this against your file, the string builder contains:
"Name,Age,Height,Hair,Eyes\r\nAlex,25,6,Brown,Hazel\r\n"
Which matches the above (\r\n is effectively the Windows new line marker)
This approach demonstrates how a custom parser might work - it's purposefully over verbose as there is plenty of refactoring that could take place here, and is just an example.
Improvements would include:
1) This function assumes there are no spaces in the actual text items themselves. This is a pretty big assumption and, if wrong, would require a different approach to parsing out the line segments. However, this only needs to change in one place - as you read a line at a time, you could apply a reg ex, or just read in characters and assume that everything after the first "column:" section is a value, for example.
2) No exception handling
3) Text output is not quoted. You could test each value to see if it's a date or number - if not, wrap it in quotes as then other programs (like Excel) will attempt to preserve the underlying datatypes more effectively.
4) Assumes no column names are repeated. If they are, then you have to check if a column item has already been added, and then create an ColName2 column in the parsing section.

c#: regex how to differentiate between two variations of a string

This is tough to explain enough to ask the question, but i'll try:
I have two possibilities of user input:
S01E05 or 0105 (two different input strings)
which both translate to season 01, episode 05
but if they user inputs it backwards E05S01 or 0501, i need to be able to return the same result, Season 01 Episode 05
The control for this would be the user defining the format of the original filename with something like this:
"SssEee" -- uppercase 'S' denoting that the following lowercase 's' belong to Season and uppercase 'E' denoting that the following lowercase 'e' belong to Episode. So if the user decides to define the format as EeeSss then my function should still return the same result since it knows which numbers belong to season or episode.
I don't have anything working quite yet to share, but what I was toying with is a loop that builds the regex pattern. The function, so far, accepts the user format and the file name:
public static int(string userFormat, string fileName)
{
}
the userFormat would be a string and look something like this:
t.t.t.SssEee
or even
t.SssEee
where t is for title, and the rest you know.
The file name might look like this:
battlestar.galactica.S01E05.mkv
Ive got the function that extracts the title from the file name by using the userFormat to build the regex string
public static string GetTitle(string userFormat, string fileName)
{
string pattern = "^";
char positionChar;
string fileTitle;
for (short i = 0; i < userFormat.Length; i++)
{
positionChar = userFormat[i];
//build the regex pattern
if (positionChar == 't')
{
pattern += #"\w+";
}
else if (positionChar == '#')
{
pattern += #"\d+";
}
else if (positionChar == ' ')
{
pattern += #"\s+";
}
else
pattern += positionChar;
}
//pulls out the title with or without the delimiter
Match title = Regex.Match(fileName, pattern, RegexOptions.IgnoreCase);
fileTitle = title.Groups[0].Value;
//remove the delimiter
string[] tempString = fileTitle.Split(#"\/.-<>".ToCharArray());
fileTitle = "";
foreach (string part in tempString)
{
fileTitle += part + " ";
}
return CultureInfo.CurrentCulture.TextInfo.ToTitleCase(fileTitle);
}
but im kind of stumped on how to do the extraction of the episode and season numbers. In my head im thinking the process would look something like:
Look through the userFormat string to find the uppercase S
Determine how many lowercase 's' are following the uppercase S
Build the regex expression that describes this
Search through the file name and find that pattern
Extract the number from that pattern
Sounds simple enough but im having a hard time putting it into actions. The complication being the the fact that the format in the filename could be S01E05 or it could be simply 0105. Either scenario would be identified by the user when they define the format.
Ex 1. the file name is battlestar.galactica.S01E05
the user format submitted will be t.t.?ss?ee
Ex 2. the file name is battlestar.galactica.0105
the user format submitted will be t.t.SssEee
Ex 3. the file name is battlestar.galactica.0501
the user format submitted will be t.t.EeeSss
Sorry for the book... the concept is simple, the regex function should be dynamic, allowing the user to define the format of a file name to where my method can generate the expression and use it to extract information from the file name. Something is telling me that this is simpler than it seems... but im at a loss. lol... any suggestions?
So if I read this right, you know where the the Season/Episode number is in the string because the user has told you. That is, you have t.t.<number>.more.stuff. And <number> can take one of these forms:
SssEee
EeeSss
ssee
eess
Or did you say that the user can define how many digits will be used for season and episode? That is, could it be S01E123?
I'm not sure you need a regex for this. Since you know the format, and it appears that things are separated by periods (I assume that there can't be periods in the individual fields), you should be able to use String.Split to extract the pieces, and you know from the user's format where the Season/Episode is in the resulting array. So you now have a string that takes one of the forms above.
You have the user's format definition and the Season/Episode number. You should be able to write a loop that steps through the two strings together and extracts the necessary information, or issues an error.
string UserFormat = "SssEee";
string EpisodeNumber = "0105";
int ifmt = 0;
int iepi = 0;
int season = 0;
int episode = 0;
while (ifmt <= UserFormat.Length && iepi < EpisodeNumber.Length)
{
if ((UserFormat[ifmt] == "S" || UserFormat[ifmt] == "E"))
{
if (EpisodeNumber[iepi] == UserFormat[ifmt])
{
++iepi;
}
else if (!char.IsDigit(EpisodeNumber[iepi]))
{
// Error! Chars didn't match, and it wasn't a digit.
break;
}
++ifmt;
}
else
{
char c = EpisodeNumber[iepi];
if (!char.IsDigit(c))
{
// error. Expected digit.
}
if (UserFormat[ifmt] == 'e')
{
episode = (episode * 10) + (int)c - (int)'0';
}
else if (UserFormat[ifmt] == 's')
{
season = (season * 10) + (int)c - (int)'0';
}
else
{
// user format is broken
break;
}
++iepi;
++ifmt;
}
}
Note that you'll probably have to do some checking to see that the lengths are correct. That is, the code above will accept S01E1 when the user's format is SssEee. There's a bit more error handling that you can add, depending on how worried you are about bad input. But I think this gives you the gist of the idea.
I have to think that's going to be a whole lot easier than trying to dynamically build regular expressions.
After #Sinaesthetic answered my question we can reduce his original post to:
The challenge is to receive any of these inputs:
0105 (if your input is 0105 you assume SxxEyy)
S01E05
E05S01 OR
1x05 (read as season 1 episode 5)
and transform any of these inputs into: S01E05
At this point title and file format are irrelevant, they just get tacked on to the ends.
Based on that the following code will always result in 'Battlestar.Galactica.S01E05.mkv'
static void Main(string[] args)
{
string[] inputs = new string[6] { "E05S01", "S01E05", "0105", "105", "1x05", "1x5" };
foreach (string input in inputs)
{
Console.WriteLine(FormatEpisodeTitle("Battlestar.Galactica", input, "mkv"));
}
Console.ReadLine();
}
private static string FormatEpisodeTitle(string showTitle, string identifier, string fileFormat)
{
//first make identifier upper case
identifier = identifier.ToUpper();
//normalize for SssEee & EeeSee
if (identifier.IndexOf('S') > identifier.IndexOf('E'))
{
identifier = identifier.Substring(identifier.IndexOf('S')) + identifier.Substring(identifier.IndexOf('E'), identifier.IndexOf('S'));
}
//now get rid of S and replace E with x as needed:
identifier = identifier.Replace("S", string.Empty).Replace("E", "X");
//at this point, if there isn't an "X" we need one, as in 105 or 0105
if (identifier.IndexOf('X') == -1)
{
identifier = identifier.Substring(0, identifier.Length - 2) + "X" + identifier.Substring(identifier.Length - 2);
}
//now split by the 'X'
string[] identifiers = identifier.Split('X');
// and put it back together:
identifier = 'S' + identifiers[0].PadLeft(2, '0') + 'E' + identifiers[1].PadLeft(2, '0');
//tack it all together
return showTitle + '.' + identifier + '.' + fileFormat;
}

Categories