Optimizing processing of data stored in a flat file - c#

At my office we use an old third-party tool to handle some data processing and export work. The output of this tool is unfortunately in a really clunky format, so for us to put it into a meaningful form and work with it, we have to have an intermediate processing step between the raw export of this data and our ability to act further on it.
This problem was one that I pretty concisely solved some time ago in Python with itertools, but for reasons, I need to relocate this work into an existing C# application.
I've super-generalized and simplified the example data that I've posted here (and the corresponding code), but it's representative of the way the real data is set up. The raw data spit out by the tool looks like this, with some caveats (which I'll explain):
Zip Code: 11111
First Name: Joe
Last Name: Smith
ID: 1
Phone Number: 555-555-1111
Zip Code: 11111
First Name: John
Last Name: Doe
ID: 2
Phone Number: 555-555-1112
Zip Code: 11111
First Name: Mike
Last Name: Jones
ID: 3
Phone Number: 555-555-1113
There are no unique separators between records. They're just listed one right after the other. A valid and actionable record contains all five items ("Zip Code", "First Name", "Last Name", "ID", "Phone Number").
We only need first/last name, ID, and phone number for our purposes. Each unique record always begins with Zip Code, but thanks to some quirks in the underlying process and the third-party tool, I have some things I need to account for:
Records missing a phone number are invalid, and will show up with a value of "(n/a)" in the "Phone Number" line. We need to ignore the whole record in this case.
Records (rarely) may be missing a line (such as "Last Name") if the record was not entered correctly prior to processing. We ignore these cases, too.
If there was an error with some linked information to the underlying data, the record will contain a line beginning with "Error". Its exact position among the other items in a record varies. If a record contains an error, we ignore it.
The way I solved this in C# is to start with the first line and check to see if it begins with "Zip Code". If so, I drop into a further loop where I build a dictionary of keys and values (splitting on the first ":") until I hit the next "Zip Code" line. It then repeats and rolls through the process again while current line < (line count - 5).
private void CrappilyHandleExportLines(List<string> RawExportLines)
{
int lineNumber = 0;
while (lineNumber < (RawExportLines.Count - 5))
{
// The lineGroup dict will represent the record we're currently processing
Dictionary<string, string> lineGroup = new Dictionary<string, string>();
// If the current line begins with "Zip Code", this means we've reached another record to process
if (RawExportLines[lineNumber++].StartsWith("Zip Code"))
{
// If the line does NOT begin with "Zip Code", we assume it's another part of the record we're already
// working on.
while (!RawExportLines[lineNumber].StartsWith("Zip Code"))
{
// Append everything except "Error" lines to the record we're working on, as stored in lineGroup
if (!RawExportLines[lineNumber].StartsWith("Error")
{
string[] splitLine = RawExportLines[lineNumber].Split(new[] { ":" }, 2, StringSplitOptions.None);
lineGroup[splitLine[0].Trim()] = splitLine[1].Trim();
}
lineNumber++;
}
}
// Validate the record before continuing. verifyAllKeys is just a method that does a check of the key list
// against a list of expected keys using Except to make sure all of the items that we require are present.
if (verifyAllKeys(new List<string>(lineGroup.Keys)) || (lineGroup["Phone Number"] != "(n/a)"))
{
// The record is good! Now we can do something with it:
WorkOnProcessedRecord(lineGroup);
}
}
}
This works (from my initial testing, at least). The problem is that I really dislike this code. I know there's a better way to do it, but I'm not as strong in C# as I'd like to be so I think I'm missing out on some ways that would allow me to more elegantly and safely get the desired result.
Can anyone lend a hand to point me in the right direction as to how I can implement a better solution? Thank you!

This may help you, the idea is grouping entries based on their id by dictionary, then you can validate enitries with appropriate conditions:
static void Main(string[] args)
{
string path = #"t.txt";
var text = File.ReadAllLines(path, Encoding.UTF8);
var dict = new Dictionary<string, Dictionary<string, string>>();
var id = "";
var rows = text
.Select(l => new { prop = l.Split(':')[0], val = l.Split(':')[1].Trim() })
.ToList();
foreach (var row in rows)
{
if (row.prop == "ID")
{
id = row.val;
}
else if (dict.ContainsKey(id))
{
dict[id].Add(row.prop, row.val);
}
else
{
dict[id] = new Dictionary<string, string>();
dict[id].Add(row.prop, row.val);
}
}
//get valid entries
var validEntries = dict.Where(e =>e.Value.Keys.Intersect(new List<string> { "Zip Code", "First Name", "Last Name", "Phone Number" }).Count()==4 && e.Value["Phone Number"] != "(n/a)").ToDictionary(x=>x.Key, x => x.Value);
}
In case ID is related to previous properties and emerges after them you can use below code as If block :
if (row.prop == "ID")
{
var values=dict[id];
dict.Remove(id);
dict.Add(row.val,values);
id = "";
}

I would try to solve the problem in a bit more of an object oriented manner using a factory-ish pattern.
//Define a class to hold all people we get, which might be empty or have problems in them.
public class PersonText
{
public string FirstName { get; set; }
public string LastName { get; set; }
public string PhoneNumber { get; set; }
public string ID { get; set; }
public string ZipCode { get; set; }
public bool Error { get; set; }
public bool Anything { get; set; }
}
//A class to hold a key ("First Name"), and a way to set the respective item on the PersonText class correctly.
public class PersonItemGetSets
{
public string Key { get; }
public Func<PersonText, string> Getter { get; }
public Action<PersonText, string> Setter { get; }
public PersonItemGetSets(string key, Action<PersonText, string> setter, Func<PersonText, string> getter)
{
Getter = getter;
Key = key;
Setter = setter;
}
}
//This will get people from the lines of text
public static IEnumerable<PersonText> GetPeople(IEnumerable<string> lines)
{
var itemGetSets = new List<PersonItemGetSets>()
{
new PersonItemGetSets("First Name", (p, s) => p.FirstName = s, p => p.FirstName),
new PersonItemGetSets("Last Name", (p, s) => p.LastName = s, p => p.LastName),
new PersonItemGetSets("Phone Number", (p, s) => p.PhoneNumber = s, p => p.PhoneNumber),
new PersonItemGetSets("ID", (p, s) => p.ID = s, p => p.ID),
new PersonItemGetSets("Zip Code", (p, s) => p.ZipCode = s, p => p.ZipCode),
};
foreach (var person in GetRawPeople(lines, itemGetSets, "Error"))
{
if (IsValidPerson(person, itemGetSets))
yield return person;
}
}
//Used to determine if a PersonText is valid and if it is worth processing.
private static bool IsValidPerson(PersonText p, IReadOnlyList<PersonItemGetSets> itemGetSets)
{
if (itemGetSets.Any(x => x.Getter(p) == null))
return false;
if (p.Error)
return false;
if (!p.Anything)
return false;
if (p.PhoneNumber.Length != 12) // "555-555-5555".Length = 12
return false;
return true;
}
//Read through each line, and return all potential people, but don't validate whether they're correct at this time.
private static IEnumerable<PersonText> GetRawPeople(IEnumerable<string> lines, IReadOnlyList<PersonItemGetSets> itemGetSets, string errorToken)
{
var person = new PersonText();
foreach (var line in lines)
{
var parts = line.Split(':');
bool valid = false;
if (parts.Length == 2)
{
var left = parts[0];
var right = parts[1].Trim();
foreach (var igs in itemGetSets)
{
if (left.Equals(igs.Key, StringComparison.OrdinalIgnoreCase))
{
valid = true;
person.Anything = true;
if (igs.Getter(person) != null)
{
yield return person;
person = new PersonText();
}
igs.Setter(person, right);
}
}
}
else if (parts.Length == 1)
{
if (parts[0].Trim().Equals(errorToken, StringComparison.OrdinalIgnoreCase))
{
person.Error = true;
}
}
if (!valid)
{
if (person.Anything)
{
yield return person;
person = new PersonText();
}
continue;
}
}
if (person.Anything)
yield return person;
}
Have a look at the code working here: https://dotnetfiddle.net/xVnATX

Related

c# use one variable value to set a second from a fixed list

I'm parsing a CSV file in a c# .net windows form app, taking each line into a class I've created, however I only need access to some of the columns AND the files being taken in are not standardized. That is to say, number of fields present could be different and the columns could appear in any column.
CSV Example 1:
Position, LOCATION, TAG, NAME, STANDARD, EFFICIENCY, IN USE,,
1, AFT-D3, P-D3101A, EQUIPMENT 1, A, 3, TRUE
2, AFT-D3, P-D3103A, EQUIPMENT 2, B, 3, FALSE
3, AFT-D3, P-D2301A, EQUIPMENT 3, A, 3, TRUE
...
CSV Example 2:
Position, TAG, STANDARD, NAME, EFFICIENCY, LOCATION, BACKUP, TESTED,,
1, P-D3101A, A, EQUIPMENT 1, 3, AFT-D3, FALSE, TRUE
2, P-D3103A, A, EQUIPMENT 2, 3, AFT-D3, TRUE, FALSE
3, P-D2301A, A, EQUIPMENT 3, 3, AFT-D3, FALSE, TRUE
...
As you can see, I will never know the format of the file I have to analyse, the only thing I know for sure is that it will always contain the few columns that I need.
My solution to this was to ask the user to enter the columns required and set as strings, the using their entry convert that to a corresponding integer that i could then use as a location.
string standardInpt = "";
string nameInpt = "";
string efficiencyInpt = "";
user would then enter a value from A to ZZ.
int standardLocation = 0;
int nameLocation = 0;
int efficiencyLocation = 0;
when the form is submitted. the ints get their final value by running through an if else... statement:
if(standard == "A")
{
standardLocation = 0;
}
else if(standard == "B")
{
standardLocation = 1;
}
...
etc running all the way to if VAR1 == ZZ and then the code is repeated for VAR2 and for VAR3 etc..
My class would partially look like:
class Equipment
{
public string Standard { get; set;}
public string Name { get; set; }
public int Efficiency { get; set; }
static Equipment FromLine(string line)
{
var data = line.split(',');
return new Equipment()
{
Standard = data[standardLocation],
Name = [nameLocation],
Efficiency = int.Parse(data[efficiencyLocation]),
};
}
}
I've got more code in there but i think this highlights where I would use the variables to set the indexes.
I'm very new to this and I'm hoping there has got to be a significantly better way to achieve this without having to write so much potentially excessive, repetitive If Else logic. I'm thinking some kind of lookup table maybe, but i cant figure out how to implement this, any pointers on where i could look?
You could make it automatic by finding the indexes of the columns in the header, and then use them to read the values from the correct place from the rest of the lines:
class EquipmentParser {
public IList<Equipment> Parse(string[] input) {
var result = new List<Equipment>();
var header = input[0].Split(',').Select(t => t.Trim().ToLower()).ToList();
var standardPosition = GetIndexOf(header, "std", "standard", "st");
var namePosition = GetIndexOf(header, "name", "nm");
var efficiencyPosition = GetIndexOf(header, "efficiency", "eff");
foreach (var s in input.Skip(1)) {
var line = s.Split(',');
result.Add(new Equipment {
Standard = line[standardPosition].Trim(),
Name = line[namePosition].Trim(),
Efficiency = int.Parse(line[efficiencyPosition])
});
}
return result;
}
private int GetIndexOf(IList<string> input, params string[] needles) {
return Array.FindIndex(input.ToArray(), needles.Contains);
}
}
You can use the reflection and attribute.
Write your samples in ,separated into DisplayName Attribute.
First call GetIndexes with the csv header string as parameter to get the mapping dictionary of class properties and csv fields.
Then call FromLine with each line and the mapping dictionary you just got.
class Equipment
{
[DisplayName("STND, STANDARD, ST")]
public string Standard { get; set; }
[DisplayName("NAME")]
public string Name { get; set; }
[DisplayName("EFFICIENCY, EFFI")]
public int Efficiency { get; set; }
// You can add any other property
public static Equipment FromLine(string line, Dictionary<PropertyInfo, int> map)
{
var data = line.Split(',').Select(t => t.Trim()).ToArray();
var ret = new Equipment();
Type type = typeof(Equipment);
foreach (PropertyInfo property in type.GetProperties())
{
int index = map[property];
property.SetValue(ret, Convert.ChangeType(data[index],
property.PropertyType));
}
return ret;
}
public static Dictionary<PropertyInfo, int> GetIndexes(string headers)
{
var headerArray = headers.Split(',').Select(t => t.Trim()).ToArray();
Type type = typeof(Equipment);
var ret = new Dictionary<PropertyInfo, int>();
foreach (PropertyInfo property in type.GetProperties())
{
var fieldNames = property.GetCustomAttribute<DisplayNameAttribute>()
.DisplayName.Split(',').Select(t => t.Trim()).ToArray();
for (int i = 0; i < headerArray.Length; ++i)
{
if (!fieldNames.Contains(headerArray[i])) continue;
ret[property] = i;
break;
}
}
return ret;
}
}
try this if helpful:
public int GetIndex(string input)
{
input = input.ToUpper();
char low = input[input.Length - 1];
char? high = input.Length == 2 ? input[0] : (char?)null;
int indexLow = low - 'A';
int? indexHigh = high.HasValue ? high.Value - 'A' : (int?)null;
return (indexHigh.HasValue ? (indexHigh.Value + 1) * 26 : 0) + indexLow;
}
You can use ASCII code for that , so no need to add if else every time
ex.
byte[] ASCIIValues = Encoding.ASCII.GetBytes(standard);
standardLocation = ASCIIValues[0]-65;

Is there a way to include null values in a list in C#?

I have a text file filled with several lines of data, and I would like to split it into 5 different elements like so..
I am successfully reading in the data and putting it into an array. Now I would like to split each part of the text up into different lists so I can compare the data against one another.
I have currently managed to read in the first 4 elements of each line into their relevant lists but the 5th one is throwing me the error "System.IndexOutOfRangeException" which I can only assume is because the first line it reads in has no value for the 5th element?
So my question is, is there a way to populate null values when writing them to a number of lists?
I've tried manually assigning the size of the array and lists but I still get the same error.
Here is my code:
class Program
{
static void Main(string[] args)
{
// Reading in file containing data from BT Code Evaluation sheet (for testing purposes).
// Each line gets stored into a string array, each element is one line of the data.txt file.
//String[] lines = System.IO.File.ReadAllLines(#"C:\Users\Ad\Desktop\data.txt");
String[] lines = new String[5] {"monitorTime", "localTime", "actor", "action", "actor2"};
lines = System.IO.File.ReadAllLines(#"C:\Users\Ad\Desktop\data.txt");
char delimiter = ' ';
List<String> monitorTime = new List<String>();
List<String> localTime = new List<String>();
List<String> actor = new List<String>();
List<String> action = new List<String>();
List<String> actor2 = new List<String>();
// Foreach loop displays the lines of text in the data file.
foreach (String line in lines)
{
// Writes the data to the console.
Console.WriteLine(line);
String[] data = new String[5] { "monitorTime", "localTime", "actor", "action", "actor2" };
data = line.Split(delimiter);
monitorTime.Add(data[0]);
localTime.Add(data[1]);
actor.Add(data[2]);
action.Add(data[3]);
actor2.Add(data[4]);
}
foreach (String time in monitorTime)
{
Console.WriteLine(time);
}
foreach (String time in localTime)
{
Console.WriteLine(time);
}
foreach (String name in actor)
{
Console.WriteLine(name);
}
foreach (String actions in action)
{
Console.WriteLine(actions);
}
foreach (String name in actor2)
{
if (name != null)
{
Console.WriteLine("UNKNOWN");
}
else
{
Console.WriteLine(actor2);
}
}
// Creates an empty line between the data and the following text.
Console.WriteLine("");
// Displays message in console.
Console.WriteLine("Press any key to analyse data and create report...");
Console.ReadKey();
}
}
You need to check the bounds of you array before you try to add. If their aren't enough items you can add null instead.
For example:
actor2.Add(data.length > 4 ? data[4] : null)
(Note: You could do the same type of check on the other items as well, unless you are positive that the last item is the only one that might be null)
This is using the ternary operator, but you could also use a simple if/else, but it'll be more verbose. It's equivalent to:
if (data.length > 4)
{
actor2.Add(data[4]);
}
else
{
actor2.Add(null);
}
This along with Console.WriteLine(name); instead of Console.WriteLine(actor2); should fix you immediate problem.
However, a much better design here would be to have a single list of objects with MonitorTime, LocalTime, Actor, Action and Actor2 properties. That way you don't ever have to worry that the 5 parallel arrays might get out of sync.
For example, create a class like this:
public class DataItem
{
public string MonitorTime { get; set; }
public string LocalTime { get; set; }
public string Actor { get; set; }
public string Action { get; set; }
public string Actor2 { get; set; }
}
Then instead of your 5 List<String>, you have one List<DataItem>:
List<DataItem> dataList = new List<DataItem>();
Then in your loop to populate it you'd do something like:
data = line.Split(delimiter);
dataList.Add(new DataItem()
{
MonitorTime = data[0],
LocalTime = data[1],
Actor = data[2],
Action = data[3],
Actor2 = data.length > 4 ? data[4] : null
});
Then you can access them later with something like:
foreach (var item in dataList)
{
Console.WriteLine(item.MonitorTime);
//...
}
In your for each you should be checking to see if the index exists before populating the object.
foreach (String line in lines)
{
// Writes the data to the console.
Console.WriteLine(line);
String[] data = new String[5] { "monitorTime", "localTime", "actor", "action", "actor2" };
data = line.Split(delimiter);
monitorTime.Add(data[0]);
localTime.Add(data[1]);
actor.Add(data[2]);
action.Add(data[3]);
if (data.Length > 4) {
actor2.Add(data[4]);
}
}
There's better ways to do this but this is a simple solution for now.

Pick Random string from List<string> with exclusions and non-repetitive pick

im trying to write a program that would let a user:
Load a set of string.
Loop through the set, and pick another string from the same set.
Avoid a picked string from being picked again.
Have specific strings not be able to pick specified strings.
Example is table below:
And below table is a sample scenario:
How am i supposed to do this the easy way?
i have below code, but it is taking like forever to generate a valid set since it restarts everything if there are nothing left to pick, while not eliminating the possibility.
private List<Participant> _participants;
AllOverAgain:
var pickedParticipants = new List<Participant>();
var participantPicks = new List<ParticipantPick>();
foreach(var participant in _participants)
{
var pickedParticipantNames = from rp in participantPicks select rp.PickedParticipant;
var picks = (from p in _participants where p.Name != participant.Name & !Utilities.IsInList(p.Name, pickedParticipantNames) select p).ToList();
var pick = picks[new Random().Next(0, picks.Count())];
if(pick == null)
{
UpdateStatus($"No Available Picks left for {participant.Name}, Restarting...");
goto AllOverAgain;
}
var exclusions = participant.Exclusions.Split(',').Select(p => p.Trim()).ToList();
if(exclusions.Contains(pick.Name))
{
UpdateStatus($"No Available Picks left for {participant.Name}, Restarting...");
goto AllOverAgain;
}
participantPicks.Add(new ParticipantPick(participant.Name, pick.Name, participant.Number));
}
return participantPicks; // Returns the final output result
The Participant Class consists of these Properties:
public string Name { get; set; }
public string Number { get; set; }
public string Exclusions { get; set; }
The ParticipantPick Class consists of these Properties:
public string Participant { get; set; }
public string PickedParticipant { get; set; }
public string Number { get; set; }
One way you can solve this is by using a dictionary, using a composite key of a tuple and the matching value of a datatype bool.
Dictionary<Tuple<string, string>, bool>
The composite key Tuple<sring,string> will contain every permutation of participants and match them to their appropriate bool value.
For example, the dictionary filled with values such as:
Dictionary<Tuple<"Judith","James">, true>
...would be indicating that Judith picking James is valid.
So lets create a dictionary with every single possible combination of participants, and set the value of them to true for them being valid at the start of the program.
This can be accomplished by a cartesian join using an array with itself.
Dictionary<Tuple<string, string>, bool> dictionary = participants.SelectMany(left => participants, (left, right) => new Tuple<string, string>(left, right)).ToDictionary(item=> item, item=>true);
After getting every permutation of possible picks and setting them to true, we can go through the "not allowed to pick" lists and change the dictionary value for that composite key to false.
dictionary[new Tuple<string, string>(personNotAllowing, notAllowedPerson)] = false;
You can remove a participant from picking itself by using a loop in the following way:
for(int abc=0;abc<participants.Length;abc++)
{
//remove clone set
Tuple<string, string> clonePair = Tuple.Create(participants[abc], participants[abc]);
dictionary.Remove(clonePair);
}
Or by simply changing the value of the clone pair to false.
for(int abc=0;abc<participants.Length;abc++)
{
dictionary[Tuple.Create(participants[abc],participants[abc])] = false;
}
In this example program, I create a string[] of participants, and a string[] for the respective list of people they do not allow. I then perform a cartesian join, the participants array with itself. This leads to every permutation, with an initial true boolean value.
I change the dictionary where the participants are not allowed to false, and display the example dictionary.
Afterward, I create 10 instances of random participants who are picking other random participants and test if it would be valid.
Every time a participant picks another participant, I check that composite key to see if it has a value of true.
If it does result in a valid pick, then every combination of the resulting participant who was picked gets set to false.
for(int j=0; j<participants.Length;j++)
{
//Make the partner never be able to be picked again
Tuple<string, string> currentPair2 = Tuple.Create(partner, participants[j]);
try
{
dictionary[currentPair2] = false;
}
catch
{
}
}
This concept is better illustrated with running the code.
The demo:
static void Main(string[] args)
{
//Create participants set
string[] participants = {"James","John","Tyrone","Rebecca","Tiffany","Judith"};
//Create not allowed lists
string[] jamesNotAllowedList = {"Tiffany", "Tyrone"};
string[] johnNotAllowedList = {};
string[] tyroneNotAllowedList = {};
string[] rebeccaNotAllowedList ={"James", "Tiffany"};
string[] judithNotAllowedList = {};
//Create list of not allowed lists
string[][] notAllowedLists = { jamesNotAllowedList, johnNotAllowedList, tyroneNotAllowedList, rebeccaNotAllowedList, judithNotAllowedList};
//Create dictionary<Tuple<string,string>, bool> from participants array by using cartesian join on itself
Dictionary<Tuple<string, string>, bool> dictionary = participants.SelectMany(left => participants, (left, right) => new Tuple<string, string>(left, right)).ToDictionary(item=> item, item=>true);
//Loop through each person who owns a notAllowedList
for (int list = 0; list < notAllowedLists.Length; list++)
{
//Loop through each name on the not allowed list
for (int person = 0; person<notAllowedLists[list].Length; person++)
{
string personNotAllowing = participants[list];
string notAllowedPerson = notAllowedLists[list][person];
//Change the boolean value matched to the composite key
dictionary[new Tuple<string, string>(personNotAllowing, notAllowedPerson)] = false;
Console.WriteLine(personNotAllowing + " did not allow " + notAllowedPerson);
}
}
//Then since a participant cant pick itself
for(int abc=0;abc<participants.Length;abc++)
{
//remove clone set
Tuple<string, string> clonePair = Tuple.Create(participants[abc], participants[abc]);
dictionary.Remove(clonePair);
}
//Display whats going on with this Dictionary<Tuple<string,string>, bool>
Console.WriteLine("--------Allowed?--Dictionary------------\n");
Console.WriteLine(string.Join(" \n", dictionary));
Console.WriteLine("----------------------------------------\n\n");
//Create Random Object
Random rand = new Random();
//Now that the data is organized in a dictionary..
//..Let's have random participants pick random participants
//For this demonstration lets try it 10 times
for (int i=0;i<20;i++)
{
//Create a new random participant
int rNum = rand.Next(participants.Length);
string randomParticipant = participants[rNum];
//Random participant picks a random participant
string partner = participants[rand.Next(participants.Length)];
//Create composite key for the current pair
Tuple<string, string> currentPair = Tuple.Create(partner,randomParticipant);
//Check if it's a valid choice
try
{
if (dictionary[currentPair])
{
Console.WriteLine(randomParticipant + " tries to pick " + partner);
Console.WriteLine("Valid.\n");
//add to dictionary
for(int j=0; j<participants.Length;j++)
{
//Make the partner never be able to be picked again
Tuple<string, string> currentPair2 = Tuple.Create(partner, participants[j]);
try
{
dictionary[currentPair2] = false;
}
catch
{
}
}
}
else
{
Console.WriteLine(randomParticipant + " tries to pick " + partner);
Console.WriteLine(">>>>>>>>Invalid.\n");
}
}
catch
{
//otherwise exception happens because the random participant
//And its partner participant are the same person
//You can also handle the random participant picking itself differently
//In this catch block
//Make sure the loop continues as many times as necessary
//by acting like this instance never existed
i = i - 1;
}
}
Console.ReadLine();
}
This code will always give you output that adheres to your criteria:
public static class Program
{
public static void Main(string[] args)
{
var gathering = new Gathering();
gathering.MakeSelections();
foreach (var item in gathering.participants)
{
Console.WriteLine(item.name + ":" + item.selectedParticipant);
}
}
public class Participant
{
public string name;
public List<string> exclusions;
public string selectedParticipant;
}
public class Gathering
{
public List<Participant> participants;
public List<string> availableParticipants;
public List<string> usedNames;
public Dictionary<string, string> result;
public Gathering()
{
//initialize participants
participants = new List<Participant>();
participants.Add(new Participant
{
name = "James",
exclusions = new List<string> { "Tiffany", "Tyrone" }
});
participants.Add(new Participant
{
name = "John",
exclusions = new List<string> { }
});
participants.Add(new Participant
{
name = "Judith",
exclusions = new List<string> { }
});
participants.Add(new Participant
{
name = "Rebecca",
exclusions = new List<string> { "James", "Tiffany" }
});
participants.Add(new Participant
{
name = "Tiffany",
exclusions = new List<string> { }
});
participants.Add(new Participant
{
name = "Tyrone",
exclusions = new List<string> { }
});
//prevent participants from selecting themselves
foreach (Participant p in participants)
{
p.exclusions.Add(p.name);
}
//create list of all the names (all available participants at the beginning)
availableParticipants = participants.Select(x => x.name).ToList();
}
public void MakeSelections()
{
Participant currentParticipant;
Random randy = new Random();
//Sort Participants by the length of their exclusion lists, in descending order.
participants.Sort((p1, p2) => p2.exclusions.Count.CompareTo(p1.exclusions.Count));
//Get the first participant in the list which hasn't selected someone yet
currentParticipant = participants.FirstOrDefault(p => p.selectedParticipant == null);
while (currentParticipant != null)
{
//of the available participants, create a list to choose from for the current participant
List<string> listToChooseFrom = availableParticipants.Where(x => !currentParticipant.exclusions.Contains(x)).ToList();
//select a random participant from the list of eligible ones to be matched with the current participant
string assignee = listToChooseFrom[randy.Next(listToChooseFrom.Count)];
currentParticipant.selectedParticipant = assignee;
//remove the selected participant from the list of available participants
availableParticipants.RemoveAt(availableParticipants.IndexOf(assignee));
//remove the selected participant from everyone's exclusion lists
foreach (Participant p in participants)
if (p.exclusions.Contains(assignee))
p.exclusions.RemoveAt(p.exclusions.IndexOf(assignee));
//Resort Participants by the length of their exclusion lists, in descending order.
participants.Sort((p1, p2) => p2.exclusions.Count.CompareTo(p1.exclusions.Count));
//Get the first participant in the list which hasn't selected someone yet
currentParticipant = participants.FirstOrDefault(p => p.selectedParticipant == null);
}
//finally, sort by alphabetical order
participants.Sort((p1, p2) => p1.name.CompareTo(p2.name));
}
}
}
In the simpler version, the items can just be shuffled:
string[] source = { "A", "B", "C", "D", "E", "F" };
string[] picked = source.ToArray(); // copy
var rand = new Random();
for (int i = source.Length - 1, r; i > 0; --i)
{
var pick = picked[r = rand.Next(i)]; // pick random item less than the current one
picked[r] = picked[i]; // and swap with the current one
picked[i] = pick;
Console.WriteLine(i + " swapped with " + r);
}
Console.WriteLine("\nsource: " + string.Join(", ", source) +
"\npicked: " + string.Join(", ", picked));
sample result:
5 swapped with 4
4 swapped with 2
3 swapped with 0
2 swapped with 1
1 swapped with 0
source: A, B, C, D, E, F
picked: F, D, B, A, C, E
or, the source can be optionally shuffled, and each person can pick the person that is next in the list.

How to compare two csv files by 2 columns?

I have 2 csv files
1.csv
spain;russia;japan
italy;russia;france
2.csv
spain;russia;japan
india;iran;pakistan
I read both files and add data to lists
var lst1= File.ReadAllLines("1.csv").ToList();
var lst2= File.ReadAllLines("2.csv").ToList();
Then I find all unique strings from both lists and add it to result lists
var rezList = lst1.Except(lst2).Union(lst2.Except(lst1)).ToList();
rezlist contains this data
[0] = "italy;russia;france"
[1] = "india;iran;pakistan"
At now I want to compare, make except and union by second and third column in all rows.
1.csv
spain;russia;japan
italy;russia;france
2.csv
spain;russia;japan
india;iran;pakistan
I think I need to split all rows by symbol ';' and make all 3 operations (except, distinct and union) but cannot understand how.
rezlist must contains
india;iran;pakistan
I added class
class StringLengthEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
...
}
public int GetHashCode(string obj)
{
...
}
}
StringLengthEqualityComparer stringLengthComparer = new StringLengthEqualityComparer();
var rezList = lst1.Except(lst2,stringLengthComparer ).Union(lst2.Except(lst1,stringLengthComparer),stringLengthComparer).ToList();
Your question is not very clear: for instance, is india;iran;pakistan the desired result primarily because russia is at element[1]? Isn't it also included because element [2] pakistan does not match france and japan? Even though thats unclear, I assume the desired result comes from either situation.
Then there is this: find all unique string from both lists which changes the nature dramatically. So, I take it that the desired results are because "iran" appears in column[1] no where else in column[1] in either file and even if it did, that row would still be unique due to "pakistan" in col[2].
Also note that a data sample of 2 leaves room for a fair amount of error.
Trying to do it in one step makes it very confusing. Since eliminating dupes found in 1.CSV is pretty easy, do it first:
// parse "1.CSV"
List<string[]> lst1 = File.ReadAllLines(#"C:\Temp\1.csv").
Select(line => line.Split(';')).
ToList();
// parse "2.CSV"
List<string[]> lst2 = File.ReadAllLines(#"C:\Temp\2.csv").
Select(line => line.Split(';')).
ToList();
// extracting once speeds things up in the next step
// and leaves open the possibility of iterating in a method
List<List<string>> tgts = new List<List<string>>();
tgts.Add(lst1.Select(z => z[1]).Distinct().ToList());
tgts.Add(lst1.Select(z => z[2]).Distinct().ToList());
var tmpLst = lst2.Where(x => !tgts[0].Contains(x[1]) ||
!tgts[1].Contains(x[2])).
ToList();
That results in the items which are not in 1.CSV (no matching text in Col[1] nor Col[2]). If that is really all you need, you are done.
Getting unique rows within 2.CSV is trickier because you have to actually count the number of times each Col[1] item occurs to see if it is unique; then repeat for Col[2]. This uses GroupBy:
var unique = tmpLst.
GroupBy(g => g[1], (key, values) =>
new GroupItem(key,
values.ToArray()[0],
values.Count())
).Where(q => q.Count == 1).
GroupBy(g => g.Data[2], (key, values) => new
{
Item = string.Join(";", values.ToArray()[0]),
Count = values.Count()
}
).Where(q => q.Count == 1).Select(s => s.Item).
ToList();
The GroupItem class is trivial:
class GroupItem
{
public string Item { set; get; } // debug aide
public string[] Data { set; get; }
public int Count { set; get; }
public GroupItem(string n, string[] d, int c)
{
Item = n;
Data = d;
Count = c;
}
public override string ToString()
{
return string.Join(";", Data);
}
}
It starts with tmpList, gets the rows with a unique element at [1]. It uses a class for storage since at this point we need the array data for further review.
The second GroupBy acts on those results, this time looking at col[2]. Finally, it selects the joined string data.
Results
Using 50,000 random items in File1 (1.3 MB), 15,000 in File2 (390 kb). There were no naturally occurring unique items, so I manually made 8 unique in 2.CSV and copied 2 of them into 1.CSV. The copies in 1.CSV should eliminate 2 if the 8 unique rows in 2.CSV making the expected result 6 unique rows:
NepalX and ItalyX were the repeats in both files and they correctly eliminated each other.
With each step it is scanning and working with less and less data, which seems to make it pretty fast for 65,000 rows / 130,000 data elements.
your GetHashCode()-Method in EqualityComparer are buggy. Fixed version:
public int GetHashCode(string obj)
{
return obj.Split(';')[1].GetHashCode();
}
now the result are correct:
// one result: "india;iran;pakistan"
btw. "StringLengthEqualityComparer"is not a good name ;-)
private void GetUnion(List<string> lst1, List<string> lst2)
{
List<string> lstUnion = new List<string>();
foreach (string value in lst1)
{
string valueColumn1 = value.Split(';')[0];
string valueColumn2 = value.Split(';')[1];
string valueColumn3 = value.Split(';')[2];
string result = lst2.FirstOrDefault(s => s.Contains(";" + valueColumn2 + ";" + valueColumn3));
if (result != null)
{
if (!lstUnion.Contains(result))
{
lstUnion.Add(result);
}
}
}
}
class Program
{
static void Main(string[] args)
{
var lst1 = File.ReadLines(#"D:\test\1.csv").Select(x => new StringWrapper(x)).ToList();
var lst2 = File.ReadLines(#"D:\test\2.csv").Select(x => new StringWrapper(x));
var set = new HashSet<StringWrapper>(lst1);
set.SymmetricExceptWith(lst2);
foreach (var x in set)
{
Console.WriteLine(x.Value);
}
}
}
struct StringWrapper : IEquatable<StringWrapper>
{
public string Value { get; }
private readonly string _comparand0;
private readonly string _comparand14;
public StringWrapper(string value)
{
Value = value;
var split = value.Split(';');
_comparand0 = split[0];
_comparand14 = split[14];
}
public bool Equals(StringWrapper other)
{
return string.Equals(_comparand0, other._comparand0, StringComparison.OrdinalIgnoreCase)
&& string.Equals(_comparand14, other._comparand14, StringComparison.OrdinalIgnoreCase);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
return obj is StringWrapper && Equals((StringWrapper) obj);
}
public override int GetHashCode()
{
unchecked
{
return ((_comparand0 != null ? StringComparer.OrdinalIgnoreCase.GetHashCode(_comparand0) : 0)*397)
^ (_comparand14 != null ? StringComparer.OrdinalIgnoreCase.GetHashCode(_comparand14) : 0);
}
}
}

Is there a way to check the data in advance of runtime if it's not the correct type?

I occasionally get data that is not completely clean, and during runtime I get error messages because the data doesn't match the expected type. For example, sometimes the data has a string where there should be an int, or an int where there should be a date.
Is there a way to scan the data first for bad data, so that I can fix it all at once instead of finding out during run-time and fixing it iteratively?
Here's my code which works:
class TestScore{
public string Name;
public int Age;
public DateTime Date;
public DateTime Time;
public double Score;
}
//read data
var Data = File.ReadLines(FilePath).Select(line => line.Split('\t')).ToArray();
//select data
var query = from x in Data
select new { Name = x[3], Age = x[1], Date = x[2], Time = x[5], Score = x[7] };
//create List and put data into List
List<TestScore> Results = new List<TestScore>();
for (int i = 0; i < query.Count; i++)
{
TestScore TS = new TestScore();
TS.Name = query[i].Name;
TS.Age = query[i].Age;
TS.Date = query[i].Date;
TS.Time = query[i].Time;
TS.Score = query[i].Score;
Results.Add(TS);
}
Is there a way to scan the data first for bad data, so that I can fix
it all at once instead of finding out during run-time and fixing it
iteratively?
Scanning is a runtime operation. However, it's fairly straightforward to implement a solution that gives you enough information to "fix it all at once".
The following code shows a pattern for validating the file in its entirety, and doesn't attempt to load any data unless it completely succeeds.
If it fails, a collection of all errors encountered is returned.
internal sealed class ParseStatus
{
internal bool IsSuccess;
internal IReadOnlyList<string> Messages;
}
private ParseStatus Load()
{
string filePath = "foo";
var data = File.ReadLines( filePath ).Select( line => line.Split( '\t' ) ).ToArray();
var results = from x in data
select new { Name = x[3], Age = x[1], Date = x[2], Time = x[5], Score = x[7] };
var errors = new List<string>();
int row = 0;
// first pass: look for errors by testing each value
foreach( var line in results )
{
row++;
int dummy;
if( !int.TryParse( line.Age, out dummy ) )
{
errors.Add( "Age couldn't be parsed as an int on line " + row );
}
// etc...use exception-free checks on each property
}
if( errors.Count > 0 )
{
// quit, and return errors list
return new ParseStatus { IsSuccess = false, Messages = errors };
}
// otherwise, it is safe to load all rows
// TODO: second pass: load the data
return new ParseStatus { IsSuccess = true };
}
For not finding out the errors during run-time, the best thing that I can think of would be to correct the data manually before your program runs ..
But as we are trying do things constructive, I think that using a static readonly field to indicate the data error would be helpful. The following is a simple example which doesn't take the failed items, you might want to modify it when you are going to do some advanced handling.
public partial class TestScore {
public static TestScore Parse(String plainText) {
var strings=plainText.Split('\t');
var result=new TestScore();
if(
strings.Length<5
||
!double.TryParse(strings[4], out result.Score)
||
!DateTime.TryParse(strings[3], out result.Time)
||
!DateTime.TryParse(strings[2], out result.Date)
||
!int.TryParse(strings[1], out result.Age)
)
return TestScore.Error;
result.Name=strings[0];
return result;
}
public String Name;
public int Age;
public DateTime Date;
public DateTime Time;
public double Score;
public static readonly TestScore Error=new TestScore();
}
public static partial class TestClass {
public static void TestMethod() {
var path=#"some tab splitted file";
var lines=File.ReadAllLines(path);
var format=""
+"Name: {0}; Age: {1}; "
+"Date: {2:yyyy:MM:dd}; Time {3:hh:mm}; "
+"Score: {4}";
var list=(
from line in lines
where String.Empty!=line
let result=TestScore.Parse(line)
where TestScore.Error!=result
select result).ToList();
foreach(var item in list) {
Console.WriteLine(
format,
item.Name, item.Age, item.Date, item.Time, item.Score
);
}
}
}

Categories