Split multiple csv files by value from one csv file with c#

Split multiple csv files by value from one csv file with c# - c#

I need to open a csv file. Than I need filter each data and generate an output for each value of them.
◘ Example
•Input file = "full list.csv"
NAME CITY
Mark Venezia
John New York
Lisa San Miguel
Emily New York
Amelia New York
Nicolas Venezia
Bill San Miguel
Steve Venezia
Output will be =
• file1 = "full list_Venezia.csv"
NAME CITY
Mark Venezia
Nicolas Venezia
Steve Venezia
• file2 = "full list_New York.csv"
NAME CITY
John New York
Emily New York
Amelia New York
• file3 = "full list_San Miguel"
NAME CITY
Lisa San Miguel
Bill San Miguel
I'm using c# with ConsoleApplication on Visual Studio and I started to read the input file in this method:
string inputFile = "full list.csv";
string outputFile;
string line;
string titles = File.ReadLines(inputFile).First();
System.IO.StreamReader file = new System.IO.StreamReader(inputFile);
while ((line = file.ReadLine()) != null)
{
}
file.Close();
System.IO.StreamWriter fileOut = new System.IO.StreamWriter(outputFile);
foreach (DatiOutput objOut in listOutput)
{
}
fileOut.Close();
Is there an algorithm that allows me to filter the data I need?

Here's a non-LINQy approach using a Dictionary to keep a reference to each output file based on the city name as the Key (there's nothing wrong with LINQ, though!):
string[] values;
string header;
string line, city, outputFileName;
string inputFile = "full list.csv";
Dictionary<string, System.IO.StreamWriter> outputFiles = new Dictionary<string, System.IO.StreamWriter>();
using (System.IO.StreamReader file = new System.IO.StreamReader(inputFile))
{
header = file.ReadLine();
while ((line = file.ReadLine()) != null)
{
values = line.Split(",".ToCharArray());
city = values[1];
if (!outputFiles.ContainsKey(city))
{
outputFileName = "full list_" + city + ".csv";
outputFiles.Add(city, new System.IO.StreamWriter(outputFileName));
outputFiles[city].WriteLine(header);
}
outputFiles[city].WriteLine(line);
}
}
foreach(System.IO.StreamWriter outputFile in outputFiles.Values)
{
outputFile.Close();
}

You have written most of the good parts yourself, and now you need to fill the blanks.
Breaking down the steps
Read the CSV to a Collection
Group Collection based on City
Write the
each group to separate file
The first step is of course is to read the input file
var listOutput = new List<DatiOutput>();
while ((line = file.ReadLine()) != null)
{
var data = line.Split(new []{";"},StringSplitOptions.RemoveEmptyEntries);
if(!data[0].Trim().Equals("NAME"))
listOutput.Add(new DatiOutput{ Name = data[0].Trim(), City = data[1].Trim()});
}
I have assumed your DatiOutput looks like following as it was not given.
public class DatiOutput
{
public string City{get;set;}
public string Name{get;set;}
}
Then next step is to Group the collection based on City and then write them to file. You can use LINQ to group the collection based on City.
listOutput.GroupBy(c=>c.City)
Once your have the result, you can now create file name with corresponding city name appended, and add the data to it.
foreach (var objOut in listOutput.GroupBy(c=>c.City))
{
var filePath = $"{Path.Combine(Path.GetDirectoryName(inputFile),Path.GetFileNameWithoutExtension(inputFile))}_{objOut.First().City}.csv";
using(System.IO.StreamWriter fileOut = new System.IO.StreamWriter(File.Open(filePath, FileMode.OpenOrCreate, FileAccess.ReadWrite)))
{
fileOut.WriteLine($"NAME;CITY");
foreach(var items in objOut)
{
fileOut.WriteLine($"{items.Name};{items.City}");
}
}
}
You would have the desired result

foreach (var g in File.ReadAllLines("full list.csv")
.Skip(1)
.Select(l => new {
Name = l.Substring(0, l.IndexOf(',')),
City = l.Substring(l.IndexOf(',') + 1) })
.GroupBy(l => l.City))
{
File.WriteAllLines($"full list_{g.Key}.csv", new[] { "NAME,CITY" }
.Concat(g.Select(l => $"{l.Name},{l.City}")));
}
The key part your example was missing was GroupBy - this allows you to group the data you have read in to groups based on a certain criteria (in our case City).
Group by is a powerful LINQ extension that allows you to filter data. The example above reads in all the data, skips the header, uses select to transform each line into an instance of an anonymous type to contain the name and city. GroupBy is then used to group these instances by city. Then for each group the data is written to a new file.

I would take #TVOHMs answer to slightly cleaner direction by keeping the same codestyle on the whole solution.
File.ReadAllLines("full list.csv") // Read the input file
.Skip(1) // Skip the header row
.Select(row => row.Split(',')) // Split each row to array of city and name
.GroupBy(row => row[1], row => row[0]) // Group by cities, selecting names
.ToList() // To list, so .ForEach is possible
.ForEach(group => File.WriteAllLines($"full list_{group.Key}.csv", group)); // Create file for each group and write the names

Related

C# - check which element in a csv is not in an other csv and then write the elements to another csv

My task is to check which of the elements of a column in one csv are not included in the elements of a column in the other csv. There is a country column in both csv and the task is to check which countries are not in the secong csv but are in the first csv.
I guess I have to solve it with Lists after I read the strings from the two csv. But I dont know how to check which items in the first list are not in the other list and then put it to a third list.

There are many way to achieve this, for many real world CSV applications it is helpful to read the CSV input into a typed in-memory store there are standard libraries that can assist with this like CsvHelper as explained in this canonical post: Parsing CSV files in C#, with header
However for this simple requirement we only need to parse the values for Country form the master list, in this case the second csv. We don't need to manage, validate or parse any of the other fields in the CSVs
Build a list of unique Country values from the second csv
Iterate the first csv
Get the Country value
Check against the list of countries from the second csv
Write to the third csv if the country was not found
You can test the following code on .NET Fiddle
NOTE: this code uses StringWriter and StringReader as their interfaces are the same as the file reader and writers in the System.IO namespace. but we can remove the complexity associated with file access for this simple requirement
string inputcsv = #"Id,Field1,Field2,Country,Field3
1,one,two,Australia,three
2,one,two,New Zealand,three
3,one,two,Indonesia,three
4,one,two,China,three
5,one,two,Japan,three";
string masterCsv = #"Field1,Country,Field2
one,Indonesia,...
one,China,...
one,Japan,...";
string errorCsv = "";
// For all in inputCsv where the country value is not listed in the masterCsv
// Write to errorCsv
// Step 1: Build a list of unique Country values
bool csvHasHeader = true;
int countryIndexInMaster = 1;
char delimiter = ',';
List<string> countries = new List<string>();
using (var masterReader = new System.IO.StringReader(masterCsv))
{
string line = null;
if (csvHasHeader)
{
line = masterReader.ReadLine();
// an example of how to find the column index from first principals
if(line != null)
countryIndexInMaster = line.Split(delimiter).ToList().FindIndex(x => x.Trim('"').Equals("Country", StringComparison.OrdinalIgnoreCase));
}
while ((line = masterReader.ReadLine()) != null)
{
string country = line.Split(delimiter)[countryIndexInMaster].Trim('"');
if (!countries.Contains(country))
countries.Add(country);
}
}
// Read the input CSV, if the country is not in the master list "countries", write it to the errorCsv
int countryIndexInInput = 3;
csvHasHeader = true;
var outputStringBuilder = new System.Text.StringBuilder();
using (var outputWriter = new System.IO.StringWriter(outputStringBuilder))
using (var inputReader = new System.IO.StringReader(inputcsv))
{
string line = null;
if (csvHasHeader)
{
line = inputReader.ReadLine();
if (line != null)
{
countryIndexInInput = line.Split(delimiter).ToList().FindIndex(x => x.Trim('"').Equals("Country", StringComparison.OrdinalIgnoreCase));
outputWriter.WriteLine(line);
}
}
while ((line = inputReader.ReadLine()) != null)
{
string country = line.Split(delimiter)[countryIndexInInput].Trim('"');
if(!countries.Contains(country))
{
outputWriter.WriteLine(line);
}
}
outputWriter.Flush();
errorCsv = outputWriter.ToString();
}
// dump output to the console
Console.WriteLine(errorCsv);

Since you write about solving it with lists, I assume you can load those values from the CSV to the lists, so let's start with:
List<string> countriesIn1st = LoadDataFrom1stCsv();
List<string> countriesIn2nd = LoadDataFrom2ndCsv();
Then you can easily solve it with linq:
List<string> countriesNotIn2nd = countriesIn1st.Where(country => !countriesIn2nd.Contains(country)).ToList();
Now you have your third list with countries that are in first, but not in the second list. You can save it.

Read delimited text files dynamically

I want to read a textfile dynamically based on the headers. Consider an example like this
name|email|phone|othername|company
john|john#example.com|1234||example
doe|doe#example.com||pin
jane||98485|
The values to be read like this for the following records
name email phone othername company
john john#example.com 1234 example
doe doe#example.com pin
jane 98485
I tried using this
using (StreamReader sr = new StreamReader(new MemoryStream(textFile)))
{
while (sr.Peek() >= 0)
{
string line = sr.ReadLine(); //Using readline method to read text file.
string[] strlist = line.Split('|'); //using string.split() method to split the string.
Obj obj = new Obj();
obj.Name = strlist[0].ToString();
obj.Email = strlist[1].ToString();
obj.Phone = strlist[2].ToString();
obj.othername = strlist[3].ToString();
obj.company = strlist[4].ToString();
}
}
Above code works if all the delimiters are put exactly but doesn't work when given dynamically like the above. Any possible solution for this?

If you have any control over this, you should use a better serialization techinology, or at least use a csv parser that can deal with this sort of format. However, if you want to use string.Split, you can also take advantage of ElementAtOrDefault
Returns the element at a specified index in a sequence or a default
value if the index is out of range.
Given
public class Data
{
public string Name { get; set; }
public string Email { get; set; }
public string Phone { get; set; }
public string OtherName { get; set; }
public string Company { get; set; }
}
Usage
var results = File
.ReadLines(SomeFileName) // stream the lines from a file
.Skip(1) // skip the header
.Select(line => line.Split('|')) // split on pipe
.Select(items => new Data() // populate some funky class
{
Name = items.ElementAtOrDefault(0),
Email = items.ElementAtOrDefault(1),
Phone = items.ElementAtOrDefault(2),
OtherName = items.ElementAtOrDefault(3),
Company = items.ElementAtOrDefault(4)
});
foreach (var result in results)
Console.WriteLine($"{result.Name}, {result.Email}, {result.Phone}, {result.OtherName}, {result.Company}");
Output
john, john#example.com, 1234, , example
doe, doe#example.com, , pin,
jane, , 98485, ,

When you split the line like string[] strlist = line.Split('|'); you can get undesired results.
For example: jane||98485| generates an array of just 4 elements as you can check here https://rextester.com/WBOT6074 online.
You should check your array strList after generating it with thinks like measuring the size.
As you haven't given clear details about the problem I cannot give a more especific answer to it.

C# Reading From A File

I have an assignment to read text in from a text file. The text is an inventory with department names followed by the quantity of items in the department and then items underneath the separate departments with the item name, quantity, and price. A part of the text file is shown here:
Stationary, 4
Notebook, 20, .99
Pens, 50, .50
Pencils, 25, 0.09
Post It Notes, 30, 4.99
Tools, 6
Band Saw, 3, 299.99
Cresent Wrench, 12, 8.49
Circular Saw, 5, 89.99
Tile Cutter, 2, 149.99
Screwdriver, 70, 2.99
Measuring Tape, 34, 10.99
I'm able to load the text file in just fine. My task is to take in user input for them decide which department they want to shop on. How am I able to display just the departments and then just the items of the desired department from the user? I have a method to output all of the departments and items shown below. This is my first time working with text files with C# so I have no idea what I am doing.
static void ReadDepartments(out List<Dept> s)
{
string line; // detail line read from file
string[] tokens; // break line up into tokens
string deptName; // name of department
int deptQuan; // quan of different items in dept
s = new List<Dept>();
try
{
using (StreamReader sr = new StreamReader(#"..\..\inventory.txt"))
{
while (sr.Peek() >=0)
{
List<Item> myItemList = new List<Item>(); // new instance of tmp List
line = sr.ReadLine();
tokens = line.Split(',');
deptName = tokens[0];
deptQuan = Convert.ToInt32(tokens[1]);
for (int i=0; i< deptQuan; i++)
{
// read each line of dept and build a list of items
line = sr.ReadLine();
tokens = line.Split(',');
Item myItem = new Item(tokens[0], Convert.ToInt32(tokens[1]), Convert.ToDouble(tokens[2]));
myItemList.Add(myItem);
}
s.Add(new Dept(deptName,deptQuan, myItemList));
}
}
}
catch (Exception e)
{
Console.WriteLine("Can't open file because {0}", e.Message);
}
}
static void PrintInventory(List<Dept> s)
{
foreach (Dept d in s)
{
Console.WriteLine("Dept: {0,-20} [{1} items]", d.Name, d.NumItems);
for (int i = 0; i < d.NumItems; i++)
Console.WriteLine(" {0,-15} {1,4} {2,7:$,##0.00}", d.GetItem(i).Name,
d.GetItem(i).Quan, d.GetItem(i).PriceEach);
}
}
I started a method to check if the desired department is a valid department shown below. Is there an easier way to implement the valid[] variable instead of including all of the department names? I will have to error check for valid items and that seems like it would be very tedious.
static string GetDepartment(string prompt)
{
string[] valid = {"BOOKS", "FOOD", "VIDEO", "SPORTS", "STATIONARY", "TOOLS"};
string ans = GetString(prompt, valid, "Inavlid response. Please choose a department.");
return ans;
}
static string GetString(string prompt, string[] valid, string error)
{
string response;
bool OK = false;
do
{
Console.Write(prompt);
response = Console.ReadLine().ToUpper();
foreach (string s in valid) if (response == s) OK = true;
if (!OK) Console.WriteLine(error);
}
while (!OK);
return response;
}

Your method that reads from the text file results in you having a List<Dept>. So you can generate a list of valid department names by going through the list of departments that you have read from the text file.

LINQ is great for searching through data and checking if items exists and what not.
Since you have all of your departments in a List you can query it via some different methods. Either search your raw data directly
using System.Linq;
...
List<Dept> departments;
...
departments.Any(dept => dept.Name == response);
Or if you want to send the names to your GetString method:
GetString(prompt, departments.Select(dept => dept.Name), ...);
...
string GetString(string prompt, IEnumerable<string> valids, string error)
...
valids.Any(valid => valid == response);
If you want to use the Department instead you can use FirstOrDefault instead (which also takes a predicate) and check for null if the item does not exist
Department found = departments.FirstOrDefault(dept => dept.Name == response);
if (found == null) //department name does not exist

If everything is ok on your code then you can add a if statement to check whether if it is your desired department info to print. I didn't check the whole code. You also can solve this problem with Linq (it will be more smart coding then) but your code seems to me as a starter code, so may be it will be a little inefficient but I hope it will solve your problem.
static void PrintInventory(List<Dept> s,string userInputDepartmentName)
{
if(s == null && s.Count <= 0)
return;
foreach (Dept d in s)
{
if(d.Name.Equals(userInputDepartmentName))
{
Console.WriteLine("Dept: {0,-20} [{1} items]", d.Name, d.NumItems);
for (int i = 0; i < d.NumItems; i++)
Console.WriteLine("{0,-15} {1,4} {2,7:$,##0.00}", d.GetItem(i).Name,d.GetItem(i).Quan, d.GetItem(i).PriceEach);
}
}
}

MailMerge TaleStart-TableEnd: Add enter on end of page with multiline rows

We have a MailMerge docx which has the following table:
_____________________________________________________________________________
Date Id Description Amount
_____________________________________________________________________________
{{TableStart {{Id}} {{Description}} € {{Amount
:Lines}}{{Da \# 0,00}}{{
te \#"dd-MM- TableEnd:Li
yyyy"}} nes}}
_____________________________________________________________________________
Total € {{Total \#
0,00}}
_____________________________________________________________________________
Here is an example result row:
____________________________________________________________________________
Date Id Description Amount
____________________________________________________________________________
03-09-2015 0001 Company Name € 25,00
Buyer Name 1, Buyer Name 2
Product description
Extra description line
As you can see, the description has multiple lines. When the end of a page is reached, it just continues on the next page. So with the example above, the line could be like this at the end of page 1:
03-09-2015 0001 Company Name € 25,00
Buyer Name 1, Buyer Name 2
And like this at the start of page 2:
Product description
Extra description line
What I'd like instead is the following: When an item doesn't fit on the page anymore, the entire item must go to the start of the next page. Basically I want to prevent items from splitting between pages. Is there any way to accomplish this with MailMerge?
Also, we use C# in our project. Here is the code we use for the MailMerge. I think it's a bit to ambitious to ask if there is a setting to allow the behavior I desire in the MailMerge libraries. Anyway, here is the code we use to convert the data & docx to a pdf:
var pdf = _documentService.CreateTableFile(new TableFileData(date, companyId,
dataList.Select(x => new TableRowData
{
Description = x.Description,
Amount = x.Amount,
Date = x.Date,
Id = x.Id
}).ToList()));
var path = Path.Combine(FileService.GetTemporaryPath(), Path.GetRandomFileName());
var file = Path.ChangeExtension(path, "pdf");
using (var fs = File.OpenWrite(file))
{
fs.Write(pdf, 0, pdf.Length);
}
Process.Start(file);
With CreateTableFile-method:
public byte[] CreateTableFile(TableFileData data)
{
if (data == null) throw new ArgumentNullException("data");
const string fileName = "TableFile.docx";
var path = Path.Combine(_templatePath, fileName);
using (var fs = File.OpenRead(path))
{
var dataSource = new DocumentDataSource(data);
return GenerateDocument(fs, dataSource);
}
}
With GenerateDocument-method:
private static byte[] GenerateDocument(Stream template, DocumentDataSource dataSource, IFieldMergingCallback fieldMergingCallback = null)
{
var doc = new Document(template);
doc.MailMerge.FieldMergingCallback = fieldMergingCallback;
doc.MailMerge.UseNonMergeFields = true;
doc.MailMerge.CleanupOptions = MailMergeCleanupOptions.RemoveContainingFields |
MailMergeCleanupOptions.RemoveUnusedFields |
MailMergeCleanupOptions.RemoveUnusedRegions |
MailMergeCleanupOptions.RemoveEmptyParagraphs;
doc.MailMerge.Execute(dataSource);
doc.MailMerge.ExecuteWithRegions((IMailMergeDataSourceRoot)dataSource);
doc.UpdateFields();
using (var ms = new MemoryStream())
{
var options = new PdfSaveOptions { WarningCallback = new AsposeWarningCallback() };
doc.Save(ms, options);
return ms.ToArray();
}
}

After #bibadia's suggestion in the first comment of the question, I've unchecked the suggested checkbox of the table settings in the docx:
This did the trick, so thanks a lot bibadia!

Parse file with multiples values lines C#

I have to parse a file that is constructed like this :
User: jcruz Name: Jules Last: Cruz Email: Some#email.com
User: jdoe Name: John Last: Doe Email: Some#email.com
User: pmartin Name: Pete Last: Martin Email: Some#email.com
User: rrichard Name: Reed Last: Richard Email: Some#email.com
I need to split every line taking just Name, Last Name and Email into an object of the type
var contact = new Conctact {
Name = fieldFromLine,
Last= fieldFromLine,
Email = fieldFromLine
}
So my problem is which tool use : String.Split or Regex.Split. and how to implement it.
Thank you very much...
This is what a Have done so far:
String archivo = ((FileDialog)sender).FileName;
using (TextReader sr = new StreamReader(archivo,Encoding.UTF8))
{
String line = String.Empty;
while ((line = sr.ReadLine()) != null )
{
string[] result = Regex.Split(line,"User:");
//How to get the other fields...
}
}

var result =File.ReadLines(fileName)
.Select(line => line.Split(new string[]{"User:", "Name:", "Last:", "Email:"}, StringSplitOptions.RemoveEmptyEntries))
.Select(parts => new Conctact(){ Name = parts[1], Last = parts[2], Email = parts[3] })
.ToArray();

try this:
public class contact
{
public string Name { get; set; }
public string Lname { get; set; }
public string Email { get; set; }
}
List<contact> contact = new List<contact>();
private void split()
{
var lines = File.ReadAllLines(#"txt file address");
foreach (var line in lines)
{
var splitline=line.Split(':');
string name = splitline[2].Replace("Last", "");
string lname = splitline[3].Replace("Email","");
contact.Add(new contact { Name = name, Lname = lname, Email = splitline[4] });
}
}

Answer: neither.
Use a simple finite-state machine parser to read the file because unless you can guarantee that the text values will never be "Name:" or "Last:" or "Email:" then you'll run into problems with string splitting. Also FSM-based parsers are significantly faster than string splitting (as there is no extraneous string allocation).
I don't have the time to write out an entire parser, but here's the simple logic:
enum State { InUser, InName, InLast, InEmail }
State currentState = State.InUser; // you start off with the 'cursor' in the "User" section
StringBuilder sb = new StringBuilder(); // this holds the current string element
foreach(Char c in entireTextFile) { // presumably using `StreamReader.Read()`
switch( currentState ) {
case InUser:
switch( c ) {
// state transition logic here
}
// append the character to the StringBuilder until you've identified and reached the next field, then save the sb value to the appropriat
case InName:
// and so on...
}
}
Of course, an FSM parser is fundamentally the same thing as a Regular Expression parser, but it means you get to code the state-transitions yourself rather than using RegEx's syntax which is faster, performance-wise.
If your project is small and don't care about performance, and can guarantee certain data formatting rules then I'd go with regex.
But never, ever, use String.Split to read a file.

Regex is overkill. Also note that some last names that contain spaces.
Contact c = new Contact();
string () tokens = input.Split(":".ToCharArray());
if (tokens.Count < 5)
return; // error
// now strip the last word from each token
c.Name = tokens(2).Substring(0, tokens(2).LastIndexOf(" ".ToCharArray())).Trim();
c.Last = tokens(3).Substring(0, tokens(3).LastIndexOf(" ".ToCharArray())).Trim();
c.Email = tokens(4).Trim();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Split multiple csv files by value from one csv file with c# - c#

Related

C# - check which element in a csv is not in an other csv and then write the elements to another csv

Read delimited text files dynamically

C# Reading From A File

MailMerge TaleStart-TableEnd: Add enter on end of page with multiline rows

Parse file with multiples values lines C#

Categories

Resources