How to turn a string into a 2d string array - c#

as the title suggests, I am looking for guidance in how to turn a string (csvData) into a 2D string array by splitting it two times with ';' and ',' respectivly.
Currently I am at the stage where I am able to split it once into rows and turn it into an array, but I cannot figure out how to instead create a 2D array where the columns divided by ',' are also separate.
string[] Sep = csvData.Split(';').Select(csvData => csvData.Replace(" ","")).Where(csvData => !string.IsNullOrEmpty(csvData)).ToArray();
I have tried various things like :
string[,] Sep = csvData.Split(';',',').Select(csvData => csvData.Replace(" ","")).Where(csvData => !string.IsNullOrEmpty(csvData)).ToArray();
naivly thinking that c# would understand what I tried to achieve, but since I am here it's obvious that I got the error that "cannot implicitly convert type string[] to string [*,*]"
Note that I have not coded for a while, so if my thinking is completely wrong and you do not understand what I am trying to convey with this question, I apologize in advance.
Thanks!

In a strongly-typed language like C#, the compiler makes no assumptions about what you intend to do with your data. You must make your intent explicit through your code. Something like this should work:
string csvData = "A,B;C,D";
string[][] sep = csvData.Split(';') // Returns string[] {"A,B","C,D"}
.Select(str => str.Split(',')) // Returns IEnumerable<string[]> {{"A","B"},{"C","D"}}
.ToArray(); // Returns string[][] {{"A","B"},{"C","D"}}

Rows are separated by semicolon, columns by comma?
Splitting by ';' gives you an array of rows. Split a row by ',' gives you an array of values.
If your data has a consistent schema, as in each csv you process has the same columns, you could define a class to represent the entity to make the data easier to with with.
Let's say it's customer data:
John,Smith,8675309,johnsmith#gmail.com;
You could make a class with those properties:
public class Customer
{
public string FirstName { get; set; }
public string LastName { get; set; }
public string Phone { get; set; }
public string Email { get; set; }
}
Then:
var rows = csvdata.Split(';');
List<Customer> customers = new();
foreach(var row in rows)
{
var customer = row.Split(',');
customers.Add(new()
{
FirstName = row[0],
LastName = row[1],
Phone = row[2],
Email = row[3]
});
}
Now you have a list of customers to do whatever it is you do with customers.

Here is an answer to present a few alternative ideas and things you can do with C# - more for educational/academic purposes than anything else. These days to consume a CSV we'd use a CSV library
If your data is definitely regularly formed you can get away with just one Split. The following code splits on either char to make one long array. It then stands to reason that every 4 elements is a new customer, the data of the customer being given by n+0, n+1, n+2 and n+3. Because we know how many data items we will consume, dividing it by 4 gives us the number of customers so we can presize our 2D array
var bits = data.Split(';',',');
var twoD = new string[bits.Length/4,4];
for(int x = 0; x < bits.Length; x+=4){
twoD[x/4,0] = bits[x+0];
twoD[x/4,1] = bits[x+1];
twoD[x/4,2] = bits[x+2];
twoD[x/4,3] = bits[x+3];
}
I don't think I'd use 2D arrays though - and I commend the other answer advising to create a class to hold the related data; you can use this same technique
var custs = new List<Customer>();
for(int x = 0; x < bits.Length;){
custs.Add(new()
{
FirstName = bits[x++],
LastName = bits[x++],
Phone = bits[x++],
Email = bits[x++]
});
}
Here we aren't incrementing x in the loop header; every time a bit of info is assigned x is bumped up by 1 in the loop body. We could have kept the same approach as before, jumping it by 4 - just demoing another approach that lends itself well here.
I mentioned that these days we probably wouldn't really read a csv manually and split ourselves - what if the data contains a comma, or a semicolon - it wrecks the file structure
There are a boatload of libraries that read CSV files, CsvHelper is a popular one, and you'd use it like:
using var reader = new StreamReader("path\\to\\file.csv");
using var csv = new CsvReader(reader, CultureInfo.InvariantCulture)
var custs = csv.GetRecords<Customer>().ToList();
...
Your file would have a header line with column names that match your property names in c#. If it doesn't then you can use attributes on the properties to tell CsvH what column should be mapped to what property - https://joshclose.github.io/CsvHelper/getting-started/

Here's the simplest way I know to produce a 2d array by splitting a string.
string csvData = "A,B,C;D,E,F,G";
var temporary =
csvData
.Split(';')
.SelectMany((xs, i) => xs.Split(',').Select((x, j) => new { x, i, j }))
.ToArray();
int max_i = temporary.Max(x => x.i);
int max_j = temporary.Max(x => x.j);
string[,] array = new string[max_i + 1, max_j + 1];
foreach (var t in temporary)
{
array[t.i, t.j] = t.x;
}
I purposely chose csvData to be missing a value.
temporary is this:
And the final array is this:

Related

Is there a way to filter a CSV file for data validation without for loops. (Lumenworks CSVReader)

I want to be able to filter out a CSV file and perform data validation on the filtered data. I imagine for loops, but the file has 2 million cells and it would take a long time. I am using Lumenworks CSVReader for accessing the file using C#.
I found this method csvfile.Where<> but I have no idea what to put in the parameters. Sorry I am still new to coding as well.
[EDIT] This is my code for loading the file. Thanks for all the help!
//Creating C# table from CSV data
var csvTable = new DataTable();
var csvReader = new CsvReader(newStreamReader(System.IO.File.OpenRead(filePath[0])), true);
csvTable.Load(csvReader);
//grabs header from the CSV data table
string[] headers = csvReader.GetFieldHeaders(); //this method gets the headers of the CSV file
string filteredData[] = csvReader.Where // this is where I would want to implement the where method, or some sort of way to filter the data
//I can access the rows and columns with this
csvTable.Rows[0][0]
csvTable.Columns[0][0]
//After filtering (maybe even multiple filters) I want to add up all the filtered data (assuming they are integers)
var dataToValidate = 0;
foreach var data in filteredData{
dataToValidate += data;
}
if (dataToValidate == 123)
//data is validated
I would read some of the documentation for the package you are using:
https://github.com/phatcher/CsvReader
https://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
To specifically answer the filtering question, so it only contains the data you are searching for consider the following:
var filteredData = new List<List<string>>();
using (CsvReader csv = new CsvReader(new StreamReader(System.IO.File.OpenRead(filePath[0])), true));
{
string searchTerm = "foo";
while (csv.ReadNextRecord())
{
var row = new List<string>();
for (int i = 0; i < csv.FieldCount; i++)
{
if (csv[i].Contains(searchTerm))
{
row.Add(csv[i]);
}
}
filteredData.Add(row);
}
}
This will give you a list of a list of string that you can enumerate over to do your validation
int dataToValidate = 0;
foreach (var row in filteredData)
{
foreach (var data in row)
{
// do the thing
}
}
--- Old Answer ---
Without seeing the code you are using to load the file, it might be a bit difficult to give you a full answer, ~2 Million cells may be slow no matter what what.
Your .Where comes from System.Linq
https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.where?view=net-6.0
A simple example using .Where
//Read the file and return a list of strings that match the where clause
public List<string> ReadCSV()
{
List<string> data = File.ReadLines(#"C:\Users\Public\Documents\test.csv");
.Select(line => line.Split(','))
// token[x] where x is the column number, assumes ID is column 0
.Select(tokens => new CsvFileStructure { Id = tokens[0], Value = tokens[1] })
// Where filters based on whatever you are looking for in the CSV
.Where(csvFileStructure => csvFileStructure.Id == "1")
.ToList();
return data;
}
// Map of your data structure
public class CsvFileStructure
{
public long Id { get; set; }
public string Name { get; set; }
public string Value { get; set; }
}
Modified from this answer:
https://stackoverflow.com/a/10332737/7366061
There is no csvreader.Where method. The "where" is part of Linq in C#. The link below shows an example of computing columns in a csv file using Linq:
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/how-to-compute-column-values-in-a-csv-text-file-linq

Read delimited text files dynamically

I want to read a textfile dynamically based on the headers. Consider an example like this
name|email|phone|othername|company
john|john#example.com|1234||example
doe|doe#example.com||pin
jane||98485|
The values to be read like this for the following records
name email phone othername company
john john#example.com 1234 example
doe doe#example.com pin
jane 98485
I tried using this
using (StreamReader sr = new StreamReader(new MemoryStream(textFile)))
{
while (sr.Peek() >= 0)
{
string line = sr.ReadLine(); //Using readline method to read text file.
string[] strlist = line.Split('|'); //using string.split() method to split the string.
Obj obj = new Obj();
obj.Name = strlist[0].ToString();
obj.Email = strlist[1].ToString();
obj.Phone = strlist[2].ToString();
obj.othername = strlist[3].ToString();
obj.company = strlist[4].ToString();
}
}
Above code works if all the delimiters are put exactly but doesn't work when given dynamically like the above. Any possible solution for this?
If you have any control over this, you should use a better serialization techinology, or at least use a csv parser that can deal with this sort of format. However, if you want to use string.Split, you can also take advantage of ElementAtOrDefault
Returns the element at a specified index in a sequence or a default
value if the index is out of range.
Given
public class Data
{
public string Name { get; set; }
public string Email { get; set; }
public string Phone { get; set; }
public string OtherName { get; set; }
public string Company { get; set; }
}
Usage
var results = File
.ReadLines(SomeFileName) // stream the lines from a file
.Skip(1) // skip the header
.Select(line => line.Split('|')) // split on pipe
.Select(items => new Data() // populate some funky class
{
Name = items.ElementAtOrDefault(0),
Email = items.ElementAtOrDefault(1),
Phone = items.ElementAtOrDefault(2),
OtherName = items.ElementAtOrDefault(3),
Company = items.ElementAtOrDefault(4)
});
foreach (var result in results)
Console.WriteLine($"{result.Name}, {result.Email}, {result.Phone}, {result.OtherName}, {result.Company}");
Output
john, john#example.com, 1234, , example
doe, doe#example.com, , pin,
jane, , 98485, ,
When you split the line like string[] strlist = line.Split('|'); you can get undesired results.
For example: jane||98485| generates an array of just 4 elements as you can check here https://rextester.com/WBOT6074 online.
You should check your array strList after generating it with thinks like measuring the size.
As you haven't given clear details about the problem I cannot give a more especific answer to it.

Merge 2 Object Array Data Series That Share the Same [0] Index

So let's say that you have 2 series of data. (Both object arrays, in your choice of a serialized JSON string, or the actual objects).
For Instance:
string str1 = #"{""datapoints"":[[""02/28/2019"",146800.0],[""02/27/2019"",147700.0],[""02/26/2019"",153900.0]]}";
Then, you have a second series that is very similar...
string str2 = #"{""datapoints"":[[""02/28/2019"",145600.0],[""02/27/2019"",143600.0],[""02/26/2019"",152200.0]]}";
Note: the object arrays inside are both a length of "2", and both contain the same "date" as the [0] index.
How does one merge the 2 object arrays into 1, to yield the following output...
string str3 = #"{""datapoints"":[[""02/28/2019"",145600.0,145600.0],[""02/27/2019"",143600.0,143600.0],[""02/26/2019"",152200.0,152200.0]]}";
For clarity, I'm interested in using the [0] index once, and merging the [1] indexes together. (the number values)
Extra credit if this can be a loop, or can be done with any number of series.
Using json.net, you can deserialize each JSON sample to an object that contains a datapoints property that is an enumerable of object arrays, then merge them using the LINQ methods GroupBy() and Aggregate().
Say the JSON samples to be merged are in a string [][] jsonSeriesList like so:
string str1 = #"{""datapoints"":[[""02/28/2019"",146800.0],[""02/27/2019"",147700.0],[""02/26/2019"",153900.0]]}";
string str2 = #"{""datapoints"":[[""02/28/2019"",145600.0],[""02/27/2019"",143600.0],[""02/26/2019"",152200.0]]}";
var jsonSeriesList = new[] { str1, str2 }; // Add others as required
Then you can create a combined series as follows:
var merged = jsonSeriesList.Aggregate(
new { datapoints = Enumerable.Empty<object[]>() },
(m, j) => new
{
datapoints = m.datapoints.Concat(JsonConvert.DeserializeAnonymousType(j, m).datapoints)
// Group them by the first array item.
// This will throw an exception if any of the arrays are empty.
.GroupBy(i => i[0])
// And create a combined array consisting of the key (the first item from all the grouped arrays)
// concatenated with all subsequent items in the grouped arrays.
.Select(g => new[] { g.Key }.Concat(g.SelectMany(i => i.Skip(1))).ToArray())
});
var mergedJson = JsonConvert.SerializeObject(merged);
Notes:
I am deserializing to an anonymous type for brevity. You could create an explicit data model if you prefer.
I am assuming the individual datapoint arrays all have at least one item.
There is no attempt to sort the resulting merged JSON by series date. You could do that if necessary.
The solution assumes that you will never have multiple component arrays in the same series with the same first item, e.g. "02/28/2019" repeated twice. If so, they will get merged also.
Sample .Net fiddle here.
Here is a simplified example (simplified as validations might be required, but hope it given you a place to start):
Convert it into dot net object
Then go through each date point - for each date point go through all the series, adding the values.
//container object for the json series
public class Container
{
public List<List<object>> datapoints;
}
//Input series in JSON
string[] inputSeries = new string[]
{
"{\"datapoints\": [[\"02/28/2019\", 146800.0],[\"02/27/2019\", 147700.0],[\"02/26/2019\", 153900.0]]}",
"{\"datapoints\": [[\"02/28/2019\", 145600.0],[\"02/27/2019\", 143600.0],[\"02/26/2019\", 152200.0]]}"
};
//Container for input series in dot net object
List<Container> con = new List<Container>();
foreach (var series in inputSeries)
{
con.Add(JsonConvert.DeserializeObject<Container>(series));
}
// output container
Container output = new Container();
output.datapoints = new List<List<object>>();
// assuming all series have equal number of data points.....might not be so
for (int i = 0; i < con[0].datapoints.Count; i++)
{
output.datapoints.Add(new List<object>());
// inner loop is to go across series for the same datapoint....
for (int j = 0; j < con.Count; j++)
{
// add the date if this is the first series....after that only add the values
// right now the assumption is that the dates are in order and match....validation logic might be required
if (j == 0)
{
output.datapoints[i].Add(con[j].datapoints[i][0]);
output.datapoints[i].Add(con[j].datapoints[i][1]);
}
else
{
output.datapoints[i].Add(con[j].datapoints[i][1]);
}
}
}

Search a List of string array to find a value in matching element and return another element in same array

So I have
List<string[]> listy = new List<string[]>();
listy.add('a','1','blue');
listy.add('b','2','yellow');
And i want to search through all of the list ti find the index where the array containing 'yellow' is, and return the first element value, in this case 'b'.
Is there a way to do this with built in functions or am i going to need to write my own search here?
Relatively new to c# and not aware of good practice or all the built in functions. Lists and arrays im ok with but lists of arrays baffles me somewhat.
Thanks in advance.
As others have already suggested, the easiest way to do this involves a very powerful C# feature called LINQ ("Language INtegrated Queries). It gives you a SQL-like syntax for querying collections of objects (or databases, or XML documents, or JSON documents).
To make LINQ work, you will need to add this at the top of your source code file:
using System.Linq;
Then you can write:
IEnumerable<string> yellowThings =
from stringArray in listy
where stringArray.Contains("yellow")
select stringArray[0];
Or equivalently:
IEnumerable<string> yellowThings =
listy.Where(strings => strings.Contains("yellow"))
.Select(strings => strings[0]);
At this point, yellowThings is an object containing a description of the query that you want to run. You can write other LINQ queries on top of it if you want, and it won't actually perform the search until you ask to see the results.
You now have several options...
Loop over the yellow things:
foreach(string thing in yellowThings)
{
// do something with thing...
}
(Don't do this more than once, otherwise the query will be evaluated repeatedly.)
Get a list or array :
List<string> listOfYellowThings = yellowThings.ToList();
string[] arrayOfYellowThings = yellowThings.ToArray();
If you expect to have exactly one yellow thing:
string result = yellowThings.Single();
// Will throw an exception if the number of matches is zero or greater than 1
If you expect to have either zero or one yellow things:
string result = yellowThings.SingleOrDefault();
// result will be null if there are no matches.
// An exception will be thrown if there is more than one match.
If you expect to have one or more yellow things, but only want the first one:
string result = yellowThings.First();
// Will throw an exception if there are no yellow things
If you expect to have zero or more yellow things, but only want the first one if it exists:
string result = yellowThings.FirstOrDefault();
// result will be null if there are no yellow things.
Based on the problem explanation provided by you following is the solution I can suggest.
List<string[]> listy = new List<string[]>();
listy.Add(new string[] { "a", "1", "blue"});
listy.Add(new string[] { "b", "2", "yellow"});
var target = listy.FirstOrDefault(item => item.Contains("yellow"));
if (target != null)
{
Console.WriteLine(target[0]);
}
This should solve your issue. Let me know if I am missing any use case here.
You might consider changing the data structure,
Have a class for your data as follows,
public class Myclas
{
public string name { get; set; }
public int id { get; set; }
public string color { get; set; }
}
And then,
static void Main(string[] args)
{
List<Myclas> listy = new List<Myclas>();
listy.Add(new Myclas { name = "a", id = 1, color = "blue" });
listy.Add(new Myclas { name = "b", id = 1, color = "yellow" });
var result = listy.FirstOrDefault(t => t.color == "yellow");
}
Your current situation is
List<string[]> listy = new List<string[]>();
listy.Add(new string[]{"a","1","blue"});
listy.Add(new string[]{"b","2","yellow"});
Now there are Linq methods, so this is what you're trying to do
var result = listy.FirstOrDefault(x => x.Contains("yellow"))?[0];

Table of strings in C#

Is there a way of creating a table with each cell containing a string in C# ?
The closest thing I found is multidimensional arrays string[,] names;, but it seems like its length needs to be defined which is a problem to me.
Here is what my code looks like :
string[] namePost;
int[] numbPage;
string post="";
string newPost;
int i=0;
int j=0;
foreach (var line in File.ReadLines(path).Where(line => regex1.Match(line).Success))
{
newPost = regex1.Match(line).Groups[1].Value;
if (String.Compare(newPost, post) == 0)
{
j = j + 1;
}
else
{
namePost[i] = post;
numbPage[i] = j;
post = newPost;
j = 1;
i = i + 1;
}
}
Each instance of the for writes the name of the new "post" in a cell of namePost. In the end, the namePost table stores the name of all the posts that are different from one another.
What is the best way to achieve that ?
If you are simply trying to store the posts, you can use the List class from the System.Collections.Generic namespace:
using System.Collections.Generic;
List<String> namePost = new List<String>();
Then, instead of namePost[i] = post;, use
namePost.Add(post);
DataTable
https://msdn.microsoft.com/en-us/library/system.data.datatable(v=vs.110).aspx
Use this, no need to define length at all.
Useful guide and examples:
http://www.dotnetperls.com/datatable
You can just use a
var table = new List<List<string>>();
This would give you a dynamic 2D table of strings.
This will give you all your unique posts. If you want the result as a list you can just do a
.ToList ()
with the result.
static IEnumerable<string> AllPosts(Regex regex, string filePath)
{
return File.ReadLines (filePath)
.Where (line => regex.Match (line).Success)
.Select (line => regex.Match (line).Groups [1].Value)
.Distinct ();
}

Categories