I'm trying to create a concise LINQ query to split a CSV file and convert to a XML file from a array of columns I have gleaned of a XSD file.
Its all working good. Except I just can't get the Counter to reset back to Zero after each row. It should go 0,1,2,3,4 then 0,1,2,3,4 but its going 0,1,2,3,4 then 5,6,7,8,9.
I'm new to LINQ so hopefully this is simple for someone with a bit of experience, thanks!
string[] columns = {"COL1","COL2","COL3","COL4","COL5"};
int Counter = 0;
XElement cust = new XElement("Root",
from str in source.Skip(1)
let fields = str.Split(',')
select new XElement("Records",
from c in columns
select new XElement(c, fields[Counter++])
)
);
That is not the way to do this. It is extremely bad practice to have side-effecting functions (like an incrementor) inside a LINQ select clause, particularly because of things like parallelization. If you were doing this manually with a foreach, I might be tempted to just suggest use of a mod:
fields[(Counter++) % fields.Length]
But even that would still be a little weird.
This is a more acceptable way, which uses the Zip method to find column names by matching them up by index.
string[] columns = {"COL1","COL2","COL3","COL4","COL5"};
var rows = source.Skip(1)
.Select(c => columns.Zip(c.Split(','),
(column, value) => new
{
Column = column,
Value = value
});
var elements = rows.Select(c => new XElement("Records",
c.Select(x => new XElement(c.Column, c.Value))));
return new XElement("Root", elements);
That all said, it's important to note that this is not currently generalizable, and will fail when the columns contain quoted values with commas in them. You might want to look into third party libraries. I've had luck with CsvHelper myself.
Related
A little question for a simple LINQ request. This is my first time with LINQ and still not understand all mechanism.
My structure is something like this
List<string> baseData = new List<string>{"\"10\";\"Texte I need\";\"Texte\"",
"\"50\";\"Texte I need\";\"Texte\"",
"\"1000\";\"Texte I need\";\"Texte\"",
"\"100\";\"Texte I need\";\"Texte\""};
Each line of data is construct with field separator ";" and each field are encapsule with quote ".
I have another List Compose with value i have to find in my first list. And i have the Position in line i have to search. because "Texte I need" can be equal with value i am searching
List<string> valueINeedToFind = new List<string>{"50","100"};
char fieldSeparator = ';';
int fieldPositionInBaseDataForSearch = 0;
int fieldPositionInBaseDataToReturn = 1;
I made a first Linq to extract only Line interested me.
List<string> linesINeedInAllData = baseData.Where(Line => valueINeedToFind.Any(Line.Split(fieldSeparator)[fieldPositionInBaseDataForSearch].Trim('"').Contains)).ToList();
This first request Work Great and now i have only Data Line Interested me.
My problem is I don't want all the line But only a list of the value "Texte I need" in position FieldPositionInBaseDataToReturn.
I have to made another LINQ or can i modify my first to directly get what I need?
Since you will be using the split version of each line more than once, separate out the Split operation and then work on the resulting array:
List<string> linesINeedInAllData = baseData.Select(Line => Line.Split(fieldSeparator))
.Where(splitLine => valueINeedToFind.Any(splitLine[fieldPositionInBaseDataForSearch].Trim('"').Contains))
.Select(splitLine => splitLine[fieldPositionInBaseDataToReturn])
.ToList();
List<string> linesINeedInAllData = baseData.Where(Line => valueINeedToFind.Any(Line.Split(fieldSeparator)[fieldPositionInBaseDataForSearch].Trim('"').Equals)).ToList()
.Select(Line => Line.Split(fieldSeparator)[fieldPositionInBaseDataToReturn].Trim('"').ToList();
I have a List<Dictionary<string,string>> something like this:
[0] key1 val,key2 val,key3 val
[1] key1 val,key2 val,key3 val
[2] key1 val,key2 val,key3 val
And i have a list of column names in the same order as columns in the datatable.
I want to filter only those keys which are there inside the list from the dictionary and also insert it in the proper order.
I'm able to filter the required keys to be inserted but then how do i insert it in the proper order in linq.
var colList = new List<string>() { "key3", "key1"};
dict.ForEach(p => jsonDataTable.Rows.Add(p.Where(q=>colList.Contains(q.key)).Select(r => r.Value).ToArray()));
I cannot do like this because number of columns will vary and also the method must work when we pass any list of column names:
foreach(var item in dict)
jsonDatatable.Rows.Add(item[colList[0]], item[colList[1]]);
Please suggest some ways.
LINQ will never ever change the input sources. You can only extract data from it.
Divide problems in subproblems
The only way to change the input sources is by using the extracted data to update your sources. Make sure that before you update the source you have materialized your query (= ToList() etc)
You can divide your problem into subproblems:
Convert the table into a sequence of columns in the correct order
convert the sequence of columns into a sequence of column names (still in the correct order)
use the column names and the dictionary to fetch the requested data.
By separating your problem into these steps, you prepare your solution for reusability. If in future you change your table to a DataGridView, or a table in an entity framework database, or a CSV file, or maybe even JSON, you can reuse the latter steps. If in future you need to use the column names for something else, you can still use the earlier steps.
To be able to use the code in a LINQ-like way, my advice would be to create extension method. If you are unfamiliar with extension methods, read Extension Methods Demystified
You will be more familiar with the layout of your table (System.Data.DataTable? Windows.Forms.DataGridView? DataGrid in Windows.Controls?) and your columns, so you'll have to create the first ones yourself. In the example I use MyTable and MyColumn; replace them with your own Table and Column classes.
public static IEnumerable<MyColumn> ToColumns(this MyTable)
{
// TODO: return the columns of the table
}
public static IEnumerable<string> ToColumnNames(this IEnumerable<MyColumn> columns)
{
return columns.Select(column => ...);
}
If the column name is just a property of the column, I wouldn't bother creating the second procedure. However, the nice thing is that it hides where you get the name from. So to be future-changes-proof, maybe create the method anyway.
You said these columns were sorted. If you want to be able to use ThenBy(...) consider returning an IOrderedEnumerable<MyColumn>. If you won't sort the sorted result, I wouldn't bother.
Usage:
MyTable table = ...
IEnumerable<string> columnNames = table.ToColumns().ToColumnNames();
or:
IEnumerable<string> columnNames = table.ToColumns()
.Select(column => column.Name);
The third subproblem is the interesting one.
Join and GroupJoin
In LINQ whenever you have two tables and you want to use a property of the elements in one table to match them with the properties of another table, consider to use (Group-)Join.
If you only want items of the first table that match exactly one item of the other table, use Join: "Get Customer with his Address", "Get Product with its Supplier". "Book with its Author"
On the other hand, if you expect that one item of the first table matches zero or more items from the other table, use GroupJoin: "Schools, each with their Students", "Customers, each with their Orders", "Authors, each with their Books"
Some people still think in database terms. They tend to use some kind of Left Outer Join to fetch "Schools with their Students". The disadvantage of this is that if a School has 2000 Students, then the same data of the School is transferred 2000 times, once for every Student. GroupJoin will transfer the data of the School only once, and the data of every Student only once.
Back to your question
In your problem: every column name is the key of exactly one item in the Dictionary.
What do you want to do with column names without keys? If you want to discard them, use Join. If you still want to use the column names that have nothing in the Dictionary, use GroupJoin.
IEnumerable<string> columNames = ...
var result = columnNames.Join(myDictionary,
columName => columName, // from every columName take the columnName,
dictionaryItem => dictionaryItem.Key, // from every dictionary keyValuePair take the key
// parameter resultSelector: from every columnName and its matching dictionary keyValuePair
// make one new object:
(columnName, keyValuePair) => new
{
// Select the properties that you want:
Name = columnName,
// take the whole dictionary value:
Value = keyValuePair.Value,
// or select only the properties that you plan to use:
Address = new
{
Street = keyValuePair.Street,
City = keyValuePair.City,
PostCode = keyValuePair.Value.PostCode
...
},
});
If you use this more often: consider to create an extension method for this.
Note: the order of the result of a Join is not specified, so you'll have to Sort after the Order
Usage:
Table myTable = ...
var result = myTable.ToColumns()
.Select(column => column.Name)
.Join(...)
.Sort(joinResult => joinResult.Name)
.ToList();
Instead of filtering on the List<Dictionary<string, string>>, filter on the colList so that you will get in the same order and only if the colList is available in the List<Dictionary<string, string>>
This is as per my understanding, please comment if you need the result in any other way.
var dictAllValues = dict.SelectMany(x => x.Select(y => y.Value)).ToList();
// Now you can filter the colList using the above values
var filteredList = colList.Where(x => dictAllValues.Contains(x));
// or you can directly add to final list as below
jsonDataTable.Rows.AddRange(colList.Where(x => dictAllValues.Contains(x)).ToList());
Hello I'm new to linq and lambda
I have two lists
fl.LocalOpenFiles ...
List<string> f....
there is a property (string) for example taking index 0
fl.LocalOpenFiles[0].Path
i wanted to select all from the first list fl.LocalOpenFiles where fl.LocalOpenFiles.Path starts with a string from the List<string> f
I finally got this...
List<LocalOpenFile> lof = new List<LocalOpenFile>();
lof = fl.LocalOpenFiles.Join(
folders,
first => first.Path,
second => second,
(first, second) => first)
.ToList();
But its just selecting folders that meet the requirement first.Path == second and i couldnt find a way to get the data that i want which is something meeting this "braindump" requirement:
f[<any>] == fl.LocalOpenFiles[<any>].Path.Substring(0, f[<any>].Length)
Another Example...
List<string> f = new List<string>{ "abc", "def" };
List<LocalOpenFile> lof = new List<LocalOpenFile>{
new LocalOpenFile("abc"),
new LocalOpenFile("abcc"),
new LocalOpenFile("abdd"),
new LocalOpenFile("defxsldf"),)}
// Result should be
// abc
// abcc
// defxsldf
I hope i explained it in a understandable way :)
Thank you for your help
Do you mean something like this :
List<LocalOpenFile> result =
lof.Where(file => f.Any(prefix => file.Path.StartsWith(prefix)))
.ToList();
You can use a regular where instead of a join, which will give you more straight forward control over the selection criteria;
var result =
from file in lof
from prefix in f
where file.Path.StartsWith(prefix)
select file.Path; // ...or just file if you want the LocalOpenFile objects
Note that a file matching multiple prefixes may show up more than once. If that is a problem, you can just add a call to Distinct to eliminate duplicates.
EDIT:
If you - as it seems in this case - only want to know the matching path and not the prefix it matches (ie you only want data from one collection as in this case), I'd go for #har07's Any solution instead.
In my database field I have a Positions field, which contains a space separated list of position codes. I need to add criteria to my query that checks if any of the locally specified position codes match at least one of the position codes in the field.
For example, I have a local list that contains "RB" and "LB". I want a record that has a Positions value of OL LB to be found, as well as records with a position value of RB OT but not records with a position value of OT OL.
With AND clauses I can do this easily via
foreach (var str in localPositionList)
query = query.Where(x => x.Position.Contains(str);
However, I need this to be chained together as or clauses. If I wasn't dealing with Linq-to-sql (all normal collections) I could do this with
query = query.Where(x => x.Positions.Split(' ').Any(y => localPositionList.contains(y)));
However, this does not work with Linq-to-sql as an exception occurs due it not being able to translate split into SQL.
Is there any way to accomplish this?
I am trying to resist splitting this data out of this table and into other tables, as the sole purpose of this table is to give an optimized "cache" of data that requires the minimum amount of tables in order to get search results (eventually we will be moving this part to Solr, but that's not feasible at the moment due to the schedule).
I was able to get a test version working by using separate queries and running a Union on the result. My code is rough, since I was just hacking, but here it is...
List<string> db = new List<string>() {
"RB OL",
"OT LB",
"OT OL"
};
List<string> tests = new List<string> {
"RB", "LB", "OT"
};
IEnumerable<string> result = db.Where(d => d.Contains("RB"));
for (int i = 1; i < tests.Count(); i++) {
string val = tests[i];
result = result.Union(db.Where(d => d.Contains(val)));
}
result.ToList().ForEach(r => Console.WriteLine(r));
Console.ReadLine();
Just getting my head around all this LINQ stuff and it seems I'm stuck at the first hurdle.
I have a datatable as such:
OrderNo LetterGroup Filepath
----------- ----------- --------------------------------------------------
0 0 Letters/SampleImage.jpg
0 0 Letters/UKPC7_0.jpg
0 0 Letters/UKPC8_0.jpg
What I need is to get all of the filepaths from the Filepath column into a String array. I thought LINQ would be perfect for this (am I right?), but can't seem to construct the correct query.
Can anyone provide some code samples that would point me in the right direction? I have searched around - but don't seem to be getting anywhere.
There are extension methods which make working with data sets much easier:
using System.Data.Linq;
var filePaths =
from row in dataTable.AsEnumerable()
select row.Field<string>("Filepath");
var filePathsArray = filePaths.ToArray();
You can also use the method syntax to put it in one statement:
var filePaths = dataTable
.AsEnumerable()
.Select(row => row.Field<string>("Filepath"))
.ToArray();
string[] filePaths = (from DataRow row in yourDataTable.Rows
select row["Filepath"].ToString()).ToArray();
If you want to use LINQ all the way, set up your database and create a context object. Then you should be able to do something like this:
var filepaths = from order in _context.Orders
select order.Filepath;
This is assuming your table for the row is named Orders, which I guess by your first column name of order. If you wanted to return a set of the order numbers as well for using later to know where the file path came from you could do something like so:
var results = from order in _context.Orders
select new
{
order.OrderNo,
order.Filepath
}
This would give you a new anonymous type that contained both those values as properties.