I'm working on a project where various files are to be uploaded to a platform. The platform has over 100 DocumentTypeIds. The files have no set naming convention. So in order to determine a file's DoumentTypeId I'm currently doing this in a method that returns a string
if (fileName.Contains("401k") || fileName.Contains("401(k)") || fileName.Contains("457b") || fileName.Contains("457(b)") || fileName.Contains("retire"))
{
return "401k-and-retirement";
}
else if (fileName.Contains("aflac"))
{
return "aflac";
}
else if ( fileName.Contains("beneficiary form") || fileName.Contains("beneficiaries") || fileName.Contains("beneficiary")
)
{
return "beneficiary-forms";
}
else if (fileName.Contains("benefit enrollment") || fileName.Contains("benefits enrollment") || fileName.Contains("benefit form") || fileName.Contains("benefits form") || fileName.Contains("benefit paperwork") || fileName.Contains("qualifying event") || fileName.Contains("enrollment") || fileName.Contains("b enroll") || fileName.Contains("benefit enrollnent")) //yes, the typo is on purpose. there are typos in some of the file names to import
{
return "benefits-election";
}
//etc
As you can imagine, this method is disgustingly ugly and long (~300 lines). I want to refactor this and make use of a database. I'm thinking of having a table with two fields DocumentTypeId and FileNameContains where FileNameContains is a comma-separated list of the OR case strings. This would allow for adding any cases without doing any code changes.
What I'm unsure about is how to do a string.Contains() comparison on a database. I'm aware of LIKE but that's not quite the same as string.Contains(). I've also thought about querying the database to convert FileNameContains field into an array or List for each record and doing an extension method(something like this) that loops through and does a string.Contains(). But that doesn't seem very efficient and fast.
Am I approaching this wrong? I just know there has to be a better way than a bunch of else if statements with OR cases. And I really think having a database would make this more elegant and scalable without any code changes and purely SQL UPDATE statements. Some help and input would be greatly appreciated.
I'd use a dictionary, or list of keyvaluepair.. the key being "find this", the value being "the file type"
var d = new Dictionary<string, string>{
{ "401k", "401k-and-retirement" },
{ "401(k)", "401k-and-retirement" },
{ "457b", "401k-and-retirement" },
{ "457(b)", "401k-and-retirement" },
{ "retire", "401k-and-retirement" },
{ "aflacs", "aflacs" },
...
};
foreach(var kvp in d)
if(filename.Contains(kvp.Key)) return kvp.Value;
Add more entries to your list/dict, or even fill it from a db
What I'm unsure about is how to do a string.Contains() comparison on a database
Well, you could transport this same concept into the db and store your values like this in your table:
Find, Ret
%401k%, 401k-and-retirement
%401(k)%, 401k-and-retirement
And query like:
SELECT ret FROM table WHERE #pFilename LIKE Find
With a c# side parameter of
//adjust type and size to match your column
command.Parameters.Add("#pFilename", SqlDbType.VarChar, 50).Value = "my401k.txt";
Or whatever equivalent you'll use in Dapper, EF etc..
context.FindRets.FirstOrDefault(fr => EF.Functions.Like(filename, fr.Find))
For the love of a relevant deity, please don't store a CSV in a table column. It will bite you sooner than later
I would usually do something like this:
var contains = new []
{
new
{
find = new [] { "401k", "401(k)", "457b", "457(b)", "retire" },
result = "401k-and-retirement"
},
new { find = new [] { "aflac" }, result = "aflac" },
new
{
find = new [] { "beneficiary form", "beneficiaries", "beneficiary" },
result = "beneficiary-forms"
},
new
{
find = new []
{
"benefit enrollment", "benefits enrollment", "benefit form", "benefits form", "benefit paperwork",
"qualifying event", "enrollment", "b enroll", "benefit enrollnent"
},
result = "benefits-election"
},
};
return
contains
.Where(x => x.find.Any(f => fileName.Contains(f)))
.Select(x => x.result)
.FirstOrDefault();
The advantage is that it's easier to add and maintain the items you're looking for. It's all on one portion of the screen.
You could go one step further and save this away in a text file that looks like this:
401k-and-retirement
401k
401(k)
457b
457(b)
retire
aflac
aflac
beneficiary-forms
beneficiary form
beneficiaries
beneficiary
benefits-election
benefit enrollment
benefits enrollment
benefit form
benefits form
benefit paperwork
qualifying event
enrollment
b enroll
benefit enrollnent
Then you can do this:
var contains =
File
.ReadLines("config.txt")
.Aggregate(
new[] { new { find = new List<string>(), result = "" } }.ToList(),
(a, x) =>
{
if (x.StartsWith(' '))
{
a.Last().find.Add(x.Substring(1));
}
else
{
a.Add(new { find = new List<string>(), result = x });
}
return a;
}, a => a.Skip(1).ToArray());
contains.Dump();
return
contains
.Where(x => x.find.Any(f => fileName.Contains(f)))
.Select(x => x.result)
.FirstOrDefault();
Now you can just add more items to the config file as you need.
Related
I am writing a small program that takes in a .csv file as input with about 45k rows. I am trying to compare the contents of this file with the contents of a table on a database (SQL Server through dynamics CRM using Xrm.Sdk if it makes a difference).
In my current program (which takes about 25 minutes to compare - the file and database are the exact same here both 45k rows with no differences), I have all existing records on the database in a DataCollection<Entity> which inherits Collection<T> and IEnumerable<T>
In my code below I am filtering using the Where method and then doing a logic based the count of matches. The Where seems to be the bottleneck here. Is there a more efficient approach than this? I am by no means a LINQ expert.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age);
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
EDIT: I can confirm that all existingRecords are in memory before this code is executed. There is no IO or DB access in the above loop.
Himbrombeere is right, you should execute the query first and put the result into a collection before you use Any, Count, AddRange or whatever method will execute the query again. In your code it's possible that the query is executed 5 times in every loop iteration.
Watch out for the term deferred execution in the documentation. If a method is implemented in that way, then it means that this method can be used to construct a LINQ query(so you can chain it with other methods and at the end you have a query). But only methods that don't use deferred execution like Count, Any, ToList(or a plain foreach) will actually execute it. If you dont want that the whole query is executed everytime and you have to access this query multiple times , it's better to store the result in a collection(.f.e with ToList).
However, you could use a different approach which should be much more efficient, a Lookup<TKey, TValue> which is similar to a dictionary and can be used with an anonymous type as key:
var lookup = existingRecords.Entities.ToLookup(r => new
{
fund = r["field_1"].ToString(),
bps = Convert.ToDecimal(r["field_2"]),
withdrawalPct = Convert.ToDecimal(r["field_3"]),
percentile = Convert.ToDecimal(r["field_4"]),
age = Convert.ToDecimal(r["field_5"])
});
Now you can access this lookup in the loop very efficiently.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = lookup[new {fund, bps, withdrawalPct, percentile, age}].ToList();
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
Note that this will work even if the key does not exist(an empty list is returned).
Add a ToList after your Convert.ToDecimal(r["field_5"]) == age);-line to force an immediate execution of the query.
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age)
.ToList();
The Where doesn´t actually execute your query, it just prepares it. The actual execution happens later in a delayed way. In your case that happens when calling Count which itself will iterate the entire collection of items. But if the first condition fails, the second one is checked leading to a second iteration of the complete collection when calling Count. In this case you actually execute that query a thrird time when calling matchingRows.First().
When forcing an immediate execution you´re executing the query only once and thus iterating the entire collection only once also which will decrease your overall-time.
Another option, which is basically along the same lines as the other answers, is to prepare your data first, so that you're not repeatedly calling things like r["field_2"] (which are relatively slow to look up).
This is a (1) clean your data, (2) query/join your data, (3) process your data approach.
Do this:
(1)
var inputs =
inputDataLines
.Select(record =>
{
var fields = record.Split(',');
return new
{
fund = fields[0],
bps = Convert.ToDecimal(fields[1]),
withdrawalPct = Convert.ToDecimal(fields[2]),
percentile = Convert.ToInt32(fields[3]),
age = Convert.ToInt32(fields[4]),
bombOutTerm = Convert.ToDecimal(fields[5]),
record
};
})
.ToArray();
var entities =
existingRecords
.Entities
.Select(entity => new
{
fund = entity["field_1"].ToString(),
bps = Convert.ToDecimal(entity["field_2"]),
withdrawalPct = Convert.ToDecimal(entity["field_3"]),
percentile = Convert.ToInt32(entity["field_4"]),
age = Convert.ToInt32(entity["field_5"]),
bombOutTerm = Convert.ToDecimal(entity["field_6"]),
entity
})
.ToArray()
.GroupBy(x => new
{
x.fund,
x.bps,
x.withdrawalPct,
x.percentile,
x.age
}, x => new
{
x.bombOutTerm,
x.entity,
});
(2)
var query =
from i in inputs
join e in entities on new { i.fund, i.bps, i.withdrawalPct, i.percentile, i.age } equals e.Key
select new { input = i, matchingRows = e };
(3)
foreach (var x in query)
{
entitiesFound.AddRange(x.matchingRows.Select(y => y.entity));
if (x.matchingRows.Count() == 0)
{
rowsToAdd.Add(x.input.record);
}
else if (x.matchingRows.Count() == 1)
{
if (x.matchingRows.First().bombOutTerm != x.input.bombOutTerm)
{
rowsToUpdate.Add(x.input.record);
entitiesToUpdate.Add(x.matchingRows.First().entity);
}
}
else
{
entitiesToDelete.AddRange(x.matchingRows.Select(y => y.entity));
rowsToAdd.Add(x.input.record);
}
}
I would suspect that this will be the among the fastest approaches presented.
I am trying to check if an entity in the database has any foreign key relations, so that I can inform the user the entity can or cannot be deleted.
I understand this can be done in a rolled back transaction, however I would like to inform the user how many references and where they are to assist in their decision to delete the entity.
I am trying to avoid loading the entire navigation collection into memory to get this data as it may be large. So, in light of this, I can formulate this simple query to firstly determine if there are any references:
private bool CanDeleteComponent(int compId)
{
var query = _Context.Components.Where(c => c.ComponentID == compId)
.Select(comp => new
{
References = comp.Incidents.Any() &&
comp.Drawings.Any() &&
comp.Documents.Any() &&
comp.Tasks.Any() &&
comp.Images.Any() &&
comp.Instructions.Any()
});
var result = query.FirstOrDefault();
if (result != null)
{
return !result.References;
}
return true;
}
This performs a series of SELECT COUNT(*) FROM <TABLE> WHERE... queries.
Now, I would like to provide some further information on the number of references. Ideally I would like to return a Dictionary with the referenced data's name, and the associated count. This way I can loop through the result, rather than access individual properties of an anonymous type. However, what I have tried results in an exception:
var query = _Context.Components.Where(c => c.ComponentID == compId)
.Select(comp => new Dictionary<string, int>
{
{"Events", comp.Incidents.Count()},
{"Drawings", comp.Drawings.Count()},
{"Documents", comp.Documents.Count()},
{"Tasks", comp.Tasks.Count()},
{"Images", comp.Images.Count()},
{"Instructions", comp.Instructions.Count()},
});
var result = query.FirstOrDefault();
return query.Any(fk => fk.Value > 0);
The exception that is raised is:
A first chance exception of type 'System.NotSupportedException' occurred in EntityFramework.SqlServer.dll
Additional information: Only list initializer items with a single element are supported in LINQ to Entities.
Is there any way around this, such that I can return some sort of IEnumerable rather than an anonymous type?
Thanks
EDIT
I currently have lazy loading disabled on my context. If there is a solution without turning Lazy loading on that would be appreciated.
You can't build a Dictionary<K,V> in the SELECT statement, that's why you get System.NotSupportedException. You can get the single Component first by query, and build the dictionary in the memory.
var comp = _Context.Components.SingleOrDefault(c => c.ComponentID == compId);
var dict = new Dictionary<string, int>()
{
{ "Events", comp.Incidents.Count()},
{ "Drawings", comp.Drawings.Count()},
{ "Documents", comp.Documents.Count()},
{ "Tasks", comp.Tasks.Count()},
{ "Images", comp.Images.Count()},
{ "Instructions", comp.Instructions.Count()}
};
EDIT If you are not using lazy loading, you can explicitly .Include the properties in the query:
var comp = _Context.Components
.Include(c => c.Incidents)
...
.SingleOrDefault(c => c.ComponentID == compId);
Is there any way around this, such that I can return some sort of IEnumerable rather than an anonymous type?
Actually there is, although I'm not sure you'll like the generated SQL (compared to the one using anonymous type).
var query = _Context.Components.Where(c => c.ComponentID == compId)
.SelectMany(comp => new []
{
new { Key = "Events", Value = comp.Incidents.Count() },
new { Key = "Drawings", Value = comp.Drawings.Count() },
new { Key = "Documents", Value = comp.Documents.Count() },
new { Key = "Tasks", Value = comp.Tasks.Count() },
new { Key = "Images", Value = comp.Images.Count() },
new { Key = "Instructions", Value = comp.Instructions.Count() },
}.ToList());
var result = query.ToDictionary(e => e.Key, e => e.Value);
return query.Any(fk => fk.Value > 0);
I'm trying to remove duplicated code and run into an issue here:
I've got five very similar entities (different asset types, e.g. Bonds, Stocks). The methods I'm trying to condense return some statistics about these assets. The statistics are obtained with the help of Linq, the queries are almost identical.
Before, I had five separate methods in my controller (e.g. BondStatistics, StockStatistics). One of these would look like this (db is my database context which has each asset type defined):
public JsonResult BondStatistics()
{
var items = db.Bonds.ToList();
var result = new[]
{
new
{
key = "Bonds",
values = items.Select(i =>
new {
x = i.priceChangeOneDayInEuro,
y = i.priceChangeTotalInEuro,
size = i.TotalValueInEuro,
toolTip = i.Description
}
)
},
};
return Json(result, JsonRequestBehavior.AllowGet);
}
I googled that one way to rewrite these into just one method could be using reflection. However, I thought I could use a dirty shortcut, something like this:
public JsonResult Scatter(string asset)
{
if (asset == "Stocks") { var items = db.Stocks.ToList(); };
if (asset == "Bonds") { var items = db.Bonds.ToList(); };
if (asset == "Futures") { var items = db.Futures.ToList(); };
if (asset == "Options") { var items = db.Options.ToList(); };
if (asset == "Funds") { var items = db.Funds.ToList(); }
var result = new[]
{
new
{
key = asset,
values = items.Select(i =>
new {
x = i.priceChangeOneDayInEuro,
y = i.priceChangeTotalInEuro,
size = i.TotalValueInEuro,
toolTip = i.Description
}
)
},
};
return Json(result, JsonRequestBehavior.AllowGet);
}
This leads to the problem that the type of "items" is not known in the Linq query at design time.
What would be a good way to overcome this problem? Use some totally other pattern, do use reflection or is there an easy fix?
EDIT
As suggested, I created an Interface and let the BaseAsset-class implement it. Then, changing the condensed method to
List<IScatter> items = new List<IScatter>();
if (asset == "Stocks") { items = db.Stocks.ToList<IScatter>(); };
if (asset == "Bonds") { items = db.Bonds.ToList<IScatter>(); };
if (asset == "Futures") { items = db.Futures.ToList<IScatter>(); };
if (asset == "Options") { items = db.Options.ToList<IScatter>(); };
if (asset == "Funds") { items = db.Funds.ToList<IScatter>(); }
works, at design time at last. Thank you very much!
You are putting everything into var, but what exactly is the type of the items you are processing?
If it would be List<Stock> for db.Stocks.ToList(), List<Bond> for db.Bonds.ToList() you can simply define an interface (e.g. IHasPriceInformation) which has the fields you are using in the LINQ query. Then, Let Stock, Bond and others implement this interface (or provide an abstract base implementation of them) and simply run your LINQ Query on a List<IHasPriceInformation>.
I have the following class:
public static IEnumerable<SelectListItem> GetDatastore()
{
return new[]
{
new SelectListItem { Value = "DEV", Text = "Development" },
new SelectListItem { Value = "DC1", Text = "Production" },
};
}
What I need is to execute a function to return the Datastore name. Something like
var abc = getDatastoreName("DEV").
Do I need to do this with LINQ or is there some easy way? How could I code this function?
public static string getDatastoreName(string name)
{
var result = GetDatastore().SingleOrDefault(s => s.Value == name);
if (result != null)
{
return result.Text;
}
throw /* some exception */
}
The Value property of SelectListItem is usually unique and hence I have SingleOrDefault(). If that is not the case then you can switch to using FirstOrDefault().
A simple LINQ query can find the value you want:
var val = dataStore.Where(d => d.Value == "DEV").FirstOrDefault();
//`val` will be the item, or null if the item doesn't exist in the list
But this is only good for small lists of items -- it's worst-case Order N.
If you wanted a better search, you could store your data as a dictionary with the keys being used as item values, for example, and databind against that rather than against a list of SelectListItems. That would allow you to look up the values constant time.
For most cases, simple LINQ will be fine. If you have a big list, or you're querying that list frequently... consider an alternative.
Maybe you are searching something like this
i have "Details" page that works perfectly:
#Html.DisplayFor(model => model.Code1dItems.SingleOrDefault(m => m.Value == model.Code1Id.ToString()).Text, new { #class = "Width100P" })
In my model :
Code1Id is int value and it comes from database Code1dItems is IEnumerable
value like 'GetDatastore' and returns ID->string, ID value matches with Code1d and Text->string
depending on your question you should use :
string abc = GetDatastore.SingleOrDefault(m => m.Value == "DEV").Text
if you get value from database you should use my code example.
I'm trying to write a dynamic sort of command line processor where I have a dictionary with keys being possible parameters, and the member being an Action where the string is the text between the parameters passed on the command line. Want to be able to add parameters just by adding the params array, and writing the action in the dictionary.
Yes I realize this is a pointless exercise in overcomplicating implementation to simplify maintenance. Mostly just trying to stress myself to learn more linq.
Here's my dictionary:
private static Dictionary<string[], Action<string>> _commandLineParametersProcessor = new Dictionary<string[], Action<string>>()
{
{
new string[] {"-l", "--l", "-log", "--log"},
(logFile) =>
{
_blaBla.LogFilePath = logFile;
}
},
{
new string[] { "-s", "--s", "-server", "--server" },
(server) =>
{
ExecuteSomething(server);
_blaBla.Server = server;
}
}
};
What's the most elegant mechanism to take string[] args and not just correlate the members that fall within any of the dictionary key arrays, but Aggregate((x,y) => string.Format("{0} {1}", x, y)) the sequence of elements (was thinking TakeWhile() fits in here somehow) inbetween the args[] members that would be Contain()ed in any of the keys arrays, and handing them into the action of the respective key's value member.
We have all written these little command line processors countless times, and while obviously a simple loop and switch is always more than adequate, this is again as I said an exercise trying to stress my linq skills. So please no complaints that I'm overengineering, that part is obvious.
Update:
To make this maybe a little easier, here is a non-linq way of doing what I'm looking for (may be imperfect, this is just winging it):
Action<string> currentAction;
string currentActionParameter;
for(int i = 0; i < e.Args.Length; i++)
{
bool isParameterSwitch = _commandLineParametersProcessor.Keys.Any((parameterChoices) => parameterChoices.Contains(e.Args[i]));
if (isParameterSwitch)
{
if (!string.IsNullOrEmpty(currentActionParameter) && currentAction != null)
{
currentAction(currentActionParameter);
currentAction = null;
currentActionParameter = "";
}
currentAction = _commandLineParametersProcessor[_commandLineParametersProcessor.Keys.Single((parameterChoices) => parameterChoices.Contains(e.Args[i]))];
}
else
{
currentActionParameter = string.Format("{0} {1}", currentActionParameter, e.Args[i]);
}
}
This is not an altogether bad approach, I just wonder if anyone can maybe simplify it a little using linq or otherwise, though this may be the simplest form i guess..
Borrowing half of Adam Robinson's answer (+1 btw), but realizing that the Dictionary will never be accessed by key, and you just want to run the Actions instead of building up a string...
var inputCommands = args
.Select((value, idx) => new { Value = value, Group = idx / 2 })
.GroupBy(x => x.Group)
.Select(g => new
{
Command = g.First().Value,
Argument = g.Last().Value
}).ToList();
inputCommands.ForEach(x =>
{
Action<string> theAction =
(
from kvp in commands
where kvp.Key.Contains(x.Command)
select kvp.Value
).FirstOrDefault();
if (theAction != null)
{
theAction(x.Argument);
}
}
kvp.Key.Contains really defeats the whole point of Dictionary. I'd re-design that to be a Dictionary<string, Action<string>>. Then you could say
inputCommands.ForEach(x =>
{
if (commands.ContainsKey(x.Command))
{
commands[x.Command](x.Argument);
}
}
PS: I can recall much more obtuse C# code that I have written than this.
I must admit the possibility that you want to collect the actions, instead of running them. Here is that code:
var todo =
(
from x in inputCommands
let theAction =
(
from kvp in commands
where kvp.Key.Contains(x.Command)
select kvp.Value
).FirstOrDefault()
where theAction != null
select new { TheAction = theAction, Argument = x.Argument }
).ToList();
Assuming you know that every command has a corresponding argument (so 'args' will always be in the format of
cmd arg (repeated)
You could do something ridiculous like this...
var output = args.Select((value, idx) => new { Value = value, Group = idx / 2 })
.GroupBy(x => x.Group)
.Select(g => new
{
Command = commands.FirstOrDefault(kvp =>
kvp.Key.Contains(g.First().Value)).Value,
Argument = g.Last().Value
})
.Where(c => c.Command != null)
.Aggregate(
new StringBuilder(),
(builder, value) =>
{
builder.AppendLine(value.Command(value.Argument));
return builder;
}).ToString();
But that is, frankly, the most obtuse bit of C# that I can recall ever writing, and not a very good way to teach yourself LINQ. Nonetheless, it will do what you're asking.
EDIT
Just realized (thanks to David B) that your key is a string[], not just a string, so I added some even more obtuse code that deals with that.