I am new to linq, and this keeps popping on a null volume field. The file is unpredictable, and it will happen so I would like to put a 0 in where there is an exception. any quick and easy way to do it?
var qry =
from line in File.ReadAllLines("C:\\temp\\T.txt")
let myRecX = line.Split(',')
select new myRec()
{
price = Convert.ToDecimal( myRecX[0].Replace("price = ", "")) ,
volume = Convert.ToInt32(myRecX[1].Replace("volume =", "")),
dTime = Convert.ToDateTime( myRecX[2].Replace("timestamp =", ""))
};
If you would like to use a default when the incoming data is null, empty, or consists entirely of whitespace characters, you can do it like this:
volume = string.IsNullOrWhitesplace(myRecX[1])
? defaultVolume // <<== You can use any constant here
: Convert.ToInt32(myRecX[1].Replace("volume =", ""))
However, this is a "quick and dirty" way of achieving what you need, because the position of each named parameter remains hardcoded. A more robust way would be writing a mini-parser that pays attention to the names of attributes specified in the file, rather than replacing them with an empty string.
You could use something like this, which offers an expressive way to write what you want:
static TOutput Convert<TInput, TOutput>(
TInput value,
params Func<TInput, TOutput>[] options)
{
foreach (var option in options) {
try { return option(value); }
catch { }
}
throw new InvalidOperationException("No option succeeded.");
}
Used like:
select new myRec()
{
price = Convert(myRecX[0].Replace("price = ", ""),
input => Convert.ToDecimal(input),
or => 0M),
...
};
The function indirection and implicit array construction may incur a slight performance penalty, but it gives you a nice syntax with which to specify a number of possible conversions, where the first successful one is taken.
I think here there's an issue beyond the use of Linq.
In general is bad practice manipulating file data before sanitizing it.
Ever if the following question is on the filename (rather than it's content) is a good starting point to understand the concept of sanitizing input:
C# Sanitize File Name
After all yourself tells that your code lacks control of the file content, so before call:
let myRecX = line.Split(',')
I suggest define a private method like:
string SanitizeInputLine(string input) {
// here do whatever is needed to bring back input to
// a valid format in a way that subsequent calls will not
// fail
return input;
}
Applying it is straightforward;
let myRecX = SanitizeInputLine(line).Split(',')
As general rule never trust input.
Let me quote Chapter 10 named _All Input Is Evil!__ of Writing Secure Code by Howard/LeBlanc:
...you should never trust data until data is validated. Failure to do
so will render your application vulnerable. Or, put another way: all
input is evil until proven otherwise.
Related
I have this simple query where I need to identify all tickets within start and end number of a specific TicketBook object on api side in EF Core.
var ticketBook = await Context.TicketBooks.FirstOrDefaultAsync(x=>x.Id == query.TicketBookId);
if (ticketBook != null)
{
dbTickets = dbTickets.Where(x => ConvertTicketNumberToInt(x, ticketBook));
}
private bool ConvertTicketNumberToInt(Ticket t, TicketBook tb)
{
try
{
var numberOnly = new string(t.Number.Where(t => char.IsDigit(t)).ToArray());
var tNumber = Convert.ToInt64(numberOnly);
return tNumber >= tb.StartIntNumber && tNumber <= tb.EndIntNumber;
}
catch(OverflowException)
{
return false;
}
}
the problem is the "Number" property in Ticket class is nvarchar (string) but I need to convert it into integer for this particular query only and for that I have written a small method which does it for me. But as you can see its very time consuming and not efficient at all so my api call just times out.
I am trying to figure out how to do this in LINQ without writing extra methods like this. The trick is that "number" property can sometimes can have a few alphabets in it which throws exception while converting it to integer so I need to remove those non digit characters before the comparison that's why I had to write this dedicated method for it.
As already mentioned, you are facing some performance issues storing nvarchar instead of long.
Anyway, what you're doing in your code is not that bad - you have fairly simple method for the job which keeps your LINQ code clean and tidy. But since you want to have a single LINQ query, try the following (it can be done shorter but I've chosen this way for readability):
var ticketBook = await Context.TicketBooks.FirstOrDefaultAsync(x=>x.Id == query.TicketBookId);
if (ticketBook != null)
{
dbTickets = dbTickets
.Select(t => new { Ticket = t, Number = new string(t.Number.Where(n => char.IsDigit(n)).ToArray()) })
.Select(t =>
{
long ticketNumber = long.MinValue;
long.TryParse(t.Number), out ticketNumber);
return new { Ticket = t, Number = ticketNumber };
})
.Where(t => t.Ticket >= ticketBook.StartIntNumber && t.Ticket <= ticketBook.EndIntNumber)
.Select(t => t.Ticket);
}
What it does:
in first pass all your varchars are stripped of the letters and converted to strings containing only the digits, then an anonymous type with the complete Ticket class is returned along with this string
the strings are parsed to long - I've abused long.MinValue to indicate a failed conversion (since you're using char.IsDigit(c) I see you're not expect any negative values in your results. You might as well use ulong for twice the positive range and abuse 0 value) and again, an anonymous type is returned
those anonymous structures are filtered with the condition you provided
finally, only the original Ticket structure is returned
If you're concerned about the number of passes over the initial results - I've run several performance tests to find out whether having a number of Selects with short operations inside is slower than having one pass with an elaborate operation and I haven't observed any significant difference.
Your best bet is to do most of the conversion in the database.
If you have access to the context, you can do this:
dbTickets = Context.Tickets
.FromSqlRaw("SELECT * FROM Tickets WHERE CAST(CASE WHEN PATINDEX('%[^0-9]%',Number) = 0 THEN Number ELSE LEFT(Number,PATINDEX('%[^0-9]%',Number)-1) END as int) BETWEEN {0} AND {1}", ticketBook.StartIntNumber, ticketBook.EndIntNumber)
.ToList();
This will strip off any trailing letters from the Number column and convert it to an int, then use that to make sure it is between your StartIntNumber and EndIntNumber.
That said, I would highly suggest you add an additional column into your tickets table that uses a derivative of the above to calculate an integer and then make the column a persistent calculated column. Then you can index on that column. Very little (if ANY) should need to be changed in your code if you do this, and the performance benefit will be huge.
This is based on your comment that said sometimes Number has additional letters at the end, like 123A. The above would need to be modified if Number can have letters at the start or in the middle like A123 or 1A23. Currently, it would treat A123 as 0, and 1A23 as 1.
While going through new C# 7.0 features, I stuck up with discard feature. It says:
Discards are local variables which you can assign but cannot read
from. i.e. they are “write-only” local variables.
and, then, an example follows:
if (bool.TryParse("TRUE", out bool _))
What is real use case when this will be beneficial? I mean what if I would have defined it in normal way, say:
if (bool.TryParse("TRUE", out bool isOK))
The discards are basically a way to intentionally ignore local variables which are irrelevant for the purposes of the code being produced. It's like when you call a method that returns a value but, since you are interested only in the underlying operations it performs, you don't assign its output to a local variable defined in the caller method, for example:
public static void Main(string[] args)
{
// I want to modify the records but I'm not interested
// in knowing how many of them have been modified.
ModifyRecords();
}
public static Int32 ModifyRecords()
{
Int32 affectedRecords = 0;
for (Int32 i = 0; i < s_Records.Count; ++i)
{
Record r = s_Records[i];
if (String.IsNullOrWhiteSpace(r.Name))
{
r.Name = "Default Name";
++affectedRecords;
}
}
return affectedRecords;
}
Actually, I would call it a cosmetic feature... in the sense that it's a design time feature (the computations concerning the discarded variables are performed anyway) that helps keeping the code clear, readable and easy to maintain.
I find the example shown in the link you provided kinda misleading. If I try to parse a String as a Boolean, chances are I want to use the parsed value somewhere in my code. Otherwise I would just try to see if the String corresponds to the text representation of a Boolean (a regular expression, for example... even a simple if statement could do the job if casing is properly handled). I'm far from saying that this never happens or that it's a bad practice, I'm just saying it's not the most common coding pattern you may need to produce.
The example provided in this article, on the opposite, really shows the full potential of this feature:
public static void Main()
{
var (_, _, _, pop1, _, pop2) = QueryCityDataForYears("New York City", 1960, 2010);
Console.WriteLine($"Population change, 1960 to 2010: {pop2 - pop1:N0}");
}
private static (string, double, int, int, int, int) QueryCityDataForYears(string name, int year1, int year2)
{
int population1 = 0, population2 = 0;
double area = 0;
if (name == "New York City")
{
area = 468.48;
if (year1 == 1960) {
population1 = 7781984;
}
if (year2 == 2010) {
population2 = 8175133;
}
return (name, area, year1, population1, year2, population2);
}
return ("", 0, 0, 0, 0, 0);
}
From what I can see reading the above code, it seems that the discards have a higher sinergy with other paradigms introduced in the most recent versions of C# like tuples deconstruction.
For Matlab programmers, discards are far from being a new concept because the programming language implements them since very, very, very long time (probably since the beginning, but I can't say for sure). The official documentation describes them as follows (link here):
Request all three possible outputs from the fileparts function:
helpFile = which('help');
[helpPath,name,ext] = fileparts('C:\Path\data.txt');
The current workspace now contains three variables from fileparts: helpPath, name, and ext. In this case, the variables are small. However, some functions return results that use much more memory. If you do not need those variables, they waste space on your system.
Ignore the first output using a tilde (~):
[~,name,ext] = fileparts(helpFile);
The only difference is that, in Matlab, inner computations for discarded outputs are normally skipped because output arguments are flexible and you can know how many and which one of them have been requested by the caller.
I have seen discards used mainly against methods which return Task<T> but you don't want to await the output.
So in the example below, we don't want to await the output of SomeOtherMethod() so we could do something like this:
//myClass.cs
public async Task<bool> Example() => await SomeOtherMethod()
// example.cs
Example();
Except this will generate the following warning:
CS4014 Because this call is not awaited, execution of the
current method continues before the call is completed. Consider
applying the 'await' operator to the result of the call.
To mitigate this warning and essentially ensure the compiler that we know what we are doing, you can use a discard:
//myClass.cs
public async Task<bool> Example() => await SomeOtherMethod()
// example.cs
_ = Example();
No more warnings.
To add another use case to the above answers.
You can use a discard in conjunction with a null coalescing operator to do a nice one-line null check at the start of your functions:
_ = myParam ?? throw new MyException();
Many times I've done code along these lines:
TextBox.BackColor = int32.TryParse(TextBox.Text, out int32 _) ? Color.LightGreen : Color.Pink;
Note that this would be part of a larger collection of data, not a standalone thing. The idea is to provide immediate feedback on the validity of each field of the data they are entering.
I use light green and pink rather than the green and red one would expect--the latter colors are dark enough that the text becomes a bit hard to read and the meaning of the lighter versions is still totally obvious.
(In some cases I also have a Color.Yellow to flag something which is not valid but neither is it totally invalid. Say the parser will accept fractions and the field currently contains "2 1". That could be part of "2 1/2" so it's not garbage, but neither is it valid.)
Discard pattern can be used with a switch expression as well.
string result = shape switch
{
Rectangule r => $"Rectangule",
Circle c => $"Circle",
_ => "Unknown Shape"
};
For a list of patterns with discards refer to this article: Discards.
Consider this:
5 + 7;
This "statement" performs an evaluation but is not assigned to something. It will be immediately highlighted with the CS error code CS0201.
// Only assignment, call, increment, decrement, and new object expressions can be used as a statement
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/compiler-messages/cs0201?f1url=%3FappId%3Droslyn%26k%3Dk(CS0201)
A discard variable used here will not change the fact that it is an unused expression, rather it will appear to the compiler, to you, and others reviewing your code that it was intentionally unused.
_ = 5 + 7; //acceptable
It can also be used in lambda expressions when having unused parameters:
builder.Services.AddSingleton<ICommandDispatcher>(_ => dispatcher);
I don't understand the use case of var patterns in C#7. MSDN:
A pattern match with the var pattern always succeeds. Its syntax is
expr is var varname
where the value of expr is always assigned to a local variable named
varname. varname is a static variable of the same type as expr.
The example on MSDN is pretty useless in my opinion, especially because the if is redundant:
object[] items = { new Book("The Tempest"), new Person("John") };
foreach (var item in items) {
if (item is var obj)
Console.WriteLine($"Type: {obj.GetType().Name}, Value: {obj}");
}
Here i don't see any benefits, you could have the same if you access the loop variable item directly which is also of type Object. The if is confusing as well because it's never false.
I could use var otherItem = item or use item diectly.
Can someone explain the use case better?
The var pattern was very frequently discussed in the C# language repository given that it’s not perfectly clear what its use case is and given the fact that is var x does not perform a null check while is T x does, making it appear rather useless.
However, it is actually not meant to be used as obj is var x. It is meant to be used when the left hand side is not a variable on its own.
Here are some examples from the specification. They all use features that are not in C# yet but this just shows that the introduction of the var pattern was primarly made in preparation for those things, so they won’t have to touch it again later.
The following example declares a function Deriv to construct the derivative of a function using structural pattern matching on an expression tree:
Expr Deriv(Expr e)
{
switch (e) {
// …
case Const(_): return Const(0);
case Add(var Left, var Right):
return Add(Deriv(Left), Deriv(Right));
// …
}
Here, the var pattern can be used inside the structures to “pull out” elements from the structure. Similarly, the following example simplifies an expression:
Expr Simplify(Expr e)
{
switch (e) {
case Mult(Const(0), _): return Const(0);
// …
case Add(Const(0), var x): return Simplify(x);
}
}
As gafter writes here, the idea is also to have property pattern matching, allowing the following:
if (o is Point {X is 3, Y is var y})
{ … }
Without checking the design notes on Github I'd guess this was added more for consistency with switch and as a stepping stone for more advanced pattern matching cases,
From the original What’s New in C# 7.0 post :
Var patterns of the form var x (where x is an identifier), which always match, and simply put the value of the input into a fresh variable x with the same type as the input.
And the recent dissection post by Sergey Teplyakov :
if you know what exactly is going on you may find this pattern useful. It can be used for introducing a temporary variable inside the expression:
This pattern essentially creates a temporary variable using the actual type of the object.
public void VarPattern(IEnumerable<string> s)
{
if (s.FirstOrDefault(o => o != null) is var v
&& int.TryParse(v, out var n))
{
Console.WriteLine(n);
}
}
The warning righ before that snippet is also significant:
It is not clear why the behavior is different in the Release mode only. But I think all the issues falls into the same bucket: the initial implementation of the feature is suboptimal. But based on this comment by Neal Gafter, this is going to change: "The pattern-matching lowering code is being rewritten from scratch (to support recursive patterns, too). I expect most of the improvements you seek here will come for "free" in the new code. But it will be some time before that rewrite is ready for prime time.".
According to Christian Nagel :
The advantage is that the variable declared with the var keyword is of the real type of the object,
Only thing I can think of offhand is if you find that you've written two identical blocks of code (in say a single switch), one for expr is object a and the other for expr is null.
You can combine the blocks by switching to expr is var a.
It may also be useful in code generation scenarios where, for whatever reason, you've already written yourself into a corner and always expect to generate a pattern match but now want to issue a "match all" pattern.
In most cases it is true, the var pattern benefit is not clear, and can even be a bad idea. However as a way of capturing anonymous types in temp variable it works great.
Hopefully this example can illustrate this:
Note below, adding a null case avoids var to ever be null, and no null check is required.
var sample = new(int id, string name, int age)[] {
(1, "jonas", 50),
(2, "frank", 48) };
var f48 = from s in sample
where s.age == 48
select new { Name = s.name, Age = s.age };
switch(f48.FirstOrDefault())
{
case var choosen when choosen.Name == "frank":
WriteLine(choosen.Age);
break;
case null:
WriteLine("not found");
break;
}
I've ended up writing my own helper-class to concatenate objects: ConcatHelper.cs.
You see some examples in the gist, but also in the following snippet:
model.Summary = new ConcatHelper(", ")
.Concat(diploma.CodeProfession /* can be any object, will be null checked and ToString() called */)
.BraceStart() // if everything between the braces is null or empty, the braces will not appear
.Concat(diploma.CodeDiplomaType)
.Concat(label: DiplomaMessage.SrkRegisterId, labelSeparator: " ", valueDecorator: string.Empty, valueToAdd: diploma.SrkRegisterId)
.BraceEnd()
.Concat(diploma.CodeCountry)
.BraceStart()
.Concat(diploma.DateOfIssue?.Year.ToString(CultureInfo.InvariantCulture)) // no separator will be added if concatenated string is null or empty (no ", ,")
.BraceEnd()
.Concat(DiplomaMessage.Recognition, " ", string.Empty, diploma.DateOfRecognition?.Year.ToString(CultureInfo.InvariantCulture))
.ToString(); // results in something like: Drogist / Drogistin (Eidgenössischer Abschluss, SRK-Registrierungsnummer 3099-239), Irland (1991)
Benefits:
Does the null checks for you, avoids if/else branches.
Supports labeling, decorating and delimiting values. Doesn't add a label if the value will be null.
Joins everything, fluent notation - less codes
Good to do summaries of domain-objects.
Contra:
Rather slow:
I measured 7ms for the above example
I measured 0.01026ms per concatenation in a real-life example (see unit-test gist)
It's not static (could it be?)
Needs a list to keep track of everything.
Probably an overkill.
So as I am now starting to override a lot of ToString() methods of domain objects, I am unsure, if there is a better way.
By better I basically mean:
Is there a library that already does the stuff I need?
If not, can it be speed up without losing the convenient fluent-notation?
So I would be happy if you show me either a convenient way to achieve the same result without my helper, or helping me improving this class.
Greetings,
flo
Update:
Look at this gist for a real-life UnitTest.
I do not see any real problem with your code. But I would prefer a more streamlined syntax. It may look like this in the end:
string result = ConcatHelper.Concat(
diploma.CodeProfession,
new Brace(
diploma.CodeDiplomaType,
new LabeledItem(label: DiplomaMessage.SrkRegisterId, labelSeparator: " ",
valueDecorator: string.Empty, valueToAdd: diploma.SrkRegisterId)
),
diploma.CodeCountry,
new Brace(
diploma.DateOfIssue?.Year.ToString(CultureInfo.InvariantCulture)
),
DiplomaMessage.Recognition
).ToString();
no wall of Text
you do not have to repeat Concat over and over again
now chance to mix up the braces
Concat() would be of the type static ConcatHelper Concat(objs object[] params) in this case. Brace and LabeledItem need to be handled by ConcatHelper of course (if (obj is LabeledItem) { ... }).
Regarding your contras:
It should be fast enough (10us/ call should be okay). If you really need it faster, you probably should use a single String.Format()
Concat can be static. Just create the ConcatHelper-object inside the Concat-call.
Yes, it needs a list. Is there a problem?
It may be overkill, it may not. If you use this type of code regularly, Utility classes can save you much time and make the code more readable.
Variant 1 - stream like
var sb = new StringBuilder();
const string delimiter = ", ";
var first = true;
Action<object> append = _ => {
if(null!=_){
if(!first){ sb.Append(delimiter);}
first = false;
sb.Append(_.ToString());
}
}
append(diploma.X);
append(diploma.Y);
...
Another one - with collection
var data = new List<object>();
data.Add(diploma.X);
data.Add(diploma.Y);
...
var result = string.Join(", ",data.Where(_=>null!=_).Select(_=>_.ToString()));
It's not much efficient but it allow you addition step between data preparation and joining to do somwthing over collection itself.
I'd like to do something like this in c sharp:
int i = 0;
foreach ( Item item in _Items )
{
foreach (Field theField in doc.Form.Fields)
{
switch (theField.Name)
{
case "Num" + i++.ToString(): // Number of Packages
theField.Value = string.Empty;
break;
}
}
}
I have 20 or so fields named Num1, Num2, etc. If I can do this all in one statement/block, I'd prefer to do so.
But the compiler complains that the case statements need to be constant values. Is there a way to use dynamic variables in the case statement so I can avoid repeating code?
I just thought I'd mention, the purpose of this method is to populate the fields in PDF form, with naming conventions which I can not control. There are 20 rows of fields, with names like "Num1" - "Num20". This is why string concatenation would be helpful in my scenario.
No. This is simply part of the language. If the values aren't constants, you'll have to use if/else if or a similar solution. (If we knew more details about what you were trying to achieve, we may be able to give more details of the solution.)
Fundamentally, I'd question a design which has a naming convention for the fields like this - it sounds like really you should have a collection to start with, which is considerably easier to work with.
Yes case value must be able to be evaluated at compile time. How about this instead
foreach (Field theField in doc.Form.Fields)
{
if(theField.Name == ("Num" + i++))
{
theField.Value = string.Empty;
}
}
How about:
int i = 0;
foreach ( Item item in _Items )
doc.Form.Fields.First(f=>f.Name == "Num" + i++.ToString()).Value = string.Empty;
Not sure what the purpose of item is in your code though.
No. There is no way.
How about replacing the switch with:
if (theField.Name.StartsWith("Num"))
theField.Value = string.Empty;
or some similar test?
var matchingFields =
from item in _Items
join field in doc.Form.Fields
on "Num" + item.PackageCount equals field.Name
select field;
foreach (var field in matchingFields)
{
field.Value = string.Empty;
}
For more efficiency include a DistinctBy on field Name after getting the matching fields (from MoreLINQ or equivalent).
Also consider that every time you concatenate two or more strings together the compiler will create memory variables for each of the component strings and then another one for the final string. This is memory intensive and for very large strings like error reporting it can even be a performance problem, and can also lead to memory fragmentation within your programs running, for long-running programs. Perhaps not so much in this case, but it is good to develop those best-practices into your usual development routines.
So instead of writing like this:
"Num" + i++.ToString()
Consider writing like this:
string.Format("{0}{1}", "Num", i++.ToString());
Also you may want to consider putting strings like "Num" into a separate constants class. Having string constants in your code can lead to program rigidity, and limit program flexibility over time as your program grows.
So you might have something like this at the beginning of your program:
using SysConst = MyNamespace.Constants.SystemConstants;
Then your code would look like this:
string.Format("{0}{1}", SysConst.Num, i++.ToString());
And in your SystemConstants class, you'd have something like this:
/// <summary>System Constants</summary>
public static class SystemConstants
{
/// <summary>The Num string.</summary>
public static readonly string Num = #"Num";
}
That way if you need to use the "Num" string any place else in your program, then you can just use the 'SysConst.Num'
Furthermore, any time you decide to change "Num" to "Number", say perhaps per a customer request, then you only need to change it in one place, and not a big find-replace in your system.