Converting C# to idiomatic R - c#

Originally, I was using a short C# program I wrote to average some numbers. But now I want to do more extensive analysis so I converted my C# code to R. However, I really don't think that I am doing it the proper way in R or taking advantage of the language. I wrote the R in the exact same way I did the C#.
I have a CSV with two columns. The first column identifies the row's type (one of three values: C, E, or P) and the second column has a number. I want to average the numbers grouped on the type (C, E, or P).
My question is, what is the idiomatic way of doing this in R?
C# code:
string path = "data.csv";
string[] lines = File.ReadAllLines(path);
int cntC = 0; int cntE = 0; int cntP = 0; //counts
double totC = 0; double totE = 0; double totP = 0; //totals
foreach (string line in lines)
{
String[] cells = line.Split(',');
if (cells[1] == "NA") continue; //skip missing data
if (cells[0] == "C")
{
totC += Convert.ToDouble(cells[1]);
cntC++;
}
else if (cells[0] == "E")
{
totE += Convert.ToDouble(cells[1]);
cntE++;
}
else if (cells[0] == "P")
{
totP += Convert.ToDouble(cells[1]);
cntP++;
}
}
Console.WriteLine("C found " + cntC + " times with a total of " + totC + " and an average of " + totC / cntC);
Console.WriteLine("E found " + cntE + " times with a total of " + totE + " and an average of " + totE / cntE);
Console.WriteLine("P found " + cntP + " times with a total of " + totP + " and an average of " + totP / cntP);
R code:
dat = read.csv("data.csv", header = TRUE)
cntC = 0; cntE = 0; cntP = 0 # counts
totC = 0; totE = 0; totP = 0 # totals
for(i in 1:nrow(dat))
{
if(is.na(dat[i,2])) # missing data
next
if(dat[i,1] == "C"){
totC = totC + dat[i,2]
cntC = cntC + 1
}
if(dat[i,1] == "E"){
totE = totE + dat[i,2]
cntE = cntE + 1
}
if(dat[i,1] == "P"){
totP = totP + dat[i,2]
cntP = cntP + 1
}
}
sprintf("C found %d times with a total of %f and an average of %f", cntC, totC, (totC / cntC))
sprintf("E found %d times with a total of %f and an average of %f", cntE, totE, (totE / cntE))
sprintf("P found %d times with a total of %f and an average of %f", cntP, totP, (totP / cntP))

I would use the data.table package since it has group by functionality built in.
library(data.table)
dat <- data.table(dat)
dat[, mean(COL_NAME_TO_TAKE_MEAN_OF), by=COL_NAME_TO_GROUP_BY]
# no quotes for the column names
If you would like to take the mean (or perform other function) on multiple columns, still by group, use:
dat[, lapply(.SD, mean), by=COL_NAME_TO_GROUP_BY]
Alternatively, if you want to use Base R, you could use something like
by(dat, dat[, 1], lapply, mean)
# to convert the results to a data.frame, use
do.call(rbind, by(dat, dat[, 1], lapply, mean) )

I would do something like this :
dat = dat[complete.cases(dat),] ## The R way to remove missing data
dat[,2] <- as.numeric(dat[,2]) ## convert to numeric as you do in c#
by(dat[,2],dat[,1],mean) ## compute the mean by group
Of course to aggregate your result in a data.frame you can use the the classic , But I don't think is necessary here since it a list of 3 variables:
do.call(rbind,result)
EDIT1
Another option here is to use the elegant ave :
ave(dat[,2],dat[,1])
But the result is different here. In the sense you will get a vector of the same length as your original data.
EDIT2 To include more results you can elaborate your anonymous function:
by(dat[,2],dat[,1],function(x) c(min(x),max(x),mean(x),sd(x)))
Or returns data.frame more suitable to rbind call and with columns names:
by(dat[,2],dat[,1],function(x)
data.frame(min=min(x),max=max(x),mean=mean(x),sd=sd(x)))
Or use the elegant built-in function ( you can define your's also) summary:
by(dat[,2],dat[,1],summary)

One way:
library(plyr)
ddply(dat, .(columnOneName), summarize, Average = mean(columnTwoName))

Related

C# for loop that gives sum of i/(i+1) through user specificed number

I need to display sum(i) = 1/2 + 2/3 + 3/4 + ... i/(i+1) where the final i is specified by the user. For some reason I am getting error "use of unassigned variable" for the second "total" in this line of code:
double total = (double) total + (i / (i + 1));
I tried to declare total outside the for loop but then it always comes out equal to 0.
Here is the full code:
public static void DisplaySums(int lastNum)
{
Console.WriteLine("i\tSum(i)");
for (int i=1; i<=(lastNum); i++)
{
double total = (double) total + (i / (i + 1));
Console.WriteLine(i + "\t" + total);
}
}
static void Main(string[] args)
{
Console.Write("Enter an integer: ");
int n = Convert.ToInt32(Console.ReadLine());
DisplaySums(n);
This is my first time ever asking a question on StackOverflow so I hope this makes sense. I can clarify if needed!
Thank you :)
First, you have to declare total outside of the loop. Else you are not summing up any intermediate results.
Second, you're getting zero because
(i / (i + 1))
performs integer division which is automatically truncated. To keep the decimal number, use a double literal:
(i / (i + 1.0))
In this line
double total = (double) total + (i / (i + 1));
You are essentially saying total is equal to itself plus something else. However the compiler doesn't know what total is as you just declared it. You need to assign the variable before you can use it.
Also (i / (i + 1)) is integer division. Which from the docs:
When you divide two integers, the result is always an integer
Anything divided by itself plus one will not be a whole number and the remainder will get thrown away and 0 will be returned. To fix this change (i / (i + 1)) to (i / (i + 1.0))
There are two issues here; the biggest obviously being your compile time error use of unassigned variable. This is because you are trying to assign the value of your newly instantiated variable as it is instantiated. A good idea is to instantiate to zero. You should also do this outside of your loop to retain the value with every iteration.
double total = 0;
Console.WriteLine("i\tSum(i)");
for (int i = 1; i <= lastNum; i++) {
total += (i / (i + 1));
Console.WriteLine(i + "\t" + total);
}
The next issue you are having is the result of zero every time. This is because the division is being performed on integers and integers are any whole number, thus a value such as 0.25 is rounded down to 0. If you revise your loop and your parameter in your method to use the double type instead, this issue will be resolved:
private static void DisplaySums(double lastNum) {
double total = 0;
Console.WriteLine("i\tSum(i)");
for (double d = 1; d <= lastNum; d++) {
total += (d / (d + 1));
Console.WriteLine(d + "\t" + total);
}
}

C# - Loop formatting [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Having a bit of trouble with an assignment of mine. It's supposed too prompt the user to enter a range (2 integers), then using the format below display the equations that are within the range.
Example:
Enter minimum integer: 3
Enter maximum integer: 7
All: 3 + 4 + 5 + 6 + 7 = 25
Even: 4 + 6 = 10
Odd: 3 + 5 + 7 = 15
Not asking for the entire solution, just a bit of the loop formatting issue. Any help would be appreciated.
Console.Write("Enter minimum integer: ");
string min = Console.ReadLine();
Console.Write("Enter maximum integer: ");
string max = Console.ReadLine();
int min32 = int.Parse(min);
int max32 = int.Parse(max);
for (int i = min32; i <= max32; i++)
Console.Write(i + " + ");
The simplest approach would be to start outputting the numbers in the console, always checking if this is the last number which will be outputed (if that's the case, don't print the + after the number.)
Console.Write("All: ");
int sum = 0;
for (int i = min32; i <= max32; i++)
{
if(i != max32) //Only add " + " after the number if this is not the end of the for loop
Console.Write(i + " + ");
else
Console.Write(i);
sum += i;
}
Console.WriteLine(" = " + sum);
//Outputs for min32 = 3 and max32 = 7:
//3 + 4 + 5 + 6 + 7 = 25
Bonus round: LINQ queries on an IEnumerable<int> returned from Enumerable.Range(), filtered using some Where statements and concatinated using a string.Join():
using System;
using System.Linq;
public class Program
{
public static void Main()
{
Console.Write("Min: ");
int min = int.Parse(Console.ReadLine());
Console.Write("Max: ");
int max = int.Parse(Console.ReadLine());
var sequence = Enumerable.Range(min, max - min + 1);
string all = "All: " + string.Join(" + ", sequence);
string even = "Even: " + string.Join(" + ", sequence.Where(a => a % 2 == 0));
string odd = "Odd: " + string.Join(" + ", sequence.Where(a => a % 2 == 1));
Console.WriteLine(all + " = " + sequence.Sum());
Console.WriteLine(even + " = " + sequence.Where(a => a % 2 == 0).Sum());
Console.WriteLine(odd + " = " + sequence.Where(a => a % 2 == 1).Sum());
}
}
Perhaps keep a string for each collection of numbers. When adding to a total, also append the number to the string. Another option might be string.Join() on a list of strings, using " + " as the separator.
Once the loop finishes, print the joined lists along with the appropriate sum.

Looping through multiple arrays to concatenate

I have 3 arrays. Two are arrays of strings and one is of date/time. I pulled all 3 from user input. Each array is always going to have the same exact amount of entries, so what I want to do is be able to loop through all 3 at once to make a string.
I was trying:
List<string> results = new List<string>();
// select
foreach (string line in array1)
{
foreach (string lines in array2)
{
foreach (DateTime date in datearray1)
{
results.Add("select * from table1 d, table2 c where d.specheader = c.specheader and c.true_false = true and d.number = " + lines.ToString() + " and d.date = '" + date.ToShortDateString() + "' and d.specnum like '%" + line.ToString() + "';");
}
}
}
results.ToArray();
foreach (string line in results)
{
MessageBox.Show(line);
}
The user types in information into 3 boxes and I'm just trying to concatenate sql statements based on the input. However when I tried doing it this way it looped through 6 times when I had only 2 entries. Is there a way to concatenate a string using all 3 arrays at the same time (so like entry 1 of array 1, entry 1 of array 2, entry 1 of array 3 - Then move on to creating the next string, entry 2 of array 1, entry 2 of array 2, entry 2 of array 3, etc.)
Any input would be appreciated. Thank you!
As the first commenter said (Yuck) don't use concatenation of strings into your SQL like that. You will want to setup an SQL Command and then pass in parameters.
That is however beside the point as you are asking about rolling together data from multiple arrays into 1 string.
Iterate through one of the arrays, If they all have the same count you will neatly get the data in one.
for(int i = 0; i < array1.Length; i++)
{
results.Add(string.format("Hello you! {0} , {1}, {2}", array1[i], array2[i], datearray[i])
}
This will get your desired result but your code is open to vulnerabilities as it stands. You need to change your approach.
Because your loops are nested, you're getting every value of array2 combined with every value in array1 (and similarly with datearray1. That's why you get too many results.
Your loops would work as intended like this (I've used similar local variables to avoid retyping the results.Add line, and to make clear how the code differs from yours):
for (int i = 0; i < array1.Length; i++)
{
string line = array1[i];
string lines = array2[i];
DateTime date = datearray1[i];
results.Add("select * from table1 d, table2 c where d.specheader = c.specheader and c.true_false = true and d.number = " + lines.ToString() + " and d.date = '" + date.ToShortDateString() + "' and d.specnum like '%" + line.ToString() + "';");
}
As a side-note: building a database query in this manner is inefficent and very insecure (try reading up on "Sql Injection" to understand why). You would see better results if you used a stored procedure instead.
if number of entries are going to be same for all you can simple do a for loop
for (int 1 = 0; i < datearray1.length; i++)
{
results.Add("select * from table1 d, table2 c
where d.specheader = c.specheader and c.true_false = true
and d.number = " + array2[i].ToString() + "
and d.date = '" + datearray1[i].ToShortDateString() + "'
and d.specnum like '%" + array1[i].ToString() + "';");
}

Line separation / Line break in a listbox

I want to display 2 sets of data on the one list box, for example, I would wont to display the 7 times table and the 8 times table on the same listbox. Here is how I get the first set of data displaying:
int awnser = 0;
int z;
z = int.Parse(textBox1.Text);
for (int i = 0; i < 11; i++)
{
awnser = z * i;
listBox6.Items.Add(z + " * " + i + " = " + awnser.ToString());
}
But how do I get a line break or separation so I can put the 8 times table just underneath?
How about this?
EDIT Insert it AFTER your loop
listBox6.Items.Add(z + " * " + i + " = " + awnser.ToString());
}
listBox6.Items.Add("--------------------");
In WPF this is easy to do using a custom template, but in WinForms I think you must do it by rendering the list items yourself.
Look at this example where they override the OnDrawItem method: http://www.syncfusion.com/FAQ/windowsforms/faq_c87c.aspx#q627q

C# is there a problem with division?

This is a piece of my code, it is called every second, after about 10 seconds the values start to become weird (see below):
double a;
double b;
for (int i = 0; i < currAC.Length; i++ )
{
a = currAC[i];
b = aveACValues[i];
divisor = (a/b);
Console.WriteLine("a = " + a.ToString("N2") + "\t" + "b = " + b.ToString("N2"));
Console.WriteLine("divisor = " + divisor);
Console.WriteLine("a+b = " + (a+b));
}
and the output:
a = -0.05 b = 0.00
divisor = 41
a+b = -0.0524010372273268
currAC and aveACValues are double[]
what on earth is going on???? The addition result is correct every time, but the division value is wrong, yet it is reading a and b correctly??
EDIT: '41' is the value of the first calculation, ie when a = currAC[0], but this should not remain???
if b == -0.001219512195122, then a/b==41, and a+b==-0.051219512195122 - so something around those areas (rounding etc) sounds feasible...
Also; note that for some arithmetic, it is possible it is using values that are still in registers. Registers may exhibit slightly different accuracy (and so give different results) than local variables.
double can result in inprecise results, due to the specification (see msdn). You might want to use decimal instead, it offers more precision.
What happens if the variables are declared as close to where they are used as is possible?
for (int i = 0; i < currAC.Length; i++ )
{
double a = currAC[i];
double b = aveACValues[i];
double divisor = (a/b);
Console.WriteLine("a = " + a.ToString("N2") + "\t" + "b = " + b.ToString("N2"));
Console.WriteLine("divisor = " + divisor.ToString());
Console.WriteLine("a+b = " + (a+b).ToString());
}
This would ensure that you have a fresh "divisor" each time and will not be affected by other scopes.

Categories