Hi I was wondering why my parse query was only returning an 100 objects when their is over 3000 rows in the parse db. I am using this in a xamrian.ios application and its only getting the first 99 objects back any ideas help is appreciated. And yes I did debug the code its only retreieving the first 99 objects back.
public async void populateFromParseLocalDB()
{
var query = ParseObject.GetQuery ("clinics");;
IEnumerable<ParseObject> results = await query.FindAsync();
int i;
foreach (var record in results)
{
i++;
Console.WriteLine("in for each");
var name = record.Get<String>("Name");
Console.WriteLine(name);
}
int mycount = i;
}
From the Parse Docs:
You can limit the number of results by calling Limit. By default,
results are limited to 100, but anything from 1 to 1000 is a valid
limit:
Related
I have an API in C# that returns data from a DB and a frontend that paints that data in a table.
My approach was to read the data from the DB with an sqlReader, iterate through this reader adding each result to a list and return that list to the frontend.
Seems easy enough, until I receive massive query data. My solution was to return this data chunk by chunk but I'm stuck with it, this is the code I'm working with:
var sqlCommand = db.InitializeSqlCommand(query);
try
{
using (var reader = sqlCommand.ExecuteReader())
{
var results = new List<List<string>>();
var headers = new List<string>();
var rows = new List<string>();
for (var i = 0; i < reader.FieldCount; i++)
{
headers.Add(reader.GetName(i));
}
results.Add(headers);
while (reader.Read())
{
for (var i = 0; i < reader.FieldCount; i++)
{
rows.Add((reader[reader.GetName(i)]).ToString());
}
results.Add(rows);
var str = JsonConvert.SerializeObject(results);
var buffer = Encoding.UTF8.GetBytes(str);
//Thread.Sleep(1000);
await outputStream.WriteAsync(buffer, 0, buffer.Length);
rows.Clear();
results.Clear();
outputStream.Flush();
}
}
}
catch (HttpException ex)
{
if (ex.ErrorCode == -2147023667) // The remote host closed the connection.
{
}
}
finally
{
outputStream.Close();
db.Dispose();
}
With this, I'm able to return data one by one (tested with the Thread.sleep), but I'm stuck on how to return a specific amount, say 200 data or 1000, it really should not matter.
Any idea on how to proceed?
Thanks in advance.
Mese.
I think controlling the query is the better way since that is what will be fetched from the database. You can increase the OFFSET for every subsequent run. Example - after ORDER BY clause add OFFSET 200 ROWS FETCH NEXT 200 ROWS ONLY to skip 200 rows and get the next 200.
However since you've mentioned that you have no control on the query, then you can do something like this to filter our results on your end. The key trick here is to use reader.AsEnumerable.Skip(200).Take(200) to choose which rows to process. Update the input to Skip() in every iteration to process data accordingly.
// Offset variable will decide how many rows to skip, the outer while loop can be
// used to determine if more data is present and increment offset by 200 or any
// other value as required. Offset -> 0, 200, 400, 600, etc.. until data is present
bool hasMoreData = true;
int offset = 0;
while(hasMoreData)
{
// SQL Data reader and other important operations
foreach(var row in reader.AsEnumerable.Skip(offset).Take(200))
{
// Processing operations
}
// Check to ensure there are more rows
if(no more rows)
hasMoreData = false;
offset += 200;
}
Another thing to keep in mind is when you pull the data in batches, the query will execute multiple times and if during that time, a new record got added or deleted, then the batches will not function correctly. To get past this, you can do 2 things:
Validate a Unique ID of every record with unique ID's of already fetched records to make sure the same record isn't pulled twice (edge case due to record addition/deletion)
Add a buffer to your offset, such as
Skip(0).Take(100) // Pulls 0 - 100 records
Skip(90).Take(100) // Pulls 90 - 190 records (overlap of 10 to cater for additions/deletions)
Skip(180).Take(100) // Pulls 180 - 280 records (overlap of 10 to cater for additions/deletions)
and so on...
Hope this helps!
I have a task in which I need to query a large amount of data. I created a method for the queries:
public List<T> Query(FilterDefinition<T> filter, SortDefinition<T> sort, int limit)
{
var query = Collection.Find(filter).Sort(sort).Limit(limit);
var result = query.ToList();
return result;
}
In the main method:
List<Cell> cells = MyDatabaseService.Query(filter, sort, 100000);
This List will contain 100000 values which is quite large.
On the other hand I can also use:
public async Task<IAsyncCursor<T>> QueryAsync(FilterDefinition<T> filter, SortDefinition<T> sort, int limit)
{
FindOptions<T> options = new FindOptions<T> { Sort = sort, Limit = limit };
var queryCursor = await Collection.FindAsync(filter, options);
return queryCursor;
}
In the main, then I use a while loop to iterate the cursor.
Task<IAsyncCursor<Cell>> cursor = MyDatabaseService.QueryAsync(filter, sort, 100000);
while (await cursor.MoveNextAsync())
{
var batch = queryCursor.Current;
foreach (var document in batch)
{
}
}
So considering I have a lot of data to query, is it a good idea to use the 2nd implementation ? Thanks for any reply.
It really depends what you are planning to do with the documents once you've retrieved them from the server.
If you need to perform an operation that requires all 100,000 documents to be in the program's memory then the two methods will essentially do the same thing.
On the other hand, if you are using the returned documents one by one, the second method is better: the first will essentially process every document twice (once to retrieve it along with all other documents and once to act on it); the second will process it once (retrieve and act immediately).
I am implementing the Parse Unity SDK in order to have a high score system. I run a query on my data to get the top ten players and their scores. (It should be sorted by score). For some reason when my code is run, I get a blank string for the name and a 0 for the score even though my data has real values in it.
Here is the query:
int[] scores = new int[10];
string[] names = new string[10];
int i = 0;
var query = ParseObject.GetQuery ("HighScores").OrderByDescending ("score").Limit (10);
query.FindAsync().ContinueWith (t =>
{
IEnumerable<ParseObject> results = t.Result;
foreach (var obj in results)
{
scores[i] = obj.Get<int>("score");
names[i] = obj.Get<string>("playerName");
i++;
}
});
The class name is "HighScores" and I am trying to access the score ("score") and player name ("playerName") of each saved entry.
EDIT:
I found that there are zero results returned so it must be something with the query. I don't see what could be wrong with it.
8/17/15
I still have not found out what is going on with my query. Any ideas?
It turns out that I was getting the data from the query. The query was fine all along. The actual issue was that I was trying to output my newly found scores to a string that was being called before the query was finished getting the data since the query is an asynchronous call. Instead of doing it this way, I let the query run fully and once the query is finished, I set a static bool called finishedRunningQuery to true. Now in the update() method I have it check: if (finishedRunningQuery) then update the high score text. This fixes the issue.
Iterating through a datatable that contains about 40 000 records using for-loop takes almost 4 minutes. Inside the loop I'm just reading the value of a specific column of each row and concatinating it to a string.
I'm not opening any DB connections or something, as its a function which recieves a datatable, iterate through it and returns a string.
Is there any faster way of doing this?
Code goes here:
private string getListOfFileNames(Datatable listWithFileNames)
{
string whereClause = "";
if (listWithFileNames.Columns.Contains("Filename"))
{
whereClause = "where filename in (";
for (int j = 0; j < listWithFileNames.Rows.Count; j++)
whereClause += " '" + listWithFileNames.Rows[j]["Filename"].ToString() + "',";
}
whereClause = whereClause.Remove(whereClause.Length - 1, 1);
whereClause += ")";
return whereClause;
}
Are you using a StringBuilder to concat the strings rather than just regular string concatenation?
Are you pulling back any more columns from the database then you really need to? If so, try not to. Only pull back the column(s) that you need.
Are you pulling back any more rows from the database then you really need to? If so, try not to. Only pull back the row(s) that you need.
How much memory does the computer have? Is it maxing out when you run the program or getting close to it? Is the processor at the max much or at all? If you're using too much memory then you may need to do more streaming. This means not pulling the whole result set into memory (i.e. a datatable) but reading each line one at a time. It also might mean that rather than concatting the results into a string (or StringBuilder ) that you might need to be appending them to a file so as to not take up so much memory.
Following linq statement have a where clause on first column and concat the third column in a variable.
string CSVValues = String.Join(",", dtOutput.AsEnumerable()
.Where(a => a[0].ToString() == value)
.Select(b => b[2].ToString()));
Step 1 - run it through a profiler, make sure you're looking at the right thing when optimizing.
Case in point, we had an issue we were sure was slow database interactions and when we ran the profiler the db barely showed up.
That said, possible things to try:
if you have the memory available, convert the query to a list, this
will force a full db read. Otherwise the linq will probably load in
chunks doing multiple db queries.
push the work to the db - if you can create a query than trims down
the data you are looking at, or even calculates the string for you,
that might be faster
if this is something where the query is run often but the data rarely
changes, consider copying the data to a local db (eg. sqlite) if
you're using a remote db.
if you're using the local sql-server, try sqlite, it's faster for
many things.
var value = dataTable
.AsEnumerable()
.Select(row => row.Field<string>("columnName"));
var colValueStr = string.join(",", value.ToArray());
Try adding a dummy column in your table with an expression. Something like this:
DataColumn dynColumn = new DataColumn();
{
dynColumn.ColumnName = "FullName";
dynColumn.DataType = System.Type.GetType("System.String");
dynColumn.Expression = "LastName+' '-ABC";
}
UserDataSet.Tables(0).Columns.Add(dynColumn);
Later in your code you can use this dummy column instead. You don't need to rotate any loop to concatenate a string.
Try using parallel for loop..
Here's the sample code..
Parallel.ForEach(dataTable.AsEnumerable(),
item => { str += ((item as DataRow)["ColumnName"]).ToString(); });
I've separated the job in small pieces and let each piece be handled by its own Thread. You can fine tune the number of thread by varying the nthreads number. Try it with different numbers so you can see the difference in performance.
private string getListOfFileNames(DataTable listWithFileNames)
{
string whereClause = String.Empty;
if (listWithFileNames.Columns.Contains("Filename"))
{
int nthreads = 8; // You can play with this parameter to fine tune and get your best time.
int load = listWithFileNames.Rows.Count / nthreads; // This will tell how many items reach thread mush process.
List<ManualResetEvent> mres = new List<ManualResetEvent>(); // This guys will help the method to know when the work is done.
List<StringBuilder> sbuilders = new List<StringBuilder>(); // This will be used to concatenate each bis string.
for (int i = 0; i < nthreads; i++)
{
sbuilders.Add(new StringBuilder()); // Create a new string builder
mres.Add(new ManualResetEvent(false)); // Create a not singaled ManualResetEvent.
if (i == 0) // We know were to put the very begining of your where clause
{
sbuilders[0].Append("where filename in (");
}
// Calculate the last item to be processed by the current thread
int end = i == (nthreads - 1) ? listWithFileNames.Rows.Count : i * load + load;
// Create a new thread to deal with a part of the big table.
Thread t = new Thread(new ParameterizedThreadStart((x) =>
{
// This is the inside of the thread, we must unbox the parameters
object[] vars = x as object[];
int lIndex = (int)vars[0];
int uIndex = (int)vars[1];
ManualResetEvent ev = vars[2] as ManualResetEvent;
StringBuilder sb = vars[3] as StringBuilder;
bool coma = false;
// Concatenate the rows in the string builder
for (int j = lIndex; j < uIndex; j++)
{
if (coma)
{
sb.Append(", ");
}
else
{
coma = true;
}
sb.Append("'").Append(listWithFileNames.Rows[j]["Filename"]).Append("'");
}
// Tell the parent Thread that your job is done.
ev.Set();
}));
// Start the thread with the calculated params
t.Start(new object[] { i * load, end, mres[i], sbuilders[i] });
}
// Wait for all child threads to finish their job
WaitHandle.WaitAll(mres.ToArray());
// Concatenate the big string.
for (int i = 1; i < nthreads; i++)
{
sbuilders[0].Append(", ").Append(sbuilders[i]);
}
sbuilders[0].Append(")"); // Close your where clause
// Return the finished where clause
return sbuilders[0].ToString();
}
// Returns empty
return whereClause;
}
I'm iterating through a smallish (~10GB) table with a foreach / IQueryable and LINQ-to-SQL.
Looks something like this:
using (var conn = new DbEntities() { CommandTimeout = 600*100})
{
var dtable = conn.DailyResults.Where(dr => dr.DailyTransactionTypeID == 1);
foreach (var dailyResult in dtable)
{
//Math here, results stored in-memory, but this table is very small.
//At the very least compared to stuff I already have in memory. :)
}
}
The Visual Studio debugger throws an out-of memory exception after a short while at the base of the foreach loop. I'm assuming that the rows of dtable are not being flushed. What to do?
The IQueryable<DailyResult> dtable will attempt to load the entire query result into memory when enumerated... before any iterations of the foreach loop. It does not load one row during the iteration of the foreach loop. If you want that behavior, use DataReader.
You call ~10GB smallish? you have a nice sense of humor!
You might consider loading rows in chunks, aka pagination.
conn.DailyResults.Where(dr => dr.DailyTransactionTypeID == 1).Skip(x).Take(y);
Using DataReader is a step backward unless there is a way to use it within LINQ. I thought we were trying to get away from ADO.
The solution suggested above works, but it's truly ugly. Here is my code:
int iTake = 40000;
int iSkip = 0;
int iLoop;
ent.CommandTimeout = 6000;
while (true)
{
iLoop = 0;
IQueryable<viewClaimsBInfo> iInfo = (from q in ent.viewClaimsBInfo
where q.WorkDate >= dtStart &&
q.WorkDate <= dtEnd
orderby q.WorkDate
select q)
.Skip(iSkip).Take(iTake);
foreach (viewClaimsBInfo qInfo in iInfo)
{
iLoop++;
if (lstClerk.Contains(qInfo.Clerk.Substring(0, 3)))
{
/// Various processing....
}
}
if (iLoop < iTake)
break;
iSkip += iTake;
}
You can see that I have to check for having run out of records because the foreach loop will end at 40,000 records. Not good.
Updated 6/10/2011: Even this does not work. At 2,000,000 records or so, I get an out-of-memory exception. It is also excruciatingly slow. When I modified it to use OleDB, it ran in about 15 seconds (as opposed to 10+ minutes) and didn't run out of memory. Does anyone have a LINQ solution that works and runs quickly?
Use .AsNoTracking() - it tells DbEntities not to cache retrieved rows
using (var conn = new DbEntities() { CommandTimeout = 600*100})
{
var dtable = conn.DailyResults
.AsNoTracking() // <<<<<<<<<<<<<<
.Where(dr => dr.DailyTransactionTypeID == 1);
foreach (var dailyResult in dtable)
{
//Math here, results stored in-memory, but this table is very small.
//At the very least compared to stuff I already have in memory. :)
}
}
I would suggest using SQL instead to modify this data.