DynamoDB: Deleting an item without using primary key - c#

I'm implementing DynamoDB in our project. We have to put large data strings into database so we are splitting data into small pieces and inserting multiple rows with only one attribute value changed - part of string. One column (range key) contains a number of part. Inserting and selecting data works perfectly fine for small and large strings. The problem is deleting an item. I read that when you want to delete an item you need to specify primary key for such item (hash key or hash key and range key - depends on table). But what if I want to delete items that have particular value for one of attributes? Do I need to scan (scan, not query) entire table and for each row run delete or batch delete? Or is there some another solution without using two queries? What I'm trying to do is to avoid scanning entire table. I think we will have about 100 - 1000 milions of rows in such table, so scanning will be very slow.
Thanks for help.

There are no way to delete an arbitrary element in DynamoDB. You indeed need to know the hash_key and the range_key.
If query does not fit your needs for this (ie. you even do not know the hash_key), then you're stuck.
The best would be to re-thing your data modeling. Build a custom index or do 'lazy delete'.
To achieve 'lazy delete', use a table as a queue of element to delete. Periodically, run an EMR on it to do all the delete in the batch in a single scan operation. It's really not the best solution but the only way I can think of to avoid re-modeling.
TL;DR: There is no real way but workarounds. I highly recommend that you re-model at least part of your data.

Related

C# Winforms Fastest Way To Query MS Access

This may be a dumb question, but I wanted to be sure. I am creating a Winforms app, and using c# oledbconnection to connect to a MS Access database. Right now, i am using a "SELECT * FROM table_name" and looping through each row to see if it is the row with the criteria I want, then breaking out of the loop if it is. I wonder if the performance would be improved if I used something like "SELECT * FROM table_name WHERE id=something" so basically use a "WHERE" statement instead of looping through every row?
The best way to validate the performance of anything is to test. Otherwise, a lot of assumptions are made about what is the best versus the reality of performance.
With that said, 100% of the time using a WHERE clause will be better than retrieving the data and then filtering via a loop. This is for a few different reasons, but ultimately you are filtering the data on a column before retrieving all of the columns, versus retrieving all of the columns and then filtering out the data. Relational data should be dealt with according to set logic, which is how a WHERE clause works, according to the data set. The loop is not set logic and compares each individual row, expensively, discarding those that don’t meet the criteria.
Don’t take my word for it though. Try it out. Especially try it out when your app has a lot of data in the table.
yes, of course.
if you have a access database file - say shared on a folder. Then you deploy your .net desktop application to each workstation?
And furthermore, say the table has 1 million rows.
If you do this:
SELECT * from tblInvoice WHERE InvoiceNumber = 123245
Then ONLY one row is pulled down the network pipe - and this holds true EVEN if the table has 1 million rows. To traverse and pull 1 million rows is going to take a HUGE amount of time, but if you add criteria to your select, then it would be in this case about 1 million times faster to pull one row as opposed to the whole table.
And say if this is/was multi-user? Then again, even on a network - again ONLY ONE record that meets your criteria will be pulled. The only requirement for this "one row pull" over the network? Access data engine needs to have a useable index on that criteria. Of course by default the PK column (ID) always has that index - so no worries there. But if as per above we are pulling invoice numbers from a table - then having a index on that column (InvoiceNumber) is required for the data engine to only pull one row. If no index can be used - then all rows behind the scenes are pulled until a match occurs - and over a network, then this means significant amounts of data will be pulled without that index across that network (or if local - then pulled from the file on the disk).

Clean memory after getting an element from queue

Suppose we have the following situation:
We have 2-3 tables in database with a huge amount of data (let it be 50-100mln of records) and we want to add 2k of new records. But before adding them we need to check our db on duplicates. So if this 2k contains records which we have in our DB we should ignore them. But to find out whether new record is a duplicate or not we need info from both tables (for example we need to make left join).
The idea of solution is: one task or thread create a suitable data for comparison and pushes data into queue (by batches, not record by record), so our queue(or concurrentQueue) is a global variable. The second thread gets batch from queue and look it through. But there's a problem - memory is growing...
How can I clean memory after I've surfed through the batch?
P.S. If smb has another idea how to optimize this process - please describe it...
This is not the specific answer to the question you are asking, because what you are asking, doesn't really make sense to me.
if you are looking to update specific rows:
INSERT INTO tablename (UniqueKey,columnname1, columnname2, etc...)
VALUES (UniqueKeyValue,value1,value2, etc....)
ON DUPLICATE KEY
UPDATE columnname1=value1, columnname2=value2, etc...
If not, simply ignore/remove the update statement.
This would be darn fast, considering, it would use the unique index of whatever field you want to be unique, and just do an insert or update. No need to validate in a separate table or anything.

SQLXML Bulk Load or manual iteration?

I am looking to insert a 20-25MB xml file into a database on a daily basis. The issue is that each entry needs an extra column added with a calculated value. So what I am wondering is if the most efficient way to do this would be using the SQLXML Bulk Load tools after editing the xml file, running through the xml file and add the new column then loading each item, or using the Bulk Load followed by going through the database adding the new column values.
Comments = answer
There is no need to store this value seperate. Since it's a calculated value with the data you need to calculate it on each record, you can calculate this on the fly instead of storing it as it's own unique value. A mix of where and/or having clauses will allow for filtering (searching) of results based on that calculated value.

How to sync the lines in a text box with the rows of a database?

I have a text box that contains lines of data similar to this:
sheep
haggis
red squirrels
chickens
rabbits
and a SQL CE table with one column, titled foods, that stores these values. What is the best way to sync these two data sources? For example, if a user deletes red squirrels and chickens from the textbox and adds carrots, the table rows that contains red squirrels and chickens should be deleted and a new row inserted that contains carrots.
Right now, my solution is to compare the old list with the new list and perform two actions. 1) Find the items that have been removed by comparing the lists using Except:
List<String> removedFoods = oldList.Except(newList).ToList();
These tags are removed with DELETE. Another Except statement finds the tags that have been added, which are then added with INSERT.
List<String> addedFoods = newList.Except(oldList).ToList();
Is their a better way that involves one C#/SQL statement that I'm missing (or a better way in general)?
Your question is about performance right? I don't see much room for improvement in performance from what you are doing already. No matter what, you have to do deletion and insertion.
As Tim asked in his reply to your question, it depends on the number of foods in your table.
You could clear the table data like following:
TRUNCATE TABLE foods
MSDN: TRUNCATE TABLE is similar to the DELETE statement with no WHERE clause; however, TRUNCATE TABLE is faster and uses fewer system and transaction log resources
and then use the text box data to insert the correct data back in.
It will be easier to make a datagrid that looks like a multi line textbox than to do all the parsing and handling yourself.
Keeping track of the lines/rows will be much easier.
Deleting and updating too.

Best way to save a ordered List to the Database while keeping the ordering

I was wondering if anyone has a good solution to a problem I've encountered numerous times during the last years.
I have a shopping cart and my customer explicitly requests that it's order is significant. So I need to persist the order to the DB.
The obvious way would be to simply insert some OrderField where I would assign the number 0 to N and sort it that way.
But doing so would make reordering harder and I somehow feel that this solution is kinda fragile and will come back at me some day.
(I use C# 3,5 with NHibernate and SQL Server 2005)
Thank you
Ok here is my solution to make programming this easier for anyone that happens along to this thread. the trick is being able to update all the order indexes above or below an insert / deletion in one update.
Using a numeric (integer) column in your table, supported by the SQL queries
CREATE TABLE myitems (Myitem TEXT, id INTEGER PRIMARY KEY, orderindex NUMERIC);
To delete the item at orderindex 6:
DELETE FROM myitems WHERE orderindex=6;
UPDATE myitems SET orderindex = (orderindex - 1) WHERE orderindex > 6;
To swap two items (4 and 7):
UPDATE myitems SET orderindex = 0 WHERE orderindex = 4;
UPDATE myitems SET orderindex = 4 WHERE orderindex = 7;
UPDATE myitems SET orderindex = 7 WHERE orderindex = 0;
i.e. 0 is not used, so use a it as a dummy to avoid having an ambiguous item.
To insert at 3:
UPDATE myitems SET orderindex = (orderindex + 1) WHERE orderindex > 2;
INSERT INTO myitems (Myitem,orderindex) values ("MytxtitemHere",3)
Best solution is a Doubly Linked list. O(1) for all operations except indexing. Nothing can index SQL quickly though except a where clause on the item you want.
0,10,20 types fail. Sequence column ones fail. Float sequence column fails at group moves.
Doubly Linked list is same operations for addition, removal, group deletion, group addition, group move. Single linked list works ok too. Double linked is better with SQL in my opinion though. Single linked list requires you to have the entire list.
FWIW, I think the way you suggest (i.e. committing the order to the database) is not a bad solution to your problem. I also think it's probably the safest/most reliable way.
How about using a linked list implementation? Having one column the will hold the value (order number) of the next item. I think it's by far the easiest to use when doing insertion of orders in between. No need to renumber.
Unfortunately there is no magic bullet for this. You cannot guarentee the order of any SELECT statement WITHOUT an order by clause. You need to add the column and program around it.
I don't know that I'd recommend adding gaps in the order sequence, depending on the size of your lists and the hits on the site, you might gain very little for the over head of handling the logic (you'd still need to cater for the occasion where all the gaps have been used up). I'd take a close look to see what benifits this would give you in your situation.
Sorry I can't offer anything better, Hope this helped.
I wouldn't recommend the A, AA, B, BA, BB approach at all. There's a lot of extra processing involved to determine hierarchy and inserting entries in between is not fun at all.
Just add an OrderField, integer. Don't use gaps, because then you have to either work with a non-standard 'step' on your next middle insert, or you will have to resynchronize your list first, then add a new entry.
Having 0...N is easy to reorder, and if you can use Array methods or List methods outside of SQL to re-order the collection as a whole, then update each entry, or you can figure out where you are inserting into, and +1 or -1 each entry after or before it accordingly.
Once you have a little library written for it, it'll be a piece of cake.
I would just insert an order field. Its the simplest way. If the customer can reorder the fields or you need to insert in the middle then just rewrite the order fields for all items in that batch.
If down the line you find this limiting due to poor performance on inserts and updates then it is possible to use a varchar field rather than an integer. This allows for quite a high level of precision when inserting. eg to insert between items 'A' and 'B' you can insert an item ordered as 'AA'. This is almost certainly overkill for a shopping cart though.
On a level of abstraction above the cart Items let's say CartOrder (that has 1-n with CartItem) you can maintain a field called itemOrder which could be just a comma - separated list of id(PK) of cartItem records relevant . It will be at application layer that you require to parse that and arrange your item models accordingly . The big plus for this approach will be in case of order reshufflings , there might not be changes on individual objects but since order is persisted as an index field inside the order item table rows you will have to issue an update command for each one of the rows updating their index field.
Please let me know your criticisms on this approach, i am curious to know in which ways this might fail.
I solved it pragmatically like this:
The order is defined in the UI.
The backend gets a POST request that contains the IDs and the corresponding Position of every item in the list.
I start a transaction and update the position for every ID.
Done.
So ordering is expensive but reading the ordered list is super cheap.
I would recommend keeping gaps in the order number, so instead of 1,2,3 etc, use 10,20,30... If you need to just insert one more item, you could put it at 15, rather than reordering everything at that point.
Well, I would say the short answer is:
Create a primary key of autoidentity in the cartcontents table, then insert rows in the correct top-down order. Then by selecting from the table with order by the primary key autoidentity column would give you the same list. By doing this you have to delete all items and reinsert then in case of alterations to the cart contents. (But that is still quite a clean way of doing it) If that's not feasible, then go with the order column like suggested by others.
When I use Hibernate, and need to save the order of a #OneToMany, I use a Map and not a List.
#OneToMany(fetch = FetchType.EAGER, mappedBy = "rule", cascade = CascadeType.ALL)
#MapKey(name = "position")
#OrderBy("position")
private Map<Integer, RuleAction> actions = LazyMap.decorate(new LinkedHashMap<>(), FactoryUtils.instantiateFactory(RuleAction.class, new Class[] { Rule.class }, new Object[] { this }));
In this Java example, position is an Integer property of RuleAction so the order is persisted that way. I guess in C# this would look rather similar.

Categories