I have a mysql database. In which there are 50 columns of detail.
Detail 1, Detail2, Detail3...... Detail50.
I have the website locally so i am scrapping from myself. The site is not in order no tags and names data is just in form of text line by line, so this was the only option to take what i get line by line and save to DB. So every line gets a column from 1-50....
Some pages have 10 columns other have 50 and the data is in no order now i have the DB,how can i sort them any suggestion ,idea is welcome.
This image will make it more clear:
So You can see Sometimes its Inner Diameter in Detail4 and sometime in 1, these are just examples i would have hard coded but there are too many possibilities, but the repeating words all have the same staring name just values different .Any chance to atleast make 50 % of the data in order the ones with same 4-5 starting words like
part,inner,diameter,oil filter etc.
Any suggestion or ideas can it be done in mysql or C# code.....
Thank you
your approach is totally wrong, but if you want to go this way, just make a table with two columns 'id' and 'details' ... make an insert for each column for the specific product ID.
After that you can use a SELECT like that:
select SUBSTRING(details, 14) from products where details like 'Inner Diameter%' and id = 'my_product_id';
Related
I am working on an ASP.NET MVC application. This application is used by 200 users. These
users constantly (every 5 mins) search for an item from the list of 100,000 items (this list is going to increase every month by 1-2 %). This list of 100,000 items are stored in a SQL Server table.
The search is a wildcard search
eg:
Select itemCode, itemName, ItemDesc
from tblItems
Where itemName like '%SearchWord%'
The searching needs to really fast since the main business relies on searching and selecting the item.
I would like to know how to get the best performance. The search results have to come up instantaneously.
What I have tried -
I tried pre-loading the entire 100,000 records into memcache and then reading from the memcache. I was trying to avoid the calls to SQL Server for every search.
This takes a lot of time. Every time user searches for an item, we are retrieving 100,000 records from the memcache and then doing the search. This is taking almost 2-3 times more time than direct SQL searches.
I tried doing a direct search on the SQL Server table but limiting the results to only 50 records at a time (using top 50)
This seems to be Ok but still no-where near the performance we are seeking
I would like to hear the possible solutions and links to any articles/code.
Thanks in advance
Run SQL Profiler and do a tuning profile. This will make recommendations on indexes to execute against your database.
Also, a query such as the following would be worth a try.
SELECT *
FROM
(
SELECT ROW_NUMBER() OVER ( ORDER BY ColumnA) AS RowNumber, itemCode, itemName, ItemDesc
FROM tblItems
WHERE itemName LIKE '%FooBar%'
) AS RowResults
WHERE RowNumber >= 1 AND RowNumber < 50
ORDER BY RowNumber
EDIT: Updated query to reflect your real scenario.
How about having a search without the leading wildcard as your primary search....
Where itemName like 'SearchWord%'
and then have having a "More Results" button that loads
Where itemName like '%SearchWord%'
(alternatively exclude results from the first result set)
Where itemName not like 'SearchWord%' and itemName like '%SearchWord%'
A weird alternative which might work, as it depends on several assumptions etc. Sorry not fully explained but am using ipad so hard to type. (and yes, this solution has been used in high txn commericial systems)
This assumes
That your query is cpu constrained not IO
That itemName is not too long, such that it holds all letters and numbers
That searchword, in total, contains enough selective characters and isnt just highly common characters
Your selection predicates are constrained by a %like%
The basic idea is to expand your query to help the optimiser know which rows need the like scanning.
Step 1. Setup your table
Create an additional 26 or 36 columns for each letter/digit. When I've done this for real it has always been a seperate table, but putting it on source table should be ok for a small volume like 100k. Lets call the colmns trig_a, trig_b etc.
Create a trigger for each insert/edit/delete and put a 1 or 0 into the trig_a field if it contains an 'a', do this for all 26/36 columns. The trigger to do this is complex, but possible (at least using Oracle). If you get stuck I'm sure SO'ers can create it, or I can dig it out.
At this point, we have a series of columns that indicate whether a field contains a letter/digit etc.
Step 2. Helping you query
With this extra info, we are in the position to help the optimiser. Add the following to your query
Select ... Where .... And
((trig_a > 0) or (searchword not like '%a%')) and
((trig_b > 0) or (searchword not like '%b%')) and
... Repeat for all columns monitored...
If the optimiser behaves, it can use the (hopefully) lower cost field>0 predicates to reduce the like predicates evaluated.
Notes.
You may need to force the optimiser to scan trig_? Fields first
Indexes can help on trig_? Fields, especically if in the source table
I haven't shown how to handle upper/lower case, dont forget to handle this
You might find just doing a few letters is all you need to do.
This technique doesnt offer performance gains for every use of like, so it isnt a general purpose technique for everywhere you use a like.
I am converting a VB6 app to C# with an SQL Server back end. The app includes a very general query editor that allows the user to write any select query and return the results visually in a grid control. Some of the tables have several hundred columns (poor design, I know but I have no control over this). A typical use case for an admin user would be to
select * from A_Table_With_Many_Columns
However, while they want to be able to view all the data, they are particularly interested in 2 columns and they want these to be displayed as the first 2 columns in the grid (instead of 67th and 99th for example) so instead they execute the following statement:
select First_Interesting_Field, Second_Interesting_Field, *
from A_Table_With_Many_Columns
Then they will go and modify the data in the grid. However, when saving this data, it results in a concurrency violation (DBConcurrencyException). This worked fine with the connected RecordSets of VB6 but not so well in C#. I have tried a myriad of solutions to no avail.
Does anyone know how to handle this exception in a generic way? (Remember, the user can type ANY select statement or join etc. into the query editor)
Does anyone know how I might manipulate the columns returned such that I delete the 2 columns that appear further on in the list? (My difficulty here is that if the column name in the database is EMail so I do select Email, * from Blah the 2 pertinent columns returned are EMail and ADO.NET or C# aliases the second EMail column from the * portion of the query as EMail1 so I am not able to detect the second column as a duplicate and remove it)
Does anyone have an alternate solution I have not thought of?
Thank you very much
Actually, you could rename all variables to something like email_userdefined by doing something like this:
SELECT First_Interesting_Field as First_Interesting_Field_userdefined, Second_Interesting_Field as Second_Interesting_Field_userdefined, *
from A_Table_With_Many_Columns
Replace user_defined with whatever you want, like order number or anything else user acceptable
I have developed an eCommerce application in C# and ASP.Net. For the Admin users "dashboard" landing page, I would like to give them a GridView that shows them the total sales dollar amount for a couple different time ranges, these would be my columns (ie last day, last week, last month, last year, total ever). I would like to give these values for orders that are in different status' (ie complete, paid but not shipped, in progress). Something similar to this:
|OrderStatus|Today|LastWeek|LastMonth|
|Processed |$10 |$100 |$34000 |
|PaidNotShip|$4 |$12 |$45 |
My question: What is the best/most efficient way to do this? I know that I could write separate SQL statements and union them together and bind the gridview to a sqldatasource:
(select amountForYesterday, amountForLastWeek from sales where orderStatus = processed)
UNION
(select amountForYesterday, amountForLastWeek from sales where orderStatus = paidnotshipped)
But that seems like a pain and very inefficient, since I would effectively be writing a separate query for each value.
I could also do this in the .cs page behind on load and programmatically populate the grid view row by row.
This GridView would only show information for the user's specific organization, so it would have to filter based on that as well.
I'm kind of at a loss as to how to do this without writing a massive query and continually hitting that query and database each time the page is viewed.
Any ideas?
I prefer using LINQ to work with data and/or GridViews (accessing the rows etc.). Have a look at a project I have on GitHub, which does exactly what I am mentioning here, as example. Note that this is just a sandbox I used previously for illustration purposes.
GitHub Repo
https://github.com/pauloosthuysen/int
Other useful info:
http://www.codeproject.com/Articles/33685/Simple-GridView-Binding-using-LINQ-to-SQL
The Sales etc. for LastWeek and LastMonth does not change very often. You could store that in a static Dictionary indexed by organization or summarize it in a separate table for faster access. This way you will not need to select the same huge amount of rows to get the same numbers over and over again. Unless special demands I would stick to the Dictionary solution because it is simple but a combination could also be a good solution
There is no direct way of doing it.
However instead of hitting the DB to the sum of every columns, you can perform the stuff using you datatable which is used for binging to your grid.
All you need to do is use
Dim iSumSal As Integer
iSumSal = StudentTable.Compute("SUM(sal)", "")
similarly you can perform for other columns.
once this is done. then just add a new row to you data table with all the summed values in it.
And then you can bind it to your grid.
optional - you can put some text value in the first column of you new row as "Total:"
thanks
rahul
I have a page with 26 sections - one for each letter of the alphabet. I'm retrieving a list of manufacturers from the database, and for each one, creating a link - using a different field in the Database. So currently, I leave the connection open, then do a new SELECT by each letter, WHERE the Name LIKE that letter. It's very slow, though.
What's a better way to do this?
TIA
Since you are going to fetch them all anyway, you might find it faster to fetch them in one go and split them into letter-groups in the code.
Looking at it from the other end, why do you need to fetch all the lists just to build a set of links? Shouldn't you fetch a single letter when its link is clicked?
It sounds like you are doing up to 26 queries, which will never be fast. Often a single db query can take at least 40 ms, due to network latency, establishing connection, etc. So, doing this 26 times means that it will take around 40 x 26 ms, or more than one second. Of course, it can take much longer depending on your schema, data set, hardware, etc., but this is a rule of thumb that gives you a rough idea of the impact of queries on overall page render time.
One way I deal with this kind of situation is to use a DataTable. Fetch all the records into the DataTable, and then you can iterate through the alphabet, and use the Select method to filter.
DataTable myData = GetMyData();
foreach(string letter in lettersOfTheAlphabet)
{
myData.Filter(String.Format("Name like '{0}%'", letter));
//create your link here
}
Depending on your model layer you may wish to filter in a different way, but this is the basic idea that should improve the performance a lot.
Assuming you are querying to determine which letters are used, so that you know which links to render, you could actually just query for the letters themselves, like this:
select distinct substring(ManufacturerName, 1, 1) as FirstCharacter
from MyTable
order by 1
get one result set from one query and split that up. There is quite a lot of overhead going out the the database 26 times to do basically the same work!
You could probably do it smarter with a stored procedure. Let the SP return all the information you need in one call, and suddenly you only have one database interaction instead of 26...
Bring back all the items in one set (dataset, etc..), either through stored procedure or query, including the field left(col1,1), and sorting by that field..
select left(col1,1) as LetterGroup, col1, url_column from table1 order by left(col1,1)
Then look through the whole resultset, changing sections when the letter changes.
First letter in the alphabet sucks (sorry) as discriminator. You do not neet to split them actually (you could just ask for "where name like 'a%'), but whatever you run for that gives you on average a 1/26 or so split of the names. Not extremely efficient.
What do you mean with "creating a link - using a different field in the Database" - this sounds like a bad design to me.
there are a couple ways u can do this. 1) create a view in your db that has all the manufactures and their website link and then continue to hit the view for each letter. 2) select all the manufactures once and store it in a .net dataset and then use that dataset to populate your links.
This seems dirty to me, but you could create a first letter CHAR column and trigger to populate it. Have the first letter from the manufacturer name stored in that column and index it. Then select * from table where FirstLetter = 'A'.
Or create a lookup table with rows A - Z and set up foreign key in the manufacturer table. Again you would probably need a trigger to update this information. Then you could inner join the lookup table to the manufacturer table.
Then instead of putting 26 datasets in the page, have a list of links (A-Z) which select and show each dataset one at a time.
If I read you right, you're making a query for every manufacturer to get the "different field" you need to construct the link. If so, that's your problem, not the 26 alphabetic queries (though that's no help).
In a case like that, the faster way is this one query:
SELECT manufacturer_name, manufacturer_id, different_field
FROM manufacturers m
INNER JOIN different_field_table d
ON m.manufacturer_id = d.manufacturer_id
ORDER BY manufacturer_name
In your server code, loop through the records as usual. If you want, emit a heading when the first letter of the manufacturer_name changes.
For additional speed:
Put that in a stored procedure.
Index different_field_table on manufacturer_id.
I have a calendar that I generate. Currently it makes the entire month and fills each cell with a number(representing the date).
Now I want to grab values from a database and fill in the cells. How could I do this efficiently?
Like right now I can only think of grabbing the data from the database. Once that is down going through that data and essentially have like 30 if statements to determine what cell it should go into.
So that just seems like a very bad way and I am thinking of better ways. So I am wondering anyone else has any ideas.
I am using asp.net mvc I generate the body of the calendar(what is just a table) through my controller and pass it as just a string of html cells and rows.
So basically I generate in the controller all 6 rows of 7 cells(42 cells with 2 cells for previous month and remaining cells for the next month - basically looks like the windows 7 calendar) with the TagBuilder and return that as one big string.
So while building the cells that's what I would have to put the if statements to do the checking.
I am using linq to sql by the way so not sure if that will help or not.
Edit
Another way what I was thinking but not sure how to do it. Would be some how getting all the dates in range. Then take those results and do some grouping on those results. Not sure how to that kind of grouping though. It probably would not be to bad if I do the grouping on the first results and not do a request for each date and then group that. Otherwise I am looking at like 42 requests to the database to group everything.
You're having to loop anyway, to build the rows and columns I assume, so why not pull the data down first, for that month, put the data into an array (old fashioned I know), and check the offset in that array as you increment through the cell rendering?