Find all table's name in one Query or SQL command - c#

I would like to find all table's names in my T-SQL command. My command can be a Select, Update, Delete, Insert, Merge or Truncate.
I can use C#. But I don't really how can I find them because there is a lot of possibility.
For example: I can have a select like select below:
SELECT
<Schemaname>.<TableName1>.Field1,
<TableName2>.Field2,
Field3,
Field4 = ( Select .. FROM <TableName6> WHERE ... )
FROM
<TableName1> , <TableName2>
INNER JOIN
<TableName3> AS TableName4 ON .....
WHERE ....
<TableName2>.Field3 in ( SELECT ... FROM TableName5 )
The list that I am looking should has these table's names
TableName1,TableName2,TableName3,TableName5,TableName6
TableName4 is an alias name in this case and it does not present a real name of a table in database.
I have the command where I have used a table name with schema name and an other table without schema name and also I have the alias for some of my tables and the alias can be like a table name that I have really in my database.

Doing this purely with t-sql is incredibly difficult. And by difficult I mean nearly impossible. It will take days and days to get a t-sql script that even comes close to being accurate for this. There are just too many possibilities. Unless the table name you are looking for is so unique you would feel comfortable just searching your code for that table name. Anything else is only going to get you close. Good luck!!!
This is impossible to get 100% correct and exhaustive. What happens if you have a function? And that function pulls data from a view? And that view pulls from other tables?
And even in your example above there are SO many possibilities. You would be better off creating this as a stored procedure and then looking at sys.depends.
But even that isn't exhaustive as it only goes as deep as this query, not the other objects dependencies being referenced

Not too sure I really understand what you are trying to achieve but here are my 2 cents:
Assuming that you can extract the SQL command using some code (can't help if you need THAT code) from your many SSIS packages.
Based on that, I would
use a reference list of all the database objects and search for each of them through your SQLCommand (extracted from your SSIS package)
Build a list of the all the name strings found (list of objects,
could be tables, views, functions, stored procedures)
Then use that list to find the dependent objects in your DB (if you
need to go that deep)
select DISTINCT s1.class_desc, s1.object_id, referenced_major_id, OBJECT_NAME(s1.object_id) AS ObjName , OBJECT_NAME(s2.object_id) AS ObjName2
from sys.sql_dependencies s1
INNER JOIN sys.objects s2 ON s2.object_id = s1.referenced_major_id
Put all that into your result table
Move on to next SSIS package
Would that work for you?
B

Related

Entity Framework - how can I optimize “Contains” statement?

In our current application we have some performance issues with some of our queries. Usually we have something like:
List<int> idList = some data here…;
var query = (from a in someTable where idList.Contains(a.Id));
while for simple queries this is acceptable, it becomes a bottleneck when we have more items in idList (in some queries we have about 700 id’s to check, for example).
Is there any way to use something other then Contains? We are thinking of using some temporary tables to first insert the Ids, and then to execute join instead of Contains, but it would seem EntityFramework does not support such operations (creating temporary tables in code) :(
What else can we try?
I Suggest using LINQ PAD it offers a "Transform to SQL" option which allows you to see your query in SQL syntax.
there is a chance that this is the optimal solution (if youre not into messy stuff).
might try holding the idList as a sorted array and have the contains method replaced with a binary search. (you can implement your own extension).
You can try this:
var query = someTable.Where(a => idList.Any(b => b.Id == a.Id));
If you don't mind having a physical table you could use a semi-temporary table. The basic idea is:
Create a physical table with a "query id" column
Generate a unique ID (not random, but unique)
Insert data into the table tagging the records with the query ID
Pass the query id to the main query, using it to join to the link table
Once the query is complete, delete the temporary records
At worst if something goes wrong you will have orphaned records in the link table (which is why you use a unique query ID).
It's not the cleanest solution but it will be faster than using Contains if you have a lot of values to check against.
When Entity Framework starts being a performance bottleneck, generally it's time to write actual SQL.
So what you could do for example is build a table-valued function that takes a table-valued parameter (your list of IDs) as parameter. The function would just return the result of your JOIN.
Table valued function feature requires EF5, so it might be not an option if you're really stuck with EF4.
The idea is to refactor your queries to get rid of idList.
For example you should return the list of orders of male users 18-25 year, from France. If you filter users table by age, sex and country to get idList of users you end up with 700+ id's. Instead you make Orders table join with Users and apply filters to the Users table. So you don't have 2 requests (one for ids and one for orders) and it works much faster cause it can use indexes while joining the table.
Makes sense?

Best way to check if multiple records exist in database

I am creating an application that takes data from a text file which has sales data from Amazon market place.The market place has items with different names compared to the data in our main database. The application accepts the text file as input and it needs to check if the item exists in our database. If not present I should throw an option to save the item to a Master table or to Sub item table and map it to a master item. My question is if the text file has 100+ items should I hit the database each time to check if the data exists there.Is there any better way of doing that so that we can minimize the database hits.
I have two options that i have used earlier
Hit database and check if it exists in table.
Fill the data in a DataTable and use DataTable.Select to check if it exists.
Can some one tell me the best way to do this?. I have to check two tables (master table, subItem table), maybe 1 at a time. Thanks.
Update:
#Downvoters add an comment .
i am not asking you whats the way to check if an item exists in database.I just want to know the best way of doing that. Should I be hitting database 1000 times if an file has 1000 items? That's my question.
The current query I use:
if exists (select * from [table] where itemname= [itemname] )
select 'True'
else
select 'False'
return
(From Chat)
I would create a Stored Procedure which takes a table valued parameter of all the items that you want to check. You can then use a join (a couple of options here)* to return a result set of items and whether each one exists or not. You can use TVP's from ADO like this.
It will certainly handle the 100 to 1000 row range mentioned in your post. To be honest, I haven't used it in the 1M+ range.
in newer versions of SQL Server, I would prefer TVP's over using an xml input parameter, as it is really quite cumbersome to pack the xml in your .Net code and then unpack it again in your SPROC.
(*) Re Joins : With the result set, you can either just inner join the TVP to your items / product table and check in .Net if the row doesn't exist, or you can do an left outer join with the TVP as the left table, and e.g. ISNULL() missing items to 0 / 'false' etc.
Make it as batch of 100 items to the database. probably a stored procedure might help, since repetitive queries has to be fired. If the data is not changed frequently, you can consider caching. I assume you will be making service calls from ur .net application, so ingest a xml from back end, in batches. Consider increasing batch size based on the filesize.
If your entire application is local, batch size size may very high, as there is no netowrk oberhead, still dont make 100 calls to db.
Try like this
SELECT EXISTS(SELECT * FROM table1 WHERE itemname= [itemname])
SELECT EXISTS(SELECT 1 FROM table1 WHERE itemname= [itemname])

Using Dapper to populate objects from a T-SQL View

I'm trying out using Dapper for my data access (in ASP.NET MVC3 FWIW). I have a a T-SQL view (in SQL Server) which is something like this:
SELECT s.*, c.CompanyId AS BreakPoint c.Name AS CompanyName
FROM tblStaff AS s
INNER JOIN tblCompanies AS c ON c.CompanyId = s.CompanyId
So pretty simple. Essentially a list of staff each of which have a single company.
The problem I'm having is that I'm trying to map the output of this query onto my POCOs, but because each field in the View has to be unique (i.e. CompanyName instead of Name which already exists in tblStaff) the mapping to POCOs isn't working.
Here's the code:
var sql = #"select * from qryStaff";
var people = _db.Query<Person, Company, Person>(sql, (person, company) => {person.Company = company; return person;}, splitOn: "BreakPoint");
Any advice how I might solve this puzzle? I'm open to changing the way I do views as right now I'm stumped about how to progress.
You should explicitly list all the fields returned from you view (no asterisks!) and where the field names are not unique, make use of aliases to deduplicate. As an exmaple:
SELECT
s.CompanyName as CompanyName1,
s.BreakPoint as BreakPoint1,
...
c.CompanyId AS BreakPoint,
c.Name AS CompanyName
FROM tblStaff AS s
INNER JOIN tblCompanies AS c ON c.CompanyId = s.CompanyId
The fields listed and the aliases you might use depend, of course, entirely on your code. Typically you adjust the aliases in your query to match the property names of the POCO.
Also, as a general rule of thumb, it's good to stay away from wildcards in SQL queries exactly because issues like this are introduced. Here's a decent article on SQL query best practices.
Excerpt:
Using explicit names of columns in your SELECT statements within your
code has a number of advantages. First, SQL Server is only returning
the data your application needs, and not a bunch of additional data
that your application will not use. By returning only the data you
need you are optimizing the amount of work SQL Server needs to do to
gather all the columns of information you require. Also by not using
the asterisk (*) nomenclature you are also minimizing the amount of
network traffic (number of bytes) required to send the data associated
with your SELECT statement to your application.
Additionally by explicitly naming your columns, you are insulating
your application from potential failures related to some database
schema change that might happen to any table you reference in your
SELECT statement. If you were to use the asterick (*) nomenclature and
someone was to add a new column to a table, your application would
start receiving data for this additional column of data, even without
changing your application code. If your application were expecting
only a specific number of columns to be returned, then it would fail
as soon as someone added an additional column to one of your
referenced tables. Therefore, by explicitly naming columns in your
SELECT statement your application will always get the same number of
columns returned, even if someone adds a new column to any one of the
tables referenced in your SELECT statement.

Is there any way to get the table hierarchy from a connection string in c#?

I have a current requirement to determine the table hierarchy from a sql statement within c#. For example, consider the following sql statement:
Select Table1.*, Table2.* from Table1
left join table2 on Table1.parentCol = Table2.childCol
That might return 7 columns, 3 for Table1 and 4 for table2. I need to know the column names, and ideally (though not mandatory) their types.
I have no control over what SQL Statement will be used, as this is a user entered field. In C# it's a very basic task to open a connection and create an SqlCommand using that statement. I have freedom to run the SQL into a SqlDataReader, or any other System.Data.SqlClient class if necessary, however I cannot find any combination that will return the columns, rather than the actual column values.
Is anyone able to help?
Many thanks and best regards
You cannot do what you are asking (easily).
More to the point, do not let users enter arbitrary TSQL (You will regret it at some point...).
Instead, create a 'Search' form that allows entering various params and use a parameterised query onto a view that joins all the tables/columns required.
There's no direct way. You'll need to parse names of all the tables from the sql query.
Once you have done that you'll need to write few queries on Information_Schema to get raw data for what you are looking for.
If you are on SQL Server, you may want to use Catalog View
ex-
Select * from sys.tables where [Name] = 'MyTable'

How to read the result of SELECT * from joined tables with duplicate column names in .NET

I am a PHP/MySQL developer, slowly venturing into the realm of C#/SQL Server and I am having a problem in C# when it comes to reading an SQL Server query that joins two tables.
Given the two tables:
TableA:
int:id
VARCHAR(50):name
int:b_id
TableB:
int:id
VARCHAR(50):name
And given the query
SELECT * FROM TableA,TableB WHERE TableA.b_id = TableB.id;
Now in C# I normally read query data in the following fashion:
SqlDataReader data_reader= sql_command.ExecuteReader();
data_reader["Field"];
Except in this case I need to differentiate from TableA's name column, and TableB's name column.
In PHP I would simply ask for the field "TableA.name" or "TableB.name" accordingly but when I try something like
data_reader["TableB.name"];
in C#, my code errors out.
How can fix this? And how can I read a query on multiple tables in C#?
The result set only sees the returned data/column names, not the underlying table. Change your query to something like
SELECT TableA.Name as Name_TA, TableB.Name as Name_TB from ...
Then you can refer to the fields like this:
data_reader["Name_TA"];
To those posting that it is wrong to use "SELECT *", I strongly disagree with you. There are many real world cases where a SELECT * is necessary. Your absolute statements about its "wrong" use may be leading someone astray from what is a legitimate solution.
The problem here does not lie with the use of SELECT *, but with a constraint in ADO.NET.
As the OP points out, in PHP you can index a data row via the "TABLE.COLUMN" syntax, which is also how raw SQL handles column name conflicts:
SELECT table1.ID, table2.ID FROM table1, table;
Why DataReader is not implemented this way I do not know...
That said, a solution to be used could build your SQL statement dynamically by:
querying the schema of the tables you're selecting from
build your SELECT clause by iterating through the column names in the schema
In this way you could build a query like the following without having to know what columns currently exist in the schema for the tables you're selecting from
SELECT TableA.Name as Name_TA, TableB.Name as Name_TB from ...
You could try reading the values by index (a number) rather than by key.
name = data_reader[4];
You will have to experiment to see how the numbers correspond.
Welcome to the real world. In the real world, we don't use "SELECT *". Specify which columns you want, from which tables, and with which alias, if required.
Although it is better to use a column list to remove duplicate columns, if for any reason you want *****, then just use
rdr.item("duplicate_column_name")
This will return the first column value, since the inner join will have the same values in both identical columns, so this will accomplish the task.
Ideally, you should never have duplicate column names, across a database schema. So if you can rename your schema to not have conflicting names.
That rule is for this very situation. Once you've done your join, it is just a new recordset, and generally the table names do go with it.

Categories