C# script to parse stored procedures and extract meta data

C# script to parse stored procedures and extract meta data - c#

I have around 500 stored procedures that are used for our ETL process. I have been asked to identify all the source and target tables used by each stored procedure. So, a stored procedure could have a connection to an Oracle linked server, or another SQL Server. It could also be using an OPENQUERY to extract data from our transactional systems.
Since I have some basic .NET/C# programming chops, I was hoping to leverage the .NET RegEx class to get started. However, I am looking for suggestions on how I should approach this. I really don't have to reinvent the wheel if someone already has a solution for this.
As a context, we are working on implementing PowerDesigner to store metadata repository. So, we are looking to extract metadata from our BI reports (map reports to it's source tables/views) and our Informatica and T-SQL ETL scripts.
Thanks

I'd suggest a dual-approach. Firstly, I'd avoid using regex for something as complex as SQL Query parsing, especially since there are tools in place for this kind of thing.
https://msdn.microsoft.com/en-us/library/microsoft.sqlserver.management.smo.dependencywalker.aspx
The SMO library exposes a class that will let you connect to a server and retrieve a dependency tree for a given stored procedure. How to do this exactly is left as an exercise for the reader :)
However, this class won't pick up dependencies that are introduced via dynamic SQL or through OPENQUERY. If the number of procedures that do this are small, I'd recommend doing this manually, and then merging the results. You could use the SMO scripting capabilities to pick up all instances of either OPENQUERY or exec/sp_executesql; at least then you would have an idea of 'suspect' pieces of code.
Merging the results will be tricky. Not only do you have to manually update dependencies for procedures containing dynamic dependencies, but you have to update procedures that depend on procedures containing dynamic dependencies.

You can use a dynamic management view dm_sql_referenced_entities to get some dependency information from SQL Server itself but there are some limitations. Not sure if the Dependency Walker leverages this view, but the pros and cons are very similar.
The same main limitation that I know of and have experienced is that you won't get any dependency information for an object that is leveraged through dynamic sql. We have very contained usages of dynamic sql so I can feel pretty confident leveraging this DMV and manually accounting for the objects hit by those specific procs.
We don't do linked servers, but in my understanding is that those would show in this DMV. I don't know about the OPENQUERY ... I did a little bit of research but I did not test it out but I am guessing those would not be surfaced by the view. Like the previous poster said, you may need a two-pronged approach to get everything you're looking for.
And just for reference, a simple example of using that DMV:
SELECT DISTINCT
[database] = COALESCE(r.referenced_database_name, DB_NAME())
, [schema] = r.referenced_schema_name
, name = r.referenced_entity_name
, r.referenced_id
FROM sys.dm_sql_referenced_entities('dbo.procName_sp', 'OBJECT') AS r
WHERE r.referenced_id IS NOT NULL;

I wouldn't use C# for this. However, maybe something like this will do the job.
select *
from DatabaseName.information_schema.routines
where routine_type = 'PROCEDURE'
SELECT name, type
FROM dbo.sysobjects
WHERE type IN (
'P', -- stored procedures
'FN', -- scalar functions
'IF', -- inline table-valued functions
'TF' -- table-valued functions
)
ORDER BY type, name
Or, if you want SProcs and parameters:
select * from information_schema.parameters
Finally, this link looks pretty helpful for your situation.
http://blog.sqlauthority.com/2010/02/04/sql-server-get-the-list-of-object-dependencies-sp_depends-and-information_schema-routines-and-sys-dm_sql_referencing_entities/

Related

Sql Server class in C# for all request?

I'm looking for a class for Sql Server. I need to make insert, update, delete, select (retrieve many rows and columns) and execute Stored Procedure.
I didn't find a sample of this sort of class and i didn't want to reinvente the wheel.
Somebody can give it to me?

You sound like you may be looking for a ORM (Object Relational Mapper). There are a great number available, some built right it to the .NET framework itself. Look at the various websites and see if you can find one that fits your needs.

There's not a single class that does this, but instead a set of a few classes you need to know:
Sql Server specific:
System.Data.SqlClient.SqlConnection
System.Data.SqlClient.SqlCommand
System.Data.SqlClient.SqlDataReader
System.Data.SqlClient.SqlDataAdapter
System.Data.SqlClient.SqlParameter
Used by all database types
System.Data.DataTable
System.Data.DataSet
System.Data.SqlDbType (enum)
There are others as well, but these are the main ones. Together, these make up the ADO.Net API, and the Sql Server provider for the ADO.Net API.
Additionally, there are a number of Object Relational Mappers that build on top of ADO.Net to try to make this easier. Entity Framework, Linq To Sql, and NHibernate are of a few of the more common options. One common characteristic of ORMs is that they try to free you from even knowing the sql language. If you want to write your own SELECT/INSERT/UPDATE/DELETE queries, which it sounds like you do, you should start at the native ADO.Net level.
To put your data access in one object, you create your own class that makes use of these other types. Don't try to build a new public method that accepts an sql string. Build individual methods for each query you will want to run that include the needed sql as part of the method, and have those methods use these types to change or return data.

You might be interested in this tutorial.
There is builtin functionality (System.Data.SqlClient) to simply access an SQL server.

There is no single class that can do everything you need. Whatever choice you decide you would necessarily need to deal with multiple classes.
Look at it this way – in order to get data from SQL Server you need to typically do following things:
Open connection
Crete SQL query
Execute SQL Query
Accept results
Close connection
Putting all this functionality into a single class would make the class way too complex.
Here is a good reading material for what you need.
Beginners guide to accessing SQL Server through C#

Where to put the SQL logic

I have an existing SQL Server database whose structure I can't really change, although I can add stored procedures or new tables if I want. I have to write a stand-alone program to access the DB, process the data and produce some reports. I've chosen C# and Visual Studio as we're pretty much an MS shop.
I've made a start at exploring using VS 2008 to create said program. I'm trying to decide where to put some of the SQL logic. My primary aims are to keep the development as simple as possible and to perform quickly.
Should I put the SQL logic into a stored procedure and simply call the stored procedure and have SQL Server do the grunt work and hand me the results? Or am I better off keeping the SQL query in my code, creating the corresponding command and executing it against the SQL Server?
I have a feeling the former might perform better, but I've then got to manage the stored procedure separately to the rest of my code base, don't I?
UPDATE: It's been pointed out the performance should be the same if it's the same SQL code in a C# program or a stored procedure. If this is the case, which is the easiest to maintain?
2009-10-02: I had to really think about which answer to select. At the time of writing, there were 8 answers, basically split 5-3 in favour of putting the SQL logic in the application. On the other hand, there were 11 up-votes, split 9-2 in favour of putting the SQL logic in stored procedures (along with a couple of warnings about going this way). So I'm torn. In the end I'm going with the up-votes. However, if I run into trouble I'm going to come back and change my selected answer :)

If it is heavy data manipulation, keep it on the db in stored procedures. If the queries might change some, the better place would be in the db too, otherwise a redeploy might be required for each change.

Keeping the mainstay of the work in stored procedures has the advantage of flexibility - I find it easier to modify a procedure than implement a program change. Unfortunately flexibility is a double-edged sword; it's much easier to make an ill-advised change as well.

I suggest taking a look at LINQ to Entities, which provides an Object Relational Mapping wrapper around any SQL statements (CRUD), abstracting away the logic needed to write to the database, and allowing you to write OO code instead of using SQLConnections and SQLCommands.
OO code (the save method does not exist but you get the gist of it):
// this adds a new car to the Car table in SQL, without using ANY SQL code
Car car = new Car();
Car.BrandName = "Audi";
Car.Save(); //save is called something else and is on the
// datacontext the car is in, but for brevity sake..
SQL code as string in SqlCommand:
// open sql connection in your app and
// create Command that inserts car
SqlConnection conn = new SqlConnection(connstring);
SQlCommand comm = new SqlCommand("INSERT INTO CAR...");
// execute

Versioning and maintaining stored procedures is a nightmare. If you don't hit serious performance issues (that you think will be resolved using stored procedures), I think it will be better to implement logic in your c# code (linq, subsonic or anything like that).

With regard to your point concerning performance variation between embedding your code in .NET source or within SQL Server stored procedures, you should actually see no difference between the two methods!
This is because the same execution plan will be generated by SQL server, provided the data access T-SQL within the two different sources is the same.
You can see this in action by running a SQL Server Profiler trace and comparing the execution plans that are generated by the two different T-SQL query sources.
In light of this and back to the main point of your question then, your choice of implementation should be determined by ease of development and your future extensibility requirements. As you appear to be the sole individual who shall be working on the project then go with what you prefer, which I suspect being to keep the code centralised i.e. within a visual studio Data Access Layer (DAL).
Stored Procedures can come into their own however when you have separate development functions within your organisation/team. For example, you may have database developers on your team who can create your data access code for you and do so independently of the application, freeing you to work on other code modules.

Update deployment: If you need to update the procedure, you can update a stored procedure without your users eve knowing, without taking the server offline. updating the C# means pushing out a new EXE to all your users!

Have a look at Entity Spaces. It's a code generation tool - but it'll do more.
There's a small amount of leg work to do in learning the tool, but once you're up and running you'll never look back. Saves hours of work. (I don't work for them BTW!)

Should I put the SQL logic into a stored procedure
Well that depends on what the “SQL logic” is, doesn't it? If it's purely database-related, a stored procedure might be most appropriate. If it's ‘business logic’, the rules that decide how your application operates, it definitely belongs in your application.
which is the easiest to maintain?
Personally I find application-side code easier as modern languages like C# have much more expressive power than SQL. Doing any significant processing in T-SQL quickly becomes tedious and difficult to read.

How to retrieve the body of an Oracle procedure or function

What I'd like to be able to do is retrieve the schema information for subprograms, functions, package specifications and package bodies from an Oracle 9i database so that I can present them to the user in a C# client using the classes in the System.Data.OracleClient namespace.
So far, I've been able to display the high level schema data far faster than Java applications can, but the packages and functions are beyond my grasp. I can show the columns, their types, the indexes, table- and column level comments, and all sorts of really useful information in really useful ways. Now, if I could just get to the procedures.

Query the data dictionary table ALL_SOURCE http://download.oracle.com/docs/cd/B10501_01/server.920/a96536/ch2124.htm#1300946

Does this help? Not clear whether you wanted to get this via System.Data.OracleClient or via SQL?
SELECT TEXT
FROM ALL_SOURCE
WHERE NAME = <proc_name>
AND OWNER = <schema>

Calling stored procedures

I have a c# application that interfaces with the database only through stored procedures. I have tried various techniques for calling stored procedures. At the root is the SqlCommand class, however I would like to achieve several things:
make the interface between c# and sql smoother, so that procedure calls look more like c# function calls
have an easy way to determine whether a given stored procedure is called anywhere in code.
make the creation of a procedure call quick and easy.
I have explored various avenues. In one, I had a project that with its namespace structure mirrored the name structure of stored procedures, that way I could generate the name of the stored procedure from the name of the class, and I could tell whether a given stored procedure was in use by fining it in the namespace tree. What are some other experiences?

You should try LINQ to SQL.

When stored procedures are the interface to the database, I tend to wrap them in classes which reflect the problem domain, so that most of the application code is using these objects and not calling stored procedures, and not even knowing about the stored procedures or the database connection. The application objects, typically play amongst themselves.
I think it's a mistake to mirror the SPs in your application, as, typically, your relational model is not 1-1 with your application domain object model.
For example, typically I do not have application objects which represent link tables or other artifacts of database design and normalization. Those are collections of objects either contained in or returned by other objects.
A lot is made of the impedance mismatch, but I think it's horses for courses - let databases do what they are good at and OO models do what they are good at.

Have you looked into using the Enterprise Library from MS? It allows you to easily call stored procedures. I generally setup a class per database that is only for calling these stored procs. You can then have something similar to this (sorry it's vb.net and not c#):
Public Shared Function GetOrg(ByVal OrgID As Integer) As System.Data.DataSet
Return db.ExecuteDataSet("dbo.cp_GetOrg", OrgID)
End Function
Where db is defined as:
Dim db As Microsoft.Practices.EnterpriseLibrary.Data.Database = DatabaseFactory.CreateDatabase()
You then have this one function that is used to call the stored procedure. You can then search your code for this one function.

When building my current product, one of the tools that I very much wanted to implement was a database class (like DatabaseFactory - only I didn't care for that one) that would simplify my development and remove some of the "gotchas." Within that class, I wanted to be able to call stored procedures as true C# functions using a function-to-sproc mapping like this:
public int Call_MySproc(int paramOne, bool paramTwo, ref int outputParam)
{
...parameter handling and sproc call here
}
The biggest issue you face when trying to do this, however, lies in the work needed to create C# functions that implement the sproc calls. Fortunately, it is easy to create a code generator to do this in T-SQL. I started with one created originally by Paul McKenzie and then modified it in various ways to generate C# code as I wanted it.
You can either Google Paul McKenzie and look for his original code generator or, if you'd like to write to me at mark -at- BSDIWeb.com, I'll bundle up the source for my SQL class library and the associated sproc code generator and place it on our web site. If I get a request or two, I'll post it and then come back and edit this response to point others to the source as well.

the simplest solution for what you want [and i'm not saying that it is better or worse than the other solutions] is to create a dataset and drag the stored procedures from the server explorer onto the dataset designer surface. This will create methods in the adapter that you can call and check for references.

Although they aren't very fashionable, we use Typed DataSets as a front-end to all of our stored procedures.

Microsoft's new Entity Framework provides just what you're asking for. EF is normally used to create proxy classes for database objects, but one thing a lot of people don't realize is that it also creates proxy methods for stored procedures (auto-generated, of course). This allows you to use your SPs just as though they were regular method calls.
Check it out!

How can I leverage an ORM for a database whose schema is unknown until runtime?

I am trying to leverage ORM given the following requirements:
1) Using .NET Framework (latest Framework is okay)
2) Must be able to use Sybase, Oracle, MSSQL interchangeably
3) The schema is mostly static, BUT there are dynamic parts.
I am somewhat familiar with SubSonic and NHibernate, but not deeply.
I get the nagging feeling that the ORM can do what I want, but I don't know how to leverage it at the moment.
SubSonic probably isn't optimal, since it doesn't currently support Sybase, and writing my own provider for it is beyond my resources and ability right now.
For #3 (above), there are a couple of metadata tables, which describe tables which the vendors can "staple on" to the existing database.
Let's call these MetaTables, and MetaFields.
There is a base static schema, which the ORM (NHibernate ATM) handles nicely.
However, a vendor can add a table to the database (physically) as long as they also add the data to the metadata tables to describe their structure.
What I'd really like is for me to be able to somehow "feed" the ORM with that metadata (in a way that it understands) and have it at that point allow me to manipulate the data.
My primary goal is to reduce the amount of generic SQL statement building I have to do on these dynamic tables.
I'd also like to avoid having to worry about the differences in SQL being sent to Sybase,Oracle, or MSSQL.
My primary problem is that I don't have a way to let ORM know about the dynamic tables until runtime, when I'll have access to the metadata
Edit: An example of the usage might be like the one outlined here:
IDataReader rdr=new Query("DynamicTable1").WHERE("ArbitraryId",2).ExecuteReader();
(However, it doesn't look like SubSonic will work, as there is no Sybase provider (see above)

Acording to this blog you can in fact use NHibernate with dynamic mapping. It takes a bit of tweaking though...

We did some of the using NHibernate, however we stopped the project since it didn't provide us with the ROI we wanted. We ended up writing our own ORM/SQL layer which worked very well (worked since I no longer work there, I'm guessing it still works).
Our system used a open source project to generate the SQL (don't remember the name any more) and we built all our queries in our own Xml based language (Query Markup Language - QML). We could then build an xmlDocument with selects, wheres, groups etc. and then send that to the SqlEngine that would turn it into a Sql statement and execute it. We discusse, but never implemented, a cache in all of this. That would've allowed us to cache the Qmls for frequently used queries.

I am a little confused as to how the orm would be used then at runtime? If the ORM would dynamically build something at runtime, how does the runtime code know what the orm did dynamically?
"have it at that point allow me to manipulate the data" - What is manipulating the data?
I may be missing something here and i aplogize if thats the case. (I only have really used bottom up approach with ORM)

IDataReader doesn't map anything to an object you know. So your example should be written using classic query builder.

Have you looked into using the ADO.NET Entity Framework?
MSDN: LINQ to Entities
It allows you to map database tables to an object model in such a manner that you can code without thinking about which database vendor is being used, and without worrying about minor variations made by a DBA to the actual tables. The mapping is kept in configuration files that can be modified when the db tables are modified without requiring a recompile.
Also, using LINQ to Entities, you can build queries in an OO manner, so you aren't writing actual SQL query strings.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.