Using ANTLR to walk a C# project

Using ANTLR to walk a C# project - c#

I have some ANTLR/C# classes capable of parsing a T-SQL stored procedure, walking its dependencies and identifying the required permissions to execute said stored procedure.
I now need to analyse a couple of relatively large C# .NET projects to identify what stored procedures are called.
It is not enough to just locate ALL stored procedure calls, I need to work from a given method and identify stored procedure calls from that.
We can assume that stored procedures are the only means of invoking SQL.
I suppose ANTLR would assist again, but I am uncertain about a few things
Is ANTLR the only approach to walking the chain of method calls?
If I use ANTLR where do I find the 'best' C# .g4 grammar files?
How would I walk from a method call in one unit to its declaration in another - this is the point on which I am most unsure at the moment
To simplify the problem let us suppose that all stored procedure calls take the form
new SqlCommand()
{
Connection = GetConnection(),
CommandText = "dbo.SomeProcName",
CommandType = CommandType.StoredProcedure
};
for my purposes that is almost true.
My Principal concern is 3. above

Related

C# script to parse stored procedures and extract meta data

I have around 500 stored procedures that are used for our ETL process. I have been asked to identify all the source and target tables used by each stored procedure. So, a stored procedure could have a connection to an Oracle linked server, or another SQL Server. It could also be using an OPENQUERY to extract data from our transactional systems.
Since I have some basic .NET/C# programming chops, I was hoping to leverage the .NET RegEx class to get started. However, I am looking for suggestions on how I should approach this. I really don't have to reinvent the wheel if someone already has a solution for this.
As a context, we are working on implementing PowerDesigner to store metadata repository. So, we are looking to extract metadata from our BI reports (map reports to it's source tables/views) and our Informatica and T-SQL ETL scripts.
Thanks

I'd suggest a dual-approach. Firstly, I'd avoid using regex for something as complex as SQL Query parsing, especially since there are tools in place for this kind of thing.
https://msdn.microsoft.com/en-us/library/microsoft.sqlserver.management.smo.dependencywalker.aspx
The SMO library exposes a class that will let you connect to a server and retrieve a dependency tree for a given stored procedure. How to do this exactly is left as an exercise for the reader :)
However, this class won't pick up dependencies that are introduced via dynamic SQL or through OPENQUERY. If the number of procedures that do this are small, I'd recommend doing this manually, and then merging the results. You could use the SMO scripting capabilities to pick up all instances of either OPENQUERY or exec/sp_executesql; at least then you would have an idea of 'suspect' pieces of code.
Merging the results will be tricky. Not only do you have to manually update dependencies for procedures containing dynamic dependencies, but you have to update procedures that depend on procedures containing dynamic dependencies.

You can use a dynamic management view dm_sql_referenced_entities to get some dependency information from SQL Server itself but there are some limitations. Not sure if the Dependency Walker leverages this view, but the pros and cons are very similar.
The same main limitation that I know of and have experienced is that you won't get any dependency information for an object that is leveraged through dynamic sql. We have very contained usages of dynamic sql so I can feel pretty confident leveraging this DMV and manually accounting for the objects hit by those specific procs.
We don't do linked servers, but in my understanding is that those would show in this DMV. I don't know about the OPENQUERY ... I did a little bit of research but I did not test it out but I am guessing those would not be surfaced by the view. Like the previous poster said, you may need a two-pronged approach to get everything you're looking for.
And just for reference, a simple example of using that DMV:
SELECT DISTINCT
[database] = COALESCE(r.referenced_database_name, DB_NAME())
, [schema] = r.referenced_schema_name
, name = r.referenced_entity_name
, r.referenced_id
FROM sys.dm_sql_referenced_entities('dbo.procName_sp', 'OBJECT') AS r
WHERE r.referenced_id IS NOT NULL;

I wouldn't use C# for this. However, maybe something like this will do the job.
select *
from DatabaseName.information_schema.routines
where routine_type = 'PROCEDURE'
SELECT name, type
FROM dbo.sysobjects
WHERE type IN (
'P', -- stored procedures
'FN', -- scalar functions
'IF', -- inline table-valued functions
'TF' -- table-valued functions
)
ORDER BY type, name
Or, if you want SProcs and parameters:
select * from information_schema.parameters
Finally, this link looks pretty helpful for your situation.
http://blog.sqlauthority.com/2010/02/04/sql-server-get-the-list-of-object-dependencies-sp_depends-and-information_schema-routines-and-sys-dm_sql_referencing_entities/

SQL Server CLR hardcoded SQL statements

I am converting an existing T-SQL stored procedure into CLR C# .NET. It has been drilled into me that hardcoding SQL statements in .NET application source code is evil. Is a CLR stored procedure an exception to this rule? What other alternatives do I have? I can't very well call a T-SQL stored procedure instead...

I can't very well call a T-SQL stored procedure instead...
I"m not sure what logic you need to be held inside your CLR component, however you can certainly call stored procedures from the CLR component to retrieve the data you want for processing. You can also call stored procedures to update the data after you've processed it.
It has been drilled into me that hardcoding SQL statements in .NET
application source code is evil. Is a CLR stored procedure an
exception to this rule?
There are many reasons not to put hard coded SQL statements into compiled code and instead to use stored procedures. We could easily list and debate the reasoning behind it but I suggest that if you currently have this rule, then, yes, it applies to CLR's as well. If for no other reason than to be consistent.

Having two methods of doing the same thing

I have some SQL scalar value functions and some stored procedures to do some tasks which I call from many other stored procedures and from SqlCommands within C# code. Sometime ago I start using EF in some parts, now I have some methods written in C# in a helper class that does the same thing as those common SQL functions and stored procedure. So basically I have many methods written twice, T-SQL and C# (EF+Linq). I can not let go of the SQL ones since they are used in other stored procedures!
What's the best practice in this case to avoid the double work and to convert the old TSQL into Linq (if I should do that)?

Put that functionality in one SQL function that everyone calls. In your C# code, you can have a helper function that calls that function for you which would be used everywhere as well.

Where to put the SQL logic

I have an existing SQL Server database whose structure I can't really change, although I can add stored procedures or new tables if I want. I have to write a stand-alone program to access the DB, process the data and produce some reports. I've chosen C# and Visual Studio as we're pretty much an MS shop.
I've made a start at exploring using VS 2008 to create said program. I'm trying to decide where to put some of the SQL logic. My primary aims are to keep the development as simple as possible and to perform quickly.
Should I put the SQL logic into a stored procedure and simply call the stored procedure and have SQL Server do the grunt work and hand me the results? Or am I better off keeping the SQL query in my code, creating the corresponding command and executing it against the SQL Server?
I have a feeling the former might perform better, but I've then got to manage the stored procedure separately to the rest of my code base, don't I?
UPDATE: It's been pointed out the performance should be the same if it's the same SQL code in a C# program or a stored procedure. If this is the case, which is the easiest to maintain?
2009-10-02: I had to really think about which answer to select. At the time of writing, there were 8 answers, basically split 5-3 in favour of putting the SQL logic in the application. On the other hand, there were 11 up-votes, split 9-2 in favour of putting the SQL logic in stored procedures (along with a couple of warnings about going this way). So I'm torn. In the end I'm going with the up-votes. However, if I run into trouble I'm going to come back and change my selected answer :)

If it is heavy data manipulation, keep it on the db in stored procedures. If the queries might change some, the better place would be in the db too, otherwise a redeploy might be required for each change.

Keeping the mainstay of the work in stored procedures has the advantage of flexibility - I find it easier to modify a procedure than implement a program change. Unfortunately flexibility is a double-edged sword; it's much easier to make an ill-advised change as well.

I suggest taking a look at LINQ to Entities, which provides an Object Relational Mapping wrapper around any SQL statements (CRUD), abstracting away the logic needed to write to the database, and allowing you to write OO code instead of using SQLConnections and SQLCommands.
OO code (the save method does not exist but you get the gist of it):
// this adds a new car to the Car table in SQL, without using ANY SQL code
Car car = new Car();
Car.BrandName = "Audi";
Car.Save(); //save is called something else and is on the
// datacontext the car is in, but for brevity sake..
SQL code as string in SqlCommand:
// open sql connection in your app and
// create Command that inserts car
SqlConnection conn = new SqlConnection(connstring);
SQlCommand comm = new SqlCommand("INSERT INTO CAR...");
// execute

Versioning and maintaining stored procedures is a nightmare. If you don't hit serious performance issues (that you think will be resolved using stored procedures), I think it will be better to implement logic in your c# code (linq, subsonic or anything like that).

With regard to your point concerning performance variation between embedding your code in .NET source or within SQL Server stored procedures, you should actually see no difference between the two methods!
This is because the same execution plan will be generated by SQL server, provided the data access T-SQL within the two different sources is the same.
You can see this in action by running a SQL Server Profiler trace and comparing the execution plans that are generated by the two different T-SQL query sources.
In light of this and back to the main point of your question then, your choice of implementation should be determined by ease of development and your future extensibility requirements. As you appear to be the sole individual who shall be working on the project then go with what you prefer, which I suspect being to keep the code centralised i.e. within a visual studio Data Access Layer (DAL).
Stored Procedures can come into their own however when you have separate development functions within your organisation/team. For example, you may have database developers on your team who can create your data access code for you and do so independently of the application, freeing you to work on other code modules.

Update deployment: If you need to update the procedure, you can update a stored procedure without your users eve knowing, without taking the server offline. updating the C# means pushing out a new EXE to all your users!

Have a look at Entity Spaces. It's a code generation tool - but it'll do more.
There's a small amount of leg work to do in learning the tool, but once you're up and running you'll never look back. Saves hours of work. (I don't work for them BTW!)

Should I put the SQL logic into a stored procedure
Well that depends on what the “SQL logic” is, doesn't it? If it's purely database-related, a stored procedure might be most appropriate. If it's ‘business logic’, the rules that decide how your application operates, it definitely belongs in your application.
which is the easiest to maintain?
Personally I find application-side code easier as modern languages like C# have much more expressive power than SQL. Doing any significant processing in T-SQL quickly becomes tedious and difficult to read.

Calling stored procedures

I have a c# application that interfaces with the database only through stored procedures. I have tried various techniques for calling stored procedures. At the root is the SqlCommand class, however I would like to achieve several things:
make the interface between c# and sql smoother, so that procedure calls look more like c# function calls
have an easy way to determine whether a given stored procedure is called anywhere in code.
make the creation of a procedure call quick and easy.
I have explored various avenues. In one, I had a project that with its namespace structure mirrored the name structure of stored procedures, that way I could generate the name of the stored procedure from the name of the class, and I could tell whether a given stored procedure was in use by fining it in the namespace tree. What are some other experiences?

You should try LINQ to SQL.

When stored procedures are the interface to the database, I tend to wrap them in classes which reflect the problem domain, so that most of the application code is using these objects and not calling stored procedures, and not even knowing about the stored procedures or the database connection. The application objects, typically play amongst themselves.
I think it's a mistake to mirror the SPs in your application, as, typically, your relational model is not 1-1 with your application domain object model.
For example, typically I do not have application objects which represent link tables or other artifacts of database design and normalization. Those are collections of objects either contained in or returned by other objects.
A lot is made of the impedance mismatch, but I think it's horses for courses - let databases do what they are good at and OO models do what they are good at.

Have you looked into using the Enterprise Library from MS? It allows you to easily call stored procedures. I generally setup a class per database that is only for calling these stored procs. You can then have something similar to this (sorry it's vb.net and not c#):
Public Shared Function GetOrg(ByVal OrgID As Integer) As System.Data.DataSet
Return db.ExecuteDataSet("dbo.cp_GetOrg", OrgID)
End Function
Where db is defined as:
Dim db As Microsoft.Practices.EnterpriseLibrary.Data.Database = DatabaseFactory.CreateDatabase()
You then have this one function that is used to call the stored procedure. You can then search your code for this one function.

When building my current product, one of the tools that I very much wanted to implement was a database class (like DatabaseFactory - only I didn't care for that one) that would simplify my development and remove some of the "gotchas." Within that class, I wanted to be able to call stored procedures as true C# functions using a function-to-sproc mapping like this:
public int Call_MySproc(int paramOne, bool paramTwo, ref int outputParam)
{
...parameter handling and sproc call here
}
The biggest issue you face when trying to do this, however, lies in the work needed to create C# functions that implement the sproc calls. Fortunately, it is easy to create a code generator to do this in T-SQL. I started with one created originally by Paul McKenzie and then modified it in various ways to generate C# code as I wanted it.
You can either Google Paul McKenzie and look for his original code generator or, if you'd like to write to me at mark -at- BSDIWeb.com, I'll bundle up the source for my SQL class library and the associated sproc code generator and place it on our web site. If I get a request or two, I'll post it and then come back and edit this response to point others to the source as well.

the simplest solution for what you want [and i'm not saying that it is better or worse than the other solutions] is to create a dataset and drag the stored procedures from the server explorer onto the dataset designer surface. This will create methods in the adapter that you can call and check for references.

Although they aren't very fashionable, we use Typed DataSets as a front-end to all of our stored procedures.

Microsoft's new Entity Framework provides just what you're asking for. EF is normally used to create proxy classes for database objects, but one thing a lot of people don't realize is that it also creates proxy methods for stored procedures (auto-generated, of course). This allows you to use your SPs just as though they were regular method calls.
Check it out!

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.