Use INFORMATION_SCHEMA to find data about your database

Have you ever been in this situation?  You’re the new developer for a database powered application and you have to figure out what tables contain a specific column?  Or maybe you’ve got a legacy database that needs it’s stored procedures and user defined functions cleaned up?  Or perhaps you’ve been assigned a unique database inventory task that requires you to report on the structure of your database, rather than the contents of your database.  Besides cracking open your favorite SQL editor and clicking through your database by hand, you might find the data you need from your database’s INFORMATION_SCHEMA System View.

All of the RDMS systems I’ve worked with have built in System Views ( or tables ) that contain data about the structure and contents of your database.  These views are there to help you with non data tasks.  For the sake of this post we’re only going mention the INFORMATION_SCHEMA, which is the system view I use most in SQL Server, and MySQL.  For your reference, here are links to INFORMATION_SCHEMA docs for four popular RDMS systems.

  1. SQL Server – http://msdn.microsoft.com/en-us/library/ms186778.aspx
  2. MySQL – http://dev.mysql.com/doc/refman/5.5/en/information-schema.html
  3. Oracle – http://download.oracle.com/docs/cd/E17952_01/refman-5.0-en/information-schema.html
  4. DB2 – http://www.tar.hu/sqlbible/sqlbible0100.html

If you work with SQL Server, you’ll be happy to know you can see all of the System Views in Management Studio by default.

SQL Server Management Studio shows all System Views

System Views in the AdventureWorks database

If you work with MySQL, you can see the INFORMATION_SCHEMA database in MySQL Workbench, but you’ll have to enable it in Preferences first.

Image

Edit > Preferences > SQL Editor > Check 'Show Metadata Schemata'

Now for the MySQL users, turning on the Metadata Schemata is not required in order to query those system tables.  Assuming your user has appropriate permissions, you can always fire up Workbench and fire a query like this.

[sql]
SELECT *
FROM INFORMATION_SCHEMA.Columns
WHERE Column_Name LIKE ‘%user%’
[/sql]

Image

MySQL has INFORMATION_SCHEMA tables too!

Now that you know about INFORMATION_SCHEMA, you should be happy to know this is part of the SQL-92 standard. Which means whatever sql scripts you write in MySQL, will most likely work with SQL Server as well. So go ahead and start querying your database, you’ll probably find uses for the INFORMATION_SCHEMA in your daily life really quickly.

I’m going to assume you’ve got the general idea here, so here are a few sql scripts that I’ve used over the years.  The first three are informational queries, and the last three are stored procedures that generate .NET or Coldfusion code based off of a table name.

Find all tables with the column EmployeeID

[sql]
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME LIKE ‘EmployeeID’
[/sql]

Image

Which tables have an EmployeeID column?

How many tables have the word employee in the name

[sql]
SELECT *
FROM INFORMATION_SCHEMA.TABLES
WHERE LOWER(TABLE_NAME) LIKE ‘%employee%’
[/sql]

Image

How many %employee% tables are in the DB?

How many SPROCs and UDFs does our database contain?

[sql]
SELECT s.SPROCs, f.UDFs
FROM
(
SELECT COUNT(ROUTINE_NAME) AS ‘SPROCs’, NULL AS ‘UDFs’
FROM INFORMATION_SCHEMA.ROUTINES
WHERE ROUTINE_TYPE = ‘PROCEDURE’
) AS s,
(
SELECT NULL AS ‘SPROCs’, COUNT(ROUTINE_NAME) AS ‘UDFs’
FROM INFORMATION_SCHEMA.ROUTINES
WHERE ROUTINE_TYPE = ‘FUNCTION’
) AS f
[/sql]

Image

How many stored procedures and user defined functions are in the DB?

The stored procedures listed below all accept an incoming table name, then reads data from INFORMATION_SCHEMA to generate code for the specified database table.  Instead of listing the raw SQL, I’m just showing a sample result and allowing you to download the raw SQL sprocs.

Generate C# ADO.NET to VO Stored Procedure

Image

Build custom C# ADO.NET to custom VO fillerup code

download stored procedure


Generate .NET VO class Stored Procedure

Generate C# VO class for DB table name

download stored procedure

Generate Coldfusion CFFunction with CFQUERY Stored Procedure

Image

Generate Coldfusion CFFunction with CFQUERY for specified table name

download stored procedure

I use these System Views almost daily, and there are a lot of scenarios where my job would take a lot longer to do if I didn’t know about this information.  The INFORMATION_SCHEMA also feeds my passion to build code that makes code, just love it.  Hopefully this nickel tour was enough information to help somebody out.

Selecting random ids using TOP and a CTE

While testing visualizations in a Flex application, I needed to do some underlying data cleanup in SQL Server.  One of my tasks was to manually update an entity table and set the status column to one of three possibilities.  Status group A and B both needed to be roughly 20% of my tables total record count, and status group C would be the remaining rows that weren’t touched by status A or status B.  Oh and there’s one more thing, the ids in each status group can not be in sequential order, they have to be random.

At first I thought no sweat.  My dataset is still small ( only 2000 rows ), so if we want uber control I could do the math and generate my id lists by hand.  Yes, hand crafting is possible and under a deadline that kind of logic almost makes sense.  However, I already know the table I’m working with will grow in the future, and I’ll probably have to do this data update again, so why not do this right?  While playing around with different select statements I had a “EUREKA!” moment.  SQL Server’s TOP operator supports PERCENT, not just number.  I couldn’t believe it.  I use TOP at least once a week and I always forget about TOP PERCENT.  Since I already know how to select random rows via CTE, it was time to put it all together.

Before giving you the final SQL, here are the important parts to be familiar with.  Also, for the sake of example I’m using the AdventureWorks database so you can play along at home.

TOP PERCENT

If you just need 50% of the rows in a table, but you’re not concerned about the sequence returned, you can fire this query.  This will give you a sequential listing of ProductIDs

[sql]
SELECT TOP 50 PERCENT ProductID
FROM Production.Product
ORDER BY ProductID
[/sql]

Which will look something like this.

SQL's TOP operator returns rows sequentially

COMMON TABLE EXPRESSION

Now let’s say you want to randomly pull all rows from a table.  This can be achieved using this CTE.

[sql]
WITH data( ProductID ) AS (
SELECT ProductID
FROM Production.Product
)
SELECT ProductID
FROM data
ORDER BY NEWID()
[/sql]

Which will look like this

Common Table Expressions in SQLSERVER are super helpful

If you’re looking to randomly select values from a pre-determined list, see my CTE sample here.

So now that you’ve seen TOP PERCENT and CTE in action, it’s time to put these together and solve my initial task of creating randomly selected groups of ids, of a percent size.

RANDOMLY SELECT TOP PERCENT

Putting it all together, here is the query I used to create my first status group.

[sql]
WITH data( ProductID ) AS (
SELECT ProductID
FROM Production.Product
)
SELECT TOP 20 PERCENT ProductID
FROM data
ORDER BY NEWID()
[/sql]

Which gives me a dataset that is 20% of all rows in Production.Product, and the ids are in random order.

And there you have it. Randomly selecting a percent sized data set from a table in SQL Server. The SQL here is really pretty simple, but for some reason I always forget TOP PERCENT. I’m hoping this post will help me remember TOP PERCENT, and maybe even help somebody else with some TSQL.