Query MongoDB from SQL Server

Problem

The problem is that half of my customers data is stored is in a NoSQL database, MongoDB.  The other half is stored in Microsoft SQL Server.  My customers would like to report on their data and do not care how I have stored the data.  Without writing a custom reporting solution, how can I get the NoSQL data to appear as a view in the Microsoft SQL Server database?

Solution

The solution of course was to ask MongoDB to serve up its data using the simple MongoDB SQL daemon.  This allows you to create an Open Database Connectivity or ODBC connection using the MongoDB driver.  Once you have accomplished this, then you can create a linked server in the SQL Server, and you are off to the races.

Let me break down the solution into manageable chunks so that you can understand how to accomplish this feat.

Software

You will need to following software pieces to make this work.

BI Connector

First install the MongoDB Connector for BI.  It should install into the C:\Program Files\MongoDB\Connector for BI\2.14\bin folder on your hard drive.  You will need to open a command prompt to that location.  You will see the following executable files listed here.

Schema

MongoDB cannot just serve up the hieratical objects without help.  You will need to create a schema to export from MongoDB for SQL Server to read.  The mongodrdl is a relational schema management tool for the MongoDB Connector for BI.  The mongodrdl binary can:

  • Produce a schema based on the contents of one or more MongoDB collections and write them out into .drdl files used by mongosqld.
  • Upload a .drdl file’s schema to the MongoDB deployment your BI Connector is reading from.
  • Download stored schemas from your MongoDB deployment.
  • Delete stored schemas from your MongoDB deployment.
  • Name stored schemas in your MongoDB deployment for easier management.

My collection of employees and each employee can have one or more vehicles.  Here is the classes that I will be using to store each item.

class Employee
{
	public int EmployeeId { get; set; }
	public string FirstName { get; set; }
	public string LastName { get; set; }
	public string Email { get; set; }
	public Gender Gender { get; set; }
	public string Country { get; set; }
	public Vehicle[] Vehicles { get; set; }
}

class Vehicle
{
	public int VehicleId { get; set; }
	public string Make { get; set; }
	public string Model { get; set; }
	public string Color { get; set; }
	public string License { get; set; }
}

public enum Gender { Male, Female };
 

I have created several instances of the Employee class with their vehicles.  I then add these items to my Employees collection in MongoDB.  I have not show this step for brevity sake.

MongoDB Compass showing the Employees collection as a tree

Now that I have data in my MongoDB collection, I will use the Mongo Schema tool to export out the basic schema based on the existing data.  The .drdl file is a YAML based file so treat it as such.

You should write out the schema to a text file using the following command:

mongodrdl.exe /db MongoIdentityTestDb /collection Employees /out schema.drdl

The contents of the schema.drdl are in YAML format and should look like the following:

schema:
- db: MongoIdentityTestDb
  tables:
  - table: Employees
    collection: Employees
    pipeline: []
    columns:
    - Name: _id
      MongoType: bson.ObjectId
      SqlName: _id
      SqlType: objectid
    - Name: Country
      MongoType: string
      SqlName: Country
      SqlType: varchar
    - Name: Email
      MongoType: string
      SqlName: Email
      SqlType: varchar
    - Name: EmployeeId
      MongoType: int
      SqlName: EmployeeId
      SqlType: int
    - Name: FirstName
      MongoType: string
      SqlName: FirstName
      SqlType: varchar
    - Name: Gender
      MongoType: int
      SqlName: Gender
      SqlType: int
    - Name: LastName
      MongoType: string
      SqlName: LastName
      SqlType: varchar
  - table: Employees_Vehicles
    collection: Employees
    pipeline:
    - $match:
        Vehicles:
          $ne: []
    - $unwind:
        includeArrayIndex: Vehicles_idx
        path: $Vehicles
        preserveNullAndEmptyArrays: true
    - $addFields:
        Vehicles:
          $cond:
          - $eq:
            - $type: $Vehicles
            - object
          - null
          - $Vehicles
        Vehicles_DOT_Color:
          $cond:
          - $eq:
            - $type: $Vehicles
            - object
          - $Vehicles.Color
          - null
        Vehicles_DOT_License:
          $cond:
          - $eq:
            - $type: $Vehicles
            - object
          - $Vehicles.License
          - null
        Vehicles_DOT_Make:
          $cond:
          - $eq:
            - $type: $Vehicles
            - object
          - $Vehicles.Make
          - null
        Vehicles_DOT_Model:
          $cond:
          - $eq:
            - $type: $Vehicles
            - object
          - $Vehicles.Model
          - null
        Vehicles_DOT_VehicleId:
          $cond:
          - $eq:
            - $type: $Vehicles
            - object
          - $Vehicles.VehicleId
          - null
    columns:
    - Name: _id
      MongoType: bson.ObjectId
      SqlName: _id
      SqlType: objectid
    - Name: Vehicles
      MongoType: string
      SqlName: Vehicles
      SqlType: varchar
    - Name: Vehicles_DOT_Color
      MongoType: string
      SqlName: Vehicles.Color
      SqlType: varchar
    - Name: Vehicles_DOT_License
      MongoType: string
      SqlName: Vehicles.License
      SqlType: varchar
    - Name: Vehicles_DOT_Make
      MongoType: string
      SqlName: Vehicles.Make
      SqlType: varchar
    - Name: Vehicles_DOT_Model
      MongoType: string
      SqlName: Vehicles.Model
      SqlType: varchar
    - Name: Vehicles_DOT_VehicleId
      MongoType: int
      SqlName: Vehicles.VehicleId
      SqlType: int
    - Name: Vehicles_idx
      MongoType: int
      SqlName: Vehicles_idx
      SqlType: int
 

SQL Daemon

Now that I have created a basic schema file for MongoDB to use, I need to allow MongoDB to accept SQL commands.  The SQL Daemon listens on a port for incoming requests and it then proxies those requests to your MongoDB server. 

Start the daemon with the following command:

mongosqld.exe /schema employees-schema.drdl /maxVarcharLength 4000

If you fail to start the daemon with the /maxVarcharLength 4000 you will get the following error when you attempt to query the server.

“OLE DB provider "MSDASQL" for linked server "MONGODB" returned message "Requested conversion is not supported.".
Msg 7341, Level 16, State 2, Line 1
Cannot get the current row value of column "[MSDASQL]._id" from OLE DB provider "MSDASQL" for linked server "MONGODB".
Completion time: 2020-09-04T17:07:47.1322678-05:00”

While the error message is not particularly helpful I want to thank Jack Vamvas for finding the solution to the issue.  You should read his article, A great way to Link SQL Server and MongoDB with BI Connector where he explains why this is an issue.

ODBC Driver

Next install the MongoDB ODBC Driver.  Once you have, open the ODBC Data Source Administrator (64-bit) and click the drivers tab.  You should see the following drivers.

Showing the correct unicode driver highlighted.

Make sure your Unicode driver is installed.  Click on the System DSN tab and create a new data source.  Make sure that you choose the MongoDB ODBC 1.4.x Unicode Driver and click finish.

Showing sample MongoDB ODBC Data Source Configuration

You will fill out the information here that corresponds to your MongoDB configuration.  Once you are done, then press the test button to make sure that you have a connection.

Linked Server

Now that MongoDB is accepting SQL Connections and we have a System Data Source to our MongoDB instance, now we can open SQL Server Management Studio and create a Linked Server.  Open the Object Explorer and navigate down to Server Objects.  Expand Server Objects, Linked Servers, and Providers.  Double click on the MSDASQL provider as shown below and enable the “Allow inprocess” option.

OLE DB Provider, allows in process checked

Next, right click on the Linked Servers level in the tree and select Add Linked Server.  Give your linked server a name.  Add the Product name of MongoDB and then enter the name of your data source.  See the image below:

Linked Server Properties showing configuration

Once your linked server has been created now it is time to explore things and check it out.  You should be able to now expand your object explorer to show the database and table that you have exported.

Table schema definition display

You will see two tables have been exported from the single collection.  This is of course because SQL Server does not handle nested objects so by default it will surface the vehicles collections of each employee to its own “table”. 

Let us query the employee table to surface the columns out.  We are going to use the T-SQL function OPENQUERY to select the information.  The OPENQUERY executes the specified pass-through query on the specified linked server. This server is an OLE DB data source. OPENQUERY can be referenced in the FROM clause of a query as if it were a table name. OPENQUERY can also be referenced as the target table of an INSERT, UPDATE, or DELETE statement. This is subject to the capabilities of the OLE DB provider.

Unfortunately the MongoDB SQL Daemon does not allow for INSERT, UPDATE, or DELETE commands but it does allow you to surface the existing MongoDB data. This can be used for selection or reporting purposes.

Although the query may return multiple result sets, OPENQUERY returns only the first one.

SELECT * 
FROM OPENQUERY(MONGODB, 'SELECT * FROM Employees')

As you can see this brings back the very simple collection of employees

Select * from Employees results

You can see all of the columns that you defined in the schema file earlier. Next, select specific columns to get a different set of results.

SELECT * 
FROM OPENQUERY(MONGODB, 'SELECT EmployeeId, FirstName, LastName FROM Employees')
 

You can see the results change shape just as you would expect.

Select Id, First, Last from Employees

Next we are going to join with the Employees_Vehicles table that is really just the collection of vehicles for each person

SELECT * 
FROM OPENQUERY(MONGODB, 'SELECT E.EmployeeId, E.FirstName, E.LastName, V.*
						 FROM Employees E
							JOIN Employees_Vehicles V ON E._id = V._id')
 

Now we can view this nested information as we would see it in SQL Server

Select * From Employees and Vehicles results

Use of a linked server allows you to alleviate the need to duplicate data . Duplicated data can easily become out of sync with the source data and usually requires some external ecosystem to support such duplication.

As always, thank you for your time and attention.   Remember to like, comment, share and subscribe. I’ll see all y’all later.

Leave a comment