Extracting Substrings Dynamically in SQL Server

String manipulation in SQL Server can sometimes be tricky, especially when you don’t know exactly where the piece of text you need begins or ends. You might have data where the structure isn’t perfectly consistent, and you can’t rely on fixed positions. That’s where dynamic substring extraction comes in handy.

In this article we’ll look at how we can dynamically extract substrings from a string in SQL Server when we don’t know the start or end positions of those substrings, or their lengths.

Example Scenario

Imagine we’re storing API request logs in a table, and each log includes a “RequestInfo” field containing several key-value pairs inside a single string. It’s not JSON, unfortunately (legacy systems happen), but we need to extract certain values out of it.

For example, here’s what a few records might look like:

userId=342;action=Login;ip=192.168.1.14;
userId=501;action=Purchase;ip=10.2.1.3;
userId=878;action=Logout;ip=172.20.0.4;

Now, let’s say we want to extract the value of action dynamically from each string. Problem is, we don’t know where it appears, and the length of the value changes. But that’s OK because we can use dynamic substring logic to do it.

Setup Script

Here’s a SQL script to create and populate the table:

-- Create a table to hold our logs
CREATE TABLE ApiLogs (
    LogId INT IDENTITY PRIMARY KEY,
    RequestInfo NVARCHAR(200)
);
GO

-- Insert some sample data
INSERT INTO ApiLogs (RequestInfo)
VALUES 
('userId=342;action=Login;ip=192.168.1.14;'),
('userId=501;action=Purchase;ip=10.2.1.3;'),
('userId=878;action=Logout;ip=172.20.0.4;'),
('userId=215;ip=192.168.1.22;action=ResetPassword;'),
('ip=10.10.10.5;userId=990;action=Signup;');
GO

Extracting the Action Value

To extract the value of action, we’ll find:

  1. The position where the word action= begins.
  2. The position of the next semicolon (;) after action=.
  3. The substring in between.

Here’s the query:

SELECT 
    LogId,
    RequestInfo,
    SUBSTRING(
        RequestInfo,
        CHARINDEX('action=', RequestInfo) + LEN('action='),
        CHARINDEX(';', RequestInfo + ';', CHARINDEX('action=', RequestInfo)) 
            - (CHARINDEX('action=', RequestInfo) + LEN('action='))
    ) AS ExtractedAction
FROM ApiLogs;

Result:

LogId  RequestInfo                                       ExtractedAction
----- ------------------------------------------------ ---------------
1 userId=342;action=Login;ip=192.168.1.14; Login
2 userId=501;action=Purchase;ip=10.2.1.3; Purchase
3 userId=878;action=Logout;ip=172.20.0.4; Logout
4 userId=215;ip=192.168.1.22;action=ResetPassword; ResetPassword
5 ip=10.10.10.5;userId=990;action=Signup; Signup

We can see that the ExtractedAction column contains the string that was dynamically extracted from the RequestInfo column.

In this query, we used the SUBSTRING() function to extract the required string. But we also passed some other functions to this function in order to find exactly where the substring starts and finishes. Here’s what we used:

  • CHARINDEX('action=', RequestInfo) finds the starting position of the keyword.
  • LEN('action=') moves the pointer to just after the equal sign.
  • CHARINDEX(';', RequestInfo + ';', CHARINDEX('action=', RequestInfo)) finds the position of the next semicolon after action=.
  • The difference between these two positions gives us the length of the value we want.

Notice the use of RequestInfo + ';' in the second CHARINDEX(). That ensures there’s always at least one semicolon even if the string ends without one, preventing a null length or error.

Adapting This Technique

The nice part is that we can reuse it for any key in our data. Just swap 'action=' for whatever field we need, like 'ip=' or 'userId='. We could even wrap this logic into a user-defined function (UDF) that accepts a key name as a parameter and returns the corresponding value.

For example:

CREATE FUNCTION dbo.fn_ExtractValue (
    @input NVARCHAR(200),
    @key NVARCHAR(50)
)
RETURNS NVARCHAR(100)
AS
BEGIN
    DECLARE @pattern NVARCHAR(60) = @key + '=';
    DECLARE @start INT = CHARINDEX(@pattern, @input);
    DECLARE @result NVARCHAR(100);

    IF @start = 0 RETURN NULL;

    SET @start = @start + LEN(@pattern);
    DECLARE @end INT = CHARINDEX(';', @input + ';', @start);

    SET @result = SUBSTRING(@input, @start, @end - @start);
    RETURN @result;
END;
GO

Then use it like this:

SELECT 
    LogId,
    dbo.fn_ExtractValue(RequestInfo, 'action') AS ActionName,
    dbo.fn_ExtractValue(RequestInfo, 'ip') AS IPAddress
FROM ApiLogs;

Output:

LogId  ActionName     IPAddress   
----- ------------- ------------
1 Login 192.168.1.14
2 Purchase 10.2.1.3
3 Logout 172.20.0.4
4 ResetPassword 192.168.1.22
5 Signup 10.10.10.5

Wrapping Up

Dynamic substring extraction in SQL Server is all about locating the start and end markers rather than relying on fixed positions. With functions like CHARINDEX() and SUBSTRING(), you can parse semi-structured text effectively, even when the layout isn’t perfect.

It’s not a substitute for having properly structured JSON or XML, but it’s a handy technique for dealing with legacy systems or log data where flexibility matters more than formality.