Schema of raw data v2.3.0
Note
The Parse.ly Data Pipeline schema is additive only. This means that columns will never be removed and only new columns will be added. Parse.ly Data Pipeline customers receive notifications about upcoming additive schema updates.
JSON format
Raw data accessed via S3 (bulk) or Kinesis (streaming) consists of lines of JSON objects, also known as JSONLines. This format is easy to parse in programming languages, cloud SQL engines, and big data tools.
This page describes the schema of these JSON records (keys and values) for interpreting raw events.
Example JSON page view record
The following example shows a page view record from a Parse.ly site with alphabetically sorted keys.
{
"action" : "pageview",
"apikey" : "example.com",
"campaign_id" : "facebook",
"channel" : "website",
"display" : true,
"display_avail_height" : 735,
"display_avail_width" : 1280,
"display_pixel_depth" : 24,
"display_total_height" : 800,
"display_total_width" : 1280,
"engaged_time_inc" : null,
"event_id" : "0xe6508eda93d5598367b18555ae9b828d",
"extra_data" : {"subscriber_type" : "premium"},
"flags_is_amp" : false,
"ip_city" : "Newark",
"ip_continent" : "NA",
"ip_country" : "US",
"ip_lat" : 37.5147,
"ip_lon" : -122.0423,
"ip_postal" : "94560",
"ip_subdivision" : "CA",
"ip_timezone" : "America/Los_Angeles",
"ip_market_name" : "New York",
"ip_market_nielsen" : "501",
"ip_market_doubleclick" : "3",
"metadata" : true,
"metadata_authors" : [
"Laura Vitto"
],
"metadata_canonical_url" : "http://example.com/2020/08/07/airpods-ftw/",
"metadata_custom_metadata" : "{"page" : 1,"omnitureData" : {"channel" : "watercooler","content_type" : "article","v_buy" : null,"v_buy_i" : null,"h_pub" : 0.0,"h_buy" : null,"h_pub_buy" : null,"v_cur" : 0.0,"v_max" : 0.0,"v_cur_i" : 0,"v_max_i" : 0,"events" : "event51,event61","top_channel" : "watercooler","content_source_type" : "Internal - Editorial Series","content_source_name" : "Apple iPhone 7 Event","author_name" : "Laura Vitto","age" : "0","pub_day" : 7,"pub_month" : 9,"pub_year" : 2020,"pub_date" : "08/07/2020","sourced_from" : "Internal","isPostView" : true,"post_lead_type" : "No Lead Image","topics" : "Apple,Gadgets,iPhone 7,Watercooler","campaign" : null,"display_mode" : null,"viral_video_type" : null,"standalone_video_show" : null,"b_flag" : false}}",
"metadata_duration" : null,
"metadata_full_content_word_count" : 174,
"metadata_image_url" : "http://a.amz.mshcdn.com/media/ZgkyMDE2LzA5LzA3LzU2L0NyeFhpNjNYRUFBSnZwRS5lNDAyMy5qcGcKcAl0aHVtYgkxMjAweDYzMAplCWpwZw/156d0173/3ae/CrxXi63XEAAJvpE.jpg",
"metadata_page_type" : "post",
"metadata_post_id" : "http://example.com/2020/08/07/airpods-ftw/",
"metadata_pub_date_tmsp" : 1473275118000,
"metadata_save_date_tmsp" : 1473275204000,
"metadata_section" : "watercooler",
"metadata_share_urls" : null,
"metadata_tags" : [
"parsely_smart:entity:Breathability",
"parsely_smart:entity:Lyocell",
"parsely_smart:entity:Perspiration",
"parsely_smart:entity:Textile",
"parsely_smart:iab:Needlework",
"sleep week 2022",
"underscored explore",
"underscored lifestyle"
],
"metadata_data_source" : "crawl",
"metadata_thumb_url" : "https://images.parsely.com/xY9xNBMulGDKRMzfKaUQzs7A9PA=/160x160/smart/http%3A//a.amz.mshcdn.com/media/ZgkyMDE2LzA5LzA3LzU2L0NyeFhpNjNYRUFBSnZwRS5lNDAyMy5qcGcKcAl0aHVtYgkxMjAweDYzMAplCWpwZw/156d0173/3ae/CrxXi63XEAAJvpE.jpg",
"metadata_title" : "Everyone has the same fear about Apple's new earbuds",
"metadata_urls" : [
"http://example.com/2020/08/07/airpods-ftw/"
],
"pageload_id" : "b510edbe-84eb-47b6-aa35-9843b5d3b579",
"pageview_id" : "ae2badca-d81f-467d-b5fc-5d8y45f08ff6",
"ref_category" : "internal",
"ref_clean" : "http://example.com/",
"ref_domain" : "example.com",
"ref_fragment" : "",
"ref_netloc" : "example.com",
"ref_params" : "",
"ref_path" : "/",
"ref_query" : "",
"ref_scheme" : "http",
"referrer" : "http://example.com/",
"session" : true,
"session_id" : 6,
"session_initial_referrer" : "http://example.com/",
"session_initial_url" : "http://example.com/",
"session_last_session_timestamp" : 1473271351611,
"session_timestamp" : 1473277747806,
"schema_version" : "2.3.0",
"slot" : false,
"sref_category" : "internal",
"sref_clean" : "http://example.com/",
"sref_domain" : "example.com",
"sref_fragment" : "",
"sref_netloc" : "example.com",
"sref_params" : "",
"sref_path" : "/",
"sref_query" : "",
"sref_scheme" : "http",
"surl_clean" : "http://example.com/",
"surl_domain" : "example.com",
"surl_fragment" : "",
"surl_netloc" : "example.com",
"surl_params" : "",
"surl_path" : "/",
"surl_query" : "",
"surl_scheme" : "http",
"surl_utm_campaign" : "facebook_campaign",
"surl_utm_term" : "8908",
"surl_utm_medium" : "partners",
"surl_utm_source" : "facebook",
"surl_utm_content" : "sports/baseball",
"timestamp_info" : true,
"timestamp_info_nginx_ms" : 1473277850000,
"timestamp_info_override_ms" : null,
"timestamp_info_pixel_ms" : 1473277850017,
"ts_action" : "2020-08-07 19:50:50",
"ts_session_current" : "2020-08-07 19:49:07",
"ts_session_last" : "2020-08-07 18:02:31",
"ua_browser" : "Safari",
"ua_browserversion" : "9.1.2",
"ua_device" : "Other",
"ua_devicebrand" : null,
"ua_devicemodel" : null,
"ua_devicetouchcapable" : false,
"ua_devicetype" : "desktop",
"ua_os" : "Mac OS X",
"ua_osversion" : "10.10.5",
"url" : "http://example.com/2020/08/07/airpods-ftw/#L.eZPflSGqq5",
"url_clean" : "http://example.com/2020/08/07/airpods-ftw/",
"url_domain" : "example.com",
"url_fragment" : "L.eZPflSGqq5",
"url_netloc" : "example.com",
"url_params" : "",
"url_path" : "/2020/08/07/airpods-ftw/",
"url_query" : "",
"url_scheme" : "http",
"user_agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.7",
"utm_campaign" : "facebook_campaign",
"utm_term" : "8908",
"utm_medium" : "partners",
"utm_source" : "facebook",
"utm_content" : "sports/baseball",
"version" : 1,
"videostart_id" : "be0badca-d81e-467d-a5fc-5d7a45f08ff5",
"visitor" : true,
"visitor_ip" : "108.225.131.20",
"visitor_network_id" : "None",
"visitor_site_id" : "zp94fd56-a400-8210-4b23-zb4348207c43"
}These key-value pairs are typically strings, but occasionally also numbers, null, or booleans (true / false).
Example JSON conversion record
Notice that the only differences between this record and the page view example record are the action and extra_data columns. The keys are alphabetically sorted.
{
"action" : "conversion",
"apikey" : "example.com",
"campaign_id" : "facebook",
"channel" : "website",
"display" : true,
"display_avail_height" : 735,
"display_avail_width" : 1280,
"display_pixel_depth" : 24,
"display_total_height" : 800,
"display_total_width" : 1280,
"engaged_time_inc" : null,
"event_id" : "0xe6508eda93d5598367b18555ae9b828d",
"extra_data" : {
"_conversion_type" : "newsletter_signup",
"_conversion_label" : "Weekly Email Newsletter"
},
"flags_is_amp" : false,
"ip_city" : "Newark",
"ip_continent" : "NA",
"ip_country" : "US",
"ip_lat" : 37.5147,
"ip_lon" : -122.0423,
"ip_postal" : "94560",
"ip_subdivision" : "CA",
"ip_timezone" : "America/Los_Angeles",
"ip_market_name" : "New York",
"ip_market_nielsen" : "501",
"ip_market_doubleclick" : "3",
"metadata" : true,
"metadata_authors" : [
"Laura Vitto"
],
"metadata_canonical_url" : "http://example.com/2020/08/07/airpods-ftw/",
"metadata_custom_metadata" : "{"page" : 1,"omnitureData" : {"channel" : "watercooler","content_type" : "article","v_buy" : null,"v_buy_i" : null,"h_pub" : 0.0,"h_buy" : null,"h_pub_buy" : null,"v_cur" : 0.0,"v_max" : 0.0,"v_cur_i" : 0,"v_max_i" : 0,"events" : "event51,event61","top_channel" : "watercooler","content_source_type" : "Internal - Editorial Series","content_source_name" : "Apple iPhone 7 Event","author_name" : "Laura Vitto","age" : "0","pub_day" : 7,"pub_month" : 9,"pub_year" : 2020,"pub_date" : "08/07/2020","sourced_from" : "Internal","isPostView" : true,"post_lead_type" : "No Lead Image","topics" : "Apple,Gadgets,iPhone 7,Watercooler","campaign" : null,"display_mode" : null,"viral_video_type" : null,"standalone_video_show" : null,"b_flag" : false}}",
"metadata_duration" : null,
"metadata_full_content_word_count" : 174,
"metadata_image_url" : "http://a.amz.mshcdn.com/media/ZgkyMDE2LzA5LzA3LzU2L0NyeFhpNjNYRUFBSnZwRS5lNDAyMy5qcGcKcAl0aHVtYgkxMjAweDYzMAplCWpwZw/156d0173/3ae/CrxXi63XEAAJvpE.jpg",
"metadata_page_type" : "post",
"metadata_post_id" : "http://example.com/2020/08/07/airpods-ftw/",
"metadata_pub_date_tmsp" : 1473275118000,
"metadata_save_date_tmsp" : 1473275204000,
"metadata_section" : "watercooler",
"metadata_share_urls" : null,
"metadata_tags" : [
"gadgets",
"iphone-7",
"watercooler",
"apple"
],
"metadata_data_source" : "crawl",
"metadata_thumb_url" : "https://images.parsely.com/xY9xNBMulGDKRMzfKaUQzs7A9PA=/160x160/smart/http%3A//a.amz.mshcdn.com/media/ZgkyMDE2LzA5LzA3LzU2L0NyeFhpNjNYRUFBSnZwRS5lNDAyMy5qcGcKcAl0aHVtYgkxMjAweDYzMAplCWpwZw/156d0173/3ae/CrxXi63XEAAJvpE.jpg",
"metadata_title" : "Everyone has the same fear about Apple's new earbuds",
"metadata_urls" : [
"http://example.com/2020/08/07/airpods-ftw/"
],
"pageload_id" : "b510edbe-84eb-47b6-aa35-9843b5d3b579",
"pageview_id" : "ae2badca-d81f-467d-b5fc-5d8y45f08ff6",
"ref_category" : "internal",
"ref_clean" : "http://example.com/",
"ref_domain" : "example.com",
"ref_fragment" : "",
"ref_netloc" : "example.com",
"ref_params" : "",
"ref_path" : "/",
"ref_query" : "",
"ref_scheme" : "http",
"referrer" : "http://example.com/",
"session" : true,
"session_id" : 6,
"session_initial_referrer" : "http://example.com/",
"session_initial_url" : "http://example.com/",
"session_last_session_timestamp" : 1473271351611,
"session_timestamp" : 1473277747806,
"schema_version" : "2.3.0",
"slot" : false,
"sref_category" : "internal",
"sref_clean" : "http://example.com/",
"sref_domain" : "example.com",
"sref_fragment" : "",
"sref_netloc" : "example.com",
"sref_params" : "",
"sref_path" : "/",
"sref_query" : "",
"sref_scheme" : "http",
"surl_clean" : "http://example.com/",
"surl_domain" : "example.com",
"surl_fragment" : "",
"surl_netloc" : "example.com",
"surl_params" : "",
"surl_path" : "/",
"surl_query" : "",
"surl_scheme" : "http",
"surl_utm_campaign" : "facebook_campaign",
"surl_utm_term" : "8908",
"surl_utm_medium" : "partners",
"surl_utm_source" : "facebook",
"surl_utm_content" : "sports/baseball",
"timestamp_info" : true,
"timestamp_info_nginx_ms" : 1473277850000,
"timestamp_info_override_ms" : null,
"timestamp_info_pixel_ms" : 1473277850017,
"ts_action" : "2020-08-07 19:50:50",
"ts_session_current" : "2020-08-07 19:49:07",
"ts_session_last" : "2020-08-07 18:02:31",
"ua_browser" : "Safari",
"ua_browserversion" : "9.1.2",
"ua_device" : "Other",
"ua_devicebrand" : null,
"ua_devicemodel" : null,
"ua_devicetouchcapable" : false,
"ua_devicetype" : "desktop",
"ua_os" : "Mac OS X",
"ua_osversion" : "10.10.5",
"url" : "http://example.com/2020/08/07/airpods-ftw/#L.eZPflSGqq5",
"url_clean" : "http://example.com/2020/08/07/airpods-ftw/",
"url_domain" : "example.com",
"url_fragment" : "L.eZPflSGqq5",
"url_netloc" : "example.com",
"url_params" : "",
"url_path" : "/2020/08/07/airpods-ftw/",
"url_query" : "",
"url_scheme" : "http",
"user_agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.7",
"utm_campaign" : "facebook_campaign",
"utm_term" : "8908",
"utm_medium" : "partners",
"utm_source" : "facebook",
"utm_content" : "sports/baseball",
"version" : 1,
"videostart_id" : "be0badca-d81e-467d-a5fc-5d7a45f08ff5",
"visitor" : true,
"visitor_ip" : "108.225.131.20",
"visitor_network_id" : "None",
"visitor_site_id" : "zp94fd56-a400-8210-4b23-zb4348207c43"
}For more documentation on how to set up conversions and exactly what data can be sent, please see the conversion integration documentation.
Base event fields
| name | description | example value |
|---|---|---|
| action | Event type identifier | “pageview” |
| apikey | Site identifier | “example.com” |
| channel | The channel source such as website, fbia (Facebook Instant Article), amp, apln-rta (Apple News Realtime) | website |
| referrer | Raw referring URL | “https://www.facebook.com/instantarticles#v1“ |
| user_agent | Raw User-Agent (UA) string | “Mozilla/5.0 (iPhone; CPU … Safari/601.1” |
| url | Raw URL on which action occurred | “http://example.com//2020/08/07/airpods-ftw#id=1“ |
| visitor_site_id | Visitor first-party site identifier | “0beabdd1-7b0c-423b-9fae-660101fc8953” |
| engaged_time_inc | Engaged time in seconds; only available where action = heartbeat or vheartbeat | 10 |
These required fields come from integration with Parse.ly’s data collection infrastructure, whether that’s:
- basic integration for standard web pages
- dynamic tracking of custom events/data
- mobile SDKs for iOS or Android
These fields appear in every single event, regardless of event type or source. Note that excluding the session_id and the visitor_ip fields is possible, though all Parse.ly integrations attempt to support these fields to the best of their ability.
On one-time historical imports
One-time imports of historical page view (or other) event data from legacy web analytics systems are possible, but require custom work on Parse.ly’s side. Equivalents for the “Base Event” fields must exist to make sense of historical data.
Timestamp fields
Parse.ly records two raw timestamps per event. One comes from Parse.ly’s data collection servers, and the other comes from Parse.ly’s client-side trackers. These are stored as numbers that represent seconds since the UNIX epoch, also known as UNIX time. Parse.ly’s server clocks are in UTC. Pulling data spanning multiple days may be necessary for timezone conversions.
| name | description | example value |
|---|---|---|
| timestamp_info | Flag to indicate if timestamp info is available | true |
| timestamp_info_nginx_ms | The automatic server-side event timestamp | 1493598778000 |
| timestamp_info_pixel_ms | The automatic client_side event timestamp | 1493598778538 |
| timestamp_info_override_ms | A client-side override timestamp | 1493598778000 |
| ts_action | Date/time of the event. This is a formatted date/time of the timestamp_info_nginx_ms in UTC | 2020-08-07 00:32:58 |
| ts_session_current | Date/time of the current session derived from timestamp_info_pixel_ms | 2020-08-07 00:30:00 |
| ts_session_last | Date/time of the previous session | 2020-08-07 20:22:47 |
Parse.ly’s server-side timestamp is generally more reliable than the client-side timestamp. However, the client-side timestamp provides greater precision in certain scenarios.
Parse.ly’s nginx (server-side) timestamp is at second resolution, whereas Parse.ly’s pixel (client-side) timestamp is at millisecond resolution. When a pixel timestamp is within a few seconds of the corresponding nginx timestamp, it is likely more accurate since it represents when the event was sent (at millisecond resolution) rather than when the event was received (at second resolution). With Parse.ly’s standard JavaScript tracker, both nginx and pixel are always captured together, so combining them makes JavaScript tracker-based events as accurate as possible.
In mobile SDKs for iOS and Android, it is common to “batch” events if devices are offline. These are also known as “late-arriving” events. In these cases, neither the auto-generated server-side timestamp (in nginx) nor the auto-generated client-side timestamp (in pixel) can be trusted; instead, the client-side override timestamp may be a more accurate representation of reality. The mobile SDK populates these by filling a ts field in the data key-value object sent with every event.
On timezones
Parse.ly’s JavaScript tracker populates the client-side timestamp using newDate().getTime(), which means it is in UTC. Parse.ly’s server clocks are also in UTC, making these timestamps comparable. However, note that the UNIX time itself does not embed any timezone information. It simply represents the number of seconds since a specific UTC time in the past, the UNIX epoch. The user’s local timezone can be inferred from their IP address based on estimated geography. Combining these fields allows interpretation of the user’s local time.
ID field
| name | description | example value |
|---|---|---|
| event_id | Unique event identifier string | “0xe6508eda93d5598367b18555ae9b828d” |
A unique, hex-encoded ID string is also generated for each Event. This property can be used to deduplicate events for easier ingestion and processing.
This unique ID is generated by hashing the values of apikey, action, url, timestamp (internal, generated property), visitor_site_id, and timestamp_info_pixel_ms. To ensure that each event_id is truly unique, all events sent to Parse.ly must provide these required fields (excluding timestamp, which Parse.ly generates) at an appropriate level of cardinality and granularity.
The following examples show the relationship between event_id, pageload_id, pageview_id, and videostart_id:
Identifier fields and correct usage
| name | description | example value |
|---|---|---|
| pageload_id | Unique identifier string | “b510edbe-84eb-47b6-aa35-9843b5d3b579” |
| pageview_id | Unique identifier string | “ae2badca-d81f-467d-b5fc-5d8y45f08ff6” |
| videostart_id | Unique identifier string | “be0badca-d81e-467d-a5fc-5d7a45f08ff5” |
Factors such as browser, device type, and integration age may impact the uniqueness of these identifiers. To avoid overlaps, combine these identifiers with the following fields to ensure they remain distinct across events.
pageview_unique_key = hash(pageview_id, apikey, date(ts_action), visitor_site_id, session_id)
pageload_unique_key = hash(pageload_id, apikey, date(ts_action), visitor_site_id, session_id)
videostart_unique_key = hash(videostart_id, apikey, date(ts_action), visitor_site_id, session_id)The following table shows how to match videos and page views with their corresponding engaged time:
The following table shows how to match page views and engaged time for slide shows where a page reload is not triggered. This example can also apply to infinite scroll pages:
Visitors
| name | description | example value |
|---|---|---|
| visitor | Flag to indicator if visitor info is available | true |
| visitor_site_id | Visitor first-party site identifier | “0beabdd1-7b0c-423b-9fae-660101fc8953” |
| visitor_network_id | [Deprecated] | NULL |
The visitor_site_id is set by a first-party cookie and is unique to each browser. The visitor_network_id was formerly a third-party cookie but has been removed due to privacy concerns. The field remains for backwards compatibility and will always be NULL.
Session enrichments
Parse.ly’s JavaScript tracker automatically creates useful session information for user session analysis. For one thing, Parse.ly’s session_id also doubles as a “number of visits” value, since it’s an auto-incrementing integer that starts at 1 and moves up by one for every new visit by a visitor with the same visitor_site_id.
Note that these enrichments are performed client-side by Parse.ly’s JavaScript tracker; they will not apply to events received via other integrations.
The other fields stored with the session are described below:
| name | description | example value |
|---|---|---|
| session_id | The raw URL of the first page view event of this session | 1 |
| session_initial_referrer | The raw referring URL of the first page view event of this session | “http://facebook.com“ |
| session_initial_url | the raw URL of the first page view event of this session | “http://example.com/1234#d3d“ |
| session_last_session_timestamp | Timestamp of the last visit, or 0 if none | 0 |
| session_timestamp | Timestamp of the first page view event of this session | 1466214847371 |
| session | Flag to indicate if session info is available | true |
Timestamp enrichments
Based on the timestamp fields, Parse.ly creates an important field called ts_action. This field reinterprets timestamp_info_nginx_ms (Parse.ly’s server time) as a formatted date string that is highly compatible with many systems. For example, it is the same format expected by Amazon Redshift and Google BigQuery’s JSON value parsers.
ts_action:"2025-08-07 02:03:24"
This value is derived from epoch time 1754532204; it also lacks timezone information but can be interpreted as a UTC time. Including timezone information, as one might for the “full” ISO8601 standard, makes this string incompatible with some SQL engines, so Parse.ly uses a maximally compatible format instead.
Geo IP enrichments
Based on the visitor_ip field, Parse.ly enriches the following:
| name | description | example value |
|---|---|---|
| ip_continent | Continent from GeoIP | “NA” |
| ip_country | Country from GeoIP | “US” |
| ip_city | City from GeoIP | “New York” |
| ip_lat | Latitude from GeoIP (postal code granularity) | 40.676 |
| ip_lon | Longitude from GeoIP (postal code granularity) | -73.963 |
| ip_postal | Postal code from GeoIP | “11238” |
| ip_subdivision | Subdivision (e.g. US state) from GeoIP | “NY” |
| ip_timezone | Time Zone of visitor based on GeoIP | “America/New_York” |
| ip_market_name | Nielsen DMA name (see note below) | “New York” |
| ip_market_nielsen | Nielsen DMA ID (see note below) | “501” |
| ip_market_doubleclick | Google DoubleClick DMA ID (see note below) | “3” |
On Nielsen-designated market areas (DMA)
ip_market_name, ip_market_nielsen, and ip_market_doubleclick all refer to Nielsen Designated Market Areas, which are only defined in the United States. This means these fields will only be populated for events that originate from U.S.-based IP addresses
URL and referrer enrichments
Based on the url, referrer, session_initial_url and session_initial_referrer fields, Parse.ly provides several enrichments. For illustration, the following examples use values:
| field | value |
|---|---|
| url | “https://www.example.com/article-1234?campaignid=1234#fragment“ |
| referrer | “https://www.google.ca/“ |
| session_initial_url | “https://www.example.com/article-1234?campaignid=1234#fragment“ |
| session_initial_referrer | “https://www.google.ca/“ |
On URL parsing
Attributes added to parsed URLs, such as: fragment, netloc, params, query, and scheme adhere to RFC 1808
| name | description | example value |
|---|---|---|
| url_clean | Cleaned url (strip query/fragment) | “https://www.example.com/article-1234“ |
| url_domain | url parsed domain, matched against TLD list | “example.com” |
| url_fragment | Fragment portion of url | “fragment” |
| url_netloc | Netloc portion of url | “www.example.com” |
| url_params | Params portion of url | “” |
| url_path | Path portion of url | “/article-1234” |
| url_query | Query portion of url | “campaignid=1234” |
| url_scheme | Scheme portion of url | “https” |
| ref_category | referrer category (traffic source categorization) | “search” |
| ref_clean | Clean referrer URL (strip query/fragment) | “https://www.google.ca/“ |
| ref_domain | referrer parsed domain, matched against TLD list | “google.ca” |
| ref_fragment | Fragment portion of referrer | “” |
| ref_netloc | Netloc portion of referrer | “www.google.ca” |
| ref_params | Params portion of referrer | “” |
| ref_path | Path portion of referrer | “/” |
| ref_query | Query portion of referrer | “” |
| ref_scheme | Scheme portion of referrer | “https” |
| surl_clean | Cleaned session_initial_url (strip query/fragment) | “https://www.example.com/article-1234“ |
| surl_domain | session_initial_url parsed domain, matched against TLD list | “example.com” |
| surl_fragment | Fragment portion of session_initial_url | “fragment” |
| surl_netloc | Netloc portion of session_initial_url | “www.example.com” |
| surl_params | Params portion of session_initial_url | “” |
| surl_path | Path portion of session_initial_url | “/article-1234” |
| surl_query | Query portion of session_initial_url | “campaignid=1234” |
| surl_scheme | Scheme portion of session_initial_url | “https” |
| sref_category | Session referrer category (traffic source categorization) | “search” |
| sref_clean | Clean session referrer URL (strip query/fragment) | “https://www.google.ca/“ |
| sref_domain | Referrer parsed domain, matched against TLD list | “google.ca” |
| sref_fragment | Fragment portion of session_initial_referrer | “” |
| sref_netloc | Netloc portion of session_initial_referrer | “www.google.ca” |
| sref_params | Params portion of session_initial_referrer | “” |
| sref_path | Path portion of session_initial_referrer | “/” |
| sref_query | Query portion of session_initial_referrer | “” |
| sref_scheme | The utm_campaign specified in the session_initial_url | “https” |
| surl_utm_campaign | The utm_content specified in the session_initial_url | “subscriber_newsletter” |
| surl_utm_content | The utm_medium specified in the session_initial_url | “template_a” |
| surl_utm_medium | The utm_source specified in the session_initial_url | “email” |
| surl_utm_source | The utm_term specified in the session_initial_url | “newsletter_2020-08-07” |
| surl_utm_term | the utm_term specified in the session_initial_url | “footer” |
Metadata
Whether crawled via JSON-LD or meta tags or passed directly in pixels (as is the case in Parse.ly’s video integration), metadata associated with the url field is passed along in a series of metadata_ fields:
| name | description | example value |
|---|---|---|
| metadata | Flag to indicate if metadata is available | true |
| metadata_authors | Array of authors for the post/parse-ly-video-tracking/ | [“Albert Einstein”, “Richard Feynman”] |
| metadata_canonical_url | The canonical URL of a post, or in the case of videos, the video ID | “http://www.example.com/article-1234“ |
| metadata_pub_date_tmsp | Publish date of the post in milliseconds since the UNIX epoch | 1471392000000 |
| metadata_custom_metadata | String of optional custom metadata (for more information, see the integration docs | “{“internal_post_id”: “2134”}” |
| metadata_section | Section the post/parse-ly-video-tracking/ was published in | “Physics” |
| metadata_tags | Array of tags associated with the post/parse-ly-video-tracking/ | [“science”, “physics”, “quantum mechanics”] |
| metadata_title | Title of the post/parse-ly-video-tracking/ | “Thoughts on Quantum Electrodynamics” |
| metadata_image_url | URL to image for the post/parse-ly-video-tracking/ | “https://www.evernote.com/l/AAFSrhKOoExCqKji3f9BS9YKfZEC-yerafgB/image.png“ |
| metadata_full_content_word_count | Word count of the post (irrelevant for videos) | 1562 |
| metadata_data_source | How the metadata was collected, i.e., ‘crawl’, ‘pixel’, etc. | “crawl” |
| metadata_urls | The aliased URLs that the post lives on (i.e., Google AMP, http://m., main page) that reference the metadata_canonical_url | “https://m.google.com/article“ |
| metadata_post_id | The post id of the article. This is the unique id of a post when the metadata exists | 99999 |
| metadata_share_urls | The social share URLs of the post in a comma-separated list. Share links are from: Facebook, LinkedIn, Pinterest, and Twitter | [“http://example.com/post”,”http://example.com/post”,”http://example.com/post”,”http://example.com/post”,”http://example.com/post”] |
| metadata_page_type | Type of page (i.e., post, section, frontpage, etc) | “post” |
| metadata_save_date_tmsp | Save date of the post in milliseconds (epoch format) | 1471392000000 |
| metadata_thumb_url | the url of the thumbnail image for the post | https://images.example.com/imagelocation |
UA and device enrichments
Based on the ua field, Parse.ly enriches the following:
| name | description | example value |
|---|---|---|
| ua_browser | Browser derived from UA | “Mobile Safari” |
| ua_browserversion | Browser version derived from UA | “9.1.2” |
| ua_devicebrand | Device brand derived from UA | “Apple” |
| ua_devicemodel | Device model derived from UA | “iPhone” |
| ua_devicetouchcapable | Flag to indicate if the device is touch-capable | true |
| ua_devicetype | Device type (mobile/tablet/desktop) from UA | “mobile” |
| ua_os | Device operating system from UA | “iOS” |
| ua_osversion | Device operating system version from UA | “9.3” |
Parse.ly also provides information regarding the display of the device:
| name | description | example value |
|---|---|---|
| display | Flag to indicate if display info is available | true |
| display_avail_height | Available height of the display, in pixels (equivalent to JavaScript’s screen.availHeight property) | 877 |
| display_avail_width | Available width of pixels (equivalent to JavaScript’s screen.availWidth property) | 1436 |
| display_pixel_depth | Color resolution (in bits per pixel) | 24 |
| display_total_height | Total height of the display, in pixels | 900 |
| display_total_width | Total width of the display, in pixels | 1440 |
| slot | Flag to indicate if the slot position on the page is available | true |
UTM parameter enrichments
Based on the url field, Parse.ly enriches the following from its query parameters. Note that UTM parameters are a web-wide de facto standard for campaign tracking, first introduced by Urchin and Google Analytics. Google runs a free tool called the URL builder to build URLs with this format, but many tools will automatically add these parameters to allow for easier tracking, especially in places where HTTP referrers are not automatically set.
In this example, the article URL, http://example.com/1234 was clicked from an email newsletter. It might then have had query parameters like the following (scroll to read):
>https://example.com/1234?utm_source=newsletter_2020-08-07&utm_medium=email&utm_term=footer&utm_content=template_a&utm_campaign=subscriber_newsletterWhich would be parsed as follows:
| name | description | example value |
|---|---|---|
| campaign_id | Campaign identifier or name | “subscribers_email” |
| utm_campaign | Campaign identifier or name | “subscriber_newsletter” |
| utm_content | Template or style (e.g. for A/B tests) | “template_a” |
| utm_medium | Medium campaign ran on (e.g. email, social) | “email” |
| utm_source | The specific identifier for the source content | “newsletter_2020-08-07” |
| utm_term | A keyword or term associated with the click | “footer” |
UTM parameter tracking is powerful because it allows grouping, rollup, and slice-and-dice of campaigns, which often have associated costs and can be part of an ROI calculation. It also helps tremendously with decoding “direct” traffic; e.g., in many email service providers, the above click from an email newsletter would have no HTTP referrer set, and thus UTM parameters would be the only way to understand this traffic.
Extra data
Arbitrary key-value pairs can be passed via Parse.ly’s dynamic tracking or Parse.ly’s implementation for custom segments. Such custom data may include subscriber information or IDs for use in joining to other data sources. In these situations, key/value pairs appear as a nested JSON object in the extra_data field.
As part of an ETL process, these fields can be “flattened” into the root document format for inclusion in downstream databases storing Parse.ly raw data.
"action":"_scroll""extra_data":{"_y": 1430}
In this example, a custom event (_scroll) was sent to Parse.ly’s Data Pipeline with associated custom data {"_y": 1430} representing 1,430 pixels on the y-axis of scroll-depth within the browser. This kind of raw data could be used to implement scroll depth tracking.
Other possibilities
This raw data schema is already quite rich and supports many queries not available in the Parse.ly Dashboard or APIs. Additional possibilities for storing data in raw events include:
- subscriber identifiers, to do detailed loyalty analysis
- more granular information about on-page or in-app activities
- a specialized set of query parameters for social virality modeling
- ad impression or revenue data
- and other custom data
Next steps
Read on for Code Examples.
Or, get help from Parse.ly:
- For existing Parse.ly customers, contact Parse.ly to discuss advanced use cases for raw data.
- For organizations not yet Parse.ly customers, start with the basic integration or schedule a demo to learn about advanced use cases Parse.ly customers have implemented.
Looking for a previous schema version?
This documentation refers to the latest versions of Parse.ly’s Data Pipeline v.2.30. For earlier versions (data prior to October 2019) of the Data Pipeline, see the legacy schema documentation.
Last updated: December 31, 2025