<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Aakash Nand on Medium]]></title>
        <description><![CDATA[Stories by Aakash Nand on Medium]]></description>
        <link>https://medium.com/@aakashnand?source=rss-bf34303c5541------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*w8qOaEfNa5jzO5oSJfovVw.jpeg</url>
            <title>Stories by Aakash Nand on Medium</title>
            <link>https://medium.com/@aakashnand?source=rss-bf34303c5541------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Mon, 20 Apr 2026 06:41:45 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@aakashnand/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Thanks for the feedback. I will update it soon]]></title>
            <link>https://aakashnand.medium.com/thanks-for-the-feedback-i-will-update-it-soon-1f00a9b178dc?source=rss-bf34303c5541------2</link>
            <guid isPermaLink="false">https://medium.com/p/1f00a9b178dc</guid>
            <dc:creator><![CDATA[Aakash Nand]]></dc:creator>
            <pubDate>Tue, 08 Feb 2022 14:41:44 GMT</pubDate>
            <atom:updated>2022-02-08T14:41:44.100Z</atom:updated>
            <content:encoded><![CDATA[<p>Thanks for the feedback. I will update it soon</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=1f00a9b178dc" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Integrating Trino and Apache Ranger]]></title>
            <link>https://medium.com/data-science/integrating-trino-and-apache-ranger-b808f6b96ad8?source=rss-bf34303c5541------2</link>
            <guid isPermaLink="false">https://medium.com/p/b808f6b96ad8</guid>
            <category><![CDATA[trinos]]></category>
            <category><![CDATA[apache-ranger]]></category>
            <category><![CDATA[data-security]]></category>
            <category><![CDATA[data-analysis]]></category>
            <category><![CDATA[data-engineering]]></category>
            <dc:creator><![CDATA[Aakash Nand]]></dc:creator>
            <pubDate>Mon, 27 Sep 2021 09:52:26 GMT</pubDate>
            <atom:updated>2022-05-20T08:54:39.875Z</atom:updated>
            <content:encoded><![CDATA[<h4>Understanding how to configure Apache Ranger and Trino for data security.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*-ytswvVrCNbU6XR9" /><figcaption>Photo by <a href="https://unsplash.com/@flyd2069?utm_source=medium&amp;utm_medium=referral">FLY:D</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4acldmX1IGkzOzzJSj0vcg.png" /><figcaption>Image by author inspired by Logos which are trademarks of Trino and Apache Software Foundation respectively from left</figcaption></figure><h3>Part-I: Concepts and Ideas</h3><h4>Background</h4><p>As demand for data grows day by day, the requirement for data security in an enterprise setup is increasing as well. In the Hadoop ecosystem, Apache Ranger has been a promising framework for data security with extensive plugins such as HDFS, Solr, Yarn, Kafka, Hive and many more. <a href="https://cwiki.apache.org/confluence/display/RANGER/Apache+Ranger+2.1.0+-+Release+Notes">Apache Ranger added a plugin for prestosql in version 2.1.0</a> but recently <a href="https://trino.io/blog/2020/12/27/announcing-trino.html">PrestoSQL was rebranded as Trino</a> and that broke the working prestosql plugin for Apache Ranger.</p><p>I have submitted a patch for this issue and there is already an open JIRA issue <a href="https://issues.apache.org/jira/browse/RANGER-3182">here</a> but that will not stop us from integrating Trino with Apache Ranger. For this tutorial, I have built the Apache Ranger 2.1.0 with the Trino plugin. If you want to build the Apache Ranger from source code including the trino plugin you can refer to <a href="https://github.com/aakashnand/ranger/tree/ranger-2.1.0-trino">this</a> GitHub repository on the branch ranger-2.1.0-trino and for this tutorial purpose, we will <a href="https://github.com/aakashnand/trino-ranger-demo">this</a> Github repository.</p><blockquote>Update: 2022–05–20</blockquote><blockquote>Trino plugin is now officially available in the ranger repository and it is released in Apache Ranger-2.3 <a href="https://github.com/apache/ranger/tree/ranger-2.3">https://github.com/apache/ranger/tree/ranger-2.3</a></blockquote><h4>Introduction to Components and Key Ideas</h4><p>Apache Ranger has three key components ranger-admin , ranger-usersync and ranger-audit . Let us get introduced to these components.</p><p><em>Note:</em> <em>Configuring </em><em>ranger-usersync is out of scope for this tutorial and we will not use any </em><em>usersync component for this tutorial.</em></p><h4>Ranger Admin</h4><p>Ranger Admin component is a UI component using which we can create policies for the different access levels. Ranger Admin requires a backend database, in our case we are using Postgres as the backend database for Ranger Admin UI.</p><h4>Ranger Audit</h4><p>Ranger Audit component collects and shows logs for each access event of the resource. Ranger supports two audit methods, solr and elasticsearch . We will use elasticsearch to store ranger audit logs which will be then displayed in the Ranger Audit UI as well.</p><h4>Trino</h4><p>Trino is a fast distributed query engine. It can connect to several data sources such as hive , postgres , oracle and so on. You can read more about Trino and Trino connectors in the official documentation <a href="https://trino.io/docs/current/">here</a>. For this tutorial, we will use the default catalog tpch which comes with dummy data.</p><h4>Trino-Ranger-Plugin</h4><p>Apache Ranger supports many plugins such as HDFS, Hive, Yarn, Trino etc. Each of these plugins needs to be configured on the host which is running that process. Trino-Ranger-Plugin is one component that will communicate with Ranger Admin to check and download the access policies which will be then synced with Trino Server. The downloaded policies are stored as JSON files on the Trino server and can be found under the path /etc/ranger/&lt;service-name&gt;/policycache so in this case the policy path is /etc/ranger/trino/policycache</p><p>The communication between the above components is explained in the following diagram.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*5NkxKbCfavJSQcGsonpm8A.png" /><figcaption>Image by author</figcaption></figure><p>The docker-compose file connects all of the above components.</p><p>Important points about docker-compose.yml</p><ol><li>We have used named-docker-volumes ex: ranger-es-data , ranger-pg-datato persist data of the services such as elasticsearch and postgres even after a container restart</li><li>The pre-built tar files of Ranger-Admin and Ranger-Trino Plugin are available as release assets on this demo repository <a href="https://github.com/aakashnand/trino-ranger-demo/releases/tag/trino-ranger-demo-v1.0">here</a>.</li><li>The ranger-Admin<a href="https://cwiki.apache.org/confluence/display/RANGER/Ranger+Installation+Guide#RangerInstallationGuide-Prerequisites"> process requires a minimum of 1.5 GB of memory</a>. The Ranger-Admin tar file contains install.properties and setup.sh . The setup.sh the script reads the configuration from install.properties . The following patch file describes configuration changes made to install.properties compared to the default version of install.propertiesfor Ranger-Admin component.</li></ol><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/2a985cbf60363983af61556bef69cda8/href">https://medium.com/media/2a985cbf60363983af61556bef69cda8/href</a></iframe><p>4. Ranger-Trino-Plugin tar file also contains install.properties and enable-trino-plugin.sh script. <strong>One important point to note about the trino docker environment is that the configuration files and plugin directory are configured to different directory locations</strong>. The configuration is read from /etc/trino whereas plugins are loaded from /usr/lib/trino/plugins These two directories are important when configuring install.properties for Trino-Ranger-Plugin and hence some extra customization is required to the default script enable-trino-plugin.sh that comes with the Trino-Ranger-Plugin tar file to make it work with dockerized Trino. These changes are highlighted in the following patch file. Basically, these changes introduce two new custom variables INSTALL_ENV and COMPONENT_PLUGIN_DIR_NAME which can be configured in install.properties</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/7d36a5c8f6e826da08ecd7c30834ad3a/href">https://medium.com/media/7d36a5c8f6e826da08ecd7c30834ad3a/href</a></iframe><p>5. install.properties file for Trino Ranger Plugin needs to be configured as shown in the following patch file. Please note that we are using two newly introduced custom variables to inform enable-plugin-script that Trino is deployed in the docker environment.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/2e5fd63d903875a18f3948679d434a75/href">https://medium.com/media/2e5fd63d903875a18f3948679d434a75/href</a></iframe><p>6. Finally, putting it all together in the docker-compose.yml as shown below. This file is also available in Github Repository <a href="https://github.com/aakashnand/trino-ranger-demo/blob/main/docker-compose.yml">here</a>.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/54404417eeb7415854a3e917af2239ca/href">https://medium.com/media/54404417eeb7415854a3e917af2239ca/href</a></iframe><h3>Part-II: Setup and Initializing</h3><p>In this part, we will deploy docker-compose services and confirm the status of each component.</p><h4>Step 1: Cloning repository</h4><pre>git clone <a href="https://github.com/aakashnand/trino-ranger-demo.git">https://github.com/aakashnand/trino-ranger-demo.git</a></pre><h4>Step 2: Deploy docker-compose</h4><pre>$ cd trino-ranger-demo<br>$ docker-compose up -d</pre><p>Once we deploy services using docker-compose, we should be able to see four running services. We can confirm this by docker-compose ps</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XzJrrx4UzC8TEDWUbQMsUA.png" /></figure><h4>Step 3: Confirm Services</h4><p>Let’s confirm that Trino and Ranger-Admin services are accessible on the following URLs</p><p>Ranger Admin: <a href="http://localhost:6080">http://localhost:6080</a></p><p>Trino: <a href="http://localhost:8080">http://localhost:8080</a></p><p>Elasticsearch: <a href="http://localhost:9200">http://localhost:9200</a></p><h4>Step 4: Create Trino service from Ranger-Admin</h4><p>Let&#39;s access Ranger-Admin UI and log in as admin user. We configured our admin user password rangeradmin1 in the above ranger-admin-install.properties file. As we can see in the following screenshot, by default, there is no trinoservice. Therefore, let&#39;s create a service with the name trino . <strong>The service name should match with the name defined in </strong><strong>install.properties for Ranger-Admin</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*2OUlNxXgSc7PE7yGoQvFRg.png" /></figure><p><strong>Please note the hostname in the JDBC string. From </strong><strong>ranger-admin container trino is reachable at </strong><strong>my-localhost-trino hence hostname is configured as </strong><strong>my-localhost-trino</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/937/1*IQ8xPQakHAWtjr213avISA.png" /></figure><p>If we click on Test Connection we will get a <em>Connection Failed</em> error as shown below. This is because the Ranger-Admin process is already running and is still looking for a service with the nametrino which we have not created yet. It will be created once we click Add .</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*3CCTPmfLyZHE59VD-N0BEA.png" /></figure><p>So let&#39;s add trino service and then click Test Connection again</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qJSIAlXGJYh4KgKZNslsfw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/969/1*Pp4z1lq1lWPwV3q3v4hT-Q.png" /></figure><p>Now Ranger-Admin is successfully connected to Trino 🎉</p><h4>Step5: Confirm Ranger-Audit Logs</h4><p>To check audit logs, navigate to audit from the top navigation bar and click Audit . We can see that audit logs are displayed 🎉 . Ranger-Admin and Elasticsearch are working correctly.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*oNW9kH_wRTZR_QJeeZ3dWA.png" /></figure><h3>Part-III Seeing it in Action</h3><p>Now that we have finished the setup, it is time to create actual access policies and see it in action</p><ul><li>When creating the trino service we used ranger-admin as username in the connection information. This creates default policies with this username and thus the ranger-adminuser will have super privileges</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*UT9I0-xMq7TpahDKI6a-7g.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*K89Ak-1MtCF_Htr1jxtZdg.png" /></figure><p>To understand the access scenario and create an access policy we need to create a test user. The Ranger usersync service syncs users, groups, and group memberships from various sources, such as Unix, File, or AD/LDAP into Ranger. Ranger usersync provides a set of rich and flexible configuration properties to sync users, groups, and group memberships from AD/LDAP supporting a wide variety of use cases. In this tutorial, we will manually create a test user from Ranger-Admin UI.</p><h4>Step 1: Create test-user from Ranger-Admin</h4><p>To create a user, let’s navigate to Settings → Users/Groups/Roles → Add New User</p><p>When creating a user we can choose different roles.</p><ul><li>user role is the normal user</li><li>Admin role can create and manage policies from Ranger Admin UI.</li><li>Auditor role is read-only user role.</li></ul><p>For the time being, let’s create a user with Admin role.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/844/1*bsYoAOLkge645EKVQv_Vog.png" /></figure><h4>Step 2: Confirm access for test-user and ranger-admin</h4><p>Let&#39;s confirm access for the user ranger-admin</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*SecBiC49A52LGeQJjOlkAg.png" /></figure><p>As we can see ranger-admin user can access all the tables under schema tpch.sf10</p><p>Since we have not configured any policy for test-user if we try to access any catalog or execute any query, we should see an <em>access denied</em> message. Let&#39;s confirm this by executing queries from <a href="https://trino.io/docs/current/installation/cli.html">Trino CLI</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*732kryA_exaURWuIc5X92g.png" /></figure><h4>Step 3: Allow access to test-user to all tables under schema tpch.sf10</h4><p>Let’s create a policy that allows test-user access to tpch.sf10 to <strong>all</strong> tables.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*uAPrRiwGTaoFiHdPIjvTYw.png" /></figure><p>We can also assign specific permissions on each policy, but for the time being let&#39;s create a policy with all permissions. After creating this policy, we have the following active policies.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Dj-3muiXK2X5e7SvTBt1vA.png" /></figure><p>Now let’s confirm the access again.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4paodwK3WPYVxz8GuKvaIg.png" /></figure><p>We are still getting <em>access denied</em> message. This is because Trino ranger policies need to be configured for each object level. For example, catalog level policy, catalog+schema level policy, catalog+schema+table level policy and information_schema policy. Let&#39;s add policy for the catalog level.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*F5xdrQ9QO0KXlXOW-O2KFA.png" /></figure><p>Let&#39;s confirm again with Trino CLI</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KZe-LY_V8g3dBPsE4Y3kAg.png" /></figure><p>We are still getting the error but the error message is different. Let&#39;s navigate to Ranger Audit Section to understand more about this.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*khwvQA0gy56owRcjddoHrg.png" /></figure><p>We can see an entry that denied permission to a resource called tpch.information_schema.tables.table_schema . In Trino, information_schema is the schema which contains metadata about table and table columns. So it is necessary to add policy for information_schema as well. Access to information_schema is required for any user to execute the query in Trino, therefore, we can use the {USER} variable in Ranger policy that gives access to all users.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*LN_biqBYHM3EpRA5r5BSmQ.png" /></figure><p>Let us confirm the access from Trino CLI again.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*InMXWfcjIfqwLgMc2bKkvQ.png" /></figure><p>We still get <em>access denied</em> if we try to execute any SQL function. In the default policies section, all-functionspolicy (ID:3) is the policy that allows access to execute any SQL function. Since executing SQL function is a requirement for all users, Let’s edit the all-functionspolicy (ID:3) and add all users using the {USER}variable to give access to functions</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*OEstyHiTuT2ULbPsgoKakA.png" /></figure><p>So to summarize, to give access to test-user to <strong>ALL</strong> tables under sf10 we added three new policies and edited the default all-function policy.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Z0sA_g1c7rjp3d3eZtgv-Q.png" /></figure><p>Now we can access and execute queries for all tables for sf10 schema.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*PArWWMXV1KI1DKAW1fTkCg.png" /></figure><p>In the next step, let’s understand how to give access to test-user for a specific table under schema sf10</p><h4>Step 4: Giving access to a specific table under sf10 schema</h4><p>In the previous step, we configured policies to give access to <strong>ALL</strong> tables under sf10 schema and therefore, schema-level the policy was not necessary. To give access to a specific schema we need to add schema-level policy and then we can configure table-level the policy. So let us add schema-level a policy for tpch.sf10</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*7djM2JKqnzC0ndSsgoz03Q.png" /></figure><p>Now let us edit sf10-all-tables-policy from all tables to specific table. We will configure a policy that will allow access to onlynation table</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Fj3a3JtIGguRJDKs9d69LQ.png" /></figure><p>So finally we have the following active policies</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qygAoag1xpr9Lyaui9fuhw.png" /></figure><p>Now let&#39;s execute queries from Trino CLI again for test-user.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KaQb8DRWdx8j1-rL_9Y9uA.png" /></figure><p>test-user can now access the onlynation table from tpch.sf10 schema as desired.</p><p>If you have followed all the steps and reached this end, Congratulations ㊗️, now you have understood how to configure Trino and Apache Ranger.</p><h4>Part-III: Key Takeaways and Conclusion</h4><ul><li>After the rebranding from PrestoSQL to Trino, the default plugin from Apache Ranger’s GitHub repository will NOT work with the new Trino as it is still referencing old io.prestosql packages. You can track this issue on JIRA <a href="https://issues.apache.org/jira/browse/RANGER-3182">here</a></li><li>Rebranded Trino plugin will not be made available in the new Ranger version 2.2.0. So meanwhile, please feel free to use <a href="https://github.com/aakashnand/ranger/tree/ranger-2.1.0-trino">this</a> GitHub repository for building Apache Ranger from source code and <a href="https://github.com/aakashnand/trino-ranger-demo">this</a> GitHub repository for getting started with Trino-Ranger integration.</li><li>Configuring Ranger policies for Trino is not so intuitive because we need to configure access policies for each level. There is an open issue regarding this on Trino’s repository <a href="https://github.com/trinodb/trino/issues/1076">here</a>.</li><li>Nonetheless, it is recommended to configure some basic policies such as information_schema and all-functions with {USER} variable as these policies are necessary for any user to execute queries.</li></ul><p>Due to the lack of good documentation and not so intuitive nature of the integration process, integrating Apache Ranger and Trino can be painful, but I hope this article makes it a bit easier. If you are using Trino, I highly recommend you to join <a href="https://trino.io/slack.html">Trino Community Slack</a> for more detailed discussions. Thank you for reading.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b808f6b96ad8" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-science/integrating-trino-and-apache-ranger-b808f6b96ad8">Integrating Trino and Apache Ranger</a> was originally published in <a href="https://medium.com/data-science">TDS Archive</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>