<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Data Querying - Medium]]></title>
        <description><![CDATA[Modernizing &amp; Simplifying how to Query Data - Medium]]></description>
        <link>https://medium.com/data-querying?source=rss----6a073e14e4e---4</link>
        <image>
            <url>https://cdn-images-1.medium.com/proxy/1*TGH72Nnw24QL3iV9IOm4VA.png</url>
            <title>Data Querying - Medium</title>
            <link>https://medium.com/data-querying?source=rss----6a073e14e4e---4</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sun, 07 Jun 2026 15:45:50 GMT</lastBuildDate>
        <atom:link href="https://medium.com/feed/data-querying" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Calling a Transformer ML Model directly via SQL to predict sentiments]]></title>
            <link>https://medium.com/data-querying/calling-a-transformer-ml-model-directly-via-sql-to-predict-sentiments-70996245ebfc?source=rss----6a073e14e4e---4</link>
            <guid isPermaLink="false">https://medium.com/p/70996245ebfc</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[hugging-face]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[sql]]></category>
            <dc:creator><![CDATA[Romain Rigaux]]></dc:creator>
            <pubDate>Sat, 07 Jan 2023 06:29:40 GMT</pubDate>
            <atom:updated>2023-01-18T13:43:52.438Z</atom:updated>
            <content:encoded><![CDATA[<h4>Tutorial on applying a Hugging Face Machine Learning model directly to some table data via SparkSql UDFs and MLflow</h4><p><a href="https://mlflow.org/">MLflow</a> and <a href="https://spark.apache.org/">Apache Spark</a> shine for manipulating your data. Let’s focus on showing how an already existing ML model, here the popular <a href="https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english">distilbert</a>, can be made available in SQL.</p><p>Democratize your ML: SQL is much simpler to use than regular Python, what if the model was easily available to your SQL user base?</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/528/0*gfjZ8hvkwH9whsuu.png" /><figcaption>Applying prediction directly on columns in a table</figcaption></figure><p>Here is the high level architecture:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/669/0*dUSmpfmrFK_kyh-E.png" /><figcaption>Registering the model into MLflow then Spark</figcaption></figure><p>First we pull the model from the Hugging Face and register it in MLflow:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/5b525a3b07ebea995e0a581ea765cf58/href">https://medium.com/media/5b525a3b07ebea995e0a581ea765cf58/href</a></iframe><p>Then via this Notebook we demo how to make the model available as a function that can directly be called in SQL queries.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/d411e36faa1593547c3c7434642dc65f/href">https://medium.com/media/d411e36faa1593547c3c7434642dc65f/href</a></iframe><p>The code is available in this <a href="https://github.com/romainr/data-demo/tree/master/ml2sql/huggingface2sql">demo repository</a>. Next time we will see how to build our own model!</p><p>And that’s it, happy predicting!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=70996245ebfc" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-querying/calling-a-transformer-ml-model-directly-via-sql-to-predict-sentiments-70996245ebfc">Calling a Transformer ML Model directly via SQL to predict sentiments</a> was originally published in <a href="https://medium.com/data-querying">Data Querying</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Serving a Transformer model converting Text to SQL with Huggingface and MLflow]]></title>
            <link>https://medium.com/data-querying/serving-a-transformer-model-converting-text-to-sql-with-huggingface-and-mlflow-be831ae6213c?source=rss----6a073e14e4e---4</link>
            <guid isPermaLink="false">https://medium.com/p/be831ae6213c</guid>
            <category><![CDATA[smart-querying]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[mlflow]]></category>
            <category><![CDATA[hugging-face]]></category>
            <category><![CDATA[tutorial]]></category>
            <dc:creator><![CDATA[Romain Rigaux]]></dc:creator>
            <pubDate>Sun, 24 Oct 2021 05:23:01 GMT</pubDate>
            <atom:updated>2023-01-07T16:29:02.811Z</atom:updated>
            <content:encoded><![CDATA[<p>As machine learning continues to mature, here is an intro on how to use a <a href="https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html">T5 model</a> to generate SQL queries from text questions and serve it via a REST API.</p><p><strong>Update</strong>: Follow-up post about <a href="https://medium.com/data-querying/calling-a-transformer-ml-model-directly-via-sql-to-predict-sentiments-70996245ebfc">using the Model directly in SQL</a></p><p>Machine Learning for code completion got a lot of press with the release of <a href="https://openai.com/blog/openai-codex/">OpenAI Codex</a> which powers <a href="https://copilot.github.com/">GitHub Copilot</a>. Many companies are tackling this problem and making progress is now quicker thanks to the better tooling and techniques.</p><p>In the <a href="https://medium.com/data-querying/10-years-of-data-querying-experience-evolution-with-hue-b005382f5685">10 years of evolution</a> of the Hue SQL Editor, investing and switching to a <a href="https://gethue.com/brand-new-autocompleter-for-hive-and-impala/">parser based autocomplete</a> was one of the top three best decisions. The <a href="https://docs.gethue.com/developer/components/parsers/">parsers</a> have even being reused by most of the competitors. This was done five years ago and now new (complementary) approaches are worth investigating.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/961/1*lPVd-NSLZfoEd0TH1KNUPQ.gif" /><figcaption>Starting the MLflow server and calling the model to generate a corresponding SQL query to the text question</figcaption></figure><p>Here are three SQL topics that could be simplified via ML:</p><ul><li>Text to SQL →a text question get converted into an SQL query</li><li>SQL to Text →getting help on understanding what a SQL query is doing</li><li>Table Question Answering → literally ask questions on a grid dataset</li></ul><p>Let’s have an intro with the generation of an SQL query from a text question.</p><p>For this we pick an existing model named <a href="https://huggingface.co/dbernsohn/t5_wikisql_SQL2en">dbernsohn/t5_wikisql_SQL2en</a>.</p><p>Most of the <em>difficult work</em> has already been done by building the model and fine tuning it on the <a href="https://github.com/salesforce/WikiSQL">WikiSQL dataset</a>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/949/1*r4ko9h8cAZQOjYt4c0UlbQ.png" /><figcaption>Invocation of the prediction service REST API via curl</figcaption></figure><p>Let’s run the model with a simple question:</p><pre>&gt; python text2sql.py predict --query=&quot;How many people live in the USA?&quot;<br><br>&quot;SELECT COUNT Live FROM table WHERE Country = united states AND Name: text&quot;</pre><p>Bonus: this quick CLI based on a <a href="https://medium.com/data-querying/quickly-building-a-command-line-interface-for-your-web-service-ef08253f9a12">previous tutorial</a> allows to interact easily with the model</p><p>Obviously the results are not pixel perfect and a lot more can be done but this is a good start. Now let’s see how serving the model as an API works:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/599/1*WXBS72ZU2rpmijZLVDK7lA.png" /><figcaption>Pulling a trained Text2SQL model M2 from Huggingface Hub and using MFlow to register it as experiments and serve them via a REST API</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/949/1*r4ko9h8cAZQOjYt4c0UlbQ.png" /><figcaption>curl command asking the model to predict the SQL from a text question</figcaption></figure><p>For this we will use MLflow which provides a lot of the glue to automate the tedious engineering management of <a href="https://mlflow.org/docs/latest/models.html#model-customization">ML models</a>.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/3e50adaac9c053e62ad28b5573d0d167/href">https://medium.com/media/3e50adaac9c053e62ad28b5573d0d167/href</a></iframe><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/ccb77c71a1a5b4ea1286f69e3a4927a2/href">https://medium.com/media/ccb77c71a1a5b4ea1286f69e3a4927a2/href</a></iframe><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/b96ad2d22b877c03f19d89bf3bbdc581/href">https://medium.com/media/b96ad2d22b877c03f19d89bf3bbdc581/href</a></iframe><p>The API is simply local here but MLflow can <a href="https://mlflow.org/docs/latest/models.html#built-in-deployment-tools">automate the pushes and deploys</a> of the models in production environments. In our case we just want to register it:</p><pre>python text2sql.py train</pre><p>And after starting the mlflow ui we can see the experiment:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*FZZzDPo4_chH3vpGfuvM7w.png" /><figcaption>Registering the small size model</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*E4QcvlLok-kt43BrQVk4MQ.png" /><figcaption>Seeing some of the model metadata as well as how to load it. Note that more options like Schemas and registering in the Model Registry are available.</figcaption></figure><p>Now we select the iteration we want to serve:</p><pre>mlflow models serve -m /home/romain/projects/romain/text2sql/mlruns/0/efec45c930714e3581033699e011df51/artifacts/model -p 5001</pre><p>And then can directly query it!</p><pre>curl -X POST -H &quot;Content-Type:application/json; format=pandas-split&quot; --data &#39;{&quot;columns&quot;:[&quot;text&quot;],&quot;data&quot;:[[&quot;How many people live in the USA?&quot;]]}&#39; http://127.0.0.1:5001/invocations<br><br>&quot;SELECT COUNT Live FROM table WHERE Country = united states AND Name: text&quot;</pre><p>And that’s it!</p><p>The project is in a <a href="https://github.com/romainr/query-demo/tree/master/huggingface-mlflow">Github repo</a>. As a follow-up you can also find a detailed exampled how to to <a href="https://databricks.com/blog/2021/10/18/mlflow-for-bayesian-experiment-tracking.html">manage a Bayesian Model with MLflow</a>.</p><p>In the next episodes we will see how to integrate the ML API into your own <a href="https://medium.com/data-querying/build-your-own-sql-editor-byoe-in-5-minutes-805ba7534e37">SQL Editor</a> and improve the model!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=be831ae6213c" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-querying/serving-a-transformer-model-converting-text-to-sql-with-huggingface-and-mlflow-be831ae6213c">Serving a Transformer model converting Text to SQL with Huggingface and MLflow</a> was originally published in <a href="https://medium.com/data-querying">Data Querying</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Hosting a Static Website]]></title>
            <link>https://medium.com/data-querying/hosting-a-static-website-e53e6d321bb?source=rss----6a073e14e4e---4</link>
            <guid isPermaLink="false">https://medium.com/p/e53e6d321bb</guid>
            <dc:creator><![CDATA[Romain Rigaux]]></dc:creator>
            <pubDate>Wed, 06 Oct 2021 17:51:50 GMT</pubDate>
            <atom:updated>2021-10-06T17:51:50.313Z</atom:updated>
            <content:encoded><![CDATA[<p>In 2021, here are some quick and efficient solutions to perform the Hosting.</p><p>These days serving a basic website should not take much time out of your way. Note that what worked for me does not necessarily means that this is the best for you and vice versa!</p><p>Here is what I tried lately:</p><h4>Kubernetes and Let’s Encrypt</h4><p>It is what has been done for gethue.com and all its services like demo.gethue.com, docs.gethue.com, cdn, helm… It is overkill but a great way to <a href="https://github.com/cloudera/hue/tree/master/tools/kubernetes/helm/website">understand</a> how services can be operated and also 100% self contained (SSL included, <a href="https://medium.com/data-querying/website-updates-without-downtime-c252f8784276">seamless</a> <a href="https://medium.com/data-querying/performing-automated-upgrades-of-services-after-a-code-change-f4c0a92933f8">auto upgrade</a> after a change) which is very handy.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*IQ52cF6gDts_OMyJTnVp1A.png" /><figcaption><a href="http://gethue.com">gethue.com</a></figcaption></figure><h4>Google Cloud Storage</h4><p>Similar to the AWS S3 public hosting (lot of other solutions in AWS too) and looked easy to try despite a non intuitive to setup. But there is no way to get HTTPS simply or for free for a custom domain name, so I dropped it (but it is good for a CDN).</p><h4>Netlify</h4><p>Should be one of the easiest. Indeed, it was very quick to sign-up, then even just drag &amp; drop the files and transfer the domain transfer. It is famous for integrating with Github. Probably what I would recommend for an open source website.</p><h4>Firebase</h4><p>I did not know about it but the Google Storage docs mention it as an alternative for easy support of HTTPS and custom domain name.</p><p><a href="https://firebase.google.com/docs/hosting/serverless-overview">Serve dynamic content and host microservices using Firebase Hosting</a></p><p>And it was very easy to use:</p><pre>firebase login<br>firebase projects:list<br>firebase init</pre><p>And preview/deploy:</p><pre>firebase emulators:start<br>firebase deploy</pre><p>Next step will be to fully automate the push of the updates to the live Website. We will see how the <a href="https://firebase.google.com/docs/hosting/github-integration">Github Action</a> performs in practice or re-poke at Netlify!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e53e6d321bb" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-querying/hosting-a-static-website-e53e6d321bb">Hosting a Static Website</a> was originally published in <a href="https://medium.com/data-querying">Data Querying</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[To the Next Generation of Data Querying!]]></title>
            <link>https://medium.com/data-querying/to-the-next-generation-of-data-querying-eca78cd52a64?source=rss----6a073e14e4e---4</link>
            <guid isPermaLink="false">https://medium.com/p/eca78cd52a64</guid>
            <dc:creator><![CDATA[Romain Rigaux]]></dc:creator>
            <pubDate>Tue, 05 Oct 2021 04:29:04 GMT</pubDate>
            <atom:updated>2021-10-12T16:31:17.178Z</atom:updated>
            <content:encoded><![CDATA[<p>After close to <a href="https://medium.com/data-querying/10-years-of-data-querying-experience-evolution-with-hue-b005382f5685">10 years of evolution</a> on the <a href="https://gethue.com/">Hue project</a> at Cloudera (and many <a href="https://gethue.com/team-retreat-in-the-phillipines/">team</a> <a href="https://gethue.com/team-retreat-in-the-caribbean-curacao/">retreats</a> <a href="https://gethue.com/team-retreat-in-vietnam/">over</a> <a href="https://gethue.com/team-retreat-in-nicaragua-and-belize/">the</a> <a href="https://gethue.com/hue-team-retreat-thailand/">world</a>) I am joining <a href="https://databricks.com/">Databricks</a> to help make Querying Data Ubiquitous and Simple.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*2BbC_GHb23x70a9w.png" /></figure><p>Interestingly, one of the first <a href="https://spark.apache.org/sql/">Spark SQL</a> querying experience was pioneered in the early days with the <a href="https://gethue.com/spark-notebook-and-livy-rest-job-server-improvements/">Livy API in Hue</a> and promoted in the first <a href="https://gethue.com/spark-summit-europe-building-a-rest-job-server-for-interactive-spark-as-a-service/">Spark Summits</a>.</p><p>After all these years, we also got better at shipping robust core of SQL functionalities and developing software in a <a href="https://medium.com/data-querying/tagged/query-service">much more efficient</a> way via automation, API and <a href="https://medium.com/data-querying/build-your-own-sql-editor-byoe-in-5-minutes-805ba7534e37">Components</a>.</p><p><a href="https://gethue.com/">Hue SQL</a> + <a href="https://redash.io/">Redash</a> pave the way for a modern Querying.</p><blockquote>Query Flow: Smarter Data Querying bridging SQL to ML</blockquote><p>On top of this, AI matured to power the next Generation of Editors and Smart Autocompletes (e.g. <a href="https://copilot.github.com/">Github Copilot</a>, <a href="https://www.tabnine.com/">Tabnine</a>…), and Data Warehousing can provide easy data access for training ML models and executing inferences via SQL itself.</p><p>The direction of Hue is still to be determined with regards to the new role and <a href="mailto:hello@getromain.com">any feedback</a> is welcomed!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/230/0*QHB-WyIbqpYXVlHV.png" /><figcaption>The Hue logo</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*5yZAG_BD-0Y2bjC9cOVFzg.png" /><figcaption>50% of Hue <a href="https://github.com/cloudera/hue/graphs/contributors">contributions</a> over the years while growing the Team/Project</figcaption></figure><blockquote>Passion is strong, only Experience can beat it, and now Passion + Experience should help deliver the next Level of Query Flow!</blockquote><p>In the meantime, checkout the current <a href="https://databricks.com/product/databricks-sql"><strong>DB SQL</strong></a>!</p><p>Happy Querying!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XNUMz3wlvWINfx1g7zPQwQ.jpeg" /><figcaption>Onwards!</figcaption></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=eca78cd52a64" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-querying/to-the-next-generation-of-data-querying-eca78cd52a64">To the Next Generation of Data Querying!</a> was originally published in <a href="https://medium.com/data-querying">Data Querying</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Spark Summit Europe: Building a REST Job Server for interactive Spark as a service]]></title>
            <link>https://medium.com/data-querying/spark-summit-europe-building-a-rest-job-server-for-interactive-spark-as-a-service-70143f941127?source=rss----6a073e14e4e---4</link>
            <guid isPermaLink="false">https://medium.com/p/70143f941127</guid>
            <dc:creator><![CDATA[Romain Rigaux]]></dc:creator>
            <pubDate>Tue, 28 Sep 2021 17:59:26 GMT</pubDate>
            <atom:updated>2021-09-28T17:58:57.650Z</atom:updated>
            <content:encoded><![CDATA[<p><em>Initially published on </em><a href="https://gethue.com/spark-summit-europe-building-a-rest-job-server-for-interactive-spark-as-a-service/"><em>https://gethue.com/spark-summit-europe-building-a-rest-job-server-for-interactive-spark-as-a-service/</em></a><em> on 28 October 2015.</em></p><h3><a href="https://spark-summit.org/eu-2015/events/building-a-rest-job-server-for-interactive-spark-as-a-service/">Building a REST Job Server for interactive Spark as a service</a></h3><p>Livy is a new open source Spark REST Server for submitting and interacting with your Spark jobs from anywhere. Livy is conceptually based on the incredibly popular IPython/Jupyter, but implemented to better integrate into the Hadoop ecosystem with multi users. Spark can now be offered as a service to anyone in a simple way: Spark shells in Python or Scala can be ran by Livy in the cluster while the end user is manipulating them at his own convenience through a REST api. Regular non-interactive applications can also be submitted. The output of the jobs can be introspected and returned in a tabular format, which makes it visualizable in charts. Livy can point to a unique Spark cluster and create several contexts by users. With YARN impersonation, jobs will be executed with the actual permissions of the users submitting them. Livy also enables the development of Spark Notebook applications. Those are ideal for quickly doing interactive Spark visualizations and collaboration from a Web browser! This talk is technical and details the architecture and design decisions taken for developing this server, as well as its internals. It also describes the alternatives we tried and the challenges that were faced. The capabilities of Livy will then be lived demo in Hue’s Notebook Application through a real life scenario.</p><p>Examples:</p><ul><li><a href="https://gethue.com/how-to-use-the-livy-spark-rest-job-server-for-interactive-spark/">Interactive shells</a></li><li><a href="https://gethue.com/how-to-use-the-livy-spark-rest-job-server-api-for-sharing-spark-rdds-and-contexts/">Sharing RDDs</a></li><li><a href="https://gethue.com/how-to-use-the-livy-spark-rest-job-server-api-for-submitting-batch-jar-python-and-streaming-spark-jobs/">Batch jobs</a></li><li><a href="https://gethue.com/new-notebook-application-for-spark-sql/">Notebook</a></li></ul><p><a href="https://www.slideshare.net/gethue/spark-summit-europe-building-a-rest-job-server-for-interactive-spark-as-a-service"><strong>Spark Summit Europe: Building a REST Job Server for interactive Spark as a service</strong></a> from <a href="https://www.slideshare.net/gethue"><strong>gethue</strong></a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*nY75w61Ja1xU1Eiz.jpg" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*wz5-sXQLVqSwoyw4.jpg" /></figure><p><a href="https://gethue.com/spark-summit-europe-building-a-rest-job-server-for-interactive-spark-as-a-service/#"><strong>Share on Facebook</strong></a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=70143f941127" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-querying/spark-summit-europe-building-a-rest-job-server-for-interactive-spark-as-a-service-70143f941127">Spark Summit Europe: Building a REST Job Server for interactive Spark as a service</a> was originally published in <a href="https://medium.com/data-querying">Data Querying</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Hadoop / Spark Notebook and Livy REST Job Server improvements!]]></title>
            <link>https://medium.com/data-querying/hadoop-spark-notebook-and-livy-rest-job-server-improvements-2dcc15388b8b?source=rss----6a073e14e4e---4</link>
            <guid isPermaLink="false">https://medium.com/p/2dcc15388b8b</guid>
            <dc:creator><![CDATA[Romain Rigaux]]></dc:creator>
            <pubDate>Tue, 28 Sep 2021 17:57:25 GMT</pubDate>
            <atom:updated>2021-09-28T17:56:26.525Z</atom:updated>
            <content:encoded><![CDATA[<p><em>Initially published on </em><a href="https://gethue.com/spark-notebook-and-livy-rest-job-server-improvements/"><em>https://gethue.com/spark-notebook-and-livy-rest-job-server-improvements/</em></a><em> on 24 August 2015.</em></p><p>The Notebook application as well as the REST Spark Job Server are being revamped. These two components goals are to let users execute <a href="https://spark.apache.org/">Spark</a> in their browser or from anywhere. They are still in beta but next version of Hue will have them graduate. Here are a list of the improvements and a video demo:</p><ul><li>Revamp of the snippets of the Notebook UI</li><li>Support for Spark 1.3, 1.4, 1.5</li><li>Impersonation with YARN</li><li>Support for R shell</li><li>Support for submitting jars or python apps</li></ul><p>How to play with it?</p><p>See in this post how to use the <a href="https://gethue.com/new-notebook-application-for-spark-sql/">Notebook UI</a> and on this page on how to use the <a href="https://github.com/cloudera/hue/tree/master/apps/spark/java#welcome-to-livy-the-rest-spark-server">REST Spark Job Server</a> named Livy. The architecture of Livy was recently detailed in a <a href="https://gethue.com/big-data-scala-by-the-bay-interactive-spark-in-your-browser/">presentation</a> at Big Data Scala by the Bay. Next updates will be at the <a href="https://www.eventbrite.com/e/spark-lightning-night-at-shutterstock-nyc-tickets-17590432457">Spark meetup</a> before Strata NYC and <a href="https://spark-summit.org/eu-2015/events/building-a-rest-job-server-for-interactive-spark-as-a-service/">Spark Summit</a> in Amsterdam.</p><h3>Slicker snippets interface</h3><p>The snippets now have a new code editor, autocomplete and syntax highlighting. Shortcut links to HDFS paths and Hive tables have been added.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*FkmeWoZ5jJxDY_G1.png" /></figure><h3>R support</h3><p>The SparkR shell is now available, and plots can be displayed inline</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/906/0*BhpUzJHgQK4gE2UV.png" /></figure><h3>Support for closing session and specifying Spark properties</h3><p>All the spark-submit, spark-shell, pyspark, sparkR properties of jobs &amp; shells can be added to the sessions of a Notebook. This will for example let you add files, modules and tweak the memory and number of executors.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*sMLGFtDZOhvHln4b.png" /></figure><p>So give this new Spark integration a try and feel free to send feedback on the <a href="https://groups.google.com/a/cloudera.org/group/hue-user">hue-user</a> list or <a href="https://twitter.com/gethue">@gethue</a>!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=2dcc15388b8b" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-querying/hadoop-spark-notebook-and-livy-rest-job-server-improvements-2dcc15388b8b">Hadoop / Spark Notebook and Livy REST Job Server improvements!</a> was originally published in <a href="https://medium.com/data-querying">Data Querying</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Performing automated upgrades of Services after a code change]]></title>
            <link>https://medium.com/data-querying/performing-automated-upgrades-of-services-after-a-code-change-f4c0a92933f8?source=rss----6a073e14e4e---4</link>
            <guid isPermaLink="false">https://medium.com/p/f4c0a92933f8</guid>
            <category><![CDATA[docker]]></category>
            <category><![CDATA[kubernetes]]></category>
            <category><![CDATA[ci-cd-pipeline]]></category>
            <category><![CDATA[query-service]]></category>
            <dc:creator><![CDATA[Romain Rigaux]]></dc:creator>
            <pubDate>Fri, 24 Sep 2021 19:23:01 GMT</pubDate>
            <atom:updated>2021-09-24T19:22:44.372Z</atom:updated>
            <content:encoded><![CDATA[<p>Efficient CICD by leveraging GitHub, DockerHub, Keel and webhooks.</p><p>This is a series of post describing how the Hue <a href="http://gethue.com/">Query Service</a> is being built.</p><p>Following-up on concept of “<a href="https://medium.com/data-querying/website-updates-without-downtime-c252f8784276">no downtime while upgrading</a>”, scheduled daily refreshes are a good first step, but shortening up even more the development-release cycle feedback loop can provide an even better return on investment i.e.:</p><ul><li>Did we introduce an evident bug in the latest code change?</li><li>Is the new functionality available right away to test/use for real?</li></ul><p>This is obviously possible only if the building and testing of the artifact is fully automated and quick to build.</p><p>Note: one of the goal is to avoid a maximum of custom scripting and stay simple</p><p>After a Pull Request is merged, a new container is automatically built and will replace the currently running ones in our Kubernetes cluster with a Keel deployment:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/699/1*6ciU9v07nNiGZG7_-UmU3w.png" /><figcaption>From sending a code change to building the artifact and serving it</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*2QvXH-1ADbjaolpd4Q5HvA.png" /><figcaption>Docker Hub auto building feature</figcaption></figure><p>Note: Docker Hub auto build feature now requires to pay. Some other companies like Google Cloud still offers it for free</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/775/1*7FM6pUWhr-575GAS2_ryDQ.png" /><figcaption>Out API pod freshly re-created with the new image</figcaption></figure><p>Caveat: auto rolling upgrades with versioning (instead of “latest” tag) are the way to go for a safe rollout in case of shipping a critical container</p><p>Many more options are described on <a href="https://keel.sh/">https://keel.sh/</a>.</p><p>What if we want something even lighter than above? We will look at some Serverless options in another episode!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f4c0a92933f8" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-querying/performing-automated-upgrades-of-services-after-a-code-change-f4c0a92933f8">Performing automated upgrades of Services after a code change</a> was originally published in <a href="https://medium.com/data-querying">Data Querying</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Quickly Building a Command Line Interface for your Web Service]]></title>
            <link>https://medium.com/data-querying/quickly-building-a-command-line-interface-for-your-web-service-ef08253f9a12?source=rss----6a073e14e4e---4</link>
            <guid isPermaLink="false">https://medium.com/p/ef08253f9a12</guid>
            <category><![CDATA[query-service]]></category>
            <category><![CDATA[cli]]></category>
            <dc:creator><![CDATA[Romain Rigaux]]></dc:creator>
            <pubDate>Fri, 10 Sep 2021 22:08:33 GMT</pubDate>
            <atom:updated>2021-09-10T22:08:33.229Z</atom:updated>
            <content:encoded><![CDATA[<p>Make your service more accessible and force good design principles.</p><p>A <a href="https://en.wikipedia.org/wiki/Command-line_interface">Command Line Interface</a> (CLI) is the antithesis of a modern Web interface. The <a href="http://gethue.com/">Hue Query Assistant</a> already provides a visual way to Query Data and manipulate files, and is getting simpler and smarter as the releases goes.</p><p>So why providing a CLI?</p><p>In short, we found a CLI was:</p><blockquote>Helpful for simplifying certain usage operations, has a very quick ROI and re-enforce clean designs and opens up more creativity.</blockquote><p>Philosophy</p><p>The CLI targets more advanced users and provides direct access to the Query Service from their desktop or favorite machine as long as it can talk to it via HTTP.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/620/1*F4_4TzA7m2HyVfRp1cvlOQ.png" /><figcaption>Still the same interaction as via a Web browser but via a Bash terminal</figcaption></figure><p>The first goal is to augment the current <a href="https://docs.gethue.com/developer/api/rest/">API</a> for its most important operations:</p><ul><li>Execute an SQL statement or saved query</li><li>List, download, upload files</li></ul><p>It was decided to focus on the new secure <a href="https://medium.com/data-querying/object-file-storage-public-rest-api-bd34c215bbb3?source=collection_home---7------1-----------------------">Storage API</a> (handy to manipulate files from a shell) and not blindly support all the possible operations (skip the clutter) of the recent public <a href="https://docs.gethue.com/developer/api/rest/">REST API</a> powering the SQL Scratchpad and the File Browsers.</p><p>The CLI leverages a lot of existing pieces and it only took 2 days to design/implement a first version with one operation. It is also straightforward for the Open Source Community and <a href="https://github.com/cloudera/hue/blob/master/CONTRIBUTING.md">Hue Contributors</a> to add extra operation by following the existing commands.</p><p>Last but not least, there was a lot of learning and inspiration cascading down from this first version. It particular on how to design and use Typer instead of the traditional <a href="https://docs.python.org/3/library/argparse.html">argparse</a>:</p><p><a href="https://typer.tiangolo.com/alternatives/#click">Alternatives, Inspiration and Comparisons</a></p><p>Typer provides exemplary documentation, is designed for simplicity, built on top of <a href="https://click.palletsprojects.com/">Click</a>, leverages Python 3 types.</p><p>But now let’s give this CLI a quick try!</p><p>The <a href="https://github.com/gethue/compose">CLI project</a> is part of the <a href="https://github.com/gethue/compose">Compose</a> repository and automatically bundled into the <a href="https://pypi.org/project/gethue/">Gethue</a> package.</p><p>Let’s install the latest version:</p><pre>pip install gethue</pre><p>See the current commands:</p><pre>&gt; compose --help</pre><pre>Usage: compose [OPTIONS] COMMAND [ARGS]...</pre><pre>Query your Data Easily</pre><pre>Options:<br>--install-completion  Install completion for the current shell.<br>--show-completion     Show completion for the current shell, to copy it or customize the installation.<br>--help                Show this message and exit.</pre><pre>Commands:<br>auth     Configure the CLI<br>query    Execute queries, list databases, tables<br>storage  Manipulate data files</pre><p>And point to the demo service API:</p><pre>&gt; compose auth<br><br>Api url [https://demo.gethue.com]:<br>Username [demo]:<br>Password [demo]:<br>Auth: success 200<br>Token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjMxMjE5MDkxLCJqdGkiOiJkNGJkY2Q5M2NjMjg0MDlkYWJlYWZhNGRlNjlkOTMzMyIsInVzZXJfaWQiOjJ9.Gr8bW_JaZ8yzQ3eEZYp3jKbdsSgLAXxqvSRbeU6jhLg</pre><p>And list the content of a remote directory:</p><pre>&gt; compose storage list --path s3a://demo-gethue</pre><pre>s3a://demo-gethue/data (https://demo.gethue.com/hue/filebrowser/view=s3a%3A%2F%2Fdemo-gethue%2Fdata)</pre><pre>s3a://demo-gethue/data/web_logs (https://demo.gethue.com/hue/filebrowser/view=s3a%3A%2F%2Fdemo-gethue%2Fdata%2Fweb_log</pre><p>Et voila!</p><p>The <a href="https://github.com/gethue/compose/tree/master/cli">new CLI</a> is paving the way for the Hue 5 Query Editor Service.</p><p>We also got new ideas along the way, like decoupling even more the Python modules, introducing design patterns from the Typer project, getting familiar with Python 3 typing… which already paid back the time spent on creating the CLI.</p><p>We bet that the user community will also come back with new usage feedback! (hint: like scheduling queries ;)</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ef08253f9a12" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-querying/quickly-building-a-command-line-interface-for-your-web-service-ef08253f9a12">Quickly Building a Command Line Interface for your Web Service</a> was originally published in <a href="https://medium.com/data-querying">Data Querying</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Seamless integration of a SQL Scratchpad component into your own Web app]]></title>
            <link>https://medium.com/data-querying/seamless-integration-of-a-sql-scratchpad-component-into-your-own-web-app-e0688aa409f5?source=rss----6a073e14e4e---4</link>
            <guid isPermaLink="false">https://medium.com/p/e0688aa409f5</guid>
            <category><![CDATA[query-service]]></category>
            <category><![CDATA[query]]></category>
            <dc:creator><![CDATA[Romain Rigaux]]></dc:creator>
            <pubDate>Tue, 24 Aug 2021 17:13:58 GMT</pubDate>
            <atom:updated>2021-08-24T18:21:02.986Z</atom:updated>
            <content:encoded><![CDATA[<p>How to authenticate an external Web Component with your application.</p><p>Now that we have these decoupled and reusable <a href="https://medium.com/data-querying/build-your-own-sql-editor-byoe-in-5-minutes-805ba7534e37">SQL Web Components</a> and <a href="https://medium.com/data-querying/object-file-storage-public-rest-api-bd34c215bbb3">REST API</a>, how do we link them up together into a separate Web application?</p><p>This post describes strategies for having them interact properly, in particular about how to handle the authentication.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/941/1*xUyDFiWFGek9o5zV9LvgNA.png" /><figcaption>High level: authentication between Web Component and API and the Database to query</figcaption></figure><p>The <a href="https://docs.gethue.com/developer/components/scratchpad/">SQL Scratchpad</a> component is injected into a Web page. This Web page can either be served by Hue or be completely independent, e.g. we want to leverage the advanced <a href="https://docs.gethue.com/user/querying/#autocomplete">SQL autocomplete</a> and query execution of Hue from within another completely independent application (an existing Notebook app or a custom Popup functionality).</p><h4>Same Authentication as the Hue API</h4><p>i.e. This is the “traditional” authentication, same as signing in from the main Hue login page itself.</p><p>Hue authentication is supporting <a href="https://docs.gethue.com/administrator/configuration/server/#authentication">multiple auth backends</a> (with some of them providing out of the box SSO like LDAP or SAML). When using the Web interface the browser currently forwards an HTTP cookie to the API. When using the public API, it is a JWT token.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/671/1*xogNSF48HpwXPtW2f7YJNQ.png" /></figure><p>This is pretty straightforward and brings us to the second strategy, where Hue is purely seen as an external SQL Editor service.</p><h4>Authentication external to the Hue API</h4><p>In the real world, the Web page displaying the SQL Scratchpad has already authenticated itself via another Authentication service (e.g. company SSO) and got a cookie or JWT identifying the logged-in user. Also we don’t want yet another login box showing-up in the component asking the user to authenticate.</p><p>Similarly to providing custom authentication login backends, Hue also supports providing your <a href="https://github.com/cloudera/hue/blob/master/desktop/core/src/desktop/auth/api_authentications.py">own authentication</a> for the public REST API itself (thanks to <a href="https://www.django-rest-framework.org/api-guide/authentication/">Django REST Framework</a> own pluggability).</p><p>For example a single page Web app can provide its own token to the SQL Editor component:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/691/1*hKyaraYKH688doNGPnxOUw.png" /><figcaption>The Web page already authenticated with the Central Authentication service for a token, forward this token to the Scratchpad Component that will forward it to the Query Service API which can decode it, usually by leveraging a public API key</figcaption></figure><p>There are multiple ways to pickup and provide this JWT token. It depends how it is stored in the main application, which could be:</p><ul><li>hardcoded</li><li>a cookie</li><li>in storage</li><li>in memory</li></ul><p>In <a href="https://demo.gethue.com/">demo.gethue.com</a>, the authentication is well, realistic only for a demo as the credentials are set in clear in the page:</p><pre>&lt;div style=”position: absolute; height: 100%; width: 100%”&gt; <br> &lt;sql-scratchpad api-url=”https://demo.gethue.com&quot; username=”demo” password=”demo” dialect=”mysql” /&gt;<br>&lt;/div&gt;</pre><p>One more realistic way is to <a href="https://github.com/cloudera/hue/blob/master/desktop/core/src/desktop/js/webComponents/QueryEditorComponents.ts#L41">provide the token</a> to the Component via its <em>setBearerToken()</em> method (other hooks are currently in design).</p><p>Note: we are not discussing here the possible CSRF/XSS vulnerabilities of above methods as these are not specific to the Web component, but this is something to be aware of.</p><p>Some advantages of this method is to see the Query Service as its own “headless” entity and to simplify the interactions, as if the token can be validated and so trusted, it can also be forwarded between services. e.g. the Query API can forward the end user token to the Database engine.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/941/1*YwRyGiUARpn1e7DAl4s5xw.png" /><figcaption>The same token is used across the platform services</figcaption></figure><p>Interested in helping build better SQL components (Editor, Parsers, Formatter, APIs..)? Feel free to follow-up on the <a href="https://docs.gethue.com/developer/components/">development section</a>!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e0688aa409f5" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-querying/seamless-integration-of-a-sql-scratchpad-component-into-your-own-web-app-e0688aa409f5">Seamless integration of a SQL Scratchpad component into your own Web app</a> was originally published in <a href="https://medium.com/data-querying">Data Querying</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Object/File Storage public REST API]]></title>
            <link>https://medium.com/data-querying/object-file-storage-public-rest-api-bd34c215bbb3?source=rss----6a073e14e4e---4</link>
            <guid isPermaLink="false">https://medium.com/p/bd34c215bbb3</guid>
            <category><![CDATA[query-service]]></category>
            <category><![CDATA[api]]></category>
            <category><![CDATA[azure-storage]]></category>
            <category><![CDATA[aws-s3]]></category>
            <dc:creator><![CDATA[Romain Rigaux]]></dc:creator>
            <pubDate>Fri, 13 Aug 2021 17:17:21 GMT</pubDate>
            <atom:updated>2021-08-13T19:40:37.939Z</atom:updated>
            <content:encoded><![CDATA[<p>Leverage a <a href="https://docs.gethue.com/developer/api/rest/">REST API</a> to simplify your data files interactions like list, upload, download in the public object storage Clouds.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*PoC1uiyQ2pnh5QS5JiOoMg.png" /><figcaption>Same file operations as in the Web App available as REST API calls</figcaption></figure><p>This post comes with a live tutorial of the <a href="https://docs.gethue.com/user/browsing/#data">Hue file listing API</a> via the demo environment <a href="https://demo.gethue.com/">demo.gethue.com</a>.</p><p>Background: the <a href="https://gethue.com/">Hue SQL Editor</a> project has been evolving for more than <a href="http://localhost:1313/blog/2020-01-28-ten-years-data-querying-ux-evolution/">10 years</a> and allows you to query any Database or Data Warehouse.</p><p>Recently: like previously described in the <a href="http://localhost:1313/blog/2021-05-29-create-own-sql-editor-via-webcomponent-and-public-api/">SQL Editor API post</a>, all the end user functionalities and under the cover grunt work of integration can now be simply reused programmatically (freeing up time to let you focus on the data work itself instead).</p><p>The main use cases for the File API is to upload data and create an SQL Table on top of them or retrieve those pesky file URIs:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*E8xoFQDJrd5J_gM_.png" /><figcaption>Quick Path copy or open file in the Create Table Wizard</figcaption></figure><p>The API leverages the standard credentials of your users (SSO via LDAP, SAML…) and is the same as if they were interacting via the Web UI directly. In bonus, it is cloud agnostic so nobody is required to learn about the intricacies of each provider, and simply use an interface they are already familiar with.</p><h3>API Demo</h3><p>The simplest operation is to list the content of your buckets or directories (aka known as “list dir”).</p><p>Start by authenticating and asking for an API access token (also known as JWT):</p><pre>curl -X POST https://demo.gethue.com/api/token/auth -d &#39;username=demo&amp;password=demo&#39;</pre><pre>{&quot;refresh&quot;:&quot;eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoicmVmcmVzaCIsImV4cCI6MTYyOTQ3MTE0MiwianRpIjoiYjNkMDUzN2I1OGU5NDNlZGE0OTJiYzVmOTkzMDEwOTEiLCJ1c2VyX2lkIjoyfQ._MXo09PzisvqY7-1NMVIaLiUCVksYx2ZA5v_PWTk0TY&quot;,&quot;<strong>access</strong>&quot;:&quot;eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjI4OTUyNzQyLCJqdGkiOiJkYTEzZjI2OWY2N2M0MTNiODNiNGYwNzY1ZDA3NzdmMCIsInVzZXJfaWQiOjJ9.47gnDdIwVSo_cULXU856WUgW8FW7UHXMg7FH-dDpoRc&quot;}</pre><p>Then provide this access value in each following calls. In your case, update the examples below with your own:</p><pre>Authorization: Bearer &lt;Your &quot;access&quot; value here&gt;</pre><p>Here is how to list the content of a path, here the S3 bucket s3a://demo-gethue:</p><pre>curl -X GET https://demo.gethue.com/api/storage/view=s3a://demo-gethue -H &quot;Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjI4OTUyNzQyLCJqdGkiOiJkYTEzZjI2OWY2N2M0MTNiODNiNGYwNzY1ZDA3NzdmMCIsInVzZXJfaWQiOjJ9.47gnDdIwVSo_cULXU856WUgW8FW7UHXMg7FH-dDpoRc&quot;</pre><pre>{<br>  &quot;path&quot;: &quot;s3a://demo-gethue&quot;,<br>  &quot;breadcrumbs&quot;: [<br>    {<br>      &quot;url&quot;: &quot;s3a%3A%2F%2F&quot;,<br>      &quot;label&quot;: &quot;s3a://&quot;<br>    },<br>    {<br>      &quot;url&quot;: &quot;s3a%3A%2F%2Fdemo-gethue&quot;,<br>      &quot;label&quot;: &quot;demo-gethue&quot;<br>    }<br>  ],<br>  &quot;current_request_path&quot;: &quot;/filebrowser/view=s3a%3A%2F%2Fdemo-gethue&quot;,<br>  &quot;is_trash_enabled&quot;: false,<br>  &quot;files&quot;: [<br>    {<br>      &quot;path&quot;: &quot;s3a://&quot;,<br>      &quot;name&quot;: &quot;..&quot;,<br>      &quot;stats&quot;: {<br>        &quot;path&quot;: &quot;s3a://&quot;,<br>        &quot;size&quot;: 0,<br>        &quot;atime&quot;: null,<br>        &quot;mtime&quot;: null,<br>        &quot;mode&quot;: 16895,<br>        &quot;user&quot;: &quot;&quot;,<br>        &quot;group&quot;: &quot;&quot;,<br>        &quot;aclBit&quot;: false<br>      },<br>      &quot;mtime&quot;: &quot;&quot;,<br>      &quot;humansize&quot;: &quot;0 bytes&quot;,<br>      &quot;type&quot;: &quot;dir&quot;,<br>      &quot;rwx&quot;: &quot;drwxrwxrwx&quot;,<br>      &quot;mode&quot;: &quot;40777&quot;,<br>      &quot;url&quot;: &quot;/filebrowser/view=s3a%3A%2F%2F&quot;,<br>      &quot;is_sentry_managed&quot;: false<br>    },<br>    {<br>      &quot;path&quot;: &quot;s3a://demo-gethue&quot;,<br>      &quot;name&quot;: &quot;.&quot;,<br>      &quot;stats&quot;: {<br>        &quot;path&quot;: &quot;s3a://demo-gethue&quot;,<br>        &quot;size&quot;: 0,<br>        &quot;atime&quot;: 1628866612,<br>        &quot;mtime&quot;: 1628866612,<br>        &quot;mode&quot;: 16895,<br>        &quot;user&quot;: &quot;&quot;,<br>        &quot;group&quot;: &quot;&quot;,<br>        &quot;aclBit&quot;: false<br>      },<br>      &quot;mtime&quot;: &quot;August 13, 2021 02:56 PM&quot;,<br>      &quot;humansize&quot;: &quot;0 bytes&quot;,<br>      &quot;type&quot;: &quot;dir&quot;,<br>      &quot;rwx&quot;: &quot;drwxrwxrwx&quot;,<br>      &quot;mode&quot;: &quot;40777&quot;,<br>      &quot;url&quot;: &quot;/filebrowser/view=s3a%3A%2F%2Fdemo-gethue&quot;,<br>      &quot;is_sentry_managed&quot;: false<br>    },<br>    {<br>      &quot;path&quot;: &quot;s3a://demo-gethue/data&quot;,<br>      &quot;name&quot;: &quot;data&quot;,<br>      &quot;stats&quot;: {<br>        &quot;path&quot;: &quot;s3a://demo-gethue/data/&quot;,<br>        &quot;size&quot;: 0,<br>        &quot;atime&quot;: null,<br>        &quot;mtime&quot;: null,<br>        &quot;mode&quot;: 16895,<br>        &quot;user&quot;: &quot;&quot;,<br>        &quot;group&quot;: &quot;&quot;,<br>        &quot;aclBit&quot;: false<br>      },<br>      &quot;mtime&quot;: &quot;&quot;,<br>      &quot;humansize&quot;: &quot;0 bytes&quot;,<br>      &quot;type&quot;: &quot;dir&quot;,<br>      &quot;rwx&quot;: &quot;drwxrwxrwx&quot;,<br>      &quot;mode&quot;: &quot;40777&quot;,<br>      &quot;url&quot;: &quot;/filebrowser/view=s3a%3A%2F%2Fdemo-gethue%2Fdata&quot;,<br>      &quot;is_sentry_managed&quot;: false<br>    }<br>  ],<br>  &quot;page&quot;: {<br>    &quot;number&quot;: 1,<br>    &quot;num_pages&quot;: 1,<br>    &quot;previous_page_number&quot;: 0,<br>    &quot;next_page_number&quot;: 0,<br>    &quot;start_index&quot;: 1,<br>    &quot;end_index&quot;: 1,<br>    &quot;total_count&quot;: 1<br>  },<br>  &quot;pagesize&quot;: 30,<br>  &quot;home_directory&quot;: null,<br>  &quot;descending&quot;: null,<br>  &quot;cwd_set&quot;: true,<br>  &quot;file_filter&quot;: &quot;any&quot;,<br>  &quot;current_dir_path&quot;: &quot;s3a://demo-gethue&quot;,<br>  &quot;is_fs_superuser&quot;: false,<br>  &quot;groups&quot;: [],<br>  &quot;users&quot;: [],<br>  &quot;superuser&quot;: null,<br>  &quot;supergroup&quot;: null,<br>  &quot;is_sentry_managed&quot;: false,<br>  &quot;apps&quot;: [<br>    &quot;filebrowser&quot;,<br>    &quot;metastore&quot;,<br>    &quot;useradmin&quot;,<br>    &quot;indexer&quot;,<br>    &quot;notebook&quot;<br>  ],<br>  &quot;show_download_button&quot;: true,<br>  &quot;show_upload_button&quot;: true,<br>  &quot;is_embeddable&quot;: false,<br>  &quot;s3_listing_not_allowed&quot;: &quot;&quot;<br>}</pre><p>Some of the parameters:</p><ul><li>pagesize=45 (number of items to return)</li><li>pagenum=1 (pagination)</li><li>filter=file names text to match, can be empty</li><li>sortby=name (field to use for sorting)</li><li>descending=false (keep sorting alphabetical)</li></ul><p>e.g. pagesize=45&amp;pagenum=1&amp;filter=&amp;sortby=name&amp;descending=false</p><p>Then peek at the data of the s3a://demo-gethue/data/web_logs/index_data.csv file:</p><pre>curl -X GET https:<em>//demo.gethue.com/api/storage/view=s3a://demo-gethue/data/web_logs/index_data.csv -H &quot;Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjI4OTUyNzQyLCJqdGkiOiJkYTEzZjI2OWY2N2M0MTNiODNiNGYwNzY1ZDA3NzdmMCIsInVzZXJfaWQiOjJ9.47gnDdIwVSo_cULXU856WUgW8FW7UHXMg7FH-dDpoRc&quot;</em></pre><pre>{<br>  &quot;show_download_button&quot;: true,<br>  &quot;is_embeddable&quot;: false,<br>  &quot;editable&quot;: false,<br>  &quot;mtime&quot;: &quot;October 31, 2016 03:34 PM&quot;,<br>  &quot;rwx&quot;: &quot;-rw-rw-rw-&quot;,<br>  &quot;path&quot;: &quot;s3a://demo-gethue/data/web_logs/index_data.csv&quot;,<br>  &quot;stats&quot;: {<br>  &quot;size&quot;: 6199593,<br>  &quot;aclBit&quot;: false,<br>  ...............<br>  &quot;contents&quot;: &quot;code,protocol,request,app,user_agent_major,region_code,country_code,id,city,subapp,latitude,method,client_ip,  user_agent_family,bytes,referer,country_name,extension,url,os_major,longitude,device_family,record,user_agent,time,os_family,country_code3<br>    200,HTTP/1.1,GET /metastore/table/default/sample_07 HTTP/1.1,metastore,,00,SG,8836e6ce-9a21-449f-a372-9e57641389b3,Singapore,table,1.2931000000000097,GET,128.199.234.236,Other,1041,-,Singapore,,/metastore/table/default/sample_07,,103.85579999999999,Other,&quot;demo.gethue.com:80 128.199.234.236 - - [04/May/2014:06:35:49 +0000] &quot;&quot;GET /metastore/table/default/sample_07 HTTP/1.1&quot;&quot; 200 1041 &quot;&quot;-&quot;&quot; &quot;&quot;Mozilla/5.0 (compatible; phpservermon/3.0.1; +http:<em>//www.phpservermonitor.org)&quot;&quot;</em><br>    &quot;,Mozilla/5.0 (compatible; phpservermon/3.0.1; +http://www.phpservermonitor.org),2014-05-04T06:35:49Z,Other,SGP<br>    200,HTTP/1.1,GET /metastore/table/default/sample_07 HTTP/1.1,metastore,,00,SG,6ddf6e38-7b83-423c-8873-39842dca2dbb,Singapore,table,1.2931000000000097,GET,128.199.234.236,Other,1041,-,Singapore,,/metastore/table/default/sample_07,,103.85579999999999,Other,&quot;demo.gethue.com:80 128.199.234.236 - - [04/May/2014:06:35:50 +0000] &quot;&quot;GET /metastore/table/default/sample_07 HTTP/1.1&quot;&quot; 200 1041 &quot;&quot;-&quot;&quot; &quot;&quot;Mozilla/5.0 (compatible; phpservermon/3.0.1; +http:<em>//www.phpservermonitor.org)&quot;&quot;</em><br>    &quot;,Mozilla/5.0 (compatible; phpservermon/3.0.1; +http://www.phpservermonitor.org),2014-05-04T06:35:50Z,Other,SGP<br>  ...............<br>}</pre><p>Some of the parameters:</p><ul><li>offset=0</li><li>length=204800</li><li>compression=none</li><li>mode=text</li></ul><p>e.g. ?offset=0&amp;length=204800&amp;compression=none&amp;mode=text</p><p>And then decide to download it:</p><pre>curl -X GET https://demo.gethue.com/api/storage/download=s3a://demo-gethue/data/web_logs/index_data.csv -H &quot;Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjI4OTUyNzQyLCJqdGkiOiJkYTEzZjI2OWY2N2M0MTNiODNiNGYwNzY1ZDA3NzdmMCIsInVzZXJfaWQiOjJ9.47gnDdIwVSo_cULXU856WUgW8FW7UHXMg7FH-dDpoRc&quot;</pre><p>It is also possible to upload your data directly (if you have the proper write permissions in the remote destination folder).</p><p>Here we send the local file README.md to the remotes3a://demo-gethue/web_log_data/ directory:</p><pre>curl -X POST https://demo.gethue.com/api/storage/upload/file?dest=s3a://demo-gethue/web_log_data/ --form hdfs_file=@README.md</pre><p>Note: the hdfs_file parameter is a relative or absolute path to a local file. The name is confusing currently, it should be read more like local_file (i.e. not related to HDFS only)</p><h3>Then what?</h3><p>When the data is stored in the cloud, it becomes easy to create a SQL table and query it. One way it to open up the <a href="http://localhost:1313/blog/2021-08-10-open-in-importer-and-copy-path-options-in-filebrowser/">File Browser</a> and copy the path of the data into a CREATE TABLE statement or just go via the Create table wizard which will do all the work for you.</p><p>Note that small data files don’t even need to go via the cloud storage and can be <a href="http://localhost:1313/blog/2021-08-13-public-api-object-file-storage/blog/2021-07-26-create-sql-tables-on-the-fly-with-zero-clicks/">directly uploaded via drag &amp; drop</a> in the Web interface or <a href="https://docs.gethue.com/developer/api/rest/#file-import">Importer API</a>. Something that will be demoed next time, so stay tuned!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*m0B7Sg0oj0BYwXOq.gif" /><figcaption>Directly uploading a file and getting a SQL table ready to query</figcaption></figure><h3>Proper security</h3><p>It is also a good timing. The <a href="https://docs.gethue.com/user/browsing/#data">file listing</a> (for <a href="https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">HDFS</a>, the Hadoop file system) has be present since day one. Later on AWS S3, Azure Storage, Google Cloud Storage (beta) have been added but were lacking fine grained security (i.e. all the users were using the same credentials, so not good).</p><p>This is not true anymore as recently the shared signed URL technology of these cloud storages is being leveraged under the hood to have each user perform file operations under their own distinct credentials. This allows true self service instead of restricting data uploads to only admin. Users can be trusted and upload their own files and analyze them without contacting anybody else. Another bottleneck removed!</p><p>If interested in more technical details, read more about <a href="http://localhost:1313/blog/2021-06-30-how-to-use-azure-storage-rest-api-with-shared-access-sginature-sas-tokens/">AWS Shared Signature</a> or <a href="http://localhost:1313/blog/2021-04-23-s3-file-access-without-any-credentials-and-signed-urls/">Azure Signed URLs</a>.</p><figure><img alt="Open the Create Table Wizard or copy a file URI" src="https://cdn-images-1.medium.com/proxy/1*hqkJ2QR1SdLf4Af0Z4ABHw.png" /><figcaption>Hue or Compose app contacting a middleware service that converts raw calls to object storages into custom signed URLs in order to provide fine grained authorization</figcaption></figure><h3>Sum-up</h3><p>Now there is no excuses to not be data driven and provide self service analytics to your hungry users ;)</p><p>Using GCP or other storages? Let us know!</p><p>And in case you missed it, the coolest API is actually the <a href="https://docs.gethue.com/developer/api/rest/#execute-a-query">Execute a SQL query</a>, play with it!</p><p>Onwards!</p><p>Romain</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=bd34c215bbb3" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-querying/object-file-storage-public-rest-api-bd34c215bbb3">Object/File Storage public REST API</a> was originally published in <a href="https://medium.com/data-querying">Data Querying</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>