<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by The Sensible Code Company on Medium]]></title>
        <description><![CDATA[Stories by The Sensible Code Company on Medium]]></description>
        <link>https://medium.com/@sensiblecode?source=rss-a47c6b913af------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*_rctCMAc4fv4A0ZX-kWHGQ.png</url>
            <title>Stories by The Sensible Code Company on Medium</title>
            <link>https://medium.com/@sensiblecode?source=rss-a47c6b913af------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Mon, 13 Apr 2026 21:23:32 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@sensiblecode/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Digital Marketing Manager — 100% Remote, UK & Ireland]]></title>
            <link>https://sensiblecode.medium.com/digital-marketing-manager-100-remote-uk-ireland-7fdd8d062fc?source=rss-a47c6b913af------2</link>
            <guid isPermaLink="false">https://medium.com/p/7fdd8d062fc</guid>
            <category><![CDATA[saas-sales]]></category>
            <category><![CDATA[b2b-marketing]]></category>
            <category><![CDATA[marketing]]></category>
            <category><![CDATA[remote-working]]></category>
            <category><![CDATA[trends]]></category>
            <dc:creator><![CDATA[The Sensible Code Company]]></dc:creator>
            <pubDate>Tue, 15 Jun 2021 09:30:45 GMT</pubDate>
            <atom:updated>2021-09-20T15:15:40.807Z</atom:updated>
            <content:encoded><![CDATA[<h3>Digital Marketing Manager — 100% Remote, Ireland &amp; UK</h3><p><em>Full or part-time, remote or semi-remote</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/945/1*qRNg_FbYRaw9MJco_IQMLQ.png" /></figure><p>This is a remote or semi-remote role; full-time or part-time with use of office space in Belfast.</p><h3>The Sensible Code Company uses modern software techniques to transform the processing and publication of data and statistics.</h3><p>Over 10 years ago, we created ScraperWiki: one of the first collaborative in-browser coding environments focused on scraping and wrangling messy data.</p><p>We run <a href="https://pdftables.com/">PDFTables.com</a> which has converted over 100 million pages of PDF files to clean tabular data for its thousands of users.</p><p>Most recently, we’ve built <a href="https://cantabular.com">Cantabular</a>: a software framework for the protection and publication of statistical data. We’ve just used it to <a href="https://cantabular.com/blog/republishing-historic-1911-irish-census-interactive-dataset/">republish the 1911 Ireland census</a> and the Office for National Statistics are using it to automate much of the dissemination for the 2021 England and Wales census.</p><p>We’re a friendly, supportive and experienced remote team focused on delivering high quality software. We are all based within Ireland or UK and have been working remotely as a company since 2018. We collaborate using Slack, Trello, GSuite, Highrise and GitHub. We have daily informal virtual chats over coffee and organise quarterly company meetups in locations across UK &amp; Europe.</p><h3>About this role</h3><p>We’re looking for a Digital Marketing Manager who will help us to grow and develop our European and international customer base.</p><h3>Key things to know about this job:</h3><ul><li>Work with the product team to plan for strategic growth</li><li>Understand how the marketing plan contributes to revenue objectives</li><li>Relish being empowered to make decisions</li><li>Drive B2B customer acquisition and retention</li><li>Lead the digital presence (SEO, PPC, Email and Social etc.)</li><li>Research, write and publish blogs, case studies, and white papers</li><li>Analyse campaigns, consumer trends and the competitive landscape</li><li>Influence, educate and prompt colleagues on the benefits of marketing</li><li>Lead generation and processing</li></ul><h3>You’ll be responsible for:</h3><ul><li>Creating and executing end-to-end marketing campaigns</li><li>Developing and delivering technical and business-level messaging</li><li>Brand identity, awareness and recognition</li><li>Implementing our go-to-market strategy</li><li>Acquisition and lead processing for enterprise deals</li><li>Managing virtual and in person events</li></ul><h3>Your skills are:</h3><ul><li>Bachelor’s degree in a relevant subject</li><li>Full or part Chartered Institute of Marketing qualification preferable however we’d also consider other similar qualifications</li><li>Experience working with HTML or CSS</li><li>Solid understanding of modern digital marketing techniques (PPC, SEO, website optimisation)</li><li>Experience using automation and modern social media analytics tools</li><li>Confident creating content in English</li><li>Energy, enthusiasm and commercial awareness</li><li>Self-management and accountability skills</li><li>Understanding of a flexible start-up like environment</li></ul><h3>Bonus points if:</h3><ul><li>2+ years B2B technology-based marketing experience, international experience</li><li>Data analysis experience (Excel, Python, SQL or similar)</li></ul><h3>Pay and benefits:</h3><ul><li>Salary up to £35,000 (pro rata for part-time) and based on experience</li><li>Flexible working times to support a healthy work/life balance</li><li>You can be located anywhere in Ireland &amp; UK</li><li>We offer a generous 30 holiday days plus public holidays (38 total)</li><li>A contribution of £1,330 per year to home office costs</li><li>Learning and training supported for career development</li><li>Expenses covered for internal and external meetings</li><li>Share options available</li></ul><h3>How to apply:</h3><p>Email jobs@sensiblecode.io quoting scjob26 in the subject line with the following information</p><ul><li><strong>Cover letter: </strong>tell us a bit about why you’re interested in this role</li><li><strong>CV or resume: </strong>your professional experience</li><li>Your telephone number</li></ul><p><strong>No agencies please.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7fdd8d062fc" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Automating disclosure checks with our disclosure rules language]]></title>
            <link>https://sensiblecode.medium.com/automating-disclosure-checks-with-our-disclosure-rules-language-5a993e3f17fe?source=rss-a47c6b913af------2</link>
            <guid isPermaLink="false">https://medium.com/p/5a993e3f17fe</guid>
            <category><![CDATA[data-analytics]]></category>
            <category><![CDATA[statistics]]></category>
            <category><![CDATA[r]]></category>
            <category><![CDATA[privacy]]></category>
            <category><![CDATA[census]]></category>
            <dc:creator><![CDATA[The Sensible Code Company]]></dc:creator>
            <pubDate>Wed, 11 Nov 2020 11:00:16 GMT</pubDate>
            <atom:updated>2020-11-11T11:13:46.127Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*erVqlTKQ7b1wnZhdBRMNKQ.jpeg" /><figcaption>Photo by <a href="https://unsplash.com/@ffstop?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Fotis Fotopoulos</a> on <a href="https://unsplash.com/?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></figcaption></figure><h4>Last week we released a new version of <a href="https://cantabular.com/">Cantabular</a> with a big new feature: a disclosure rules language.</h4><p>The disclosure rules language, or DRL, is a tool to help statisticians automate decisions about table publication which they might previously have made using manual analysis techniques. It does this by letting its users encode their own confidentiality rules in a language designed for this purpose.</p><p>It works with our API and user interfaces to automatically check requests for tables built from confidential datasets for disclosure risks such as identity disclosure, attribute disclosure and sparsity.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1004/0*EASiBPZA8sTHgy2Q" /><figcaption><strong>Figure 1:</strong> A schematic of the operation of the DRL within Cantabular.</figcaption></figure><h4>Statistical disclosure control at speed and scale for the England and Wales Census</h4><p>The ONS is deep in preparations for the Census next year, and is working to make sure that business, government and wider society get as much value from the next Census as possible. One path to accomplish this is by allowing people to make their own queries of the data gathered in 2021, instead of relying on a smaller set of predetermined tables.</p><p>To help do this, we’ve been working with them to automate some of the privacy protections they have designed for Census 2021 outputs, based around our product Cantabular.</p><p>Cantabular works by programmatically adding noise, and hence uncertainty, to outputs and screening queries and output tables for disclosure risks, in real-time. The performance of our software means that table protection and production can be automated and used to power flexible dissemination tools or as part of a repeatable statistical production pipeline.</p><h4>Benefits of a disclosure rules language</h4><p>In Censuses past, checking of tables before publication involved a manual or semi-manual process of evaluating the disclosivity of tables by checking for unsafe cells.</p><p>Each published table had to be checked for its own inherent risks, and evaluated alongside all previously published tables to also understand the cumulative risk. This was a hands-on, time-consuming process, but necessary to ensure confidentiality.</p><p>For Census 2021, the ONS’s SDC team decided to explore the possibility of automating these checks, and saving their eyeballs. The DRL we’ve created is one of the results of our collaboration with them on this problem and promises a number of benefits:</p><ul><li><strong>Saving time</strong> by removing the need for manual or semi-manual processes and freeing up thinking time for gnarlier problems.</li><li><strong>Enabling the release of thousands or millions of output tables</strong> by collapsing the time taken to check a table to milliseconds.</li><li><strong>Allowing the creation of custom rules</strong> specifically written to match the structure, content and nuance of a particular dataset.</li><li><strong>Giving methodologists autonomy</strong> to write automated disclosure checks that don’t involve any changes to the underlying software.</li></ul><h4>What can you do with a disclosure rules language?</h4><p>Our new DRL has two elements to it: firstly, it checks that a query received by our API is allowed. Secondly, it checks the disclosivity of the output table produced for each geographic area in the output to evaluate whether it is safe enough to be released.</p><p>These elements, and the flexibility inherent in the language to interrogate the properties of an output table, can support a wide range of use cases.</p><p>Here are a few of them, largely inspired by a paper published by the ONS’s Statistical Disclosure Control team <a href="https://www.ons.gov.uk/file?uri=/methodology/methodologicalpublications/generalmethodology/surveymethodologybulletin/smb79combined2.pdf">(Survey Methodology Bulletin 79— Office for National Statistics)</a></p><ul><li><strong>Set maximum variables:</strong> block queries that will lead to overly sparse outputs before they’re even run by limiting the number of variables that can be added to a query.</li><li><strong>Limit queries for sensitive variables</strong>: queries made including particularly sensitive variables could be limited to a smaller number of variables.</li><li><strong>Attribute disclosure</strong>: individual or group attribute disclosure in a table can be detected and tables with too many instances can be blocked.</li><li><strong>Identity disclosure:</strong> tables containing too many values of one can also be blocked.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/714/1*OzL0_-uvn0RSdZsiu583pw.png" /></figure><h3>Beyond Census</h3><p>We’re expecting that the DRL will allow the use of Cantabular for other datasets and by other clients with different needs and in different data domains.</p><p>To that end, our language definition is open, to ensure it can be shared, discussed and understood, and to allow anyone to make an implementation.</p><h3>Want to find out more?</h3><p>If you’d like to find out more about how our product Cantabular works, we’d love to hear from you. Drop us an email at <a href="mailto:hello@sensiblecode.io">hello@sensiblecode.io</a>.</p><h4>We’re hiring for a <a href="https://sensiblecode.medium.com/statistical-disclosure-control-specialist-full-time-8cc6a866759c">senior methodologist role</a>.</h4><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=5a993e3f17fe" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Press Release]]></title>
            <link>https://sensiblecode.medium.com/press-release-cf3d7d752bdf?source=rss-a47c6b913af------2</link>
            <guid isPermaLink="false">https://medium.com/p/cf3d7d752bdf</guid>
            <category><![CDATA[census]]></category>
            <category><![CDATA[statistics]]></category>
            <category><![CDATA[government]]></category>
            <category><![CDATA[dissemination]]></category>
            <dc:creator><![CDATA[The Sensible Code Company]]></dc:creator>
            <pubDate>Wed, 08 Jul 2020 07:53:13 GMT</pubDate>
            <atom:updated>2020-07-08T07:53:13.877Z</atom:updated>
            <content:encoded><![CDATA[<h3>Press Release: Gerry O’Hanlon, Former Director General CSO Ireland joins Sensible Code as Non Exec Director</h3><h3>The Sensible Code Company is pleased to announce the appointment of Gerry O’Hanlon, former Director General Central Statistics Ireland, as a non executive director.</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/300/1*YFDa-fxWwuXbgsrUaByFEg.jpeg" /></figure><p>SensibleCode recently signed a £1.3 million contract with the UK Office for National Statistics, for Cantabular; data privacy technology that delivers real time statistical disclosure control.</p><blockquote><strong>Gerry O’Hanlon said …. </strong>“I am delighted to be joining The Sensible Code Company as a Non-Executive Director. As a career statistician, an ever-present key challenge was to maximise the statistical potential of available data to meet the needs of all users while ensuring that the principle of statistical confidentiality was fully adhered to in respect of all data subjects. The Cantabular software provides a cutting edge solution to assist Statistical Offices and other data holders in meeting this challenge in a timely and safe manner in an era of ever-increasing data availability and a concomitant focus on data protection”</blockquote><blockquote><strong>CEO, Aidan McGuire added …</strong> “We are delighted to welcome Gerry O’Hanlon to Sensible Code, his arrival comes at a key moment; the Coronavirus pandemic shows the importance of timely granular statistical data for accurate modelling and planning. Cantabular has the capacity to transform and accelerate data dissemination to enable better policy &amp; economic decision making“</blockquote><h3>Press contacts</h3><p>Aine McGUIRE <br><a href="mailto:aine@sensiblecode.io">aine@sensiblecode.io</a> <br>+44 (0)7710 377929<br>Ormeau Baths, 18 Ormeau Avenue, Belfast BT2 8HS</p><h3><strong>Note to editors</strong></h3><p>Gerry was a top-level manager in the Irish Central Statistics Office (CSO) for over twenty years prior to his retirement as Director-General in 2012. From 2013 to 2019, he was a member of the Good Practice Advisory Committee of the Greek Statistical System, serving as Chairperson of the Committee from 2013 to 2017. In recent years he has also led high-level teams in conducting extensive reviews of national statistical systems in over ten EU and other European countries. He has a BSc in Mathematics and Mathematical Statistics from University College Cork and a Masters in Strategic Management (Public Service) from Trinity College Dublin.</p><h3><strong>About The Sensible Code Company</strong></h3><p>The Sensible Code Company is a digital start-up with venture capital backing. It has won several awards for innovative technology products. Its ground breaking technology Cantabular(TM) modernises the processing and dissemination of data. The software is designed to support statisticians and data controllers, to help improve business operations that require the processing of confidential data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*O_2Ur6w3v0WczQZg" /><figcaption><a href="https://cantabular.com/?utm_source=gohanlon-pr">Cantabular.com</a></figcaption></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=cf3d7d752bdf" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[ONS hack day Census data unplugged]]></title>
            <link>https://medium.com/swlh/ons-hack-day-census-data-unplugged-f575728d75cc?source=rss-a47c6b913af------2</link>
            <guid isPermaLink="false">https://medium.com/p/f575728d75cc</guid>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[statistics]]></category>
            <category><![CDATA[analytics]]></category>
            <category><![CDATA[big-data]]></category>
            <category><![CDATA[privacy]]></category>
            <dc:creator><![CDATA[The Sensible Code Company]]></dc:creator>
            <pubDate>Thu, 25 Jun 2020 08:22:21 GMT</pubDate>
            <atom:updated>2020-06-29T19:32:48.120Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*HOkqVfm4U3CFtzSR" /><figcaption>Talking through the ideas</figcaption></figure><p><strong>We’d planned a hack day at the Office for National Statistics for </strong><a href="https://medium.com/@SensibleCode/engineering-privacy-for-census-2021-more-data-more-quickly-and-for-more-areas-a24439847973?source=---------7------------------"><strong>our work on Census 2021</strong></a><strong> towards the end of May. But the arrival of Covid-19 meant we had to pivot to making the event virtual.</strong></p><p><strong>The objective was to allow ONS data scientists and analysts the opportunity to road test an innovation that will happen when anonymised, high level Census 2021 data are released. It’s called flexible table builder and is powered by our software </strong><a href="https://cantabular.com/"><strong>Cantabular</strong></a><strong>.</strong></p><p><strong>Cantabular applies privacy protections as a query is processed in order to produce safe tables of aggregated data. Queries are facilitated by an API that has two design objectives; security and speed.</strong></p><h3>What is a flexible table builder?</h3><p>We’ve been working with the ONS to modernise the way Census data are protected and disseminated. At the heart of the census is the principle of keeping information safe, confidential, secure and private — no one can find out individual’s details for 100 years.</p><p>However, aggregated census data are essential to inform decisions nationally and locally on vital services and issues like diversity. And the ONS has made a commitment to provide more timely information and analysis.</p><p>At the start of the hackday, our sponsor introduced the concept and how it had the potential to dramatically change how Census data can be made available</p><p><strong><em>Before</em></strong><em>:</em> For Census 2011, there was a 16 month lag between the final day of collection and the first output table appearing on the ONS website. It took up to 4 years to produce all the output tables. It was a time consuming process that required significant human intervention and expert decision making.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*ci3btYh26cPLqR5u" /></figure><p><strong><em>After</em></strong><em>:</em> the ONS wants to collapse the time frame, producing first results within 12 months of data collection, and make as many census statistics as possible available soon after that first date of publication; aiming to do so within a year. This will increase the contribution census data make to the economy and provide much higher value to ONS customers.</p><h3>Modern tools and a large synthetic dataset</h3><p>We’ve been working on a Python API client that makes it easy for data scientists to query the data as part of their applications and visualizations. We created extra documentation and ran a short tutorial at the start of the event to help the analysts get started and see some useful code snippets.</p><p>We spun up a server instance running a hosted <a href="https://jupyter.org/">Jupyter</a> notebook to support anyone who was unable to install the Python package because of security restrictions which worked really well.</p><p>The hackday participants were working with a 57 million row artificial dataset that mimicked the 2011 census and to give a real sense for the lightning speed with which a query can return results.</p><h3>Imagining new uses for census data</h3><p>Despite the challenges of working remotely, the assembled company managed to organise into six groups around an impressive variety of ideas and almost all managed to show a working prototype within a few hours.</p><blockquote>The judges said <em>“The teams worked well together. The fact that it was a remote event did not detract from the quality of work produced; we were impressed with the range of ideas; the speed with which they were realised as well as the potential for these to be developed further.</em>”</blockquote><p><strong>Winner: Map areas where my query is too disclosive</strong></p><p>The winning team approached an issue that is likely to present itself when disclosure control is applied to data. If generated data for an area are too disclosive, the flexible table builder suppresses the output. It may be possible to get more aggregated data by performing the same query at a higher geographic area (e.g. Middle Super Output Area (MSOA instead of LA) or by using variables with fewer categories (e.g. ages grouped into 5 year bands instead of individual years). The program highlights these missing areas using a choropleth map. The team produced a working proof of concept. It demonstrated great collaboration between the data science campus and the census outputs team.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/497/1*43nFrcW7NEZY0uf-tyS3Xg.png" /><figcaption>Choropleth map highlighting missing areas where a query is too disclosive</figcaption></figure><p><strong>Where should I move to?</strong></p><p>The initial idea was to create a tool to help someone find places they might like to live based on factors that were important to them such as good health or level of education.</p><p>A user would select factors that were important to them, and the tool would perform queries using the API to find areas that met their desired criteria.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/0*zcP4-FAw-IWQU9uM" /><figcaption>Tool to help people find places to live based on factors important to them</figcaption></figure><p><strong>Where is similar to where I live today?</strong></p><p>The purpose of this tool was to identify similar areas based on a set of chosen variables. The tool used a similarity matrix l to find the most similar areas to a specified Local Authority (LA). To improve this program, the team suggests developing a UI and to generate a map on output, highlighting similar LAs. If an LA had a problem, such as low educational attainment, they could use this tool to find other LAs with a similar population but high educational attainment. Planners could then engage with those other LAs to identify any approaches that might benefit their own area.</p><p><strong>Mapping postcodes to census geographies</strong></p><p>This idea looked at linking Census data to a particular postcode. The tool would map the postcode to areas in the census geographic hierarchy e.g. What Lower Super Output Area includes my postcode?’ A UI was developed that enabled the user to input a postcode into a search box at the top of a page, then when the postcode is submitted, a list of anonymised information would be output. This approach could be applied to alternative geographies such as school district.</p><p><strong>The commonalities quiz</strong></p><p>This team worked on an idea to create a fun game using census data. A commonalities quiz was produced and presented in an easy-to-use UI. Random shuffling was used to make the quiz different each time it was run. The questions are based around identifying areas that are similar according to different characteristics. The project showed that a dataset can be used to return random questions. A quiz like this could be tailored for different audiences e.g. school children.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/610/1*7hxQ7z_Hze39oczz0qbaiA.png" /><figcaption>2021 Fake Census Data Quiz</figcaption></figure><p><strong>Identifying Covid-19 risk factors</strong></p><p>This idea was inspired by a recent ONS publication around ethnicity and the risk of being a victim of Covid-19. The team looked for a link between various variables (such as commuting distance and mode of transport) and Covid-19 deaths. Their hypothesis was that public transport was a contributing factor in the spread of Covid-19 — the longer spent on transport, the higher risk of being infected. The team produced a program that estimated the average time a commuter spent on public transport using a number of basic assumptions. They feel confident that the queries could be combined with more accurate commuter data to establish whether commute time is a Covid-19 risk factor. A similar approach could be applied to other variables.</p><h3>A virtual hack day requires lots of planning!</h3><p>We’d originally planned for an onsite face to face hack day — for which we had experience. Covid lockdown put that idea on ice. We prepared for weeks — we tested and retested the systems. We had a group of users who downloaded the Python client and made sure they were able to access the hosted synthetic dataset. We also had fantastic support to mobilise a bunch of analysts from the data science campus, census outputs and digital publishing including a data analyst in Beirut. We used a designated <strong>Slack</strong> workspace with channels for announcements, team communications and tech support. We used <strong>Google Meet</strong> for presentations and for the API tutorial.</p><h3>What happens next?</h3><p>We’re hoping that some of the ideas presented on the day will be considered for further development and to inspire analysts inside the ONS. We’re developing a Go client and we’re responding to user feedback on how the API worked for the participants.</p><p>We’d like to thank everyone at the ONS who participated and who helped make the event successful.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/370/0*gA4ekFU7baM1a2aB" /><figcaption>Hack day sticker</figcaption></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f575728d75cc" width="1" height="1" alt=""><hr><p><a href="https://medium.com/swlh/ons-hack-day-census-data-unplugged-f575728d75cc">ONS hack day Census data unplugged</a> was originally published in <a href="https://medium.com/swlh">The Startup</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Go Cantabular!]]></title>
            <link>https://sensiblecode.medium.com/go-cantabular-f117a98adf02?source=rss-a47c6b913af------2</link>
            <guid isPermaLink="false">https://medium.com/p/f117a98adf02</guid>
            <category><![CDATA[statistics]]></category>
            <category><![CDATA[go]]></category>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[data]]></category>
            <category><![CDATA[golang]]></category>
            <dc:creator><![CDATA[The Sensible Code Company]]></dc:creator>
            <pubDate>Wed, 15 Jan 2020 11:27:43 GMT</pubDate>
            <atom:updated>2020-01-17T15:31:00.131Z</atom:updated>
            <content:encoded><![CDATA[<p>The UK based Office for National Statistics has selected Cantabular to allow flexible dissemination for Census 2021 data. The UK Census is a significant project in both scale and budget (<a href="https://researchbriefings.parliament.uk/ResearchBriefing/Summary/SN05230">estimate for 2011 Census £482 million over a decade</a>).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*Angtk-6nK2sUMKMYwYDOHA.png" /><figcaption>Cantabular — Real-time flexible data dissemination with powerful privacy protection</figcaption></figure><p><a href="https://cantabular.com?utm_source=go-cantabular-blog">Cantabular</a> is designed for organisations that want to share statistics derived from sensitive and potentially personally identifiable data, whilst protecting privacy. It is designed to modernise statistics.</p><p><a href="https://sensiblecode.io/">Sensible Code</a> decided to use the Go programming language (also known as Golang) to build the latest version of its product. The Cantabular team have been working on it for three years and found Go a great fit; almost everyone developing in Go at Sensible Code learned it on the project. Go has a small set of keywords to learn, and a compact and readable language <a href="https://golang.org/ref/spec">specification</a>.</p><h3>Strong tooling and library support</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*yaIstUSx_k-7NL6c" /><figcaption>Picture: Marco Franssen <a href="https://marcofranssen.nl/about/">https://marcofranssen.nl/about/</a></figcaption></figure><p>The toolkit provided by the language core helps all developers, whether new to the language or not. A standout is the <em>gofmt</em> tool that formats valid Go code into a uniform standard. By defining a standard, there are no time-wasting debates about how the code should be formatted. It enables our entire Go codebase to be formatted in this standard way, making the code look more uniform and easier to read. In addition <em>gofmt</em> is used to support the continuous integration process in Cantabular and to ensure the code is formatted correctly for every build.</p><p>Furthermore, Go features a test framework (<a href="https://golang.org/cmd/go/#hdr-Test_packages"><em>go test</em></a>) and has performance tooling (<a href="https://blog.golang.org/profiling-go-programs"><em>pprof</em></a>) for profiling; both of these are used in Cantabular. Go reduces the friction to getting started by bundling the core development tools with the language. This is especially important for new developers and it’s worth mentioning the useful defaults included.</p><p>Go’s <a href="https://golang.org/pkg/">extensive official library</a> enabled the Cantabular team <strong>to minimise external dependencies </strong>for this enterprise scale project. As <a href="https://queue.acm.org/detail.cfm?id=3344149">this excellent paper</a> by Russ Cox (<a href="https://twitter.com/_rsc"><strong>@</strong>_rsc</a>) highlights, the convenience of including external code has an overhead associated with <a href="https://www.owasp.org/index.php/Component_Analysis">auditing and managing dependencies</a>. This is a concern in security-critical environments; a software’s attack surface can increase with large numbers of dependencies, both direct and transitive.</p><h3>Safety features</h3><p>Security is an important consideration for Cantabular since it processes sensitive data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/400/0*_w07LINRp7CWccDt" /><figcaption>Picture: Renee French <a href="https://blog.golang.org/10years">The Go Blog</a></figcaption></figure><p>Go is a memory safe language and includes features that avoid security bugs that might occur if Cantabular was developed in another language. <a href="https://research.swtch.com/gorace">Although data races can circumvent this memory safety</a>, there is also an in-built <a href="https://golang.org/doc/articles/race_detector.html">race checker</a> to help catch them.</p><p>Go’s static type system helps too. Data types of values are checked at compilation time, before the code is even run. This process helps catch bugs, rather than discovering a problem while the code is actually running. Additionally, Go’s static typing helps Cantabular offer protection against accidentally publishing raw data by defining distinct types for publication-safe and publication-unsafe values.</p><p>The built-in Go tooling discussed earlier helps with safety too; <a href="https://golang.org/cmd/vet/"><em>go vet</em></a> is a built-in static analysis tool for checking for mistakes or bad coding practices. A large range of third-party <a href="https://github.com/mre/awesome-static-analysis#go">static analysis tools</a> supplement the built-in tooling. As these analysis tools are often written in Go too, they are easy to deploy — see the next section. This makes it simple to incorporate these tools into continuous integration pipelines for providing automated code quality checking that can help highlight <a href="https://blog.trailofbits.com/2019/11/07/attacking-go-vr-ttps/">potential security flaws</a>.</p><h3>Rapid build and deployment</h3><p>Cantabular customers run our code in their secure environments; we do not have access to their systems to debug deployment problems. Customers shouldn’t need to configure prerequisite software to use Cantabular because Go compiles code to standalone binary executables. One hitch is that getting <a href="https://github.com/golang/go/issues/26492">static Go binaries is not quite as simple as it could be</a>, but if this is not yet <em>simple</em>, then at least it is <em>possible</em>. Static binaries remove guesswork on what libraries and packages our customers have installed, and avoids us having to suggest additional software requirements (e.g. Docker, Ansible or others).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/0*7sOMRJCEklmg_w_Y" /><figcaption>Picture: Ying Kit Yuen</figcaption></figure><p>Go code can be compiled for multiple platforms from a single operating system; further reducing barriers to a customer looking to run our software. Enterprise customers have varying computing requirements. Some customers are Windows-only shops and the ability to offer a Windows installation option with minimum effort is a great benefit to them. Internally, we’ve benefited from Go’s cross-platform support too: most developers at Sensible Code use Linux, but developers on our Go projects sometimes use OS X.</p><p>Last, and by no means least, quick builds were a <a href="https://talks.golang.org/2009/go_talk-20091030.pdf">must have</a> for the Go language developers from the outset, and at Sensible Code we benefit directly from this by not waiting long for our code to compile!</p><h3>Going forward</h3><p>Cantabular is a strategic product for the company; we must provide long term support to all our customers since the product will be operational for some years. There is a need therefore to consider the plan for Go’s forward compatibility.</p><blockquote><a href="https://golang.org/doc/go1compat">It is intended that programs written to the Go 1 specification will continue to compile and run correctly, unchanged, over the lifetime of that specification.</a></blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/881/0*HaI8huMzDZfGzrk0" /></figure><p>Being able to rebuild the source to incorporate subsequent security fixes to the Go compiler and libraries over a support period, with minimum effort, reduces code maintenance overhead. This means that future Go performance improvements can be passed to customers. Hopefully, this compatibility promise also proves true when Go 2 is finally released too, as this post suggests:</p><blockquote><a href="https://blog.golang.org/toward-go2">Go 2 must also bring along all the existing Go 1 source code. We must not split the Go ecosystem.</a></blockquote><h3>Where do we Go next?</h3><p>There are tradeoffs in the choice of a programming language and a team has to find a compromise by selecting a language that is suitable for the application domain, and for developer comfort and productivity. The positive experiences with Go certainly make it a candidate for SensibleCode future projects.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*ofZvsrOk_3DX3BoA" /><figcaption>Picture: Steven Maude, Golang Meetup April 2019</figcaption></figure><p>This post is based on a presentation that <a href="https://www.stevenmaude.co.uk/">Steven Maude</a> presented on behalf of SensibleCode at <a href="https://github.com/goamsterdam/meetups">Golang Amsterdam</a> in 2019.</p><p>Thanks to <a href="https://twitter.com/brocaar">Orne Brocaar</a> and the rest of the great team of organisers for inviting us to speak and for hosting us in the evening.</p><p>Want to know more about Cantabular? <a href="https://cantabular.com?utm_source=go-cantabular-blog">Visit the Cantabular site</a>!</p><p>SensibleCode are looking for a <a href="https://medium.com/@SensibleCode/statistical-disclosure-control-specialist-full-time-8cc6a866759c">Statistical Disclosure Control Specialist</a> to help build core expertise in the discipline.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f117a98adf02" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Cantabular — privacy preserving technology]]></title>
            <link>https://sensiblecode.medium.com/cantabular-privacy-preserving-technology-1d44eddc20a0?source=rss-a47c6b913af------2</link>
            <guid isPermaLink="false">https://medium.com/p/1d44eddc20a0</guid>
            <category><![CDATA[data]]></category>
            <category><![CDATA[statistics]]></category>
            <category><![CDATA[government]]></category>
            <category><![CDATA[census]]></category>
            <category><![CDATA[privacy]]></category>
            <dc:creator><![CDATA[The Sensible Code Company]]></dc:creator>
            <pubDate>Wed, 15 Jan 2020 11:27:07 GMT</pubDate>
            <atom:updated>2020-01-17T16:01:09.789Z</atom:updated>
            <content:encoded><![CDATA[<h3>Cantabular product launch</h3><p>The team at <a href="https://sensiblecode.io/">Sensible Code</a> have been busy for a few years working on an innovative privacy preserving technology called <a href="https://cantabular.com?utm_source=privacy-cantabular-blog">Cantabular</a>. This highly performant disclosure control algorithm protects data in real time as a user or reasearcher makes a query.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*Angtk-6nK2sUMKMYwYDOHA.png" /><figcaption>Cantabular — Modernising statistics by automating disclosure control processes and putting more power in the hands of end-users.</figcaption></figure><p>The UK based Office for National Statistics has selected Cantabular to allow flexible dissemination for anonymised Census 2021 data.</p><blockquote>We’re delighted to be working with the ONS, given its international reputation as the gold standard in statistical practice. Our technology will transform the way Census 2021 data is disseminated and deliver higher value to the economy, through better policy, better business decisions and valuable research. The software applies robust statistical disclosure control techniques in real time. The ONS is able to compute millions of tables of data at high speed whilst protecting anonymity and to ensure data are non-disclosive.<strong><br><em>Aine McGUIRE, Commercial Director, SensibleCode​.</em></strong></blockquote><h3><strong>It’s a journey for the ONS</strong></h3><p>The outputs team at the ONS have been in consultation with users since 2017. From day one the statistical disclosure control professionals within the ONS have been engaged in the process. To give some sense of this, we’ve included references to some public events and consultations.</p><blockquote>United Nations Economic and Social Council, Economic Commission for Europe Conference of European Statisticians (ECE/CES/GE.41/2018/18)</blockquote><blockquote><strong>Abstract:</strong> The Office for National Statistics has been working to ensure 2021 Census outputs are more flexible, timely and accessible compared to the 2011 Census. This document outlines our strategic vision for the dissemination of 2021 Census outputs. We also set out the approach we have taken to gather feedback from a spectrum of users on our design and content and how we are planning to incorporate this feedback into our future research. In early 2018, we held a public consultation to outline our vision for the content and design of 2021 Census outputs.</blockquote><blockquote>This included our plans to disseminate the majority of census data through a single point of access via the ONS website using a flexible dissemination system. This will be enabled through an innovative combination of statistical disclosure control methods, which include targeted record swapping, and an automated layer of light-touch perturbation and final disclosure checks. We also set out our plans for the design and dissemination of specialist products, including microdata samples and origin-destination (flow) data products.</blockquote><blockquote>Source: <a href="https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.41/2018/Meeting-Geneva-Sept/ECE_CES_GE.41_2018_18-1811914E.pdf">Plans for disseminating 2021 Census data for England and Wales</a></blockquote><h3>Applying Cell-Key Perturbation to 2021 Census Outputs</h3><blockquote><strong>By Iain Dove, Stephanie Blanchard, and Keith Spicer</strong></blockquote><blockquote>“In preparation for 2021, the disclosure control team is investigating several methods of protection, including use of targeted record swapping plus cell key perturbation, illustrated below.</blockquote><blockquote>This would specifically protect against disclosure by differencing and allow user-defined outputs to be distributed through an online table builder. The protection and checking will have been applied before the tables are made, so anything available to be built will not need to be checked, and tables will not need to be re-designed. Most protection would still come from targeted record swapping as before, with cell perturbation also protecting against differencing.”</blockquote><blockquote>Source: <a href="https://gss.civilservice.gov.uk/wp-content/uploads/2017/01/ExN-Disclosure-control-methodology-in-2021-Census-outputs-Spicer-Blanchard-Dove-ONS.docx">Applying Cell-Key Perturbation to 2021 Census Outputs</a></blockquote><h3>…and it’s a journey for SensibleCode too</h3><p>We’ve been working with the ONS since 2016. This is an enterprise-scale digital transformation and is driven by user needs. Census is a critically important dataset for the ONS and information assurance is paramount.</p><p>We’ve included some earlier posts that explain the approach to the product design. The product used to be called TableBuilder.</p><ul><li><a href="https://medium.com/@SensibleCode/modernising-statistics-keeping-data-safe-f2a783d40fff">Modernising statistics — keeping data safe</a></li><li><a href="https://medium.com/@SensibleCode/engineering-privacy-for-census-2021-more-data-more-quickly-and-for-more-areas-a24439847973">Engineering confidentiality for Census 2021 — more data, more quickly and for more areas</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=1d44eddc20a0" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Statistical Disclosure Control Specialist (full-time)]]></title>
            <link>https://sensiblecode.medium.com/statistical-disclosure-control-specialist-full-time-8cc6a866759c?source=rss-a47c6b913af------2</link>
            <guid isPermaLink="false">https://medium.com/p/8cc6a866759c</guid>
            <category><![CDATA[privacy]]></category>
            <category><![CDATA[golang]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[statistics]]></category>
            <category><![CDATA[python]]></category>
            <dc:creator><![CDATA[The Sensible Code Company]]></dc:creator>
            <pubDate>Wed, 15 Jan 2020 10:58:01 GMT</pubDate>
            <atom:updated>2020-12-02T15:16:43.651Z</atom:updated>
            <content:encoded><![CDATA[<h3>Remote: Senior Statistical Methodologist/Privacy Expert</h3><p>The Sensible Code Company makes digital products that automate processing, privacy protection and publication of data.</p><p>We’re looking for a senior methodologist to develop our understanding of the statistical domain and to help inform how we enhance and extend <a href="https://cantabular.com/">Cantabular</a>.</p><h3>Key things to know about this job:</h3><ul><li>could be a full-time or part-time role</li><li>supporting our technical team in designing approaches to SDC</li><li>supporting our work with large enterprise customers like the Office for National Statistics (ONS) and other statistical institutes and agencies</li><li>participating and speaking at international conferences</li><li>writing technical papers</li><li>must be within a 2 hour flight to Belfast, Northern Ireland</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*wfOKbAyF4BinXzMMUTeRpA.jpeg" /></figure><h3>Your skills are:</h3><ul><li>have a knowledge and experience in the implementation of SDC methods</li><li>a keen enthusiast for stats packages: e.g. R, SAS, Python</li><li>an ability to interpret and describe the latest SDC methods</li><li>an ability to install and experiment with commercial and open source SDC software</li><li>ability to convey your ideas in written and verbal forms</li></ul><h3>You’ll be responsible for:</h3><ul><li>identifying use cases and associated SDC feature requirements</li><li>creating a knowledge bank on SDC from across the world</li><li>analysing existing SDC software</li><li>documenting algorithms</li><li>finding the highest value events to talk about our SDC software</li><li>installing and configuring software on Windows and Linux including in the cloud, e.g. on AWS</li></ul><h3>Facts about The Sensible Code Company</h3><ul><li>we all have balanced lives, exact working times are flexible</li><li>we offer a generous 30 days plus public holidays (38 total)</li><li>we have a remote team across the EU</li><li>we are centred in Belfast</li><li>we use lots of Go, Python, AWS, Docker and more</li></ul><h3>Salary</h3><p>to €75,000 based on experience and pro-rata for part-time</p><h3>To apply, send the following:</h3><ul><li>It would be nice to see example of your papers you’ve published.</li><li>Your CV;</li><li>Your telephone number.</li></ul><h3>How to apply</h3><p>Email <a href="mailto:jobs@sensiblecode.io"><strong>jobs@sensiblecode.io</strong></a><strong> quoting scjob23 </strong>in the subject line (no agencies).</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8cc6a866759c" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Press Release]]></title>
            <link>https://sensiblecode.medium.com/press-release-a3b7d2ae0b68?source=rss-a47c6b913af------2</link>
            <guid isPermaLink="false">https://medium.com/p/a3b7d2ae0b68</guid>
            <category><![CDATA[government]]></category>
            <category><![CDATA[statistics]]></category>
            <category><![CDATA[census]]></category>
            <category><![CDATA[dissemination]]></category>
            <dc:creator><![CDATA[The Sensible Code Company]]></dc:creator>
            <pubDate>Tue, 14 Jan 2020 16:50:29 GMT</pubDate>
            <atom:updated>2020-01-15T11:31:05.326Z</atom:updated>
            <content:encoded><![CDATA[<h3>Press Release: ONS UK selects Cantabular for Census 2021</h3><h4><strong>The UK-based Office for National Statistics has selected Belfast company SensibleCode and its privacy preserving technology Cantabular for disseminating anonymised Census 2021 data.</strong></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*Angtk-6nK2sUMKMYwYDOHA.png" /></figure><p>The three year contract valued at £1.3m is a significant commercial win for the company that specialises in products for modernising statistics.</p><p>The ONS is improving the way it disseminates data in order to add value to its community of users. Cantabular applies data anonymisation in real time as a user makes a census query. Data can be published sooner, users can self service their queries and researchers can get access to a high speed API to allow seamless data processing.</p><blockquote><strong><em>Aine McGUIRE, Commercial Director, SensibleCode.</em></strong> “We’re delighted to be working with the ONS, given its international reputation as the gold standard in statistical practice. Our technology will transform the way Census 2021 data is disseminated and deliver higher value to the economy, through better policy, better business decisions and valuable research. The software applies robust statistical disclosure control techniques in real time. The ONS is able to compute millions of tables of data at high speed whilst protecting anonymity and to ensure data are non-disclosive”.</blockquote><h3><strong>About Office for National Statistics, UK</strong></h3><p>The Office for National Statistics (ONS; Welsh: Swyddfa Ystadegau Gwladol) is the executive office of the UK Statistics Authority, a non-ministerial department which reports directly to the UK Parliament. It is charged with the collection and publication of statistics related to the economy, population and society of the UK; responsibility for some areas of statistics in Scotland, Northern Ireland and Wales is devolved to the devolved governments for those areas.</p><h3>About SensibleCode</h3><p><a href="https://sensiblecode.io/">The Sensible Code Company</a> is a digital start-up with venture capital backing. It has won several awards for innovative technology products. Its ground breaking technology modernises the processing and dissemination of data. The software is designed to support statisticians and data controllers, to help improve business operations that require the processing of confidential data.</p><h3>Press contacts</h3><p>Aine McGUIRE <br><a href="mailto:aine@sensiblecode.io">aine@sensiblecode.io</a> <br>+44 (0)7710 377929</p><p><a href="https://cantabular.com?utm_source=cantabular-pr">cantabular.com</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/430/0*_L1js2JDnPON1uGT" /></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a3b7d2ae0b68" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Modernising statistics — keeping data safe]]></title>
            <link>https://sensiblecode.medium.com/modernising-statistics-keeping-data-safe-f2a783d40fff?source=rss-a47c6b913af------2</link>
            <guid isPermaLink="false">https://medium.com/p/f2a783d40fff</guid>
            <category><![CDATA[statistics]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[gdpr]]></category>
            <category><![CDATA[economics]]></category>
            <category><![CDATA[privacy]]></category>
            <dc:creator><![CDATA[The Sensible Code Company]]></dc:creator>
            <pubDate>Thu, 13 Dec 2018 13:33:32 GMT</pubDate>
            <atom:updated>2018-12-14T14:15:30.451Z</atom:updated>
            <content:encoded><![CDATA[<h3><strong>Modernising statistics — keeping data safe</strong></h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/711/0*ij2NPcqdB3_OdwMs" /><figcaption>Pipeline for processing personal data</figcaption></figure><p>Statistics professionals within public sector agencies take great care in how they process and protect personal data and this is reflected in the trust and respect they enjoy from their customers and the public at large. #GDPR has thrown a further spotlight on governance around data confidentiality.</p><p>A significant increase in the number of new data sources and a drive to make use of <a href="https://en.wikipedia.org/wiki/Administration_Data">admin data</a> means more data is being processed and published. Getting data released as close as possible to their collection date results in higher value to the economy. Transparency and open data are also driving change. There is an appetite for innovation and projects that focus on data confidentiality are happening across government.</p><p>There are two routes to get data:-</p><ul><li>aggregate tables are published on a website, these have had manual disclosure control techniques applied</li><li>access to research data is made available to those who meet specific criteria and sign legal agreements. Where data is highly sensitive the research is often conducted in a locked and secure room.</li></ul><h3><strong>Common challenges</strong></h3><p><strong><em>Data consumers</em></strong></p><ul><li>want more data and sooner.</li><li>want to be able to access customised tables as their needs cannot always be served by pre computed published tables.</li><li>require more meta data to support statistical releases.</li><li>public sector data consumers worry about a 100% push to API. They fully support open data, but are under cost pressure and do not have the resources to exploit these downstream.</li></ul><p><strong><em>Process</em></strong></p><ul><li>Myriad disclosure control techniques exist; these are manual processes.</li><li>Legal agreements to protect data access are complex and time consuming to negotiate and agree.</li><li>Data controllers want users to be able to make ‘meaningful’ queries so that data cannot be misinterpreted.</li><li>A pragmatic risk-utility balance needs be achieved and a precedent set if automation is to be successful.</li></ul><p><strong><em>Culture</em></strong></p><ul><li>Micro data access is still the norm for research purposes for a percentage of government datasets</li></ul><p>There is no simple solution to address all of the challenges; it seems likely that a variety of different approaches will be used to address the tension between publication and statistical disclosure control (SDC).</p><h3>Open source software for SDC</h3><p>There are numerous tried and tested open source solutions in the market.</p><p><a href="https://archive.fo/OapRI"><strong>sdcMicro</strong></a></p><p>SDCMicro is free, R-based open-source package for the generation of protected microdata for researchers and public use.</p><blockquote>Multiple options for reducing disclosure risk and for assessing information loss.</blockquote><blockquote>Multiple methods for assessing the re-identification risk (k-anonymity, individual, and global re-identification risk).</blockquote><blockquote>Graphical user interface available for users with no or limited knowledge of the R language.</blockquote><p>Author: <em>Matthias Templ</em></p><p><a href="http://τ-ARGUS is a software tooldesigned to assist a data protector in producing safe tables"><strong>τ-ARGUS</strong></a></p><p>Tau-Argus is a software tool which enables statistical disclosure control to be carried out to protect tabular output. It can be run in either interactive or batch mode and can import tables or microdata, allowing the user to create tables. Tau-Argus supports either frequency or magnitude data types and once imported along with a metadata file, the user can apply a number of confidentiality rules.</p><p>Typically for magnitude tables, safety rules such as threshold and dominance rules are set by the user, and cells failing these rules are highlighted, allowing the user to select them for suppression. In order to avoid disclosure by differencing, secondary suppression can be applied using a variety of techniques. For frequency tables, controlled rounding is commonly applied. This method rounds cell values to the nearest multiple of a user specified base, whilst maintaining the table additivity.</p><ul><li>Author: <em>Numerous authors. Peter-Paul de Wolf </em>— Statistics Netherlands.</li></ul><p><a href="https://archive.fo/mpZXS#selection-263.0-271.82"><strong>SUDA</strong></a></p><p>Software tool for use with statistical disclosure control for microdata. It provides a per record risk measure which not only tells the user how much risk they have but also where in the file (by record, by variable, by variable value) the risk is located. This enables the user to make principled decisions and to target disclosure control which in turn maximises the residual utility.</p><p>Author: <em>Mark Elliot, U</em>niversity of Manchester</p><h3>Protecting personal data with TableBuilder</h3><p>TableBuilder is designed to help statistics professionals to modernise the way they process and disseminate data whilst ensuring data is kept confidential.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/887/0*-O0WcGKfHI1DqA3N" /><figcaption>Pipeline for processing personal data with TableBuilder</figcaption></figure><blockquote><strong>It allows flexible data dissemination through real-time application of disclosure control techniques in response to user queries</strong>.</blockquote><h3>The benefits</h3><ul><li>More granular data</li><li>More speed as data are released closer to collection data</li><li>More flexible as users can make their own queries</li></ul><p>There is a standard user Interface and an API. <strong>Data controllers can use TableBuilder as an internal tool to model disclosure risk.</strong></p><p>The <strong>data controller administration</strong> module allows experimentation with disclosure parameters and to see the results in real time.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*3gy2jh_k6UiftvT0O0jGhQ.png" /></figure><h3>Components</h3><ul><li>Statistical disclosure control module (cell key method with automatic preservation of structural zeros)</li><li>Data controller administrator module</li><li>End user interface module</li></ul><p>Poster presented at the International Association of Statistics in Paris in September 2018.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/555/1*Rq9ehBWni5wRkN6JJ24c5g.png" /><figcaption><a href="https://sensiblecode.io/resources/case-study-ons.pdf">https://sensiblecode.io/resources/case-study-ons.pdf</a></figcaption></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f2a783d40fff" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Engineering confidentiality for Census 2021 — more data, more quickly and for more areas.]]></title>
            <link>https://sensiblecode.medium.com/engineering-privacy-for-census-2021-more-data-more-quickly-and-for-more-areas-a24439847973?source=rss-a47c6b913af------2</link>
            <guid isPermaLink="false">https://medium.com/p/a24439847973</guid>
            <category><![CDATA[census]]></category>
            <category><![CDATA[statistics]]></category>
            <category><![CDATA[gdpr]]></category>
            <category><![CDATA[privacy]]></category>
            <category><![CDATA[data-science]]></category>
            <dc:creator><![CDATA[The Sensible Code Company]]></dc:creator>
            <pubDate>Thu, 01 Mar 2018 11:15:09 GMT</pubDate>
            <atom:updated>2018-03-02T11:43:59.497Z</atom:updated>
            <content:encoded><![CDATA[<iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2Fr5sDksDYY7k%3Ffeature%3Doembed&amp;url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dr5sDksDYY7k&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2Fr5sDksDYY7k%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/bad9f5a71fc42b994032ea6d6cfcd036/href">https://medium.com/media/bad9f5a71fc42b994032ea6d6cfcd036/href</a></iframe><p><strong>Over the past 9 months we’ve added substantial capability to TableBuilder. The end game is to allow the Office for National Statistics, England and Wales to publish more 2021 Census data tables, much sooner, while ensuring that data is kept confidential.</strong></p><p>The Office for National Statistics is a trusted organisation: it commits to ensuring that the personal information people provide will be kept safe and secure. It also has to comply with confidentiality requirements set out in the Statistics and Registration Service Act 2007, the Data Protection Act 1998 and the forthcoming General Data Protection Regulation.</p><h3><strong>Publishing more tables for diverse areas</strong></h3><p>Access to finer grained data across the England and Wales will mean that decisions made which are based on census data can be better. Disclosure control rules are complex and historically they’ve been applied manually. This meant they were mostly applied uniformly across different geographical areas because there are too many output areas to consider: about 180,000 in England and Wales. This gave rise to a problem: tables that could be created for diverse areas (like Barnet in London) were not published because the similar table would be too disclosive in, for example, Northumberland.</p><p>Using TableBuilder the ONS can create a set of rules which are applied uniformly across the country. However the effect of the rules is not uniform because they depend on data which varies across the country. The rules are evaluated independently for every area in a user’s query (up to 180,000 areas). Data is only published for those areas where all the rules pass.</p><h3>With TableBuilder we can now :-</h3><ul><li>Process more complex input data</li><li>Set disclosure rules to determine which tables can be published</li><li>Automatically preserve ‘structural zeros’</li><li>Demonstrate the user interface</li><li>Generate tables orders of magnitude faster than a conventional database</li></ul><h3>Process more complex input data</h3><p>We’ve transformed TableBuilder so that it can handle the amount of data generated by the Census (over 56 million people living in more than 23 million households according to the last Census ). We’ve loaded a full geographic hierarchy for England and Wales (180,000 output areas), added mappings (a.k.a. grouping or banding) and loaded ONS supplied perturbation parameters.</p><h3>Set Disclosure Rules</h3><p>The ONS has developed a set of disclosure rules which determine whether a table can be published. These rules are configurable with numeric threshold parameters that determine their sensitivity.</p><p>We created a disclosure rules editor to allow the ONS team to experiment with adjusting the rule parameters. Using this tool they can immediately see the impact on the number of different tables that could be published across the country.</p><h4>How it works</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/640/0*ymN3URTAz5tC4U4b." /></figure><p>TableBuilder Architecture</p><h3>Preserve Structural zeros</h3><p>Some combinations of categories should never coexist, for example: people under 16 who are in full time employment. Zeros in the output table with these combinations are deemed “structural”.</p><p>In order to avoid exposing the workings of the perturbation algorithm we need to make sure that “structural” zeros in the output table are not perturbed. TableBuilder does this by first running the user’s query at a higher level geography to determine which zeros must not be changed.</p><h3><strong>Demonstrate the user interface</strong></h3><p>We’ve built a test interface to allow the ONS to show its users how the system works and to facilitate user research.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/640/0*NYjZrZQealfighZ9." /></figure><h3>Why performance matters</h3><p>TableBuilder has been engineered from the ground up to be performant: we can deliver results to queries in sub-second response times. In this time-frame we scan 56,000,000 rows, compute results at two different geographic levels, execute the cell key perturbation and evaluate multiple business rules on up to 180,000 areas.</p><h3><strong>What happens next?</strong></h3><p>The dissemination team at the ONS will show the system to some of their users and conduct more user research and testing.</p><p><em>“Users have always wanted the data as soon as possible after the Census. We expect this desire to be strongly reflected again in our forthcoming public consultation on outputs [launching on 28 February this year]. This work paves the way for us to be able to meet this demand”</em>.</p><p><strong>Suzie Dunsmith (Head of 2021 Census Outputs at ONS)</strong></p><h3>The backstory….</h3><p>This blog tells you how we got here! <a href="https://medium.com/@SensibleCode/expose-the-value-protect-the-data-51b701ebfed4">Expose the value — Protect the data</a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a24439847973" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>