{"id":9288,"date":"2023-03-16T17:29:10","date_gmt":"2023-03-17T01:29:10","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/python\/?p=9288"},"modified":"2023-05-24T09:57:23","modified_gmt":"2023-05-24T17:57:23","slug":"data-wrangler-release","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/python\/data-wrangler-release\/","title":{"rendered":"Introducing the Data Wrangler extension for Visual Studio Code"},"content":{"rendered":"<p>We&#8217;re excited to announce the launch of Data Wrangler, a revolutionary tool for data scientists and analysts who work with tabular data in Python. Data Wrangler is an extension for <a href=\"https:\/\/code.visualstudio.com\/\" target=\"_blank\" rel=\"noopener\">VS Code<\/a> and the first step towards our vision of simplifying and expediting the data preparation process on Microsoft platforms.<\/p>\n<p>Data preparation, cleaning, and visualization is a time-consuming task for many data scientists, but with Data Wrangler we&#8217;ve developed a solution that simplifies this process. Our goal is to make this process more accessible and efficient for everyone, to free up your time to focus on other parts of the data science workflow. To try Data Wrangler today, go to the Extension Marketplace tab in VS Code and search for &#8220;Data Wrangler&#8221;. To learn more about Data Wrangler, check out the documentation here: <a href=\"https:\/\/aka.ms\/datawrangler\">https:\/\/aka.ms\/datawrangler<\/a>.<\/p>\n<p style=\"text-align: center;\"><iframe title=\"YouTube video player\" src=\"\/\/www.youtube.com\/embed\/KrzcV1c1W1U\" width=\"562\" height=\"315\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<p>With Data Wrangler, you can seamlessly clean and explore your data in <a href=\"https:\/\/code.visualstudio.com\/\">VS Code<\/a>. It offers a variety of features that will help you quickly identify and fix errors, inconsistencies, and missing data. You can perform data profiling and data quality checks, visualize data distributions, and easily transform data into the format you need. Plus, Data Wrangler comes with a library of built-in transformations and visualizations, so you can focus on your data, not the code. As you make changes, the tool generates code using open-source Python libraries for the data transformation operations you perform. This means you can write better data preparation programs faster and with fewer errors. The code also keeps Data Wrangler transparent and helps you verify the correctness of the operation as you go.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/user-images.githubusercontent.com\/15910920\/222234663-5e8178f8-851f-4326-8719-9d90039ce023.gif\" alt=\"Data Wrangler operation\" \/><\/p>\n<p>In a recent study, Python data scientists using the Pandas dataframe library report spending the majority (~51%) of their time preparing, cleaning and visualizing data for their models (<a href=\"https:\/\/www.anaconda.com\/state-of-data-science-report-2022\">Anaconda State of Data Science Report 2022<\/a>). This activity is critical to the success of their projects, as poor data quality directly impacts the quality of the predictions made by their models. Furthermore, this activity is not predictable: the industry even calls it <em>exploratory data analysis<\/em> to capture the fact that it is often highly creative, requiring experimentation, visualization, comparison and iteration. However, despite the activity being creative and iterative, the individual operations are not \u2013 they involve writing small code snippets that drop columns, remove missing values, etc. But today there isn\u2019t tooling support that makes it easier; In our research with data scientists, we regularly see them searching for and copy-pasting snippets of code from Stack Overflow into their programs.<\/p>\n<h2>Data Wrangler Interface<\/h2>\n<p>With Data Wrangler, we&#8217;ve developed an interactive UI that writes the code for you. As you inspect and visualize your Pandas dataframes using Data Wrangler, generating the code for your desired operations is easy. For instance, if you want to remove a column, you can right-click on the column heading and delete it, and Data Wrangler will generate the Python code to do that. If you want to remove rows containing missing values or substitute them with a computed default value, you can do that directly from the UI. If you want to reformat a categorical column by one-hot encoding it to make it suitable for machine learning algorithms, you can do so with a single command.<\/p>\n<h2>Create column from examples<\/h2>\n<p>Data scientists often need to create a new derived column from existing columns in their Pandas dataframe, which usually involves writing custom code that can easily become a source of bugs. With Data Wrangler, all you need to do is provide examples of how you want the data in the derived column to look like, and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/prose-framework\/\">PROSE<\/a>, our AI-powered program synthesis technology (the same technology that powers Microsoft Excel&#8217;s Flash Fill feature), will write the Python code for you. If you find an error in the results, you can correct it with a new example, and PROSE will rewrite the Python code to produce a better result. You can even modify the generated code yourself.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/user-images.githubusercontent.com\/15910920\/222234636-26075b4c-386f-48d5-8e4a-280935ea7876.gif\" alt=\"Extract first name by example\" \/><\/p>\n<p>&nbsp;<\/p>\n<h2>How to try Data Wrangler<\/h2>\n<p>To start using Data Wrangler today in <a href=\"https:\/\/code.visualstudio.com\">Visual Studio Code<\/a>, just download the Data Wrangler extension from the marketplace and visit our <a href=\"https:\/\/aka.ms\/datawrangler\">getting started page<\/a> to try it out! You can then launch Data Wrangler from any Pandas dataframe output in a Jupyter Notebook, or by right-clicking any CSV or Parquet file in VS Code and selecting &#8220;Open in Data Wrangler&#8221;.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/user-images.githubusercontent.com\/15910920\/222234652-0f1d0f93-cbb7-4928-b83e-5d9c101e32cd.gif\" alt=\"Data Wrangler entrypoint\" \/><\/p>\n<p>This is the first release of Data Wrangler so we are looking for feedback as we iterate on the product. Please provide any product feedback <a href=\"https:\/\/aka.ms\/datawranglerfeedback\">here<\/a>. If you run into any issues, please file a bug report in our Github repo <a href=\"https:\/\/github.com\/microsoft\/vscode-data-wrangler\/issues\/new\/choose\">here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Microsoft announces the launch of Data Wrangler, a data-centric user interface that generates Python code to help data scientists complete their data preparation tasks faster and with fewer errors.<\/p>\n","protected":false},"author":8347,"featured_media":9289,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,6],"tags":[1032,17],"class_list":["post-9288","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python","category-visual-studio-code","tag-data-science","tag-python"],"acf":[],"blog_post_summary":"<p>Microsoft announces the launch of Data Wrangler, a data-centric user interface that generates Python code to help data scientists complete their data preparation tasks faster and with fewer errors.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/posts\/9288","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/users\/8347"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/comments?post=9288"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/posts\/9288\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/media\/9289"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/media?parent=9288"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/categories?post=9288"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/tags?post=9288"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}