{"id":62120,"date":"2024-04-30T10:49:08","date_gmt":"2024-04-30T10:49:08","guid":{"rendered":"https:\/\/www.askpython.com\/?p=62120"},"modified":"2025-04-10T20:28:49","modified_gmt":"2025-04-10T20:28:49","slug":"survival-analysis-python","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python\/examples\/survival-analysis-python","title":{"rendered":"Survival Analysis in Python: A Comprehensive Guide with Examples"},"content":{"rendered":"\n<p>Survival analysis is a statistical method for investigating the time until an event of interest occurs, making it invaluable in fields such as medical sciences, engineering, and beyond. With its extensive libraries like <a href=\"https:\/\/www.askpython.com\/python-modules\/numpy\/numpy-frexp\" data-type=\"post\" data-id=\"59605\">NumPy<\/a> and <a href=\"https:\/\/www.askpython.com\/python-modules\/matplotlib\/python-matplotlib\" data-type=\"post\" data-id=\"3182\">Matplotlib<\/a>, Python provides an ideal platform for implementing survival analysis. <\/p>\n\n\n\n<p>From generating random survival data to calculating survival probabilities using the Kaplan-Meier method and visualizing survival curves, Python empowers us to unravel the mysteries of survival analysis.<\/p>\n\n\n\n<p>In this article, we will discuss the concept of survival analysis and observe a simple case.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>Survival analysis is a statistical method used to calculate the time until an event of interest occurs. It is applied in various fields, such as engineering and medical sciences, to evaluate the viability of different approaches or treatments. Python provides tools like NumPy and Matplotlib to generate random survival data, calculate survival probabilities using the Kaplan-Meier method, and visualize survival curves for comparison.<\/em><\/p>\n<\/blockquote>\n\n\n\n<p><strong><em>Recommended: <a href=\"https:\/\/www.askpython.com\/python\/examples\/non-parametric-statistics-in-python\">Non-Parametric Statistics in Python: Exploring Distributions and Hypothesis Testing<\/a><\/em><\/strong><\/p>\n\n\n\n<p><strong><em>Recommended: <a href=\"https:\/\/www.askpython.com\/python\/examples\/joint-probability-distribution\">Understanding Joint Probability Distribution with Python<\/a><\/em><\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction to Survival Analysis: Concept and Implementation in Python<\/h2>\n\n\n\n<p>Survival analysis calculates the time for an event that is of interest to us. It is used in multiple fields, such as engineering and medical sciences. Let us implement this in Python.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport numpy as np\nimport matplotlib.pyplot as plt\n\n# --- Generate Random Survival Data with NumPy ---\n\n# Set random seed (optional, for reproducibility)\nnp.random.seed(42)\n\n# Simulate random durations (time) between 0 and 10\ndurations = np.random.uniform(low=0, high=10, size=100)\n\n# Simulate random events (0 for no event, 1 for event)\n# Adjust probability (p) to control the number of events\nevents = np.random.binomial(n=1, p=0.3, size=100)  # Assuming 30% event probability\n\n# Sort together by duration (ascending)\ndata = np.array(&#x5B;durations, events]).T\ndata = data&#x5B;data&#x5B;:, 0].argsort()]\n\n# Separate sorted data\nsorted_durations = data&#x5B;:, 0]\nsorted_events = data&#x5B;:, 1]\n\n# Initialize variables for Kaplan-Meier calculation\nn_total = len(sorted_durations)\nn_alive = n_total\nt = &#x5B;]\ns = &#x5B;]  # Survival probability\n\nfor i in range(n_total):\n    if sorted_events&#x5B;i] == 1:\n        t.append(sorted_durations&#x5B;i])\n        s.append(n_alive \/ (n_alive + 1))\n    n_alive -= 1\n\n# --- Plot Kaplan-Meier Curve (DIY) ---\n\nplt.figure(figsize=(8, 6))\nplt.step(t, s, where=&#039;post&#039;)  # Step function for Kaplan-Meier curve\nplt.xlabel(&quot;Time&quot;)\nplt.ylabel(&quot;Probability of Survival&quot;)\nplt.grid(True)\nplt.title(&quot;Kaplan-Meier Survival Curve (DIY with NumPy)&quot;)\nplt.show()\n\n# --- Note on Cox Proportional Hazards Model ---\n\nprint(&quot;Cox Proportional Hazards Model cannot be directly implemented with NumPy alone.&quot;)\nprint(&quot;Consider using libraries like lifelines or statsmodels for this analysis.&quot;)\n\n<\/pre><\/div>\n\n\n<p>Let us look at the output of the code above.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"700\" height=\"547\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/04\/Survival-analysis-output.png\" alt=\"Survival Analysis Output\" class=\"wp-image-62203\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/04\/Survival-analysis-output.png 700w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/04\/Survival-analysis-output-300x234.png 300w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\" \/><figcaption class=\"wp-element-caption\"><strong><em>Plotting Kaplan-Meier Survival Curve using NumPy<\/em><\/strong><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Real-World Example: Comparing Survival Probabilities of New Drug vs. Standard Treatment<\/h2>\n\n\n\n<p>Let us look at a simple example of where survival analysis is used in real life. So essentially, we have cures for disease, one is a standard drug and the other is a new drug. We run the survival analysis algorithm to determine which is better in this scenario. We have created random data to support our research activity.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport numpy as np\nimport matplotlib.pyplot as plt\n\n# Set random seed (optional, for reproducibility)\nnp.random.seed(42)\n\n# Sample sizes (adjust for desired imbalance)\nn_drug = 50\nn_standard = 70\n\n# Simulate durations (potentially affected by treatment)\n# Drug group might have slightly longer survival times on average\ndrug_durations = np.random.normal(loc=5, scale=1.5, size=n_drug)\nstandard_durations = np.random.normal(loc=4, scale=1, size=n_standard)\n\n# Simulate events (considering potential treatment effect)\ndrug_events = np.random.binomial(n=1, p=0.4, size=n_drug)  # Assuming lower event probability for drug\nstandard_events = np.random.binomial(n=1, p=0.6, size=n_standard)  # Assuming higher event probability for standard\n\n# Simulate treatment assignment (0: Standard, 1: Drug)\ntreatment = np.concatenate((np.zeros(n_standard), np.ones(n_drug)))\n\n# Combine data into arrays\ndurations = np.concatenate((standard_durations, drug_durations))\nevents = np.concatenate((standard_events, drug_events))\nsorted_data = np.array(&#x5B;durations, events, treatment]).T  # Combine and sort by duration\nsorted_data = sorted_data&#x5B;sorted_data&#x5B;:, 0].argsort()]\n\n# Separate sorted data\nsorted_durations = sorted_data&#x5B;:, 0]\nsorted_events = sorted_data&#x5B;:, 1]\ntreatments = sorted_data&#x5B;:, 2]  # Separate treatment assignments\n\n# Initialize variables for Kaplan-Meier calculation (per treatment group)\nn_total_drug = np.sum(treatments == 1)\nn_total_standard = np.sum(treatments == 0)\nn_alive_drug = n_total_drug\nn_alive_standard = n_total_standard\nt_drug = &#x5B;]\ns_drug = &#x5B;]  # Survival probability (Drug)\nt_standard = &#x5B;]\ns_standard = &#x5B;]  # Survival probability (Standard)\n\ncurrent_time = sorted_durations&#x5B;0]\nevent_index = 0\n\nwhile event_index &lt; len(sorted_events):\n    if sorted_durations&#x5B;event_index] &gt; current_time:\n        # Update time point\n        current_time = sorted_durations&#x5B;event_index]\n        # Update survival probabilities for each group if applicable\n        if n_alive_drug &gt; 0:\n            s_drug.append(n_alive_drug \/ (n_total_drug))\n        if n_alive_standard &gt; 0:\n            s_standard.append(n_alive_standard \/ (n_total_standard))\n        t_drug.append(current_time)\n        t_standard.append(current_time)\n    else:\n        # Handle event (decrement alive for the corresponding treatment group)\n        if sorted_events&#x5B;event_index] == 1 and treatments&#x5B;event_index] == 1:\n            n_alive_drug -= 1\n        elif sorted_events&#x5B;event_index] == 1 and treatments&#x5B;event_index] == 0:\n            n_alive_standard -= 1\n    event_index += 1\n\n# Plot Kaplan-Meier curves (DIY) ---\n\nplt.figure(figsize=(8, 6))\nplt.step(t_drug, s_drug, where=&#039;post&#039;, label=&#039;New Drug&#039;)\nplt.step(t_standard, s_standard, where=&#039;post&#039;, label=&#039;Standard Treatment&#039;)\nplt.xlabel(&quot;Time&quot;)\nplt.ylabel(&quot;Probability of Survival&quot;)\nplt.grid(True)\nplt.title(&quot;Kaplan-Meier Survival Curves (DIY with NumPy)&quot;)\nplt.legend()\nplt.show()\n\n# --- Note on Statistical Comparison ---\n\nprint(&quot;This example demonstrates Kaplan-Meier curves. Consider tests&quot;)\nprint(&quot;like the Log-Rank test to statistically compare survival between groups.&quot;)\n\n<\/pre><\/div>\n\n\n<p>Let us look at the output for the same.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"709\" height=\"547\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/04\/Survival-analysis-case-study.png\" alt=\"Survival Analysis Case Study\" class=\"wp-image-62224\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/04\/Survival-analysis-case-study.png 709w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/04\/Survival-analysis-case-study-300x231.png 300w\" sizes=\"auto, (max-width: 709px) 100vw, 709px\" \/><figcaption class=\"wp-element-caption\"><strong><em>Survival Analysis Case Study<\/em><\/strong><\/figcaption><\/figure>\n\n\n\n<p>Thus, we can see that according to our tests, the new drug has a probability of survival of 1 as compared to standard treatment.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>You now have a solid understanding of survival analysis and its implementation in Python. We explored the concept, generated random survival data using NumPy, plotted Kaplan-Meier survival curves, and even delved into a case study comparing the effectiveness of a new drug against a standard treatment. <\/p>\n\n\n\n<p>Survival analysis is a powerful tool for evaluating the viability of different approaches in various domains. With this knowledge, you can apply survival analysis to your projects and make data-driven decisions. So, are you ready to put your survival analysis skills to the test and uncover valuable insights?<\/p>\n\n\n\n<p><strong><em>Recommended: <a href=\"https:\/\/www.askpython.com\/python\/examples\/probability-distributions\">Probability Distributions with Python (Implemented Examples)<\/a><\/em><\/strong><\/p>\n\n\n\n<p><strong><em>Recommended: <a href=\"https:\/\/www.askpython.com\/python-modules\/pyjanitor-miscellaneous-functions\">10 PyJanitor\u2019s Miscellaneous Functions for Enhancing Data Cleaning<\/a><\/em><\/strong><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Survival analysis is a statistical method for investigating the time until an event of interest occurs, making it invaluable in fields such as medical sciences, engineering, and beyond. With its extensive libraries like NumPy and Matplotlib, Python provides an ideal platform for implementing survival analysis. From generating random survival data to calculating survival probabilities using [&hellip;]<\/p>\n","protected":false},"author":80,"featured_media":63863,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-62120","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-examples"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/62120","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/80"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=62120"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/62120\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/63863"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=62120"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=62120"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=62120"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}