{"id":1003,"date":"2026-01-21T06:03:40","date_gmt":"2026-01-21T06:03:40","guid":{"rendered":"https:\/\/www.askpython.com\/?p=1003"},"modified":"2026-01-25T15:03:18","modified_gmt":"2026-01-25T15:03:18","slug":"python-set","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python\/set\/python-set","title":{"rendered":"Python Set &#8211; Things You MUST Know"},"content":{"rendered":"<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# Create a set\nnumbers = {1, 2, 3, 4, 5}\nempty_set = set()  # Not {}, that&#039;s a dictionary\nunique_items = set(&#x5B;1, 2, 2, 3, 3, 4])  # {1, 2, 3, 4}\n\n<\/pre><\/div>\n\n\n<p>Sets are Python&#8217;s built-in data structure for storing unique, unordered collections. <\/p>\n\n\n\n<p>They automatically eliminate duplicates and provide fast membership testing. If you&#8217;ve ever needed to remove duplicate entries from a list or check whether something exists in a collection without iterating through every item, sets solve that problem.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Creating and initializing Python sets<\/h2>\n\n\n\n<p>You can create Python sets in three ways. The curly brace syntax works for non-empty sets, while the <code>set()<\/code> constructor handles empty sets and conversions from other iterables.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# Direct creation\nfruits = {&#039;apple&#039;, &#039;banana&#039;, &#039;cherry&#039;}\n\n# From a list (duplicates removed automatically)\nnumbers = set(&#x5B;1, 2, 2, 3, 3, 3, 4])\nprint(numbers)  # {1, 2, 3, 4}\n\n# From a string (each character becomes an element)\nletters = set(&#039;hello&#039;)\nprint(letters)  # {&#039;h&#039;, &#039;e&#039;, &#039;l&#039;, &#039;o&#039;}\n\n# Empty set (careful here)\nempty = set()  # Correct\nnot_empty = {}  # This creates a dictionary, not a set\n\n<\/pre><\/div>\n\n\n<p>The gotcha that trips up newcomers is the empty set syntax. You can&#8217;t use <code>{}<\/code> because Python reserves that for dictionaries. This makes sense once you know that sets came later to the language than dictionaries, so the bracket notation was already taken.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Basic operations with Python sets that actually matter<\/h2>\n\n\n\n<p>Sets shine when you need to add, remove, or check for items. The operations are straightforward and perform better than equivalent list operations.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nanimals = {&#039;cat&#039;, &#039;dog&#039;, &#039;bird&#039;}\n\n# Adding items\nanimals.add(&#039;fish&#039;)\nprint(animals)  # {&#039;cat&#039;, &#039;dog&#039;, &#039;bird&#039;, &#039;fish&#039;}\n\n# Adding won&#039;t duplicate\nanimals.add(&#039;cat&#039;)\nprint(animals)  # Still {&#039;cat&#039;, &#039;dog&#039;, &#039;bird&#039;, &#039;fish&#039;}\n\n# Removing items (raises error if not found)\nanimals.remove(&#039;dog&#039;)\n\n# Safer removal (no error if missing)\nanimals.discard(&#039;elephant&#039;)  # Does nothing, no error\n\n# Remove and return arbitrary item\nrandom_animal = animals.pop()\n\n<\/pre><\/div>\n\n\n<p>The difference between <code>remove()<\/code> and <code>discard()<\/code> matters in production code. If you&#8217;re not certain an item exists, <code>discard()<\/code> saves you from handling exceptions. I&#8217;ve seen codebases littered with try-except blocks around <code>remove()<\/code> calls when <code>discard()<\/code> would have been cleaner.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Membership testing is why Python sets exist<\/h2>\n\n\n\n<p>This is where sets actually earn their keep. Checking if an item exists in a set is constant time, O(1), while lists require iterating through items, O(n).<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# Slow with lists\nitems_list = list(range(10000))\nprint(9999 in items_list)  # Has to check every item\n\n# Fast with sets\nitems_set = set(range(10000))\nprint(9999 in items_set)  # Direct lookup, instant\n\n<\/pre><\/div>\n\n\n<p>The performance difference isn&#8217;t academic. If you&#8217;re checking membership repeatedly in a loop, using a list instead of a set can turn a millisecond operation into minutes. I&#8217;ve optimized code from 30 seconds to under a second just by converting a membership check from a list to a set.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Python set mathematics for practical problems<\/h2>\n\n\n\n<p>Sets implement mathematical operations that solve real problems. Union combines sets, intersection finds common elements, difference shows what&#8217;s unique to one set.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndevelopers = {&#039;Alice&#039;, &#039;Bob&#039;, &#039;Charlie&#039;}\ndesigners = {&#039;Bob&#039;, &#039;Diana&#039;, &#039;Eve&#039;}\n\n# Union: everyone on both teams\nall_people = developers | designers\nprint(all_people)  # {&#039;Alice&#039;, &#039;Bob&#039;, &#039;Charlie&#039;, &#039;Diana&#039;, &#039;Eve&#039;}\n\n# Intersection: people on both teams\nboth_teams = developers &amp; designers\nprint(both_teams)  # {&#039;Bob&#039;}\n\n# Difference: only developers\nonly_devs = developers - designers\nprint(only_devs)  # {&#039;Alice&#039;, &#039;Charlie&#039;}\n\n# Symmetric difference: people on exactly one team\none_team_only = developers ^ designers\nprint(one_team_only)  # {&#039;Alice&#039;, &#039;Charlie&#039;, &#039;Diana&#039;, &#039;Eve&#039;}\n\n<\/pre><\/div>\n\n\n<p>These operations have method equivalents that read more clearly in some contexts.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# Method versions of the same operations\nall_people = developers.union(designers)\nboth_teams = developers.intersection(designers)\nonly_devs = developers.difference(designers)\none_team_only = developers.symmetric_difference(designers)\n\n<\/pre><\/div>\n\n\n<p>The method versions accept any iterable as an argument, not just sets. That flexibility helps when you&#8217;re working with mixed data types.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nnumbers = {1, 2, 3}\n# This works\nresult = numbers.union(&#x5B;4, 5, 6])\n# This doesn&#039;t\nresult = numbers | &#x5B;4, 5, 6]  # TypeError\n\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">Practical use cases for Python sets that come up constantly<\/h2>\n\n\n\n<p>Sets solve specific problems better than any other data structure. Removing duplicates from user input, finding common elements between datasets, or tracking unique visitors all become simple with sets.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# Remove duplicates from a list\nuser_ids = &#x5B;101, 102, 101, 103, 102, 104]\nunique_ids = list(set(user_ids))\nprint(unique_ids)  # &#x5B;101, 102, 103, 104]\n\n# Find common interests\nuser_a_interests = {&#039;python&#039;, &#039;golang&#039;, &#039;rust&#039;, &#039;javascript&#039;}\nuser_b_interests = {&#039;python&#039;, &#039;java&#039;, &#039;javascript&#039;, &#039;c++&#039;}\nshared_interests = user_a_interests &amp; user_b_interests\nprint(shared_interests)  # {&#039;python&#039;, &#039;javascript&#039;}\n\n# Track unique visitors\nvisitors = set()\nvisitors.add(&#039;user_123&#039;)\nvisitors.add(&#039;user_456&#039;)\nvisitors.add(&#039;user_123&#039;)  # Duplicate, ignored\nprint(len(visitors))  # 2\n\n<\/pre><\/div>\n\n\n<p>One pattern I use constantly is filtering a large dataset based on a smaller set of valid identifiers. Converting the identifier list to a set makes the filtering operation dramatically faster.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# Slow approach with lists\nvalid_ids = &#x5B;1, 2, 3, 4, 5]\nrecords = &#x5B;{&#039;id&#039;: i, &#039;data&#039;: &#039;value&#039;} for i in range(1000)]\nfiltered = &#x5B;r for r in records if r&#x5B;&#039;id&#039;] in valid_ids]\n\n# Fast approach with sets\nvalid_ids_set = {1, 2, 3, 4, 5}\nrecords = &#x5B;{&#039;id&#039;: i, &#039;data&#039;: &#039;value&#039;} for i in range(1000)]\nfiltered = &#x5B;r for r in records if r&#x5B;&#039;id&#039;] in valid_ids_set]\n\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">Updating Python sets in place<\/h2>\n\n\n\n<p>Sets provide methods that modify the set directly rather than creating new ones. These operations are faster when you&#8217;re working with large datasets and don&#8217;t need to preserve the original.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ntags = {&#039;python&#039;, &#039;programming&#039;}\n\n# Add multiple items\ntags.update(&#x5B;&#039;web&#039;, &#039;backend&#039;, &#039;api&#039;])\nprint(tags)  # {&#039;python&#039;, &#039;programming&#039;, &#039;web&#039;, &#039;backend&#039;, &#039;api&#039;}\n\n# Intersection update (keep only common elements)\nallowed_tags = {&#039;python&#039;, &#039;web&#039;, &#039;mobile&#039;}\ntags.intersection_update(allowed_tags)\nprint(tags)  # {&#039;python&#039;, &#039;web&#039;}\n\n# Difference update (remove elements found in another set)\ntags = {&#039;python&#039;, &#039;web&#039;, &#039;mobile&#039;, &#039;backend&#039;}\ndeprecated = {&#039;mobile&#039;, &#039;backend&#039;}\ntags.difference_update(deprecated)\nprint(tags)  # {&#039;python&#039;, &#039;web&#039;}\n\n<\/pre><\/div>\n\n\n<p>The naming convention helps clarify what&#8217;s happening. Methods ending in <code>_update<\/code> modify the set in place, while their counterparts without the suffix return new sets.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Immutable Python sets with frozenset<\/h2>\n\n\n\n<p>Sometimes you need a set that can&#8217;t change. Frozen sets work as dictionary keys or elements in other sets, which regular sets can&#8217;t do because they&#8217;re mutable.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# Create a frozen set\nimmutable = frozenset(&#x5B;1, 2, 3])\n\n# Can&#039;t modify it\n# immutable.add(4)  # AttributeError\n\n# Can use as dictionary key\ncache = {}\nkey = frozenset(&#x5B;&#039;python&#039;, &#039;tutorial&#039;])\ncache&#x5B;key] = &#039;cached_result&#039;\n\n# Can nest in other sets\nset_of_sets = {frozenset(&#x5B;1, 2]), frozenset(&#x5B;3, 4])}\nprint(set_of_sets)  # {frozenset({1, 2}), frozenset({3, 4})}\n\n<\/pre><\/div>\n\n\n<p>Frozen sets come up less often than regular sets, but they&#8217;re essential when you need hashable collections. Configuration data that shouldn&#8217;t change or building composite cache keys are the main use cases.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Comprehensions for building Python sets<\/h2>\n\n\n\n<p>Set comprehensions follow the same syntax as list comprehensions but produce sets with automatic deduplication.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# Create a set of squared numbers\nsquares = {x**2 for x in range(10)}\nprint(squares)  # {0, 1, 4, 9, 16, 25, 36, 49, 64, 81}\n\n# Extract unique words from text\ntext = &quot;the quick brown fox jumps over the lazy dog&quot;\nunique_words = {word for word in text.split()}\nprint(unique_words)  # {&#039;the&#039;, &#039;quick&#039;, &#039;brown&#039;, &#039;fox&#039;, &#039;jumps&#039;, &#039;over&#039;, &#039;lazy&#039;, &#039;dog&#039;}\n\n# Filter and transform in one operation\nnumbers = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}\neven_squares = {x**2 for x in numbers if x % 2 == 0}\nprint(even_squares)  # {4, 16, 36, 64, 100}\n\n<\/pre><\/div>\n\n\n<p>The deduplication happens automatically, which makes set comprehensions perfect for extracting unique transformed values from larger datasets.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Performance characteristics that matter<\/h2>\n\n\n\n<p>Sets use hash tables internally, which gives them constant time operations for adding, removing, and checking membership. That&#8217;s the whole reason they exist. Lists can&#8217;t match that performance for these operations.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport time\n\n# Compare membership testing\nlarge_list = list(range(100000))\nlarge_set = set(range(100000))\n\n# Test with list\nstart = time.time()\nfor _ in range(1000):\n    99999 in large_list\nlist_time = time.time() - start\n\n# Test with set\nstart = time.time()\nfor _ in range(1000):\n    99999 in large_set\nset_time = time.time() - start\n\nprint(f&quot;List: {list_time:.4f}s&quot;)\nprint(f&quot;Set: {set_time:.4f}s&quot;)\n\n<\/pre><\/div>\n\n\n<p>The tradeoff is memory. Sets consume more memory per element than lists because of the hash table overhead. For small collections, the difference doesn&#8217;t matter. For millions of items, it adds up.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">When Python sets aren&#8217;t the answer<\/h2>\n\n\n\n<p>Sets lose ordering information. If you need to maintain the sequence of items, sets will frustrate you because they&#8217;re fundamentally unordered collections.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nnumbers = {5, 1, 3, 2, 4}\nprint(numbers)  # Order is unpredictable\n\n<\/pre><\/div>\n\n\n<p>Sets also only work with hashable types. You can&#8217;t store lists, dictionaries, or other sets as set elements because these types are mutable and therefore not hashable.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# This fails\nbad_set = {&#x5B;1, 2, 3]}  # TypeError: unhashable type: &#039;list&#039;\n\n# This works\ngood_set = {(1, 2, 3)}  # Tuples are hashable\n\n<\/pre><\/div>\n\n\n<p>If you need both uniqueness and ordering, you have two options. Use a list and manually check for duplicates, or use Python 3.7+ dictionaries which maintain insertion order and can simulate sets through their keys.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# Ordered unique collection\nordered_unique = list(dict.fromkeys(&#x5B;3, 1, 2, 1, 3, 2]))\nprint(ordered_unique)  # &#x5B;3, 1, 2]\n\n<\/pre><\/div>\n\n\n<p>Sets are a specialized tool that excel at specific tasks. Understanding when to reach for them versus lists or dictionaries separates developers who write slow code from those who write fast code.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Sets are Python&#8217;s built-in data structure for storing unique, unordered collections. They automatically eliminate duplicates and provide fast membership testing. If you&#8217;ve ever needed to remove duplicate entries from a list or check whether something exists in a collection without iterating through every item, sets solve that problem. Creating and initializing Python sets You can [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":65738,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[478],"class_list":["post-1003","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-set","tag-python-set"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/1003","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=1003"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/1003\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/65738"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=1003"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=1003"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=1003"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}