Posts

Showing posts with the label python

Count frequency of items in a list using Python

In this post I am going to show you how we can write a program in Python that can count the frequency of items in a list. First I shall implement it using the dictionary data structure in Python. Then I shall make it shorter (and nicer) using the defaultdict and Counter data structures from the collections module . Here is our code that we will use to run and test the program - if __name__ == "__main__": li = [2, 6, 9, 2, 8, 2, 9, 9, 3, 1, 4, 5, 7, 1, 8, 10, 2, 10, 10, 5] freq = frequency_counter(li) assert freq[1] == 2 assert freq[2] == 4 assert freq[3] == 1 assert freq[11] == 0 Now let's look at the implementation using dictionary. We will iterate over all the items and then check if it exists in the dictionary. If it exists, we shall increase the count. And if it doesn't, then we shall set the count to 1. def frequency_counter(li): freq_dt = dict() for item in li: if item in freq_dt: freq_dt[item] += 1 ...

Python Program to Find Prime Factors of a Number

 In this post I am going to write a program in python that finds all the prime factors of a number. I am going to start by writing an empty function and a test. def get_prime_factors(number): prime_factors = [] return prime_factors if __name__ == "__main__": n = 8 expected = [2, 2, 2] result = get_prime_factors(n) assert expected == result, result Now, if you run the program above, it will give an AssertionError, as we are returning an empty list. Let's write some code to make our test pass. def get_prime_factors(number): prime_factors = [] while number % 2 == 0: prime_factors.append(2) number = number // 2 return prime_factors The program will work for multiple of 2's. Our next task is to find the other prime factors. For example, prime factors of 10 are 2 and 5. We shall add this test case first and then write code to pass this test....

divmod - division and modulus together in Python

divmod is a wonderful, nice, little built-in function in Python that allows you to do division and modulus operation together. Using it, you can make your code slightly beautiful, thus Pythonic! Check the following example - >>> 10 / 3 3.3333333333333335 >>> 10 // 3 3 >>> 10 % 3 1 >>> divmod(10, 3) (3, 1)

Python all any built-in function

I have been using Python for a long time, still learning lot of simple and new things. Recently I came to know about two built-in functions named all and any . Today while I was solving a hackerrank problem , I used both functions which makes the code a bit nicer, in my opinion. The functions compares two singly linked lists and if they are equal, returns 1, otherwise returns 0. Here is my code - def compare_lists(llist1, llist2): while all([llist1, llist2]): if llist1.data != llist2.data: return 0 llist1 = llist1.next llist2 = llist2.next if any([llist1, llist2]): return 0 return 1 Sometimes, this kind of simple thing brings joy to me. Let me know your thoughts. :)

Create 2D Array using List in Python

If for some reason you want to create a two dimensional array in Python and don't want to use external packages like numpy etc., the most obvious thing is to write a code like this : >>> n = 3 >>> m = 4 >>> ara = [[0] * m] * n >>> ara [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]] But there is a problem. Check the code below- >>> ara[0][0] = 1 >>> ara [[1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0]] Actually, when you do [0] * m, it returns a reference to a list and when you multiply it with n, the same reference is duplicated. Hence you see the change in a way that you didn't expect. The right way to do it is to append [0] * m, n times in the array. You can code it in multiple ways. My preferred way is: >>> ara = [[0] * m for _ in range(n)]

Table Driven Unit Test in Python

Now days, table driven tests are pretty much industry standard. In my workplace, we use table driven tests when we write unit tests (in golang though). Here I shall share a simple code example using pytest that shows how to write table driven tests in Python. In table driven test, what you need to do is, to gather all the tests cases together in a single table. We can use dictionary for each test case and a list to store all the test cases. Instead of discussing it further, let me show you an example : def average(L): if not L: return None return sum(L)/len(L) def test_average(): test_cases = [ { "name": "simple case 1", "input": [1, 2, 3], "expected": 2.0 }, { "name": "simple case 2", "input": [1, 2, 3, 4], "expected": 2.5 }, { "name": "list w...

global and nonlocal variable in Python

Most of us are already familiar with global variables in Python. If we declare those variables in a module, the functions inside that module (can read python file or .py file) can access the variable. For example, check the code below : x = 5 def myfnc(): print("inside myfnc", x) def myfnc2(): print("inside myfnc2", x) myfnc2() myfnc() It will print :  inside myfnc 5 inside myfnc2 5 If you change your code like this : x = 5 def myfnc(): print("inside myfnc", x) def myfnc2(): print("inside myfnc2", x) x = 10 print("x = ", x) myfnc2() myfnc() You will get an error :   File "program.py", line 6, in myfnc2     print("inside myfnc2", x) UnboundLocalError: local variable 'x' referenced before assignment The moment you wrote x = 10, Python assume that x is a local variable, and inside the print function, it is giving this error. Because local variables are determi...

Python Code Measure Time

How to measure time of a Python code snippet? We can do it using the timeit module . It has a function called timeit that runs a particular code snippet for N times (one million by default) and gives you the runtime. Today I just used it for a small purpose and thought of writing a note here. >>> from timeit import timeit >>> timeit('"c" >= "a" and "c" <= "z"') 0.06282545300200582 >>> timeit('"c".isalpha()') 0.06563570606522262 Here I compared the performance between two snippets : one uses isalpha() to determine if a (lowercase) character is an alphabet or not, another just compares using operators. There is no noticeable performance difference. Please go through the module doc to get some more ideas of how to use the timeit module , if will often come handy when you quickly want to measure running time of different Python code.

Code School Python Courses

As Python is getting more and more popular and becoming the de facto standard for starting to learn programming, we can see increasing amount of online content to learn Python. Last week got an email form Code School team to check out their couple of new Python courses. I spent around 30 minutes watching the videos, checkout out the exercises. The exercises are good, and can be done online. They also prepared the videos with care. But one thing that bothers me, the lecture is kind of robotic. I didn't feel anything special, in fact didn't feel the connection to the teacher that I used to feel in other courses in Coursera and Udacity. Anyway, Python beginners can check those courses: 1. Try Python 2. Flying through Python

How to get image size from url

Today I needed to get image size (width and height) from image urls. I am sharing my python code here. import requests from PIL import Image from io import BytesIO def get_image_size(url): data = requests.get(url).content im = Image.open(BytesIO(data)) return im.size if __name__ == "__main__": url = "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQiI_ipTLsmwY5rfnlGkMdp0Bq2bwLmlhtXRuQMJjCYxojb1bxaZ-LRGdiMlP0XiBWlgG0HCF3GbHQAlrMVi16isP7lwWhBySdTtipnnuWk6UrYiztsKtwStN_0-pyR_mlnMbt9_5rCx14/s1600/sbn_pic_2.jpg" width, height = get_image_size(url) print width, height I used pillow - which is a PIL fork. Feel free to comment if you have any suggestion to improve my code.

crawler controller

I work on a project where I have written 20+ crawlers and the crawlers are running 24/7 (with good amount of sleep). Sometimes, I need to update / restart the server. Then I have to start all the crawlers again. So, I have written a script that will control all the crawlers. It will first check if the crawler is already running, and if not, then it will start the crawler and the crawler will run in the background. I also saved the pid of all the crawlers in a text file so that I can kill a particular crawler immediately when needed. Here is my code : import shlex from subprocess import Popen, PIPE site_dt = {'Site1 Name' : ['site1_crawler.py', 'site1_crawler.out'],  'Site2 Name' : ['site2_crawler.py', 'site2_crawler.out']} location = "/home/crawler/" pidfp = open('pid.txt', 'w') def is_running(pname): p1 = Popen(["ps", "ax"], stdout=PIPE) p2 = Popen(["grep", pname...

finding maximum element of a list - divide and conquer approach

Here I share my Python implementation of divide and conquer approach of finding maximum element from a list : """ Divide and conquer approach to find the maximum from a list """ def find_max(li, left, right): if left == right: return li[left] mid = (left + right) / 2 max1 = find_max(li, left, mid) max2 = find_max(li, mid+1, right) return max1 if max1 > max2 else max2 def main(): li = [1, 5, 2, 9, 3, 7, 5, 2, 10] print "Maximum element of the list is", find_max(li, 0, len(li)-1) if __name__ == "__main__": main()

pow() function returns float or integer?

Today, someone posted this code and asked why x is float ? >>> from math import pow >>> x = pow(5, 2) >>> x 25.0 It's float because the pow function of math package returns float. But there is a built-in pow function in Python which will return integer in this case. You need to restart the python interpreter before you run the following code : >>> a = 5 >>> x = pow(a, 2) >>> x 25 >>> type(x) <type 'int'> So, don't get confused. Happy coding! :)

AttributeError: 'module' object has no attribute

Are you getting this type of error in your Python program? AttributeError: 'module' object has no attribute 'whatever' ?  Well, then you might need to change the name of your python file. For example, save the following code in a file named json.py : import json print json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}]) Then try to run it, and you will get this error : $ python json.py Traceback (most recent call last):   File "json.py", line 1, in     import json   File "/home/work/practice/json.py", line 3, in     print json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}]) AttributeError: 'module' object has no attribute 'dumps' But of course the json module has 'dumps' attribute! What went wrong? When you wrote import json, it tries to find the file json.py in the current directory first and it got the file that you wrote (you saved the file as json.py) a...

fastest list index searching

Someone was looking for fastest list index searching (in Python) in stackoverflow. There are only four elements in the list, still it was as issue for him, as it was used in an inner-loop and the profiler identified that, this part was executing the most. He tried several approaches. 1. This is the most obvious one: if value in mylist: return mylist.index(value) Here the problem is, in the condition used in if, it takes O(n) time to check if the value is in the list, and the return statement - mylist.index(value), also takes O(n) time. 2. The second approach - I call it smart approach : try: return mylist.index(value) except ValueError: return None This code gets rid of the condition check. But the try-except thing takes some time for the Python interpreter. I like this approach. In fact, if the list was large, I would prefer this. 3. The third approach - I guess it's more Pythonic? for i, x in enumerate(mylist): if x == value: return i return N...

Python list of lists

I just stumbled upon a post/problem in stackoverflow. Here I discuss it and the solution. First check the following code segment : >>>visited = [[False]*4]*4 >>>visited[0][1] = True >>>print visited >>> [[False, True, False, False], [False, True, False, False], [False, True, False, False], [False, True, False, False]] Do you see the problem, second element of all the list changed to True! So it means [[False]*4]*4 creates a list that contains reference to the same list [False, False, False, False] You can avoid this by following this code: >>> n = 4 >>> m = 4 >>> visited = [] >>> for x in xrange(n): ... visited.append([False] * m) ... >>> visited [[False, False, False, False], [False, False, False, False], [False, False, False, False], [False, False, False, False]] >>> visited[0][1] = True >>> visited [[False, True, False, False],...

turtle triangle fun

Image
Just Wrote this piece of code. First just copy-paste the code, run and have fun. Then try to write it yourself and have more fun. :) import turtle def draw_triangle(side_length, depth): if depth == 0: return counter = 0 while counter < 3: counter += 1 brad.forward(side_length/2) if depth > 1: brad.left(120) draw_triangle(side_length/2, depth-1) brad.forward(side_length/2) brad.right(120) brad.left(240) if __name__ == "__main__": window = turtle.Screen() window.bgcolor("white") brad = turtle.Turtle() brad.shape("turtle") brad.color("green", "green") brad.speed(5) brad.begin_fill() brad.right(120) brad.forward(128) brad.left(120) brad.forward(256) brad.left(120) brad.forward(256) brad.left(120) brad.forward(128) brad.left(120) brad.end_...

Heap Sort Implementation in Python

Here I share my Python implementation of heap sort algorithm . Heap sort is O(n log n) sorting algorithm though quicksort is used more frequently. You should learn heap sort to implement priority queue or at least learn the internals of priority queue. :) In the Python code, I directly followed the heap sort algorithm presented in Introduction to Algorithms by CLRS. # heapsort def max_heapify(li, i, heap_size): l = i * 2 #left(i) r = l + 1 #right(i) if l li[i]: largest = l else: largest = i if r li[largest]: largest = r if i != largest: li[i], li[largest] = li[largest], li[i] max_heapify(li, largest, heap_size) def build_max_heap(li, heap_size): for i in range(heap_size/2 - 1, 0, -1): max_heapify(li, i, heap_size) def heap_sort(li, heap_size): build_max_heap(li, heap_size) for i in range(len(li[1:]), 1, -1): li[1], li[i] = li[i], li[1] heap_size = heap_size - 1 max_heapify(li, 1, heap_size) def main(): # we shall consi...

Merge Sort Implementation in Python

Today I post the python implementation of merge sort. This is straight-forward implementation of the algorithm presented in the book Introduction to Algorithms [CLRS]. If you think the implementation can be more Pythonic, feel free to comment. Here is my code: def merge(li, start, mid, end): left_li = [] left_li.extend(li[start : mid+1]) right_li = [] right_li.extend(li[mid+1: end+1]) left_li.append(2147483647) right_li.append(2147483647) i = 0 j = 0 for k in range(start, end+1): if left_li[i] You can test the merge sort implementation against this problem: http://www.spoj.com/problems/MERGSORT/

recursive binary search in python

In my previous post, I showed the iterative way to implement binary search algorithm in Python. Here comes the recursive way. def binary_search_recursive(li, left, right, key): while True: if left > right: return -1 mid = (left + right) / 2 if li[mid] == key: return mid if li[mid] > key: right = mid - 1 else: left = mid + 1 return binary_search_recursive(li, left, right, key) if __name__ == "__main__": li = [1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12] left = 0 right = len(li) for key in [8, 6, 1, 12, 7]: index = binary_search_recursive(li, left, right, key) print key, index