I am new to Python3 and working with textfiles. I am trying to extract all filenames from a log file that end in JavaScript (.js) extensions. The file contains other file extensions also. I want to return only the filename and not the path, sort the output alphabetically and display uniuqe values as there are repeats in the log entries.
Examples from the log file are:
72.133.47.242 - - [25/Apr/2013:15:45:28 -0700] "GET /include/jquery.js HTTP/1.1" 200 25139
22.133.47.242 - - [25/Apr/2013:15:45:28 -0700] "GET /include/jquery.jshowoff.js HTTP/1.1" 200 25139
In this case I just want to return jquery.js and jquery.jshowoff.js and not the HTTP request and other log data.
This is my code so far:
filepath = '/home/user/Documents/access_log.txt'
with open(filepath, 'r') as access_log:
contents = access_log.readlines()
for line in contents:
if ".js" in line:
print(line)
My ouput does return only lines that contain .js in them but I don't know how to extract the rest. I have tried to use regex to match but have not been successful as I'm also new to using that. Any help would be greatly appreciated.