I have a Git repository URL and a branch name.
Using GitPython, how do I get all the commits from the branch?
From https://gitpython.readthedocs.io/en/stable/tutorial.html
Meet the Repo type
The first step is to create a
git.Repoobject to represent your repository.from git import Repo # rorepo is a Repo instance pointing to the git-python repository. # For all you know, the first argument to Repo is a path to the repository # you want to work with repo = Repo(self.rorepo.working_tree_dir) assert not repo.bareIn the above example, the directory
self.rorepo.working_tree_direquals/Users/mtrier/Development/git-pythonand is my working repository which contains the.gitdirectory. You can also initialize GitPython with a bare repository.
...
...
The Commit object
Commitobjects contain information about a specific commit. Obtain commits using references as done in Examining References or as follows.Obtain commits at the specified revision
repo.commit('master') repo.commit('v0.8.1') repo.commit('HEAD~10')
...
I would suggest reading the tutorial I quoted and at least the entire The Commit Object section of it. (https://gitpython.readthedocs.io/en/stable/tutorial.html#the-commit-object)
Advanced Repo Usage section of the Meet The Repo section and you should also check out this relevant documentation: gitpython.readthedocs.io/en/stable/…in shell: clone remote to local repo
git clone url
in Python: list all commits
r = git.Repo("path/to/directory")
all_commits = list(r.iter_commits(branch_name))
If you only want the commits that were changed between your branch and another, you can use
all_commits = list(r.iter_commits(f"{base_branch}...{branch_name}"))
instead.
If you wanted for example to get monthly updates, you can use something like this:
def get_commits_by_month(repo_path):
try:
# Initialize the Git repository object
repo = git.Repo(repo_path)
# Dictionary to store commits: {user: {year_month: count}}
commits_by_user = {}
# Get all commits from the repository
commits = list(repo.iter_commits())
# Process each commit
for commit in commits:
# Get author name and commit date
author = commit.author.name
commit_date = datetime.fromtimestamp(commit.authored_date)
# Format year-month key (e.g., "2023-02")
year_month = commit_date.strftime("%Y-%m")
# Increment counter for this user and month
if author not in commits_by_user:
commits_by_user[author] = {}
if year_month not in commits_by_user[author]:
commits_by_user[author][year_month] = 0
commits_by_user[author][year_month] += 1
# Print results sorted by author
print("\nCommit Counts by User and Month:")
print("-" * 40)
for author in sorted(commits_by_user.keys()):
print(f"\nAuthor: {author}")
# Sort months chronologically
for year_month in sorted(commits_by_user[author].keys()):
count = commits_by_user[author][year_month]
print(f" {year_month}: {count} commits")
# Print total for this author
total_commits = sum(commits_by_user[author].values())
print(f" Total commits: {total_commits}")
# Print overall statistics
total_all_commits = sum(sum(months.values()) for months in commits_by_user.values())
print("\n" + "-" * 40)
print(f"Total commits across all users: {total_all_commits}")
print(f"Number of contributors: {len(commits_by_user)}")
except git.exc.InvalidGitRepositoryError:
print(f"Error: '{repo_path}' is not a valid Git repository")
except Exception as e:
print(f"An error occurred: {str(e)}")
Feel free to tweak this exactly to how you want to use the commits.
This is another good option:
import git
from datetime import datetime
from dateutil.relativedelta import relativedelta
from collections import defaultdict
def get_user_commits_by_month(repo_path):
commit_data = defaultdict(lambda: defaultdict(int)) # Nested defaultdict for easier handling
repo = git.Repo(repo_path)
today = datetime.today()
one_year_ago = today - relativedelta(years=1)
# Use `since` to optimize commit retrieval
commits = repo.iter_commits(since=one_year_ago.strftime("%Y-%m-%d"))
for commit in commits:
commit_date = commit.authored_datetime
commit_month = commit_date.strftime('%Y-%m') # Use YYYY-MM format for proper grouping
commit_author = commit.author.name
commit_data[commit_author][commit_month] += 1
# Print results in a readable format
print(f"{'Author':<20} {'Month':<10} {'Commits':<10}")
print("-" * 40)
for author in sorted(commit_data):
for month in sorted(commit_data[author]):
print(f"{author:<20} {month:<10} {commit_data[author][month]:<10}")
# Call the function with your repo path
repo_path = <path>
get_user_commits_by_month(repo_path)
commits = list(repo.iter_commits(f"{branch_name}~3..{branch_name}", **{'max-count': 3}))