PyDigger

About PyDigger

The goal of this project is to analyze Python package uploaded to PyPI and find relatively easy ways to contribute to the Python ecosystem. Some of the smallest contributions could be making improvements to the meta-data of the project that is usually stored in the pyproject.toml file. For example many packages do not include a link to their public VCS (Version Control System). Adding it will make it possible for PyPI to link to the repository and it will make it easier for the next potential contributor to find the source code and contribute to the package. See how to add VCS to a Python project
This is the "new" PyDigger. It still does not have many of the features the old one had, but it is being improved. Feel free to open issues on the repository of the backend that collects the data and the frontend that displays it.
Main developer: Gabor Szabo.

Recent uploads

Licenses

License Count Percentage
{{item.license}} {{item.count}} {{((item.count / data.total) * 100).toFixed(2)}}%

No License

No License: {{data.license.no_license_count}}

Unrecognized License

Unrecognized License: {{data.license.bad_license_count}}

Long License

Total: {{data.license.long_license_count}}

VCS

{{host}} {{count}}

No VCS

Each distribution uploaded to PyPI can contain a link to the public VCS of the project. That is the GitHub, GitLab, etc. repository of the project. There are several ways to do that.
No VCS: {{data.vcs.no_vcs_count}}

Unrecognized VCS

Unrecognized VCS: {{data.vcs.bad_vcs_count}}

GitHub Projects

Total: {{data.vcs.github_count}}

GitLab Projects

Total: {{data.vcs.gitlab_count}}

Has GitHub Actions

Total: {{data.vcs.has_github_actions_count}}

No GitHub Actions

Total: {{data.vcs.no_github_actions_count}}

Has Dependabot

Total: {{data.vcs.has_dependabot_count}}

No Dependabot

Total: {{data.vcs.no_dependabot_count}}

Has GitLab Pipeline

Total: {{data.vcs.has_gitlab_pipeline_count}}

No GitLab Pipeline

Total: {{data.vcs.no_gitlab_pipeline_count}}
The goal is to improve the Python ecosystem step-by-step starting with really small contributions such as updating some of the meta-data of the projects, through slightly more involved contributions such as setting up CI for the project to atting tests to projects. See our Statistics
Total: {{data.total}}
Category Count Percentage
{{item.label}} {{item.getValue(data)}} {{((item.getValue(data) / data.total) * 100).toFixed(2)}}%

Version Control System (VCS)

Nowadays most developers use some kind of a version control system for their project, usually with a publicly hosted version with read access to everyone. The most commonly used Version Control System is Git. The most popular public Git hosting services are GitHub, GitLab, and BitBucket.
In order to make it easier to the users of the project, many people include a link to the version control system of the project in the meta-data of the project. In most cases they use the home_page field. In other cases the link is embedded in the description field in the JSON file.
Our database indicates that about 32% of the packages have no VCS, however we have to take in account that our process only checks the home_page field (not the description) and until June 12, 2020 the system only recognized GitHub URLs. Since then it also recognized GitLab and BitBucket URLs.
In order to make it easier for every to be able to contribute to their favorite project we would like to encourage more project owners to include the link to their public VCS in their meta-data. Ideally there would be a specialized field for this (instead of re-using the home_page field), but for now the home_page will do it.

How can you help fixing the situation?

Check the list of packages where we have not found a VCS. First make sure the package indeed does not have a link to its VCS. Our code might have bugs or the package might have been indexed before some of our latest improvements to our indexing.
If there really is no VCS then finding and adding the link to it is probably not an easy task. One thing that can be done is to see if the same author has multiple packages, maybe some of the other packages have VCS links. Then the VCS of this package might be in a similar place. In addition on could just manually search the major Git hosting services.

setup.py

Here is the section of the setup.py file:
            url='https://github.com/user/project',
        

pyproject.toml

        [project.urls]
        source = "https://github.com/me/spam.git"
        
Though it can also be "Repository".
See Writing your pyproject.toml
See also the well known labels.

Suggestions for commit message and PR message

         chore: Add repository URL to pyproject.toml
         To allow PyPI to link to it and to make it easier for people to find it.
        
         I found your project through the [PyDigger](https://pydigger.code-maven.com/)
         that helps me find Python projects where I can have some small contribution.
        

Author field

Background

The old PyDigger had a report about missing author field. This new one does not have one yet.
The Pydigger site monitors the uploads to PyPI and collects meta-data about the packages. It shows various statistics and points you to packages that could be improved by adding some simple meta-data to them. One of the easiste to fix is the missing author field.

Adding Author

Each package can have a field called author that contains the name of, well the author. Accourding to our stats about 4.6% of the packages have not author field. About 1% has VCS but no author field making them very easy to fix. (Locating the missing VCS-es (ak Version Control System) will be the material of another post.
The most common way to include it in the package is to add a field called author to the setup function in the setup.py of the project. In the video we checked the eririn project to see how it is done and then fixed the arduino-udev project. This is the pull request you can see in the video. Within a few minutes the Pull-request was accepted.

setup.py

Here is the section of the setup.py file:
            author="Foo Bar",
        
Another project mr.flagly was using a pyproject.toml file which is documented here. I sent the pull-request from another account I was demonstrating it to the OSDC students.

pyproject.toml

Here is the section of the pyproject.toml files:
        authors = [
            { name="Foo Bar", email="[email protected]" },
        ]
        

Steps you can take

  • Visit the list of packages that have VCS but no author field and pick one.
  • Visit the source code of the package. (Link is on the page of the package)
  • In the VCS (Github, GitLab, Bitbucket, etc. look for the name of the author of the package.
  • Locate the setup.py in the root of the repository
  • Edit the file and add the author field.
  • Save it. (This creates a commit)
  • Send a Pull-Request

Adding the License field

Adding the license is technically very similar to adding the author field, but knowing what to write is a bit more challenging. You'll have to check what is the real license of the project. Most probably there is a file called "LICENSE" in the root of the project with the full text of the license. Issue #20 call for the collection of that information.
Alternatively you can locate the bug-tracking system of the project (they are usually the same place the version control system can be found) and you can open a ticket or issue depending the word the particular system uses.

Shortening the license field

We can list package with long licenses (over 50 character), but we don't have a way to list the ones that also have a VCS. In general it would be better to use a short and standard name for each license. That will make them more comparable.

The technical details

The JSON file of each package can have a field called license that contains the short name of the license. The most common way to include it in the packages is to add a field called license to the setup function in the setup.py of the project or add the same field under [build-system] in pyproject.toml file. Ticket #19 futher discusses the issues.
Here is the section of the setup.py file:
          setuptools.setup(
          name="name of the project",
          version="1.1.7",
          author="Foo Bar",
          author_email="[email protected]",
          license="MIT"
        
Here is the section of the pyproject.toml file:
          [build-system]
          requires = ["poetry>=0.12"]
          build-backend = "poetry.masonry.api"
          license = "Apache-2.0"
        

OSI approved licenses

One of the goals is to make it easy to check if a package has an OSI approved license or not. The short names probably should be either the names of URLs of OSI approved licenses or on the short names should be hosted on PYPI web site.

Packaging

setup.cfg / setup.py

pyproject.toml

Adding Summary

Adding the summary is technically very similar to adding the author field, but knowing what to write is a bit more challenging. You'll have to understand what is the project about and you must be able to write a one-line description.
Alternatively you can locate the bug-tracking system of the project (they are usually the same place the version control system can be found) and you can open a ticket or issue depending the word the particular system uses.

The technical details

The JSON file of each package can have a field called summary that contains the short description of the package. The most common way to include it in the packages is to add a field called description to the setup function in the setup.py of the project. (Not to be confused with the long_description.)
In setup use the field description to provide a one-line description that will become the summary field in the JSON file.
In setup use the field long_descrption to provide, well, a long description that will become the description field in the JSON file.
Yes, I know the naming is a bit confusing.

{{projectData.name}} {{projectData.version}}

Author: {{projectData.author}} missing author
Maintainer: {{projectData.maintainer}} no maintainer
License: {{projectData.license}}
Home page: {{projectData.home_page}} no homepage
Summary: {{projectData.summary}}
Published: {{new Date(projectData.pub_date*1000)}}
Loading...

Project not found

{{projectError}}