Page 70 - Cyber Defense eMagazine February 2024
P. 70
* How widespread is secrets sprawl in PyPI?
At GitGuardian, we worked with security researcher Tom Forbes to scan every PyPI project for embedded
secrets. PyPI, The Python Package Index, serves the Python community as the official 3rd party package
management platform. We analyzed over 450,000 projects containing over 9.4 million files across 5
million released versions. This is what we found:
- Total unique secrets found: 3,938
- Unique secrets found to be valid: 768
- Total occurrences of secrets across all releases: 56,866
- Projects containing at least one unique secret: 2,922
- Individual types of secrets detected: 151
Caption: Distinct secrets by detector over time
*The files containing the most secrets
Given the research was on Python code, it should not be a surprise that files with the extension `.py`
were the number one source for hardcoded credentials. Next most common were configuration and
documentation files such as `.JSON` and `.yml` files. We also found valid secrets in some unexpected
places, such as 209 README files and test folders with 675 unique secrets.
Most common types of files other than .py containing a hardcoded secret in PyPI packages
Cyber Defense eMagazine – February 2024 Edition 70
Copyright © 2024, Cyber Defense Magazine. All rights reserved worldwide.