About & Methodology
What is the Electric Capital Developer Report?
Electric Capital's Developer Report is an effort to quantify the developer activity happening in the crypto ecosystems. The report comes from analysis across 200M+ crypto related git commits across 350K+ crypto code repositories. This initiative was started in 2018 and has been published after the end of the year to capture the insights that we considered most meaningful.
In 2022, we launched a public live dashboard with the most important metrics and charts in crypto and also the breakdown of the different ecosystems that compose crypto. This dashboard is built by Electric Capital and uses the same data that underpins the annual Electric Capital Developer Report.
Why measure developer activity?
An early and leading indicator of value creation in emerging technology ecosystems is developer engagement. Developers build killer applications that deliver value to end users, which attracts more customers, which then draws more developers.
Because Crypto is significantly open source, we have a unique and unprecedented ability to understand an emerging industry that may one day reach billions of users.
This dashboard provides a comprehensive view of the activity and contributions of crypto developers. It tracks the number and types of code commits, pull requests, and other contributions made by developers across various crypto projects.
We share this data publicly in the hopes of helping the community have a more clear understanding of our collective progress. We are grateful to everyone in the community who helps by contributing to the Crypto Ecosystems Repository on Github, the foundations who help us validate our analysis, and the friends who offer feedback on drafts.
How can I contribute to the Taxonomy?
If you want to check if your projects are in the taxonomy and contribute if they are missing, you can search them in our official repository or you can check the README if you want to learn how to contribute.
How do you find the data being used to produce the charts?
We use a combination of proprietary tools and crowdsourcing to track and analyze the activity of developers. We divide the entire process into 4 steps:
- Sourcing code repositories: We use public contributions from new ecosystems and/or their code repositories, code searches, and public project list crawling. All the results gathered during this stage are converted into what we call “candidate repositories.”
- Public contributions: Individuals, foundations, hackathon organizers, and researchers have been submitting PRs (Pull Requests) to our taxonomy for years. The team at Electric reviews each PR until they are merged and start counting in the metrics.
- Code searches: For each ecosystem, we search for keywords that indicate the usage of certain libraries, frameworks, or blockchains that can be mapped to a specific ecosystem. For example, repositories with a dependency and usage of solana/web3.js are likely to belong to the Solana ecosystem.
- Project list crawling: We obtain lists of candidate code repositories from websites like ecosystem project sites, general crypto aggregators, hackathon websites, etc.
- Classifying Candidate Repositories: The next step is to assign the found crypto repositories to the ecosystems it belongs to. Our classifiers use ad-hoc heuristics like:
- Keyword count in code, commit messages, README, repository description.
- File extension presence in the repository file tree.
- Files with certain name and structure in the repository file tree.
- Dependency analysis.
- Crawling Source Code: We aggregate data from version control systems like Git to track code changes, commit logs, repository metadata, etc. This allows us to attribute original authors based on fingerprinted code, detect copied and pasted code, and understand which ecosystems are actually interconnected.
- Cleaning Data: We use a combination of automated techniques to detect code duplicates so we can mark duplicate commits that don't necessarily result from forking, and employ manual tools and human review to identify potential issues with the data.
- QA and Feedback: We are in constant communication with the largest ecosystems in crypto. Their teams help validate the data, assist us in identifying projects that our searches may have missed, pinpoint data quality issues, and aid in cross-checking our numbers with their internal metrics.
We've adjusted the repository classifiers to prioritize higher Precision over Recall, aiming to minimize the inclusion of repositories in ecosystems where they don't belong.
Exceptionally and in order to ensure high-quality data, we manually review all the candidate repositories that have a large amount of contributors or a high probability of belonging to numerous ecosystems.
The preceding steps form a loop and an iterative process. We are constantly sourcing, reviewing, crawling, cleaning, and QA-ing the data. If your ecosystem is not represented in our taxonomy, please follow the instructions here and open a PR on GitHub.
Which commits and developers are counted towards an ecosystem's total?
We strive to credit only the original authors and original ecosystems that produce code. This is achieved by fingerprinting the first instance of code created by the original author.
- Forks: Only new code counts towards developer activity. We exclude code and developer activity from merging changes from an upstream codebase.
- Original Code: We eliminate copy/pasted code across libraries by fingerprinting it and crediting the original author. Factors considered include the file, lines changed, commit message, committer, author, and associated dates to determine the fingerprint.
- Branches: We analyze commits from all branches (master/main/development, etc.) and tags. Our approach is far more in-depth than other data sources and offers a more comprehensive view than Github's default view.
Note: Currently, our focus is only on open-source repositories. Many repositories are not yet open-source. We are progressively enriching our open-source code data with on-chain smart contract-related data.
Why are some ecosystems not showing up on the website? What's the selection criteria?
We filter down ecosystems in our taxonomy to create the list of ecosystems that show up in the “Top Ecosystems Monthly Active Developers” table and that have their own subsection page with charts. The triple-filter is as follows:
- The ecosystem must have a token.
- It should be ranked within the Top 200 by network value of the previous year.
- It must be among the top 100 by the number of full-time developers.
- More than 50 developers at the end of 2023.
Eligible if one of the following is true:
- Top 200 by network value in the previous year.
- If the private market valuations is over 500m.
- More than 50 developers at the end of 2023.
On web site: Top 100 by developer count from the eligible set.
How often is the Top Ecosystems updated?
The list of top ecosystems is updated at least once prior to the publication of each annual report. Additional updates may occur if there are significant changes to the list.
What are Monthly Active Devs, Full-Time, Part-Time and One-Time developers?
Developers - We count original code authors as developers. This means that a developer who merges a pull request is not an active developer on the project, but the original authors of the commits are.
- Monthly Developer - We count commits during the next 28 days of the commit happening to generate more stability in the data.
- Full-Time Contributors - Contributed code 10+ days out of a month.
- Part-Time Contributors - Contributed code fewer than 10 days out of a month.
- One-Time Contributors - Contributed code once in a rolling 3-month window.
How does the analysis filter out noise from gamed active commits and developer numbers that teams might manipulate knowing they're being monitored?
We take several steps to filter the noise:
- We fingerprint each commit so we remove any commits that are forks. Fingerprinting is our technique used to identify commits originating from upstream projects. We look at the files and lines changed, the commit message, committer, author, and associated dates. This eliminates copy/pasted code from counting towards an ecosystem. In order to do this, we have fingerprinted 483M commits to date.
- We identify "bots" that are committing code."Bots" are not always malicious. Often, they are automations that a team puts in place. We detect and remove those.
- We dedupe developers. Often developers will commit code from multiple accounts. We detect and collapse accounts into a single developer "entity" as to not overcount developers.
- We segment developers based on the number of days they contribute code. We segment developer by "Full Time", "Part Time", and "One-Time" counts. A "Full-Time" developer is someone who commits code 10 or more days in a rolling 28 day period.
No technique is 100% foolproof and we are constantly updating our methodology to account for issues. These are techniques that we uniquely created. Some metrics like "Full Time", "Part Time", etc have been adopted by others in the industry since they are a better, more nuanced way of understanding developer activity. As far as we know, no other team currently goes to the depth we use to remove noise from data since it takes significant infrastructure investment to do what we do and our infra has evolved significantly in the 5 years since we first published the report.
We always open to suggestions — if you know of certain ways teams are gaming the data, please let us know and we will work on reviewing those cases and potentially improve our methods.
What's the difference between “Single-chain” and “Multi-chain” developers?
In previous years, developers were only counted towards a chain if they met at least 2 of the following conditions:
- The project was first launched on this chain.
- The project's governance tokens were hosted on this chain.
- The primary medium of exchange for the project used this chain's tokens.
- The project's governance activities primarily occurred on this chain.
- The project's team or website publicly expressed support exclusively for this chain.
This approach, however, became inadequate as crypto projects increasingly embraced multiple chains, focusing on interoperability rather than allegiance to a specific chain.
To address this, “Multi-chain” and “Single-chain” categories were introduced at the end of 2023. Multi-chain developers are involved with projects that support multiple chains, or work on various projects across different chains. Exclusive developers, on the other hand, work on projects supporting a single ecosys.
Repositories are evaluated based on their chain connections too. Repos linked to multiple chains are Multi-chain; those connected to just one chain are Single-chain. Developer activities are categorized accordingly, except for “First party” organization repos, which are always classified as Exclusive to their specific chain (e.g., 'https://github.com/ethereum' for Ethereum).
If a project like Lido (before they sunsetted Solana) is involved in multiple chains, any developer working on Lido would be counted towards Solana's Multi-chain developer count. Conversely, if a project is exclusively part of Solana, like Solend, then developers working only on Solend would be categorized as Single-chain. This would change if they also work on a project that is part of another chain, like Uniswap.
In summary, our current attribution model distinguishes between 2 categories:
Multi-chain developersWork on a project that supports multiple chains, OR Work on many projects, each supporting a different chain.
Single-chain developersWork on projects that exclusively support Solana.
Note: in figures where there's no mention of Single-chain / Multi-chain that means that the Total (Single-chain + Multi-chain) is being used.
How do you deal with taxonomy changes over time?
Increasingly, projects support and are deployed on multiple chains at once. To attribute fairly in these situations, we categorize dev activity into 2 buckets; 1) Multichain 2) Exclusive.
Multichain/Exclusive Classification: Repositories are evaluated based on their immediate chain connections (either directly to the repo or through the ecosystem they are attached to). Repos linked to multiple chains are classified as multichain, and those connected to a single chain are exclusive. Any developer activity for a repo is then categorized according to the classification of the repo. The only exception in this rule is repos for 'first_party' organizations, which are always exclusive to their specific chain (e.g., 'https://github.com/ethereum' for Ethereum).
Historical Data Attribution: We incorporate a time dimension to assess chain-repo connections. Activities are attributed to an ecosystem only from the established connection date forward. These dates are also used in determining when repos become multichain. Connection dates can be set both between different ecosystems and between ecosystems and repositories. This approach ensures accurate historical attribution, with pre-multichain activities classified as exclusive and post-multichain as multichain.
I've created a custom dashboard using your taxonomy, but the metrics displayed do not match the numbers in your Developer Report. Why might this be?
First of all, when processing the toml files from the taxonomy, it is important to recursively attribute the sub-ecosystems activity to the ecosystem being processed.
Another very important thing to note is that we internally run some data processing steps that may derive in numbers different from yours. Check the next section to learn more about such pipeline.
What are the limitations of your approach?
- Code Quality and Complexity - Some code commits may be routine changes, whereas others represent hours of accumulated research and analysis. Despite these caveats, we consider the analysis in this report directionally accurate.
- Undercounting of Total Crypto Developers - There are many more developers than accounted for in our report. Some teams are working on important closed source projects. Some teams will open-source their code later. We also undercount developers in roles such as testing or release engineering as their efforts may not result in unique code contributions. It will require more than just software engineers to build products and reach mainstream adoption, so this is likely a dramatic undercounting of the number of people building in crypto & Web3.
Is there a paid tier to get more data?
We do not have a paid tier for additional access.
Do you have an API to access the data?
We do not have an API yet. If you are interested, please email us at email@example.com and let us know your use case. We will gather requirements to scope a potential API as we hear from potential users of the API.
Do you have disclosures?
Disclosures / Not Financial Advice - You should not construe any such information or other material as legal, tax, investment, financial, or other advice. Nothing contained in this report constitutes a solicitation, recommendation, endorsement, or offer by Electric Capital or any third party service provider to buy or sell any securities or other financial instruments in this or in any other jurisdiction in which such solicitation or offer would be unlawful under the securities laws of such jurisdiction.
Can I use this data? How should I cite this data?
You can use the data for non-commercial purposes only. If you use this data, you must cite Electric Capital as the originator and link back to www.developerreport.com as the source immediately adjacent to the location that showcases the data. This means that you may not hide a link to this data as a citation in a secondary page, generate a graph to show to your users, and imply that this data is your own.
If you directly use one of the graphs presented on this site, say in a screenshot, you must leave the watermark in place, cite Electric Capital, and link back to www.developerreport.com as the source of the data.
Taking this data and re-selling it or including it as part of a paid product is expressly forbidden. This is a public good that should be used by the community and to advance crypto as a whole.