About & Methodology

What is the Electric Capital Developer Report?

Electric Capital's Developer Report is an effort to quantify the developer activity happening in crypto ecosystems. The report analyzes billions of crypto-related git commits across millions of crypto code repositories. This initiative was started in 2018 and has been published annually to capture the insights we consider most meaningful.

In 2022, we launched this public live dashboard with the most important metrics and charts in crypto, including breakdowns of the different ecosystems that compose crypto. The dashboard provides a comprehensive view of the activity and contributions of crypto developers, tracking the number and types of code commits, pull requests, and other contributions made by developers across various crypto projects.

This dashboard is built by Electric Capital and uses data from Open Dev Data, our open source platform for measuring developer activity across crypto and the decentralized web.

Why measure developer activity?

An early and leading indicator of value creation in emerging technology ecosystems is developer engagement. Developers build killer applications that deliver value to end users, which attracts more customers, which then draws more developers.

Because crypto is significantly open source, we have a unique and unprecedented ability to understand an emerging industry that may one day reach billions of users. We share this data publicly in the hopes of helping the community have a more clear understanding of our collective progress.

How to contribute?

We are grateful to everyone in the community who helps by contributing to the Open Dev Data taxonomy on GitHub, the foundations who help us validate our analysis, and the friends who offer feedback on drafts.

If you want to check if your projects are included in our analysis or contribute missing projects, visit the Open Dev Data taxonomy tool or learn how to contribute in the GitHub repository.

How do you find the data being used to produce the charts?

We use a combination of proprietary tools and crowdsourcing to track and analyze the activity of developers. We divide the entire process into 4 steps:

  1. Sourcing code repositories: We use public contributions from new ecosystems and/or their code repositories, code searches, and public project list crawling. All the results gathered during this stage are converted into what we call “candidate repositories.”
    • Public contributions: Individuals, foundations, hackathon organizers, and researchers have been submitting PRs (Pull Requests) to our taxonomy for years. The team at Electric reviews each PR until they are merged and start counting in the metrics.
    • Code searches: For each ecosystem, we search for keywords that indicate the usage of certain libraries, frameworks, or blockchains that can be mapped to a specific ecosystem. For example, repositories with a dependency and usage of solana/web3.js are likely to belong to the Solana ecosystem.
    • Project list crawling: We obtain lists of candidate code repositories from websites like ecosystem project sites, general crypto aggregators, hackathon websites, etc.
  2. Classifying Candidate Repositories: The next step is to assign the found crypto repositories to the ecosystems it belongs to. Our classifiers use ad-hoc heuristics like:
    • Keyword count in code, commit messages, README, repository description.
    • File extension presence in the repository file tree.
    • Files with certain name and structure in the repository file tree.
    • Dependency analysis.

    We've adjusted the repository classifiers to prioritize higher Precision over Recall, aiming to minimize the inclusion of repositories in ecosystems where they don't belong.

    Exceptionally and in order to ensure high-quality data, we manually review all the candidate repositories that have a large amount of contributors or a high probability of belonging to numerous ecosystems.

  3. Crawling Source Code: We aggregate data from version control systems like Git to track code changes, commit logs, repository metadata, etc. This allows us to attribute original authors based on fingerprinted code, detect copied and pasted code, and understand which ecosystems are actually interconnected.
  4. Cleaning Data: We use a combination of automated techniques to detect code duplicates so we can mark duplicate commits that don't necessarily result from forking, and employ manual tools and human review to identify potential issues with the data.
  5. QA and Feedback: We are in constant communication with the largest ecosystems in crypto. Their teams help validate the data, assist us in identifying projects that our searches may have missed, pinpoint data quality issues, and aid in cross-checking our numbers with their internal metrics.

The preceding steps form a loop and an iterative process. We are constantly sourcing, reviewing, crawling, cleaning, and QA-ing the data. If your ecosystem is not represented in our taxonomy, please follow the instructions here and open a PR on GitHub.

Which commits and developers are counted towards an ecosystem's total?

We strive to credit only the original authors and original ecosystems that produce code. This is achieved by fingerprinting the first instance of code created by the original author.

  • Forks: Only new code counts towards developer activity. We exclude code and developer activity from merging changes from an upstream codebase.
  • Original Code:We eliminate copy/pasted code across libraries by fingerprinting it and crediting the original author. Factors considered include the file, lines changed, commit message, committer, author, and associated dates to determine the fingerprint.
  • Branches: We analyze commits from all branches (master/main/development, etc.) and tags. Our approach is far more in-depth than other data sources and offers a more comprehensive view than Github's default view.

Note: Currently, our focus is only on open-source repositories. Many repositories are not yet open-source. We are progressively enriching our open-source code data with on-chain smart contract-related data.

Why are some ecosystems not showing up on the website? What's the selection criteria?

We filter down ecosystems in our taxonomy to create the list of ecosystems that show up in the "Top Ecosystems Monthly Active Developers" table and that have their own subsection page with charts. The selection criteria is evaluated monthly and includes:

Eligibility Requirements - An ecosystem is eligible if it meets one of the following:

  • Top 200 by network value
  • Private market valuation over $500 million
  • More than 50 monthly active developers

Display on Website - From the eligible set, the top 100 ecosystems by full-time developer count are displayed on the website with their own subsection pages and charts.

How often is the Top Ecosystems updated?

The list of top ecosystems is updated on a monthly basis. This ensures the website reflects current developer activity trends and includes emerging ecosystems that meet the eligibility criteria.

What are Monthly Active Devs, Full-Time, Part-Time and One-Time developers?

Developers - We count original code authors as developers. This means that a developer who merges a pull request is not an active developer on the project, but the original authors of the commits are.

  • Monthly Developer - We use a 28-day rolling window to track developer activity, which provides more stable metrics over time.
  • Full-Time Contributors - Developers who are consistently active across multiple weeks based on sustained activity patterns over an 84-day rolling window.
  • Part-Time Contributors - Developers who are intermittently active with regular contributions over an 84-day rolling window.
  • One-Time Contributors - Developers with minimal or sporadic activity over an 84-day rolling window.

How does the analysis filter out noise from gamed active commits and developer numbers that teams might manipulate knowing they're being monitored?

We take several steps to filter the noise:

  1. We fingerprint each commit so we remove any commits that are forks. Fingerprinting is our technique used to identify commits originating from upstream projects. We look at the files and lines changed, the commit message, committer, author, and associated dates. This eliminates copy/pasted code from counting towards an ecosystem. In order to do this, we have fingerprinted 483M commits to date.
  2. We identify "bots" that are committing code."Bots" are not always malicious. Often, they are automations that a team puts in place. We detect and remove those.
  3. We dedupe developers. Often developers will commit code from multiple accounts. We detect and collapse accounts into a single developer "entity" as to not overcount developers.
  4. We segment developers based on the number of days they contribute code. We segment developer by "Full Time", "Part Time", and "One-Time" counts. A "Full-Time" developer is someone who commits code 10 or more days in a rolling 28 day period.

No technique is 100% foolproof and we are constantly updating our methodology to account for issues. These are techniques that we uniquely created. Some metrics like "Full Time", "Part Time", etc have been adopted by others in the industry since they are a better, more nuanced way of understanding developer activity. As far as we know, no other team currently goes to the depth we use to remove noise from data since it takes significant infrastructure investment to do what we do and our infra has evolved significantly in the 5 years since we first published the report.

We always open to suggestions — if you know of certain ways teams are gaming the data, please let us know and we will work on reviewing those cases and potentially improve our methods.

What's the difference between “Single-chain” and “Multi-chain” developers?

Developer classification is determined on a daily basis. A developer's multi-chain status can change from day to day based on their contributions.

A developer is classified as Multi-chain on a specific day if they:

  1. Commit to a repository connected to 2+ chains, OR
  2. Commit to a repository in infrastructure categories (Wallets, Bridges, Oracles, or RPC providers), OR
  3. Commit to multiple repositories that, in aggregate, connect to 2+ chains on that day

A developer is classified as Single-chain (Exclusive) on a specific day if they:

  1. Only commit to repositories connected to the same single chain, OR
  2. Commit to a "first party" organization repository for that chain (e.g., OffchainLabs for Arbitrum, ethereum/go-ethereum for Ethereum)

Important: Multi-chain developers still count toward individual chain developer totals. For example, a developer working on a wallet that supports both Bitcoin and Ethereum counts as a multi-chain developer for both Bitcoin AND Ethereum.

Infrastructure Categories: Developers working on Wallets, Bridges, Oracles, and RPC providers are automatically classified as multi-chain developers because these services inherently support multiple ecosystems, even if they maintain separate chain attributions.

Ecosystem-Level Classification: When an ecosystem supports 10+ chains, it may be reclassified under the "Multichain (Category)" ecosystem, and its developers no longer attribute to individual chains. However, individual repositories within that ecosystem can still maintain direct connections to specific chains and those attributions continue to count.

Examples:

  • A developer working on a DeFi protocol that supports multiple chains is counted as a multi-chain developer but still attributes to each specific chain the protocol supports.
  • If a developer commits to an Arbitrum-specific repo on Monday and an Ethereum-specific repo on Tuesday, they are multi-chain on both days.
  • If a developer only commits to Arbitrum repos all week, they are single-chain for those days.
  • A developer working on the core Ethereum client (first party) who also contributes to a Solana project on the same day is classified as multi-chain.

In summary, our current attribution model distinguishes between 2 categories:

Multi-chain developers

Developers who commit to repositories that support multiple chains, work on infrastructure projects (Wallets, Bridges, Oracles, RPC), or contribute to multiple chains on the same day.

Single-chain developers

Developers who commit exclusively to repositories connected to the same single chain on a given day.

Note: in figures where there's no mention of Single-chain / Multi-chain that means that the Total (Single-chain + Multi-chain) is being used.

How do you deal with taxonomy changes over time?

Increasingly, projects support and are deployed on multiple chains at once. To attribute fairly in these situations, we categorize dev activity into 2 buckets; 1) Multichain 2) Exclusive.

Multichain/Exclusive Classification: Repositories are evaluated based on their immediate chain connections (either directly to the repo or through the ecosystem they are attached to). Repos linked to multiple chains are classified as multichain, and those connected to a single chain are exclusive. Any developer activity for a repo is then categorized according to the classification of the repo. The only exception in this rule is repos for 'first_party' organizations, which are always exclusive to their specific chain (e.g., 'https://github.com/ethereum' for Ethereum).

Historical Data Attribution: We incorporate a time dimension to assess chain-repo connections. Activities are attributed to an ecosystem only from the established connection date forward. These dates are also used in determining when repos become multichain. Connection dates can be set both between different ecosystems and between ecosystems and repositories. This approach ensures accurate historical attribution, with pre-multichain activities classified as exclusive and post-multichain as multichain.

I've created a custom dashboard using your taxonomy, but the metrics displayed do not match the numbers in your Developer Report. Why might this be?

First of all, when processing the toml files from the taxonomy, it is important to recursively attribute the sub-ecosystems activity to the ecosystem being processed.

Another very important thing to note is that we internally run some data processing steps that may derive in numbers different from yours. Check the next section to learn more about such pipeline.

What are the limitations of your approach?

  • Code Quality and Complexity - Some code commits may be routine changes, whereas others represent hours of accumulated research and analysis. Despite these caveats, we consider the analysis in this report directionally accurate.
  • Undercounting of Total Crypto Developers - There are many more developers than accounted for in our report. Some teams are working on important closed source projects. Some teams will open-source their code later. We also undercount developers in roles such as testing or release engineering as their efforts may not result in unique code contributions. It will require more than just software engineers to build products and reach mainstream adoption, so this is likely a dramatic undercounting of the number of people building in crypto & Web3.

Is there a paid tier to get more data?

We do not have a paid tier for additional access.

Do you have an API to access the data?

We do not have an API yet. If you are interested, please email us at devreport@electriccapital.com and let us know your use case. We will gather requirements to scope a potential API as we hear from potential users of the API.

Do you have disclosures?

Disclosures / Not Financial Advice - You should not construe any such information or other material as legal, tax, investment, financial, or other advice. Nothing contained in this report constitutes a solicitation, recommendation, endorsement, or offer by Electric Capital or any third party service provider to buy or sell any securities or other financial instruments in this or in any other jurisdiction in which such solicitation or offer would be unlawful under the securities laws of such jurisdiction.

There are lots of other details in our Terms of Service and Privacy Policy.

Can I use this data? How should I cite this data?

This data is licensed under Creative Commons Attribution 4.0 (CC BY 4.0), which permits both commercial and non-commercial use. Please give appropriate credit to Electric Capital and link back to www.developerreport.com as the source immediately adjacent to the location that showcases the data. This means that you may not hide a link to this data as a citation in a secondary page, generate a graph to show to your users, and imply that this data is your own.

If you directly use one of the graphs presented on this site, say in a screenshot, you must leave the watermark in place, cite Electric Capital, and link back to www.developerreport.com as the source of the data.