Methodology

Getting Started

Getting started

At Electricity Maps, we’re data scientists, first and foremost.

Data comes in from many sources, and in many formats. We ingest and harmonize it, apply our models to it, and make it available to the world. This is the place to learn more about our data; read FAQs, or deep dive in our methodology.

Frequently Asked Questions

How good is your data?

How good is your data?

What Emission Factors do you use?

What Emission Factors do you use?

How do you provide live data?

How do you provide live data?

What time horizons do you cover?

What time horizons do you cover?

Are your numbers different from other sources of electricity data?

Are your numbers different from other sources of electricity data?

How do you calculate the grid carbon intensity? What emission factors do you use?

How do you calculate the grid carbon intensity? What emission factors do you use?

Do you provide electricity price as a signal?

Do you provide electricity price as a signal?

Is your data auditable?

Is your data auditable?

How are you forecasting your signals?

How are you forecasting your signals?

How often does your data get updated?

How often does your data get updated?

How long until the data is final?

How long until the data is final?

Historical & Real Time

First, let’s have a look at the Historical time frame

I. Foundational Methodological Choices

For accurate and verifiable data that most closely represents the physical reality of electricity grids, Electricity Maps employs a robust, triple-layered methodological choice: our data is attributional, location-based, and consumption-based.

Attributional Accounting Approach

We align with international standards, such as the GHG Protocol Scope 2 Guidance, which track, GHG emissions and removals within a defined organizational and operational boundary over time. It is the primary method, required by regulation and standards, to report on companies and individuals emissions.

Location Based Method

Our data reflects the physical reality of the grid. A location-based method considers the electricity available on grids where energy consumption occurs and does not include contracts or certificates traded.

Consumption-based Calculation

We provide grid signals (electricity mix, carbon intensity, ...) for the electricity available (or consumed) in a grid, rather than merely what was produced locally. This crucial distinction mandates accounting for electricity flows across grids, which is achieved through our flow-tracing algorithm.

II. Defining Granularity (Space and Time)

To offer actionable data, we support different spatial and temporal aggregations on top of the highest granular data.

  • Spatial Granularity: Our spatial units represent a physical network that connects generators to consumers. They typically correspond to an electricity grid controlled by a single responsible operator. We aim to display the smallest subdivision of electricity grids for which reliable data is available, ensuring the highest accuracy. We also provide data aggregated at a country level.

  • Temporal Granularity: All our data can be delivered with a 5-minute, 15-minute, and hourly granularity to ensure the highest temporal fidelity and accuracy. We also provide data aggregated daily, monthly, quarterly, and yearly.

III. Ingestion: Parser System

High-quality data starts with reliable sourcing, and mandatory standardization.

  • Trusted Data Acquisition: We prioritize obtaining data from the highest-quality, most credible organizations globally, including government agencies (like the EIA in the US), Transmission System Operators (TSOs like ENTSO-E in Europe), and large utility companies. Currently, we have 75 active parsers for real-time electricity mix data and 38 active parsers for exchange data.


  • Multiple time frequencies: We integrate with data sources that support different time granularities. Some parsers run with high frequency to ingest hourly or more granular data, while others run less frequently and ingest monthly or yearly data.

  • The Parser System: We use an open-source parser system to ingest raw data and transform it into a standardized format. This critical step maps disparate raw data inputs (e.g., ENTSO-E's 21 specific modes) into our fixed, harmonized set of 12 distinct production modes. This standardization ensures consistency and comparability across all global zones.

ENTSO-E example

Disparate raw data

Harmonized Set

Fossil Brown Coal / Lignite

Fossil Hard Coal

Fossil Oil Shale

Fossil Peat

Coal

Fossil Oil

Oil

Fossil Coal-derived Gas

Fossil Gas

Gas

Geothermal

Geothermal

Solar

Solar

Hydro run-of-river & poundage

Hydro water reservoir

Hydro

Hydro Pumped Storage

Hydro Storage

Wind Offshore

Wind Onshore

Wind

Biomass

Waste

Biomass

Energy Storage

Battery Storage

Nuclear

Nuclear

Marine

Other

Other renewable

Other

IV. Automated Quality Validation and Outlier Detection

To prevent flawed data from impacting calculations, every ingested data point undergoes immediate and rigorous quality validation.

  • Outlier Detection Pipeline: Our system automatically detects and flags outliers using an Apache Beam pipeline that runs every 15 minutes.

  • Validation Rules: Every component of every data points is subject to multiple configurable validation rules that run in parallel. Some of these checks ensure physical plausibility, such as verifying production levels do not exceed capacity, or ensuring expected modes are not missing. If one component does not pass one of the validation rules, the data point is immediately flagged as invalid.

  • Manual Correction: Recognizing that automatic validation may not catch all faults, we maintain a manual outlier detection process to flag faulty data points and subsequently re-trigger the estimation and flow-tracing pipelines.

original_event

explode event

validation

gas, original_event

wind, original_event

hydro, original_event

Capacity

Expected modes

Range mode

Zero production

Range total

correction

1 - valid datapoint

0 - invalid datapoint

V. The Flow-Tracing Algorithm: Accounting for Electricity Flows across Interconnected Grids

The electricity mix produced in a given area is often not an accurate reflection of what is actually available on the grid, primarily because electricity grids are highly interconnected. Electricity is constantly exchanged between grids through interconnectors. These imports and exports create complicated electricity flows that confound simple production-based accounting.

Our flow-tracing method addresses this fundamental difficulty. This peer-reviewed scientific approach traces electricity flows across all interconnected grids to calculate the electricity mix truly available at each location of the grid.

The methodology is based on two core principles regarding electricity behavior:

  1. Proportional Mixing: When electricity from various sources combines, the sources mix proportionally to their share of the power supplied.

  2. Irreversibility: Once mixed, the electricity cannot be unmixed to select a specific source (analogous to unmixing a smoothie).

By applying these principles, the algorithm mathematically solves the entire network's flows to precisely determine the origin of power consumed on the grid.

If you want to dive deeper into Flow-Tracing, you can read our peer-reviewed paper: Real-time carbon accounting method for the European electricity markets

VI. Precision in Emissions Factors: Direct vs. Lifecycle Emissions

To calculate Carbon Intensity (CI), we match the flow-traced electricity mix with technology-specific emission factors. We offer two primary types:

  • Life-Cycle Emission Factors: These provide a thorough "cradle-to-grave" accounting. They include emissions from building the power plant, operating it, extracting fuel, and disposal at the end of its life.

  • Direct Emission Factors (Operational): These only count the emissions released directly from the operation of the power plant (like burning fuel).

We ensure precision by employing both globally recognized standards and highly specific regional factors.

  • Global Default Factors: Electricity Maps uses the globally recognized and peer-reviewed 2014 IPCC Fifth Assessment Report emission factors as the default for most electricity grids worldwide.

  • Regional Emission Factors: To provide superior accuracy, we have developed advanced methodologies for computing regional emission factors in regions like the US and the EU where power-plant-level data is made available.


  • US: We compute grid-specific emission factors using data from the US Environmental Protection Agency's Emissions & Generation Resource Integrated Database (eGRID), which includes generation and emissions data for most power plants in the US.

  • EU: We compute grid-specific emission factors using emissions data from the European Union’s EU Emissions Trading Scheme (EU-ETS) and power-plant generation data from ENTSO-E.

This multi-faceted approach ensures our emission factors capture regional specificities like fuel type and plant efficiency, and temporal evolutions, resulting in data that is highly specific and often more up-to-date than generalized global standards.

How does this compare to Emission Factors used by the IEA?

We differ in few, but noteworthy ways:

  • IEA uses IPCC 2006 emission factors, whereas we use IPCC 2014 emission factors, in combination with more granular emissions when available. Our approach is more accurate.

  • Electricity Maps uses more granular data for electricity exchanges between zones and countries (hourly, compared to IEA using a yearly balance), upstream emissions factors, as well as spatial and time granularity.

  • IEA and Electricity Maps use different methodologies for the allocation of emissions of combined heat and power (CHP) plants, and inclusion of T&D losses.

    It's important to note that while our emission factors differ in principle from the IEA's, they are generally within the same order of magnitude, and reflect similar overall trends.

VI. Precision in Emissions Factors: Direct vs. Lifecycle Emissions

To calculate Carbon Intensity (CI), we match the flow-traced electricity mix with technology-specific emission factors. We offer two primary types:

  • Life-Cycle Emission Factors: These provide a thorough "cradle-to-grave" accounting. They include emissions from building the power plant, operating it, extracting fuel, and disposal at the end of its life.

  • Direct Emission Factors (Operational): These only count the emissions released directly from the operation of the power plant (like burning fuel).

We ensure precision by employing both globally recognized standards and highly specific regional factors.

  • Global Default Factors: Electricity Maps uses the globally recognized and peer-reviewed 2014 IPCC Fifth Assessment Report emission factors as the default for most electricity grids worldwide.

  • Regional Emission Factors: To provide superior accuracy, we have developed advanced methodologies for computing regional emission factors in regions like the US and the EU where power-plant-level data is made available.


  • US: We compute grid-specific emission factors using data from the US Environmental Protection Agency's Emissions & Generation Resource Integrated Database (eGRID), which includes generation and emissions data for most power plants in the US.

  • EU: We compute grid-specific emission factors using emissions data from the European Union’s EU Emissions Trading Scheme (EU-ETS) and power-plant generation data from ENTSO-E.

This multi-faceted approach ensures our emission factors capture regional specificities like fuel type and plant efficiency, and temporal evolutions, resulting in data that is highly specific and often more up-to-date than generalized global standards.

How does this compare to Emission Factors used by the IEA?

We differ in few, but noteworthy ways:

  • IEA uses IPCC 2006 emission factors, whereas we use IPCC 2014 emission factors, in combination with more granular emissions when available.

  • Electricity Maps uses more granular data for electricity exchanges between zones and countries (hourly, compared to IEA using a yearly balance), upstream emissions factors, as well as spatial and time granularity.

  • IEA and Electricity Maps use different methodologies for the allocation of emissions of combined heat and power (CHP) plants, and inclusion of T&D losses.

    It's important to note that while our emission factors differ in principle from the IEA's, they are generally within the same order of magnitude, and reflect similar overall trends.

Electricity Maps Emission Factors (EF)

Default

US Regional EFs

European regional EFs

VII. The Refetching Policy for Definitive Accuracy

Real-time data sources often consolidate, adjust, or finalize their initial readings over time, meaning the instant real-time value is often preliminary. To ensure that Electricity Maps provides the most accurate primary data possible, we implement an automatic refetching policy.

  • Refetch Schedule: Once per day, we refetch data covering a 48-hour period for the current day, a week, a month, and three months in the past.

  • Impact of Refetching: This systematic process ensures that we capture source updates. While significant changes can occur immediately, data stabilizes rapidly. For most zones, the magnitude of updates to the Renewable Energy Percentage (RE%) considerably decreases after six hours. For 50% of zones, the RE% value can be considered definitive (updates of less than 0.5 percentage points) after 72 hours. For 90% of zones, updates after 72 hours are less than 2 percentage points different compared to the real time values.

3 Months

48hr fetched

48hr

48hr

1 Month

48hr fetched

48hr

48hr

1 Week

48hr fetched

48hr

48hr

Now

48hr fetched

48hr

48hr

VIII. Strategic Estimation to Guarantee Global Coverage

Data gaps—due to invalid points, delays, or sparse reporting—must be filled to provide complete, continuous, granular data. We manage this through a tiered system based on data availability.

Tier A Zones

  • High granularity: measured hourly

  • Original data source

  • Gaps filled using TSA

These zones have measured hourly data available for the full electricity mix from the original source. Any potential gaps here are filled using the Time Slicer Average (TSA) estimation model. TSA is efficient for immediate gap filling, as it operates without a dedicated training phase, and maintaining continuity.

Tier B

  • Partial granularity: measured hourly

  • Original data source

  • Missing info estimated using zone-specific est. models

These zones have partial measured hourly data available from the original source. Since the full production mix breakdown may be missing, we develop zone-specific estimation models to fill these gaps. These custom models are designed to leverage all measured hourly data available and estimate the missing parts leveraging weather parameters.

Tier C

  • Limited granularity: monthly/yearly totals

  • Limited granularity: monthly/yearly totals

  • Limited granularity: monthly/yearly totals

  • Limited granularity: monthly/yearly totals

  • Hourly values modelled with General Purpose Zone Development model

  • Ensure reconciliation with original data
    source on monthly and yearly totals

These zones do not have measured hourly data available, only aggregate monthly or yearly totals. For these regions, hourly values are estimated using the General Purpose Zone Development (GPZD) model. GPZD was specifically developed to provide hourly estimated grid data by breaking down yearly or monthly production figures into plausible hourly estimates, using weather data and geographic information.

IX. Operational Excellence and Incident Management

The continuous delivery of trusted real-time data requires highly structured monitoring and incident response.

  • Observability Stack: Our alerting is supported by a robust tooling set, including Grafana, Prometheus, Big Query, and Sentry. This setup provides constant visibility into product-wide Service Level Objectives (SLOs).

  • Incident Response: We use a formal incident management playbook for responding to and resolving incidents in real time. This playbook ensures a fast, structured, and coordinated response when issues arise, supported by an on-call system (via Grafana OnCall and Slack).

  • Traceability: Every data point is stored with its full data lineage, including the original source or estimation model, ensuring complete data traceability should an auditor need to retrace results independently.

X. Data Publication and Versioning for Audit Readiness

Ensuring data traceability is a key objective. We guarantee that users can access and verify the exact data used for their calculations.

New historical datasets are typically published every January for the previous calendar year. Should major updates or data source improvements occur throughout the year, the data is updated, and these changes are fully versioned. This policy allows users to access and reference previous snapshots, ensuring complete audit readiness and traceability. We maintain a complete data lineage, tracking the value and origin (source or estimation model) of every data point over time.

Datasets

Zone name

Zone

Zone

Year

Version date

France

FR

2023

Jul 3,

2023

LATEST

France

FR

2023

Apr 3,

2023

France

FR

2023

Jan 27th,

2023

XI. Validation Against Global Data Sources

To continually reinforce the trustworthiness of our methodology, our production-based historical data is rigorously validated against highly regarded external sources.

  • Global Comparison (IEA and Ember): When comparing our production-based Renewable Energy Percentage (RE%) and Carbon-Free Energy Percentage (CFE%) data against worldwide sources like the International Energy Agency (IEA) and Ember (which do not include electricity flows), we find a strong correlation (0.99 for RE%). Across 59 countries, the median absolute difference for RE% data against both Ember and IEA remained below 3.2 percentage points for 2023. Similarly, the CFE% comparison shows consistency, with the median absolute differences remaining low.

  • Regional Comparison (Eurostat): Validation against regional authoritative sources, such as Eurostat (the statistical office of the European Union), also confirms consistency. For the 33 countries compared in 2022, the median absolute difference for RE% was 2.4 percentage points, demonstrating that our data is consistent with Eurostat's authoritative figures.

Median difference


Compared to IEA

vs. IEA

Compared to EMBER

vs. EMBER

Median Absolute

Median Abs.

2.9 pp

2.1 pp

Median

-1.1 pp

0.3 pp

Electricity Maps data for the CFE% over 2023 is consistent with values provided by EMBER and the IEA.

XII. How Electricity Maps provides Real-Time data

In reality, even the best public sources (from TSOs in Europe, for example), only provide data with a slight delay, because they report what happened in the last reporting interval. To this, we have to add that there are technical delays in delivering this data.

So how can Electricity Maps provide real-time data for decision-making? If you’ve been looking carefully through our map, you will have noticed that the real-time view uses two labels: Preliminary, and Synthetic.

"Preliminary" is used when our models estimate the values, but they will be replaced with actual values from the source.

"Synthetic" is used when the sources are not granular enough, or updated often enough, or simply don't exist. In those cases, our models estimate the values, and they will not be replaced with actual values.

Preliminary

Preliminary

Synthetic

Synthetic

XIII. Tier A zones

We have briefly introduced our tiering system on this page, which categorizes zones into Tier A, B, and C.

Tier A zones are zones with measured hourly or sub-hourly data available. For these zones, we complement our data sources with the Time Slicer Average model to guarantee data real-timeness and completeness.

Time Slice Average (TSA) is an estimation method for Tier A zones that fills short gaps or delays in otherwise reliable hourly production data. For every missing timestamp, TSA takes the average of available observations at the same time of day across other days within the same month, then aligns the filled values to ensure continuity with the data immediately before and after the gap. For weather-sensitive modes like solar and wind, TSA can be complemented with external or internal forecasts to improve realism.

This is the data our map labels as “Preliminary”, and our API responds with “Estimated”, along with the Time Slicer Average as the estimation method.

XIV. Tier B and C zones

Tier B and C zones are zones where we get partial hourly data, and zones with no hourly data at all. Data in these zones will always be estimated to some extent, and will always be labelled as such in the App and in the API.

For Tier B zones, we leverage all hourly data available and use zone-specific models to estimate, on an hourly granularity, the data that is only available at a daily or lower granularity. These models usually leverage time and weather parameters to break down original values into hourly granularity.

For Tier C zones, we have developed a model called General Purpose Zone Development (GPZD) that estimates hourly electricity production by mode in zones where only low-frequency data exists, such as yearly or monthly aggregates. It aims for plausible, smooth hourly profiles that exactly reconcile to reported monthly or yearly totals per mode, prioritising global coverage and stability over perfect accuracy. The model is trained on zones with both hourly and yearly data to learn realistic patterns and then applied where high-frequency data is missing.

It works in two stages: first, it derives monthly production per mode either by using existing monthly data or by disaggregating yearly totals into months using monthly weather signals and capacity bounds. Second, it converts monthly to hourly using hourly weather, geographic cues like sunrise and sunset, capacity limits, and an optimization step that enforces ramping and non-negativity constraints.

This is the data our map labels as “Synthetic”, and our API responds with “Estimated”, along with the corresponding estimation method.

Forecasting

This documentation addresses the methodology ensuring the quality of our forecasts.

I. The Strategic Imperative for Grid Forecasting

As the global energy landscape rapidly electrifies and relies more heavily on highly variable low-carbon sources like solar and wind, anticipating the grid's future state is essential.

Our forecasting engine provides a comprehensive prediction for the future state of grids worldwide, typically spanning up to 72 hours. The goal is to produce accurate and actionable forecasts for all of our signals, including all carbon and pricing signals supported, enabling users to optimize consumption patterns for lower carbon emissions or lower costs.

Forecasted

Biomass

Solar

Wind

Coal

II. Ensuring Coherency through Flow-Tracing Predictions

Forecasting individual grid components (like solar production or net flows) is complex enough, but the greatest challenge is ensuring these thousands of individual predictions result in a physically coherent network state. Since all models are intertwined through flow tracing, changing one forecast (e.g., geothermal production in California) can affect the forecasted grid state in distant, interconnected zones.

We build a physically coherent prediction of the future state of all interconnected grids by applying our fundamental flow-tracing algorithm to each individual production and power flow forecasts (e.g., solar production, nuclear baseline, and net flow predictions). This results in the most accurate prediction of the electricity mix and, subsequently, the future Carbon Intensity (CI) in each grid globally.

US-CAL-CISO

US-CAL-CISO

US-SW-WALC

US-SW-WALC

Flow-Tracing

US-CAL-CISO

US-SW-WALC

III. Scalable Architecture and the General-Purpose Model

The dimensionality of the predictions we make is relatively large: we have to predict about 20 signals, across more than 200 zones, for more than 72 hours horizons. This forces us to avoid hand-crafting models for a particular zone, signal, or horizon, as it hinders our scalability.

Instead, we prefer to iterate on a single general-purpose model that can cope with the varying degrees of availability and robustness of the features it ingests while being robust to many error sources, including those we don’t yet know about.

Depending on the type of forecasts we want to generate, different sets of features will be most relevant. For example, features describing weather patterns are essential to forecast solar power production, while features engineered to provide useful information about the expected future make-up of the power grid are relevant to forecast net flows between regions.

These features can further be pre-processed in a multitude of fashions. Choosing to standardize them or imputing missing values can have a significant impact on the behavior of the predictions.

Parameters

Model

/

Data

Forecast:

IV. Automation, Traceability, and Version Control

To manage the complexity of thousands of interconnected predictions globally, we rely on core engineering principles: automation and guaranteed traceability.

  • Automated Lifecycle: We automate all operations within our model’s lifecycle, including training, testing, and deployment, to ensure high speed and reliability while avoiding reliance on manual intervention.

  • Guaranteed Traceability: We guarantee users access to a fully traceable release version of our engine. This means all information describing the features, preprocessors, trainers, and model class used is frozen under a release, enabling us to explain exactly where our forecasts originate.

  • Version Control Environments: We maintain three distinct environments to manage model deployment risk:

Nightly

Features

Models

Trainers

Latest

Features

Models

Trainers

Support

Features

Models

Trainers

Nightly: Used for high-risk experimentation and testing new model classes, mainly for internal use.

Latest: The current production version used by commercial services.

Support: Holds a stable backup release matching the previous version of "Latest".

When a major model release occurs, a dedicated service promotes the model configurations from Nightly to Latest, and from Latest to Support in a version-controlled system.

V. Monitoring and Scalable Analytics Setup

Trust in forecast data is maintained through dedicated, scalable analytics that continuously monitor model performance.

  • Scalable Analytics Setup: Our system utilizes BigQuery and Dataform to define and compute complex system metrics. Dataform is crucial as it implements software-engineering best practices (version control and testing) within our analytics engine, ensuring the metrics we report are inherently trustworthy.

  • Monitoring & Transparency: Forecast metrics and key observability data are exposed via Looker Studio dashboards, which allows internal teams to build confidence in the forecast quality and ensures that the grid forecasts team is not distracted by recurring inquiries. Furthermore, completeness metrics are scraped and integrated into tools like Prometheus and Grafana for continuous health monitoring.

Release
Configuration

Configuration
Exporter

BigQuery
config tracker

Operational
Database

Metrics
Exporter

Prometheus

Grafana

Index

Index