Methodology
At Electricity Maps, we’re data scientists, first and foremost.
Data comes in from many sources, and in many formats. We ingest and harmonize it, apply our models to it, and make it available to the world. This is the place to learn more about our data; read FAQs, or deep dive in our methodology.
Frequently Asked Questions
Historical & Real Time
First, let’s have a look at the Historical time frame
I. Foundational Methodological Choices
For accurate and verifiable data that most closely represents the physical reality of electricity grids, Electricity Maps employs a robust, triple-layered methodological choice: our data is attributional, location-based, and consumption-based.
Attributional Accounting Approach
We align with international standards, such as the GHG Protocol Scope 2 Guidance, which track, GHG emissions and removals within a defined organizational and operational boundary over time. It is the primary method, required by regulation and standards, to report on companies and individuals emissions.
Location Based Method
Our data reflects the physical reality of the grid. A location-based method considers the electricity available on grids where energy consumption occurs and does not include contracts or certificates traded.
Consumption-based Calculation
We provide grid signals (electricity mix, carbon intensity, ...) for the electricity available (or consumed) in a grid, rather than merely what was produced locally. This crucial distinction mandates accounting for electricity flows across grids, which is achieved through our flow-tracing algorithm.
II. Defining Granularity (Space and Time)
To offer actionable data, we support different spatial and temporal aggregations on top of the highest granular data.
Spatial Granularity: Our spatial units represent a physical network that connects generators to consumers. They typically correspond to an electricity grid controlled by a single responsible operator. We aim to display the smallest subdivision of electricity grids for which reliable data is available, ensuring the highest accuracy. We also provide data aggregated at a country level.
Temporal Granularity: All our data can be delivered with a 5-minute, 15-minute, and hourly granularity to ensure the highest temporal fidelity and accuracy. We also provide data aggregated daily, monthly, quarterly, and yearly.
III. Ingestion: Parser System
High-quality data starts with reliable sourcing, and mandatory standardization.
Trusted Data Acquisition: We prioritize obtaining data from the highest-quality, most credible organizations globally, including government agencies (like the EIA in the US), Transmission System Operators (TSOs like ENTSO-E in Europe), and large utility companies. Currently, we have 75 active parsers for real-time electricity mix data and 38 active parsers for exchange data.
Multiple time frequencies: We integrate with data sources that support different time granularities. Some parsers run with high frequency to ingest hourly or more granular data, while others run less frequently and ingest monthly or yearly data.
The Parser System: We use an open-source parser system to ingest raw data and transform it into a standardized format. This critical step maps disparate raw data inputs (e.g., ENTSO-E's 21 specific modes) into our fixed, harmonized set of 12 distinct production modes. This standardization ensures consistency and comparability across all global zones.
ENTSO-E example
Disparate raw data
Harmonized Set
Fossil Brown Coal / Lignite
Fossil Hard Coal
Fossil Oil Shale
Fossil Peat
Coal
Fossil Oil
Oil
Fossil Coal-derived Gas
Fossil Gas
Gas
Geothermal
Geothermal
Solar
Solar
Hydro run-of-river & poundage
Hydro water reservoir
Hydro
Hydro Pumped Storage
Hydro Storage
Wind Offshore
Wind Onshore
Wind
Biomass
Waste
Biomass
Energy Storage
Battery Storage
Nuclear
Nuclear
Marine
Other
Other renewable
Other
IV. Automated Quality Validation and Outlier Detection
To prevent flawed data from impacting calculations, every ingested data point undergoes immediate and rigorous quality validation.
Outlier Detection Pipeline: Our system automatically detects and flags outliers using an Apache Beam pipeline that runs every 15 minutes.
Validation Rules: Every component of every data points is subject to multiple configurable validation rules that run in parallel. Some of these checks ensure physical plausibility, such as verifying production levels do not exceed capacity, or ensuring expected modes are not missing. If one component does not pass one of the validation rules, the data point is immediately flagged as invalid.
Manual Correction: Recognizing that automatic validation may not catch all faults, we maintain a manual outlier detection process to flag faulty data points and subsequently re-trigger the estimation and flow-tracing pipelines.
original_event
explode event
validation
gas, original_event
wind, original_event
hydro, original_event
Capacity
Expected modes
Range mode
Zero production
Range total
correction
1 - valid datapoint
0 - invalid datapoint
V. The Flow-Tracing Algorithm: Accounting for Electricity Flows across Interconnected Grids
The electricity mix produced in a given area is often not an accurate reflection of what is actually available on the grid, primarily because electricity grids are highly interconnected. Electricity is constantly exchanged between grids through interconnectors. These imports and exports create complicated electricity flows that confound simple production-based accounting.
Our flow-tracing method addresses this fundamental difficulty. This peer-reviewed scientific approach traces electricity flows across all interconnected grids to calculate the electricity mix truly available at each location of the grid.
The methodology is based on two core principles regarding electricity behavior:
Proportional Mixing: When electricity from various sources combines, the sources mix proportionally to their share of the power supplied.
Irreversibility: Once mixed, the electricity cannot be unmixed to select a specific source (analogous to unmixing a smoothie).
By applying these principles, the algorithm mathematically solves the entire network's flows to precisely determine the origin of power consumed on the grid.
If you want to dive deeper into Flow-Tracing, you can read our peer-reviewed paper: Real-time carbon accounting method for the European electricity markets
Electricity Maps Emission Factors (EF)
Default
US Regional EFs
European regional EFs
VII. The Refetching Policy for Definitive Accuracy
Real-time data sources often consolidate, adjust, or finalize their initial readings over time, meaning the instant real-time value is often preliminary. To ensure that Electricity Maps provides the most accurate primary data possible, we implement an automatic refetching policy.
Refetch Schedule: Once per day, we refetch data covering a 48-hour period for the current day, a week, a month, and three months in the past.
Impact of Refetching: This systematic process ensures that we capture source updates. While significant changes can occur immediately, data stabilizes rapidly. For most zones, the magnitude of updates to the Renewable Energy Percentage (RE%) considerably decreases after six hours. For 50% of zones, the RE% value can be considered definitive (updates of less than 0.5 percentage points) after 72 hours. For 90% of zones, updates after 72 hours are less than 2 percentage points different compared to the real time values.
3 Months
1 Month
1 Week
Now
VIII. Strategic Estimation to Guarantee Global Coverage
Data gaps—due to invalid points, delays, or sparse reporting—must be filled to provide complete, continuous, granular data. We manage this through a tiered system based on data availability.
Tier A Zones
High granularity: measured hourly
Original data source
Gaps filled using TSA
These zones have measured hourly data available for the full electricity mix from the original source. Any potential gaps here are filled using the Time Slicer Average (TSA) estimation model. TSA is efficient for immediate gap filling, as it operates without a dedicated training phase, and maintaining continuity.
Tier B
Partial granularity: measured hourly
Original data source
Missing info estimated using zone-specific est. models
These zones have partial measured hourly data available from the original source. Since the full production mix breakdown may be missing, we develop zone-specific estimation models to fill these gaps. These custom models are designed to leverage all measured hourly data available and estimate the missing parts leveraging weather parameters.
Tier C
Limited granularity: monthly/yearly totals
Limited granularity: monthly/yearly totals
Limited granularity: monthly/yearly totals
Limited granularity: monthly/yearly totals
Hourly values modelled with General Purpose Zone Development model
Ensure reconciliation with original data
source on monthly and yearly totals
These zones do not have measured hourly data available, only aggregate monthly or yearly totals. For these regions, hourly values are estimated using the General Purpose Zone Development (GPZD) model. GPZD was specifically developed to provide hourly estimated grid data by breaking down yearly or monthly production figures into plausible hourly estimates, using weather data and geographic information.
IX. Operational Excellence and Incident Management
The continuous delivery of trusted real-time data requires highly structured monitoring and incident response.
Observability Stack: Our alerting is supported by a robust tooling set, including Grafana, Prometheus, Big Query, and Sentry. This setup provides constant visibility into product-wide Service Level Objectives (SLOs).
Incident Response: We use a formal incident management playbook for responding to and resolving incidents in real time. This playbook ensures a fast, structured, and coordinated response when issues arise, supported by an on-call system (via Grafana OnCall and Slack).
Traceability: Every data point is stored with its full data lineage, including the original source or estimation model, ensuring complete data traceability should an auditor need to retrace results independently.
X. Data Publication and Versioning for Audit Readiness
Ensuring data traceability is a key objective. We guarantee that users can access and verify the exact data used for their calculations.
New historical datasets are typically published every January for the previous calendar year. Should major updates or data source improvements occur throughout the year, the data is updated, and these changes are fully versioned. This policy allows users to access and reference previous snapshots, ensuring complete audit readiness and traceability. We maintain a complete data lineage, tracking the value and origin (source or estimation model) of every data point over time.
Datasets
Year
Version date
Jul 3,
LATEST
Apr 3,
Jan 27th,
XI. Validation Against Global Data Sources
To continually reinforce the trustworthiness of our methodology, our production-based historical data is rigorously validated against highly regarded external sources.
Global Comparison (IEA and Ember): When comparing our production-based Renewable Energy Percentage (RE%) and Carbon-Free Energy Percentage (CFE%) data against worldwide sources like the International Energy Agency (IEA) and Ember (which do not include electricity flows), we find a strong correlation (0.99 for RE%). Across 59 countries, the median absolute difference for RE% data against both Ember and IEA remained below 3.2 percentage points for 2023. Similarly, the CFE% comparison shows consistency, with the median absolute differences remaining low.
Regional Comparison (Eurostat): Validation against regional authoritative sources, such as Eurostat (the statistical office of the European Union), also confirms consistency. For the 33 countries compared in 2022, the median absolute difference for RE% was 2.4 percentage points, demonstrating that our data is consistent with Eurostat's authoritative figures.
2.9 pp
2.1 pp
Median
-1.1 pp
0.3 pp
Electricity Maps data for the CFE% over 2023 is consistent with values provided by EMBER and the IEA.
XII. How Electricity Maps provides Real-Time data
In reality, even the best public sources (from TSOs in Europe, for example), only provide data with a slight delay, because they report what happened in the last reporting interval. To this, we have to add that there are technical delays in delivering this data.
So how can Electricity Maps provide real-time data for decision-making? If you’ve been looking carefully through our map, you will have noticed that the real-time view uses two labels: Preliminary, and Synthetic.
"Preliminary" is used when our models estimate the values, but they will be replaced with actual values from the source.
"Synthetic" is used when the sources are not granular enough, or updated often enough, or simply don't exist. In those cases, our models estimate the values, and they will not be replaced with actual values.
XIII. Tier A zones
We have briefly introduced our tiering system on this page, which categorizes zones into Tier A, B, and C.
Tier A zones are zones with measured hourly or sub-hourly data available. For these zones, we complement our data sources with the Time Slicer Average model to guarantee data real-timeness and completeness.
Time Slice Average (TSA) is an estimation method for Tier A zones that fills short gaps or delays in otherwise reliable hourly production data. For every missing timestamp, TSA takes the average of available observations at the same time of day across other days within the same month, then aligns the filled values to ensure continuity with the data immediately before and after the gap. For weather-sensitive modes like solar and wind, TSA can be complemented with external or internal forecasts to improve realism.
This is the data our map labels as “Preliminary”, and our API responds with “Estimated”, along with the Time Slicer Average as the estimation method.
XIV. Tier B and C zones
Tier B and C zones are zones where we get partial hourly data, and zones with no hourly data at all. Data in these zones will always be estimated to some extent, and will always be labelled as such in the App and in the API.
For Tier B zones, we leverage all hourly data available and use zone-specific models to estimate, on an hourly granularity, the data that is only available at a daily or lower granularity. These models usually leverage time and weather parameters to break down original values into hourly granularity.
For Tier C zones, we have developed a model called General Purpose Zone Development (GPZD) that estimates hourly electricity production by mode in zones where only low-frequency data exists, such as yearly or monthly aggregates. It aims for plausible, smooth hourly profiles that exactly reconcile to reported monthly or yearly totals per mode, prioritising global coverage and stability over perfect accuracy. The model is trained on zones with both hourly and yearly data to learn realistic patterns and then applied where high-frequency data is missing.
It works in two stages: first, it derives monthly production per mode either by using existing monthly data or by disaggregating yearly totals into months using monthly weather signals and capacity bounds. Second, it converts monthly to hourly using hourly weather, geographic cues like sunrise and sunset, capacity limits, and an optimization step that enforces ramping and non-negativity constraints.
This is the data our map labels as “Synthetic”, and our API responds with “Estimated”, along with the corresponding estimation method.
Forecasting
This documentation addresses the methodology ensuring the quality of our forecasts.
I. The Strategic Imperative for Grid Forecasting
As the global energy landscape rapidly electrifies and relies more heavily on highly variable low-carbon sources like solar and wind, anticipating the grid's future state is essential.
Our forecasting engine provides a comprehensive prediction for the future state of grids worldwide, typically spanning up to 72 hours. The goal is to produce accurate and actionable forecasts for all of our signals, including all carbon and pricing signals supported, enabling users to optimize consumption patterns for lower carbon emissions or lower costs.
Forecasted
Biomass
Solar
Wind
Coal
II. Ensuring Coherency through Flow-Tracing Predictions
Forecasting individual grid components (like solar production or net flows) is complex enough, but the greatest challenge is ensuring these thousands of individual predictions result in a physically coherent network state. Since all models are intertwined through flow tracing, changing one forecast (e.g., geothermal production in California) can affect the forecasted grid state in distant, interconnected zones.
We build a physically coherent prediction of the future state of all interconnected grids by applying our fundamental flow-tracing algorithm to each individual production and power flow forecasts (e.g., solar production, nuclear baseline, and net flow predictions). This results in the most accurate prediction of the electricity mix and, subsequently, the future Carbon Intensity (CI) in each grid globally.
US-CAL-CISO
US-CAL-CISO
US-SW-WALC
US-SW-WALC
Flow-Tracing
US-CAL-CISO
US-SW-WALC
III. Scalable Architecture and the General-Purpose Model
The dimensionality of the predictions we make is relatively large: we have to predict about 20 signals, across more than 200 zones, for more than 72 hours horizons. This forces us to avoid hand-crafting models for a particular zone, signal, or horizon, as it hinders our scalability.
Instead, we prefer to iterate on a single general-purpose model that can cope with the varying degrees of availability and robustness of the features it ingests while being robust to many error sources, including those we don’t yet know about.
Depending on the type of forecasts we want to generate, different sets of features will be most relevant. For example, features describing weather patterns are essential to forecast solar power production, while features engineered to provide useful information about the expected future make-up of the power grid are relevant to forecast net flows between regions.
These features can further be pre-processed in a multitude of fashions. Choosing to standardize them or imputing missing values can have a significant impact on the behavior of the predictions.
Parameters
Model
/
Data
Forecast:
IV. Automation, Traceability, and Version Control
To manage the complexity of thousands of interconnected predictions globally, we rely on core engineering principles: automation and guaranteed traceability.
Automated Lifecycle: We automate all operations within our model’s lifecycle, including training, testing, and deployment, to ensure high speed and reliability while avoiding reliance on manual intervention.
Guaranteed Traceability: We guarantee users access to a fully traceable release version of our engine. This means all information describing the features, preprocessors, trainers, and model class used is frozen under a release, enabling us to explain exactly where our forecasts originate.
Version Control Environments: We maintain three distinct environments to manage model deployment risk:
Nightly
Features
Models
Trainers
Latest
Features
Models
Trainers
Support
Features
Models
Trainers
Nightly: Used for high-risk experimentation and testing new model classes, mainly for internal use.
Latest: The current production version used by commercial services.
Support: Holds a stable backup release matching the previous version of "Latest".
When a major model release occurs, a dedicated service promotes the model configurations from Nightly to Latest, and from Latest to Support in a version-controlled system.
V. Monitoring and Scalable Analytics Setup
Trust in forecast data is maintained through dedicated, scalable analytics that continuously monitor model performance.
Scalable Analytics Setup: Our system utilizes BigQuery and Dataform to define and compute complex system metrics. Dataform is crucial as it implements software-engineering best practices (version control and testing) within our analytics engine, ensuring the metrics we report are inherently trustworthy.
Monitoring & Transparency: Forecast metrics and key observability data are exposed via Looker Studio dashboards, which allows internal teams to build confidence in the forecast quality and ensures that the grid forecasts team is not distracted by recurring inquiries. Furthermore, completeness metrics are scraped and integrated into tools like Prometheus and Grafana for continuous health monitoring.
Release
Configuration
Configuration
Exporter
BigQuery
config tracker
Operational
Database
Metrics
Exporter
Prometheus
Grafana
