Methodology

Getting Started

Getting started

At Electricity Maps, we’re data scientists, first and foremost.

Data comes in from many sources, and in many formats. We ingest and harmonize it, apply our models to it, and make it available to the world. This is the place to learn more about our data; read FAQs, or deep dive in our methodology.

Frequently Asked Questions

How good is your data?

What Emission Factors do you use?

How do you provide live data?

What time horizons do you cover?

Are your numbers different from other sources of electricity data?

How do you calculate the grid carbon intensity? What emission factors do you use?

Do you provide electricity price as a signal?

Is your data auditable?

How are you forecasting your signals?

How often does your data get updated?

How long until the data is final?

Historical & Real Time

First, let’s have a look at the Historical time frame

I. Foundational Methodological Choices

For accurate and verifiable data that most closely represents the physical reality of electricity grids, Electricity Maps employs a robust, triple-layered methodological choice: our data is attributional, location-based, and consumption-based.

Attributional Accounting Approach

We align with international standards, such as the GHG Protocol Scope 2 Guidance, which track, GHG emissions and removals within a defined organizational and operational boundary over time. It is the primary method, required by regulation and standards, to report on companies and individuals emissions.

Location Based Method

Our data reflects the physical reality of the grid. A location-based method considers the electricity available on grids where energy consumption occurs and does not include contracts or certificates traded.

Consumption-based Calculation

We provide grid signals (electricity mix, carbon intensity, ...) for the electricity available (or consumed) in a grid, rather than merely what was produced locally. This crucial distinction mandates accounting for electricity flows across grids, which is achieved through our flow-tracing algorithm.

II. Defining Granularity - Spatial and Temporal

To offer actionable data, we support different spatial and temporal aggregations on top of the highest granular data.

Spatial Granularity: Our spatial units represent a physical network that connects generators to consumers. They typically correspond to an electricity grid controlled by a single responsible operator. We aim to display the smallest subdivision of electricity grids for which reliable data is available, ensuring the highest accuracy. We also provide data aggregated at a country level.
Temporal Granularity: All our data can be delivered with a 5-minute, 15-minute, and hourly granularity to ensure the highest temporal fidelity and accuracy. We also provide data aggregated daily, monthly, quarterly, and yearly.

III. Ingestion: Parser System

High-quality data starts with reliable sourcing, and mandatory standardization.

Trusted Data Acquisition: We prioritize obtaining data from the highest-quality, most credible organizations globally, including government agencies (like the EIA in the US), Technical Bodies (like ENTSO-E in Europe), Nominated Electricity Market Operators (like Nordpool), Transmission System Operators (TSOs like RTE in France and Energinet in Denmark), and large utility companies. Currently, we have 75 active parsers for real-time electricity mix data and 38 active parsers for exchange data.
Multiple Time Frequencies: We integrate with data sources that support different time granularities. Some parsers run with high frequency to ingest hourly or more granular data, while others run less frequently and ingest monthly or yearly data.
The Parser System: We use an open-source parser system to ingest raw data and transform it into a standardized format. This critical step maps disparate raw data inputs (e.g., ENTSO-E's 21 specific modes) into our fixed, harmonized set of 12 distinct production modes. This standardization ensures consistency and comparability across all global zones.

ENTSO-E example

Disparate raw data

Harmonized Set

Fossil Brown Coal / Lignite

Fossil Hard Coal

Fossil Oil Shale

Fossil Peat

Coal

Fossil Oil

Oil

Fossil Coal-derived Gas

Fossil Gas

Gas

Geothermal

Solar

Hydro run-of-river & poundage

Hydro water reservoir

Hydro

Hydro Pumped Storage

Hydro Storage

Wind Offshore

Wind Onshore

Wind

Biomass

Waste

Biomass

Energy Storage

Battery Storage

Nuclear

Marine

Other

Other renewable

Unknown

IV. Automated Quality Validation and Outlier Detection

IV. Automated Quality Validation and Outlier Detection

To prevent flawed data from impacting calculations, every ingested data point undergoes immediate and rigorous quality validation.

Outlier Detection Pipeline: Our system automatically detects and flags outliers using an Apache Beam pipeline that runs continuously.
Validation Rules: Every component of every data points is subject to multiple configurable validation rules that run in parallel. Some of these checks ensure physical plausibility, such as verifying production levels do not exceed capacity, or ensuring expected modes are not missing. If one component does not pass one of the validation rules, the data point is immediately flagged as invalid.

To prevent flawed data from impacting calculations, every ingested data point undergoes immediate and rigorous quality validation.

Outlier Detection Pipeline: Our system automatically detects and flags outliers using an Apache Beam pipeline that runs continuously.
Validation Rules: Every component of every data points is subject to multiple configurable validation rules that run in parallel. If one component or the datapoint as a whole don't pass one of the validation rules, the data point is immediately flagged as invalid.

IV. Automated Quality Validation and Outlier Detection

To prevent flawed data from impacting calculations, every ingested data point undergoes immediate and rigorous quality validation.

Outlier Detection Pipeline: Our system automatically detects and flags outliers using an Apache Beam pipeline that runs continuously.
Validation Rules: Every component of every data points is subject to multiple configurable validation rules that run in parallel. Some of these checks ensure physical plausibility, such as verifying production levels do not exceed capacity, or ensuring expected modes are not missing. If one component does not pass one of the validation rules, the data point is immediately flagged as invalid.

Zones

IV. Automated Quality Validation and Outlier Detection

To prevent flawed data from impacting calculations, every ingested data point undergoes immediate and rigorous quality validation.

Outlier Detection Pipeline: Our system automatically detects and flags outliers using an Apache Beam pipeline that runs continuously.
Validation Rules: Every component of every data points is subject to multiple configurable validation rules that run in parallel. Some of these checks ensure physical plausibility, such as verifying production levels do not exceed capacity, or ensuring expected modes are not missing. If one component does not pass one of the validation rules, the data point is immediately flagged as invalid.

To validate zones we use a set of rules that are suitable for zonal data. These include checks that production modes do not exceed capacity, that all the expected production modes are there, that a zone don't have zero production, and that both the modes and totals for the zone are within expected ranges.

IV. Automated Quality Validation and Outlier Detection

To prevent flawed data from impacting calculations, every ingested data point undergoes immediate and rigorous quality validation.

Outlier Detection Pipeline: Our system automatically detects and flags outliers using an Apache Beam pipeline that runs continuously.
Validation Rules: Every component of every data points is subject to multiple configurable validation rules that run in parallel. Some of these checks ensure physical plausibility, such as verifying production levels do not exceed capacity, or ensuring expected modes are not missing. If one component does not pass one of the validation rules, the data point is immediately flagged as invalid.

event

validation

gas

wind

hydro

Capacity

Expected modes

Range mode

Zero production

Range total

state

1 - valid datapoint

0 - invalid datapoint

Exchanges

IV. Automated Quality Validation and Outlier Detection

To prevent flawed data from impacting calculations, every ingested data point undergoes immediate and rigorous quality validation.

Outlier Detection Pipeline: Our system automatically detects and flags outliers using an Apache Beam pipeline that runs continuously.
Validation Rules: Every component of every data points is subject to multiple configurable validation rules that run in parallel. Some of these checks ensure physical plausibility, such as verifying production levels do not exceed capacity, or ensuring expected modes are not missing. If one component does not pass one of the validation rules, the data point is immediately flagged as invalid.

To validate exchanges we have a different set of rules that are better suited for the exchange data such as ensuring it don't exceed capacity, that it's within the expected range, and Interquartile Range (IQR) to ensure that exchange flows are within reasonable limits.

IV. Automated Quality Validation and Outlier Detection

To prevent flawed data from impacting calculations, every ingested data point undergoes immediate and rigorous quality validation.

Outlier Detection Pipeline: Our system automatically detects and flags outliers using an Apache Beam pipeline that runs continuously.
Validation Rules: Every component of every data points is subject to multiple configurable validation rules that run in parallel. Some of these checks ensure physical plausibility, such as verifying production levels do not exceed capacity, or ensuring expected modes are not missing. If one component does not pass one of the validation rules, the data point is immediately flagged as invalid.

event

validation

net flow

Capacity

Range

IQR

state

1 - valid datapoint

0 - invalid datapoint

V. The Flow-Tracing Algorithm: Accounting for Electricity Flows across Interconnected Grids

V. The Flow-Tracing Algorithm: Accounting for Electricity Flows across Interconnected Grids

The electricity mix produced in a given area is often not an accurate reflection of what is actually available on the grid, primarily because electricity grids are highly interconnected. Electricity is constantly exchanged between grids through interconnectors. These imports and exports create complicated electricity flows that confound simple production-based accounting.

Our flow-tracing method addresses this fundamental difficulty. This peer-reviewed scientific approach traces electricity flows across all interconnected grids to calculate the electricity mix truly available at each location of the grid.

The methodology is based on two core principles regarding electricity behavior:

Proportional Mixing: When electricity from various sources combines, the sources mix proportionally to their share of the power supplied.
Irreversibility: Once mixed, the electricity cannot be unmixed to select a specific source (analogous to unmixing a smoothie).

By applying these principles, the algorithm mathematically solves the entire network's flows to precisely determine the origin of power consumed on the grid.

If you want to dive deeper into Flow-Tracing, you can read our peer-reviewed paper: Real-time carbon accounting method for the European electricity markets.

We also have a less technical blog post here explaining flow-tracing.

The methodology is based on two core principles regarding electricity behavior:

Proportional Mixing: When electricity from various sources combines, the sources mix proportionally to their share of the power supplied.
Irreversibility: Once mixed, the electricity cannot be unmixed to select a specific source (analogous to unmixing a smoothie).

By applying these principles, the algorithm mathematically solves the entire network's flows to precisely determine the origin of power consumed on the grid.

If you want to dive deeper into Flow-Tracing, you can read our peer-reviewed paper: Real-time carbon accounting method for the European electricity markets.

We also have a less technical blog post here explaining flow-tracing.

V. The Flow-Tracing Algorithm: Accounting for Electricity Flows across Interconnected Grids

The methodology is based on two core principles regarding electricity behavior:

Proportional Mixing: When electricity from various sources combines, the sources mix proportionally to their share of the power supplied.
Irreversibility: Once mixed, the electricity cannot be unmixed to select a specific source (analogous to unmixing a smoothie).

By applying these principles, the algorithm mathematically solves the entire network's flows to precisely determine the origin of power consumed on the grid.

If you want to dive deeper into Flow-Tracing, you can read our peer-reviewed paper: Real-time carbon accounting method for the European electricity markets.

We also have a less technical blog post here explaining flow-tracing.

VI. Precision in Emissions Factors: Direct vs. Lifecycle Emissions

VI. Precision in Emissions Factors: Direct vs. Lifecycle Emissions

To calculate Carbon Intensity (CI), we match the flow-traced electricity mix with technology-specific emission factors. We offer two primary types:

Life-Cycle Emission Factors: These provide a thorough "cradle-to-grave" accounting. They include emissions from building the power plant, operating it, extracting fuel, and disposal at the end of its life.
Direct Emission Factors (Operational): These only count the emissions released directly from the operation of the power plant (like burning fuel).

We ensure precision by employing both globally recognized standards and highly specific regional factors.

Global Default Factors: Electricity Maps uses the globally recognized and peer-reviewed 2014 IPCC Fifth Assessment Report emission factors as the default for most electricity grids worldwide.
Regional Emission Factors: To provide superior accuracy, we have developed advanced methodologies for computing regional emission factors in regions like the US and the EU where power-plant-level data is made available.
US: We compute grid-specific emission factors using data from the US Environmental Protection Agency's Emissions & Generation Resource Integrated Database (eGRID), which includes generation and emissions data for most power plants in the US.
EU: We compute grid-specific emission factors using emissions data from the European Union’s EU Emissions Trading Scheme (EU-ETS) and power-plant generation data from ENTSO-E.

This multi-faceted approach ensures our emission factors capture regional specificities like fuel type and plant efficiency, and temporal evolutions, resulting in data that is highly specific and often more up-to-date than generalized global standards.

How does this compare to Emission Factors used by the IEA?

We differ in few, but noteworthy ways:

IEA uses IPCC 2006 emission factors, whereas we use IPCC 2014 emission factors, in combination with more granular emissions when available.
Electricity Maps uses more granular data for electricity exchanges between zones and countries (hourly, compared to IEA using a yearly balance), upstream emissions factors, as well as spatial and time granularity.
IEA and Electricity Maps use different methodologies for the allocation of emissions of combined heat and power (CHP) plants, and inclusion of T&D losses.
It's important to note that while our emission factors differ in principle from the IEA's, they are generally within the same order of magnitude, and reflect similar overall trends.

To calculate Carbon Intensity (CI), we match the flow-traced electricity mix with technology-specific emission factors. We offer two primary types:

Life-Cycle Emission Factors: These provide a thorough "cradle-to-grave" accounting. They include emissions from building the power plant, operating it, extracting fuel, and disposal at the end of its life.
Direct Emission Factors (Operational): These only count the emissions released directly from the operation of the power plant (like burning fuel).

We ensure precision by employing both globally recognized standards and highly specific regional factors.

Global Default Factors: Electricity Maps uses the globally recognized and peer-reviewed 2014 IPCC Fifth Assessment Report emission factors as the default for most electricity grids worldwide.
Regional Emission Factors: To provide superior accuracy, we have developed advanced methodologies for computing regional emission factors in regions like the US and the EU where power-plant-level data is made available.
US: We compute grid-specific emission factors using data from the US Environmental Protection Agency's Emissions & Generation Resource Integrated Database (eGRID), which includes generation and emissions data for most power plants in the US.
EU: We compute grid-specific emission factors using emissions data from the European Union’s EU Emissions Trading Scheme (EU-ETS) and power-plant generation data from ENTSO-E.

IEA uses IPCC 2006 emission factors, whereas we use IPCC 2014 emission factors, in combination with more granular emissions when available. Our approach is more accurate.
Electricity Maps uses more granular data for electricity exchanges between zones and countries (hourly, compared to IEA using a yearly balance), upstream emissions factors, as well as spatial and time granularity.
IEA and Electricity Maps use different methodologies for the allocation of emissions of combined heat and power (CHP) plants, and inclusion of T&D losses.
It's important to note that while our emission factors differ in principle from the IEA's, they are generally within the same order of magnitude, and reflect similar overall trends.
Read more about it in our whitepaper.

VI. Precision in Emissions Factors: Direct vs. Lifecycle Emissions

To calculate Carbon Intensity (CI), we match the flow-traced electricity mix with technology-specific emission factors. We offer two primary types:

Life-Cycle Emission Factors: These provide a thorough "cradle-to-grave" accounting. They include emissions from building the power plant, operating it, extracting fuel, and disposal at the end of its life.
Direct Emission Factors (Operational): These only count the emissions released directly from the operation of the power plant (like burning fuel).

We ensure precision by employing both globally recognized standards and highly specific regional factors.

Global Default Factors: Electricity Maps uses the globally recognized and peer-reviewed 2014 IPCC Fifth Assessment Report emission factors as the default for most electricity grids worldwide.
Regional Emission Factors: To provide superior accuracy, we have developed advanced methodologies for computing regional emission factors in regions like the US and the EU where power-plant-level data is made available.
US: We compute grid-specific emission factors using data from the US Environmental Protection Agency's Emissions & Generation Resource Integrated Database (eGRID), which includes generation and emissions data for most power plants in the US.
EU: We compute grid-specific emission factors using emissions data from the European Union’s EU Emissions Trading Scheme (EU-ETS) and power-plant generation data from ENTSO-E.

IEA uses IPCC 2006 emission factors, whereas we use IPCC 2014 emission factors, in combination with more granular emissions when available.
Electricity Maps uses more granular data for electricity exchanges between zones and countries (hourly, compared to IEA using a yearly balance), upstream emissions factors, as well as spatial and time granularity.
IEA and Electricity Maps use different methodologies for the allocation of emissions of combined heat and power (CHP) plants, and inclusion of T&D losses.
It's important to note that while our emission factors differ in principle from the IEA's, they are generally within the same order of magnitude, and reflect similar overall trends.

Electricity Maps Emission Factors (EF)

Default

US regional EFs

European regional EFs

VII. The Refetching Policy for Definitive Accuracy

Real-time data sources often consolidate, adjust, or finalize their initial readings over time, meaning the instant real-time value is often preliminary. To ensure that Electricity Maps provides the most accurate primary data possible, we implement an automatic refetching policy.

Refetch Schedule: Once per day, we refetch data covering a backwards looking 48-hour period for the current day, a week, a month, and three months in the past.
Impact of Refetching: This systematic process ensures that we capture source updates. While significant changes can occur immediately, data stabilizes rapidly. For most zones, the magnitude of updates to the Renewable Energy Percentage (RE%) considerably decreases after six hours. For 50% of zones, the RE% value can be considered definitive (updates of less than 0.5 percentage points) after 72 hours. For 90% of zones, updates after 72 hours are less than 2 percentage points different compared to the real time values.

3 Months

48hr fetched

48hr

1 Month

48hr fetched

48hr

1 Week

48hr fetched

48hr

Now

48hr fetched

48hr

VIII. Zone tiers and estimations for global coverage

Data gaps—due to invalid points, delays, or sparse reporting—must be filled to provide complete, continuous, granular data. We manage this through a tiered system based on data availability.

Tier A

Tier A zones are zones with measured hourly or sub-hourly data available. For these zones, we complement our data sources with the Time Slicer Average or Forecast Hierarchy models to guarantee data real-timeness and completeness.

Time Slice Average (TSA) is an estimation method for Tier A zones that fills short gaps or delays in otherwise reliable hourly production data. For every missing timestamp, TSA takes the average of available observations at the same time of day across other days within the same month, then aligns the filled values to ensure continuity with the data immediately before and after the gap.

Forecast Hierarchy is a estimation method for Tier A zones that have external forecasts that can be used to enhance the accuracy of the estimation method. If no external forecasts is available then TSA is used instead for the production mode in question.

This is the data our map labels as “Preliminary”, and our API responds with “Estimated”, along with the Time Slicer Average or Forecast Hierarchy as the estimation method.

Tier A Zones

High granularity: hourly or better measured data

Original data source

Gaps filled using Time Slicer Average or Forecast Hierarchy

Tier B

Tier B zones are zones where we get partial measured hourly or sub-hourly data. Since the full production mix breakdown is missing we use zone-specific estimation models to break down existing or fill in the missing data. These models usually leverage time and weather parameters to break down original values into hourly granularity.

The data in these zones will always be estimated to some extent, and will always be labelled as such in the App and in the API.

Tier B

Partial granularity: hourly or better measured data

Partial granularity: partial hourly data

Original data source

Missing info estimated using zone-specific est. models

Tier C

Tier C zones are zones where we get no hourly data at all.

For Tier C zones, we have developed a model called General Purpose Zone Development (GPZD) that estimates hourly electricity production by mode in zones where only low-frequency data exists, such as yearly or monthly aggregates. It aims for plausible, smooth hourly profiles that exactly reconcile to reported monthly or yearly totals per mode, prioritising global coverage and stability over perfect accuracy. The model is trained on zones with both hourly and yearly data to learn realistic patterns and then applied where high-frequency data is missing.

It works in two stages: first, it derives monthly production per mode either by using existing monthly data or by disaggregating yearly totals into months using monthly weather signals and capacity bounds. Second, it converts monthly to hourly using hourly weather, geographic cues like sunrise and sunset, capacity limits, and an optimization step that enforces ramping and non-negativity constraints.

Data in these zones will always be estimated, and will always be labelled as such in the App and in the API, in the map it's labeled as “Synthetic”, and our API responds with “Estimated”, along with the General Purpose Zone Development estimation method.

Tier C

Limited granularity: monthly or yearly totals

Hourly values modelled with General Purpose Zone Development model

Ensure reconciliation with original data
source on monthly and yearly totals

Exchanges

In order to be able to have global coverage we also need to estimate exchange when there are short gaps to ensure the grid stays balanced. This data is hourly or sub-hourly and if any gaps occur it's estimated with TSA.

In cases where we know both the production and consumption but are missing exchange information we can re-construct it as long as the zone does not have more than one exchange.

All estimations for exchanges are limited to 7 days at most to ensure the estimation methods accuracy.

Exchanges

High granularity: hourly or better measured data

Original data source

Gaps filled using Time Slicer Average or reconstructed from consumption and production

IX. Operational Excellence and Incident Management

The continuous delivery of trusted real-time data requires highly structured monitoring and incident response.

Observability Stack: Our alerting is supported by a robust tooling set, including Grafana, Prometheus, Big Query, and Google Cloud Logging and Cloud Monitoring. This setup provides constant visibility into product-wide Service Level Objectives (SLOs).
Incident Response: We use a formal incident management playbook for responding to and resolving incidents in real time. This playbook ensures a fast, structured, and coordinated response when issues arise, supported by an on-call system (via Grafana OnCall and Slack).
Traceability: Every data point is stored with its full history, including the original source or estimation model, ensuring complete data traceability should an auditor need to retrace results independently.

X. Data Publication and Versioning for Audit Readiness

Ensuring data traceability is a key objective. We guarantee that users can access and verify the exact data used for their calculations.

New historical datasets are typically published every January for the previous calendar year. Should major updates or data source improvements occur throughout the year, the data is updated, and these changes are fully versioned. This policy allows users to access and reference previous snapshots, ensuring complete audit readiness and traceability. We maintain a complete data history, tracking the value and origin (source or estimation model) of every data point over time.

Datasets

Zone name

Zone

Year

Version date

France

2023

Jul 3,

2023

LATEST

France

2023

Apr 3,

2023

France

2023

Jan 27th,

2023

XI. Validation Against Global Data Sources

To continually reinforce the trustworthiness of our methodology, our production-based historical data is rigorously validated against highly regarded external sources.

Global Comparison (IEA and Ember): When comparing our production-based Renewable Energy Percentage (RE%) and Carbon-Free Energy Percentage (CFE%) data against worldwide sources like the International Energy Agency (IEA) and Ember (which do not include electricity flows), we find a strong correlation (0.99 for RE%). Across 59 countries, the median absolute difference for RE% data against both Ember and IEA remained below 3.2 percentage points for 2023. Similarly, the CFE% comparison shows consistency, with the median absolute differences remaining low.
Regional Comparison (Eurostat): Validation against regional authoritative sources, such as Eurostat (the statistical office of the European Union), also confirms consistency. For the 33 countries compared in 2022, the median absolute difference for RE% was 2.4 percentage points, demonstrating that our data is consistent with Eurostat's authoritative figures.

Median difference

Compared to IEA

vs. IEA

Compared to EMBER

vs. EMBER

Median Absolute

Median Abs.

2.9 pp

2.1 pp

Median

-1.1 pp

0.3 pp

Electricity Maps data for the CFE% over 2023 is consistent with values provided by EMBER and the IEA.

XII. How Electricity Maps provides Real-Time data

In reality, even the best public sources (from TSOs in Europe, for example), only provide data with a slight delay, because they report what happened in the last reporting interval. To this, we have to add that there are technical delays in delivering this data.

So how can Electricity Maps provide real-time data for decision-making? If you’ve been looking carefully through our map, you will have noticed that the real-time view uses two labels: Preliminary, and Synthetic.

"Preliminary" is used when our models estimate the values, but they will be replaced with actual values from the source.

"Synthetic" is used when the sources are not granular enough, or updated often enough, or simply don't exist. In those cases, our models estimate the values, and they will not be replaced with actual values.

Preliminary

Synthetic

Forecasting

This documentation addresses the methodology ensuring the quality of our forecasts.

I. The Strategic Imperative for Grid Forecasting

As the global energy landscape rapidly electrifies and relies more heavily on highly variable low-carbon sources like solar and wind, anticipating the grid's future state is essential.

Our forecasting engine provides a comprehensive prediction for the future state of grids worldwide, typically spanning up to 72 hours. The goal is to produce accurate and actionable forecasts for all of our signals, including all carbon and pricing signals supported, enabling users to optimize consumption patterns for lower carbon emissions or lower costs.

Forecasted

Biomass

Solar

Wind

Coal

II. Ensuring Coherency through Flow-Tracing Predictions

Forecasting individual grid components (like solar production or net flows) is complex enough, but the greatest challenge is ensuring these thousands of individual predictions result in a physically coherent network state. Since all models are intertwined through flow tracing, changing one forecast (e.g., geothermal production in California) can affect the forecasted grid state in distant, interconnected zones.

We build a physically coherent prediction of the future state of all interconnected grids by applying our fundamental flow-tracing algorithm to each individual production and power flow forecasts (e.g., solar production, nuclear baseline, and net flow predictions). This results in the most accurate prediction of the electricity mix and, subsequently, the future Carbon Intensity (CI) in each grid globally.

US-CAL-CISO

US-SW-WALC

Flow-Tracing

US-CAL-CISO

US-SW-WALC

III. Scalable Architecture and the General-Purpose Model

The dimensionality of the predictions we make is relatively large: we have to predict about 20 signals, across more than 200 zones, for more than 72 hours horizons. This forces us to avoid hand-crafting models for a particular zone, signal, or horizon, as it hinders our scalability.

Instead, we prefer to iterate on a single general-purpose model that can cope with the varying degrees of availability and robustness of the features it ingests while being robust to many error sources, including those we don’t yet know about.

Depending on the type of forecasts we want to generate, different sets of features will be most relevant. For example, features describing weather patterns are essential to forecast solar power production, while features engineered to provide useful information about the expected future make-up of the power grid are relevant to forecast net flows between regions.

These features can further be pre-processed in a multitude of fashions. Choosing to standardize them or imputing missing values can have a significant impact on the behavior of the predictions.

Parameters

Model

Data

Forecast:

IV. Automation, Traceability, and Version Control

To manage the complexity of thousands of interconnected predictions globally, we rely on core engineering principles: automation and guaranteed traceability.

Automated Lifecycle: We automate all operations within our model’s lifecycle, including training, testing, and deployment, to ensure high speed and reliability while avoiding reliance on manual intervention.
Guaranteed Traceability: We have the ability to trace exactly which features, training data, preprocessors and model class that was used for a release.
Version Control Environments: We maintain two distinct environments to manage model deployment risk:

Nightly

Features

Models

Trainers

Latest

Features

Models

Trainers

Nightly: Used for high-risk experimentation and testing new model classes, mainly for internal use.

Latest: The current production version used by commercial services.

When a major model release occurs, a dedicated service promotes the model configurations from Nightly to Latest, and from Latest to Support in a version-controlled system.

V. Monitoring and Scalable Analytics Setup

Trust in forecast data is maintained through dedicated, scalable analytics that continuously monitor model performance.

Scalable Analytics Setup: Our system utilizes BigQuery and Dataform to define and compute complex system metrics. Dataform is crucial as it implements software-engineering best practices (version control and testing) within our analytics engine, ensuring the metrics we report are inherently trustworthy.
Monitoring & Transparency: Forecast metrics and key observability data are exposed via Looker Studio dashboards, which allows internal teams to build confidence in the forecast quality and ensures that the grid forecasts team is not distracted by recurring inquiries. Furthermore, completeness metrics are ingested and integrated into tools like Prometheus and Grafana for continuous health monitoring.

Release
Configuration

Configuration
Exporter

BigQuery
config tracker

Operational
Database

Metrics
Exporter

Prometheus

Grafana

Get started

Are you ready to get started?

Get started for free - No credit card needed.

Get started

Are you ready to get started?

Get started for free - No credit card needed.

Index