Data Evaluation

🚧 Coming Soon: This feature is launching soon — stay tuned.

Before using data for forecasting or model training, it’s important to understand data quality. The evaluation scores data from three angles: whether the data is complete, regular, and whether series are related. Higher scores indicate better quality.

Three evaluation dimensions

Integrity

This score tells you whether the data is missing or irregular.

Sensor outages, network jitter, and duplicate reports can introduce gaps, duplicates, or misalignment in time-series data. The integrity score reflects how severe these issues are.

ScoreMeaningSuggestion
80–100Continuous data with regular timestampsSafe to use directly
40–80Some missing points or anomaliesClean the data first
0–40Many missing points or severe anomaliesInvestigate data sources carefully

If the integrity score is low, forecasting and analytics may be impacted—models can learn incorrect patterns or become biased due to gaps. Consider fixing data issues before proceeding.

Forecastability

This score tells you whether the series has learnable patterns.

Some series are naturally regular (e.g., hourly load with daily cycles), while others behave closer to random fluctuations. The forecastability score reflects how strong the pattern is.

ScoreMeaningSuggestion
50–100Strong patterns and easier to forecastGood candidate for modeling
30–50Some patterns with noticeable volatilityTry modeling; treat results as reference
0–30Weak patterns, close to random noiseForecasting may be poor; review data or strategy

If the forecastability score is low, it does not necessarily mean the data is wrong—it may simply be highly volatile and hard to predict. Use domain knowledge to decide whether to add covariates or adjust expectations.

Correlation

This score tells you whether multiple series are related.

When you collect multiple signals (e.g., temperature, humidity, pressure), correlation helps you see which series move together and which are independent. This is useful for multivariate forecasting and feature selection.

If two series are highly correlated (close to 100), they likely carry similar information—keeping one may be enough to reduce redundancy. If the target series has low correlation with others, those series may contribute little and can be removed.

Quick Reference

MetricKey QuestionLow Score Indicates
IntegrityAre timestamps continuous? Any duplicates or gaps?Missing data, duplicates, or time-ordering issues
ForecastabilityDoes the series have cyclic or trend patterns?Near-random series, hard to extrapolate from history
CorrelationAre multiple series linearly related?Series are independent; covariate value is limited