Data Evaluation

🚧 Coming Soon: This feature is launching soon — stay tuned.

Before using data for forecasting or model training, it’s important to understand data quality. The evaluation scores data from three angles: whether the data is complete, regular, and whether series are related. Higher scores indicate better quality.

Three evaluation dimensions

Integrity

This score tells you whether the data is missing or irregular.

Sensor outages, network jitter, and duplicate reports can introduce gaps, duplicates, or misalignment in time-series data. The integrity score reflects how severe these issues are.

Score	Meaning	Suggestion
80–100	Continuous data with regular timestamps	Safe to use directly
40–80	Some missing points or anomalies	Clean the data first
0–40	Many missing points or severe anomalies	Investigate data sources carefully

If the integrity score is low, forecasting and analytics may be impacted—models can learn incorrect patterns or become biased due to gaps. Consider fixing data issues before proceeding.

Forecastability

This score tells you whether the series has learnable patterns.

Some series are naturally regular (e.g., hourly load with daily cycles), while others behave closer to random fluctuations. The forecastability score reflects how strong the pattern is.

Score	Meaning	Suggestion
50–100	Strong patterns and easier to forecast	Good candidate for modeling
30–50	Some patterns with noticeable volatility	Try modeling; treat results as reference
0–30	Weak patterns, close to random noise	Forecasting may be poor; review data or strategy

If the forecastability score is low, it does not necessarily mean the data is wrong—it may simply be highly volatile and hard to predict. Use domain knowledge to decide whether to add covariates or adjust expectations.

Correlation

This score tells you whether multiple series are related.

When you collect multiple signals (e.g., temperature, humidity, pressure), correlation helps you see which series move together and which are independent. This is useful for multivariate forecasting and feature selection.

If two series are highly correlated (close to 100), they likely carry similar information—keeping one may be enough to reduce redundancy. If the target series has low correlation with others, those series may contribute little and can be removed.

Quick Reference

Metric	Key Question	Low Score Indicates
Integrity	Are timestamps continuous? Any duplicates or gaps?	Missing data, duplicates, or time-ordering issues
Forecastability	Does the series have cyclic or trend patterns?	Near-random series, hard to extrapolate from history
Correlation	Are multiple series linearly related?	Series are independent; covariate value is limited