Issue has been resolved.
The root cause of the issues was a failure in one of our services relying on Elasticache mainly for auth caching - this resulted in failures for expired user tokens to load screens/login/connect external data sources/access API's (all of which rely on the authentication mechanism)
The main data pipeline is decoupled from the auth service and was not affected. The issue was resolves after ±15min and monitored for the next hour before closing and notifying on full resolved status.
The team is working on making the required changes to prevent such issues from re-occurring in the future.
Jun 14, 14:03 IDT
Some services are still experiencing sporadic timeouts - the team is working to stabilize the problematic component.
Jun 14, 13:16 IDT
The system is now functional - we are monitoring the incident.
Jun 14, 12:55 IDT
The team is currently investigating an issue with the query service - Log searching is unavailable.
Alerts/LiveTail/Archive queries/Data ingestion are all functional - we will continue to update.
Jun 14, 12:49 IDT