EVENFLOW Scalability Toolkit

Scalable AI infrastructure for high-volume data environments

What is it about?

The Scalability Toolkit is a set of four open-source components that are designed to enhance the scalability and efficiency of Machine Learning (ML) workflows. The toolkit is specifically designed for distributed and high-volume environments, and it enables streamlined training and prediction. Its four integrated components are:

  • Synopses-based Training Optimisation, which accelerates ML training using data summaries and Bayesian optimisation.
  • Synopses Data Engine as a Service (SDEaaS), which provides efficient stream processing on Apache Flink and Dask for real-time data summarisation.
  • Advanced Distributed-Parallel Training, which reduces training time and communication lag via data-driven synchronisation.
  • SuBiTO, which integrates all components into one solution for production-grade ML pipelines.

 Who is it for?

  • Data Scientists and ML Engineers who handle large datasets (or even continuous data streams)
  • Enterprises that deploy AI models, at scale, in production environments
  • Research labs who work on AI model efficiency and scalability
  • DevOps teams who want to reduce infrastructure load during AI training

Why use it?

  • Reduces training times without sacrificing model accuracy
  • Improves resource efficiency and CPU usage
  • Compatible with widely used platforms (Apache Flink, Kafka)
  • The modular design enables using one or all components, depending on the users’ need

How to access the tool?

The toolkit is fully open-source and publicly available:

🔗 EVENFLOW Scalability Toolkit on GitHub

Each component is accompanied by:

  • Implementation documentation
  • APIs and use examples
  • Instructions for integrating with real-time data systems

Who is involved?

Developed and maintained by Athena Research Centre (ARC) within the EVENFLOW project framework.

Related publications:

And Synopses for All: a Synopses Data Engine for Extreme Scale Analytics-as-a-Service
Antonios Kontaxakis, Nikos Giatrakos, Dimitris Sacharidis, Antonios Deligiannakis
Information Systems, Volume 116, Issue C, 102221,
June 2023

Data-driven Synchronization Protocols for Data-parallel Neural Learning over Streaming Data
George Klioumis, Nikos Giatrakos
In Proceedings of the 2024 IEEE International Conference on Big Data (IEEE BigData’24)
Washington DC, USA, December 2024.

SuBiTO: Synopsis-based Training Optimization for Continuous Real-Time Neural Learning over Big Streaming Data (Demo Paper – SuBiTO Website)
Errikos Streviniotis, George Klioumis, Nikos Giatrakos
In Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence (AAAI’25)
Philadelphia, Pennsylvania, USA, March 2025.

NeuroFlinkCEP: Neurosymbolic Complex Event Recognition Optimized across IoT Platforms (Demo Paper – **to appear**)
Ourania Ntouni, Dimitrios Banelas, Nikos Giatrakos
In Proceedings of the 51st International Conference on Very Large Data Bases (VLDB’25)
London, United Kingdom, September 2025.

Go Up