EVENFLOW Scalability Toolkit

Scalable AI infrastructure for high-volume data environments

What is it about?

The Scalability Toolkit is a set of four open-source components that are designed to enhance the scalability and efficiency of Machine Learning (ML) workflows. The toolkit is specifically designed for distributed and high-volume environments, and it enables streamlined training and prediction. Its four integrated components are:

  • Synopses-based Training Optimisation, which accelerates ML training using data summaries and Bayesian optimisation.
  • Synopses Data Engine as a Service (SDEaaS), which provides efficient stream processing on Apache Flink and Dask for real-time data summarisation.
  • Advanced Distributed-Parallel Training, which reduces training time and communication lag via data-driven synchronisation.
  • Subito, which integrates all components into one solution for production-grade ML pipelines.

 Who is it for?

  • Data Scientists and ML Engineers who handle large datasets (or even continuous data streams)
  • Enterprises that deploy AI models, at scale, in production environments
  • Research labs who work on AI model efficiency and scalability
  • DevOps teams who want to reduce infrastructure load during AI training

Why use it?

  • Reduces training times without sacrificing model accuracy
  • Improves resource efficiency and CPU usage
  • Compatible with widely used platforms (Apache Flink, Kafka)
  • The modular design enables using one or all components, depending on the users’ need

How to access the tool?

The toolkit is fully open-source and publicly available:

🔗 EVENFLOW Scalability Toolkit on GitHub

Each component is accompanied by:

  • Implementation documentation
  • APIs and use examples
  • Instructions for integrating with real-time data systems

Who is involved?

Developed and maintained by Athena Research Centre (ARC) within the EVENFLOW project framework.

Related publications:

Go Up