EVENFLOW Scalability Toolkit
Scalable AI infrastructure for high-volume data environments
What is it about?
The Scalability Toolkit is a set of four open-source components that are designed to enhance the scalability and efficiency of Machine Learning (ML) workflows. The toolkit is specifically designed for distributed and high-volume environments, and it enables streamlined training and prediction. Its four integrated components are:
- Synopses-based Training Optimisation, which accelerates ML training using data summaries and Bayesian optimisation.
- Synopses Data Engine as a Service (SDEaaS), which provides efficient stream processing on Apache Flink and Dask for real-time data summarisation.
- Advanced Distributed-Parallel Training, which reduces training time and communication lag via data-driven synchronisation.
- Subito, which integrates all components into one solution for production-grade ML pipelines.
Who is it for?
- Data Scientists and ML Engineers who handle large datasets (or even continuous data streams)
- Enterprises that deploy AI models, at scale, in production environments
- Research labs who work on AI model efficiency and scalability
- DevOps teams who want to reduce infrastructure load during AI training
Why use it?
- Reduces training times without sacrificing model accuracy
- Improves resource efficiency and CPU usage
- Compatible with widely used platforms (Apache Flink, Kafka)
- The modular design enables using one or all components, depending on the users’ need
How to access the tool?
The toolkit is fully open-source and publicly available:
🔗 EVENFLOW Scalability Toolkit on GitHub
Each component is accompanied by:
- Implementation documentation
- APIs and use examples
- Instructions for integrating with real-time data systems
Who is involved?
Developed and maintained by Athena Research Centre (ARC) within the EVENFLOW project framework.
Related publications: