Szilveszter Tóth
Data Scientist/Engineer
I enjoy working with machine learning and AI algorithms to develop solutions that automate processes and improve efficiency. I focus on providing practical, profitable solutions and simplifying tasks through IT. With experience in handling and processing large datasets, I aim to extract valuable insights and use data more effectively to support business goals.
Skills
Data Engineer
Szallas Group Zrt., Budapest, Hungary
2025 MAY - PRESENT
As a Data Engineer, I built and automated data platforms on Google Cloud (GCP). I specialized in designing scalable BigQuery data warehouses using Terraform for infrastructure and dbt for data modeling. I focused on improving team efficiency by automating repetitive manual tasksmost notably by developing a RAG-based tool that utilized Large Language Models to handle routine SQL data requests. My work centered on creating reliable, automated pipelines that transformed raw data into high-quality, governed datasets.
Projects
- Developed a RAG-based automation tool for Jira, utilizing Large Language Models to resolve frequent SQL data requests autonomously.
- Enhanced LLM accuracy by injecting DDL (Data Definition Language) schemas into the RAG context, ensuring syntactically correct query generation.
- Streamlined ticket operations by automating the end-to-end Jira workflow, including ticket ingestion, processing, and resolution uploading.
- Architected and deployed a scalable DWH on BigQuery, utilizing Terraform for 100% Infrastructure-as-Code (IaC) management.
- Engineered high-fidelity ingestion pipelines using Sequin for Change Data Capture (CDC) and Airbyte for snapshot-based synchronization from PostgreSQL.
- Orchestrated containerized ETL workloads on a self-managed GKE cluster, leveraging Argo Workflows for complex dependency management.
- Optimized transformation layers using dbt to implement modular Data Marts and Star Schema architectures.
- Automated CI/CD lifecycles via GitLab and Artifact Registry to streamline Docker image deployments and dbt model updates.
- Established Data Governance frameworks by integrating Google Cloud Dataplex with automated Metadata tagging via Terraform.
- Designed ingestion workflows with Cloud Run Jobs and Cloud Scheduler.
- Automated ETL deployments using GitLab CI/CD and Artifact Registry.
- Provisioned and managed infrastructure entirely with Terraform.
- Built dbt-based data pipelines with dependency-aware models on BigQuery.
Tech Stack
Programming Languages
Tools
DevOps
Cloud
Data Scientist
Zenitech Ltd., Budapest, Hungary
2019 FEB - 2025 APR
In my role as a Data Scientist, I leverage advanced analytics and machine learning to extract valuable insights from complex datasets. I'm passionate about building predictive models and creating data-driven solutions to challenging business problems for a variety of clients.
Projects
- Selected DW3000-based UWB modules suited for subway environments.
- Implemented SS-TWR algorithm on embedded devices to measure distances accurately.
- Applied least squares multilateration to compute initiator's 2D position.
- Built a live display interface to show anchor and initiator positions on a subway map.
- Validated system performance in conditions with multipath and partial line-of-sight.
- Developed and optimized MLOps tools to accelerate the deployment of data scientists' models to production environments.
- Implemented a remote execution framework for efficient GPU utilization on Google Cloud, enabling remote invocation of functions.
- Authored Request for Comments (RFCs) to clarify the architecture of tools for customers, enhancing their understanding.
- Designed and implemented a REST-based machine learning service framework to streamline the deployment of models created by data scientists.
- Created a machine learning service abstraction layer using Open Inference Protocol v2, integrated OpenTelemetry for observability, and developed an intuitive interface for service creation.
- Automated deployment of machine learning services to Google Kubernetes Engine (GKE) using GitHub Actions for Continuous Integration (CI).
- Built metrics dashboards with PromQL to monitor and visualize service performance.
- Integrated machine learning services with Triton Inference Server to support efficient model serving.
- Making human resource use more efficient with the development of a support agent by incorporating OpenAI APIs and the company's internal knowledge base.
- Getting a deeper understanding of Large Language Models
- Building framework for using OpenAI ChatGPT
- Building an efficient solution for In-Context learning based on internal knowledge and incorporating the best practices of the prompt engineering
- Implementation of chat front-end application using Svelte framework
- Applying data exploration, cleaning, and anonymization techniques (especially for PII data), preparing an efficient form of support ticket data for fine-tuning LLMs
- Implement custom data extractor from Zendesk
- Implement ETL and machine learning pipeline in Databricks
- Deployment of the Svelte front-end and FastAPI-powered backend using GitHub actions CI technology into Azure
- Applying CRISP DM methodology on the project
- Analyzing and understanding the business domain, exploring potentially important datasets
- Elaborating deeper exploratory data analysis (EDA) on the relevant datasets with Python, Jupyter notebook and Sklearn technologies
- Creating connections between different datamarts (SAP, Excel, Worx)
- Loading data (ETL) into Hadoop Impala based data warehouse
- Creating (for the business side) useful Power BI reports as a by-product
- Applying feature engineering based on the deeper data exploration
- Modeling with ARIMA, SARIMAX, Random Forest, NN (neural network), Exponential Smoothing algorithms, and Fourier Transform algorithms
- Elaborating partial installation on the business side (data updating is supported by developers)
- Successfully addressed the challenge of rapid company growth using PowerBI, providing high-level views of the organization's current and past states, enabling management to make informed decisions for planning, forecasting, and workflow improvement.
- Collaborated with management to define and prioritize KPIs, emphasizing continuous and direct communication throughout the reporting project, with real-time dashboards powered by PowerBI.
- Developed detailed visualizations using PowerBI, including monthly, site-specific, seniority-based, and billability breakdowns, to present complex calculations and measurements for the board, uncovering utilization issues and anomalies.
- Identified and resolved company-wide issues related to projectless free capacity using PowerBI, leading to a more efficient administration flow and eliminating redundant processes.
- Developed monthly operational reports for delivery managers and people leads using PowerBI, simplifying administration processes and improving information accuracy.
- Advocated for continuous improvement by eliminating redundant administrations and unsynchronized data sources, ensuring that PowerBI reports became a reliable and indispensable tool for management decision-making and planning.
- Emphasized the importance of communicating with stakeholders, providing diverse data representations, and deep domain understanding for effective decision support
- Defined and implemented key performance indicators (KPIs), with a focus on Billable Utilization, leading to optimized layoff strategies and improved resource allocations, all powered by the capabilities of PowerBI.
- Joining different sales pipeline stages into Power BI reports to get a way better overall picture of the company's sales pipeline and, this way ease decision making
- Getting familiar with trading basics
- Getting familiar with applicable algorithmic trading frameworks: Freqtrade, Backtrader, Zipline
- Elaborating efficient ETL pipeline for the source data
- Implementing customized trading strategies based on CO2 quota data exploration
- Injecting ML algorithms into specific strategies
- Exploring, comparing, and using similar but much larger datasets (like cryptos)
- Handling efficiently highly frequent quote and trade datasets
- Building efficient model evaluation using MLFlow framework
- Implementing sophisticated neural networks based on publications in PyTorch framework
- Elaborating various risk-reward strategies using the model output and other tools like stop loss/take profit or stake setups
- Implementation of automated continuous retraining solution using Azure technologies and Azure ML studio
- Implementation of a decoupled multi-container web socket-based solution for processing level 1 order book data, running the neural network inference, and applying the results to the trading strategy
- Implementation of a Svelte based web app for the traders
- Getting familiar with football world especially the dependence of match attendances
- Exploratory data analysis on Fradi attendance data
- Using simple feature selection mechanism with Boruta and random forest feature importances
- Building and training models (like random forest or xgboost) for predicting next matches importances based on our exploratory data analysis
- Making easy-to-integrate solution with docker composition and exposing out the service with REST interface (documented with Swagger)
- Implementing efficient solution for face detection and recognition using DLIB
- Elaborating warning signals based on the result of the face recognition service
- Building easy-to-integrate architecture with Docker, RabbitMQ, Flask.
- Getting to know basics of RabbitMQ and AMQP protocol
- Implementing event driven endpoints using AMQP protocol
- Determining most applicable deep learning based object detector networks
- Getting to deeper knowledge and training of YOLOv4 network
- Implementing basic alert system based on a simple image annotator tool
- Getting to know and applying deep learning based anomaly detectors
- Integrating simple tracker algorithms in the processing
- Implementing efficient algorithm for comparing multiple flight tracks (for detecting object difference)
- Getting to deeper knowledge of Tesseract OCR and implementing its training and testing procedures
- Implementing efficient card detection by applying SIFT detector and its key feature points
- Participating in the elaboration of the image processing pipeline and testing environment
- Elaboration of pre- and postprocessing algorithms in order to significantly increase the accuracy and efficiency
- Implementing effective DevOps pipeline for the fast development and delivery by using Docker and Gitlab CI
- Elaborating deeper exploratory data analysis (EDA) on the relevant datasets with Python, Jupyter notebook and Sklearn technologies
- Deeper analysis and understanding of the business and industrial (domain) area
- Collecting, analyzing, and applying household datasets from several foreign countries
- Implementing efficient algorithm for creating the most similar dataset compared to the Hungarian conditions
- Elaborating simple modeling framework
- Modeling with ARIMA, SARIMAX, Random Forest, NN (neural network), Exponential Smoothing algorithms
- Applying feature engineering on the datasets
- Participating in implementation of data warehouse for reporting
- Creating data marts for specific reports and report groups
- Implementing and optimizing data efficient extraction, transformation, and load (ETL) from source systems
- Optimizing SQL queries, inspecting execution plans, applying appropriate indexes
- Designing star schema
- Participating in .NET backend development
- Optimizing electronic delivery system report module’s performance
Tech Stack
Programming Languages
Tools
DevOps
Cloud
Machine Learning Engineer
Idaso Ltd., Ireland
2016 FEB - 2018 DEC
Gained foundational knowledge in Deep Learning, specifically focusing on convolutional neural networks. Responsible for training and examining neural network-based object detectors and classifiers on both public and proprietary datasets.
Projects
- Examined and applied SSD (Single Shot Multibox Detector), YOLO (You Only Look Once), and Faster R-CNN object detectors for vehicle detection.
- Utilized AlexNet, Inception, and ResNet classification networks to categorize vehicles.
- Studied and implemented SORT (Simple Online and Realtime Tracking) and Deep SORT object trackers for vehicle tracking.
- Created an end-to-end automated vehicle counting process using a Serverless architecture in Microsoft Azure.
- Developed automatic processing solutions using Azure Batch Service.
- Ran and optimized SSD Detector and Deep SORT tracker components on the NVIDIA Jetson TX 2 Module for real-time vehicle counting.
- Collected and generated training data for object detection and classification using a modified Grand Theft Auto (GTA) V game and RenderDoc graphics debugger.
Tech Stack
Programming Languages
Tools
Cloud
Software Engineer Master's Degree
Budapest University of Technology and Economics
2017 – 2019
Development of a general-purpose framework to support performing industrial processes using mixed reality techniques
Software Engineer Bachelor's Degree
Budapest University of Technology and Economics
2013 – 2017
Computer vision-based vehicle detection and tracking
Let's Get in Touch
I'm always open to discussing new projects, creative ideas, or opportunities to be part of an amazing team.