Szilveszter Tóth

Data Scientist/Engineer

Szilveszter Tóth

I enjoy working with machine learning and AI algorithms to develop solutions that automate processes and improve efficiency. I focus on providing practical, profitable solutions and simplifying tasks through IT. With experience in handling and processing large datasets, I aim to extract valuable insights and use data more effectively to support business goals.


Skills

Work Experience
My Professional Journey.

Data Engineer

Szallas Group Zrt., Budapest, Hungary

2025 MAY - PRESENT

As a Data Engineer, I built and automated data platforms on Google Cloud (GCP). I specialized in designing scalable BigQuery data warehouses using Terraform for infrastructure and dbt for data modeling. I focused on improving team efficiency by automating repetitive manual tasksmost notably by developing a RAG-based tool that utilized Large Language Models to handle routine SQL data requests. My work centered on creating reliable, automated pipelines that transformed raw data into high-quality, governed datasets.

Projects

  • Developed a RAG-based automation tool for Jira, utilizing Large Language Models to resolve frequent SQL data requests autonomously.
  • Enhanced LLM accuracy by injecting DDL (Data Definition Language) schemas into the RAG context, ensuring syntactically correct query generation.
  • Streamlined ticket operations by automating the end-to-end Jira workflow, including ticket ingestion, processing, and resolution uploading.
  • Architected and deployed a scalable DWH on BigQuery, utilizing Terraform for 100% Infrastructure-as-Code (IaC) management.
  • Engineered high-fidelity ingestion pipelines using Sequin for Change Data Capture (CDC) and Airbyte for snapshot-based synchronization from PostgreSQL.
  • Orchestrated containerized ETL workloads on a self-managed GKE cluster, leveraging Argo Workflows for complex dependency management.
  • Optimized transformation layers using dbt to implement modular Data Marts and Star Schema architectures.
  • Automated CI/CD lifecycles via GitLab and Artifact Registry to streamline Docker image deployments and dbt model updates.
  • Established Data Governance frameworks by integrating Google Cloud Dataplex with automated Metadata tagging via Terraform.
  • Designed ingestion workflows with Cloud Run Jobs and Cloud Scheduler.
  • Automated ETL deployments using GitLab CI/CD and Artifact Registry.
  • Provisioned and managed infrastructure entirely with Terraform.
  • Built dbt-based data pipelines with dependency-aware models on BigQuery.

Tech Stack

Programming Languages
PythonSQL
Tools
JiraLarge Language ModelsSequinAirbytedbtPostgreSQL
DevOps
TerraformGitLabCI/CDDockerArgo Workflows
Cloud
GCPBigQueryGKECloud RunCloud SchedulerDataplex

Data Scientist

Zenitech Ltd., Budapest, Hungary

2019 FEB - 2025 APR

In my role as a Data Scientist, I leverage advanced analytics and machine learning to extract valuable insights from complex datasets. I'm passionate about building predictive models and creating data-driven solutions to challenging business problems for a variety of clients.

Projects

  • Selected DW3000-based UWB modules suited for subway environments.
  • Implemented SS-TWR algorithm on embedded devices to measure distances accurately.
  • Applied least squares multilateration to compute initiator's 2D position.
  • Built a live display interface to show anchor and initiator positions on a subway map.
  • Validated system performance in conditions with multipath and partial line-of-sight.
  • Developed and optimized MLOps tools to accelerate the deployment of data scientists' models to production environments.
  • Implemented a remote execution framework for efficient GPU utilization on Google Cloud, enabling remote invocation of functions.
  • Authored Request for Comments (RFCs) to clarify the architecture of tools for customers, enhancing their understanding.
  • Designed and implemented a REST-based machine learning service framework to streamline the deployment of models created by data scientists.
  • Created a machine learning service abstraction layer using Open Inference Protocol v2, integrated OpenTelemetry for observability, and developed an intuitive interface for service creation.
  • Automated deployment of machine learning services to Google Kubernetes Engine (GKE) using GitHub Actions for Continuous Integration (CI).
  • Built metrics dashboards with PromQL to monitor and visualize service performance.
  • Integrated machine learning services with Triton Inference Server to support efficient model serving.
  • Making human resource use more efficient with the development of a support agent by incorporating OpenAI APIs and the company's internal knowledge base.
  • Getting a deeper understanding of Large Language Models
  • Building framework for using OpenAI ChatGPT
  • Building an efficient solution for In-Context learning based on internal knowledge and incorporating the best practices of the prompt engineering
  • Implementation of chat front-end application using Svelte framework
  • Applying data exploration, cleaning, and anonymization techniques (especially for PII data), preparing an efficient form of support ticket data for fine-tuning LLMs
  • Implement custom data extractor from Zendesk
  • Implement ETL and machine learning pipeline in Databricks
  • Deployment of the Svelte front-end and FastAPI-powered backend using GitHub actions CI technology into Azure
  • Applying CRISP DM methodology on the project
  • Analyzing and understanding the business domain, exploring potentially important datasets
  • Elaborating deeper exploratory data analysis (EDA) on the relevant datasets with Python, Jupyter notebook and Sklearn technologies
  • Creating connections between different datamarts (SAP, Excel, Worx)
  • Loading data (ETL) into Hadoop Impala based data warehouse
  • Creating (for the business side) useful Power BI reports as a by-product
  • Applying feature engineering based on the deeper data exploration
  • Modeling with ARIMA, SARIMAX, Random Forest, NN (neural network), Exponential Smoothing algorithms, and Fourier Transform algorithms
  • Elaborating partial installation on the business side (data updating is supported by developers)
  • Successfully addressed the challenge of rapid company growth using PowerBI, providing high-level views of the organization's current and past states, enabling management to make informed decisions for planning, forecasting, and workflow improvement.
  • Collaborated with management to define and prioritize KPIs, emphasizing continuous and direct communication throughout the reporting project, with real-time dashboards powered by PowerBI.
  • Developed detailed visualizations using PowerBI, including monthly, site-specific, seniority-based, and billability breakdowns, to present complex calculations and measurements for the board, uncovering utilization issues and anomalies.
  • Identified and resolved company-wide issues related to projectless free capacity using PowerBI, leading to a more efficient administration flow and eliminating redundant processes.
  • Developed monthly operational reports for delivery managers and people leads using PowerBI, simplifying administration processes and improving information accuracy.
  • Advocated for continuous improvement by eliminating redundant administrations and unsynchronized data sources, ensuring that PowerBI reports became a reliable and indispensable tool for management decision-making and planning.
  • Emphasized the importance of communicating with stakeholders, providing diverse data representations, and deep domain understanding for effective decision support
  • Defined and implemented key performance indicators (KPIs), with a focus on Billable Utilization, leading to optimized layoff strategies and improved resource allocations, all powered by the capabilities of PowerBI.
  • Joining different sales pipeline stages into Power BI reports to get a way better overall picture of the company's sales pipeline and, this way ease decision making
  • Getting familiar with trading basics
  • Getting familiar with applicable algorithmic trading frameworks: Freqtrade, Backtrader, Zipline
  • Elaborating efficient ETL pipeline for the source data
  • Implementing customized trading strategies based on CO2 quota data exploration
  • Injecting ML algorithms into specific strategies
  • Exploring, comparing, and using similar but much larger datasets (like cryptos)
  • Handling efficiently highly frequent quote and trade datasets
  • Building efficient model evaluation using MLFlow framework
  • Implementing sophisticated neural networks based on publications in PyTorch framework
  • Elaborating various risk-reward strategies using the model output and other tools like stop loss/take profit or stake setups
  • Implementation of automated continuous retraining solution using Azure technologies and Azure ML studio
  • Implementation of a decoupled multi-container web socket-based solution for processing level 1 order book data, running the neural network inference, and applying the results to the trading strategy
  • Implementation of a Svelte based web app for the traders
  • Getting familiar with football world especially the dependence of match attendances
  • Exploratory data analysis on Fradi attendance data
  • Using simple feature selection mechanism with Boruta and random forest feature importances
  • Building and training models (like random forest or xgboost) for predicting next matches importances based on our exploratory data analysis
  • Making easy-to-integrate solution with docker composition and exposing out the service with REST interface (documented with Swagger)
  • Implementing efficient solution for face detection and recognition using DLIB
  • Elaborating warning signals based on the result of the face recognition service
  • Building easy-to-integrate architecture with Docker, RabbitMQ, Flask.
  • Getting to know basics of RabbitMQ and AMQP protocol
  • Implementing event driven endpoints using AMQP protocol
  • Determining most applicable deep learning based object detector networks
  • Getting to deeper knowledge and training of YOLOv4 network
  • Implementing basic alert system based on a simple image annotator tool
  • Getting to know and applying deep learning based anomaly detectors
  • Integrating simple tracker algorithms in the processing
  • Implementing efficient algorithm for comparing multiple flight tracks (for detecting object difference)
  • Getting to deeper knowledge of Tesseract OCR and implementing its training and testing procedures
  • Implementing efficient card detection by applying SIFT detector and its key feature points
  • Participating in the elaboration of the image processing pipeline and testing environment
  • Elaboration of pre- and postprocessing algorithms in order to significantly increase the accuracy and efficiency
  • Implementing effective DevOps pipeline for the fast development and delivery by using Docker and Gitlab CI
  • Elaborating deeper exploratory data analysis (EDA) on the relevant datasets with Python, Jupyter notebook and Sklearn technologies
  • Deeper analysis and understanding of the business and industrial (domain) area
  • Collecting, analyzing, and applying household datasets from several foreign countries
  • Implementing efficient algorithm for creating the most similar dataset compared to the Hungarian conditions
  • Elaborating simple modeling framework
  • Modeling with ARIMA, SARIMAX, Random Forest, NN (neural network), Exponential Smoothing algorithms
  • Applying feature engineering on the datasets
  • Participating in implementation of data warehouse for reporting
  • Creating data marts for specific reports and report groups
  • Implementing and optimizing data efficient extraction, transformation, and load (ETL) from source systems
  • Optimizing SQL queries, inspecting execution plans, applying appropriate indexes
  • Designing star schema
  • Participating in .NET backend development
  • Optimizing electronic delivery system report module’s performance

Tech Stack

Programming Languages
PythonSQLPromQLC#TypescriptDAXSQL
Tools
.NETUWB modulesMLOpsOpen Inference Protocol v2Triton Inference ServerOpenAI APIsSvelteFastAPIZendeskCRISP DMJupyterScikit-LearnSAPPower BIARIMARandom ForestFreqtradeBacktraderZiplineMLFlowPyTorchxgboostSwaggerDLIBFlaskYOLOv4Tesseract OCRSIFTRabbitMQAMQP
DevOps
GKEGitHub ActionsCI/CDDockerGitLab
Cloud
Google CloudOpenTelemetryDatabricksAzureAzure ML Studio

Machine Learning Engineer

Idaso Ltd., Ireland

2016 FEB - 2018 DEC

Gained foundational knowledge in Deep Learning, specifically focusing on convolutional neural networks. Responsible for training and examining neural network-based object detectors and classifiers on both public and proprietary datasets.

Projects

  • Examined and applied SSD (Single Shot Multibox Detector), YOLO (You Only Look Once), and Faster R-CNN object detectors for vehicle detection.
  • Utilized AlexNet, Inception, and ResNet classification networks to categorize vehicles.
  • Studied and implemented SORT (Simple Online and Realtime Tracking) and Deep SORT object trackers for vehicle tracking.
  • Created an end-to-end automated vehicle counting process using a Serverless architecture in Microsoft Azure.
  • Developed automatic processing solutions using Azure Batch Service.
  • Ran and optimized SSD Detector and Deep SORT tracker components on the NVIDIA Jetson TX 2 Module for real-time vehicle counting.
  • Collected and generated training data for object detection and classification using a modified Grand Theft Auto (GTA) V game and RenderDoc graphics debugger.

Tech Stack

Programming Languages
PythonCC++CUDA
Tools
SSDYOLOFaster R-CNNAlexNetInceptionResNetSORTDeep SORTNVIDIA Jetson TX 2
Cloud
Microsoft AzureAzure Batch Service

Education
My Academic Background.

Software Engineer Master's Degree

Budapest University of Technology and Economics

2017 – 2019

Development of a general-purpose framework to support performing industrial processes using mixed reality techniques

Software Engineer Bachelor's Degree

Budapest University of Technology and Economics

2013 – 2017

Computer vision-based vehicle detection and tracking


Let's Get in Touch

I'm always open to discussing new projects, creative ideas, or opportunities to be part of an amazing team.

totszilveszter@gmail.com