Szilveszter Tóth

Data Scientist/Engineer

I enjoy working with machine learning and AI algorithms to develop solutions that automate processes and improve efficiency. I focus on providing practical, profitable solutions and simplifying tasks through IT. With experience in handling and processing large datasets, I aim to extract valuable insights and use data more effectively to support business goals.

Download CV CV

Skills

Open for details

Skill Overview

Programming

CC++PythonC#TypescriptSQLPower QueryDAXCUDATerraform

Cloud & Devops

Azure Batch ServiceAzure Functions.NETDockerLinuxBashOpentelemetryGithub ActionsGitlab CIKubernetesPowerBIGoogle Cloud (Workflow, Scheduler, Run, BigQuery)Azure servicesAWS services (Lambda, Glue, Athena)Terraform

Data Engineering

ETL/ELT DesignCDC IngestionData ModelingData WarehousingPostgreSQLBigQuerydbt

ML & AI

NVIDIA TensorRTNVIDIA DigitsCaffeTensorFlowJupyter NotebookScikit-LearnNumpyAzure ML StudioPyTorchDatabricksGrafanaPrometheusMLFlowHelmArgo WorkflowsLLM ApplicationsRAG SystemsKnowledge GraphsComputer VisionSegmentationLangchain

Collaboration

Solution designEstimationRequirement translation

Languages

English (B2)Hungarian (Native)

Work Experience

My Professional Journey.

Data Engineer

Szallas Group Zrt., Budapest, Hungary

2025 MAY - PRESENT

As a Data Engineer, I built and automated data platforms on Google Cloud (GCP). I specialized in designing scalable BigQuery data warehouses using Terraform for infrastructure and dbt for data modeling. I focused on improving team efficiency by automating repetitive manual tasksmost notably by developing a RAG-based tool that utilized Large Language Models to handle routine SQL data requests. My work centered on creating reliable, automated pipelines that transformed raw data into high-quality, governed datasets.

Projects

Developed a RAG-based automation tool for Jira, utilizing Large Language Models to resolve frequent SQL data requests autonomously.
Enhanced LLM accuracy by injecting DDL (Data Definition Language) schemas into the RAG context, ensuring syntactically correct query generation.
Streamlined ticket operations by automating the end-to-end Jira workflow, including ticket ingestion, processing, and resolution uploading.

Architected and deployed a scalable DWH on BigQuery, utilizing Terraform for 100% Infrastructure-as-Code (IaC) management.
Engineered high-fidelity ingestion pipelines using Sequin for Change Data Capture (CDC) and Airbyte for snapshot-based synchronization from PostgreSQL.
Orchestrated containerized ETL workloads on a self-managed GKE cluster, leveraging Argo Workflows for complex dependency management.
Optimized transformation layers using dbt to implement modular Data Marts and Star Schema architectures.
Automated CI/CD lifecycles via GitLab and Artifact Registry to streamline Docker image deployments and dbt model updates.
Established Data Governance frameworks by integrating Google Cloud Dataplex with automated Metadata tagging via Terraform.

Designed ingestion workflows with Cloud Run Jobs and Cloud Scheduler.
Automated ETL deployments using GitLab CI/CD and Artifact Registry.
Provisioned and managed infrastructure entirely with Terraform.
Built dbt-based data pipelines with dependency-aware models on BigQuery.

Tech Stack

Programming Languages

PythonSQL

Tools

JiraLarge Language ModelsSequinAirbytedbtPostgreSQL

DevOps

TerraformGitLabCI/CDDockerArgo Workflows

Cloud

GCPBigQueryGKECloud RunCloud SchedulerDataplex

Data Scientist

Zenitech Ltd., Budapest, Hungary

2019 FEB - 2025 APR

In my role as a Data Scientist, I leverage advanced analytics and machine learning to extract valuable insights from complex datasets. I'm passionate about building predictive models and creating data-driven solutions to challenging business problems for a variety of clients.

Projects

Selected DW3000-based UWB modules suited for subway environments.
Implemented SS-TWR algorithm on embedded devices to measure distances accurately.
Applied least squares multilateration to compute initiator's 2D position.
Built a live display interface to show anchor and initiator positions on a subway map.
Validated system performance in conditions with multipath and partial line-of-sight.

Developed and optimized MLOps tools to accelerate the deployment of data scientists' models to production environments.
Implemented a remote execution framework for efficient GPU utilization on Google Cloud, enabling remote invocation of functions.
Authored Request for Comments (RFCs) to clarify the architecture of tools for customers, enhancing their understanding.
Designed and implemented a REST-based machine learning service framework to streamline the deployment of models created by data scientists.
Created a machine learning service abstraction layer using Open Inference Protocol v2, integrated OpenTelemetry for observability, and developed an intuitive interface for service creation.
Automated deployment of machine learning services to Google Kubernetes Engine (GKE) using GitHub Actions for Continuous Integration (CI).
Built metrics dashboards with PromQL to monitor and visualize service performance.
Integrated machine learning services with Triton Inference Server to support efficient model serving.

Making human resource use more efficient with the development of a support agent by incorporating OpenAI APIs and the company's internal knowledge base.
Getting a deeper understanding of Large Language Models
Building framework for using OpenAI ChatGPT
Building an efficient solution for In-Context learning based on internal knowledge and incorporating the best practices of the prompt engineering
Implementation of chat front-end application using Svelte framework
Applying data exploration, cleaning, and anonymization techniques (especially for PII data), preparing an efficient form of support ticket data for fine-tuning LLMs
Implement custom data extractor from Zendesk
Implement ETL and machine learning pipeline in Databricks
Deployment of the Svelte front-end and FastAPI-powered backend using GitHub actions CI technology into Azure

Applying CRISP DM methodology on the project
Analyzing and understanding the business domain, exploring potentially important datasets
Elaborating deeper exploratory data analysis (EDA) on the relevant datasets with Python, Jupyter notebook and Sklearn technologies
Creating connections between different datamarts (SAP, Excel, Worx)
Loading data (ETL) into Hadoop Impala based data warehouse
Creating (for the business side) useful Power BI reports as a by-product
Applying feature engineering based on the deeper data exploration
Modeling with ARIMA, SARIMAX, Random Forest, NN (neural network), Exponential Smoothing algorithms, and Fourier Transform algorithms
Elaborating partial installation on the business side (data updating is supported by developers)

Successfully addressed the challenge of rapid company growth using PowerBI, providing high-level views of the organization's current and past states, enabling management to make informed decisions for planning, forecasting, and workflow improvement.
Collaborated with management to define and prioritize KPIs, emphasizing continuous and direct communication throughout the reporting project, with real-time dashboards powered by PowerBI.
Developed detailed visualizations using PowerBI, including monthly, site-specific, seniority-based, and billability breakdowns, to present complex calculations and measurements for the board, uncovering utilization issues and anomalies.
Identified and resolved company-wide issues related to projectless free capacity using PowerBI, leading to a more efficient administration flow and eliminating redundant processes.
Developed monthly operational reports for delivery managers and people leads using PowerBI, simplifying administration processes and improving information accuracy.
Advocated for continuous improvement by eliminating redundant administrations and unsynchronized data sources, ensuring that PowerBI reports became a reliable and indispensable tool for management decision-making and planning.
Emphasized the importance of communicating with stakeholders, providing diverse data representations, and deep domain understanding for effective decision support
Defined and implemented key performance indicators (KPIs), with a focus on Billable Utilization, leading to optimized layoff strategies and improved resource allocations, all powered by the capabilities of PowerBI.
Joining different sales pipeline stages into Power BI reports to get a way better overall picture of the company's sales pipeline and, this way ease decision making

Getting familiar with trading basics
Getting familiar with applicable algorithmic trading frameworks: Freqtrade, Backtrader, Zipline
Elaborating efficient ETL pipeline for the source data
Implementing customized trading strategies based on CO2 quota data exploration
Injecting ML algorithms into specific strategies
Exploring, comparing, and using similar but much larger datasets (like cryptos)
Handling efficiently highly frequent quote and trade datasets
Building efficient model evaluation using MLFlow framework
Implementing sophisticated neural networks based on publications in PyTorch framework
Elaborating various risk-reward strategies using the model output and other tools like stop loss/take profit or stake setups
Implementation of automated continuous retraining solution using Azure technologies and Azure ML studio
Implementation of a decoupled multi-container web socket-based solution for processing level 1 order book data, running the neural network inference, and applying the results to the trading strategy
Implementation of a Svelte based web app for the traders

Getting familiar with football world especially the dependence of match attendances
Exploratory data analysis on Fradi attendance data
Using simple feature selection mechanism with Boruta and random forest feature importances
Building and training models (like random forest or xgboost) for predicting next matches importances based on our exploratory data analysis
Making easy-to-integrate solution with docker composition and exposing out the service with REST interface (documented with Swagger)

Implementing efficient solution for face detection and recognition using DLIB
Elaborating warning signals based on the result of the face recognition service
Building easy-to-integrate architecture with Docker, RabbitMQ, Flask.
Getting to know basics of RabbitMQ and AMQP protocol
Implementing event driven endpoints using AMQP protocol

Determining most applicable deep learning based object detector networks
Getting to deeper knowledge and training of YOLOv4 network
Implementing basic alert system based on a simple image annotator tool
Getting to know and applying deep learning based anomaly detectors
Integrating simple tracker algorithms in the processing
Implementing efficient algorithm for comparing multiple flight tracks (for detecting object difference)

Getting to deeper knowledge of Tesseract OCR and implementing its training and testing procedures
Implementing efficient card detection by applying SIFT detector and its key feature points
Participating in the elaboration of the image processing pipeline and testing environment
Elaboration of pre- and postprocessing algorithms in order to significantly increase the accuracy and efficiency
Implementing effective DevOps pipeline for the fast development and delivery by using Docker and Gitlab CI

Elaborating deeper exploratory data analysis (EDA) on the relevant datasets with Python, Jupyter notebook and Sklearn technologies
Deeper analysis and understanding of the business and industrial (domain) area
Collecting, analyzing, and applying household datasets from several foreign countries
Implementing efficient algorithm for creating the most similar dataset compared to the Hungarian conditions
Elaborating simple modeling framework
Modeling with ARIMA, SARIMAX, Random Forest, NN (neural network), Exponential Smoothing algorithms
Applying feature engineering on the datasets

Participating in implementation of data warehouse for reporting
Creating data marts for specific reports and report groups
Implementing and optimizing data efficient extraction, transformation, and load (ETL) from source systems
Optimizing SQL queries, inspecting execution plans, applying appropriate indexes
Designing star schema

Participating in .NET backend development
Optimizing electronic delivery system report module’s performance

Tech Stack

Programming Languages

PythonSQLPromQLC#TypescriptDAXSQL

Tools

.NETUWB modulesMLOpsOpen Inference Protocol v2Triton Inference ServerOpenAI APIsSvelteFastAPIZendeskCRISP DMJupyterScikit-LearnSAPPower BIARIMARandom ForestFreqtradeBacktraderZiplineMLFlowPyTorchxgboostSwaggerDLIBFlaskYOLOv4Tesseract OCRSIFTRabbitMQAMQP

DevOps

GKEGitHub ActionsCI/CDDockerGitLab

Cloud

Google CloudOpenTelemetryDatabricksAzureAzure ML Studio

Machine Learning Engineer

Idaso Ltd., Ireland

2016 FEB - 2018 DEC

Gained foundational knowledge in Deep Learning, specifically focusing on convolutional neural networks. Responsible for training and examining neural network-based object detectors and classifiers on both public and proprietary datasets.

Projects

Examined and applied SSD (Single Shot Multibox Detector), YOLO (You Only Look Once), and Faster R-CNN object detectors for vehicle detection.
Utilized AlexNet, Inception, and ResNet classification networks to categorize vehicles.
Studied and implemented SORT (Simple Online and Realtime Tracking) and Deep SORT object trackers for vehicle tracking.
Created an end-to-end automated vehicle counting process using a Serverless architecture in Microsoft Azure.
Developed automatic processing solutions using Azure Batch Service.
Ran and optimized SSD Detector and Deep SORT tracker components on the NVIDIA Jetson TX 2 Module for real-time vehicle counting.
Collected and generated training data for object detection and classification using a modified Grand Theft Auto (GTA) V game and RenderDoc graphics debugger.

Tech Stack

Programming Languages

PythonCC++CUDA

Tools

SSDYOLOFaster R-CNNAlexNetInceptionResNetSORTDeep SORTNVIDIA Jetson TX 2

Cloud

Microsoft AzureAzure Batch Service

Education

My Academic Background.

Software Engineer Master's Degree

Budapest University of Technology and Economics

2017 – 2019

Development of a general-purpose framework to support performing industrial processes using mixed reality techniques

Software Engineer Bachelor's Degree

Budapest University of Technology and Economics

2013 – 2017

Computer vision-based vehicle detection and tracking

Let's Get in Touch

I'm always open to discussing new projects, creative ideas, or opportunities to be part of an amazing team.

totszilveszter@gmail.com

Say Hello