Mikuláš Bankovič

Heidelberg 69129, DE · +421 917 384 791 · mikulas.bankovic27@gmail.com

I am a data scientist and computer vision researcher with a Master's degree in Artificial Intelligence and Data Processing from Masaryk University. My master's thesis was written in collaboration between DKFZ German Cancer Research Center and the Centre for Biomedical Image Analysis at Masaryk University. I am experienced in leveraging advanced machine learning and deep learning to uncover profound insights within data and translate these revelations into valuable, real-world solutions.

My experience includes leading the development of custom OCR systems for low-resource language, conducting machine learning research at Seznam.cz, and improving the digitization of ancient Hussite texts as part of the AHISTO project. I am proficient in Python, computer vision, deep learning, and natural language processing, with expertise in PyTorch.

My colleagues would describe me as a driven and passionate person who maintains a positive, proactive attitude when faced with adversity. I thrive in a collaborative environment, always focused on delivering innovative solutions and continuously learning. I enjoy learning and presenting new ideas. Connect with me on LinkedIn.


Experience

Computer Vision Research Engineer

Masaryk University

As part of the Intelligent Back-Office project, a university research endeavor aimed at challenging startups in automating document processing, particularly scanned invoices, I led the implementation of our custom OCR system. Operating within resource-constrained language settings, specifically Czech, my role involved handling annotated documents. This encompassed preprocessing data and training models. Additionally, I orchestrated post-processing procedures to rectify errors, such as the second OCR run, which, after NER classification, identified fields as numbers. The outcome of this effort yielded a robust pipeline capable of processing approximately 70% of invoices originating from low-quality scans and photos.

Mar 2022 - May 2023

Machine Learning Researcher

Seznam.cz

Within the research team at Mapy.cz, a project aimed at offering an enhanced alternative to Google Maps, I assumed the role of developing a ranking model to enhance search engine outcomes. My responsibilities encompassed the processing of a vast volume of user interactions, converting them into a comprehensive dataset. Leveraging Machine Learning and Deep Learning methodologies, I trained models with the objective of identifying optimal solutions. The results were rigorously evaluated through an A/B test, demonstrating an improved ranking performance on crucial samples, while closely maintaining the overall quality.

September 2020 - December 2021

Student Research Assistant

Masaryk University

In my role on the AHISTO project, our objective was to digitize ancient Hussite texts. I took on the task of investigating the potential of super-resolution techniques to enhance the performance of the open-source available OCR (Optical Character Recognition) system. By implementing this idea, I achieved a singificant improvement in OCR accuracy, likely attributed to challenging assumptions that were previously inherent in the OCR systems. This contribution played a pivotal role in advancing the project's goals and outcomes.

January 2011 - May 2013

Student Research Assistant

University of Heidelberg

In my role as a HiWi, I focused on processing openly accessible CT scans and conducting image registration across various breathing phases of patients. I effectively utilized the SimpleITK and Elastix libraries for this purpose. This endeavor involved an extensive exploration of hyperparameters and the application of various visualization techniques. Ultimately, I selected the top-performing models to establish speed and quality benchmark within a university-developed registration framework.

January 2023 - April 2023

Machine Learning Engineer

Simple Finance

In this role, I was tasked with developing a predictive model for housing prices in the USA. I took the initiative to enhance my skills in working with AWS servers and data storage. I created a custom PyTorch neural network from scratch, which involved extensive research, coding, and testing to compete with existing gradient boosting models. Through this experience, I acquired valuable expertise in AWS infrastructure, data management, and advanced machine learning techniques, strengthening my capabilities as a data scientist.

January 2020 - February 2020

Junior Python Developer

Resideo

In my role, I conducted web scraping to collect product reviews from platforms like Amazon and BestBuy, bypassing APIs and CAPTCHA restrictions. The goal was to perform sentiment analysis for quality assurance. I successfully managed web scraping, addressing site changes through regular maintenance. Although sentiment analysis required manual validation and had some complexities, it aided in identifying product issues. This data-driven approach improved product quality and customer satisfaction.

October 2018 - October 2019

Education

DKFZ German Cancer Research Center

My primary objective was to conduct a comparative analysis of generative models applied to image synthesis within the medical field. I developed custom preprocessing techniques tailored to CT scans from 300 patients, including tasks such as table removal and background replacement. Leveraging GANs and Diffusion models libraries, I proceeded to train these models on a newly created dataset. The outcome of my work demonstrated the capability to generate results resembling the original samples using both methods, each with its own set of advantages and limitations.

October 2022 - July 2023

Masaryk University Brno

Master of Science
Artificial Intelligence and Data Processing

Grade: 1.82

September 2020 - June 2023

University of Bergen

Erasmus+ Exchange
January 2022 - July 2022

Masaryk University Brno

Bachelor of Science
Artificial Intelligence and Natural Language Processing

Grade: 2

September 2016 - June 2020

University of Tartu

Erasmus+ Exchange
September 2019 - December 2019

Skills

Programming Languages & Tools
  • Python

    I have been programming in Python since the first year of the university, and all of my projects were primarily written in Python. I can efficiently use new features and tricks to improve code speed and readability.

  • Data Preprocessing and Data Engineering

    I can process various data formats using Python libraries, such as numpy or Pandas, from images to spatial data. Preprocessing usually includes exploratory analysis, feature engineering, data curation, and standardization. I can create efficient pipelines to avoid unnecessary data inflations or transfers. I can work with big data, collecting data from Hadoop clusters or SQL databases.

  • Deep Learning and Machine Learning

    I possess extensive experience in evaluating and testing classical machine learning methodologies and cutting-edge solutions like neural networks and gradient-boosting trees. I can leverage the scikit-learn framework to evaluate experiments appropriately. I am meticulous in avoiding common pitfalls associated with data analysis, ensuring the accuracy and reliability of my findings. Additionally, I excel in efficiently tracking metrics and training progress, utilizing tools like Weights & Biases (wandb) to streamline the process.

  • Natural Language Processing and Computer Vision

    I regularly use PyTorch and hugging face model hubs to save my models and load pre-trained models. I am skilled in using these frameworks to perform the most modern Natural Language Processing with Large Language Models and Computer Vision, including Generative Adversarial Networks and diffusion models. I also worked extensively with nltk and openCV libraries.

  • Data Visualisation

    I am skilled in creating insightful visualizations using tools like matplotlib, seaborn and plotly. I can create interactive plots as well as simple graphics to support and show insights in data.

  • Reproducibility and Version Control

    I am fond of the versioning of the data and models with tools such as Git-LFS, ensuring a smooth collaborative workflow and reproducibility. I can containerise solutions with Docker and deploy them through CI/CD pipelines for scalable product delivery.

  • Linux

    I am in the process of unlocking my full potential in Linux. I can efficiently script in bash and use everyday commands. I can use available open-source tools for my leverage in work tasks.


Interests

When I'm not working as a data scientist, I love spending time outdoors. In the city of Heidelberg, I often go slacklining or swimming outside during the warmer months.

During indoor days, you can often find me in the local bouldering gym. I am often also immersed in music, either strumming my guitar or playing bass in spontaneous jam sessions with my roommates. Alternatively, I indulge in board games, and I also devote a substantial portion of my free time to staying updated on the latest developments in the exciting world of deep learning."


Awards & Certifications

  • Deep Learning A-Z™: Hands-On Artificial Neural Networks
  • Introduction to Self-Driving Cars
  • State Estimation and Localization for Self-Driving Cars
  • Visual Perception for Self-Driving Cars
  • 2 nd Place - DTSE Telekom - NLP hackathon 2019
  • 3 rd Place - ŠKODA - UnIT Big Data hackathon 2019