Hi, I'm Tobias Sterbak

I'm a freelance data scientist and machine learning engineer focusing on natural language processing (NLP). I like build smart and useful applications with and without machine learning.

Tobias Sterbak | Data Science & Machine Learning Consultant

I like to work on:

Topics I'm interested in

Machine learning & AI solutions from prototype to production

I support you and your team in developing useful, functioning AI & machine learning systems and bring them to production.
Techstack: scikit-learn, keras, pytorch, tensorflow, fastAPI, jupyter, pandas, langchain

Machine Learning Operations (MLOps) and LLMops

I help you and your team building reliable and maintainable machine learning systems. I help you put processes in place to keep track of the machine life-cycle.

Techstack: docker, MLFlow, AWS, Azure, git, langsmith

Natural language processing (NLP)

Analyze text data and build software solutions with text. Common use cases involve named entity recognition (NER), document classification, intend detection, sentiment analysis, search, text clustering and text generation.
Techstack: huggingface-transformers, sentence-transformers, langchain, llms, scikit-learn, spacy, OpenAI, regex

Explainable AI (XAI)

I help you to understand what your machine learning system is doing and how certain decisions are made. I also support in building transparent, understandable machine learning systems.
Techstack: ELI5, lime, shaply, biaslyze

You need help with something? Drop me a mail.

Projects

Some open source and commercial projects I work(ed) on.

#

Bookkeeping automation

I worked on building a bookkeeping automation system based on machine learning to handle large numbers of transactions per month.

#

Domain-dependent information retrieval

I worked on a client project to search through a database of domain-specific documents and find semantically close matches.

OpenAndroidInstaller

OpenAndroidInstaller

The project helps to keep smartphone up to date with free software. With a graphical installation software, users are easily guided through the installation process of free Android operating systems.

Biaslyze

Biaslyze

Biaslyze helps to get started with the analysis of bias within NLP models and offers a concrete entry point for further impact assessments and mitigation measures.

Legal Review of Rental Contracts

Legal Review of Rental Contracts

Together with dida, I used different methods from the field of NLP to create software that spots errors in rental contracts.

Beyond AI Collective e.V.

Beyond AI Collective e.V.

The Beyond AI Collective is a non-profit association that works to prevent discrimination mediated by the use of algorithmic systems.

Public speaking

Occasionally, I'm talking about things I work on or give workshops on different topics.
Some recordings of public talks and tutorials can be found here.

Blog

Selected articles from depends-on-the-definition.com.

 Data validation for NLP applications with topic models

Data validation for NLP applications with topic models

In a recent article, we saw how to implement a basic validation pipeline for text data. Once a machine learning model has been deployed its behavior must be monitored. The predictive performance is expected to degrade over time as the environment changes.

Latent Dirichlet allocation from scratch

Latent Dirichlet allocation from scratch

Today, I’m going to talk about topic models in NLP. Specifically we will see how the Latent Dirichlet Allocation model works and we will implement it from scratch in numpy.

Named entity recognition with Bert

Named entity recognition with Bert

In 2018 we saw the rise of pretraining and finetuning in natural language processing. Large neural networks have been trained on general tasks like language modeling and then fine-tuned for classification tasks. One of the latest milestones in this development is the release of BERT.

NLP & ML Office Hour

Are you stuck with an NLP or ML project?