hero image
Jaroslaw Szlichta, PhD - University of Ontario Institute of Technology. Oshawa, ON, CA

Jaroslaw Szlichta, PhD Jaroslaw Szlichta, PhD

Assistant Professor, Faculty of Science | University of Ontario Institute of Technology

Oshawa, ON, CANADA

Award-winning big data analytics expert examines big data cleaning to improve accuracy of predictors and trends



With an infinite amount of data swirling around vast global networks, big data analytics is exploding not only as a means to process and understand abundant information, but as a key method for predicting trends in social and economic behaviour. While data availability continues to gain rapid speed, the challenge lies in ensuring its accuracy.

Human error produces ‘dirty data’ which triggers incorrect analytics and leads to inaccurate business decisions. Jaroslaw Szlichta, PhD, Assistant Professor in the Faculty of Science, is focused on data analytics, business intelligence and big data cleaning. His latest research aims to improve the rate of clean data, which would significantly improve data accuracy, and lead to more precise data analytics predictions and trends.

Awarded a post-doctoral fellowship by Mitacs Elevate in 2014, Dr. Szlichta’s research focused on big data integration and continuous data cleaning. He developed an algorithm to automatically integrate and clean all data before any analytics were performed to ensure more accurate outcomes. In 2013, Dr. Szlichta was appointed post-doctoral fellow in the Department of Computer Science at the University of Toronto, before joining UOIT in July 2014. He brings award-winning, big data analytics expertise to the university and has developed an undergraduate course on the subject.

Applying his interest in math to computer science, Dr. Szlichta earned his Master of Science in Engineering from the Faculty of Electronics and Information Science at the Warsaw University of Technology in Warsaw Poland in 2009; and received his Doctorate in Computer Science from the Department of Computer Science and Engineering at York University in Toronto, Ontario in 2013. During his doctoral studies, he was appointed a three-year research fellowship at the IBM Centre for Advanced Studies (CAS) in Markham, Ontario; and in 2012, he received the IBM CAS Research Student of the Year Award.

A former software developer for Comarch Research & Development in Warsaw, he developed the WYSIWYG reporting system OCEAN GenRap, a novel data analytics reporting solution. Recognized for his collaborative work, Dr. Szlichta received the prestigious CeBIT Business Award. He is also a member of the Big Data Benchmark Community, a global community group aimed at developing a data set that may be used as a benchmark for evaluating research.

Industry Expertise (3)



Information Technology and Services

Areas of Expertise (13)

Big Data

Business Intelligence

Data Analytics

Information Integration

Heterogeneous Computing

Web Search

Machine Learning

Data Curation

Data Cleaning

Cloud Computing

Data Mining

Optimization of Queries for Business Intelligence

Data Science

Accomplishments (5)

Mitacs Elevate Post-doctoral Fellowship Program (professional)


Awarded $57,000 over one year to support his research, Dr. Szlichta focused on big data integration and continuous data cleaning.

Post-doctoral Fellow, Department of Computer Science, University of Toronto (professional)


Appointed post-doctoral fellow to continue his research into big data analytics, and data cleaning.

IBM CAS Research Student of the Year Award (professional)


Awarded to a student who has shown outstanding insight and perspective that has contributed to IBM in a matter of great importance. During his research fellowship, Dr. Szlichta worked closely with IBM on order dependencies in databases, and proved to be a key resource in developing a prototype to exploit order optimization in DB2, and to optimize date predicates using generated subqueries.

IBM CAS Research Fellowship (professional)


During his doctoral studies, Dr. Szlichta was appointed to a three-year research fellowship and awarded $102,000 within this highly competitive worldwide program, which honours exceptional doctoral students who have an interest in solving problems that are important in practice (and to IBM) and fundamental to innovation in many academic disciples and areas of study.

2007 CeBIT Business Award (professional)


Awarded to Dr. Szlichta for his collaborative work in designing and implementing OCEAN GenRap system, an innovative data analytics reporting solution. CeBIT is the world's largest international computer exhibition.

Education (2)

York University: PhD, Computer Science 2013

Warsaw University of Technology: MSE, Computer Science 2009

Languages (2)

  • English
  • Polish

Event Appearances (7)

Expressiveness and Complexity of Order Dependencies

University of Waterloo, Database Research Group Meeting  Waterloo, Ontario


Fundamentals of Order Optimization

Invited Talk, University of Waterloo  Waterloo, Ontario


Fundamentals and Applications of Order Dependencies

University of Toronto DB Seminar  Toronto, Ontario


Chasing Order Dependencies

Invited Talk, Carleton University  Ottawa, Ontario


Applications for Order Dependencies in IBM DB2

Invited Talk, IBM Research Almaden  San Jose, California


Optimizing Business-Intelligence Queries in DB2 with Order Dependencies

Invited Talk, AT&T Labs  New Jersey, United States


Queries With Dates

Invited Talk, Warsaw University of Technology  Warsaw, Poland


Research Grants (1)

Big Data Cleaning

NSERC Discovery Grant $90000


As primary investigator, Dr. Szlichta's five-year, international research project focuses on big data cleaning in partnership with the University of Waterloo, Ontario, AT&T in New York, and IBM CAS in Markham, Ontario.

view more

Courses (5)

Computers and Media

1200U, 1st Year, Undergraduate Course (Elective)

view more

Software Design and Analysis

2040U, 2nd Year, Undergraduate Course

view more

Database Systems and Concepts

3030U, 3rd Year, Undergraduate Course

view more

Big Data Analytics

4030U, 4th Year, Undergraduate Course

view more

Advanced Topics in Information Science

CSCI 6720G, Graduate Course

view more

Articles (5)

MeanKS: Meaningful Keyword Search in Relational Databases with Complex Schema SIGMOD '14, Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data


This research demonstrates MeanKS, a new system for meaningful keyword search over relational databases. The system first captures the user's interest by determining the roles of the keywords. Then, it uses schema-based ranking to rank join trees that cover the keyword roles. This uses the relevance of relations and foreign-key relationships in the schema over the information content of the database.

view more

Continuous Data Cleaning 2014 IEEE 30th International Conference on Data Engineering (ICDE)


In declarative data cleaning, data semantics are encoded as constraints and errors arise when the data violates the constraints. This research introduces a continuous data cleaning framework that can be applied to dynamic data and constraint environments. The approach permits both the data and its semantics to evolve and suggests repairs based on the accumulated evidence to date. Importantly, the approach uses not only the data and constraints as evidence, but also considers the past repairs chosen and applied by a user (user repair preferences).

view more

Expressiveness and Complexity of Order Dependencies Journal Proceedings of the VLDB Endowment


Dependencies play an important role in databases. Order dependencies (ODs)--and unidirectional order dependencies (UODs), a proper sub-class of ODs--which describe the relationships among lexicographical orderings of sets of tuples are studied. Lexicographical ordering is considered, as by the order-by operator in SQL, because this is the notion of order used in SQL and within query optimization. The main goal is to investigate the inference problem for ODs, both in theory and in practice. We show the usefulness of ODs in query optimization.

view more

Fundamentals of Order Dependencies Journal Proceedings of the VLDB Endowment


Dependencies have played a significant role in database design for many years. They have also been shown to be useful in query optimization. This paper discusses dependencies between lexicographically ordered sets of tuples. It introduces formally the concept of order dependency and presents a set of axioms (inference rules) for them. Additionally, it shows how query rewrites based on these axioms can be used for query optimization.

view more

Queries on Dates: Fast yet not Blind Proceedings of the 14th International Conference on Extending Database Technology


Data warehouses are repositories of electronically stored data which are designed to support reporting and analysis. The analysis of historical data often involves aggregation over time. Thus, time is critical in the design of a data warehouse. This research describes novel techniques for storing date information and optimization of queries that reference the date dimension. It shows how to embed intelligence into the date key and how to exploit monotonic dependencies. This research presents the value of these techniques for the improvement of performance when combined with partitioning and indexes.

view more