Work Experience

Data Scientist

2018 - Present

As the Data Scientist in the Analytics team, I deliver insights to various stakeholders within the organisation, as well as tools that help our customers become better investors. This work can take different shapes and forms, such as building recommender systems using matrix factorisation models, by helping with campaign evaluation using Causal Impact modeling, or performing significance testing of marketing campaigns.
I have implemented regression and classification models for different tasks, such as budget allocation. These models are often developed (and built) using frameworks such as scikit-learn, CatBoost, XGBoost and Keras.
During a platform upgrade, I helped evaluate the expected performance of the system using prophecy forecasting.
I developed a conversion tool to transform files using the closed Qlikview data format (QVD) to the open Parquet format, to ease migration tasks.
I have been part of the Non-Maturity Deposit risk modeling team.
During this time, I have also worked with the Google Cloud Platform, where I have built ETL pipelines using, among other components: Cloud Functions, BigQuery, AI Notebooks (Jupyter), Cloud Composer (Apache Airflow), Cloud DataFlow (Apache Beam). Increasingly, Terraform has been used to keep our growing cloud infrastructure in committable code form.

Highlights

    Data Engineer (Consultant)

    2018 - 2018

    As part of the Exploratory Analytics team, I worked on building a platform to enable Data Scientists and Analysts by giving them access to data from various sources in the cloud platform Azure.
    The service was built, using various components, such as: Azure Functions, Azure Data Factory, Azure Data Lake Analytics and Azure SQL Data Warehouse. Development consisting mostly of SQL (T-SQL, U-SQL with C#/LINQ and Databricks/Spark SQL), but also some programming in NodeJS.

    Highlights

      Data Scientist (Consultant)

      2017 - 2018
      The work consisted of building prediction models relating to patient admission and discharge data at two acute medical units in Sweden. Using Python and various machine learning frameworks, such as scikit-learn, XGBoost, CatBoost and Keras, we built models using different algorithms, from linear regression and k-nearest neighbour to gradient boosting and recurrent neural networks.

      Highlights

        Lead in Big Data Analysis and Visualisation (Consultant)

        2017 - 2017

        Part of a team working with cloud in the Adtech field, developing a new metric ("Seen") within digital marketing funnels.
        Responsible for setting up a scalable solution for analysis of user behavioural data for reporting and visualisation, with GDPR compliancy in mind.
        Development in AWS, using SQL, Python and NodeJS, and AWS services: Kinesis, Athena and Lambda. Development followed a "GitOps/infrastructure-as-code" methodology.

        Highlights

          Consultant in Data Science

          2017 - 2018

          (Consulting highlights at various exciting companies in Sweden described separately. See above).
          Work also included shorter consultations involving design and implementation of ETL pipelines on the Azure cloud platform. As an example, a realtime pipeline, visualising patient blood pressure data using Azure functions, Azure stream analytics and PowerBI.

          Highlights

            Data Scientist / Information Analyst

            2013 - 2017
            I was responsible for turning our data into one of our most valued assets while respecting the integrity of our customers. The work involved mining for patterns and correlations in transactional data, including basket analysis on receipt data, using the Apriori algorithm, and the use of Bandit algorithms, as a more dynamic and adaptive alternative to A/B testing within the SEQR mobile app. The work also included the development of an analytics platform, including ETL pipelines.

            Highlights

              Software Developer

              2012 - 2013
              As part of a cross-functional team, developing the CSCF component (a core component of Ericsson's LTE offering), I was working on adapting the build system, and testing frameworks (functional and system) to a Continuous Integration (Jenkins) environment.

              Highlights

                The overall research goal was to enhance the performance and usability of the automatic program discovery system called Grammatical Evolution.

                Highlights

                • Evaluation of various target language backends.
                • Tremendous speedup. Before: over an hour; After: in minutes (single digit).
                • Code was mainly written in C/C++.
                • Evaluation using a large Beowulf cluster.
                • Analysis of large data sets.

                Assistant Lecturer & Teaching Assistant

                2002 - 2007
                Teaching students in various courses in the field of computer science and information technology, including the introductory course. The tasks involved lecturing, tutoring, construction of exam questions and correcting exams.

                Highlights

                  Software Developer

                  2000 - 2002
                  Designed and implemented a customer-praised GUI for a classification, regression and data mining application (Virtual Predict). The GUI included preprocessing features, such as feature selection and principal component analysis, and also a visualization framework.

                  Highlights

                  • Roles: Software Architect, Developer, Tester.
                  • Designed and implemented a client/server protocol.
                  • Implementation mostly in Java and Swing.
                  • Agile development model.
                  • First to migrate to Linux and OS X.
                  • Introduced open-source tools to the team (CVS, Doxygen, Netbeans).

                  Publications

                  Candidate Oversampling Prefers Two to Tango

                  GECCO '11: Proceedings of the 13th annual conference companion on Genetic and Evolutionary Computation
                  David Wallin and Conor Ryan and R. Muhammad Atif Azad.

                  Evaluation of Population Partitioning Schemes in Bayesian Classifier EDAs

                  Proceedings of the 11th Annual conference on Genetic and Evolutionary Computation
                  David Wallin and Conor Ryan.

                  Using Over-Sampling in a Bayesian Classifier EDA to Solve Deceptive and Hierarchical Problems

                  2009 IEEE Congress on Evolutionary Computation
                  David Wallin and Conor Ryan.

                  Diversity in Discrete EDAs on Real-Valued and Dynamic Problems

                  Soft Computing, Springer Verlag.
                  David Wallin and Conor Ryan.

                  Maintaining Diversity in EDAs for Real-Valued Optimisation Problems

                  Proceedings of Frontiers in the Convergence of Bioscience and Information Technologies 2007 (FBIT2007)
                  David Wallin and Conor Ryan. Nominated for Best Paper award.

                  On the Diversity of Diversity

                  Proceedings of the 2007 Congress on Evolutionary Computation (CEC 2007)
                  David Wallin and Conor Ryan.

                  Effect of Endosymbiosis in the Symbiogenetic Coevolutionary Algorithm

                  Proceedings of the 7th International Conference on Artificial Evolution (EA'05).
                  David Wallin, Conor Ryan and R. Muhammad Atif Azad

                  Symbiogenetic Coevolution

                  Proceedings of the 2005 IEEE International Conference on Evolutionary Computation
                  David Wallin, Conor Ryan and R. Muhammad Atif Azad

                  Non-stationary Function Optimization Using Polygenic Inheritance

                  Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2003), Part II
                  Conor Ryan, J. J. Collins and David Wallin

                  Adaptation of Hyper Objects for Classification

                  Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), Graduate Student Workshop Program
                  David Wallin.

                  Skills

                  Machine learning


                  Keywords:
                  • Estimation of Distribution Algorithms
                  • Evolutionary Algorithms
                  • Genetic Algorithms
                  • Genetic Programming
                  • Artificial Neural Networks
                  • Grammatical Evolution
                  • Bayesian Networks
                  • Recommender Systems
                  • Bandit Algorithms
                  • Keras
                  • Jupyter

                  Programming


                  Keywords:
                  • Python
                  • R
                  • C/C++
                  • Common Lisp
                  • Java
                  • NodeJS/Javascript
                  • [Julia]

                  Databases


                  Keywords:
                  • Postgres
                  • MySQL
                  • Apache Drill
                  • AWS Athena
                  • Azure Stream Analytics
                  • Azure SQL Data Warehouse
                  • BigQuery
                  • Oracle

                  Tools


                  Keywords:
                  • Google Cloud Platform
                  • AWS
                  • Azure
                  • Docker
                  • Ansible
                  • Terraform
                  • git
                  • Emacs