Democratizing AI for Data Science

Artificial Intelligence will play a key role in the future of data science, technology, and business, and this intelligence must be accessible to those who want it.

When we look at the most influential companies in the world like Microsoft, Facebook, Google, Amazon, Netflix, etc., they have many similarities. They all own a lot of data and have now mastered the art of applying AI to it!

As much as everyone talks about AI, using buzzwords like “Data Science”, “Machine Learning” and “Deep Learning”, there are only a small number of companies that are really using AI as part of their core business.

A study by research-led venture capital firm MMC Ventures showed that, in Europe, only 60 percent of start-ups were actually using AI in a way that’s material to their value proposition[1]. If we were to include all start-ups and established companies, this percentage would decrease even more.

Initially, this may seem surprising. But, is it really? A wide range of highly complex skills, including coding, statistics, data analysis and domain-specific knowledge are required to implement AI solutions. There aren’t many qualified people available for this installation, and the people that do exist are expensive to hire. On top of that, large amounts of data are needed to train AI models. Setting up the required data infrastructure needs an initial investment that many companies are still reluctant to make, despite the business benefits the technology brings.

To leverage the full power of AI and address these challenges, the technology needs to become accessible to a wider range of businesses, in other words, tackle the “Democratization of AI”. This article will address this AI democratization (especially the Machine Learning part of it), the required steps to do so and how to mitigate the risks that it brings along.

Some terminology first

Before we dive straight into the discussion, let’s clarify a few key terms and how they relate to each other. Some of the buzzwords are actually just subsets of one another, e.g. Deep Learning is a part of Neural Networks which is a subset of Machine Learning which is a subset of AI. AI is a broad term that describes any technique that attempts to approximately mimic human capabilities. But, predictive models, which we will discuss later on, will fall specifically into the category of Machine Learning, where an algorithm learns from historical data and discovers patterns to then be able to make predictions based on new data.

“Democratization of Artificial Intelligence” – what does it mean and why is it needed?

The democratization of AI means making it more accessible to a wider range of businesses and business users. Currently, there aren’t many people that have the background to understand AI applications, but everyone should be able to benefit from the power of AI because, in the end, “Knowledge is power.” This power is currently in the hands of a select few, which is why it must be spread out to reach more people.

Making AI accessible will increase the number of people that can interact with it. This expansion allows applications to spread to new sectors and frees up AI experts’ time to work on cutting-edge developments.

Artificial Intelligence will play a key role in the future of technology and business, and this intelligence must be accessible to those who want it.

Steps towards the democratization of Artificial Intelligence

1) Data accessibility and quality

“Data is the new oil“, “Your results are only as good as your data“, “Garbage in, garbage out“, etc. – we’ve heard these statements many times, but the reality is, maintaining the organization and quality of business data is still a big challenge. Collecting large amounts of data has become gradually easier and more affordable over the years, however, most companies are still in one of these three situations:

  • They have limited data availability, making it hard to build accurate AI models.
  • They have poor data quality, making the resulting AI models unstable and potentially misleading.
  • Their data is poorly managed and badly organized, making it time-consuming and costly to automate processes and produce AI models.

AI models only give reliable results if they learn from the right data. Democratizing the modeling part of the process will not be enough if we cannot also democratize the data management part of it.

It’s essential that a culture of data preservation, quality assurance and organization is taught at all levels, from basic to higher education, leaving the field of data science and percolating into other aspects, specifically in the business world. Training courses for companies on how to value and properly treat their data are viable short-cuts, while the education of the rest of the population might take decades.

2) User-friendly interfaces

How many people could operate computers when they first were invented? Not many! Nowadays, even toddlers can engage with iPads, and the one big difference that made this possible is the evolution of the user interface over the years.

The average Data Scientist relies heavily on coding skills throughout an analytics project. Since coding can be intimidating at first, simpler user-friendly interfaces to the coding tools allow the less tech-savvy population to also be able to interact with their data. With large cloud providers like Microsoft leading the way, more and more self-service analytics platforms (e.g. Azure ML) are appearing on the market that will make interaction with AI more intuitive.

3) Explanation of results

Let’s assume you have overcome the initial challenges of accessing, collecting and cleaning your data and you have built your first predictive model. You now have to go back to the business and present your results. How do you persuade someone to trust and act on the predictions of your model? Results explanation is key! You may need to sacrifice some model accuracy in favor of easy to understand outputs.

Using a black-box model, a model where you know the input and you get an output, but you don’t know what happens in-between, might not be a feasible solution since you not only want the business to trust your results, you also want them to act on them. Understanding the drivers of a certain behavior is what allows you to take the correct actions and create a real impact on your business. So, instead of just focusing on the data that contains information on what you’re investigating, think of additional information that could drive a certain behavior. And this is why it’s important to work in cross-functional teams with subject matter experts from the business unit.

Mitigating the risks of AI democratization

As always, innovation and change don’t come without risk. But this shouldn’t stop us from widening our horizons. We just need to learn how to control the risks.

Let’s fully automate the whole process from data processing to modeling and let everyone build their own personal predictive models. “What could go wrong?” you might think. Well, as mentioned above, a wide range of highly technical skills are required to be able to get viable results from your data. Certain parts can be facilitated or automated using simple drag-and-drop, out-of-the-box modeling functionalities, however, a machine just follows rules in the end. If you cheat (and this can happen unintentionally) in the way you set up your problem, the machine will learn and apply the wrong rules and therefore produce unusable results. Catching these mistakes or sometimes even identifying the limitations of an algorithm, is something that requires specialized data science knowledge. Making AI more easily accessible to people that don’t have that required knowledge may lead to a false interpretation of results. There is no one metric that states which model to use and how well it will perform. It’s a matter of experimenting and comparing different algorithms with different parameters and metrics, and, in the end, it all comes down to experience and specialized know-how.

So, what can we do to avoid this? First of all, it’s important to share the knowledge of data science and ensure that people who interact with self-service analytics platforms have a basic understanding of what goes on behind the simple and user-friendly interface provided by the platform.

Second, instead of trying to democratize AI in general, we should focus on a few applications, but making those as accessible as possible. Automating some of the more straightforward and widely used use-cases such as prediction of churn or credit default would allow business users to focus their efforts on learning how to correctly interpret and assess the results, and businesses could then focus their limited experienced data science resources on more complex use-cases.

Features

No items found.
No items found.

Conclusion

Democratizing AI isn’t an easy and straight-forward process that will happen overnight, and it definitely doesn’t come without risks. However, one thing is certain: one way or the other, it’s going to happen. So, if you want your business to join the circle of most influential companies in the world and succeed, you need to commit to embedding AI across your business functions.

[1]https://www.mmcventures.com/wp-content/uploads/2019/02/The-State-of-AI-2019-Divergence.pdf (p.99)
*this blog post was first published at IoT for All

It was easier in this project since we used this outpout

Business Insights

Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis

Predictive Models

Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis

Micro-Segments

Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis

Features For
External Models

Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis

Business Automation
Rules

Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis

Root-Cause
Analysis

Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis

Join our event about this topic today.

Learn all about the SparkBeyond mission and product vision.

RVSP
Arrow