Are Startups Abusing the AI Concept?

In this column, I comment on the ways start-ups use the concept of AI, whether they abuse this concept and if the consequences are innocuous.

Start-ups deploy a wide range of ITs, and it is legitime that they name themselves AI-based companies, even if there isn’t really a cutting edge technology or conceptual innovation, or a massive scale or use of computational resources. However, there is evidence that some start-ups abuse AI-labeling as a marketing strategy to impress, without AI playing a role in their alleged success.

My point is that it is legitimate to use (and name) a technology AI as long as it uses technologies usually referred to as AI and as long as the use of such technologies is helpful to accomplish a business goal. The real dangers are that focusing too much on AI as a buzzword downplays the discussion on the more relevant issues arising from these technologies and also confuses the laypeople.

There are many AI definitions, and it is not my goal here to trace a taxonomy or comment on historical aspects. It is essential, nonetheless, to distinguish between AI and the more ambitious General Artificial Intelligence (AGI) program, whose outcomes have yet not fully manifested (some would even argue that they will never come out).

Hereafter I will talk about vanilla AI. My opinion is influenced by a recent (perhaps controversial) definition of AI by an EU commission proposal aiming to define a regulatory framework for AI technologies to protect citizens:

“Artificial intelligence system’ (AI system) means software that is developed with one or more of the techniques and approaches listed in Annex I and can, for a given set of human-defined objectives, generate outputs such as content, predictions, recommendations, or decisions influencing the environments they interact with”

The techniques described in the Annex are (a) machine learning-based, (b) logic and knowledge systems-based (c) statistical and optimization-based.

I praise the pragmatism in the definitions instead of appealing to vague definitions or science fiction discourse. And there are reasons for it: although historically, the technologies in (b) have been important, today they are not the most popular, and the preeminence of AI today can be mainly attributed to machine learning technologies (a), in turn, heavily based and influenced by the most fundamental mathematical and statistical approaches in ( c).

Much of today’s discussion about whether a technology qualifies as AI-based has to do with the tension between (a) and ( c). A comic by SandSerif eloquently points to this tension, a usual source of conflict.

Some members of the public and academics working in the least trendy research areas constantly argue that machine learning is only a glorified version of statistics, a hype that somehow ‘gentrifies’ more traditional fields, but without a substantial difference with the more traditional theory and methods.

Similarly, it is criticized that even simple techniques—like a regression, finding the line that best fits a set of points to find relations among variables—are finally sold as AI, which would even be dishonest. But according to the EU proposal, doing a simple regression would still qualify as AI, as long as the final product of inference influences the environment it interacts with.

Criticisms against the hype in machine learning are certainly justified, but they sometimes become too extreme. Machine learning has been established as a scholarly discipline, and their constructions cannot be reduced to statistics because of the relevance it is given to the computational aspects. And arguably, it is finally the explosion in computational power in the past years that best explains the recent success of AI and its preeminence in society. This means that machine learning and statistics are not the same, neither in their foundations nor in their scope.

An apparent paradox arises once we group both into the same AI umbrella; characterizing two different things as the same doesn’t serve the distinction. But there is no problem altogether in saying that linear regression is AI if its outcome will help humans in cognitively demanding tasks. Besides, there is a rising advocacy for making AI methods as simple as possible in order to preserve interpretability.

Even a seemingly simple regression can indeed be a quite complex object if stated in the more modern setups. For example, a fundamental problem in statistical genetics is to determine the extent to which the expression of different variants of a gene may increase the risk for certain diseases. This question is typically stated in terms of regressions: one aims to find how individual variants influence disease risk. However, answering this question is non-straightforward as ‘false discoveries’ or spurious findings can readily appear because of the large number of regressions being inferred at the same time.

Hype

Finally, there is a more practical aspect: is it legitimate to call my startup AI-based if I am only using the same linear regression I would be doing ten years ago? Here, it is essential to understand the incentives: speaking about AI makes someone cool, and one immediately becomes an agent of the current technological revolution. A few years ago, AI pioneer Andrew Ng said with enthusiasm that “AI is the new electricity,” fueling an arms race for “being the first to use AI for.”

In fact, selling technologies as AI-based has been a useful marketing strategy. Reportedly, a few years startups that self-labeled as being in the AI field obtained between 15% and 50% more money in funding rounds compared to startups in any other field.

But we have also learned about the scope and limitations of these technologies: it was also revealed that nearly half of the startups who claimed to be using AI were not using it in any substantial part of their productive process. Besides, there have been rather scandalous cases. For example, it was pointed out that on prediction tasks in medicine (e.g. from electronic health records) the most sophisticated deep learning methods were not superior to much simpler methods, indicating that there might have not been such a need for computational sophistication.

This all suggests that it should not be easy today to appeal to the AI buzzword for fundraising purposes, although it possibly still is an effective means to impress friends, family, and laypeople. Maybe, because of the natural delay between developed and emerging economies in adopting such technologies, there might still be room among the latter (e.g., Latin America) for over-profiting from the hype, but this should not be the case for much longer.

The question of the legitimacy of the AI label also has to do with the problem that is aimed to be solved, with how the methods (simple or not) are instrumental in solving this question, and what are the means to achieve these goals. For example, if someone develops an AI-based technology that detects skin cancer based on images, then even more important than the neural network architecture is having privileged access to a sufficiently large training dataset.

After all, the value of an AI technology is not determined by the fact that it is based on a linear regression or a transformer. In the case of NotCo, one of the most successful startups in Latin America that claims to be AI-based, more important than Giuseppe (as they named their AI) is the existence of an entire food development platform so that all suggestions made by the AI can materialize quickly.

Citing a controversial case: there has been a recent concern in relation to Clearview AI, a facial recognition company, as it was disclosed that in order to achieve high accuracy levels, this company had to scrap billions of images on the web in a potentially illegal way. We can argue that the AI not only corresponds to the algorithms but also to the “fuel,” i.e., the images.

Cultural aspects

It is not casual that Rosalía speaks about AI in one of her recent songs. It shows the tremendous level of penetration of AI in culture. In this realm, the problems associated with language abuses finally manifest as misinformation. For example, there is a generalized belief that the impressive progress in Boston Dynamics’ robots can be attributed to their use of AI, but in reality, it is due to improvements in classical tools of control and mechanical engineering. This suggests that to avoid contributing to this misinformation, it is better to avoid using words solely because they are buzzwords.

Michael I Jordan, one of the pioneers in statistics and machine learning, wrote in 2019 an influential article, AI —the revolution hasn’t happened yet. His point is that the biggest problem in the abuse of the concept is that it distracts us from more important aspects: speaking of AI places the debate around exuberant sci-fi fantasy and inhibits the discussion on the real problems arising from these technologies (for him, AI is simply a new type of human-centered engineering): reliability, privacy, robustness, fairness, etc. Three years later, there don’t seem to be significant changes in this direction.