Zen and the Art of Dissatisfaction  – Part 22

Big Data, Deep Context

In this post, we explore what artificial intelligence (AI) algorithms, or rather – large language models – are, how they learn, and their growing impact on sectors such as medicine, marketing and digital infrastructure. We look into some prominent real‑world examples from the recent past—IBM’s Watson, Google Flu Trends, and the Hadoop ecosystem—and discuss how human involvement remains vital even as machine learning accelerates. Finally, we reflect on both the promise and the risks of entrusting complex decision‑making to algorithms.

Originally published in Substack: https://substack.com/inbox/post/168617753

Artificial intelligence algorithms function by ingesting training data, which guides their learning. How this data is acquired and labelled marks the key differences between various types of AI algorithms. An AI algorithm receives training data and uses it to learn. Once trained, the algorithm performs new tasks using that data as the basis for its future decisions.

AI in Healthcare: From Watson to Robot Doctors

Some algorithms are capable of learning autonomously, continuously integrating new information to adjust and refine their future actions. Others require a programmer’s intervention from time to time. AI algorithms fall into three main categories: supervised learning, unsupervised learning and reinforcement learning. The primary differences between these approaches lie in how they are trained and how they operate.

Algorithms learn to identify patterns in data streams and make assumptions about correct and incorrect choices. They become more effective and accurate the more data they receive—a process known as deep learning, based on artificial neural networks that distinguish between right and wrong answers, enabling them to draw better and faster conclusions. Deep learning is widely used in speech, image and text recognition and processing.

Modern AI and machine learning algorithms have empowered practitioners to notice things they might otherwise have missed. Herbert Chase, a professor of clinical medicine at Columbia University in New York, observed that doctors sometimes have to rely on luck to uncover underlying issues in a patient’s symptoms. Chase served as a medical adviser to IBM during the development of Watson, the AI diagnostic assistant.

IBM’s concept involved a doctor inputting, for example, three patient‑described symptoms into Watson; the diagnostic assistant would then suggest a list of possible diagnoses, ranked from most to least likely. Despite the impressive hype surrounding Watson, it proved inadequate at diagnosing actual patients. IBM therefore announced that Watson would be phased out by the end of 2023 and its clients encouraged to transition to its newer services.

One genuine advantage of AI lies in the absence of a dopamine response. A human doctor, operating via biological algorithms, experiences a rush of dopamine when they arrive at what feels like a correct diagnosis—but that diagnosis can be wrong. When doubts arise, the dopamine fades and frustration sets in. In discouragement, the doctor may choose a plausible but uncertain diagnosis and send the patient home.

An AI‑algorithm‑based “robot‑doctor” does not experience dopamine. All of its hypotheses are treated equally. A robot‑doctor would be just as enthused about a novel idea as about its billionth suggestion. It is likely that doctors will initially work alongside AI‑based robot doctors. The human doctor can review AI‑generated possibilities and make their own judgement. But how long will it be before human doctors become obsolete?

AI in Action: Data, Marketing, and Everyday Decisions

Currently, AI algorithms trained on large datasets drive actions and decision‑making across multiple fields. Robot‑doctors assisting human physicians and the self‑driving cars under development by Google or Tesla are two visible examples of near‑future possibilities—assuming the corporate marketing stays honest.

AI continues to evolve. Targeted online marketing, driven by social media data, is an example of a seemingly trivial yet powerful application that contributes to algorithmic improvement. Users may tolerate mismatched adverts on Facebook, but may become upset if a robot‑doctor recommends an incorrect, potentially expensive or risky test. The outcome is all about data—its quantity, how it is evaluated and whether quantity outweighs quality.

According to MIT economists Erik Brynjolfsson and Andrew McAfee (2014), in the 1990s only about one‑fifth of a company’s activities left a digital trace. Today, almost all corporate activities are digitised, and companies have begun to produce reports in language intelligible to algorithms. It is now more important that a company’s operations are understood by AI algorithms than by its human employees.

Nevertheless, vast amounts of data are still analysed using tools built by humans. Facebook is perhaps the most well‑known example of how our personal data is structured, collected, analysed and used to influence and manipulate opinions and behaviour.

Big Data Infrastructure

Jeff Hammerbacher—in a 2015 interview with Steve Lohr—helped introduce Hadoop in 2008 to manage the ever‑growing volume of data. Hadoop, developed by Mike Cafarella and Doug Cutting, is an open‑source variant of Google’s own distributed computing system. Initially named after Cutting’s child’s yellow toy elephant, Hadoop could process two terabits of data in two days. Two years later it could perform the same task in mere minutes.

At Facebook, Hammerbacher and his team constructed Hive, an application running on Hadoop. Now available as Apache Hive, it allows users without a computer science degree to query large processed datasets. During the writing of this post, generative AI applications such as ChatGPT (by OpenAI), Claude (Anthropic), Gemini (Google DeepMind), Mistral & Mixtral (Mistral AI), and LLaMA (Meta) have become available for casual users on ordinary computers.

A widely cited example of public‑benefit predictive data analysis is Google Flu Trends (GFT). Launched in 2008, GFT aimed to predict flu outbreaks faster than official healthcare systems by analysing popular Google search terms related to flu.

GFT successfully detected the H1N1 virus before official bodies in 2009, marking a major achievement. However, in the winter of 2012–2013, media coverage of flu induced a massive spike in related searches, causing GFT’s estimates to be almost twice the real figures. The Science article “The Parable of Google Flu” (Lazer et al., 2014) accused Google of “big‑data hubris”, although it conceded that GFT was never intended as a standalone forecasting tool, but rather as a supplementary warning signal (Raising the bar, Wikipedia).

Google’s miscalculation lay in its failure to interpret context. Steve Lohr (2015) emphasises that context involves understanding associations—a shift from raw data to meaningful information. IBM’s Watson was touted as capable of such contextual understanding, capable of linking words to appropriate contexts .

Watson: From TV champion to Clinical Tool, and sold for scraps!

David Ferrucci, a leading AI researcher at IBM, headed the DeepQA team responsible for Watson . Named after IBM’s founder Thomas J. Watson, Watson gained prominence after winning £1 million on Jeopardy! in 2011, defeating champions Brad Rutter and Ken Jennings.

Jennifer Chu‑Carroll, one of Watson’s Jeopardy! coaches, told Steve Lohr (2015) that Watson sometimes made comical errors. When asked “Who was the first female astronaut?”, Watson repeatedly answered “Wonder Woman,” failing to distinguish between fiction and reality.

Ken Jennings reflected that:

“Just as manufacturing jobs were removed in the 20th century by assembly‑line robots, Brad and I were among the first knowledge‑industry workers laid off by the new generation of ‘thinking’ machines… The Jeopardy! contestant profession may be the first Watson‑displaced profession, but I’m sure it won’t be the last.”

In February 2013, IBM announced that Watson’s first commercial application would focus on lung cancer treatment and other medical diagnoses—a real‑world “Dr Watson”—with 90% of oncology nurses reportedly following its recommendations at the time. The venture ultimately collapsed under the weight of unmet expectations and financial losses. In January 2022, IBM quietly sold the core assets of Watson Health to private equity firm Francisco Partners—reportedly for about $1 billion, a fraction of the estimated $4 billion it had invested—effectively signalling the death knell of its healthcare ambitions. The sale marked the end of Watson’s chapter as a medical innovator; the remaining assets were later rebranded under the name Merative, a standalone company focusing on data and analytics rather than AI‑powered diagnosis. Slate described the move as “sold for scraps,” characterising the downfall as a cautionary tale of over‑hyped technology failing to deliver on bold promises in complex fields like oncology.

Conclusion

Artificial intelligence algorithms are evolving rapidly, and while they offer significant benefits in fields like medicine, marketing, and data analysis, they also bring challenges. Data is not neutral: volume must be balanced with quality and contextual understanding. Tools such as Watson, Hadoop and Google Flu Trends underscore that human oversight remains indispensable. Ultimately, AI should augment human decision‑making rather than replace it—at least for now.


References

Brynjolfsson, E., & McAfee, A. (2014). The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. W. W. Norton & Company.

Ferrucci, D. A., Brown, E., Chu‑Carroll, J., Fan, J., Gondek, D., Kalyanpur, A. A., … Welty, C. (2011). Building Watson: An overview of the DeepQA project. AI Magazine, 31(3), 59–79. (IBM Research)

Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science, 343(6176), 1203–1205. (Wikipedia)

Lohr, S. (2015). Data‑ism. HarperBusiness.

Mintz‑Oron, O. (2010). Smart Machines: IBM’s Watson and the Era of Cognitive Computing. Columbia Business School Publishing. [Referenced via IBM Watson bibliography] (TIME, Wikipedia)

Jätä kommentti