Zen and the Art of Dissatisfaction  – Part 22

Big Data, Deep Context

In this post, we explore what artificial intelligence (AI) algorithms, or rather – large language models – are, how they learn, and their growing impact on sectors such as medicine, marketing and digital infrastructure. We look into some prominent real‑world examples from the recent past—IBM’s Watson, Google Flu Trends, and the Hadoop ecosystem—and discuss how human involvement remains vital even as machine learning accelerates. Finally, we reflect on both the promise and the risks of entrusting complex decision‑making to algorithms.

Originally published in Substack: https://substack.com/inbox/post/168617753

Artificial intelligence algorithms function by ingesting training data, which guides their learning. How this data is acquired and labelled marks the key differences between various types of AI algorithms. An AI algorithm receives training data and uses it to learn. Once trained, the algorithm performs new tasks using that data as the basis for its future decisions.

AI in Healthcare: From Watson to Robot Doctors

Some algorithms are capable of learning autonomously, continuously integrating new information to adjust and refine their future actions. Others require a programmer’s intervention from time to time. AI algorithms fall into three main categories: supervised learning, unsupervised learning and reinforcement learning. The primary differences between these approaches lie in how they are trained and how they operate.

Algorithms learn to identify patterns in data streams and make assumptions about correct and incorrect choices. They become more effective and accurate the more data they receive—a process known as deep learning, based on artificial neural networks that distinguish between right and wrong answers, enabling them to draw better and faster conclusions. Deep learning is widely used in speech, image and text recognition and processing.

Modern AI and machine learning algorithms have empowered practitioners to notice things they might otherwise have missed. Herbert Chase, a professor of clinical medicine at Columbia University in New York, observed that doctors sometimes have to rely on luck to uncover underlying issues in a patient’s symptoms. Chase served as a medical adviser to IBM during the development of Watson, the AI diagnostic assistant.

IBM’s concept involved a doctor inputting, for example, three patient‑described symptoms into Watson; the diagnostic assistant would then suggest a list of possible diagnoses, ranked from most to least likely. Despite the impressive hype surrounding Watson, it proved inadequate at diagnosing actual patients. IBM therefore announced that Watson would be phased out by the end of 2023 and its clients encouraged to transition to its newer services.

One genuine advantage of AI lies in the absence of a dopamine response. A human doctor, operating via biological algorithms, experiences a rush of dopamine when they arrive at what feels like a correct diagnosis—but that diagnosis can be wrong. When doubts arise, the dopamine fades and frustration sets in. In discouragement, the doctor may choose a plausible but uncertain diagnosis and send the patient home.

An AI‑algorithm‑based “robot‑doctor” does not experience dopamine. All of its hypotheses are treated equally. A robot‑doctor would be just as enthused about a novel idea as about its billionth suggestion. It is likely that doctors will initially work alongside AI‑based robot doctors. The human doctor can review AI‑generated possibilities and make their own judgement. But how long will it be before human doctors become obsolete?

AI in Action: Data, Marketing, and Everyday Decisions

Currently, AI algorithms trained on large datasets drive actions and decision‑making across multiple fields. Robot‑doctors assisting human physicians and the self‑driving cars under development by Google or Tesla are two visible examples of near‑future possibilities—assuming the corporate marketing stays honest.

AI continues to evolve. Targeted online marketing, driven by social media data, is an example of a seemingly trivial yet powerful application that contributes to algorithmic improvement. Users may tolerate mismatched adverts on Facebook, but may become upset if a robot‑doctor recommends an incorrect, potentially expensive or risky test. The outcome is all about data—its quantity, how it is evaluated and whether quantity outweighs quality.

According to MIT economists Erik Brynjolfsson and Andrew McAfee (2014), in the 1990s only about one‑fifth of a company’s activities left a digital trace. Today, almost all corporate activities are digitised, and companies have begun to produce reports in language intelligible to algorithms. It is now more important that a company’s operations are understood by AI algorithms than by its human employees.

Nevertheless, vast amounts of data are still analysed using tools built by humans. Facebook is perhaps the most well‑known example of how our personal data is structured, collected, analysed and used to influence and manipulate opinions and behaviour.

Big Data Infrastructure

Jeff Hammerbacher—in a 2015 interview with Steve Lohr—helped introduce Hadoop in 2008 to manage the ever‑growing volume of data. Hadoop, developed by Mike Cafarella and Doug Cutting, is an open‑source variant of Google’s own distributed computing system. Initially named after Cutting’s child’s yellow toy elephant, Hadoop could process two terabits of data in two days. Two years later it could perform the same task in mere minutes.

At Facebook, Hammerbacher and his team constructed Hive, an application running on Hadoop. Now available as Apache Hive, it allows users without a computer science degree to query large processed datasets. During the writing of this post, generative AI applications such as ChatGPT (by OpenAI), Claude (Anthropic), Gemini (Google DeepMind), Mistral & Mixtral (Mistral AI), and LLaMA (Meta) have become available for casual users on ordinary computers.

A widely cited example of public‑benefit predictive data analysis is Google Flu Trends (GFT). Launched in 2008, GFT aimed to predict flu outbreaks faster than official healthcare systems by analysing popular Google search terms related to flu.

GFT successfully detected the H1N1 virus before official bodies in 2009, marking a major achievement. However, in the winter of 2012–2013, media coverage of flu induced a massive spike in related searches, causing GFT’s estimates to be almost twice the real figures. The Science article “The Parable of Google Flu” (Lazer et al., 2014) accused Google of “big‑data hubris”, although it conceded that GFT was never intended as a standalone forecasting tool, but rather as a supplementary warning signal (Raising the bar, Wikipedia).

Google’s miscalculation lay in its failure to interpret context. Steve Lohr (2015) emphasises that context involves understanding associations—a shift from raw data to meaningful information. IBM’s Watson was touted as capable of such contextual understanding, capable of linking words to appropriate contexts .

Watson: From TV champion to Clinical Tool, and sold for scraps!

David Ferrucci, a leading AI researcher at IBM, headed the DeepQA team responsible for Watson . Named after IBM’s founder Thomas J. Watson, Watson gained prominence after winning £1 million on Jeopardy! in 2011, defeating champions Brad Rutter and Ken Jennings.

Jennifer Chu‑Carroll, one of Watson’s Jeopardy! coaches, told Steve Lohr (2015) that Watson sometimes made comical errors. When asked “Who was the first female astronaut?”, Watson repeatedly answered “Wonder Woman,” failing to distinguish between fiction and reality.

Ken Jennings reflected that:

“Just as manufacturing jobs were removed in the 20th century by assembly‑line robots, Brad and I were among the first knowledge‑industry workers laid off by the new generation of ‘thinking’ machines… The Jeopardy! contestant profession may be the first Watson‑displaced profession, but I’m sure it won’t be the last.”

In February 2013, IBM announced that Watson’s first commercial application would focus on lung cancer treatment and other medical diagnoses—a real‑world “Dr Watson”—with 90% of oncology nurses reportedly following its recommendations at the time. The venture ultimately collapsed under the weight of unmet expectations and financial losses. In January 2022, IBM quietly sold the core assets of Watson Health to private equity firm Francisco Partners—reportedly for about $1 billion, a fraction of the estimated $4 billion it had invested—effectively signalling the death knell of its healthcare ambitions. The sale marked the end of Watson’s chapter as a medical innovator; the remaining assets were later rebranded under the name Merative, a standalone company focusing on data and analytics rather than AI‑powered diagnosis. Slate described the move as “sold for scraps,” characterising the downfall as a cautionary tale of over‑hyped technology failing to deliver on bold promises in complex fields like oncology.

Conclusion

Artificial intelligence algorithms are evolving rapidly, and while they offer significant benefits in fields like medicine, marketing, and data analysis, they also bring challenges. Data is not neutral: volume must be balanced with quality and contextual understanding. Tools such as Watson, Hadoop and Google Flu Trends underscore that human oversight remains indispensable. Ultimately, AI should augment human decision‑making rather than replace it—at least for now.


References

Brynjolfsson, E., & McAfee, A. (2014). The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. W. W. Norton & Company.

Ferrucci, D. A., Brown, E., Chu‑Carroll, J., Fan, J., Gondek, D., Kalyanpur, A. A., … Welty, C. (2011). Building Watson: An overview of the DeepQA project. AI Magazine, 31(3), 59–79. (IBM Research)

Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science, 343(6176), 1203–1205. (Wikipedia)

Lohr, S. (2015). Data‑ism. HarperBusiness.

Mintz‑Oron, O. (2010). Smart Machines: IBM’s Watson and the Era of Cognitive Computing. Columbia Business School Publishing. [Referenced via IBM Watson bibliography] (TIME, Wikipedia)

Zen and the Art of Dissatisfaction – Part 21

Data: The Oil of the Digital Age

Data applications rely fundamentally on data—its extraction, collection, storage, interpretation, and monetisation—making them arguably the most significant feature of our contemporary world. Often referred to as ”the new oil,” data is, from the perspective of persistent capitalists, a valuable resource capable of sustaining economic growth even after conventional natural reserves have been exhausted. This new form of capitalism has been titled Surveillance Capitalism (Zuboff 2019).

Originally published in Substack: https://substack.com/@mikkoijas

Data matters more than opinions. For developers of data applications, the key goal is that we browse online, click “like,” follow links, spend time on their platforms, and accept cookies. What we think or do does not matter; what matters is the digital behavioural surplus, a trace we leave and our consent to tracking. That footprint has become immensely valuable—companies are willing to pay for it, and sometimes break laws to get it.

Cookies and Consumer Privacy in Europe

European legislation like the General Data Protection Regulation (GDPR) ensures some personal protection, but we still leave traces even if we refuse to share personal data. Websites are legally obligated to request our cookie consent, making privacy violations more visible. Rejecting cookies and clearing them out later becomes a time-consuming and frustrating chore.

In stark contrast, China’s data laws are much more relaxed, granting companies broader operational freedom. The more data a company gathers, the more fine-tuned its predictive algorithms can be. It’s much like environmental regulation: European firms are restricted from drilling for oil in protected areas, which reduces profit but protects nature. Chinese firms, unrestrained by such limits, may harm ecosystems while driving profits. In the data realm, restrictive laws narrow the available datasets. Whereas Chinese firms harvest freely, they might gain a major competitive edge that could help them lead the global AI market.

Data for Good: Jeff Hammerbacher’s Vision

American data scientist Jeff Hammerbacher is one of the field’s most influential figures. As journalist Steve Lohr (2015) reports, Hammerbacher started on Wall Street and later helped build Facebook’s data infrastructure. Today, he curates data collection and interpretation for the purpose of improving human lives—a fundamental ethos across the data industry. According to Hammerbacher, we must understand the current data landscape to predict the future. Practically, this means equipping everything we care about with sensors that collect data. His current focus? Transforming medicine by centring it on data. Data science is one of the most promising fields, where evidence trumps intuition.

Hammerbacher has been particularly interested in mental health and how data can improve psychological wellbeing. His close friend and former classmate, Steven Snyder, tragically died by suicide after struggling with bipolar disorder. This event, combined with Hammerbacher’s own breakdown at age 27—after being diagnosed with bipolar disorder and generalised anxiety disorder—led him to rethink his life. He notes that mental illness is a major cause of workforce dropout and ranks third among causes of early death. Researchers are now collecting neurobiological data from those with mental health conditions. Hammerbacher calls this “one of the most necessary and challenging data problems of our time.”

Pharmaceuticals haven’t solved the issue. Selective serotonin reuptake inhibitors(SSRIs), introduced in the 1980s, have failed to deliver a breakthrough for mood disorders. These remain a leading cause of death; roughly 90% of suicides involve untreated or poorly treated mood disorders, and about 50% of Western populations are affected at some point. The greater challenge lies in defining mental wellness—should people simply adapt to lives that feel unfit?

“Bullshit Jobs” and Social Systems

Investigative anthropologist David Graeber (2018) reported that 37–40% of Western workers view their jobs as “bullshit”—work they see as socially pointless. Thus, the problem isn’t merely psychological; our entire social structure normalises employment that values output over wellbeing.

Data should guide smarter decisions. Yet as our world digitises, data accumulates faster than our ability to interpret it. As Steve Lohr (2015) notes, a 20-bed intensive care unit can generate around 160,000 data points per second—a torrent demanding constant vigilance. Still, this data deluge offers positive outcomes: continuous patient monitoring enables proactive, personalised care.

Data-driven forecasting is set to reshape society, concentrating power and wealth. Not long ago, anyone could found a company; now a single corporation could dominate an entire sector with superior data. A case in point is the partnership between McKesson and IBM. In 2009, Kaan Katircioglu (IBM researcher) sought data for predictive modelling. He found it at McKesson—clean datasets recording medication inventory, prices, and logistics. IBM used this to build a predictive model, enabling McKesson to optimise its warehouse near Memphis and improve delivery accuracy from 90% to 99%.

At present, data-mining algorithms behave as clever tools. An algorithm is simply a set of steps for solving problems—think cooking recipes or coffee machine programming. Even novices can produce impressive outcomes by following a good set of instructions.

Historian Yuval Noah Harari (2015) provocatively suggests we are ourselves algorithms. Unlike machines, our algorithms run through emotions, perceptions, and thoughts—biological processes shaped by evolution, environment, and culture.

Summary

Personal data is the new source of extraction and exploitation—vital for technological progress yet governed by uneven regulations that determine competitive advantage. Pioneers like Jeff Hammerbacher highlight its potential for social good, especially in mental health, while revealing our complex psychology. We collect data abundantly, yet face the challenge of interpreting it effectively. Predictive systems can drive efficiency, but they can also foster monopolies. Ultimately, whether data serves or subsumes us depends on navigating its ethical, legal, and societal implications.


References

Graeber, D. (2018). Bullshit Jobs: A Theory. New York: Simon & Schuster.
Hammerbacher, J. (n.d.). [Interview in Lohr 2015].
Harari, Y. N. (2015). Homo Deus: A History of Tomorrow. New York: Harper.
Lohr, S. (2015). Data-ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else. New York: Harper Business.
Zuboff, Shoshana (2019). The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. PublicAffairs.