Zen and the Art of Dissatisfaction  – Part 22

Big Data, Deep Context

In this post, we explore what artificial intelligence (AI) algorithms, or rather – large language models – are, how they learn, and their growing impact on sectors such as medicine, marketing and digital infrastructure. We look into some prominent real‑world examples from the recent past—IBM’s Watson, Google Flu Trends, and the Hadoop ecosystem—and discuss how human involvement remains vital even as machine learning accelerates. Finally, we reflect on both the promise and the risks of entrusting complex decision‑making to algorithms.

Originally published in Substack: https://substack.com/inbox/post/168617753

Artificial intelligence algorithms function by ingesting training data, which guides their learning. How this data is acquired and labelled marks the key differences between various types of AI algorithms. An AI algorithm receives training data and uses it to learn. Once trained, the algorithm performs new tasks using that data as the basis for its future decisions.

AI in Healthcare: From Watson to Robot Doctors

Some algorithms are capable of learning autonomously, continuously integrating new information to adjust and refine their future actions. Others require a programmer’s intervention from time to time. AI algorithms fall into three main categories: supervised learning, unsupervised learning and reinforcement learning. The primary differences between these approaches lie in how they are trained and how they operate.

Algorithms learn to identify patterns in data streams and make assumptions about correct and incorrect choices. They become more effective and accurate the more data they receive—a process known as deep learning, based on artificial neural networks that distinguish between right and wrong answers, enabling them to draw better and faster conclusions. Deep learning is widely used in speech, image and text recognition and processing.

Modern AI and machine learning algorithms have empowered practitioners to notice things they might otherwise have missed. Herbert Chase, a professor of clinical medicine at Columbia University in New York, observed that doctors sometimes have to rely on luck to uncover underlying issues in a patient’s symptoms. Chase served as a medical adviser to IBM during the development of Watson, the AI diagnostic assistant.

IBM’s concept involved a doctor inputting, for example, three patient‑described symptoms into Watson; the diagnostic assistant would then suggest a list of possible diagnoses, ranked from most to least likely. Despite the impressive hype surrounding Watson, it proved inadequate at diagnosing actual patients. IBM therefore announced that Watson would be phased out by the end of 2023 and its clients encouraged to transition to its newer services.

One genuine advantage of AI lies in the absence of a dopamine response. A human doctor, operating via biological algorithms, experiences a rush of dopamine when they arrive at what feels like a correct diagnosis—but that diagnosis can be wrong. When doubts arise, the dopamine fades and frustration sets in. In discouragement, the doctor may choose a plausible but uncertain diagnosis and send the patient home.

An AI‑algorithm‑based “robot‑doctor” does not experience dopamine. All of its hypotheses are treated equally. A robot‑doctor would be just as enthused about a novel idea as about its billionth suggestion. It is likely that doctors will initially work alongside AI‑based robot doctors. The human doctor can review AI‑generated possibilities and make their own judgement. But how long will it be before human doctors become obsolete?

AI in Action: Data, Marketing, and Everyday Decisions

Currently, AI algorithms trained on large datasets drive actions and decision‑making across multiple fields. Robot‑doctors assisting human physicians and the self‑driving cars under development by Google or Tesla are two visible examples of near‑future possibilities—assuming the corporate marketing stays honest.

AI continues to evolve. Targeted online marketing, driven by social media data, is an example of a seemingly trivial yet powerful application that contributes to algorithmic improvement. Users may tolerate mismatched adverts on Facebook, but may become upset if a robot‑doctor recommends an incorrect, potentially expensive or risky test. The outcome is all about data—its quantity, how it is evaluated and whether quantity outweighs quality.

According to MIT economists Erik Brynjolfsson and Andrew McAfee (2014), in the 1990s only about one‑fifth of a company’s activities left a digital trace. Today, almost all corporate activities are digitised, and companies have begun to produce reports in language intelligible to algorithms. It is now more important that a company’s operations are understood by AI algorithms than by its human employees.

Nevertheless, vast amounts of data are still analysed using tools built by humans. Facebook is perhaps the most well‑known example of how our personal data is structured, collected, analysed and used to influence and manipulate opinions and behaviour.

Big Data Infrastructure

Jeff Hammerbacher—in a 2015 interview with Steve Lohr—helped introduce Hadoop in 2008 to manage the ever‑growing volume of data. Hadoop, developed by Mike Cafarella and Doug Cutting, is an open‑source variant of Google’s own distributed computing system. Initially named after Cutting’s child’s yellow toy elephant, Hadoop could process two terabits of data in two days. Two years later it could perform the same task in mere minutes.

At Facebook, Hammerbacher and his team constructed Hive, an application running on Hadoop. Now available as Apache Hive, it allows users without a computer science degree to query large processed datasets. During the writing of this post, generative AI applications such as ChatGPT (by OpenAI), Claude (Anthropic), Gemini (Google DeepMind), Mistral & Mixtral (Mistral AI), and LLaMA (Meta) have become available for casual users on ordinary computers.

A widely cited example of public‑benefit predictive data analysis is Google Flu Trends (GFT). Launched in 2008, GFT aimed to predict flu outbreaks faster than official healthcare systems by analysing popular Google search terms related to flu.

GFT successfully detected the H1N1 virus before official bodies in 2009, marking a major achievement. However, in the winter of 2012–2013, media coverage of flu induced a massive spike in related searches, causing GFT’s estimates to be almost twice the real figures. The Science article “The Parable of Google Flu” (Lazer et al., 2014) accused Google of “big‑data hubris”, although it conceded that GFT was never intended as a standalone forecasting tool, but rather as a supplementary warning signal (Raising the bar, Wikipedia).

Google’s miscalculation lay in its failure to interpret context. Steve Lohr (2015) emphasises that context involves understanding associations—a shift from raw data to meaningful information. IBM’s Watson was touted as capable of such contextual understanding, capable of linking words to appropriate contexts .

Watson: From TV champion to Clinical Tool, and sold for scraps!

David Ferrucci, a leading AI researcher at IBM, headed the DeepQA team responsible for Watson . Named after IBM’s founder Thomas J. Watson, Watson gained prominence after winning £1 million on Jeopardy! in 2011, defeating champions Brad Rutter and Ken Jennings.

Jennifer Chu‑Carroll, one of Watson’s Jeopardy! coaches, told Steve Lohr (2015) that Watson sometimes made comical errors. When asked “Who was the first female astronaut?”, Watson repeatedly answered “Wonder Woman,” failing to distinguish between fiction and reality.

Ken Jennings reflected that:

“Just as manufacturing jobs were removed in the 20th century by assembly‑line robots, Brad and I were among the first knowledge‑industry workers laid off by the new generation of ‘thinking’ machines… The Jeopardy! contestant profession may be the first Watson‑displaced profession, but I’m sure it won’t be the last.”

In February 2013, IBM announced that Watson’s first commercial application would focus on lung cancer treatment and other medical diagnoses—a real‑world “Dr Watson”—with 90% of oncology nurses reportedly following its recommendations at the time. The venture ultimately collapsed under the weight of unmet expectations and financial losses. In January 2022, IBM quietly sold the core assets of Watson Health to private equity firm Francisco Partners—reportedly for about $1 billion, a fraction of the estimated $4 billion it had invested—effectively signalling the death knell of its healthcare ambitions. The sale marked the end of Watson’s chapter as a medical innovator; the remaining assets were later rebranded under the name Merative, a standalone company focusing on data and analytics rather than AI‑powered diagnosis. Slate described the move as “sold for scraps,” characterising the downfall as a cautionary tale of over‑hyped technology failing to deliver on bold promises in complex fields like oncology.

Conclusion

Artificial intelligence algorithms are evolving rapidly, and while they offer significant benefits in fields like medicine, marketing, and data analysis, they also bring challenges. Data is not neutral: volume must be balanced with quality and contextual understanding. Tools such as Watson, Hadoop and Google Flu Trends underscore that human oversight remains indispensable. Ultimately, AI should augment human decision‑making rather than replace it—at least for now.


References

Brynjolfsson, E., & McAfee, A. (2014). The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. W. W. Norton & Company.

Ferrucci, D. A., Brown, E., Chu‑Carroll, J., Fan, J., Gondek, D., Kalyanpur, A. A., … Welty, C. (2011). Building Watson: An overview of the DeepQA project. AI Magazine, 31(3), 59–79. (IBM Research)

Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science, 343(6176), 1203–1205. (Wikipedia)

Lohr, S. (2015). Data‑ism. HarperBusiness.

Mintz‑Oron, O. (2010). Smart Machines: IBM’s Watson and the Era of Cognitive Computing. Columbia Business School Publishing. [Referenced via IBM Watson bibliography] (TIME, Wikipedia)

Zen and the Art of Dissatisfaction – Part 21

Data: The Oil of the Digital Age

Data applications rely fundamentally on data—its extraction, collection, storage, interpretation, and monetisation—making them arguably the most significant feature of our contemporary world. Often referred to as ”the new oil,” data is, from the perspective of persistent capitalists, a valuable resource capable of sustaining economic growth even after conventional natural reserves have been exhausted. This new form of capitalism has been titled Surveillance Capitalism (Zuboff 2019).

Originally published in Substack: https://substack.com/@mikkoijas

Data matters more than opinions. For developers of data applications, the key goal is that we browse online, click “like,” follow links, spend time on their platforms, and accept cookies. What we think or do does not matter; what matters is the digital behavioural surplus, a trace we leave and our consent to tracking. That footprint has become immensely valuable—companies are willing to pay for it, and sometimes break laws to get it.

Cookies and Consumer Privacy in Europe

European legislation like the General Data Protection Regulation (GDPR) ensures some personal protection, but we still leave traces even if we refuse to share personal data. Websites are legally obligated to request our cookie consent, making privacy violations more visible. Rejecting cookies and clearing them out later becomes a time-consuming and frustrating chore.

In stark contrast, China’s data laws are much more relaxed, granting companies broader operational freedom. The more data a company gathers, the more fine-tuned its predictive algorithms can be. It’s much like environmental regulation: European firms are restricted from drilling for oil in protected areas, which reduces profit but protects nature. Chinese firms, unrestrained by such limits, may harm ecosystems while driving profits. In the data realm, restrictive laws narrow the available datasets. Whereas Chinese firms harvest freely, they might gain a major competitive edge that could help them lead the global AI market.

Data for Good: Jeff Hammerbacher’s Vision

American data scientist Jeff Hammerbacher is one of the field’s most influential figures. As journalist Steve Lohr (2015) reports, Hammerbacher started on Wall Street and later helped build Facebook’s data infrastructure. Today, he curates data collection and interpretation for the purpose of improving human lives—a fundamental ethos across the data industry. According to Hammerbacher, we must understand the current data landscape to predict the future. Practically, this means equipping everything we care about with sensors that collect data. His current focus? Transforming medicine by centring it on data. Data science is one of the most promising fields, where evidence trumps intuition.

Hammerbacher has been particularly interested in mental health and how data can improve psychological wellbeing. His close friend and former classmate, Steven Snyder, tragically died by suicide after struggling with bipolar disorder. This event, combined with Hammerbacher’s own breakdown at age 27—after being diagnosed with bipolar disorder and generalised anxiety disorder—led him to rethink his life. He notes that mental illness is a major cause of workforce dropout and ranks third among causes of early death. Researchers are now collecting neurobiological data from those with mental health conditions. Hammerbacher calls this “one of the most necessary and challenging data problems of our time.”

Pharmaceuticals haven’t solved the issue. Selective serotonin reuptake inhibitors(SSRIs), introduced in the 1980s, have failed to deliver a breakthrough for mood disorders. These remain a leading cause of death; roughly 90% of suicides involve untreated or poorly treated mood disorders, and about 50% of Western populations are affected at some point. The greater challenge lies in defining mental wellness—should people simply adapt to lives that feel unfit?

“Bullshit Jobs” and Social Systems

Investigative anthropologist David Graeber (2018) reported that 37–40% of Western workers view their jobs as “bullshit”—work they see as socially pointless. Thus, the problem isn’t merely psychological; our entire social structure normalises employment that values output over wellbeing.

Data should guide smarter decisions. Yet as our world digitises, data accumulates faster than our ability to interpret it. As Steve Lohr (2015) notes, a 20-bed intensive care unit can generate around 160,000 data points per second—a torrent demanding constant vigilance. Still, this data deluge offers positive outcomes: continuous patient monitoring enables proactive, personalised care.

Data-driven forecasting is set to reshape society, concentrating power and wealth. Not long ago, anyone could found a company; now a single corporation could dominate an entire sector with superior data. A case in point is the partnership between McKesson and IBM. In 2009, Kaan Katircioglu (IBM researcher) sought data for predictive modelling. He found it at McKesson—clean datasets recording medication inventory, prices, and logistics. IBM used this to build a predictive model, enabling McKesson to optimise its warehouse near Memphis and improve delivery accuracy from 90% to 99%.

At present, data-mining algorithms behave as clever tools. An algorithm is simply a set of steps for solving problems—think cooking recipes or coffee machine programming. Even novices can produce impressive outcomes by following a good set of instructions.

Historian Yuval Noah Harari (2015) provocatively suggests we are ourselves algorithms. Unlike machines, our algorithms run through emotions, perceptions, and thoughts—biological processes shaped by evolution, environment, and culture.

Summary

Personal data is the new source of extraction and exploitation—vital for technological progress yet governed by uneven regulations that determine competitive advantage. Pioneers like Jeff Hammerbacher highlight its potential for social good, especially in mental health, while revealing our complex psychology. We collect data abundantly, yet face the challenge of interpreting it effectively. Predictive systems can drive efficiency, but they can also foster monopolies. Ultimately, whether data serves or subsumes us depends on navigating its ethical, legal, and societal implications.


References

Graeber, D. (2018). Bullshit Jobs: A Theory. New York: Simon & Schuster.
Hammerbacher, J. (n.d.). [Interview in Lohr 2015].
Harari, Y. N. (2015). Homo Deus: A History of Tomorrow. New York: Harper.
Lohr, S. (2015). Data-ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else. New York: Harper Business.
Zuboff, Shoshana (2019). The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. PublicAffairs.

Zen and the Art of Dissatisfaction – Part 20

The Triple Crisis of Civilisation

“At the time I climbed the mountain or crossed the river, I existed, and the time should exist with me. Since I exist, the time should not pass away. […] The ‘three heads and eight arms’ pass as my ‘sometimes’; they seem to be over there, but they are now.”

Dōgen

Introduction

This blog post explores the intertwining of ecology, technology, politics and data collection through the lens of modern civilisation’s crises. It begins with a quote by the Japanese Zen master Dōgen, drawing attention to the temporal nature of human existence. From climate emergency to digital surveillance, from Brexit to barcodes, the post analyses how personal data has become the currency of influence and control.


Originally published in Substack: https://mikkoijas.substack.com/

The climate emergency currently faced by humanity is only one of the pressing concerns regarding the future of civilisation. A large-scale ecological crisis is an even greater problem—one that is also deeply intertwined with social injustice. A third major concern is the rapidly developing situation created by technology, which is also connected to problems related to nature and the environment.

Cracks in the System: Ecology, Injustice, and the Digital Realm

The COVID-19 pandemic  revealed new dimensions of human interaction. We are dependent on technology-enabled applications to stay connected to the world through computers and smart devices. At the same time, large tech giants are generating immense profits while all of humanity struggles with unprecedented challenges.

Brexit finally came into effect at the start of 2021. On Epiphany of that same year, angry supporters of Donald Trump stormed the United States Capitol. Both Brexit and Trump are children of the AI era. Using algorithms developed by Cambridge Analytica, the Brexit campaign and Trump’s 2016 presidential campaign were able to identify voters who were unsure of their decisions. These individuals were then targeted via social media with marketing and curated news content to influence their opinions. While the data for this manipulation was gathered online, part of the campaigning also happened offline, as campaign offices knew where undecided voters lived and how to sway them.

I have no idea how much I am being manipulated when browsing content online or spending time on social media. As I move from one website to another, cookies are collected, offering me personalised content and tailored ads. Algorithms working behind websites monitor every click and search term, and AI-based systems form their own opinion of who I am.

Surveillance and the New Marketplace

A statistical analysis algorithm in a 2013 study analysed the likes of 58,000 Facebook users. The algorithm guessed users’ sexual orientation with 88% accuracy, skin colour with 95% accuracy, and political orientation with 85% accuracy. It also guessed with 75% accuracy whether a user was a smoker (Kosinski et al., 2013).

Companies like Google and Meta Platforms—which includes Facebook, Instagram, Messenger, Threads, and WhatsApp—compete for users’ attention and time. Their clients are not individuals like me, but advertisers. These companies operate under an advertising-based revenue model. Individuals like me are the users whose attention and time are being competed for.

Facebook and other similar companies that collect data about users’ behaviour will presumably have a competitive edge in future AI markets. Data is the oil of the future. Steve Lohr, long-time technology journalist at the New York Times, wrote in 2015 that data-driven applications will transform our world and behaviour just as telescopes and microscopes changed our way of observing and measuring the universe. The main difference with data applications is that they will affect every possible field of action. Moreover, they will create entirely new fields that have not previously existed.

In computing, the word ”data” refers to various numbers, letters or images as such, without specific meaning. A data point is an individual unit of information. Generally, any single fact can be considered a data point. In a statistical or analytical context, a data point is derived from a measurement or a study. A data point is often the same as data in singular form.

From Likes to Lives: How Behaviour Becomes Prediction

Decisions and interpretations are created from data points through a variety of processes and methods, enabling individual data points to form applicable information for some purpose. This process is known as data analysis, through which the aim is to derive interesting and comprehensible high-level information and models from collected data, allowing for various useful conclusions to be drawn.

A good example of a data point is a Facebook like. A single like is not much in itself and cannot yet support major interpretations. But if enough people like the same item, even a single like begins to mean something significant. The 2016 United States presidential election brought social media data to the forefront. The British data analytics firm Cambridge Analytica gained access to the profile data of millions of Facebook users.

The data analysts hired by Cambridge Analytica could make highly reliable stereotypical conclusions based on users’ online behaviour. For example, men who liked the cosmetics brand MAC were slightly more likely to be homosexual. One of the best indicators of heterosexuality was liking the hip-hop group Wu-Tang Clan. Followers of Lady Gaga were more likely to be extroverted. Each such data point is too weak to provide a reliable prediction. But when there are tens, hundreds or thousands of data points, reliable predictions about users’ thoughts can be made. Based on 270 likes, social media knows as much about a user as their spouse does.

The collection of data is a problem. Another issue is the indifference of users. A large portion of users claim to be concerned about their privacy, while simultaneously worrying about what others think of them on social platforms that routinely violate their privacy. This contradiction is referred to as the Privacy Paradox. Many people claim to value their privacy, yet are unwilling to pay for alternatives to services like Facebook or Google’s search engine. These platforms operate under an advertising-based revenue model, generating profits by collecting user data to build detailed behavioural profiles. While they do not sell these profiles directly, they monetise them by selling highly targeted access to users through complex ad systems—often to the highest bidder in real-time auctions. This system turns user attention into a commodity, and personal data into a tool of influence.

The Privacy Paradox and the Illusion of Choice

German psychologist Gerd Gigerenzer, who has studied the use of bounded rationality and heuristics in decision-making, writes in his excellent book How to Stay Smart in a Smart World (2022) that targeted ads usually do not even reach consumers, as most people find ads annoying. For example, eBay no longer pays Google for targeted keyword advertising because they found that 99.5% of their customers came to their site outside paid links.

Gigerenzer calculates that Facebook could charge users for its service. Facebook’s ad revenue in 2022 was about €103.04 billion. The platform had approximately 2.95 billion users. So, if each user paid €2.91 per month for using Facebook, their income would match what they currently earn from ads. In fact, they would make significantly more profit because they would no longer need to hire staff to sell ad space, collect user data, or develop new analysis tools for ad targeting.

According to Gigerenzer’s study, 75% of people would prefer that Meta Platforms’ services remain free, despite privacy violations, targeted ads, and related risks. Of those surveyed, 18% would be willing to pay a maximum of €5 per month, 5% would be willing to pay €6–10, and only 2% would be willing to pay more than €10 per month.

But perhaps the question is not about money in the sense that Facebook would forgo ad targeting in exchange for a subscription fee. Perhaps data is being collected for another reason. Perhaps the primary purpose isn’t targeted advertising. Maybe it is just one step toward something more troubling.

From Barcodes to Control Codes: The Birth of Modern Data

But how did we end up here? Today, data is collected everywhere. A good everyday example of our digital world is the barcode. In 1948, Bernard Silver, a technology student in Philadelphia, overheard a local grocery store manager asking his professors whether they could develop a system that would allow purchases to be scanned automatically at checkout. Silver and his friend Norman Joseph Woodland began developing a visual code based on Morse code that could be read with a light-based scanner. Their research only became standardised as the current barcode system in the early 1970s. Barcodes have enabled a new form of logistics and more efficient distribution of products. Products have become data, whose location, packaging date, expiry date, and many other attributes can be tracked and managed by computers in large volumes.

Conclusion

We are living in a certain place in time, as Dōgen described—an existence with a past and a future. Today, that future is increasingly built on data: on clicks, likes, and digital traces left behind.

As ecological, technological, and political threats converge, it is critical that we understand the tools and structures shaping our lives. Data is no longer neutral or static—it has become currency, a lens, and a lever of power.


References

Gigerenzer, G. (2022). How to stay smart in a smart world: Why human intelligence still beats algorithms. Penguin.

Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behaviour. Proceedings of the National Academy of Sciences, 110(15), 5802–5805. https://doi.org/10.1073/pnas.1218772110

Lohr, S. (2015). Data-ism: The revolution transforming decision making, consumer behavior, and almost everything else. HarperBusiness.

Dōgen / Sōtō Zen Text Project. (2023). Treasury of the True Dharma Eye: Dōgen’s Shōbōgenzō (Vols. I–VII, Annotated trans.). Sōtōshū Shūmuchō, Administrative Headquarters of Sōtō Zen Buddhism.