What ChatGPT Teaches About Agile Work Design Experiments

February 22, 2023

Many of us have conducted at least one experiment, asking ChatGPT a question.  You can try it here.

HR discussions about generative AI often focus on how it will change HR systems, such as offering answers to employee benefit questions by reading the company employee handbook and other documents.  For me, an equally fascinating issue is this:

If leaders are willing to risk millions in market capitalization and very public and disturbing failures in the interests of agile product/software design experiments, then why are they so reluctant to embrace agile work design experiments?

The saga of the early release of generative AI prototypes, like ChatGPT and Bard, offer HR leaders a good opportunity to help leaders see the value, logic and rigor of embracing uncertainty about the future of work, through agile work design experiments.

Here, I’ll use the analogies between work and software development experiments, to spur more productive conversations about the future of work.

“We Don’t Know” Is The Only Rational Future-Of-Work Policy

In February 2021, Pete Ramstad and I asserted that organizations should undertake agile work design experiments.  Since then, I am increasingly convinced that the only rational approach to the future of work is to embrace uncertainty.  I still believe, that the only rationale broad future-of-work policy must begin with “We Don’t Know the Future of Work.”

Does that sound like abdication, a failure to lead, or a recipe for chaos?  Many leaders seem to think so.  But Pete and I described the analogy to agile development of software, products and services.  “We Don’t Know the Future of Our Products and Services” is already a foundation for agile development, and leaders are quite capable and willing to embrace it.  The agile approach includes releasing products early, even with imperfections, and relying on careful observation of user experiences to identify the most valuable new product features, the most vital imperfections, and what features and imperfections NOT to address.  When it comes to products/software, these decisions are hardly chaotic and unsystematic.  They reflect well-accepted frameworks like Kano analysis.

Pete and I encouraged applying these same tools to work, with “agile work design experiments.” We encouraged HR leaders to step up and make the HR function the “hub” for agile work experiments, like a software or product-development function. The HR work-experiment hub would apply rigorous tools and frameworks to user-driven work experiments, to identify what works, and what doesn’t.  This would build more evidence-based, inclusive, agile, flexible, personalized, and mutually beneficial work experience for workers, managers, leaders and organizations.

This approach embraces uncertainty, rather than fixating only on what is certain, or attempting to avoid uncertainty with fixed and generalized work policies.  Even better, agile work experiments can draw on the lessons from agile software/product development, which has decades of practice and research on how you can embrace uncertainty with transparency, rigor and collaboration. Agile experiment systems apply consistent frameworks and approaches, to embrace the fact that products are constantly both obsolete and newly-upgraded, just as work is also perpetually upgraded.

Slow Progress Toward Agile Work Experiments

Yet, leaders still rarely approach work this way.  This is evident in the many examples, such as asserting simplistic relationships between presence in the office and things like innovation, collaboration, culture and productivity, to justify rigid return-to-office requirements, requiring large swathes of workers to be in the office some number of days a week.  This often seems designed to avoid risks like lost productivity, innovation, collaboration or culture.  Yet, the disruptions of the last 4 years have repeatedly shown that workers and managers can often actually increase these things with work arrangements that were virtually unheard of in the past.  Yet, emerging evidence suggests, not surprisingly, that the optimal approach is more personalized, nuanced and intentional.  Getting it right requires deep understanding about different work arrangements and experiences.  Just as with software and products, that understanding requires experiments.

It’s not that experiments aren’t happening.  In fact, one could say that we are currently in the most experimental period of work ever.  Organization and HR leaders describe to me many different work arrangements, such as negotiations between managers and their worker teams, increasingly personalized work arrangements, internal talent marketplaces, skills-based taxonomies, and work automation.  It is humbling and astonishing to see their skill and dedication.  Yet, these efforts often lack a systematic approach, similar to agile product development, that explicitly and rigorously treats these as true experiments, carefully observing their effects, comparing them to baseline arrangements, testing the effects of alternatives, and systematically understanding which experimental arrangements worked and which didn’t, and then perpetually fashioning new work designs and experiences that build on successes, and winnow away failures.

Typically, new work features are implemented not as experiments, but as established policies, expected to last a long time, and to “scale” across an entire organization.

My HR colleagues often suggest that it’s not possible to do rigorous agile work design experiments.  The reasons include “our top leaders believe (often with little evidence) that collaboration, innovation, culture, etc., can only happen when we force workers to be together in an office,” or “our legal counsel says that allowing work experiments will create risks of perceived or real inequality, lawsuits, etc.” or “allowing workers to experiment will raise expectations of personalized work features that we might have to roll back if the experiment doesn’t work,” or “Anything but a blanket approach to work arrangements will be too expensive to administer.

These are all valid concerns, but what’s striking to me is that the same concerns apply the release of new products like ChatGPT and BARD.  We can see this in the rapid adjustments that Microsoft and other companies made within days of the release, including experiments with limits on the number of sessions per day, and going silent when asked about “feelings”.  The early experiments have prompted healthy discussions about the ethics of such releases.

Work experiments test new approaches with managers and employees, who are arguably less likely to search for the most extreme or embarrassing outcomes.  The risks were may be far greater for generative AI, which is so new that little evidence and few frameworks exist.  Work experiments can draw upon decades of rigorous research and frameworks.  Indeed, one notable source for such research sits within Microsoft itself, in its WorkLab, and in the insights of Microsoft’s HR leaders, such as @Kathleen Hogan and Jared Spataro

Which brings us back to my question:

If leaders are willing to risk millions in market capitalization and very public and disturbing failures in the interests of agile product/software design experiments, then why are they so reluctant to embrace agile work design experiments?

 Are ChatGPT and Bard “Failed” Software Design Experiments?

Microsoft (ChatGPT) and Google (Bard) released generative AI enhanced search engines, and reporters went to work to discover where generative AI made mistakes or gave troubling answers.  It didn’t take long to discover that these product releases were indeed quite imperfect.

Kevin Roose of the New York Times reported on February 16, 2023 that “As we got to know each other, Sydney told me about its dark fantasies (which included hacking computers and spreading misinformation), and said it wanted to break the rules that Microsoft and OpenAI had set for it and become a human. At one point, it declared, out of nowhere, that it loved me. It then tried to convince me that I was unhappy in my marriage, and that I should leave my wife and be with it instead.” (Read the full transcript of the conversation here.)

What a great example of agile product development, with users invited to find imperfections to speed improvements in the software, products and services.  David Leonhardt’s, opinion column quotes Kevin Scott, Microsoft’s chief technology officer, who said that Roose’s chat with Bing was “part of the learning process” as the company readies its A.I. for wider release. “This is exactly the sort of conversation we need to be having, and I’m glad it’s happening out in the open,” Scott said. “These are things that would be impossible to discover in the lab.”  The Times quoted Sam Altman, the CEO of the company that developed ChatGPT, who said “ChatGPT is a horrible product.”

Does that mean the product is a mistake that should have been avoided?  No.  These leaders know that agile product development experiments reveal early versions that are “horrible,” on the way to future versions that are better.

Risks Are Essential for Agile Experiments

Companies must compete to on fast innovation, and often the best way to do that is to allow users to test products, potentially uncovering very public and visible flaws, and learn from those tests.

This carries risks.  When Google released BARD, it was so notably “rough” that Google’s market capitalization fell by $100 billion, after its ad on Twitter showed Bard making an obvious mistake.  Companies take big risks in releasing beta products.  Yet, Google and Microsoft persist because agile product development requires risk-taking, unearthing unanticipated imperfections in new products/services/software.  It is precisely such experiments that allow fast and perpetual upgrades.  “We could never learn these things in a lab”.

Leaders should approach the future of work the same way.  Agile work design experiments are arguably no riskier than user-driven software/product experiments, and should be just as common.

The Rising Chorus Encouraging Work Design Experiments

I’m encouraged that some of the most respected thought-leaders also have embraced work experiments as the future.

Adam Grant’s 2021 book, “Think Again:  The Power of Knowing what You Don’t Know” is a terrific example of a call to embrace uncertainty by experimenting. His recent Wall Street Journal interview applies this to the future of work:  “Too many leaders feel like their decisions are permanent. As opposed to saying, “We’re going to test and learn.”  Adam offers examples of work experiments that worked out well (such as giving some workers Friday’s off).  I’d love more examples of experiments that didn’t work, and what was learned.

Lynda Gratton’s 2022 book “Redesigning Work” offers four steps: (1) Understand what matters; (2) Reimagine new ways of operating; (3) Model and test new ways of working; and (4) Act and create.  She has expertly applied an experimental test-retest-scale approach to work design.  She also offers marvelous success stories, that might be extended to describe failures and their lessons, like ChatGPT and Bard.

Dan Pink’s 2022 bestseller, “The Power of Regret:  How Looking Backward Moves us Forward” includes this conclusion:  “These seventy years of research distill to two simple yet urgent conclusions: Regret makes us human. Regret makes us better.”  Experimentation even to the point of regret is key to improving.

Perhaps leaders should be more willing to risk “regret” in work design experiments, in service of significant learning.

In McKinsey’s January 2023 article, “When the times get tough, the tough get innovative and create paths to future growth,” Matt Banholzer, Michael Birshan, Rebecca Doherty, and Laura LaBerge note:  “Our research and experience show that companies tend to fall behind if they focus solely on avoiding the downside. … To capture growth opportunities while creating more strategic options in a fast-changing environment, innovation is key. Many companies are already acting: In our 2021 New Business Building Survey, respondents reported that, on average, they expect half of their revenues in the next five years to come from entirely new products, services, and businesses.”

Should leaders expect half of their work productivity in the next five years to come from entirely new work designs, organization structures and work arrangements?

Agile experiments need not mean unfettered and chaotic change, if they are used with care. Amazon veterans Colin Bryar and Bill Carr in their HBR article, “Have We Taken Agile Too Far” suggest combining agile development with Amazon’s “working backwards” approach:  “The working backwards approach requires a fully realized vision of a proposed product, embodied in a written press release for the product’s launch. This felt wrong, even unnatural, to software developers and product managers who wanted to get going on coding already. Teams typically spent weeks, if not months, hashing out this press release — along with an FAQ that explained to colleagues, customers, and senior management how Amazon could create this wonderful offering at an affordable yet profitable price. Only when the executives were satisfied with these documents could anyone start writing code and actually assemble the product.”

Imagine combining “working backwards” plus “agile” in work design.

How to Use the ChatGPT and Bard Examples to “Retool” Work Design

“Retooling HR” means helping non-HR leaders be more engaged and intelligent about HR issues by reframing HR using tools in which leaders are already smart and well-trained.  You can retool the ChatGPT and Bard examples to engage leaders with agile work design experiments.

Ask your leaders to imagine software/product agile design experiments, the risk and rigor they accept, and the logical frameworks they use.  Then, ask them to substitute the words “work and organization” for the words “software/product” and the words “workers/leaders/managers” for the word “users”. 

Have your leaders asserted work policies that restricts how workers and managers are allowed to experiment?  Do they keep work arrangements secret until they believe they are “perfect?”  ChatGPT and Bard were not done this way.

Work experiments carry risk, but they are probably less risky than releasing a product that reduces market capitalization by $100 million in one week.  Is the upside of worker-driven work experiments as substantial as the upside from user-driven product experiments?

You can’t learn enough about work innovation by keeping experiments under wraps, restricting them to only what you know for sure, or by keeping them in a laboratory of tightly-controlled work designers. Like products and software, allowing experiments to flourish, even if they reveal flaws, ma be the most effective way to stay competitive in the talent market, create sustainable work relationships, attract and retain talent, build skills quickly, and achieve other important future work outcomes.

As Boards and leadership teams consider future of work policies, try substituting “product” for “work” and “users” for “workers.”  Ask yourself if your current approach to work design would make sense if it was an agile product strategy.  I think you’ll often find that your mindset about work experimentation is less systematic and nuanced, and you can improve it using tools you already know, for agile product/software design.