[TALK] AI talk at Open Science & Societal Impact conference #3598

penyuan · 2024-04-06T12:10:44Z

Date of talk

2024-04-25

Details of the talk

I've been invited to give a talk at the Open Science & Societal Impact online conference. It's scheduled for 16:30 UTC on 25 April 2024, and @dingaaling has kindly agreed to be a co-author.

The meeting is about various facets of open science and policy, and the session I'm asked to speak in is on open science and AI (noting that "AI" can be a problematic term!).

After a couple of stimulating meetings with @dingaaling, here's the general structure of the talk (which is still in development):

Among all the AI hype right now, the term "open/open source AI" has been thrown around by many of the big power players, from Meta to OpenAI. Often what they mean by "open source" is far from how the term has been defined for software.
This is indeed a problem, and coming up with a clear definition for open source AI is an important conversation. At the same time, we believe defining open source AI is necessary but not sufficient for addressing the challenges around AI.
In addition, we should think about the specific outcomes we want to see in a world with AI. What else needs to happen to realise those outcomes? Here are some initiatives worth looking at. (see below for details)
Let's look at some problems in scientific research. For example, there are now numerous peer-reviewed papers that include obvious AI-generated content, from those with the text "Certainly, here is a summary of..." to that infamour figure of a lab rat with giant gonads.
What's important is that AI is often not the problem! Instead, AI highlights existing problems in our institutions, e.g. the peer review system has been broken for a long time before AI came along. AI is not corrupting, it highlights existing corruption.\
To learn more, join us at the Turing Way to continue the conversation!

OK, that's the gist of it. More coming!

Checklist

Schedule your talk and let the community know in TTW Slack. If you’d like feedback and/or to schedule a practice talk, ask in the TTW Slack!
Download template(s) from the promotion pack to ensure stylistic consistency
Generate a DOI on Zenodo and upload your slides when ready, preferably in the original format along with a PDF or any other format you are using. Tag with "the-turing-way" under Communities. Zenodo allows versioning, so we encourage you to upload your slides before your talk and add additional file(s) with any changes after
Check all ackowledgements (bottom right corner of each slide) and add the DOI for your presentation and your personal contact info if desired
If you re-used slides from a specific talk, please acknowledge the original author of the slides
Double check acknowledgements slide for TTW team, license info, Scriberia link, any additional acknowledgements
Double check contact info for TTW links (book, Twitter, GitHub, Slack, newsletter), your own contact into
Share the Zenodo link and a recording (if available) in TTW Slack!

penyuan · 2024-04-06T12:16:42Z

A few more things I'm thinking about:

Besides the peer review example, there are also anecdotes of prospective PhD students using generative AI to write their cover letters from a list of bullet points, and overworked professors using the same AI to reduce the cover letter back to bullet points to save time. Is this a good example to include?
IMO "AI" is a loaded term that can be misleading and often misunderstood. What's the state of the art on explaining this confusion?
Other than OpenAI closed models and Meta's Llama which is not really open source, are there other examples of people calling their "AI" thing "open source" while actually muddying the waters?

dingaaling · 2024-04-08T10:31:38Z

Thanks for start this issue @penyuan! Here is the medium post with the writeup: https://jending.medium.com/c57ccdbce896

A few thoughts on how to weave this talk preparation and scoping into TTW:

Use the open-ai.jpg from Scriberia created at a past book dash: https://zenodo.org/records/7587336
Get feedback and open the conversation at the 17 April TTW Collab Cafe, 4PM GMT (share & outreach)
Identify engagement opportunities with the talk attendees (e.g. invite them to another future Collab Cafe)
Explore if there are elements of the presentation that we can write up for the upcoming Book Dash

penyuan · 2024-04-17T14:09:47Z

I've fleshed out the content of the talk a bit more into three sections with contents in bullet point form:

1. "Open AI" is often neither open, articificial, nor intelligent

We've seen over the past year egregious examples of the use of "AI" technology in scientific practice. For example, we have seen:
- Generative AI being used to write papers and reviews of those papers. There's even that infamous paper from February 2024 with a prominent figure featuring a AI-generated lab rat with giant gonads. Somehow all of this passed peer review.
- In undergraduate education, not only are students using generative AI to write essays, instructors are using the same technologies to assess them.
Part of the ensuing discussion are arguments for open source, or "open", AI that provides transparency, security, accountability, and reproducibility for the underlying technologies.
But among all the AI hype right now, the term "open/open source AI" has been thrown around by many of the big power players, from Meta to OpenAI. Often what they mean by "open source" is far from how the term has been historically defined for other domains like software. For example:
- Meta's Llama LLM was pitched as "open source" but in fact came with restrictions that don't fit well-established meanings of the term. Intentions aside, this would diluting the meaning of open source and complicate conversations when we use the term to mean different things.
For scientific research, the "openness" of many allegedly open source AI technologies do not enable reuse and reproducibility as we have come to expect of science.
On top of this, as Kate Crawford wrote in their book, what we call "AI" is often "neither artificial nor intelligent."
Critically, ambiguity around key terms risks adding to already dwindling public/societal trust in scientific institutions.
These are indeed problems, and coming up with a clear definition for open source AI is an important conversation. A good example is the process led by the Open Source Initiative. Scientific researchers and open science practitioners should be part of this conversation.

2. Moving towards an outcomes-based approach to AI

At the same time, we believe defining open source AI is necessary but not sufficient for addressing the challenges around AI.
We should think about the specific outcomes we want to see in a world with AI. What else needs to happen to realise those outcomes?
@dingaaling's post goes here

3. What outcomes do we want for open science?

Coming back to scientific research, let's look at the examples I gave at the beginning.
Let's look at those published peer-reviewed papers that include obvious AI-generated content, from those that start with "Certainly, here is a summary of..." to that infamous figure of a lab rat with giant gonads. What are the outcomes we really want here?
Initially, we might think that AI is creating these problems, and an obvious solution is to prohibit the use of AI for writing papers and peer reviews.
In my view, doing so would paper over a deeper underlying problem, i.e. applying a bandaid to the gushing wound that is the corruption of peer review itself.
Even before "AI" came along, the publish-or-perish culture pressures academics into publishing as many papers as possible when we are already overworked. AI exacerbates that problem.
On the other side of this is the expectation to provide peer reviews or be editors on journals with no recognition when we are already overworked. AI exacerbates this problem, too.
When put together, they lead to corrupt outcomes.
But if we "simply" ban the use of AI, that doesn't tackle the underlying corruption of our institutions. Not to mention that a ban might have unintended consequences, such as preventing using AI to let us quickly understand a new field of knowledge, i.e. generating a review of the literature.
There are ongoing efforts to review scientific publishing, from open peer review, preprints, Registered Reports, post publication peer review, etc. Supporting these efforts hits at the core of the problems.
The message here is that AI is often not the problem! Instead, AI highlights existing problems in our institutions. AI is (often) not corrupting in itself, it highlights existing corruption.

So, in summary:

Words matter, and clearly defining key terms will enable effective discourse on the role of AI in open science. We should participate in this process but know that this is a necessary but insufficient step.
In addition to that, we should adopt an outcomes-based approach to thinking about AI issues.
With the understanding that AI if often not the problem per se, but rather a technology that highlights long-existing problems in scientific research. It is a reminder for us to reflect deeply on what we really care about and tackle those problems.

dingaaling · 2024-04-18T14:00:33Z

@penyuan some suggested text for slides 13/14 for the interlude of section 2. 2. Moving towards an outcomes-based approach to AI!

slide 13:
- We should think about the specific outcomes we want to see in a world with AI. What else needs to happen to realise those outcomes?
slide 14:
- Some of these outcomes may include things like safe, democratic, trustworthy, and inclusive, and to achieve those different outcomes we need different tools. All of the positive outcomes we hope for AI cannot be achieved through the single dimension of openness, and certainly not through an open source AI definition or license alone.
- A stronger Open Source AI License won’t enable “Transparent” or “Democratic” AI, even though some of the marketing hype we saw in the earlier slide might try to sell that dream. But more openly licensed AI artefacts can support these outcomes, by making it easier for more people to audit or reuse them.
- Open Source is an enabler of many different outcomes. By being more specific for what outcome we are aiming for (whether that’s promoting freedom, facilitating access/reuse, or building public trust), we can have a better conversation about “what next?” And what new tools should be built to work alongside an Open Source AI definition or license.

penyuan · 2024-04-19T11:52:11Z

That sounds great @dingaaling thank you!

Continuing the thread in section 2 about outcomes, your post inspired me to think more about specific examples to demonstrate this approach that (1) could be connected to scientific research; and (2) demonstrates the point of section 3, i.e. dealing with "AI" (let along "open" AI) often misses the point/deeper problems.

Let me know what you think of this:

The colloquially ambiguous use of the term "AI" can misdirect our attention as it is often neither artificial nor intelligent. Not only does it perpetuate AI as pixie dust that you sprinkle on to things to give them a magical sheen, it entrenches deep systemic problems we've had long before this popular term came along.

For example, ChatGPT comes across as an autonomous, independent entity that you can have a human(-like) conversation with and do tasks for you. In fact, the development of its underlying statistical models and "intelligent" facade is built on traumatised sweatshop labourers - often in Africa countries - who manually provide training data (such as reported here, here, here, here, here, or here). Calling this "artificial intelligence" further distances us from the inequitable labour and colonialism that have long been deeply problematic.

An outcomes-based approach to AI means that in addition to defining key terms, we consider the outcomes we want to see for these underlying issues and think about what tools we need to achieve that.

What does this have to do with scientific research?

In the past few years, I've peer reviewed several scientific papers where academic researchers crowdsource the labelling of their big datasets to an army of online volunteers, who provide training data to machine learning algorithms. Some researchers like to call this "citizen science" (I disagree), and stress in their papers how crowdsourcing (menial) work to volunteers saves money and is efficient for achieving their scientific aims. Much of the conversation is about how to ensure that these low-skilled volunteers provide scientifically rigorous results. In contrast, relatively little ink is spilled on what the activity means for this labour, or how this labour is not part of the "costs" that's saved for these academic scientists.

In my view, those of us in the scientific community must engage with broader discourse - including non-academic circles - on the outcomes we'd like to see in a world with AI.

[lead into section 3]

Apparently my presentation has to be 17-18 minutes, so fitting everything in is a challenge, but I'd appreciate input from @dingaaling or @everyone on whether this fits with the outline above!

penyuan · 2024-04-23T19:31:23Z

I've published a "release candidate" iteration of the slides to Zenodo:

https://doi.org/10.5281/zenodo.11051128

I'm still tweaking a few things, and the final version used for the talk this Thursday will use the same DOI.

penyuan · 2024-04-26T11:46:49Z

With many thanks to @dingaaling I "shipped" the final talk yesterday.

Slides and video recording:

https://doi.org/10.5281/zenodo.11051128

Recording on Internet Archive:

https://archive.org/details/AI-is-not-the-problem-2024-04-25

Transcript:

https://write.as/naclscrg/talk-ai-is-not-the-problem

There's like 100 tabs on my browser with further reading from the development process, some of which was kindly suggested by @dingaaling. I'll try to somehow dump that somewhere, maybe in one of the documents above...

Thanks everyone!

penyuan · 2024-05-01T15:33:22Z

Quick update from the 1 May 2024 Turing Way Collaboration Cafe.

Notes from the cafe pad

Open Source AI - Open Science & Societal Impact (Room 8)
https://arxiv.org/html/2404.06484v1
Following Pen's presentation, we are eager to find ways to continue the conversation, for example:
- Position Paper summarising OpenUK & OSSI presentations
- "Full Stack" approach to open source AI - Thinking about this from a full stack angle, rather than just focusing on models, etc., "a full stack approach to open source"
- Connecting in multidisciplinary, Open umbrella approach to inform our understanding of what open source AI is and what it could/should be
- Case studies from across the AI stack/pipeline, where stack is not limited to the technical stack
- Book Dash documentation of the different "tools in the toolbox" for open practices in AI
- Mapping tools and practices across the stack
- Organise a Fireside Chat tying different open communities into open source AI (data, software, hardware, environmental) - @aleesteele suggests that we open an issue (using the relevant) template to discuss this, and schedule a meeting to chat about what the topic might be.
  - Cailean Osborne
  - Sasha Luccioni: https://www.sashaluccioni.com/about/
  - Elena or Joe from ODI (or Jeni Tennison)
  - https://www.shannondosemagen.com/about
  - https://www.wilsoncenter.org/person/alison-parker
We also discussed an interesting, broad research project idea
- There are past (and ongoing efforts) to characterise and evaluate the health of open source communities and the collaboration that happens, e.g. CHAOSS.
- Now, there are not just work on modelling openness for AI but also similar thinking on how we can examine AI collaboration
- There's potential for a mixed methods approach to looking at AI collaboration with data mining/scraping of projects (such as on Hugging Face) and in depth interviews with individuals representing the different actors in that system
And who knows, maybe in the future we can present something at a meeting, such as this one: https://metascience.info/

things we're doing after today

@dingaaling will take @penyuan's talk transcript and @dingaaling's Medium post and combine that into a rough outline.
@penyuan will review the 100+ browser tabs left over from working on the talk and summarise some emerging themes (maybe 2-3), and see if that fits with the outline.

penyuan added newsletter items that can added in the newsletters talks-and-workshops Any talk and workshop that is delivered in association with The Turing Way. labels Apr 6, 2024

aleesteele added the events Coordinating workshops, book dashes and any other events label Apr 17, 2024

github-actions bot closed this as completed May 6, 2024

github-actions bot mentioned this issue May 6, 2024

🤖 Recent talks to promote in the newsletter #3589

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TALK] AI talk at Open Science & Societal Impact conference #3598

[TALK] AI talk at Open Science & Societal Impact conference #3598

penyuan commented Apr 6, 2024 •

edited

penyuan commented Apr 6, 2024

dingaaling commented Apr 8, 2024 •

edited

penyuan commented Apr 17, 2024 •

edited

dingaaling commented Apr 18, 2024

penyuan commented Apr 19, 2024

penyuan commented Apr 23, 2024

penyuan commented Apr 26, 2024

penyuan commented May 1, 2024

[TALK] AI talk at Open Science & Societal Impact conference #3598

[TALK] AI talk at Open Science & Societal Impact conference #3598

Comments

penyuan commented Apr 6, 2024 • edited

Date of talk

Details of the talk

Checklist

penyuan commented Apr 6, 2024

dingaaling commented Apr 8, 2024 • edited

penyuan commented Apr 17, 2024 • edited

1. "Open AI" is often neither open, articificial, nor intelligent

2. Moving towards an outcomes-based approach to AI

3. What outcomes do we want for open science?

dingaaling commented Apr 18, 2024

penyuan commented Apr 19, 2024

penyuan commented Apr 23, 2024

penyuan commented Apr 26, 2024

penyuan commented May 1, 2024

Notes from the cafe pad

things we're doing after today

penyuan commented Apr 6, 2024 •

edited

dingaaling commented Apr 8, 2024 •

edited

penyuan commented Apr 17, 2024 •

edited