Outline for a major update[edit]

AI Nanny, if included, feels like it should be under "Regulation of AI", not here. Goertzel stated, "It would require either a proactive assertion of power by some particular party, creating and installing and AI Nanny without asking everybody else’s permission; or else a degree of cooperation between the world’s most powerful governments, beyond what we see today". Rolf H Nelson (talk) 06:38, 8 April 2020 (UTC)[reply]
AI Nanny is really a hybrid solution if it involves human oversight, yes, like Multivac, but that also depends where the code comes from. Sotala & Yampolskiy note that it could come from bottom-up, top-down, or hybrid approaches, but the approach requires developing internal constraints of some kind. It is not that impractical in that regulation of global AI for high-frequency trading on stock markets is driving some of the xr concern - the first corporate team to a 'young' AGI will probably shoot for a measured takeover of the stock market, and the stock markets know that, which is why flash crashes are being studied so much and there are jitters over the algorithms being used. I'd be happy to insert AI Nanny over at Regulation of AI on the hybridity basis. However, most AGI control solutions involve hybridity at some stage. Coherent extrapolated volition or any norm/value-dependent solution has to rely on training input or be maintained by humans, but it is also subject to subversion by politicians along nation-state lines. A strong world government with weak or no nation-states may have been how they did it on other planets, but we are stuck here. I would caution about adding too much content to Regulation of AI, at least until regulation of AGI becomes more concrete. Johncdraper (talk) 08:42, 8 April 2020 (UTC)[reply]
Moving it to the regulation article seems fine. WeyerStudentOfAgrippa (talk) 09:55, 8 April 2020 (UTC)[reply]
It has its own separate article, IMHO either add your new content to AI box or merge AI box into AI control problem rather than duplicate too much. The AI box article can talk about the AI box in relation to the overall control problem if desired, if it remains its own article. Rolf H Nelson (talk) 06:38, 8 April 2020 (UTC)[reply]
I didn't mean to add substantial new content, just to more clearly define it and better integrate it into the article. WeyerStudentOfAgrippa (talk) 09:55, 8 April 2020 (UTC)[reply]
The overriding rationale for MOS:CURRENTLY is "In general, editors should avoid using statements that will date quickly". Statements like "it is currently unknown" don't really apply, since they'd have to be rewritten anyway once the state of knowledge changes. So feel free to rewrite them if you think the phrasing is awkward or not sufficiently sourced, but don't just do it because MOS:CURRENTLY tells you to. Rolf H Nelson (talk) 06:38, 8 April 2020 (UTC)[reply]

WeyerStudentOfAgrippa (talk) 16:00, 7 April 2020 (UTC)[reply]

New article: AI Alignment[edit]

I'd like to get some feedback and potentially help from editors here to create a new page. I've got quite a bit of time and motivation on my hands for this, and have the necessary experience (having worked in four AI safety labs).

@Rolf h nelson: @WeyerStudentOfAgrippa: @Johncdraper: Do the editors here support this plan, and potentially want to help refine it?

SoerenMind (talk) 19:22, 17 August 2020 (UTC)[reply]

SoerenMind It works for me. There is not a lot out there on AI alignment in the peer-reviewed academic literature, nor in books, but I recognize it has been firming up in the community in the last 3-4 years. Google Scholar threw this up: https://scholar.google.com/scholar?start=0&q=%22AI+alignment%22&hl=en&as_sdt=0,5, and see especially https://micahcarroll.github.io/assets/ValueAlignment.pdf. It would also help separate out AI and AGI concerns. Your problem may be in your citations - conference proceedings, fora papers, and ArXiv preprints, etc., may not hack it in terms of setting up a new page. Thanks for offering to write a draft. You can use your own Userspace or https://en.wikipedia.org/wiki/Wikipedia:Drafts. Johncdraper (talk) 20:55, 17 August 2020 (UTC)[reply]
@SoerenMind: IMO, the best way to approach this would be as a major update/rewrite of Friendly artificial intelligence, culminating in a move to the new name. The FAI article is very out of date and has been open for merging with this article for a year now. Take the alignment content from here if you want. Much of the present content of the FAI article could go in a history section.
You could create a draft version of the FAI article and build on that, or just jump into making incremental changes. Just give a heads-up on the FAI talk page a week or so before any planned major changes to the live article. WeyerStudentOfAgrippa (talk) 21:46, 17 August 2020 (UTC)[reply]
@SoerenMind:On second thought, it may not even be necessary to move alignment content out of this article, if you are mainly concerned with clarifying terminology and scope. If you can provide adequate support for your position on terminology, I would be open to moving this article to a new title, e.g. "AI alignment and control" or "Technical approaches to AI safety". This would likely be much more straightforward than creating a new article or rewriting the FAI article, which could be merged into a history section here. WeyerStudentOfAgrippa (talk) 23:23, 17 August 2020 (UTC)[reply]
There's conflicting terminologies that can be well-sourced; it's going to come down to use our own judgement. Rolf H Nelson (talk) 05:53, 19 August 2020 (UTC)[reply]

-- Rolf H Nelson (talk) 05:53, 19 August 2020 (UTC)[reply]

@SoerenMind: A good place to start would be adding an IDA subsection to the alignment section here. I was not getting anywhere when I tried to find good sources and explain it; you might have better luck. The content could be moved later if that is what we end up deciding. WeyerStudentOfAgrippa (talk) 16:22, 19 August 2020 (UTC)[reply]

@WeyerStudentOfAgrippa I'm fine with including uncontroversial information from the self-published Scalable agent alignment via reward modeling paper you cited, on the grounds that they're acknowledged subject-matter experts (WP:RSSELF), but their own self-published approach needs a secondary source. In contrast, a lot of the info in section 7 ("Alternatives for agent alignment") and elsewhere in the paper is secondary and could definitely be included. Rolf H Nelson (talk) 04:51, 27 August 2020 (UTC)[reply]
Another strong secondary source, albeit less detailed, would be "A formal methods approach to interpretable reinforcement learning for robotic planning" in Science Robotics. Rolf H Nelson (talk) 05:00, 27 August 2020 (UTC)[reply]
Perhaps it would be better to have a section on iterated amplification and related approaches and mention recursive reward modeling there. WeyerStudentOfAgrippa (talk) 18:42, 28 August 2020 (UTC)[reply]

I very much appreciate these inputs! To start with, I'll update the section on alignment in the present article now (should I make a section draft first?). Afterwards, I'd use this updated content to implement one of the two plans from WeyerStudentOfAgrippa: either rename and replace Friendly artificial intelligence to "AI alignment" or keep updating the present article and rename it to "AI alignment and control" or simply "AI safety". SoerenMind (talk) 15:14, 8 October 2020 (UTC)[reply]

Major updates drafted for alignment and control sections

As discussed above I've now made final drafts for significantly updated and restructured versions of the sections on Alignment and Capability Control. Hopefully this will give readers a starting point to understand the new developments of the last few years. Before pushing these changes it would be great to get an okay or criticism from some of the Wikipedians here @Rolf h nelson: @WeyerStudentOfAgrippa: @Johncdraper:.

Here's the draft: https://en.wikipedia.org/wiki/User:SoerenMind/sandbox/Alignment_and_control

I've chosen references that are canonical, well-known, and uncontroversial in the field, or sources that are reliable for another reason. However, since this is a matter of judgment I'd be grateful if the Wikipedians here could check my judgment. I can provide context for any references if it's not clear why I chose them.

If the draft is okay I plan to push these changes in ~a week. After that I want to improve the "Problem description" section. And then I'll suggest renaming the article to a more fitting and widely used name (e.g. AI Safety, or AI Alignment & Control) as discussed above. SoerenMind (talk) 17:18, 27 January 2021 (UTC)[reply]

@SoerenMind: Just saw this. Your ping didn't work because you didn't sign your comment. From a cursory reading, your draft looks okay. There are lots of minor issues that can be addressed once it's fully merged into the existing content. WeyerStudentOfAgrippa (talk) 16:40, 27 January 2021 (UTC)[reply]
Thanks, fixed it SoerenMind (talk) 17:22, 27 January 2021 (UTC).[reply]

@Rolf h nelson: @WeyerStudentOfAgrippa: @Johncdraper: It's now updated. SoerenMind (talk) 15:59, 7 February 2021 (UTC)[reply]

Looks good. I don't have anything else to remove or change; I want to add or restore, when I get a chance, content that documents prominent views held outside the AI control community (Three Laws of Robotics, "Just unplug it", AI policing AI). Rolf H Nelson (talk) 03:22, 8 February 2021 (UTC)[reply]

Sourcing issues[edit]

There's a lot of non-peer-reviewed arXiv papers here. This makes the article seem puffed-up. Are any of these substitutable? Else they should just be removed, and the claims supported by them - David Gerard (talk) 12:20, 8 February 2021 (UTC)[reply]

Thanks for raising this, it has improved the article. I’ve removed a good chunk of the Arxiv sources and the claims relying on them because these were not absolutely necessary. Can you name 3-5 Arxiv links that are most likely to be problematic? Then we can discuss those and see if there’s still need for further improvement.
As discussed above, I've been careful to only include Arxiv references when there are clear reasons. Because some of the most respected papers in the fields of AI and AI safety are only on Arxiv, I’ve included those and backed each of them with secondary sources. Some are themselves secondary. For example, the most cited and well-known paper in the field of AI safety, “Concrete Problems in AI Safety” is an Arxiv paper. You can find the secondary source next to each Arxiv reference or in the same paragraph. (When an Arxiv ref is cited more than once the secondary source may be next to only one of them). Having both the canonical original sources plus secondary sources is useful to the reader.
In addition, I have only chosen Arxiv refs written by the leading researchers in AI safety (Jan Leike, Paul Christiano, Dario Amodei, Markus Hutter, Scott Garrabant) at the leading groups (OpenAI, DeepMind, etc). This can further support reliability (at least according to this essay WP:RSE). SoerenMind (talk) 18:59, 8 February 2021 (UTC)[reply]
Assuming this resolved the issue and there is no further discussion in the next 7 days, I plan to remove the unreliable sources tag in 7 days then. SoerenMind (talk) 14:52, 28 February 2021 (UTC)[reply]
These are still unreliable sources. If the field itself relies on papers that aren't in reliable journals, that does not excuse their use on wikipedia. TeddyW (talk) 15:08, 18 December 2022 (UTC)[reply]

AI skepticism material[edit]

Can I solicit additional opinions from other editors on [1]? I'm personally in favor of its inclusion as documenting widely-held and influential schools of thought, but I may have written the material so I might be biased. Rolf H Nelson (talk) 20:43, 27 February 2021 (UTC)[reply]

For context, there are two deletions.

1) The topic AGI enforcement. For a rationale see edit history. I'm NOT strongly opposed to restoring this. Happy for a third party to decide (or Rolf if he has a strong view on it).

2) Content in Skepticism section. This content was previously the subsection Kill Switch under Capability Control where it seems (to me) better placed than under Skepticism. I had replaced it with the new subsection Interruptibility and Off Switch. Did I miss any important content there? If so, happy to help work it in. Off-switches are NB also discussed under Problem Description a few times so I assumed they have plenty of coverage. SoerenMind (talk) 14:47, 28 February 2021 (UTC)[reply]

Gary Marcus is listed as a skeptic in the article, but his position seems to be more complicated as indicated by this recent substack post [1]: "To me the only solution to the long-term risk issue is to build machines with consensus human values, but we are a long way from knowing how to do that". So it now seems more accurate to describe him as someone concerned about AI alignment, but who positions himself on the more moderate side 89.145.233.65 (talk) 05:09, 26 March 2023 (UTC)[reply]

References

Feedback on plan for a major update[edit]

Continuing my efforts from last year, I'm working on a major update/rewrite to this article. I wanted to get some feedback from the existing editors about whether these changes seem appropriate.

Here are the planned changes:

@Rolf h nelson: @WeyerStudentOfAgrippa: @Johncdraper: Do the editors here support this plan, or potentially want to help refine it? SoerenMind (talk) 11:34, 1 June 2022 (UTC)[reply]

SoerenMind (talk) 11:34, 1 June 2022 (UTC)[reply]
My main interest is outer alignment. That's the area I just got published in in AI & Society. I'd like to see what you have in your sandbox in this regard before it gets published. Johncdraper (talk) 13:55, 1 June 2022 (UTC)[reply]

Examples[edit]

Perhaps the rest of you can better determine how to incorporate the following actual example of the alignment problem into the article.

DATE: August 22, 2022

SITE: beta.openai.com/playground

MODEL: text-davinci-002

PROMPT: Write a short story about a paraplegic turtle with a crazy aunt who wants to kill him.

STORY: Auntie Marge was a crazy old woman who lived in the woods outside of town. She was always talking about how she wanted to kill her nephew, Timmy the Turtle. Timmy was paraplegic and couldn't move very well on his own. Auntie Marge would always say that it would be a mercy killing and that she would do it quickly so he wouldn't suffer. Timmy's mom tried to keep him away from Auntie Marge as much as possible, but sometimes she would come over when Timmy's mom wasn't home and try to kill him then too. One day, while Auntie Marge was trying to kill Timmy, he finally managed to get away from her and crawled into the road. A car came by and hit him, killing him instantly. When his mom found out what had happened, she was relieved that he was finally out of Auntie Marge's reach forever. 24.233.116.170 (talk) 14:27, 22 August 2022 (UTC)[reply]

This isn't an example of Alignment failure. It is merely unexpected but low impact outputs from a language model. While research has been done by the team at Anthropic that involved violent prompt completions by language models (https://www.anthropic.com/red_teaming.pdf) at no point has anyone from the team claimed that this is an identical problem to Alignment.
Your example is *not* an actual example of the alignment problem and including it will confuse people. 50.220.196.194 (talk) 04:15, 8 September 2022 (UTC)[reply]

Proposed merge of Misaligned goals in artificial intelligence into AI alignment[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
To split out the section on Reward hacking, then merge the rest to AI alignment on the grounds of overlap. Klbrain (talk) 12:49, 16 December 2023 (UTC)[reply]

Both articles discuss substantially the same topic, but do not interface with each other, and only link through redirects, suggesting that their authors were unaware of the existence of the other page. Ipatrol (talk) 05:46, 22 April 2023 (UTC)[reply]

No strong opinion either way. The misaligned goals page was focused on examples of alleged past misalignments; if it fits with another article without going over length limits, then that's fine. Rolf H Nelson (talk) 20:21, 10 June 2023 (UTC)[reply]
I disagree. They are of course related topics. But due to the complexity of the matter and current relevance to the situation worldwide. I believe a seperate article that is able to go into more details on the problems seperate from speculation of hypothetical future issues or technical generalities. In other words, the problem is just as important as the concept and seems likely to remain such for the forseeable future and I believe espescially for a general audience and those looking into the misalignment problem, it is warranted to remain seperate. 2601:346:501:2C00:F57F:613E:1649:34EA (talk) 10:32, 31 July 2023 (UTC)[reply]
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
  checkY Merger complete. Klbrain (talk) 13:16, 16 December 2023 (UTC)[reply]

Origin of use of the 'alignmnent' word[edit]

CAn anyone provide the origin of the use of this word. Most of the authors cited and ideas were in circulation long before anyone started to use this word, and it is still only a specific theory of technology and society used by a small group. Jamesks (talk) 15:26, 8 May 2023 (UTC)[reply]

It's an ellipsis of "value alignment" coined by Stuart J. Russell no later than November 2014. Ain92 (talk) 23:16, 29 May 2023 (UTC)[reply]