Talk:OpenAI Codex

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Did you know nomination[edit]

The following is an archived discussion of the DYK nomination of the article below. Please do not modify this page. Subsequent comments should be made on the appropriate discussion page (such as this nomination's talk page, the article's talk page or Wikipedia talk:Did you know), unless there is consensus to re-open the discussion at this page. No further edits should be made to this page.

The result was: promoted by Theleekycauldron (talk) 06:30, 8 September 2021 (UTC)[reply]

  • ... that the artificial intelligence model OpenAI Codex can help with "probably the least fun part of programming"? Source: OpenAI blog, as quoted by VentureBeat
    • ALT1:... that OpenAI Codex, an artificial intelligence model based on GPT-3, has been trained on 159 gigabytes of code in addition to text? Source: VentureBeat "Codex was trained on 54 million public software repositories hosted on GitHub [...] The final training dataset totaled 159GB."
    • ALT2:... that OpenAI Codex's use of licensed code as training data has raised questions about the copyright status of machine learning models? Source: InfoWorld, quoting FSF: "Is a trained AI/ML model copyrighted? Who holds the copyright?"
    • ALT3:... that OpenAI Codex's use of licensed code as training data has drawn similarities to Google Books's digitization of many in-print books, in terms of its copyright implications? Source: by The Register "We're reminded of the Authors Guild vs Google case.", WIRED "There are many ways of transforming a work, like using it for parody or criticism or summarizing it—or, as courts have repeatedly found, using it as the fuel for algorithms. In one prominent case, a federal court rejected a lawsuit brought by a publishing group against Google Books"
  • Comment: First nomination, so no QPQ.

Created by Eviolite (talk). Self-nominated at 03:51, 4 September 2021 (UTC).[reply]

  • Date, size, etc. are fine. What I am a bit concerned is whether the topic is notable (it's borderline, very new) and to what degree we should consider this a form of promotion and use the main page for this. In other words, this is technically ok but I am concerned a bit this is just not appropriate for the front page. Either of this wouldn't matter much, and I'd give this a pass, but taken together I am a bit concerned and as such I'll mark this as 'maybe' and request a second reviewer (ping User:BlueMoonset, maybe there's a way to mark this better in such a case? Or maybe my concerns are not relevant at DYK level?). --Piotr Konieczny aka Prokonsul Piotrus| reply here 05:32, 4 September 2021 (UTC)[reply]
@Piotrus: Thanks for the comments. I personally would expect that the topic is notable; there is plenty of SIGCOV that seem to show that it will have lasting impacts, but WP:CRYSTAL. On the other hand I completely get the promotion/neutrality concern and I wonder if it would be better to use something like "... that OpenAI Codex has raised questions about the copyright status of machine learning models and training data?" based on the Infoworld source (currently the last in the article), though it would obviously have to be added into the article (it's late for me so I'll try to expand tomorrow). Thanks, eviolite (talk) 06:28, 4 September 2021 (UTC)[reply]
That or ALT1 are more neutral, although I do think that main hook is the 'most interesting' (good marketing should draw people's attentions...). Sigh. --Piotr Konieczny aka Prokonsul Piotrus| reply here 07:15, 4 September 2021 (UTC)[reply]
I have expanded that section of the article a bit and have proposed that as ALT2, and added a possible ALT3 as well. eviolite (talk) 17:39, 4 September 2021 (UTC)[reply]
To answer Piotrus's question, the {{DYK?again}} (red arrow) icon should be used if you want a second opinion. A few general points: the original hook does read as promotional to me, and ALT2's "OpenAI Codex has raised questions" construction is problematic, since the codex itself has not raised questions (which an AI might do), but its existence and how it works has resulted in questions being raised. ALT3 is a bit vague: "has been compared to" is effectively meaningless when you don't know the context and whether the comparison is that they are similar or far from it. BlueMoonset (talk) 00:24, 5 September 2021 (UTC)[reply]
@BlueMoonset:: Thanks. I have modified ALT2 and ALT3 to clarify and give a bit more context, though I'm worried about going too far and making them too clunky. (For reference, the originals: "... that OpenAI Codex has raised questions about the copyright status of machine learning models and training data?" and "... that OpenAI Codex has been compared to Google Books in terms of its copyright implications?") eviolite (talk) 02:46, 5 September 2021 (UTC)[reply]
@Piotrus: to see if these are more satisfactory... eviolite (talk) 21:06, 5 September 2021 (UTC)[reply]
I think the revised hooks, particularly ALT2 and 3, are GTG. In the future, for clarity, please add them as ALT2a and such rather than modify the old ones. --Piotr Konieczny aka Prokonsul Piotrus| reply here 02:33, 6 September 2021 (UTC)[reply]
Thank you, I'll make sure to do so in the future. eviolite (talk) 03:16, 6 September 2021 (UTC)[reply]
ALT2 to T:DYK/P4