Artificial intelligence (AI) use is on the rise in agriculture and elsewhere, thanks to generative AI platforms like OpenAI’s ChatGPT. AI is often described as a Black Box, where data is fed into the box and decisions are made, without anyone knowing exactly how the Black Box arrived at the result. Considering that the AI decisions are made in secrecy, is there room for a discussion of transparency when talking about AI? Yes.
AI tools require vast amounts of training data to make informed, accurate decisions. Where that training data originated, and from whom, matters when building an AI tool. We are starting to see lawsuits arising from AI tools allegedly using unlicensed data. The New York Times has sued Microsoft and Open AI alleging that their tools used Times’ articles without permission to train the software companies’ AI tools. Times’ articles are copyrighted and cannot be used without permission, except in cases of “fair use” which is well established legal doctrine allowing limited use of copyrighted works in certain areas (such as for education).
Data need not be copyrighted to be protected from use by AI tools without permission. Ag data platforms typically license farmers’ agricultural data for use in their platforms. Often such use is limited by the user agreement (as it should be) to specific uses authorized by the farmer within the platform. In other instances, the license is broad and gives the tech company the right to use ag data for any lawful use. Either way, the time has come for tech platforms to explain to users whether their data will be used to train AI platforms.
The Ag Data Transparent organization recently undertook a review and update of the Core Principles for Ag Data Use. One of these core principles is “Transparency.” Transparency requires tech providers to inform users of how data will be used. When it comes to AI, this means explaining whether data is used to train AI models. Here is the entire principle:
Transparency:
[Ag Tech] Providers shall inform farmers about the purposes for which they collect and use ag data. Providers shall provide information about how farmers can contact the providers with any inquiries or complaints. Providers shall explain the choices the providers offer for limiting its use and disclosure. A provider’s operating principles, policies and practices should be transparent and fully consistent with the terms and conditions in their published legal contracts. Providers should explain whether ag data will be used in training machine learning or artificial intelligence models.
No doubt that many tech companies will not take this advice and simply (mis)appropriate data already obtained from users to train their AI models, with the users never knowing their data was fed into the Black Box. This is a dangerous game, as cases like the Times’ versus Open AI are showing. Transparency with data requires more.