University of Michigan is selling student data to AI companies

[ad_1]

Do you want to buy some student data for your AI? The University of Michigan can help. It appears that representatives of the school or its partners are cold-emailing tech employees at Google and other companies, offering data on University of Michigan students to train large language models. The data includes recordings of lectures, student discussions and office hours, as well as essays written by seniors and graduate students, which are available for a nominal license fee. It is not clear whether the students gave their consent or not.

The story surfaced in an X/Twitter post by an employee of Google DeepMind, the company’s AI research hub. Susan Zhang, an engineer at DeepMind, said she received a sponsored LinkedIn message that provided information and offered a free sample of data from the University of Michigan to prove its usefulness.

“I am contacting you because, based on your profile, you may be working with large language models (LLM) or natural language processing,” the sales message said. “I wanted to let you know that the University of Michigan is licensing academic speech data and student papers that could be very useful for training or tuning the LLM.”

This message offers data from 85 hours of lectures, discussion sections, and interviews for $15,595, a second set of 829 papers written by University of Michigan students in a variety of disciplines for $12,595, or a discounted package for both data sets at $25,000 Is.

“I think it’s worth finding out which universities are selling student data and what the terms are,” Zhang told Gizmodo in a message on X. The creators won’t get a penny, while the reseller who stores the data will capture all the profits).

The university appears to be working with an organization called Catalyst Research Alliance, which also claims to have a partnership with North Carolina State University. The website offers a sample data setWhich comes with an essay titled “The Democratic Inadequacies of the European Union” and what appears to be a recording of a classroom discussion section.

Catalyst Research Alliance and North Carolina State University did not immediately respond to requests for comment. A representative from the University of Michigan said they were preparing a statement. We’ll update this article when we hear back.

Training large language models, such as software that runs chatbots like ChatGPT and Bard, requires massive, clearly labeled data sets across a variety of topics and subjects. While the University of Michigan data set is small, the well-organized material on a narrow set of topics may be useful for tuning some models, especially tools designed for specific purposes related to education, formal communication, or their improvements. Performance on individual areas of subject matter expertise to train more general AI.

[ad_2]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Global Social Media Outage: Facebook, Instagram, Messenger – Resolved! Cal.com: Know how this productivity tool can keep you on track at work; it is free for individuals Amazon is offering a whopping 26 pct discount on iPhone 14 Plus: Check offers here iPhone 15 price drop: Get a huge 11% discount on Amazon now – check deal NASA captures the most powerful black hole eruption ever recorded! Check details here. Private US moon lander Odysseus enters lunar orbit en route to historic touchdown attempt Want to buy the new Samsung Galaxy S24 Ultra? Check out this huge Amazon discount Grab 11 pct discount on iPhone 15! Check deals and whopping exchange offer on Amazon NASA calls for volunteers to join simulated one-year Mars surface mission iPhone 14 price drop: Huge 15% discount now on Flipkart; check Rs. 42000 exchange offer too