• To chevron_right

      Film Companies and Reddit Clash Again in Court over Anonymous Piracy Comments

      news.movim.eu / TorrentFreak • 11 January 2024 • 4 minutes

    reddit-logo Millions of people regularly pirate movies and music. While this is against the law, most don’t get into trouble.

    Some people feel so comfortable about their piracy habits that they openly discuss them online. On Reddit , for example, where most people use a semi-anonymous handle.

    Admissions of anonymous Redditors typically go unnoticed by copyright holders and, even when observed, it’s rare for companies to take matters further or ask any questions. A group of independent film companies in the United States set out to buck that trend last year.

    The film companies and their attorney Kerry Culpepper say they’re not planning to take Reddit users to court. However, they do want to use public piracy-related comments as evidence in a lawsuit against their Internet provider.

    These lawsuits don’t target individual pirates. Instead, the film companies have sued Internet providers including RCN , Grande , and Frontier , for failing to disconnect ‘repeat infringer’ customers.

    Trying Once, Twice….

    The independent film companies first reached out to Reddit roughly a year ago, asking the company to disclose names, IP addresses, and emails of various users. These people all made piracy-related claims in comments on Reddit, with several indicating that their ISP did little to stop this activity.

    Reddit was unhappy with the subpoena, characterizing it as overbroad and more akin to a fishing expedition than regular evidence gathering. Reddit only handed over the details of one user whose comment mentioned RCN directly, denying the rest to protect users’ First Amendment right to anonymous speech.

    The court eventually agreed with this defense, concluding that Redditors’ First Amendment right to anonymous speech outweighs the interest of rightsholders . According to Magistrate Judge Laurel Beeler, the filmmakers have other options to obtain this type of information, including through RCN itself.

    After this setback, the rightsholders filed an adjusted request in their case against ISP Grande. This second attempt wasn’t successful either, as Judge Beeler again concluded that the Redditors’ right to anonymous speech weighs stronger than the rightsholders’ need for additional evidence.

    Third Time’s The Charm?

    This week, attorney Kerry Culpepper returned to the California federal court with a request for Reddit to comply with another subpoena. This time, it’s part of the case against Internet provider Frontier Communications .

    Broadly speaking, the request looks similar to those seen before. The film companies, Voltage Holdings and Screen Media Ventures, highlight several comments from Redditors that could help to prove that the ISP didn’t take proper action against repeat infringers, or that lax enforcement acted as a draw to potential pirates.

    “In the Reddit ‘Piracy’ forum, Reddit user’Cyb3rR****’ admits to using Frontier’s service to pirate from the notorious piracy websites 1337x and PirateBay and that ‘I’ve been torrenting unprotected for like a decade and never gotten [a DMCA notice]’,” the companies write.

    Other Redditors made similar remarks, suggesting that using the ISP to pirate movies shouldn’t result in any trouble.

    ‘This Time is Different’

    In common with the earlier cases, Reddit is refusing to comply. Last week, it objected to the subpoena, arguing that the request violates its users’ rights to anonymous speech. This prompted the movie companies to go to court again, with a new plan of attack.

    The requested information is relevant and proportional to the needs of the case, they argue before the court. Contrary to the earlier cases, the subpoena no longer seeks any names and email addresses, only the IP address logs.

    “Reddit asserts that the information Movants request is not permissible under the First Amendment. However, Movants’ subpoena does not request anonymous users’ identities.

    “Rather, the subpoena is limited to requesting the Reddit users’ IP address logs. Accordingly, the analysis of Reddit I and Reddit II is not applicable,” the movie companies add.

    ‘Users Are not the Target’

    According to the rightsholders, Reddit didn’t identify any potential harm to the affected users. They further note that there are no plans to go after these people directly.

    “Movants are not seeking to retaliate economically or officially against these Reddit users. Rather, Movants wish to use their comments as evidence that Frontier has no meaningful policy for terminating repeat infringers and this lax or no policy was a draw for using Frontier’s service.”

    Reddit previously sent a letter to the movie companies’ attorney questioning whether the IP-addresses are relevant to the copyright infringement claims. In addition, it suggested that there might be other sources of evidence available to prove the same.

    The request doesn’t disclose why IP addresses are needed, since the anonymous comments are public. One theory would be that the rightsholders will check these addresses for repeat infringements, which might add extra weight to their claims.

    Reddit has yet to respond in court but, based on its earlier responses, it will do all it can to keep users’ information private.

    This results in a similar standoff as before, albeit with a twist. Since Frontier is not planning to disclose customer identification information, the filmmakers see these comments as important evidence, and this time they hope that the court agrees.

    Ultimately, it will be up to the court to decide whether it’s indeed different this time, or not.

    A copy of the motion to compel Reddit to respond to the subpoena is available here (pdf) . A copy of Reddit’s letter to the attorney can be found here (pdf)

    From: TF , for the latest news on copyright battles, piracy and more.

    • To chevron_right

      Film Companies and Reddit Clash Again in Court over Anonymous Piracy Comments

      news.movim.eu / TorrentFreak • 11 January 2024 • 4 minutes

    reddit-logo Millions of people regularly pirate movies and music. While this is against the law, most don’t get into trouble.

    Some people feel so comfortable about their piracy habits that they openly discuss them online. On Reddit , for example, where most people use a semi-anonymous handle.

    Admissions of anonymous Redditors typically go unnoticed by copyright holders and, even when observed, it’s rare for companies to take matters further or ask any questions. A group of independent film companies in the United States set out to buck that trend last year.

    The film companies and their attorney Kerry Culpepper say they’re not planning to take Reddit users to court. However, they do want to use public piracy-related comments as evidence in a lawsuit against their Internet provider.

    These lawsuits don’t target individual pirates. Instead, the film companies have sued Internet providers including RCN , Grande , and Frontier , for failing to disconnect ‘repeat infringer’ customers.

    Trying Once, Twice….

    The independent film companies first reached out to Reddit roughly a year ago, asking the company to disclose names, IP addresses, and emails of various users. These people all made piracy-related claims in comments on Reddit, with several indicating that their ISP did little to stop this activity.

    Reddit was unhappy with the subpoena, characterizing it as overbroad and more akin to a fishing expedition than regular evidence gathering. Reddit only handed over the details of one user whose comment mentioned RCN directly, denying the rest to protect users’ First Amendment right to anonymous speech.

    The court eventually agreed with this defense, concluding that Redditors’ First Amendment right to anonymous speech outweighs the interest of rightsholders . According to Magistrate Judge Laurel Beeler, the filmmakers have other options to obtain this type of information, including through RCN itself.

    After this setback, the rightsholders filed an adjusted request in their case against ISP Grande. This second attempt wasn’t successful either, as Judge Beeler again concluded that the Redditors’ right to anonymous speech weighs stronger than the rightsholders’ need for additional evidence.

    Third Time’s The Charm?

    This week, attorney Kerry Culpepper returned to the California federal court with a request for Reddit to comply with another subpoena. This time, it’s part of the case against Internet provider Frontier Communications .

    Broadly speaking, the request looks similar to those seen before. The film companies, Voltage Holdings and Screen Media Ventures, highlight several comments from Redditors that could help to prove that the ISP didn’t take proper action against repeat infringers, or that lax enforcement acted as a draw to potential pirates.

    “In the Reddit ‘Piracy’ forum, Reddit user’Cyb3rR****’ admits to using Frontier’s service to pirate from the notorious piracy websites 1337x and PirateBay and that ‘I’ve been torrenting unprotected for like a decade and never gotten [a DMCA notice]’,” the companies write.

    Other Redditors made similar remarks, suggesting that using the ISP to pirate movies shouldn’t result in any trouble.

    ‘This Time is Different’

    In common with the earlier cases, Reddit is refusing to comply. Last week, it objected to the subpoena, arguing that the request violates its users’ rights to anonymous speech. This prompted the movie companies to go to court again, with a new plan of attack.

    The requested information is relevant and proportional to the needs of the case, they argue before the court. Contrary to the earlier cases, the subpoena no longer seeks any names and email addresses, only the IP address logs.

    “Reddit asserts that the information Movants request is not permissible under the First Amendment. However, Movants’ subpoena does not request anonymous users’ identities.

    “Rather, the subpoena is limited to requesting the Reddit users’ IP address logs. Accordingly, the analysis of Reddit I and Reddit II is not applicable,” the movie companies add.

    ‘Users Are not the Target’

    According to the rightsholders, Reddit didn’t identify any potential harm to the affected users. They further note that there are no plans to go after these people directly.

    “Movants are not seeking to retaliate economically or officially against these Reddit users. Rather, Movants wish to use their comments as evidence that Frontier has no meaningful policy for terminating repeat infringers and this lax or no policy was a draw for using Frontier’s service.”

    Reddit previously sent a letter to the movie companies’ attorney questioning whether the IP-addresses are relevant to the copyright infringement claims. In addition, it suggested that there might be other sources of evidence available to prove the same.

    The request doesn’t disclose why IP addresses are needed, since the anonymous comments are public. One theory would be that the rightsholders will check these addresses for repeat infringements, which might add extra weight to their claims.

    Reddit has yet to respond in court but, based on its earlier responses, it will do all it can to keep users’ information private.

    This results in a similar standoff as before, albeit with a twist. Since Frontier is not planning to disclose customer identification information, the filmmakers see these comments as important evidence, and this time they hope that the court agrees.

    Ultimately, it will be up to the court to decide whether it’s indeed different this time, or not.

    A copy of the motion to compel Reddit to respond to the subpoena is available here (pdf) . A copy of Reddit’s letter to the attorney can be found here (pdf)

    From: TF , for the latest news on copyright battles, piracy and more.

    • To chevron_right

      Film Companies and Reddit Clash Again in Court over Anonymous Piracy Comments

      news.movim.eu / TorrentFreak • 11 January 2024 • 4 minutes

    reddit-logo Millions of people regularly pirate movies and music. While this is against the law, most don’t get into trouble.

    Some people feel so comfortable about their piracy habits that they openly discuss them online. On Reddit , for example, where most people use a semi-anonymous handle.

    Admissions of anonymous Redditors typically go unnoticed by copyright holders and, even when observed, it’s rare for companies to take matters further or ask any questions. A group of independent film companies in the United States set out to buck that trend last year.

    The film companies and their attorney Kerry Culpepper say they’re not planning to take Reddit users to court. However, they do want to use public piracy-related comments as evidence in a lawsuit against their Internet provider.

    These lawsuits don’t target individual pirates. Instead, the film companies have sued Internet providers including RCN , Grande , and Frontier , for failing to disconnect ‘repeat infringer’ customers.

    Trying Once, Twice….

    The independent film companies first reached out to Reddit roughly a year ago, asking the company to disclose names, IP addresses, and emails of various users. These people all made piracy-related claims in comments on Reddit, with several indicating that their ISP did little to stop this activity.

    Reddit was unhappy with the subpoena, characterizing it as overbroad and more akin to a fishing expedition than regular evidence gathering. Reddit only handed over the details of one user whose comment mentioned RCN directly, denying the rest to protect users’ First Amendment right to anonymous speech.

    The court eventually agreed with this defense, concluding that Redditors’ First Amendment right to anonymous speech outweighs the interest of rightsholders . According to Magistrate Judge Laurel Beeler, the filmmakers have other options to obtain this type of information, including through RCN itself.

    After this setback, the rightsholders filed an adjusted request in their case against ISP Grande. This second attempt wasn’t successful either, as Judge Beeler again concluded that the Redditors’ right to anonymous speech weighs stronger than the rightsholders’ need for additional evidence.

    Third Time’s The Charm?

    This week, attorney Kerry Culpepper returned to the California federal court with a request for Reddit to comply with another subpoena. This time, it’s part of the case against Internet provider Frontier Communications .

    Broadly speaking, the request looks similar to those seen before. The film companies, Voltage Holdings and Screen Media Ventures, highlight several comments from Redditors that could help to prove that the ISP didn’t take proper action against repeat infringers, or that lax enforcement acted as a draw to potential pirates.

    “In the Reddit ‘Piracy’ forum, Reddit user’Cyb3rR****’ admits to using Frontier’s service to pirate from the notorious piracy websites 1337x and PirateBay and that ‘I’ve been torrenting unprotected for like a decade and never gotten [a DMCA notice]’,” the companies write.

    Other Redditors made similar remarks, suggesting that using the ISP to pirate movies shouldn’t result in any trouble.

    ‘This Time is Different’

    In common with the earlier cases, Reddit is refusing to comply. Last week, it objected to the subpoena, arguing that the request violates its users’ rights to anonymous speech. This prompted the movie companies to go to court again, with a new plan of attack.

    The requested information is relevant and proportional to the needs of the case, they argue before the court. Contrary to the earlier cases, the subpoena no longer seeks any names and email addresses, only the IP address logs.

    “Reddit asserts that the information Movants request is not permissible under the First Amendment. However, Movants’ subpoena does not request anonymous users’ identities.

    “Rather, the subpoena is limited to requesting the Reddit users’ IP address logs. Accordingly, the analysis of Reddit I and Reddit II is not applicable,” the movie companies add.

    ‘Users Are not the Target’

    According to the rightsholders, Reddit didn’t identify any potential harm to the affected users. They further note that there are no plans to go after these people directly.

    “Movants are not seeking to retaliate economically or officially against these Reddit users. Rather, Movants wish to use their comments as evidence that Frontier has no meaningful policy for terminating repeat infringers and this lax or no policy was a draw for using Frontier’s service.”

    Reddit previously sent a letter to the movie companies’ attorney questioning whether the IP-addresses are relevant to the copyright infringement claims. In addition, it suggested that there might be other sources of evidence available to prove the same.

    The request doesn’t disclose why IP addresses are needed, since the anonymous comments are public. One theory would be that the rightsholders will check these addresses for repeat infringements, which might add extra weight to their claims.

    Reddit has yet to respond in court but, based on its earlier responses, it will do all it can to keep users’ information private.

    This results in a similar standoff as before, albeit with a twist. Since Frontier is not planning to disclose customer identification information, the filmmakers see these comments as important evidence, and this time they hope that the court agrees.

    Ultimately, it will be up to the court to decide whether it’s indeed different this time, or not.

    A copy of the motion to compel Reddit to respond to the subpoena is available here (pdf) . A copy of Reddit’s letter to the attorney can be found here (pdf)

    From: TF , for the latest news on copyright battles, piracy and more.

    • To chevron_right

      Meta Admits Use of ‘Pirated’ Book Dataset to Train AI

      news.movim.eu / TorrentFreak • 11 January 2024 • 4 minutes

    meta logo In recent months, rightsholders of all ilks have filed lawsuits against companies that develop AI models.

    The list includes record labels, individual authors, visual artists, and more recently the New York Times. These rightsholders all object to the presumed use of their work without proper compensation.

    Several of the lawsuits filed by book authors include a piracy component as well. The cases allege that tech companies, including Meta and OpenAI, used the controversial Books3 dataset to train their models.

    The Books3 dataset has a clear piracy angle. It was created by AI researcher Shawn Presser in 2020, who scraped the library of ‘pirate’ site Bibliotik. This book archive was publicly hosted by digital archiving collective ‘ The Eye ‘ at the time, alongside various other data sources.

    Bibliotik and other sources previously hosted at The Eye

    the eye

    The general vision was that the plaintext collection of more than 195,000 books, which is nearly 37GB in size, could help AI enthusiasts build better models, which would spur innovation.

    AI Boom Triggers Copyright Troubles

    Presser wasn’t wrong, but the dataset didn’t just help garage AI startups. Several of the world’s largest tech companies discovered it too and used it to improve their own language models.

    For years, Books3 continued to be freely and widely available, aiding AI researchers and enthusiasts around the world. However, when the AI boom reached the mainstream last year, book authors and publishers took notice, then took retaliatory action.

    For example, Danish anti-piracy group Rights Alliance demanded The Eye to remove their copy of Books3, which it did. The dataset also disappeared from the website of AI company Huggingface, citing reported copyright infringement , while others considered their options.

    As previously reported by Wired, Bloomberg informed Rights Alliance that it doesn’t plan to train future versions of its BloombergGPT model using Books3, and other companies likely made similar decisions behind closed doors.

    Meta Admits Books3 Use

    These are noteworthy developments but not all complaints can be resolved with promises. Several lawsuits against OpenAI and Meta remain ongoing, accusing the companies of using the Books3 dataset to train their models.

    While OpenAI and Meta are very cautious about discussing the subject in public, Meta provided more context in a California federal court this week.

    Responding to a lawsuit from writer/comedian Sarah Silverman, author Richard Kadrey, and other rights holders, the tech giant admits that “portions of Books3” were used to train the Llama AI model before its public release.

    “Meta admits that it used portions of the Books3 dataset, among many other materials, to train Llama 1 and Llama 2,” Meta writes in its answer.

    meta books3 answer

    This admission doesn’t come as a massive surprise as several sources, including research papers, basically reached the same conclusion. While the use of Books3 is not contested by Meta, the question remains whether the company was in the wrong when it did so.

    Meta Denies Copyright Infringement

    Meta’s answer admits the use of Books3 but denies various other allegations and claims. For example, the authors alleged that Meta trained its AI on copyrighted works without permission. The answer doesn’t directly deny this but notes that consent or compensation is not necessarily required.

    “To the extent a response is deemed required, Meta denies that its use of copyrighted works to train Llama required consent, credit, or compensation,” Meta writes.

    The authors further stated that, as far as their books appear in the Books3 database, they are referred to as “infringed works”. This prompted Meta to respond with yet another denial. “Meta denies that it infringed Plaintiffs’ alleged copyrights,” the company writes.

    Fair Use

    Meta’s response doesn’t provide much additional detail and the full defense will be revealed as the case progresses. It is clear, however, that the company plans to rely on a fair use defense, at least in part.

    “To the extent that Meta made any unauthorized copies of any Plaintiffs’ registered copyrighted works, such copies constitute fair use under 17 U.S.C. § 107,” Meta notes.

    The fair use angle is expected to be a key part of this and other AI lawsuits. This doesn’t only apply to ‘pirate’ sources but also to the use of content that’s published through official channels, but used without explicit permission.

    These legal battles are still in their early stages, but may ultimately find their way to the Supreme Court if needed. AI companies have stressed that progress will be hampered if rules and regulations are too strict.

    Earlier this week, OpenAI mentioned that fair use is both necessary and critical to building competitive AI models , noting that news organizations can opt out if they wish. Needless to say, this option didn’t previously exist, certainly not for the Books3 database.

    We presume that when Presser created Books3, he never envisioned the dataset to be at the center of landmark lawsuits that could define the future of AI. However, the stakes have changed, and the well-intended ‘archiving’ effort is now part of a major copyright clash.

    A copy of Meta’s response to the author’s first consolidated amended complaint is available here (pdf)

    From: TF , for the latest news on copyright battles, piracy and more.

    • To chevron_right

      Meta Admits Use of ‘Pirated’ Book Dataset to Train AI

      news.movim.eu / TorrentFreak • 11 January 2024 • 4 minutes

    meta logo In recent months, rightsholders of all ilks have filed lawsuits against companies that develop AI models.

    The list includes record labels, individual authors, visual artists, and more recently the New York Times. These rightsholders all object to the presumed use of their work without proper compensation.

    Several of the lawsuits filed by book authors include a piracy component as well. The cases allege that tech companies, including Meta and OpenAI, used the controversial Books3 dataset to train their models.

    The Books3 dataset has a clear piracy angle. It was created by AI researcher Shawn Presser in 2020, who scraped the library of ‘pirate’ site Bibliotik. This book archive was publicly hosted by digital archiving collective ‘ The Eye ‘ at the time, alongside various other data sources.

    Bibliotik and other sources previously hosted at The Eye

    the eye

    The general vision was that the plaintext collection of more than 195,000 books, which is nearly 37GB in size, could help AI enthusiasts build better models, which would spur innovation.

    AI Boom Triggers Copyright Troubles

    Presser wasn’t wrong, but the dataset didn’t just help garage AI startups. Several of the world’s largest tech companies discovered it too and used it to improve their own language models.

    For years, Books3 continued to be freely and widely available, aiding AI researchers and enthusiasts around the world. However, when the AI boom reached the mainstream last year, book authors and publishers took notice, then took retaliatory action.

    For example, Danish anti-piracy group Rights Alliance demanded The Eye to remove their copy of Books3, which it did. The dataset also disappeared from the website of AI company Huggingface, citing reported copyright infringement , while others considered their options.

    As previously reported by Wired, Bloomberg informed Rights Alliance that it doesn’t plan to train future versions of its BloombergGPT model using Books3, and other companies likely made similar decisions behind closed doors.

    Meta Admits Books3 Use

    These are noteworthy developments but not all complaints can be resolved with promises. Several lawsuits against OpenAI and Meta remain ongoing, accusing the companies of using the Books3 dataset to train their models.

    While OpenAI and Meta are very cautious about discussing the subject in public, Meta provided more context in a California federal court this week.

    Responding to a lawsuit from writer/comedian Sarah Silverman, author Richard Kadrey, and other rights holders, the tech giant admits that “portions of Books3” were used to train the Llama AI model before its public release.

    “Meta admits that it used portions of the Books3 dataset, among many other materials, to train Llama 1 and Llama 2,” Meta writes in its answer.

    meta books3 answer

    This admission doesn’t come as a massive surprise as several sources, including research papers, basically reached the same conclusion. While the use of Books3 is not contested by Meta, the question remains whether the company was in the wrong when it did so.

    Meta Denies Copyright Infringement

    Meta’s answer admits the use of Books3 but denies various other allegations and claims. For example, the authors alleged that Meta trained its AI on copyrighted works without permission. The answer doesn’t directly deny this but notes that consent or compensation is not necessarily required.

    “To the extent a response is deemed required, Meta denies that its use of copyrighted works to train Llama required consent, credit, or compensation,” Meta writes.

    The authors further stated that, as far as their books appear in the Books3 database, they are referred to as “infringed works”. This prompted Meta to respond with yet another denial. “Meta denies that it infringed Plaintiffs’ alleged copyrights,” the company writes.

    Fair Use

    Meta’s response doesn’t provide much additional detail and the full defense will be revealed as the case progresses. It is clear, however, that the company plans to rely on a fair use defense, at least in part.

    “To the extent that Meta made any unauthorized copies of any Plaintiffs’ registered copyrighted works, such copies constitute fair use under 17 U.S.C. § 107,” Meta notes.

    The fair use angle is expected to be a key part of this and other AI lawsuits. This doesn’t only apply to ‘pirate’ sources but also to the use of content that’s published through official channels, but used without explicit permission.

    These legal battles are still in their early stages, but may ultimately find their way to the Supreme Court if needed. AI companies have stressed that progress will be hampered if rules and regulations are too strict.

    Earlier this week, OpenAI mentioned that fair use is both necessary and critical to building competitive AI models , noting that news organizations can opt out if they wish. Needless to say, this option didn’t previously exist, certainly not for the Books3 database.

    We presume that when Presser created Books3, he never envisioned the dataset to be at the center of landmark lawsuits that could define the future of AI. However, the stakes have changed, and the well-intended ‘archiving’ effort is now part of a major copyright clash.

    A copy of Meta’s response to the author’s first consolidated amended complaint is available here (pdf)

    From: TF , for the latest news on copyright battles, piracy and more.

    • To chevron_right

      Meta Admits Use of ‘Pirated’ Book Dataset to Train AI

      news.movim.eu / TorrentFreak • 11 January 2024 • 4 minutes

    meta logo In recent months, rightsholders of all ilks have filed lawsuits against companies that develop AI models.

    The list includes record labels, individual authors, visual artists, and more recently the New York Times. These rightsholders all object to the presumed use of their work without proper compensation.

    Several of the lawsuits filed by book authors include a piracy component as well. The cases allege that tech companies, including Meta and OpenAI, used the controversial Books3 dataset to train their models.

    The Books3 dataset has a clear piracy angle. It was created by AI researcher Shawn Presser in 2020, who scraped the library of ‘pirate’ site Bibliotik. This book archive was publicly hosted by digital archiving collective ‘ The Eye ‘ at the time, alongside various other data sources.

    Bibliotik and other sources previously hosted at The Eye

    the eye

    The general vision was that the plaintext collection of more than 195,000 books, which is nearly 37GB in size, could help AI enthusiasts build better models, which would spur innovation.

    AI Boom Triggers Copyright Troubles

    Presser wasn’t wrong, but the dataset didn’t just help garage AI startups. Several of the world’s largest tech companies discovered it too and used it to improve their own language models.

    For years, Books3 continued to be freely and widely available, aiding AI researchers and enthusiasts around the world. However, when the AI boom reached the mainstream last year, book authors and publishers took notice, then took retaliatory action.

    For example, Danish anti-piracy group Rights Alliance demanded The Eye to remove their copy of Books3, which it did. The dataset also disappeared from the website of AI company Huggingface, citing reported copyright infringement , while others considered their options.

    As previously reported by Wired, Bloomberg informed Rights Alliance that it doesn’t plan to train future versions of its BloombergGPT model using Books3, and other companies likely made similar decisions behind closed doors.

    Meta Admits Books3 Use

    These are noteworthy developments but not all complaints can be resolved with promises. Several lawsuits against OpenAI and Meta remain ongoing, accusing the companies of using the Books3 dataset to train their models.

    While OpenAI and Meta are very cautious about discussing the subject in public, Meta provided more context in a California federal court this week.

    Responding to a lawsuit from writer/comedian Sarah Silverman, author Richard Kadrey, and other rights holders, the tech giant admits that “portions of Books3” were used to train the Llama AI model before its public release.

    “Meta admits that it used portions of the Books3 dataset, among many other materials, to train Llama 1 and Llama 2,” Meta writes in its answer.

    meta books3 answer

    This admission doesn’t come as a massive surprise as several sources, including research papers, basically reached the same conclusion. While the use of Books3 is not contested by Meta, the question remains whether the company was in the wrong when it did so.

    Meta Denies Copyright Infringement

    Meta’s answer admits the use of Books3 but denies various other allegations and claims. For example, the authors alleged that Meta trained its AI on copyrighted works without permission. The answer doesn’t directly deny this but notes that consent or compensation is not necessarily required.

    “To the extent a response is deemed required, Meta denies that its use of copyrighted works to train Llama required consent, credit, or compensation,” Meta writes.

    The authors further stated that, as far as their books appear in the Books3 database, they are referred to as “infringed works”. This prompted Meta to respond with yet another denial. “Meta denies that it infringed Plaintiffs’ alleged copyrights,” the company writes.

    Fair Use

    Meta’s response doesn’t provide much additional detail and the full defense will be revealed as the case progresses. It is clear, however, that the company plans to rely on a fair use defense, at least in part.

    “To the extent that Meta made any unauthorized copies of any Plaintiffs’ registered copyrighted works, such copies constitute fair use under 17 U.S.C. § 107,” Meta notes.

    The fair use angle is expected to be a key part of this and other AI lawsuits. This doesn’t only apply to ‘pirate’ sources but also to the use of content that’s published through official channels, but used without explicit permission.

    These legal battles are still in their early stages, but may ultimately find their way to the Supreme Court if needed. AI companies have stressed that progress will be hampered if rules and regulations are too strict.

    Earlier this week, OpenAI mentioned that fair use is both necessary and critical to building competitive AI models , noting that news organizations can opt out if they wish. Needless to say, this option didn’t previously exist, certainly not for the Books3 database.

    We presume that when Presser created Books3, he never envisioned the dataset to be at the center of landmark lawsuits that could define the future of AI. However, the stakes have changed, and the well-intended ‘archiving’ effort is now part of a major copyright clash.

    A copy of Meta’s response to the author’s first consolidated amended complaint is available here (pdf)

    From: TF , for the latest news on copyright battles, piracy and more.

    • To chevron_right

      Google Sees DMCA Takedown Requests Surge to New Highs

      news.movim.eu / TorrentFreak • 10 January 2024 • 3 minutes

    dmca-google-s1 In 2012, Google expanded its transparency report with a new section dedicated to DMCA takedown requests .

    For the first time, outsiders were able to see which URLs were being targeted by copyright holders and in what quantity.

    The decision to make this information public was in part triggered by a rapid increase in removal requests. The increased activity impacted the “free flow of information”, the search engine argued.

    According to Fred von Lohmann, Google’s Senior Copyright Counsel at the time, the volume of DMCA notices was skyrocketing . At times, the company was processing over 250,000 takedown requests a week, more than previously received in an entire year.

    Today, that weekly figure of 250,000 requests has increased to well over 30 million, a new record. While Google has set plenty of records in the past, the recent resurgence in DMCA takedowns is somewhat atypical.

    From Millions to Over a Billion

    When Google first made the numbers public it was processing a few million DMCA takedown requests in a year. That number swiftly increased to hundreds of millions and eventually reached a billion yearly DMCA requests in 2016.

    The exponential growth curve eventually flattened out and around 2017, the takedown volume started to decline . The decrease was in part due to various anti-piracy algorithms making pirated content less visible in search results.

    By downranking pirate sites , infringing content became harder to find. As a result, Google processed fewer takedown notices, a welcome change for both rightsholders and the search engine.

    DMCA Resurgence

    Today, Google continues to make pirate sites less visible in search, but the reduction in takedown notices didn’t last. On the contrary, over the past several months, Google search processed a record number of DMCA notices.

    Last summer, the search giant reached a new milestone when it recorded the 7 billionth takedown request and, five months later, it can add more than 700 million new ones to this tally.

    The company is now handling removal requests at a rate of more than 1.6 billion per year; a new record. This is more than 30 million takedown requests per week and roughly 50 every second.

    The graph below illustrates how these numbers have grown over time, with the most recent uptick on the right.

    Google Search Takedown Notices (2012-2024)

    Will it Last?

    We noticed that the volume of takedown requests had begun to increase again last August. At the time, we suggested that this could be a temporary uptick since the increase in volume could in large part be attributed to adult company MG Premium, which reported hundreds of millions of URLs in just a few months.

    Since MG Premium scaled down its efforts last summer, volumes should have normalized. What we didn’t foresee was several other rightsholders stepping in to take over.

    Over the past few months, takedown outfits Link-Busters.com and Comeso have increased their efforts. Together, they now submit roughly two-thirds of the recent DMCA notices to Google. If that persists, this would be good for a billion yearly requests.

    The two companies work with a variety of rightsholders. Link-Busters mostly works with major publishers , including Penguin Random House, HarperCollins, and Hachette. Comeso, in turn , has sent most takedowns on behalf of KakaoPage, a major webtoon publisher.

    link-busters

    In the past, video and music rightsholders were responsible for the bulk of DMCA requests, but this has now switched to publishers. How Google’s takedown volume develops going forward, and if any new records will be broken in the near future, will largely depend on these players.

    Then again, it’s also possible that an entirely new anti-piracy outfit will surface and take over. There’s never a dull moment in takedown land.

    For background, this article refers to the number of URLs reported in DMCA takedown requests to Google. The search engine can remove the URLs from its index in response, or place them on a preemptive blacklist if they are not yet indexed. Finally, a small number of notices don’t link to infringing material, requiring no response from Google.

    From: TF , for the latest news on copyright battles, piracy and more.

    • To chevron_right

      Google Sees DMCA Takedown Requests Surge to New Highs

      news.movim.eu / TorrentFreak • 10 January 2024 • 3 minutes

    dmca-google-s1 In 2012, Google expanded its transparency report with a new section dedicated to DMCA takedown requests .

    For the first time, outsiders were able to see which URLs were being targeted by copyright holders and in what quantity.

    The decision to make this information public was in part triggered by a rapid increase in removal requests. The increased activity impacted the “free flow of information”, the search engine argued.

    According to Fred von Lohmann, Google’s Senior Copyright Counsel at the time, the volume of DMCA notices was skyrocketing . At times, the company was processing over 250,000 takedown requests a week, more than previously received in an entire year.

    Today, that weekly figure of 250,000 requests has increased to well over 30 million, a new record. While Google has set plenty of records in the past, the recent resurgence in DMCA takedowns is somewhat atypical.

    From Millions to Over a Billion

    When Google first made the numbers public it was processing a few million DMCA takedown requests in a year. That number swiftly increased to hundreds of millions and eventually reached a billion yearly DMCA requests in 2016.

    The exponential growth curve eventually flattened out and around 2017, the takedown volume started to decline . The decrease was in part due to various anti-piracy algorithms making pirated content less visible in search results.

    By downranking pirate sites , infringing content became harder to find. As a result, Google processed fewer takedown notices, a welcome change for both rightsholders and the search engine.

    DMCA Resurgence

    Today, Google continues to make pirate sites less visible in search, but the reduction in takedown notices didn’t last. On the contrary, over the past several months, Google search processed a record number of DMCA notices.

    Last summer, the search giant reached a new milestone when it recorded the 7 billionth takedown request and, five months later, it can add more than 700 million new ones to this tally.

    The company is now handling removal requests at a rate of more than 1.6 billion per year; a new record. This is more than 30 million takedown requests per week and roughly 50 every second.

    The graph below illustrates how these numbers have grown over time, with the most recent uptick on the right.

    Google Search Takedown Notices (2012-2024)

    Will it Last?

    We noticed that the volume of takedown requests had begun to increase again last August. At the time, we suggested that this could be a temporary uptick since the increase in volume could in large part be attributed to adult company MG Premium, which reported hundreds of millions of URLs in just a few months.

    Since MG Premium scaled down its efforts last summer, volumes should have normalized. What we didn’t foresee was several other rightsholders stepping in to take over.

    Over the past few months, takedown outfits Link-Busters.com and Comeso have increased their efforts. Together, they now submit roughly two-thirds of the recent DMCA notices to Google. If that persists, this would be good for a billion yearly requests.

    The two companies work with a variety of rightsholders. Link-Busters mostly works with major publishers , including Penguin Random House, HarperCollins, and Hachette. Comeso, in turn , has sent most takedowns on behalf of KakaoPage, a major webtoon publisher.

    link-busters

    In the past, video and music rightsholders were responsible for the bulk of DMCA requests, but this has now switched to publishers. How Google’s takedown volume develops going forward, and if any new records will be broken in the near future, will largely depend on these players.

    Then again, it’s also possible that an entirely new anti-piracy outfit will surface and take over. There’s never a dull moment in takedown land.

    For background, this article refers to the number of URLs reported in DMCA takedown requests to Google. The search engine can remove the URLs from its index in response, or place them on a preemptive blacklist if they are not yet indexed. Finally, a small number of notices don’t link to infringing material, requiring no response from Google.

    From: TF , for the latest news on copyright battles, piracy and more.

    • To chevron_right

      Google Sees DMCA Takedown Requests Surge to New Highs

      news.movim.eu / TorrentFreak • 10 January 2024 • 3 minutes

    dmca-google-s1 In 2012, Google expanded its transparency report with a new section dedicated to DMCA takedown requests .

    For the first time, outsiders were able to see which URLs were being targeted by copyright holders and in what quantity.

    The decision to make this information public was in part triggered by a rapid increase in removal requests. The increased activity impacted the “free flow of information”, the search engine argued.

    According to Fred von Lohmann, Google’s Senior Copyright Counsel at the time, the volume of DMCA notices was skyrocketing . At times, the company was processing over 250,000 takedown requests a week, more than previously received in an entire year.

    Today, that weekly figure of 250,000 requests has increased to well over 30 million, a new record. While Google has set plenty of records in the past, the recent resurgence in DMCA takedowns is somewhat atypical.

    From Millions to Over a Billion

    When Google first made the numbers public it was processing a few million DMCA takedown requests in a year. That number swiftly increased to hundreds of millions and eventually reached a billion yearly DMCA requests in 2016.

    The exponential growth curve eventually flattened out and around 2017, the takedown volume started to decline . The decrease was in part due to various anti-piracy algorithms making pirated content less visible in search results.

    By downranking pirate sites , infringing content became harder to find. As a result, Google processed fewer takedown notices, a welcome change for both rightsholders and the search engine.

    DMCA Resurgence

    Today, Google continues to make pirate sites less visible in search, but the reduction in takedown notices didn’t last. On the contrary, over the past several months, Google search processed a record number of DMCA notices.

    Last summer, the search giant reached a new milestone when it recorded the 7 billionth takedown request and, five months later, it can add more than 700 million new ones to this tally.

    The company is now handling removal requests at a rate of more than 1.6 billion per year; a new record. This is more than 30 million takedown requests per week and roughly 50 every second.

    The graph below illustrates how these numbers have grown over time, with the most recent uptick on the right.

    Google Search Takedown Notices (2012-2024)

    Will it Last?

    We noticed that the volume of takedown requests had begun to increase again last August. At the time, we suggested that this could be a temporary uptick since the increase in volume could in large part be attributed to adult company MG Premium, which reported hundreds of millions of URLs in just a few months.

    Since MG Premium scaled down its efforts last summer, volumes should have normalized. What we didn’t foresee was several other rightsholders stepping in to take over.

    Over the past few months, takedown outfits Link-Busters.com and Comeso have increased their efforts. Together, they now submit roughly two-thirds of the recent DMCA notices to Google. If that persists, this would be good for a billion yearly requests.

    The two companies work with a variety of rightsholders. Link-Busters mostly works with major publishers , including Penguin Random House, HarperCollins, and Hachette. Comeso, in turn , has sent most takedowns on behalf of KakaoPage, a major webtoon publisher.

    link-busters

    In the past, video and music rightsholders were responsible for the bulk of DMCA requests, but this has now switched to publishers. How Google’s takedown volume develops going forward, and if any new records will be broken in the near future, will largely depend on these players.

    Then again, it’s also possible that an entirely new anti-piracy outfit will surface and take over. There’s never a dull moment in takedown land.

    For background, this article refers to the number of URLs reported in DMCA takedown requests to Google. The search engine can remove the URLs from its index in response, or place them on a preemptive blacklist if they are not yet indexed. Finally, a small number of notices don’t link to infringing material, requiring no response from Google.

    From: TF , for the latest news on copyright battles, piracy and more.