Movim

Ar chevron_right

Why Google Gemini’s Pokémon success isn’t all it’s cracked up to be

news.movim.eu / ArsTechnica • 5 May 2025 • 1 minute

Earlier this year, we took a look at how and why Anthropic's Claude LLM was struggling to beat Pokémon Red (a game, let's remember, designed for young children). But while Claude 3.7 is still struggling to make consistent progress at the game weeks later, a similar Twitch-streamed effort using Google's Gemini 2.5 model managed to finally complete Pokémon Blue this weekend across over 106,000 in-game actions, earning accolades from followers, including Google CEO Sundar Pichai .

Before you start using this achievement as a way to compare the relative performance of these two AI models—or even the advancement of LLM capabilities over time—there are some important caveats to keep in mind. As it happens, Gemini needed some fairly significant outside help on its path to eventual Pokémon victory.

Strap in to the agent harness

Gemini Plays Pokémon developer JoelZ (who's unaffiliated with Google) will be the first to tell you that Pokémon is ill-suited as a reliable benchmark for LLM models. As he writes on the project's Twitch FAQ , "please don't consider this a benchmark for how well an LLM can play Pokémon. You can't really make direct comparisons—Gemini and Claude have different tools and receive different information. ... Claude's framework has many shortcomings so I wanted to see how far Gemini could get if it were given the right tools."

Read full article

Comments

Ar chevron_right

Why Google Gemini’s Pokémon success isn’t all it’s cracked up to be

news.movim.eu / ArsTechnica • 5 May 2025 • 1 minute

Earlier this year, we took a look at how and why Anthropic's Claude LLM was struggling to beat Pokémon Red (a game, let's remember, designed for young children). But while Claude 3.7 is still struggling to make consistent progress at the game weeks later, a similar Twitch-streamed effort using Google's Gemini 2.5 model managed to finally complete Pokémon Blue this weekend across over 106,000 in-game actions, earning accolades from followers, including Google CEO Sundar Pichai .

Before you start using this achievement as a way to compare the relative performance of these two AI models—or even the advancement of LLM capabilities over time—there are some important caveats to keep in mind. As it happens, Gemini needed some fairly significant outside help on its path to eventual Pokémon victory.

Strap in to the agent harness

Gemini Plays Pokémon developer JoelZ (who's unaffiliated with Google) will be the first to tell you that Pokémon is ill-suited as a reliable benchmark for LLM models. As he writes on the project's Twitch FAQ , "please don't consider this a benchmark for how well an LLM can play Pokémon. You can't really make direct comparisons—Gemini and Claude have different tools and receive different information. ... Claude's framework has many shortcomings so I wanted to see how far Gemini could get if it were given the right tools."

Read full article

Comments

Ar chevron_right

Why Google Gemini’s Pokémon success isn’t all it’s cracked up to be

news.movim.eu / ArsTechnica • 5 May 2025 • 1 minute

Earlier this year, we took a look at how and why Anthropic's Claude LLM was struggling to beat Pokémon Red (a game, let's remember, designed for young children). But while Claude 3.7 is still struggling to make consistent progress at the game weeks later, a similar Twitch-streamed effort using Google's Gemini 2.5 model managed to finally complete Pokémon Blue this weekend across over 106,000 in-game actions, earning accolades from followers, including Google CEO Sundar Pichai .

Before you start using this achievement as a way to compare the relative performance of these two AI models—or even the advancement of LLM capabilities over time—there are some important caveats to keep in mind. As it happens, Gemini needed some fairly significant outside help on its path to eventual Pokémon victory.

Strap in to the agent harness

Gemini Plays Pokémon developer JoelZ (who's unaffiliated with Google) will be the first to tell you that Pokémon is ill-suited as a reliable benchmark for LLM models. As he writes on the project's Twitch FAQ , "please don't consider this a benchmark for how well an LLM can play Pokémon. You can't really make direct comparisons—Gemini and Claude have different tools and receive different information. ... Claude's framework has many shortcomings so I wanted to see how far Gemini could get if it were given the right tools."

Read full article

Comments

Ar chevron_right

After two court losses, DOGE asks Supreme Court for Social Security data access

news.movim.eu / ArsTechnica • 5 May 2025

The Trump administration filed an emergency application on Friday asking the Supreme Court to restore DOGE's access to Social Security Administration records. A lower-court order that prohibited DOGE's access is causing "irreparable harm to the executive branch" and thwarting DOGE's attempts to "eliminate waste and fraud," US Solicitor General John Sauer wrote in the appeal .

"The government cannot eliminate waste and fraud if district courts bar the very agency personnel with expertise and the designated mission of curtailing such waste and fraud from performing their jobs," Sauer told the Supreme Court. The preliminary injunction that is currently in place halted "the Executive Branch's critically important efforts to improve its information-technology infrastructure and eliminate waste," the brief said.

The appeal was lodged in a case filed by the American Federation of State, County and Municipal Employees; the Alliance for Retired Americans; and American Federation of Teachers. Chief Justice John Roberts asked them to file a response to the US by May 12.

Read full article

Comments

Ar chevron_right

After two court losses, DOGE asks Supreme Court for Social Security data access

news.movim.eu / ArsTechnica • 5 May 2025

The Trump administration filed an emergency application on Friday asking the Supreme Court to restore DOGE's access to Social Security Administration records. A lower-court order that prohibited DOGE's access is causing "irreparable harm to the executive branch" and thwarting DOGE's attempts to "eliminate waste and fraud," US Solicitor General John Sauer wrote in the appeal .

"The government cannot eliminate waste and fraud if district courts bar the very agency personnel with expertise and the designated mission of curtailing such waste and fraud from performing their jobs," Sauer told the Supreme Court. The preliminary injunction that is currently in place halted "the Executive Branch's critically important efforts to improve its information-technology infrastructure and eliminate waste," the brief said.

The appeal was lodged in a case filed by the American Federation of State, County and Municipal Employees; the Alliance for Retired Americans; and American Federation of Teachers. Chief Justice John Roberts asked them to file a response to the US by May 12.

Read full article

Comments

Ar chevron_right

After two court losses, DOGE asks Supreme Court for Social Security data access

news.movim.eu / ArsTechnica • 5 May 2025

The Trump administration filed an emergency application on Friday asking the Supreme Court to restore DOGE's access to Social Security Administration records. A lower-court order that prohibited DOGE's access is causing "irreparable harm to the executive branch" and thwarting DOGE's attempts to "eliminate waste and fraud," US Solicitor General John Sauer wrote in the appeal .

"The government cannot eliminate waste and fraud if district courts bar the very agency personnel with expertise and the designated mission of curtailing such waste and fraud from performing their jobs," Sauer told the Supreme Court. The preliminary injunction that is currently in place halted "the Executive Branch's critically important efforts to improve its information-technology infrastructure and eliminate waste," the brief said.

The appeal was lodged in a case filed by the American Federation of State, County and Municipal Employees; the Alliance for Retired Americans; and American Federation of Teachers. Chief Justice John Roberts asked them to file a response to the US by May 12.

Read full article

Comments

Ar chevron_right

Software update makes HDR content “unwatchable” on Roku TVs

news.movim.eu / ArsTechnica • 5 May 2025

An update to Roku OS has resulted in colors looking washed out in HDR content viewed on Roku apps, like Disney+.

Complaints started surfacing on Roku's community forum a week ago. On May 1, a company representative posted that Roku was “investigating the Disney Plus HDR content that was washed out after the recent update.” However, based on user feedback, it seems that HDR on additional Roku apps, including Apple TV+ and Netflix, are also affected. Roku’s representative has been asking users to share their experiences so that Roku can dig deeper into the problem.

One user, going by "Squinky" on the forum, reported having a TCL TV with the problem and shared the following photo comparison:

Read full article

Comments

Ar chevron_right

Software update makes HDR content “unwatchable” on Roku TVs

news.movim.eu / ArsTechnica • 5 May 2025

An update to Roku OS has resulted in colors looking washed out in HDR content viewed on Roku apps, like Disney+.

Complaints started surfacing on Roku's community forum a week ago. On May 1, a company representative posted that Roku was “investigating the Disney Plus HDR content that was washed out after the recent update.” However, based on user feedback, it seems that HDR on additional Roku apps, including Apple TV+ and Netflix, are also affected. Roku’s representative has been asking users to share their experiences so that Roku can dig deeper into the problem.

One user, going by "Squinky" on the forum, reported having a TCL TV with the problem and shared the following photo comparison:

Read full article

Comments

Ar chevron_right

Software update makes HDR content “unwatchable” on Roku TVs

news.movim.eu / ArsTechnica • 5 May 2025

An update to Roku OS has resulted in colors looking washed out in HDR content viewed on Roku apps, like Disney+.

Complaints started surfacing on Roku's community forum a week ago. On May 1, a company representative posted that Roku was “investigating the Disney Plus HDR content that was washed out after the recent update.” However, based on user feedback, it seems that HDR on additional Roku apps, including Apple TV+ and Netflix, are also affected. Roku’s representative has been asking users to share their experiences so that Roku can dig deeper into the problem.

One user, going by "Squinky" on the forum, reported having a TCL TV with the problem and shared the following photo comparison:

Read full article

Comments