Unleashing the Data Kraken: Japan’s Unique Stance on AI Training

Nyari Dori
3 min readJun 15, 2023

As the discourse around the ethical and legal implications of using publicly accessible data to train AI models continues to heighten, Japan has taken a clear stand, advocating for the rights of machine learning practitioners to utilize any data they uncover.

In a significant move, a Japanese authority has clarified that the nation’s laws permit AI researchers to base their models on works that are otherwise safeguarded by copyright.

This announcement was made by Keiko Nagaoka, a cabinet minister in the Japanese government, during a session at the House of Representatives. She elucidated that the current law allows machine learning researchers to engage copyrighted works, irrespective of whether the resulting model would be deployed commercially or the nature of its end-use.

While the law does restrict the use of copyrighted materials acquired unlawfully, Nagaoka acknowledged the challenge of tracing the origins of voluminous data, thereby rendering this restriction hard to enforce. Interestingly, the law offers no legal recourse to copyright holders against the use of their works for data-driven analysis, including AI training, unless it results in unreasonable harm to them.

In 2018, the Japanese Copyright Act underwent changes to accommodate the use of copyrighted works for AI training, provided the objective was not to enjoy or feel the sentiments expressed in the work.

Nevertheless, opposition parties and creatives, including visual artists and musicians, have urged for a tightening of the law, expressing their concerns about how training AI on their works without consent could endanger their creative earnings.

Uniquely, Japan explicitly allows the use of copyrighted materials for commercial purposes in AI development. In contrast, the European Union permits the use of copyrighted works purely for research, while its upcoming AI Act necessitates generative AI developers to be transparent about their use of copyrighted works during training. The United Kingdom also limits the use of copyrighted materials for AI training to research purposes alone.

In the United States, the “fair use” principle within copyright law generally permits the use of copyrighted works without requiring permission, provided that it significantly transforms the work and does not undermine the copyright holder’s interests. However, it remains undetermined whether this provision applies to the training of machine learning models, a question that might be addressed by ongoing legal cases.

This conversation has global implications. Recently, member nations of the Group of Seven (G7), an informal consortium of industrialized democratic governments including Japan, proposed a plan to develop congruent regulations and standards for generative AI. Japan’s standpoint appears to deviate from its counterparts, but this disparity may diminish as a collective vision materializes.

The era of generative AI poses a complex question of what constitutes fairness and thus an appropriate legal standard. This has led to diverse legal trajectories across various regions. Despite the complexities, the G7’s initiative to harmonize laws globally is a promising step, potentially simplifying the terrain for developers across the world and fostering work that universally benefits humanity.

--

--