Pap2Pat: Benchmarking Outline-Guided Long-Text Patent Generation with Patent-Paper Pairs

Note

Currently, this repository contains only the Pap2Pat dataset. All code for the dataset creation, outline-guided generation and evaluation will be added later.

Abstract

Dealing with long and highly complex technical text is a challenge for Large Language Models (LLMs), which still have to unfold their potential in supporting expensive and timeintensive processes like patent drafting. Within patents, the description constitutes more than 90% of the document on average. Yet, its automatic generation remains understudied. When drafting patent applications, patent attorneys typically receive invention reports (IRs), which are usually confidential, hindering research on LLM-supported patent drafting. Often, prepublication research papers serve as IRs. We leverage this duality to build PAP2PAT, an open and realistic benchmark for patent drafting consisting of 1.8k patent-paper pairs describing the same inventions. To address the complex longdocument patent generation task, we propose chunk-based outline-guided generation using the research paper as invention specification. Our extensive evaluation using PAP2PAT and a human case study show that LLMs can effectively leverage information from the paper, but still struggle to provide the necessary level of detail. Fine-tuning leads to more patent-style language, but also to more hallucination. We release our data and code.

Contents of this Repository

This repository comprises three main parts:

Pap2Pat: Dataset and evaluation code
Outline_Guided_Generation: Implementation of chunk-based outline-guided generation
Pap2Pat_Dataset_Creation: Code for the creation of Pap2Pat

License

The code is released under the MIT license, see LICENSE.

The data is released under CC-BY.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Outline_Guided_Generation		Outline_Guided_Generation
Pap2Pat/data		Pap2Pat/data
Pap2Pat_Dataset_Creation		Pap2Pat_Dataset_Creation
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pap2Pat: Benchmarking Outline-Guided Long-Text Patent Generation with Patent-Paper Pairs

Abstract

Contents of this Repository

License

About

Releases

Packages

License

boschresearch/Pap2Pat

Folders and files

Latest commit

History

Repository files navigation

Pap2Pat: Benchmarking Outline-Guided Long-Text Patent Generation with Patent-Paper Pairs

Abstract

Contents of this Repository

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages