The study on sentence embedding models for patent analysis addresses the challenge of calculating technological similarities between patents. Embedding models like PatentSBERTa, Bert-for-patents, and TF-IDF Weighted Word Embeddings have been tested for their accuracy in patent classification tasks. The research emphasizes the importance of selecting appropriate models according to the specific patent data sections being used.
It proposes a standard library and dataset for accurate assessment and highlights variations in model performance across different classes. This survey is crucial for guiding researchers in choosing the best embedding model for in-depth patent analysis.
Why it’s important: This survey provides invaluable insights into the application of NLP in understanding and categorizing patents. It streamlines the process of selecting the most effective models, enhancing the precision of innovation research and technological mapping. The findings could fuel advancements in AI-driven patent analysis, fostering more nuanced and informed strategies for intellectual property management.