SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.
RSS Feed

HELP: Tutorials | FAQ
CONTACT US: Contact info

Search Results

Journal Article

Citation

Islamaj Doğan R, Lu Z. Comput. Appl. Biosci. 2010; 26(21): 2767-2775.

Copyright

(Copyright © 2010, Oxford University Press)

DOI

10.1093/bioinformatics/btq459

PMID

unavailable

Abstract

Motivation: Recognizing words that are key to a document is important for ranking relevant scientific documents. Traditionally, important words in a document are either nominated subjectively by authors and indexers or selected objectively by some statistical measures. As an alternative, we propose to use documents' words popularity in user queries to identify click-words, a set of prominent words from the users' perspective. Although they often overlap, click-words differ significantly from other document keywords.Results: We developed a machine learning approach to learn the unique characteristics of click-words. Each word was represented by a set of features that included different types of information, such as semantic type, part of speech tag, term frequency-inverse document frequency (TF-IDF) weight and location in the abstract. We identified the most important features and evaluated our model using 6 months of PubMed click-through logs. Our results suggest that, in addition to carrying high TF-IDF weight, click-words tend to be biomedical entities, to exist in article titles, and to occur repeatedly in article abstracts. Given the abstract and title of a document, we are able to accurately predict the words likely to appear in user queries that lead to document clicks.Contact: luzh@ncbi.nlm.nih.govSupplementary information: Supplementary data are available at Bioinformatics online.

NEW SEARCH


All SafetyLit records are available for automatic download to Zotero & Mendeley
Print