OATAO - Open Archive Toulouse Archive Ouverte Open Access Week

Location extraction from tweets

Hoang, Thi Bich Ngoc and Mothe, Josiane Location extraction from tweets. (2018) Information Processing and Management, 54 (2). 129-144. ISSN 0306-4573

(Document in English)

PDF (Author's version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader

Official URL: https://doi.org/10.1016/j.ipm.2017.11.001


Five hundred million tweets are posted daily, making Twitter a major social media platform from which topical information on events can be extracted. These events are represented by three main dimensions: time, location and entity-related information. The focus of this paper is location, which is an essential dimension for geo-spatial applications, either when helping rescue operations during a disaster or when used for contextual recommendations. While the first type of application needs high recall, the second is more precision-oriented. This paper studies the recall/precision trade-off, combining different methods to extract locations. In the context of short posts, applying tools that have been developed for natural language is not sufficient given the nature of tweets which are generally too short to be linguistically correct. Also bearing in mind the high number of posts that need to be handled, we hypothesize that predicting whether a post contains a location or not could make the location extractors more focused and thus more effective. We introduce a model to predict whether a tweet contains a location or not and show that location prediction is a useful pre-processing step for location extraction. We define a number of new tweet features and we conduct an intensive evaluation. Our findings are that (1) combining existing location extraction tools is effective for precision-oriented or recall-oriented results, (2) enriching tweet representation is effective for predicting whether a tweet contains a location or not, (3) words appearing in a geography gazetteer and the occurrence of a preposition just before a proper noun are the two most important features for predicting the occurrence of a location in tweets, and (4) the accuracy of location extraction improves when it is possible to predict that there is a location in a tweet.

Item Type:Article
HAL Id:hal-02640811
Audience (journal):International peer-reviewed journal
Uncontrolled Keywords:
Institution:French research institutions > Centre National de la Recherche Scientifique - CNRS (FRANCE)
Université de Toulouse > Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE)
Université de Toulouse > Université Toulouse III - Paul Sabatier - UT3 (FRANCE)
Université de Toulouse > Université Toulouse - Jean Jaurès - UT2J (FRANCE)
Université de Toulouse > Université Toulouse 1 Capitole - UT1 (FRANCE)
Other partners > University of Economics – The University of Danang (VIETNAM)
Laboratory name:
Deposited On:30 Apr 2020 13:50

Repository Staff Only: item control page