AMiner (database)

Aminer
Type of site	Bibliographic database
Owner	Tsinghua University
URL	www.aminer.org
Registration	Optional
Launched	March 2006;17 years ago
Current status	Active

Last updated November 26, 2023

AMiner (formerly ArnetMiner) is a free online service used to index, search, and mine big scientific data.

Overview

AMiner (ArnetMiner) is designed to search and perform data mining operations against academic publications on the Internet, using social network analysis to identify connections between researchers, conferences, and publications.^[1] This allows it to provide services such as expert finding, geographic search, trend analysis, reviewer recommendation, association search, course search, academic performance evaluation, and topic modeling.

AMiner was created as a research project in social influence analysis, social network ranking, and social network extraction. A number of peer-reviewed papers have been published arising from the development of the system. It has been in operation for more than three years, and has indexed 130,000,000 researchers and more than 265 million publications.^[2] The research was funded by the Chinese National High-tech R&D Program and the National Science Foundation of China.

AMiner is commonly used in academia to identify relationships between and draw statistical correlations about research and researchers. It has attracted more than 10 million independent IP accesses from 220 countries and regions. The product has been used in Elsevier's SciVerse platform,^[3] and academic conferences such as SIGKDD, ICDM, PKDD, WSDM.

Operation

AMiner automatically extracts the researcher profile from the web. It collects and identifies the relevant pages, then uses a unified approach to extract data from the identified documents. It also extracts publications from online digital libraries using heuristic rules.

It integrates the extracted researchers’ profiles and the extracted publications. It employs the researcher name as the identifier. A probabilistic framework has been proposed to deal with the name ambiguity problem in the integration. The integrated data is stored into a researcher network knowledge base (RNKB).

The principal other product in the area are Google Scholar, Elsevier's Scirus, and the open source project CiteSeer.

History

It was initiated and created by professor Jie Tang from Tsinghua University, China. It was first launched in March 2006. The following provide a list of updates in the past years:

March 2006, Version 0.1, Functions include researcher profiling, expert search, conference search, and publication search. The system was developed in Perl;
August 2006, Version 1.0, The system was re-implemented in Java;
July 2007, Version 2.0, New functions include researcher interest mining, association search, survey paper finding (unavailable now);
April 2008, Version 3.0, New functions include query understanding, new GUI, and search log analysis;
November 2008, Version 4.0, New functions include graph search, topic modeling, NSF/NSFC funding information extraction;
April 2009, Version 5.0, New functions include Profile edition, open API service, Bole search, course search (unavailable now);
December 2009, Version 6.0, New functions include academic performance evaluation, user feedback, conference analysis;
May 2010, Version 7.0, New functions include name disambiguation, paper-reviewer recommendation, ArnetPage creation;
March 2012, Version II, renamed as AMiner, rewrote all the codes and redesign the GUI. New functions include: geographic search, ArnetAPP platform.
June 2014, Version II, renamed as AMiner, rewrote all the codes and redesign the GUI. New functions include: geographic search, ArnetAPP platform.
December 2015, a completely new version got online.
May 2017, professional version got online.
April 2018, New functions include Trend Analysis,^[4] a deep learning based Name Disambiguation^[5]

Resources

AMiner published several datasets for academic research purpose, including Open Academic Graph,^[6] DBLP+citation^[7] (a data set augmenting citations into the DBLP data from Digital Bibliography & Library Project), Name Disambiguation,^[8] Social Tie Analysis.^[9] For more available datasets and source codes for research, please refer to.^[10]

Related Research Articles

Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

Waikato Environment for Knowledge Analysis (Weka) is a collection of machine learning and data analysis free software licensed under the GNU General Public License. It was developed at the University of Waikato, New Zealand and is the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques".

SIGKDD, representing the Association for Computing Machinery's (ACM) Special Interest Group (SIG) on Knowledge Discovery and Data Mining, hosts an influential annual conference.

Expertise finding is the use of tools for finding and assessing individual expertise. In the recruitment industry, expertise finding is the problem of searching for employable candidates with certain required skills set. In other words, it is the challenge of linking humans to expertise areas, and as such is a sub-problem of expertise retrieval.

<span class="mw-page-title-main">Reverse image search</span> Content-based image retrieval

Reverse image search is a content-based image retrieval (CBIR) query technique that involves providing the CBIR system with a sample image that it will then base its search upon; in terms of information retrieval, the sample image is very useful. In particular, reverse image search is characterized by a lack of search terms. This effectively removes the need for a user to guess at keywords or terms that may or may not return a correct result. Reverse image search also allows users to discover content that is related to a specific sample image or the popularity of an image, and to discover manipulated versions and derivative works.

Hans-Peter Kriegel is a German computer scientist and professor at the Ludwig Maximilian University of Munich and leading the Database Systems Group in the Department of Computer Science. He was previously professor at the University of Würzburg and the University of Bremen after habilitation at the Technical University of Dortmund and doctorate from Karlsruhe Institute of Technology.

Jie Tang is a full-time professor at the Department of Computer Science of Tsinghua University. He received a PhD in computer science from the same university in 2006. He is known for building the academic social network search system AMiner, which was launched in March 2006 and now has attracted 2,766,356 independent IP accesses from 220 countries. His research interests include social networks and data mining.

<span class="mw-page-title-main">Usama Fayyad</span> American computer scientist

Usama M. Fayyad is an American-Jordanian data scientist and co-founder of KDD conferences and ACM SIGKDD association for Knowledge Discovery and Data Mining. He is a speaker on Business Analytics, Data Mining, Data Science, and Big Data. He recently left his role as the Chief Data Officer at Barclays Bank.

Philip S. Yu is an American computer scientist and professor of information technology at the University of Illinois at Chicago. He is a prolific author, holds over 300 patents, and is known for his work in the field of data mining.

Social media mining is the process of obtaining big data from user-generated content on social media sites and mobile apps in order to extract actionable patterns, form conclusions about users, and act upon the information, often for the purpose of advertising to users or conducting research. The term is an analogy to the resource extraction process of mining for rare minerals. Resource extraction mining requires mining companies to shift through vast quantities of raw ore to find the precious minerals; likewise, social media mining requires human data analysts and automated software programs to shift through massive amounts of raw social media data in order to discern patterns and trends relating to social media usage, online behaviours, sharing of content, connections between individuals, online buying behaviour, and more. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as these organizations can use these patterns and trends to design their strategies or introduce new programs, new products, processes or services.

<span class="mw-page-title-main">Author name disambiguation</span>

Author name disambiguation is a type of disambiguation and record linkage applied to the names of individual people. The process could, for example, distinguish individuals with the name "John Smith".

Gregory I. Piatetsky-Shapiro is a data scientist and the co-founder of the KDD conferences, and co-founder and past chair of the Association for Computing Machinery SIGKDD group for Knowledge Discovery, Data Mining and Data Science. He is the founder and president of KDnuggets, a discussion and learning website for Business Analytics, Data Mining and Data Science.

Discovering communities in a network, known as community detection/discovery, is a fundamental problem in network science, which attracted much attention in the past several decades. In recent years, with the tremendous studies on big data, another related but different problem, called community search, which aims to find the most likely community that contains the query node, has attracted great attention from both academic and industry areas. It is a query-dependent variant of the community detection problem. A detailed survey of community search can be found at ref., which reviews all the recent studies

Huan Liu is a Chinese-born computer scientist.

Jianchang (JC) Mao is a Chinese-American computer scientist and Vice President, Google Assistant Engineering at Google. His research spans artificial intelligence, machine learning, computational advertising, data mining, and information retrieval. He was named a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) in 2012 for his contributions to pattern recognition, search, content analysis, and computational advertising.

Gautam Das is a computer scientist in the field of databases research. He is an ACM Fellow and IEEE Fellow.

Wei Wang is a Chinese-born American computer scientist. She is the Leonard Kleinrock Chair Professor in Computer Science and Computational Medicine at University of California, Los Angeles and the director of the Scalable Analytics Institute (ScAi). Her research specializes in big data analytics and modeling, database systems, natural language processing, bioinformatics and computational biology, and computational medicine.

Jiliang Tang is a Chinese-born computer scientist and associate professor at Michigan State University in the Computer Science and Engineering Department, where he is the director of the Data Science and Engineering (DSE) Lab. His research expertise is in data mining and machine learning.

Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension.

Yixin Chen is a computer scientist, academic, and author. He is a professor of computer science and engineering at Washington University in St. Louis.

References

↑ Jie Tang; Jing Zhang; Limin Yao; Juanzi Li; Li Zhang; Zhong Su (2008). "ArnetMiner". Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM. pp. 990–998. doi:10.1145/1401890.1402008. ISBN 9781605581934. S2CID 3348552.
↑ "Arnetminer: introduction" . Retrieved 17 Dec 2020.
↑ "SciVerse - HUB - Home". Archived from the original on 9 September 2012. Retrieved 24 April 2012.
↑ "Trend Analysis" . Retrieved 24 December 2018.
↑ Yutao Zhang; Fanjin Zhang; Peiran Yao; Jie Tang (2018). "Name Disambiguation in AMiner". Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London: ACM. pp. 1002–1011. doi:10.1145/3219819.3219859. ISBN 9781450355520. S2CID 207579405.
↑ "Open Academic Graph" . Retrieved 24 December 2018.
↑ "DBLP Papers + Citation Relationship" . Retrieved 24 December 2018.
↑ "Name Disambiguation" . Retrieved 24 April 2012.
↑ "Inferring Social Ties in Large Networks" . Retrieved 24 April 2012.
↑ "Open Data and Codes by ArnetMiner" . Retrieved 24 April 2012.

External links

AMiner.org (Arnetminer.org is now archived)
AMiner.cn